STRUCTURAL AND REGULATORY GENES CONTROLLING THE BIOSYNTHESIS OF ESSENTIAL OIL CONSTITUENTS IN LAVENDER

By

Lukman Syed Sarker

M.Sc., University of British Columbia, 2013 M.Sc., University of Dhaka, 2009 B.Sc., University of Dhaka, 2007

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

in THE COLLEGE OF GRADUATE STUDIES (Biology)

THE UNIVERSITY OF BRITISH COLUMBIA

April 2020

© Lukman Sarker, 2020

i

The following individuals certify that they have read, and recommend to the College of Graduate Studies for acceptance, a thesis/dissertation entitled:

Structural and regulatory genes controlling the biosynthesis of essential oil constituents in lavender

Submitted by Lukman Sarker in partial fulfillment of the requirements of the degree of Doctor of Philosophy

Dr. Soheil S. Mahmoud, Biology, Irving K. Barber School of Arts and Sciences, UBC Supervisor Dr. Michael Deyholos, Biology, Irving K. Barber School of Arts and Sciences, UBC Supervisory Committee Member Dr. Mark Rheault, Biology, Irving K. Barber School of Arts and Sciences, UBC Supervisory Committee Member Dr. Frederic Menard, Chemistry, Irving K. Barber School of Arts and Sciences, UBC University Examiner Dr. Philipp Zerbe, Department of Plant Biology, University of California External Examiner Additional Committee Members include: Dr. Thu-Thuy Dang, Chemistry, Irving K. Barber School of Arts and Sciences, UBC Supervisory Committee Member

ii

Abstract This thesis describes research conducted to enhance our understanding of essential oil (EO) metabolism in lavender (Lavandula). Specific experiments were carried out in three areas.

First, we developed a comprehensive transcriptomic database to facilitate the discovery of novel structural and regulatory essential oil biosynthetic genes in lavender. The database includes 101,618 contigs (N50 = 831 bp), over 75% of which were successfully annotated.

Annotated sequences include full length transcripts for all previously reported genes involved in the MVA and MEP pathways of isoprenoid metabolism, all prenyltransferases involved in the isoprenoid biosynthesis, and all terpene synthase genes cloned from various lavender species. They also contain novel EO-related genes, as exemplified by the S-linalool synthase

(Li(S)-LINS) gene we cloned using this database. Further, the database contains 1633 TFs, representing 2.1% of the lavenders' transcriptome, some of which have been shown to regulate secondary metabolism in plants. The most abundant TF families were bHLH (209),

WRKY (190), and MYB (189) followed by AP2 (79), bZIP (77), and (19), NAC

(13), and (9).

Second, we isolated the 5' upstream genomic DNA (promoter) sequences for linalool synthase (LiLINS) and 1,8-cineole synthase (LiCINS) genes from L. x intermedia and used them to conduct a Yeast One-Hybrid Assay in order to identify TFs that regulate the expression of LiLINS and LiCINS genes in this plant. The assay identified 96 proteins that interacted with one or both promoters. To elucidate the nature of this interaction further, the

LiLINS and LiCINS promoter fragments were each fused to the E. coli gusA (GUS) reporter gene. The constructs were separately transformed into tobacco (Nicotiana benthamiana) leaves, co-expressing individually a subset of ten representative transcription factors. Six TFs

iii

induced expression from both promoters, two activated LiCINS promoter alone, and two did not induce expression from either promoter.

Finally, we isolated and functionally characterized genes for two acetyltransferases,

LiLAT-3 & LiLAT-4, which convert some of the monoterpene EO constituents into their respective esters in lavender.

Our results, which have enhanced our understanding of EO biosynthesis in plants, will help improve EO yield and composition in lavender and other plants through plant breeding and biotechnology.

iv

Lay Summary

Lavender is an economically important plant known for its essential oils (EO), which is dominated by a group of small, volatile biochemical compounds known as monoterpenoids.

To understand the genetic basis of lavender EO constituent production, we developed a comprehensive transcriptome database, which facilitates the discovery of structural and regulatory genes involved in terpenoids biosynthesis, as well as other important processes such as defense against pest and herbivores. Further, we employed a Yeast One-Hybrid

Assay to identify transcription factors that control the expression of two important genes responsible for the production of linalool and cineole in lavenders. Finally, we cloned genes for two acetyltransferase enzymes involved in the conversion of monoterpenes to their corresponding esters in lavender plants.

v

Preface

Chapter 2 has been published in a journal Planta and the contents were reprinted/ adapted by permission from Springer Nature: Planta, 249(1):271–290, RNA-Seq in the discovery of a sparsely expressed scent-determining monoterpene synthase in lavender (Lavandula),

Ayelign M. Adal, Lukman S. Sarker, Radesh PN Malli, Ping Liang and Soheil S.

Mahmoud, © 2018, License # 4718011260334. I designed the experiment to develop the lavender transcriptomic database. I also helped Ayelign Adal to clone the S-linalool synthase.

I helped in the writing procedure; however, Ayelign Adal led the manuscript preparation with the guidance of Dr. Soheil Mahmoud. Both Lukman Sarker and Ayelign Adal share an equal contribution in publishing this manuscript. Mr. Radesh PN Malli and Dr. Ping Liang analyzed genomic seq and copy numbers of S-linalool synthase in the genome.

Chapter 3 has been published in Planta, and the contents were reprinted/ adapted by permission from Springer Nature: Planta, (2020) 251: 5 (online version), Diverse transcription factors control monoterpene synthase expression in lavender (Lavandula).

Lukman S. Sarker, Ayelign M. Adal, and Soheil S. Mahmoud, © 2019, License #

4718020130702. I have designed and conducted the experiment. Ayelign Adal took the lead to write the manuscript with the help of myself and Dr. Soheil Mahmoud.

Chapter 4 has been published in Planta, and the contents were reprinted/ adapted by permission from Springer Nature: Planta, 242 (3): 709-19. Cloning and functional characterization of two monoterpene acetyltransferases from glandular trichomes of L. x intermedia. Lukman S. Sarker and Soheil S. Mahmoud, © 2015, License

vi

#4718020297429. I have designed and conducted the experiments and wrote the manuscript with the guidance of Dr. Soheil Mahmoud.

During my Ph.D. thesis work, I also co-authored the following manuscripts:

a) Malli RPN, Adal AM, Sarker LS, Liang P, and Mahmoud SS (2019) De novo

sequencing of the Lavandula angustifolia genome reveals highly duplicated and

optimized features for essential oil production, Planta, 249(1):251–256. (I collected

leaf tissues, prepared genomic DNA for sequencing and provided ESTs and

transcriptomes for annotation).

b) Wells R, Truong F, Adal AM, Sarker LS, and Mahmoud SS (2018) Lavandula

essential oils: a current review of applications in medicinal, food, and cosmetic

industries of lavender. Natural Product Communications, 13(10): 1403-1417. (I wrote

a section of the review and helped with reference management and manuscript

preparation).

c) Adal AM, Sarker LS, Lemke A, Mahmoud SS (2017) Isolation and functional

characterization of a methyl jasmonate-responsive 3-carene synthase from Lavandula

x intermedia. Plant Molecular Biology, 93: 641-657. I helped Ayelign Adal with

qPCR experimentation and writing the manuscript.

vii

Table of Contents

Abstract ...... iii

Lay Summary ...... v

List of Tables ...... xi

Table of Figures...... xii

List of Symbols ...... xiv

Acknowledgments ...... xvi

Dedication...... xviii

1 Chapter: Introduction ...... 1 1.1 Biosynthesis of lavender terpenoids ...... 3 1.1.1 Isoprene biosynthesis ...... 4 1.1.2 Mono- and sesquiterpene synthesis ...... 7 1.1.2.1 Mono and sesquiterpene synthases ...... 8 1.1.2.2 EO Esters and ketones in lavender ...... 10 1.2 Storage and secretion of volatile terpenoids ...... 11 1.3 Regulation of terpenoid synthesis ...... 12 1.3.1.1 Trans-acting factors and cis-elements ...... 14 1.3.1.2 families ...... 14 1.3.2 Other modes of gene regulation ...... 16 1.4 Research objectives and outline...... 17

2 Chapter: Transcriptomic database ...... 22 2.1 Synopsis ...... 22 2.2 Materials and methods ...... 23 2.2.1 Plant Material and Nucleic Acid Extraction ...... 23 2.2.2 Illumina Sequencing and de novo Assembly...... 23 2.2.3 Functional Annotation ...... 24 2.2.4 Quantification and gene expression levels ...... 25 2.2.5 Identification of Transcription Factor (TF) families ...... 26 2.2.6 qPCR to validate TF candidates ...... 26 2.3 Results ...... 28 viii

2.3.1 RNA sequencing and de novo transcriptome assembly ...... 28 2.3.2 Annotation, gene ontology, and protein families ...... 30 2.3.3 DGE analysis and gene quantification ...... 39 2.3.4 Analysis of putative genes in terpene biosynthesis ...... 42 2.3.5 Selection of transcription factors involved in terpene biosynthesis ...... 46 2.4 Discussion ...... 50 2.4.1 Transcriptome sequencing, assembly, and annotation ...... 50 2.4.2 Identification of regulatory genes...... 51 2.4.3 Identification of terpenoid biosynthesis genes ...... 53

3 Chapter: Identification of TFs regulating Lavender terpene synthases ...... 55 3.1 Synopsis ...... 55 3.2 Materials and methods ...... 56 3.2.1 Candidate selection ...... 56 3.2.2 Promoter search ...... 56 3.2.3 Cloning and construct design ...... 57 3.2.4 Promoter localization assay in N. benthamiana leaves ...... 58 3.2.5 Yeast-one-hybrid system ...... 60 3.2.5.1 Screening yeast one hybrid ...... 61 3.2.5.2 Transient transactivation assay in N. benthamiana leaves ...... 61 3.3 Results ...... 62 3.3.1 Promoter search ...... 62 3.3.2 Construct design and GUS activity ...... 65 3.3.3 Yeast-one-hybrid assay to identify TFs interacting with LiLINSp/LiCINSp ...... 67 3.3.4 In vivo screening of LinSp/CinSp interaction with specific TF candidates ...... 68 3.4 Discussion ...... 71

4 Chapter: Cloning of terpene acetyltransferase ...... 74 4.1 Synopsis ...... 74 4.2 Materials and methods ...... 75 4.2.1 Candidate selection ...... 75 4.2.2 Relative expression assay of LiAAT ...... 75 4.2.3 Recombinant protein expression and purification ...... 77 4.2.4 Enzymatic assay and kinetic study ...... 79 4.2.5 GCMS analysis ...... 80 ix

4.2.6 Multiple sequence alignment and phylogenetic tree analysis ...... 80 4.2.7 Accession number ...... 80 4.3 Results ...... 81 4.3.1 Candidate selection ...... 81 4.3.2 Sequence analysis ...... 82 4.3.3 Heterologous protein expression and enzymatic assay ...... 84 4.3.4 LiAAT kinetic studies ...... 85 4.3.5 Tissue-specific regulation of LiAAT ...... 87 4.3.6 Phylogenetic tree analysis ...... 89 4.4 Discussion ...... 92

5 Chapter: Conclusion ...... 96 5.1 Lavender transcriptomic database ...... 96 5.2 Identification of transcription factors regulating LiLINSp & LiCINSp ...... 97 5.3 Lavender acetyltransferases ...... 98

References ...... 99

Appendices ...... 117 Appendix A : KEGG pathways using KO ...... 117 Appendix B : KEGG pathways distribution during gene expression analysis ...... 126 Appendix C : Plasmids used in this study...... 135 Appendix D : Media recipes used for N. benthamiana transformation and regeneration ...... 140

x

List of Tables

Table 1.1: Major terpenoids in lavender species...... 2 Table 2.1: List of primers used in TF qPCR study ...... 27 Table 2.2: Summary of the Lavender Transcriptome ...... 29 Table 2.3: Summary of annotations on unigenes against public databases...... 31 Table 2.4: Species distribution of the top BLAST hits...... 32 Table 2.5: Summary of the most common InterPro entries found in the Lavender transcriptome database...... 34 Table 2.6: KEGG pathway related to the biosynthesis of secondary metabolites found in lavender transcriptome database...... 36 Table 2.7: Summary of DGE sequencing and mapping ...... 39 Table 2.8: Statistics of gene expression abundance in three lavender species ...... 40 Table 3.1: Primers used in this study ...... 57 Table 3.2: Predicted regulatory elements in the two promoter regions...... 64 Table 3.3: L . x intermedia TFs tested in N. benthamiana for activating linalool synthase promoter (LiLINSp) / 1,8-cineole synthase promoter (LiCINSp)...... 70 Table 4.1: Oligonucleotides used in this study ...... 77 Table 4.2: A Microarray analysis ...... 82 Table 4.3: Kinetics data (0.2 mM acetyl CoA as a substrate)...... 87

xi

Table of Figures Figure 1.1 Biosynthesis of IPP and DMAPP via the mevalonate (MVA) and MEP independent pathway...... 6 Figure 1.2 Terpene biosynthesis...... 19 Figure 1.3 Schematic presentation of monoterpene synthesis...... 20 Figure 1.4 Schematic presentation of sesquiterpene synthesis in lavender...... 21 Figure 2.1 Bar graph distribution of unigenes from the lavender transcriptomic database. ... 30 Figure 2.2 Gene Ontology of lavender transcriptome database...... 33 Figure 2.3 Comparisons of digital gene expressions (DGEs)...... 41 Figure 2.4 Heat map showing the expression patterns of genes involved in the biosynthesis of TPS precursors...... 43 Figure 2.5 Heat map showing the expression patterns of TPSs that contain long DNA sequences (≥ 1kb)...... 45 Figure 2.6 Heat map showing the normalized expression of differentially regulated transcription factors (TFs) in flowers relative to leaf tissues...... 47 Figure 2.7 qPCR analysis of selected TFs from Lavandula...... 49 Figure 2.8 Phylogenetic tree of plant DXSs...... 54 Figure 3.1 LiLINS and LiCINS promoter analysis...... 63 Figure 3.2 Predicted promoter fragments and constructs in the pCambia1391z vector...... 66 Figure 3.3 GUS assay different promoter constructs in the pCambia1391z binary vector. ... 67 Figure 3.4 TF constructs in the pGA482 plant binary vector...... 69 Figure 3.5 GUS activity to determine the TFs interaction with LiLINSp/LiCINSp promoters in N. benthamiana leaf disc assay...... 69 Figure 3.6 Transactivation of LiCINSp & LiLINSp promoters by TFs in N. benthamiana leaves...... 71 Figure 4.1 Multiple sequence alignment of LiAAT-3 and LiAAT-4 with respect to their closest alcohol acetyltransferase homolog...... 83 Figure 4.2 GCMS analysis of assay products formed in LiAAT-3 and LiAAT-4 enzymatic assays...... 84 Figure 4.3 Kinetic assays of LiLAAT-3 and LiAAT-4...... 86 Figure 4.4 Detection of transcripts for LiAAT-3 and LiAAT-4 by standard PCR...... 88 xii

Figure 4.5 Transcriptional activity of LiAAT candidates in different tissues of lavender species...... 89 Figure 4.6 Phylogenetic tree analysis of BAHD acyltransferases including LiAAT-1-4...... 90

xiii

List of Symbols Aa Amino acid Bp Base pair BSA Bovine serum albumin CoA Acetyl-coenzyme cDNA Complementary DNA cv. Cultivar DMAPP Dimethyl-allyl diphosphate dNTPs Deoxynucleotide-tri-phosphate mix DOXP 1-deoxy-D-xylulose 5-phosphate DTT Dithiothreitol DXP Deoxy-xylulose-phosphate

EDTA Ethylenediaminetetraacetic Acid EO Essential Oil EST Expressed sequence tag FPP Farnesyl diphosphate GC/MS Tandem Gas chromatography mass spectrometry GGPP Geranylgeranyl diphosphate GO Gene Ontology GOI Gene of interest GPP Geranyl diphosphate HMBPP 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate HMG-CoA 3-hydroxy-3-methylglutaryl CoA IDT Integrated DNA technologies IPP Isopentyl diphosphate IPTG Isopropyl β-D-1-thiogalactopyranoside ISO International organization for standardization LA L. angustifolia LI L. x intermedia LL L. latifolia LiLINS L. x intermedia linalool synthase xiv

LiLINSp L. x intermedia linalool synthase promoter LiCINS L. x intermedia cineole synthase LiCINSp L. x intermedia cineole synthase promoter LiLAAT L. x intermedia linalool alcohol acetyltransferase MECP 2-C-methylerythritol-2,4-cyclodiphosphate MEP Methyl-erythritol phosphate mg/gfWT mg per gram fresh weight MOPSO 3-morpholino-2-hydroxypropanesulfonic acid mRNA Messenger RNA MU 4-methylumbelliferone MUG 4-methylumbelliferyl β-D glucuronide MVA NAD+ Nicotinamide adenine dinucleotide Ni-NTA agarose Nickel-charged affinity resin NPP Neryl diphosphate ORF Open reading frame PCR Polymerase chain reaction

-PO4 buffer Phosphate buffer PTV Programmable Temperature Vaporizing

PVP-40 Polyvinylpyrrolidone absorbs up to 40% of its weight in atmospheric water qRT-PCR Quantitative Reverse-Transcriptase PCR r.p.m Rotation per minute RE Restriction enzyme SDS-PAGE Sodium dodecyl sulfate- polyacrylamide gel TF Transcription factor Tm Annealing temperature TPS Terpenoid synthase U Unit- the amount of the enzyme that catalyzes the conversion of One micromole of substrate per minute UV Ultraviolet

xv

Acknowledgments

First, I would like to thank all the faculties, staff and graduate students of UBC’s Okanagan campus who have helped me with my studies. In particular, I am grateful to my supervisor

Dr. Soheil Mahmoud for providing me with the opportunity of conducting my research in his laboratory. His guidance and support helped me exceed all my expectations and raised my confidence level to a new height. He has been a great help wherever I needed him as a mentor, or, as a friend. I would like to express my heartfelt gratitude to my committee members, Dr. Kirsten Wolthers, Dr. Mike Deyholos, Dr. Mark Rheault and Dr. Thu-Thuy

Dang for their continuous support and all the technical guidance.

My sincere appreciation goes to the past and current members of Dr. Soheil

Mahmoud's lab (Dr. Ayelign Adal, Dr. Zerihun Demissie, Dr. Lauren Erland, Rebecca Wells,

Elaheh Najafian, Felisha Truong, Imel Khaleghi, Ashley Lemke, Katie Del Buono,

Christopher Bitcon, Dakshita Ranatunga, Mike Tarnowycz, Michelle Hawkins, Sarah

Mahmoud, and Gabriel Zavala) for their unconditional support throughout my study and kind friendship. This thesis would not have been possible without the generous assistance of Dr.

Ayelign Adal and Dr. Dinesh Adhikary for their openness and being there whenever I needed them. Special thanks to Dr. Melanie Jones and Justin Meeds for the help with 4- methylumbelliferyl (MU) chemical, Dr. Kirsten Wolters for sourcing me with pGEX4T-1 vector. I would like to specially mention Barb Lucente for her administrative guidance throughout my stay at UBCO, she was quick to respond and solve any issues. I am very much grateful to Rosemary Garner and Sunil Kainth for their support during my teaching assistant job which made me feel comfortable and confident. Special thanks to my fellow

xvi

friends at UBCO: Mansak Tantikachoniket, Antreas Pogiatzis, Dr. Krin Mann, Dr. Eric

Vukicevich, Dr. Isadora Quintans. Much needed acknowledgment of the Bangladeshi community here in Kelowna for their unconditional help and support.

I would like to thank my wife and have to give due recognition to my in-laws for all the supports and constantly encouraging me to complete this thesis. Finally, my heartiest appreciation goes out to my parents and brothers for being the source of my inspirations and guiding me in my difficult times.

xvii

Dedication

To my family

xviii

1 Chapter: Introduction Lavenders are perennial shrubs that belong to the genus Lavandula (Lamiaceae). Over 39 morphologically distinct species are identified and cultivated around the world for their unique composition of volatile compounds. However, only three species, including L. angustifolia, L. latifolia, and L. x intermedia, are grown commercially for their essential oils (EO). Lavandin (Lavandula x intermedia), a natural cross of L. latifolia and L. angustifolia, is a popular species cultivated commercially because of its higher yield per acre compared to its parent species. Lavender EOs- a blend of mono and sesquiterpenoid alcohols, esters, oxides, and ketones- are extensively used in cosmetics, hygiene products, and alternative medicines. Around 50-60 monoterpenes have been identified in different lavender varieties, although only a few components determine the characteristic EO of a given species

(Upson, Tim; Andrew, 2004). The most abundant monoterpenes found in lavenders include linalool, linalool acetate, borneol, camphor, and 1, 8-cineole. Among these, camphor, linalool, and linalool acetate are key determinants of the lavender EO quality (Lis-Balchin,

2002; Upson, Tim; Andrew, 2004). EOs with a high linalool and linalool acetate to camphor ratio are considered to be of “high quality”, and thus are used in cosmetic products and aromatherapy (Cavanagh & Wilkinson, 2002; Lane et al., 2010). EOs added to alternative medicines are typically rich in camphor and 1, 8-cineole. Lavender EO also contains sesquiterpenes such as caryophyllene, bergamotene, and nerolidol with trace amounts of other terpenoid compounds such as perillyl alcohol.

Lavender EO composition is greatly influenced by environmental factors, and the species it is collected from (Cavanagh & Wilkinson, 2002). Oil composition for the most common lavender species: Lavandula angustifolia (formerly L. officinalis, English lavender), 1

L. latifolia (Spike lavender) and L. x intermedia (Lavandin) are listed in Table 1.1. Though it is smaller in size and the oil yield is relatively low, L. angustifolia has a better linalool and linalool acetate to camphor ratio compared to L. x intermedia and L. latifolia. L. latifolia contains large quantities of camphor while producing a small amount of linalool and linalool acetate, making it more useful for the alternative medicine industry. On the other hand,

Lavandin produces an EO with less favorable linalool and linalool acetate to camphor ratio; the overall oil yield is much higher, and the plant has better adaptability in cold weather. The choice of lavender variety is therefore dependent on required oil yield, and quality (higher quality oils for pure EOs, fragrance, and medical application; lower quality oils for soaps and detergents), and the growth environment (Al-Badani et al., 2017; Aprotosoaie et al., 2017;

Boeckelmann, 2008; Lis-Balchin, 2002).

Table 1.1: Major terpenoids in lavender species. (English lavender, Lavandin, and Spike lavender) (Lis-Balchin, 2002).

Content (%) of major terpenes in lavender oil English lavender Lavandin Spike lavender (L. angustifolia) (L. x intermedia) (L. latifolia)

Linalool 10-50 20-23 26-44

Linalool acetate 12-54 19-26 0-1.5

Lavandulol and acetate 0.1-14 0.5-0.8 0.2-1.5

Camphor 0-0.2 12 5.3-14.3

1,8-Cineiol 2.1-3.0 10 25-36

Borneol 1.0-4.0 2.9-3.7 0.8-4.9

Caryophyllene 3.0-8.0 2.7-6.0 0.1-0.3

2

Myrcene 0.4-1.3 1.2-1.5 0.2-0.4

Farnesene Trace 1.1 0.2-0.3

Limonene 0.2-0.4 0.9-1.5 1.0-2.2

Lavender EOs are extensively used in cosmetics and personal care products including colognes, perfumes, shampoos, and soaps. They are also used in aromatherapy to relieve stress, depression, and insomnia (Chu, 2005; Cathey et al., 2020; Wolfe & Herzberg, 1996), and in alternative medicine as antimicrobial agents (Adaszynska-Skwirzynska et al., 2019).

1.1 Biosynthesis of lavender terpenoids Terpenoids, naturally occurring organic hydrocarbons, are the primary constituents of lavender EO, which plays a significant role in plant physiology. Terpenes are produced through the polymerization of a five-carbon unit called ‘isoprene' and are classified based on the number of isoprene units they contain (McGarvey & Croteau, 1995). The smallest terpenes contain only a single five-carbon unit and are called hemiterpenes. The best-known hemiterpene is the isoprene itself, which is released from photosynthetically active plant tissues. Monoterpenes are composed of two five-carbon units and are predominant in the volatile essences of flowers and the EO of spices and herbs. Sesquiterpenes contain three five-carbon units, and, like monoterpenes, they are volatile components in essential oils. In addition, sesquiterpenes act as phytoalexins, antibiotic compounds, and antifeedants.

Diterpenes contain four five-carbon units and include phytols, gibberellin hormones, and phytoalexins. Some di-terpenes, like Taxol and forskolin, are pharmacologically important in the treatment of cancer and glaucoma, respectively. Triterpenes contain six five-carbon units and include membrane components, certain phytoalexins, various toxins, and

3

feeding deterrents. Tetraterpenes that contain eight five-carbon units are the accessory pigments and are essential to photosynthesis (Cavanagh & Wilkinson, 2002;

Croteau, Kutchan, & Lewis, 2000; Mahizan et al., 2019).

1.1.1 Isoprene biosynthesis

In plants, two independent but interactive pathways called Mevalonate (MVA) or cytosolic, and 2-C-methyl-D-erythritol 4-phosphate (MEP) or plastidial pathway, produce the general terpene precursors IPP and DMAPP (Arigoni et al., 1997; Bick & Lange, 2003; Gershenzon et al.,2000; Laule et al., 2003). The MVA pathway is the only pathway found in animals and fungi as well as in the cytoplasm of phototropic organisms. Precursors produced through this pathway are mainly converted to FPP to synthesize sesquiterpenes, and triterpenes, among others (Chappell et al., 1995; McGarvey & Croteau, 1995). The MEP pathway, present in most bacteria and plant chloroplasts, provides precursors for the biosynthesis of GPP and

GGPP that are ultimately used to produce monoterpenes and diterpenes, respectively (Adal,

2019; Mahmoud & Croteau, 2002, 2003).

The MVA pathway is initiated by three molecules of acetyl-coenzyme (Co)A yields

3-hydroxy-3-methylglutaryl CoA (HMG-CoA). The enzyme HMG-CoA reductase reduces

HMG-CoA to mevalonic acid (MVA), which is then converted to mevalonate 5-diphosphate by mevalonate kinase and mevalonate 5-phosphate kinase. Mevalonate 5-phosphate is subsequently decarboxylated to yield IPP (Figure 1.1) (Liu et al., 2005; Tholl, 2015).

The MEP pathway or DXP pathway is initiated by the condensation of pyruvate and glyceraldehyde-3-phosphate to 1-deoxy-D-xylulose 5-phosphate (DOXP), catalyzed by

DOXP synthase (DXPS). DOXP is then reduced to 2-C-methyl-D-erythritol 4-phosphate

(MEP) by DX reductoisomerase (DXR). The cytidine 5-phosphate derivative is synthesized 4

from MEP, which then undergoes phosphorylation and cyclization to produce 2-C- methylerythritol-2,4-cyclodiphosphate (MECP). 1-hydroxy-2-methyl-2-(E)-butenyl 4- diphosphate (HMBPP) synthase converts MECP to HMBPP and is then transformed to IPP and DMAPP (Figure 1.1) (Dewick, 2002; Liu et al., 2005).

5

Figure 1.1 Biosynthesis of IPP and DMAPP via the mevalonate (MVA) and MEP independent pathway. The indicated enzymes are: AACT, acetyl-CoA/acetyl-CoA C-acetyl-thiolase; HMGS, 3-hydroxy-3- methylglutaryl-CoA synthase; HMGR, 3-hydroxy-3-methylglutaryl-CoA reductase; MVA kinase, mevalonate kinase; MVAP kinase, phosphomevalonate kinase; MVAPP decarboxylase, mevalonate-

6

5-diphosphate decarboxylase; DXPS, 1-deoxyxylulose-5-phosphate synthase; DXR, 1- deoxyxylulose-5-phosphate reductoisomerase; MEP cytidyl transferase, 2-C-methylerythritol-4- phosphate cytidyltransferase; CDP-ME kinase, 4-(cytidine-5'-diphospho)-2-C-methylerythritol kinase; MECP synthase, 2-C-methylerythritol-2,4-cyclodiphosphate synthase; HMPPP synthase, 1- hydroxy-2-methyl-E-butenyl-4-diphosphate synthase; HMBPP reductase, 1-hydroxy-2-methyl-E- butenyl-4-diphosphate reductase and IPP isomerase (IPPI). The pathway may give rise to IPP and DMAPP independently of the interconversion catalyzed by IPPI. A transfer of IPP/DMAPP between cytosol and plastid is possible but, as of yet, unproven. It is inspired by (Malli et al., 2019) with permission from Planta © Authors 2018, the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).

1.1.2 Mono- and sesquiterpene synthesis

Mono- and sesquiterpenes are derived from the precursors GPP and FPP, respectively, by the activity of various terpene synthases (sometimes called cyclase) (Figure 1.2). Some monoterpenes, such as borneol and linalool, are further modified through acetylation, oxidation, or reduction reactions. Camphor is produced from borneol by the action of short- chain alcohol dehydrogenase, and linalool acetate is produced from linalool by the linalool acetyltransferase enzyme. Monoterpene synthases initiate the reaction by forming cationic intermediates such as geranyl cation, linalyl diphosphate (transoid and cisoid), linalyl cation, and finally the α-terpinyl cation. These intermediate products individually experience a number of cyclizations, hydride shifts, or other rearrangements until they form a stable component. For example, α-terpinyl cation, a critical branchpoint intermediate, forms all cyclic monoterpenes such as limonene, α-terpineol, pinene, phellandrene, sabinene, terpinene, borneol, camphor, cineol, etc. Geraniol, linalool, myrcene, and β-ocimene are derived from geranyl and linalyl cation (Figure 1.3). Two intermediate cationic forms, such as farnesyl cation and its isomer nerolidyl cation, are produced by sesquiterpene synthases before any rearrangement occurs for stable compounds (Degenhardt et al., 2009) (Figure

1.4).

7

1.1.2.1 Mono and sesquiterpene synthases

To date, hundreds of different mono and sesquiterpene synthases have been cloned from a number of plants, including mint, lemon, snapdragon, sage, and Arabidopsis and even gymnosperms like grand fir (for details, please follow (Degenhardt et al., 2009)).

Interestingly, most terpene synthases possess similar properties, such as a native 50-100 kDa molecular mass range (either monomeric or dimeric), a requirement for a divalent metal ion as a cofactor for catalysis (usually Mg2+ or Mn2+ for angiosperms, K+, Mn2+, Fe2+ for gymnosperms), a pI near 5.0 and a pH optimum within a unit of neutrality (Bohlmann et al.,

1998). In general, plant monoterpene synthases (600-650 amino acids) are larger than sesquiterpene synthases (550-580 amino acids) due to the N-terminal signal peptide sequences which target the protein towards the plastids. The N-terminal signal peptides contain a high frequency of serine and threonine residues with low amounts of acidic amino acids; however, they do not share any common sequence similarities (Bohlmann et al., 1997).

Sequence analysis of terpene synthases from different plant species revealed four conserved motifs, i.e., the RRx8WD motif, LQLYEASFLL motif, DDxxD, and

(N,D)D(L,I,V)X(S,T)XXXE motifs (Bohlmann et al., 1998). The arginine-rich N-terminal

RR(x8)W motif is essential for cyclization of GPP and the enzymatic activity of many monoterpene synthases (Williams et al., 1998), while the LQLYEASFLL motif is thought to be part of the active site (Wise et al., 1998). The aspartate rich regions: DDxxD and

(N,D)D(L,I,V)X(S,T)XXXE motifs, are responsible for the enzymatic activity and coordination of divalent cations and are thus responsible for substrate binding and ionization, respectively (Christianson, 2006; Whittington et al., 2002). The DDxxD motif is highly conserved compared to the less conserved (N,D)D(L,I,V)X(S,T)XXXE motif in almost all

8

plant terpene synthases, and both of these motifs bind a trinuclear magnesium cluster involved in the fixation of the pyrophosphate substrate (Zhou & Peters, 2009). Site-directed mutagenesis studies revealed that this region is very important for terpene catalysis, as mutations in this region frequently lower or completely abolish the catalytic activity, while other alterations to this region lead to abnormal products (Cane et al., 1996; Cane et al.,

1996; Degenhardt et al., 2009; Seemann et al., 2002). However, an NDxxD motif, which is a natural variant of the DDxxD motif of (+) germacrene synthase from goldenrod, has no impact on catalytic activity. It shows that the highly conserved DDxxD motif is not as necessary for catalytic activity in farnesyl diphosphate cyclization (Prosser et al. 2004).

Moreover, some sesquiterpene synthases share an extra piece of DDxxD motif instead of

(N,D)D(L,I,V)X(S,T)XXXE motif, which is also involved in catalysis (Little & Croteau,

2002; Steele et al., 1998).

Phylogenetic analysis has discerned that terpenoid synthases (TPS) are categorized into eight different gene subfamilies (designated TPS-a to TPS-h). TPS-a is constituted by sesquiterpene and diterpene synthases from angiosperms. TPS-d is comprised of gymnosperm monoterpene, sesqui- and diterpene synthases. Many monoterpene synthases, including identified monoterpene synthases from Lamiaceae, belong to the TPS-b and TPS-g family (Bohlmann et al., 1998; Trapp & Croteau, 2001). TPS-c, TPS-e, and TPS-f are represented by single angiosperm terpene synthases, i.e., the diterpene synthases such as copalyl diphosphate synthase, kaurene synthase, and the angiosperm linalool synthase, respectively. TPS-h subfamily was categorized specifically to taxonomic sources (Selaginella spp) (Chen et al., 2011).

9

1.1.2.2 EO Esters and ketones in lavender

Monoterpene ketones, such as camphor, are derived from their substrate by short chain dehydrogenase enzymes (SDR). The SDR family includes various oxidoreductase, some isomerize, and lyase enzymes which lead to the production of some new monoterpenes from the regular monoterpenes; this family also exhibits a variety of substrate specificities for , retinoids, prostaglandins, sugars, alcohols and other small molecules (Figure 1.5)

(Moummou et al., 2012; Persson et al., 2008; Sarker et al. 2012). SDRs are usually 750-800 nucleotides, 250-275 amino acid, long with a molecular mass of ca. 28 – 30 kDa. Two conserved motifs have been found in amino acid sequence comparisons between the SDR enzymes, even though pairwise identities are quite low (10%-30%) (Kallberg et al., 2002;

Kallberg et al., 2010; Kavanagh et al., 2008). These two motifs are the coenzyme binding motif GxxxGxG and an active site pattern of YxxxK. The first SDR crystal structure revealed that the coenzyme and substrate binding sites fall into a single domain, which is clearly distinct from the structures of the medium-chain dehydrogenase/ reductase (MDR) enzymes that are composed of two separate domains. The active site motif, YxxxK, positioned at 155-

159 in ZSD1 from Zingiber zerumbet, is one of the most common conserved motifs in SDR.

A Ser142 residue (13 residues upstream from the Tyr) is conserved in most SDR enzymes.

The ‘Ser-Tyr-Lys triad' is responsible for the catalysis of SDR. Again, one Asp residue at N- terminal end (Asp39 for ZSD1) plays a critical role in determining the coenzyme specificity for NAD(H) over NADP(H) in the SDR enzyme (Okamoto et al., 2011; Sarker et al., 2012).

The monoterpene esters are derived from their respective parent monoterpenes via acetylation reactions. Although the vast majority of monoterpene acetyltransferases have not been described, four BAHD-type acetyltransferases were shown to mediate the formation of

10

geranyl acetate in rose and oil grass (Sarker & Mahmoud, 2015; Shalit, 2003; Sharma et al.,

2013). The BAHD acyltransferases are cytosolic enzymes with molecular masses ranging from 48 to 55 kDa. These enzymes, which require Acetyl Coenzyme A (Acetyl CoA) as a cofactor, contain two highly conserved motifs (D’Auria, 2006; D’Auria et al., 2007). The first is the HxxxD motif, which is located near the center of the molecule and is important for general base catalysis (D’Auria, 2006; D’Auria et al., 2007; St-Pierre & Luca, 2000). The oxygen or nitrogen atom of the corresponding substrate is deprotonated by the histidine residue of the motif, allowing a nucleophilic attack on the carbonyl carbon of the coenzyme

A thioester. This leads to the formation of a tetrahedral intermediate between the coenzyme

A and the acceptor substrate. The intermediate is subsequently reprotonated to produce the free CoA and the acetylated ester or amide (D’Auria, 2006). The second highly conserved region is the DFGWG motif, which is located at the carboxyl end of the protein and is believed to have a structural role in enzymatic function (D’Auria, 2006; Garvey et al., 2009;

Sarker & Mahmoud, 2015; Unno et al., 2007). Nearly all functionally characterized BAHD enzymes contain both of these motifs, and deletion or modification of one or both of these motifs result in highly reduced enzyme activity (D’Auria, 2006; Unno et al., 2007).

1.2 Storage and secretion of volatile terpenoids

There are numerous functions of the volatile compounds produced by plants. For example, they help to attract pollinators during pollination, protect the plant from herbivorous attacks, or act as a pathogen deterrent. In some plants, e.g., lavenders and mints, these volatile compounds are produced and accumulated in a specialized secretion structure called glandular trichomes (Fahn, 1988; Lis-Balchin, 2002; Wang, 2014). Glandular trichomes are

11

modified epidermal cells that cover leaves, stems, and parts of the flower. There are two forms of glandular trichomes available in lavender, including capitate and peltate glandular trichomes. The capitate glandular trichomes are smaller in size and simple in form, having only a basal cell, a short stalk, and a one to two cell head. On the other hand, peltate glandular trichomes are complex in structure and consist of secretory cells (usually eight-disc cells), a stalk cell, and a basal cell anchoring the trichome in the epidermis. Essential oils are stored in the subcuticular space between the cuticle and the apical walls of the secretory cells.

The exact secretory mechanism is not known yet; however, it is believed that volatile compounds are secreted through a diffusion system through the cuticle (Fahn, 1988; Wang,

2014).

Volatile compound production is related to the size and age of the glandular trichomes as well as to the number of glands per area of tissue. In recent studies, it was shown that monoterpene synthesis and accumulation are directly controlled by the development of the oil glands during the growth season. For example, linalool content in lavender is proportional to the flower developmental stages (Boeckelmann, 2008). The glandular trichome development process is rapid, and their number increases simultaneously while the tissue matures, especially during the vegetative growth (Ascensão et al., 1999;

Fridman, 2005; Wang, 2014; Werker et al., 1993).

1.3 Regulation of terpenoid synthesis

The biosynthesis of terpenoids is initiated by a group of enzymes called terpene synthases.

Although the accumulation and biosynthesis of terpenoids play an important role in plants' life, terpenoids are not required by plants all the time, but rather only in responses to biotic

12

and abiotic stresses, and during plant developmental stages. Plant defense system also evolves a complex, mostly phytohormones and elicitors, signaling network to trigger the transcriptional regulation of specific terpenoids in Arabidopsis, Spruce, Lima bean, etc.

(Fäldt et al., 2003; Huang et al., 2010; Miller et., 2005; Navia-Giné et al., 2009; Pieterse et.,

2009). However, most of the terpene synthases are spatiotemporally regulated during the developmental stages (Farmer et al., 2003; Pieterse et al., 2009; Vranová et al., 2012). For example, β-ocimene and myrcene from snapdragon flowers were undetectable in unopened and one-day-old flowers but were strongly detected in the anthesis or later stages (Dudareva et al., 2003). In peppermint, monoterpene content reached a peak between twelve to twenty days after leaf emergence and then rapidly declined at full leaf expansion stage (Turner et al,

2000). In lavender, linalool production was developmentally regulated. More specifically, linalool content gradually increased during flower developmental stages, starting from bud to full bloom (Boeckelmann, 2008).Regulation of terpenoids by gene regulation

Gene expression process is initiated when DNA is transcribed into RNA, which is followed by a translation into protein. The regulation of gene expression and the rate of functional protein production in the cell is primarily manipulated at the level of transcription but also at post-transcriptional levels. It is important to notate that gene expression is a process of sequential steps from transcription to post-translational protein modification with few interdependent and simultaneous mechanisms. Plant specialized metabolite biosynthesis is activated by a signal transduction cascade that involves transcription factors, DNA binding proteins or protein complexes that interact with upstream promoter sequences (cis-regulatory elements) of a particular terpene synthase (Patra et al., 2013; Wray et al., 2003).

13

1.3.1.1 Trans-acting factors and cis-elements

Transcription factor (TF) is a DNA-binding protein that controls the rate of transcription by recognizing and binding specific cis-regulatory sequences of a particular gene. Cis-regulatory sequences are usually located adjacent to the promoter sequences of a particular gene, as well as far upstream or downstream, or even in the intron sequences of the target gene. In general, eukaryotic genomic DNA is organized into chromatins, hence restricts the physical accessibility of regulatory protein to initiate the transcription of a gene. Decondensation of chromatin around the core promoter region and some TF binding site(s) is a preliminary process to allow TF binding followed by recruiting the RNA polymerase II complexes on to the basal promoter (Lee & Young, 2000). TFs have a transcription regulatory domain composed of activators and repressors, which can control multiple genes in a metabolic pathway; hence, they are an ideal target for engineering the specialized metabolite accumulation (Grotewold, 2008; Iwase et al., 2009). Repressors and activators control the transcription through various mechanisms, including binding with DNA binding sites, and modifying chromatin structures. In a recent study in spearmint, an MYB (R2R3) TF negatively regulated monoterpene biosynthesis through suppressing GPPS large subunit (Lee

& Young, 2000; Reddy et al., 2017). At the same time, there are numerous examples of TFs activating terpenoid regulation in different plants (Chen et al., 2017; Li et al., 2017;

Spyropoulou et al., 2014; Zhang et al., 2015).

1.3.1.2 Transcription factor families

Transcription factors can be classified into different families according to their DNA binding domain. There are at least 64 TF families found in vascular plant genomes, and more than 40 families have been found in both Arabidopsis and rice genome (Riechmann et al., 2000; 14

Rushton et al., 2008). Myeloblastosis (MYB) was the first plant TF identified, which is required for the synthesis of anthocyanins in maize kernels (Lloyd et al., 2017; Paz-Ares et al., 1987). In addition, MYB TFs are also involved in the regulation of an array of processes including secondary metabolism (e.g. flavonoid biosynthesis), cell fate and identity (e.g. trichome formation), development (e.g. anther development) or abiotic and biotic stress responses (e.g. drought stress and disease resistance; Dubos et al., 2010). Different MYB proteins bind specifically to different DNA binding sites. However, very few of them have been functionally characterized to date (Dubos et al., 2010; Martin & Paz-Ares, 1997).

Apetala 2 (AP2) is an important TF group involved in Arabidopsis flower and seed development processes, abiotic stress acclimation, and hormone-dependent signaling (Dietz.,

2010).

Basic helix-loop-helix (bHLH), one of the largest plant TF family, binds DNA through their N-terminal basic region, whereas the C-terminal HLH region functions as a dimerization domain involved in processes like anthocyanin biosynthesis, trichome differentiation, or light signaling (Heim et al., 2003; Toledo-Ortiz et al., 2003). Myc TFs, a subgroup of bHLH superfamily, are key regulators of jasmonate (JA) responsive genes, which are involved in plant defense systems by synthesizing specialized metabolites (Alves et al., 2014). A highly conserved region of 60 amino acids is a defining feature of WRKY TF and is involved in seed development, dormancy, and germination, biotic and abiotic stresses, as well as related to pathogen infection (Alves et al., 2014).

TFs that are involved in the regulation of terpenoid biosynthesis have been identified in limited numbers to date. Methyl-jasmonate (MeJA)-inducible TF of the Myc family was found to regulate sesquiterpene synthases from Catharanthus roseus and Arabidopsis (Hong.,

15

2012a; Zhang et al., 2011). Two JA-responsive AP2 family transcription factors from

Artemisia annua (AaERF1 and 2) were shown to regulate the Amorpha-4,11-diene synthase

(ADS), a sesquiterpene synthase involved in the biosynthesis of artemisinin (Yu et al., 2012).

Subsequently, another AP2/ERF TF from A. annua (AaORA) was identified. This trichome- specific TF was shown to positively regulate several genes in the artemisinin biosynthetic pathway, including AaERF1 (Lu et al., 2013).

1.3.2 Other modes of gene regulation

Post-transcriptional modification (PTM) is an integral part of gene expression process in all living cells; therefore, proteins can undergo further reversible or irreversible modifications of certain amino acid residues, N- or C- terminal cleavages for processing and maturation after translation. Many regulatory proteins and metabolic enzymes undergo a variety of PTMs, notably resulting in changes in oligomeric state, stabilization/degradation, and activation or deactivation (Huber & Hardin, 2004), followed by optimization of metabolic flux. HMGR and DXS are the two rate-limiting enzymes in MVA, and DXP biosynthetic pathways undergo positive and negative regulation at the post-transcriptional level to control the terpenoid accumulation at the different developmental stages (Rodríguez-Concepción &

Boronat, 2015). Protein-protein interaction of post-transcriptional interaction can significantly impact the regulatory activity of TFs, such as MYB44, MYB75, MYB41,

MYB15, MYB77 were phosphorylated to regulate different signaling cascades (Millard et al., 2019).

Epigenetics, through histone modifications and DNA methylation, can play a role in chromatin structure and potentially determine the transcriptional state and gene expression without altering the underlying DNA sequences (Chinnusamy & Zhu, 2009). In lavender,

16

linalool production was developmentally regulated. More specifically, linalool content gradually increased during flower developmental stages starting from bud to full bloom

(Boeckelmann, 2008).

1.4 Research objectives and outline

The principal objective of my research is to-

• Develop lavender genomic resources

• To explore regulatory elements involved in terpene biosynthesis in lavender

• Clone and characterize lavender terpene synthases.

We hypothesized that:

1. Genomic resources can help identify and clone the transcription factors involved

in terpenoid biosynthesis,

2. Transcription factors control the expression of linalool and 1,8-cineole genes,

3. Alcohol acetyltransferases convert monoterpenes to monoterpene esters.

In Chapter 2, we used Illumina sequencing to obtain mRNA sequence information from leaf and flower of three economically important lavender species- L. angustifolia, L. x intermedia, and L. latifolia. We also had access to Lavender EST databases corresponding to cDNA libraries of L. angustifolia leaf and flower (Lane et al., 2010), and L. x intermedia glandular trichome tissues (Demissie et al., 2012; Sarker et al., 2012). Raw transcript reads were assembled using CLC Genomic Workbench software and annotated against online databases. This transcriptome data was useful to identify all the terpene synthases, various

17

transcription factors, all the biosynthetic pathway genes, etc. S-linalool, an isomer of R- linalool (most dominant compounds in lavender EO), is rarely present in lavender, and the terpene synthase encoding the S-linalool was identified using this database and functionally characterized (Adal, 2019).

In Chapter 3, upstream genomic sequences of linalool synthase and 1,8-cineole synthase were identified by genome walking procedure. Putative promoter sequences were cloned into pCambia1391z plant binary vector and stably transformed into N. benthamiana to analyze the promoter localization. Using the Y1H system, a cDNA library was prepared from

L. x intermedia flower tissues and used as prey against promoter bait sequences. This study identified 96 sequences encoding protein sequences that interact with one or both promoters.

Subsequently, a total of 11 TFs were cloned into pGA482 binary vector and co-transformed into tobacco leaf with either linalool or 1,8 cineole synthase promoters. We were able to find out the combination of TFs which were able to regulate promoter sequences in tobacco leaf.

In Chapter 4, we demonstrated that two acetyltransferases, isolated from our genomic resources, were able to convert lavender terpenoids into their corresponding esters. LiLAT-3 and LiLAT-4 were isolated from L. x intermedia glandular trichomes and converted lavandulol, geraniol and nerol into lavandulol acetate, geraniol acetate, and nerol acetate, respectively. Catalytic efficiency was recorded for the enzymatic assays and found to be in the range of other known acetyltransferases.

18

Figure 1.2 Terpene biosynthesis. GPP is synthesized by the condensation of one molecule of IPP and one molecule of DMAPP catalyzed by GPP synthase. FPP is the condensation product of GPP and one molecule of IPP, while GGPP is produced through the condensation of one molecule of GPP and two molecules of IPP. Monoterpenes are a result of the derivatization and rearrangement of GPP, while FPP and GGPP are the precursors to sesqui- and triterpenes, and di- and tetraterpenes, respectively. Inspired from (Mahmoud and Croteau 2002) with permission from TRENDS in plant science.

19

Figure 1.3 Schematic presentation of monoterpene synthesis. The reaction mechanism starts with the ionization of the geranyl diphosphate substrate. The resulting carbocation can undergo a range of cyclizations, hydrogen shifts, and rearrangements before the reaction is terminated by deprotonation or water capture. Cyclic monoterpenes are synthesized from α-terpinyl cation. Acyclic monoterpenes are produced from either geranyl cation or linalyl cation. Reproduced from (Degenhardt et al., 2009) with permission from Phytochemistry.

20

Figure 1.4 Schematic presentation of sesquiterpene synthesis in lavender. The reaction mechanism for sesquiterpene synthases starts with the ionization of the FPP. The resulting carbocation can undergo a range of cyclizations, hydrogen shifts, and rearrangements before any stable compound is produced. Reproduced from (Degenhardt et al., 2009) with permission from Phytochemistry.

21

2 Chapter: Transcriptomic database 2.1 Synopsis

Next-generation RNA sequencing (RNA-Seq) has been widely used as a cost-effective and efficient approach for obtaining deep sequence information for plant transcriptomes (Wang et al., 2009; Grabherr et al., 2011). It has also enabled digital gene expression (DGE) profiling experiments that provide information on the expression pattern of genes. In this context,

RNA-Seq coupled with DGE profiling is expected to be very helpful in identifying structural and regulatory genes that control the biosynthesis of essential oil constituents in lavenders.

Using this approach, a number of secondary metabolic pathways, including terpenoid biosynthesis, have been investigated in several plants, including Taxus (Zhao et al., 2011), S. grosvenorii (Lei et al., 2011), M. cochinchinensis (Hyun et al., 2012), grape hyacinth (Lou et al., 2014), and medicinal Cannabis (Braich et al. , 2019). In Lavandula, EST databases have been exploited to identify highly expressed monoterpene and sesquiterpene biosynthesis genes. However, several important TPSs, regulatory genes, and genes involved in the trafficking and storage of EO constituents are yet to be identified in these plants. In this study, we used RNA-Seq to develop a database of sequences expressed in economically important lavender species. Our specific goals were to identify and functionally characterize scarcely expressed TPS genes, and identify transcription factors (TFs) involved in isoprenoid biosynthesis in lavenders. Here, we report the de novo assembly of more than 28 million short sequencing reads, which resulted in 101,618 unique contigs.

22

2.2 Materials and methods

2.2.1 Plant Material and Nucleic Acid Extraction

Three economically important lavender species were used in this study including L. angustifolia Cv. Lady, L. latifolia and their natural breed L. intermedia Cv. Grosso. L. x intermedia and L. angustifolia were grown under natural conditions at a field site at the

University of British Columbia, Okanagan campus (Kelowna, BC, Canada) (Boeckelmann,

2008). L. latifolia flower and leaf tissue were provided by Dr. Tim Upson from Cambridge

University (UK). Leaf and flower tissues were flash-frozen in liquid nitrogen, and total RNA was extracted using an RNeasy plant mini kit (Qiagen, USA) treated with the Qiagen RNAse free DNAse I (Qiagen, USA) to degrade contaminant genomic DNA following the manufacturer's protocol. Two EST libraries- L. angustifolia leaf and flower, and L. intermedia glandular trichome- were also used in this experiment containing 22,418 unigenes

(Lane et al., 2010; Sarker et al., 2013).

2.2.2 Illumina Sequencing and de novo Assembly

Library preparation and transcriptome sequencing were performed at Plant Biosis

(Lethbridge, Alberta, Canada) using a standard protocol. Briefly, poly(A) mRNA was isolated using oligo(dT) beads, and fragmented prior cDNA synthesize. The short fragments were ligated to sequencing adapters and purified through agarose gel electrophoresis. Finally, suitable cDNA fragments from different tissue libraries were PCR amplified and sequenced using the Illumina HiSeq 2000 platform (Illumina Inc., San Diego CA, USA).

Raw sequence data were filtered to remove low-quality reads (limit 0.05 and length >20bp) and adaptor contamination using the CLC Genomics Workbench software prior to assembly.

Sequences from all EST libraries (L. angustifolia flower and leaf and, L. x intermedia cv

23

Grosso glandular trichomes) and Illumina reads from leaf and flower tissues of three lavender species (L. angustifolia, L. x intermedia, and L. latifolia) were assembled into contigs by CLC Genomics Workbench using multiple words (30-60) and bubble (50-600) sizes. Trinity de novo assembly facility was also used to generate a lavender transcriptome database using compute Canada facilities (www.westgrid.ca). Out of 15 assembled transcriptome databases, only five were selected for further studies based on initial homology search for known terpene related genes from lavender. Selected assembled files were then combined, and redundant sequences were trimmed off using CD-HIT-EST (Fu et al., 2012) with the threshold of 0.90 identity. For further validation of de novo transcriptome assembly, pre-processed Illumina sequence reads were mapped back to the assembly using CLC

Genomics Workbench. Most reads mapped back to the assemblies (97% uniquely mapped back, with >92% properly paired), validating the de novo transcriptome assembly process.

2.2.3 Functional Annotation

The assembled sequences were aligned using the Basic Local Algorithm Search Tool

(BLASTx, BLASTp) against three databases: UniProtKB (https://www.uniprot.org/),

National Center for Biotechnology Information (NCBI) non-redundant (nr) protein database

(ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nr.gz), and Phytozome v9 databases

(https://phytozome.jgi.doe.gov/pz/portal.html). BLAST hit results with significant matches

(e-value ≤ 1e-5) were employed for an additional inference about gene function (i.e., molecular function, biological process, and cellular components) using plant-specific GO terms obtained from TAIR database (https://www.arabidopsis.org/download_ files/GO_and_PO_Annotations/Gene_Ontology_Annotations/ATH_GO_GOSLIM.txt) and validated using the pantherdb.org database. The output contigs were also functionally

24

annotated using trinotate (https://trinotate.github.io/). Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from a model or non-model organisms. Trinotate makes use of a number of different well-referenced methods for functional annotation including homology search to known sequence data (BLAST+/SwissProt), protein domain identification (HMMER/), protein signal peptide and transmembrane domain prediction (signalP/tmHMM), and leveraging various annotation databases

(eggNOG/GO/Kegg databases). Interproscan analysis was also conducted to further classify the protein families (Compute Canada, Westgrid, CA). All functional annotation data derived from the analysis of transcripts is integrated into an SQLite database, which allows fast, efficient searching for terms with specific qualities related to a desired scientific hypothesis or a means to create a whole annotation report for a transcriptome.

2.2.4 Quantification and gene expression levels

Differential expression analysis was performed using the CLC Genomics Workbench

(Qiagen), and the Tuxedo suite package in Bowtie, Tophat, Cufflinks, and statistical analysis and normalization were performed according to developer guidelines, with contigs with less than five reads aligned to them being excluded from the analysis. For de novo assembly, reads from each plant were mapped against the newly assembly de novo transcriptome library. Differential expression analysis was performed in the CLC Genomics Workbench

(Qiagen) using reads per kilobase of transcript per million reads mapped (RPKM).

25

2.2.5 Identification of Transcription Factor (TF) families

Transcription factors were identified and classified using the Arabidopsis database with a cut off e-value of < 1e-5. Identified TF candidates were confirmed with the plant TF database

(http://planttfdb.cbi.pku.edu.cn/). Subsequently, the differentially expressed TFs were picked out from DEGs in leaf Vs flower of L. angustifolia, L. intermedia, and L. latifolia.

2.2.6 qPCR to validate TF candidates

The relative abundance of transcription factor candidates was analyzed in young leaves, and mature flowers of all three lavender species by quantitative (qPCR), using the SteponePlus

Real-Time detection system (Applied Biosystem, Canada). Complementary DNA (cDNA) for relative transcript analysis was synthesized using iScript cDNA synthesis kit (Bio-Rad) according to the manufacturer’s instructions. SYBR® Select Mastermix (Life Technologies,

Canada), along with approximately 150 ng of cDNA as a template and 500 nM of each of the primers were used in a 10 µL reaction volume. Gene-specific primers (Table 2.1) used in quantitative real-time, and primer sequences were analyzed for a hairpin, self-dimer, and hetero-dimer formation using IDT primer quest software

(https://www.idtdna.com/pages/tools/primerquest). The following program was used for qPCR: initial heat-labile Uracil-DNA Glycosylase (UDG) activation step at 50 °C for 2 min, denaturation at 95 °C for 2 min, and 50 cycles of 3 s at 95 °C and 30 s at 60 °C. Following threshold-dependent cycling, melting dissociation was performed from 60 to 95 °C at 0.3

°C/s melt rates with a smooth curve setting. PCR efficiency was calculated using

LinRegPCR for all the primers used in this experiment and updated in the SteponePlus data analysis software (Ruijter et al., 2009). Normalized expression values (ΔΔCT) of TF

26

candidates were calculated by DataAssistTM software (Life Technologies, Canada) using β- actin and 18S RNA as reference genes.

Table 2.1: List of primers used in TF qPCR study

TF candidates Primer name Sequences (5' ==> 3')

MYC_1 Forward GGACATAGCCGGATTTCAAG Reverse CTCGGAGTCAAGATAGGAGAA MYC_2 Forward CCACCACAGCTAGATTCTTC Reverse GTTGCCAGTCCATGAGTT MYB_186 Forward GAGAACTGCACAGCTCAAA Reverse GGAGAGTAGGACTCAACTGTAT MYB_56 Forward CAGCTCAAACAGCAGAGAA Reverse GAGGGAAGGTGGAGAGTAG MYB_63 Forward CCGTTAGGAATGCCGTTATC Reverse GCAATTCCGTCACTCGATTA MYB_64 Forward AGGACGAGACCATCATCAA Reverse GGAGTTCCAGTGGTTCTTAAT MYB_130 Forward AGACTGAGATGGCTGAATTATC Reverse AAGGACCATCTGTTTCCTAAC MYB_67 Forward GCGATGATGAAGGAGATGAC Reverse TCCTCCTGCAACGAAGTA MYB_70 Forward GATACTTGTCTCCTTCCTTCAC Reverse TACGCCTTTCTTTCGTTCTT MYB_139 Forward GAGATCACTACCAGCCAATG Reverse CCTCTCCTCACATCTTCTCT MYB_146 Forward GGAGGAACAGCTCTTGATTATAG Reverse GTTCTTGATCTCGTTGTCTGT MYB_88 Forward TAGGATCATTAGTGGTGTTTGG Reverse CGAGACTGGTAGAGCAAATG MYB_86 Forward GAAGAGGATGAAGAGGGTTTG Reverse GCACATATAGCCAGCCATT MYB_93 Forward GGGATTACTACAGAGCAACAA Reverse CCACCAAACTCTTCCATCATA AP2_38 Forward GTATAGAGGAGTCACCAGACA

27

Reverse CCAAGATAGACTTGCCTTCC AP2_18 Forward CTCCTCCTCGTTGAAGGT Reverse GTCGTCGCGGTTCAAAG AP2_78 Forward CATCTGCTTCATCTCCTCATC Reverse TGCAGCTAGAGCGTATGA WRKY_15 Forward GAACTCCCTTTGGCTTTGT Reverse CCACATCCGGTTCGAATAAG WRKY_23 Forward CTTCACAGTTCAGACCACAG Reverse GGCTACGTTATCTGTTCTTCTT WRKY_4 Forward TTGCATAACAGTACCTCTTCC Reverse CAGATCAACTCTGGGTGATAAT bHLH_167 Forward GATCAAAGTGGTGGCTTGA Reverse TTTAGGCAGTTCGGGTTTAC bHLH_13 Forward TGAAGTTGGCCTCAGTAAATC Reverse CTGGTGCAGAGGAATCTAAAG bHLH_146 Forward CAACTAACAAGAACGACAAAGG Reverse CGAGCTCGAACATGGATATAA bHLH_4 Forward TTGGAGTGTGAGAGAGAGAG Reverse GCACGTATAAACCTCGATACC bHLH_51 Forward AAGCACCAGAAGGGTTTATC Reverse CGCTTGTAGTAGCTTCATTCT β-actin Forward TGTGGATTGCCAAGGCAGAGT

Reverse AATGAGCAGGCAGCAACAGCA 18s rRNA Forward GTGACGGGTGACGGAGAA

Reverse GACTCAATGAGCCCGGTA

2.3 Results

2.3.1 RNA sequencing and de novo transcriptome assembly

To investigate the formation and regulation of lavender essential oil, flower and leaf tissues were collected from three economically important lavender species (mentioned in the methods and materials) to extract RNA and synthesize the cDNA library. Approximately 30 million raw reads were generated. After removal of adaptor sequences, ambiguous 28

nucleotides, and low-quality clean reads, 28.83 million high quality reads along with previously generated 22,592 EST sequences were assembled using CLC workbench and

Trinity with a series of stepwise strategies. Best assembled transcripts were then combined, and redundant sequences were trimmed off using CD-HIT-EST (Fu et al., 2012) which generated 101,618 unique contigs. An overview of the sequencing and assembly outlined in

Table 2.2. The N50 value, extensively used to evaluate de novo assembly, was 831 bp

(Table-2.2).

Table 2.2: Summary of the Lavender Transcriptome

total number of the raw reads 29008569 total number of clean reads 28830705 GC content (%) 45.92 total number of unigenes 101,618 mean length of unigenes (bp) 692.64 Min length of unigenes (bp) 201 Max length of unigenes (bp) 12,223 N50 value (bp) 831

The assembly generated a number of larger unigenes: 3081 unigenes longer than 2,001 bp,

15211 unigenes between 1,001-2,000 bp, and 35044 unigenes between 501 – 1000 bp (Figure

2.1).

29

Figure 2.1 Bar graph distribution of unigenes from the lavender transcriptomic database.

2.3.2 Annotation, gene ontology, and protein families

In order to maximize the information of novel assembled unigenes, all unigene sequences were searched against seven public databases: NCBI non-redundant protein (Nr) database,

SwissProt protein database, Phytozome plant protein database, protein family (Pfam), Gene

Ontology (GO), eukaryotic Ortholog Groups (KOG), and Kyoto Encyclopedia of Genes and

Genomes (KEGG) database (Table-2.3). 75484 unigenes were annotated using this strategy, accounted for 74.6% of the total unigenes. Among 26337 unannotated unigenes, 20,152

(76.5%) were less than 500 bp indicating the importance of the unigene lengths during annotation.

30

Table 2.3: Summary of annotations on unigenes against public databases.

Annotation was prepared based on the a) Phytozome database b) Trinotate database. a)

Phytozome_Annotation Database Number of annotated Percent of annotated unigenes unigenes (%) NCBI (nr) 72081 70.93 Swiss-prot 52749 51.90 Pfam 60237 59.27 KOG 27382 26.94 KO 25481 25.07 GO 40412 39.76 ARTH 76607 75.38 Phytozome 76825 75.60 b) Trinotate_Annotation Database Number of annotated unigenes Percent of annotated unigenes (%)

Pfam 36172 35.59 eggnog 17224 16.94 go 37637 37.03 sprot_blastx 55755 54.86 sprot_blastp 39225 38.60 signalP 2500 2.46 TmHMM 9319 9.17

The majority of the top hits were matched to protein sequences of Solanum lycopersicum,

Vitis vinifera, Theobroma cacao, Populus trichocarpa (Table-2.4).

31

Table 2.4: Species distribution of the top BLAST hits.

Species % Solanum lycopersicum 23.59213 Vitis vinifera 15.45054 Theobroma cacao 7.71309 Populus trichocarpa 5.404478 Ricinus communis 4.750094 Prunus persica 4.700268 Glycine 3.140155 Fragaria vesca subsp. vesca 2.721616 Cucumis sativus 2.608676 Medicago truncatula 1.686339 Cicer arietinum 1.679695 Nicotiana tabacum 1.233474 1.14268 Rest of the species 24.18

Next, gene ontology (GO) performed in order to classify unigenes. It is a well-known fact that one sequence could be assigned to more than one GO terms. Using the Arabidopsis database and pantherdb.org facilities, 40412 unigenes were mapped to at least one GO terms.

Among them, 44.27% unigenes were assigned to “biological process”, 32.4% unigenes to

“molecular function”, and 23.33% unigenes to “cellular component”. In the biological process category, dominant subcategories were “metabolic process (36.4%)” and “cellular process (31.2%)”. Among molecular function terms, “catalytic activity (45.5%)” and

“binding (26.5%)” were observed to be the most abundant classes. Subcategory “cell part

(41.1%)”, “organelle (26.4%)” showed a high percentage of unigenes in the category of cellular component (Figure 2.2).

32

Figure 2.2 Gene Ontology of lavender transcriptome database. a) biological process, b) cellular component, and c) molecular function.

The search of additional databases for protein families, domains, regions, and sites were performed via the InterPro scan EBI web server. The 50 top InterPro entries obtained are presented in Table-2.5. The most abundant class of enzymes were protein kinases and nucleoside triphosphate hydrolase. The cluster of orthologous groups (COG) were determined using the EggNOG database which identified 16,670 (16.5%) unigenes with a top hit of Serine Threonine protein kinase (11.8%). To identify biological pathways that are related to unigenes, the unigenes were annotated with KEGG ortholog (KO) and then mapped to the reference pathways in KEGG. 25481 unigenes were annotated with KEGG pathways and were assigned to 140 pathways to metabolism, 21 pathways to genetic information processing, 34 pathways to environmental information processing, 26 pathways to cellular processes (Appendix - A). Among the pathways identified, the ones related to secondary metabolism after KEGG analysis are shown in table-2.6.

33

Table 2.5: Summary of the most common InterPro entries found in the Lavender transcriptome database.

InterPro Description Frequency

IPR027417 P-loop containing nucleoside triphosphate hydrolase 1947

IPR011009 Protein kinase-like domain 1884

IPR000719 Protein kinase domain 1657

IPR032675 Leucine-rich repeat domain, L domain-like 1080

IPR013083 Zinc finger, RING/FYVE/PHD-type 899

IPR008271 Serine/threonine-protein kinase, active site 889

IPR016024 Armadillo-type fold 729

IPR002885 Pentatricopeptide repeat 728

IPR017441 Protein kinase, ATP binding site 726

IPR016040 NAD(P)-binding domain 677

IPR011990 Tetratricopeptide-like helical domain 654

IPR009057 Homeodomain-like 639

IPR015943 WD40/YVTN repeat-like-containing domain 585

IPR011989 Armadillo-like helical 584

IPR001245 Serine-threonine/tyrosine-protein kinase catalytic domain 569

IPR017986 WD40-repeat-containing domain 556

IPR012677 Nucleotide-binding alpha-beta plait domain 549

IPR013320 Concanavalin A-like lectin/glucanase domain 525

IPR001841 Zinc finger, RING-type 523

IPR029058 Alpha/Beta hydrolase fold 512

IPR001611 Leucine-rich repeat 510

IPR001680 WD40 repeat 483

IPR000504 RNA recognition motif domain 462

IPR029063 S-adenosyl-L-methionine-dependent methyltransferase 436

IPR017853 Glycoside hydrolase superfamily 375

IPR001128 Cytochrome P450 370

34

IPR001005 SANT/Myb domain 362

IPR020846 Major facilitator superfamily domain 344 IPR012337 Ribonuclease H-like domain 324 IPR012336 Thioredoxin-like fold 316 IPR002182 NB-ARC 308 IPR017930 Myb domain 308 IPR011992 EF-hand domain pair 281 IPR013781 Glycoside hydrolase, catalytic domain 272 IPR003593 AAA+ ATPase domain 261 IPR011991 Winged helix-turn-helix DNA-binding domain 259 IPR013785 Aldolase-type TIM barrel 257 IPR001810 F-box domain 255 IPR023214 HAD-like domain 253 IPR019775 WD40 repeat, conserved site 251 IPR029044 Nucleotide-diphospho-sugar transferases 242 IPR002048 EF-hand domain 238 IPR002401 Cytochrome P450, E-class, group I 237 IPR023753 FAD/NAD(P)-binding domain 229 IPR013026 Tetratricopeptide repeat-containing domain 218 IPR011598 Myc-type, basic helix-loop-helix (bHLH) domain 209 IPR003439 ABC transporter-like 208 IPR013210 Leucine-rich repeat-containing N-terminal, plant-type 208

35

Table 2.6: KEGG pathway related to the biosynthesis of secondary metabolites found in the lavender transcriptome database.

KEGG pathway EC number Enzyme name No of sequences Phenylpropanoid biosynthesis ec:1.1.1.7 Peroxidases 20 ec:2.1.1.104 Caffeoyl-CoA O-methyltransferase 7 ec:3.2.1.21 Beta-glucosidase 9 ec:2.1.1.68 Caffeate O-methyltransferase 2 ec:1.14.13.11 Trans-cinnamate-CoA ligase 1 ec:6.2.1.12 4-coumarate-CoA ligase 6 ec:1.1.1.195 Cinnamyl alcohol dehydrogenase 8 ec:2.3.1.133 Shikimate O- 1 hydroxycinnamoyltransferase ec:4.3.1.24 Phenylalanine aminomutase 1 ec:6.2.1.12 4-coumarate--CoA ligase 6 Flavonoid biosynthesis ec:1.1.1.219 Dihydroflavonol 4-reductase 2 ec:1.14.11.22 Flavone synthase 1 ec:1.14.11.23 flavanone 3-hydroxylase 1 ec:1.14.11.9 Flavanol synthase 2 ec:1.14.13.11 Trans-cinnamate 4-monooxygenase 1 ec:1.14.13.21 Flavonoid 3'-monooxygenase 1 ec:1.14.13.88 Flavonoid 3',5'-hydroxylase 1 ec:2.1.1.104 Caffeoyl-CoA O-methyltransferase 7 ec:2.3.1.133 Shikimate O- 1 hydroxycinnamoyltransferase ec:2.3.1.74 Chalcone synthase 2 36

ec:5.5.1.6 Chalcone--flavanone isomerase 2

Terpenoid biosynthesis ec:1.1.1.216 Farnesol dehydrogenase 2 ec:1.1.1.34 3-hydroxy-3-methylglutaryl- 2 coenzyme A reductase ec:1.1.1.354 Farnesol dehydrogenase 1 ec:1.1.1.88 3-hydroxy-3-methylglutaryl- 2 coenzyme A reductase ec:1.17.7.1 4-hydroxy-3-methylbut-2-en-1-yl 1 diphosphate synthase ec:1.17.7.3 4-hydroxy-3-methylbut-2-en-1-yl 1 diphosphate synthase ec:1.17.7.4 4-hydroxy-3-methylbut-2-enyl 2 diphosphate reductase ec:2.2.1.7 1-deoxy-D-xylulose-5-phosphate 1 synthase ec:2.3.1.9 Acetyl-CoA acetyltransferase 2 ec:2.3.3.10 Hydroxymethylglutaryl-CoA 1 synthase ec:2.5.1.10 Geranylgeranyl pyrophosphate 3 synthase ec:2.5.1.1 Dimethylallyltranstransferase 3 ec:2.5.1.20 Cis-prenyltransferase 1 ec:2.5.1.29 Geranylgeranyl pyrophosphate 2 synthase ec:2.5.1.68 (2Z,6E)-farnesyl diphosphate 1 synthase ec:2.5.1.92 (2Z,6Z)-farnesyl diphosphate 1 synthase 37

ec:2.7.1.148 4-diphosphocytidyl-2-C-methyl-D- 2 erythritol kinase ec:2.7.1.185 Mevalonate-3-kinase 1 ec:2.7.1.186 Mevalonate-3-phosphate-5-kinase 1 ec:2.7.1.36 Mevalonate kinase 1 ec:2.7.4.26 Isopentenyl phosphate kinase 1 ec:2.7.4.2 Phosphomevalonate kinase 1 ec:2.7.7.60 2-C-methyl-D-erythritol 4- 1 phosphate cytidylyltransferase ec:3.1.7.6 (E, E)-farnesol synthase 1 ec:4.1.1.33 Diphosphomevalonate 1 decarboxylase ec:4.1.1.99 Phosphomevalonate decarboxylase 1 ec:4.2.3.27 Isoprene synthase 1 ec:4.6.1.12 2-C-methyl-D-erythritol 2,4- 1 cyclodiphosphate synthase ec:5.3.3.2 IPP isomerase 2

Alkaloid biosynthesis ec:1.14.11.20 Deacetoxyvindoline 4-hydroxylase 1 ec:2.6.1.42 Branched-chain-amino-acid 1 aminotransferase ec:4.1.1.28 Aromatic-L-amino-acid 1 decarboxylase ec:4.3.3.2 Strictosidine synthase 3

Steroid biosynthesis ec:1.1.1.145 3-beta-hydroxysteroid 1 dehydrogenase 38

ec:1.14.14.17 monooxygenase 1 ec:1.14.14.1 Cytochrome P450 1 ec:1.14.21.6 Lathosterol oxidase 1 ec:1.3.99.5 3-oxo-5-alpha- 4- 1 dehydrogenase ec:2.1.1.41 Sterol 24-C-methyltransferase 1 ec:2.1.1.6 Catechol O-methyltransferase 2 ec:2.5.1.21 Squalene synthase 1 ec:5.3.3.5 Cholestenol Delta-isomerase 1

2.3.3 DGE analysis and gene quantification

Filtered illumine seq sequences were mapped to the lavender reference database to quantify the digital gene expression. After de novo assembly, filtered reads were mapped back to the reference transcriptome yielding 89.75% and 88.33%, 91.21% and 91.01%, 92.30% and

91.96% clean reads for leaf and flower tissues of L. angustifolia, L. intermedia, and L. latifolia, respectively, indicating an ideal DEG analysis and mapping (Table 2.7).

Table 2.7: Summary of DGE sequencing and mapping

Mapped Tissue Raw Trimmed Mapped (bp) (%) LA-LF 6284997 6236002 5597087 89.75 LA-FL 4877352 4,837,049 4272388 88.33 LI-LF 8392462 8322459 7590900 91.21 LI-FL 2226272 2209100 2011632 91.06 LL-LF 3444352 3443739 3178547 92.30 LL-FL 3783134 3782356 3478212 91.96 Total 29008569 28830705 26128766

39

To quantify the gene expression level, the number of mapped reads for each gene was calculated and then normalized to reads per kilobase of exon model per million mapped reads

(RPKM). Significant differential expression values (medium to very high) were found in flower tissues compared to leaf in all three lavender experimental species (Table 2.8). 19,691 unigenes showed significantly differential expression in L. angustifolia flower compared to leaf whereas 11298 and 11554 unigenes for L. intermedia and L. latifolia flowers. We also compared the flower of all three lavender species among each other to find out different gene expression levels (Table 2.8).

Table 2.8: Statistics of gene expression abundance in three lavender species

RPKM expression LA Vs LA Vs LI Vs value (log2) level LA LI LL LI LL LL 0-0.5 no 34656 44036 43278 51002 41279 45792 0.5-2 low 48191 47204 47706 41505 46920 46447 2.01-4 medium 17241 9991 10765 9498 12931 9676 4.01-6 high 2143 1117 685 509 1263 574 >6 very high 307 190 104 24 145 49

This information might lead us to assess genes responsible for EO variation in different lavender species. We found that 998 unigenes showed medium to very high expression levels in flower tissues compared to the leaf of all three species. However, only 189 unigenes were found bearing significant expression levels in flower tissues compared among different species (Figure 2.3).

40

Figure 2.3 Comparisons of digital gene expressions (DGEs). a) Venn diagram of a number of DGEs within and among species in flower compared to leaf tissues, and b) changes in gene expression profile in flowers compared to leaves of the three species. c) Venn diagram of a number of DGEs among species based on the flower tissues. The numbers of differentially expressed genes that are common in the two or three species are presented in overlapping sets in the Venn diagram. Most digitally expressed genes were up-regulated in flowers, with few down-regulated genes relative to leaf tissues. LA- L. angustifolia, LI- L. x intermedia, LL- L. latifolia; LF- leaf and FL- flower.

To investigate the biological event that takes place in EO variation in different lavender species, we also analyzed GO terms (medium to very high expression values). In the category of biological process, GO terms metabolic process, cellular process, localization, biological regulation, response to a stimulus, cellular component organization or biogenesis were 41

significantly enriched in all comparisons. For molecular function, GO terms translation regulator activity, binding, catalytic activity, transporter activity were the most significantly enriched members. Similarly, for the cellular component, GO terms cell part, organelle, membrane, and macromolecular complex were the most prominent members. To identify the pathways which are significantly altered between leaf and flower of the same species and flower to flower between species, we have used the PANTHER (www.pantherdb.org) database to assess the KEGG pathways. 136 pathways were up-regulated while 89 pathways were down-regulated in L. angustifolia, 102 pathways were up-regulated and 65 were down- regulated in L. intermedia, and 102 pathways were up-regulated and 74 pathways were down-regulated in L. latifolia flowers (Appendix - B).

2.3.4 Analysis of putative genes in terpene biosynthesis

In order to assess the depth and quality of sequences in the transcriptome, we searched the database for isoprenoid biosynthetic genes that were previously cloned from lavenders and/or other plants. The results demonstrated that the transcriptome database has sufficient depth and high-quality sequence information. It contains full-length transcript sequences for most of the genes involved in both MVA and MEP pathways (Figure 2.4), and short-chain prenyltransferases including GPPS, FPPS, LPPS, and GGPPS responsible for conversion of

IPP/DMAPP to various linear precursors for regular monoterpenes (GPP), irregular monoterpenes (LPP), sesquiterpenes (FPP), and diterpenes (GGPP) found in lavenders and other plants (Demissie et al., 2013; Tholl et al., 2011).

42

Figure 2.4 Heat map showing the expression patterns of genes involved in the biosynthesis of TPS precursors. Blue color indicates low expression; white indicates medium expression and red represents a high expression. LA- L. angustifolia, LI- L. x intermedia, LL- L. latifolia; LF- leaf and FL- flower.

Finally, the database contains all TPS genes previously cloned from the three lavender species (Figure 2.5) including trans-α-bergamotene synthase, cadinol synthase, germacrene synthase, limonene synthase, β-caryophyllene synthase, bornyl diphosphate synthase, caryophyllene synthase, R-linalool synthase, 3-carene synthase, 1,8-cineole 43

synthase, β-phellandrene synthase, borneol dehydrogenase, and three monoterpene acetyltransferases (Adal et al., 2017; Demissie et al., 2012; Demissie et al., 2011; Despinasse et al., 2017; Jullien et al., 2014; Landmann et al., 2007; Sarker et al., 2013; Sarker et al.,

2012; Sarker & Mahmoud, 2015). The presence of the full complement of previously reported genes involved in essential oil metabolism is indicative of the high quality and depth of the lavender transcriptome reported here.

We analyzed the expression patterns of the isoprenoid biosynthetic genes in silico

(Figure 2.4). A total of 17 full-length genes encoding enzymes involved in the MEP pathway were identified in the lavenders transcriptome, including five 1-deoxy-D-xylulose 5- phosphate synthase (DXS) genes, three 1-deoxy-D-xylulose 5-phosphate reductoisomerase

(DXR) genes, one 4-diphosphocytidyl-2-C-methyl-D-erythritol synthase (MCT) gene, two 4-

(cytidine 5-diphospho)-2-C-methyl-D-erythritol kinase (CMK) genes, one 4-hydroxy-3- methylbut-2-en-1-yl diphosphate synthase (HDS) gene, and three 4-hydroxy-3-methylbut-2- en-1-yl diphosphate reductase (HDR) genes. Three out of five DXS genes showed up- regulation in the flower compared to leaf tissue, but no significant differences among tested species (Figure 2.4). Additionally, eight unigenes were found to be related to the MVA pathway, including two acetyl-CoA acetyltransferases (AACT), one hydroxymethylglutaryl-

CoA synthase (HMGS), two hydroxymethylglutaryl-CoA reductases (HMGR), one phosphomevalonate kinase (PMK), and one mevalonate diphosphate decarboxylase (MDC).

HMGR – the rate-limiting step in the MVA pathway – was found to be up-regulated in the flower of L. angustifolia. A 4.5-fold up-regulation of HMGR was also found in L. latifolia flowers compared to both L. angustifolia and L. x intermedia flowers.

44

Figure 2.5 Heat map showing the expression patterns of TPSs that contain long DNA sequences (≥ 1kb). Blue color indicates low expression; white indicates medium expression and red represents a high expression. LA- L. angustifolia, LI- L. x intermedia, LL- L. latifolia; LF- leaf and FL- flower.

In addition to the MEP and MVA pathway genes, we identified a total of 65 TPS unigenes (>1 kb sequence cut off), including sequences corresponding to all previously reported lavender TPSs (full length). However, a few unknown TPS sequences are not full length and worth further investigation. As anticipated, TPSs corresponding to products found

45

in flowers, e.g. (R)-Linalool synthase (TPS-48, in the heat map), were found to be up- regulated in flower compared to leaf tissues in all three species. However, a significant difference in the expression of these genes was not detected between the flowers of the three species tested (Figure 2.5).

2.3.5 Selection of transcription factors involved in terpene biosynthesis

We identified 1633 TFs, representing 2.1% of the lavenders' transcriptome. The most abundant TF families were bHLH (209), WRKY (190), and MYB (189) followed by AP2

(79), bZIP (77), zinc finger (19), NAC (13), and MYC (9), which have been shown to regulate secondary metabolism in plants (Hong et al., 2012a; Lu et al., 2013; Memelink &

Gantet, 2007; Wang et al., 2016; Xu et al., 2004). DGE analysis demonstrated that many of these TF candidates were differentially expressed in lavender flowers (Figure 2.6) and might represent good candidates for TFs controlling EO metabolism in lavenders.

46

Figure 2.6 Heat map showing the normalized expression of differentially regulated transcription factors (TFs) in flowers relative to leaf tissues. Of the total TFs identified that accounts for over 2 % of the total lavenders transcriptome, bHLH, WRKY, and MYB were the most abundant TFs, followed by AP2, bZIP, and few others. The color bar represents the Z-scores of normalized expression. The color scale of blue indicates low expression; white indicates medium expression and red represents a high expression. The expression patterns of TFs in flowers were compared to leaves within a species, and flowers between species. LA- L. angustifolia, LI- L. x intermedia, LL- L. latifolia; LF- leaf and FL- flower.

To validate the DGE result, a set of 25 candidates (including 3 AP2, 5 bHLH, 12 MYB, 3

WRKY, and 2 MYC TFs) were selected for qPCR analysis. The qPCR experiments (Figure

2.7) confirmed the DGE analysis with little differences. For example, DGE studies indicated that AP2-18 was up-regulated in L. angustifolia, and down-regulated in L. latifolia flowers compared to leaves. However, in qPCR analysis, AP2-18 was shown to be down-regulated in

L. angustifolia flowers and up-regulated in L. latifolia flowers compared to leaves. Similarly, 47

DGE analysis indicated that bHLH-146 was down-regulated, and bHLH-51 was up-regulated in L. latifolia flowers compared to leaves. In qPCR studies, bHLH-146 was found to be up- regulated and bHLH-51 was down-regulated in L. latifolia flowers compared to leaves.

Likewise, based on DGE studies, MYB-70 was down-regulated in the L. x intermedia and L. latifolia flowers compared to leaves. However, qPCR data indicated that this gene is up- regulated in the flowers of both species. Lastly, WRKY-4 was found to be up-regulated in the L. latifolia flowers compared to leaves in qPCR analysis, while it appeared to be down- regulated in the L. latifolia flowers compared to leaves in DGE analysis. In summary, the results of the DGE analysis were largely confirmed by qPCR indicating that the DGE analysis provided a reasonable starting point for investigating TFs that might be involved in

EO metabolism. However, given the deviations explained above, it is best to confirm the

DGE analysis results for a given TF by qPCR before proceeding with additional experimentation.

48

a) b) AP2 bHLH

c) MYB d) MYC

e) WRKY

Figure 2.7 qPCR analysis of selected TFs from Lavandula. The transcripts were normalized to β-actin and 18S rRNA genes. The flower transcript levels of the genes encoding selective TFs were compared in relative to that of leaf transcripts within each lavender species. All of the tested TFs belong to MYB subfamily showed up-regulation in flowers

49

despite variations in levels of transcript across the species. Other TFs grouped under four subfamilies had different expression patterns, some of which are up-regulated and others down-regulated in the flowers, depending on the species. The transcript from leaf tissue, which is used as a control was assigned with an arbitrary value of 1.0. Bars represent mean values of biological replications ± standard errors (n=3). LA- L. angustifolia; LI- L. x intermedia; LL- L. latifolia.

2.4 Discussion

2.4.1 Transcriptome sequencing, assembly, and annotation

Traditional EST databases derived from cDNA libraries have been tremendously important in gene discovery. However, this approach suffers from certain limitations. For example, compare to RNA-Seq, building an EST database is time tedious, costly, and low throughput.

In addition, EST databases are often not deep enough to enable the discovery of genes for which transcripts are not abundant (Wang et al., 2009). These bottlenecks have now been resolved through RNA-Seq, which is widely applied to transcriptome profiling studies in various plants, providing valuable resources for functional genomics investigations. For example, this approach has been very effective in the discovery of biosynthetic genes from leaves and rhizomes of C. pictus, and C. longa, respectively (Annadurai et al., 2012), and numerous other plants (Han et al., 2016). It has also been useful in identifying several genes involved in resistance to bacterial wilt in mango ginger, in different flower developmental stages in H. coronarium (Prasath et al., 2014; Yue et al., 2015), and in terpenoid biosynthesis in C. sativum (Galata et al., 2014). Application of RNA-Seq is well-beyond gene discovery, and are summarized in recent reviews (Han et al., 2015; Hrdlickova et al., 2017; Lowe et al.,

2017; Wang et al., 2009).

In this study, over 28 million high-quality reads were generated from leaf and flower tissues of three economically important lavender species using the Illumina platform, and assembled into 101,618 contigs (N50 = 831 bp). Using various public databases, over 75% of

50

the total unigenes were annotated, providing important genomic resources that provide insight into the various biological processes and facilitates the discovery of novel genes. In terms of contents, i.e. the number and type of sequences represented, the lavender transcriptomic database has a close similarity to recently reported de novo assembled transcriptomic databases from different plant species (Liu et al., 2017; Zhan et al., 2016;

Landi et al., 2017; Annadurai et al., 2012). The percentage of annotated transcriptomes for

Lavandula (ca. 75%) also falls well within the range reported for some other plants, for example, for Curcuma amada (73.17%) and Zingiber officinale (76.55%) (Prasath et al.,

2014).

As noted earlier, our transcriptome database contained most of the genes involved in the MVA and MEP pathways, as well as genes encoding prenyltransferases involved in isoprenoid metabolism. The database also contained sequences corresponding to all of the previously cloned TPSs from the three lavender species. Further, based on a homology search to discover genes, the database included sequences for several putative

(uncharacterized) TPS genes and transcription factors that can potentially control isoprenoid metabolism in lavenders. Altogether, our database is of very high quality with respect to the number and type of sequences in contains, as well as the proportion of annotated sequences, and provides a good resource for the identification of isoprenoid related genes in lavenders.

2.4.2 Identification of regulatory genes

Over the last few decades, hundreds of TPS genes that mediate the biosynthesis of mono- and sesquiterpenes from diverse plants, including lavenders, have been described, but the regulatory genes (TFs) that control their expression have been poorly investigated. However, recently several studies have tackled this issue, and a few TFs controlling or potentially

51

controlling TPS gene expression have been reported. For example, MYC and WRKY type

TFs were found to transiently activate a monoterpene synthase promoter in N. benthamiana

(Spyropoulou et al., 2014). In addition, NAC and Ethylene-insensitive3 like TFs have been reported to regulate the expression of TPSs that produce both cyclic and acyclic monoterpenes in Kiwi fruit (Nieuwenhuizen et al. 2015). Further, WRKY and AP2 type TFs were shown to have regulatory roles in the artemisinin (a sesquiterpene) biosynthetic pathway in A. annua (Lu et al., 2013; Tan et al., 2015; Chen et al. 2017; Hong et al., 2012).

In an attempt to identify TFs that might contribute to mono- and sesquiterpene (i.e., essential oil) metabolism in Lavandula, we investigated the expression pattern of TF genes in our databases. A total of 1633 TFs were identified in the Lavandula transcriptome database, which is in close range of TFs detected in the Arabidopsis (1533) and H. coronarium (1741) transcriptome databases (Riechmann et al., 2000, Yue et al., 2015). Of these, MYB, AP2-

EREBP, bHLH, and WRKY represent the most abundant TF families, as in most angiosperms. Indeed, members of these TF families play significant roles in regulating the expression of genes involved in terpenoid metabolism in plants (Patra et al., 2013; Lu et al.,

2013; Tan et al., 2015; Chen et al. 2017). A large number of the TFs were differentially expressed in flowers of the three species we studied. More specifically, over 658 were differentially expressed in L. angustifolia, 651 in L. x intermedia, and 659 in L. latifolia flowers compared to the corresponding leaf tissues. Among these, over 600 were common to all three species (i.e., they were differentially expressed in all three plants), and represent good candidates for TFs that control essential oil metabolism.

52

2.4.3 Identification of terpenoid biosynthesis genes

Transcriptome profiling has proven as a powerful tool in the identification of candidate genes and enzymes involved in the formation of the secondary metabolites (Dhandapani et al.,

2017; Yue Liu et al., 2017). Most TPS genes reported earlier (before RNA-Seq) were cloned using time-consuming and tedious approaches. In lavenders, many of the cloned genes were discovered from cDNA libraries and/or EST databases derived from leaf and floral tissues

(Adal et al., 2017; Demissie et al., 2012; Demissie et al., 2011; Jullien et al., 2014;

Landmann et al., 2007; Lane et al., 2010). However, typically, EST databases do not contain genes for which transcripts are not highly abundant. For example, our lavender EST databases do not contain sequences for some of the MEP pathway genes, including 4- diphosphocytidyl-2-C-methyl-D-erythritol synthase (Demissie et al., 2012) and Li(S)-LINS

(this study). Since TPS genes are highly homologous, we can not rule out that all the 63 TPSs genes in our database are different sequences. Further analysis is required to functionally characterize these TPS candidates. The de novo transcriptome database we report here, on the other hand, contains sequence information for the most complement of genes involved in isoprenoid metabolism in lavenders. It also contains sequence information for numerous regulatory genes, in particular, TFs that are important in the regulation of isoprenoid metabolism and were not detected in our EST databases. In this context, our database is of high quality (in terms of containing full-length transcripts) and has sufficient depth to allow cloning of scarcely expressed genes involved in EO metabolism in Lavandula. Furthermore, it is worth noting that the lavender transcriptome contains multiple isoforms for certain genes. For example, the database contains three isoforms for the DXS gene (DXS1-3) (Figure

2.8), which catalyzes the first step of the MEP pathway. Intriguingly, these isoforms

53

exhibited distinct expression patterns. For example, transcript levels for LavDXS1 declined while transcripts for LavDXS2 sharply increased in the flowers compared to leaf tissues. The strong expression of the LavDXS2 isoform in EO-producing tissue implies that this gene might be dedicated to EO metabolism and is expressed only where there is a need for the production of large amounts of EO (i.e., floral tissue). The other less strongly expressed DXS isoforms might play housekeeping roles (Yue et al., 2015).

Figure 2.8 Phylogenetic tree of plant DXSs. Based on the neighbor-joining method. Five DXSs (designated as LavDXS1-5) from the Lavender transcriptome database. GenBank accession numbers are shown in parentheses. At, Arabidopsis thaliana; Mt, Medicago truncatula; Os, Oryza sativa; Pa, Picea abies; Zm, Zea mays.

54

3 Chapter: Identification of TFs regulating Lavender terpene synthases 3.1 Synopsis

The lavender EO is dominated by monoterpenes, specially linalool, linalool acetate, 1,8- cineole, borneol, and camphor. Ever since the first three terpene synthases cloned by

Landmann et al. (2007), many researchers have cloned many other genes responsible for lavender EO compositions. Recently, Adal and Mahmoud (2019) isolated and characterized short-chain trans-isoprenyl diphosphate synthases involved in the biosynthesis of precursors for monoterpenes in lavender. However, very limited information is available for the upstream DNA sequences of terpene synthase genes and the regulation by transcription factors in lavender. As in all eukaryotes, such regulatory mechanisms involve sequences that flank the gene in question containing cis-regulatory elements (promoters) and proteins that recognize and can interact with these elements. In eukaryotes, cis-regulatory elements

(promoters) are recognized by the transcription factors and regulate the specific gene at the very tissue level. It has recently shown that MYB, MYC, bHLH, ERF, and WRKY type TF can regulate the expression of plant TPS genes.

There are in vitro and in vivo methods available to study such promoter and TF interactions. It has been found that in vivo studies provide a more comprehensive relationship than in vitro studies, and it has been shown that yeast one hybrid studies successfully identified DNA binding proteins regulating promoter sequences (Spyropoulou et al., 2014;

Spyropoulou et al., 2014). In this study, we have isolated two putative promoter sequences,

Lp-LINS- lavender promoter of linalool synthase and Lp-CINS- lavender promoter of cineole synthase and cloned into yeast vector to conduct a Y1H assay with lavender flower 55

cDNA library. We have identified 96 colonies for both promoters together, and PCR amplified to do the initial screen. After initial screening, we have selected 38 candidates and were used for further assessment. We also collected three TFs homologous to known TFs characterized before. Six of these proteins induced expression from both promoters, three proteins activated LiCINS promoter alone, and two did not induce expression from either promoter. The TFs identified in this study could be used to improve essential oil yield and composition in plants through plant biotechnology.

3.2 Materials and methods

3.2.1 Candidate selection

Upstream genomic sequence of Linalool synthase (Accession# DQ263741.1

) and Cineole synthase (Accession# JN701461.1) were selected due to their roles in the

Lavender EO profile. Linalool synthase is expressed in the flower and glandular trichomes whereas cineole synthase is expressed in the leaf of lavender plants.

3.2.2 Promoter search

Primers used in this study are listed in Table 3.1. DNA sequences immediately upstream of the linalool synthase and 1,8-cineole synthase genes were cloned from L. x intermedia genomic DNA using the modified Universal GenomeWalkerTM Kit according to procedures recommended by the manufacturer (Clontech, USA). Briefly, genomic DNA was extracted from L. x intermedia leaf tissue using the GeneAid Genomic DNA mini kit (plant) (Geneaid

Biotech, Taiwan) as per the manufacturer's instructions. The DNA was digested with HindIII,

EcoRI, NdeI (for cloning 1,8-cineole synthase promoter), or with DraI, EcoRV, PvuII and

StuI (for cloning the linalool synthase promoter) at 37°C overnight. The digested DNA was

56

purified using the E.Z.N.A Gel Extraction Kit (Omega Bio-Tek) and ligated to adapters using

T4 DNA ligase (NEB, Canada) at 16°C overnight. Promoter fragments were amplified by

PCR using gene-specific primers (Table S1), and adapter specific primers from the

GenomeWalkerTM Kit, and the Q5 Taq DNA polymerase (NEB, Canada). PCR program was set as denaturation at 95°C for 2 min, followed by 20 cycles of 95°C for 25 sec, 72°C to

52°C annealing for 30 sec, extension at 72°C for 3 min; and finally 20 cycles of 95°C for 25 sec, 52°C annealing for 30 sec, extension at 72°C for 3 min. Initial PCR products were diluted 50X and used as templates in the subsequent PCR assays. Final PCR products were resolved on agarose (1%) gel, and the desired DNA bands were excised and purified as before. Purified DNA was cloned into a pGEM-T vector and sequenced.

Table 3.1: Primers used in this study

Promoters 5’ ➔ 3’ LiLINSp LIN-F1 5’-TTTTTAGAATTCACTATAGGGCACGCGTGG -3’ LIN-F2 5’-TTTGAATTCCCACGTGTCTTTTTACCGG-3’ LIN-F3 5’-CGTGAATTCCCTCTCTCTCAAACAAGTAC-3’ LIN-R 5’-ATGCCATGGTTTTTAGCTTGTTGGTTTGG-3’ LiCINSp CIN-F1 5’-TTT GTC GAC CAT ATG CCG AAC TTA TTG-3 CIN-F2 5’- GGG GTC GAC GGA TTT TAT ATG TTG TTT GG-3’ CIN-F3 5’- TTT GTC GAC CGA TTT TCT TCG CAA ACG-3’ CIN-F4 5’- TTT GTC GAC CCT ATT TTC TCT TGC AAA CC -3’ CIN-R 5’-TTA CCA TGG ATT TCT AGT CAA GTA TCA C-3’

3.2.3 Cloning and construct design

DNA fragments corresponding to the linalool synthase promoter (GenBank: MN435985) and

1,8-cineole synthase promoter (GenBank: MN435986) were subcloned in pCambia1391z vector using NcoI and EcoRI / SalI restriction sites (Marker Gene Technologies, Inc.), 57

upstream of the E. coli gusA gene, which encodes β-glucuronidase (GUS). Cineole synthase and linalool synthase promoters were fragmented using 5’ deletion program into -1087 bp, -

856 bp, -553 bp, -251 bp, and -768 bp, -502 bp, -249 bp upstream of ATG start codon, respectively (Primers- Table 3.1). To avoid leaky GUS expression, all the constructs were modified by either removing 35s promoter or addition of Nos terminator in between 35s and experimental promoter sequences in pCambia1391z vector.

The resulting plasmids were transformed into Agrobacterium tumefaciens strain

EHA105 using the freeze-thaw method (Jyothishwaran et al., 2007) and plated on LB agar medium supplemented with 50 mg L-1 of kanamycin and 25 mg L-1 of rifampicin. After confirming by PCR, the positive colony was grown in 5 ml of LB with antibiotics as above at

28 °C shaking at 200 rpm overnight in the dark. Then, 1 ml of the overnight culture was transferred into fresh 100 ml of LB with the same antibiotics and grown at 28 °C overnight at

100 rpm. The next day, OD600 was measured using Ultrospec 2100 pro UV/ Visible

Spectrophotometer, and when OD600 reached to ~0.4-0.5, the cells were spun at 4000 rpm for

10 min and resuspended with an equal volume of co-cultivation medium supplemented with

100 μM acetosyringone (PhytoTechnology laboratories) and incubated at 28 °C shaking at 50 rpm for 3 h. Then, the cells were spun again as above and resuspended with 5-10 mL of co- cultivation medium for leaf disc infiltration.

3.2.4 Promoter localization assay in N. benthamiana leaves

For stable transformation, N. benthamiana young leaves were washed with running water and soap and then soaked in 10 % (v/v) commercial bleach with a single drop of Triton X-100 for

10 min. The tissues were washed with sterile water three times for 5 min each. After sterilization, leaves were cut into ~1 cm2 using a surgical blade and leaf discs were soaked in

58

Agrobacterium solution carried with different pCambia1391z::LiLINSp/LiCINSp promoter or empty pCambia1391z constructs with gentle shaking for 5 min and transferred to sterile

Whatman filter paper for drying. The dried leaf discs were then transferred to co-cultivation solid medium, and the cultures were wrapped with aluminum foil and incubated at 28 °C.

After four days of co-cultivation, the leaf discs with overgrown bacteria were washed with sterile water with 500 mg L-1 of TIMENTIN® (ticarcillin disodium and clavulanate potassium) (Gold Biotechnology), followed by air drying on sterile filter paper. Then, the discs were transferred to regeneration medium with antibiotics (200 mg L-1 of cefotaxime +

125 mg L-1 of TIMENTIN to kill the remaining Agrobacterium, and 50 mg L-1 of

Hygromycin for transgenic plant selection). The cultures were then incubated in a growth chamber in the dark for a week and moved to light conditions until new shoots generated.

The new shoots were detached and transferred to the rooting medium supplemented with antibiotics. Well-rooted plants were transferred to pots for further PCR detection using specific primers, and then to produce T0 seeds. All regeneration and rooting media compositions are described in Appendix - D. All plant cultures were grown at 25 °C under a

16 /8 h day/night photoperiod with cool white fluorescent bulbs. Selected T1 N. benthamiana transgenic plants were grown in pots in a growth room for further analysis.

For transient expression, A. tumefaciens EHA105 cultures were grown overnight from a single colony and diluted in infiltration buffer (10 mM MES at pH 5.6, 10 mM MgCl2, 20

μM ascorbic acid, 150 μM acetosyringone; Sigma-Aldrich) to OD600 of 0.3. Five weeks old

N. benthamiana plant leaves were then infiltrated with mixtures carrying various promoter:GUS constructs. Four days later, leaves were collected and submerged into GUS

59

buffer (50 mM NaH2PO4 buffer at pH-7.0, 1 mM EDTA, 10% glycerol, 0.1% Triton X 100, 1 mM X-gluc of Sigma Aldrich) and destained after color development (Jefferson et al. 1987).

3.2.5 Yeast-one-hybrid system

Matchmaker Gold yeast one hybrid library screening kit was used in this study (Clontech,

USA). Since yeast construct uses only a short promoter section for the initial assay, we opted to use shorter promoter sections for both CINS and LINS genes. In that process, 251 CINS and 249 LINS promoter sections were cloned into pAbAi binary vector using HindIII and

XhoI restriction sites. Correct constructs were selected by PCR and sequencing.

Selected plasmids were digested by BstBI, and BsbI restriction enzyme for pAbAi::LiLINSp and pAbAi::LiCINSp promoter constructs, respectively and transformed the linearized plasmid into Y1HGold yeast cells. Transformed yeast cells were selected on SD/-Ura media, and selected colonies were confirmed by Insert check PCR mix1 and stored at -80 °C freezer.

The minimal inhibitory concentration of Aureobasidin A was determined for the bait

(pAbAi::LiLINSp/pAbAi::LiCINSp) in the range of 100 -1000 ng/ml Aureobasidin A in

SD/-Ura media.

A cDNA library was created from L. x intermedia flower tissues using the SMART cDNA synthesis kit according to the supplied protocol. CDS III primers were used, and LD-

PCR was conducted using the Advantage 2 PCR mix to amplify the cDNA targets. Since 1µg

RNA was used, PCR was conducted at 95 °C for 30 s, 20 cycles of 95 °C for 10 s followed by 68 °C for 6 mins, final termination at 68 °C for 5 mins and stored at 4 °C until used next.

5 µl PCR products were run on the agarose gel to confirm the cDNA synthesis. The rest of the PCR products were purified by the Chrom-spin+400TE spin column system and eluted in

60 µl final volume.

60

3.2.5.1 Screening yeast one hybrid

Prepared cDNA library and pGADT7-RecAD vector were transformed into yeast Gold harboring pAbAi::LiLINSp/pAbAi::LiCINSp promoter competent cells. Transformed cells were plated on SD/-Leu/Aba media with 200 ng/ml and 150 ng/ml Aureobasidin for

LiLINSp and LiCINSp constructs, respectively. Transformed yeast cells were diluted at 1/10,

1/100, 1/1000, and 1/10,000 dilutions during plating on selective media. Selected colonies were confirmed by Advantage PCR mix 2 by colony PCR after 3 days of incubation.

Selected colonies were subcultured on selective media, and a single colony was used to prepare the glycerol stocks. Also, plasmid DNA was extracted from the selected colonies using the Yeast plasmid extraction kit (Omega Biotek, USA). Purified plasmids were subcultured into DH5α bacterial cells for propagation and sequencing. Putative DNA binding candidates were blasted against Plant transcription factor databases

(http://planttfdb.cbi.pku.edu.cn/), NCBI protein database, and Lavender transcriptomic databases. Three transcription factor sequences were isolated from lavender transcriptomic database homologous to known transcription factors, which regulates terpenoid biosynthesis.

Selected putative DNA binding candidates were cloned into the pGA482 plant transformation vector under the regulation of CaMV 35S promoter using KpnI and EcoRI restriction sites. Putative DNA binding candidates were transformed into A. tumefaciens

EHA105 competent cells.

3.2.5.2 Transient transactivation assay in N. benthamiana leaves

To explore TF-promoter interactions in Planta, DNA binding constructs were co-transformed and expressed in N. benthamiana leaves with full-length pCambia1391z::LiLINSp/LiCINSp promoter candidates using infiltration buffer (10 mM MES at pH 5.6, 10 mM MgCl2, 20 μM 61

ascorbic acid, 150 μM acetosyringone; Sigma-Aldrich). Four days later, leaves were collected and submerged into GUS buffer (50 mM NaH2PO4 buffer, pH-7.0, 1 mM EDTA,

10% glycerol, 0.1% Triton X 100, 1mM X-gluc of Sigma Aldrich) and destained after color development (Jefferson et al., 1987).

For enzymatic GUS activity, leaf discs were collected from the infected area and submerged in liquid nitrogen to prepare powder. Crude extracts were prepared in extraction buffer containing 50 mM sodium phosphate at pH 7.5, 10 mM DTT, 1 mM EDTA, 0.1% sodium lauryl sarcosine, 0.1% Triton X-100 and incubated at room temperature for 10 mins followed by centrifuged at 14,000 rpm at 4 °C for 5 mins. Crude protein concentration was measured by Bradford assay (Bio-Rad, USA). 20 µl crude extracts were incubated in 37 °C prewarmed 4-methyl umbelliferyl β-D glucuronide (MUG) assay buffer (1 mM MUG in

GUS extraction buffer). The reaction was mixed thoroughly by vortexer and incubated at 37

°C for 2 mins, and eventually, 100 µl reaction was collected and mixed with 900 µl stop buffer (0.2 M Na2CO3). MU concentration was determined using Varioskan LUX multimode reader (Thermofisher, CA) at 365 nm for excitation and 455 nm for emission. Similar results were recorded in 5 mins interval for 90 mins. GraphPad Prism 6 (GraphPad Software, Inc.) was used for statistical analysis and bar graph representation.

3.3 Results

3.3.1 Promoter search

The nucleotide sequences upstream of linalool and cineole synthases, identified using genome walking, were 768 bp and 1078 bp, respectively. The transcription start site (TSS)

(labeled +1) of the cloned promoters were predicted using the software PLACE

(http://www.dna.affrc.go.jp/PLACE/) and TSSP (http://linux1.soft-berry.com/berry.phtml)

62

software. Both software predicted all the regulatory elements including the TATA and

CAAT boxes of the promoter sequences (Figure 3.1).

Figure 3.1 LiLINS and LiCINS promoter analysis. Regulatory elements were predicted using PLACE. a) Linalool synthase promoter b) cineole synthase promoter. TSS (ATG) is noted with a grey shade.

63

Table 3.2: Predicted regulatory elements in the two promoter regions. TSSP, a software to predict plant promoters, was used to identify the putative binding factors of LiLINSp and LiCINSp promoter. Use the link for proper annotation. (http://www.softberry.com/berry.phtml?topic=regsitelist)

Putative LinS promoter Putative CinS promoter Thresholds for TATA+ promoters - 0.02, Thresholds for TATA+ promoters - 0.02, for TATA-/enhancers - 0.04 for TATA-/enhancers - 0.04

One (1) promoter/enhancer(s) are predicted Three (3) promoter/enhancer(s) are predicted Promoter Pos: 739 LDF- 0.09 TATA box at Promoter Pos: 1015 LDF- 0.13 TATA box at 702 (21.14) 1001 21.07

Enhancer Pos: 1004 LDF- 0.06 Promoter Pos: 125 LDF- 0.06 TATA box at

89 22.13 Transcription factor binding sites/RegSite DB: Transcription factor binding sites/RegSite DB: for promoter at position - 739 for promoter at position - 1015 579 (-) RSP00016 caTGCAC 1004 (+) RSP00005 CTWWWWWWGT 674 (-) RSP00026 gcttttgaTGACtTcaaacac 997 (+) RSP00046 atccattcTATATAAGaaacata 556 (-) RSP00092 TAACAAA 852 (+) RSP00081 ttwCCWWWWnnGGbww 510 (-) RSP00092 TAACAAA 721 (+) RSP00096 GGTTT 462 (+) RSP00161 WAAAG 856 (-) RSP00096 GGTTT 602 (+) RSP00161 WAAAG 790 (-) RSP00129 CACGAC 607 (-) RSP00305 CCTTTT 740 (-) RSP00151 CAANNNNATC 673 (-) RSP00339 RTTTTTR 796 (+) RSP00161 WAAAG 455 (+) RSP00398 TTTGAA 755 (+) RSP00308 CAACA 482 (-) RSP00470 GTGGNG 964 (+) RSP00308 CAACA 636 (+) RSP00477 TTTAA 980 (+) RSP00308 CAACA 675 (-) RSP00508 gcaTTTTTatca 853 (+) RSP00316 AACCAA 608 (-) RSP00508 gcaTTTTTatca 926 (+) RSP00316 AACCAA 607 (-) RSP00508 gcaTTTTTatca 827 (+) RSP00339 RTTTTTR 439 (-) RSP00508 gcaTTTTTatca 941 (+) RSP00401 TAACGT 788 (+) RSP00470 GTGGNG 819 (-) RSP00470 GTGGNG 825 (+) RSP00508 gcaTTTTTatca 881 (+) RSP00508 gcaTTTTTatca 913 (+) RSP00508 gcaTTTTTatca 914 (+) RSP00508 gcaTTTTTatca 951 (+) RSP00512 cttgtaacCATCAgccaatcgaccagccaatcattc 864 (+) RSP00565 GTATTTT

64

3.3.2 Construct design and GUS activity

Upstream genomic sequence fragments of linalool synthase (LiLINSp) and cineole synthase

(LiCINSp) promoters were cloned upstream of the gusA gene (encoding GUS) within the T-

DNA of the pCambia1391z plant transformation vector. Three primers at 768 bp, 502 bp, and

249 bp, and four primers at 1087 bp, 856 bp, 553 bp, and 251 bp were designed distal to the

ATG start codon of LiLINSp and LiCINSp, respectively (Figure 3.2-a). All the constructs, along with the empty pCambia1391z (w/o any insert) were stably transformed in N. benthamiana (tobacco) plants (Figure 3.2-b). For quantification of GUS expression three plants of each independent T1 line were grown in soil for 4 – 5 weeks. Leaves from each plant were used for the GUS expression in the presence of X-gluc as a substrate. Trichomes specific GUS expression was observed for shorter promoter fragments of both LinS and CinS promoters. However, for larger fragments, GUS expression was located on the trichomes of tobacco leaves as well as on the leaf mesophyll. GUS activity for empty pCambia1391z vector without any insert showed GUS expression on the leaf mesophyll (Figure 3.2-b).

To eliminate the leaky GUS expression, pCambia1391z::LiLINSp/LiCINSp constructs were modified either by removing CaMV35S promoter entirely (Figure 3.2-c) or introducing NOS terminator in between the CaMV35S promoter and LiLINSp/LiCINSp promoter (Figure 3.2-d). This experiment eliminated the leaky GUS expression from the negative controls in tobacco transient experimentation (Figure 3.3). To further study the TF-

LiLINSp/LiCINSp interaction, we decided to use pCambia1391z(-35S)::LiLINSp/LiCINSp constructs.

65

Figure 3.2 Predicted promoter fragments and constructs in the pCambia1391z vector. a) different fragments of LiLINSp/LiCINSp promoter candidates, b) promoter fragments were cloned into multiple cloning sites of pCambia1391z vector, c) CaMV 35S promoter was removed from the LiLINSp/LiCINSp constructs, d) Nos terminator was inserted in between CaMV 35S and LiLINSp/LiCINSp fragments.

66

Figure 3.3 GUS assay different promoter constructs in the pCambia1391z binary vector. a) LiLINSp/LiCINSp fragments were cloned in the MCS of pCambia1391z binary vector, b) pCambia1391z::LiLINSp/LiCINSp constructs were modified by introducing Nos-terminator between CaMV35S promoter and LiLINSp/LiCINSp fragments, c) pCambia1391z::LiLINSp/LiCINSp constructs were modified by deleting CaMV35S promoter.

3.3.3 Yeast-one-hybrid assay to identify TFs interacting with LiLINSp/LiCINSp

Since the Y1H system was designed for smaller DNA fragments, we used LiLINSp-F3 (249 bp) and LiCINSp-F4 (251 bp) fragments in this experimentation. LiLINSp-F3 and LiCINSp-

F4 fragments were cloned into pAbAi binary vectors and linearized to transformed into

Y1HGold yeast cells. Colony PCR was conducted to screen successful candidates from transformed Y1HGold cells grown on SD/-Ura selective media. Selected candidates were screened for Aureobasidin minimum inhibitory concentration for the promoter-TF interaction 67

assay on yeast cells. Y1HGold[LiLINSp-F3 / LiCINSp-F4] cells were found to be sensitive at 150 ng/ml and 200 ng/ml Aureobasidin concentration. A prey library was constructed from cDNA of L. x intermedia flower tissues according to the manufacturer protocol. cDNA library along with pGADT7-recAD prey vector was transformed into Y1HGold[LiLINSp-

F3/LiCINSp-F4] cells and selected on appropriate SD/-Leu/AbA media. A total of 96 colonies were found on LiLINSp/LiCINSp-TF interaction assay. Colony PCR was conducted to confirm the candidates, and a total of 38 candidates that were approximately 1000 bp in size were selected and annotated against different public protein databases. These included twenty-nine DNA binding proteins, three MYB related TFs, two bHLH- related TFs, three

WRKY-related TFs, and one unidentified protein.

3.3.4 In vivo screening of LinSp/CinSp interaction with specific TF candidates

To elucidate the function of a selection of the TF proteins in Planta, a previously established transient assay was employed in N. benthamiana leaves (Spyropoulos et a. 2019). Briefly, the coding sequence of the gusA gene was placed under the control of either LiLINSp or

LiCINSp in the pCambia1391z plant transformation vector to produce the reporter constructs

LiLINSp:GUS or LiCINSp:GUS, respectively. To generate the TF constructs, the coding sequences of eight TFs selected from the Y1H assay, and three lavender TFs orthologous to those known to regulate terpene (artemisinin) biosynthesis in Artemisia annua (see Table

3.2) were separately cloned into the pGA482 plant transformation vector (An, 1986) under the control of CaMV35s promoter (Figure 3.4).

68

Figure 3.4 TF constructs in the pGA482 plant binary vector.

TF-19, TF-21, TF-24, TF-35, TF-37, and TF-38 were identified from the LiCINSp Y1H assay and, TF-06, TF-07 were recovered from LiLINSp Y1H assay. TF constructs were then individually co-transformed with either LiLINSp:GUS or LiCINSp:GUS construct into tobacco leaves. Three days after co-expression, total leaf protein was extracted from transiently transformed leaves and assayed for GUS activity x-gluc as a substrate. Six TFs activated both promoters, two only induced the LiCINS promoter, and two did not induce either promoter (Figure 3.5). Our results indicate that LiMYB, LibZIP, LiGeBP, LiSBP-2,

LiERF-1, and LiERF-2 (Table 3.3) might be activators of monoterpene synthases in lavenders.

Figure 3.5 GUS activity to determine the TFs interaction with LiLINSp/LiCINSp promoters in N. benthamiana leaf disc assay.

69

Table 3.3: L . x intermedia TFs tested in N. benthamiana for activating linalool synthase promoter (LiLINSp) / 1,8-cineole synthase promoter (LiCINSp). Some TFs act as an activator (+), while others do not (-).

Transcription Activator Annotated TF Reference factors (TFs) LiLINSp LiCINSp class/ family (OD)* (OD)* TF-06 - + B3 – like (Romanel et al., 2009) TF-07 - - B3 – like (Romanel et al., 2009) TF-19 + + MYB related (Riechmann et al., 2000) TF-21 + + bZIP related (Riechmann et al., 2000) TF-24 - + NAC related (Riechmann et al., 2000) TF-35 + + GeBP – like (Chevalier et al., 2008) TF-37 - - SBP related (Riechmann et al., 2000) TF-38 + + SBP related (Riechmann et al., 2000) TF-S24 - + SBP related (Riechmann et al., 2000) TF-9306 + + ERF1-like (Yu et al., 2012) TF-100583 + + ERF2- like (Yu et al., 2012)

To confirm this experiment, crude extracts were used to determine the GUS assay using MUG as a substrate since MUG is a better fluorogenic substrate than x-gluc during GUS assay (Jefferson et al., 1987). For both promoters, TF-S9306 and TF-S100583 showed the most significant activity compared to others. Again, TF-07, which, as found from the LiLINSp Y1H assay, showed 30 pmol/mg/min GUS activity, which is similar to the TF- S9036. On the contrary, TF-021, which was identified from the LiCINSp Y1H assay, showed the most GUS activity of 55 pmol/mg/min compared to all other TF candidates found in the Y1H assay (Figure 3.6).

70

Figure 3.6 Transactivation of LiCINSp & LiLINSp promoters by TFs in N. benthamiana leaves. Normalized GUS activity after co-infiltration with A. tumefaciens harboring the 35S:TF effector construct and LiCINSp/LiLINSp promoter::GUS reporter constructs. The respective 35S:TF assay were subtracted from the co-infiltration assay to express the true interaction of promoter with the respective TF. The bars represent mean values, and the error bars the standard error (n = 3) (ANOVA, P < 0.05).

3.4 Discussion Linalool and cineole are two important compounds in lavender EO, and linalool synthase and cineole synthase genes responsible for the synthesis of these compounds were identified and characterized previously (Demissie et al., 2012; Landmann et al., 2007). Linalool adds the pleasant flavor to the lavender oil and is used mostly in the cosmetic industry whereas cineole is being predominantly used in personal hygiene products (Cavanagh & Wilkinson,

71

2002; Demissie et al., 2012; Woronuk et al., 2011). Linalool and cineole synthases are regulated spatiotemporally, and identification of the regulatory elements involved in the biosynthesis of linalool and cineole in lavender EO was a timely manner.

Genome walker kit revealed 768 bp and 1087 bp genomic sequences for linalool synthase and cineole synthases, respectively. Promoter analysis with consecutive deletion fragments allowed us to delimit a 249 bp and 251bp fragments that are sufficient for glandular trichome specific expression of linalool synthase and cineole synthase, respectively. A 207 bp promoter fragment was enough to direct the linalool synthase expression in the glandular trichomes of S. lycopersicum (Spyropoulou et al., 2014).

A yeast-one-hybrid screen identified transcription factors that are actively transactivated LiLINSp/LiCINSp in planta. Relatively few terpene synthases promoter sequences have been identified and characterized to date (Spyropoulou et al., 2014;

Spyropoulou et al., 2014; Tissier, 2012). Since yeast promoters are within the range of 150 bp – 400 bp, we needed to find out the suitable size of the LiLINSp/LiCINSp fragments for the Y1H assay (Dobi & Winston, 2007). Smaller promoter fragments were able to direct the expression of the GUS gene in the glandular trichomes of tobacco leaves. Similar to the

SlMTS5 promoter regulating linalool synthase of tomato plants, we could not find any specific region of LiLINSp/LiCINSp related to repressing the LiLINS and LiCINS genes

(Ennajdaoui et al., 2010; Spyropoulou et al., 2014).

Y1H assay has been widely used to predict the DNA-protein interaction in living organisms. Using Y1H assay, we were able to identify few transcription factors interacting with LiLINSp and LiCINSp; however, eight TF candidates along with three TFs from the lavender transcriptomic database, which was homologous to other known TFs were co-

72

expressed in tobacco leaves. Fluorogenic GUS activity revealed few TFs specific to LiLINSp and LiCINSp. TF-07, a LiLINSp specific candidate, found to show the best activity compared to other TF candidates. Similarly, TF-21 from the LiCINSp Y1H assay and TF-

S9306, TF-S100583 from the lavender database showed better activity compared to the rest.

73

4 Chapter: Cloning of terpene acetyltransferase 4.1 Synopsis

Two monoterpene acetyltransferase cDNA clones (LiAAT- 3 and LiAAT-4) were isolated from L. x intermedia glandular trichomes, expressed in bacteria to produce, and functionally characterize the encoded proteins in vitro. The recombinant LiAAT-3 and LiAAT-4 proteins had molecular weights of ca. 47 and 49 kDa, respectively, as evidenced by SDS-PAGE. The

Km (mM) values for the recombinant LiAAT-3 and LiAAT-4 were 1.046 and 0.354 for lavandulol, 1.31, and 0.279 for geraniol, and 0.87 and 0.113 for nerol, respectively. The

Vmax (pkat/mg) values for LiAAT-3 and LiAAT-4 were 92.13 and 105.1 for lavandulol,

81.07 and 52.17 for geraniol, and 15.02 and 15.8 for nerol, correspondingly. Catalytic efficiencies (mM-1 - min-1) for LiAAT-3 and LiAAT-4 were 0.27 and 0.85 for lavandulol,

0.19 and 0.54 for geraniol, and 0.052 and 0.4 for nerol, respectively. These kinetic properties are in the range of those reported for other plant acetyltransferases and indicate that LiAAT-4 has better catalytic efficiency than LiAAT-3, with lavandulol serving as the preferred substrate for both enzymes. Transcripts for both genes were abundant in L. angustifolia and

L. x intermedia flowers, where monoterpene acetates are produced, and were undetectable

(or present in trace quantities) in L. latifolia flowers, which do not accumulate significant amounts of these metabolites.

74

4.2 Materials and methods

4.2.1 Candidate selection

The construction of a cDNA library and the corresponding expressed sequence tag (EST) database from the floral glandular trichomes of mature (30 % bloomed) L. x intermedia flowers, and the transcript profiling experiment in various tissues of L. angustifolia, L. x intermedia and L. latifolia using microarray technique were recently reported ( Demissie et al., 2012; Lane et al., 2010). Probe generation, array construction, RNA labeling, array hybridization, washing, scanning, signal quantification, and data analysis for microarray experimentation was performed at the University Health Network Microarray Centre

(Toronto, Canada). After analyzing the microarray data and homology-based searches, a total of four candidates (LiAAT-1 to LiAAT-4) were selected for this study. Candidates were further analyzed for the presence of conserved motifs and active motifs using the Conserved

Domains module of protein structure analysis software of NCBI

(http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi), and for transit peptides using

ChloroP1.1 (http://www.cbs.dtu.dk/services/ChloroP/) and Signal 3L

(http://www.csbio.sjtu.edu.cn/bioinf/Signal-3L/) online software.

4.2.2 Relative expression assay of LiAAT

Total RNA was extracted from different lavender tissues using RNeasy Plant Mini Kit and treated with DNase I enzyme to remove genomic DNA (Qiagen, Canada). The relative abundance of the four LiAAT candidates was analyzed in young leaves, and mature flowers of all three lavender species, and in glandular trichomes of L. angustifolia and L. x intermedia tissues by quantitative (qPCR), using the SteponePlus Real-Time detection system (Applied Biosystem, Canada). Complementary DNA (cDNA) for relative transcript

75

analysis was synthesized using iScript cDNA synthesis kit (Bio-Rad) according to the manufacturer’s instructions. SYBR® Select Mastermix (Life Technologies, Canada) along with approximately 150 ng of cDNA as a template and 500 nM of each of the primers in 10

µl reaction volume. Gene-specific primers (Table 4.1) used in quantitative real-time PCR

(qPCR) experiments were designed manually as LiAAT-1, LiAAT-2, and LiAAT-3 share above 80% nucleotide sequence similarity. Primer sequences were analyzed for a hairpin, self-dimer, and hetero-dimer formation using IDT primer quest software

(http://www.idtdna.com/Scitools/Applications/Primerquest/). The following program was used for qPCR: initial heat-labile Uracil-DNA Glycosylase (UDG) activation step at 50 °C for 2 min, denaturation at 95 °C for 2 min, and 50 cycles of 3 sec at 95 °C and 30 sec at 60

°C. Following threshold-dependent cycling, melting dissociation was performed from 60 to

95 °C at 0.3 °C/s melt rates with a smooth curve setting. PCR efficiency was calculated using

LinRegPCR for all the primers used in this experiment and updated in the SteponePlus data analysis software (Ruijter et al., 2009). Normalized expression values (ΔΔCT) of LiAAT candidates were calculated by DataAssistTM software (Life Technologies, Canada) using β- actin and 18S RNA as reference genes.

Treated RNA was reverse transcribed with Oligo-dT (80 µM) and random hexamers

(40 µM) (Custom oligos, IDT Canada) using M-MuLV reverse transcriptase enzyme (New

England Biolabs, Canada) following manufacturer protocol. Transcriptional activities of

LiAAT-3 and LiAAT-4 were analyzed in the tissues described above by standard PCR using specific primers (Table 4.1) with Taq DNA Polymerase (New England Biolabs, Canada).

The following PCR program was used: initial denaturation at 95 °C for 5 min, followed by

76

95 °C for 1 min, 60 °C for 30 sec and 72 °C for 30 sec for 25/30 cycles with a final elongation at 72 °C for 5 min.

Table 4.1: Oligonucleotides used in this study

Primer Target primers type gene LiAAT-3 F-5’ AAT TGA ATT CAT GGC ATC CAC CAA AAC C 3’ Full R -5’ AAA GCT CGA GCA ATG CTG AAA GAT TGA AAG 3’ length LiAAT-4 F-5’ CCC CGA ATT CAT GGC GAT GAT TAT TAC A 3’ R-5’ CCC CCC TCG AGA GTA TCC AAT TTA TTG TA 3’ LiAAT-3 F-5’ GCC AAA GGC ACG ATT GAT TGG 3’ R-5’ GAG AGT CCT GGC AGC CGT AAT 3’ LiAAT-4 F-5’ AGT TCG GCC TCT TCA TTC CG 3’ R-5’ CTC ACC CCG TCA CTA GAT GC 3’ qPCR Actin F-5’ TGT GGA TTG CCA AGG CAG AGT 3’ R-5’AAT GAG CAG GCA GCA ACA GCA 3’ 18s RNA F-5’ GTG ACG GGT GAC GGA GAA 3’ R-5’GAC TCA ATG AGC CCG GTA 3’

4.2.3 Recombinant protein expression and purification

The coding regions of selected sequences were amplified from L. x intermedia glandular trichome cDNA using appropriate primers (Table 4.1), and iProof HiFi Taq polymerase (Bio-

Rad, Canada). The PCR program used was: initial denaturation at 95 °C for 5 min, followed by 95 °C for 1 min, 52 °C for 30 s and 72 °C for 1:30 min for 35 cycles with a final elongation at 72 °C for 5 min. PCR products were purified using a Gel Extraction/PCR purification kit (Omega Bio-Tek, USA). The purified PCR products were inserted in the 77

XhoI/EcoRI site of the pGEX4T-1 bacterial expression vector fused to the coding sequences for glutathione S transferase (GST). The resulting constructs were transformed into E. coli

BL21(DE3)plysS cells (EMD Chemicals, Darmstadt, Germany) to allow the production of the recombinant proteins. Cells were grown and induced at 20 °C with isopropyl-β-D- thiogalactopyranoside (IPTG) at a final concentration of 0.5 mM for 12–14 h in Luria–

Bertani (LB) media supplemented with 100 mg/L ampicillin and 34 mg/L chloramphenicol.

The induced cells were chilled on ice for 15–20 min, collected by centrifugation at 3,220 g and 4 °C for 20 min, and stored at −80 °C overnight. The harvested cells were thawed and resuspended in GST wash buffer (43 mM Na2HPO4, 14.7 mM KH2PO4, 1.37 M NaCl, 27 mM KCl, pH 7.3) supplemented with 1 mM protease inhibitor phenylmethanesulfonylfluoride (PMSF). Cells were further sonicated on ice using a Sonic

Dismembrator Model 100 (Fisher Scientific, Ottawa, ON, Canada) to complete bacterial membrane disruption after the freeze-thaw cycle. The lysate was centrifuged (10,000 g, 20 min) and incubated for 90 min with GST bind resin previously equilibrated with GST wash buffer on ice. After centrifugation (600 g, 5 min) the supernatant was discarded, and the resin was washed three times with 10x volumes of GST wash buffer. Thrombin (EMD Chemicals,

Darmstadt, Germany) was applied to the resin containing recombinant protein at 1 U/mg concentration for 12 hr at 4 °C in a tilted rocking platform. The treated resin was centrifuged at 500 g for 2 min and the supernatant was collected carefully for the recombinant protein.

GST elution buffer (50 mM Tris-Cl, pH 8.0, 10 mM reduced glutathione) was also treated to collect residual protein fragments for SDS-PAGE analysis. Protein samples were filtered through an Amicon column (30 K cut off) to increase the protein concentration by lowering total volume. Protein samples were stored in protein storage buffer (50 mM Tris, 1 mM DTT,

78

1 mM EDTA, 0.5 mg BSA, 10 mM NaCl, and 15% glycerol at pH 7.5). Protein concentration was determined by the Bradford method.

4.2.4 Enzymatic assay and kinetic study

Initial enzyme activity was studied in 0.5 ml reaction volume with protein storage buffer, 2 mM substrate (geraniol, lavandulol, and linalool), 0.2 mM acetyl CoA, and 10 µg of protein.

After overnight incubation at 30 °C with 80 r.p.m shakings, assay products were extracted into 0.5 ml pentane and concentrated ~50 times before analysis by GCMS (see below).

Negative controls were performed using protein extracted from bacteria harboring the empty vector (without insert) maintaining all other conditions of the regular experiment. For linear kinetics study, assays were performed at five different time points: 10, 30, 60, 90, and 120 min. The optimum temperature was determined from a set of reactions performed at 25- 37

°C for 30 min. The optimum pH was determined by performing assays at pH 5.5- 10.0 using different compatible buffers using the method stated above. All assays were performed in duplicate or triplicate for statistical significance. Reaction assays contained camphor (1 mg/ml) as an internal standard to quantify the amount of product formed.

Michaelis–Menten saturation curve was constructed using enzyme assays (n = 3) performed at the optimum temperature (32 °C) and pH (8.0) for 30 min in 0.5 ml reaction volume containing 50 mM sodium phosphate buffer, 250 nM enzyme, 0.2 mM Acetyl CoA, and substrate concentration of 10 µM to 5 mM. Reactions were terminated by simple mixing followed by freezing at -80 °C. Assay products were recovered by 0.5 ml pentane and analyzed by GCMS, as discussed below. Kinetic parameters were determined from a

Michaelis–Menten saturation curve constructed using SigmaPlot software version v.10.00

(Systat Software, Germany).

79

4.2.5 GCMS analysis

Assay products were analyzed using a Varian 3800 Gas Chromatographer coupled to a

Saturn 2200 Ion Trap mass detector. The instrument was equipped with a 30 m X 0.25 mm capillary column coated with a 0.25 µm film of acid-modified polyethylene glycol (ECTM

1000, Alltech, Deerfield, IL, USA), and a CO2 cooled 1079 Programmable Temperature

Vaporizing (PTV) injector (Varian Inc., USA). Samples were injected on-column at 40 °C.

The oven temperature was initially maintained at 40 °C for 1 min, raised to 100 °C at a rate of 30 °C/min, then to 150 °C at a rate of 6 °C/min, then to 230 °C at a rate of 35 °C/min, and finally held at 230 °C for 2 min. The carrier gas (helium) flow rate was set to 1 ml per min.

Identification of the products was confirmed by comparing their retention time and mass spectra to those of authentic standards (from our collection) analyzed under the same condition. EO constituents were identified by comparison of obtained mass spectra to those of authentic standards, or to those in the NIST library.

4.2.6 Multiple sequence alignment and phylogenetic tree analysis

Sequences for LiAAT candidates were blasted against those in the NCBI database using the

BLASTp function to identify the closest homologs, which were analyzed by multiple sequence alignment using the default parameters of the ClustalW tools available at the EBI platform (http://www.ebi.ac.uk/Tools/msa/clustalw2/). The phylogenetic tree was created using neighbor-joining and bootstrap analysis of the Geneious Tree Builder module

(Geneious 5.0.3 software, Auckland, New Zealand).

4.2.7 Accession number

The coding sequences for LiAAT-3 (Accession # KM275343) and LiAAT-4 (Accession #

KM275344) have been submitted to NCBI.

80

4.3 Results

4.3.1 Candidate selection

Construction of a cDNA library and the corresponding expressed sequence tag (EST) database from the secretory cells of L. x intermedia flower glandular trichomes was previously reported (Demissie et al. 2012). The annotated EST database was initially searched using the keyword “acetyltransferase” to identify putative alcohol acetyltransferase homologs. The selected candidates were further screened by comparing their sequences to those of known alcohol acetyltransferase (AAT) present in online databases. The initial search yielded a total of 117 ESTs as putative acetyltransferase. Among these, 2 ESTs were singleton while the remaining 115 ESTs formed eight contigs. Subsequent search, based on

GO annotation, revealed that four contigs were potentially viable terpene alcohol acetyltransferases, while others corresponded to histone acetyltransferase, amino acid acyltransferase, and dihydrolipoamide acyltransferase, among others. The four selected contigs, LiAAT-1, LiAAT-2, LiAAT-3, and LiAAT-4, contained six, thirty-five, twenty, and thirty-five ESTs, respectively (Table 4.2), and produced transcripts with complete ORFs for

LiAA-1 – LiAA-4 candidates.

Given that the floral EOs of L. x intermedia and L. angustifolia contain substantial amounts of monoterpene acetates, we anticipated a strong expression of EO-related acetyltransferase in the floral tissue of these plants. On the other hand, we did not expect to observe strong expression of these genes in flowers of L. latifolia, which do not accumulate significant amounts of monoterpene acetates (Herraiz-Peñalver et al., 2013; Lis-Balchin,

2002; Sarker, 2013). Our microarray analysis indicated that transcripts corresponding to

LiAAT candidates were more abundant (by 1.9, 1.4, 18.5, and 18-fold for LiAAT-1, LiAAT-

81

2, LiAAT-3, and LiAAT-4, respectively) in L. x intermedia as compare to L. latifolia floral tissue (Table 4.2). Similarly, transcript levels for all candidates were higher in newly opened flowers (anthesis stage), which accumulate monoterpene acetates, compared to unopened buds, which do not produce these compounds (Table 4.2). A similar expression pattern was observed for LiLAAT-3 and LiLAAT-4 in L. angustifolia flowers. However, transcripts for

LiAAT-1 and LiAAT-2 were 52 and 11-fold down-regulated in L. angustifolia compared to

L. latifolia floral tissue, indicating that these sequences are not likely to be involved in EO metabolism. Based on these observations LiAAT-3 and LiAAT-4 were selected for further analysis.

Table 4.2: A Microarray analysis

. Candidate # of LI Vs LA LI Vs LL LA Vs LL Bud Vs Anth ESTs (LI-gland) LiLAT-1 6 Up-44 Up-1.89 Down -52 Down-4 LiLAT-2 35 Up-2.4 Up-1.39 Down-11 Down-6 LiLAT-3 20 Up-1.47 Up-18.5 Up-20.79 Down-6 LiLAT-4 35 No change Up-18 Up-18 Down-7 LA = L. angustifolia, LI = L. x intermedia, LL = L. latifolia, LI-gland = glandular trichomes from L. x intermedia bud and anthesis flower developmental stages. Unless otherwise stated, 30% of bloomed flower tissues were used for all other species. Up = up-regulation, down= down-regulation in their respective tissues.

4.3.2 Sequence analysis

The open reading frame (ORF) of LiAAT-3 and LiAAT-4 consists of 1344 and 1254 bp, coding for proteins of 447, 417 amino acids with predicted molecular weights of 50.27 and

47.57 kDa, respectively. Proteins were predicted to be cytosolic as none of the ORFs

82

encoded a transit peptide. Intracellular trafficking of isoprenoid metabolites is known to occur in different organisms, for example, the intermediates involved in the biosynthesis of the monoterpene menthol in peppermint oil glands are believed to move around the cell for hydroxylation and isomerization reactions (Turner et al., 2000). LiAAT-3 and LiAAT-4 proteins were highly homologous to alcohol acetyltransferase from strawberry and apple fruit

(FaSAAT, MpAAT), and included all the conserved motifs present in typical plant alcohol acetyltransferase, specifically HxxxD and DFGWG motifs (D’Auria, 2006) (Figure 4.1).

However, in LiAAT-3 aspartic acid (D) and phenylalanine (F) of the DFGWG motif were replaced with glutamic acid (E) and valine (V), respectively. LiAAT-4 contained two HxxxD motifs separated by 29 amino acids and an unaltered DFGWG motif at the C-terminal end

(Figure 4.1).

Figure 4.1 Multiple sequence alignment of LiAAT-3 and LiAAT-4 with respect to their closest alcohol acetyltransferase homolog. MpAAT1- Malus pumila (pumila-) alcohol acyltransferase, and SAAT-Strawberry alcohol acetyltransferase. The black bar indicates the conserve motifs HxxxD and DFGWG.

83

4.3.3 Heterologous protein expression and enzymatic assay

The coding sequences of LiAAT-3 and LiAAT-4 candidates were cloned into pGEX4T-1 vector using EcoRI and XhoI restriction sites, which allowed for the production of N-terminal

GST tagged recombinant proteins. The GST tag was removed by thrombin digestion, and the purified enzymes were used in enzymatic assays using monoterpenoid alcohols as substrates.

Initially, 10 µg of each enzyme was used in a reaction containing 2 mM of linalool, geraniol, and lavandulol as substrate and incubated overnight at 30 °C. The reaction was terminated, and assay products were extracted in 0.5 ml pentane after storing reaction vials at -80 °C allowing volatile compounds to settle down. Assay products were concentrated around 50 times, before analysis by GCMS. LiLAAT-3 and LiAAT-4 both produced significant amounts of respective esters from geraniol and lavandulol, but not from linalool (Figure 4.2).

A small amount of linalyl acetate was also present in all assays, including the negative control, indicating that it was most likely a contaminant of the substrate (linalool) preparation

(not shown).

Figure 4.2 GCMS analysis of assay products formed in LiAAT-3 and LiAAT-4 enzymatic assays. 84

As an example, GCMS chromatogram of reaction products formed from lavandulol by LiAAT-4 is shown. Mass spectrum of lavandulol acetate also included.

4.3.4 LiAAT kinetic studies

The kinetic properties were determined at a pH of 8.0, the temperature of 32 °C, acetyl-CoA concentration of 0.2 mM, and substrate concentration of 10 µM to 5 mM. Significant substrate inhibition was observed at 5 mM point, which was thus not included in the kinetic analysis reported in Figure 4.3.

The Michaelis–Menten enzyme saturation curve was generated using the hyperbolic enzyme kinetics analysis module of the SigmaPlot software. The Km of LiAAT-3 for geraniol, lavandulol, and nerol was calculated as 1.31, 1.05, 0.87 mM, respectively. The

Vmax of the enzyme for the same substrates was 81.07, 92.13, 15.02 pkat/mg, respectively

(Figure 4.3). Turnover number (kcat) and catalytic efficiency (kcat/Km) of LiAAT-3 was calculated for geraniol, lavandulol and nerol with a value of 0.245, 0.278, 0.045 min-1, and

0.186, 0.266, 0.052 mM-1min-1, correspondingly. Similarly, kinetic parameters of LiAAT-4 candidate were calculated for geraniol, lavandulol, and nerol. For these substrates Km was

0.279, 0.354, 0.118 mM, and Vmax was 52.17, 105.1, 15.8 pkat/mg, respectively (Figure

4.3). Turnover number (kcat) and catalytic efficiency (kcat/Km) of LiAAT-4 for geraniol, lavandulol and nerol were found to be 0.151, 0.30, 0.045 min-1, and 0.539, 0.848, 0.40 mM-

1min-1, respectively (Table 4.3).

85

Figure 4.3 Kinetic assays of LiLAAT-3 and LiAAT-4. LiAAT-3 with a) Geraniol, b) Lavandulol, c) Nerol, and LiAAT-4 with d) Geraniol, e) Lavandulol, f) Nerol. The substrate concentration was in µM. Lavandulol was a preferred substrate for both candidates.

86

Table 4.3: Kinetics data (0.2 mM acetyl CoA as a substrate).

Substrate Parameters LiAAT-3 LiAAT-4

Geraniol Km (mM) 1.31 0.279

Vmax (pkat/mg) 81.07 52.17 -1 kcat (min ) 0.245 0.150 -1 -1 kcat/Km (mM . Min ) 0.186 0.539

Lavandulol Km (mM) 1.04613 0.354

Vmax (pkat/mg) 92.13 105.1 -1 kcat (min ) 0.278 0.3 -1 -1 kcat/Km (mM . Min ) 0.266 0.848

Nerol Km (mM) 0.87 0.113

Vmax (pkat/mg) 15.02 15.8 -1 kcat (min ) 0.045 0.045 -1 -1 kcat/Km (mM . Min ) 0.052 0.4

4.3.5 Tissue-specific regulation of LiAAT

The expression pattern of LiAAT-3 and LiAAT-4 transcripts in leaf, flower, and glandular trichomes of L. x intermedia and L. angustifolia, and flower tissues of L. latifolia was evaluated by standard and qPCR. For qPCR, transcript levels were quantitated in flowers relative to leaves. The results of the standard PCR assay revealed that transcripts for both candidates were highly concentrated in L. x intermedia and L. angustifolia floral glandular trichomes (Figure 4.4).

87

Figure 4.4 Detection of transcripts for LiAAT-3 and LiAAT-4 by standard PCR. (LI = L. x intermedia, LA = L. angustifolia, LL = L. latifolia. LF = leaf, FL = flower, GL = glandular trichomes). LiAAT-3 and LiAAT-4 transcripts were highly abundant in LA, and LI glandular trichomes. β-actin was used as a reference gene.

The results of the qPCR experiment confirmed these findings, demonstrating that LiAAT-3 transcripts were up-regulated by 118 and 5866-fold in flowers and glandular trichomes of L. angustifolia, respectively, and by 1.3 and 49-fold in flowers and glandular trichomes of L. x intermedia, respectively. LiAAT-4 transcripts were up-regulated by 83 and 1601-fold in flower and glandular trichome of L. angustifolia, and by 7.5 and 345-fold in flower and glandular trichome of L. x intermedia, respectively. LiAAT-3 transcripts were more abundant

(3-fold) in L. latifolia flowers compared to leaf tissues, and LiAAT-4 transcripts were not detected at all (Figure 4.5).

88

Figure 4.5 Transcriptional activity of LiAAT candidates in different tissues of lavender species. β-actin and 18s RNA were used as a reference gene. Transcriptional activity of flower and glandular trichome tissues were normalized against a leaf of their respective plants. (n = 6).

4.3.6 Phylogenetic tree analysis

A non-rooted phylogenetic tree was developed for all four candidates to examine their relationship with other known BAHD acetyltransferases. The tree shows all five clades of known BAHD acyltransferases (D’Auria, 2006) and a new clade with one of the lavender acyltransferase (Landmann et al., 2011). LiAAT1-3 falls into a subgroup of clade five (V) which consists of enzymes capable of forming the benzenoid ester benzyl benzoate. Benzyl alcohol benzoyl transferase from Clarkia breweri (CbBEBT) is the closest related enzyme from this subgroup with a score of 67.34% identity with LiAAT-3. AtCHAT from

Arabidopsis thaliana and MpAAT1 from Malus domestica are the two-alcohol acetyltransferase from this subgroup with 60.93% and 60.15% identity, respectively. LiAAT-

4 belongs to clade three (III) with 11 more members in it accepting a diverse range of alcohols as substrates while Acetyl CoA is a major acyl donor. There are two subgroups in this clade, and LiAAT-4 belongs to the second subgroup closest to RsVINS, Vinorine

89

synthase of Rauvolfia serpentine with a score of only 40% identity. LaAT1, an L. angustifolia acyltransferase clone, shares around 40% sequence identity with all four LiAAT candidates and occupy clade five in the phylogram (Figure 4.6).

Figure 4.6 Phylogenetic tree analysis of BAHD acyltransferases including LiAAT-1-4. The tree was constructed by neighbor-joining and bootstrap distance analysis (adapted from D'Auria 2006; Landmann et al., 2011). Candidates included that have been characterized by either genetic mutant screening or biochemical assay. AsHHT1, Avena sativa hydroxycinnamoyl-CoA: hydroxyanthranilate-N-hydroxycinnamoyltransferse (BAC78633); AtAT, Arabidopsis thaliana acyltransferase (NP_197782); AtCER2, A. thaliana CER2-protein, biosynthesis of C30-waxes (AAM64817)1; AtCHAT, A. thaliana (Z)-3-hexen-1-ol-O-acetyltransferase (AAN09797);AtHCT, A. thaliana hydroxycinnamoyl-CoA; shikimate/quinate hydroxycinnamoyltransferase (NP_199704);

90

CaPun1, Capsicum annum Pun1-Protein, biosynthesis of capsaicin (AAV66311); CbBEAT, Clarkia breweri benzyl alcohol-O-acetyltransferase (AAC18062); CbBEBT, C. breweri benzoyl-CoA: benzyl alcohol-O-benzoyltransferase (AAN09796), CmAAT1-4, Cucumis melo alcohol acyltransferase (CAA94432, AAl77060, AAW51125, AAW51126); CrDAT, Catharanthus roseus deacetylvindolin- 4-O-acetyltransferase (AAC99311); CrMAT, C. roseus minovincinin-19-hydroxy-O-acetyltransferase (AAO13736); DcHCBT: Dianthus caryophyllus anthranilate-N- hydroxycinnamoyl/benzoyltransferase (CAB06430); Dm3MAT1-2, Dendramthema x morifolium anthocyanidine-3-O-glucosid-6''-O-malonyl-transferase (AAQ63615, AAQ63616); Dv3MAT, Dahlia variabilis malonyl-CoA: anthocyanidine-3-O-glucosid-6''-O-malonyltransferase (AAO12206); FaSAAT, Fragaria x ananassa (strawberry-) alcohol acyltransferase (AAG13130); FvVAAT, Fragaria vesca alcohol acyltransferase (CAC09062); Gt5AT, Gentiana triflora anthocyanin-5- aromatic acyltransferase (BAA74428); HcACT, Hordeum vulgare agmatine-coumaroyltransferase (AAO73071); LaAT1 and LaAt2, L. angustifolia acyltransferase 1 and 2 (ABI48360 and ABI48361); Lp3MAT1, Lamium purpureum malonyl-CoA: flavonol-3-O-glucosid-6''-O-malonyl- transferase (AAS77404); LuaHMT/HLT, Lupinus albus tigloyl-CoA; 13α-hydroxymultiflorin/13α-hydroxy- lupanin-O-tigloyltransferase (BAD89275); MpAAT1, Malus pumila (pumila-) alcohol acyltransferase (AAU14879); MsBanAAT, Musa sapientum (banana-) alcohol acyltransferase (CAC09063); NtBEBT, Nicotiana tabacum benzoyl-CoA, benzyl alcohol-O-benzoyltransferase (AAN09798); NtHCT, N. tabacum hydroxycinnamoyl-CoA, shikimate/quinate hydroxycinnamoyltransferase (CAD47830); NtHQT, N. tabacum hydroxycinnamoyl-CoA: quinate hydroxycinnamoyltransferas (CAE46932); NtMAT, N. tabacum malonyl-CoA: flavonoid/naphthol-glucosid-acyltransferase (BAD93691); Pf3AT, Perilla frutescens hydroxy cinnamoyl-CoA: anthocyanin-3-O-glucosid-6''-O- acyltransferase (BAA93475); Pf5MAT, P. frutescens anthocyanin-5-O-glucosid-6''-O- malonyltransferase (AAL50565); PhAT, Petunia hybrida acyltransferase (BAA93453); PhBPBT, P. hybrida benzoyl-CoA: benzyl alcohol/phenylethanol-benzoyltransferase (AAU06226); PsSaIAT, Papaver somniferum salutaridinol-7-O-acetyltransferase (AAK73661); RhAAT1, Rosa hybrida alcohol acetyltransferase (AAW31948); RsVinS, Rauwolfia serpentina vinorinsynthase (CAD89104); Sc3MaT: Senecia cruentus malonyl-CoA:anthocyanidin-3-O-glucosid-6''-O-malonyl-transferase (AAO38058) Ss5MAT1: Salvia splendens malonyl-CoA; anthocyanin-5-O-glucosid-6''-O- malonyltransferase (AAL50566); SsRAS: Solenostemon scutellarioides rosmarinic acid synthase (CAK55166); TcaDBTNBT: Taxus Canadensis 3'-N-debenzoyltaxol-N-benzoyltransferase (AAM75818); TcBAPT, Taxus cuspidata baccatin-III-O-phenylpropanoyltransferase (AAL92459); TcDBAT, T. cuspidata 10-deacetylbaccatin-III-10-O-acetyltransferase (AAF27621); TcDBBT, T. cuspidata 2-debenzoyl-7, 13-diacetylbaccatin-III-O-benzoyltransferase (Q9FPW3); TcTAT, T. cuspidata taxa-4(20),11(12)-dien-5α-ol-O-acetyltransferase (AAF34254); Vh3MAT1, Verbena x hybrida malonyl-CoA: flavonol-3-O-glucosid-6''-O-malonyltransferase (AAS77404); VIAMAT, Vitis labrusca anthranoyl-CoA:methanolacyltransferase (AAW22989); ZmGlossy2, Zea mays Glossy2- protein, biosynthesis of C32-waxes (CAA61258). Clade VI candidates have not been characterized.

91

4.4 Discussion

The EOs of most lavender species, in particular, those of the commercially propagated L. angustifolia and L. x intermedia plants, contain substantial amounts of monoterpene alcohols linalool, geraniol, lavandulol and nerol, and their corresponding esters linalyl acetate, geranyl acetate, lavandulyl acetate and neryl acetate (Aprotosoaie et al., 2017; Basch et al., 2004;

Lawrence, 1990; Lis-Balchin, 2002; Upson, Tim; Andrew, 2004). However, the EO of a few species, including L. latifolia, does not produce a significant amount of monoterpene acetates/esters (Herraiz-Peñalver et al., 2013; Lawrence, 1990). The monoterpene esters impart pleasant aromas to the EO and are among the key determinant of EO quality. For example, the finest lavender EOs, e.g., that of L. angustifolia, is highly enriched in linalyl acetate and other monoterpene esters and are mostly used in the cosmetic industry or aromatherapy. On the other hand, lavender EOs that lack monoterpene acetates, for example, that of L. latifolia, are characterized by strong unpleasant scents – due to the presence of high levels of other compounds such as camphor - and are often used in alternative medicine

(Cavanagh & Wilkinson, 2002). Despite the importance of monoterpene acetates, the genes responsible for the production of these metabolites in lavenders have not been described.

In search of alcohol acetyltransferase genes responsible for the synthesis of monoterpene esters in Lavandula, a cDNA library constructed from L. x intermedia floral glandular trichome (Demissie et al., 2012) was screened for putative acetyltransferase homologs, and a number of cDNA candidates identified. It is well established that the expression of genes involved in monoterpene synthases is transcriptionally regulated (e.g., menthofuran synthase in peppermint leaves (Mahmoud & Croteau, 2003), and linalool synthase in L. angustifolia flowers (Lane et al., 2010)), and that this transcriptional activity

92

directly correlates with product (monoterpene) formation. Provided the above and given that

L. angustifolia and L. x intermedia plants accumulate monoterpene acetates, but L. latifolia plants do not we anticipated that monoterpene acetyltransferases of interest would be exclusively (or more strongly) expressed in L. angustifolia and L. x intermedia flowers compared to those of L. latifolia plants. We, therefore, studied the expression pattern of the selected candidates by microarray analysis and PCR in these three closely related species.

These experiments led to the identification of two differentially expressed candidates

LiAAT-3 and LiAAT-4, which exhibited strong transcriptional activity in the floral tissue (in particular glandular trichomes) of L. angustifolia and L. x intermedia as compared to L. latifolia plants (Table 4.3, Figure 4.4, Figure 4.5).

The coding sequences for these candidates were cloned into the pGEX4T-1 expression vector and expressed in bacteria to produce the corresponding proteins. The partially purified recombinant proteins were assayed for activity using linalool, geraniol, lavandulol, and nerol as substrates, and acetyl Co-enzyme A as a co-factor. The assay reactions yielded geranyl acetate, lavandulyl acetate, and neryl acetate, but not linalyl acetate suggesting that linalyl acetate formation is catalyzed by a different enzyme. In this context, the cell-free extracts of glandular trichomes of lemon plants were able to catalyze the conversion of linalool to linalyl acetate, indicating that an enzyme capable of catalyzing this reaction exists (Zaks et al., 2008). Given that linalool is tertiary alcohol (other monoterpenes studied here are primary or secondary alcohols), it is possible that the linalool acetyltransferase is different in structure and mechanism of action than enzymes reported here (Aharoni et al., 2000; Beekwilder et al., 2004; Zaks et al., 2008).

93

The optimum pH and temperature for LiAAT-3 and LiAAT-4 were in the range of alcohol acetyltransferases reported before (Landmann et al., 2011; Sharma et al., 2013) and were determined to be 8.0-8.5 and 30 -32 °C, respectively. Both enzymes showed linear catalytic activity from 10 min to 120 min. Based on the kinetics data, LiAAT-4 has a much lower Km (mM) value than LiAAT-3 for all the substrates used in this experiment. However, both enzymes had similar Vmax values for all substrates except for geraniol, in which case

Vmax was higher for LiAAT-3 (81.07 pkat/mg) than LiAAT-4 (52.17 pkat/mg) (Table 4.3).

Therefore, LiAAT-4 had better catalytic efficiency than LiAAT-3 with a fold difference of

2.9, 3.19, and 7.7 for geraniol, lavandulol and nerol, respectively. Among these three substrates, lavandulol was a preferred substrate compared to geraniol and nerol. The catalytic efficiency for lavandulol was approximately 1.5-fold higher than that of geraniol for both enzymes. Also, geraniol was the preferred substrate compared to nerol, as the catalytic efficiencies of LiAAT-3 and LiAAT-4 were 3.1 and 1.35-fold higher for the former substrate.

Kinetic parameters of LiAAT-3 and LiAAT-4 are in the range of those reported for previously cloned or biochemically characterized alcohol acyltransferase genes from a number of different plants (Aharoni et al., 2000; Beekwilder et al., 2004; El-Sharkawy et al.,

2005; Shalit, 2003). In addition, the enzymatic activity of FvVAAT, CmAAT-4, CbBEBT,

VpAAT1 for geraniol and nerol were reported to be in the range of 10 - 3451 pkat/mg in the presence of acetyl CoA (Balbontín et al., 2010; Beekwilder et al., 2004; D’Auria, Chen, &

Pichersky, 2002; El-Sharkawy et al., 2005).

LiAAT-3 and LiAAT-4 were found to be associated with clade V and clade III after a non-rooting phylogenetic analysis with other characterized acetyltransferases from different plants (Figure 4.6). Although these proteins contain both conserved motifs found in

94

acyltransferases, the aspartic acid of DFGWG motif was replaced by glutamic acid (E378) in

LiAAT-3. It has been reported that HxxxD and DFGWG motifs are highly conserved in all acyltransferase proteins, and alterations of these motifs result in reduced enzyme activity

(Bayer et al., 2004; D’Auria, 2006; Suzuki et al., 2002; Unno et al., 2007). This could be the reason for the lower enzymatic activity of LiAAT-3 compared to LiAAT-4. In this context, other poorly active or inactive acyltransferase homologs (e.g., AtCER2 and ZmGlossy2) containing glutamic acid instead of aspartic acid in DFGWG motif have been previously reported (Tacke et al., 1995; Xia et al., 1996). In addition, a third conserve motif,

LSxTLxxxYxxxG (Aharoni et al., 2000; El-Sharkawy et al., 2005; González et al., 2009;

Sharma et al., 2013), which is less conserved among acyltransferase genes, is present in

LiAAT-4 with a modification of threonine73 to Isoleucine73. Only leucine74, tyrosine78, and glycine82 are present in LiAAT-3 for the third motif.

95

5 Chapter: Conclusion The economically important lavender is cultivated around the world for the composition of its valued EO which is dominated by monoterpenes and its esters. More than 50-60 terpenoid compounds have been identified from different lavender species, and terpene synthases for most of the terpenes are identified and functionally characterized. There was a need for a genomic resource to identify the terpene synthases which are not significantly abundant and infrequently present among the species. Similarly, information regarding the identification of transcription factors involved in the regulation of terpene synthesis is very limited in lavender EO biosynthesis. In this thesis, we have developed a whole transcriptomic database from three economically important lavender species using Illumina based sequencing. This database was useful to identify a very rare EO compound, S-linalool, from L. x intermedia flower. We were also able to map the genes responsible in lavender EO biosynthesis and involved in secondary metabolite regulations.

5.1 Lavender transcriptomic database

We have developed an extensive EO-producing tissues-derived transcriptome database for

Lavandula and used the database to conduct in silico transcript profiling (DGE profiles) for three economically important lavender species. This database produced 101,618 unique contigs and over 75% sequences were annotated against known public databases. Most of the genes involved in MVA and MEP pathways as well as genes encoding prenyltransferases involved in terpenoid biosynthesis were identified. A total of 1633 TFs were identified in the

Lavandula transcriptome database and classified based on regulating the expression of genes involved in terpenoid metabolism in plants. All these investigations resulted in the identification of novel terpene synthase enzymes involved in essential oil biosynthesis, as 96

well as regulatory genes potentially involved in controlling the expression of terpene synthases and other isoprenoid biosynthetic genes. As a proof of concept study, we cloned and functionally characterized one of the scarcely expressed terpene synthases (Li(S)-LINS), responsible for the production of (S)-linalool in lavenders. Lavender produces very little S- linalool, which is typically masked by the R-linalool. Previously generated lavender EST database failed to identify any genes related to the S-linalool biosynthesis. Li(S)-LINS protein exhibits a strong homology to Class III TPSs, and unlike most other monoterpene synthases, it is clustered under TPS-g subfamily that lacks the RR(x)8W conserved motif.

The protein contains transit peptide mediating chloroplast targeting, and the corresponding gene is mainly expressed in flowers (Adal et al. 2019).

5.2 Identification of transcription factors regulating LiLINSp & LiCINSp

The biosynthetic pathways related to plant secondary metabolites are well known and most of the genes have been identified as well as characterized to understand plant terpenoid metabolism. However, only a few regulatory elements such as transcription factors and promoters involved in terpenoid biosynthesis have been reported. Such information is not available for lavender. An important objective of this thesis was to identify promoters and

TFs involved in the regulation of two important terpenes of lavender EO, linalool, and cineole. Linalool and cineole are found mostly in flower and leaf EO, respectively, and specific terpene synthases are spatiotemporally regulated. Using plant Genome walking kit, we have identified upstream genomic sequences of 768 bp and 1087 bp for linalool synthase and cineole synthase genes. Promoter analysis revealed TATA and CAAT regulatory elements along with more than hundreds of protein binding sites. We also employed Y1H assay to identify 96 DNA binding proteins and opted to characterize a total of 11 TFs,

97

including TFs from lavender transcriptomic database in N. benthamiana leaf disc assay. We report the identification, cloning and functional assay for a number of transcription factors that interact with one or both of the LiLINS and LiCINS promoters. Some of these TFs can activate one, or both promoters in tobacco leaves, while some do not. It is important to note that our results are only indications of the functions of these TFs. In order to firmly establish a function for these TFs as activators or repressors, it is crucial to carry out additional tests.

For example, one or more of these TFs can be overexpressed or knocked out in transgenic lavender plants to assess their effects of essential oil (monoterpene) production in Planta.

5.3 Lavender acetyltransferases

Terpenoid esters- such as linalool acetate, lavandulol acetate, geraniol acetate, etc.- are important compounds of lavender EO. In this thesis, we have demonstrated cloning and functionally characterization of two alcohol acetyltransferases from L. x intermedia glandular trichomes. Both enzymes are capable of synthesizing geranyl acetate, lavandulyl acetate, and neryl acetate from their respective monoterpene substrates, although both enzymes exhibit a preference for lavandulol as a substrate. Neither enzyme was able to convert linalool to linalyl acetate, which is a major EO constituent in some lavender species, indicating that a different acetyltransferase enzyme is likely involved in the production of linalyl acetate. The cloning of these acetyltransferases has significant implications in understanding the biology of essential oil formation in higher plants and enables future experiments aimed at improving

EO quality and yield in lavenders through metabolic engineering and targeted breeding programs.

98

References

Adal, A. M. (2019). Development of molecular markers and cloning of genes involved in the biosynthesis of monoterpenes in Lavandula. Adal, A. M., Sarker, L. S., Lemke, A. D., & Mahmoud, S. S. (2017). Isolation and functional characterization of a methyl jasmonate-responsive 3-carene synthase from Lavandula x intermedia. Plant Molecular Biology, 93(6), 641–657. https://doi.org/10.1007/s11103- 017-0588-6 Adaszynska-Skwirzynska, M., Szczerbinska, D., & Zych, S. (2019). Antibacterial activity of lavender essential oil and linalool combined with gentamicin on selected bacterial strains. Medycyna Weterynaryjna, 76(2), 115–118. https://doi.org/10.21521/mw.6279 Aharoni, a, Keizer, L. C. P., Bouwmeester, H. J., Sun, Z., Alvarez-Huerta, M., Verhoeven, H. a, … O’Connell, a P. (2000). Identification of the SAAT gene involved in strawberry flavor biogenesis by use of DNA microarrays. Plant Cell, 12(5), 647–661. https://doi.org/10.2307/3870992 Al-Badani, R. N., Da Silva, J. K. R., Setzer, W. N., Awadh Ali, N. A., Muharam, B. A., & Al-Fahad, A. J. A. (2017). Variations in Essential Oil Compositions of Lavandula pubescens (Lamiaceae) Aerial Parts Growing Wild in Yemen. Chemistry and Biodiversity, 14(3). https://doi.org/10.1002/cbdv.201600286 Alves, M., Dadalto, S., Gonçalves, A., de Souza, G., Barros, V., & Fietto, L. (2014). Transcription Factor Functional Protein-Protein Interactions in Plant Defense Responses. Proteomes, 2(1), 85–106. https://doi.org/10.3390/proteomes2010085 An, G. (1986). Development of Plant Promoter Expression Vectors and Their Use for Analysis of Differential Activity of Nopaline Synthase Promoter in Transformed Tobacco Cells. Plant Physiology, 81(1), 86–91. https://doi.org/10.1104/pp.81.1.86 Annadurai, R. S., Jayakumar, V., Mugasimangalam, R. C., Katta, M. A., Anand, S., Gopinathan, S., … Rao, S. N. (2012). Next generation sequencing and de novo transcriptome analysis of Costus pictus D. Don, a non-model plant with potent anti- diabetic properties. BMC Bioinformatics, 13(663), 1–15. https://doi.org/10.1186/1471- 2164-13-663 99

Aprotosoaie, A. C., Gille, E., Trifan, A., Luca, V. S., & Miron, A. (2017). Essential oils of Lavandula genus: a systematic review of their chemistry. Phytochemistry Reviews, 16(4), 761–799. https://doi.org/10.1007/s11101-017-9517-1 Arigoni, D., Sagner, S., Latzel, C., Eisenreich, W., Bacher, A., & Zenk, M. H. (1997). Terpenoid biosynthesis from 1-deoxy-D-xylulose in higher plants by intramolecular skeletal rearrangement. Proceedings of the National Academy of Sciences of the United States of America, 94(20), 10600–10605. https://doi.org/10.1073/pnas.94.20.10600 Ascensão, L., Mota, L., & Castro, M. (1999). Glandular Trichomes on the Leaves and Flowers of Plectranthus ornatus: Morphology, Distribution and {…}. Annals of Botany, 84, 437–447. https://doi.org/10.1006/anbo.1999.0937 Balbontín, C., Gaete-Eastman, C., Fuentes, L., Figueroa, C. R., Herrera, R., Manriquez, D., … Moya-León, M. A. (2010). VpAAT1, a gene encoding an alcohol acyltransferase, is involved in ester biosynthesis during ripening of mountain papaya fruit. Journal of Agricultural and Food Chemistry, 58(8), 5114–5121. https://doi.org/10.1021/jf904296c Basch, E., Foppa, I., Liebowitz, R., Nelson, J., Smith, M., Sollars, D., & Ulbricht, C. (2004). Lavender (Lavandula angustifolia Miller). In Journal of Herbal Pharmacotherapy. https://doi.org/10.1300/J157v04n02_07 Bayer, A., Ma, X., & Stöckigt, J. (2004). Acetyltransfer in natural product biosynthesis - Functional cloning and molecular analysis of vinorine synthase. Bioorganic and Medicinal Chemistry, 12(10), 2787–2795. https://doi.org/10.1016/j.bmc.2004.02.029 Beekwilder, J., Alvarez-Huerta, M., Neef, E., Verstappen, F. W. A., Bouwmeester, H. J., & Aharoni, A. (2004). Functional characterization of enzymes forming volatile esters from strawberry and banana. Plant Physiology, 135(4), 1865–1878. https://doi.org/10.1104/pp.104.042580 Bick, J. A., & Lange, B. M. (2003). Metabolic cross talk between cytosolic and plastidial pathways of isoprenoid biosynthesis: Unidirectional transport of intermediates across the chloroplast envelope membrane. Archives of Biochemistry and Biophysics, 415(2), 146–154. https://doi.org/10.1016/S0003-9861(03)00233-9 Boeckelmann, A. (2008). Monoterpene production and regulation in Lavenders ( Lavandula angustifolia and Lavandula x intermedia ) by, (July), 96. Bohlmann, J, Meyer-Gauen, G., & Croteau, R. (1998). Plant terpenoid synthases: molecular 100

biology and phylogenetic analysis. Proceedings of the National Academy of Sciences of the United States of America, 95(8), 4126–4133. https://doi.org/10.1073/pnas.95.8.4126 Bohlmann, Jörg, Steele, C. L., & Croteau, R. (1997). Monoterpene synthases from grand fir (Abies grandis): cDNA isolation, characterization, and functional expression of myrcene synthase, (-)-(4S)- limonene synthase, and (-)-(1S,5S)-pinene synthase. Journal of Biological Chemistry, 272(35), 21784–21792. https://doi.org/10.1074/jbc.272.35.21784 Braich, S., Baillie, R. C., Jewell, L. S., Spangenberg, G. C., & Cogan, N. O. I. (2019). Generation of a Comprehensive Transcriptome Atlas and Transcriptome Dynamics in Medicinal Cannabis. Scientific Reports, 9(16583), 1–12. https://doi.org/10.1038/s41598- 019-53023-6 Cane, D. E., Xue, Q., & Fitzsimons, B. C. (1996). Trichodiene synthase. Probing the role of the highly conserved aspartate-rich region by site-directed mutagenesis. Biochemistry, 35(38), 12369–12376. https://doi.org/10.1021/bi961344y Cane, D. E., Xue, Q., Van Epp, J. E., & Tsantrizos, Y. S. (1996). Enzymatic formation of isochamigrene, a novel sesquiterpene, by alteration of the aspartate-rich region of trichodiene synthase. Journal of the American Chemical Society, 118(35), 8499–8500. https://doi.org/10.1021/ja961897w Cathey, K., Gunyon, N., Chung, N., Conway, N., Ames, D., Singh, M., … Rovin, R. A. (2020). A Feasibility Study of Lavender Aromatherapy in an Awake Craniotomy Environment. Journal of Patient-Centered Research and Reviews. https://doi.org/10.17294/2330-0698.1716 Cavanagh, H. M. A., & Wilkinson, J. M. (2002). Biological activities of lavender essential oil. Phytotherapy Research, 16(4), 301–308. https://doi.org/10.1002/ptr.1103 Chappell, J., Wolf, F., Proulx, J., Cuellar, R., & Saunders, C. (1995). Is the Reaction Catalyzed by 3-Hydroxy-3-Methylglutaryl Coenzyme A Reductase a Rate-Limiting Step for Isoprenoid Biosynthesis in Plants? Plant Physiology, 109(4), 1337–1343. https://doi.org/10.1104/pp.109.4.1337 Chen, Minghui; Yan, T. (2017). Glandular trichome specific WRKY1 promotes artemisinin biosynthesis. New Phytologist. Chen, F., Tholl, D., Bohlmann, J., & Pichersky, E. (2011). The family of terpene synthases in plants: A mid-size family of genes for specialized metabolism that is highly diversified 101

throughout the kingdom. Plant Journal, 66(1), 212–229. https://doi.org/10.1111/j.1365- 313X.2011.04520.x Chen, M., Yan, T., Shen, Q., Lu, X., Pan, Q., Huang, Y., … Tang, K. (2017). GLANDULAR TRICHOME-SPECIFIC WRKY 1 promotes artemisinin biosynthesis in Artemisia annua. New Phytologist. https://doi.org/10.1111/nph.14373 Chevalier, F., Perazza, D., Laporte, F., Le Hénanff, G., Hornitschek, P., Bonneville, J. M., … Vachon, G. (2008). GeBP and GeBP-like proteins are noncanonical leucine-zipper transcription factors that regulate cytokinin response in arabidopsis. Plant Physiology, 146(3), 1142–1154. https://doi.org/10.1104/pp.107.110270 Chinnusamy, V., & Zhu, J. (2009). Epigenetic regulation of stress responses in plants. Curr Opin Plant Biol, 12(2), 133–139. Christianson, D. W. (2006). Structural Biology and Chemistry of the Terpenoid Cyclases Structural Biology and Chemistry of the Terpenoid Cyclases. Chemical reviews (Vol. 106). https://doi.org/10.1021/cr050286w Chu, C. (2005). Lavender (Lavandula spp.). Task Force: Http://Www. Mep. Edu, 1–32. Retrieved from http://www.greenpeoplescentfree.com/GP_Documents/Lavandula _spp_monograph.pdf Croteau, R., Kutchan, T. M., & Lewis, N. G. (2000). Natural products (Secondary Metatolites). In Biochemistry & Molecular Biology of Plants (pp. 1250–1318). D’Auria, J. C. (2006). Acyltransferases in plants: a good time to be BAHD. Current Opinion in Plant Biology, 9(3), 331–340. https://doi.org/10.1016/j.pbi.2006.03.016 D’Auria, J. C., Chen, F., & Pichersky, E. (2002). Characterization of an acyltransferase capable of synthesizing benzylbenzoate and other volatile esters in flowers and damaged leaves of Clarkia breweri. Plant Physiology, 130(1), 466–476. https://doi.org/10.1104/pp.006460 D’Auria, J. C., Pichersky, E., Schaub, A., Hansel, A., & Gershenzon, J. (2007). Characterization of a BAHD acyltransferase responsible for producing the green leaf volatile (Z)-3-hexen-1-yl acetate in Arabidopsis thaliana. Plant Journal, 49(2), 194– 207. https://doi.org/10.1111/j.1365-313X.2006.02946.x Degenhardt, J., Köllner, T. G., & Gershenzon, J. (2009). Monoterpene and sesquiterpene synthases and the origin of terpene skeletal diversity in plants. Phytochemistry, 70(15– 102

16), 1621–1637. https://doi.org/10.1016/j.phytochem.2009.07.030 Demissie, Z. A., Cella, M. A., Sarker, L. S., Thompson, T. J., Rheault, M. R., & Mahmoud, S. S. (2012). Cloning, functional characterization and genomic organization of 1,8- cineole synthases from Lavandula. Plant Molecular Biology, 79(4–5), 393–411. https://doi.org/10.1007/s11103-012-9920-3 Demissie, Z. A., Erland, L. A. E., Rheault, M. R., & Mahmoud, S. S. (2013). The biosynthetic origin of irregular monoterpenes in lavandula: Isolation and biochemical characterization of a novel cis-prenyl diphosphate synthase gene, lavandulyl diphosphate synthase. Journal of Biological Chemistry, 288(9), 6333–6341. https://doi.org/10.1074/jbc.M112.431171 Demissie, Z. a, Sarker, L. S., & Mahmoud, S. S. (2011). Cloning and functional characterization of β-phellandrene synthase from Lavandula angustifolia. Planta, 233(4), 685–696. https://doi.org/10.1007/s00425-010-1332-5 Despinasse, Y., Fiorucci, S., Antonczak, S., Moja, S., Bony, A., Nicolè, F., … Jullien, F. (2017). Bornyl-diphosphate synthase from Lavandula angustifolia: A major monoterpene synthase involved in essential oil quality. Phytochemistry, 137, 24–33. https://doi.org/10.1016/j.phytochem.2017.01.015 Dewick, P. M. (2002). The biosynthesis of C5-C25 terpenoid compounds. Natural Product Reports, 19(2), 181–222. https://doi.org/10.1039/np9971400111 Dhandapani, S., Jin, J., Sridhar, V., Sarojam, R., Chua, N.-H., & Jang, I.-C. (2017). Integrated metabolome and transcriptome analysis of Magnolia champaca identifies biosynthetic pathways for floral volatile organic compounds. BMC Genomics, 18(1), 463. https://doi.org/10.1186/s12864-017-3846-8 Dietz, K. J., Vogel, M. O., & Viehhauser, A. (2010). AP2/EREBP transcription factors are part of gene regulatory networks and integrate metabolic, hormonal and environmental signals in stress acclimation and retrograde signalling. Protoplasma, 245(1), 3–14. https://doi.org/10.1007/s00709-010-0142-8 Dobi, K. C., & Winston, F. (2007). Analysis of Transcriptional Activation at a Distance in Saccharomyces cerevisiae. Molecular and Cellular Biology, 27(15), 5575–5586. https://doi.org/10.1128/mcb.00459-07 Dubos, C., Stracke, R., Grotewold, E., Weisshaar, B., Martin, C., & Lepiniec, L. (2010, 103

October). MYB transcription factors in Arabidopsis. Trends in Plant Science. https://doi.org/10.1016/j.tplants.2010.06.005 Dudareva, N., Martin, D., Kish, C. M., Kolosova, N., Gorenstein, N., Fäldt, J., … Bohlmann, J. (2003). (E)-beta-ocimene and myrcene synthase genes of floral scent biosynthesis in snapdragon: function and expression of three terpene synthase genes of a new terpene synthase subfamily. The Plant Cell, 15(5), 1227–1241. https://doi.org/10.1105/tpc.011015 El-Sharkawy, I., Manríquez, D., Flores, F. B., Regad, F., Bouzayen, M., Latché, A., & Pech, J. C. (2005). Functional characterization of a melon alcohol acyl-transferase gene family involved in the biosynthesis of ester volatiles. Identification of the crucial role of a threonine residue for enzyme activity. Plant Molecular Biology, 59(2), 345–362. https://doi.org/10.1007/s11103-005-8884-y Ennajdaoui, H., Vachon, G., Giacalone, C., Besse, I., Sallaud, C., Herzog, M., & Tissier, A. (2010). Trichome specific expression of the tobacco (Nicotiana sylvestris) cembratrien- ol synthase genes is controlled by both activating and repressing cis-regions. Plant Molecular Biology, 73(6), 673–685. https://doi.org/10.1007/s11103-010-9648-x Fahn, A. (1988). Tansley Review No. 14 Secretory Tissues in Vascular Plants. New Phytologist, 108(14), 229–257. https://doi.org/10.1111/j.1469-8137.1988.tb03729.x Fäldt, J., Arimura, G. I., Gershenzon, J., Takabayashi, J., & Bohlmann, J. (2003). Functional identification of AtTPS03 as (E)-β-ocimene synthase: A monoterpene synthase catalyzing jasmonate- and wound-induced volatile formation in Arabidopsis thaliana. Planta, 216(5), 745–751. https://doi.org/10.1007/s00425-002-0924-0 Farmer, E. E., Alméras, E., & Krishnamurthy, V. (2003). Jasmonates and related oxylipins in plant responses to pathogenesis and herbivory. Current Opinion in Plant Biology. Elsevier Ltd. https://doi.org/10.1016/S1369-5266(03)00045-1 Fridman, E. (2005). Metabolic, Genomic, and Biochemical Analyses of Glandular Trichomes from the Wild Tomato Species Lycopersicon hirsutum Identify a Key Enzyme in the Biosynthesis of Methylketones. The Plant Cell Online, 17(4), 1252–1267. https://doi.org/10.1105/tpc.104.029736 Fu, L., Niu, B., Zhu, Z., Wu, S., & Li, W. (2012). CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics, 28(23), 3150–3152. 104

https://doi.org/10.1093/bioinformatics/bts565 Galata, M., Sarker, L. S., & Mahmoud, S. S. (2014). Transcriptome profiling, and cloning and characterization of the main monoterpene synthases of Coriandrum sativum L. Phytochemistry, 102, 64–73. https://doi.org/10.1016/j.phytochem.2014.02.016 Garvey, G. S., McCormick, S. P., Alexander, N. J., & Rayment, I. (2009). Structural and functional characterization of TRI3 trichothecene 15-O-acetyltransferase from Fusarium sporotrichioides. Protein Science : A Publication of the Protein Society, 18(4), 747–761. https://doi.org/10.1002/pro.80 Gershenzon, J., McConkey, M. E., & Croteau, R. B. (2000). Regulation of Monoterpene Accumulation in Leaves of Peppermint. Plant Physiology, 122(1), 205–214. https://doi.org/10.1104/pp.122.1.205 González, M., Gaete-Eastman, C., Valdenegro, M., Figueroa, C. R., Fuentes, L., Herrera, R., & Moya-León, M. A. (2009). Aroma development during ripening of Fragaria chiloensis fruit and participation of an alcohol acyltransferase (FcAAT1) gene. Journal of Agricultural and Food Chemistry, 57(19), 9123–9132. https://doi.org/10.1021/jf901693j Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., … Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology. https://doi.org/10.1038/nbt.1883 Grotewold, E. (2008, April). Transcription factors for predictive plant metabolic engineering: are we there yet? Current Opinion in Biotechnology. https://doi.org/10.1016/j.copbio.2008.02.002 Han, R., Rai, A., Nakamura, M., Suzuki, H., Takahashi, H., Yamazaki, M., & Saito, K. (2016). De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products. In Methods in Enzymology (pp. 19–45). https://doi.org/10.1016/bs.mie.2016.03.001 Han, Y., Gao, S., Muegge, K., Zhang, W., & Zhou, B. (2015). Advanced applications of RNA sequencing and challenges. Bioinformatics and Biology Insights, 9, 29–46. https://doi.org/10.4137/BBI.S28991 Heim, M. A., Jakoby, M., Werber, M., Martin, C., Weisshaar, B., & Bailey, P. C. (2003). The basic helix-loop-helix transcription factor family in plants: A genome-wide study of 105

protein structure and functional diversity. Molecular Biology and Evolution, 20(5), 735– 747. https://doi.org/10.1093/molbev/msg088 Herraiz-Peñalver, D., Cases, M. Á., Varela, F., Navarrete, P., Sánchez-Vioque, R., & Usano- Alemany, J. (2013). Chemical characterization of Lavandula latifolia Medik. essential oil from Spanish wild populations. Biochemical Systematics and Ecology, 46, 59–68. https://doi.org/10.1016/j.bse.2012.09.018 Hong, G.-J., Xue, X.-Y., Mao, Y.-B., Wang, L.-J., & Chen, X.-Y. (2012a). Arabidopsis MYC2 interacts with DELLA proteins in regulating sesquiterpene synthase gene expression. The Plant Cell, 24(6), 2635–2648. https://doi.org/10.1105/tpc.112.098749 Hong, G.-J., Xue, X.-Y., Mao, Y.-B., Wang, L.-J., & Chen, X.-Y. (2012b). Arabidopsis MYC2 Interacts with DELLA Proteins in Regulating Sesquiterpene Synthase Gene Expression. The Plant Cell. https://doi.org/10.1105/tpc.112.098749 Hrdlickova, R., Toloue, M., & Tian, B. (2017). RNA-Seq methods for transcriptome analysis. Wiley Interdisciplinary Reviews: RNA, 8(1). https://doi.org/10.1002/wrna.1364 Huang, M., Abel, C., Sohrabi, R., Petri, J., Haupt, I., Cosimano, J., … Tholl, D. (2010). Variation of herbivore-induced volatile terpenes among arabidopsis ecotypes depends on allelic differences and subcellular targeting of two terpene synthases, TPS02 and TPS03. Plant Physiology, 153(3), 1293–1310. https://doi.org/10.1104/pp.110.154864 Huber, S. C., & Hardin, S. C. (2004, June). Numerous posttranslational modifications provide opportunities for the intricate regulation of metabolic enzymes at multiple levels. Current Opinion in Plant Biology. https://doi.org/10.1016/j.pbi.2004.03.002 Hyun, T. K., Rim, Y., Jang, H. J., Kim, C. H., Park, J., Kumar, R., … Kim, J. Y. (2012). De novo transcriptome sequencing of Momordica cochinchinensis to identify genes involved in the biosynthesis. Plant Molecular Biology, 79(4–5), 413–427. https://doi.org/10.1007/s11103-012-9919-9 Iwase, A., Matsui, K., & Ohme-Takagi, M. (2009). Manipulation of plant metabolic pathways by transcription factors. Plant Biotechnology, 26, 29–38. Retrieved from http://www.jspcmb.jp/ Jefferson, R. A., Kavanagh, T. A., & Bevan, M. W. (1987). GUS fusions: beta-glucuronidase as a sensitive and versatile gene fusion marker in higher plants. EMBO Journal, 6(13), 3901–3907. 106

Jullien, F., Moja, S., Bony, A., Legrand, S., Petit, C., Benabdelkader, T., … Magnard, J. L. (2014). Isolation and functional characterization of a τ-cadinol synthase, a new sesquiterpene synthase from Lavandula angustifolia. Plant Molecular Biology, 84(1–2), 227–241. https://doi.org/10.1007/s11103-013-0131-3 Jyothishwaran, G., Kotresha, D., Selvaraj, T., Srideshikan, S. M., Rajvanshi, P. K., & Jayabaskaran, C. (2007). A modified freeze-thaw method for efficient transformation of Agrobacterium tumefaciens [10]. Current Science, 93(6), 770–772. Kallberg, Y., Oppermann, U., Jörnvall, H., & Persson, B. (2002). Short-chain dehydrogenases/reductases (SDRs). Coenzyme-based functional assignments in completed genomes. European Journal of Biochemistry, 269(18), 4409–4417. https://doi.org/10.1046/j.1432-1033.2002.03130.x Kallberg, Y., Oppermann, U., & Persson, B. (2010). Classification of the short-chain dehydrogenase/reductase superfamily using hidden Markov models. FEBS Journal, 277(10), 2375–2386. https://doi.org/10.1111/j.1742-4658.2010.07656.x Kavanagh, K. L., Jörnvall, H., Persson, B., & Oppermann, U. (2008). Medium- and short- chain dehydrogenase/reductase gene and protein families: The SDR superfamily: Functional and structural diversity within a family of metabolic and regulatory enzymes. Cellular and Molecular Life Sciences, 65(24), 3895–3906. https://doi.org/10.1007/s00018-008-8588-y Landi, Lucia; Angelini, Rita M. De Miccolis; Pollastro, Stefania; Feliziani, Erica; Faretra, Franco; Romanazzi, G. (2017). Global transcriptome analysis and identification of differentially expressed genes in Strawberry after preharvest application of benzothiadiazole and chitosan. Frontiers in Plant Science, 8, 1–22. https://doi.org/10.3389/fpls.2017.00235 Landmann, C., Fink, B., Festner, M., Dregus, M., Engel, K. H., & Schwab, W. (2007). Cloning and functional characterization of three terpene synthases from lavender (Lavandula angustifolia). Archives of Biochemistry and Biophysics, 465(2), 417–429. https://doi.org/10.1016/j.abb.2007.06.011 Landmann, C., Hücherig, S., Fink, B., Hoffmann, T., Dittlein, D., Coiner, H. A., & Schwab, W. (2011). Substrate promiscuity of a rosmarinic acid synthase from lavender (Lavandula angustifolia L.). Planta, 234(2), 305–320. https://doi.org/10.1007/s00425- 107

011-1400-5 Lane, A., Boecklemann, A., Woronuk, G. N., Sarker, L., & Mahmoud, S. S. (2010). A genomics resource for investigating regulation of essential oil production in Lavandula angustifolia. Planta, 231(4), 835–845. https://doi.org/10.1007/s00425-009-1090-4 Laule, O., Fürholz, A., Chang, H.-S., Zhu, T., Wang, X., Heifetz, P. B., … Lange, M. (2003). Crosstalk between cytosolic and plastidial pathways of isoprenoid biosynthesis in Arabidopsis thaliana. Proceedings of the National Academy of Sciences of the United States of America, 100(11), 6866–6871. https://doi.org/10.1073/pnas.1031755100 Lawrence, B. (1990). Progress in Essential Oils. Perfume Flavor, 15, 57–60. Lee, T. I., & Young, R. A. (2000). Transcription of Eukaryotic Protein-Coding Genes. Annual Review of Genetics, 34(1), 77–137. https://doi.org/10.1146/annurev.genet.34.1.77 Lei, W., Yao, R. X., Kang, X. H., Tang, S. H., Qiao, A. M., & Sun, M. (2011). Isolation and characterization of the anthocyanidin genes PAL, F3H and DFR of Scutellaria viscidula (Lamiaceae). Genetics and Molecular Research : GMR, 10(4), 3385–3402. https://doi.org/10.4238/2011.November.22.7 Li, Xiang; Xu, Yaying; Shen, Shuling; Yin, Xueren; Klee, Harry; Zhang, Bo; Chen, K. (2017). Transcription factor CitERF71 activates the terpene synthase gene CitTPS16 involved in the synthesis of E-geraniol in sweet organge fruit. Journal of Experimental Biology, 1–10. https://doi.org/10.1093/jxb/erx316 Lis-Balchin, M. (2002). Chemical composition of essential oils from different species, hybrids and cultivars of Lavandula. In Lavender: the genus Lavandula (pp. 251–262). Taylor & Francis, London. Little, D. B., & Croteau, R. B. (2002). Alteration of product formation by directed mutagenesis and truncation of the multiple-product sesquiterpene synthases ??-selinene synthase and ??-humulene synthase. Archives of Biochemistry and Biophysics, 402(1), 120–135. https://doi.org/10.1016/S0003-9861(02)00068-1 Liu, Yan, Wang, H., Ye, H. C., & Li, G. F. (2005). Advances in the plant isoprenoid biosynthesis pathway and its metabolic engineering. Journal of Integrative Plant Biology, 47(7), 769–782. https://doi.org/10.1111/j.1744-7909.2005.00111.x Liu, Yue, Wang, Y., Guo, F., Zhan, L., Mohr, T., Cheng, P., … Gu, Y. Q. (2017). Deep 108

sequencing and transcriptome analyses to identify genes involved in secoiridoid biosynthesis in the Tibetan medicinal plant Swertia mussotii. Scientific Reports. https://doi.org/10.1038/srep43108 Lloyd, A., Brockman, A., Aguirre, L., Campbell, A., Bean, A., Cantero, A., & Gonzalez, A. (2017). Advances in the MYB-bHLH-WD Repeat (MBW) pigment regulatory model: Addition of a WRKY factor and co-option of an anthocyanin MYB for betalain regulation. Plant and Cell Physiology, 58(9), 1431–1441. https://doi.org/10.1093/pcp/pcx075 Lou, Q., Liu, Y., Qi, Y., Jiao, S., Tian, F., Jiang, L., & Wang, Y. (2014). Transcriptome sequencing and metabolite analysis reveals the role of delphinidin metabolism in flower colour in grape hyacinth. Journal of Experimental Botany, 65(12), 3157–3164. https://doi.org/10.1093/jxb/eru168 Lowe, R., Shirley, N., Bleackley, M., Dolan, S., & Shafee, T. (2017). Transcriptomics technologies. PLoS Computational Biology, 13(5), 1–23. https://doi.org/10.1371/journal.pcbi.1005457 Lu, X., Zhang, L., Zhang, F., Jiang, W., Shen, Q., Zhang, L., … Wang, G. (2013). AaORA , a trichome-specific AP2 / ERF transcription factor of Artemisia annua , is a positive regulator in the artemisinin biosynthetic pathway and in disease resistance to Botrytis cinerea, 2, 1191–1202. Mahizan, N. A., Yang, S., Moo, C.-L., & Song, A. A.-L. (2019). Terpene Derivatives as a Potential Agent against. Molecules, 24(2631), 1–21. Mahmoud, S. S., & Croteau, R. B. (2002). Strategies for transgenic manipulation of monoterpene biosynthesis in plants. Trends in Plant Science, 7(8), 366–373. https://doi.org/10.1016/S1360-1385(02)02303-8 Mahmoud, S. S., & Croteau, R. B. (2003). Menthofuran regulates essential oil biosynthesis in peppermint by controlling a downstream monoterpene reductase. Proceedings of the National Academy of Sciences, 100(24), 14481–14486. https://doi.org/10.1073/pnas.2436325100 Malli, R. P. N., Adal, A. M., Sarker, L. S., Liang, P., & Mahmoud, S. S. (2019). De novo sequencing of the Lavandula angustifolia genome reveals highly duplicated and optimized features for essential oil production. Planta, 249(1), 251–256. 109

https://doi.org/10.1007/s00425-018-3012-9 Martin, C., & Paz-Ares, J. (1997, February). MYB transcription factors in plants. Trends in Genetics. https://doi.org/10.1016/S0168-9525(96)10049-4 McGarvey, D. J., & Croteau, R. (1995). Terpenoid metabolism. Plant Cell, 7(7), 1015–1026. https://doi.org/10.1105/tpc.7.7.1015 Memelink, J., & Gantet, P. (2007). Transcription factors involved in terpenoid indole alkaloid biosynthesis in Catharanthus roseus. Phytochemistry Reviews, 6(2–3), 353–362. https://doi.org/10.1007/s11101-006-9051-z Millard, P. S., Weber, K., Kragelund, B. B., & Burow, M. (2019). Specificity of MYB interactions relies on motifs in ordered and disordered contexts. Nucleic Acids Research, 47(18), 9592–9608. https://doi.org/10.1093/nar/gkz691 Miller, B., Madilao, L. L., Ralph, S., & Bohlmann, J. (2005). Insect-induced conifer defense. White pine weevil and methyl jasmonate induce traumatic resinosis, de novo formed volatile emissions, and accumulation of terpenoid synthase and putative octadecanoid pathway transcripts in sitka spruce. Plant Physiology, 137(1), 369–382. https://doi.org/10.1104/pp.104.050187 Misawa, N. (2011). Pathway engineering for functional isoprenoids. Current Opinion in Biotechnology, 22, 627–633. https://doi.org/10.1016/j.copbio.2011.01.002 Moummou, H., Kallberg, Y., Tonfack, L. B., Persson, B., & van der Rest, B. (2012). The plant short-chain dehydrogenase (SDR) superfamily: genome-wide inventory and diversification patterns. BMC Plant Biology, 12, 219. https://doi.org/10.1186/1471- 2229-12-219 Navia-Giné, W. G., Yuan, J. S., Mauromoustakos, A., Murphy, J. B., Chen, F., & Korth, K. L. (2009). Medicago truncatula (E)-β-ocimene synthase is induced by insect herbivory with corresponding increases in emission of volatile ocimene. Plant Physiology and Biochemistry, 47(5), 416–425. https://doi.org/10.1016/j.plaphy.2009.01.008 Nieuwenhuizen, Neils J.; Chen, Xiuyin; Wang, Mindy Y.; Matich, Adam J.; Perez, Ramon L.; Allan, Andrew C.; Green, Sol A.; Atkinson, R. G. (2015). Natural variation in mTPS in kiwifruit: transcriptional regulation of terpene synthases by NAC and Ethylene insensitive3 like transcription factors. Plant Physiology, 167, 1243–1258. Okamoto, S., Yu, F., Harada, H., Okajima, T., Hattan, J. I., Misawa, N., & Utsumi, R. 110

(2011). A short-chain dehydrogenase involved in terpene metabolism from Zingiber zerumbet. FEBS Journal, 278(16), 2892–2900. https://doi.org/10.1111/j.1742- 4658.2011.08211.x Patra, B., Schluttenhofer, C., Wu, Y., Pattanaik, S., & Yuan, L. (2013). Transcriptional regulation of secondary metabolite biosynthesis in plants. Biochimica et Biophysica Acta - Gene Regulatory Mechanisms, 1829(11), 1236–1247. https://doi.org/10.1016/j.bbagrm.2013.09.006 Paz-Ares, J., Ghosal, D., Wienand, U., Peterson, P. A., & Saedler, H. (1987). The regulatory c1 locus of Zea mays encodes a protein with homology to proto-oncogene products and with structural similarities to transcriptional activators. The EMBO Journal, 6(12), 3553–3558. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/3428265 Persson, B., Hedlund, J., & Jörnvall, H. (2008). Medium- and short-chain dehydrogenase/reductase gene and protein families: The MDR superfamily. Cellular and Molecular Life Sciences, 65(24), 3879–3894. https://doi.org/10.1007/s00018-008- 8587-z Pieterse, C. M. J., Leon-Reyes, A., Van Der Ent, S., & Van Wees, S. C. M. (2009). Networking by small-molecule hormones in plant immunity. Nature Chemical Biology. Nature Publishing Group. https://doi.org/10.1038/nchembio.164 Prasath, D., Karthika, R., Habeeba, N. T., Suraby, E. J., Rosana, O. B., Shaji, A., … Gibas, C. (2014). Comparison of the Transcriptomes of Ginger (Zingiber officinale Rosc.) and Mango Ginger (Curcuma amada Roxb.) in Response to the Bacterial Wilt Infection. https://doi.org/10.1371/journal.pone.0099731 Reddy, V. A., Wang, Q., Dhar, N., Kumar, N., Venkatesh, P. N., Rajan, C., … Sarojam, R. (2017). Spearmint R2R3-MYB transcription factor MsMYB negatively regulates monoterpene production and suppresses the expression of geranyl diphosphate synthase large subunit (MsGPPS.LSU). Plant Biotechnology Journal, 15(9), 1105–1119. https://doi.org/10.1111/pbi.12701 Riechmann, J. L., Heard, J., Martin, G., Reuber, L., Jiang, C.-Z., Keddie, J., … Yu, G.-L. (2000). Arabidopsis Transcription Factors: Genome-Wide Comparative Analysis Among Eukaryotes. Science, 290, 2105–2110. Retrieved from http://science.sciencemag.org/content/sci/290/5499/2105.full.pdf 111

Rodríguez-Concepción, M., & Boronat, A. (2015, June 1). Breaking new ground in the regulation of the early steps of plant isoprenoid biosynthesis. Current Opinion in Plant Biology. Elsevier Ltd. https://doi.org/10.1016/j.pbi.2015.04.001 Romanel, E. a. C., Schrago, C. G., Couñago, R. M., Russo, C. a. M., & Alves-Ferreira, M. (2009). Evolution of the B3 DNA binding superfamily: New insights into REM family gene diversification. PLoS ONE, 4(6). https://doi.org/10.1371/journal.pone.0005791 Ruijter, J. M., Ramakers, C., Hoogaars, W. M. H., Karlen, Y., Bakker, O., Van Den Hoff, M. J. B., & Moorman, a. F. M. (2009). Amplification efficiency: Linking baseline and bias in the analysis of quantitative PCR data. Nucleic Acids Research, 37(6). https://doi.org/10.1093/nar/gkp045 Rushton, P. J., Bokowiec, M. T., Han, S., Zhang, H., Brannock, J. F., Chen, X., … Timko, M. P. (2008). Tobacco transcription factors: Novel insights into transcriptional regulation in the Solanaceae. Plant Physiology, 147(1), 280–295. https://doi.org/10.1104/pp.107.114041 Sarker, L. S. (2013). Cloning of Lavandula Essential Oil Biosynthetic Genes. Sarker, L. S., Demissie, Z. A., & Mahmoud, S. S. (2013). Cloning of a sesquiterpene synthase from Lavandula x intermedia glandular trichomes. Planta, 238(5), 983–989. https://doi.org/10.1007/s00425-013-1937-6 Sarker, L. S., Galata, M., Demissie, Z. A., & Mahmoud, S. S. (2012). Molecular cloning and functional characterization of borneol dehydrogenase from the glandular trichomes of Lavandula x intermedia. Archives of Biochemistry and Biophysics, 528(2), 163–170. https://doi.org/10.1016/j.abb.2012.09.013 Sarker, L. S., & Mahmoud, S. S. (2015). Cloning and functional characterization of two monoterpene acetyltransferases from glandular trichomes of L. x intermedia. Planta. https://doi.org/10.1007/s00425-015-2325-1 Seemann, M., Zhai, G., de Kraker, J.-W., Paschall, C. M., Christianson, D. W., & Cane, D. E. (2002). Pentalenene synthase. Analysis of active site residues by site-directed mutagenesis. Journal of the American Chemical Society, 124(26), 7681–7689. https://doi.org/10.1021/ja026058q Shalit, M. (2003). Volatile Ester Formation in Roses. Identification of an Acetyl-Coenzyme A. Geraniol/Citronellol Acetyltransferase in Developing Rose Petals. Plant Physiology, 112

131(4), 1868–1876. https://doi.org/10.1104/pp.102.018572 Sharma, P. K., Sangwan, N. S., Bose, S. K., & Sangwan, R. S. (2013). Biochemical characteristics of a novel vegetative tissue geraniol acetyltransferase from a monoterpene oil grass (Palmarosa, Cymbopogon martinii var. Motia) leaf. Plant Science, 203–204, 63–73. https://doi.org/10.1016/j.plantsci.2012.12.013 Spyropoulou, E. A., Haring, M. A., & Schuurink, R. C. (2014). Expression of Terpenoids 1, a glandular trichome-specific transcription factor from tomato that activates the terpene synthase 5 promoter. Plant Molecular Biology, 84(3), 345–357. https://doi.org/10.1007/s11103-013-0142-0 Spyropoulou, E. a, Haring, M. a, & Schuurink, R. C. (2014). RNA sequencing on Solanum lycopersicum trichomes identifies transcription factors that activate terpene synthase promoters. BMC Genomics, 15(1), 402. https://doi.org/10.1186/1471-2164-15-402 St-Pierre, B., & De Luca, V. (2000). Evolution of acyltransferase genes_ Origin and diversification of the BAHD superfamily of acyltransferases involved in secondary metabolism, 34, 285–315. Steele, C. L., Crock, J., Bohlmann, J., & Croteau, R. (1998). Sesquiterpene Synthases from Grand Fir (Abies grandis). The Journal of Biological Chemistry, 273(4), 2078–2089. https://doi.org/10.1074/jbc.273.4.2078 Suzuki, H., Nakayama, T., Yonekura-Sakakibara, K., Fukui, Y., Nakamura, N., Yamaguchi, M. A., … Nishino, T. (2002). cDNA cloning, heterologous expressions, and functional characterization of malonyl-coenzyme A:anthocyanidin 3-O-glucoside-6′-O- malonyltransferase from dahlia flowers. Plant Physiology, 130(4), 2142–2151. https://doi.org/10.1104/pp.010447 Tacke, E., Korfhage, C., Michel, D., Maddaloni, M., Motto, M., Lanzini, S., … Döring, H. P. (1995). Transposon tagging of the maize Glossy2 locus with the transposable element En/Spm. The Plant Journal : For Cell and Molecular Biology, 8(6), 907–917. https://doi.org/10.1046/j.1365-313x.1995.8060907.x Tan, H., Xiao, L., Gao, S., Li, Q., Chen, J., Xiao, Y., … Zhang, L. (2015). TRICHOME and ARTEMISININ REGULATOR 1 is required for trichome development and artemisinin biosynthesis in Artemisia annua. Molecular Plant, 8(9), 1396–1411. https://doi.org/10.1016/j.molp.2015.04.002 113

Tholl, Dorothea; Lee, S. (2011). Terpene specialized metabolism in Arabidopsis thaliana. The Arabidopsis Book. https://doi.org/10.1199/tab.0143 Tholl, D. (2015). Biosynthesis and biological functions of terpenoids in plants. In Biotechnology of isoprenoids (pp. 63–106). Tissier, A. (2012, April). Glandular trichomes: What comes after expressed sequence tags? Plant Journal. https://doi.org/10.1111/j.1365-313X.2012.04913.x Toledo-Ortiz, G., Huq, E., & Quail, P. H. (2003). The Arabidopsis basic/helix-loop-helix transcription factor family. Plant Cell, 15(8), 1749–1770. https://doi.org/10.1105/tpc.013839 Trapp, S. C., & Croteau, R. B. (2001). Genomic organization of plant terpene synthases and molecular evolutionary implications. Genetics, 158(2), 811–832. Turner, G. W., Gershenzon, J., & Croteau, R. B. (2000). Development of Peltate Glandular Trichomes of Peppermint. Plant Physiology, 124(2), 665–679. https://doi.org/10.1104/pp.124.2.665 Unno, H., Ichimaida, F., Suzuki, H., Takahashi, S., Tanaka, Y., Saito, A., … Nakayama, T. (2007). Structural and mutational studies of anthocyanin malonyltransferases establish the features of BAHD enzyme catalysis. Journal of Biological Chemistry, 282(21), 15812–15822. https://doi.org/10.1074/jbc.M700638200 Upson, Tim; Andrew, S. (2004). The Genus Lavandula. Vranová, E., Coman, D., & Gruissem, W. (2012). Structure and dynamics of the isoprenoid pathway network. Molecular Plant, 5(2), 318–333. https://doi.org/10.1093/mp/sss015 Wang, Qian; Reddy, Vaishanavi; Panicker, Deepa; Mao, Hui-Zhu; Kumar, Nadimuthu; Rajan, Chakravarthy; Venkatesh, Prasanna N.; Chua, Nam-Hai; Sarojam, R. (2016). Metabolic engineering of terpene biosynthesis in plants using trichome specific transcription factor MsYABBY5 from spearmint (Mentha spicata). Plant Biotechnology Journal, 14, 1619–1632. Wang, G. (2014). Recent progress in secondary metabolism of plant glandular trichomes. Plant Biotechnology, 31(5), 353–361. https://doi.org/10.5511/plantbiotechnology.14.0701a Wang, Z., Gerstein, M., & Snyder, M. (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews. Genetics, 10(1), 57–63. 114

Werker, E., Putievsky, E., Ravid, U., Katzir, I. (1993). Glandular and essential oil in developing leaves of ocimum basilicum.pdf. Annals of Botany. Whittington, D. a, Wise, M. L., Urbansky, M., Coates, R. M., Croteau, R. B., & Christianson, D. W. (2002). Bornyl diphosphate synthase: structure and strategy for carbocation manipulation by a terpenoid cyclase. Proceedings of the National Academy of Sciences of the United States of America, 99(24), 15375–15380. https://doi.org/10.1073/pnas.232591099 Williams, D. C., McGarvey, D. J., Katahira, E. J., & Croteau, R. (1998). Truncation of limonene synthase preprotein provides a fully active “pseudomature” form of this monoterpene cyclase and reveals the function of the amino-terminal arginine pair. Biochemistry, 37(35), 12213–12220. https://doi.org/10.1021/bi980854k Wise, M., Savage, T., & Katahira, E. (1998). Monoterpene synthases from common sage (Salvia officinalis). Journal of Biological, 273(24), 14891–14899. https://doi.org/10.1074/jbc.273.24.14891 Wolfe, N., & Herzberg, J. (1996). Can aromatherapy oils promote sleep in severely demented patients? [2]. International Journal of Geriatric Psychiatry. https://doi.org/10.1002/(SICI)1099-1166(199610)11:10<926::AID-GPS473>3.0.CO;2-1 Woronuk, G., Demissie, Z., Rheault, M., & Mahmoud, S. (2011). Biosynthesis and therapeutic properties of lavandula essential oil constituents. Planta Medica, 77(1), 7– 15. https://doi.org/10.1055/s-0030-1250136 Wray, G. A., Hahn, M. W., Abouheif, E., Balhoff, J. P., Pizer, M., Rockman, M. V., & Romano, L. A. (2003, September 1). The evolution of transcriptional regulation in eukaryotes. Molecular Biology and Evolution. Oxford University Press. https://doi.org/10.1093/molbev/msg140 Xia, Y., Nikolau, B. J., & Schnable, P. S. (1996). Cloning and characterization of CER2, an Arabidopsis gene that affects cuticular wax accumulation. Plant Cell, 8(8), 1291–1304. https://doi.org/10.1105/tpc.8.8.1291 Xu, Y.-H., Wang, J.-W., Wang, S., Wang, J.-Y., & Chen, X.-Y. (2004). Characterization of GaWRKY1, a cotton transcription factor that regulates the sesquiterpene synthase gene (+)-delta-cadinene synthase-A. Plant Physiology, 135(1), 507–515. https://doi.org/10.1104/pp.104.038612 115

Yu, Z. X., Li, J. X., Yang, C. Q., Hu, W. L., Wang, L. J., & Chen, X. Y. (2012). The jasmonate-responsive AP2/ERF transcription factors AaERF1 and AaERF2 positively regulate artemisinin biosynthesis in Artemisia annua L. In Molecular Plant. https://doi.org/10.1093/mp/ssr087 Yue, Y., Yu, R., & Fan, Y. (2015). Transcriptome profiling provides new insights into the formation of floral scent in Hedychium coronarium. BMC Genomics, 16, 470. https://doi.org/10.1186/s12864-015-1653-7 Zaks, A., Davidovich-Rikanati, R., Bar, E., Inbar, M., & Lewinsohn, E. (2008). Biosynthesis of linalyl acetate and other terpenes in lemon mint (Mentha aquatica var. citrata, Lamiaceae) glandular trichomes. Israel Journal of Plant Sciences, 56(3), 233–244. https://doi.org/10.1560/IJPS.56.3.233 Zhan, X., Yang, L., Wang, D., Zhu, J. K., & Lang, Z. (2016). De novo assembly and analysis of the transcriptome of Ocimum americanum var. pilosum under cold stress. BMC Genomics, 17(1), 209. https://doi.org/10.1186/s12864-016-2507-7 Zhang, F., Fu, X., Lv, Z., Lu, X., Shen, Q., Zhang, L., … Tang, K. (2015). A basic transcription factor, aabzip1, connects abscisic acid signaling with artemisinin biosynthesis in artemisia annua. Molecular Plant, 8(1), 163–175. https://doi.org/10.1016/j.molp.2014.12.004 Zhang, H., Hedhili, S., Montiel, G., Zhang, Y., Chatel, G., Pré, M., … Memelink, J. (2011). The basic helix-loop-helix transcription factor CrMYC2 controls the jasmonate- responsive expression of the ORCA genes that regulate alkaloid biosynthesis in Catharanthus roseus. Plant Journal, 67(1), 61–71. https://doi.org/10.1111/j.1365- 313X.2011.04575.x Zhao, Q.-Y., Wang, Y., Kong, Y.-M., Luo, D., Li, X., & Hao, P. (2011). Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study. BMC Bioinformatics, 12(Suppl 14), S2. https://doi.org/10.1186/1471-2105-12-S14-S2 Zhou, K., & Peters, R. J. (2009). Investigating the conservation pattern of a putative second terpene synthase divalent metal binding motif in plants. Phytochemistry, 70(3), 366– 369. https://doi.org/10.1016/j.phytochem.2008.12.022

116

Appendices

Appendix A : KEGG pathways using KO

Metabolism

Global and overview maps

01100 Metabolic pathways 818 01110 Biosynthesis of secondary metabolites 387 01120 Microbial metabolism in diverse environments 131 01130 Biosynthesis of antibiotics 187 01200 Carbon metabolism 90 01210 2-Oxocarboxylic acid metabolism 28 01212 Fatty acid metabolism 23 01230 Biosynthesis of amino acids 97 01220 Degradation of aromatic compounds 3 Carbohydrate metabolism

00010 Glycolysis / Gluconeogenesis 30 00020 Citrate cycle (TCA cycle ) 19 00030 Pentose phosphate pathway 17 00040 Pentose and glucuronate interconversions 12 00051 Fructose and mannose metabolism 20 00052 Galactose metabolism 17 00053 Ascorbate and aldarate metabolism 15 00500 Starch and sucrose metabolism 28 00520 Amino sugar and nucleotide sugar metabolism 38 00620 Pyruvate metabolism 29 00630 Glyoxylate and dicarboxylate metabolism 27 00640 Propanoate metabolism 16 00650 Butanoate metabolism 10 00660 C5-Branched dibasic acid metabolism 5 00562 Inositol phosphate metabolism 21 Energy metabolism

00190 Oxidative phosphorylation 77 00195 Photosynthesis 42 00196 Photosynthesis - antenna proteins 12 00710 Carbon fixation in photosynthetic organisms 25 00720 Carbon fixation pathways in prokaryotes 13 00680 Methane metabolism 18 00910 Nitrogen metabolism 11 117

00920 Sulfur metabolism 14 Lipid metabolism

00061 Fatty acid biosynthesis 13 00062 Fatty acid elongation 7 00071 Fatty acid degradation 10 00072 Synthesis and degradation of 3 00073 Cutin, suberine and wax biosynthesis 9 00100 Steroid biosynthesis 18 00140 Steroid hormone biosynthesis 3 00561 Glycerolipid metabolism 24 00564 Glycerophospholipid metabolism 37 00565 Ether lipid metabolism 7 00600 Sphingolipid metabolism 15 00590 Arachidonic acid metabolism 7 00591 Linoleic acid metabolism 4 00592 alpha-Linolenic acid metabolism 13 01040 Biosynthesis of unsaturated fatty acids 9 Nucleotide metabolism

00230 Purine metabolism 86 00240 Pyrimidine metabolism 72 Amino acid metabolism

00250 Alanine, aspartate and glutamate metabolism 26 00260 Glycine, serine and threonine metabolism 33 00270 Cysteine and methionine metabolism 35 00280 Valine, leucine and isoleucine degradation 21 00290 Valine, leucine and isoleucine biosynthesis 10 00300 Lysine biosynthesis 9 00310 Lysine degradation 13 00220 Arginine biosynthesis 19 00330 Arginine and proline metabolism 23 00340 Histidine metabolism 11 00350 Tyrosine metabolism 17 00360 Phenylalanine metabolism 15 00380 Tryptophan metabolism 10 00400 Phenylalanine, tyrosine and tryptophan biosynthesis 22 Metabolism of other amino acids

00410 beta-Alanine metabolism 16 00430 Taurine and hypotaurine metabolism 3 00440 Phosphonate and phosphinate metabolism 3 00450 Selenocompound metabolism 9 118

00460 Cyanoamino acid metabolism 9 00471 D-Glutamine and D-glutamate metabolism 1 00480 Glutathione metabolism 16 Glycan biosynthesis and metabolism

00510 N-Glycan biosynthesis 30 00513 Various types of N-glycan biosynthesis 22 00515 Mannose type O-glycan biosynthesis 1 00514 Other types of O-glycan biosynthesis 3 00532 Glycosaminoglycan biosynthesis - chondroitin sulfate / dermatan sulfate 2 00534 Glycosaminoglycan biosynthesis - heparan sulfate / heparin 3 00531 Glycosaminoglycan degradation 5 00563 Glycosylphosphatidylinositol (GPI-anchor biosynthesis ) 20 00601 Glycosphingolipid biosynthesis - lacto and neolacto series 1 00603 Glycosphingolipid biosynthesis - globo and isoglobo series 3 00604 Glycosphingolipid biosynthesis - ganglio series 2 00540 Lipopolysaccharide biosynthesis 8 00550 Peptidoglycan biosynthesis 1 00511 Other glycan degradation 9 Metabolism of cofactors and vitamins

00730 Thiamine metabolism 10 00740 Riboflavin metabolism 8 00750 Vitamin B6 metabolism 7 00760 Nicotinate and nicotinamide metabolism 13 00770 Pantothenate and CoA biosynthesis 17 00780 Biotin metabolism 7 00785 Lipoic acid metabolism 2 00790 Folate biosynthesis 11 00670 One carbon pool by folate 10 00830 Retinol metabolism 6 00860 Porphyrin and chlorophyll metabolism 31 00130 Ubiquinone and other terpenoid-quinone biosynthesis 20 Metabolism of terpenoids and polyketides

00900 Terpenoid backbone biosynthesis 29 00902 Monoterpenoid biosynthesis 3 00909 Sesquiterpenoid and triterpenoid biosynthesis 7 00904 Diterpenoid biosynthesis 8 00906 Carotenoid biosynthesis 18 00905 Brassinosteroid biosynthesis 8 00981 Insect hormone biosynthesis 1 00908 Zeatin biosynthesis 5 119

00903 Limonene and pinene degradation 2 00281 Geraniol degradation 1 01051 Biosynthesis of ansamycins 1 00523 Polyketide sugar unit biosynthesis 1 01053 Biosynthesis of siderophore group nonribosomal peptides 1 Biosynthesis of other secondary metabolites

00940 Phenylpropanoid biosynthesis 18 00945 Stilbenoid, diarylheptanoid and gingerol biosynthesis 5 00941 Flavonoid biosynthesis 13 00944 Flavone and flavonol biosynthesis 3 00942 Anthocyanin biosynthesis 3 00943 Isoflavonoid biosynthesis 1 00950 Isoquinoline alkaloid biosynthesis 8 00960 Tropane, piperidine and pyridine alkaloid biosynthesis 8 00232 Caffeine metabolism 2 00965 Betalain biosynthesis 1 00966 Glucosinolate biosynthesis 3 00261 Monobactam biosynthesis 6 00521 Streptomycin biosynthesis 4 00524 Neomycin, kanamycin and gentamicin biosynthesis 1 00401 Novobiocin biosynthesis 2 00254 Aflatoxin biosynthesis 1 Xenobiotics biodegradation and metabolism

00362 Benzoate degradation 2 00627 Aminobenzoate degradation 3 00364 Fluorobenzoate degradation 1 00625 Chloroalkane and chloroalkene degradation 3 00361 Chlorocyclohexane and chlorobenzene degradation 1 00623 Toluene degradation 1 00643 Styrene degradation 4 00791 Atrazine degradation 1 00363 Bisphenol degradation 1 00626 Naphthalene degradation 2 00624 Polycyclic aromatic hydrocarbon degradation 1 00980 Metabolism of xenobiotics by cytochrome P450 5 00982 Drug metabolism - cytochrome P450 4 00983 Drug metabolism - other enzymes 12 Chemical structure transformation maps

01062 Biosynthesis of terpenoids and steroids 1

120

Genetic Information Processing

Transcription

03020 RNA polymerase 28 03022 Basal transcription factors 28 03040 Spliceosome 101 Translation

03010 Ribosome 126 00970 Aminoacyl-tRNA biosynthesis 26 03013 RNA transport 95 03015 mRNA surveillance pathway 50 03008 Ribosome biogenesis in eukaryotes 55 Folding, sorting and degradation

03060 Protein export 26 04141 Protein processing in endoplasmic reticulum 77 04130 SNARE interactions in vesicular transport 18 04120 Ubiquitin mediated proteolysis 56 04122 Sulfur relay system 9 03050 Proteasome 33 03018 RNA degradation 50 Replication and repair

03030 DNA replication 30 03410 Base excision repair 25 03420 Nucleotide excision repair 36 03430 Mismatch repair 20 03440 Homologous recombination 31 03450 Non-homologous end-joining 8 03460 Fanconi anemia pathway 31

Environmental Information Processing

Membrane transport

02010 ABC transporters 8 03070 Bacterial secretion system 6 Signal transduction

02020 Two-component system 8 04014 Ras signaling pathway 10 04015 Rap1 signaling pathway 5 04010 MAPK signaling pathway 9 04013 MAPK signaling pathway - fly 10

121

04016 MAPK signaling pathway - plant 23 04011 MAPK signaling pathway - yeast 12 04012 ErbB signaling pathway 5 04310 Wnt signaling pathway 15 04330 Notch signaling pathway 8 04340 Hedgehog signaling pathway 4 04341 Hedgehog signaling pathway - fly 7 04350 TGF-beta signaling pathway 10 04390 Hippo signaling pathway 7 04391 Hippo signaling pathway - fly 6 04392 Hippo signaling pathway -multiple species 2 04370 VEGF signaling pathway 5 04371 Apelin signaling pathway 14 04630 Jak-STAT signaling pathway 3 04064 NF-kappa B signaling pathway 6 04668 TNF signaling pathway 3 04066 HIF-1 signaling pathway 15 04068 FoxO signaling pathway 17 04020 Calcium signaling pathway 7 04070 Phosphatidylinositol signaling system 19 04072 Phospholipase D signaling pathway 10 04071 Sphingolipid signaling pathway 19 04024 cAMP signaling pathway 8 04022 cGMP-PKG signaling pathway 8 04151 PI3K-Akt signaling pathway 24 04152 AMPK signaling pathway 24 04150 mTOR signaling pathway 26 04075 Plant hormone signal transduction 41

Cellular Processes

Transport and catabolism

04144 Endocytosis 52 04145 Phagosome 28 04142 Lysosome 34 04146 Peroxisome 38 04140 Autophagy - animal 34 04138 Autophagy - yeast 34 04137 Mitophagy - animal 13 04139 Mitophagy - yeast 13 122

Cell growth and death

04110 Cell cycle 58 04111 Cell cycle - yeast 52 04112 Cell cycle - Caulobacter 4 04113 Meiosis - yeast 40 04114 Oocyte meiosis 31 04210 Apoptosis 13 04214 Apoptosis - fly 11 04215 Apoptosis - multiple species 3 04115 signaling pathway 12 Cellular community - eukaryotes

04510 Focal adhesion 7 04520 Adherens junction 6 04530 Tight junction 15 04540 Gap junction 5 04550 Signaling pathways regulating pluripotency of stem cells 5 Cellular community - prokaryotes

02024 Quorum sensing 13 05111 Biofilm formation - Vibrio cholerae 1 02025 Biofilm formation - Pseudomonas aeruginosa 2 02026 Biofilm formation - Escherichia coli 3 Cell motility

04810 Regulation of actin cytoskeleton 17

Organismal Systems

Immune system

04611 Platelet activation 2 04620 Toll-like signaling pathway 5 04624 Toll and Imd signaling pathway 5 04621 NOD-like receptor signaling pathway 13 04622 RIG-I-like receptor signaling pathway 6 04623 Cytosolic DNA-sensing pathway 15 04650 Natural killer cell mediated cytotoxicity 4 04612 Antigen processing and presentation 11 04660 T cell receptor signaling pathway 5 04658 Th1 and Th2 cell differentiation 2 04659 Th17 cell differentiation 4 04657 IL-17 signaling pathway 6 04662 B cell receptor signaling pathway 5 123

04664 Fc epsilon RI signaling pathway 4 04666 Fc gamma R-mediated phagocytosis 14 04670 Leukocyte transendothelial migration 1 04062 Chemokine signaling pathway 5 Endocrine system

04911 Insulin secretion 1 04910 Insulin signaling pathway 19 04922 Glucagon signaling pathway 14 04920 Adipocytokine signaling pathway 6 03320 PPAR signaling pathway 8 04912 GnRH signaling pathway 5 04913 Ovarian steroidogenesis 1 04915 Estrogen signaling pathway 7 04914 Progesterone-mediated oocyte maturation 22 04917 Prolactin signaling pathway 4 04921 Oxytocin signaling pathway 10 04918 Thyroid hormone synthesis 5 04919 Thyroid hormone signaling pathway 16 04916 Melanogenesis 5 04924 Renin secretion 3 04614 Renin-angiotensin system 3 04925 Aldosterone synthesis and secretion 2 Circulatory system

04260 Cardiac muscle contraction 12 04261 Adrenergic signaling in cardiomyocytes 8 04270 Vascular smooth muscle contraction 5 Digestive system

04970 Salivary secretion 1 04971 Gastric acid secretion 1 04972 Pancreatic secretion 5 04976 Bile secretion 5 04973 Carbohydrate digestion and absorption 2 04974 Protein digestion and absorption 3 04975 Fat digestion and absorption 4 04977 Vitamin digestion and absorption 2 04978 Mineral absorption 6 Excretory system

04962 Vasopressin-regulated water reabsorption 6 04960 Aldosterone-regulated sodium reabsorption 2 04961 Endocrine and other factor-regulated calcium reabsorption 6 124

04964 Proximal tubule bicarbonate reclamation 2 04966 Collecting duct acid secretion 11 Nervous system

04724 Glutamatergic synapse 6 04727 GABAergic synapse 8 04725 Cholinergic synapse 3 04728 Dopaminergic synapse 9 04726 Serotonergic synapse 3 04720 Long-term potentiation 6 04730 Long-term depression 4 04723 Retrograde endocannabinoid signaling 3 04721 Synaptic vesicle cycle 24 04722 Neurotrophin signaling pathway 11 Sensory system

04744 Phototransduction 2 04745 Phototransduction - fly 1 04740 Olfactory transduction 2 04750 Inflammatory mediator regulation of TRP channels 2 Development

04320 Dorso-ventral axis formation 3 04360 Axon guidance 5 04380 Osteoclast differentiation 5 Aging

04211 Longevity regulating pathway 13 04212 Longevity regulating pathway - worm 16 04213 Longevity regulating pathway - multiple species 12 Environmental adaptation

04710 Circadian rhythm 6 04713 Circadian entrainment 3 04711 Circadian rhythm - fly 1 04712 Circadian rhythm - plant 21 04626 Plant-pathogen interaction 27

125

Appendix B : KEGG pathways distribution during gene expression analysis

Pathways Database Up-regulated Down-regulated Toll pathway-drosophila (P06217) SCW signaling pathway (P06216)

SCW signaling pathway (P06216) GBB signaling pathway (P06214) GBB signaling pathway (P06214) DPP signaling pathway (P06213) DPP signaling pathway (P06213) DPP-SCW signaling pathway (P06212) DPP-SCW signaling pathway (P06212) BMP/activin signaling pathway-drosophila (P06211) BMP/activin signaling pathway-drosophila Apoptosis signaling pathway (P00006) (P06211)

L. angustifolia L. Axon guidance mediated by Slit/Robo (P00008) Gonadotropin-releasing pathway (P06664) Axon guidance mediated by semaphorins Angiogenesis (P00005) (P00007) Apoptosis signaling pathway (P00006) Alzheimer disease-presenilin pathway (P00004) Pyridoxal-5-phosphate biosynthesis (P02759) Alzheimer disease-amyloid secretase pathway (P00003) Gonadotropin-releasing hormone receptor Methylmalonyl pathway (P02755) pathway (P06664) Angiogenesis (P00005) Lysine biosynthesis (P02751) Alzheimer disease-presenilin pathway (P00004) Lipoate_biosynthesis (P02750) O-antigen biosynthesis (P02757) Ubiquitin proteasome pathway (P00060) Alzheimer disease-amyloid secretase pathway Leucine biosynthesis (P02749) (P00003) N-acetylglucosamine metabolism (P02756) Isoleucine biosynthesis (P02748) Adrenaline and noradrenaline biosynthesis Histidine biosynthesis (P02747) (P00001) Methionine biosynthesis (P02753) Wnt signaling pathway (P00057) Mannose metabolism (P02752) Heme biosynthesis (P02746) Lysine biosynthesis (P02751) Glutamine glutamate conversion (P02745) CCKR signaling map (P06959) Fructose galactose metabolism (P02744) Ubiquitin proteasome pathway (P00060) Formyltetrahydroformate biosynthesis (P02743) Leucine biosynthesis (P02749) T cell activation (P00053) p53 pathway (P00059) TGF-beta signaling pathway (P00052) Isoleucine biosynthesis (P02748) Flavin biosynthesis (P02741) Wnt signaling pathway (P00057) TCA cycle (P00051) Heme biosynthesis (P02746) De novo pyrimidine deoxyribonucleotide biosynthesis (P02739) Glutamine glutamate conversion (P02745) Parkinson disease (P00049) Transcription regulation by bZIP transcription De novo purine biosynthesis (P02738) factor (P00055) Fructose galactose metabolism (P02744) Cysteine biosynthesis (P02737) Toll receptor signaling pathway (P00054) PDGF signaling pathway (P00047) Formyltetrahydroformate biosynthesis (P02743) Oxidative stress response (P00046) Tetrahydrofolate biosynthesis (P02742) Chorismate biosynthesis (P02734) T cell activation (P00053) Nicotinic acetylcholine receptor signaling pathway (P00044)

126

TGF-beta signaling pathway (P00052) Acetylcholine receptor 2 & 4 signaling pathway (P00043) Flavin biosynthesis (P02741) Biotin biosynthesis (P02731) TCA cycle (P00051) Metabotropic glutamate receptor group I pathway (P00041) De novo pyrimidine ribonucleotides biosynthesis Metabotropic glutamate receptor group II pathway (P00040) (P02740) De novo pyrimidine deoxyribonucleotide Ascorbate degradation (P02729) biosynthesis (P02739) Parkinson disease (P00049) Arginine biosynthesis (P02728) De novo purine biosynthesis (P02738) Metabotropic glutamate receptor group III pathway (P00039) PI3 kinase pathway (P00048) Androgen/estrogen/progesterone biosynthesis (P02727) Cysteine biosynthesis (P02737) Ionotropic glutamate receptor pathway (P00037) PDGF signaling pathway (P00047) Interferon-gamma signaling pathway (P00035) Coenzyme A biosynthesis (P02736) Integrin signalling pathway (P00034) Insulin/IGF pathway-protein kinase B signaling cascade Oxidative stress response (P00046) (P00033) Cobalamin biosynthesis (P02735) ATP synthesis (P02721) Notch signaling pathway (P00045) MAP kinase cascade (P00032) Chorismate biosynthesis (P02734) p53 pathway feedback loops 2 (P04398) Nicotinic acetylcholine receptor signaling Valine biosynthesis (P02785) pathway (P00044) Acetylcholine receptor 2 & 4 signaling pathway Inflammation mediated signaling pathway (P00031) (P00043) Acetylcholine receptor 1 and 3 signaling Tyrosine biosynthesis (P02784) pathway (P00042) Metabotropic glutamate receptor group I Hypoxia response via HIF activation (P00030) pathway (P00041) Asparagine and aspartate biosynthesis (P02730) Vitamin D metabolism and pathway (P04396) Metabotropic glutamate receptor group II Tryptophan biosynthesis (P02783) pathway (P00040) Thyrotropin-releasing hormone receptor signaling pathway Ascorbate degradation (P02729) (P04394) Metabotropic glutamate receptor group III Ras Pathway (P04393) pathway (P00039) Arginine biosynthesis (P02728) Oxytocin receptor mediated signaling pathway (P04391) Androgen/estrogen/progesterone biosynthesis Huntington disease (P00029) (P02727) Ionotropic glutamate receptor pathway (P00037) Heterotrimeric G-protein signaling pathway (P00028) Aminobutyrate degradation (P02726) p38 MAPK pathway (P05918) Interleukin signaling pathway (P00036) Heterotrimeric Gi & Gs protein signaling pathway (P00026) Alanine biosynthesis (P02724) Sulfate assimilation (P02778) Interferon-gamma signaling pathway (P00035) Glycolysis (P00024) Adenine and hypoxanthine salvage pathway Succinate to proprionate conversion (P02777) (P02723) Integrin signalling pathway (P00034) Serine glycine biosynthesis (P02776) Vitamin B6 metabolism (P02787) Salvage pyrimidine ribonucleotides (P02775) Protein kinase B signaling cascade (P00033) Enkephalin release (P05913) Vitamin B6 biosynthesis (P02786) FGF signaling pathway (P00021) ATP synthesis (P02721) Dopamine receptor mediated signaling pathway (P05912) MAP kinase cascade (P00032) Histamine H2 receptor mediated signaling pathway (P04386) p53 pathway feedback loops 2 (P04398) Pyruvate metabolism (P02772) 127

Valine biosynthesis (P02785) Histamine H1 receptor mediated signaling pathway (P04385) Inflammation mediated signaling pathway Endothelin signaling pathway (P00019) (P00031) p53 pathway by glucose deprivation (P04397) EGF receptor signaling pathway (P00018) Tyrosine biosynthesis (P02784) DNA replication (P00017) Hypoxia response via HIF activation (P00030) Cytoskeletal regulation by Rho GTPase (P00016) Vitamin D metabolism and pathway (P04396) Circadian system (P00015) Tryptophan biosynthesis (P02783) biosynthesis (P00014) Vasopressin synthesis (P04395) Cell cycle (P00013) Thyrotropin-releasing receptor signaling pathway Cadherin signaling pathway (P00012) (P04394) Threonine biosynthesis (P02781) Beta2 adrenergic receptor signaling pathway (P04378) Ras Pathway (P04393) B cell activation (P00010) P53 pathway feedback loops 1 (P04392) Beta1 adrenergic receptor signaling pathway (P04377) Oxytocin receptor mediated signaling pathway Peptidoglycan biosynthesis (P02763) (P04391) Huntington disease (P00029) Pentose phosphate pathway (P02762) Heterotrimeric G-protein signaling pathway 5HT2 type receptor mediated signaling pathway (P04374) (P00028) Heterotrimeric Gi & Gs-protein signaling 5HT1 type receptor mediated signaling pathway (P04373) pathway (P00027) p38 MAPK pathway (P05918) 5-Hydroxytryptamine degredation (P04372) Heterotrimeric Gi & Gs protein signaling pathway (P00026) Thiamin biosynthesis (P02779) Opioid proopiomelanocortin pathway (P05917) Hedgehog signaling pathway (P00025) Sulfate assimilation (P02778) Opioid prodynorphin pathway (P05916) Glycolysis (P00024) Opioid proenkephalin pathway (P05915) General transcription regulation (P00023) Serine glycine biosynthesis (P02776) Nicotine pharmacodynamics pathway (P06587) Salvage pyrimidine ribonucleotides (P02775) Enkephalin release (P05913) FGF signaling pathway (P00021) Salvage pyrimidine deoxyribonucleotides

(P02774) Dopamine receptor mediated signaling pathway

(P05912) FAS signaling pathway (P00020) S-adenosylmethionine biosynthesis (P02773) Histamine H2 receptor mediated signaling pathway (P04386) Pyruvate metabolism (P02772) Histamine H1 receptor mediated signaling pathway (P04385) Pyrimidine Metabolism (P02771) 128

Pyridoxal phosphate salvage pathway (P02770) Corticotrophin releasing factor signaling

pathway (P04380) Endothelin signaling pathway (P00019) EGF receptor signaling pathway (P00018) DNA replication (P00017) Cytoskeletal regulation by Rho GTPase (P00016) Proline biosynthesis (P02768) Cholesterol biosynthesis (P00014) Cell cycle (P00013) Phenylethylamine degradation (P02766) Cadherin signaling pathway (P00012) Beta3 adrenergic receptor signaling pathway

(P04379) Phenylalanine biosynthesis (P02765) Beta2 adrenergic receptor signaling pathway

(P04378) B cell activation (P00010) Beta1 adrenergic receptor signaling pathway

(P04377) Peptidoglycan biosynthesis (P02763) 5HT4 type receptor mediated signaling pathway

(P04376) Pentose phosphate pathway (P02762) 5HT3 type receptor mediated signaling pathway

(P04375) Pantothenate biosynthesis (P02761) 5HT2 type receptor mediated signaling pathway

(P04374) 5HT1 type receptor mediated signaling pathway

(P04373) 5-Hydroxytryptamine degredation (P04372) 5-Hydroxytryptamine biosynthesis (P04371) Pathways Database Up-regulated Down-regulated SCW signaling pathway (P06216) De novo pyrimidine deoxyribonucleotide biosynthesis (P02739) GBB signaling pathway (P06214) Apoptosis signaling pathway (P00006) DPP signaling pathway (P06213) De novo purine biosynthesis (P02738) DPP-SCW signaling pathway (P06212) Cysteine biosynthesis (P02737) BMP/activin signaling pathway-drosophila Interferon-gamma signaling pathway (P00035) (P06211) Axon guidance mediated by Slit/Robo (P00008) 5HT2 type receptor mediated signaling pathway (P04374)

L. xintermedia L. Apoptosis signaling pathway (P00006) Alzheimer disease-amyloid secretase pathway (P00003) Pyridoxal-5-phosphate biosynthesis (P02759) 5-Hydroxytryptamine degredation (P04372) Angiogenesis (P00005) Chorismate biosynthesis (P02734) Alzheimer disease-presenilin pathway (P00004) Phenylalanine biosynthesis (P02765) Alzheimer disease-amyloid secretase pathway MAP kinase cascade (P00032) (P00003)

129

Adrenaline and noradrenaline biosynthesis Inflammation mediated signaling pathway (P00031) (P00001) Methyl citrate cycle (P02754) Peptidoglycan biosynthesis (P02763) Methionine biosynthesis (P02753) Biotin biosynthesis (P02731) Lysine biosynthesis (P02751) Pentose phosphate pathway (P02762) Ubiquitin proteasome pathway (P00060) Asparagine and aspartate biosynthesis (P02730) Leucine biosynthesis (P02749) Ubiquitin proteasome pathway (P00060) p53 pathway (P00059) Pantothenate biosynthesis (P02761) Isoleucine biosynthesis (P02748) Huntington disease (P00029) mRNA splicing (P00058) Ascorbate degradation (P02729) Wnt signaling pathway (P00057) p53 pathway (P00059) Heme biosynthesis (P02746) p53 pathway feedback loops 2 (P04398) Glutamine glutamate conversion (P02745) Pyridoxal-5-phosphate biosynthesis (P02759) Transcription regulation by bZIP transcription Wnt signaling pathway (P00057) factor (P00055) Fructose galactose metabolism (P02744) Aminobutyrate degradation (P02726) Formyltetrahydroformate biosynthesis (P02743) Vasopressin synthesis (P04395) Tetrahydrofolate biosynthesis (P02742) Glycolysis (P00024) TGF-beta signaling pathway (P00052) Transcription regulation by bZIP transcription factor (P00055) Thyrotropin-releasing hormone receptor signaling pathway Flavin biosynthesis (P02741) (P04394) TCA cycle (P00051) General transcription regulation (P00023) De novo pyrimidine ribonucleotides biosynthesis Ras Pathway (P04393) (P02740) De novo pyrimidine deoxyribonucleotide Adenine and hypoxanthine salvage pathway (P02723) biosynthesis (P02739) Parkinson disease (P00049) Vitamin B6 biosynthesis (P02786) De novo purine biosynthesis (P02738) FGF signaling pathway (P00021) PI3 kinase pathway (P00048) TGF-beta signaling pathway (P00052) Cysteine biosynthesis (P02737) Oxytocin receptor mediated signaling pathway (P04391) PDGF signaling pathway (P00047) FAS signaling pathway (P00020) Oxidative stress response (P00046) ATP synthesis (P02721) Nicotinic acetylcholine receptor signaling Tyrosine biosynthesis (P02784) pathway (P00044) Acetylcholine receptor 2 & 4 signaling pathway Tryptophan biosynthesis (P02783) (P00043) Acetylcholine receptor 1 & 3 signaling pathway Lysine biosynthesis (P02751) (P00042) Metabotropic glutamate receptor group I Lipoate_biosynthesis (P02750) pathway (P00041) Asparagine and aspartate biosynthesis (P02730) Endothelin signaling pathway (P00019) Arginine biosynthesis (P02728) EGF receptor signaling pathway (P00018) Allantoin degradation (P02725) p38 MAPK pathway (P05918) Alanine biosynthesis (P02724) Parkinson disease (P00049) Interferon-gamma signaling pathway (P00035) DNA replication (P00017) Adenine and hypoxanthine salvage pathway Cytoskeletal regulation by Rho GTPase (P00016) (P02723) Integrin signalling pathway (P00034) Thiamin biosynthesis (P02779) Protein kinase B signaling cascade (P00033) Oxidative stress response (P00046) 130

Vitamin B6 biosynthesis (P02786) Histidine biosynthesis (P02747) ATP synthesis (P02721) Histamine H1 receptor mediated signaling pathway (P04385) MAP kinase cascade (P00032) Sulfate assimilation (P02778) p53 pathway feedback loops 2 (P04398) Heme biosynthesis (P02746) Valine biosynthesis (P02785) Cell cycle (P00013) Inflammation mediated signaling pathway Gamma-aminobutyric acid synthesis (P04384) (P00031) Tyrosine biosynthesis (P02784) Nicotinic acetylcholine receptor signaling pathway (P00044) Hypoxia response via HIF activation (P00030) Glutamine glutamate conversion (P02745) Vitamin D metabolism and pathway (P04396) Serine glycine biosynthesis (P02776) Tryptophan biosynthesis (P02783) Fructose galactose metabolism (P02744) Vasopressin synthesis (P04395) Salvage pyrimidine ribonucleotides (P02775) Thyrotropin-releasing receptor signaling pathway Salvage pyrimidine deoxyribonucleotides (P02774) (P04394) Threonine biosynthesis (P02781) Flavin biosynthesis (P02741) Ras Pathway (P04393) De novo pyrimidine ribonucleotides biosynthesis (P02740) P53 pathway feedback loops 1 (P04392) Gonadotropin-releasing hormone receptor pathway (P06664) Oxytocin receptor mediated signaling pathway

(P04391) Huntington disease (P00029) Heterotrimeric Gq & Go protein signaling pathway (P00027) p38 MAPK pathway (P05918) Heterotrimeric Gi & Gs protein signaling pathway (P00026) Thiamin biosynthesis (P02779) Sulfate assimilation (P02778) Glycolysis (P00024) General transcription regulation (P00023) Serine glycine biosynthesis (P02776) Nicotine pharmacodynamics pathway (P06587) Salvage pyrimidine ribonucleotides (P02775) FGF signaling pathway (P00021) Salvage pyrimidine deoxyribonucleotides

(P02774) Dopamine receptor mediated signaling pathway

(P05912) FAS signaling pathway (P00020) S-adenosylmethionine biosynthesis (P02773) Pyruvate metabolism (P02772) Histamine H1 receptor mediated signaling pathway (P04385) Pyrimidine Metabolism (P02771) Corticotrophin releasing factor signaling pathway (P04380) Endothelin signaling pathway (P00019) EGF receptor signaling pathway (P00018) DNA replication (P00017) 131

Cytoskeletal regulation by Rho GTPase (P00016) Proline biosynthesis (P02768) Cholesterol biosynthesis (P00014) Cell cycle (P00013) Phenylethylamine degradation (P02766) Cadherin signaling pathway (P00012) Phenylalanine biosynthesis (P02765) Peptidoglycan biosynthesis (P02763) Pentose phosphate pathway (P02762) Pantothenate biosynthesis (P02761) 5HT2 type receptor mediated signaling pathway

(P04374) 5-Hydroxytryptamine degredation (P04372) 5-Hydroxytryptamine biosynthesis (P04371) Pathways Database Up-regulated Down-regulated Toll pathway-drosophila (P06217) SCW signaling pathway (P06216)

SCW signaling pathway (P06216) GBB signaling pathway (P06214) GBB signaling pathway (P06214) DPP signaling pathway (P06213) DPP signaling pathway (P06213) DPP-SCW signaling pathway (P06212)

latifolia DPP-SCW signaling pathway (P06212) BMP/activin signaling pathway-drosophila (P06211) BMP/activin signaling pathway-drosophila L. L. Axon guidance mediated by Slit/Robo (P00008) (P06211) Axon guidance mediated by Slit/Robo (P00008) Apoptosis signaling pathway (P00006) Apoptosis signaling pathway (P00006) Alzheimer disease-presenilin pathway (P00004) Pyridoxal-5-phosphate biosynthesis (P02759) Alzheimer disease-amyloid secretase pathway (P00003) Gonadotropin-releasing hormone receptor Mannose metabolism (P02752) pathway (P06664) Angiogenesis (P00005) CCKR signaling map (P06959) Alzheimer disease-presenilin pathway (P00004) Lysine biosynthesis (P02751) Alzheimer disease-amyloid secretase pathway Ubiquitin proteasome pathway (P00060) (P00003) Adrenaline and noradrenaline biosynthesis Leucine biosynthesis (P02749) (P00001) Methyl citrate cycle (P02754) p53 pathway (P00059) Methionine biosynthesis (P02753) Isoleucine biosynthesis (P02748) Mannose metabolism (P02752) Histidine biosynthesis (P02747) CCKR signaling map (P06959) mRNA splicing (P00058) Lysine biosynthesis (P02751) Heme biosynthesis (P02746) Lipoate_biosynthesis (P02750) Wnt signaling pathway (P00057) Ubiquitin proteasome pathway (P00060) Glutamine glutamate conversion (P02745) p53 pathway (P00059) Fructose galactose metabolism (P02744) Isoleucine biosynthesis (P02748) Transcription regulation by bZIP transcription factor (P00055) mRNA splicing (P00058) Formyltetrahydroformate biosynthesis (P02743) Histidine biosynthesis (P02747) Tetrahydrofolate biosynthesis (P02742)

132

Wnt signaling pathway (P00057) T cell activation (P00053) Heme biosynthesis (P02746) Flavin biosynthesis (P02741) VEGF signaling pathway (P00056) TGF-beta signaling pathway (P00052) Glutamine glutamate conversion (P02745) De novo pyrimidine ribonucleotides biosynthesis (P02740) Transcription regulation by bZIP transcription TCA cycle (P00051) factor (P00055) Fructose galactose metabolism (P02744) Parkinson disease (P00049) Formyltetrahydroformate biosynthesis (P02743) De novo purine biosynthesis (P02738) TGF-beta signaling pathway (P00052) Cysteine biosynthesis (P02737) Flavin biosynthesis (P02741) PDGF signaling pathway (P00047) TCA cycle (P00051) Oxidative stress response (P00046) De novo pyrimidine ribonucleotides biosynthesis Chorismate biosynthesis (P02734) (P02740) De novo pyrimidine deoxyribonucleotide Nicotinic acetylcholine receptor signaling pathway (P00044) biosynthesis (P02739) Parkinson disease (P00049) Ascorbate degradation (P02729) De novo purine biosynthesis (P02738) Arginine biosynthesis (P02728) PI3 kinase pathway (P00048) Metabotropic glutamate receptor group III pathway (P00039) PDGF signaling pathway (P00047) Ionotropic glutamate receptor pathway (P00037) Oxidative stress response (P00046) Interferon-gamma signaling pathway (P00035) Nicotinic acetylcholine receptor signaling Integrin signalling pathway (P00034) pathway (P00044) Muscarinic acetylcholine receptor 2 and 4 Insulin/IGF pathway-protein kinase B signaling cascade signaling pathway (P00043) (P00033) Muscarinic acetylcholine receptor 1 and 3 ATP synthesis (P02721) signaling pathway (P00042) Biotin biosynthesis (P02731) MAP kinase cascade (P00032) Metabotropic glutamate receptor group I p53 pathway feedback loops 2 (P04398) pathway (P00041) Asparagine and aspartate biosynthesis (P02730) Inflammation mediated signaling pathway (P00031) Ascorbate degradation (P02729) Hypoxia response via HIF activation (P00030) Arginine biosynthesis (P02728) Tryptophan biosynthesis (P02783) Interleukin signaling pathway (P00036) Threonine biosynthesis (P02781) Interferon-gamma signaling pathway (P00035) Ras Pathway (P04393) Adenine and hypoxanthine salvage pathway Huntington disease (P00029) (P02723) Integrin signalling pathway (P00034) Heterotrimeric G-protein signaling pathway (P00028) Vitamin B6 metabolism (P02787) p38 MAPK pathway (P05918) Protein kinase B signaling cascade (P00033) Heterotrimeric Gi & Gs protein signaling pathway (P00026) Vitamin B6 biosynthesis (P02786) Glycolysis (P00024) ATP synthesis (P02721) Succinate to proprionate conversion (P02777) MAP kinase cascade (P00032) General transcription regulation (P00023) p53 pathway feedback loops 2 (P04398) Serine glycine biosynthesis (P02776) Inflammation mediated signaling pathway Salvage pyrimidine ribonucleotides (P02775) (P00031) p53 pathway by glucose deprivation (P04397) FGF signaling pathway (P00021) Tyrosine biosynthesis (P02784) Pyruvate metabolism (P02772) Hypoxia response via HIF activation (P00030) Endothelin signaling pathway (P00019) 133

Vitamin D metabolism and pathway (P04396) EGF receptor signaling pathway (P00018) Tryptophan biosynthesis (P02783) DNA replication (P00017) Vasopressin synthesis (P04395) Cytoskeletal regulation by Rho GTPase (P00016) Thyrotropin-releasing hormone signaling Circadian clock system (P00015) pathway (P04394) Threonine biosynthesis (P02781) Cholesterol biosynthesis (P00014) Ras Pathway (P04393) Proline biosynthesis (P02768) Oxytocin receptor mediated signaling pathway Cell cycle (P00013) (P04391) Huntington disease (P00029) B cell activation (P00010) Heterotrimeric Gq & G0 protein signaling Pentose phosphate pathway (P02762) pathway (P00027) p38 MAPK pathway (P05918) 5-Hydroxytryptamine degredation (P04372) Heterotrimeric Gi & Gs protein signaling pathway (P00026) Thiamin biosynthesis (P02779) Hedgehog signaling pathway (P00025) Glycolysis (P00024) General transcription regulation (P00023) Serine glycine biosynthesis (P02776) Nicotine pharmacodynamics pathway (P06587) Salvage pyrimidine ribonucleotides (P02775) FGF signaling pathway (P00021) Salvage pyrimidine deoxyribonucleotides

(P02774) Dopamine receptor mediated signaling pathway

(P05912) FAS signaling pathway (P00020) Pyruvate metabolism (P02772) Histamine H1 receptor mediated signaling pathway (P04385) Pyridoxal phosphate salvage pathway (P02770) Corticotrophin releasing factor receptor pathway

(P04380) Endothelin signaling pathway (P00019) EGF receptor signaling pathway (P00018) DNA replication (P00017) Cytoskeletal regulation by Rho GTPase (P00016) Cholesterol biosynthesis (P00014) Cell cycle (P00013) Phenylethylamine degradation (P02766) Cadherin signaling pathway (P00012) Pantothenate biosynthesis (P02761) 5HT2 type receptor mediated signaling pathway

(P04374) 5-Hydroxytryptamine degredation (P04372) 5-Hydroxytryptamine biosynthesis (P04371)

134

Appendix C : Plasmids used in this study.

Figure: A map of pCambia1391z vector map (https://www.markergene.com/).

135

Figure: A map of pAbAi vector (Clonetech).

136

Figure: A map of pGADT7 AD vector (Clonetech).

137

Figure: A map of pGA482 vector (A 1986).

138

Figure: A map of pGEX4T-1 vector. This vector was a gift from Kirsten Wolthers.

139

Appendix D : Media recipes used for N. benthamiana transformation and regeneration a) Co-cultivation medium: • 2.215 g L-1 (half-strength) Murashige and Skoog salts with vitamins (PhytoTechnology laboratories) • 0.4 % Gellan gum (PhytoTechnology laboratories) • 3.0 % (w/v) sucrose (Fisher scientific) • 0.1 mg L-1 indole butyric acid (Sigma Aldrich), • 0.8 mg L-1 6-benzylaminopurine (Sigma Aldrich), • adjusted pH 5.7 and autoclaved • 100 μM Acetosyringone (PhytoTechnology laboratories) (added after autoclaving) b) Regeneration (Shooting) medium: • 4.43 g L-1 Murashige and Skoog salts with vitamins • 0.4 % Gellan gum • 3.0 % (w/v) sucrose • 0.1 mg L-1 indole butyric acid • 0.8 mg L-1 6-benzylaminopurine • adjusted pH to 5.7 and autoclaved • 200 mg L-1 Timentin (ticarcillin and clavulanate) (Gold Biotechnology, USA), • 250 mg L-1 cefotaxime (Gold Biotechnology, USA) • 100 mg L-1 Kanamycin (Bio Basic Inc.) c) Rooting medium: • 4.43 g L-1 Murashige and Skoog salts with vitamins, • 0.8 % Gellan gum, • 3.0 % (w/v) sucrose, • 0.5 mg L-1 indole butyric acid • adjusted pH to 5.7 and autoclaved

140

• 200 mg L-1 Timentin (ticarcillin and clavulanate) • 250 mg L-1 Cefotaxime • 200 mg L-1 Kanamycin • Note: all antibiotics used in regeneration and rooting media were added after autoclaving

141