Purdue University Purdue e-Pubs

Open Access Dissertations Theses and Dissertations

Spring 2015 Paralogous genes in Arabidopsis thaliana contribute to diversified phenylpropanoid metabolism Li Yi Purdue University

Follow this and additional works at: https://docs.lib.purdue.edu/open_access_dissertations Part of the Biochemistry Commons

Recommended Citation Yi, Li, "Paralogous genes in Arabidopsis thaliana contribute to diversified phenylpropanoid metabolism" (2015). Open Access Dissertations. 612. https://docs.lib.purdue.edu/open_access_dissertations/612

This document has been made available through Purdue e-Pubs, a service of the Purdue University Libraries. Please contact [email protected] for additional information. Graduate School Form 30 Updated 1/15/2015

PURDUE UNIVERSITY GRADUATE SCHOOL Thesis/Dissertation Acceptance

This is to certify that the thesis/dissertation prepared

By YI LI

Entitled PARALOGOUS GENES IN ARABIDOPSIS THALIANA CONTRIBUTE TO DIVERSIFIED PHENYLPROPANOID METABOLISM

For the degree of Doctor of Philosophy

Is approved by the final examining committee:

CLINT CHAPPLE Chair THOMAS.J.KAPPOCK

JOSEPH OGAS

BRIAN DILKES

To the best of my knowledge and as understood by the student in the Thesis/Dissertation Agreement, Publication Delay, and Certification Disclaimer (Graduate School Form 32), this thesis/dissertation adheres to the provisions of Purdue University’s “Policy of Integrity in Research” and the use of copyright material.

Approved by Major Professor(s): CLINT CHAPPLE

Approved by: CLINT CHAPPLE 4/23/2015 Head of the Departmental Graduate Program Date

PARALOGOUS GENES IN ARABIDOPSIS THALIANA CONTRIBUTE TO

DIVERSIFIED PHENYLPROPANOID METABOLISM

A Dissertation

Submitted to the Faculty

of

Purdue University

by

Yi Li

In Partial Fulfillment of the

Requirements for the Degree

of

Doctor of Philosophy

August 2015

Purdue University

West Lafayette, Indiana ii

For my husband, colleague and best friend, Nickolas Alan Anderson iii

ACKNOWLEDGEMENTS

I thank Dr. Clint Chapple for being my mentor and training me the scientist

I am today. I also thank my present and past committee members: Dr. Joe

Kappock, Dr. Brian Dilkes, Dr. Joe Ogas and Dr. David Salt for their support and scientific discussions every step of my graduate career. I can only hope to pass on the passion for science, the perpetual curiosity and rigorous training you have shared with me to the next generation of scientists.

I thank my parents, Li Fanghong and Dong Lianqi, for their unconditional love, for supporting me to come to America alone at the age of 19 to pursue my dreams, and for never stopping believing in me. I thank my husband, Nickolas

Anderson, for sharing and continuing on this journey with me that is filled with happiness, dreams, love even hardship ever since the first day we met at Purdue.

I thank my parents-in-law: Russell and Janet Anderson, for taking me in as their own daughter and giving me all the love and support. I also thank my friends, without whom I would never have made it through: Luyen Nguyen, Aubrey and

Christopher Schilling, Heather Piscatelli, Jeong Im Kim, Jo Cusumano, Jing-Ke

Weng, Sirius Li, Nickolas Bonawitz, and many others to whom I am forever indebted. iv

PREFACE

Chapter 1 is a literature review of the modes of gene duplication and how the fixation of duplicated genes contributes to the chemical diversity of specialized metabolism. Portions of Chapter 2 are published in Science in 2012.

Figures listed in Chapter 2 under the published results are my contributions to the publication. Chapter 3 is to be submitted for publication in Plant Physiology.

Len Pysh generated the promoters of 4CLs into entry clone constructs. v

TABLE OF CONTENTS

Page

ABSTRACT ...... viii

CHAPTER 1. GENE DUPLICATION EVENTS CONTRIBUTE TO DIVERSIFIED

PLANT SPECIALIZED METABOLISM ...... 1

Introduction ...... 1

Modes and consequences of gene duplication ...... 2

Plant specialized metabolism ...... 5

Divergence from primary metabolism ...... 5

The glucosinolate biosynthetic pathway ...... 7

The phenylpropanoid biosynthetic pathway ...... 8

Terpene biosynthetic pathway ...... 10

Conclusions and future perspectives ...... 10

References ...... 12

Figures ...... 18

CHAPTER 2. ARABIDOPYRONES ARE NEWLY IDENTIFIED

PHENYLPROPANOIDS IN ARABIDOPSIS...... 24

Introduction ...... 24 vi

Page

Results ...... 26

Summary of published results ...... 26

CYP84A4 is required for AP biosynthesis ...... 26

APs are derived from PHE ...... 27

CYP84A4 generates caffealdehyde ...... 28

LIGB is recruited in AP biosynthesis ...... 29

Assembly of the ap biosynthetic pathway ...... 30

PAL ...... 30

C4H...... 31

4CL ...... 32

CCR ...... 32

CAD C and CAD D ...... 33

Identification of other genes of AP biosynthesis ...... 33

Discussion ...... 35

Materials and methods...... 37

References ...... 40

Figures ...... 43

CHAPTER 3. ISOFORMS OF ARABIDOPSIS 4CL HAVE OVERLAPPING YET

DISTINCT ROLES IN PHENYLPROPANOIND METABOLISM...... 55 vii

Page

Introduction ...... 55

Results ...... 58

Phylogenetic analysis of 4CL homologs ...... 58

Promoter gus analysis of four 4CLs ...... 59

4CL activity ...... 60

Plant growth and lignin biosynthesis ...... 60

Lignin composition ...... 61

Sinapoylmalate biosynthesis ...... 62

Atypical soluble phenylpropanoid metabolites ...... 62

4CL3 in flavonoid biosynthesis ...... 63

Discussion ...... 65

Divergence of 4CL isoforms ...... 65

4CL isoforms in soluble metabolism ...... 66

Growth defects and lignin deposition ...... 68

Materials and methods...... 71

References ...... 75

Table 3-1. List of sequences in phylogenetic tree ...... 81

Figures ...... 83

VITA ...... 91 viii

ABSTRACT

Li, Yi. Ph.D., Purdue University, August 2015. Paralogous genes in Arabidopsis thaliana contribute to diversified phenylpropanoid metabolism. Major Professor: Clint Chapple.

Significant evidence supports the idea that gene duplication drives the evolution of new gene function. Besides being silenced, duplicated genes can either neofunctionalize or subfunctionalize under selective pressure or neutral drift. Understanding the trajectory of how each gene is fixed and presumably provides added fitness remains difficult. Plant specialized metabolism provides an attractive platform to study the fixation of genes post duplication and how this process leads to the chemical diversity seen today. Specialized metabolites by definition are thought to be dispensable under normal growth conditions. Thus deleterious mutations occurring in paralogous genes that otherwise would be selected against in primary metabolism may be more tolerated in specialized metabolism. Using the Arabidopsis phenylpropanoid pathway as a model, in this thesis I describe two cases where neofunctionalization and subfunctionalization of duplicated genes contributed to metabolite diversification. The phenylpropanoid pathway intersects with primary metabolism at phenylalanine. Recently we identified a new set of phenylalanine derived compounds in Arabidopsis which we named arabidopyrones (APs) which include arabidopyl alcohol, iso-arabidopyl alcohol, arabidopic acid and iso-arabidopic acid. CYP84A4 is a paralog of CYP84A1, a well-characterized enzyme in the phenylpropanoid pathway, and CYP84A4 has neofunctionalized relative to its ancestral function. CYP84A4 3-hydroxylates p-coumaraldehyde, a phenylpropanoid intermediate, to generate caffealdehyde. Caffealdehyde can be ix used by a conserved ring cleavage dioxygenase, AtLigB, in a step required to make the heterocyclic APs. Understanding AP biosynthesis may provide a unique opportunity to learn the broader biological function of LigB homologs. To do so, we tested the hypothesis that enzymes and intermediates in the phenylpropanoid pathway leading to p-coumaraldehyde are involved in AP biosynthesis. The general phenylpropanoid pathway gives rise to p-coumaryl CoA via phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H) and 4-coumarate: CoA ligase (4CL). Cinnamoyl CoA reductase (CCR) then converts p-coumaryl CoA to p-coumaraldehyde, the substrate of CYP84A4. Through the analyses of mutants that are defective in these genes, stable- isotope labeling studies, and chemical complementation experiments, we conclude that the activities of these enzymes are required for AP biosynthesis. In addition, we found that cinnamyl alcohol dehydrogenase C and D (CAD C and D), known enzymes in later steps of the phenylpropanoid pathway, are involved in AP biosynthesis in that they may convert caffealdehyde to caffeoyl alcohol which then can be used by AtLigB to generate arabidopyl alcohol and arabidopic acid. Four isoforms of 4CLs have been identified in Arabidopsis. 4CL generates p-coumaryl CoA and caffeoyl CoA from their respective acids which are required for the major products of this pathway. Phylogenetic analysis reveals that 4CL1, 4CL2 and 4CL4 are more closely related to one another than to 4CL3. Promoter- GUS analysis shows that 4CL1 and 4CL2 are expressed in lignifying cells. In contrast, 4CL3 is expressed in a broad range of cell types, indicating that 4CLs have subfunctionalized with regard to expression patterns. We found that 4cl3 mutants have an over-all reduction in flavonoid biosynthesis, suggesting that 4CL3 has acquired a distinct role in phenylpropanoid metabolism. Sinapoylmalate, the major hydroxycinnamoyl ester found in Arabidopsis is greatly reduced in a 4cl1 4cl3 mutant, showing that 4CL1 and 4CL3 function redundantly in its biosynthesis. The 4cl1 4cl2 double mutant and the 4cl1 4cl2 4cl3 triple mutant are both dwarf and contain less lignin than wild type, indicating that 4CL1 and 4CL2 are important for plant growth and that 4CL3 has a role in lignin x biosynthesis in addition to its function in soluble metabolism. We could not find an important role for 4CL4 in any of the organs examined, consistent with its limited expression profile. Together, these data show that the four paralogs of 4CLs in Arabidopsis diverged in their expression patterns, resulting in their overlapping yet distinct roles in phenylpropanoid metabolism.

1

CHAPTER 1. GENE DUPLICATION EVENTS CONTRIBUTE TO DIVERSIFIED PLANT SPECIALIZED METABOLISM

Introduction

It has been proposed that gene duplication drives the evolution of new gene functions and traits (Ohno, 1970). Post gene duplication, there are three possible evolutionary fates for duplicated gene pairs. First, one copy of the gene can become nonfunctional by being silenced or converted to a pseudogene. Second, it may acquire a novel function (neofunctionalization) under purifying selection or neutral genetic drift, or third, the original function of the two duplicates may be divided (subfunctionalization) (Rensing, 2014) (Fig. 1-1). The majority of duplicated genes are thought to be eliminated within the first couple of million years post duplication (Lynch and Conery, 2000). The fixation of neofunctionalized and subfunctionalized genes can take as little as 12 million years (Pegueroles et al., 2013). Although there is a significant body of knowledge on the selective process post duplication, the trajectory of exactly how each gene is fixed and presumably provides added fitness remains difficult. In fact, only an estimated 1% of the four billion survived over the course of 3.5 billion years, leaving us a small fraction to sample in present day (Barnosky et al., 2011) and making it extremely difficult to accurately document the selection process. Plant metabolism provides an attractive platform with which to study the fixation of genes post duplication. The fixation of the genes in turn leads to the chemical diversity existing within the complex metabolic networks. are known to contain many low-molecular weight compounds estimated to number anywhere between 100,000 to 500,000 (Wink, 2010; Wu, 2010; Chu et al., 2011). Only a small number of these compounds are involved in primary metabolism. 2

The majority of these compounds is termed specialized metabolites because they are thought to be dispensable under normal growth conditions and their biosynthesis is often restricted to certain plant lineages, cell types or is inducible under certain conditions (Ananthakrishnan, 1999). Nonetheless, the biological functions of some of these specialized metabolites have been shown to be crucial for plant fitness and play key roles in many aspects of plant life including color, fragrance, flavor, toxicity, physical rigidity, among others. Primary metabolism is required for survival and deleterious mutations in genes involved in these pathways are presumably more easily purged and selected against, however, specialized metabolism is more tolerant of mutations. In this chapter, I summarize the modes of gene duplication events and the general consequences of these events, then discuss pathway-specific examples of neofunctionalized or subfunctionalized genes in plant specialized metabolism, their origins of duplication, and their contributions to over-all plant fitness.

Modes and consequences of gene duplication

Over 40 years ago it was proposed that gene duplication contributes to genetic diversity (Ohno, 1970). Most evidence now points to gene duplication events as the driving force for evolution which leads to new gene functions and ultimately the complexity of different life forms. This concept is widely accepted and many efforts have been made to further understand the mechanism behind this process. Whole-genome duplications (WGD) occurred in many lineages of plants, animals and fungi (Wolfe and Shields, 1997; Jaillon et al., 2004; Kellis et al., 2004; Aury et al., 2006; Paterson et al., 2010). WGD is widespread in plants in that all angiosperms have experienced at least one WGD (Paterson et al., 2004; Tang et al., 2008). The model organism Arabidopsis and the crop rice, for example, both have experienced at least two WGDs (Tang et al., 2008; Tang et al., 2010). An estimated 80% of Populus genes, 54% of Arabidopsis, 18% of Vitis and 11% of Carica genes were generated by WGDs (Wang et al., 2012). 3

In addition to WGDs, other modes of gene duplication exist that are collectively termed small scale duplications (SSDs) or single gene duplications (Maere et al., 2005; Freeling, 2009; Rensing, 2014). Tandem duplication generates gene copies that are consecutive in physical positions on the chromosome. Similarly, proximal duplicates are gene copies that are close to one another but separated by a few genes. These two modes of duplication are presumed to arise from unequal chromosomal crossover or localized transposon activities (Zhao et al., 1998; Freeling, 2009). In Arabidopsis and rice, 10% of the total genes are the result of tandem duplications (Rizzon et al., 2006). Dispersed duplicated genes are neither adjacent on the chromosome or within any segment of homologous chromosomes. Distant single gene transposition can explain the wide spread of the dispersed duplicated genes and can be achieved by either a DNA or RNA based mechanism (Freeling et al., 2008; Freeling, 2009). DNA transposons insert DNA fragments into a new location on the chromosome via a cut-and-paste mechanism. Examples of these include but are not limited to Mutator-like transposable elements (MULEs) that are first identified in maize but also found in rice and Arabidopsis, helitrons that are well characterized in rice and maize, and CACTA elements that are found in sorghum but the mechanism of how they function remains elusive (Jiang et al., 2004; Brunner et al., 2005; Paterson et al., 2009). RNA based transposed duplication, also known as retrotransposition, translocates gene to a new location via an RNA intermediate in which the genomic DNA is first transcribed into a messenger RNA which is then reverse transcribed back to DNA and subsequently inserted into a new location on the same or different chromosome. Since the native promoter is often lost during this transposition process, the fixation of the gene relies on insertion adjacent to a functional promoter (Brosius, 1991; Kaessmann et al., 2009). It is assumed that immediately after gene duplication events such as WGD and tandem gene duplication, pairs of genes are fully redundant. Population genetic theory suggests that fully redundant genes are not stable during evolution unless the extra copy of the gene proves beneficial (Tautz, 1992; 4

Wagner, 1998; Zhang et al., 2003). In fact, it has been shown that the majority of the duplicated genes are silenced within a few million years (Lynch and Conery, 2000). Although these silenced genes are no longer functional individually, it has been hypothesized that they collectively contribute to biodiversity by promoting origin of reproduction barriers (Lynch and Conery, 2000; Briggs et al., 2006). Based on numerous analyses, it appears that duplicated genes that are fixed within the population have either neofunctionalized or subfunctionalized. Subfunctionalization, either in the form of partial redundancy and/or changes in expression pattern, is the most common means of gene retention. In fruit flies and yeast, duplicated genes are expressed at significantly different developmental stages than their single copy counterparts (Gu et al., 2004). A mouse model study shows that in almost all cases of duplicated gene pairs, the novel daughter copy exhibits a higher sequence divergence rate compared to single copy genes, regardless of the modes of duplication origin (Pegueroles et al., 2013). In Arabidopsis, 57% of more recently duplicated pairs and 73% of older duplicates have diverged in expression (Blanc and Wolfe, 2004). 115 pairs of post WGD genes in rice were shown to have diverged in expression patterns in grain, leaf and root (Throude et al., 2009). This prediction may be over- estimating the role of subfunctionalization because genome-wide studies may not necessary identify neofunctionalization without detailed biochemical studies. The modes of duplication may affect the rate of gene retention. WGD and tandem duplicates are more likely to show low asymmetric expression, whereas genes that are translocated show higher gene expression divergence (Wang et al., 2012), presumably because transposed genes gain new promoters upon translocation. In a more detailed study using rice and Arabidopsis microarray data sets, the trends of divergence of expression between duplicates are that DNA/RNA based transposed duplication > proximal duplication > WGD/tandem duplication (Wang et al., 2011). Further, different modes of gene duplication may result in differential gene contents. For examples, WGD often retains genes related to transcription factors, protein kinases and ribosomal proteins, whereas 5 genes involved in DNA metabolism, defense, tRNA ligation are often silenced (Freeling, 2009; Rensing, 2014). It is thought that the rise of transcriptional complexity is correlated with the increased number of cell types leading to complex morphology (Wang et al., 2012). Tandem duplication regions are often enriched for genes encoding membrane proteins and proteins involved in abiotic and biotic stress but lack genes encoding transcription factors (Rizzon et al., 2006). Although it is still unclear why each gene duplication mode retains genes in a biased manner, two models may provide some insights (Freeling, 2009). First, the balanced gene drive model (Freeling and Thomas, 2006) states that genes within the same network or genes encoding protein that interact in a complex may be retained stoichiometrically post gene duplication. Second, the conserved gene retention model posits that genes that are conserved before gene duplication events tend to be retained (Davis and Petrov, 2004). Gene duplication events, regardless of the modes of duplication, contribute to increased genetic diversity through the small fraction that are retained in the genome by either neofunctionalization or subfunctionalization. Genome scale studies provide great insights on the fixation process during genomic evolution. Nonetheless, to differentiate if a pair of genes is neofunctionalized or subfunctionalized, more detailed biochemical characterizations are required on a case by case basis. Below I present case studies on how the neofunctionalization and subfunctionalization of duplicated genes involved in plant specialized metabolism contribute to overall plant fitness.

Plant specialized metabolism

Divergence from primary metabolism

The sequencing of plant genomes has revealed that they contain 20,000 to 60,000 genes, and 15-25% of these genes encode enzymes involved in specialized metabolism (Bevan et al., 1998; Somerville and Somerville, 1999; Swarbreck et al., 2008; Kawahara et al., 2013). These genes provide a unique 6 platform to study how neofunctionalization and subfunctionalization contribute to the genetic and chemical diversity because of the plasticity and lineage specificity of specialized metabolism. The most important step towards the evolution of specialized metabolism is the neofunctionalization of enzymes that divert metabolites from primary metabolism to specialized metabolism. For example, the ability of phenylalanine ammonia lyase (PAL) to de-aminate phenylalanine from primary metabolism led to phenylpropanoid metabolism (Heller et al., 1979; Weng, 2014) (Fig. 1-2). Crystallography studies revealed that PAL is a close relative of histidine ammonia lyase (HAL), which is involved in the first de-amination step of histidine degradation (Schwede et al., 1999). The divergence of PAL from HAL likely happened when fungi and plants diverged from other lineages (Ferrer et al., 2008). Similarly, terpene biosynthesis starts with the primary metabolites isopentenyl pyrophosphate and dimethylallyl pyrophosphate, the evolved ability of enzymes to use these substrates enabled the downstream terpene metabolites. There are multiple enzymes that can catalyze the first step of the reaction. Nonetheless, the fixation of these genes is a key requirement for the emergence of this pathway (Chen et al., 2011). A number of key gene duplication events have been documented in detail. Pyrrolizidine alkaloid biosynthesis requires homospermidine synthase (HSS) as the first enzyme and it has been shown to have duplicated and further neofunctionalized from deoxyhypusine synthase (DHS) which is required for the biosynthesis of hypusine, a structurally related post-translational modification required for the activation of the translation initiation factor 5A (Ober and Hartmann, 1999) (Fig. 1-3). The committed step in the biosynthesis of the benzoxazinoid 2, 4-dihydroxy-2H-1, 4-benzoixazin-3(4H)- one (DIBOA), an indole alkaloid involved in plant defense, requires BX1, an enzyme whose duplicated counterpart is tryptophan synthase (Grun et al., 2005). DIBOA is made in all the species that encode BX1, in species lacking BX1, gramine, a different downstream metabolite, is synthesized instead of DIBOA (Fig. 1-4). The mutual exclusiveness of BXI and gramine suggests that the 7 neofunctionalization of BX1 from primary metabolism enabled the pathway that makes DIBOA. The emergence of these entry point enzymes provided opportunities for the emergence of new metabolic pathways. Additional duplicated genes that co- evolved and are fixed within these metabolic pathways often provide plants with added fitness through defense against biotic and abiotic stress as discussed below.

The glucosinolate biosynthetic pathway

The glucosinolate biosynthetic pathway exemplifies how multiple gene duplications followed by neofunctionalization or subfunctionalization co-evolve to generate diversified gene functions and downstream metabolites. Glucosinolates represent a large group of Brassicale specific metabolites that are involved in herbivore defense (Rask et al., 2000) and are thought to have anti-cancer properties (Hecht, 2000). The majority of glucosinolates are derived from amino acid methionine which is first de-aminated, the product of which can then be condensed with acetyl CoA by methylthioalkylmalate synthase (MAM) up to six times (Textor et al., 2004). The products of each round of condensation reaction then enter the main glucosinolate biosynthetic pathway (Fig. 1-5). Two MAMs have been characterized in Arabidopsis thaliana Col-0 ecotype where MAM1 catalyzes the first and second round of condensation, and MAM3 catalyzes the later rounds of condensation (Textor et al., 2004; Benderoth et al., 2006; Textor et al., 2007). Phylogenetic analysis shows that MAM3 arose recently after the divergence of Arabidopsis from the other Brassica members (Benderoth et al., 2006; Textor et al., 2007) while MAM1 represents the ancestral copy of the duplicate. These data together show that MAM3 neofunctionalized post duplication by evolving new enzymatic activities that are crucial for the downstream products found only in Arabidopsis species. 8

Later in the main glucosinolate pathway, amino acids including phenylalanine, tryptophan and chain elongated methionine derivatives are converted to their respective oximes by the CYP79 family. Two paralogs of this family CYP79F1 and CYP79F2, which most likely arose from a tandem gene duplication event because they are only 1kb apart and encoded on the same strand on chromosome 1, have catalytic activity towards long chain elongated methionine derivatives, however, CYP79F1 shows additional activity towards short chain elongated methionine derivatives (Chen et al., 2003), suggesting that two genes have subfunctionalized (Fig. 1-5). The ability of these genes to use long chain elongated methionine derivatives coincides with the neofunctionalization of MAM3 to generate their substrates, leading to downstream glucosinolates. Further, the expression pattern of CYP79F1 and F2 also diverged in that CYP79F1 is primarily expressed in rosette, stem and silique while CYP79F2 is expressed in root and silique to an extent (Chen et al., 2003). The subfunctionalization of CYP79F1 and F2 dictates the accumulation of different glucosinolates in an organ specific manner, and the co-evolution of MAM3 ensured the emergence of long chain methionine derived glucosinolates (Fig.1-5).

The phenylpropanoid biosynthetic pathway

Phenylpropanoids are derived from L- phenylalanine (Weisshaar and Jenkins, 1998). The downstream products of the pathway include but are not limited to flavonoids, hydroxycinnamic acids and lignin. These specialized metabolites play key roles in protection from herbivory, microbial infection, and UV damage, as well as attraction of pollinators, signaling and mechanical support (Landry et al., 1995; Winkel-Shirley, 2001; Dixon, 2002). In this pathway, ancient and recent gene duplication events also contribute to the biochemical diversity. There are two examples of neofunctionalization in which both substrate specificity and expression profiles 9 have diverged. Matsumo reported in 2009 that an RNA-based transposition duplication event of CYP98A3, which encodes an enzyme required for lignin and hydroxycinnamic acid biosynthesis, was followed by a tandem duplication event to give rise to CYP98A8 and CYP98A9 (Matsuno et al., 2009). By nature of the duplication event, the promoter was swapped upon transposition duplication, and in this case, the new promoters of CYP98A8 and A9 drive strong expression in pollen. In addition, the two genes neofunctionalized and now encode proteins that use different substrates. CYP98A3 uses p-coumaroyl shikimate, the product of which is required for the biosynthesis lignin and hydroxycinnamic acids, which accumulate in the inflorescence and leaves (Schoch et al., 2001). In contrast, CYP98A8 and A9 generates tricaffeoyl spermidine from tricoumaroyl spermidine, leading to the biosynthesis of N1,N5-di-(hydroxyferuoyl)-N10-sinapoyl spermidine, which is only found in the pollen (Fig. 1-2). Another example can be found in the Fabaceae family, in which the legume-microbe symbiosis is critical for plant fitness and legume-specific flavonoids such as vestitol mediate this interaction. The biosynthesis of these flavonoids relies on the ability of chalcone isomerase (CHI) to accept 6’- deoxychalcone. CHI in non-legume plant species exclusively uses 6’- hydroxychalcone. A tandem duplication event in the Fabaceae family led to the neofunctionalization of CHI2 which can take both 6’-deoxychalcone and 6’- hydroxychalone as substrate, enabling the biosynthesis of necessary metabolites to provide an opportunity to improve plant fitness via microbial interactions (Shimada et al., 2003) whereas the other copy of the gene CHI1 encodes an enzyme that still uses 6’-hydroxychalcone for the biosynthesis of other flavonoids (Fig. 1-2). The evolution of CHI2 represented an important step in the establishment of symbioses in the legume family.

10

Terpene biosynthetic pathway

Terpenes comprise the largest class of plant specialized metabolites, which are known for their roles in defense against pathogens and herbivores. Volatiles such as monoterpenes and sesquiterpenes are thought to attract or deter pollinators and herbivores (Dudareva et al., 2003; Gershenzon and Dudareva, 2007). Terpene synthases can convert either C10 intermediate geranyl diphosphate (GPP) to monoterpenes or C15 intermediate farnesyl diphosphate (FPP) to sesquiterpenes (Bohlmann and Croteau, 1999). There are over 30 potential terpene synthases in Arabidopsis, three of them At3g25810, At3g25820 and At3g25830 are tandem duplicates. At3g25830 and At3g25820 are expressed highly in the roots, whereas, At3g25810 is expressed in the flowers. At3g25830 and At3g25820 use GPP to make a major product 1,8- cineole and nine other minor products (Chen et al., 2004). At3g25810, reported to be involved in myrcene and (E)-β-ocimene biosynthesis, cannot generate 1,8- cineole, but can make the other nine minor products (Chen et al., 2003; Kollner et al., 2004) (Fig. 1-6). Although the three enzymes have some overlapping catalytic abilities, the proportions of their reaction products as well as the tissue distribution of these products have largely diverged. This tandem gene duplication event led to the neofunctionalization of both enzyme product and expression pattern, which then contributed to the overall diversity of terpenes.

Conclusions and future perspectives

The most common mechanism underlying the rise of complex specialized metabolite pathway is that in most reported cases of tandem gene duplication, one of the duplicated genes neofunctionalizes to gain additional biochemical function and/or a different expression pattern, generating downstream metabolites that potentially provide improved plant fitness. It is also possible that changes in expression patterns, rising from subfunctionalization, can also contribute to improved fitness. Future biochemical studies may reveal more 11 cases of how neofunctionalization and subfunctionalization contribute to diversified specialized metabolism. Improved technological advances may even further our analysis beyond tissue specificity, perhaps at the cell localization level where changes in the localization of proteins encoded by duplicated genes may affect the overall architecture of metabolic pathways.

12

References

Ananthakrishnan TN (1999) Induced responses, signal diversity and plant defense: Implications in insect phytophagy. Current Science 76: 285-290 Aury JM, Jaillon O, Duret L, Noel B, Jubin C, Porcel BM, Segurens B, Daubin V, Anthouard V, Aiach N, Arnaiz O, Billaut A, Beisson J, Blanc I, Bouhouche K, Camara F, Duharcourt S, Guigo R, Gogendeau D, Katinka M, Keller AM, Kissmehl R, Klotz C, Koll F, Le Mouel A, Lepere G, Malinsky S, Nowacki M, Nowak JK, Plattner H, Poulain J, Ruiz F, Serrano V, Zagulski M, Dessen P, Betermier M, Weissenbach J, Scarpelli C, Schachter V, Sperling L, Meyer E, Cohen J, Wincker P (2006) Global trends of whole-genome duplications revealed by the ciliate Paramecium tetraurelia. Nature 444: 171-178 Barnosky AD, Matzke N, Tomiya S, Wogan GO, Swartz B, Quental TB, Marshall C, McGuire JL, Lindsey EL, Maguire KC, Mersey B, Ferrer EA (2011) Has the Earth's sixth mass extinction already arrived? Nature 471: 51-57 Benderoth M, Textor S, Windsor AJ, Mitchell-Olds T, Gershenzon J, Kroymann J (2006) Positive selection driving diversification in plant secondary metabolism. Proc Natl Acad Sci U S A 103: 9118-9123 Bevan M, Bancroft I, Bent E, Love K, Goodman H, Dean C, Bergkamp R, Dirkse W, Van Staveren M, Stiekema W, Drost L, Ridley P, Hudson SA, Patel K, Murphy G, Piffanelli P, Wedler H, Wedler E, Wambutt R, Weitzenegger T, Pohl TM, Terryn N, Gielen J, Villarroel R, De Clerck R, Van Montagu M, Lecharny A, Auborg S, Gy I, Kreis M, Lao N, Kavanagh T, Hempel S, Kotter P, Entian KD, Rieger M, Schaeffer M, Funk B, Mueller-Auer S, Silvey M, James R, Montfort A, Pons A, Puigdomenech P, Douka A, Voukelatou E, Milioni D, Hatzopoulos P, Piravandi E, Obermaier B, Hilbert H, Dusterhoft A, Moores T, Jones JD, Eneva T, Palme K, Benes V, Rechman S, Ansorge W, Cooke R, Berger C, Delseny M, Voet M, Volckaert G, Mewes HW, Klosterman S, Schueller C, Chalwatzis N (1998) Analysis of 1.9 Mb of contiguous sequence from chromosome 4 of Arabidopsis thaliana. Nature 391: 485- 488 Blanc G, Wolfe KH (2004) Functional divergence of duplicated genes formed by polyploidy during Arabidopsis evolution. Plant Cell 16: 1679-1691 Bohlmann J, Croteau R (1999) Diversity and variability of terpenoid defences in conifers: molecular genetics, biochemistry and evolution of the terpene synthase gene family in grand fir (Abies grandis). Novartis Found Symp 223: 132-145; discussion 146-139 Briggs GC, Osmont KS, Shindo C, Sibout R, Hardtke CS (2006) Unequal genetic redundancies in Arabidopsis--a neglected phenomenon? Trends Plant Sci 11: 492-498 Brosius J (1991) Retroposons - Seeds of Evolution. Science 251: 753-753 13

Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A (2005) Evolution of DNA sequence nonhomologies among maize inbreds. Plant Cell 17: 343- 360 Chen F, Ro DK, Petri J, Gershenzon J, Bohlmann J, Pichersky E, Tholl D (2004) Characterization of a root-specific Arabidopsis terpene synthase responsible for the formation of the volatile monoterpene 1,8-cineole. Plant Physiol 135: 1956-1966 Chen F, Tholl D, Bohlmann J, Pichersky E (2011) The family of terpene synthases in plants: a mid-size family of genes for specialized metabolism that is highly diversified throughout the kingdom. Plant J 66: 212-229 Chen F, Tholl D, D'Auria JC, Farooq A, Pichersky E, Gershenzon J (2003) Biosynthesis and emission of terpenoid volatiles from Arabidopsis flowers. Plant Cell 15: 481-494 Chen S, Glawischnig E, Jorgensen K, Naur P, Jorgensen B, Olsen CE, Hansen CH, Rasmussen H, Pickett JA, Halkier BA (2003) CYP79F1 and CYP79F2 have distinct functions in the biosynthesis of aliphatic glucosinolates in Arabidopsis. Plant J 33: 923-937 Chu HY, Wegel E, Osbourn A (2011) From hormones to secondary metabolism: the emergence of metabolic gene clusters in plants. Plant J 66: 66-79 Davis JC, Petrov DA (2004) Preferential duplication of conserved proteins in eukaryotic genomes. PLoS Biol 2: E55 Dixon A (2002) The phenylpropanoid pathway and plant defense- a genomics perspective. Molecular Plant Pathology 3: 371-390 Dudareva N, Martin D, Kish CM, Kolosova N, Gorenstein N, Faldt J, Miller B, Bohlmann J (2003) (E)-beta-ocimene and myrcene synthase genes of floral scent biosynthesis in snapdragon: function and expression of three terpene synthase genes of a new terpene synthase subfamily. Plant Cell 15: 1227-1241 Ferrer JL, Austin MB, Stewart C, Jr., Noel JP (2008) Structure and function of enzymes involved in the biosynthesis of phenylpropanoids. Plant Physiol Biochem 46: 356-370 Freeling M (2009) Bias in plant gene content following different sorts of duplication: tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol 60: 433-453 Freeling M, Lyons E, Pedersen B, Alam M, Ming R, Lisch D (2008) Many or most genes in Arabidopsis transposed after the origin of the order Brassicales. Genome Res 18: 1924-1937 Freeling M, Thomas BC (2006) Gene-balanced duplications, like tetraploidy, provide predictable drive to increase morphological complexity. Genome Res 16: 805-814 Gershenzon J, Dudareva N (2007) The function of terpene natural products in the natural world. Nat Chem Biol 3: 408-414

14

Grun S, Frey M, Gierl A (2005) Evolution of the indole alkaloid biosynthesis in the genus : distribution of gramine and DIBOA and isolation of the benzoxazinoid biosynthesis genes from Hordeum lechleri. Phytochemistry 66: 1264-1272 Gu Z, Rifkin SA, White KP, Li WH (2004) Duplicate genes increase gene expression diversity within and between species. Nat Genet 36: 577-579 Hecht SS (2000) Inhibition of carcinogenesis by isothiocyanates. Drug Metab Rev 32: 395-411 Heller W, Egin-Buhler B, Gardiner SE, Knobloch KH, Matern U, Ebel J, Hahlbrock K (1979) Enzymes of general phenylpropanoid metabolism and of flavonoid glycoside biosynthesis in parsley: Differential inducibility by light during the growth of cell suspension cultures. Plant Physiol 64: 371-373 Jaillon O, Aury JM, Brunet F, Petit JL, Stange-Thomann N, Mauceli E, Bouneau L, Fischer C, Ozouf-Costaz C, Bernot A, Nicaud S, Jaffe D, Fisher S, Lutfalla G, Dossat C, Segurens B, Dasilva C, Salanoubat M, Levy M, Boudet N, Castellano S, Anthouard V, Jubin C, Castelli V, Katinka M, Vacherie B, Biemont C, Skalli Z, Cattolico L, Poulain J, De Berardinis V, Cruaud C, Duprat S, Brottier P, Coutanceau JP, Gouzy J, Parra G, Lardier G, Chapple C, McKernan KJ, McEwan P, Bosak S, Kellis M, Volff JN, Guigo R, Zody MC, Mesirov J, Lindblad-Toh K, Birren B, Nusbaum C, Kahn D, Robinson-Rechavi M, Laudet V, Schachter V, Quetier F, Saurin W, Scarpelli C, Wincker P, Lander ES, Weissenbach J, Roest Crollius H (2004) Genome duplication in the teleost fish Tetraodon nigroviridis reveals the early vertebrate proto- karyotype. Nature 431: 946-957 Jiang N, Bao ZR, Zhang XY, Eddy SR, Wessler SR (2004) Pack-MULE transposable elements mediate gene evolution in plants. Nature 431: 569- 573 Kaessmann H, Vinckenbosch N, Long MY (2009) RNA-based gene duplication: mechanistic and evolutionary insights. Nature Reviews Genetics 10: 19-31 Kawahara Y, de la Bastide M, Hamilton JP, Kanamori H, McCombie WR, Ouyang S, Schwartz DC, Tanaka T, Wu J, Zhou S, Childs KL, Davidson RM, Lin H, Quesada-Ocampo L, Vaillancourt B, Sakai H, Lee SS, Kim J, Numa H, Itoh T, Buell CR, Matsumoto T (2013) Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice (N Y) 6: 4 Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature 428: 617-624 Kollner TG, Schnee C, Gershenzon J, Degenhardt J (2004) The variability of sesquiterpenes emitted from two Zea mays cultivars is controlled by allelic variation of two terpene synthase genes encoding stereoselective multiple product enzymes. Plant Cell 16: 1115-1131 15

Landry LG, Chapple CC, Last RL (1995) Arabidopsis mutants lacking phenolic sunscreens exhibit enhanced ultraviolet-B injury and oxidative damage. Plant Physiol 109: 1159-1166 Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151-1155 Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van de Peer Y (2005) Modeling gene and genome duplications in eukaryotes. Proc Natl Acad Sci U S A 102: 5454-5459 Matsuno M, Compagnon V, Schoch GA, Schmitt M, Debayle D, Bassard JE, Pollet B, Hehn A, Heintz D, Ullmann P, Lapierre C, Bernier F, Ehlting J, Werck-Reichhart D (2009) Evolution of a novel phenolic pathway for pollen development. Science 325: 1688-1692 Ober D, Hartmann T (1999) Homospermidine synthase, the first pathway- specific enzyme of pyrrolizidine alkaloid biosynthesis, evolved from deoxyhypusine synthase. Proc Natl Acad Sci U S A 96: 14777-14782 Ohno S (1970) Evolution by gene duplication. Allen & Unwin; Springer-Verlag, London, New York Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang HB, Wang XY, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang LF, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Mehboob-ur-Rahman, Ware D, Westhoff P, Mayer KFX, Messing J, Rokhsar DS (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551-556 Paterson AH, Bowers JE, Chapman BA (2004) Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc Natl Acad Sci U S A 101: 9903-9908 Paterson AH, Freeling M, Tang H, Wang X (2010) Insights from the comparison of plant genome sequences. Annu Rev Plant Biol 61: 349-372 Pegueroles C, Laurie S, Alba MM (2013) Accelerated evolution after gene duplication: a time-dependent process affecting just one copy. Mol Biol Evol 30: 1830-1842 Rask L, Andreasson E, Ekbom B, Eriksson S, Pontoppidan B, Meijer J (2000) Myrosinase: gene family evolution and herbivore defense in Brassicaceae. Plant Mol Biol 42: 93-113 Rensing SA (2014) Gene duplication as a driver of plant morphogenetic evolution. Curr Opin Plant Biol 17: 43-48 Rizzon C, Ponger L, Gaut BS (2006) Striking similarities in the genomic distribution of tandemly arrayed genes in Arabidopsis and rice. PLoS Comput Biol 2: e115

16

Schoch G, Goepfert S, Morant M, Hehn A, Meyer D, Ullmann P, Werck- Reichhart D (2001) CYP98A3 from Arabidopsis thaliana is a 3'- hydroxylase of phenolic esters, a missing link in the phenylpropanoid pathway. J Biol Chem 276: 36566-36574 Schwede TF, Retey J, Schulz GE (1999) Crystal structure of histidine ammonia- lyase revealing a novel polypeptide modification as the catalytic electrophile. Biochemistry 38: 5355-5361 Shimada N, Aoki T, Sato S, Nakamura Y, Tabata S, Ayabe S (2003) A cluster of genes encodes the two types of chalcone isomerase involved in the biosynthesis of general flavonoids and legume-specific 5- deoxy(iso)flavonoids in Lotus japonicus. Plant Physiol 131: 941-951 Somerville C, Somerville S (1999) Plant functional genomics. Science 285: 380-383 Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, Radenbaugh A, Singh S, Swing V, Tissier C, Zhang P, Huala E (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36: D1009-1014 Tang H, Bowers JE, Wang X, Ming R, Alam M, Paterson AH (2008) Synteny and collinearity in plant genomes. Science 320: 486-488 Tang H, Bowers JE, Wang X, Paterson AH (2010) Angiosperm genome comparisons reveal early polyploidy in the monocot lineage. Proc Natl Acad Sci U S A 107: 472-477 Tang H, Wang X, Bowers JE, Ming R, Alam M, Paterson AH (2008) Unraveling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res 18: 1944-1954 Tautz D (1992) Redundancies, development and the flow of information. Bioessays 14: 263-266 Textor S, Bartram S, Kroymann J, Falk KL, Hick A, Pickett JA, Gershenzon J (2004) Biosynthesis of methionine-derived glucosinolates in Arabidopsis thaliana: recombinant expression and characterization of methylthioalkylmalate synthase, the condensing enzyme of the chain- elongation cycle. Planta 218: 1026-1035 Textor S, de Kraker JW, Hause B, Gershenzon J, Tokuhisa JG (2007) MAM3 catalyzes the formation of all aliphatic glucosinolate chain lengths in Arabidopsis. Plant Physiol 144: 60-71 Throude M, Bolot S, Bosio M, Pont C, Sarda X, Quraishi UM, Bourgis F, Lessard P, Rogowsky P, Ghesquiere A, Murigneux A, Charmet G, Perez P, Salse J (2009) Structure and expression analysis of rice paleo duplications. Nucleic Acids Res 37: 1248-1259 Wagner A (1998) The fate of duplicated genes: loss or new function? Bioessays 20: 785-788 Wang Y, Wang X, Paterson AH (2012) Genome and gene duplications and gene expression divergence: a view from plants. Ann N Y Acad Sci 1256: 1-14 17

Wang Y, Wang X, Tang H, Tan X, Ficklin SP, Feltus FA, Paterson AH (2011) Modes of gene duplication contribute differently to genetic novelty and redundancy, but show parallels across divergent angiosperms. PLoS One 6: e28150 Weisshaar B, Jenkins GI (1998) Phenylpropanoid biosynthesis and its regulation. Curr Opin Plant Biol 1: 251-257 Weng JK (2014) The evolutionary paths towards complexity: a metabolic perspective. New Phytol 201: 1141-1149 Wink M (2010) Biochemistry of plant secondary metabolism, Ed 2nd. Wiley- Blackwell, Chichester, West Sussex, U.K. ; Ames, Iowa Winkel-Shirley B (2001) Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol 126: 485-493 Wolfe KH, Shields DC (1997) Molecular evidence for an ancient duplication of the entire yeast genome. Nature 387: 708-713 Wu J, and Baldwin, I.T. (2010) New insights into plant responses to the attack from insect herbivores. Annu Rev Genet 44: 1-24 Zhang P, Gu Z, Li WH (2003) Different evolutionary patterns between young duplicate genes in the human genome. Genome Biol 4: R56 Zhao XP, Si Y, Hanson RE, Crane CF, Price HJ, Stelly DM, Wendel JF, Paterson AH (1998) Dispersed repetitive DNA has spread to new genomes since polyploid formation in cotton. Genome Res 8: 479-492

18

Figures

Figure 1-1. Gene fates post duplication. Most genes are nonfunctionalized by silencing or becoming pseudogenes (shown in white at left). The genes that are retained either neofunctionalize or subfunctionalize. P, promoter. CDS, coding regions. Color change in P represents changes in promoter regions that lead to differential gene expression. Green in CDS represents new function. Lighter blues in CDS represent sub-divided gene functions.

19

Figure 1-2. The phenylpropanoid pathway and primary metabolism. The first step of the phenylpropanoid pathway catalyzed by phenylalanine ammonia lyase (PAL) is likely the result of a duplication of histidine ammonia lyase (HAL) from primary metabolism. The neofunctionalization of chalcone isomerase 2 (CHI2) from CHI1 is important in generating legume-specific flavonoids. The neofunctionalization of CYP98A8/9 led to the biosynthesis of pollen specific metabolites in Arabidopsis. Solid arrows are single step reactions. Dotted arrows represent multiple steps. 20

Figure 1-3. Neofunctionalization of homospermidine synthase (HSS) from deoxyhypusine synthase (DHS). The neofunctionalization of HSS from DHS in protein primary metabolism enabled the biosynthesis of a specific pyrrolizidine alkaloid. Solid arrows are single step reactions. Dotted arrow represents multiple steps.

21

Figure 1-4. Neofunctionalization of BX1 from tryptophan synthase. The neofunctionalization of BX1 from tryptophan biosynthesis in primary metabolism established the biosynthesis of benzoxazinoid 2, 4-dihydroxy-2H-1, 4-benzoixazin-3(4H)-one (DIBOA), an indole alkaloid. Solid arrows are single step reactions. Dotted arrow represents multiple steps. 22

Figure 1-5. The methionine-derived glucosinolate pathway. Methylthioalkylmalate synthase 1 (MAM1) and MAM3 neofunctionalized to generate long chain and short chain elongated methionine precursors. Then the subfunctionalized CYP79F2 enabled the later biosynthesis of downstream longer chain elongated methionine glucosinolates. Solid arrows are single step reactions. Dotted arrows indicate multiple steps. 23

Figure 1-6. The role of At3g25810, At3g25820 and At3g25830 in terpene biosynthesis. The three tandem duplicated genes have subfunctionalized in that although all three encode enzymes that can produce 9 minor products, their major products have diverged: At3g25810 produces myrcene and (E)-β-ocimene while the other two make 1,8-cineole as the major product. Dotted arrows indicate multiple steps.

24

CHAPTER 2. ARABIDOPYRONES ARE NEWLY IDENTIFIED PHENYLPROPANOIDS IN ARABIDOPSIS

Introduction

It is estimated that more than 500,000 specialized metabolites are present in the plant kingdom (Wu, 2010). Unlike primary metabolites, the range of distribution of these specialized compounds can range from widespread to highly species specific. For example, the phenylpropanoid pathway is thought to be common to all vascular plants, although some end products of this pathway are only synthesized in certain species. Phenylpropanoids are a group of natural products derived from L- phenylalanine (Weisshaar and Jenkins, 1998). The downstream products of the pathway include flavonoids, hydroxycinnamic acids and lignin among others. These specialized metabolites play a key role in processes such as protection from herbivory, microbial infection, UV damage, attraction of pollinators, signaling and mechanical support (Landry et al., 1995; Winkel-Shirley, 2001; Dixon, 2002). Sinapoylmalate, an end metabolite of the pathway that serves as a UV protectant, is found primarily in members of the Brassicaceae (Chapple et al., 1992; Landry et al., 1995). Using the UV-fluorescent nature of sinapoylmalate, a number of reduced epidermal fluorescence (ref) mutants were identified in Arabidopsis (Ruegger and Chapple, 2001), including the ferulic acid hydroxylase 1 (fah1) mutant (Chapple et al., 1992). The fah1 mutant is defective in the gene encoding ferulate 5- hydroxylase (F5H or CYP84A1), one of the three P450-dependent monooxygenases required for lignin biosynthesis. In a recent study we investigated the role of a paralog of CYP84A1, CYP84A4, in Arabidopsis (Weng et al., 2012). We found that CYP84A4 has diverged from its ancestral 5- 25 hydroxylase function and has acquired greatly enhanced 3-hydroxylase activity. Most species contain only one copy of the CYP84A subfamily gene whereas both CYP84A4 and CYP84A1 are present in the Arabidopsis genus, suggesting that one of these paralogs may have evolved recently. We also found that CYP84A4 is required for the biosynthesis of a set of four phenylalanine-derived α-pyrone compounds: arabidopyl alcohol, iso-arabidopyl alcohol, arabidopic acid and iso- arabidopic acid, which together we have named arabidopyrones (APs) (Weng et al., 2012). Like sinapate esters, the distribution of APs is restricted. To date, these compounds have only been found in Arabidopsis, possibly due to the recent neofunctionalization of CYP84A4. Although the biological function of APs is still unclear, some pyrone or pyrone substituted compounds have been shown to be cytotoxic (Shiono, 2010) and others can be chemically activated upon irradiation with UV light (Legrand et al., 2010). In plants, there are other known metabolites that include a nitrogen- or oxygen-containing heterocyclic moiety similar to that of APs. For example, betalamic acid is found in Portulaca grandiflora as a precursor to pigments (Christinet et al., 2004) and stizolobinic acid and stizolobic acid are found in Stizolobium hassjoo (Saito and Komamine, 1976). A common theme among these metabolites is that their synthesis involves an extradiol ring-cleavage reaction of the catechol-substituted precursor dihydroxyphenylalanine (DOPA), the products of which rearrange to form the heterocyclic ring found in the final metabolites. Similarly, we have found that the only extra-diol ring-cleavage dioxygenase gene (AtLigB) in Arabidopsis is required for AP biosynthesis (Weng et al., 2012). AtLigB and its homologs are well-conserved within the plant kingdom, suggesting that AtLigB may play a broader role in plant metabolism. The recent neofunctionalization of CYP84A4 in Arabidopsis led us to speculate that APs are produced in Arabidopsis because CYP84A4 generates a new metabolite that can be acted upon enzymes from a pre-existing pathway or pathways, including AtLigB. This hypothesis is supported by the fact that APs can accumulate in tissues where APs are not normally detected when CYP84A4 26 is over-expressed ectopically (Weng et al., 2012). We propose that we could learn more about phenylpropanoid metabolism in various plant species by dissecting the steps of AP biosynthesis. Perhaps more importantly, understanding AP biosynthesis in Arabidopsis may provide a unique opportunity to learn the broader biological function of AtLigB, and the pathway in which AtLigB otherwise participates.

Results

Summary of published results

CYP84A4 is required for AP biosynthesis

CYP84A4 is a paralog of CYP84A1 in Arabidopsis (Fig. 2-1). We have shown that CYP84A1 and CYP84A4 have distinct catalytic functions through a number of experiments. First, we identified a homozygous CYP84A4 T-DNA insertional mutant Salk_064406, which we later named apd1 for arabidopyrone deficient 1. While CYP84A1 mutants fluoresce red under UV light due to their inability to synthesize the fluorescent specialized metabolite sinapoylmalate, plants homozygous for apd1 do not, suggesting that CYP84A1 and CYP84A4 are not redundant genes. Second, to test if the non-redundancy of CYP84A1 and CYP84A4 is due to the tissue specificity of their expression, we over-expressed CYP84A4 under the control of the cinnamate 4-hydroxylase (C4H) promoter and used this construct to transform CYP84A1-deficient fah1 plants. When driving CYP84A1 expression, the C4H promoter is known to lead to complementation of the fah1 phenotype. If CYP84A4 catalyzes the same chemical reaction as CYP84A1, then fah1 plants expressing CYP84A4 under the C4H promoter should be complemented for the fah1 phenotype; however, we found that they were not, supporting the hypothesis that CYP84A4 has different catalytic properties than CYP84A1. Finally, we heterologously expressed CYP84A4 in vitro. In these experiments, CYP84A4 did not show detectable activity towards 27 either coniferyl alcohol or coniferyl aldehyde, both of which are substrates of CYP84A1. Together these data indicated that CYP84A4 biochemical function is distinct from that of CYP84A1. To learn more about the biochemical function of CYP84A4, we analyzed the soluble metabolites of apd1 plants and compared them to those of wild-type plants. Four unknown compounds were found in wild-type young stem tissues that were absent in apd1 plants. Furthermore, when apd1 plants were transformed with a construct encoding CYP84A4, the accumulation of the four compounds was restored whereas an experiment using apd1 plants transformed with a construct encoding CYP84A1 did not restore their accumulation. These data indicated that CYP84A4 is required for the biosynthesis of the four unknown compounds and provided further evidence that the catalytic function of CYP84A4 and CYP84A1 are distinct. The four unknown compounds were then HPLC- purified and analyzed by LC-MS and the structures of the four compounds were identified by MS and NMR. We named these newly discovered pyrone compounds arabidopyrones: arabidopyl alcohol, iso-arabidopyl alcohol, arabidopic acid and iso-arabidopic acid. The structure of arabidopyl alcohol was confirmed by synthesis. Finally, when the C4H-CYP84A4 fah1 plants described above were analyzed, it was found that arabidopyrones over-accumulated in stem tissues and also appeared in leaf tissues, suggesting that CYP84A4 may be the rate-limiting step of arabidopyrone biosynthesis.

APs are derived from PHE

Based upon work on related substances in other plants, we speculated that a compound with a catechol substitution pattern, such as dihydroxyphenylalanine (DOPA), is most likely to be an intermediate in AP biosynthesis. DOPA is derived from phenylalanine and/or tyrosine in vertebrates and only from tyrosine in plants (Joy et al., 1995). To test if either tyrosine or phenylalanine is the precursor of arabidopyrones, we grew Arabidopsis seedlings over-expressing CYP84A4 on media containing either deuterium ring-labeled 28 phenylalanine (2H-5-PHE) or tyrosine (2H-4-TYR) and harvested seedlings four days after germination. Arabidopyl alcohol was then HPLC-purified and analyzed by LC-MS. The molecular mass of arabidopyl alcohol is 196, so negative ion mode mass-spectrometry detects its M-1 form with an m/z ratio of 195 (Fig. 2-2). In unlabeled PHE- and TYR-fed seedlings as well as 2H-4-TYR -fed seedlings, LC-MS analysis detected arabidopyl alcohol with an m/z ratio of 195. In contrast, parallel analysis of extracts of 2H-5-PHE -fed seedlings revealed that arabidopyl alcohol from these plants had an m/z ratio of 197 (M-1+2), suggesting that phenylalanine is the precursor of arabidopyl alcohol and that two deuteriums from 2H-5-PHE are retained from the labeled precursor. Furthermore, the three main ring-derived fragments generated from arabidopyl alcohol are increased in mass by 2 units, indicating that the heterocyclic ring structure of arabidopyl alcohol originate from the phenyl ring of phenylalanine. Although it might be assumed that all nine carbons of arabidopyl alcohol are derived from the nine carbons of phenylalanine, there are examples of phenylalanine-derived plant metabolites in which the phenylalanine side-chain is shortened and then rebuilt from another source (Krizevski et al., 2010). To test the hypothesis that the ring and the side-chain from phenylalanine remain intact in AP biosynthesis, we repeated the feeding experiment with equal concentrations of 13C-1-PHE and 2H-5-PHE. We found equal ratio of arabidopyl alcohol derived from 13C-1-PHE and 2H-5-PHE, indicating that the side-chain of phenylalanine remains intact during AP biosynthesis (Fig. 2-3).

CYP84A4 generates caffealdehyde

Known metabolic routes that lead to heterocyclic compounds like APs often involve enzyme-catalyzed ring-cleavage reactions. For example, in Portulaca grandiflora, dihydroxyphenylalanine (DOPA) is cleaved by an extradiol ring-cleavage dioxygenase and the product of this reaction non-enzymatically rearranges to form betalamic acid (Christinet et al., 2004). Similarly, stizolobinic acid and stizolobic acid found in Stizolobium hassjoo (Saito and Komamine, 1976) 29 are synthesized from DOPA via a catechol dioxygenase. The common theme of the biosynthesis of these metabolites involves an extradiol ring-cleavage reaction of the catechol substituted precursor dihydroxyphenylalanine (DOPA). Knowing CYP84A1 catalyzes a 5-hydroxylation reaction, we speculated that its paralog, CYP84A4, catalyzes a 3-hydroxylation reaction that produces a 3,4- dihydroxylated intermediate that can be used by a ring-cleavage dioxygenase. To test this hypothesis, CYP84A4 was expressed in the WAT11 yeast strain and microsomal fractions were isolated for enzyme assays (Pompon et al., 1996). Various phenyl or hydroxy-phenyl substituted acids, aldehydes and alcohols were tested. CYP84A4 only showed 3-hydroxylation activity toward p- coumaraldehyde to form caffealdehyde with a Km in the hundred millimolar range. To test if caffealdehyde is involved in AP biosynthesis, wild-type plants and apd1-1 (CYP84A4 T-DNA insertional null) plants were grown on media containing either caffealdehyde, caffeic acid or caffeoyl alcohol and secondary metabolites of four-day-old seedlings were analyzed by HPLC. Caffealdehyde, but not caffeic acid or caffeoyl alcohol chemically complemented the AP-deficient phenotype of apd1-1 plants. These data strongly suggest that caffealdehyde is involved in the AP biosynthesis.

LigB is recruited in AP biosynthesis

Only one gene in the Arabidopsis genome, At4g15093 or AtLigB, is annotated to encode an extra-diol ring cleavage enzyme, which is homologous to the 4,5-DOPA extradiol dioxygenase in Portulaca grandiflora (Christinet et al., 2004). We identified two independent homozygous AtLigB T-DNA insertional lines: SALK_141715c (apd2-1) and CS800365 (apd2-2), in which the T-DNA insertions are in the promoter region and in the first intron, respectively. AP accumulation was near or below the limits of detection in apd2-1 and apd2-2 plants (Fig. 2-4); strongly suggesting that AtLigB is required for AP biosynthesis. To further test this hypothesis, apd2-1 plants were transformed with a construct over-expressing AtLigB under the control of 35S promoter. The apd2-1/35S- 30

AtLigB plants accumulated arabidopyrones at levels comparable to wild type, demonstrating that AtLigB is required for AP synthesis.

AtLigB contains a functional domain that is highly conserved in bacteria, yeast, plants and a few other organisms including silkworms and jellyfish. Bayesian phylogenetic analysis of the relationships among the AtLigB homologs in plants is presented in Figure 2-5. The protein sequences were obtained from “InParanoid6 Eukaryotic ortholog clusters” (Berglund et al., 2008) and NCBI (Geer et al., 2010). The conservation of AtLigB and its homologs suggests that these enzymes are under selective pressure and serve important functions.

Assembly of the AP biosynthetic pathway

PAL

We showed that APs are derived from phenylalanine and caffealdehyde is an intermediate in the AP biosynthetic pathway (Weng et al., 2012), indicating that earlier phenylpropanoid enzyme activities including PAL may be required for AP biosynthesis. Four PAL genes are present in the Arabidopsis genome and each is required to a greater or lesser degree for normal phenylpropanoid metabolism in Arabidopsis (Raes et al., 2003; Huang et al., 2010). To test if one or more of the four known PALs is/are specifically involved in AP biosynthesis, we analyzed AP content in the inflorescence stems of various PAL single and multiple mutants. AP content in the pal1 pal2 double mutant was not significantly different from wild type, whereas AP content was about 50% reduced in the pal1 pal2 pal3 pal4 quadruple mutant compared to wild type (Fig. 2-6b). Similar observations were made when we analyzed sinapoylmalate content in the same tissue (Fig. 2-6b). In contrast, when we analyzed the same mutants for leaf sinapoylmalate content, we observed a decrease of sinapoylmalate in the pal1 pal2 double mutant and a further reduction in the pal1 pal2 pal3 pal4 quadruple mutant compared to wild-type plants (Fig.2-6a). These data indicate that either there is sufficient residual PAL activity in the inflorescence stem of the quadruple 31 mutant to support the biosynthesis of APs and sinapoylmalate or that there is an alternative pathway for the synthesis of cinnamate in the quadruple mutant background. To test the first hypothesis, we measured PAL activity in these mutants. PAL activity was reduced by about 80% in the pal1 pal2 double mutant and 90% in the pal1 pal2 pal3 pal4 when compared to wild-type plants, suggesting that the 10% residual PAL activity may be sufficient for the biosynthesis of APs (Fig. 2-6c), although the source of the PAL activity remains unknown.

C4H

Given that a catechol substituted compound is required for AtLigB to act upon and that CYP84A4 catalyzes a 3-hydroxylation reaction using p- coumaraldehyde as a substrate, we postulated that a 4-hydroxylase is required for AP biosynthesis and the most likely candidate for that is C4H. To test whether AP biosynthesis is affected when C4H activity is perturbed in vivo, we examined if the C4H inhibitor piperonylic acid (PA) (Schalk et al., 1998) had an effect on AP biosynthesis. To do so we grew Arabidopsis seedlings in the presence of the inhibitor and to simplify AP detection we used CYP84A4 over-expressing seedlings that accumulate higher levels of APs than wild type. These seedlings were treated with PA alone, p-coumaric acid alone or with p-coumaric acid in addition to PA (Fig. 2-7a). AP content was unchanged in seedlings supplemented with p-coumaric acid alone, indicating that p-coumaric acid is neither toxic at this concentration nor does it increase AP biosynthesis when provided exogenously. AP content was reduced in seedlings treated with PA compared to the control group, consistent with a role of C4H in AP biosynthesis. In PA-treated seedlings, AP content was significantly (p<0.05) restored in seedlings supplemented with p- coumaric acid compared to that supplemented with PA alone, suggesting that p- coumaric acid is an intermediate of AP biosynthesis. 32

To test genetically the involvement of C4H in AP biosynthesis, we analyzed the AP content of mutants defective in the C4H gene. AP content was reduced in the two C4H mutants ref3-2 (strong allele, male sterile) and ref3-3 (weak allele) (Schilmiller et al., 2009) compared to that of wild type (Fig. 2-7b). These data indicate that disruptions of the C4H gene lead to defects in AP biosynthesis and that proper C4H function is required for normal AP biosynthesis.

4CL

4CL catalyzes the last step of the general phenylpropanoid pathway and is required for p-coumaraldehyde biosynthesis. We postulated that 4CL activity is required for AP biosynthesis since p-coumaraldehyde is a known substrate of CYP84A4. 4CL is encoded by a family of four genes in Arabidopsis (Lee et al., 1995; Ehlting et al., 1999; Soltani et al., 2006). To test if one or more of the 4CLs is/are specifically involved in AP biosynthesis, we measured AP content in available 4cl mutants. No single 4cl mutants showed any difference in AP biosynthesis compared to wild type, however, the AP content in a 4cl1 4cl2 4cl3 triple mutant is reduced by 50% compared to wild type (4cl2 4cl4, 4cl1 4cl2 4cl4 and 4cl2 4cl3 4cl4 could not be tested due to the close genetic linkage of 4CL2 and 4CL4) (Fig. 2-8), suggesting that 4CL activity is required for AP biosynthesis.

CCR

It has been shown that CYP84A4 can convert p-coumaraldehyde to caffealdehyde in vitro (Weng et al., 2012). In Arabidopsis, it is thought that p- coumaraldehyde is synthesized from p-coumaroyl CoA by the enzymes cinnamoyl-CoA reductase 1 (CCR1) and cinnamoyl-CoA reductase 2 (CCR2) (Lauvergeat et al., 2001). To test if these two CCR genes are involved in AP biosynthesis, we analyzed AP contents in ccr1 and ccr2 mutants. HPLC analysis revealed that the ccr1 mutant had about 60% of the wild-type levels of APs and that the ccr2 mutant had wild-type levels of APs (Fig. 2-9a). This is consistent with phenotypes associated with sinapoylmalate accumulation in that its content 33 in leaves is decreased in a ccr1 mutant background but not in a ccr2 mutant (Fig. 2-9b). These results indicate that CCR is involved in AP biosynthesis.

CAD C and CAD D

In attempts to identify other genes involved in AP biosynthesis, we took a candidate gene approach and hypothesized that one or more reductases can convert an aldehyde AP precursor to arabidopyl alcohol. The side-chain structures of APs resemble those of monolignol precursors, so we tested whether known reductases in the phenylpropanoid pathway are involved in AP biosynthesis. We found that the AP content was altered in a double mutant that is defective in both cinnamyl alcohol dehydrogenase C (CAD C) and cinnamyl alcohol dehydrogenase D (CAD D), encoding the two hydroxycinnamaldehyde reductases known to be required for monolignol biosynthesis (Sibout et al., 2005). In all the mutants that are defective in AP biosynthesis identified so far, levels of all four APs content are either reduced or eliminated. Surprisingly, the levels of only arabidopyl alcohol (and not the other three APs), the most abundant AP in a wild-type plant, were reduced in a cad c cad d double mutant, indicating that CAD C and CAD D are involved in arabidopyl alcohol biosynthesis. In contrast, the levels of iso-arabidopyl alcohol is increased in the double mutant (Fig. 2-10), indicating that a portion of the flux that normally goes to arabidopyl alcohol biosynthesis may be re-directed to iso-arabidopyl alcohol biosynthesis in the cad c cad d double mutant. Last, the level of iso-arabidopic acid was moderately increased, suggesting that CAD C and CAD D may convert caffealdehyde to caffeoyl alcohol that then can be used by the ring-cleavage dioxygenase to generate arabidopyl alcohol and arabidopic acid (Fig. 2-11).

Identification of other genes of AP biosynthesis

Although many of the key steps of AP biosynthesis have been identified, we still do not know all the enzymes required in AP biosynthesis. The minimum biosynthetic steps that are still unknown include a possible oxidase that 34 generates the lactone from the alcohol is adjacent to the heterocyclic oxygen after hemiacetal ring closure and the oxidases/reductases that generate the alcohol and acid on the AP sidechains. For the first enzyme, we tested two candidates. A well characterized bacterial enzyme LigC can catalyze such reactions following ring closure (Masai et al., 2000). Although the sequence similarities are low, we identified four putative homologs of LigC in Arabidopsis. There was no reduction of AP content in any single mutant of the four putative LigC genes. This could be due to the redundancy of gene function or that the oxidation step is catalyzed by an unknown enzyme. The second candidate is glucose-1-dehydrogenase which oxidizes the alcohol at the C1 position of the hemiacetal form of a glucose molecule to a ketone. This enzyme is found in bacteria (EC1.1.1.47) (Masai et al., 2000) and its homolog in Arabidopsis is ABA2. However, an allelic series of aba2 mutants did not show changes in AP content, indicating that ABA2 is not involved in AP biosynthesis. For the oxidase/reductases involved in generating the size chain, we tested two enzymes based chemical structure similarities: REF1 whose known function is generating sinapic acid from sinapaldehyde (Nair et al., 2004) and ABA3 which is known to catalyze the last oxidation step in ABA biosynthesis (Schwartz et al., 1997). In neither case, did we detect any difference in AP biosynthesis compared to wild type. Forward genetic screens using natural accessions may help us identify additional genes involved in AP biosynthesis. We analyzed 80 Arabidopsis natural accessions for their AP contents. In these experiments, accession Cvi-0 was found to be a natural apd mutant (Fig. 2-12), suggesting that a gene encoding an enzyme or regulatory factor involved in AP biosynthesis is defective in this accession. To determine whether the apd phenotype in Cvi-0 could be attributed to defects in known genes CYP84A4 or AtLigB, we mapped the phenotype in a population of Cvi-0 x Ler recombinant inbred lines (Alonso-Blanco et al., 1998). In this experiment, the locus controlling the apd phenotype mapped to a region on the upper arm of chromosome V containing At5g04330 35

(CYP84A4). When compared to the nucleotide sequences published on POLYMORPH (1001genomes.org), Cvi-0 has an A to C change at position 1213923 that results in a L389R substitution in CYP84A4 and this lysine residue is conserved among the CYP84A subfamily. This was confirmed independently by sequencing the genomic region of CYP84A4 from this accession. This nucleotide change is unique among all the sequenced accessions and is likely to explain the lack of APs in Cvi-0.

Discussion

The evolution of AP biosynthesis in Arabidopsis thaliana exemplifies the emergence of a new metabolic pathway in plant specialized metabolism. Post gene duplication, CYP84A4 neofunctionalized from its ancestral CYP84A1 5- hydroxylase activity, which is required for lignin and sinapoylmalate accumulation, and obtained a predominant 3-hydroxylase activity which is present in CYP84A1 at only low levels. Then by the recruitment of proteins of known function in the phenylpropanoid pathway including PAL, C4H, 4CL, CCR and CAD, and proteins of unknown but presumably important function such as AtLigB, APs are generated and accumulate as end products. Similar modes of generating new metabolic pathways are documented and are thought to contribute to the complex chemical diversity within plant species. For example, Matsumo reported in 2009 that a RNA-based transposition duplication event of CYP98A3, which encodes an enzyme required for lignin and sinapoylmalate biosynthesis, is followed by a tandem duplication event to give rise to CYP98A8 and CYP98A9 (Matsuno et al., 2009), which neofunctionalized both in substrate specificity and expression patterns. In this case, the new promoters of CYP98A8 and A9 post transposition event drive strong expression in pollen. As a result, new metabolites are generated in pollen, presumably also by the recruitment of enzymes of other functions. In AP biosynthesis, the expression patterns of the two paralogs also diverged. Although both genes are expressed in the inflorescence, CYP84A4 is 36 expressed in the phloem while CYP84A1 is expressed in the epidermis and the xylem (unpublished data), indicating that the accumulation of downstream products such as APs and sinapoylmalate may be restricted to the cell types in which the two genes are expressed. In both of these two cases, neofunctionalization including both enzyme activities and expression patterns led to the biosynthesis of new metabolites in a tissue/organ specific manner, contributing to the increased chemical diversity in Arabidopsis specialized metabolism. Upon further analysis in AP biosynthesis, we found that enzymes leading to p-coumaraldehyde (substrate of CYP84A4) biosynthesis including PAL, C4H, 4CL and CCR are all required for AP biosynthesis. When we analyzed the pal quadruple mutants, although there is only 10% PAL activity left, we only observed a 50% reduction of AP content, suggesting that a small amount of PAL activity, although insufficient for sinapoylmalate biosynthesis, can support the AP biosynthesis. Similarly, although the activities of the other enzymes are required for AP biosynthesis, we do not observe an AP null phenotype in ref3 and ccr1 mutants like we see in apd1-1 and apd2-2 mutants, suggesting that the rate limiting steps for AP biosynthesis is not at any of these steps but at CYP84A4 and AtLigB. The current assembly of the AP biosynthetic pathway also includes CAD, an enzyme of known function in the phenylpropanoid pathway and AtLigB, an enzyme that is highly conserved but is of unknown function in Arabidopsis and most other plant species. Our data is consistent with the model that caffealdehyde which is generated by CYP84A1 can be reduced to caffeoyl alcohol by CAD. We postulate that AtLigB preferentially cleaves at the distal position using caffeoyl alcohol to generate precursor to arabidopyl alcohol and arabidopic acid, whereas caffealdehyde is cleaved by AtLigB at the proximal position to generate the AP isomers. In this model (Fig. 2-11), when CAD is blocked, both arabidopyl alcohol and arabidopic acid levels decrease and the flux that normally goes to biosynthesis of these two compounds can be redirected to 37 the biosynthesis of the two isomers, consistent with the observation that the levels of the isomers increase slightly in the cad c cad d double mutant background. Further, AtLigB has homologs in most plant species, fungi and bacteria. The function of LigB is well characterized in the degradation of phenolic compounds in bacteria (Peng et al., 1998; Masai et al., 2000; Kasai et al., 2004). In plants, the function of characterized LigB enzymes is to cleave DOPA to generate betalamic acid which is a precursor to pigments whose presence is restricted to certain plant lineages (Christinet et al., 2004). AtLigB has the highest expression levels in pollen (Winter et al., 2007), suggesting it functions in that tissue. Understanding the function of AtLigB in Arabidopsis may provide insights in the functions of its homologs in plants that do not produce betalamic acid. In summary, the emergence of AP biosynthesis is attributed to the neofunctionalization of CYP84A4 which generates new metabolites that can be used by enzymes that normally participate in other metabolic steps. Enzymes required for the biosynthesis of CYP84A4 substrate are important for AP biosynthesis, however they are not the rate limiting step in this pathway. Further recruitment of AtLigB, which is conserved and presumably has other important functions, enables the generation of the α-pyrone moiety found in AP structures. Not only is AP biosynthesis a case study of how neofunctionalization of genes contributes to new plant specialized metabolic pathway, it is also a platform to further study the biological function of AtLigB, which may have broader impact on understanding its function in other plant species.

Materials and methods

Plant growth conditions Soil grown Arabidopsis thaliana plants were grown in growth chamber during a 16 hour light and 8 hour dark cycle under a light intensity of 120 µE sec-1 m-2 at 22oC ( Redi-Earth Plug and Seedling Mixture; Sun Gro Horticulture, Bellevue, WA, USA). The soil was supplemented with Scotts Osmocote Plus 38 controlled release fertilizer (Hummert International, Earth City, MO, USA). For plate grown Arabidopsis plants, seeds were surface sterilized at room temperature while shaking for 30 min in a mixture of 2 parts 0.1% Triton X-100 and 1 part household bleach. Seeds were washed five times with sterile water and plated onto ammonia-free Murashige and Skoog (MS) media (Murashige and Skoog, 1962). The plants were then cold treated for 2 days in the dark before placed in continuous light chamber or 16 hour light/ 8 hour dark chamber at 22oC.

Phylogenetic analysis The phylogenetic tree was constructed using MRBAYES version 3.1.1 (Huelsenbeck and Ronquist, 2001), and the protein sequences were aligned using MEGA version 5 (Tamura et al., 2011). The analysis parameters were set to: aamodelpr=mixed, nset=6, rates=invgamma. The Markov chain Monte Carlo (MCMC) analysis was set to run for 1,000,000 generations with a sampling frequency of every 1,000th generation.

HPLC analysis Sinapoylmalate and APs were extracted with 50% MeOH at 100 mg or 250 mg fresh weight mL-1 for 30 minutes at 65 oC. Sinapoylmalate was separated by HPLC using a Shim-pack XR-ODS column (Shimadzu, Kyoto, Japan; column dimensions 3.0mm I.D.x 75 mm, bead size 2.2µm). The gradient for HPLC separation is 2.0% to 25% acetonitrile during 29.5 min in 0.1% formic acid at a flow rate of 0.9 mL min-1. APs were separated on a Microsorb C18 column (Varian, Santa Clara, CA, USA) using a gradient from 2% acetonitrile in 0.1% formic acid to 10% acetonitrile in 0.1% formic acid at a flow rate of 1.2 ml min−1.

PAL enzyme assay The protocol was modified from (Huang et al., 2010). Briefly, samples were homogenized in 0.1 m Tris-HCl buffer, pH 8.5 with 2mM DTT, and 39 centrifuged for 15 min at 4°C at 14,000 rpm. The supernatant was then desalted using PD-10 column and protein content was determined using the Bradford procedure (Hammond and Kruger, 1988). The reaction mixture consisted of 100 µL protein extract and 4 mM L-phenylalanine in the same buffer. The reaction mixture was incubated at 37°C for 2 hrs. The reaction was stopped by adding 10 μL of glacial acetic acid. The mixture was then ethyl acetate extracted and dried down in a speed vac then re-dissolved in 50% MeOH and analyzed by HPLC.

40

References

Alonso-Blanco C, Peeters AJ, Koornneef M, Lister C, Dean C, van den Bosch N, Pot J, Kuiper MT (1998) Development of an AFLP based linkage map of Ler, Col and Cvi Arabidopsis thaliana ecotypes and construction of a Ler/Cvi recombinant inbred line population. Plant J 14: 259-271 Berglund AC, Sjolund E, Ostlund G, Sonnhammer EL (2008) InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res 36: D263- 266 Chapple CC, Vogt T, Ellis BE, Somerville CR (1992) An Arabidopsis mutant defective in the general phenylpropanoid pathway. Plant Cell 4: 1413- 1424 Christinet L, Burdet FX, Zaiko M, Hinz U, Zryd JP (2004) Characterization and functional identification of a novel plant 4,5-extradiol dioxygenase involved in betalain pigment biosynthesis in Portulaca grandiflora. Plant Physiol 134: 265-274 Dixon A (2002) The phenylpropanoid pathway and plant defense- a genomics perspective. Molecular Plant Pathology 3: 371-390 Ehlting J, Buttner D, Wang Q, Douglas CJ, Somssich IE, Kombrink E (1999) Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionarily divergent classes in angiosperms. Plant J 19: 9-20 Geer LY, Marchler-Bauer A, Geer RC, Han L, He J, He S, Liu C, Shi W, Bryant SH (2010) The NCBI BioSystems database. Nucleic Acids Res 38: D492-496 Hammond JB, Kruger NJ (1988) The Bradford method for protein quantitation. Methods Mol Biol 3: 25-32 Huang J, Gu M, Lai Z, Fan B, Shi K, Zhou YH, Yu JQ, Chen Z (2010) Functional analysis of the Arabidopsis PAL gene family in plant growth, development, and response to environmental stress. Plant Physiol 153: 1526-1538 Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755 Joy RWt, Sugiyama M, Fukuda H, Komamine A (1995) Cloning and characterization of polyphenol oxidase cDNAs of Phytolacca americana. Plant Physiol 107: 1083-1089 Kasai D, Masai E, Miyauchi K, Katayama Y, Fukuda M (2004) Characterization of the 3-O-methylgallate dioxygenase gene and evidence of multiple 3-O- methylgallate catabolic pathways in Sphingomonas paucimobilis SYK-6. J Bacteriol 186: 4951-4959 Krizevski R, Bar E, Shalit O, Sitrit Y, Ben-Shabat S, Lewinsohn E (2010) Composition and stereochemistry of ephedrine alkaloids accumulation in Ephedra sinica Stapf. Phytochemistry 71: 895-903 41

Landry LG, Chapple CC, Last RL (1995) Arabidopsis mutants lacking phenolic sunscreens exhibit enhanced ultraviolet-B injury and oxidative damage. Plant Physiol 109: 1159-1166 Lauvergeat V, Lacomme C, Lacombe E, Lasserre E, Roby D, Grima- Pettenati J (2001) Two cinnamoyl-CoA reductase (CCR) genes from Arabidopsis thaliana are differentially expressed during development and in response to infection with pathogenic bacteria. Phytochemistry 57: 1187-1195 Lee D, Ellard M, Wanner LA, Davis KR, Douglas CJ (1995) The Arabidopsis thaliana 4-coumarate:CoA ligase (4CL) gene: stress and developmentally regulated expression and nucleotide sequence of its cDNA. Plant Mol Biol 28: 871-884 Legrand YM, van der Lee A, Barboiu M (2010) Single-crystal X-ray structure of 1,3-dimethylcyclobutadiene by confinement in a crystalline matrix. Science 329: 299-302 Masai E, Momose K, Hara H, Nishikawa S, Katayama Y, Fukuda M (2000) Genetic and biochemical characterization of 4-carboxy-2- hydroxymuconate-6-semialdehyde dehydrogenase and its role in the protocatechuate 4,5-cleavage pathway in Sphingomonas paucimobilis SYK-6. J Bacteriol 182: 6651-6658 Matsuno M, Compagnon V, Schoch GA, Schmitt M, Debayle D, Bassard JE, Pollet B, Hehn A, Heintz D, Ullmann P, Lapierre C, Bernier F, Ehlting J, Werck-Reichhart D (2009) Evolution of a novel phenolic pathway for pollen development. Science 325: 1688-1692 Murashige T, Skoog F (1962) A revised medium for rapid growth and bio assays with tobacco tissue cultures. Physiologia Plantarum 15: 473-497 Nair RB, Bastress KL, Ruegger MO, Denault JW, Chapple C (2004) The Arabidopsis thaliana REDUCED EPIDERMAL FLUORESCENCE1 gene encodes an aldehyde dehydrogenase involved in ferulic acid and sinapic acid biosynthesis. Plant Cell 16: 544-554 Peng X, Egashira T, Hanashiro K, Masai E, Nishikawa S, Katayama Y, Kimbara K, Fukuda M (1998) Cloning of a Sphingomonas paucimobilis SYK-6 gene encoding a novel oxygenase that cleaves lignin-related biphenyl and characterization of the enzyme. Appl Environ Microbiol 64: 2520-2527 Pompon D, Louerat B, Bronine A, Urban P (1996) Yeast expression of animal and plant P450s in optimized redox environments. Methods Enzymol 272: 51-64 Raes J, Vandepoele K, Simillion C, Saeys Y, Van de Peer Y (2003) Investigating ancient duplication events in the Arabidopsis genome. J Struct Funct Genomics 3: 117-129 Ruegger M, Chapple C (2001) Mutations that reduce sinapoylmalate accumulation in Arabidopsis thaliana define loci with diverse roles in phenylpropanoid metabolism. Genetics 159: 1741-1749 42

Saito K, Komamine A (1976) Biosynthesis of stizolobinic acid and stizolobic acid in higher plants. An enzyme system(s) catalyzing the conversion of dihydroxyphenylalanine into stizolobinic acid and stizolobic acid from etiolated seedlings of Stizolobium hassjoo. Eur J Biochem 68: 237-243 Schalk M, Cabello-Hurtado F, Pierrel MA, Atanossova R, Saindrenan P, Werck-Reichhart D (1998) Piperonylic acid, a selective, mechanism- based inactivator of the trans-cinnamate 4-hydroxylase: A new tool to control the flux of metabolites in the phenylpropanoid pathway. Plant Physiol 118: 209-218 Schilmiller AL, Stout J, Weng JK, Humphreys J, Ruegger MO, Chapple C (2009) Mutations in the cinnamate 4-hydroxylase gene impact metabolism, growth and development in Arabidopsis. Plant J 60: 771-782 Schwartz SH, Leon-Kloosterziel KM, Koornneef M, Zeevaart JA (1997) Biochemical characterization of the aba2 and aba3 mutants in Arabidopsis thaliana. Plant Physiol 114: 161-166 Shiono Y, Yokoi, M., Koseki, T., Murayama, T., Aburai, N., and Kimura, K.I. (2010) Allantopyrone A, a new alpha-pyrone metabolite with potent cytotoxicity from an endophytic fungus, Allantophomopsis lycopodina KS- 97. J Antibiot (Tokyo) Sibout R, Eudes A, Mouille G, Pollet B, Lapierre C, Jouanin L, Seguin A (2005) CINNAMYL ALCOHOL DEHYDROGENASE-C and -D are the primary genes involved in lignin biosynthesis in the floral stem of Arabidopsis. Plant Cell 17: 2059-2076 Soltani BM, Ehlting J, Hamberger B, Douglas CJ (2006) Multiple cis- regulatory elements regulate distinct and complex patterns of developmental and wound-induced expression of Arabidopsis thaliana 4CL gene family members. Planta 224: 1226-1238 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution 28: 2731-2739 Weisshaar B, Jenkins GI (1998) Phenylpropanoid biosynthesis and its regulation. Curr Opin Plant Biol 1: 251-257 Weng JK, Li Y, Mo H, Chapple C (2012) Assembly of an evolutionarily new pathway for alpha-pyrone biosynthesis in Arabidopsis. Science 337: 960- 964 Winkel-Shirley B (2001) Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol 126: 485-493 Winter D, Vinegar B, Nahal H, Ammar R, Wilson GV, Provart NJ (2007) An "Electronic Fluorescent Pictograph" browser for exploring and analyzing large-scale biological data sets. PLoS One 2: e718 Wu J, and Baldwin, I.T. (2010) New insights into plant responses to the attack from insect herbivores. Annu Rev Genet 44: 1-2

43

Figures

Figure 2- 1. Bayesian phylogenetic analysis of the CYP84A subfamily. The phylogenetic tree is out-grouped to Selaginella CYP788A1. The colors on the illustration represent different plant families: , Salicaceae, Caricaceae, Vitaceae, Gentianaceae, Solanaceae, Cornaceae, Myrtaceae, Altingiaceae, Moraceae, Fabaceae, and Brassicaceae.

44

Figure 2- 2. Stable isotope study to identify if PHE or TYR is a precursor to AP biosynthesis. Plants transformed with an AtC4H::CYP84A4 construct were grown on media containing PHE, TYR, deuterium ring labeled PHE (2H-5-PHE) or deuterium ring labeled TYR (2H-4-TYR). Arabidopyl alcohol, the most abundant arabidopyrone, was analyzed by LC-MS (ESI negative mode) at day four. While 2H-4-TYR showed no mass shift compared to PHE or TYR fed seedling, 2H-5- PHE fed seedlings showed an m/z shift of 2. These data showed that arabidopyl alcohol is derived from PHE, not TYR.

45

Figure 2- 3. Stable isotope labeling study to test if the carbons on PHE remain intact during AP biosynthesis. Plants transformed with an AtC4H::CYP84A4 construct were grown on media containing PHE, TYR, equal molar ratio of 2H-5- PHE and 13C-1-PHE, or 2H-4-TYR. Arabidopyl alcohol, the most abundant arabidopyrone, was analyzed by LC-MS (ESI negative mode) at day four. Consistent with previous experiment, 2H-4-TYR showed no mass shift compared to PHE or TYR fed seedling, however, there was an equal ratio of arabidopyl alcohol derived from 2H-5-PHE and 13C-1-PHE, indicating that the side-chain of phenylalanine remains intact during AP biosynthesis.

46

Figure 2- 4. HPLC chromatograms of metabolites in Arabidopsis inflorescence tissues at 313 nm. Arabidopyrones are indicated as the following: green, arabidopyl alcohol; blue, arabidopic acid; maroon, iso-arabidopyl alcohol; and red, iso-arabidopic acid. Wild-type plants have all four of the compounds while the levels of arabidopyrones in the two AtLigB mutants (T-DNA line Salk141715 as apd2-1 and CS800365 as apb2-2) are close or under the detection limit. This AP deficiency phenotype is restored in the apd2-1/35S::AtLigB transgenic plants. These data indicate that AtLigB is required for AP biosynthesis. 47

Figure 2- 5. Bayesian phylogenetic analysis of extradiol ring cleavage dioxygenases. 3 of the 46 known bacterial homologs are included. Black and grey represent prokaryotes. The rest of the colors represent different orders of plants: Funariales, Selaginellales, Pinales, Fabales, Vitales, , Malpighiales, Caryophyllales, Solanales, Brassicales. Orthologs in the Caryophyllales are relatively well-characterized and many of them are involved in the biosynthesis of betalains.

48

Figure 2- 6. Evaluation of the involvement of PAL in AP biosynthesis. (a) Sinapoylmalate content quantified by HPLC in three-week old rosettes of the pal double and quadruple mutants. (b) Sinapoylmalate (SM) and arabidopyl alcohol (AP) contents were quantified by HPLC in five week old inflorescence of the same mutants. (c) Total PAL activity from six week old inflorescence. Error bars represent three biological replicates. The * indicates statistical difference (p<0.05) compared to wild type using student’s t-test, and **represents statistical difference than both wild type and * (p<0.05) using student’s t-test. 49

Figure 2- 7. Evaluation of the involvement of C4H in AP biosynthesis. (a) Plants over-accumulating APs were grown on media supplemented with MS (regular media), DMSO alone, p-coumarate alone, piperonylic acid (PA), or PA with p- coumarate. Arabidopyl alcohol levels were quantified in six day old seedlings. Supplementation with p-coumaric acid and PA partially rescued the AP deficient phenotype when compared with plants supplemented only with PA. (b) Arabidopyl alcohol contents were quantified in an allelic series of C4H mutants at five weeks of age. Error bars represent three biological replicates. The * indicates statistical difference (p<0.05) compared to wild type using student’s t-test, and **represents statistical difference than both wild type and * (p<0.05) using student’s t-test. 50

Figure 2- 8. Evaluation of the involvement of 4CL in AP biosynthesis. Arabidopyl alcohol content was quantified by HPLC in 4cl mutants at five weeks of age. Only the 4cl1 4cl2 4cl3 mutant showed reduction in AP content compared to wild type. Error bars represent three biological replicates. The * indicates statistical difference (p<0.05) compared to wild type using student’s t-test.

51

Figure 2- 9. Evaluation of the involvement of CCR in AP biosynthesis. (a) Arabidopyl alcohol content was quantified by HPLC in five week old inflorescence samples. (b) Sinapoylmalate content was quantified in three week old rosette extracts. Only the ccr1 mutant showed reduced AP content. Error bars represent three biological replicates. The * indicates statistical difference (p<0.05) compared to wild type using student’s t-test. 52

Figure 2- 10. Evaluation of the involvement of CAD in AP biosynthesis. The graph illustrates the LC-MS mass counts of the four arabidopyrones in five week old inflorescence samples. Both arabidopyl alcohol and acid showed reduction in the cad c cad d double mutant; whereas, both isomers showed an increase in content. Error bars represent three biological replicates. The * indicates statistical difference (p<0.05) compared to wild type using student’s t-test.

53

Figure 2- 11. Current model of AP biosynthesis. CYP84A4 (in red) neofunctionalized from CYP84A1, which is required for sinapoylmalate and S lignin biosynthesis (in purple), and obtained 3-hydroxylase activity toward p- coumaryl aldehyde, a phenylpropanoid intermediate. CAD, an enzyme of known function in the phenylpropanoid pathway and AtLigB, a conserved extra-diol ring cleavage dioxygenase are required to ultimately generate APs.

54

Figure 2- 12. Cvi-0 is a natural apd mutant defective in AP biosynthesis. Arabidopyl alcohol content was quantified in five week old inflorescence from 80 natural accessions. The accessions with error bars represent three biological replicate, the rest of the accessions represent one biological sample.

55

CHAPTER 3. ISOFORMS OF ARABIDOPSIS 4CL HAVE OVERLAPPING YET DISTINCT ROLES IN PHENYLPROPANOIND METABOLISM

Introduction

Plants are sessile organisms that cannot escape from natural predators or non-ideal environments. It is thought that selective pressures associated with defense and abiotic and biotic stresses led to the ability to accumulate the large number of specialized metabolites seen in plants today. Pathways that are required for the biosynthesis of specialized metabolites often intersect with primary metabolism and in some cases contribute to plant vigor and fitness. For example, 30% of the total inorganic carbon fixed by plants enters the phenylpropanoid pathway which produces major specialized metabolites including lignin, hydroxycinnamoyl esters and flavonoids (Weisshaar and Jenkins, 1998; Boerjan et al., 2003). The emergence of lignin enabled land plants the rigidity to stand up-right and transport water long-distances. In addition, lignin also serves as the first mechanical defense against pathogens (Boerjan et al., 2003; Bonawitz and Chapple, 2010). Although lignin accounts for the vast majority of the carbon flux through this pathway, soluble phenylpropanoids also play vital roles in plant growth and development. For example, sinapoylmalate, a hydroxycinnamoyl ester primarily found in members of the Brassicaceae, serves as a UV protectant (Chapple et al., 1992; Landry et al., 1995). More than 1000 flavonoids have been identified to date, some of which play key roles as pigments, in pathogen resistance, and in protection against oxidative stress (Dixon and Paiva, 1995; Landry et al., 1995; Winkel-Shirley, 2001; Broun, 2005; Tanaka et al., 2008). Certain sub-classes of flavonoids are better known for their antioxidant properties in the human diet; examples include isoflavonoids in 56 soybean and stilbenes in red wine (Winkel-Shirley, 2001). Harnessing and manipulation of this pathway is desirable for many applications, however, we still lack a complete understanding of the genetics and biochemistry underlying the biosynthesis of these compounds, proven by the fact that the biosynthetic pathway continues to be rewritten (Bonawitz and Chapple, 2010; Weng et al., 2012; Vanholme et al., 2013). Furthering our understanding of the phenylpropanoid pathway not only provides an improved basic understanding of plant specialized metabolism but also enhances our ability to rationally design plant metabolic pathways for future applications. The first three steps of the pathway, known as the general phenylpropanoid pathway, are conserved in almost all dicots (Koukol and Conn, 1961). Phenylalanine is first de-aminated by phenylalanine ammonia lyase (PAL), a step that is followed by a 4-hydroxylation reaction catalyzed by cinnamate 4- hydroxylase (C4H), the product of which is converted to its CoA ester by 4- coumarate: CoA ligase (4CL). These reactions have been described in detail since the 1960s (Russell and Conn, 1967; Russell, 1971; Knobloch and Hahlbrock, 1975; Hahlbrock and Grisebach, 1979). Genetic screens, especially in Arabidopsis thaliana, have led to the identification of many genes involved in this pathway. PAL and 4CL, however, are encoded by multiple paralogs in most plant species, consistent with the fact that only one known mutant in 4CL has been identified from a forward genetic screen (Saballos et al., 2012). In Arabidopsis, both PAL and 4CL are encoded by four genes (Lee et al., 1995; Rohde et al., 2004; Soltani et al., 2006; Huang et al., 2010). Although studies show that the PAL isoforms 1 and 2 are the major PALs and are redundant in function (Rohde et al., 2004; Huang et al., 2010), the functions of 4CL isoforms are less clear. The substrate specificities of the 4CL and 4CL-like proteins have been reported (Ehlting et al., 1999; Hamberger and Hahlbrock, 2004; Costa et al., 2005; Soltani et al., 2006) and four 4CLs have been shown to use p-coumaric acid, caffeic acid, and ferulic acid as substrates when heterologously expressed. Interestingly, 4CL4 is the only 4CL capable of activating sinapic acid (Hamberger 57 and Hahlbrock, 2004). The amino acid residues important for recognizing and accommodating substrates at the enzymes’ active sites have been identified and characterized (Ehlting et al., 2001; Schneider et al., 2003). Based on all of this evidence, At4CL1, 2, 3 and 4 are thought to participate in phenylpropanoid metabolism in Arabidopsis. Phylogenetic analysis reveals that 4CL1, 4CL2 and 4CL4 are more closely related to one another than to 4CL3 (Ehlting et al., 1999; Soltani et al., 2006), suggesting that 4CL3 may have evolved a distinct biological function. Although the phylogenetic relationship among the four Arabidopsis 4CLs is clear, it is difficult to further dissect the relationship between the duplication events and plant phylogeny because of the limited number of species and sequences that were included in these studies. Co-expression network analysis suggests that 4CL1, 4CL2 and 4CL4 may be associated with lignin and sinapoylmalate biosynthesis; whereas, 4CL3 is associated with flavonoid production (Koopman et al., 2012). In addition, promoter-GUS analysis shows that 4CL1 and 4CL2 are expressed in lignifying cells whereas 4CL3 is expressed in a broad range of cell types in Arabidopsis seedlings (Ehlting et al., 1999). It has been reported that 4CL1, 4CL2 and 4CL4 but not 4CL3 transcripts were induced upon wounding in Arabidopsis leaves (Soltani et al., 2006), consistent with the observation that physical wounding can induce lignification (Rittinger et al., 1987; Dixon and Paiva, 1995; Hawkins and Boudet, 1996). Based on this evidence, it has been proposed that 4CL1 and 4CL2 are the major 4CLs involved in lignin biosynthesis; whereas, 4CL3 has a more important role in flavonoid biosynthesis (Ehlting et al., 1999; Saito et al., 2013). Further, 4cl3 mutants were identified in a large scale genetic screen to find genes involved in pollen exine formation (Dobritsa et al., 2011). In this study, two mutant alleles of 4cl3 show defects in pollen exine as well as soluble flavonoid production, supporting the idea that 4CL3 may have a specific role in flavonoid biosynthesis, however, flavonoid biosynthesis in tissues other than pollen was not examined in this study. Lastly, antisense suppression of 4CL activity in Arabidopsis has been shown to alter lignin content and composition 58

(Lee et al., 1997), although it was unclear in this study whether the suppression of 4CL activity was specific to any isoforms of the 4CLs. A recent study shows that 4cl1 but not 4cl2 mutants contain less total lignin content than wild-type plants, suggesting that 4CL1 may have an important role in lignin biosynthesis (Van Acker et al., 2013). No 4CL has been implicated in sinapoylmalate production. In summary, although some evidence for the roles of 4CLs have been established in Arabidopsis, there has not been a comprehensive genetic study on the impact of these isoforms on overall phenylpropanoid metabolism. In this chapter, we provide experimental evidence that describes the biological functions of the Arabidopsis 4CL isoforms. We report that 4CL3 is the predominant 4CL involved in flavonoid biosynthesis. In addition, 4CL1 and 4CL3 are both important in sinapoylmalate biosynthesis and 4CL1, 4CL2 and 4CL3 all contribute to lignin biosynthesis. Lastly, 4CL4 contributes to the pathway in a manner that has yet to be determined and possibly not at all. These findings together show that although the four isoforms of 4CLs have similar catalytic efficiencies towards their substrate(s) when heterologously expressed, their biological functions are distinct but overlapping in the soluble and lignin branches of phenylpropanoid metabolism. These biological differences may be attributed to the tissue specificity of differential gene expressions for the biosynthesis of different metabolites.

Results

Phylogenetic analysis of 4CL homologs

As mentioned previously, there are four 4CL genes annotated in the Arabidopsis thaliana genome that encode enzymes that show catalytic activity towards hydroxycinnamic acids when heterologously expressed (Ehlting et al., 1999; Hamberger and Hahlbrock, 2004; Costa et al., 2005; Soltani et al., 2006). Previous phylogenetic analysis shows that the four Arabidopsis 4CLs belong to two distinct classes (Ehlting et al., 1999; Soltani et al., 2006; Chen et al., 2014), 59 however, the scope of these phylogenetic analyses was limited to a few plant species. Using the four Arabidopsis thaliana 4CLs as query, we identified 192 4CL homologous proteins from across the plant kingdom with at least 70% sequence identity and generated a new phylogenetic tree using Bayesian analysis (Fig.3-1). Our findings confirm that 4CL3 belongs to a different clade than the other 4CLs. Further, our analysis shows that this divergence happened before the split between monocots and dicots and that both 4CL3-like and proteins more similar to 4CL1, 4CL2, and 4CL4 are present in both of these lineages. It is also notable that 4CL1 and 4CL2 are more closely related to one another than to 4CL4, supporting the idea that 4CL1 and 4CL2 may share some functional redundancy.

Promoter GUS analysis of four 4CLs

Previous promoter-GUS transcriptional analysis (Soltani et al., 2006) showed that 4CL1 and 4CL2 are expressed predominantly in lignifying tissues; whereas,4CL3 is expressed more broadly, suggesting that the 4CL3 may have a different biological function than the other two isoforms. These findings are consistent with the hypothesis that the differential expression patterns of the four 4CLs lead to their different biological functions (Costa et al., 2005; Soltani et al., 2006; Bassard et al., 2012). To further test the expression profiles of the four 4CLs, we generated transgenic Arabidopsis plants carrying 4CL promoter-GUS constructs and analyzed their rosettes at two weeks of age and obtained data consistent with previous observations (Fig. 3-2B). The expression differences are even more distinct in three-week-old leaves: 4CL1 and 4CL3 both have strong over-all expression, but 4CL1 is expressed more in the vasculature, 4CL2 is expressed mainly in the mid rib, second order and to an extent the third order veins, and 4CL4 expression is very limited in the minor veins (Fig. 3-2C). In the inflorescence, we observed that both 4CL1 and 4CL2 have strong expression in the xylem, consistent with their proposed role in lignin biosynthesis (Fig. 3-2D). In 60 addition, 4CL1 is expressed in the cortex, phloem and interfasicular region. While 4CL4 does not show any detectable expression, 4CL3 is expressed in the epidermis, cortex, cambium, phloem and the pith (Fig. 3-2D). We examined the flowers of the same plants and found that 4CL1 is expressed almost everywhere, 4CL2 is expressed mainly in the stigma and the veins of the sepal, 4CL3 is expressed in the sepal, petal, filament and the stigma and 4CL4 is minimally expressed (Fig. 3-2E). Last, we found that all 4CLs are broadly expressed in the root, but 4CL2 and 4CL4 expressions are excluded from the root tip (Fig. 3-2B).

4CL activity

To further dissect the biological functions of the 4CL isoforms, we identified single and higher order mutants of the four 4CLs. We were unable to identify mutants with both the 4cl2 and 4cl4 mutations due to the tight genetic linkage between the two adjacent genes. We then measured the total 4CL activity in the set of mutants generated and found that the 4cl1 single mutant alone shows a 90% reduction in activity in both two-week-old rosettes and five- week-old inflorescences compared to wild type; whereas, the activity measured in extracts of the other single mutants is the same as that of wild type (Fig. 3-3D; 3- 4D). Further, in the higher order mutants we analyzed, all the double and triple mutants containing the 4cl1 mutation also show similar reductions in activity as the 4cl1 single mutant; the 4cl2 4cl3 and 4cl3 4cl4 mutants both have wild-type levels of 4CL activity. These data indicate that 4CL1 alone accounts for the majority of the total 4CL activity in the organs examined and the other three 4CLs contribute little to over-all 4CL activity, at least as measured in vitro.

Plant growth and lignin biosynthesis

Deficiency in lignin production often but not always leads to a dwarf phenotype (Bonawitz and Chapple, 2013). Despite the fact the 4CL1 alone accounts for the majority of the total 4CL activity, we did not observe any growth 61 defect in the 4cl1 mutant. In fact, in the whole set of 12 mutants we analyzed; only the 4cl1 4cl2 double mutant and the 4cl1 4cl2 4cl3 triple mutant exhibit a dwarf and bushy phenotype at the same developmental stage (Fig. 3-5C; 3-5 D). Unlike many other dwarf mutants in the phenylpropanoid pathway in which the plants are dwarf even at the rosette stage, the 4cl1 4cl2 mutant is indistinguishable from wild type in growth at the rosette stage and its growth defect only becomes apparent after the plants have bolted, but the 4cl1 4cl2 4cl3 triple mutant has a smaller rosette compared to wild type. Although still fertile, both mutants produce fewer seeds. When we analyzed the total lignin content using the acetyl bromide method, we found that only these dwarf plants (4cl1 4cl2 and 4cl1 4cl2 4cl3) have less total lignin content than wild type (p=0.018 and 0.016 using student’s t-test, respectively) (Fig. 3-5A). These data together suggest that 4CL1, 4CL2 and 4CL3 are important for proper plant growth and that 4CL1 and 4CL2 contribute most to lignin biosynthesis. 4CL4 does not appear to contribute to lignin biosynthesis, consistent with its expression profiles.

Lignin composition

Targeted engineering of the phenylpropanoid genes can result in both changes in lignin content and composition (Lee et al., 1997; Franke et al., 2002; Sibout et al., 2005; Coleman et al., 2008; Li et al., 2010; Van Acker et al., 2013). There are two studies on how 4CL affects lignin deposition in Arabidopsis: the 4CL antisense knock-down line generated almost two decades ago was reported to contain less lignin and altered lignin composition (Lee et al., 1997), and the 4cl1 mutant alone shows a similar phenotype (Van Acker et al., 2013). We measured lignin composition by the DFRC method and found that it is changed in the 4cl mutants: the mol% of guaiacyl (G) : mol% syringyl (S) lignin in a wild- type plant was changed from 75:25 to 60:30 in a 4cl1 single mutant alone (Fig. 3- 5B). Higher order mutants involving the 4cl1 mutation all show a similar trend in this change (Fig. 3-5B). Further, the mol% of guaiacyl (G): mol% syringyl (S) 62 lignin in a 4cl1 4cl2 4cl3 triple mutant was changed to 50:40 compared to wild type. These data together show that 4CL1 alone affects the deposition of lignin, and that 4CL2 and 4CL3 also contribute to the regulation of G and S lignin deposition. 4CL4 again does not appear to contribute to the regulation of lignin deposition.

Sinapoylmalate biosynthesis

Sinapoylmalate accumulates in the epidermis of Arabidopsis as a UV protectant and its biosynthesis presumably requires 4CL activity. To test whether one or more isoforms of 4CL have a major role in its biosynthesis, we measured sinapoylmalate content in rosette leaves and found that the 4cl1 single mutant shows a phenotype, with a 20% reduction in sinapoylmalate content (Fig. 3-3B). The reduction of sinapoylmalate is further exacerbated in the 4cl1 4cl3 (80%) and 4cl1 4cl2 4cl3 (90%) mutants (Fig. 3-3B), indicating that 4CL3 and 4CL2 contribute to sinapoylmalate biosynthesis. It is notable that in the mutants that contain less sinapoylmalate than the 4cl1 single mutant, none shows any less 4CL activity than that of the 4cl1 single mutant (Fig. 3-3D), suggesting that the epidermal cells in which sinapoylmalate is made may only contribute to a small portion of the total 4CL activity measured in leaf homogenates. Sinapoylmalate content was also quantified in the inflorescence and we found no significant change in any of the genotypes (Fig. 3-3C).

Atypical soluble phenylpropanoid metabolites

The Arabidopsis 4CLs can use p-coumarate, caffeate, ferulate, and in the case of 4CL4 sinapate as substrates when heterologously expressed (Ehlting et al., 1999; Hamberger and Hahlbrock, 2004; Costa et al., 2005). The canonical view of the phenylpropanoid pathway (Fig. 3-8) is that 4CL activity is directed towards p-coumarate in vivo, however, caffeoyl CoA is now also thought to be generated by 4CL using caffeic acid produced by caffeoyl shikimate esterase 63

(CSE) (Vanholme et al., 2013). To further evaluate this model, we analyzed soluble metabolites of two-week-old rosettes and the five-week-old inflorescences and found that substances with masses as determined by LC-MS that match those of p-coumaroyl glucose, p-coumaroyl malate, and feruloyl malate accumulate in 4cl1 mutants and that the accumulation of these atypical phenylpropanoids increases when the 4cl2 and 4cl3 mutations are combined with 4cl1 (Fig. 3-3A; 3-4A). The accumulation of atypical phenylpropanoid metabolites has been reported before. When p-coumaroyl shikimate 3′-hydroxylase (C3’H) is blocked in the ref8 mutant, the plants accumulate atypical soluble phenylpropanoids including p-coumaroyl glucose and p-coumaroyl malate (Schoch et al., 2001; Franke et al., 2002). The accumulation of these two compounds in the Arabidopsis 4cl1 mutant is consistent with the canonical view of the pathway. The accumulation of feruloyl malate in Arabidopsis supports the placement of CSE in the pathway and suggests that when 4CL is blocked, caffeic acid generated by CSE can be methylated by caffeic acid O-methyl transferase (COMT) to generate ferulate and used by the promiscuous enzymes of sinapoylmalate biosynthesis to make feruloyl malate (Do et al., 2007). Further, the 4cl1 4cl2 double mutant accumulates more atypical compounds than the 4cl1 4cl3 mutant in the inflorescence. In contrast, the 4cl1 4cl3 mutant contains more atypical compounds than the 4cl1 4cl2 mutant in the rosette, suggesting that 4CL1 and 4CL2 are the major 4CLs in the inflorescence and 4CL1 and 4CL3 are the major 4CLs in leaves (Fig. 3-3A; 3-4A).

4CL3 in flavonoid biosynthesis

Flavonoids are accumulated as flavonol glycosides in the aerial part of the Arabidopsis plant. 4CL3 has been shown to be important in flavonol biosynthesis in the flower (Dobritsa et al., 2011). To further test the role of 4CL3 in flavonoid biosynthesis, we analyzed the flavonol glycoside content of Arabidopsis two- week-old rosettes and found that among the single mutants, only in a 4cl3 single 64 mutant is flavonol glycoside content reduced compared to wild-type plants (Fig. 3-3C), showing that 4CL3 is the dominant 4CL in flavonoid production. Higher order mutants containing the 4cl3 mutation exhibit similar phenotypes as the 4cl3 single mutant (Fig. 3-3C). The 4cl1 4cl2 mutant accumulates less flavonols than wild type but more than the 4cl3 mutant alone or mutants containing the 4cl3 mutation (Fig. 3-3C), suggesting that 4CL1 and 4CL2 also contribute to flavonol biosynthesis, although surprisingly, the 4cl1 4cl2 4cl3 triple mutant does not accumulate less flavonols than the 4cl3 single mutant alone (Fig. 3-3C). Flavonol content was also quantified in the inflorescence and we found that there was no significant difference among the genotypes except where the total flavonol content is increased in a 4cl1 4cl2 double mutant, (Fig. 3-4C). It is possible that when both 4CL1 and 4CL2 are blocked in the inflorescence, the carbon flux that normally enters lignin and sinapoylmalate biosynthesis is re- redirected to flavonol biosynthesis, resulting in the hyper-accumulation of flavonols, however, when 4CL3 is blocked in this background, no more flux can enter the flavonol biosynthetic pathway, thus the flavonol content is not enhanced in the 4cl1 4cl2 4cl3 triple mutant. The flavonoid branch of the phenylpropanoid pathway generates a large number of metabolites, some of which accumulate in normal growth condition and others are induced under stress conditions. Anthocyanins are purple flavonoid pigments transiently induced under high light and/or sucrose induction conditions in Arabidopsis (Kubasek et al., 1992; Shirley et al., 1995). To further test the role of 4CL3 in flavonoid biosynthesis, we grew seedlings under anthocyanin inducing conditions. Although wild-type seedlings show the purple color typical for anthocyanin-containing seedlings, the 4cl3 mutant and mutants containing the 4cl3 mutation fail to do so. The same seedlings were then analyzed by HPLC for anthocyanin content and this evaluation confirmed the visual results (Fig. 3-6B). Together, these data show that 4CL3 is the predominant 4CL in flavonoid biosynthesis under a variety of conditions.

65

Discussion

The Arabidopsis genome contains a large number of genes encoding enzymes that can activate carboxylic acids including UDP-glucosyltransferases (UGTs) (Lim et al., 2001), long-chain fatty acyl-CoA synthetases (LACSs), acetate CoA ligases (ACSs), and 4CL and 4CL-like proteins (Cukovic et al., 2001; Kienow et al., 2008). For each class, there are a large number of paralogs within the genome. The generation of these paralogous genes is thought to be largely due to gene duplication events. Although the majority of the genes are silenced post gene duplication, the small portions of genes that are retained have either neofunctionalized or subfunctionalized (Lynch and Conery, 2000; Rensing, 2014). Many of these enzymes are promiscuous and have evolved diversified functions: many of the LACSs are involved in fatty acids biosynthesis, however, one LACS member has neofunctionalized to participate in jasmonate biosynthesis (Kienow et al., 2008). It has also been reported that one 4CL-like protein encoded by At4g19010 is important for ubiquinone biosynthesis (Block et al., 2014). Among the 4CL genes that are thought to be involved in phenylpropanoid metabolism in Arabidopsis, four encode proteins that can catalyze the reactions to generate the CoA esters required for downstream products (Ehlting et al., 1999; Hamberger and Hahlbrock, 2004). These 4CL isoforms appear to have subfunctionalized in that their enzyme activities remain similar, with the exception that 4CL4 has gained the ability to activate sinapate (Cukovic et al., 2001; Ehlting et al., 2001; Stuible and Kombrink, 2001; Hamberger and Hahlbrock, 2004), but their expression patterns have diverged (Fig. 3-2). The neofunctionalization and subfunctionalization of these enzymes contributed to the chemical diversity in plant primary and specialized metabolism we see today.

Divergence of 4CL isoforms

The presence of multiple 4CL isoforms is common in plant lineages and the divergence of expression patterns in 4CL isoforms has also been reported. 66

For example, four 4CLs identified in Sorghum exhibit similar catalytic efficiency towards p-coumaric acid, caffeic acid and ferulic acid, but their expression patterns have diverged in leaves, stems and roots (Saballos et al., 2012). In Kudzu, a member of the Fabaceae, two 4CLs have been isolated, although both could use p-coumaric acid as substrate, only one 4CL can activate caffeic acid. In addition, their expression patterns are different in that one is expressed highly in the root while the other is in the stem (Li et al., 2014). Functional analysis showed that Pinta4CL1 and Pinta4CL3 from Loblolly pine have similar catalytic efficiency towards p-coumaric acid and caffeic acid, however, Pinta4CL1 has the highest expression in xylem but Pinta4CL3 does not (Chen et al., 2014). The authors also demonstrated that Pinta4CL3 functions mainly in the biosynthesis of soluble metabolites. These examples together show that the enzymatic functions of 4CL isoforms often remain similar and that the divergence of expression patterns is widespread among plant lineages. This divergence may be the contributing factor to their different roles in phenylpropanoid metabolism. In Arabidopsis, our findings are consistent with this model in that although the four 4CL isoforms have similar enzymatic functions, 4CL1 and 4CL2 are highly expressed in lignifying cells while 4CL3 is expressed more broadly and 4CL4 has very limited expression. The phenotypical consequences are that 4CL3 is primarily involved in flavonoid and sinapoylmalate biosynthesis, 4CL1 and 4CL2 are involved in lignin biosynthesis and are required for proper plant growth, and 4CL4 has no detectable function in organs examined due to its limited expression profile.

4CL isoforms in soluble metabolism

It has been suggested that 4CL1 and 4CL2 are required for lignin biosynthesis while 4CL3 is involved in flavonoid biosynthesis (Ehlting et al., 1999; Soltani et al., 2006; Bassard et al., 2012). Our data show that 4CL3 is not only required for flavonoid biosynthesis, but is also involved in sinapoylmalate and to 67 some extent lignin biosynthesis. In addition, 4CL1 and 4CL2 not only participate in lignin biosynthesis, both are involved in soluble metabolism including sinapoylmalate and flavonoid (Fig. 3-7). These findings are consistent with their respective expression profiles in that 4CL3 has the broadest expression profiles while 4CL1 and 4CL2 are more expressed in the vasculature. Although most of the studies on 4CLs in other plant species focus on their roles in lignin biosynthesis, some have shown their roles in soluble metabolism. For example, using hybrid poplar as a model system, the authors observed a 50% reduction in 4CL activity using RNAi, while there was no change in lignin content in these plants, they observed an overall reduction in known flavonoids found in poplar (Voelker et al., 2010). In the study characterizing Pinta4CL1 and Pinta4CL3 from Loblolly pine, the authors reported that when ectopically expressing Pinta4CL3 in poplar, flavonoids and hydroxycinnamoyl esters were both affected, suggesting that Pinta4CL3 may have a specific role in soluble metabolism (Chen et al., 2014). Interestingly, Pinta4CL3 is more closely related to Arabidopsis 4CL3 than Pinta4CL1 (Chen et al., 2014), suggesting that 4CL3 and 4CL3 like 4CLs in other plant lineages may also function primarily in soluble metabolism. When we examined the total flavonol content in the inflorescence, we found that a 4cl1 4cl2 double mutant contains elevated level of flavonols (Fig. 3- 4C). It has been observed before that when earlier steps of the phenylpropanoid pathway are blocked, flux that normally enters lignin and sinapoylmalate biosynthesis is re-redirected to flavonol biosynthesis, resulting in the hyper- accumulation of flavonoids. For example, the ref8 mutant accumulates high levels of flavonols as well as anthocyanins (Schoch et al., 2001; Franke et al., 2002). In addition, the ref8 mutant accumulates atypical soluble metabolites derived from p-coumaric acid. We also noted the accumulation of atypical metabolites including p-coumaroyl glucose, p-coumaroyl malate, and feruloyl malate when 4CL1 is blocked (Fig. 3-3A; 3-4A). Interestingly, we found that these compounds accumulate to a much higher level in the inflorescence than the rosette (Fig. 3-3A; 3-4A). The 4cl1 4cl2 double mutant accumulates more of 68 these compounds in the inflorescence than the 4cl1 4cl3 mutant and the 4cl1 4cl3 mutant has a higher level of accumulation of the compounds in the rosette than the 4cl1 4cl2 mutant, suggesting that 4CL2 has more of a role in the inflorescence and 4CL3 is more important in the rosette. The different roles of the 4CLs may be a result of their relative expression patterns in these organs. Perhaps most importantly, the presence of these atypical metabolites, especially feruloyl malate provides another layer of experimental evidence that CSE generates caffeic acid that could be used by 4CL to make caffeoyl CoA as reported (Vanholme et al., 2013) (Fig. 3-8).

Growth defects and lignin deposition

Disruptions in the phenylpropanoid pathway often lead to plant growth defects. It was thought that the dwarf phenotype associated with lignin deficient mutants is due to the inability of these plants to transport water when lignin content is reduced. In general, plants defective in the earlier steps of the pathway have more severe consequences: the pal1 pal2 double mutant has global reduction of phenylpropanoids and has either a sterile or a dwarf phenotype depending on growth conditions (Rohde et al., 2004; Huang et al., 2010). A pal1 pal2 pal3 pal4 quadruple mutant is dwarf and sterile and contains only 30% wild- type levels of lignin and 10% total PAL activity (Huang et al., 2010). Similarly, an allelic series of C4H mutants show reduction of over-all soluble metabolites and cell wall components along with a corresponding dwarf and sometimes sterile phenotype (Schilmiller et al., 2009). In the most severe allele of C4H mutant ref3- 2, the plant has a similar dwarf phenotype to what we observed in a 4cl1 4cl2 mutant, however, the ref3-2 mutant has over 80% reduction in lignin content whereas the 4cl1 4cl2 and the 4cl1 4cl2 4cl3 mutants contain only about 20% less total lignin than wild type while there is only 10% total 4CL activity in these plants (Fig. 3-5C; 3-5A), suggesting that the degree of dwarfing observed is not directly proportional to the extent of lignin reduction and that 4CL activity in wild- 69 type plants considerably exceeds what is needed to support normal lignin biosynthesis. Consistent with what we observed in Arabidopsis, it has been reported in tobacco that even when the total 4CL activity is knocked down by 99%, there is only a 30% reduction of total lignin content (Kajita et al., 1996), demonstrating that the total 4CL activity is in excess for lignin biosynthesis in other plant lineages. There have been similar reports of the role of 4CL in lignin biosynthesis in pine, poplar, and switch grass. In these studies, when the total 4CL activity is reduced by 70% to 95%, lignin content is reduced by 20% to 40% and the plants are dwarf (except in one report that plants show enhanced growth) (Hu et al., 1999; Li et al., 2003; Wagner et al., 2009; Xu et al., 2011). Although we do not know yet the extent of 4CL activity reduction to see any lignin deficiency phenotype, it has been shown that a 50% reduction of 4CL activity in poplar does not lead to any change in lignin content (Voelker et al., 2010). Recent studies show that dwarfism may not always be a direct result of reduction in lignin content (Bonawitz et al., 2012; Bonawitz et al., 2014) but instead could involve the sensing (directly or indirectly) of metabolic or cell wall changes via the Mediator complex, a multi-subunit transcriptional co-regulator that mediates the recruitment of RNA Pol II during transcription. In this study it was demonstrated that the dwarf and lignin-deficient phenotype of the Arabidopsis ref8 mutant is almost completely rescued when MED5a and MED5b are knocked out (Bonawitz et al., 2014), indicating that dwarfism, at least in the context of this blockage in phenylpropanoid metabolism, is regulated through MED5 presumably via transcription and is not a direct effect of lignin deficiency. In an otherwise wild-type background, the loss of MED5a and MED5b leads to increased phenylpropanoid metabolism (Bonawitz et al., 2012) providing further evidence of the role of MED5 in phenylpropanoid homeostasis. In the dwarf 4cl mutants we only observed a 20% reduction in lignin content. It is possible that when 4CL is blocked, MED5 could sense either a metabolic or cell wall change via an unknown mechanism, and this signal then triggers phenotypical consequences such as dwarfism, presumably via transcriptional regulation. 70

In addition to the lignin deficiency phenotype, we observed changes in lignin composition in the 4cl mutants. In a wild-type plant, it is typical that the mol% of G lignin is 75% and S lignin is 25%. But in a 4cl1 single mutant we observed that mol% of G lignin becomes 60% and S becomes 30%. This change is more obvious in a 4cl1 4cl2 4cl3 mutant in that the plant has 50% G lignin and 40% S lignin (Fig. 3-3B). Although we do not understand the mechanism behind this change, similar phenotypes have been reported before. Antisense mutants generated in Arabidopsis based on homology to Parsley 4CL that contain 10-20% residue 4CL activity show a decrease in mol% of G lignin compared to wild type (Lee et al., 1997). A 4cl1 single mutant is also reported to show similar phenotype in a large scale study (Van Acker et al., 2013). It has been proposed that the antisense knockdown may be preferentially affecting one 4CL isoform over the others. Our results in the 4cl1 mutant are consistent with this hypothesis. G lignin is preferentially deposited in the xylem and S lignin is primarily deposited in the interfasicular tissues (Chen et al., 2002). It is possible that 4CL1 has a major role in the deposition of G lignin, consistent with its expression pattern in the xylem (Fig. 3-2E), although we still cannot rule out the possibility that there is an alternative pathway leading to S lignin deposition independent of 4CL. The change in lignin monomer deposition has been reported in other plant species. In a study where 4CL antisense knock-down lines were generated in cottonwood, not only did they observe a decrease in lignin content but also found that the mol% of G lignin monomer is decreased compared to wild type (Xiang et al., 2015). Similar observations have also been reported in switch grass (Xu et al., 2011). Although the lignin monomer composition was not changed in poplar 4cl1 mutant with 10% total 4CL activity and a 40% reduction in total lignin content (Li et al., 2003). Further, unlike 4CL which is encoded by four isoforms, the C4H mutants in Arabidopsis show not only total lignin content change but also a decreased in the mol% of G lignin monomer (Schilmiller et al., 2009). In this case, we cannot explain the change by the differential impact on different isoforms. Rather, it is possible that missense (not knockout) nature of the mutation in C4H mutants 71

leads to a differential deposition of G and S lignin. We still do not know if the lignin composition change in C4H and 4CL mutants is caused by the same mechanism or not, thus further understanding of the mechanism(s) is necessary and may be applicable in a broader range of plant species for future applications for bioengineering. In summary, the four isoforms of Arabidopsis 4CLs have overlapping yet distinctive roles in phenylpropanoid metabolism, probably due to their diverged expression patterns. The subfunctionalization of Arabidopsis 4CL isoforms adds further biochemical understanding that could be broadly applicable in other plant species. Further, this study revises previous predictions and extends our understanding of the roles of the 4CL isoforms in soluble metabolism and lignin biosynthesis. This knowledge may assist in further rational engineering of the phenylpropanoid pathway.

Materials and methods

Plant Material and Growth Conditions Soil grown Arabidopsis thaliana plants were grown in a growth chamber during a 16 hour light and 8 hour dark cycle under a light intensity of 120 µE sec-1 m-2 at 22 oC (Redi-Earth Plug and Seedling Mixture; Sun Gro Horticulture, Bellevue, WA, USA). The soil was supplemented with Scotts Osmocote Plus controlled release fertilizer (Hummert International, Earth City, MO, USA). For plate grown Arabidopsis plants, seeds were surface sterilized at room temperature while shaking for 30 min in a mixture of 2 parts 0.1 % Triton X-100 and 1 part household bleach. Seeds were washed five times with sterile water and plated onto ammonia-free Murashige and Skoog, MS, media (Murashige and Skoog, 1962). The plants were then cold treated for 2 days in the dark before placed in continuous light chamber or 16 hour light/ 8 hour dark chamber at 22 oC. For anthocyanin induction experiments, Arabidopsis seedlings were grown on MS media supplemented with 2% sucrose under continuous light for three days. 72

Phylogenetic analysis The phylogenetic tree of plant 4CL homologs was constructed using MRBAYES version 3.1.1 (Huelsenbeck and Ronquist, 2001), and the protein sequences were aligned using MEGA version 5 (Tamura et al., 2011). The protein sequences were obtained from the National Center for Biotechnology information (NCBI) using BLAST. The sequences are listed in Table 3-1.

GUS analysis 2548bp of 4CL1, 952bp of 4CL2, 1589bp of 4CL3 and 2416bp of 4CL4 promoters were cloned into pHGWFS7 destination vector that contains the GUS gene (Karimi et al., 2002) using Gateway® technology (Invitrogen).The plasmids were transformed into Arabidopsis using agrobacterium (Karimi et al., 2002). T3 generation plants were used for analysis. The histochemical GUS assay was performed at 37 °C for 24 hours in 50 mM sodium phosphate buffer at pH 7 which contains 0.5 mg mL-1 X-Gluc (5-bromo-4-chloro-3-indolyl β-D-glucuronide), 0.5 mM potassium ferricyanide, 0.5 mM potassium ferrocyanide, and 0.3% Triton X-100. Then the samples were washed with 70% ethanol overnight (Jefferson et al., 1987). The samples were visualized by either a dissecting microscope or light microscope.

Genotyping The T-DNA insertional lines were obtained from the Arabidopsis Biological Resource Center (ABRC). Homozygous T-DNA insertional mutants were identified by genotyping PCR and used for later analysis. WiscDsLox473B01 is designated to be the 4cl1 mutant. LP, RP and border primers used were cc 3991, cc 3992 and cc4208. Salk_110197 is designated to be the 4cl2 mutant. LP, RP and border primers used were cc 4123, cc 4124 and cc4125. Sail_636_b07 is designated to be the 4cl3 mutant. LP, RP and border primers used were cc 4036, cc 4364 and cc1973. Salk_063824 is designated to be the 4cl4 mutant. LP, RP and border primers used were cc 4365, cc 4366 and cc4125. 73

Soluble metabolite analysis Flavonol glycosides and sinapoylmalate were extracted from rosette or inflorescence with 50% MeOH at 100 mg fresh weight mL-1 for 30 minutes at 65 oC. The extracts were separated by HPLC using a Shim-pack XR-ODS column (Shimadzu, Kyoto, Japan; column dimensions 3.0 mm I.D.x 75 mm, bead size 2.2 µm). The gradient for HPLC separation was 2.0 % to 25 % acetonitrile during 29.5 min in 0.1% formic acid at a flow rate of 0.9 mL min-1. Anthocyanins in seedlings were extracted with 50 % methanol with 4% acetic acid solution (100 mg fresh weight mL-1) using an acetonitrile gradient from 10 % to 20 % for 10.5 min in 10 % formic acid at a flow rate of 0.7 mL min-1. Kaempferol, sinapic acid and cyanidin were used as standard for quantification of flavonol glycosides, sinapoylmalate and anthocyanins, respectively.

Lignin analysis For plant cell wall preparation, dried Arabidopsis inflorescence tissue was ground to powder in liquid nitrogen as described previously (Meyer et al., 1998). The acetyl bromide content was determined as described in (Fukushima and Hatfield, 2001). DFRC lignin content was determined as described in (Peng et al., 1998).

4CL enzyme assay 4CL activity assays were conducted according to conditions described in (Klempien et al., 2012) with slight modifications: 25 µL of final volume contained

100 mM Tris-HCl pH 8.0, 5mM MgCl2, 5mM ATP, 25 µM CoA and 4 mM p- coumaric acid. The reaction was started by adding 5 µL of desalted protein extract from plants and continued for 40 minutes at room temperature. The reaction was stopped by adding 4 µL of 40% TCA. Samples were then spun down and the supernatant was analyzed by HPLC using a SB-C18 column (4.6 x 250 mm) and a MeOH gradient 10%-40% over 20 minutes with 50 mM potassium phosphate buffer at pH 7.0 as mobile phase. Protein concentration 74 was determined by the Bicinchoninic Acid assay method (www.piercenet.com) using bovine serum albumin (Sigma Aldrich St. Louis MO) as standard.

75

References

Bassard JE, Richert L, Geerinck J, Renault H, Duval F, Ullmann P, Schmitt M, Meyer E, Mutterer J, Boerjan W, De Jaeger G, Mely Y, Goossens A, Werck-Reichhart D (2012) Protein-protein and protein-membrane associations in the lignin pathway. Plant Cell 24: 4465-4482 Block A, Widhalm JR, Fatihi A, Cahoon RE, Wamboldt Y, Elowsky C, Mackenzie SA, Cahoon EB, Chapple C, Dudareva N, Basset GJ (2014) The origin and biosynthesis of the benzenoid moiety of ubiquinone (Coenzyme Q) in Arabidopsis. Plant Cell 26: 1938-1948 Boerjan W, Ralph J, Baucher M (2003) Lignin biosynthesis. Annu Rev Plant Biol 54: 519-546 Bonawitz ND, Chapple C (2010) The genetics of lignin biosynthesis: connecting genotype to phenotype. Annu Rev Genet 44: 337-363 Bonawitz ND, Chapple C (2013) Can genetic engineering of lignin deposition be accomplished without an unacceptable yield penalty? Curr Opin Biotechnol 24: 336-343 Bonawitz ND, Kim JI, Tobimatsu Y, Ciesielski PN, Anderson NA, Ximenes E, Maeda J, Ralph J, Donohoe BS, Ladisch M, Chapple C (2014) Disruption of Mediator rescues the stunted growth of a lignin-deficient Arabidopsis mutant. Nature 509: 376-380 Bonawitz ND, Soltau WL, Blatchley MR, Powers BL, Hurlock AK, Seals LA, Weng JK, Stout J, Chapple C (2012) REF4 and RFR1, subunits of the transcriptional coregulatory complex Mediator, are required for phenylpropanoid homeostasis in Arabidopsis. J Biol Chem 287: 5434- 5445 Broun P (2005) Transcriptional control of flavonoid biosynthesis: a complex network of conserved regulators involved in multiple aspects of differentiation in Arabidopsis. Curr Opin Plant Biol 8: 272-279 Chapple CC, Vogt T, Ellis BE, Somerville CR (1992) An Arabidopsis mutant defective in the general phenylpropanoid pathway. Plant Cell 4: 1413- 1424 Chen HY, Babst BA, Nyamdari B, Hu H, Sykes R, Davis MF, Harding SA, Tsai CJ (2014) Ectopic expression of a loblolly pine class II 4- coumarate:CoA ligase alters soluble phenylpropanoid metabolism but not lignin biosynthesis in Populus. Plant Cell Physiol 55: 1669-1678 Chen L, Auh C, Chen F, Cheng X, Aljoe H, Dixon RA, Wang Z (2002) Lignin deposition and associated changes in anatomy, enzyme activity, gene expression, and ruminal degradability in stems of tall fescue at different developmental stages. J Agric Food Chem 50: 5558-5565 Coleman HD, Park JY, Nair R, Chapple C, Mansfield SD (2008) RNAi- mediated suppression of p-coumaroyl-CoA 3'-hydroxylase in hybrid poplar impacts lignin deposition and soluble secondary metabolism. Proc Natl Acad Sci U S A 105: 4501-4506 76

Costa MA, Bedgar DL, Moinuddin SG, Kim KW, Cardenas CL, Cochrane FC, Shockey JM, Helms GL, Amakura Y, Takahashi H, Milhollan JK, Davin LB, Browse J, Lewis NG (2005) Characterization in vitro and in vivo of the putative multigene 4-coumarate:CoA ligase network in Arabidopsis: syringyl lignin and sinapate/sinapyl alcohol derivative formation. Phytochemistry 66: 2072-2091 Cukovic D, Ehlting J, VanZiffle JA, Douglas CJ (2001) Structure and evolution of 4-coumarate:coenzyme A ligase (4CL) gene families. Biol Chem 382: 645-654 Dixon RA, Paiva NL (1995) Stress-induced phenylpropanoid metabolism. Plant Cell 7: 1085-1097 Do CT, Pollet B, Thevenin J, Sibout R, Denoue D, Barriere Y, Lapierre C, Jouanin L (2007) Both caffeoyl Coenzyme A 3-O-methyltransferase 1 and caffeic acid O-methyltransferase 1 are involved in redundant functions for lignin, flavonoids and sinapoyl malate biosynthesis in Arabidopsis. Planta 226: 1117-1129 Dobritsa AA, Geanconteri A, Shrestha J, Carlson A, Kooyers N, Coerper D, Urbanczyk-Wochniak E, Bench BJ, Sumner LW, Swanson R, Preuss D (2011) A large-scale genetic screen in Arabidopsis to identify genes involved in pollen exine production. Plant Physiol 157: 947-970 Ehlting J, Buttner D, Wang Q, Douglas CJ, Somssich IE, Kombrink E (1999) Three 4-coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionarily divergent classes in angiosperms. Plant J 19: 9-20 Ehlting J, Shin JJ, Douglas CJ (2001) Identification of 4-coumarate:coenzyme A ligase (4CL) substrate recognition domains. Plant J 27: 455-465 Franke R, Hemm MR, Denault JW, Ruegger MO, Humphreys JM, Chapple C (2002) Changes in secondary metabolism and deposition of an unusual lignin in the ref8 mutant of Arabidopsis. Plant J 30: 47-59 Franke R, Humphreys JM, Hemm MR, Denault JW, Ruegger MO, Cusumano JC, Chapple C (2002) The Arabidopsis REF8 gene encodes the 3- hydroxylase of phenylpropanoid metabolism. Plant J 30: 33-45 Fukushima RS, Hatfield RD (2001) Extraction and isolation of lignin for utilization as a standard to determine lignin concentration using the acetyl bromide spectrophotometric method. J Agric Food Chem 49: 3133-3139 Hahlbrock K, Grisebach H (1979) Enzymic controls in the biosynthesis of lignin and flavonoids. Annual Review of Plant Physiology and Plant Molecular Biology 30: 105-130 Hamberger B, Hahlbrock K (2004) The 4-coumarate:CoA ligase gene family in Arabidopsis thaliana comprises one rare, sinapate-activating and three commonly occurring isoenzymes. Proc Natl Acad Sci U S A 101: 2209- 2214 Hawkins S, Boudet A (1996) Wound-induced lignin and suberin deposition in a woody angiosperm (Eucalyptus gunnii Hook.): Histochemistry of early changes in young plants. Protoplasma 191: 96-104 77

Hu WJ, Harding SA, Lung J, Popko JL, Ralph J, Stokke DD, Tsai CJ, Chiang VL (1999) Repression of lignin biosynthesis promotes cellulose accumulation and growth in transgenic trees. Nat Biotechnol 17: 808-812 Huang J, Gu M, Lai Z, Fan B, Shi K, Zhou YH, Yu JQ, Chen Z (2010) Functional analysis of the Arabidopsis PAL gene family in plant growth, development, and response to environmental stress. Plant Physiol 153: 1526-1538 Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17: 754-755 Jefferson RA, Kavanagh TA, Bevan MW (1987) Gus fusions: beta- glucuronidase as a sensitive and versatile gene fusion marker in higher- plants. Embo Journal 6: 3901-3907 Kajita S, Katayama Y, Omori S (1996) Alterations in the biosynthesis of lignin in transgenic plants with chimeric genes for 4-coumarate: coenzyme A ligase. Plant Cell Physiol 37: 957-965 Karimi M, Inze D, Depicker A (2002) GATEWAYTM vectors for agrobacterium- mediated plant transformation. Trends in Plant Science 7: 193-195 Kienow L, Schneider K, Bartsch M, Stuible HP, Weng H, Miersch O, Wasternack C, Kombrink E (2008) Jasmonates meet fatty acids: functional analysis of a new acyl-coenzyme A synthetase family from Arabidopsis thaliana. Journal of Experimental Botany 59: 403-419 Klempien A, Kaminaga Y, Qualley A, Nagegowda DA, Widhalm JR, Orlova I, Shasany AK, Taguchi G, Kish CM, Cooper BR, D'Auria JC, Rhodes D, Pichersky E, Dudareva N (2012) Contribution of CoA Ligases to benzenoid biosynthesis in petunia flowers. Plant Cell 24: 2015-2030 Knobloch KH, Hahlbrock K (1975) Isoenzymes of p-coumarate:CoA ligase from soybean cell-suspension cultures. Planta Medica: 102-106 Koopman F, Beekwilder J, Crimi B, van Houwelingen A, Hall RD, Bosch D, van Maris AJ, Pronk JT, Daran JM (2012) De novo production of the flavonoid naringenin in engineered Saccharomyces cerevisiae. Microb Cell Fact 11: 155 Koukol J, Conn EE (1961) The metabolism of aromatic compounds in higher plants. IV. Purification and properties of the phenylalanine deaminase of Hordeum vulgare. J Biol Chem 236: 2692-2698 Kubasek WL, Shirley BW, McKillop A, Goodman HM, Briggs W, Ausubel FM (1992) Regulation of flavonoid biosynthetic genes in germinating Arabidopsis seedlings. Plant Cell 4: 1229-1236 Landry LG, Chapple CC, Last RL (1995) Arabidopsis mutants lacking phenolic sunscreens exhibit enhanced ultraviolet-B injury and oxidative damage. Plant Physiol 109: 1159-1166 Lee D, Ellard M, Wanner LA, Davis KR, Douglas CJ (1995) The Arabidopsis thaliana 4-coumarate:CoA ligase (4CL) gene: stress and developmentally regulated expression and nucleotide sequence of its cDNA. Plant Mol Biol 28: 871-884 78

Lee D, Meyer K, Chapple C, Douglas CJ (1997) Antisense suppression of 4- coumarate:coenzyme A ligase activity in Arabidopsis leads to altered lignin subunit composition. Plant Cell 9: 1985-1998 Li L, Zhou Y, Cheng X, Sun J, Marita JM, Ralph J, Chiang VL (2003) Combinatorial modification of multiple lignin traits in trees through multigene cotransformation. Proc Natl Acad Sci U S A 100: 4939-4944 Li X, Bonawitz ND, Weng JK, Chapple C (2010) The growth reduction associated with repressed lignin biosynthesis in Arabidopsis thaliana is independent of flavonoids. Plant Cell 22: 1620-1632 Li ZB, Li CF, Li J, Zhang YS (2014) Molecular cloning and functional characterization of two divergent 4-coumarate : coenzyme A ligases from Kudzu (Pueraria lobata). Biol Pharm Bull 37: 113-122 Lim EK, Li Y, Parr A, Jackson R, Ashford DA, Bowles DJ (2001) Identification of glucosyltransferase genes involved in sinapate metabolism and lignin synthesis in Arabidopsis. J Biol Chem 276: 4344-4349 Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate genes. Science 290: 1151-1155 Meyer K, Shirley AM, Cusumano JC, Bell-Lelong DA, Chapple C (1998) Lignin monomer composition is determined by the expression of a cytochrome P450-dependent monooxygenase in Arabidopsis. Proc Natl Acad Sci U S A 95: 6619-6623 Murashige T, Skoog F (1962) A revised medium for rapid growth and bio assays with tobacco tissue cultures. Physiologia Plantarum 15: 473-497 Peng J, Lu F, Ralph J (1998) The DFRC method for lignin analysis. 4. Lignin dimers Isolated from DFRC-degraded loblolly pine wood. J Agric Food Chem 46: 553-560 Rensing SA (2014) Gene duplication as a driver of plant morphogenetic evolution. Curr Opin Plant Biol 17: 43-48 Rittinger PA, Biggs AR, Peirson DR (1987) Histochemistry of lignin and suberin deposition in boundary-layers formed after wounding in various plant-species and organs. Canadian Journal of Botany-Revue Canadienne De Botanique 65: 1886-1892 Rohde A, Morreel K, Ralph J, Goeminne G, Hostyn V, De Rycke R, Kushnir S, Van Doorsselaere J, Joseleau JP, Vuylsteke M, Van Driessche G, Van Beeumen J, Messens E, Boerjan W (2004) Molecular phenotyping of the pal1 and pal2 mutants of Arabidopsis thaliana reveals far-reaching consequences on phenylpropanoid, amino acid, and carbohydrate metabolism. Plant Cell 16: 2749-2771 Russell DW (1971) The metabolism of aromatic compounds in higer plants. X. Properties of the cinnamic acid 4-hydroxylase of pea seedlings and some aspects of its metabolic and developmental control. J Biol Chem 246: 3870-3878 Russell DW, Conn EE (1967) The cinnamic acid 4-hydroxylase of pea seedlings. Arch Biochem Biophys 122: 256-258 79

Saballos A, Sattler SE, Sanchez E, Foster TP, Xin Z, Kang C, Pedersen JF, Vermerris W (2012) Brown midrib2 (Bmr2) encodes the major 4- coumarate:coenzyme A ligase involved in lignin biosynthesis in sorghum (Sorghum bicolor (L.) Moench). Plant J 70: 818-830 Saito K, Yonekura-Sakakibara K, Nakabayashi R, Higashi Y, Yamazaki M, Tohge T, Fernie AR (2013) The flavonoid biosynthetic pathway in Arabidopsis: structural and genetic diversity. Plant Physiol Biochem 72: 21-34 Schilmiller AL, Stout J, Weng JK, Humphreys J, Ruegger MO, Chapple C (2009) Mutations in the cinnamate 4-hydroxylase gene impact metabolism, growth and development in Arabidopsis. Plant J 60: 771-782 Schneider K, Hovel K, Witzel K, Hamberger B, Schomburg D, Kombrink E, Stuible HP (2003) The substrate specificity-determining amino acid code of 4-coumarate:CoA ligase. Proc Natl Acad Sci U S A 100: 8601-8606 Schoch G, Goepfert S, Morant M, Hehn A, Meyer D, Ullmann P, Werck- Reichhart D (2001) CYP98A3 from Arabidopsis thaliana is a 3'- hydroxylase of phenolic esters, a missing link in the phenylpropanoid pathway. J Biol Chem 276: 36566-36574 Shirley BW, Kubasek WL, Storz G, Bruggemann E, Koornneef M, Ausubel FM, Goodman HM (1995) Analysis of Arabidopsis mutants deficient in flavonoid biosynthesis. Plant J 8: 659-671 Sibout R, Eudes A, Mouille G, Pollet B, Lapierre C, Jouanin L, Seguin A (2005) CINNAMYL ALCOHOL DEHYDROGENASE-C and -D are the primary genes involved in lignin biosynthesis in the floral stem of Arabidopsis. Plant Cell 17: 2059-2076 Soltani BM, Ehlting J, Hamberger B, Douglas CJ (2006) Multiple cis- regulatory elements regulate distinct and complex patterns of developmental and wound-induced expression of Arabidopsis thaliana 4CL gene family members. Planta 224: 1226-1238 Stuible HP, Kombrink E (2001) Identification of the substrate specificity- conferring amino acid residues of 4-coumarate:coenzyme A ligase allows the rational design of mutant enzymes with new catalytic properties. J Biol Chem 276: 26893-26897 Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S (2011) MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Molecular Biology and Evolution 28: 2731-2739 Tanaka Y, Sasaki N, Ohmiya A (2008) Biosynthesis of plant pigments: anthocyanins, betalains and carotenoids. Plant J 54: 733-749 Van Acker R, Vanholme R, Storme V, Mortimer JC, Dupree P, Boerjan W (2013) Lignin biosynthesis perturbations affect secondary cell wall composition and saccharification yield in Arabidopsis thaliana. Biotechnol Biofuels 6: 46

80

Vanholme R, Cesarino I, Rataj K, Xiao Y, Sundin L, Goeminne G, Kim H, Cross J, Morreel K, Araujo P, Welsh L, Haustraete J, McClellan C, Vanholme B, Ralph J, Simpson GG, Halpin C, Boerjan W (2013) Caffeoyl shikimate esterase (CSE) is an enzyme in the lignin biosynthetic pathway in Arabidopsis. Science 341: 1103-1106 Voelker SL, Lachenbruch B, Meinzer FC, Jourdes M, Ki C, Patten AM, Davin LB, Lewis NG, Tuskan GA, Gunter L, Decker SR, Selig MJ, Sykes R, Himmel ME, Kitin P, Shevchenko O, Strauss SH (2010) Antisense down-regulation of 4CL expression alters lignification, tree growth, and saccharification potential of field-grown poplar. Plant Physiol 154: 874-886 Wagner A, Donaldson L, Kim H, Phillips L, Flint H, Steward D, Torr K, Koch G, Schmitt U, Ralph J (2009) Suppression of 4-coumarate-CoA ligase in the coniferous gymnosperm Pinus radiata. Plant Physiol 149: 370-383 Weisshaar B, Jenkins GI (1998) Phenylpropanoid biosynthesis and its regulation. Curr Opin Plant Biol 1: 251-257 Weng JK, Li Y, Mo H, Chapple C (2012) Assembly of an evolutionarily new pathway for alpha-pyrone biosynthesis in Arabidopsis. Science 337: 960- 964 Winkel-Shirley B (2001) Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiol 126: 485-493 Xiang Z, Sen S, Roy A, Min D, Savithri D, Jameel H, Chiang V, Chang H (2015) Wood characteristics and enzymatic saccharification efficiency of field-grown transgenic black cottonwood with altered lignin content and structure. Cellulose 22: 683-693 Xu B, Escamilla-Trevino LL, Sathitsuksanoh N, Shen Z, Shen H, Zhang YH, Dixon RA, Zhao B (2011) Silencing of 4-coumarate:coenzyme A ligase in switchgrass leads to reduced lignin content and improved fermentable sugar yields for biofuel production. New Phytol 192: 611-625

81

Table 3-1. List of sequences in phylogenetic tree

species accession species accession Allium cepa AAS48417.1 Phaseolus vulgaris AGW47870.1 Amborella trichopoda ERN07293.1 Phaseolus vulgaris ESW22826.1 Arabidopsis lyrata CAP08804.1 Phaseolus vulgaris ESW26770.1 Arabidopsis lyrata subsp. lyrata XP_002888305.1 Phaseolus vulgaris ESW29061.1 Arabidopsis thaliana NP_176686.1 Physcomitrella patens ABV60447.1 Arabidopsis thaliana NP_175579.1 Physcomitrella patens ABV60448.1 Arabidopsis thaliana NP_188761.1 Physcomitrella patens ABV60449.1 Arabidopsis thaliana NP_188760.3 Physcomitrella patens ABV60450.1 Arnebia euchroma ABD59789.2 Picea sitchensis ABR17998.1 Betula luminifera ACJ38668.1 Pinus massoniana ACO40513.1 Betula platyphylla AAV65114.1 Pinus radiata ACF35279.1 Brachypodium distachyon XP_003570153.1 Pinus taeda AAA92669.1 Camellia sinensis ABA40922.1 Pinus taeda AGX45528.1 Cannabis sativa AHA42444.1 Platycodon grandiflorus AEM63673.1 Capsella rubella EOA40053.1 Populus deltoides AAZ30033.1 Capsella rubella XP_006297290.1 Populus nigra AEN02915.1 Capsella rubella XP_006297366.1 Populus tomentosa AAL02144.1 Capsella rubella XP_006302083.1 Populus tomentosa AAL56850.1 Capsella rubella XP_006307155.1 Populus tomentosa AAY84731.1 Capsicum annuum AAG43823.1 Populus tomentosa 3A9U Capsicum annuum ACF17632.1 Populus tomentosa AFC89538.1 Cicer arietinum XP_004486593.1 Populus tomentosa AFC89540.1 Cicer arietinum XP_004486593.1 Populus tomentosa AFC89541.1 Cicer arietinum XP_004507544.1 Populus tomentosa AFZ78520.1 Cicer arietinum XP_004511423.1 Populus tremuloides AAC24503.1 Cicer arietinum XP_004511424.1 Populus tremuloides AAC24504.1 Cinnamomum osmophloeum AFG26323.1 Populus trichocarpa XP_002297699.1 Cinnamomum osmophloeum AFG26324.1 Populus trichocarpa XP_002329649.1 Citrus clementina ESR40954.1 Populus trichocarpa XP_002324477.1 Citrus clementina ESR56437.1 Populus trichocarpa XP_002325815.1 Citrus clementina ESR66633.1 Populus trichocarpa EEF03042.2 Coffea arabica AFP49810.1 Populus trichocarpa ERP59614.1 Coffea arabica AFP49811.1 Populus trichocarpa EEE82504.2 Corchorus capsularis ADE95828.1 Populus trichocarpa EEE82504.2 Cucumis sativus XP_004135893.1 Populus trichocarpa EEE82504.2 Populus trichocarpa x Populus Cucumis sativus XP_004135894.1 deltoides AAC39365.1 Populus trichocarpa x Populus Cucumis sativus XP_004137646.1 deltoides AAC39366.1 Populus trichocarpa x Populus Cucumis sativus XP_004151908.1 deltoides AAK58908.1 Populus trichocarpa x Populus Cucumis sativus XP_004166853.1 deltoides AAK58909.1 Cucumis sativus XP_004166854.1 Prunus avium ADZ54779.1 Cucumis sativus XP_004172312.1 Prunus persica EMJ16718.1 Cunninghamia lanceolata AFX98059.1 Prunus persica EMJ23441.1 Eucalyptus camaldulensis AAZ79469.1 Pueraria montana var. lobata AGW16013.1 Eucalyptus camaldulensis ACX68559.1 Pueraria montana var. lobata AGW16014.1 Eucalyptus camaldulensis ACY66928.1 Punica granatum AFU90743.1 Eucalyptus globulus subsp. globulus BAI47543.1 Pyrus pyrifolia AFY97681.1 82

Eutrema salsugineum ESQ30298.1 Pyrus pyrifolia AFY97682.1 Eutrema salsugineum ESQ38855.1 Ricinus communis EEF49284.1 Eutrema salsugineum ESQ47749.1 Ricinus communis XP_002512781.1 Eutrema salsugineum ESQ47750.1 Ricinus communis XP_002520028.1 Eutrema salsugineum ESQ47751.1 Ricinus communis XP_002533186.1 Fragaria vesca subsp. vesca XP_004303212.1 Ricinus communis XP_002533186.1 Fragaria vesca subsp. vesca XP_004307864.1 Rubus idaeus AAF91308.1 Fragaria vesca subsp. vesca XP_004309949.1 Rubus idaeus AAF91309.1 Galega orientalis ACZ64784.1 Rubus idaeus AAF91310.1 Galtonia saundersiae AGW24660.1 Ruta graveolens ABY60842.1 Glycine max P31687.2 Ruta graveolens ABY60843.1 Glycine max NP_001236236.1 Ruta graveolens ABY81910.1 Glycine max NP_001237270.1 Ruta graveolens ABY81911.1 Glycine max XP_003538929.1 Salix arbutifolia AGO89320.1 Glycine max XP_003542129.1 Salix arbutifolia AGO89321.1 Glycine max XP_003550658.1 Salvia miltiorrhiza AAP68991.1 Glycine max XP_003550660.1 Salvia miltiorrhiza AGW27193.1 Gossypium hirsutum ACT32027.1 Selaginella moellendorffii XP_002969881.1 Gossypium hirsutum ACZ06243.1 Selaginella moellendorffii XP_002979073.1 Gossypium hirsutum ADU15550.1 Selaginella moellendorffii XP_002985214.1 Gynura bicolor BAJ17664.1 Selaginella moellendorffii XP_002986545.1 Hibiscus cannabinus ADK24217.1 Setaria italica XP_004953555.1 Hibiscus cannabinus AGJ84134.1 Setaria italica XP_004953556.1 Humulus lupulus ACM69363.1 Solanum lycopersicum XP_004235294.1 Humulus lupulus AGA17918.1 Solanum lycopersicum XP_004235870.1 Ipomoea batatas BAG82851.1 Solanum lycopersicum XP_004241647.1 Lilium hybrid cultivar ADO22687.1 Solanum lycopersicum XP_004242790.1 Linum usitatissimum AGO50638.1 Solanum tuberosum P31684.1 Lithospermum erythrorhizon BAA08365.1 Solanum tuberosum AAD40664.1 Lithospermum erythrorhizon BAA08366.2 Solanum tuberosum XP_006341437.1 Lolium perenne AAF37732.1 Solanum tuberosum XP_006347575.1 Lonicera japonica AGE10594.1 Solanum tuberosum XP_006366277.1 Malus hybrid cultivar AFG30056.1 Solanum tuberosum XP_006366955.1 Medicago truncatula XP_003604127.1 Solanum tuberosum XP_006366955.1 Medicago truncatula XP_003610843.1 Sorbus aucuparia ADE96996.1 Medicago truncatula XP_003637266.1 Sorbus aucuparia ADE96997.1 Medicago truncatula AFK47607.1 Sorbus aucuparia ADF30254.1 Melissa officinalis CBJ23825.1 Sorghum bicolor XP_002452704.1 Nicotiana tabacum BAA07828.1 Thellungiella halophila BAJ33983.1 Nicotiana tabacum AAB18637.1 Theobroma cacao EOY10077.1 Nicotiana tabacum O24145.1 Theobroma cacao EOY27424.1 Nicotiana tabacum O24146.1 Theobroma cacao EOY31810.1 Ocimum basilicum AGP02119.1 Vitis vinifera CAN69130.1 Ocimum tenuiflorum ADO16242.1 Vitis vinifera CAN73910.1 Oryza sativa AAA69580.1 Vitis vinifera XP_002272782.1 Oryza sativa Indica Group EAY87168.1 Vitis vinifera XP_002269945.1 Oryza sativa Japonica Group NP_001047819.1 Vitis vinifera XP_002274994.1 Panicum virgatum ADZ96250.1 Vitis vinifera CBI26520.3 Paulownia fortunei ACL31667.1 Vitis vinifera AEX32786.1 Petroselinum crispum P14913.1 Zea mays NP_001105258.1 Petunia x hybrida AEO52694.1 Zea mays ACF84437.1 Phaseolus vulgaris AGW47866.1 Zea mays AFW63475.1 83

Figures

Figure 3-1. Bayesian phylogenetic analysis of 192 4CL homologs. Selected bootstrap values are shown. The colors represent different plant lineages: green, dicots; orange, monocots; blue, pine; purple; bryophyte and lycophyte. The split between the groups including 4CL3 and 4CL1, 2, 4 happened before the divergence of monocots and dicots. 84

Figure 3-2. Promoter GUS reporter gene analysis of 4CL gene expression. (A) ten day old seeding, (B) root tip of ten day old seeding, (C) two week old rosette, (D) three week old rosette, (E) cross section of five week old inflorescence stem, (F) five week old flower. 85

Figure 3-3. Measurements of soluble metabolites and enzyme activity in two week old rosette. (A) HPLC chromatograms of rosette methanol extracts with UV absorbance measured at 313 nm. 1,2, p-coumaroyl glucose; 3,4, p-coumaroyl malate; 5,6, feruloyl malate. (B) sinapoylmalate content quantification by HPLC. (C) flavonol content quantification by HPLC. (D) 4CL enzyme activity measuring product formation. Error bars represent at least three biological replicates. Statistical differences are calculated by One-way Anova and corrected by Dunnett’s multiple comparisons test, indicated by letters “a,b,c”. 86

Figure 3-4. Measurements of soluble metabolites and enzyme activity in five week old inflorescence. (A) HPLC chromatograms of rosette methanol extracts with UV absorbance measured at 313 nm. 1,2, p-coumaroyl glucose; 3,4, p- coumaroyl malate; 5,6, feruloyl malate. (B) sinapoylmalate content quantification by HPLC. (C) flavonol content quantification by HPLC. (D) 4CL enzyme activity measuring product formation. Error bars represent at least three biological replicates. Statistical differences are calculated by One-way Anova and corrected by Dunnett’s multiple comparisons test, indicated by letters “a”. 87

Figure 3-5. Plant growth and lignin deposition. (A) Acetyl bromide quantification of lignin content. “a” shows student’s t-test p<0.02. (B) DFRC lignin composition. Total G+S are ≥ 94% in all samples. (C) Plant images of 4cl1 4cl2, and 4cl1 4cl2 4cl3 mutants compared to wild type at eight weeks of age. (D) Plant heights at six weeks of age. Error bars represent three biological replicates. Statistical differences in (B) and (D) are calculated by One-way Anova and corrected by Dunnett’s multiple comparisons test, indicated by “a,b,c”. 88

Figure 3-6. Anthocyanin induction of Arabidopsis seedlings. Arabidopsis seedlings were grown under 2% sucrose stress and 24 hour continuous light. Anthocyanin contents were quantified by HPLC at day four. Error bars represent three biological replicates. Statistical differences are calculated by One-way Anova and corrected by Dunnett’s multiple comparisons test, indicated by “a”.

89

Figure 3-7. Current model of the roles of 4CL isoforms. The four isoforms of the 4CLS have overlapping yet distinct roles in phenylpropanoid metabolism. The relative heights of the letters in each metabolic category represent their relative functional contributions based on metabolic analysis and expression patterns. In lignin producing cell types, 4CL1 and 4CL2 are the most important. In flavonoid producing cells, 4CL3 has a dominant role and in sinapoylmalate producing cells, 4CL1 and 4CL3 are both major functional 4CLs. We could not find a role for 4CL4, at least in the organs examined (illustrated by hollowed font). 90

Figure 3-8. The phenylpropanoid pathway. 4CL acts at two steps (indicated in red) of the pathway and is required for the biosynthesis of lignin and soluble metabolites. In mutants containing at least the 4cl1 mutation, p-coumaroyl glucose, p-coumaroyl malate and feruloyl malate accumulate as atypical phenylpropanoids as indicated gray. List of abbreviated enzymes: phenylalanine ammonia lyase (PAL), cinnamate 4-hydroxylase (C4H), hydroxycinnamoyl CoA shikimate:quinate hydroxycinnamoyltransferase (HCT), p-coumaroylshikimate 3’- hydroxylase (C3’H), caffeoyl shikimate esterase (CSE), caffeic acid/5- hydroxyferulic acid O-methyltransferase (COMT), sinapic acid:UDP-glucose dependent sinapoyltransferase (SGT), sinapoylglucose malate:sinapoyltransferase (SMT).

VITA 91

VITA

Yi Li

175 S. University Street, Purdue University, West Lafayette, IN, USA 47907-2063 Phone: 484-767-3444 Email: [email protected] Education

Ph.D., Biochemistry, Purdue University, West Lafayette, IN, August, 2015

B.S., Biochemistry (major) Business management (minor), cum laude, Moravian College, Bethlehem, PA, May 2008

Research

2008-2015, Doctoral research. Department of Biochemistry, Purdue University, West Lafayette, IN. Advisor: Dr. Clint Chapple, Head and Distinguished Professor of Biochemistry. Project I: Understanding the novel biosynthesis of arabidopyrones in Arabidopsis thaliana. Project II: Arabidopsis 4-coumarate CoA Ligases (4CLs) have distinct roles in phenylpropanoid metabolism

Techniques: HPLC, LC-MS, stable-isotope labeling, molecular cloning, phylogenetic analysis and bioinformatics, protein expression, enzyme kinetics, and plant genetics

2005-2007, Undergraduate research. Department of Biochemistry, Moravian College, Bethlehem, PA. Advisor: Dr. Diane Husic, Head and Professor of Biochemistry. Project I: Preliminary studies of heavy metal tolerant plant species at the Lehigh Gap Superfund remediation site. Project II: Bioinformatics and enzymatic studies associated with D-lactate and glycolate metabolism in Chlamydomonas reinhardtii

Publications

Yi Li, Jeong Im Kim, Len Pysh, and Clint Chapple. Arabidopsis isoforms of 4CL have overlapping yet distinct functions in phenylpropanoid metabolism (submitted)

92

Jing-Ke Weng, Yi Li, Huaping Mo, Clint Chapple, Assembly of an evolutionarily new pathway for α-pyrone biosynthesis in Arabidopsis. Science 337, 960 (2012)

Honors and Awards

Beach Travel Award, Department of Biochemistry, Purdue University, West Lafayette, IN, 2014

Best Poster Award, 2nd place, Department of Biochemistry Retreat, Purdue University, West Lafayette, IN, 2010 “Understanding the biosynthesis of arabidopyrones in Arabidopsis thaliana”

Best Poster Award, 1st place, Banff Conference on Plant Metabolism. Banff, AB, Canada, June 2010 “Arabidopyrones are newly-identified phenylalanine-derived secondary metabolites in Arabidopsis thaliana”

Ross Fellowship, Purdue University, West Lafayette, IN, 2008-2009

Student Opportunities for Academic Research (SOAR) Scholarship, Moravian College, Bethlehem, PA, 2005-2007

Fanny Harrar Scholarship, Moravian College, Bethlehem, PA, 2004- 2008 International Student Grant, Moravian College, Bethlehem, PA, 2004- 2008

Professional Development

Seminars

Invited speaker, A Hitchhiker’s guide to the protein galaxy, a Purdue Symposium on Protein, West Lafayette, IN, May 2015

Department of Biochemistry Graduate and Post-doc Seminar Series, Purdue University, West Lafayette, IN, 2009-2014

Plant Science Inter-Departmental Seminar, Purdue University, West Lafayette, IN, Spring 2010

National Conference of Undergraduate Research (NCUR), University of North Carolina, Asheville, NC, 2006

93

Moravian College 1st Annual Scholars Day, Moravian College, Bethlehem, PA, 2006

Posters

Gordon Conference on Plant Molecular Biology, Holderness, NH, July 2014

Department of Biochemistry Retreat, Purdue University. Turkey Run State Park, IN, 2010, 2011, 2012 and 2013

Banff Conference on Plant Metabolism. Banff, AB, Canada, June 2010

Biochemistry Department 75th Anniversary Celebration. Purdue University, West Lafayette, IN, October 2009

Service and Leadership

Coordinator and Host, Graduate Student and Post-Doctorate Seminar Series, Department of Biochemistry, Purdue University, 2012-2014

Coordinator and Host for Special Seminars, Department of Biochemistry, Purdue University. “Searching for Faculty Positions” in 2012 and “Alternative PhD Career as Consultant” in 2014

Peer Reviewer for preliminary-exam proposals, reviewed and edited 14 graduate students’ written and oral proposals. 2011-2014

Lafayette Regional Science Fair Judge, Lafayette, Indiana, 2011, 2013 and 2014

Committee Member, Biochemistry Department Annual Recognition Meeting, Purdue University, Fall 2009

Committee Member, CSREES Review Graduate Learning Committee, Department of Biochemistry, Purdue University, Summer 2009

Teaching and Other Career Experience

Teaching Assistant, Biochemistry 100, Purdue University, Fall 2012

Substitute Teacher, English and Math for ESL students, Notre Dame High School, Easton, PA, Spring 2008 94

Tutor, Math and Science for high school students, Bethlehem, PA, 2006- 2007 Resident Advisor, Student Affairs, Moravian College, Bethlehem, PA, 2005-2008

Languages

Native proficiency in English and Mandarin Chinese