Structure, Function, and Evolution in Flavonoid Biosynthesis

by

Geoffrey Liou

B.A. Molecular and Cell Biology University of California, Berkeley, 2013

Submitted to the Department of Biology in Partial Fulfillment of the Requirements for the Degree of

DOCTOR OF PHILOSOPHY

at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

June 2019

© 2019 Massachusetts Institute of Technology. All rights reserved.

Signature of Author ______Geoffrey Liou Department of Biology May 24, 2019

Certified by ______Jing-Ke Weng Assistant Professor of Biology Thesis Supervisor

Accepted by ______Amy E. Keating Professor of Biology Co-Director, Biology Graduate Committee

1

2

Enzyme Structure, Function, and Evolution in Flavonoid Biosynthesis

by

Geoffrey Liou

Submitted to the Department of Biology on May 24, 2019 in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biology

Abstract

Plant specialized metabolism is a key evolutionary adaptation that has enabled plants to migrate from water onto land and subsequently spread throughout terrestrial environments. Flavonoids are one particularly important class of plant specialized metabolites, playing a wide variety of roles in plant physiology including UV protection, pigmentation, and defense against herbivores and pathogens. Flavonoid diversity has increased in conjunction with land plant evolution over the past 470 million years.

This dissertation examines the structure, function, and evolution of in the flavonoid biosynthetic pathway. First, we structurally and biochemically characterized orthologs of chalcone synthase (CHS), the enzyme that catalyzes the first step of flavonoid biosynthesis, from diverse plant lineages. By doing so, we gained insight into the sequence changes that gave rise to increased reactivity of the catalytic cysteine residue in CHS orthologs in euphyllophytes compared to basal land plants. We then developed methods and transgenic plant lines to study the in vivo function of these CHS orthologs, as well as whether their functional differences play a ​ ​ role in redox-based regulation of flavonoid biosynthesis. Finally, we examined enzymes involved in the biosynthesis of galloylated catechins, a highly enriched class of flavonoids in tea that are thought to have health benefits in humans. These findings contribute to an understanding of the evolution of enzyme structure and function in flavonoid biosynthesis, and how it has facilitated the adaptation of plants to a wide variety of terrestrial habitats.

Thesis Supervisor: Jing-Ke Weng Title: Assistant Professor of Biology

3

4 Acknowledgements

First and foremost, I would like to thank my thesis advisor, Jing-Ke Weng. I feel very fortunate to have joined the lab as the first group of students and to have watched it grow over the years. I remember being instantly captivated by the research when I saw your presentation to the first-year graduate students. It was a perfect alignment of my interests at the time, structural biology and plant biology, but my experience has turned out to be more than I could ever have imagined. Your enthusiasm for and breadth of knowledge in all aspects of biology has been inspiring and has taught me to keep learning and discover my passions. Your insight and support has been invaluable in helping me think more broadly and work past the difficult parts of my research.

To the members of the Weng Lab, I can’t imagine a better group of people to work with. Tim Fallon, you have been a great classmate, seat neighbor, labmate, roommate, and most of all friend throughout the years. I’ll never forget the first time I saw fireflies when we went to New Jersey to collect Photinus pyralis. Olesya Levsh, I’m so glad to have joined the lab together with ​ ​ you. Your kindness set the tone of the lab from the beginning and has left a mark in the form of many lab traditions. Joe Jacobowitz, thanks for all the fun game nights and putting up with our teasing. Sophia Xu, it has been great sharing Asian snacks and bonding as baymates despite our differences (Go Bears!). Bena Chan, thank you for warmly welcoming me to the lab during my rotation, and for staying in touch and all your help in your current position in the Metabolomics Core. Valentina Carballo, thank you for everything you do to keep the lab running and making everyone feel like part of a family. Fu-Shuang Li, it has been inspiring to see your dedication to your family and your work. Mike Spence, thanks for imparting your wisdom in and life experience over the years. Bastien Christ, thank you for sharing your knowledge of plants, fondue, and sense of adventure. Tomáš Pluskal, I have learned so much about metabolomics, mind-bending films, and life from you, and I aspire to be an international man of mystery like you. Roland Kersten, thanks for your help over the years and bringing some California spirit to the lab. Chengchao Xu, Andrew Mitchell, Yasmin Chau, Matthew Hill, Chris Glinkerman, Menglong Xu, Anastassia Bobokalonova, Amy Zhang, Sheena Vazquez, Brian Levine, Jack Liu, Michael Gutierrez, Naoki Wada, Colin Kim, and others: thank you for making this lab a fun place to work. It has been a pleasure and privilege to get to know such an interesting, diverse group of people.

Thanks to Biograd 2013 for all the memories. In particular, Chetan, Aneesha, Rachit, Emir, Nicole, and Amelie, it’s been great sharing our journey through grad school and getting together to relax and unwind. Thank you to my friends at MIT Japanese Lunch Table, in the Cal Alumni Club of New England, in the Boston area, in New York, back home in California, and in Japan, and also my extended family in Taiwan, for all the fun times and keeping me connected to the things I love outside of school. Special thanks to MIT Japan Program Director Chris Pilcavage, Miyuki-san, Masako-san, the Baber family, Joey, Kristine, Matthew, Mark, Diana, Sherman, Roger, Yuzo, Heechan, Willie, Sam, David, Alex, Daniel, Tomoya, Moka, Chihhi, Yuihan, and Dahyun, among many others.

5 Finally, I want to thank my family. Kerry and Zachary, it’s been great to watch you follow your own paths as we become independent adults. To my mother and father, thank you for everything you have done for us, first and foremost valuing our education, which has brought us to where we are today. We are very fortunate to have so much opportunity here in the United States, and I am grateful for all your sacrifice and hard work that has brought us here.

6 Table of Contents

Abstract 3

Acknowledgements 5

Table of Contents 7

Chapter 1. Introduction 9 ​ ​ Overview of land plant evolution 9 Plant specialized metabolism 11 ​ Flavonoid biosynthesis and diversity 13 ​ Type III polyketide synthases 18 Applications of plant metabolic and enzyme engineering 21 ​ Concluding remarks 24 ​ References 26 ​

Chapter 2. Mechanistic basis for the evolution of chalcone synthase catalytic cysteine ​ ​ reactivity in land plants 31 ​ Abstract 32 ​ Introduction 33 ​ Results 37 ​ Basal-plant CHSs contain reduced catalytic cysteine in their crystal structures 37 ​ Basal-plant CHSs only partially complement the Arabidopsis CHS-null mutant 41 ​ The pKa of the catalytic cysteine is higher in basal-plant CHSs than in euphyllophyte ​ ​ ​ CHSs 42 ​ Residues near the active-site cavity affect the pKa and reactivity of the catalytic ​ ​ ​ cysteine 43 ​ Molecular dynamics simulations reveal differences in active-site interactions between basal-plant and euphyllophyte CHSs 49 Discussion 54 ​ Materials and Methods 58 References 65 ​ Supporting Information 68

Chapter 3. Regulation of chalcone synthase activity in vivo by oxidation of the catalytic ​ ​ cysteine 81 ​ Abstract 82 ​ Introduction 83 ​ Results 87 ​ tt5 mutant Arabidopsis thaliana accumulates naringenin and can be used for metabolic ​ ​ ​ tracing to measure CHS activity in vivo 87 ​ tt5 mutant Arabidopsis thaliana accumulates both enantiomers of naringenin 89 ​ ​ ​ Generation of and metabolic tracing with tt5 and mbs1-1 mutant Arabidopsis crosses 89 ​ ​ ​ ​

7 The catalytic cysteine in AtCHS is more sensitive to in vitro oxidation than in SmCHS 91 ​ FLAG-tag purification and western blotting of CHS 95 ​ Generation and characterization of transgenic Arabidopsis thaliana lines expressing ​ ​ FLAG-tagged CHS orthologs 98 ​ Discussion and Future Directions 101 ​ Materials and Methods 103 ​ References 111 ​

Appendix. Investigation of galloylated catechin biosynthetic enzymes in tea 113 ​ ​ ​ Abstract 114 ​ Introduction 115 ​ Results 119 ​ CsUGGT expression in Nicotiana benthamiana produces β-glucogallin 119 ​ ​ ​ Identification of ECGT candidate genes 119 ​ Nicotiana benthamiana leaf protein extraction fails to show ECGT activity 121 ​ ​ Discussion and Future Directions 124 ​ Materials and Methods 126 ​ References 131 ​

8 Chapter 1

Introduction

Overview of land plant evolution

Terrestrial life as it exists today was seeded and shaped by the transition of plants from water to land. Embryophytes, or land plants, evolved from the charophycean green algae approximately 470 million years ago (Kenrick & Crane, 1997). Life on land brought numerous ​ ​ challenges to plants previously ameliorated by an aquatic environment: ultraviolet radiation, desiccation, lack of structural support, and gas exchange. As other clades of life also adapted to land, plants also needed to fend off pathogens and herbivores. To adapt to these previously unencountered stresses, plants have evolved an extraordinary array of physiological innovations.

Departure from aquatic life required plants to develop a way to prevent desiccation on land. The evolution of the cuticle, a thin layer of lipids and waxes on the epidermal cells of plants, allowed for the control of water loss from transpiration and gas exchange (Riederer & ​ Muller, 2008). This innovation also allowed plants to concentrate the resources necessary for ​ photosynthesis, enabling more efficient carbon fixation and energy production. Sporopollenin, an extremely chemically inert polymer, was developed to protect spores and pollen, allowing plants to reproduce without dependence on water and to disseminate their progeny long distances, further facilitating their spread across land (Li, Phyo, Jacobowitz, Hong, & Weng, 2019). ​ ​ Without the buoyant forces provided by water, plants needed to develop new systems of structural support. Lignin, a hydrophobic polymer, provided this rigidity and also enabled the development of vascular tissues to transport water long distances (Weng & Chapple, 2010). ​ ​

9 These physical features combined to allow plants to grow much larger in size than previously possible, furthering their dominance over the terrestrial landscape. Lignin is now the second most abundant organic polymer, surpassed only by cellulose, illustrating the outsized importance of this metabolic innovation in shaping life on Earth (Boerjan, Ralph, & Baucher, 2003). The rise ​ ​ of vascular plants further shaped the biosphere: lignin provided a stable sink for carbon, and the development of roots led to increased weathering of Ca-Mg rocks (Berner, 1993). These ​ ​ processes both contributed to an 8 to 20-fold decrease in atmospheric CO2 (Harrison & Morris, ​ ​ ​

2018). The subsequent increase in atmospheric O2 enabled the diversification of many clades of ​ ​ ​ animals, as well as the evolution of physiological features such as flight in insects and megaphyll leaves in plants (Beerling, Osborne, & Chaloner, 2001; Graham, Aguilar, Dudley, & Gans, ​ 1995). ​ The evolution of seeds, which occurred roughly 320 million years ago, granted a variety of advantages that allowed embryophytes to reproduce successfully even in difficult environments: the seed coat protects the embryo from physical damage and herbivores, the endosperm provides a source of nutrients, and dormancy allows germination to be delayed until environmental conditions are favorable (Linkies, Graeber, Knight, & Leubner-Metzger, 2010). ​ ​ The emergence of flowers approximately 130 million years ago provided a more efficient method of pollination. During the Cretaceous, angiosperms subsequently underwent extraordinary diversification and became the dominant flora on Earth, outnumbering all other land plants in species abundance. The majority of angiosperms today are pollinated by insects, serving as a prominent example of symbiosis and its far-reaching impact on evolution (Friis, ​ Crane, & Pedersen, 2011). ​

10 Plant specialized metabolism

Evolutionary adaptation to new ecological niches was facilitated by not only anatomical adaptations but also metabolic innovations. As sessile organisms, plants cannot move away from stressors and instead have developed a vast chemical arsenal to adapt to their ecological niches.

In contrast to primary metabolism conserved across all kingdoms of life, these biosynthetic pathways are often restricted to particular lineages of plants. These metabolites are also not strictly essential for survival, although they can greatly enhance the fitness of organisms in a particular niche. This limited distribution and essentiality led to these pathways initially being dubbed secondary metabolism, but as scientists came to better understand the roles of these diverse chemical compounds, they were renamed specialized metabolism (Moghe & Last, 2015). ​ ​ Specialized metabolites act as pigments, flavors, scents, defense compounds, and so on to mitigate biotic or abiotic stresses, attract pollinators or seed dispersers, deter herbivores, and defend against pathogens, among many other functions. Many of these compounds also show evidence of therapeutic effects in humans, giving rise to various traditions of herbal medicine around the world. A prominent example is artemisinin from sweet wormwood, or Artemisia ​ annua; Tu Youyou was awarded the 2015 Nobel Prize in Physiology or Medicine for her work in ​ identifying the active compound of this traditional Chinese medicine. Even modern medicine relies heavily on plant natural products: at least 25% of drugs used today are derived from plant specialized metabolites (Schmidt, Ribnicky, Lipsky, & Raskin, 2007). ​ ​ Terpenoids, encompassing over 36,000 different compounds, are the largest class of plant specialized metabolites (Roberts, 2018). They exhibit enormous structural diversity in both the ​ ​ basic carbon skeleton—from relatively simple branched alkyl chains to polycyclic, bridged

11 compounds—and subsequent decorations such as hydroxylation and oxidation. Due to their hydrophobic and volatile nature, terpenoids often function in signaling, which encompasses defense against herbivores, attraction of pollinators, growth inhibition of nearby plants, and more

(Roberts, 2018). ​ Terpenoids are terpenes modified with additional functional groups. Common among all

these varied structures is the basic building block of C5 isoprene units, which are synthesized by ​ ​ the mevalonic acid pathway or the methylerythritol phosphate pathway. These isoprene units are joined together by various prenyltransferases to form a linear intermediate, which is then cyclized by terpene cyclases that fold the compound into a conformation to facilitate cyclization.

The resulting products are classified by their size: monoterpene (C10), sesquiterpene (C15), or ​ ​ ​ ​ diterpene (C20). Longer products such as triterpenes (C30) are synthesized from the linear ​ ​ ​ ​ intermediate squalene and cyclized by oxidosqualene cyclase (Thimmappa, Geisler, Louveau, ​ O’Maille, & Osbourn, 2014). Subsequent modifications of terpenoids are performed by many ​ classes of enzymes, such as cytochrome P450 enzymes and 2-oxoglutarate-dependent dioxygenases (Roberts, 2018). ​ ​ Alkaloids are another large class of plant specialized metabolites, totaling about 12,000 compounds (Ziegler & Facchini, 2008). They are highly diverse in structure, with a heterocyclic ​ ​ nitrogen being the only common functional group. Accordingly, the various classes of alkaloids have diverse biosynthetic origins, such as the benzylisoquinolines derived from tyrosine, or purine alkaloids derived from purine nucleotides. Many alkaloids show pharmacological effects in humans, from anticancer compounds like vinblastine to stimulants like caffeine and nicotine.

12 Phenylpropanoids are another major class of plant specialized metabolites. They are found in all land plants and function in mediating many biotic and abiotic stresses and interactions (Vogt, 2010). Phenylpropanoid biosynthesis derives from the shikimate pathway, ​ ​ which produces the aromatic amino acid phenylalanine (Chapter 2, Figure 1). Phenylalanine is then deaminated by phenylalanine ammonia (PAL) to form cinnamic acid. Cinnamic acid

4-hydroxylase (C4H) then produces 4-coumarate (or p-coumarate), which is conjugated to ​ ​ coenzyme A by 4-coumarate:CoA (4CL) to produce 4-coumaroyl-CoA (or p-coumaroyl-CoA). ​ A key branching point occurs at this step, where p-coumaroyl-CoA can be used by ​ ​ chalcone synthase to feed into flavonoid biosynthesis. The other branch is catalyzed by hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyl (HCT), which reacts p-coumaroyl-CoA with shikimic acid to form p-coumaroyl shikimate (Levsh et al., 2016). This ​ ​ ​ ​ ​ compound serves as the precursor for many important phenylpropanoid metabolites, including monolignols, the monomeric units of lignin. Other phenylpropanoids function as pigments, flavor compounds, phytoalexins, and so on (Vogt, 2010). ​ ​

Flavonoid biosynthesis and diversity

Flavonoids are a diverse class of plant specialized metabolites found in all extant land plants. They serve many of aforementioned roles important for plants’ survival on land: UV protection, pigmentation, defense, and communication with symbiotic microbes (Winkel-Shirley, ​ 2001). Flavonoids have also garnered considerable interest for numerous potential health benefits ​ in humans, including antioxidant, anticancer, cardioprotective, and anti-aging effects, which

13 have been observed in a wide variety of studies ranging from cell culture experiments to mouse models to epidemiological studies. (Yao et al., 2004). ​ ​

Flavonoids consist of a C6-C3-C6 core structure, formally called a phenylbenzopyran ​ ​ ​ ​ ​ ​ moiety, with the three rings named A, B, and C and carbon atoms numbered as shown in Figure

1. Flavonoid biosynthesis derives from phenylpropanoid biosynthesis. Chalcone synthase (CHS) catalyzes the first step by reacting p-coumaroyl-CoA with three molecules of malonyl-CoA to ​ ​ form naringenin chalcone (Heller & Hahlbrock, 1980). Chalcone (CHI) then rapidly ​ ​ and stereospecifically performs a ring closure to form (2S)-naringenin, the precursor to ​ ​ downstream flavonoid biosynthesis.

Over 6000 flavonoids have been identified in plants; this enormous diversity is achieved through various enzymatic modifications including hydroxylation, O-methylation, prenylation, ​ ​ glycosylation, oxidation, and reduction (Austin & Noel, 2003). In particular, different classes of ​ ​ flavonoids are named based on the degree of oxidation and saturation at the C-3, C-4, and C-5 positions of the C ring (Figure 1). These various tailoring enzymes can be expressed either constitutively or in response to developmental or environmental cues, and many are restricted to certain plant lineages that have evolved a class of flavonoids as a specific adaptation (R. A. ​ Dixon & Paiva, 1995). ​ Naringenin, a flavanone, can be hydroxylated by flavanone 3-hydroxylase (F3H) to produce dihydrokaempferol (DHK), a dihydroflavonol. These first three enzymatic steps of CHS,

CHI, and F3H likely evolved early in land plants, namely in bryophytes, liverworts, and hornworts (Markham, 1988). These early flavonoids likely acted as sunscreens against UV ​ ​ radiation damage, because they can absorb UV wavelengths, and because they are found in the

14

Figure 1. Overview of flavonoid biosynthesis. Abbreviated enzyme names are written for each ​ biosynthetic step. Classes of flavonoids are written in bold type, whereas the names of naringenin and individual dihydroflavonols are written in normal type. The structure of naringenin is labeled with the flavonoid ring naming and atom numbering system.

15 upper leaf surface in the epicuticular wax due to their lipophilicity (Harborne & Williams, 2000). ​ ​ Another possible function, evolved even before UV protection, is that of regulating or chaperoning plant hormones, because the low amounts of flavonoids produced by early enzymes could have been sufficient for this role (Stafford, 1991). ​ ​ Another important step in flavonoid diversification likely evolved in these early land plants as well: the three major branches of flavonoid modification by differential B-ring hydroxylation (Rausher, 2006). The B-ring of DHK can be hydroxylated at the 3′ and/or 5′ ​ ​ position by flavonoid 3′-hydroxylase (F3′H) or flavonoid 3′,5′-hydroxylase (F3′5′H) to form dihydroquercetin (DHQ) or dihydromyricetin (DHM). These dihydroflavonols can then be converted by flavonol synthase (FLS) into flavonols, which are also found in all land plants

(Rausher, 2006). ​ Vascular plants, beginning with pteridophytes, produce flavan-3,4-diols from dihydroflavonols using dihydroflavonol-4-reductase (DFR). Flavan-3,4-diols can polymerize to form condensed tannins (also known as proanthocyanidins, because their depolymerization produces anthocyanidins). Tannins function primarily as defense against bacterial and fungal pathogens, and their astringency also deters herbivores (Feeny, 1970). ​ ​ In seed plants, flavan-3,4-diols (also known as leucoanthocyanidins) can also be oxidized by anthocyanidin synthase (ANS) to form anthocyanidins. Glycosylation of these aglycones, usually at the 3-O position by anthocyanin 3-O-glucosyltransferase (UFGT) using UDP-glucose, ​ ​ ​ ​ results in anthocyanins, the major red-blue pigments found usually in fruits and flowers but also in other parts of the plant. Different combinations of hydroxylation and methylation of the 3′ and

5′ positions, as well as the identity of the glycosyl group(s), determines the color of the final

16 anthocyanin. Anthocyanins are stored in vacuoles, where the pH and complex formation with metals, malonic acid, or flavones may also modify the color (Austin & Noel, 2003). Flower color ​ ​ is critical in mediating interactions with pollinators, and evolutionary changes in color, underlain by evolution of flavonoid biosynthetic enzymes, often coincide with changes in flower morphology (Rausher, 2006). ​ ​ Isoflavonoids are a class of flavonoids restricted mostly to legumes, except for a few gymnosperm lineages and one moss species (Dewick, 1988). Their structure differs from other ​ ​ flavonoids in that the B ring is shifted from C-2 to C-3 on the lactone ring, the result of an oxidative rearrangement catalyzed by isoflavone synthase (IFS) (Austin & Noel, 2003). ​ ​ Isoflavonoids function in communication with symbiotic rhizobia, a specialized feature of legumes.

As is the case with many specialized metabolic pathways, the enzymes catalyzing these various biosynthetic steps evolved from progenitor enzymes in primary metabolism (Weng, ​ Philippe, & Noel, 2012). Most of the tailoring enzymes in flavonoid biosynthesis are members of ​ one of three enzyme families: 2-oxoglutarate-dependent dioxygenases (2OGD, such as F3H and

ANS), cytochrome P450 reductases (P450, such as F3′H and F3′5′H), or NADPH-dependent reductases (such as DFR and LCR) (Richard A. Dixon & Steele, 1999; Winkel-Shirley, 2001). ​ ​ Catalytic promiscuity is also a common feature of specialized metabolic enzymes, and this feature is also critical for the diversity of flavonoids (Weng et al., 2012). For example, DHK, ​ ​ DHQ, and DHM can all be reduced by DFR or converted into flavonols by FLS, forming a metabolic grid. This extensive flavonoid diversity, produced by the myriad combinations of

17 modifications by specialized metabolic enzymes, has enabled plants to adapt to a wide range of ecological niches.

Type III polyketide synthases

CHS is no different from the aforementioned specialized metabolic enzymes in possessing evolutionary origins in primary metabolic enzymes. CHS is a member of the type III polyketide synthase (PKS) superfamily, which evolved from β-ketoacyl-acyl carrier protein synthase III (KAS III), a type of fatty acid synthase in bacterial fatty acid biosynthesis (Austin & ​ Noel, 2003). Type III PKS enzymes and KAS III share a conserved structural fold and an ​ analogous catalytic mechanism. Both are homodimers, in which each monomer consists of an

αβαβα thiolase fold domain and a bottom domain that together form the active-site cavity, and both contain the same residues (Ferrer, Jez, Bowman, Dixon, & Noel, 1999). Both ​ ​ enzymes perform the same basic catalysis of adding a two-carbon acetate unit to an acyl thioester starter molecule: KAS III uses malonyl-acyl carrier protein as the donor to lengthen a 2-carbon fatty acid thioester (e.g. acetyl-CoA) to a 4-carbon fatty acid in the first step of fatty acid biosynthesis in bacteria, whereas CHS uses three molecules of malonyl-CoA to iteratively extend p-coumaroyl-CoA by a total of six carbons to form a polyketide. ​ The CHS catalytic triad consists of C164, H303, and N336 (Chapter 2, Figure 1), as numbered in Medicago sativa CHS (MsCHS), the first CHS ortholog to be structurally ​ ​ characterized by X-ray crystallography (Ferrer et al., 1999). The catalytic cysteine is conserved ​ ​ in all thiolase-fold enzymes and is located at the N-terminus of an α-helix. This cysteine performs the first step of the CHS catalytic mechanism, a nucleophilic attack on the

18 p-coumaroyl-CoA substrate to generate an acyl-enzyme intermediate. To perform this reaction, ​ the cysteine must be present in the deprotonated thiolate state, suggesting that the active-site

environment has evolved to lower the pKa of cysteine from 8.8 of free cysteine in solution to a ​ ​ ​ ​ value below physiological pH. The helix dipole moment lowers the pKa of cysteine from 8.8 to ​ ​ ​ 7.2 in model peptides, suggesting a mechanistic role for this conserved structural feature

(Kortemme & Creighton, 1995). The Nε of H303 of the catalytic triad forms a stable ​ ​ ​ imidazolium-thiolate ion pair with C164, and mutations of H303 to glutamate and alanine shift

the cysteine pKa from 5.5 to 6.6 and 7.6, respectively (Jez & Noel, 2000; Suh, Kagami, Fukuma, ​ ​ ​ ​ ​ & Sankawa, 2000). ​ The protonated nitrogens of H303 and N336 form an that stabilizes multiple steps of the CHS catalytic mechanism (Austin & Noel, 2003). First, the tetrahedral ​ ​ transition state formed after Cys nucleophilic attack is stabilized. Second, the enol tautomer of malonyl-CoA is stabilized, promoting its decarboxylation and subsequent condensation with the p-coumaroyl moiety. A conserved phenylalanine residue (position 215 in MsCHS) also promotes ​ the formation of a neutral CO2. These loading, decarboxylation, and condensation steps are ​ ​ performed three times until a linear tetraketide intermediate is formed. An intramolecular Claisen condensation then occurs between C-1 and C-6, followed by aromatization to form naringenin chalcone.

Many details of the CHS catalytic mechanism have been elucidated by comparing CHS to other members of the type III PKS superfamily that utilize different acyl donors and acceptors to produce a wide variety of polyketides. The number of malonyl-CoA units incorporated, which determines the length of the linear polyketide intermediate and thus the size of the final product,

19 depends upon the volume of the active-site cavity. The enzyme 2-pyrone synthase (2-PS) from

Gerbera hybrida, for example, uses a smaller acetyl-CoA starter molecule and performs only ​ two malonyl-CoA additions to produce triacetic acid lactone. The overall structures of CHS and

2-PS are highly similar, except for a two-thirds reduction in active-site volume in 2-PS resulting from three key active-site residue substitutions compared to CHS (Jez et al., 2000). In a striking ​ ​ example, octaketide synthase (OKS) from Aloe arborescens, which uses eight molecules of ​ ​ malonyl-CoA to produce SEK4 and SEK4b, was subjected to site-directed mutagenesis to thoroughly investigate the effect of active-site volume on product profile. A series of substitutions of a key glycine residue, ranging from a small alanine to a large tryptophan, generated correspondingly smaller products ranging from heptaketides to tetraketides (Abe, ​ Oguro, Utsumi, Sano, & Noguchi, 2005). Together, these examples illustrate how the steric bulk ​ of residues lining the are responsible for limiting the iterative extension steps and directing the subsequent cyclization step of the type III PKS mechanism.

The cyclization step is another avenue for diversification in type III PKS enzymes.

Stilbene synthase (STS) catalyzes the formation of resveratrol, a compound that has garnered significant attention for its potential contribution to the health benefits of red wine consumption

(Frémont, 2000). STS uses the same substrates as CHS but differs in its cyclization mechanism, ​ which involves a C-2 to C-7 intramolecular aldol condensation. STS and CHS are 60-90% identical in amino acid sequence, and the difference in function is due to a few substitutions near a buried loop, which causes a subtle change in the hydrogen-bonding network of an active-site threonine. This small difference in the electronic environment of the active site is enough to favor one cyclization mechanism over another (Austin, Bowman, Ferrer, Schröder, & Noel, ​

20 2004). In summary, subtle changes in side chain positioning, even if caused by a mutation distant ​ from the residue involved in the catalytic mechanism, can lead to large differences in type III

PKS function.

Applications of plant metabolic and enzyme engineering

Metabolic engineering is the modification of existing biosynthetic pathways to change the amounts of metabolites produced in a particular organism, or to create new pathways and metabolites altogether. Metabolic flux of existing chemistries can be altered by deleting or overexpressing biosynthetic genes in an existing pathway, or by expressing biosynthetic genes in a heterologous host. Novel chemical reactions, however, require engineering of individual enzymes to perform new catalysis (Erb, Jones, & Bar-Even, 2017). To accomplish this, ​ ​ understanding the structure-function relationships of enzymes is critical.

Plant metabolic engineering has many potential applications in energy, pharmaceuticals, food, and agriculture. Biofuels have emerged as a fossil fuel alternative for transportation and other energy needs, but the use of food crops such as corn and sugarcane can conflict with the rising demand for food, especially in developing countries (Tyner, 2012). Second-generation ​ ​ biofuels, which use non-food crop plants (e.g. switchgrass or miscanthus) or residual biomass from food crops, may be an alternative source (Evans, Ramage, DiRocco, & Potts, 2015). The ​ ​ presence of lignin in plant cell walls, however, inhibits access to polysaccharides by enzymes used to produce fermentable sugars.

Given the important role of lignin in structural support and water transport, there have been extensive efforts to reduce or alter lignin content without causing growth defects. One

21 recent effort in Arabidopsis thaliana involved engineering lignin biosynthesis to occur only in ​ ​ vessels, while increasing the thickness of cell walls in secondary cells to provide structural support (Yang et al., 2013). Lignin consisting of syringyl units (S lignin) is less condensed than ​ ​ lignin containing p-hydroxyphenyl or guaiacyl units (H or G lignin), resulting in enhanced ​ ​ chemical and enzymatic digestibility of S-rich lignocellulosic biomass (Renault, ​ Werck-Reichhart, & Weng, 2019). There have been engineering efforts to increase S lignin ​ content in several angiosperm species by overexpressing enzymes in the biosynthetic pathway

(Franke et al., 2000; Meyer, Shirley, Cusumano, Bell-Lelong, & Chapple, 1998; Stewart,

Akiyama, Chapple, Ralph, & Mansfield, 2009). ​ As the world population continues to grow, combined with the effects of climate change, food security will become a greater challenge. Even in a low global warming scenario of 1.5 °C, crop yields and nutritional composition are predicted to change, and engineered crops with elevated stress tolerance or nutrient levels could be an important climate change mitigation strategy (Intergovernmental Panel on Climate Change, 2018). Golden Rice is a well known ​ ​ example of a crop engineered to address nutritional deficiencies; three genes were heterologously expressed in rice endosperm to increase the content of beta-carotene, a precursor of vitamin A

(Ye et al., 2000). Flavonoids, given their numerous potential health benefits, could be another ​ target for metabolic engineering in food crops. Recently, fruit-specific expression of the A. ​ thaliana transcription factor MYB12 was shown to increase phenylpropanoid content, including ​ flavonols, in tomatoes (Y. Zhang et al., 2015). ​ ​ Many small molecule plant hormones control abiotic stress responses, such as auxin, brassinosteroids, salicylic acid, and jasmonates. Abscisic acid (ABA), a terpenoid, is a

22 particularly important secondary messenger in stress responses like reducing transpiration during drought stress and enhancing root growth under nitrogen deficiency (Wani, Kumar, Shriram, & ​ Sah, 2016). As such, many attempts at engineering ABA-mediated stress tolerance have been ​ made. Overexpression of zeaxanthin epoxidase, an enzyme in ABA biosynthesis, in A. thaliana ​ conferred elevated tolerance to drought and salt stress (Park et al., 2008). Transgenic ​ ​ overexpression of A. thaliana LOS5, which activates a key in the last step of ABA ​ ​ biosynthesis, in maize led to greater biomass accumulation under salt stress (J. Zhang et al., ​ 2016). ​ Enzymes can also be expressed, purified, and used in large-scale industrial reactions as biocatalysts. This process, also known as chemoenzymatic synthesis, has the advantages of chemo-, regio-, stereo-, and enantiospecificity over traditional chemical synthesis. Enzyme engineering has also improved biocatalysis by expanding the substrate range, catalytic rate, and stability of enzymes (Bornscheuer et al., 2012). Initially, knowledge of an enzyme’s structure ​ ​ and/or catalytic mechanism allowed for rational design of mutations that could accommodate a new substrate, for example. With the advent of new biotechnological tools to allow for rapid

DNA synthesis and high-throughput screening of enzyme activity, directed evolution enabled the identification of beneficial mutations in enzymes whose structures or mechanisms are unknown, or simple amino acid substitutions that are difficult to rationalize due to epistasis (Tracewell & ​ Arnold, 2009). Rational design and directed evolution can also be combined to perform ​ smaller-scale saturating mutagenesis screening of amino acid positions thought to be important for the desired function (Strohmeier, Pichler, May, & Gruber-Khadjawi, 2011). ​ ​

23 The majority of biocatalysts are bacterial or fungal enzymes, but one class of plant enzymes that has been particularly useful in biocatalysis is hydroxynitrile (HNLs). HNLs from Manihot esculenta (cassava) and Prunus amygdalus (almond) have been subjected to ​ ​ ​ ​ rational design and saturating mutagenesis to generate enzymes with improved specificity in producing the correct enantiomer of intermediates in the syntheses of the antiplatelet drug

Clopidogrel, vitamin B5, and other compounds (Strohmeier et al., 2011). Rational mutations have ​ ​ ​ ​ also been engineered to improve expression of plant HNLs in microbial hosts.

In addition to novel chemistry, biocatalysis also enables more efficient and environmentally friendly industrial chemical manufacturing. Enzymes are made from renewable sources and are biodegradable; their higher product purities lead to less waste production; and enzymatic reactions usually operate at ambient temperature, pressure, and pH, requiring less

energy use (Bornscheuer et al., 2012). Recently, a novel CO2-fixation pathway was designed in ​ ​ ​ ​ ​ vitro using engineered enzymes from all three kingdoms of life, surpassing the efficiency of the ​ Calvin cycle used by plants (Schwander, Schada von Borzyskowski, Burgener, Cortina, & Erb, ​ 2016). This metabolic pathway could be engineered into an organism to produce valuable ​ downstream chemicals using CO2 as the carbon feedstock. Plant metabolism, which has helped ​ ​ shape Earth’s biosphere, climate, and rich species diversity, could play a key role in creating a sustainable future for the planet.

Concluding remarks

Plants have evolved diverse specialized metabolism, facilitating their successful colonization of all but the most extreme corners of land on Earth. Flavonoids play particularly

24 important roles in plant physiological adaptation, and derived plant clades grew their flavonoid arsenals as they diversified into increasingly challenging niches. To accommodate this increased demand for flavonoid production, the key enzyme chalcone synthase has also evolved. Chapter 2 investigates the structural features that enabled increased reactivity of the catalytic cysteine residue by comparison of CHS orthologs from five diverse plant lineages. Chapter 3 establishes methods for investigating the in vivo consequences of this differential cysteine reactivity toward ​ ​ oxidation and its possible role in a redox regulation system to control flavonoid biosynthesis. In the appendix, I explore the function of enzymes in biosynthesis galloylated catechins, major flavonoids found in tea. Altogether, this thesis investigates the relationships among structure, function, and evolution of enzymes in flavonoid biosynthesis.

25 References

Abe, I., Oguro, S., Utsumi, Y., Sano, Y., & Noguchi, H. (2005). Engineered biosynthesis of plant polyketides: chain length control in an octaketide-producing plant type III polyketide synthase. Journal of the American Chemical Society, 127(36), 12709–12716. ​ ​ ​ ​ Austin, M. B., Bowman, M. E., Ferrer, J.-L., Schröder, J., & Noel, J. P. (2004). An aldol switch discovered in stilbene synthases mediates cyclization specificity of type III polyketide synthases. Chemistry & Biology, 11(9), 1179–1194. ​ ​ ​ ​ Austin, M. B., & Noel, J. P. (2003). The chalcone synthase superfamily of type III polyketide synthases. Natural Product Reports, 20(1), 79–110. ​ ​ ​ ​ Beerling, D. J., Osborne, C. P., & Chaloner, W. G. (2001). Evolution of leaf-form in land plants linked to atmospheric CO2 decline in the Late Palaeozoic era. Nature, 410(6826), 352–354. ​ ​ ​ ​ Berner, R. A. (1993). Paleozoic Atmospheric CO2: Importance of Solar Radiation and Plant Evolution. Science, 261(5117), 68–70. ​ ​ ​ ​ Boerjan, W., Ralph, J., & Baucher, M. (2003). Lignin biosynthesis. Annual Review of Plant ​ Biology, 54, 519–546. ​ ​ ​ Bornscheuer, U. T., Huisman, G. W., Kazlauskas, R. J., Lutz, S., Moore, J. C., & Robins, K. (2012). Engineering the third wave of biocatalysis. Nature, 485(7397), 185–194. ​ ​ ​ ​ Dewick, P. M. (1988). Isoflavonoids. The Flavonoids. ​ ​ https://doi.org/10.1007/978-1-4899-2913-6_5 ​ Dixon, R. A., & Paiva, N. L. (1995). Stress-Induced Phenylpropanoid Metabolism. The Plant ​ Cell, 7(7), 1085–1097. ​ ​ ​ Dixon, R. A., & Steele, C. L. (1999). Flavonoids and isoflavonoids – a gold mine for metabolic engineering. Trends in Plant Science. https://doi.org/10.1016/s1360-1385(99)01471-5 ​ ​ ​ Erb, T. J., Jones, P. R., & Bar-Even, A. (2017). Synthetic metabolism: metabolic engineering meets enzyme design. Current Opinion in Chemical Biology, 37, 56–62. ​ ​ ​ ​ Evans, S. G., Ramage, B. S., DiRocco, T. L., & Potts, M. D. (2015). Greenhouse gas mitigation on marginal land: a quantitative review of the relative benefits of forest recovery versus biofuel production. Environmental Science & Technology, 49(4), 2503–2511. ​ ​ ​ ​ Feeny, P. (1970). Seasonal Changes in Oak Leaf Tannins and Nutrients as a Cause of Spring Feeding by Winter Moth Caterpillars. Ecology, 51(4), 565–581. ​ ​ ​ ​ Ferrer, J. L., Jez, J. M., Bowman, M. E., Dixon, R. A., & Noel, J. P. (1999). Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nature ​ Structural Biology, 6(8), 775–784. ​ ​ ​ Franke, R., McMichael, C. M., Meyer, K., Shirley, A. M., Cusumano, J. C., & Chapple, C. (2000). Modified lignin in tobacco and poplar plants over-expressing the Arabidopsis gene encoding ferulate 5-hydroxylase. The Plant Journal: For Cell and Molecular Biology, 22(3), ​ ​ ​ ​ 223–234. Frémont, L. (2000). Biological effects of resveratrol. Life Sciences, 66(8), 663–673. ​ ​ ​ ​ Friis, E. M., Crane, P. R., & Pedersen, K. R. (2011). Early Flowers and Angiosperm Evolution. ​ ​ Cambridge University Press. Graham, J. B., Aguilar, N. M., Dudley, R., & Gans, C. (1995). Implications of the late Palaeozoic oxygen pulse for physiology and evolution. Nature, 375(6527), 117–120. ​ ​ ​ ​ Harborne, J. B., & Williams, C. A. (2000). Advances in flavonoid research since 1992.

26 Phytochemistry, 55(6), 481–504. ​ ​ ​ Harrison, C. J., & Morris, J. L. (2018). The origin and early evolution of vascular plant shoots and leaves. Philosophical Transactions of the Royal Society of London. Series B, Biological ​ Sciences, 373(1739). https://doi.org/10.1098/rstb.2016.0496 ​ ​ ​ ​ Heller, W., & Hahlbrock, K. (1980). Highly purified “flavanone synthase” from parsley catalyzes the formation of naringenin chalcone. Archives of Biochemistry and Biophysics. ​ ​ https://doi.org/10.1016/0003-9861(80)90395-1 ​ Intergovernmental Panel on Climate Change. (2018). Global Warming of 1.5°C: An IPCC ​ Special Report on the Impacts of Global Warming of 1.5°C Above Pre-industrial Levels and Related Global Greenhouse Gas Emission Pathways, in the Context of Strengthening the Global Response to the Threat of Climate Change, Sustainable Development, and Efforts to Eradicate Poverty. ​ Jez, J. M., Austin, M. B., Ferrer, J., Bowman, M. E., Schröder, J., & Noel, J. P. (2000). Structural control of polyketide formation in plant-specific polyketide synthases. Chemistry & Biology, ​ ​ 7(12), 919–930. ​ Jez, J. M., & Noel, J. P. (2000). Mechanism of Chalcone Synthase: pKa of the Catalytic Cysteine and the Role of the Conserved Histidine in a Plant Polyketide Synthase. The Journal of ​ Biological Chemistry, 275(50), 39640–39646. ​ ​ ​ Kenrick, P., & Crane, P. R. (1997). The origin and early evolution of plants on land. Nature, ​ ​ 389(6646), 33–39. ​ Kortemme, T., & Creighton, T. E. (1995). Ionisation of Cysteine Residues at the Termini of Model α-Helical Peptides. Relevance to Unusual Thiol pKaValues in Proteins of the Thioredoxin Family. Journal of Molecular Biology, 253(5), 799–812. ​ ​ ​ ​ Levsh, O., Chiang, Y.-C., Tung, C. F., Noel, J. P., Wang, Y., & Weng, J.-K. (2016). Dynamic Conformational States Dictate Selectivity toward the Native Substrate in a Substrate-Permissive Acyltransferase. Biochemistry, 55(45), 6314–6326. ​ ​ ​ ​ Li, F.-S., Phyo, P., Jacobowitz, J., Hong, M., & Weng, J.-K. (2019). The molecular structure of plant sporopollenin. Nature Plants, 5(1), 41–46. ​ ​ ​ ​ Linkies, A., Graeber, K., Knight, C., & Leubner-Metzger, G. (2010). The evolution of seeds. The ​ New Phytologist, 186(4), 817–831. ​ ​ ​ Markham, K. R. (1988). Distribution of flavonoids in the lower plants and its evolutionary significance. In J. B. Harborne (Ed.), The Flavonoids: Advances in Research since 1980 (pp. ​ ​ 427–468). Boston, MA: Springer US. Meyer, K., Shirley, A. M., Cusumano, J. C., Bell-Lelong, D. A., & Chapple, C. (1998). Lignin monomer composition is determined by the expression of a cytochrome P450-dependent monooxygenase in Arabidopsis. Proceedings of the National Academy of Sciences of the ​ United States of America, 95(12), 6619–6623. ​ ​ ​ Moghe, G. D., & Last, R. L. (2015). Something Old, Something New: Conserved Enzymes and the Evolution of Novelty in Plant Specialized Metabolism. Plant Physiology, 169(3), ​ ​ ​ ​ 1512–1523. Park, H.-Y., Seok, H.-Y., Park, B.-K., Kim, S.-H., Goh, C.-H., Lee, B.-H., … Moon, Y.-H. (2008). Overexpression of Arabidopsis ZEP enhances tolerance to osmotic stress. Biochemical and Biophysical Research Communications, 375(1), 80–85. ​ ​ ​ Rausher, M. D. (2006). The Evolution of Flavonoids and Their Genes. In E. Grotewold (Ed.), The Science of Flavonoids (pp. 175–211). New York, NY: Springer New York. ​

27 Renault, H., Werck-Reichhart, D., & Weng, J.-K. (2019). Harnessing lignin evolution for biotechnological applications. Current Opinion in Biotechnology, 56, 105–111. ​ ​ ​ ​ Riederer, M., & Muller, C. (2008). Annual Plant Reviews, Biology of the Plant Cuticle. John ​ ​ Wiley & Sons. Roberts, J. A. (Ed.). (2018). Biochemistry of Terpenoids: Monoterpenes, Sesquiterpenes and Diterpenes. In Annual Plant Reviews online (Vol. 202, pp. 258–303). Chichester, UK: John ​ ​ Wiley & Sons, Ltd. Schmidt, B. M., Ribnicky, D. M., Lipsky, P. E., & Raskin, I. (2007). Revisiting the ancient concept of botanical therapeutics. Nature Chemical Biology, 3(7), 360–366. ​ ​ ​ ​ Schwander, T., Schada von Borzyskowski, L., Burgener, S., Cortina, N. S., & Erb, T. J. (2016). A synthetic pathway for the fixation of carbon dioxide in vitro. Science, 354(6314), ​ ​ ​ ​ 900–904. Stafford, H. A. (1991). Flavonoid evolution: an enzymic approach. Plant Physiology, 96(3), ​ ​ ​ ​ 680–685. Stewart, J. J., Akiyama, T., Chapple, C., Ralph, J., & Mansfield, S. D. (2009). The Effects on Lignin Structure of Overexpression of Ferulate 5-Hydroxylase in Hybrid Poplar1. PLANT ​ PHYSIOLOGY. https://doi.org/10.1104/pp.109.137059 ​ ​ Strohmeier, G. A., Pichler, H., May, O., & Gruber-Khadjawi, M. (2011). Application of designed enzymes in organic synthesis. Chemical Reviews, 111(7), 4141–4164. ​ ​ ​ ​ Suh, D.-Y., Kagami, J., Fukuma, K., & Sankawa, U. (2000). Evidence for Catalytic Cysteine–Histidine Dyad in Chalcone Synthase. Biochemical and Biophysical Research ​ Communications, 275(3), 725–730. ​ ​ ​ Thimmappa, R., Geisler, K., Louveau, T., O’Maille, P., & Osbourn, A. (2014). Triterpene biosynthesis in plants. Annual Review of Plant Biology, 65, 225–257. ​ ​ ​ ​ Tracewell, C. A., & Arnold, F. H. (2009). Directed enzyme evolution: climbing fitness peaks one amino acid at a time. Current Opinion in Chemical Biology, 13(1), 3–9. ​ ​ ​ ​ Tyner, W. E. (2012). Biofuels and agriculture: a past perspective and uncertain future. International Journal of Sustainable Development and World Ecology, 19(5), 389–394. ​ ​ ​ Vogt, T. (2010). Phenylpropanoid biosynthesis. Molecular Plant, 3(1), 2–20. ​ ​ ​ ​ Wani, S. H., Kumar, V., Shriram, V., & Sah, S. K. (2016). Phytohormones and their metabolic engineering for abiotic stress tolerance in crop plants. The Crop Journal, 4(3), 162–176. ​ ​ ​ ​ Weng, J.-K., & Chapple, C. (2010). The origin and evolution of lignin biosynthesis. The New ​ Phytologist, 187(2), 273–285. ​ ​ ​ Weng, J.-K., Philippe, R. N., & Noel, J. P. (2012). The rise of chemodiversity in plants. Science, ​ ​ 336(6089), 1667–1670. ​ Winkel-Shirley, B. (2001). Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiology, 126(2), 485–493. ​ ​ ​ ​ Yang, F., Mitra, P., Zhang, L., Prak, L., Verhertbruggen, Y., Kim, J.-S., … Loqué, D. (2013). Engineering secondary cell wall deposition in plants. Plant Biotechnology Journal, 11(3), ​ ​ ​ ​ 325–335. Yao, L. H., Jiang, Y. M., Shi, J., Tomás-Barberán, F. A., Datta, N., Singanusong, R., & Chen, S. S. (2004). Flavonoids in food and their health benefits. Plant Foods for Human Nutrition , ​ ​ 59(3), 113–122. ​ Ye, X., Al-Babili, S., Klöti, A., Zhang, J., Lucca, P., Beyer, P., & Potrykus, I. (2000). Engineering the Provitamin A (β-Carotene) Biosynthetic Pathway into (Carotenoid-Free)

28 Rice Endosperm. Science, 287(5451), 303–305. ​ ​ ​ ​ Zhang, J., Yu, H., Zhang, Y., Wang, Y., Li, M., Zhang, J., … Li, Z. (2016). Increased abscisic acid levels in transgenic maize overexpressing AtLOS5 mediated root ion fluxes and leaf water status under salt stress. Journal of Experimental Botany, 67(5), 1339–1355. ​ ​ ​ ​ Zhang, Y., Butelli, E., Alseekh, S., Tohge, T., Rallapalli, G., Luo, J., … Martin, C. (2015). Multi-level engineering facilitates the production of phenylpropanoid compounds in tomato. Nature Communications, 6, 8635. ​ ​ ​ Ziegler, J., & Facchini, P. J. (2008). Alkaloid biosynthesis: metabolism and trafficking. Annual ​ Review of Plant Biology, 59, 735–769. ​ ​ ​

29

30 Chapter 2

Mechanistic basis for the evolution of chalcone synthase catalytic cysteine reactivity in land plants

Authors 1,2 3 3 1,2 Geoffrey Liou ,​ Ying-Chih Chiang ,​ Yi Wang ,​ and Jing-Ke Weng ​ ​ ​ ​

Author Affiliations 1. Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 2. Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA 3. Department of Physics, The Chinese University of Hong Kong, Shatin, NT, Hong Kong

Published As Liou, G., Chiang, Y.-C., Wang, Y., & Weng, J.-K. (2018). Mechanistic basis for the evolution of chalcone synthase catalytic cysteine reactivity in land plants. Journal of Biological Chemistry ​ 293: 18601-18612.

Author Contributions G.L. and J.-K.W. performed crystallography, seed complementation, and phylogenetic analysis. G.L. performed enzyme assays. Y.-C.C. and Y.W. performed molecular dynamics simulations and wrote the relevant results and discussion. G.L. wrote the remaining sections together with supervision from J.-K.W.

31 Abstract

Flavonoids are important polyphenolic natural products, ubiquitous in land plants, that play diverse functions in plants’ survival in their ecological niches, including UV protection, pigmentation for attracting pollinators, symbiotic nitrogen fixation, and defense against herbivores. Chalcone synthase (CHS) catalyzes the first committed step in plant flavonoid biosynthesis and is highly conserved in all land plants. In several previously reported crystal structures of CHSs from flowering plants, the catalytic cysteine is oxidized to sulfinic acid, indicating enhanced nucleophilicity in this residue associated with its increased susceptibility to oxidation. In this study, we report a set of new crystal structures of CHSs representing all five major lineages of land plants (bryophytes, lycophytes, monilophytes, gymnosperms, and angiosperms), spanning 500 million years of evolution. We reveal that the structures of CHS from a lycophyte and a moss species preserve the catalytic cysteine in a reduced state, in contrast to the cysteine sulfinic acid seen in all euphyllophyte CHS structures. In vivo complementation, ​ ​ in vitro biochemical and mutagenesis analyses, and molecular dynamics simulations identified a ​ set of residues that differ between basal-plant and euphyllophyte CHSs and modulate catalytic cysteine reactivity. We propose that the CHS active-site environment has evolved in euphyllophytes to further enhance the nucleophilicity of the catalytic cysteine since the divergence of euphyllophytes from other vascular plant lineages 400 million years ago. These changes in CHS could have contributed to the diversification of flavonoid biosynthesis in euphyllophytes, which in turn contributed to their dominance in terrestrial ecosystems.

32 Introduction

In their transition from aquatic domains to terrestrial environments, early land plants faced several major challenges, including exposure to damaging UV-B radiation once screened by aquatic environments, lack of structural support once provided by buoyancy in water, drought, and novel pathogens and herbivores. To cope with many of these stresses, land plants have evolved a series of specialized metabolic pathways, among which phenylpropanoid metabolism was probably one of the most critical soon after the transition from water to land

(Weng & Chapple, 2010). ​ Flavonoids are a diverse class of plant phenolic compounds found in all extant land plants, with important roles in many aspects of plant life, including UV protection, pigmentation for attracting pollinators and seed dispersers, defense, and signaling between plants and microbes

(Winkel-Shirley, 2001). Some flavonoids are also of great interest for their anti-cancer and ​ antioxidant activities as well as other potential health benefits to humans (Yao et al., 2004). After ​ ​ the core flavonoid biosynthetic pathway was established in early land plants, new branches of the pathway continued to evolve over the history of plant evolution, producing structurally and functionally diverse flavonoids to cope with changing habitats, co-evolving pathogens and herbivores, and other aspects of plants’ ecological niches. Basal bryophytes biosynthesize the three main classes of flavonoids, namely flavanones, flavones, and flavonols, which likely emerged as UV sunscreens (Rausher, 2006). The lycophyte Selaginella biosynthesizes a rich ​ ​ ​ ​ diversity of biflavonoids, many of which were shown to be cytotoxic and may function as phytoalexins (Weng & Noel, 2013). The ability to synthesize the astringent, polyphenolic ​ ​ tannins, which defend against bacterial and fungal pathogens, seems to have evolved in

33 euphyllophytes (Rausher, 2006). Finally, seed plants, including gymnosperms and angiosperms, ​ ​ developed elaborate anthocyanin biosynthetic pathways to produce the vivid colors used to attract pollinators or ward off herbivores.

Chalcone synthase (CHS), a highly conserved plant type III polyketide synthase (PKS), is the first committed enzyme in the plant flavonoid biosynthetic pathway. CHS synthesizes naringenin chalcone from a molecule of p-coumaroyl-CoA and three molecules of malonyl-CoA ​ ​ (Weng & Noel, 2012) (Figure 1A). The proposed catalytic mechanism of CHS involves loading ​ of the starter molecule p-coumaroyl CoA onto the catalytic cysteine, which also serves as the ​ ​ attachment site of the growing polyketide chain during the iterative elongation steps (Austin & ​ Noel, 2003). This initial reaction step requires the cysteine to be present as a thiolate anion ​ before loading of the starter molecule (Figure 1B). Using thiol-specific inactivation and the pH

dependence of the malonyl-CoA decarboxylation reaction, the pKa of the catalytic cysteine (Cys ​ ​ ​ 164) of Medicago sativa CHS (MsCHS) was measured to be 5.5, a value significantly lower than ​ ​ 8.7 for free cysteine (Jez & Noel, 2000). ​ ​ Interestingly, we observed that the catalytic cysteine residues in the previously reported

MsCHS crystal structures appear to be oxidized to sulfinic acid (PDB ID 1BI5 and 1BQ6)

(Ferrer et al. 1999). Furthermore, the same phenomenon was observed in the crystal structures ​ ​ for several other plant type III PKSs evolutionarily derived from CHS, including Gerbera ​ hybrida 2-pyrone synthase (PDB ID 1QLV) (Jez et al., 2000) (Figure S1). The other ​ ​ ​ non-catalytic cysteines in these proteins do not appear to be oxidized. These findings suggest that the oxidation of the catalytic cysteine observed in several type III PKS crystal structures may not simply be an artifact of X-ray crystallography, but rather reflects the intrinsic redox potential and

34 reactivity of the catalytic cysteine evolved in this family of enzymes. Indeed, the propensity for a particular cysteine residue to undergo oxidation has been previously indicated to correlate with

low pKa (Reddie & Carroll, 2008). ​ ​ ​ ​ ​ Here, we present a set of new crystal structures of orthologous CHSs representing five major lineages of land plants, namely bryophytes, lycophytes, monilophytes, gymnosperms, and angiosperms, spanning 500 million years of land plant evolution. Through comparative structural analysis, in vivo complementation, in vitro biochemistry, mutagenesis studies, and molecular ​ ​ ​ ​ dynamics simulations, we reveal that CHSs of basal land plants, i.e. bryophytes and lycophytes, contain a catalytic cysteine less reactive than that of the CHSs from higher plants, i.e. euphyllophytes. We probe into the structure-function relationship of a set of residues that modulate the reactivity of the catalytic cysteine, which leads us to propose that euphyllophytes may have evolved a more catalytically efficient CHS to enhance flavonoid biosynthesis relative to their basal plant relatives.

35

Figure 1. A, Phenylpropanoid and flavonoid metabolism. PAL, phenylalanine ammonia-lyase; ​ C4H, trans-cinnamate 4-monooxygenase; 4CL, 4-coumarate-CoA ligase; CHS, chalcone ​ ​ synthase; CHI, chalcone isomerase; CoA, coenzyme A. Cyclization of naringenin chalcone to naringenin also proceeds spontaneous in aqueous solution. B, Reaction mechanism of CHS. The ​ ​ extension step is performed three times to repeatedly extend the starter molecule malonyl-CoA to form a linear tetraketide intermediate, which then cyclizes to form naringenin chalcone.

36 Results

Basal-plant CHSs contain reduced catalytic cysteine in their crystal structures

To examine the structural basis for the evolution of CHS across major land plant lineages, we cloned, expressed, and solved the crystal structures of the five CHS orthologs from the bryophyte Physcomitrella patens (PpCHS), the lycophyte Selaginella moellendorffii ​ ​ ​ (SmCHS), the monilophyte Equisetum arvense (EaCHS), the gymnosperm Pinus sylvestris ​ ​ ​ (PsCHS), and the angiosperm Arabidopsis thaliana (AtCHS) (Figure 2, Table 1). Like ​ ​ previously reported crystal structures of type III polyketide synthases, all five CHS orthologs form symmetric homodimers and share the same αβαβα thiolase fold, suggesting a common evolutionary origin (Ferrer, Jez, Bowman, Dixon, & Noel, 1999). The catalytic triad of cysteine, ​ ​ histidine, and asparagine is found in a highly similar conformation to other PKS and related fatty acid biosynthetic β-ketoacyl-(acyl-carrier-protein) synthase III (KAS III) enzymes, suggesting that they share a similar general catalytic mechanism (Figure 2B).

Based on the previously proposed reaction mechanism for MsCHS, the catalytic cysteine is C169 in AtCHS and C159 in SmCHS. This residue initiates the reaction mechanism by performing nucleophilic attack on p-coumaroyl-CoA (Figure 1B). The other two members of the ​ ​ catalytic triad consist of H309 and N342 in AtCHS, and H302 and N335 in SmCHS. The

catalytic histidine contributes to the lowered pKa of the catalytic cysteine by forming a stable ​ ​ ​ imidazolium-thiolate ion pair (Jez & Noel, 2000). The histidine and asparagine also form the ​ ​ oxyanion hole that stabilizes the tetrahedral transition states formed during the initial nucleophilic attack by cysteine on p-coumaroyl-CoA and after malonyl-CoA decarboxylation ​ ​ (Figure 1B).

37

Figure 2. Structural and in vivo functional characterization of diverse CHS orthologs. A, A ​ ​ ​ ​ ​ maximum-likelihood phylogenetic tree of CHSs from diverse land plant species, with clades indicated by color. The tree is rooted on a bacterial KAS III enzyme (EcFabH). The scale bar indicates evolutionary distance in substitutions per amino acid. The sequence near the differentially conserved cysteine/serine (position 347 in AtCHS) is shown for each CHS. B, ​ ​ Overall apo crystal structures and active site structures of CHSs from diverse plant lineages. Above, the homodimeric form of CHS is shown with a color gradient from blue at the N terminus to red at the C terminus of each monomer. Below, the backbone and side chains of the catalytic triad and the differentially conserved cysteine/serine are shown. The 2Fo−Fc electron ​ ​ ​ ​ ​ ​ density map contoured at 1.5휎 is shown around the catalytic cysteine. CHSs from euphyllophytes show the catalytic cysteine oxidized to sulfinic acid, whereas CHSs from basal land plants have a reduced catalytic cysteine. The red or yellow dot next to the enzyme name indicates the presence of serine or cysteine, respectively, in position 347 (AtCHS numbering).

38

PpCHS SmCHS PDB ID 6DX7 6DX8 Data collection Total reflections 161385 (13620) 493910 (50217) Unique reflections 82406 (7763) 81737 (6434) Multiplicity 2.0 (1.8) 6.0 (6.2) Completeness (%) 98.95 (94.60) 90.60 (79.18) Mean I/sigma(I) 14.31 (1.57) 12.32 (1.62) R-merge 0.03961 (0.4442) 0.1427 (1.194) CC1/2 0.998 (0.747) 0.995 (0.51) Refinement 36.24 - 2.61 39.01 - 1.7 Resolution range (Å) (2.703 - 2.61) (1.761 - 1.7) Space group P 2 21 21 P 1 21 1 Unit cell (Å) 71.6 192.83 195.51 55.1702 66.6703 102.55 Unit cell (°) 90 90 90 90 91.35 90 R-work 0.1810 (0.3187) 0.1927 (0.3111) R-free 0.2627 (0.3951) 0.2362 (0.3567) Non-hydrogen protein atoms 17592 5776 Water molecules 59 549 RMSD bonds (Å) 0.014 0.012 RMSD angles (°) 1.5 1.32 Ramachandran favored (%) 94.07 97.74 Ramachandran allowed (%) 5.58 1.99 Ramachandran outliers (%) 0.35 0.27 Average B-factor 63.61 22.24

Table 1. Crystallographic data collection and refinement statistics for the five wild-type CHSs. ​ The highest-resolution shell values are given in parentheses.

39

EaCHS PsCHS AtCHS PDB ID 6DX9 6DXA 6DXB Data collection Total reflections 1164176 (116600) 168567 (13442) 430437 (39846) Unique reflections 125911 (12431) 46489 (4596) 227280 (21992) Multiplicity 9.2 (9.4) 3.6 (2.9) 1.9 (1.8) Completeness (%) 99.97 (99.90) 99.31 (98.75) 98.48 (95.43) Mean I/sigma(I) 13.26 (1.35) 10.40 (2.62) 10.17 (2.78) R-merge 0.1013 (1.58) 0.1424 (0.6168) 0.04372 (0.2437) CC1/2 0.998 (0.697) 0.979 (0.406) 0.996 (0.837) Refinement 56.57 - 1.5 52.45 - 2.01 38.68 - 1.549 Resolution range (Å) (1.554 - 1.5) (2.082 - 2.01) (1.604 - 1.549) Space group P 21 21 21 P 1 21 1 P 1 21 1 Unit cell (Å) 52.954 112.764 130.803 58.017 100.059 65.882 54.64 137.56 108.56 Unit cell (°) 90 90 90 90 110.807 90 90 95.59 90 R-work 0.1610 (0.4374) 0.1591 (0.2512) 0.1416 (0.1967) R-free 0.1796 (0.4500) 0.2204 (0.3189) 0.1640 (0.2151) Non-hydrogen protein atoms 6054 6005 12052 Water molecules 711 684 1680 RMSD bonds (Å) 0.009 0.012 0.009 RMSD angles (°) 1.26 1.31 1.27 Ramachandran favored (%) 97.8 96.75 97.91 Ramachandran allowed (%) 2.2 3.25 2.09 Ramachandran outliers (%) 0 0 0 Average B-factor 20.79 17.9 17.52

Table 1 continued.

40

Notably, SmCHS and PpCHS are the first CHSs for which a reduced catalytic cysteine has been observed in the crystal structure (Figure 2B). The catalytic cysteine in SmCHS can still become oxidized to sulfenic acid when the crystal is soaked in hydrogen peroxide, indicating that it is still susceptible to oxidation at a lower rate (Figure S2). Like most other euphyllophyte type

III PKS crystal structures solved to date, AtCHS, PsCHS, and EaCHS contain doubly oxidized catalytic cysteine sulfinic acid (Figure 2B). This interesting observation suggests a functional divide between basal-plant and euphyllophyte CHSs. Despite shared orthology, the redox potential of the catalytic cysteine in PpCHS and SmCHS may differ from that of the euphyllophyte CHSs, resulting in different levels of sensitivity to oxidation under similar crystallization conditions. This could be due to the evolution of some novel molecular features in euphyllophyte CHSs not present in the lower-plant CHSs.

Basal-plant CHSs only partially complement the Arabidopsis CHS-null mutant

CHS orthologs have been identified in all land plant species sequenced to date, suggesting a highly conserved biochemical function. To test whether the five CHSs from the five major plant lineages are functionally equivalent, we generated transgenic Arabidopsis thaliana ​ lines expressing each of the five different CHSs driven by the Arabidopsis CHS promoter in the ​ ​ ​ ​ ​ ​ CHS-null mutant transparent testa 4-2 (tt4-2) background (Shirley et al., 1995) (Figure S3). ​ ​ ​ ​ ​ ​ ​ Twenty independent T1 plants were selected for each construct. The phenotypes of the transgenic plants described below were represented by the majority of independent transgenic events for each unique construct. As the name indicates, the tt4-2 mutant is devoid of flavonoid ​ ​ biosynthesis and therefore lacks the accumulation of the brown condensed tannin pigments in

41 seed coats, revealing the pale yellow color of the underlying cotyledons (Shirley et al., 1995). ​ ​ Whereas AtCHS, PsCHS and EaCHS fully complement the tt phenotype of tt4-2, PpCHS and ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ SmCHS only partially rescue the seed tt phenotype of tt4-2 (Figure S3), suggesting that PpCHS ​ ​ ​ ​ ​ ​ ​ and SmCHS are likely less active than their higher-plant counterparts in vivo. This result also ​ ​ ​ ​ correlates with the crystallographic observation where the catalytic cysteine of basal plant and euphyllophyte CHSs exhibit differential susceptibility to oxidation.

The pKa of the catalytic cysteine is higher in basal-plant CHSs than in euphyllophyte CHSs ​ ​ ​ To perform nucleophilic attack on the p-coumaroyl-CoA substrate, the catalytic cysteine ​ ​ must be present in the thiolate anion form. As shown previously in MsCHS, the pKa of the ​ ​ ​ catalytic cysteine is lowered to 5.5, well below physiological pH, in order to stabilize this

deprotonated state (Jez & Noel, 2000). Two factors could contribute to the depressed pKa of ​ ​ ​ ​ ​ C164. First, H303, one of the catalytic triad of CHS in vicinity of C164, provides an ionic interaction with C164 that can further stabilize the cysteine thiolate anion. Second, C164 is positioned at the N-terminus of the MsCHS α-9 helix (Ferrer et al., 1999), which provides a ​ ​ stabilizing effect on the cysteine thiolate anion through the partial positive charge of the helix

dipole (Kortemme & Creighton, 1995). The acidic pKa of the catalytic cysteine in CHS ensures ​ ​ ​ ​ ​ the presence of a cysteine thiolate anion in the enzyme active site at physiological pH to serve as the nucleophile for starter molecule loading.

To measure the pKa of the catalytic cysteine in the five land plant CHS orthologs, we ​ ​ ​ performed pH-dependent inactivation of CHS using iodoacetamide, a thiol-specific compound that reacts with sulfhydryl groups that are sufficiently nucleophilic, followed by a CHS activity

42 assay at the usual reaction pH. At pH values above the pKa, the catalytic cysteine is deprotonated ​ ​ ​ and able to react with iodoacetamide, thus inactivating CHS. At pH values below the pKa, the ​ ​ ​ catalytic cysteine is protonated and protected from iodoacetamide modification, thus retaining

CHS activity in the subsequent enzyme assay. The amount of CHS activity remaining after iodoacetamide treatment was expressed as a ratio compared to the CHS activity of a control

treatment at the same pH but without iodoacetamide. The pKa was calculated using nonlinear ​ ​ ​ regression to fit a log(inhibitor) vs. response equation, which gave the pH at which 50% of maximal inhibition was obtained.

The pKa for AtCHS was measured to be 5.428, which is close to the 5.5 measured for ​ ​ ​

MsCHS (Figure 3A). The pKa for SmCHS was measured to be 6.468, approximately 1 pH unit ​ ​ ​ higher than that of the two angiosperm CHS orthologs. This elevated pKa measured for SmCHS ​ ​ ​ is consistent with the observation of a catalytic cysteine that is less reactive and less prone to oxidation. Also consistent with the crystallographic and plant complementation results, pK ​ ​a values around 5.5 were measured for euphyllophyte orthologs PsCHS and EaCHS, and around

6.5 for the basal-plant orthologs PpCHS (Figure S4).

Residues near the active-site cavity affect the pKa and reactivity of the catalytic cysteine ​ ​ ​ We next examined the sequence and structural differences between basal-plant and euphyllophyte CHSs that could play a role in modulating catalytic cysteine reactivity. This led us to first identifying a residue near the active site that is conserved as C347 (AtCHS numbering) in

AtCHS and other euphyllophyte sequences, and as S340 (SmCHS numbering) in SmCHS and other lycophyte and bryophyte sequences (Figure 2A).

43

Figure 3. pKa measurement of the catalytic cysteine and characterization of key residues that ​ ​ ​ ​ affect pKa. A, pKa measurement of AtCHS and SmCHS wild-type enzymes. CHS enzyme was ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ pre-incubated at various pH with or without the 25 µM iodoacetamide inhibitor for 30 s, and an aliquot was taken to run in a CHS activity assay. The ratio of naringenin product produced in the iodoacetamide treatment divided by the control treatment was calculated for each pH point. A nonlinear regression was performed to fit a log(inhibitor) vs. response curve to determine the pH at which 50% of maximal inhibition was achieved, which was determined to be the pKa of the ​ ​ ​ catalytic cysteine residue. The pKa of AtCHS is close to the 5.5 determined for other ​ ​ ​ euphyllophyte CHSs, whereas the pKa of SmCHS is over 1 pH unit higher. B, Overall structures ​ ​ ​ ​ ​ and active site configurations of AtCHS C347S and SmCHS S340C single mutants. The 2Fo−Fc ​ ​ ​ ​ ​ electron density map contoured at 1.5휎 is shown around the catalytic cysteine. SmCHS S340C shows oxidation of C159, unlike the SmCHS WT. AtCHS C347S has an oxidized C169, like

AtCHS WT. C, pKa measurements of AtCHS C347S and SmCHS S340C mutants. ​ ​ ​ ​ ​

44

SmCHS S340C AtCHS C347S AtCHS M7 PDB ID 6DXC 6DXD 6DXE

Data collection Total reflections 185024 (12834) 368273 (17370) 293345 (28831) Unique reflections 105252 (8805) 201521 (14085) 102276 (7995) Multiplicity 1.8 (1.5) 1.8 (1.2) 2.9 (2.8) Completeness (%) 95.55 (80.52) 93.32 (65.39) 91.60 (76.72) Mean I/sigma(I) 9.66 (3.06) 8.04 (1.51) 11.10 (1.18) R-merge 0.04892 (0.1642) 0.05533 (0.3447) 0.04526 (0.5779) CC1/2 0.983 (0.891) 0.994 (0.699) 0.999 (0.778)

Refinement 30.48 - 1.54 32.92 - 1.59 60.11 - 1.608 Resolution range (Å) (1.595 - 1.54) (1.647 - 1.59) (1.665 - 1.608) Space group P 1 21 1 P 1 21 1 P 1 21 1 Unit cell (Å) 55.22 66.38 103 54.86 138.22 108.9 72.8 55.9 100.21 Unit cell (°) 90 91.73 90 90 95.73 90 90 92.51 90 R-work 0.1427 (0.1925) 0.1455 (0.2608) 0.1721 (0.2610) R-free 0.1725 (0.2471) 0.1688 (0.2982) 0.2023 (0.2675) Non-hydrogen protein atoms 5800 12028 6058 Water molecules 881 1825 859 RMSD bonds (Å) 0.01 0.01 0.01 RMSD angles (°) 1.38 1.39 1.27 Ramachandran favored (%) 97.72 97.92 97.79 Ramachandran allowed (%) 2.28 2.02 2.08 Ramachandran outliers (%) 0 0.07 0.13 Average B-factor 16.52 20.78 21.36

Table 2. Crystallographic data collection and refinement statistics for the three mutant CHSs. ​ The highest-resolution shell values are given in parentheses.

45 To investigate the role of this residue in modulating catalytic cysteine reactivity, we generated the reciprocal mutations in SmCHS and AtCHS respectively and first characterized these mutant proteins using X-ray crystallography (Figure 3B, Table 2). Under identical crystallization conditions as wild-type SmCHS, the SmCHS S340C mutant exhibits a partially oxidized catalytic cysteine in its crystal structure, suggesting that the residue does play some role in determining cysteine reactivity. The AtCHS C347S mutant, however, still retains an oxidized catalytic cysteine in its crystal structure.

We then measured the pKa of the catalytic cysteine in both SmCHS S340C and AtCHS ​ ​ ​

C347S mutants (Figure 3C). The pKa for SmCHS S340C decreases by about 0.25 pH units ​ ​ ​ compared to wild-type SmCHS, consistent with the observation that the SmCHS S340C crystal

structure contained a partially oxidized catalytic cysteine. The pKa for AtCHS C347S decreases ​ ​ ​ by about 0.25 pH units compared to wild-type AtCHS, also consistent with the observation that the AtCHS C347S crystal structure retained an oxidized catalytic cysteine. Taken together, the

crystallographic and pKa measurement results suggest that the reciprocal mutation at this position ​ ​ ​ is not sufficient to act as a simple switch between the active-site environments of euphyllophyte and basal-plant CHSs to modulate catalytic cysteine reactivity. Additional sequence and

structural features likely contribute to an active-site environment that lowers the pKa of the ​ ​ ​ catalytic cysteine in AtCHS.

To identify these features, we examined a multiple sequence alignment of CHS orthologs from diverse plant species and identified residues that show conserved variations between euphyllophytes and basal-plant lineages (Figure 4A and Figure S5). Two residues, F170 and

G173 in euphyllophyte CHSs, were found to be substituted as serine and alanine, respectively, in

46

Figure 4. Identification and characterization of additional key residues that affect CHS cysteine ​ reactivity. A, Overlaid crystal structures of AtCHS and SmCHS showing the seven conserved ​ ​ residue differences between euphyllophyte and basal-plant CHSs. B, pKa measurement of ​ ​ ​ ​ ​ AtCHS M7 and SmCHS M7 mutants. The pKa of each M7 mutant is about 0.5 pH units higher or ​ ​ ​ lower, respectively, than the corresponding wild type CHS. C, The active sites of the two ​ ​ monomers of the AtCHS M7 septuple mutant structure. The 2Fo−Fc electron density map ​ ​ ​ ​ ​ ​ contoured at 1.5휎 is shown around the catalytic cysteine. The crystal structure shows oxidation to sulfenic acid in the catalytic cysteine of one chain (left) and a reduced cysteine in the other (right).

47 basal-plant lineages. Because of their positions in the alpha helix immediately C-terminal to the catalytic C169, we postulated that these two residues could play a role in determining the structure of the helix, which would have an effect on the electronic environment of the active

site, due to the helix dipole’s contribution to lowering the catalytic cysteine pKa (Ferrer et al., ​ ​ ​ ​ 1999). Four additional residues near the active-site opening of CHS were also identified as ​ differentially conserved between euphyllophytes and basal plants. We postulated that these positions might affect the dynamics of the active-site tunnel and solvent access to the active site.

The six aforementioned residues were mutated in the SmCHS S340C background to their corresponding residues in AtCHS to generate the SmCHS I54M S160F A163G G203S A207Q

V258T S340C septuple mutant, termed SmCHS M7. Likewise, the reciprocal mutations were also made in the AtCHS C347S background to generate AtCHS M7.

Compared to SmCHS S340C, the six additional mutations in SmCHS M7 lower the pK ​ a​ by nearly 0.7 pH units from 6.429 to 5.738 (Figure 4B). Similarly, the six mutations of AtCHS

M7 raise the pKa by almost 1 pH unit from 5.181 to 6.167 compared to AtCHS C347S. ​ ​ ​

Consistent with the pKa observation, the dimeric crystal structure of AtCHS M7 has one ​ ​ ​ monomer with a catalytic cysteine singly oxidized to sulfenic acid and one monomer with a reduced cysteine (Figure 4C). Sulfenic acid is more reduced than the doubly oxidized sulfinic acid seen in other euphyllophyte crystal structures, indicating that these six mutations decreased the reactivity of the catalytic cysteine. These mutations represent a part of a possible

evolutionary path from ancestral basal-plant CHSs toward the stronger pKa-lowering properties ​ ​ ​ of euphyllophyte CHSs. Any further attempts at engineering CHS to fully swap the pKa-lowering ​ ​ ​ properties between AtCHS and SmCHS would likely require different methods of searching for

48 conserved sequence differences, beyond visual observation of structural differences. An analysis of the CHS multiple sequence alignment using ancestral sequence reconstruction with FastML

(Ashkenazy et al., 2012) identified eight additional positions that are differently conserved ​ between euphyllophytes and basal plants and could affect CHS function based on their position in the CHS crystal structure (Figure S6).

Molecular dynamics simulations reveal differences in active-site interactions between basal-plant and euphyllophyte CHSs

Our crystal structures revealed a correlation between the pKa of the catalytic cysteine and ​ ​ ​ a set of residues near the active site. To further investigate the mechanisms underlying these conserved differences between euphyllophyte and basal-plant CHSs, we employed molecular dynamics (MD) simulations to examine the interactions between these residues. We first surveyed the potential role of the C347S substitution (AtCHS numbering) in affecting the active site environment in wild-type AtCHS and SmCHS (Figure 5A). In wild-type AtCHS, where the largest cluster represents 70.3% of all structures sampled in this simulation, the thiol group of

C347 points away from the active site and cannot form any stable interaction with the catalytic

H309 (distance 6.5 Å). In contrast, the corresponding S340 in SmCHS is 2.8 Å away from the histidine in the largest cluster, representing 98.7% of all structures sampled in the SmCHS simulation.

Next, we determined the inter-residue distances between the ionic pair C169-H309 as in

AtCHS or C159-H302 as in SmCHS and between residue C347 (AtCHS)/S340 (SmCHS) and the catalytic histidine (Figure 5B). For wild-type SmCHS simulation, we observe a sharp peak at

49

Figure 5. Molecular dynamics simulations of CHS orthologs and mutants. A, The centroid ​ ​ ​ structure of the largest cluster of the catalytic pair C169-H309. For visualization purposes, the sulfur atom in the ionic cysteine C169 is shown in a ball representation, and crystal structures are depicted as thin sticks. B, Distributions of inter-residue distances obtained from simulations. ​ ​

50 around 2.8 Å between S340 and H302, reflecting a stable hydrogen bond between the two residues. On the contrary, no such short-distance peak is observed for wild- type AtCHS. These results suggest that the catalytic histidine is stabilized upon forming a hydrogen bond with S340 in SmCHS, but such an interaction is relatively loose in AtCHS. Similar differences between the other euphyllophyte and basal-plant CHSs are also seen for PsCHS, EaCHS, and PpCHS (Figure

S7).

To further investigate the motion of the catalytic histidine in various mutant enzyme active-site environments, we also performed MD simulations of AtCHS C347S, SmCHS S340C,

AtCHS M7, and SmCHS M7 (Figure 5A). The largest cluster sizes were 86.0%, 66.6%, and

96.7%, and 71.0%, respectively. S347 and H309 in AtCHS mutants adopt similar conformation to the corresponding residues in wild-type SmCHS. In contrast, no stable hydrogen bond between C340 and H302 is formed in the largest cluster of the SmCHS mutant simulations.

Introducing point mutations dramatically changes the distributions of those key inter-residue distances. In the AtCHS C347S mutant, the S347-H309 distance dramatically shortens to a peak around 2.8 Å, and introducing the six additional mutations in AtCHS M7 further increases the height of the peak. This suggests that mutating these seven positions in AtCHS to the corresponding residues in SmCHS can allow the active-site residues to approximate the interactions of wild-type SmCHS. The opposite effect is seen in SmCHS S340C and SmCHS M7 mutants, which recapitulate the weak interaction between C347 and H309 seen in wild-type

AtCHS.

Based on these results, we hypothesize that the strong S340-H302 interaction facilitated by the SmCHS active site environment may weaken the stabilizing effect of H302 on the

51 catalytic cysteine thiolate compared to that in AtCHS, thus contributing to the higher pKa. ​ ​ ​ Meanwhile, the inter-residue distance of the catalytic cysteine-histidine ionic pair is rather stable in all CHS simulations, ranging from 3 to 5 Å and centered at around 4.1 Å. This suggests that the C347S substitution (AtCHS numbering) does not directly break this ionic interaction but may subtly influence the charge distribution on the histidine imidazole ring to perturb the catalytic

cysteine pKa (Figure 6). In addition, the presence of a cysteine appears to decrease solvent ​ ​ ​ content in the active site compared to serine, which would increase the pKa-lowering effect of the ​ ​ ​ ionic interaction between histidine and the catalytic cysteine (Figure S8 and Supporting Note).

Taken together, our results suggest that euphyllophyte CHSs have evolved to enhance the reactivity of the catalytic cysteine through the modification of specific interactions between active-site residues to allow for stronger stabilization of the thiolate.

52

Figure 6. Proposed model for differential modulation of catalytic cysteine nucleophilicity in ​ basal-plant (left) and euphyllophyte (right) CHSs. In basal-plant CHSs (left), the serine (S340 in SmCHS) interacts more strongly with the histidine of the catalytic triad, weakening the ionic interaction that stabilizes the thiolate form of the catalytic cysteine. This is depicted as a shift of the equilibrium toward a state in which the positive charge on the histidine (blue) is shifted away from the catalytic cysteine (C159 in SmCHS) and the shared proton interacts more closely with cysteine. In euphyllophyte CHSs (right), this position mutated to a cysteine (C347 in AtCHS), which interacts relatively loosely with the catalytic histidine, in turn strengthening the ionic interaction between the catalytic histidine and the activated thiolate of the catalytic cysteine. This is depicted as a shift of the equilibrium toward a state in which the positive charge on the histidine (blue) is shifted toward the catalytic cysteine (C169 in AtCHS).

53 Discussion

As early plants initially migrated from water to land and further radiated to occupy diverse terrestrial environmental niches, they continuously encountered new challenges from biotic and abiotic stresses. The greatly expanded diversity and increased abundance of flavonoids in certain plant lineages could have increased the demand for metabolic flux into flavonoid biosynthesis. One adaptive strategy to meet this demand, among many others, is to increase the enzymatic efficiency of chalcone synthase, the first committed enzyme of flavonoid biosynthesis that gates flux from general phenylpropanoid metabolism. One property of CHS that affects its enzymatic efficiency is the reactivity of the first step of nucleophilic attack on p-coumaroyl- ​ ​ CoA. To investigate this, we performed structural, biochemical, mutagenesis, and molecular dynamics experiments on CHS orthologs from five major plant lineages. Our results suggest that euphyllophyte CHSs have indeed evolved new structural features to increase the reactivity of their catalytic cysteine compared to basal-plant CHSs.

To identify sequence and structural features between euphyllophyte and basal-plant CHSs that lead to this difference in enzymatic properties, we generated mutants in the background of

AtCHS and SmCHS at various positions with conserved sequence differences segregating

euphyllophyte and basal-plant CHSs. AtCHS M7 and SmCHS M7 had pKa values raised by ​ ​ ​ about 0.7 pH units and lowered by about 1 pH unit from the wild-type enzymes, respectively.

Furthermore, AtCHS M7 also exhibits a less oxidized catalytic cysteine in its crystal structure than in wild-type AtCHS. These results indicate that we were able to identify residue changes that partially traced the evolutionary path from SmCHS to AtCHS that increased the reactivity of the catalytic cysteine. In the type III PKS family, the introduction of a large number of mutations

54 to yield subtle changes in enzyme activity is not unprecedented. Stilbene synthase (STS) produces resveratrol, a tetraketide product whose biosynthetic mechanism differs from that of naringenin chalcone in only the final cyclization step. In a previous study, a total of 18 point mutations were required to convert CHS activity to STS activity, through small changes in the hydrogen-bonding network in the active site (Austin, Bowman, Ferrer, Schröder, & Noel, 2004). ​ ​ To examine in detail the intramolecular interactions that lead to enhanced cysteine reactivity, we performed molecular dynamics simulations on CHS. In comparing different CHS orthologs and point mutants, we observed that the presence of a cysteine in position 347 (AtCHS numbering) leads to a weak interaction between that cysteine and histidine, as indicated by the broad distribution of inter-residue distances centered at a distance greater than 5 Å, too long for a stable hydrogen bond. In contrast, when a serine is present, the sharp peak of serine-histidine inter-residue distance around 2.75 Å suggests the presence of a strong hydrogen bond. This hydrogen bonding likely shifts the electron density of the histidine away from the catalytic cysteine, weakening the imidazoline-thiolate ion pair. This weakened ionic interaction would

lead to less pKa depression compared to CHS orthologs and mutants containing a cysteine in the ​ ​ ​ nearby position, where the histidine is able to maintain a stronger ion pair with the catalytic

cysteine and lower the pKa to a greater degree. This is reminiscent of the role of aspartate 158 in ​ ​ ​ papain, a cysteine protease that also uses a cysteine-histidine-asparagine catalytic triad for nucleophilic attack on its substrates (Storer & Ménard, 1994). Although D158 is not essential for ​ ​ papain activity, its side chain affects the pH-activity profile by forming a hydrogen bond with the backbone amide of the catalytic histidine. This interaction stabilizes the catalytic ionic pair and maintains an optimal orientation of active-site residues. A D158E mutant papain had a

55 pH-activity profile shifted by 0.3 pH units, about the same magnitude of the effect we observed

on pKa for CHS cysteine/serine mutants. ​ ​ ​ We propose a model of the role of position 347 in enhancing CHS reactivity (Figure 6).

In the basal example of SmCHS, the serine interacts more strongly with the histidine of the catalytic triad, weakening the ionic interaction that stabilizes the thiolate form of the catalytic cysteine. In euphyllophyte CHSs, this position mutated to a cysteine, which interacts more poorly with the histidine, strengthening the ionic interaction and stabilizing the activated thiolate of the catalytic cysteine.

While the mechanism of how the other six mutations in the M7 mutants affect the catalytic cysteine is not entirely clear, we noticed that, possibly due to the smaller side chains of the S213G and Q217A mutations, AtCHS M7 has a surface helix in a slightly different conformation than wild-type AtCHS, leading to a slightly wider active-site opening. There is also a newly solvent-accessible cavity as determined by a computational cavity-finding software

(Figure S9). These structural differences could lead to subtle changes in the amino acid backbone dynamics near the active site and thus alter the active-site volume or electronic environment,

which could alter the pKa of the catalytic cysteine (Jez & Noel, 2000). ​ ​ ​ ​ ​ Although cysteine sulfenic and sulfinic acid have been thought of as crystallographic artifacts, an increasing number of studies have shown that this type of cysteine oxidation can play an important functional role. In particular, cysteine sulfinic acid has been shown to play a regulatory role in reversible inhibition of the activity of enzymes such as protein tyrosine phosphatase 1B and glyceraldehyde-3-phosphate dehydrogenase, suggesting that cysteine redox

56 potential can be an evolved trait (Peralta et al., 2015; van Montfort, Congreve, Tisi, Carr, & ​ Jhoti, 2003). ​ Our results demonstrate that euphyllophytes could have evolved a CHS enzyme that is intrinsically more active, with increased cysteine reactivity as a component, as one adaptation to produce the larger suite of flavonoids needed to counter the various environmental stresses they face. Although it may seem counterintuitive for euphyllophytes, which encounter more oxidative environments than do basal plants, to rely on a CHS enzyme that is more susceptible to oxidation, this susceptibility may be an unavoidable trade-off resulting from the chemical nature of a more nucleophilic cysteine: a catalytic cysteine more reactive toward substrate is also more reactive toward oxidants like hydrogen peroxide. To compensate for this increased susceptibility to oxidation, euphyllophytes may have evolved other systems to better maintain the redox environment inside the cell, one of those systems being the antioxidant flavonoids themselves.

57 Materials and Methods

Cloning and site-directed mutagenesis of CHSs

Total RNA was obtained from Arabidopsis thaliana, Pinus sylvestris, Equisetum arvense, ​ ​ ​ ​ ​ ​ Selaginella moellendorffii, and Physcomitrella patens. Reverse transcription was performed to ​ ​ ​ obtain cDNA. The open reading frames (ORFs) of five CHS orthologs were amplified via PCR ​ ​ from cDNA, digested with NcoI and XhoI, and ligated into NcoI- and XhoI-digested pHis8-3 or pHis8-4B Escherichia coli expression vectors. Site-directed mutagenesis was performed ​ ​ according to the QuikChange II Site-Directed Mutagenesis protocol (Agilent Technologies).

Transgenic Arabidopsis

The AtCHS promoter (defined as 1328 bp of sequence upstream of the CHS transcription ​ ​ start site) was amplified via PCR from Arabidopsis genomic DNA, digested with HindIII and

XhoI, and ligated into HindIII- and XhoI-digested pCC 1136, a promoterless Gateway cloning binary vector containing a BAR resistance gene marker, to generate pJKW 0152. The five CHS ​ ORFs described above were then PCR amplified from cDNA and cloned into pCC 1155, an ampicillin-resistant version of the pDONR221 Gateway cloning vector, with BP clonase in the

Gateway cloning method (Thermo-Fisher). The resulting vectors were recombined with pJKW

0152 using LR clonase in the Gateway cloning method to generate the final binary constructs.

Agrobacterium tumefaciens-mediated transformation of Arabidopsis was performed using the ​ floral dipping method (Weigel & Glazebrook, 2002). ​ ​

58 Recombinant protein expression and purification

CHS genes were cloned into pHis8-3 or pHis8-4B, bacterial expression vectors containing an N-terminal 8×His tag followed by a thrombin or tobacco etch virus (TEV) cleavage site, respectively, for recombinant protein production in E. coli. Proteins were ​ ​ expressed in the BL21(DE3) E. coli strain cultivated in terrific broth (TB) and induced with 0.1 ​ ​ mM isopropyl β-D-1-thiogalactopyranoside (IPTG) overnight at 18 °C. E. coli cells were ​ ​ harvested by centrifugation, resuspended in 150 mL lysis buffer (50 mM Tris pH 8.0, 500 mM

NaCl, 30 mM imidazole, 5 mM DTT), and lysed with five passes through an M-110L microfluidizer (Microfluidics). The resulting crude protein lysate was clarified by centrifugation

(19,000 g, 1 h) prior to QIAGEN nickel–nitrilotriacetic acid (Ni–NTA) gravity flow chromatographic purification. After loading the clarified lysate, the Ni–NTA resin was washed with 20 column volumes of lysis buffer and eluted with 1 column volume of elution buffer (50 mM Tris pH 8.0, 500 mM NaCl, 300 mM imidazole, 5 mM DTT). 1 mg of His-tagged thrombin or TEV protease was added to the eluted protein, followed by dialysis at 4 °C for 16 h in dialysis buffer (50 mM Tris pH 8.0, 500 mM NaCl, 5 mM DTT). After dialysis, the protein solution was passed through Ni–NTA resin to remove uncleaved protein and His-tagged TEV. The recombinant proteins were further purified by gel filtration on an ÄKTA Pure fast protein liquid chromatography (FPLC) system (GE Healthcare Life Sciences). The principal peaks were collected, verified by SDS–PAGE, and dialyzed into a storage buffer (12.5 mM Tris pH 8.0, 50 mM NaCl, 5 mM DTT). Finally, proteins were concentrated to >10 mg/mL using Amicon

Ultra-15 Centrifugal Filters (Millipore).

59 Protein crystallization

All protein crystals were grown by hanging drop vapor diffusion at 4 °C, except for

EaCHS at 20 °C. For AtCHS wild-type and C347S crystals, 1 µL of 10 mg/mL protein was mixed with 1 µL of reservoir solution containing 0.1 M HEPES (pH 7.5), 0.3 M ammonium acetate, 14% (v/v) PEG 8000, and 5 mM DTT. For AtCHS M7 crystals, 1 µL of 16.33 mg/mL protein was mixed with 1 µL of reservoir solution containing 0.125 NaSCN, 20% (v/v) PEG

3350, and 5 mM DTT; 0.2 µL of a crystal seed stock from previous rounds of crystal optimization was also added. For SmCHS wild-type and S340C crystals, 1 µL of 10 mg/mL protein was mixed with 1 µL of reservoir solution containing 0.1 M MOPSO (pH 6.6), 0.3 M

Mg(NO3)2, 19% (v/v) PEG 4000, and 5 mM DTT. For EaCHS, 1 µL of 16.92 mg/mL protein ​ ​ ​ ​ was mixed with 1 µL of reservoir solution containing 0.15 M LiCl, 8% PEG 6000, and 5 mM

DTT. For PsCHS, 1.66 µL of 14.65 mg/mL protein was mixed with 0.67 µL of reservoir solution

containing 0.14 M NH4Cl, 20% (v/v) PEG 3350, and 5 mM DTT; 0.2 µL of a crystal seed stock ​ ​ from previous rounds of crystal optimization was also added. For PpCHS, 1 µL of 10 mg/mL protein was mixed with 1 µL of reservoir solution containing 0.1 M MES (pH 6.9), 18% (v/v)

PEG 20000, and 5 mM DTT. Crystals were harvested within 1 week and transferred to a

cryoprotection solution of 17% glycerol and 83% reservoir solution. H2O2 soaking of SmCHS ​ ​ ​ ​ crystals was performed by adding H2O2 to 1 mM to the cryoprotection solution and incubating at ​ ​ ​ ​ 4 °C for 75 min. Single crystals were mounted in a cryoloop and flash-frozen in liquid nitrogen.

60 X-ray diffraction and structure determination

X-ray diffraction data were collected at beamlines 8.2.1 and 8.2.2 of the Advanced Light

Source at Lawrence Berkeley National Laboratory on ADSC Quantum 315 CCD detectors for

AtCHS wild-type, AtCHS C347S, and SmCHS S340C crystals. X-ray diffraction data were collected at beamlines 24-ID-C and 24-ID-E of the Advanced Photon Source at Argonne

National Laboratory on an ADSC Quantum 315 CCD detector, Eiger 16M detector, or Pilatus

6M detector for SmCHS wild-type, EaCHS, PsCHS, and AtCHS M7 crystals. Diffraction intensities were indexed and integrated with iMosflm (Battye, Kontogiannis, Johnson, Powell, & ​ Leslie, 2011) and scaled with Scala under CCP4 (Evans, 2006; Winn et al., 2011). The phases ​ ​ ​ were determined with molecular replacement using Phaser under Phenix (Adams et al., 2010). ​ ​ Further structural refinement utilized Phenix programs. Coot was used for manual map inspection and model rebuilding (Emsley & Cowtan, 2004). Crystallographic calculations were ​ ​ performed using Phenix.

Comparative sequence and structure analyses

CHS protein sequences were derived from NCBI and the 1000 Plants (1KP) Project

(Matasci et al., 2014; NCBI Resource Coordinators, 2016). In all cases, AtCHS was used as the ​ search query. Amino acid alignment of CHS orthologs was created using MUSCLE with default settings (Edgar, 2004). UCSF Chimera and ESPript were used to display the multiple-sequence ​ ​ alignments shown in Figure 2, Figure S5, and Figure S6 (Pettersen et al., 2004; Robert & Gouet, ​ 2014). Phylogenetic analysis was performed using MEGA7 (Kumar, Stecher, & Tamura, 2016). ​ ​ ​ All structural figures were created with the PyMOL Molecular Graphics System, version 1.3

61 (Schrödinger, LLC) (DeLano, 2016). Active site cavity measurements for the AtCHS and AtCHS ​ M7 structures were determined using KVFinder (Oliveira et al., 2014). ​ ​

Enzyme assays and pKa measurement ​ ​ ​ A 4CL-CHS coupled assay was used for kinetic analysis. A 4CL reaction master mix was made by incubating 917 nM Arabidopsis thaliana 4CL1 (NCBI accession number NP_175579.1)

in 100 mM Tris-HCl (pH 8.0), 5 mM MgCl2, 5 mM ATP, 100 µM p-coumaric acid, 100 µM ​ ​ ​ ​ coenzyme A, and 10 or 50 µM malonyl-CoA for 30 min at room temperature to generate p-coumaroyl-CoA at a final concentration of 70 µM. This 4CL was divided into individual ​ aliquots of 196 µL in Eppendorf tubes. CHS enzyme was incubated for 30 or 60 s in 16 µL volumes using a triple buffer system (50 mM AMPSO, 50 mM sodium phosphate, 50 mM sodium pyrophosphate, various pH) (Ellis & Morrison, 1982) (Schlegel, Jez, & Penning, 1998) at ​ ​ ​ ​ room temperature in the presence of 25 µM iodoacetamide for the inactivation sample or water for the control sample. Aliquots (4 µL) were withdrawn from the incubation mixture and added to the standard coupled CHS assay system. The CHS reaction was run for 10 min at room temperature and stopped by addition of 200 µL methanol.

The assay samples were centrifuged and analyzed directly by liquid chromatography−mass spectrometry (LC−MS). LC was conducted on a Dionex UltiMate 3000

UHPLC system (Thermo Fisher Scientific), using water with 0.1% formic acid as solvent A and acetonitrile with 0.1% formic acid as solvent B. Reverse phase separation of analytes was performed on a Kinetex C18 column, 150 × 3 mm, 2.6 μm particle size (Phenomenex). The column oven was held at 30 °C. Samples were eluted with a gradient of 5–60% B for 9 min, 95%

62 B for 3 min, and 5% B for 3 min, with a flow rate of 0.7 mL/min. MS analysis was performed on a TSQ Quantum Access Max mass spectrometer (Thermo Fisher Scientific) operated in negative ionization mode with a SIM scan centered at 271.78 m/z to detect naringenin chalcone.

The pH profiles (pH on the X-axis, ratio of naringenin chalcone produced with iodoacetamide-treatment to control on the Y-axis) were determined by fitting raw data to the log(inhibitor) vs. response equation using nonlinear regression in Prism, version 6.0f (GraphPad

Software).

Molecular dynamics

All MD simulations were performed using the GROMACS 5.1.4 package (Abraham et ​ al., 2015) and CHARMM force field (Best et al., 2012). The catalytic residues were modeled as ​ ​ ​ protonated histidine (H309 in AtCHS number) and deprotonated cysteine (C169 in AtCHS numbering). All CHSs were constructed as dimers and were pre-aligned to the wild-type AtCHS crystal structure using the Multiseq plugin of VMD (Roberts, Eargle, Wright, & ​ Luthey-Schulten, 2006). All CHS dimers were solvated with 0.1 M NaCl in a dodecahedron box. ​ Before the production runs, all systems were submitted to a minimization, followed by a 500-ps

NVT and a 500-ps NPT run with heavy atoms constrained. This was followed by another 5-ns

NPT simulation with protein backbone constrained. In all simulations, an integration time step of

2 fs was used, with bonds involving hydrogens constrained using LINCS (Hess, 2008; Hess, ​ Bekker, Berendsen, Fraaije, & Others, 1997). The van der Waals interaction was smoothly ​ switched off starting from 10 Å, with a cut-off distance of 12 Å. The neighboring list was updated every 10 steps with Verlet cutoff-scheme. The electrostatic interaction was evaluated

63 using Particle-Mesh-Ewald (PME) summation (Darden, York, & Pedersen, 1993) with a grid ​ ​ spacing of 1.5 Å to account for the long-range interaction, while its short-range interaction in real space had a cut-off distance of 12 Å. The velocity-rescaling thermostat (Bussi, Donadio, & ​ Parrinello, 2007) and Parrinello-Rahman barostat (Nosé & Klein, 1983; Parrinello & Rahman, ​ ​ 1981) were employed to maintain the temperature at 300 K and the pressure at 1 bar. ​ For each CHS, three copies of 200-ns production runs were performed. The aggregated simulation time of all CHS wildtype and mutant systems is 5.4 μs. The two monomers of a given

CHS were treated equivalently in the analysis; i.e., the three copies of trajectories of each monomer were combined after they were aligned to chain A of the associated crystal structure, resulting in a total of 1.2-μs trajectory for analysis of a given CHS system. Clustering analysis was carried out with GROMACS gmx cluster with a RMSD cutoff of 0.1 nm. The inter-residue distance was measured using the tcl scripting abilities provided by VMD (Humphrey, Dalke, & ​ Schulten, 1996). The minimum distance between the two nitrogen atoms of the catalytic histidine ​ and the associated hydroxyl, thiol, or thiolate group of its serine or cysteine partener was taken as the inter-residue distance. Water occupancy calculation was performed using the volmap plugin of VMD (Humphrey et al., 1996). ​ ​

64 References

Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindahl, E. (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1-2, 19–25. ​ ​ ​ ​ Adams, P. D., Afonine, P. V., Bunkóczi, G., Chen, V. B., Davis, I. W., Echols, N., … Zwart, P. H. (2010). PHENIX: a comprehensive Python-based system for macromolecular structure solution. Acta Crystallographica. Section D, Biological Crystallography, 66(Pt 2), ​ ​ ​ ​ 213–221. Ashkenazy, H., Penn, O., Doron-Faigenboim, A., Cohen, O., Cannarozzi, G., Zomer, O., & Pupko, T. (2012). FastML: a web server for probabilistic reconstruction of ancestral sequences. Nucleic Acids Research, 40(Web Server issue), W580–W584. ​ ​ ​ ​ Austin, M. B., Bowman, M. E., Ferrer, J.-L., Schröder, J., & Noel, J. P. (2004). An aldol switch discovered in stilbene synthases mediates cyclization specificity of type III polyketide synthases. Chemistry & Biology, 11(9), 1179–1194. ​ ​ ​ ​ Austin, M. B., & Noel, J. P. (2003). The chalcone synthase superfamily of type III polyketide synthases. Natural Product Reports, 20(1), 79–110. ​ ​ ​ ​ Battye, T. G. G., Kontogiannis, L., Johnson, O., Powell, H. R., & Leslie, A. G. W. (2011). iMOSFLM: a new graphical interface for diffraction-image processing with MOSFLM. Acta Crystallographica. Section D, Biological Crystallography, 67(Pt 4), 271–281. ​ ​ ​ Best, R. B., Zhu, X., Shim, J., Lopes, P. E. M., Mittal, J., Feig, M., & Mackerell, A. D., Jr. (2012). Optimization of the additive CHARMM all-atom protein force field targeting improved sampling of the backbone φ, ψ and side-chain χ(1) and χ(2) dihedral angles. Journal of Chemical Theory and Computation, 8(9), 3257–3273. ​ ​ ​ Bussi, G., Donadio, D., & Parrinello, M. (2007). Canonical sampling through velocity rescaling. The Journal of Chemical Physics, 126(1), 014101. ​ ​ ​ Darden, T., York, D., & Pedersen, L. (1993). Particle mesh Ewald: An N⋅log(N) method for Ewald sums in large systems. The Journal of Chemical Physics, 98(12), 10089–10092. ​ ​ ​ ​ DeLano, W. L. (2016). The PyMOL Molecular Graphics System. DeLano Scientific; Palo Alto, CA: 2002. There Is No Corresponding Record for This Reference. ​ ​ Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. ​ ​ ​ ​ Ellis, K. J., & Morrison, J. F. (1982). Buffers of constant ionic strength for studying pH-dependent processes. Methods in Enzymology, 87, 405–426. ​ ​ ​ ​ Emsley, P., & Cowtan, K. (2004). Coot: model-building tools for molecular graphics. Acta ​ Crystallographica. Section D, Biological Crystallography, 60(Pt 12 Pt 1), 2126–2132. ​ ​ ​ Evans, P. (2006). Scaling and assessment of data quality. Acta Crystallographica. Section D, ​ Biological Crystallography, 62(Pt 1), 72–82. ​ ​ ​ Ferrer, J. L., Jez, J. M., Bowman, M. E., Dixon, R. A., & Noel, J. P. (1999). Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nature ​ Structural Biology, 6(8), 775–784. ​ ​ ​ Harris, T. K., & Turner, G. J. (2002). Structural basis of perturbed pKa values of catalytic groups in enzyme active sites. IUBMB Life, 53(2), 85–98. ​ ​ ​ ​ Hess, B. (2008). P-LINCS: A Parallel Linear Constraint Solver for Molecular Simulation.

65 Journal of Chemical Theory and Computation, 4(1), 116–122. ​ ​ ​ Hess, B., Bekker, H., Berendsen, H. J. C., Fraaije, J. G., & Others. (1997). LINCS: a linear constraint solver for molecular simulations. Journal of Computational Chemistry, 18(12), ​ ​ ​ ​ 1463–1472. Humphrey, W., Dalke, A., & Schulten, K. (1996). VMD: visual molecular dynamics. Journal of ​ Molecular Graphics, 14(1), 33–38, 27–28. ​ ​ ​ Jez, J. M., Austin, M. B., Ferrer, J., Bowman, M. E., Schröder, J., & Noel, J. P. (2000). Structural control of polyketide formation in plant-specific polyketide synthases. Chemistry & ​ Biology, 7(12), 919–930. ​ ​ ​ Jez, J. M., & Noel, J. P. (2000). Mechanism of Chalcone Synthase: pKa of the Catalytic Cysteine and the Role of the Conserved Histidine in a Plant Polyketide Synthase. The Journal of ​ Biological Chemistry, 275(50), 39640–39646. ​ ​ ​ Kortemme, T., & Creighton, T. E. (1995). Ionisation of Cysteine Residues at the Termini of Model α-Helical Peptides. Relevance to Unusual Thiol pKa Values in Proteins of the Thioredoxin Family. Journal of Molecular Biology, 253(5), 799–812. ​ ​ ​ ​ Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Molecular Biology and Evolution, 33(7), ​ ​ ​ ​ 1870–1874. Matasci, N., Hung, L.-H., Yan, Z., Carpenter, E. J., Wickett, N. J., Mirarab, S., … Wong, G. K.-S. (2014). Data access for the 1,000 Plants (1KP) project. GigaScience, 3, 17. ​ ​ ​ ​ NCBI Resource Coordinators. (2016). Database resources of the National Center for Biotechnology Information. Nucleic Acids Research, 44(D1), D7–D19. ​ ​ ​ ​ Nosé, S., & Klein, M. L. (1983). Constant pressure molecular dynamics for molecular systems. Molecular Physics, 50(5), 1055–1076. ​ ​ ​ Oliveira, S. H. P., Ferraz, F. A. N., Honorato, R. V., Xavier-Neto, J., Sobreira, T. J. P., & de Oliveira, P. S. L. (2014). KVFinder: steered identification of protein cavities as a PyMOL plugin. BMC Bioinformatics, 15, 197. ​ ​ ​ ​ Parrinello, M., & Rahman, A. (1981). Polymorphic transitions in single crystals: A new molecular dynamics method. Journal of Applied Physics, 52(12), 7182–7190. ​ ​ ​ ​ Peralta, D., Bronowska, A. K., Morgan, B., Dóka, É., Van Laer, K., Nagy, P., … Dick, T. P. (2015). A proton relay enhances H2O2 sensitivity of GAPDH to facilitate metabolic adaptation. Nature Chemical Biology, 11(2), 156–163. ​ ​ ​ ​ Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., & Ferrin, T. E. (2004). UCSF Chimera--a visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25(13), 1605–1612. ​ ​ ​ ​ Rausher, M. D. (2006). The Evolution of Flavonoids and Their Genes. In E. Grotewold (Ed.), The Science of Flavonoids (pp. 175–211). New York, NY: Springer New York. ​ Reddie, K. G., & Carroll, K. S. (2008). Expanding the functional diversity of proteins through cysteine oxidation. Current Opinion in Chemical Biology, 12(6), 746–754. ​ ​ ​ ​ Roberts, E., Eargle, J., Wright, D., & Luthey-Schulten, Z. (2006). MultiSeq: unifying sequence and structure data for evolutionary analysis. BMC Bioinformatics, 7, 382. ​ ​ ​ ​ Robert, X., & Gouet, P. (2014). Deciphering key features in protein structures with the new ENDscript server. Nucleic Acids Research, 42(Web Server issue), W320–W324. ​ ​ ​ ​ Schlegel, B. P., Jez, J. M., & Penning, T. M. (1998). Mutagenesis of 3α-Hydroxysteroid Dehydrogenase Reveals a “Push- Pull” Mechanism for Proton Transfer in Aldo- Keto

66 Reductases. Biochemistry, 37(10), 3538–3548. ​ ​ ​ ​ Shirley, B. W., Kubasek, W. L., Storz, G., Bruggemann, E., Koornneef, M., Ausubel, F. M., & Goodman, H. M. (1995). Analysis of Arabidopsis mutants deficient in flavonoid biosynthesis. The Plant Journal: For Cell and Molecular Biology, 8(5), 659–671. ​ ​ ​ ​ Storer, A. C., & Ménard, R. (1994). Catalytic mechanism in papain family of cysteine peptidases. In Methods in Enzymology (Vol. 244, pp. 486–500). Academic Press. ​ ​ van Montfort, R. L. M., Congreve, M., Tisi, D., Carr, R., & Jhoti, H. (2003). Oxidation state of the active-site cysteine in protein tyrosine phosphatase 1B. Nature, 423(6941), 773–777. ​ ​ ​ ​ Weigel, D., & Glazebrook, J. (2002). Arabidopsis: a laboratory manual. CSHL Press. ​ ​ Weng, J.-K., & Chapple, C. (2010). The origin and evolution of lignin biosynthesis. The New ​ Phytologist, 187(2), 273–285. ​ ​ ​ Weng, J.-K., & Noel, J. P. (2012). Structure-function analyses of plant type III polyketide synthases. Methods in Enzymology, 515, 317–335. ​ ​ ​ ​ Weng, J.-K., & Noel, J. P. (2013). Chemodiversity in Selaginella: a reference system for parallel and convergent metabolic evolution in terrestrial plants. Frontiers in Plant Science, 4, 119. ​ ​ ​ ​ Winkel-Shirley, B. (2001). Flavonoid biosynthesis. A colorful model for genetics, biochemistry, cell biology, and biotechnology. Plant Physiology, 126(2), 485–493. ​ ​ ​ ​ Winn, M. D., Ballard, C. C., Cowtan, K. D., Dodson, E. J., Emsley, P., Evans, P. R., … Wilson, K. S. (2011). Overview of the CCP4 suite and current developments. Acta ​ Crystallographica. Section D, Biological Crystallography, 67(Pt 4), 235–242. ​ ​ ​ Yao, L. H., Jiang, Y. M., Shi, J., Tomás-Barberán, F. A., Datta, N., Singanusong, R., & Chen, S. S. (2004). Flavonoids in food and their health benefits. Plant Foods for Human Nutrition , ​ ​ 59(3), 113–122. ​

67 Supporting Information

Figure S1. Active site structures of Medicago sativa CHS (A) (PDB ID 1BI5) and Gerbera ​ ​ ​ ​ ​ ​ hybrida 2-pyrone synthase (B) (PDB ID 1QLV) showing catalytic cysteine oxidized to sulfinic ​ ​ ​ acid. The 2Fo−Fc electron density map contoured at 1.5휎 is shown around the catalytic cysteine. ​ ​ ​ ​ ​ ​

68

Figure S2. Active site structure of SmCHS crystals soaked in 1 mM hydrogen peroxide for 75 ​ min. A, The 2Fo−Fc composite map to 1.55 Å resolution and contoured at 1.5휎 is shown around ​ ​ ​ ​ ​ ​ ​ ​ the catalytic cysteine, modeled as oxidized to sulfenic acid. B, The 2Fo−Fc electron density map ​ ​ ​ ​ ​ ​ ​ ​ to 1.55 Å resolution and contoured at 1.5휎 is shown as purple and the Fo−Fc difference map ​ ​ ​ ​ ​ ​ contoured at 3.0휎 is shown as green around the catalytic cysteine, modeled as reduced cysteine, indicating clear residual electron density for the oxidized sulfenic acid.

69

SmCHS H2O2 75 min ​ ​ ​ ​ PDB ID 6DXF Data collection Total reflections 404281 (37428) Unique reflections 108309 (10712) Multiplicity 3.7 (3.5) Completeness (%) 98.71 (98.24) Mean I/sigma(I) 11.08 (1.66) R-merge 0.08912 (0.825) CC1/2 0.996 (0.532) Refinement 102.9 - 1.55 Resolution range (Å) (1.605 - 1.55) Space group P 1 21 1 Unit cell (Å) 55.54 67.064 102.993 Unit cell (°) 90 91.719 90 R-work 0.1550 (0.2771) R-free 0.1834 (0.3058) Non-hydrogen protein atoms 5807 Water molecules 686 RMSD bonds (Å) 0.01 RMSD angles (°) 1.25 Ramachandran favored (%) 97.32 Ramachandran allowed (%) 2.54 Ramachandran outliers (%) 0.13 Average B-factor 24.03

Table S1. Statistics for the crystal structure of SmCHS crystals soaked in 1 mM hydrogen ​ peroxide for 75 minutes. The highest-resolution shell values are given in parentheses.

70

Figure S3. Complementation of the transparent testa seed phenotype of tt4-2 mutant ​ ​ ​ ​ ​ Arabidopsis thaliana. CHS orthologs were expressed under the AtCHS promoter. CHS from ​ ​ ​ ​ ​ ​ ​ euphyllophytes (AtCHS, PsCHS, EaCHS) fully complement the mutant phenotype, whereas CHS ​ ​ ​ ​ ​ ​ ​ from basal land plants (SmCHS, PpCHS) only partially complement. ​ ​ ​ ​

71

Figure S4. pKa measurement of PsCHS, EaCHS, and PpCHS wild type enzymes. CHS enzyme ​ ​ ​ ​ was pre-incubated at various pH in the 25 µM iodoacetamide inhibitor or water (control) for 30 s, and an aliquot was taken to run in a CHS activity assay. The ratio of naringenin product produced in the iodoacetamide treatment divided by the control treatment was calculated for each pH point. A nonlinear regression was performed to fit a log(inhibitor) vs. response curve to determine the pH at which 50% of maximal inhibition was achieved, which was determined to be the pKa of the catalytic cysteine residue. The pKa of PsCHS and EaCHS are close to the 5.5 ​ ​ ​ ​ ​ ​ determined for other euphyllophyte CHSs, whereas the pKa of PpCHS is over 1 pH unit higher, ​ ​ ​ similar to that of SmCHS.

72

Figure S5. Multiple sequence alignment of CHSs. Sequence numbers of the beginning of each ​ block for each CHS sequence are indicated. Residues outlined in thin black boxes are conserved with > 70% similarity across all sequences. Residues with 100% conservation are in white text with a black background. Red boxes indicate the seven positions mutated in the AtCHS M7 and SmCHS M7 constructs; these positions are differentially conserved between euphyllophyte and basal-plant CHSs, which are divided by the horizontal red line.

73

Figure S5 continued.

74

Figure S6. CHS ancestral sequence reconstruction. Sequences and phylogenetic tree of CHSs ​ shown in Figure 1 were used to perform ancestral sequence construction with FastML. The most recent common ancestor (MRCA) sequences of all branches, euphyllophyte, and basal land plant clades are compared to AtCHS and SmCHS. Among the five sequences shown, absolutely conserved residues are shown in white text with red background. Residues with > 70% similarity are shown in red text and white background and blue outline. Other residues are shown in black text. Red arrows indicate the seven differentially conserved positions previously identified and mutated in the M7 CHS constructs. Black arrows indicate additional residue positions that are differentially conserved between euphyllophyte and basal-plant CHSs and determined to have possible functional impact based on their position in the CHS crystal structure. The catalytic triad residues are also labeled.

75

Figure S7. Distributions of inter-residue distances and the largest cluster conformations of ​ EaCHS, PpCHS, PsCHS obtained from MD simulations. The observation of a serine forming a more stable hydrogen bond interaction than cysteine with the catalytic histidine is similar to the AtCHS and SmCHS wild-type and mutant simulations (Figure 5). Notably, with the rather weak interaction between the cysteine C346/C355 and the catalytic histidine, the latter moves more freely and often shows a much larger displacement from the corresponding position in the crystal structure (thin sticks).

76

Figure S8. Average occupancy of water molecules obtained from MD simulations. Black dots ​ represent grid points with an average water occupancy greater than 0.2. SmCHS in general has more water inside the active site, while the wild-type AtCHS has fewer water molecules. AtCHS mutants gradually attract more water around S347. This pattern is also observed in PpCHS, which also attracts more water around its serine than CHS where the serine is replaced by a cysteine (EaCHS, PsCHS). See also Supporting Note below.

77

Figure S9. Comparison of wild-type AtCHS (yellow) and AtCHS M7 (yellow) crystal structures. ​ The catalytic triad residues and two of the seven mutations from wild-type to M7 are modelled as sticks and labeled. The yellow and magenta surfaces represent the solvent-accessible cavities measured using the cavity-finding program KVFinder. The helix containing the two marked mutations is shifted in AtCHS M7 compared to wild type, leading to a larger active-site cavity.

78 Supporting Note

Our MD calculations show that the C347S substitution (AtCHS numbering) can significantly affect active-site solvation. The occupancy of water molecules within the active site

3 was measured with a resolution of 1 Å ​ (Figure S8). Interestingly, S347 in AtCHS C347S and ​ M7 mutants attracts more water toward itself and H309. Similarly, the wild-type SmCHS is also considerably wetter than the wild-type AtCHS: employing a cylinder with a radius of 9 Å and a height of 13 Å to enclose the catalytic residues, we found that the average number of water molecules enclosed was 40.0 for SmCHS and 31.4 for AtCHS. The ability of a serine to attract more water is also observed in simulations of EaCHS, PpCHS, and PsCHS, although in SmCHS mutants the active site remains rather wet despite the mutation of serine to cysteine (Figure S8).

AtCHS M7 also showed a wider active-site opening than wild-type AtCHS, which may also affect solvent access to the active site, as shown by the large cavity found in cavity analysis.

In addition to changing the hydrogen bonding network, the decreased solvation in euphyllophyte

CHSs would enhance the pKa-lowering effect of the histidine, because ionic effects are enhanced ​ ​ ​ as the dielectric constant decreases along with solvent polarity (Harris & Turner, 2002). ​ ​

79

80 Chapter 3

Regulation of chalcone synthase activity in vivo by oxidation of the ​ ​ catalytic cysteine

Authors 1,2 1,2 Geoffrey Liou ​ and Jing-Ke Weng ​ ​

Author Affiliations 1. Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 2. Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA

81 Abstract

Oxidation of the cysteinyl side chain in proteins occurs widely in all living organisms, often as a result of increased levels of reactive oxygen species (ROS). However, the molecular mechanisms and biochemical consequences of cysteine oxidation are not well understood. We have previously observed that chalcone synthase (CHS), the first committed enzyme in plant flavonoid biosynthesis, has evolved a catalytic cysteine that is both more reactive and sensitive to oxidation in euphyllophytes than in basal land plants. We aim to gain a better understanding of how CHS activity and flavonoid metabolism is controlled by oxidative inactivation at the systems level. Furthermore, we seek to understand how such an intricate molecular mechanism ​ ​ could have arisen evolutionarily, and what functional significance this trait might have contributed to the adaptation of land plants to the challenging terrestrial environments in the past

450 million years. We have begun to develop several experimental methods that will allow us to address these questions in vivo. ​ ​

82 Introduction

Post-translational modification is a process that changes the properties of proteins and extends their functions beyond those defined simply by their amino acid sequences. Compared to other post-translational modifications, oxidative thiol modifications of cysteine residues have received little attention and have often been disregarded as in vitro artifacts (Brandes, Schmitt, & ​ ​ ​ Jakob, 2009). More recently, however, the role of cysteine oxidation as a dynamic post- ​ translational modification to regulate protein function in signal transduction and other important physiological processes has become apparent (Kettenhofen & Wood, 2010). These advances in ​ ​ understanding have been facilitated by the advent of chemical probes and omics tools for trapping and detecting specific oxidized forms of cysteine (Seo & Carroll, 2011). ​ ​

• − Reactive oxygen species (ROS), including hydrogen peroxide (H2O2), superoxide (O2 ),​ ​ ​ ​ ​ ​ ​

• and the hydroxyl radical ( OH),​ are generally considered toxic byproducts of aerobic metabolism ​ that must be eradicated to maintain cellular homeostasis (Paulsen & Carroll, 2010). Recent ​ ​ findings, however, suggest that ROS can act as secondary messengers to mediate cellular signaling and metabolism (D’Autréaux & Toledano, 2007). The cysteinyl side chain in proteins ​ ​ can undergo a series of oxidative transformations upon reaction with ROS: the cysteine thiol

(Cys-SH) is reversibly oxidized to sulfenic acid (Cys-SOH), which can then be successively

oxidized irreversibly to sulfinic (Cys-SO2H) and sulfonic acid (Cys-SO3H) (Kettenhofen & ​ ​ ​ ​ ​ Wood, 2010). ​ These oxidative modifications of cysteine can alter the properties of proteins and regulate their activity in various ways, as has been studied in a few systems. For example, protein tyrosine phosphatases (PTPs) were among the first family of proteins found to be regulated by

83 intracellular redox state through cysteine oxidation (Denu & Tanner, 1998). The catalytic ​ ​ mechanism of PTPs involves nucleophilic attack by a conserved catalytic cysteine on a phosphotyrosine substrate to remove the phosphate group, and PTP activity can be reversibly inhibited by oxidation of the catalytic cysteine to sulfenic acid in vivo (van Montfort, Congreve, ​ ​ ​ Tisi, Carr, & Jhoti, 2003). PTP inactivation by oxidation has also been shown to occur in plants, ​ which correlates with the previous observation of the activation by H2O2 of the downstream ​ ​ ​ ​ mitogen-activated protein (MAP) kinase (Gupta & Luan, 2003). Similar to PTPs, ​ ​ glyceraldehyde-3-phosphate dehydrogenase (GAPDH), an enzyme in glycolysis, contains a catalytic cysteine that is prone to oxidation both in vitro and in vivo (Butterfield, Allan ​ ​ ​ ​ ​ Butterfield, Hardas, & Bader Lange, 2010; Ishii et al., 1999). Interestingly, oxidative ​ modification of this catalytic cysteine to sulfenic acid, but not other forms, grants a novel acyl phosphatase activity to GAPDH while inhibiting its dehydrogenase activity (Schmalhausen, ​ Nagradova, Boschi-Muller, Branlant, & Muronetz, 1999). ​ Although cysteine sulfinic and sulfonic acid are generally considered irreversible oxidative modifications, recent studies have discovered a family of eukaryotic enzymes, known as sulforedoxins (Srx), that can specifically reduce the sulfinic acid form of the catalytic cysteine in the 2-Cys peroxiredoxins (Prx) back to the thiol form in an ATP-dependent manner (Basu & ​ Koonin, 2005). Srx binds to Prx proteins by specific interactions with several critical ​ surface-exposed residues of the Prx proteins and transfers the γ-phosphate of ATP to sulfinic acid, using its conserved cysteine as the phosphate carrier. The resulting sulfinic phosphoryl ester is then reduced to cysteine thiol after oxidation of four thiol equivalents (Rhee, Jeong, ​ Chang, & Woo, 2007). Although it is largely unknown whether similar enzyme-mediated ​

84 reduction of cysteine sulfinic and sulfonic acid in proteins occurs more widely using protein substrates other than Prxs or involves alternative classes of reductases other than Srx, reactivation of enzymes inactivated by cysteine oxidation could be a critical component of a system to regulate their activity.

A recent study on singlet oxygen responses in Chlamydomonas reinhardtii and ​ ​ Arabidopsis thaliana identified a conserved small zinc finger protein, MBS1, that is required for ​ proper ROS signaling in response to oxidative stress caused by singlet oxygen (Shao, Duan, & ​ Bock, 2013). Intriguingly, under high-light stress, which elicits singlet oxygen overproduction by ​ photosystem II, the MBS1-overexpression lines showed a much stronger accumulation of ​ ​ anthocyanins than the wild type, while the mbs1-1 mutant lacked visible accumulation of ​ ​ anthocyanins (Shao et al., 2013). The CHS transcript and protein levels were unaltered in both ​ ​ ​ ​ the mbs1-1 mutant and the 35S:MBS1 lines compared to wild-type (Ning Shao, personal ​ ​ ​ ​ communication). We therefore postulate that the decreased anthocyanin accumulation in the mbs1-1 mutant could be due to CHS inhibition through catalytic cysteine oxidation, as the ​ mbs1-1 mutant may fail to maintain proper redox homeostasis under high-light stress. On the ​ other hand, the enhanced ROS signaling in the MBS1 overexpression lines may prevent CHS ​ ​ from being oxidized, resulting in enhanced flux into flavonoid biosynthesis, and consequently, the hyperaccumulation of anthocyanins. MBS1 may be part of a redox regulatory network that ​ ​ has evolved to modulate flavonoid biosynthesis. We have obtained these mbs1-1 mutant and ​ ​ MBS1 overexpression lines, and they can be crossed with our CHS transgenic lines to examine ​ ​ ​ the differential effect of altered redox signaling on various CHS orthologs, and with tt5 mutant ​ ​ lines to measure flavonoid biosynthetic flux under different redox stress and light conditions.

85 In this chapter, we aimed to develop transgenic Arabidopsis thaliana lines and various ​ ​ methods to measure CHS activity and CHS redox state in vivo. To measure CHS activity and ​ ​ flavonoid production, we developed a metabolic tracing method using Arabidopsis seedlings

grown in d6-p-coumaric acid and measured the production of labeled naringenin using a mutant ​ ​ ​ ​ that accumulates naringenin. We also crossed this mutant with the mbs1-1 mutant and performed ​ ​ metabolic tracing as a preliminary test of whether altered redox regulation affected flavonoid metabolic flux. To measure CHS redox state, we developed a proteomic mass spectrometric method to detect peptides containing different oxidized forms of cysteine, and this method was first tested with heterologously expressed CHS. To develop a purification workflow for isolating

CHS from plant tissues, we generated transgenic Arabidopsis lines expressing 3×FLAG-tagged

CHS and characterized their CHS gene expression by qRT-PCR. We also piloted a FLAG-based ​ ​ purification and detection method with heterologously expressed 3×FLAG-CHS.

86 Results tt5 mutant Arabidopsis thaliana accumulates naringenin and can be used for metabolic ​ ​ ​ tracing to measure CHS activity in vivo ​ Chalcone isomerase (CHI) is the enzyme immediately downstream of CHS in the flavonoid biosynthetic pathway. CHI catalyzes the intramolecular cyclization of naringenin chalcone to form (2S)-naringenin, which serves as the precursor to all downstream flavonoids ​ ​ (Jez, Bowman, Dixon, & Noel, 2000). In the absence of CHI, naringenin chalcone can ​ spontaneously cyclize in aqueous solution to form a mixture of both enantiomers, (2S)- and ​ ​ (2R)-naringenin. ​ ​ The tt5 mutant of Arabidopsis thaliana contains a knockout of the CHI gene and ​ ​ ​ ​ ​ ​ consequently lacks flavonoids in all tissues, resulting in the namesake transparent testa ​ phenotype of pale yellow seeds lacking brown colored flavonoids in the seed coat, as well as the lack of hypocotyl flavonoid accumulation (Shirley et al., 1995). tt5 has been reported to ​ ​ ​ ​ accumulate naringenin chalcone (Peer, 2001). We observed an accumulation of naringenin, ​ ​ possibly due to spontaneous cyclization during metabolite extraction and LC/MS analysis. This accumulation serves as a useful readout for CHS activity in vivo. ​ ​ tt5 Arabidopsis thaliana seedlings were incubated in Murashige-Skoog media containing ​

100 µM d6-p-coumaric acid for 2 to 24 hours, and the abundances of unlabeled and ​ ​ ​ ​ deuterium-labeled downstream phenylpropanoids and flavonoids were measured. Naringenin showed an increase in label incorporation that plateaued at 8 hours, whereas the endpoint phenylpropanoid sinapoyl malate and the endpoint flavonoid kaempferol 3-O-glucoside- ​ ​

87

Figure 1. Incorporation of deuterium labeling into flavonoid (A, B) and phenylpropanoid (C) ​ metabolites over time during Arabidopsis thaliana seedling incubation in d6-p-coumaric acid. ​ ​ ​ ​ ​ ​ The peak areas of the unlabeled (M) and d6- or d4-labeled compounds (M+6 or M+4, ​ ​ ​ ​ respectively) are plotted in blue and red, respectively. The percentage of labeled compound peak area as a percentage of the sum of the labeled compound peak area and unlabeled compound peak area is plotted in green.

88 7-O-rhamnoside showed a linear incorporation rate (Figure 1). This suggests that d6-p-coumaric ​ ​ ​ ​ ​ ​ acid tracing could be used as a measurement of flavonoid metabolic flux through CHS.

tt5 mutant Arabidopsis thaliana accumulates both enantiomers of naringenin ​ ​ ​ In addition to the pale seed color common to all transparent testa mutants, tt5 exhibits ​ ​ ​ ​ slightly stunted growth; male sterility; and shortened, curled siliques. This is somewhat surprising because one would expect that naringenin chalcone produced by CHS would spontaneously cyclize to form (2S)-naringenin, which could serve as the precursor to ​ ​ downstream flavonoid biosynthesis. We hypothesized that the accumulation of the incorrect R ​ enantiomer of naringenin may somehow interfere with downstream flavonoid biosynthesis, possibly through competitive inhibition of the biosynthetic enzymes. To examine the metabolic changes that may lead to this phenotype, we prepared metabolic extracts from tt5 leaves and ​ ​ siliques. Chiral chromatography revealed that these tissues indeed accumulate both the R and S ​ ​ ​ enantiomers of naringenin in approximately equal abundance (Figure 2).

Generation of and metabolic tracing with tt5 and mbs1-1 mutant Arabidopsis crosses ​ ​ ​ ​ After observing that metabolic tracing with tt5 could be used to measure flavonoid ​ ​ metabolic flux in vivo, we wanted to examine whether mutants in MBS1-mediated redox ​ ​ ​ ​ homeostasis would affect flavonoid metabolism in a way that could be measured by tracing. An mbs1-1 individual was used as the female and a tt5 individual was used as the male in a genetic ​ ​ ​ cross (mbs1-1 × tt5). The F1 seeds from this cross were collected and allowed to germinate. The ​ ​ ​ ​ F2 seeds were collected and allowed to germinate; notably, only 64 F2 seeds germinated out of

89

Figure 2. Chiral chromatography shows that tt5 Arabidopsis accumulates both enantiomers of ​ ​ ​ naringenin. Metabolic extracts from tt5 Arabidopsis silique and leaf were compared to a racemic ​ ​ − naringenin standard. The total ion current (TIC) of the mass window to detect the [M−H] ​ ion of ​ naringenin is shown.

90 270 total planted. The F2 individuals that exhibited lack of hypocotyl flavonoids and/or yellow seeds were assumed to be tt5/tt5 homozygous, and leaf tissue samples were collected for ​ ​ ​ ​ genotyping the MBS1 locus. Of these, approximately 75% were confirmed to have at least one ​ ​ wild-type copy of MBS1, indicating that they were not homozygous for mbs1-1/mbs1-1. Because ​ ​ ​ ​ ​ ​ the genotyping reaction for the T-DNA insertion in mbs1-1 failed to produce a PCR product ​ ​ despite extensive efforts to optimize the reaction, we assumed that the lack of a PCR product in the MBS1 wild-type reaction indicated mbs1-1/mbs1-1 homozygosity. Double homozygous ​ ​ ​ ​ ​ ​ tt5/mbs1-1 lines were carried forward to the F3 generation. ​ ​ ​

Metabolic tracing with d6-p-coumaric acid was performed with F3 seedlings. Two of the ​ ​ ​ ​ lines, 51 and 54, turned out not to be tt5 homozygous based on their accumulation of hypocotyl ​ ​ and cotyledon flavonoids (Figure 3A and 3B). Only one line, 14, showed the same naringenin accumulation as the tt5 control, whereas the remaining mbs1-1 × tt5 lines failed to accumulate ​ ​ ​ ​ ​ ​ naringenin despite lacking hypocotyl flavonoids (Figure 3C). We also did not observe any difference in labeled or unlabeled naringenin abundance between tt5 and mbs1-1 × tt5 line 14, ​ ​ ​ ​ ​ ​ since the seedlings were grown under normal light conditions and not high-light stress.

The catalytic cysteine in AtCHS is more sensitive to in vitro oxidation than in SmCHS ​ ​ To develop a method of detecting oxidized cysteine species in CHS isolated from plant tissue samples, we first performed a pilot in vitro oxidation assay with CHS followed by ​ ​ proteomic mass spectrometry. Recombinantly expressed and purified AtCHS and SmCHS were incubated in aqueous buffer with one of the following oxidizing and reducing agents added: 5 mM or 1 mM hydrogen peroxide, 5 mM oxidized glutathione, water (no additional redox agent),

91

Figure 3. Metabolic tracing in tt5 and mbs1-1 mutant Arabidopsis seedlings. A, Wild-type ​ ​ ​ ​ ​ ​ ​ Arabidopsis seedlings exhibit anthocyanin accumulation in cotyledons, hypocotyl, and seed coat. The other lines with brown seeds (mbs1-1/mbs1-1, tt5/tt5;mbs1-1/mbs1-1 lines 51 and 54) also ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ exhibited this seedling phenotype. B, tt5/tt5;mbs1-1/mbs1-1 line 14 Arabidopsis seedlings exhibit ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ lack of anthocyanin accumulation. The other lines with yellow seeds (tt5/tt5, ​ ​ ​ ​ tt5/tt5;mbs1-1/mbs1-1 lines 28, 33, 55, 13, 42, and 48) also exhibited this seedling phenotype. C, ​ ​ ​ ​ ​ ​ ​ ​ ​ Unlabeled (M) and d6-labeled naringenin content in Arabidopsis lines measured at 2 and 8 hours ​ ​ of incubation in d6-p-coumaric acid. tt5/tt5;mbs1-1/mbs1-1 line 14 shows elevated naringenin ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ accumulation like tt5/tt5. ​ ​ ​ ​

92 5 mM reduced glutathione, or 5 mM dithiothreitol (DTT). The protein samples were then run on a non-reducing SDS-PAGE to isolate the bands corresponding to CHS, which were then submitted for proteomic mass spectrometric analysis. We searched for the peptides containing the catalytic cysteine and determined the redox modification by the difference in mass from the peptide containing reduced cysteine (Figure 4A and 4B). Peptide abundance was determined using spectral counting.

Both AtCHS and SmCHS contain five non-catalytic cysteine residues located at various positions across the protein. In these ten non-catalytic cysteines, the cysteinyl side chains are all buried and therefore inaccessible to the solvent. Consistent with this notion, these noncatalytic cysteines in both AtCHS and SmCHS were detected predominantly as carbamidomethylated at the cysteine residues (>99.9%), indicating that these non-catalytic cysteines existed as reduced thiols prior to iodoacetamide treatment. This ratio is invariable across samples incubated in the six different redox conditions.

In contrast, the peptide containing the catalytic cysteine shows large variations in the abundances of cysteine oxidized states across samples incubated in different redox conditions

(Figure 4C). Under reducing conditions, the catalytic cysteine exists primarily as a reduced thiol.

Under increasingly oxidizing conditions, however, the oxidative states of cysteine sulfenic, sulfinic, and sulfonic acid were all detected in increasing ratios. Interestingly, this pilot proteomic experiment comparing recombinant AtCHS and SmCHS indicates that the catalytic cysteine in AtCHS is more sensitive to oxidation than that of SmCHS (Figure 4C). In the most

oxidizing condition (5 mM H2O2, condition A in Figure 4C), less than 10% of the catalytic ​ ​ ​ ​ cysteine in AtCHS remained reduced and over 85% was oxidized to sulfonic acid. In contrast,

93

Figure 4. Proteomic profiling of the redox states of the catalytic cysteine in CHS shows that ​ AtCHS is more sensitive to oxidation in vitro than SmCHS. A, Structures of oxidized and ​ ​ ​ ​ carbamidomethylated cysteine species detected in this experiment. B, A representative MS-MS ​ ​ spectrum showing cysteine sulfonic acid (trioxidation, C+48). Methionine oxidation is also frequently observed throughout all peptide samples. C, Relative quantification of oxidized ​ ​ cysteine species of the catalytic cysteine-containing peptide in AtCHS and SmCHS by spectral counting. The percentage of the total number of spectra identified as each species within each sample condition is shown. Conditions are labeled with letters A through F as follows: A. 5 mM hydrogen peroxide, B. 1 mM hydrogen peroxide, C. 5 mM oxidized glutathione, D. water, E. 5 mM reduced glutathione, and F. 5 mM DTT.

94 about 30% of the catalytic cysteine in SmCHS remained reduced together with a significant portion of sulfinic acid not fully oxidized to sulfonic acid. This observation is consistent with the our previous observations, made by X-ray crystallography and enzyme assays, that the catalytic cysteine is more reactive and nucleophilic in euphyllophyte CHSs than basal land plant CHSs

(Liou, Chiang, Wang, & Weng, 2018). ​ Although the catalytic cysteine in CHS could theoretically be oxidized through forming mixed disulfides with glutathione or DTT in some of the aforementioned incubation conditions, none of these species were detected in our analysis. This suggests that the CHS active site environment may preclude the entry of DTT or glutathione in effective orientations, and/or kinetically favors other oxidative products over mixed disulfide formation.

FLAG-tag purification and western blotting of CHS

To develop a method for affinity purification of StrepII-tagged CHS from Arabidopsis tissue for proteomic profiling of the catalytic cysteine redox state in vivo, we first attempted to ​ ​ develop a protocol for affinity purification and western blotting of StrepII-tagged CHS. We used both Arabidopsis protein extracts and CHS recombinantly expressed in E. coli. We attempted to ​ ​ use both a StrepTactin-HRP conjugate (StrepTactin is a modified streptavidin protein that binds specifically to the StrepII tag) and an anti-StrepII-tag mouse primary antibody followed by detection with an anti-mouse IgG goat secondary antibody-HRP conjugate, but neither method specifically detected the StrepII-tagged CHS from either Arabidopsis or E. coli sources. We thus ​ ​ decided to try a different protein tag on CHS, an N-terminal 3×FLAG tag.

95 A 3×FLAG tag sequence was introduced onto the 5′ end of the AtCHS and SmCHS ​ ​ ​ sequences, and the combined 3×FLAG-CHS insert was cloned into an E. coli expression vector ​ ​ ​ ​ to also introduce an N-terminal 8×His tag at the N-terminus of the 3×FLAG sequence, separated by a tobacco etch virus (TEV) protease cleavage site. This His tag allows for an independent affinity purification method to first purify 3×FLAG-CHS for pilot testing of FLAG affinity purification and anti-FLAG western blotting. The presence of both His and FLAG tags could also be useful for tandem affinity purification from plant tissue.

3×FLAG-tagged AtCHS and SmCHS were expressed in BL21 E. coli. Cell lysis was ​ ​ performed as in our standard protein purification protocol (Liou et al., 2018). TEV protease ​ ​ digestion to remove the His tag was performed on half of the cleared protein lysate in order to assess whether the His tag would interfere with FLAG purification or western blotting. FLAG purification was then performed using ANTI-FLAG M2 magnetic beads following the manufacturer’s instructions (Sigma-Aldrich). SDS-PAGE of the bead-purified CHS samples showed approximately equal enrichment of the TEV treated and untreated CHS, indicating that the His tag does not interfere with the FLAG tag for purification of CHS (Figure 5A). Proteomic mass spectrometric analysis of the excised gel pieces also confirmed the presence of AtCHS and

SmCHS in their respective protein bands (data not shown).

Anti-FLAG western blotting was then performed on the purified 3×FLAG-CHS samples, as well as samples collected from various steps in the purification process. A monoclonal

ANTI-FLAG M2 mouse antibody (Sigma-Aldrich) was used as the primary antibody, and an anti-mouse IgG goat antibody-HRP conjugate was used as the secondary antibody.

Chemiluminescence detection was performed using the Pierce ECL substrate (Thermo Fisher).

96

Figure 5. Recombinantly expressed 3×FLAG-tagged CHS was enriched after ANTI-FLAG M2 ​ magnetic bead purification. A, SDS-PAGE of E. coli cell lysate and elution from ANTI-FLAG ​ ​ ​ ​ M2 magnetic beads for 3×FLAG-AtCHS and 3×FLAG-SmCHS, either untreated or digested by TEV. B, Ponceau S-stained membrane of E. coli cell lysate, ANTI-FLAG M2 magnetic bead ​ ​ ​ ​ binding supernatant, and elution from beads for 3×FLAG-AtCHS and 3×FLAG-SmCHS, either untreated or digested by TEV. C, Western blot of the membrane in B, using an ANTI-FLAG M2 ​ ​ mouse primary antibody and goat anti-mouse secondary antibody conjugated to HRP, visualized by chemiluminescence.

97 Although some nonspecific blotting was observed in the E. coli lysate and ANTI-FLAG bead ​ ​ supernatant samples, the bead elution samples show mostly specific detection of 3×FLAG-CHS, indicating that ANTI-FLAG purification did indeed enrich for 3×FLAG-CHS, and that the

ANTI-FLAG M2 antibody is suitable for western blot detection (Figure 5B and 4C).

Generation and characterization of transgenic Arabidopsis thaliana lines expressing ​ ​ FLAG-tagged CHS orthologs

We have previously generated transgenic Arabidopsis thaliana lines containing ​ ​ C-terminal StrepII-tagged CHS orthologs from different species, under the control of the

Arabidopsis CHS promoter in the CHS-null mutant tt4-2 background. These lines were used to ​ ​ ​ ​ ​ ​ observe the capability of different CHS orthologs to complement in vivo the flavonoid-deficient ​ ​ ​ ​ phenotypes of tt4-2 (Chapter 2, Figure S3). ​ ​ Because the FLAG tag-based purification and western blotting worked better than the

StrepII-tag based system, we generated new transgenic Arabidopsis lines containing N-terminal

3×FLAG-AtCHS or SmCHS under the control of the Arabidopsis CHS promoter in the tt4-2 ​ ​ ​ ​ ​ ​ ​ background using Agrobacterium tumefaciens floral dipping. For each construct, 16 to 20 T1 ​ ​ individuals were selected based on hypocotyl purple coloration, indicating flavonoid accumulation and complementation of the CHS-null phenotype. Seeds and leaf tissue were ​ ​ collected from these individuals for propagation and gene expression analysis, respectively.

CHS gene expression analysis was performed by quantitative RT-PCR (qRT-PCR), with ​ At1g13320 as a reference gene control. New gene-specific primer pairs were designed for both ​ AtCHS and SmCHS, with efficiencies of 50.52% and 45.87%, respectively (Figure 6A). ​ ​ ​ ​

98 Although their amplification efficiencies are quite low, the primers can still be used to measure the relative expression levels of the different CHS independent lines. The T1 individuals for both ​ ​ constructs exhibited a wide range of CHS expression levels (Figure 6B). ​ ​ The T2 seeds from the T1 lines were planted and allowed to germinate, then subject to

Basta selection. The percentage of T2 seedlings surviving Basta selection was counted for each

T1 line, and those with close to 75% survival were taken forward as single-insertion lines. T3 seeds were planted and subject to Basta selection, and those that had 100% seedling survival were confirmed as homozygous in the T2 generation. These were further carried forward into the

T4 generation, but some lines were discarded after discovering that a narrow, curled leaf phenotype had emerged. The remaining T4 individuals had their CHS expression characterized ​ ​ by qRT-PCR. Two lines each for AtCHS and SmCHS were carried forward based on the consistency of CHS expression between T4 individuals and their overall health and morphology ​ ​ (Figure 6C).

99

Figure 6. Quantitative RT-PCR expression measurement of 3×FLAG-CHS transgenic ​ ​ ​ Arabidopsis lines. A, Standard curves for qRT-PCR on serial dilutions of a mix of T1 AtCHS or ​ ​ SmCHS cDNA samples. The primer efficiencies were calculated to be 50.52% and 45.87%, respectively. B, CHS expression levels of T1 plants, relative to an average of 3 control plants, ​ ​ ​ ​ showing a wide range of expression levels among independent transformants. C, CHS expression ​ ​ ​ ​ levels of T4 plants, relative to their respective T1 plants. Two T4 individuals of two or three different T3 individuals were tested for each T1 line. Most pairs of T4 individuals show similar expression levels.

100 Discussion and Future Directions

We have developed several transgenic lines and experimental methods to examine the in ​ vivo activity and oxidation of CHS. Metabolic tracing with d6-p-coumaric acid in tt5 Arabidopsis ​ ​ ​ ​ ​ ​ ​ showed linear accumulation of deuterium-labeled naringenin. This system could be used to compare Arabidopsis tt4 mutants complemented with CHS orthologs from different plant ​ ​ ​ ​ lineages to confirm whether the in vitro biochemical differences we observed previously also ​ ​ affect their in vivo activities. The mbs1-1 × tt5 line can be used to examine the effect of redox ​ ​ ​ ​ ​ ​ homeostasis on CHS activity and flavonoid biosynthesis. The various CHS transgenic lines could ​ ​ be crossed with tt5, and the plants could be grown under different light cycle or oxidative stress ​ ​ conditions to see whether their flavonoid biosynthetic flux is affected.

Like all biological processes, flavonoid metabolism is subject to regulation at many different levels. To observe changes in transcriptional regulation, we developed qRT-PCR primers to measure CHS expression, although their efficiencies could be improved by using a ​ ​ different method to design them, such as QuantPrime. We also generated multiple independent lines for each 3×FLAG-CHS ortholog to examine the effect of expression level on flavonoid ​ ​ biosynthesis, since each line has the transgene inserted randomly so that transcription may be affected by the surrounding genomic sequence.

To observe the effect of post-translational oxidative modifications on cysteine, we aimed to develop a method of isolating CHS and measuring the different cysteine oxidized states. We were able to detect peptides containing these cysteine residues that were oxidized in an in vitro ​ redox treatment. We have also preliminarily showed that the 3×FLAG-tagged CHS can be

101 isolated and detected specifically from biological samples using protein heterologously expressed in E. coli, although this protocol has yet to be tested from plant tissue. ​ ​ Our proteomic method can be further optimized to facilitate quantitative profiling of cysteine oxidation in CHSs in vivo. For example, dimedone (5,5-dimethyl-1,3-cyclohexane- ​ ​ dione) has been used to derivatize cysteine sulfenic acid, which is the least stable cysteine hyperoxidative product (Nelson et al., 2010). Our pilot experiment did not involve derivatization ​ ​ of any of the oxidized cysteine states, so an unstable state like cysteine sulfenic acid may have been underrepresented in our proteomic quantitation data. Addition of dimedone as soon as possible during the process of protein isolation from plant tissue could help us better quantify the redox state of CHS in vivo. ​ ​ In plant cells, ROS are produced in the photosynthetic reaction centers of the chloroplast in excess light conditions. The chloroplast contains redox-active enzymes and small molecules to

quench ROS, but some species like H2O2 can diffuse long distances out of the chloroplast ​ ​ ​ ​ (Asada, 2006). ROS are also produced as primary signaling molecules in response to pathogens ​ and other biotic and abiotic stresses (Møller, Jensen, & Hansson, 2007). Flavonoids likely serve ​ ​ as a second line of ROS scavengers in plants, and flavonoid biosynthesis increases in oxidative stress conditions (Fini, Brunetti, Di Ferdinando, Ferrini, & Tattini, 2011). Although it may seem ​ ​ counterintuitive that euphyllophyte CHSs have evolved to become more sensitive to oxidation, this may be an unavoidable consequence of selection for higher enzyme activity by increase reactivity of the catalytic cysteine. These plants may have instead evolved other regulatory systems to subsequently increase the expression of CHS and other flavonoid biosynthetic genes to compensate for an initial inactivation by ROS.

102

103 Materials and Methods

Arabidopsis thaliana metabolic tracing ​ 1 mg of tt5 Arabidopsis thaliana seeds were plated on top of a round piece of filter paper ​ ​ on a Murashige-Skoog (Caisson Labs) 1% agarose plate (6 cm diameter). Seeds were vernalized at 4 °C for 96 hours, then transferred to a plant growth chamber set to 25 °C and 16 hour light and 8 hour dark cycle. After 5 days, the filter paper with germinated seedlings was removed and placed into a new petri dish (6 cm diameter) with 1 mL of liquid Murashige-Skoog media with

100 µM d6-p-coumaric acid and incubated in the growth chamber at 25 °C. Samples were taken ​ ​ ​ ​ at 2, 4, 8, and 24 hours of incubation by scraping the seedlings off the filter paper and patting them dry with a Kimwipe. Seedlings were then placed in a microcentrifuge tube, weighed, flash frozen in liquid nitrogen, and stored at -80 °C. Each seedling sample had a fresh weight of about

30 mg on average.

Metabolite extraction was performed by adding 5 µL of 50% methanol per mg of seedling tissue, then incubating for 2 hours at 50 °C. Samples were centrifuged at 21,000 g for

15 min, and the supernatant was taken for analysis by liquid chromatography−high resolution mass spectrometry (LC−HRMS). LC was performed on a Dionex UltiMate 3000 UHPLC system

(Thermo Fisher Scientific) using water with 0.1% formic acid (Solvent A) and acetonitrile with

0.1% formic acid (Solvent B) and a Kinetex 2.6 μm C18 100 Å column (Phenomenex) at 30 °C.

The elution gradient gradient was 5% B for 2 min, 5-95% B for 23 min, 95% B for 3 min, 95-5%

B for 0.1 min, 5% B for 2.9 min, at a flow rate of 0.8 mL/min. Compounds were detected on a high-resolution Q-Exactive benchtop Orbitrap mass spectrometer (Thermo Fisher Scientific) using a full scan range of 100−1250 m/z in negative ionization mode. The amount of label

104 incorporation for each metabolite was calculated as the peak area of the deuterium-labeled metabolite (M+6 for naringenin; M+4 for sinapoyl malate and kaempferol 3-O-glucoside-7-O- ​ ​ ​ ​ rhamnoside) divided by the sum of the peak areas of the deuterium-labeled and unlabeled compounds.

Chiral chromatographic analysis of tt5 metabolic extracts ​ ​ Approximately 10 mg each of tt5 leaf and silique tissue were incubated in 1 mL 50% ​ ​ methanol at 50 °C for 2 hours. The supernatant was taken for chiral LC−MS analysis. LC was performed on a Dionex UltiMate 3000 UHPLC system (Thermo Fisher Scientific) using water with 0.1% formic acid (Solvent A) and acetonitrile with 0.1% formic acid (Solvent B) and a Lux

3 µm Cellulose-4 column (Phenomenex) at 30 °C. The elution gradient gradient was 5% B for 2 min, 5−95% B for 23 min, 95% B for 3 min, 95−5% B for 0.1 min, 5% B for 2.9 min, at a flow rate of 0.8 mL/min. MS was performed on a TSQ Quantum Access MAX mass spectrometer

(Thermo Fisher Scientific) using a full scan range of 100−800 m/z in negative ionization mode.

Genotyping

2 to 3 cauline leaves from each plant were placed in a microcentrifuge tube and ground with a pestle. 500 µL of extraction buffer (200 mM Tris-HCl pH 8, 250 mM NaCl, 25 mM

EDTA, 0.5% SDS) was added, and the tube was vortexed. The sample was incubated at 50 °C for 10 min, vortexed, and centrifuged at max speed for 10 min in a tabletop centrifuge at room temperature. 300 µL of the supernatant was transferred to a new tube containing 300 µL isopropanol, and the tube was vortexed at low speed, then placed at -20 °C for 15 to 30 min. The

105 samples were centrifuged at max speed for 10 min in a tabletop centrifuge at room temperature, and the supernatant was removed using a vacuum line. 500 µL of ice-cold 70% ethanol was added, and the sample was centrifuged at max speed for 10 min at 4 °C. The supernatant was removed using a vacuum line, and the pellet was dried on the benchtop or in a speed vac. The pellet was resuspended in 100 µL of TE Buffer (10 mM Tris pH 7, 1 mM EDTA) by pipetting and centrifuged at max speed for 1 min at 4 °C. The supernatant was transferred to a new tube, and 1 µL was used as a template for genotyping PCR. Genotyping primers were designed using the SIGnAL Salk T-DNA primer design tool (http://signal.salk.edu/tdnaprimers.2.html) (Table ​ ​ 1).

CHS in vitro redox treatment and proteomic mass spectrometry ​ ​ About 10 µg of purified recombinant AtCHS and SmCHS protein was incubated in an assay buffer [6.25 mM Tris pH 8, 25 mM NaCl, and 0.2 mM dithiothreitol (DTT)], supplemented with one of the six following redox conditions: A. 5 mM hydrogen peroxide, B. 1 mM hydrogen peroxide, C. 5 mM oxidized glutathione, D. water (no additional redox agent), E.

5 mM reduced glutathione, and F. 5 mM DTT. After incubation for 15 min at room temperature,

10 µL of 2× non-reducing SDS sample buffer (250 mM Tris-HCl pH 6.8, 8% SDS, 40% glycerol, 0.02% bromophenol blue) was added to 10 µL of the protein samples. Samples were run on 12% SDS-PAGE. The gel was fixed for 15 min in 40% methanol/10% acetic acid, stained with BioSafe Coomassie (Bio-Rad), and destained with distilled water overnight. Gel bands were cut out with a razor blade and submerged in 100 µL 50% methanol. After dehydration, the samples were incubated in 100 mM iodoacetamide for 30 min at room temperature to label

106 reduced thiols, without the usual previous step of adding DTT to reduce disulfide bonds. In-gel digestion with trypsin was performed overnight at 37 °C. Samples were then analyzed on a

Thermo Fisher Orbitrap Elite hybrid ion trap-orbitrap mass spectrometer. The MS/MS spectra were compared using the Mascot search engine (Matrix Science) against computationally generated MS/MS spectra of simulated trypsinized peptides from AtCHS and SmCHS.

Modifications were searched for by mass additions to the mass of whole peptides. The mass difference of various modified cysteine compared to cysteine is listed as follows:

Mass difference vs. Cys Modification Exact Mass (121.02)

Carbamidomethyl (labeled thiol) 178.04 57.02

Sulfenic acid (monooxidation) 137.01 15.99

Sulfinic acid (dioxidation) 153.01 31.99

Sulfonic acid (trioxidation) 169 47.98

Glutathione (mixed disulfide) 426.09 305.07

DTT (mixed disulfide) 273.02 152

Data were analyzed using Scaffold (Proteome Software) and custom Python scripts. Total number of peptides for each modification was examined by spectral counting.

Cloning, expression, and purification of 3×FLAG-CHS

The 3×FLAG sequence was synthesized as a gBlocks® gene fragment (Integrated DNA

Technologies). The fragment was cloned together with AtCHS or SmCHS into NcoI-digested ​ ​ ​ ​ pHis8-4 using Gibson assembly. Some synonymous mutations were introduced into AtCHS by ​ ​

107 PCR to fix a mis-annealing error encountered in Gibson assembly: in the AtCHS coding ​ ​ sequence, they are T15A, T18A, T19A and C20G. Proteins were expressed in the BL21(DE3) E. ​ coli strain cultivated in terrific broth (TB) and induced with 0.1 mM isopropyl ​ β-D-1-thiogalactopyranoside (IPTG) overnight at 18 °C. E. coli cells were harvested by ​ ​ centrifugation, resuspended in 150 mL lysis buffer (50 mM Tris pH 8.0, 500 mM NaCl, 30 mM imidazole, 5 mM DTT), and lysed with five passes through an M-110L microfluidizer

(Microfluidics). The resulting crude protein lysate was clarified by centrifugation (19,000 g, 1 h).

TEV digestion was performed on half of the cleared lysate overnight at 4 °C.

ANTI-FLAG M2 magnetic bead purification was performed according to the manufacturer’s instructions for the batch format and elution with FLAG peptide (Sigma-Aldrich). Proteomic mass spectrometry to identify AtCHS and SmCHS from SDS-PAGE bands was performed as described above.

Western blotting of 3×FLAG-CHS

Tris-glycine SDS-PAGE was performed with the cleared E. coli lysate (1 µL), ​ ​ supernatant after ANTI-FLAG M2 bead binding (1 µL), and the bead elution (7.5 µL) for

AtCHS and SmCHS, untreated or TEV cleaved. Electrophoretic transfer was performed using the Bio-Rad Criterion Blotter system using a transfer buffer consisting of 25 mM Tris, 192 mM glycine, and 20% analytical grade methanol adjusted to pH 8.3, at a constant current of 300 mA for 3 hours. Ponceau S staining was performed on the membrane for 5 min, followed by destaining with distilled water. Blocking was performed with 3% nonfat dry milk in TBS.

Primary blotting was performed with the ANTI-FLAG M2 antibody (Sigma-Aldrich, Lot number

108 SLBJ7864V) at a concentration of 1 µg/mL at room temperature for 30 min. The membrane was washed with TBS, and then secondary blotting was performed with a goat anti-mouse IgG antibody-HRP conjugate (Sigma Aldrich) at 1:10,000 dilution with 3% nonfat dry milk.

Chemiluminescence detection was performed with the Pierce ECL substrate (Thermo Fisher).

Generation of transgenic Arabidopsis lines

The AtCHS promoter (defined as 1328 bp of sequence upstream of the CHS transcription ​ ​ start site) was amplified via PCR from Arabidopsis genomic DNA, digested with HindIII and

XhoI, and ligated into HindIII- and XhoI-digested pCC 1136, a promoterless Gateway cloning binary vector containing a BAR resistance gene marker, to generate pJKW 0152. The

3×FLAG-AtCHS and SmCHS ORFs were then PCR amplified from previously generated ​ ​ ​ ​ plasmid constructs and cloned into pCC 1155, an ampicillin-resistant version of the pDONR221

Gateway cloning vector, with BP clonase in the Gateway cloning method (Thermo-Fisher). The resulting vectors were recombined with pJKW 0152 using LR clonase in the Gateway cloning method to generate the final binary constructs. Agrobacterium tumefaciens-mediated ​ ​ transformation of Arabidopsis was performed using the floral dipping method (19).

Basta selection was performed on T1 seedlings. T1 individuals exhibiting purple hypocotyl coloration were chosen to carry forward for gene expression analysis and seed collection: 16 for AtCHS, 20 for SmCHS. Among each set of transformants, 2 or 3 individuals that did not exhibit purple hypocotyl coloration were chosen as control plants for gene expression analysis.

109 Seed planting and Basta selection was performed for subsequent generations. The percentage of resistant T2 seedlings was counted for each T1 line, and those with close to 75% survival were taken forward as single-insertion lines. T3 seeds that had 100% seedling survival were confirmed as homozygous in the T2 generation.

Quantitative RT-PCR

Total RNA was extracted using a Qiagen RNeasy Plant Mini Kit according to the manufacturer’s instructions, with on-column DNase treatment. The concentration and purity of

RNA were determined by absorbance at 260/280 nm. First-strand cDNA was synthesized from

1 µg of RNA using SuperScript III Reverse Transcriptase with Oligo dT primers (Thermo

Fisher). Reactions were run on a QuantStudio 6 system (Thermo Fisher) using SYBR Green

Master Mix (Thermo Fisher) and primers listed in Table 1. Gene expression values were

calculated using CT values and normalized using the reference gene At1g13320. ​ ​ ​ ​

110 Table 1. Primers used in this study.

Name Sequence (5′ to 3′) Purpose

GL0039 TGACTGGAACTCCCTCTTCT AtCHS qRT-PCR forward ​ ​ GL0049 GCCCTCATCTTCTCTTCCTTTAG AtCHS qRT-PCR reverse ​ ​ GL0049 CTCTCATCATCGGCTCCAATC SmCHS qRT-PCR forward ​ ​ GL0050 TCCCAGAATTGCTCCATCAC SmCHS qRT-PCR reverse ​ ​ GL0053 TTCCATTTTCTCACCGACCAA MBS1 genotyping LP ​ GL0054 TTCTTCAAGCTTCCCCTGAT MBS1 genotyping RP ​ JKW0054 GCCTTTTCAGAAATGGATAAATAGCCTTGCTTCC MBS1 genotyping BP ​ (SAIL LB1)

JKW0444 TAACGTGGCCAAAATGATGC At1g13320 genotyping LP ​ and qRT-PCR

JKW0445 GTTCTCCACAACCGCTTGGT At1g13320 genotyping RP ​ and qRT-PCR

111 References

Asada, K. (2006). Production and scavenging of reactive oxygen species in chloroplasts and their functions. Plant Physiology, 141(2), 391–396. ​ ​ ​ ​ Basu, M. K., & Koonin, E. V. (2005). Evolution of Eukaryotic Cysteine Sulfinic Acid Reductase, Sulfiredoxin (Srx), from Bacterial Chromosome Partitioning Protein ParB. Cell ​ Cycle. https://doi.org/10.4161/cc.4.7.1786 ​ ​ Brandes, N., Schmitt, S., & Jakob, U. (2009). Thiol-based redox switches in eukaryotic proteins. Antioxidants & Redox Signaling, 11(5), 997–1014. ​ ​ ​ Butterfield, D. A., Allan Butterfield, D., Hardas, S. S., & Bader Lange, M. L. (2010). Oxidatively Modified Glyceraldehyde-3-Phosphate Dehydrogenase (GAPDH) and Alzheimer’s Disease: Many Pathways to Neurodegeneration. Journal of Alzheimer’s ​ Disease. https://doi.org/10.3233/jad-2010-1375 ​ ​ D’Autréaux, B., & Toledano, M. B. (2007). ROS as signalling molecules: mechanisms that generate specificity in ROS homeostasis. Nature Reviews. Molecular Cell Biology, 8(10), ​ ​ ​ ​ 813–824. Denu, J. M., & Tanner, K. G. (1998). Specific and reversible inactivation of protein tyrosine phosphatases by hydrogen peroxide: evidence for a sulfenic acid intermediate and implications for redox regulation. Biochemistry, 37(16), 5633–5642. ​ ​ ​ ​ Fini, A., Brunetti, C., Di Ferdinando, M., Ferrini, F., & Tattini, M. (2011). Stress-induced flavonoid biosynthesis and the antioxidant machinery of plants. Plant Signaling & ​ Behavior. https://doi.org/10.4161/psb.6.5.15069 ​ ​ Gupta, R., & Luan, S. (2003). Redox control of protein tyrosine phosphatases and mitogen-activated protein kinases in plants. Plant Physiology, 132(3), 1149–1152. ​ ​ ​ ​ Ishii, T., Sunami, O., Nakajima, H., Nishio, H., Takeuchi, T., & Hata, F. (1999). Critical role of sulfenic acid formation of thiols in the inactivation of glyceraldehyde-3-phosphate dehydrogenase by nitric oxide. Biochemical Pharmacology, 58(1), 133–143. ​ ​ ​ ​ Jez, J. M., Bowman, M. E., Dixon, R. A., & Noel, J. P. (2000). Structure and mechanism of the evolutionarily unique plant enzyme chalcone isomerase. Nature Structural Biology, 7(9), ​ ​ ​ ​ 786–791. Kettenhofen, N. J., & Wood, M. J. (2010). Formation, reactivity, and detection of protein sulfenic acids. Chemical Research in Toxicology, 23(11), 1633–1646. ​ ​ ​ ​ Liou, G., Chiang, Y.-C., Wang, Y., & Weng, J.-K. (2018). Mechanistic basis for the evolution of chalcone synthase catalytic cysteine reactivity in land plants. Journal of Biological ​ Chemistry, 293, 18601–18612. ​ ​ ​ Møller, I. M., Jensen, P. E., & Hansson, A. (2007). Oxidative modifications to cellular components in plants. Annual Review of Plant Biology, 58, 459–481. ​ ​ ​ ​ Nelson, K. J., Klomsiri, C., Codreanu, S. G., Soito, L., Liebler, D. C., Rogers, L. C., … Poole, L. B. (2010). Use of dimedone-based chemical probes for sulfenic acid detection methods to visualize and identify labeled proteins. Methods in Enzymology, 473, 95–115. ​ ​ ​ ​ Paulsen, C. E., & Carroll, K. S. (2010). Orchestrating redox signaling networks through regulatory cysteine switches. ACS Chemical Biology, 5(1), 47–62. ​ ​ ​ ​ Peer, W. A. (2001). Flavonoid Accumulation Patterns of Transparent Testa Mutants of Arabidopsis. PLANT PHYSIOLOGY. https://doi.org/10.1104/pp.126.2.536 ​ ​ ​

112 Rhee, S. G., Jeong, W., Chang, T.-S., & Woo, H. A. (2007). Sulfiredoxin, the cysteine sulfinic acid reductase specific to 2-Cys peroxiredoxin: its discovery, mechanism of action, and biological significance. Kidney International. Supplement, (106), S3–S8. ​ ​ Schmalhausen, E. V., Nagradova, N. K., Boschi-Muller, S., Branlant, G., & Muronetz, V. I. (1999). Mildly oxidized GAPDH: the coupling of the dehydrogenase and acyl phosphatase activities. FEBS Letters, 452(3), 219–222. ​ ​ ​ ​ Seo, Y. H., & Carroll, K. S. (2011). Quantification of protein sulfenic acid modifications using isotope-coded dimedone and iododimedone. Angewandte Chemie, International Edition, ​ ​ 50(6), 1342–1345. ​ Shao, N., Duan, G. Y., & Bock, R. (2013). A mediator of singlet oxygen responses in Chlamydomonas reinhardtii and Arabidopsis identified by a luciferase-based genetic screen in algal cells. The Plant Cell, 25(10), 4209–4226. ​ ​ ​ ​ Shirley, B. W., Kubasek, W. L., Storz, G., Bruggemann, E., Koornneef, M., Ausubel, F. M., & Goodman, H. M. (1995). Analysis of Arabidopsis mutants deficient in flavonoid biosynthesis. The Plant Journal: For Cell and Molecular Biology, 8(5), 659–671. ​ ​ ​ ​ van Montfort, R. L. M., Congreve, M., Tisi, D., Carr, R., & Jhoti, H. (2003). Oxidation state of the active-site cysteine in protein tyrosine phosphatase 1B. Nature, 423(6941), 773–777. ​ ​ ​ ​

113 Appendix

Investigation of galloylated catechin biosynthetic enzymes in tea

Authors 1,2 1,2 Geoffrey Liou ​ and Jing-Ke Weng ​ ​

Author Affiliations 1. Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, USA 2. Whitehead Institute for Biomedical Research, Cambridge, MA 02142, USA

114 Abstract

Galloylated catechins, such as epigallocatechin gallate (EGCG), are highly abundant flavan-3-ols found in tea, Camellia sinensis. They function in pathogen defense in the tea plant, ​ ​ and the health benefits of tea consumption, such as antioxidative and anti-inflammatory effects, are attributed to EGCG and related flavonoid compounds. The final step of galloylated catechin biosynthesis, the addition of the galloyl moiety onto unmodified flavan-3-ols, was recently shown to involve two enzymatic steps: UDP-glucosyl:galloyl-1-O-β-D-glucosyltransferase ​ ​ ​ ​

(UGGT) and epicatechin:galloyl-1-O-β-D-glucose O-galloyltransferase (ECGT). The gene ​ ​ ​ ​ ​ ​ encoding the enzyme performing the UGGT step has been identified and biochemically characterized, but not yet for the ECGT step. In this chapter, we investigated candidate genes from C. sinensis and developed a heterologous expression system in Nicotiana benthamiana to ​ ​ ​ ​ test for ECGT activity both in vivo and in vitro. ​ ​ ​ ​

115 Introduction

Tea is the second most widely consumed beverage worldwide, surpassed only by water

(Brody, 2019). It is prepared from the leaves of two varieties of Camellia sinensis, var. sinensis ​ ​ ​ ​ and var. assamica. Tea has been consumed for around 4000 years, originating in China, where it ​ ​ has been valued for its stimulative properties and other health benefits for millennia. Modern research has begun to shed light on the metabolites in tea that may contribute to these traditionally attested health effects, such as stress relief and improved memory.

The stimulative effect of tea is well known, and so is the mechanism of caffeine, the compound in tea that provides this effect. Caffeine, which functions as a toxic deterrent to

insects and herbivores in plants, inhibits the adenosine A2A receptor in humans (Huang et al., ​ ​ ​ 2005). Adenosine promotes sleep in the sleep-wake cycle, so caffeine promotes wakefulness by ​ acting as an antagonist. Consumption of caffeinated beverages has also been associated with reduced risk of Parkinson’s disease (Pluskal & Weng, 2018). Some of the beneficial effects of ​ ​ tea may result from the complex mixture of natural products found in the plant and the prepared beverage. Theanine, an amino acid, is highly abundant in tea and contributes to its umami flavor.

It has been shown to improve memory and reaction time in combination with caffeine, more so than either theanine or caffeine alone (Haskell, Kennedy, Milne, Wesnes, & Scholey, 2008). ​ ​ Flavanols are an extremely abundant class of polyphenols in tea, making up 25% of the dry weight of tea leaves (Balentine, Wiseman, & Bouwens, 1997). The majority of these ​ ​ flavanols are catechins, a subclass of flavan-3-ols that have di- or trihydroxy substitution on the

B ring and dihydroxy substitution on the A ring. Tea contains mostly (−)-epicatechin (EC),

(−)-epicatechin gallate (ECG), (−)-epigallocatechin (EGC), and (−)-epigallocatechin gallate

116 (EGCG). Catechins, particularly EGCG, have been of interest for many potential health benefits in humans, such as antioxidant, anticancer, and anti-inflammatory properties (Cabrera, Artacho, ​ & Giménez, 2006). ​ The biosynthetic pathways of the non-galloylated catechins epicatechin and epigallocatechin are well understood. The flavan-3,4-diols leucocyanidin and leucodelphinidin are catalyzed by anthocyanidin synthase (ANS) to form cyanidin and delphinidin, respectively.

These are subsequently reduced by anthocyanidin reductase to form EC and EGC, respectively

(Punyasiri et al., 2004). The enzymatic activities involved in galloylation of EC and EGC ​ remained unsolved for a long time, until they were recently elucidated by activity-guided fractionation of C. sinensis crude protein extract (Liu et al., 2012). First, a UDP-glucosyl:galloyl- ​ ​ ​ ​

1-O-β-D-glucosyltransferase (UGGT) step uses gallic acid and UDP-glucose to form ​ ​ ​ ​ β-glucogallin. This compound serves as an activated intermediate for transfer of the galloylation of a variety of phenolic compounds, such as gallotannins in Quercus robur (pedunculate oak) ​ ​ (Mittasch, Böttcher, Frolova, Bönn, & Milkowski, 2014). The second enzymatic step, ​ epicatechin:galloyl-1-O-β-D-glucose O-galloyltransferase (ECGT), uses β-glucogallin and EC or ​ ​ ​ ​ ​ ​ EGC to form their respective galloylated versions, ECG and EGCG.

The UGGT gene was later identified as CsUGT84A22 (Cui et al., 2016), but the ECGT ​ ​ gene has yet to be completely characterized. The original study identifying the two enzymatic activities from tea protein extract identified several properties of the ECGT enzyme that suggested that it belongs to the serine carboxypeptidase-like (SCPL) family (Liu et al., 2012). ​ ​ SDS-PAGE analysis of ECGT activity-containing fractions indicated that the basic ECGT is a 58 or 60 kDa heterodimeric protein consisting of a 34 or 36 kDa and a 28 kDa subunit. These

117 subunits, or larger assemblies of the 58 or kDa units, may be held together by disulfide bonds, because β-mercaptoethanol inhibited ECGT activity. PMSF, an inhibitor of serine proteases, also inhibited ECGT activity, suggesting that it shares a similar reaction mechanism. The uncertain mass of the protein came from possible post-translational modifications; SCPL enzymes are often glycosylated. Proteomic mass spectrometry analysis also identified a peptide sequence from the ECGT sample that matched to a protein that the researchers called SCPL1199, translated from the C. sinensis genome. A different group later cloned a gene they called ​ ​ CsSCPL and showed that it is highly expressed in leaf buds and young leaves; its expression ​ increases in response to heat stress and decreases in response to cold, salt, or drought stress; and galloylated catechin content correlated with CsSCPL expression (Chiu, Chen, Tzen, & Yang, ​ ​ ​ 2016). This gene had the same sequence as a candidate gene that we had identified from a C. ​ ​ sinensis EST database. ​ SCPL enzymes are a relatively recently identified class of acyltransferases. As their name implies, they share sequence homology to serine carboxypeptidases, and they share the same catalytic triad of serine, histidine, and aspartic acid residues (Milkowski & Strack, 2004). The ​ ​ first SCPL to be cloned and characterized was sinapoylglucose:malate sinapoyltransferase from

Arabidopsis thaliana (AtSMT) (Lehfeldt et al., 2000). AtSMT is the gene mutated in Arabidopsis ​ ​ ​ ​ ​ sng1 (sinapoylglucose accumulator 1) mutants, which cannot synthesize sinapoylmalate, a major ​ phenylpropanoid. Other SCPL enzymes have since been identified in Arabidopsis and other plant species, such as sinapoylglucose:choline sinapoyltransferase (SCT) in A. thaliana, Brassica ​ ​ ​ napus, and Avena strigosa (lopsided oat) (Milkowski, Baumert, Schmidt, Nehlin, & Strack, ​ ​ ​ ​ 2004; Mugford et al., 2009; Shirley, McMichael, & Chapple, 2001); ​

118 sinapoylglucose:anthocyanin acyltransferase in A. thaliana (Fraser et al., 2007); and enzymes ​ ​ ​ ​ that form acyl sugars in Lycopersicon pennellii (Li & Steffens, 2000). ​ ​ ​ ​ SCPL enzymes often undergo various post-translational modifications. There are

N-glycosylation sites found in the amino acid sequence of AtSMT that may explain the ​ differences between the observed mass in several studies and the predicted mass from the sequence alone (Ciarkowska, Ostrowski, Starzyńska, & Jakubowska, 2019). SCPL enzymes also ​ ​ possess an N-terminal signal sequence for translocation to the endoplasmic reticulum during translation, eventually directing them to the vacuole. AtSCT was also observed as a heterodimer formed by proteolytic cleavage of an internal loop; many SCPL sequences have this loop that must be removed for the enzyme to exhibit proper activity (Ciarkowska et al., 2019). Homology ​ ​ modeling also suggested that AtSMT, BnSCT, and AtSCT require disulfide bonds to hold the subunits of the heterodimer together (Stehle, Brandt, Milkowski, & Strack, 2006). ​ ​ The genomes and transcriptomes of C. sinensis var. assamica and var. sinensis were ​ ​ ​ ​ ​ ​ sequenced recently (Wei et al., 2018; Xia et al., 2017). The C. sinensis var. sinensis genome ​ ​ ​ ​ ​ ​ paper also investigated galloylated catechin biosynthesis, showing that tea has 22 SCPL genes and comparative expression analysis of these genes across various tissues. To identify and characterize the SCPL from C. sinensis responsible for galloylation of EC and EGC, we took a ​ ​ candidate gene approach using these new genomic resources. We searched the C. sinensis var. ​ ​ sinensis young leaf transcriptome using the peptide sequence provided by Liu et al. 2012. We ​ then attempted to express these candidate genes in the heterologous hosts E. coli, Saccharomyces ​ ​ ​ cerevisiae, and Nicotiana benthamiana. ​ ​ ​

119 Results

CsUGGT expression in Nicotiana benthamiana produces β-glucogallin ​ ​ CsUGGT (CsUGT84A22) was cloned into the pEAQ-HT vector, and the resulting plasmid was used to transform Agrobacterium tumefaciens strain LBA4404. Transient protein ​ ​ expression was performed in Nicotiana benthamiana leaves by Agrobacterium-mediated ​ ​ transformation. Gallic acid and/or β-glucogallin substrate was also co-infiltrated with

Agrobacterium. Protein (mGFP) and no-substrate controls were also performed. Expression of

CsUGGT with gallic acid infiltration, either with or without UDP-glucose infiltration, led to high amounts of β-glucogallin production (Figure 1). A small amount of β-glucogallin (not visible at the scale of Figure 1) was also detected when CsUGGT was expressed and no substrate was co-infiltrated, suggesting that there may be a small amount of gallic acid present natively in

Nicotiana benthamiana. ​ Other peaks were also detected in the same mass window as β-glucogallin but at different retention times when mGFP was expressed and gallic acid was infiltrated, either with or without

UDP-glucose infiltration (Figure 1). This suggests that N. benthamiana possesses a UGT enzyme ​ ​ that can use gallic acid and UDP-glucose to produce an isomer of β-glucogallin.

Identification of ECGT candidate genes

The peptide sequence published in Liu et al. 2012 was used as the query in a tblastn search of a Camellia sinensis EST database. Blastp searches of the Arabidopsis thaliana and ​ ​ ​ ​ Vitis vinifera proteomes were also performed. A neighbor-joining tree was generated for the ​ combined

120

Figure 1. CsUGGT produces β-glucogallin when heterologously expressed in Nicotiana ​ ​ benthamiana. Protein expressed by Agrobacterium-mediated transformation and infiltrated ​ ​ ​ substrates (GA, gallic acid; UDP-glc, UDP-glucose) are listed for each LC/MS trace. Each trace shows an extracted ion chromatogram for the mass range around the expected mass-to-charge + ratio for the [M+H] ​ ion of β-glucogallin. Peaks are labeled with their retention times in minutes. ​

121 list of top hits from each search. Three closely clustered C. sinensis sequences were designated ​ ​ CsECGT1, CsECGT2, and CsECGT3. Primers were designed to amplify these genes from C. ​ sinensis cDNA. These were the first candidate ECGT genes investigated. ​ Additional candidate genes were identified later after the publication of the Camellia ​ sinensis var. sinensis genome and transcriptome. This transcriptome was queried with the same ​ ​ ​ peptide sequence. Out of the resulting hits, 10 of the most highly and differentially expressed genes, as listed in the supplemental data of Wei et al. 2018, were selected for further analysis.

Sequence alignment of these 10 sequences showed that they clustered into 7 groups with identical sequences at the extreme 5′ and 3′ ends (the first and last 15 nucleotides), so 7 pairs of primers were designed to amplify these genes from C. sinensis leaf cDNA. Only two of the most ​ ​ highly expressed genes were successfully cloned; they were named CsSCPL1 and CsSCPL3.

The RNA and cDNA samples were prepared from relatively old leaf tissue samples, so the expression of these SCPL candidate genes was likely lower than they would have been in young leaves, making it difficult to amplify the more lowly expressed genes for cloning.

Nicotiana benthamiana leaf protein extraction fails to show ECGT activity ​ Transient overexpression of CsECGT1 was attempted in N. benthamiana leaves along ​ ​ with co-infiltration of β-glucogallin and EGC substrates, but EGCG was not detected in the leaf metabolite extracts. We suspected that even if EGCG product formation had occurred, enzymes in N. benthamiana may be consuming the product and rendering it undetectable. To test this ​ ​ hypothesis and to search for any potential downstream metabolites by untargeted metabolomics, we infiltrated ECG and EGCG into N. benthamiana leaves with or without co-infiltration of ​ ​

122 mGFP or CsUGGT enzymes. Leaf samples were collected for metabolite extraction 1 day post infiltration. ECG and EGCG were not detected, and there were no obvious flavonoid-derived compounds enriched in the samples. These results suggested that metabolic extraction of leaves overexpressing CsECGT would not be the best way to detect EGCG formation.

We then tried crude protein extraction from leaves overexpressing CsECGT, and then using the protein extract for in vitro enzyme assays to measure ECG or EGCG formation from ​ ​ β-glucogallin and EC or EGC. CsSCPL1 and CsSCPL3 were also tested, and mGFP and

CsUGGT were used as negative controls. Leaf tissue was frozen and ground in a mortar and pestle, then extracted with a pH 6 sodium citrate buffer, without protease inhibitors for these initial experiments. A boiled protein control was also performed. Two distinct peaks on the

EGCG SRM trace, with different retention times from the EGCG standard, were detected for boiled, but not unboiled, N. benthamiana leaf protein extract samples, regardless of which ​ ​ enzyme was expressed (Figure 2). This was a very unexpected result; it suggests both that there is enzymatic activity in N. benthamiana that can use β-glucogallin and EGC to make a ​ ​ compound with the same mass and similar fragmentation pattern as EGCG, and that the activity requires activation by boiling of the protein extract. No activity was detected in any samples when EC was used as the substrate.

123

Figure 2. In vitro enzyme assay with N. benthamiana leaf protein extracts shows peaks detected ​ ​ ​ ​ ​ by SRM for EGCG in boiled protein samples, regardless of which protein was expressed.

124 Discussion and Future Directions

To identify the gene responsible for catechin galloylation in tea, we found candidates for both the UGGT and ECGT enzymatic steps and attempted to heterologously express them. We were able to reconstitute UGGT activity in Nicotiana benthamiana. We thus far have failed to ​ ​ observe ECGT activity when any of the candidate genes were expressed in N. benthamiana or S. ​ ​ ​ cerevisiae (data not shown), and attempts at CsECGT1 expression in E. coli also failed to ​ ​ ​ produce detectable amounts of protein. The various post-translational modifications required for

SCPL enzymes is the likely cause of these failures.

Our protein extraction methods from N. benthamiana and S. cerevisiae may also have ​ ​ ​ ​ been insufficient to detect activity even if the enzyme was being expressed and modified properly. We used mostly simple whole-cell lysis techniques and use the protein lysate with minimal purification steps. It may be helpful to use a previously studied SCPL enzyme, such as

AtSMT or AtSCT, as a positive control and follow the published expression and purification methods. Some groups have optimized expression in S. cerevisiae by changing the N-terminal ​ ​ signal sequence to that of a yeast protein, for example (Stehle, Stubbs, Strack, & Milkowski, ​ 2008). All published heterologous expression methods also seem to rely on activity-guided ​ fractionation of the protein lysate (Mugford & Milkowski, 2012). ​ ​ Because of the high sequence similarity between the ECGT and SCPL candidates we have investigated, it is possible that they have overlapping function in planta. Thus, to show both ​ ​ necessity and sufficiency of these genes in catechin galloylation, a combinatorial approach to heterologous expression may be optimal. Others in our group have attempted this strategy, which involves expressing all candidate genes in N. benthamiana by infiltrating a mixture of ​ ​

125 Agrobacterium strains that each harbor one gene. Formation of the target compound is detected in N. benthamiana plants infiltrated with the mixture, the experiment can be repeated with one ​ ​ strain can be dropped out at a time to identify whether any single strain is necessary for activity.

To clone additional SCPL candidates that have not yet been successfully cloned, new cDNA can be prepared from young leaf samples of C. sinensis. ​ ​

126 Materials and Methods

RNA extraction and cDNA template preparation

Total RNA was extracted from C. sinensis var. sinensis adult leaf tissue using the RNeasy ​ ​ ​ ​ Plant Mini Kit (QIAGEN). First-strand cDNAs were synthesized by RT-PCR from the total

RNA samples as templates using the SuperScript III First-Strand Synthesis System with the

oligo(dT)20 primer (Thermo Fisher Scientific). ​​ ​

Cloning of candidate genes from cDNA

Phusion High-Fidelity DNA Polymerase (Thermo Fisher Scientific) was used for PCR amplifications from C. sinensis var. sinensis cDNA. Gibson assembly was used to clone the ​ ​ ​ ​ amplified genes into target vectors. Restriction enzymes and Gibson assembly reagents were ​ purchased from New England Biolabs. Oligonucleotide primers were purchased from Integrated

DNA Technologies. All primers used for cloning are listed in Table 1.

Transcriptome assembly

RNA-seq datasets for Camellia sinensis var. assamica stem and young leaf (NCBI SRA ​ ​ ​ ​ accessions SRR5421032 and SRR5421035) were de novo assembled into a transcriptome using ​ ​ Trinity (Grabherr et al., 2011). Camellia sinensis var. assamica and var. sinensis genomes were ​ ​ ​ ​ ​ ​ ​ ​ also downloaded from their respective genome project websites (Wei et al., 2018; Xia et al., ​ 2017). Transcriptome and genome mining were performed on a local BLAST server (Priyam et ​ ​ al., 2015). ​

127 Sequence alignment and phylogenetic analysis

Sequence alignment and phylogenetic analyses. Sequence alignments were performed ​ using the MUSCLE algorithm (Edgar, 2004) in MEGA7 (Kumar, Stecher, & Tamura, 2016). ​ ​ ​ ​ ​ Evolutionary histories were inferred by using the Maximum Likelihood method based on the

JTT matrix-based model (Jones, Taylor, & Thornton, 1992). Bootstrap values were calculated ​ ​ ​ using 1,000 replicates. All phylogenetic analyses were conducted in MEGA7 (Kumar et al., ​ ​ 2016). ​

Transient expression in Nicotiana benthamiana ​ Candidate genes were cloned into the pEAQ-HT vector (Peyret & Lomonossoff, 2013; ​ Sainsbury, Thuenemann, & Lomonossoff, 2009) and transformed into the ElectroMAX ​ Agrobacterium tumefaciens strain LBA4404 (Invitrogen). Bacteria were cultivated at 30 °C to ​​

OD600 of 1.5 in 50 mL of YM medium (0.4 g/L yeast extract, 10 g/L mannitol, 0.1 g/L NaCl, 0.2 ​ ​ g/L MgSO4·7H2O, 0.5 g/L K2HPO4·3H2O), washed with 0.5× PBS (68 mM NaCl, 1.4 mM KCl, ​ ​ ​​ ​ ​​ ​ ​ ​ ​ ​

5 mM Na2HPO4, 0.9 mM KH2PO4), and resuspended in 0.5× PBS to OD600 of 0.8. ​​ ​ ​​ ​ ​ ​ ​​ ​​ ​​ ​ ​ Approximately 1 mL of the final culture was used to infiltrate the underside of 5-6 week old N. ​ benthamiana leaves. Leaves were harvested 3 days post infiltration for protein extraction, or 5 ​​ days post infiltration for metabolite extraction.

Plant metabolite extraction

Approximately 100 mg of plant leaf tissue was dissected, transferred into grinding tubes containing approximately 15 zirconia/silica disruption beads (2 mm diameter; Research Products

128 International), and snap-frozen in liquid nitrogen. The frozen samples were homogenized twice on a TissueLyser II (QIAGEN). Metabolites were extracted using 5 to 10 volumes (w/v) of 50% methanol at 50 °C for 1 hour. Extracts were centrifuged twice (13,000 g, 20 min) and supernatants were collected for LC−MS analysis.

Plant protein extraction

Whole leaves (550 to 750 mg wet weight) were collected from N. benthamiana and ​ ​ frozen in liquid nitrogen. Leaves were pulverized to a fine powder with a mortar and pestle, and

10× w/v 100 mM sodium citrate pH 6 buffer (40.9 mM sodium citrate dihydrate, 59.03 mM citric acid, 0.1% Tween 20) was added and allowed to incubate at room temperature for 10 to 20 min. Samples were centrifuged in a JA-20 rotor at 19,000 rpm for 40 min at 4 °C. The supernatant was transferred to a 30,000 Da MWCO concentrator tube and concentrated to about

50% of the starting volume. Finally, the concentrated sample was centrifuged at 13,000 g for 10 min, and the supernatant was stored at 4 °C for later use in enzyme assays.

In vitro ECGT enzyme activity assay ​ 40 µL of N. benthamiana protein extract was added to a 210 µL reaction mix consisting ​ ​ of 50 mM sodium phosphate buffer pH 6, 1 mM β-glucogallin, and 0.4 mM EC or EGC. As a negative control, the protein extract was heated at 95 °C for 15 min and then centrifuged at

13,000 g for 3 min before being added to the reaction. The reaction was stopped after approximately 18 hours by addition of 250 µL methanol, and the reaction mix was centrifuged at

129 13,000 g for 20 min. The supernatant was transferred to new tubes and taken for LC−MS analysis.

LC−MS analysis

LC was conducted on a Dionex UltiMate 3000 UHPLC system (Thermo Fisher

Scientific), using water with 0.1% formic acid as solvent A and acetonitrile with 0.1%formic acid as solvent B. Reverse phase separation of analytes was performed on a Kinetex C18 column, 150 × 3 mm, 2.6 μm particle size (Phenomenex). The column oven was held at 30 °C.

Injections were eluted with 5% B for 2 min, a gradient of 5–36.3% B for 8 min, 95% B for 3 min, and 5% B for 2 min, with a flow rate of 0.7 mL/min. MS analyses for the in vitro enzyme ​ ​ assays were performed on a TSQ Quantum Access Max mass spectrometer (Thermo Fisher

Scientific) operated in positive ionization mode with single reaction monitoring for

β-glucogallin, EGC, and EGCG. MS analyses for plant metabolic extracts were performed on a high-resolution Q-Exactive benchtop Orbitrap mass spectrometer (Thermo Fisher Scientific) operated in positive ionization mode with full scan range of 100−1250 m/z and top 5 data-dependent MS/MS scans. Raw LC−MS data were analyzed using XCalibur (Thermo Fisher

Scientific).

130 Table 1. Primers used in this study.

Name Sequence (5′ to 3′) Purpose

GL0045 ctgcccaaattcgcgaccggtATGGGCTCTGA CsUGGT cloning to ATCACTTG pEAQ-HT, forward

GL0031 ccagagttaaaggcctcgagctaTTAAACAAC CsUGGT cloning to AACAGTAGTAGTTGTGATAA pEAQ-HT, reverse

GL0047 ctgcccaaattcgcgaccggtATGTTTCCACC CsECGT1 cloning to AAAGTCATAC pEAQ-HT, forward

GL0032 atgcatcaccatcaccatcatcccgggATGTT CsECGT1 cloning to TCCACCAAAGTCATAC pEAQ-HT, reverse

GL0092 tattctgcccaaattcgcgaccggtATGTTTC CsSCPL1 (SCPL023451) CACCAAAGTCATACAGT cloning to pEAQ-HT, forward

GL0093 tgaaaccagagttaaaggcctcgagCTAAATA CsSCPL1 (SCPL023451) GGATAGTAATGAATCCA cloning to pEAQ-HT, reverse

131 References

Balentine, D. A., Wiseman, S. A., & Bouwens, L. C. (1997). The chemistry of tea flavonoids. Critical Reviews in Food Science and Nutrition, 37(8), 693–704. ​ ​ ​ Brody, H. (2019). Tea. Nature, 566(7742), S1. ​ ​ ​ ​ Cabrera, C., Artacho, R., & Giménez, R. (2006). Beneficial Effects of Green Tea—A Review. Journal of the American College of Nutrition, 25(2), 79–99. ​ ​ ​ Chiu, C.-H., Chen, G.-H., Tzen, J. T. C., & Yang, C.-Y. (2016). Molecular identification and characterization of a serine carboxypeptidase-like gene associated with abiotic stress in tea plant, Camellia sinensis (L.). Plant Growth Regulation. ​ ​ https://doi.org/10.1007/s10725-015-0138-7 ​ Ciarkowska, A., Ostrowski, M., Starzyńska, E., & Jakubowska, A. (2019). Plant SCPL acyltransferases: multiplicity of enzymes with various functions in secondary metabolism. Phytochemistry Reviews, 18(1), 303–316. ​ ​ ​ Cui, L., Yao, S., Dai, X., Yin, Q., Liu, Y., Jiang, X., … Xia, T. (2016). Identification of UDP-glycosyltransferases involved in the biosynthesis of astringent taste compounds in tea (Camellia sinensis). Journal of Experimental Botany, 67(8), 2285–2297. ​ ​ ​ ​ Edgar, R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. ​ ​ ​ ​ Fraser, C. M., Thompson, M. G., Shirley, A. M., Ralph, J., Schoenherr, J. A., Sinlapadech, T., … Chapple, C. (2007). Related Arabidopsis serine carboxypeptidase-like sinapoylglucose acyltransferases display distinct but overlapping substrate specificities. Plant Physiology, ​ ​ 144(4), 1986–1999. ​ Grabherr, M. G., Haas, B. J., Yassour, M., Levin, J. Z., Thompson, D. A., Amit, I., … Regev, A. (2011). Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology, 29(7), 644–652. ​ ​ ​ ​ Haskell, C. F., Kennedy, D. O., Milne, A. L., Wesnes, K. A., & Scholey, A. B. (2008). The effects of L-theanine, caffeine and their combination on cognition and mood. Biological ​ Psychology, 77(2), 113–122. ​ ​ ​ Huang, Z.-L., Qu, W.-M., Eguchi, N., Chen, J.-F., Schwarzschild, M. A., Fredholm, B. B., … Hayaishi, O. (2005). Adenosine A2A, but not A1, receptors mediate the arousal effect of caffeine. Nature Neuroscience, 8(7), 858–859. ​ ​ ​ ​ Jones, D. T., Taylor, W. R., & Thornton, J. M. (1992). The rapid generation of mutation data matrices from protein sequences. Computer Applications in the Biosciences: CABIOS, 8(3), ​ ​ ​ ​ 275–282. Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular Evolutionary Genetics Analysis Version 7.0 for Bigger Datasets. Molecular Biology and Evolution, 33(7), ​ ​ ​ ​ 1870–1874. Lehfeldt, C., Shirley, A. M., Meyer, K., Ruegger, M. O., Cusumano, J. C., Viitanen, P. V., … Chapple, C. (2000). Cloning of the SNG1 gene of Arabidopsis reveals a role for a serine carboxypeptidase-like protein as an acyltransferase in secondary metabolism. The Plant ​ Cell, 12(8), 1295–1306. ​ ​ ​ Li, A. X., & Steffens, J. C. (2000). An acyltransferase catalyzing the formation of diacylglucose

132 is a serine carboxypeptidase-like protein. Proceedings of the National Academy of Sciences ​ of the United States of America, 97(12), 6902–6907. ​ ​ ​ Liu, Y., Gao, L., Liu, L., Yang, Q., Lu, Z., Nie, Z., … Xia, T. (2012). Purification and characterization of a novel galloyltransferase involved in catechin galloylation in the tea plant (Camellia sinensis). The Journal of Biological Chemistry, 287(53), 44406–44417. ​ ​ ​ ​ Milkowski, C., Baumert, A., Schmidt, D., Nehlin, L., & Strack, D. (2004). Molecular regulation of sinapate ester metabolism inBrassica napus: expression of genes, properties of the encoded proteins and correlation of enzyme activities with metabolite accumulation. The ​ Plant Journal. https://doi.org/10.1111/j.1365-313x.2004.02036.x ​ ​ Milkowski, C., & Strack, D. (2004). Serine carboxypeptidase-like acyltransferases. Phytochemistry, 65(5), 517–524. ​ ​ ​ Mittasch, J., Böttcher, C., Frolova, N., Bönn, M., & Milkowski, C. (2014). Identification of UGT84A13 as a candidate enzyme for the first committed step of gallotannin biosynthesis in pedunculate oak (Quercus robur). Phytochemistry, 99, 44–51. ​ ​ ​ ​ Mugford, S. T., & Milkowski, C. (2012). Serine carboxypeptidase-like acyltransferases from plants. Methods in Enzymology, 516, 279–297. ​ ​ ​ ​ Mugford, S. T., Qi, X., Bakht, S., Hill, L., Wegel, E., Hughes, R. K., … Osbourn, A. (2009). A serine carboxypeptidase-like acyltransferase is required for synthesis of antimicrobial compounds and disease resistance in oats. The Plant Cell, 21(8), 2473–2484. ​ ​ ​ ​ Peyret, H., & Lomonossoff, G. P. (2013). The pEAQ vector series: the easy and quick way to produce recombinant proteins in plants. Plant Molecular Biology, 83(1-2), 51–58. ​ ​ ​ ​ Pluskal, T., & Weng, J.-K. (2018). Natural product modulators of human sensations and mood: molecular mechanisms and therapeutic potential. Chemical Society Reviews, 47(5), ​ ​ ​ ​ 1592–1637. Priyam, A., Woodcroft, B. J., Rai, V., Munagala, A., Moghul, I., Ter, F., … Wurm, Y. (2015). Sequenceserver: a modern graphical user interface for custom BLAST databases. bioRxiv. ​ ​ https://doi.org/10.1101/033142 ​ Punyasiri, P. A. N., Abeysinghe, I. S. B., Kumar, V., Treutter, D., Duy, D., Gosch, C., … Fischer, T. C. (2004). Flavonoid biosynthesis in the tea plant Camellia sinensis: properties of enzymes of the prominent epicatechin and catechin pathways. Archives of Biochemistry ​ and Biophysics, 431(1), 22–30. ​ ​ ​ Sainsbury, F., Thuenemann, E. C., & Lomonossoff, G. P. (2009). pEAQ: versatile expression vectors for easy and quick transient expression of heterologous proteins in plants. Plant ​ Biotechnology Journal, 7(7), 682–693. ​ ​ ​ Shirley, A. M., McMichael, C. M., & Chapple, C. (2001). The sng2 mutant of Arabidopsis is defective in the gene encoding the serine carboxypeptidase-like protein sinapoylglucose:choline sinapoyltransferase. The Plant Journal. ​ ​ https://doi.org/10.1046/j.1365-313x.2001.01123.x ​ Stehle, F., Brandt, W., Milkowski, C., & Strack, D. (2006). Structure determinants and substrate recognition of serine carboxypeptidase-like acyltransferases from plant secondary metabolism. FEBS Letters. https://doi.org/10.1016/j.febslet.2006.10.046 ​ ​ ​ Stehle, F., Stubbs, M. T., Strack, D., & Milkowski, C. (2008). Heterologous expression of a serine carboxypeptidase-like acyltransferase and characterization of the kinetic mechanism. FEBS Journal. https://doi.org/10.1111/j.1742-4658.2007.06244.x ​ ​ Wei, C., Yang, H., Wang, S., Zhao, J., Liu, C., Gao, L., … Wan, X. (2018). Draft genome

133 sequence of Camellia sinensis var. sinensis provides insights into the evolution of the tea genome and tea quality. Proceedings of the National Academy of Sciences of the United ​ States of America, 115(18), E4151–E4158. ​ ​ ​ Xia, E.-H., Zhang, H.-B., Sheng, J., Li, K., Zhang, Q.-J., Kim, C., … Gao, L.-Z. (2017). The Tea Tree Genome Provides Insights into Tea Flavor and Independent Evolution of Caffeine Biosynthesis. Molecular Plant, 10(6), 866–877. ​ ​ ​ ​

134