<<

UNIVERSIDADE ESTADUAL DE CAMPINAS

INSTITUTO DE BIOLOGIA

Nathalia Volpi e Silva

CHLOROGENIC ACID AND ITS RELATIONSHIP WITH BIOSYNTHESIS

ÁCIDO CLOROGÊNICO E SUA RELAÇÃO COM A BIOSSÍNTESE DE LIGNINA

CAMPINAS - SP

2019

NATHALIA VOLPI E SILVA

CHLOROGENIC ACID AND ITS RELATIONSHIP WITH LIGNIN BIOSYNTHESIS

ÁCIDO CLOROGÊNICO E SUA RELAÇÃO COM A BIOSSÍNTESE DE LIGNINA

Thesis presented to the Institute of Biology of the University of Campinas in partial fulfillment of the requirements for the degree of Doctor in Genetic and Molecular Biology in the area of Plant Genetics and Breeding.

Tese apresentada ao Instituto de Biologia da Universidade Estadual de Campinas como parte dos requisitos exigidos para obtenção do Título de Doutor em Genética e Biologia Molecular, na área de Genética Vegetal e Melhoramento.

Supervisor / Orientador: Prof. Dr. Paulo Mazzafera

Co-supervisor / Co-Orientador: Prof. Dr. Igor Cesarino

ESTE ARQUIVO DIGITAL CORRESPONDE À VERSÃO FINAL DA TESE DEFENDIDA PELA ALUNA NATHALIA VOLPI E SILVA, ORIENTADA PELO PROF. DR. PAULO MAZZAFERA.

CAMPINAS - SP

2019 Agência de fomento: FAPESP Agência de fomento: Capes

N° Processo: 2014/17831-5, 2016/15834-2 N° Processo: 001

Nº processo:0 Nº processo:0

Campinas, 31de julho de 2019

EXAMINATION COMMITTEE

Banca examinadora

Dr. Paulo Mazzafera (Supervisor/Orientador)

Dr. Paula Macêdo Nobile

Dra. Sara Adrian Lopez de Andrade

Dr. Douglas Silva Domingues

Dr. Michael dos Santos Brito

Os membros da Comissão Examinadora acima assinaram a Ata de Defesa, que se encontra no processo de vida acadêmica do aluno.

ACKNOWLEDGMENT

I would like to thank the São Paulo Research Foundation (Fundação de Amparo à Pesquisa do Estado de São Paulo) Grant (Processo) nº 2014/17831-5, FAPESP and n° 2016/15834-2, FAPESP for the grant/fellowship and all financial support to develop this thesis. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. I am profoundly grateful to my supervisor Dr. Paulo Mazzafera and my co-supervisor Dr. Igor Cesarino for all knowledge shared and patience to guide me through this journey. I am also extremely grateful to Dra. Nicola J Patron for receiving me in her laboratory at Earlham Institute (Norwich – UK) and teach me about genome editing. My sincere thanks to Tatiane Gregório, Felipe Tolentino, Ewerton Ribeiro, Dr. Oleg Raitskin, Dra. Juliana Mayer and Dr. Eduardo Kiyota for helping me with my experiments when I needed. I could not forget to thank Rafaela Bulgarelli, Dra. Sarah Caroline Ribeiro and Uiara Romero Souza for all the help in taking care of my plants while I was in maternity leave. Dr. Adilson Domingues Junior, Dr. Franklin Magnum Silva, Dr. Luciano Pereira, and Dra. Flávia Shimpl also gave me invaluable help with my writing and academic talk. I also would like to thank all LAFIMP’s team for friendship and support. My research would have been impossible without the support of my family: my husband, my parents, my daughter, my sister. You have always been there for me with unfailing support and continuous encouragement, thank you. My husband, my mother and my mother- in-law help taking care of my daughter Laura were essential while I was writing my thesis.

ABSTRACT

Phenylpropanoids are compounds derived from and are involved in several aspects related to the defense of biotic and abiotic stresses. One of the present in most plants is chlorogenic acid (CGA). CGA biosynthesis is mediated by the hydroxycinnamoyl quinate transferase (HQT). Although never proved, some papers have suggested that the CGA pool could be related to lignin biosynthesis in plants. Another enzyme, hydroxycinnamoyl shikimate transferase (HCT), appears to be involved with both lignin and CGA pathway. Like HCT, HQT uses p-coumaroyl CoA for the formation of the hydroxycinnamoyl shikimate or hydroxycinnamoyl quinate , respectively. In addition, recently the enzyme caffeoyl shikimate esterase (CSE) has been described as involved in the conversion of caffeoyl shikimate to , which is subsequently converted to caffeoyl CoA in lignin route. CSE shares the substrate with the HCT enzyme, thus suggesting that a change in its expression may interfere not only with lignin metabolism, but also with CGA. Because they present common intermediates, it is possible that CGA can act as a donor of carbon skeletons for lignin biosynthesis. In Chapter 1 we brought a review discussing the interconnection among the main genes involved in CGA and lignin interdependency, HCT, HQT, and CSE. In Chapter 2 we focused on the relationship between the genes CSE and HCT, bringing important data to reinforce the importance of shikimate shunt in both pathways. In Chapter 3 we constructed and validated CRISPR/Cas9 constructions to genome edit HCT, CSE, and CCoAOMT aiming the development of tobacco stable mutants. The construction of mutant and double mutants overexpressing and silencing the HCT, HQT and CSE genes may help to clarify the nature of this interdependence between the CGA pool and lignin, as well as to validate the role of the CSE enzyme as a common component in the lignin pathway. Bioinformatics analyses identified four putative isoforms of the HCT gene and two of CSE in Nicotiana tabacum, the species chosen for this study. In order to obtain mutants for these genes we designed several transformation constructions: pCaMV35S::CSE (CSE overexpression), pCaMV35S::HCT (HCT overexpression); pCaMV35S::HQT (HQT overexpression); pCaMV35S::amiRNACSE (CSE downregulation), pCaMV35S::HCT::pCsVMV::amiRNAHQT (HCT overexpression combined with HQT downregulation); pCaMV35S::HQT::pCsVMV::amiRNAHCT (HQT overexpression combined with HCT downregulation), and pCaMV35S::HCT::pCsVMV::amiRNACSE (HCT overexpression combined with CSE downregulation). CSE silencing plants (amiCSE) showed severe dwarfed phenotype and did not produce any descendants indicating the importance of CSE in plant normal development. On the other hand, HCTamiCSE and CSE developed normally and were carried to generation T1 where it was conducted further analyses. The mutants were assayed for phenotype, gene expression, lignin, plant cell wall polysaccharides, saccharification, and phenolic profiling. The plants analyzed showed no alteration in the composition of lignin, but presented alterations in the metabolism of chlorogenic acid, especially the plants overexpressing CSE, indicating a probable role of CGA as carbon skeleton of the lignin pathway. In addition, we also successfully constructed and validated vectors using CRISPR/Cas9 tool for the CCoAOMT, CSE and HCT genes in tobacco leaves. Although several studies suggest the interconnection between the lignin and chlorogenic acid routes, most of the analyzes shown are in vitro. The fact that our mutants have the chlorogenic acid composition affected strongly suggests that these pathways are interconnected and that CSE may play a decisive role in the biosynthesis of chlorogenic acid in tobacco plants.

RESUMO

Fenilpropanóides são compostos derivados da fenilalanina e estão envolvidos em vários aspectos relacionados à defesa de plantas. Alguns desses fenilpropanóides são os ácidos clorogênicos (CGA). A biossíntese de CGA é mediada pela enzima hidroxicinamoil quinato transferase (HQT). Embora nunca provado, alguns trabalhos sugerem que o pool de CGA poderia estar relacionado com a biossíntese de lignina. Outra enzima, a hidroxiciamoil chiquimato esterase (HCT), parece estar envolvida nas rotas de biossíntese de lignina e ácido clorogênico. Assim como a HCT, a HQT utiliza p-coumaoil CoA para formação de ésteres de hidrocinamoil chiquimato ou hidroxicinamoil quinato, respectivamente. Além disso, a enzima cafeoil chiquimato esterase (CSE) foi descrita como envolvida na conversão de cafeoil chiquimato em ácido cafeico, o qual é convertido em cafeoil CoA. Dessa forma, CSE compartilha o substrato com a enzima HCT, sugerindo que uma mudança em sua expressão deva interferir não apenas no metabolismo de lignina, mas também no metabolismo de CGA. Por terem intermediários em comum, é possível que haja interdependência entre essas vias, e CGA possa atuar como doadora de esqueleto de carbono para a biossíntese de lignina. Desta forma, este trabalho objetiva trazer mais informações a fim de entender a relação entre estas duas vias de biossíntese. No Capítulo 1 trouxemos uma revisão com foco na relação entre os principais genes envolvidos na interdependência entre CGA e lignina, os genes HCT, HQT e CSE, com objetivo de conectar os dados disponíveis na literatura que tratam deste assunto. No Capítulo 2 focamos na relação entre estas vias e os genes HCT e CSE, trazendo dados importante que reforça a importância do braço da rota que utiliza chiquimato tanto para lignina como para CGA. No Capítulo 3 construímos e validamos vetores para edição de genoma os genes HCT, CSE e CCoAOMT com objetivo de futuramente desenvolvermos plantas mutantes para estes genes via CRISPR/Cas9. A construção de mutantes e duplos mutantes super- expressando e silenciando os genes HCT, HQT e CSE pode contribuir para esclarecer a natureza dessa interdependência entre o pool de CGA e lignina, assim como validar o papel da enzima CSE como um componente da via de lignina em Nicotiana tabacum. Análises de bioinformática identificaram quatro isoformas putativas do gene HCT e duas do gene CSE em N. tabacum, a espécie escolhida para estudo. Com objetivo de obtermos mutantes para estes genes foram desenhadas várias construções para transformação estável: pCaMV35S::CSE (CSE super- expressão), pCaMV35S::HCT (HCT super-expressão); pCaMV35S::HQT (HQT super- expressão); pCaMV35S::amiRNACSE (CSE silenciamento), pCaMV35S::HCT::pCsVMV::amiRNAHQT (HCT super-expressão combinada com HQT silenciamento); pCaMV35S::HQT::pCsVMV::amiRNAHCT (HQT super-expressão combinada com HCT silenciamento), e pCaMV35S::HCT::pCsVMV::amiRNACSE (HCT super-expressão combinada com CSE silenciamento). As plantas silenciadas para o gene CSE (amiCSE) apresentaram comprometimento severo do desenvolvimento e não produziram descendentes indicando a importância da CSE para o desenvolvimento normal em N. tabacum. Em contrapartida, as linhagens de HCTamiCSE e CSE não apresentaram alteração do fenótipo. Essas plantas foram cultivadas até T1 e então submetidos às seguintes análises: fenotípica, expressão gênica, lignina, polissacarídeos de parede celular, sacarificação e perfil fenólico. As plantas analisadas não apresentaram alteração da composição de lignina, mas apresentaram alteração no metabolismo de CGA, especialmente as plantas super-expressando o gene CSE, indicando um provável papel de CGA como esqueleto de carbono para o metabolismo de lignina. Além disso, nós também construímos e validamos vetores utilizando a ferramenta de edição de genoma via CRISPR/Cas9 para os genes HCT, CSE e CCoAOMT em folhas de N. tabacum. Apesar de vários estudos sugerirem uma interconexão entre as vias de lignina e CGA, a maior parte das análises foram feitas in vitro. O fato dos nossos mutantes terem a composição de CGA afetada sugere fortemente que essas vias são interconectadas, e que CSE tem um papel decisivo na via de biossíntese em N. tabacum.

ABBREVIATION LIST

CGA – chlorogenic acid

5-CQA – 5- caffeoyl quinate di-CQA – 3,5-dicaffeoyl quinate

CCT – chlorogenate:chlorogenate transferase

PAL – phenylalanine ammonia-lyase

C4H – cinnamate 4-hydroxylase

C3H – p-coumarate 3-hydroxylase (ascorbate peroxidase)

4CL – 4-coumarate:CoA ligase

HCT – hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyl transferase

HQT – hydroxycinnamoyl-CoA:quinate hydroxycinnamoyl transferase

C3’H – p-coumaroyl shikimate 3′-hydroxylase (CYP98)

CSE – caffeoyl shikimate esterase

CCoAOMT – caffeoyl-CoA 3-O-methyltransferase

UGT84 – UDP-glucoside transferase

HCGQT – hydroxycinnamoyl D-glucose:quinate hydroxycinnamoyl transferase

H – p-hydroxyphenyl

G – guaiacyl

S – syringil

CCR – cinnamoyl-CoA reductase

CRISPR/Cas9 – Cluster Regulatory Interspaced Short Palindromic repeats/ Associated protein 9 system amiRNA – artificial microRNA

WT – Wild Type pCaMV35S – Cauliflower Mosaic Virus 35S promoter pCsVMV – Cassava vein mosaic virus promoter

UPLC- MS/MS – Ultra-Performance Liquid Chromatography coupled to a mass spectrometer PCA – Principal Component Analyses

PC1 – Principal Component 1

PC2 – Principal Component 2

ORF – Open Read Frame

CDS – Coding Sequence

HCBPT – Anthranilate N-hydroxycinnamoyl/ benzoyltransferase

DAT – deacetylvindoline 4-O- acetyltransferase

CNRQ – Calibrated Normalized Relative Quantities

CA – Caffeic acid

LMID – Lignin modification induced dwarfism

PAM – Protospacer adjacent motif gRNA – guide RNA tracrRNA – trans acting RNA crRNA – CRISPR RNA trugRNA – truncated guideRNA

MoClo – Golden Gate Modular Cloning

PCR/RE – Restriction enzyme site loss-based PCR

NptII – neomycin phosphotransferase pNOS – Nopaline synthase promoter

TNos – Nopaline synthase terminator

SpCas9h – Cas9 from Streptococcus pyogenes human códon optimized

T35S – 35S terminator

OsCald5H1 – conipheraldehyde 5-hydrolase

SNP – Single Nucleotide Polymorphism

SUMMARY

INTRODUCTION ...... 14 Chlorogenic acids biosynthesis ...... 17 Lignin biosynthesis ...... 21 Relationship with lignin and chlorogenic acids biosynthetic pathways ...... 23 Chapter 1

Abstract ...... 28 Chapter 2

1. Introduction ...... 30 2. Material and Methods ...... 32 2.1. Sequence analysis ...... 32 2.2. Vector assemble and amiRNA design ...... 33 2.3. Plant Transformation ...... 34 2.4. qRT-PCR analyses ...... 35 2.5. Phenolic profiling ...... 36 2.6. Lignin quantification ...... 36 2.7. Plant cell wall polysaccharides and saccharification ...... 37 2.8. Morphological and histochemical analyses ...... 37 2.9. Statistical analyses ...... 37 3. Results ...... 38 3.1. Bioinformatic analysis ...... 38 3.1.1. HCT ...... 38 3.1.2. CSE...... 39 3.2. Vector assembly and amiRNA design ...... 40 3.3. Plant Transformation ...... 46 3.3. Gene expression in different tissues of WT and mutants of tobacco ...... 47 3.3.1. Different tissues in Wild Type Plants ...... 47 3.3.2. qRT-PCR screening from double and single mutants ...... 49 3.5. Phenolic Profiling ...... 53 3.6. Lignin content and composition ...... 56 3.7. Plant cell wall polysaccharides and saccharification ...... 58 3.8. Morphological and histochemical analyses ...... 60 3.9. Pearson correlation and network analyses ...... 64 3.10. Principal Components Analysis ...... 66 4. Discussion ...... 68 4.1. Search of HCT and CSE alleles and expression profile ...... 68 4.2. Downregulation of CSE severely impact plant development ...... 69 4.3. HCT overexpression overcome cse dwarfism and CSE mutants accumulate CGA without affecting lignin content ...... 71 Chapter 3

1. Introduction ...... 78 2. Material and Methods ...... 82 2.1. Target locus selection and sgRNA design ...... 82 2.2. Construct DNA assembly and multiplex targeting ...... 82 2.3. Agroinfiltration: Development, test, and delivery of the constructions in tobacco leaves ...... 83 2.4. Genotyping ...... 83 3. Results and Discussion ...... 83 3.1. Target locus selection and sgRNA design ...... 83 3.1.1. HCT ...... 83 3.1.2. CCoAOMT ...... 86 3.1.3. CSE...... 89 3.2. Construct DNA assembly and multiplex targeting ...... 90 3.3. Agroinfiltration and Genotyping ...... 96 4. Discussion ...... 101 CONCLUSION ...... 104 PERSPECTIVES ...... 106 REFERENCES ...... 107 SUPPLEMENTARY INFORMATION ...... 124 ATTACHMENT ...... 135 Attachment 1 ...... 135 Attachment 2 ...... 136 Attachment 3 ...... 137 14

INTRODUCTION Plant cell wall, or lignocellulosic biomass, account for 70% of the biomass produced worldwide and is therefore considered the most abundant renewable resource in the world (Pauly and Keegstra, 2008; Mottiar et al., 2016). The cell wall is a component of the raw material of various consumer goods such as wood, textiles, paper, films, explosives and biofuels (Cosgrove, 2000; Neutelings, 2011). Despite the high economic and productive potential, the use of lignocellulosic biomass feedstock is hampered by its chemical recalcitrance (Pauly and Keegstra, 2008). The cell wall has a complex and dynamic structure and is composed mostly of high molecular weight polysaccharides, glycosylated proteins and compounds derived from the pathway (Somerville et al., 2004; Van de Wouwer et al., 2018; Lorenzo et al., 2019; Terrett and Dupree, 2019). The architecture of the cell walls may vary according to each cell type, however, in general, they are classified into two types: primary and secondary (Taiz and Zeiger, 2006). Primary walls are formed while cells are still differentiating, usually during expansion and stretching, and are generally non-lignified (Harris and Stone, 2008; Zeng et al., 2014). They are composed mainly by polysaccharides – cellulose, hemicellulose and pectin – and water (Figure 1 A – Loqué et al., 2015). In contrast, secondary walls are deposited after cessed cell elongation inside the primary walls and typically present lignin in their composition (Harris and Stone, 2008). There are two types of the secondary cell wall, parenchyma type and sclerenchyma type (Zeng et al., 2014). The parenchyma type is present in the parenchyma and collenchyma, which are living cells; the sclerenchyma type is thicker and highly differentiated and is found in tracheids and fibers, which are dead cells (Zeng et al., 2014). Although they have variable structures, the secondary walls are organized so that the cellulose microfibrils are embedded in a complex matrix of hemicellulose, pectin, and lignin (Figure 1 B) (Schubert, 2006; Loqué et al., 2015; Van de Wouwer et al., 2018; Lorenzo et al., 2019; Terrett and Dupree, 2019). Due to the structural complexity and the way it chemically binds to cellulose, lignin is the major contributor to secondary cell wall recalcitrant to degradation (Li et al., 2008; Mottiar et al., 2016; Mahon and Mansfield, 2019). 15

Figure 1. Schematic model of plant cell wall composition. (A) Primary plant cell wall found n dicots (B) Lignified Secondary Plant cell wall. This figure is from Loqué et al., 2015.

The current world demands the production of large amounts of energy and nowadays the main source comes from fossil representing 80% of total consumption (Seh et 16

al., 2017; Correa et al., 2019). Population growth has increased the demand for energy and, consequently, there are concerns about the depletion of oil and the environmental impacts caused by the emission of greenhouse gases (Mccann and Buckeridge, 2014; Seh et al., 2017). Studies indicate that by 2040 biofuel demand may increase to 4.7 mboe/d, representing 6% of renewables will be used in transport (IEA, 2018). According to International Energy Agency, (2018) to the world achieve the 2030 sustainable developmental scenario the biofuel production would have to triplicate, and for this reason to implement more efficient biofuel production is needed (Correa et al., 2019). Thus, the development of energy from renewable energy sources to replace fossil fuels has been the priority of many countries (Himmel et al., 2007). The use of biomass to generate energy is important for the sustainable development since this source of energy can provide liquid fuel for transportation (Alalwan et al., 2019; Correa et al., 2019). The production of biofuels such as biodiesel and bioethanol can contribute to supplying energy demand in a safe way, promoting rural development and reducing the emission of noxious gases (Macrelli et al., 2014). Bioethanol production from biomass can be based on first-generation or second- generation technology (Mccann and Buckeridge, 2014). The first-generation technology is made by direct fermentation of extract obtained from plants with a high content of sucrose, such as sugarcane and beet, or through saccharification (splitting of the starch into glucose) followed by fermentation, such as corn and wheat (Mccann and Buckeridge, 2014). Despite the relative success of ethanol production from sugarcane, corn, and wheat, the use of these sources of biomass can generate competition due to its use as food for humans and animals and biodiverse landscape (Alalwan et al., 2019; Correa et al., 2019). Second-generation biofuel production could optimize the use of these plant resources and help to reduce the emission of polluting gases from fossil fuel use (reviewed by Correa et al., 2019). Second-generation technology, in turn, uses enzymatic conversion of lignocellulosic material to ethanol production (Mccann and Buckeridge, 2014). In this case, the cell wall structural polysaccharides are broken down by hydrolyzes or thermochemical process into fermentable monomeric sugars by microorganisms (Alalwan et al., 2019). However, lignocellulosic biomass is underutilized because lignin restricts the access of microbial to cellulose (Mahon and Mansfield, 2019). Besides the negative effect on the production of biofuels, lignin in biomass also affects the efficiency of paper industry, where lignin is undesirable because its removal increases costs (Ververis et al., 2004; Alalwan et al., 2019; Mahon and Mansfield, 2019). Thus, it is of great economic interest to produce plants which accumulate less lignin or contain lignin with altered composition facilitating cellulose extraction. Several studies have been carried out in recent 17

years in an attempt to understand the most varied aspects from lignin metabolism, from the characterization of genes and enzymes involved in the biosynthesis pathway to the identification of transcriptional regulators and enzymes related to polymerization (Wang and Dixon, 2012; Faraji et al., 2018; Takeda et al., 2018; Pereira et al., 2018; Oyarce et al., 2019; Ralph et al., 2019; Takeda et al., 2019; Terrett and Dupree, 2019). Lignin is a phenylpropanoid and its route of biosynthesis shares common intermediates with another economically important group of phenylpropanoids, chlorogenic acids (CGA). Even though it is well known that lignin and CGAs biosynthesis pathway shares common intermediates and enzymes, it has not been proven yet if CGAs can be used as substrates for the biosynthesis of lignin (Joët et al., 2009; Escamilla-Treviño et al., 2014; Valiñas et al., 2015). CGAs is commonly found in plants and some have significant amounts of this phenol. Using the information available about each biosynthesis route, this work proposes to investigate this superposition.

Chlorogenic acids biosynthesis

Chlorogenic acids (caffeoylquinic acids – CGAs) belong to an important group of dietary antioxidants (Niggeweg et al., 2004; Lallemand et al., 2012a; Lallemand et al., 2012b). These metabolites are soluble esters formed from the conjugation of trans-cinnamic acids and (Clifford, 1999; Lallemand et al., 2012a). CGAs have an important role in plant defense, considering they act as antioxidants in plants. High levels of CGAs can increase pathogen resistance (Niggeweg et al., 2004; Leiss et al., 2009; Pu et al., 2017) and prevent damage caused by abiotic stresses (Clé et al., 2008; Comino et al., 2009). In fact, CGAs have been described as involved in anti-herbivore compound (Leiss et al., 2009; Kundu and Vadassery, 2019). Their pro-antioxidant effect gives to CGAs an anti-nutritive property when consumed by insects (Kundu and Vadassery, 2019). For example, plants of chrysanthemum (Dendranthema grandiflora) with a higher content of CGAs showed a higher level of resistance to Frankliniella occidentalis, an important insect pest of agricultural (Leiss et al., 2009). In sweet potato, the CGA content was associated with resistance to fungi and immature insects (Peterson et al., 2005). Gauthier et al., (2016) associated an increase in CGA content as a strategy to defend small-grain cereals and maize from fungi attack. In addition, its nutraceutical value has been described against several different human conditions (Yoshimoto et al., 2002; Islam, 2006; Thom, 2007; Yamaguchi et al., 2008; Van Dijk et al., 2009; Oboh et al., 2013; Ohkawara et al., 2017). 18

The biosynthetic pathway of CGA in plants has yet to be completely elucidated, despite three routes have been proposed (Fig. 2) (Escamilla-Treviño et al., 2014). The first CGA formed is 5-CQA and little is known on how the more complex chemical structures of this chemical group are formed.

Figure 2. Proposed biochemical routes for the biosynthesis of CGAs and lignin. The shikimate shunt is highlighted in blue and constitutes a common route towards both CGAs and lignin. The quinate shunt towards CGAs is highlighted in red. An alternative route employing cinnamoyl glucosides as activated intermediates are shown in green. For the biosynthesis of diCQAs, caffeoyl quinate is first transported to the vacuole where HQT catalyzes chlorogenate:chlorogenate transferase (CCT) activity at lower pH. The pathway from caffeoyl- CoA towards G and S lignin units is channeled by CCoAOMT and involves other downstream enzymes. Abbreviations: PAL, phenylalanine ammonia-lyase; C4H, cinnamate 4- hydroxylase; C3H, p-coumarate 3-hydroxylase (ascorbate peroxidase); 4CL, 4- coumarate:CoA ligase; HCT, hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyl transferase HQT, hydroxycinnamoyl-CoA:quinate hydroxycinnamoyl transferase; C3’H, p- coumaroyl shikimate 3′-hydroxylase (CYP98); CSE, caffeoyl shikimate esterase; CCoAOMT, caffeoyl-CoA 3-O-methyltransferase; UGT84, UDP-glucoside transferase; HCGQT, hydroxycinnamoyl D-glucose:quinate hydroxycinnamoyl transferase. This figure is from Silva et al., 2019.

19

The first route described is also a branch of lignin pathway biosynthesis. In this route the shikimate shunt, the enzyme hydroxycinnamoyl CoA: shikimate hydroxycinnamoyl transferase (HCT) catalyzes the esterification of p-coumaroyl-CoA and shikimate in p- coumaroyl shikimate. The 3’- of p-coumaroyl shikimate by p-coumaroyl shikimate 3-hydroxylase (C3’H) produces caffeoyl shikimate, which is further converted in caffeoyl-CoA by the activity of HCT (Hoffmann et al., 2004) or caffeoyl shikimate esterase (CSE), with an intermediary step were caffeic acid is produced (Vanholme et al., 2013b). In this route, until this point, lignin and CGAs would have the same pathway, and, only after Caffeoyl-CoA production, this compound could either be channeled into the biosynthesis of by the enzyme CCoAOMT or be converted into CQA by HCT/HQT. The second pathway involves the conversion of p-coumaroyl-CoA into p-coumaroyl quinate through the activity of HCT or HQT, followed by the 3’-hydroxylation by C3’H to produce 5-CQA, the quinate shunt (Figure 2). In the last possibility, the 5-CQA is produced by a transesterification reaction involving a caffeoyl glucoside as the activated intermediate instead of from caffeoyl- CoA (Villegas and Kojima, 1986). It is important to highlight here the interconnection between the shikimate and the quinate shunt by caffeoyl CoA since the conversion of caffeoyl CoA to CGA mediated by HCT/HQT enzymes are reversible reactions (Figure 2). The fact that HCT/HQT catalyzes more than one reaction and theses reactions are reversible, increases the complexity between these routes and the importance of each reaction for a metabolic balance between the pathways become a difficult task. Although the route through HQT has been considered the main pathway to CGAs production (Niggeweg et al., 2004) the involvement of HCT in CGAs metabolism has been demonstrated in several species such as tobacco, Populus trichocarpa, coffee and switchgrass (Hoffmann et al., 2003; Lallemand et al., 2012b; Escamilla-Treviño et al., 2014; Zhang et al., 2018) even though HCT affinity to quinate is much lower than HQT (Lallemand et al., 2012b; Walker et al., 2013). Moreover, it has been shown that not all species have HQT in their genome (Escamilla-Treviño et al., 2014) leading to the argument that different species may have different routes to produce CGAs. For example, an HCT was described displaying specificity to both substrates quinate and shikimate in coffee (Lallemand et al., 2012a). For fruits from this same species, Joët et al. (2009) suggested that HQT activity would be restricted to perisperm while HCT would be responsible for CGAs production in the endosperm, suggesting that different routes could operate in different tissues in other species too, i.e. the biosynthesis of CGAs in plants might be tissue-dependent. In this sense, the routes could be regulated by 20

enzymes and substrates availability and/or cellular pH since HQT and HCT seem to have optimal activity at different pH (Lallemand et al., 2012b). In Panicum virgatum L.(switchgrass), biochemical characterization with a recombinant protein of PvHCT-like showed that PvHCT-like 1a and PvHCT-like 2a have preference to as acyl acceptor while the PvHCT-like 1 prefers quinic acid as acyl acceptor indicating that this last one has HQT activity (Escamilla-Treviño et al., 2014). The same assay showed that although PvHCT-like 1 and PvHCT-like 2 can catalyze the reverse reaction converting caffeoyl shikimate to caffeoyl CoA, this reaction occurs very inefficiently, and it is unlikely that it occurs in switchgrass. Besides in vitro evidence, until this moment no PvHCT-like mutant was developed to confirm the correlation between CGA and PvHCT-like in switchgrass. The same work also suggested the involvement of CSE in CGA pathway in switchgrass using the branch shared with lignin biosynthesis. When Escamilla-Treviño et al. (2014) tested stem protein extracts to produce caffeoyl CoA from caffeoyl shikimate, the main compound produced was caffeic acid, indicating that CSE bypass HCT second step to produce caffeoyl CoA. Valiñas et al. (2015) found a positive correlation between CSE and HQT transcripts and 5-CQA accumulation and an opposite pattern in HCT transcripts in potato tuber. The authors attributed the negative correlation between HCT and 5-CQA by the competition by substrate between CSE and HCT and indicate HCT as probably involved in 5-CQA catabolism. This way, in potato tuber the CGA content would be the product of a dynamic balance between 5-CQA production via p-coumaroyl CoA by C4H and HQT and 5-CQA catabolism by HCT and CSE. It is noteworthy that the CSE substrate (caffeoyl shikimate) is structurally similar to caffeoyl quinate (CQA) and, thus, it is plausible to think that CSE could use CQA as a substrate as well. Recombinant Arabidopsis CSE also showed a broader specificity (Vanholme et al., 2013b) suggesting that CSE might be able to use shikimate and quinate esters as substrates, even though with different efficiencies. Moreover, CSE steps end with caffeoyl CoA formation (see figure 2) and that HCT may convert caffeoyl-shikimate to caffeoyl CoA form, the CGA biosynthesis through HQT activity may change according to the pool size of caffeoyl CoA. In this way, considering that caffeoyl CoA can be used as a substrate by both, CCoAOMT in lignin and HQT/HCT in CGA, it is tempting to think that CGA is a metabolic reservoir of caffeoyl acid and plays a role in lignin metabolism (Días et al., 1997; Joët et al., 2009; Escamilla-Treviño et al., 2014). Considering this hypothesis, the CSE could have a key role in CGA accumulation for species that use caffeoyl CoA as a substrate to produce this antioxidant. But, CSE has been discovered recently and, although it had been added in the lignin/CGA shunt (Vanholme et al., 2013b), its metabolic role has not been completely elucidated. There are only a few studies with 21

CSE mutants (Vanholme et al. 2013; Ha et al. 2016; Saleme et al. 2017; Vargas et al., 2016) and most of them have only evaluated its impact in lignin biosynthesis (Vanholme et al. 2013; Ha et al. 2016; Vargas et al., 2016). For this reason, further studies are necessary to understand CSE impact in CGA biosynthesis pathway. Considering its complexity, the study of CGA biosynthesis goes beyond its health benefits. Furthermore, CGA route may acts as a carbon skeleton to lignin pathway, a compound with big impact in biofuel production (Días et al., 1997; Comino et al., 2009; Joët et al., 2009; Escamilla-Treviño et al., 2014). It is a plausible idea, independently of which of these two main routes are used to its biosynthesis, considering that both pathways have common intermediates and branches to produce them. In Chapter 1 we discuss in further details this interconnection and how the dynamic between lignin and CGA pathways possible happens in different species. Even though there are several studies suggesting this interconnection how it happens still needs further studies.

Lignin biosynthesis

Lignin is mainly deposited in vessels and fiber cells, enabling vascular plants to stand upright, endure mechanical stresses, transport water in the xylem and avoid vasculature collapse under negative pressures caused by transpiration (Ferrer et al., 2008; Bonawitz and Chapple, 2010; Pereira et al., 2018). Lignin forms a complex linkage with cell wall polysaccharides - cellulose, hemicellulose, and pectin – providing the plant cell wall with recalcitrance against degradation (Lorenzo et al., 2019; Terrett and Dupree, 2019). Plant cell wall polysaccharides can be converted to fermentable sugars to produce second-generation biofuels (Mccann and Buckeridge, 2014; Alalwan et al., 2019), but the processing of lignocellulosic biomass is still hampered by the presence of lignin (Mahon and Mansfield, 2019). In most angiosperms, lignin is composed of two major monomers, the subunits guaiacyl (G) and syringyl (S), with only trace amounts of the p-hydroxyphenyl (H) subunit (Boerjan et al., 2003; Vanholme et al., 2019b). The ratio of the S/G subunits in lignin will determine the degree and nature of polymeric cross-linking (Ferrer et al., 2008). The lower methoxylation level in G subunits causes a higher rate of carbon-carbon linkages with a consequent increase in chemical stability resulting in higher rigidity and hydrophobicity than lignin-rich in S subunit (Ferrer et al., 2008). Radical coupling during lignin polymer deposition is not enzymatically controlled and for this reason, lignin structure is determined by the monomer’s availability during its polymerization (Mottiar et al., 2016; Ralph et al., 2019). For 22

this same reason, several other phenylpropanoids derivates can be naturally incorporated into polymeric lignin (Ralph et al., 2019; Vanholme et al., 2019b). Hydroxycinnamoyl alcohols, hydroxycinnamoyl esters, hydroxypropanols, hydroxycinnamic acids, hydroxycinnamaldehydes, hydroxystilbenes, flavone, and tricin are some examples of these non-canonical monomers (reviewed by Vanholme et al. 2019). Another fact that influence directly the complex nature of lignin structure is how different interunit linkage types connect lignin monomers (Ralph et al., 2019). In addition, how this complex polymer cross-link into cell wall polysaccharides may also influence plant cell wall recalcitrance (Terrett and Dupree, 2019). This way, due to the high complexity of lignin polymer and its interconnection with polysaccharides in plant cell wall some strategies has been proposed to reduce biomass recalcitrance: A) changing lignin composition; B) changes in lignin structure; C) changes in lignin cross-linking with plant cell wall polysaccharides (Marriott et al., 2015; Ralph et al., 2019; Terrett and Dupree, 2019; Vanholme et al., 2019b). There are eleven main enzymes described as involved in lignin biosynthetic pathway and enzyme HCT is positioned at the beginning of the lignin pathway and it was first described as essential to produce G and S subunits in tobacco (Hoffmann et al., 2003). Hoffmann et al. (2004) first proved HCT activity in planta through gene-silencing in Arabidopsis thaliana and Nicotiana benthamiana. Since this finding, several studies have shown that the silencing of HCT results in altered lignin content and composition (Hoffmann et al., 2004; Chen et al., 2006; Shadle et al., 2007; Wagner et al., 2007; Gallego-Giraldo et al., 2014; Peng et al., 2014; Ponniah et al., 2017). In the shikimate shunt (see Figure 1) this enzyme catalyzes the esterification of p-coumaroyl-CoA into p-coumaroyl shikimate. In another step in this same shunt, HCT uses caffeoyl shikimate to produce caffeoyl CoA (Hoffmann et al., 2003). However, in vitro enzyme assays showed that after caffeoyl CoA, p-coumaroyl CoA was the second-best substrate, thus suggesting a reverse reaction (Hoffmann et al., 2003). Although the reaction caffeoyl shikimate/quinate to caffeoyl CoA was proved in vitro, the efficiency was low (Hoffmann et al., 2004; Lallemand et al., 2012b; Escamilla-Treviño et al., 2014; Wang et al., 2014a). Caffeic acid was recently described as the product of the conversion of caffeoyl shikimate by the enzyme caffeoyl shikimate esterase (CSE) in Arabidopsis thaliana (Vanholme et al., 2013b). CSE role in lignin pathway was already described in A. thaliana, Medicago truncatula, dicot, Leguminosae), poplar (Populus deltoides, dicot, Salicaceae), and switchgrass (Panicum virgatum, monocot, Poaceae) (Vanholme et al. 2013; Ha et al. 2016; Saleme et al. 2017; Vargas et al., 2016). Escamilla-Treviño et al. (2014) could not determine the formation 23

of caffeoyl CoA from caffeoyl shikimate when they used crude protein extract from switchgrass. They found instead, caffeic acid as the main product of caffeoyl shikimate conversion. Thus, altogether, although it still needs to be proved, at least for some species, the conversion of caffeoyl shikimate into caffeoyl CoA by HCT may not be the preferable reaction in vivo. It is plausible to think that CSE has an important role in the route, by-passing the reaction catalyzed by HCT. Caffeic acid would be converted by 4CL to caffeoyl CoA (Vanholme et al., 2013b), and CSE steps would end with caffeoyl CoA formation (see figure 1), which can be used in the route through CCoAOMT/CCR (Bonawitz and Chapple, 2010) or CGA pathway by the action of HCT/HQT enzymes (Escamilla-Treviño et al., 2014). Although CSE may be the preferable branch to caffeoyl CoA production it may not be the only one, since some important species such as Brachypodium distachyon, maize, sorghum, and sugarcane do not possess putative orthologues of this enzyme (Vicentini et al., 2015; Ha et al., 2016). Mutants down-expressing cse have less lignin and enrichment of H unit (Vanholme et al., 2013b; Ha et al., 2016; Saleme et al., 2017), a phenotype usually found in hct (Chen et al., 2006; Shadle et al., 2007; Wagner et al., 2007; Gallego-giraldo et al., 2011; Vanholme et al., 2013a) and c3h (Franke et al., 2002a; Takeda et al., 2019) mutants. This result is in line with the suggestion that CSE is part of the same branch in lignin biosynthesis. Thus, considering that CSE steps end with caffeoyl CoA formation (see figure 2) and that HCT may convert caffeoyl-shikimate to caffeoyl CoA form, the CGA biosynthesis through HQT activity may change according to the pool size of caffeoyl CoA. It is possible that different factors may be controlling the balance between these two metabolic alternatives, such as cellular localization (vacuolar or cytoplasmatic), pH, and probably the concentration of CGA in the cell (Moglia et al., 2014). In this way, considering that caffeoyl CoA can be used as a substrate by both, CCoAOMT in lignin and HQT/HCT in CGA, it is tempting to think that CGA is a metabolic reservoir of caffeoyl acid and plays a role in lignin metabolism (Días et al., 1997; Joët et al., 2009; Escamilla-Treviño et al., 2014).

Relationship with lignin and chlorogenic acids biosynthetic pathways

Although the lignin route has been extensively studied, it is noteworthy that remains unclear if chlorogenic acids (monocaffeoylquinic acids – CGAs) can be used as a carbon skeleton to the lignin pathway. Several reports suggest that wounding stress in potato and carrots can induce the conversion of CGA into lignin, probably as a defense against pathogens (Gamborg, 1966; Friend et al., 1973; Becerra-Moreno et al., 2015; Jacobo-Velázquez et al., 24

2015). Joët et al. (2009) found a match between the time window of gene expression pattern related to CGA catabolism and cell wall lignification in Coffea arabica, suggesting that this conversion would be mediated by CaHCT1. In agreement with these findings, Lepelley et al. (2007) observed that although CGA level maintains constant during grain development, quinic acid (CGA precursor) level falls during endosperm expansion and grain maturation in coffee, the same period when there is an increase in HCT and CCoAOMT gene expression. A decrease of quinic acid (Rogers et al., 1999) and an increase in total CGA (Bertrand et al., 2003; Castro and Marraccini, 2006) content during coffee grain development had been previously reported. Also, in coffee, a decrease in CGAs coincides with an increase in lignin deposition in seedlings (Aerts and Baumann, 1994). Similar results were described by Días et al. (1997) in Capsicum annuum L. during plant early development. Thus, it might be possible that the balance between the two routes changes depending on the cell types and the cell need for carbon allocation in new structures. For example, HQT role might be punctual depending on the plant development stage or environmental conditions. The transcriptional profile developed by Joët et al. (2009) also suggests the remobilization of CGA toward monolignols biosynthesis in coffee seeds, as the first three enzymes of phenylpropanoid pathway (PAL, C4H, and 4CL) did not display a gene transcription profile that matched with lignification process associated with endosperm hardening during coffee seed development. Interestingly, Mondolot et al. (2006) reported in C. canephora the transportation of CGA to vascular tissue in old leaves indicating its remobilization among tissues. In coffee, most chlorogenic acid studies are in endosperm (Rogers et al., 1999; Castro and Marraccini, 2006; Lepelley et al., 2007; Joët et al., 2009) due the fact that these compounds have an important role in drink quality (Clifford, 1999; Mazzafera, 1999; Casas et al., 2017). Despite its importance to drink quality, the coffee seed is physiologically very different from the stem – the main source of lignocellulosic biomass. Trees and grasses are the main biomass reservoir and the focus in most studies that try to understand lignin biosynthesis. For example, maize and sugarcane have been gaining the attention of scientists that study lignin due to its potential to improve biofuel production (Jung et al., 2012; Vicentini et al., 2015; Fornalé et al., 2017). Thus, understanding the relationship between HCT, CSE, and HQT and how they can influence lignin and CGA pathways can be useful for the development of plants with lower lignin content and highest antioxidant potential due to CGAs accumulation. On the one hand, studies in several species such as tobacco, coffee, sorghum, poplar and switchgrass (Hoffmann et al., 2004; Lallemand et al., 2012b; Walker et al., 2013; Escamilla-Treviño et al., 2014; Wang et al., 2014a) suggest that HCT can act in both 25

lignin and CGA pathways. On the other hand, HCT has lower in vitro affinity to quinate (Lallemand et al., 2012b) than HQT, which is described as the main enzyme responsible for CGA biosynthesis in coffee, artichoke, potato, tomato and tobacco (Niggeweg et al., 2004; Lepelley et al., 2007; Comino et al., 2009; Sonnante et al., 2010; Payyavula et al., 2015). In addition, only a few studies have explored the potential role of CCoAOMT in CGA accumulation (Campa et al., 2003; Jiang et al., 2014; Valiñas et al., 2015), despite its function in stress response (Senthil-Kumar et al., 2010; Giordano et al., 2016; Wang and Balint-Kurti, 2016). Noteworthy, Valiñas et al., (2015) found a strong correlation between CCoAOMT expression and CGA production in potato. Similarly, in coffee, a CCoAOMT allele seems to be strongly related to CGA production (Campa et al., 2003). The correlation between CGAs and lignin during early plant development was studied in Capsicum annuum L. (Días et al., 1997). The authors suggested that CGAs were potential precursors for lignin biosynthesis, a hypothesis later endorsed by the work of Joët et al., (2009) in coffee. Considering that caffeoyl- CoA can be used as a substrate by both CCoAOMT in lignin and HQT/HCT in CGA biosynthesis, it is plausible to think that CCoAOMT regulation may affect CGA content. There is a possibility that CGA is a storage compound that is subsequently re-routed towards lignin biosynthesis during specific developmental stages (Días et al., 1997).

This thesis – efforts to get stronger evidence that CGA is substrate for lignin biosynthesis In order to gain insights into the correlation between the two routes, we developed transgenic tobacco downregulated for three key enzymes (CSE, HQT, and HCT), including lines in which the overexpression of one gene is followed by the repression of another. Briefly, we transformed plants with seven different constructs – pCaMV35S::CSE (CSE overexpression), pCaMV35S::HCT (HCT overexpression); pCaMV35S::HQT (HQT overexpression); pCaMV35S::amiRNACSE (CSE downregulation), pCaMV35S::HCT::pCsVMV::amiRNAHQT (HCT overexpression combined with HQT downregulation); pCaMV35S::HQT::pCsVMV::amiRNAHCT (HQT overexpression combined with HCT downregulation), and pCaMV35S::HCT::pCsVMV::amiRNACSE (HCT overexpression combined with CSE downregulation). Although we have developed transgenic lines for all seven constructs, due to technical problems, not all were analyzed. We had powdery mildew contamination in our greenhouse that destroyed half of our plants before flowering. Moreover, our double mutants pCaMV35S::HCT::pCsVMV::amiRNAHQT (HCT overexpression combined with HQT downregulation); 26

pCaMV35S::HQT::pCsVMV::amiRNAHCT (HQT overexpression combined with HCT downregulation) generated several lines (10 and 80 respectively) but we did not find in T0 any line with high level of expression of the gene we were overexpressing combined with the downregulation of the gene we were downregulating. When we screened the plants transformed with pCaMV35S::HCT (HCT overexpression); pCaMV35S::HQT (HQT overexpression) we were able to find only one line of each construct with a high level of expression of the gene we were expressing. For this reason, we did not use transformed plants with these five constructs in Chapter 2, although the construction of the vectors and the production of the plant mutants took a considerable time. In Chapter 3, we applied genome editing by CRISPR/Cas9 to induce mutations in tobacco by agro-transient expression in CSE, CCoAOMT, and HCT. We also developed 8 stable plants containing the construction CRISPR HCT but we did not find any mutation what may be related with the low number of plants evaluated. Genome editing by CRISPR/Cas9 allows the development of more reproducible and accurate results, besides the generation of “transgene-free” plants, which would imply in higher acceptance by the general public (Belhaj et al., 2015; Lowder et al., 2015; Ma et al., 2015; Tong et al., 2015; Zhou et al., 2015). In Chapter 3 we reviewed and discuss biochemical and molecular evidence of the metabolic re-routing of CGAs towards lignin. Most of the studies regarding CGA and lignin relationship evaluated the carbon flow between these pathways only in vitro by enzymatic assay or by the analysis of the transcript level. Although these approaches are important and helpful, it is only the first steps towards understanding CGA and lignin metabolic relationship. Studies using mutants for different genes of both pathways with a metabolomic approach would help to clarify the role of CGA in lignin and how it works in vivo. This information can be used to improve biomass utilization for second-generation bioethanol and cellulose production, but also to improve food quality, since high levels of antioxidants such as CGAs have high nutraceutical value.

27

Chapter 1

This Chapter was published as an article review at

Volpi e Silva N, Mazzafera P, Cesarino I. Should I stay or should I go: are chlorogenic acid mobilized towards lignin biosynthesis? V 166, October 2019, 112063, DOI: https://doi.org/10.1016/j.phytochem.2019.112063 28

Should I stay or should I go: are chlorogenic acids mobilized towards lignin biosynthesis?

Nathalia Volpi e Silvaa, Paulo Mazzaferaa,b, Igor Cesarinoc,* a Department of Plant Biology, Institute of Biology, State University of Campinas, Campinas- SP, Brazil b Department of Crop Science, College of Agriculture “Luiz de Queiroz”, University of São Paulo, Piracicaba - SP, Brazil c Department of Botany, Institute of Biosciences, University of São Paulo, Rua do Matão 277, CEP 05508-090, São Paulo - SP, Brazil

*Corresponding author: Igor Cesarino, +55 11 3091 7550, [email protected]

Abstract Chlorogenic acids (CGAs) and the lignin are both products of the phenylpropanoid pathway. Whereas CGAs have been reported to play a role during stress responses, lignin is a major component of secondary cell walls, providing physical strength and hydrophobicity to supportive and water-conducting tissues. Because the chemical structure of CGAs largely resembles those of some lignin intermediates and because CGAs can be converted back to hydroxycinnamoyl-CoAs in vitro, CGAs have been considered authentic intermediates of the lignin biosynthetic pathway. However, it is still unclear whether and how the CGA pool can be channelled towards the production of lignin monomers in response to developmental or environmental signals. Comprehensive studies on the catalytic activity of recombinant enzymes together with functional characterizations in planta have been very useful in understanding the potential interdependence between these two metabolic routes. Here we present the current understanding on CGA metabolism and discuss the biochemical and molecular evidence of the metabolic re-routing of CGAs towards lignin.

Key words: chlorogenic acids; lignin; phenylpropanoids; shikimate; quinate; hydroxycinnamoyl CoA:shikimate/quinate hydroxycinnamoyl transferase; caffeoyl shikimate esterase

Chapter 1 - The full version of this article is available online at the link: https://www.sciencedirect.com/science/article/pii/S0031942219304868

29

Chapter 2

30

The role of CSE and HCT in chlorogenic acid and lignin biosynthesis in tobacco.

Nathalia Volpi e Silvaa; Felipe Thadeu Tolentinoa; Ewerton Ribeiroa; Rafaela Gagetti Bulgarellia; Juliana Mayera; Franklin Magnum de Oliveira Silvaa; Eduardo Kiyotaa; Juan P.P. Llerenaa; Igor Cesarinoc; Paulo Mazzaferaa,b. a Department of Plant Biology, Institute of Biology, State University of Campinas, Campinas- SP, Brazil b Department of Crop Science, College of Agriculture “Luiz de Queiroz”, University of São Paulo, Piracicaba - SP, Brazil c Department of Botany, Institute of Biosciences, University of São Paulo, Rua do Matão 277, CEP 05508-090, São Paulo - SP, Brazil

Abstract Phenylpropanoids are involved in several aspects related to the defense of biotic and abiotic stresses. Chlorogenic acids (CGA) and the lignin are both part phenylpropanoid pathway. While CGAs have been related with tolerance to stresses and diseases, lignin is a major component of secondary cell walls, providing physical strength and hydrophobicity to supportive and water- conducting tissues. Although the metabolic pathway leading to the biosynthesis of CGAs and lignin have several common intermediates, it is still unclear whether and how the CGA pool can be used as a reservoir to lignin biosynthesis. Here we developed transgenic plants to HCT and CSE genes, key genes from lignin metabolism, in order to understand the interconnection between these metabolic pathways. We focused on the role of the enzymes CSE and HCT to understand the interconnection between both pathways and the evaluate the impact of CGA as carbon skeleton to lignin pathway in tobacco. Our results indicate that alteration in lignin pathway affect CGA and plant cell wall content, especially in mutants overexpressing CSE that showed a significant increase in CGA indicating that this gene might be also related to CGA metabolism. In addition, a complex regulatory network among lignin biosynthesis seems to be affected since other phenylpropanoid metabolites were also affected. In conclusion, our results come to give strength to the hypothesis that CGA provides carbon skeleton to the lignin pathway. In the opposite direction, an excess of lignin biosynthesis may redirect carbon to CGA pathway.

1. Introduction Chlorogenic acids (CGAs) belong to an important group of dietary antioxidants (Niggeweg et al., 2004; Lallemand et al., 2012a; Lallemand et al., 2012b). These metabolites are soluble esters formed from the conjugation of trans-cinnamic acids and quinic acid (Clifford, 1999; Lallemand et al., 2012a). High levels of CGA can increase pathogen resistance 31

(Niggeweg et al., 2004; Leiss et al., 2009; Pu et al., 2017) and prevent damage caused by abiotic stresses (Clé et al., 2008; Comino et al., 2009). In addition, CGAs have great potential as nutraceuticals considering its benefits for the human health, mainly as antioxidants (Olthof et al., 2001; Thom, 2007; Yamamoto and Obokata, 2008; Van Dijk et al., 2009; Oboh et al., 2013). Lignin and CGA biosynthetic pathways potentially share intermediates and enzymes and several studies suggests the connection between these routes (Hoffmann et al., 2004; Lepelley et al., 2007; Joët et al., 2009; Escamilla-Treviño et al., 2014; Becerra-Moreno et al., 2015; Jacobo-Velázquez et al., 2015). Lignin is mainly deposited in the xylem cells and fibers rays, enabling vascular plants to stand upright, endure mechanical stresses, transport water in the xylem and avoid xylem collapsing under negative pressures during high transpiration rates (Ferrer et al., 2008; Bonawitz and Chapple, 2010; Pereira et al., 2018). On the other hand, lignin forms a complex matrix with the cell wall polysaccharides - cellulose, hemicellulose, and pectin – and is a major contributor to biomass recalcitrance (Li et al., 2008; Cesarino et al., 2012; Mottiar et al., 2016; Lorenzo et al., 2019). This recalcitrance has consequences in downstream applications such as chemical pulping, forage digestibility and production of biofuels paper industry since its removal demands chemical reagents increasing the process costs (Ververis et al., 2004; Schubert, 2006; Faraji et al., 2018; Figueiredo et al., 2019). However, manipulation of lignin biosynthesis has provided a basis for generating plants with reduced lignin content and altered composition, and increased saccharification efficiency (Hoffmann et al., 2004; Vanholme et al., 2013b; Tong et al., 2015; Vargas et al., 2016; Saleme et al., 2017). The key enzymes linking lignin and CGA metabolism are hydroxycinnamoyl CoA: shikimate hydroxycinnamoyl transferase (HCT), hydroxycinnamoyl CoA: quinate hydroxycinnamoyl transferase (HQT) and caffeoyl shikimate esterase (CSE). Together, these enzymes are responsible for the balance among caffeoyl CoA, caffeoylquinic acid (CGA) and caffeoyl shikimic acid (Hoffmann et al., 2003; Niggeweg et al., 2004; Vanholme et al., 2013b). At this point in the route, caffeoyl CoA can be used by the enzyme caffeoyl CoA 3-O- methyltransferase (CCoAOMT) to produce the coniferyl and sinapyl alcohols, which are the precursors of the G and S units in the lignin backbone, respectively, or by HCT/HQT to produce CGA. Theoretically, an excess of CGA could be converted to caffeoyl CoA to be used for lignin production (Joët et al., 2009; Escamilla-Treviño et al., 2014). On the other hand, an over stimulate lignin biosynthesis could re-direct precursors to CGA biosynthesis. 32

So far, most of the data supporting this interconnection were obtained from in vitro or transient assays (Hoffmann et al., 2004; Joët et al., 2009; Escamilla-Treviño et al., 2014; Valiñas et al., 2015). Here, we used genetically modified tobacco plants to provoke an unbalance between CGA and lignin pathways to obtain evidence of their interconnection. We produced simple and double mutants to the enzymes HCT and CSE. It is already accepted that there is an interconnection between CGA and lignin (Hoffmann et al., 2004; Lepelley et al., 2007; Joët et al., 2009; Escamilla-Treviño et al., 2014; Becerra-Moreno et al., 2015; Jacobo- Velázquez et al., 2015; Valiñas et al., 2015), but the intensity it happens and how this can affect mainly lignin biosynthesis is still unknown. Here we aimed to study the interconnection between lignin and CGA through Caffeoyl CoA and the role of CGA as carbon skeleton to lignin biosynthesis. Moreover, CSE was recently discovered in lignin pathway (Vanholme et al., 2013b) and before that, it was believed that HCT was responsible to convert caffeoyl shikimate into caffeoyl CoA, even though in vitro reactions showed that HCT is more efficient in the conversion of the reverse reaction (Hoffmann et al., 2004; Lallemand et al., 2012b; Wang et al., 2014a). The role of HCT in caffeoyl CoA production has only been proved in an enzymatic assay and no one has ever proved if HCT is able to convert caffeoyl shikimate into caffeoyl CoA in plant. Here, we show that caffeoyl CoA is produced mainly via CSE in tobacco. Furthermore, we obtained strong evidences that HCT is capable to produce caffeoyl CoA in vivo in the absence of CSE and recover normal lignin and CGA biosynthesis.

2. Material and Methods 2.1. Sequence analysis

Search for the full length sequences of all alleles from HCT, HQT and CSE through search by keywords and alignment with sequences previously described in the literature [Hoffmann et al., (2003) (AJ5078251), Niggeweg et al., (2004) (AJ582651 e AJ582652) and Vanholme et al., (2013b) (AT1G52760)] in public database available: NCBI (National Center for Biotechnology Information), SOL Genomics (Bombarely et al., 2011) and Tabacco EST clones from BY-2 cells Database Search (Altschul et al., 1997). Tobacco was chosen as plant material for this study because it accumulates chlorogenic acids (Niggeweg et al., 2004). The sequences were aligned in BioEdit Sequence Alignment Editor v. 7.0.9.0 (Hall, 1990) using the following parameters: 85% minimum match percentage and 20 bp minimum overlap. The contigs obtained were confronted with sequences from NCBI using the algorithm BlastX 33

(Altschup et al., 1990). Domain search was performed using ScanProsite (http://www.expasy.ch – Castro et al., (2006).

2.2. Vector assemble and amiRNA design

RNA was extracted from tobacco leaves based on Chang et al., (1993) protocol. DNAse I from Bio-Rad™ was used to eliminate DNA and quantification was made at 260 nm in a spectrophotometer. cDNA synthesis was performed with Superscript III ®(Invitrogen™). Full-length HCT and partial CSE (877 from 984 bp) were amplified by RT-PCR and the product of amplification was cloned in pDONR221® vector using Gateway® System. The vectors used to create the cassette of expression are based in Gateway® technology this way, two rounds of PCR were required to insert the attB recombination site into the PCR product. The first PCR consists of the amplification of the gene of interest and partial addition of the attB site. The second round of PCR completes the addition of the full attB site, which allows recombination of the PCR product with the pDONR221, the entry vector. The PCR product was used for recombination between the entry vector (pDONR221 ™) and the insert through the BP clone enzyme. After confirming the identity of the cloned genes in pDONR221 ™, the LR clonase reaction from the Gateway® system was performed for recombination of the genes with the final expression vector pK7GW2. To design the amiRNA primers to CSE gene we used the website http://wmd3.weigelworld.org/cgi-bin/webapp.cgi (Ossowski et al., 2008). The amiRNA was designed using MIR319a as a precursor by PCR-based mutagenesis and the plasmid pRS300 (Schwab et al., 2006). The first step in the construction of the amiRNAs consists of 3 PCRs: (a), (b) and (c). The second step is to use the three reactions of the first step ((a), (b), (c)) as the template for amiRNA production. After this step, we added the attB site by PCR using the primers AttB1 and AttB2. The PCR product was used to perform the Gateway recombination using BP clonase reaction with the pDONR221 ™ entry vector and the colonies obtained were submitted to colony PCR to confirm the presence of amiRNAs in the vector. After confirmation in the pDONR221 ™ vector, a colony was sequenced to confirm it the correct sequence was inserted in the entry vector. Next, it was possible to proceed through the assembly of the amiRNACSE to the final vector pK7GW2 by LR clonase reaction. The insertion of the amiRNA into the pK7GW2 vector was confirmed by PCR. Moreover, the vector was transformed into A. tumenfaciens and colonies that were positive in colony PCR were used for the transformation of tobacco. 34

In total, three different constructions were developed. First, we assembled by Gateway® the amiRNA from CSE with the constitutive promoter CsVMV (Verdaguer et al., 1996) into pK7WG2 vector, to create the downregulation vector. Second, we produced two vectors, the overexpression vectors for HCT and CSE both using CaMV35S promoter in pK7WG2. Third, we produced a double cassette construction using the cassette from Gateway® Multisite pXB2m43GW2, to insert pCsVMV::amiCSE, and fuse it to pK7WG-HCT. After cloning HCT into pK7WG2, we digested it with XbaI for linearization and treated with alkaline phosphatase for dephosphorylation. The same enzyme was used to remove m43GW2 fragment from pXB2m43GW2. After purification of the fragments, the DNA was quantitated using Nanodrop and pK7GW2-HCT and the m43GW2 fragment was fused by T4 ligase from Invitrogen, following manufacturer's guidelines. The engineered vector was cloned into E.coli resistant to ccdb. Using this strategy, a new site of recombination by Gateway Multisite® was created: pK7WG2-HCT::m43GW2. We used three building blocks to assemble this new recombination site and generate a double cassette vector: to insert pCsVMV, we used pEN-L4- 4-R1, insert amiCSE we used pDONR221 and to insert the terminator tOCS we used pEN-R2- 8-L3. After recombination, the vectors were cloned in E.coli DH10B by thermal shock and colony PCR was done to confirm if the construct was correct. After confirmation, the vectors were inserted into A. tumenfaciens and the positive colonies were used for tobacco transformation.

2.3. Plant Transformation

In order to generate stable tobacco plants, the binary vectors were introduced into Agrobacterium tumenfaciens EHA105 and leaf tissues were transformed following the protocol described by Horsch et al., (1985). We used 3 months old tobacco plants. Six rounds of transformation were performed, in each of the experiments. Each transformation experiment referred to the transformation of a specific construct described in Table 1. The plants that showed normal roots development were transferred to the growth chamber under a photoperiod of 12 h of light and 25°C. T1 plants were germinated in vitro in a B.O.D. chamber set to 16 h of light and 25°C in MS medium (Murashige and Skoog, 1962), supplemented with Kanamycin (100 mg / L) and transferred to greenhouse after one month.

35

Table 1. Experiment design to obtain transgenic plants.

Construction Expected Result Number of Plants used obtained CSE 1 e 2 Overexpression of CSE 18 amiCSE Downregulation of CSE 9 HCT:amiCSE Overexpression of HCT and 28 downregulation of CSE

2.4. qRT-PCR analyses

Six-months-old plants cultivated in pots with a capacity of 5 kg soil in a greenhouse were used to carry the next steps of the experiments. To analyse the level of expression of CSE, HCT and HQT in different tissues of wild type (WT) tobacco we extracted the RNA from the following tissues: flowers, old leaves (2 basal leaves), young leaves (4 apical leaves), old stem (10 cm from the basal stem), young stem (10cm from the apical stem) and root, all in four biological replicates. The samples were immediately frozen in liquid nitrogen and stored at -80 ºC until further analysis. For qPCR analyses from T1 generation from mutants, 1-year-old plants were collected. The stem was collected excluding the first 5 cm from the base up to the 7th internode and all leaves from 7th – 10th internode were collected and stored in -80°C for subsequent analyses. For RNA extraction we used Trizol (Life Technologies) and DNase I from Ambion. For cDNA synthesis, we used 1 µg of RNA and the iScript cDNA Synthesis Kit (Bio- Rad). The cDNA was diluted 50X and 3 µl were used for each qPCR reaction, which was carried out with the iTaq Universal SYBR Green Supermix (Bio-Rad). To screen and analyze mutants and double-mutants we extracted RNA from leaves and stem using RNAeasy Plant Mini Kit (Qiagen) and RNase-free DNase Set (Qiagen). cDNA was produced from 500 ng of RNA using the iScript cDNA Synthesis Kit (Bio-Rad). The cDNA was diluted 25X and 3 µl were used for each qPCR reaction. All the analyses were done using qbase+ software, version 3.0 (Biogazelle, Zwijnaarde, Belgium - www.qbaseplus.com). The expression level was normalized with the constitutive genes PP2a and EF1a (Schmidt and Delaney, 2010). The primers used are shown in Table 2.

36

Table 2. Primers designed to qRT-PCR analyses.

Primer Sequence 5’ - 3’ HQTqPCR1_F CAGATTTTGGATGGGGAAGG HQTqPCR1_R GCCAAACGCAAGTTCCTATC HCTqPCR_F AAACCAGCGTGTCCATCTTC HCTqPCR_R ACACATGTCCTGCCAACATC CSEqPCR_F TTGCCAGGCAGTGTGAATAC CSEqPCR_R GCTTGAGGGATTTGTCCTTG

2.5. Phenolic profiling

Phenolic profiling was carried out by Ultra-Performance Liquid Chromatography coupled to a Mass Spectrometer (UPLC-MS / MS) using the protocol described by Torras- claveria et al., (2012) with modifications. Samples from leaf and stem (30 mg) from 1-year-old mutants and WT plants were lyophilized and used for analyses. The stem was collected excluding the first 5 cm from the base up to the 7th internode and all leaves from 7th – 10th internode were collected and stored in -80°C for all biochemical analyses. The material was extracted with 500 µL methanol:H2O (4:1, v/v). This mixture was then vortexed for 30 s, sonicated for 10 min, and shaken for 2 h at room temperature. After centrifugation at 10.000 rpm for 15 min, the supernatant was collected and dried using a centrifugal vacuum evaporator.

The same volume of methanol:H2O (4:1, v/v) was added to the pellet and the procedure was repeated. Dried N. tabacum extracts were made up in 600 µL ethanol:H2O (1:1, v/v) and filtered through a 0.2 µm syringe filter (PTFE Millex-LG, Merck).

2.6. Lignin quantification

Lignin quantification was made using the acetyl bromide method (Foster et al., 2010). We also quantified lignin monomers and S/G ratio using Ultra-Performance Liquid Chromatography Coupled to a Mass Spectrometer (UPLC-MS) according to the protocol described by (Mokochinski et al., 2015). After lyophilization of the samples from stem, 80mg of each sample was used to determine S/G ratio. Initially, the samples were hydrolysed in 2mL of 4M NaOH at 95° for 24 hours. Next, the samples were acidified with 1.6 mL of 6M HCl, mixed for neutralization and centrifuged by 13.000 rpm for 5 minutes. An aliquot of 500 µL from supernatant was transferred to a 2mL tube and 1mL of ethyl acetate was added to extract the organic phase. The last step was repeated, and the samples dried under a stream of N2. 37

Lastly, the sample was resuspended into 1mL of miliQ water and the solution was analyzed in UPLC-MS.

2.7. Plant cell wall polysaccharides and saccharification

Stem of the plants was lyophilized and used (100 mg) to determine cell wall polysaccharides, following the protocol described by (Chen et al., 2002). The percentage of saccharification was determined using 30 mg of lyophilized tissue sample and followed the protocol described by (Llerena et al., 2019).

2.8. Morphological and histochemical analyses

The transgenic lines selected (three for each: HCTamiCSE and CSE) together with WT were grown randomly in the greenhouse for 1 year when they had approximately 60-70 cm in height (Supplementary. Figure 1). Height, internode length, leaf number and area, leaf and stem fresh and dry weight were measured in these plants, but the number of replicates varied for each line according to the plants available for analysis. Histochemical analyses were performed as a first step to verify changes in lignin deposition and morpho-anatomical alterations. Such analyses were done according to a protocol described by Vanholme et al., (2013b). The cuts were made in the 7th internode of the plants. For each lineage, three plants were analyzed in order to ensure that the differences found are due to the presence of the transgene and not just biological variations. The fresh material was stained with the Wiesner stain (1g of phloroglucinol in 100 mL of 95% EtOH and 16 mL of 37% HCl), placing a drop on top of the cut. For Maule staining, the samples were prepared by incubating for 5 minutes in 1% KMnO4 solution, followed by rinsing with water and incubating in 37% HCl and adding one drop of NH4OH (14.8M). The materials were observed in a Zeiss microscope, model AXIOSKOPE, for documentation and later analysis.

2.9. Statistical analyses

The statistical analyses were carried out with RBio (Bhering, 2017) and R statistical software version 3.1.2 (Team, 2011). We performed one-way analysis of variance (ANOVA), and the means were tested by the Tukey test at a 5% significance level. In order to integrate data, we performed multivariate analysis by Principal Component Analysis (PCA) with Minitab® 17 (Minitab 17 Statistical Software). For PCA, data were normalized to maximize the variance of each component. Moreover, to give the results an easy understanding, a graphical 38

representation of the metabolic profiling data was provided as a heatmap (Howe et al., 2010) and correlation network performed using RBio Software.

3. Results

3.1. Bioinformatic analysis

3.1.1. HCT

Four HCT genes (gene_27881, gene_45849, gene_29243, and gene_83292) were identified using NCBI and SOL Genomics databases. These four genes were described in the SOL genomics database and will be referred to in this work as HCT1, HCT2, HCT3, and HCT4 respectively. HCT1 was previously described by Hoffmann et al., (2003). In order to confirm these data, the sequences found in BlastN were aligned to confirm the presence of 4 putative isoforms. We used the ORF Finder (NCBI) to obtain the coding sequence (CDS) and they were translated, aligned with each other and an identity matrix was made using the BioEdit program (Supplementary Table 1). In order to verify the presence of the isoforms in the tobacco genome, a BlastN was carried out in the SOL Genomics using the N. tabacum TN90 Genome database (Figure 1). The presence of four HCT genes in the tobacco genome was confirmed at the positions described in Supplementary Table 2.

Figure 1. BlastN in SOL Genomics from HCT in the N. tabacum TN90 Genome database to identify tobacco haplotypes.

The search for conserved domains was performed using the Batch CD-Search tool (Marchler-Bauer and Bryant, 2004) in the NCBI database, which allows searching of conserved domains in multiple protein sequences. The search was done separately with the four putative isoforms of HCT. The first domain found PLNO2663 is a nonspecific domain and is described as domain hydroxycinnamoyl-CoA:shikimate/quinate hydroxycinnamoyltransferase. 39

Moreover, a transferase domain (pfam02458) was also found (Yang et al., 1997), this domain is characteristic of the BAHD family of plant CoA-dependent and is also present in enzymes anthranilate N-hydroxycinnamoyl/benzoyltransferase (HCBT) involved in the biosynthesis of phytoalexins (Yang et al., 1997; St-Pierre and Luca, 2000), and the enzyme deacetylvindoline 4-O-acetyltransferase (DAT) (EC: 2.3.1.107), which catalyses the last step in the vindoline pathway (St-Pierre et al., 1998). The HXXXD motif is probably the active site and characterizes the BAHD superfamily, to which belong the enzymes HCBT, DAT, HCT and HQT (St-Pierre and Luca, 2000).

3.1.2. CSE

To find the homologous sequence from tobacco CSE we used the protein sequence from Arabidopsis thaliana CSE, previously described by Vanholme et al., (2013b), obtained from TAIR database (AT1G52760). Through the analyses in the NCBI and SOL Genomics databases, it was possible to identify the presence of two possible CSE gene haplotypes (mRNA_119258_cds and mRNA_108581_cds – Supplementary Table 3). Using the ORF Finder program (NCBI) the CDSs sequences were obtained, which were compared to each other using the BioEdit program. The translated protein sequences were used to construct an identity matrix in the BioEdit program to verify their proximity, which showed 0.965 identities between the haplotypes. In order to confirm if there are two isoforms of CSE in tobacco, a BlastN was performed in SOL Genomics in Genomic database (N. tabacum TN90 Genome) and two separate loci were identified for each sequence as shown in Figure 2 and Supplementary Table 4.

Figure 2. BlastN in SOL Genomics from CSE in the N. tabacum TN90 Genome database to identify tobacco haplotypes.

Caffeoyl shikimate esterase or lysophospholipase 2 (CSE - At1g52760) was first functionally described in the phospholipid repair during stress conditions (Gao et al., 2010). This enzyme was described as having monoacylglycerol O-acyltransferase, monoacylglycerol 40

lipase and lysophospholipase activities in vitro (Gao et al., 2010; Vijayaraj et al., 2012). Here, we searched for conserved domains using the Batch CD-Search tool (Marchler-Bauer and Bryant, 2004) in the NCBI database. The same domains were found for the two putative CSE haplotypes we found in tobacco. We found the hydrolase_4 domain (Pfam 12146), part of the Esterase-lipase superfamily (cl21494), and the multidomain PLN02298 and lysophospholipases and alpha/beta hydrolases, PidB (Whayeb et al., 1996; Karlsson et al., 1997; Nardini and Dijkstra, 1999). Other multidomain found were: PHA02857, described as monoglyceride lipase (Esteban and Buller, 2005); Abhidrolase_6 (pfam12697), from the alpha/beta hydrolases family; the PST-A (TIGR01607), found in Plasmodium falciparum and Plasmodium yoelli and which is closely related to the lysophospholipases and alpha/beta hydrolases of plants; and the multidomain PRK14875, described as the E2 subunit domain of acetoin dehydrogenase.

3.2. Vector assembly and amiRNA design

The Sequences found (described in topic 3.1.) were used to design specific primers (Table 3) in order to clone the genes HCT1, CSE1 and CSE2. The HCT1 was selected instead of the other haplotypes since it has already been characterized in the literature and has its function in the lignin pathway proved in vitro (Hoffmann et al., 2003; Hoffmann et al., 2004). Two rounds of PCR were performed in order to clone CSE and HCT into pDONR221. The first PCR can be seen in Figures 3A and 3Band the second round of PCR can be seen in Figure 3C. As the first time PCR was done to add the attB overhang in CSE2 sequence had little amplification (Figure 3C), the PCR reaction was performed again for this gene (Figure 3D).

Table 3. Primers designed to clone HCT1, CSE1, and CSE2.

Primer Sequence 5’ – 3’ AttB1 5'GGGGACAAGTTTGTACAAAAAAGCAGGCT3' AttB2 5'GGGGACCACTTTGTACAAGAAAGCTGGGT3' 5'AGAAAGCTGGGTCTCAAAAGTCATACAAGAACTTCT HCTTabGtw_R C3' HCTTabGtw_F 5'AAAAAGCAGGCTTCATGAAGATCGAGGTGAAAGA3' CSETab2_RGtw 5'AGAAAGCTGGGTCTCAACGAGTGATACATTCCATC3' CSETab2_FGtw 5'AAAAAGCAGGCTTCATGGCGTCCGACGTACC3' CSETab1Gtw_F 5'AAAAAGCAGGCTTCATGGCGTCAGACGTGCC3' 5'AGAAAGCTGGGTCATGATACATTCCATCATAAAGCT CSETab1Gtw_R TGA3'

41

Figure 3. PCR amplifying full-length genes HCT1; CSE1 and CSE2. A) First round - Gradient PCR HCT1, and CSE1 M – 1Kb Ladder; 1 – Negative Control 62°C; 2 – HCT1 62°C; 3 – Negative Control; 4 – HCT1 66°C; 5 – Negative Control 68°C; 6 – HCT1 68°C; 13 – Negative Control 62°C; 14 – CSE1 62°C; 15 – Negative Control 66°C; 16 – CSE1 66°C;17 – Negative Control 68°C; 18 – CSE1 68°C. B) First round PCR CSE2 PCR M – 100pb Ladder; 1- Negative Control; 2 – CSE2. C) Second Round PCR -Adding attB overhang M – 100 pb Ladder; 1 – HCT1; 3 – CSE1; 4 – CSE2. D) Second Round PCR – Adding attB overhang M – 100 pb Ladder; 1 – Negative Control; 2 – CSE2.

Figure 4. Colony PCR to confirm gene insertion into pDONR221™ vector. A) HCT1 (1- 6), M -1Kb Ladder; B) CSE1 (4-6), M -1Kb Ladder; C) CSE2 (2-14), 1 – Negative Control, M- 100 pb Ladder.

To confirm the insertion of CSE and HCT into pDONR221 ™ vectors, the positive colonies (Figure 4 A-C) were sequenced by the Sanger methodology to verify the insertion and identify the presence of CSE1, CSE2, and HCT1. Subsequently, the CSE and HCT1 were cloned into pK7WG2 final vector. 42

The vectors were inserted into Agrobacterium tumenfaciens strain EHA-105. Colony PCR was used to confirm the presence of the vectors in the agrobacteria before transformation (Figure 5). In addition, the vectors from HCT1 gene was used to produce the multisite vector for the double mutant.

Figure 5. Colony PCR to confirm final vectors: pK7GW2-CSE1 (1-5); pK7GW2-CSE2 (6- 10) insertion in A.tumenfaciens. M – 1 Kb Ladder; B – Negative control; V1 – Vector pK7GW2- CSE1; V2 – Vector pK7GW2-CSE2.

For the construction of the amiRNAs, we used a set of primers described in Table 8. The first step consisting of 3 PCRs: (a), (b) and (c) - Figure 6 A – were used as template as the template for amiRNA production - Figure 7 B. After this step, we added the attB site by PCR (Figure 6 C) using the primers AttB1 and AttB2 described in table 3. The PCR product (Figure 6 C) was inserted into pDONR221 ™ entry vector and we developed colony PCR to confirm the presence of amiRNAs in the vector (Figure 7). The insertion of the amiRNA into the pK7GW2 vector was confirmed by PCR (Figure 8 - v1). Moreover, the vector was transformed into A. tumenfaciens and colonies that were positive in colony PCR (Figure 8) were used for the transformation of tobacco.

43

Table 4. Primers used for amiRNA design.

Primers Sequence MIR319aA 5’ GGGG ACA AGT TTG TAC AAA AAA GCA GGCTTC CTG Gtw CAA GGC GAT TAA GTT GGG TAA C 3’ amiRNA MIR319aB 5’GGGG AC CAC TTT GTA CAA GAA AGC TGG GTT GCG GAT Gtw AAC AAT TTC ACA CAG GAA ACA G 3’ CSETabmi 5’GATGTCATGTAAACAGTGCGCTTTCTCTCTTTTGTATTCC Rs-I 3’ CSETabmi 5’GAAAGCGCACTGTTTACATGACATCAAAGAGAATCAATG Ra-II A 3’ CSE CSETabmi 5’GAAAACGCACTGTTTTCATGACTTCACAGGTCGTGATATG R*s-III 3’ CSETabmi 5’GAAGTCATGAAAACAGTGCGTTTTCTACATATATATTCCT R*a-IV 3’

Figure 6. Construction of amiRNA by PCR. A) First-round - 3 reactions (a),(b),(c) M – 100 bp Ladder, 7-9 CSE (a)(b)(c) respectively; B) Second round PCR- used the first reaction as template M – 1Kb Ladder; 3 – amiRNA CSE, 4 – Negative control; C) Addition of attB overhang clone it by Gateway®, M – 1Kb, 1 – Negative Control, 4 – amiRNA CSE.

Figure 7. Colony PCR to confirm insertion of amiRNAs into pDONR221™ by Gateway®. M – 100 bp Ladder; 1 – Negative Control; 12-16 –amiRNACSE. 44

Figure 8. Colony PCR to confirm amiRNACSE insertion the final into vector pK7GW2 in A. tumenfaciens. M – 1 Kb; B – Negative Control; V1 – Vector pK7GW2 amiRNACSE; 1- 5 Colonies from A.tumenfaciens with the construction pK7GW2 amiRNACSE. We assembled a vector to contain two cassettes of expression, one to overexpress HCT1 and the other to silence CSE: p35S::HCT-pCsVMV::amiRNACSE. For this, the vector pK7GW2-HCT (p35S:: HCT) was digested with XbaI enzyme (Figure 9 A), and the vector pXB2m43GW2 was digested with the same enzyme and the m43GW2 fragment was cut from the gel (Figure 9 B). After assembled of both parts we developed colony PCR using the primers (Table 5) to identify the one containing the insert in the desired orientation (Figure 10). Two positive colonies were obtained for the pK7GW2-HCT construct fused to the m43GW2 fragment (Figure 11). The multisite vector was used in the subsequent steps for insertion of the cassette pCsVMVamiRNACSE into the vector pK7GW2-HCT::m43GW2. The vectors were cloned in E.coli DH10B by thermal shock and colony PCR was done with the primers MIR319A and B (Table 4) to confirm if the construct was correct (Figures 9). After confirmation, the vectors were inserted into A. tumenfaciens and the positive colonies were used for tobacco transformation.

45

Figure 9. Digestion with the enzyme XbaI from the vector used to multisite assemble. A) Vector pK7GW2-HCT (1); B) Vector pXB2m43GW2 (1-9). M- 1 Kb Plus.

Table 5. Primers used to check the correct orientation of the insert m43GW into pK7GW2HCT:m43.

Primer Sequence KanF 5' ACTCTAATTGGATACCGAGGGG 3' m43R 5' GAGCTCGTTTTCCCAGTCAC 3'

Figure 10. Colony PCR to check the correct orientation of m43GW2 into pK7GW2HCT- m43GW2, two E.coli colonies were tested. M- 1 Kb Plus; B – Negative Control; 1 – 4 Colony 1 with different temperatures Tm 55°C, 57°C, 60°C e 62°, respectively; 5-8 Colony 2 with different temperatures Tm 55°C, 57°C, 60°C e 62°C respectively.

46

Figure 11. Colony PCR from final vector pK7GW2HCT-amiCSE to check if pCsVMVamiRNACSE, was inserted. M- 1 Kb Plus; 1-10 pK7GW2HCT-amiCSE; 11 – Positive Control (vector pCSVMVamiCSE).

3.3. Plant Transformation

The activity of CSE in lignin pathway has already been confirmed in several species (Vanholme et al., 2013b; Ha et al., 2016; Saleme et al., 2017), and to confirm its importance in tobacco we developed a series of mutants overexpressing and downregulating CSE. Moreover, we developed a double mutant to overexpress HCT and downregulate CSE to try to recover the dwarfism phenotype we found in plants downregulating CSE. The number of plants obtained for each construction is described in Table 1.

Down-regulating CSE gene showed a severe impact in plant development indicating this gene is essential for plant development in tobacco. These plants were severely stunted. In culture media, we observed callus and leaves formation but stagnation of plant development. To confirm the phenotype observed we performed five extra rounds of transformation, all with around 200 explants each, and in all cases the phenotype found was the same. These data indicate that product from CSE is essential for the normal development of tobacco plants. The cse dwarf plants were cloned by tissue culture to obtain enough material to perform qPCR in order to confirm the silencing of the CSE gene. In contrast, plants overexpressing HCT and downregulating CSE (transformed with the vector pK7GW2 HCT-amiCSE) showed normal development indicating that somehow the HCT enzyme can reverse the cse phenotype and whatever is causing it. Indicating the possibility of ensuring the production of caffeoyl CoA in the absence of CSE.

47

3.3. Gene expression in different tissues of WT and mutants of tobacco

3.3.1. Different tissues in Wild Type Plants

To better understand the balance between lignin and CGA biosynthesis we quantified the relative expression of the key genes of the pathway: HCT, HQT, and CSE in different tissue of WT 6-month-old tobacco plants (Figures 12 A-C). The level of expression of CSE was 3.52 higher in young leaves and 3.29 higher in young stem than old stem (the lower level of expression in this gene). Old leaves and roots have an intermediate pattern of expression (Figure 12 A). HCT had the highest level of expression in the young stem and the old leaves, the lowest. Roots and old stem also have a high level of expression than old leaves (Figure 12 B). In contrast, HQT had highest level of expression in young leaves, followed by old stem, and the lowest in flower (Figure 12 C). 48

A CSE

B HCT

C HQT

Figure 12. Gene expression in WT tobacco plants from different tissues. A) qPCR from CSE gene; B) qPCR from HCT gene; C) qPCR from HQT gene. The letters represent the different tissues used in the analyses: YS – Young stem; OS – old stem; F – flower; YL – Young leaf; OL – old leaf; R – root. The bar represents the standard error of 4 replicates. The letters represent the Tukey test and ANOVA statistical analyses with p<0.05 value. CNRQ = Calibrated Normalized Relative Quantities. 49

3.3.2. qRT-PCR screening from double and single mutants

Plants downregulating CSE – both haplotypes at the same time – had growth stunted and displaying a severe dwarf phenotype, similar to the phenotype reported for tobacco plants downregulating HCT (Hoffmann et al., 2004). For this reason, these plants did not generate decedents and we just analyzed T0 plants. The expression of HCT and HQT and CSE in these plants are shown in Figure 13 and the relative level of expression of each gene can be seen in table 6. Generally, these plants had low transcript level of CSE and HQT and no change in HCT transcript levels.

Table 6. The transcript level of genes CSE, HCT, and HQT in cse mutants.

Tobacco Lines CSE HCT HQT WT 1.00 1.00 1.00 amiCSE90 0.54 1.54 0.37 amiCSE102 0.45 0.67 0.15 amiCSE78 0.37 1.25 0.34 amiCSE103 0.43 0.47 0.41 amiCSE31 0.41 0.95 0.37 amiCSE19 0.61 0.68 0.32 amiCSE91 0.86 2.83 0.50 amiCSE75 0.42 0.77 0.42 amiCSE101 0.40 0.92 0.34

50

4.0 CSE HCT HQT 3.5 3.0 2.5 2.0 1.5 1.0

0.5 Relative Quantities RelativeQuantities (CNRQ) 0.0

Figure 13. Gene expression of CSE, HCT, and HQT by qRT-PCR of cse downregulated tobacco plants. The bar represents the standard error of 3 replicates for WT plants, no statistical analyses, and standard error was done to amiCSE lines since in T0 generation each line is considered one independent line. CNRQ = Calibrated Normalized Relative Quantities.

We screened the other transgenic plants obtained – CSE and HCTamiCSE – (Table 1) by qRT-PCR to select the lineages to further analyses of T1 (Supplementary Figure 2). For the double mutants (pK7WG2 HCTamiCSE) we screened 28 lineages and selected the lineages 1, 12 and 18. Plants overexpressing CSE1 and CSE2 were also screened, 9 of each haplotype. Considering both haplotypes, only the transgenic overexpressing CSE2 showed high levels of CSE expression, for this reason, we selected lineages 8, 9, 12 from CSE2. In T1 generation we analyzed the relative expression of the transgenic lines comparing it to wild type in two different tissues: stem and leaves. Although the transgenes were transformed under the control of strong constitutive promoters, we decided to analyze both tissues as lignin is mainly found in stem and CGA in leaves. The relative expression in both tissues differed considerably for some transgenic lines and generally, the difference of expression among WT and the transgenic plants were bigger in leaves than in the stem, (Figures 14 A- D). As expected, and opposite to CSE – except the event HCTamiCSE18 – the HCT expression was increased in tobacco plants (Figure 14 A - B). Except for the expression in the leaves of the event HCTamiCSE1, it was evident that CSE expression was decreased in tobacco plants (Figures 15 C- D). The lowest and highest expression of CSE and HCT was observed in the mutant HCTamiCSE12 (Figure 14 A-D).

51

A 18 a 16 14 12 10 b 8 6 4 2 c c

0 Relative Quantities (CNRQ) Quantities Relative

B HCT Stem

6 a bc 5 ab 4 3 2 c 1

0 Relative Quantities RelativeQuantities (CNRQ)

CSE Gene expression qRT-PCR - Leaves C 1.6 * 1.4 1.2 1 0.8 0.6 0.4 0.2

0 Relative Quantities RelativeQuantities (CNRQ) D

2.0 1.8 1.6 1.4 a 1.2 1.0 0.8 0.6 b 0.4 bc 0.2 c

0.0 Relative Quantities RelativeQuantities (CNRQ)

Figure 14. Gene expression of double mutants in the T1 generation. A) HCT leaves; B) HCT stem; C) CSE leaves; D) CSE stem. The letters and asterisk represent the Tukey test and ANOVA statistical analyses with p<0.05 value, graphics without letters or asterisk means the data have no statistical difference. The bar represents the standard error of 4 replicates. CNRQ = Calibrated Normalized Relative Quantities. 52

As expected, the expression of CSE was increased in the tobacco plants transformed with pCaMV35S::CSE (CSE overexpression). Even though the leaves of the event CSE8 showed a large variation (Figure 15 A), expression was increased in the stem and leaves of all the events. CSE 8 was the line with the highest level of expression while CSE9 was the lowest in both tissues (Figure 15 A – B). A

45 a 40 35 30 ab 25 20 15 10 bc 5 c Relative Quantities RelativeQuantities (CNRQ) 0 B

50 a 45 40 35 30 25 20 ab 15 10 5 b b Relative Quantities (CNRQ) Quantities Relative 0

Figure 15. Gene expression of overexpression mutants of CSE in the T1 generation. A) CSE leaves; B) CSE stem. The letters represent the Tukey test and ANOVA statistical analyses with p<0.05 value, graphics without letters means the data have no statistical difference. The bar represents the standard error of 3 replicates for CSE9 and WT and 2 replicates for CSE12 and CSE8. CNRQ = Calibrated Normalized Relative Quantities.

53

3.5. Phenolic Profiling

We studied also the phenolic profile in stems and leaves of CSE and HCTamiCSE mutants (Figures 16 and 17, respectively). In the stem, except to quinic acid, all other phenolics analyzed – shikimic acid, CGA and caffeic acid (CA) – had their content affected in the mutants (Figure 16 A – D). CA content in stem increased in all lines, ranging from 33% to 51% more than WT, with exception to HCTamiCSE18 that did not differ from WT (Figure 16 C). Following the same pattern from CA, CGA content changed in all mutants, with exception to HCTamiCSE18, with an increase of up to 53% more than WT (Figure 16 D). In the leaves (Figure 17 A – D) there was a large variation for quinic and shikimic acids (Figure 17 A; B), but a discrete increase was observed for caffeic acid (Figure 17 C) and a clear increase in CGAs in both mutants (Figure 17 D). In general, the largest increases of CGA were observed in the CSE mutants, where it was observed an increased up to 67% more than WT (Figure 17 D). 54

A

25

20

15

10

ug/mg ug/mg mass dry 5

0

B

0.18 0.16 0.14 0.12 0.1 0.08 0.06

0.04 ug/mg ug/mg dry mass 0.02 0

C

0.18 0.16 a a a a 0.14 ab 0.12 b b 0.1 0.08 0.06

ug/mg ug/mg dry mass 0.04 0.02 0

D 16 14 12 10 8 6

ug/mg ug/mg mass dry 4 2 0

Figure 16. Phenolic profiling in the stem of transgenic lines. A) Quinic Acid; B) Shikimic Acid; C) Caffeic Acid; D) Chlorogenic Acid. The letters represent the Tukey test and ANOVA statistical analyses with p<0.05 value, graphics without letters means the data have no statistical difference. The bar represents the standard error of 3 replicates for HCTamiCSE 1, 12 and 18; CSE9 and WT and 2 replicates for CSE12 and CSE8. 55

A 12 10 8 6 4 ug/mg ug/mg dry mass 2 0

B 0.35 0.3 0.25 0.2 0.15

0.1 ug/mg ug/mg dry mass 0.05 0

C 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15

ug/ mg mg ug/ dry mass 0.1 0.05 0

D

30 ab a 25 ab ac 20 bc bc c 15 10 ug/mg ug/mg dry mass 5 0

Figure 17. Phenolic profiling in leaves of transgenic lines. A) Quinic Acid; B) Shikimic Acid; C) Cafeic Acid; D) Chlorogenic Acid. The letters represent the Tukey test and ANOVA statistical analyses with p<0.05 value, graphics without letters means the data have no statistical difference. The bar represents the standard error of 3 replicates for HCTamiCSE 1, 12 and 18; CSE9 and WT and 2 replicates for CSE12 and CSE8. 56

3.6. Lignin content and composition

In order to understand how the production of lignin in tobacco stems was impacted in CSE and HCTamiCSE mutants, we determined the total lignin content by acetyl bromide method and quantified the lignin monomers – S, G, and H – to understand if an alteration in lignin pathway could change lignin structure. Even though no statistically significant difference was found, probably due variation among the replicates, we noticed a tendency among mutants and double mutants. Compared to WT, lines HCTamiCSE12 and CSE12 had the most pronounced change in total lignin content increasing 14% and 15% (Figure 18 A). These same lines also had the S/G ratio more affected, 19%, and 16% higher than WT respectively (Figure 18 B). When we analyzed H monomers, we could observe decreasing in almost all lines, where CSE 8 and 9 showing the lowest levels – 36% and 20% lower than WT. CSE 12, HCTamiCSE1 and 18 decreased 12%, 14%, and 11%, respectively, while HCTamiCSE 12 increased 9% (Figure 18 C). 57

A 8 7 6 5 4 3 % Cell Wall Cell % 2 1 0 B 1 0.9 0.8 0.7 0.6 0.5 0.4 S/G S/G Ratio 0.3 0.2 0.1 0

C S G H 60 50 40 30 20 10

0 nmol/100 mass dry mg nmol/100

Figure 18. Lignin quantification in CSE mutants and HCTamiCSE double mutants compared to WT. A) Total Lignin Content measured by acetyl bromide; B) S/G ratio; C) Lignin Monomers. Tukey test and ANOVA statistical analyses with p<0.05 value but NO statistical difference was found. The bar represents the standard error of 3 replicates for HCTamiCSE 1, 12 and 18; CSE9 and WT and 2 replicates for CSE12 and CSE8. 58

3.7. Plant cell wall polysaccharides and saccharification

To estimate the impact of CSE and HCT activities change in the plant cell wall of the mutants we analyzed the polysaccharides – cellulose, hemicellulose, and pectin. The highest increase of cellulose was observed in CSE12, 35%. In the other mutants, the increase was discrete and in line HCTamiCSE1 there was a decrease of 18% in cellulose (Figure 19 A). Hemicellulose content in all CSE lines increased by 9% compared to WT while in double mutants it ranged from 1% to 5% (Figure 19 B). Pectin level changed in CSE 9 and 12 – decreased by 14 and 12% WT – and in HCTamiCSE1 – an increase of 6% (Figure 19 C). Interestingly, all transgenic lines of HCTamiCSE had an increase in plant saccharification efficiency. HCTamiCSE1 had 57%, HCTamiCSE18 had 24%, and HCTamiCSE12 had 19% of increase compared to WT. On another hand, the saccharification efficiency was 15% (±1) lower in all CSE lines (Figure 19 D). 59

A

180 a 160 ab 140 ab ab ab ab 120 b 100 80 60

mg/g dry mg/g mass dry 40 20 0

B 250

200

150

100

mg/g dry mg/g mass dry 50

0

C 30 25 20 15

10 mg/g dry mg/g mass dry 5 0

D

140 120 100 80 60

Efficiency % Efficiency 40 20 0

Figure 19. Determination of plant cell wall polysaccharides and saccharification efficiency of CSE mutants and HCTamiCSE double mutants. A) Cellulose; B) Hemicellulose; C) Pectin; D) Percentage of saccharification. Tukey test and ANOVA statistical analyses with p<0.05 value but no statistical difference were found. The bar represents the standard error of 3 replicates for HCTamiCSE 1, 12 and 18; CSE9 and WT and 2 replicates for CSE12 and CSE8. 60

3.8. Morphological and histochemical analyses

In order to verify if the mutants had changed their growth, we measured the following parameters: plant height, internode length, leaf number, leaf area, fresh and dry weight of T1 plants with one-year-old. The morphological characteristics observed did not change statistically, probably due to a large variation we found in each group. For example, we counted the number of leaves in all mutants and WT plants and while almost all lines of CSE and HCTamiCSE ranged between 30 – 41 leaves the CSE8 line had 53 leaves, this means an increase in 50% compared to WT – which had 35 leaves (Figure 20 A). Even though this difference is considerable, when we look closely, we observed that both groups (CSE8 and WT) had a large variation among the replicates – CSE8 ranged from 44 to 62 while WT ranged from 29 to 49. Leaf area reduced in all mutants, but it is more notable in CES8 line, 40% the average found for WT leaf area (Figure 20 B). Both parameter, fresh mass and dry mass from leaves, followed the same pattern (Figure 20 C – D). It did not significantly among the groups but showed a tendency to decrease in CSE8 line. The internode length (Figure 20 E) varied a lot in all lines ranging from 1 to 5 cm while in double mutants it was more stable – an average of 3 cm – and tended to decrease compared to WT – an average of 3.75 cm. The variation inside each replicate had high variation but generally, it did not change compared with WT plants (Figure 20 F). The stem fresh and dry mass did not change significantly among the groups analyzed and tended to decrease CSE9 line (Figure 20 G – H). Interestingly, CSE8 line, who showed a tendency to losing dry mass in leaves did not the same pattern in stem.

61

A E

70 4.5 60 4 3.5 50 3 40 2.5 30 cm 2 1.5 20

1 Number Leaves Number of 10 0.5 0 0

B F 900 70 800 60 700 50 600 500 40 400 cm 30 300 20

Leaf Area Area (cm2) Leaf 200 100 10 0 0

C G 30 70 25 60 50 20 40 15 cm 30 10 20 5

g Fresh MassFresh g Leaves 10 0 0

D H 4.5 70 4 3.5 60 3 50 2.5 40

2 cm 30

1.5 g Dry Dry g Mass 1 20 0.5 10 0 0

Figure 20. Morphological analyses of CSE mutants and HCTamiCSE double mutants compared to WT. A) Leaf Number; B) Leaf Area; C) Leaf Fresh Mass; D) Leaf Dry Mass; E) Distance between internodes; F) Stem Height; G) Stem Fresh Mass; H) Stem Dry Mass. Tukey test and ANOVA statistical analyses with p<0.05 value but NO statistical difference were found. The bar represents the standard error of 3 replicates for HCTamiCSE 1, 12 and 18; CSE9 and WT and 2 replicates for CSE12 and CSE8.

62

Plants downregulating CSE analyzed in T0 reached the maximum height after 1 month in culture and remained unchanged for the next 5 months (1 – 2 cm), as observed in figure 21 A.

Figure 21. In vitro growth of cse mutants and WT after 2 months of transformation by A. tumenfaciens. A) pK7GW2-amiRNACSE; B) Wild Type (WT).

Phloroglucinol-HCl stain showed a similar pattern of lignin in all plants analyzed. Maüle staining showed a change of color from blood-red to brown-yellow in all transgenic lines analyzed when comparing to WT indicating a decrease in S monomers (Figure 22 A – U) what is not consistent with the analyses of lignin content and composition. 63

Figure 22. Histochemical analysis of lignin in transgenic tobacco lines. Cross-section of stems from 7th internode from the top from the plant with 1year-old, WT, and transgenic lines. A) WT colored with Phloroglucinol-HCl reagent 10X - 100nm; B) WT coloured with Phloroglucinol-HCl reagent 20X - 20nm; C) WT coloured with Maüle reagent 20X – 20 nm; D) CSE8 coloured with Phloroglucinol-HCl reagent 10X - 100nm; E) CSE8 coloured with Phloroglucinol-HCl reagent 20X - 20nm; F) CSE8 coloured with Maüle reagent 20X – 20 nm; G) HCTamiCSE1 coloured with Phloroglucinol-HCl reagent 10X - 100nm; H) HCTamiCSE1 coloured with Phloroglucinol-HCl reagent 20X - 20nm; I) HCTamiCSE1 coloured with Maüle reagent 20X – 20 nm; J) CSE12 coloured with Phloroglucinol-HCl reagent 10X - 100nm; K) CSE12 coloured with Phloroglucinol-HCl reagent 20X - 20nm; L) CSE12 coloured with Maüle reagent 20X – 20 nm; M) HCTamiCSE12 coloured with Phloroglucinol-HCl reagent 10X - 100nm; N) HCTamiCSE12 coloured with Phloroglucinol-HCl reagent 20X - 20nm; O) HCTamiCSE12 coloured with Maüle reagent – 20 nm; P) HCTamiCSE18 coloured with Phloroglucinol-HCl reagent - 100nm; Q) HCTamiCSE18 coloured with Phloroglucinol-HCl reagent 20X - 20nm; R) HCTamiCSE18 coloured with Maüle reagent – 20 nm; S) CSE9 coloured with Phloroglucinol-HCl reagent 10X - 100nm; T) CSE9 coloured with Phloroglucinol-HCl reagent 20X - 20nm; U) CSE9 coloured with Maüle reagent – 20 nm 64

3.9. Pearson correlation and network analyses

We also performed Pearson correlation and network analysis from stem data of each mutant group to access the level of association between the traits analyzed. The full data set of correlation efficiency is shown as a heat map (Figure 23). In the stem of CSE mutants, a statistically significant correlation was observed between cellulose and S/G (R=0.99), monomer S (R=0.95) and CGA (R=0.96). Also, caffeic acid and shikimic acid displayed a positive correlation (R=0.99), while pectin and S/G ratio (R=-0.97) showed a negative correlation. When we analyze the HCTamiCSE double mutants, the metabolites connected differently, and the statistically significant correlation was only found between quinic acid and hemicellulose (R=0.97). However, while saccharification in CSE mutants was related with stem dry mass, the network analyses for the HCTamiCSE mutants indicated that the changes in saccharification efficiency were associated positively with pectin, hemicellulose, shikimic and quinic acid.

65

CSE_Stem HCTamiCSE_Stem

Quinic Stem DW G Acid Hemicellulose Sacarification S

H

Shikimic Caffeic Acid Sacarification Acid Hemicellulose S/G Cellulose

Pectins Quinic Lignin Pectins Acid S/G Shikimic Acid H Chlorogenic Acid Chlorogenic S Acid G Caffeic Cellulose Lignin Acid Stem DW Positive 0.73-0.99 0.50-0.72 0.23-0.49 0.00-0.22 Negative 0.73-0.99 0.50-0.72 0.23-0.49 0.00-0.22 Cellulose 0.51 Cellulose 0.56 Chlorogenic Acid 0.48 *0.96 Chlorogenic Acid 0.65 -0.22 Caffeic Acid 0.67 0.55 0.71 Caffeic Acid 0.21 -0.68 0.86 Shikimic Acid 0.61 0.62 0.78 *0.99 Shikimic Acid 0.23 -0.54 0.54 0.72 Quinic Acid -0.47 0.45 0.54 0.14 0.26 Quinic Acid 0.20 -0.31 0.22 0.38 0.92 S 0.75 *0.95 0.91 0.66 0.69 0.18 S 0.84 0.14 0.71 0.50 0.72 0.68 G 0.79 0.93 0.91 0.72 0.75 0.15 *1.00 G 0.79 0.36 0.39 0.14 0.59 0.71 0.92 H 0.31 -0.28 -0.49 -0.48 -0.56 -0.85 -0.11 -0.13 H 0.72 0.49 0.60 0.17 -0.30 -0.49 0.31 0.13 Pectins -0.65 -0.92 -0.80 -0.37 -0.41 -0.13 -0.94 -0.91 -0.11 Pectins -0.71 -0.93 -0.11 0.41 0.51 0.43 -0.24 -0.33 -0.78 Hemicellulose 0.34 0.61 0.80 0.91 0.95 0.54 0.59 0.63 -0.79 -0.31 Hemicellulose -0.03 -0.41 0.05 0.30 0.86 *0.97 0.49 0.56 -0.68 0.58 Sacarification 0.12 0.46 0.21 -0.43 -0.39 -0.01 0.40 0.31 0.51 -0.68 -0.42 Sacarification -0.23 -0.79 0.23 0.61 0.89 0.83 0.33 0.23 -0.64 0.83 0.88 S/G 0.58 *0.99 0.92 0.49 0.55 0.33 *0.97 0.94 -0.13 *-0.97 0.51 0.55 S/G 0.76 -0.11 0.94 0.78 0.71 0.50 0.91 0.67 0.46 -0.12 0.31 0.35

Stem DW -0.31 0.35 0.14 -0.59 -0.51 0.34 0.16 0.06 0.20 -0.45 -0.40 0.89 0.39 Stem DW -0.18 -0.46 0.45 0.54 -0.19 -0.56 -0.31 -0.65 0.43 0.15 -0.57 -0.12 0.12

Lignin % Lignin Cellulose Acid Chlorogenic Acid Caffeic Acid Shikimic Acid Quinic S G H Pectins Hemicellulose Sacarification S/G % Lignin Cellulose Acid Chlorogenic Acid Caffeic Acid Shikimic Acid Quinic S G H Pectins Hemicellulose Sacarification S/G

Figure 23. Correlation matrix based on Pearson coefficient derived from the average data of stem for 1-year old CSE and HCTamiCSE mutants. Significant correlation coefficient (p<0.05) are indicated by an asterisk, with positive and negative correlations being distinguished by red and blue, respectively. A Pearson correlation and network analyses were also established for leaves (Figure 24). For CSE mutants, no statistically significant correlation was found. Interestingly, in the HCTamiCSE double mutants’ negative correlations among almost all the compounds analyzed were observed: caffeic acid and quinic acid (R=-1.00); CGA and quinic acid (R=-0.98); CGA and shikimic acid (R=0.96); CGA and dry mass (R=-0.97). Positive correlations were found between shikimic acid and dry mass (R=0.98), and CGA and caffeic acid (0.97). This strong correlation is highlighted in the network analyses (Figure 24).

66

CSE_Leaf HCTamiCSE_Leaf

Chlorogenic Caffeic Acid Acid

Shikimic Acid Caffeic Acid Leaf DW Leaf DW

Chlorogenic Acid

Quinic Quinic Shikimic Acid Acid Acid

Positive 0.73-0.99 0.50-0.72 0.23-0.49 0.00-0.22 Negative 0.73-0.99 0.50-0.72 0.23-0.49 0.00-0.22

Caffeic Acid 0.85 Caffeic Acid 0.97 Quinic Acid -0.58 -0.06 Quinic Acid -0.98 -1.00 Shikimic Acid 0.13 0.55 0.68 Shikimic Acid -0.96 -0.91 0.94

Leaf DW -0.88 -0.85 0.28 -0.51 Leaf DW -0.97 -0.89 0.91 0.98

Caffeic Acid Caffeic Acid Quinic Acid Shikimic Acid Chlorogenic Acid Caffeic Acid Quinic Acid Shikimic Chlorogenic Acid Chlorogenic

Figure 24. Correlation matrix based on Pearson coefficient derived from the average data of leaves for 1-year old CSE and HCTamiCSE mutants. Significant correlation coefficient (p<0.05) are indicated by an asterisk, with positive and negative correlations being distinguished by red and blue, respectively.

3.10. Principal Components Analysis

In order to identify notable differences between mutants and WT plants we analyzed all data by Principal Components Analysis - PCA (Figure 25). The first principal component (PC1) and the second principal component (PC2) accounted for 39.9% and 20.2% of the total variation, respectively. By PCA score plots we could identify three groups. The first group is the WT contrasting with profiling of CSE mutant and HCTamiCSE double mutants – CSE mutants are the second group and HCTamiCSE the third group. Interestingly, the line 9 behaved differently than the other two lines of CSE mutants, as it grouped together with the double mutants. One of the main contributors for the formation of WT and CSE8/CES12 groups were 67

the higher levels of S and G monomers, and chlorogenic acid (both organs), caffeic acid (in leaves) and hemicellulose in the this two CSE mutant lines. On the other hand, saccharification level, quinic acid, and shikimic acid, both from stem provided a massive contribution for the separation of the HCTamiCSE double mutants from all other transgenic plants we analyzed.

Figure 25. Principal component analyses (PCA) for transgenic plants and WT. PCA was performed on the correlation matrix of least square means. The number in parentheses give the percentage variation explained by the first (PC1 – 39.9%) and the second components (PC2 – 20.2%) which together comprise 60.1% of the total variance. A) shows the score plot where the circle colors indicate the clusters to which metabolite was assigned using hierarchical cluster formed by Pearson distance (red WT; green double mutants and CSE 9; yellow CSE 8 and CSE12); B) shows the loading plots obtained for the resulting of the distribution of the analysed parameters. 68

4. Discussion

4.1. Search of HCT and CSE alleles and expression profile

The enzymes CSE and HCT are positioned at the beginning of the lignin pathway and they were described as related with the production of G and S subunits (Vanholme et al., 2013; Hoffmann et al. 2003). Hoffmann et al. (2004) first proved HCT activity in planta by producing gene-silenced A. thaliana and Nicotiana benthamiana. CSE had its activity proved more recently in A. thaliana by (Vanholme et al., 2013b). Here we search HCT and CSE sequences in the databases available and found 4 haplotypes for HCT and 2 for CSE. Considering that N. tabacum is an allotetraploid (2n=4x=48) that evolved from interspecific hybridization between the ancestors Nicotiana sylvestris (2n=24) and Nicotiana tomentosiformis (2n=24) (Leitch et al., 2008), it was expected to find up to four different haplotypes for each locus. It was also analyzed the protein domains for all genes. For all HCT we found the same motifs, characteristic of BAHD superfamily, group from which HCT belongs. Similarly, for CSE it was also found the same motifs for both haplotypes, all of them are in accordance with the characteristics of this enzyme as previously described (Gao et al., 2010; Vijayaraj et al., 2012).

It is well known that CGA and lignin metabolic pathways share common intermediates but the connection between CGA and lignin pathway are still unclear. It has been suggested that CGA route probably acts as carbon donor to lignin pathway (Días et al., 1997; Comino et al., 2009; Joët et al., 2009; Escamilla-Treviño et al., 2014). Another important question is related to a balance between the pathways, i.e., is there a competition by intermediates in these pathways? To answer this question first it is necessary to investigate if the key genes involved in both metabolic pathways are expressed in the same organ or if they have different expression pattern. For this reason, we analyzed the expression of the three genes (HQT, HCT, and CSE) involved in the biosynthesis of CGA and lignin in different organs of WT plants. According to Niggeweg et al., (2004), 98% of the CGA produced in tobacco is produced by HQT. They also found evidence indicating that most of CGA in tobacco is found in leaves. We found relatively high levels of HQT transcripts in leaves followed by old stem (Figure 12 C). High levels of HQT was also found in potato leaves and skin tuber by Payyavula et al., (2015). As an antioxidant, the CGA levels in the old part of the stem may be related to the accumulation of reactive oxygen species in old tissues (Petrov et al., 2015). HCT transcripts were mostly found in the young stem (Figure 12 B), probably because of the high activity of 69

lignin metabolism during the formation of the vascular tissue (F. Evert, 2006). The expression of HCT was also expressive in roots where xylem formation is important for water transport. High level of lignin in roots and its importance to root development has been previously reported (Abiven et al., 2011; Naseer et al., 2012; Zhao et al., 2013). It is noteworthy that HCT and HQT showed an opposite pattern of expression, indicating a balance between lignin and CGA routes. Interestingly, while HCT was more expressed in the stem (mostly in the young stem) and HQT more expressed in young leaves, CSE has an intermediate pattern of expression, i.e., it is found mostly expressed in young leaves and young stem respectively (Figure 12 A). A positive correlation between CSE and HQT transcripts has been previously described in potato tubers (Valiñas et al., 2015). Differently of we found here, in potato tuber CSE seems to have a stronger relationship with HQT than with HCT, suggesting greater importance of CSE in CGA instead of lignin pathway in this species (Valiñas et al., 2015). In tobacco this relationship seems to be different, as our results suggest that CSE pattern of expression have similarities with both genes (HCT and HQT), suggesting an involvement in both pathways.

4.2. Downregulation of CSE severely impact plant development

To better understand the impact of CSE in tobacco, we developed tobacco cse mutants. The most significant characteristic of cse mutants was the dwarf phenotype with growth and development severely affected (Figure 21 A). Defect in plant growth and development caused by manipulation in lignin metabolism has been reported for several genes: CCR1 (Ruel et al., 2009); HCT (Hoffmann et al., 2004; Shadle et al., 2007); C3’H (Franke et al., 2002b; Takeda et al., 2018); C4H (Schilmiller et al., 2009) including for Medicago truncatula cse mutants (Ha et al., 2016). Ruel et al., (2009) developed ccr1 Arabidopsis mutants with a severe dwarfed phenotype and using transmission electron microscopy observed a strong collapse of xylem cells. Recently, Meester et al., (2018) developed ccr1 ProSNBE: CCR1 able to overcome this dwarfed phenotype described earlier using a vessel-specific promoter and developed viable plants. It has been argued that lignin dwarfed phenotype may have a collapse of conducting vessels, making the development of these plants impossible (Pereira et al., 2018). Lignin provides to the vascular system mechanical strength, stiffness and hydrophobicity to support gravity, mechanical stress and negative pressure generated by perspiration allowing the transport of water and solutes along with the plant (Ferrer et al., 2008; Pereira et al., 2018). Recently, the dwarf phenotype was described as lignin modification-induced dwarfism - LMID (for a recent review see Muro-Villanueva, Mao and Chapple, 2019) and was related with 1) 70

collapse of conducting vessels; 2) accumulation of an intermediate or derivates compounds of phenylpropanoid pathway; 3) changes in the integrity of plant cell wall structure. Taking into account the second hypothesis, downregulation of CSE may affect the flow of the phenylpropanoid pathway and other secondary metabolites may have their biosynthesis blocked or overstimulated. Our cse mutants have the caffeic acid production locked and this compound has been described as the major regulator of monolignols biosynthesis (Wang et al., 2014a). Caffeic acid content is responsible for regulating the expression of key enzymes of the phenylpropanoid pathway such as PAL and 4CL (Wang et al., 2014a). The inhibition of PAL activity could lead to an over-accumulation of , which in excess could lead to dwarfism (Vanholme et al., 2019a). The reduction in C4H activity leads to a dwarfed phenotype (Schilmiller et al., 2009) and their drastic reduction in lignin metabolism was associated to an increase in cinnamic acid derivates (Van de Wouwer et al., 2016). Perturbations in auxin signaling have been associated with LMID and trans-cinnamic acid can act as an anti-auxin compound (Schilmiller et al., 2009; Bonawitz and Chapple, 2013). The CSE role in the lignin pathway was already investigated in A. thaliana, M. truncatula, dicot, Leguminosae), poplar (Populus deltoides, dicot, Salicaceae), and switchgrass (Panicum virgatum, monocot, Poaceae) (Vanholme et al., 2013b; Ha et al., 2016; Vargas et al., 2016; Saleme et al., 2017). Although in all these species CSE has a clear role in the lignin pathway, in each species CSE seems to have a different level of importance. Poplar cse mutants did not show any phenotype abnormality compared to WT, whereas cse mutant of M. truncatula plants was severely dwarfed (Ha et al., 2016; Saleme et al., 2017), similar to the phenotype we obtained in tobacco. These data indicate that CSE is the preferable route for the biosynthesis of caffeoyl CoA in the lignin pathway for these species. Ha et al., (2016) produced CSE loss of function mutants using transposon insertion and observed in M. truncatula severe dwarfing, altered development, reduction in lignin content, and preferential accumulation of hydroxyphenyl (H) units. Even though in Arabidopsis cse mutants did not present severe dwarfism, the plants were 37% smaller than WT and had an increase in up to 30% accumulation of H units (Vanholme et al., 2013b). The same authors recovered normal phenotype in Arabidopsis cse mutants using a vessel-specific promoter to drive CSE expression, suggesting that at least in this species, the phenotype found was probably caused by the collapse of conducting vessels. The differences in the impact of CSE downregulation in different species might be a consequence of the HCT efficiency to produce caffeoyl CoA to the lignin biosynthesis (Ha et al., 2016). Here, we could recover WT phenotype overexpressing HCT 71

using a constitute promoter (pCaMV35S). Furthermore, besides dwarf phenotypes we also observed a reduction of H units mainly in plants over-expressing CSE (CSE lines) and less in the HCTamiCSE lines (Figure 18 C), indicating that HCT might be partially restoring H biosynthesis. To gain information about our mutant, we analyzed the level of expression of CSE, HCT, and HQT by qRT-PCR in the cse mutants and although we did not find significant changes in HCT transcript level, the expression of HQT was on average 64% reduced compared to WT expression (Figure 13). As HQT has been suggested as the main enzyme in the biosynthesis of CGA in tobacco (Niggeweg et al., 2004), this data suggests that CGA metabolism was directly affected and it might be involved in dwarfing induction in the cse mutants. Most of the known CGAs are conjugates of quinic acid and caffeic acid (Clifford, 1999), and an unbalance in caffeic acid could affect HQT level of expression by negative feedback. Furthermore, these results are in accordance with 1) our analysis of qRT-PCR in different plant organs (Figure 12), when we could clearly see a pattern of expression that connects this gene to both CGA and lignin metabolic pathways, and 2) with the mutants overexpressing CSE, that showed accumulation in CGA content (Figure 17 D).

4.3. HCT overexpression overcome cse dwarfism and CSE mutants accumulate CGA without affecting lignin content

Interestingly, when we associate downregulation of CSE with overexpression of HCT in the HCTamiCSE mutants, we obtained plants close to the normal phenotype of WT plants. Despite a loss in up to 39% of dry mass (Figures 20 D and 20 H), our double mutants generated fertile plants, which flowered at the same time of WT plants. This is a proof that at least in tobacco, HCT is able to convert caffeoyl shikimate into caffeoyl CoA in planta, a reaction that has only been shown before in vitro and indicates a clear preference in the reverse reaction indicating this reaction could not happen in vivo (Hoffmann et al., 2004; Vanholme et al., 2013b; Escamilla-Treviño et al., 2014; Wang et al., 2014a). The phenolic profiling from these mutants showed that even though both mutants – HCTamiCSE and CSE lines – showed their caffeic acid content affected in stem, the CGA was only affected in leaves of the CSE mutants. Caffeic acid content increased in the stem of all CSE mutants’ lines but it was only statistically significant for the lines CSE 9 and 12 (Figure 16 C). This compound also accumulated in the double mutants (lines HCTamiCSE1 and 12), but less than the CSE mutants, even though these plants presented very low levels of expression 72

of CSE. Whether and at which rate HCT and CSE operate modulating the formation of caffeoyl- CoA is unknown in plants harbouring both enzyme activities. Tobacco hct mutants showed an increase in CGA content in the stem indicating its involvement in CGA catabolism instead of biosynthesis (Hoffmann et al., 2004). The caffeic acid accumulation in our double-mutants may be related with a lower efficiency of HCT to convert caffeoyl CoA to CGA acid, thus inhibiting the conversion of caffeic acid by 4CL. Escamilla-Treviño et al., (2014) showed in switchgrass that two recombinant HCTs (PvHCT1a and PvHCT2a) displayed in vitro activity to convert efficiently caffeoyl-shikimate to caffeoyl CoA, while an HCT-like (PvHCT-Like1) was able to convert caffeoyl CoA to CGA, thus exhibiting HQT activity and preferring quinic acid as acyl acceptor. The recombinant PvHCTLike1 was less efficient to catalyse the formation of CGA from caffeoyl CoA than the reaction using 4-coumaroyl CoA and quinic acid to form 4- coumaroyl quinate. Thus, we speculate that even the double mutants had over-expression of HCT and inhibition of CSE, the low affinity of HCT for caffeoyl CoA may have led to the inhibition of 4CL since it was observed an accumulation of caffeoyl CoA. This may also explain the reason CSE and HCTamiCSE mutants accumulate caffeic acid. The accumulation of caffeic acid also argues against a possible activity of CSE using CGA as substrate (see Figure 2 in introduction).

Our double mutants downregulating cse and overexpressing HCT (HCTamiCSE) did not have their CGA content significantly affected (Figure 17 D). However, the mutants overexpressing CSE had their CGA content affected with an increase in up to 67% in leaves of CSE9 line (Figure 17 D). This accumulation can be the result of a redirection of the excess of caffeic acid produced by CSE overexpression, and CGA may serve as a carbon reservoir for the excedant carbon flowing in the lignin metabolism. The increase in the stem was less expressive, on average 31% compared with the WT plants (Figure 16 D). The major accumulation of caffeic acid in stem instead of leaves is an indication that CGA was more actively synthesized in the leaves, although quinic acid had not changed its level. In fact, caffeic acid is used both for CGA and lignin biosynthesis. Additionally, the greater accumulation of CGA in leaves instead stem could be explained by the fact that more HQT transcripts were found in leaves than in stem (Figure 12 C). HQT is the preferable enzyme for CGA synthesis in tobacco (Niggeweg et al., 2004) and for this reason, an excess of substrate could favor CGA production in leaves instead of stem. It is noteworthy, that caffeic acid concentration might be a key point in lignin metabolic flux. The excess of caffeic acid has been associated with inhibition of PAL and 4CL (Wang et al., 2014a; Van de Wouwer et al., 2016). Once it is blocked, the metabolic pathway cannot flow 73

to the lignin pathway until the excess was re-routed. This data may explain why we did not find difference in lignin content for both mutants and indicates that the flow of carbon in the lignin pathway is finely regulated. Our correlation and network analyses in stem of CSE mutants found a positive correlation between caffeic acid and shikimic acid content (R= 0.99 – Figure 23 A). Shikimate also increased in the stem of the mutants indicating that whatever the same protein or not of the reaction caffeoyl CoA → CGA, the accumulation of this phenolic is a strong indication that CSE was the main route for lignin biosynthesis in tobacco, thus efficiently draining caffeoyl CoA for monomers synthesis. The excess of caffeic acid produced in stem leads to an overproduction of shikimic, which explains the over-accumulation of this intermediate (Vanholme et al., 2019b). Shikimate esters intermediates are not essential for the biosynthesis of monolignols but have been considered as the preferred substrate for C3H though, this way it is possible that shikimic acid act in monolignol regulation (Vanholme et al., 2019b). In spite of the changes in levels of caffeic and shikimic acids, there was not a change in quinic acid. The Pearson correlation and network analyses from leaves (Figure 23) identified a negative correlation between caffeic acid and quinic acid content (R=-1) in the double mutants and this correlation have no significance in CSE mutants. This data indicates that although both mutants have the same branch engineered, quinic and shikimate branches were affected differently in leaves. It is noteworthy, that transgenic lines with the highest level of CSE transcripts level (lines CSE8 and 12 were 23 and 36-fold higher than the WT – Figure 15) are more contrasting between themselves than when they are compared to WT. They also tended to accumulate more caffeic acid and CGA (Figure 25). Differently of these lines, CSE9 tended to accumulate more lignin then CGA and had less CSE expression (9-fold than the WT – Figure 15). Thus, it seems that an overflow of the CSE branch favors CGA biosynthesis probably because of an increase in the amounts of the intermediates. The balance, however, between the two routes for the biosynthesis of caffeoyl CoA seems to be more complex as the double mutant with the highest level of expression of HCT (19-fold than WT) and the lowest level of expression of CSE (0,1- fold than WT – Figure 14) – HCTamiCSE12 – tended to be closer to the CSE lines and accumulate more CGA while the other two double mutants with the lower expression level of HCT, tend to accumulate more lignin. Differently from what expected, our mutants did not increase massively lignin content or had the ratio S/G changed (Figure 18 A – B). The lignin content showed large 74

variation among replicates from both mutants – CSE and HCTamiCSE – and it did not significantly differ from WT. The same happened to the S/G ratio. Alteration in lignin content associated with an increase in H monomer due to C3H, HCT or CSE manipulation has been described in the literature in several species but these works usually analyses downregulation of these genes (Hoffmann et al., 2004; Shadle et al., 2007; Vanholme et al., 2013a; Vanholme et al., 2013b; Tong et al., 2015; Ha et al., 2016; Ponniah et al., 2017; Saleme et al., 2017; Zhou et al., 2018). Thus, we expected a decrease in H units in our mutants, especially in the CSE mutants. Indeed, H units’ relative amount slightly decreased in CSE mutants, with an average of 22%, although it was not statistically significant (Figure 18 C). Arabidopsis cse-2 loss-of- function mutants had a decrease of 36% in total lignin content associated with an increase of 30X in H units in lignin polymer (Vanholme et al., 2013b). M. truncalata cse loss-of-function mutants had a reduction in 80% of lignin with an increase of 50X of H unit relative amount (Ha et al., 2016). Downregulation of HCT in alfalfa led to a reduction of lignin in up to 50% associated with an increase in H unit into lignin polymer (Shadle et al., 2007). Saleme et al., (2017) described a reduction of 25% in lignin deposition associated with an increase of 113% of H units in lignin polymer in Poplar cse mutants. Considering that there are differences in the level of downregulation of these genes and differences among species, it is well accepted that downregulation of genes positioned at the beginning of S and G metabolic branch can be compensated by an overproduction of H subunit. Counterintuitively, the overexpression of F5H driven by C4H promoter in poplar did not change total lignin content, it changed instead the S/G ratio and lignin polymer was constituted mainly by S units (97.5%) (Stewart et al., 2009). Highlighting the complexity of lignin pathway, our results showed that, lignin biosynthesis is finely regulated in tobacco and the excess of carbon generated by overstimulation of shikimate shunt in the mutants were reallocated into CGA. The correlation and network analysis in stem of CSE mutants (Figure 23 A) showed a positive correlation between the monomer S and CGA content (R=0.96), supporting the hypothesis that the accumulation of CGA did not affect the production of lignin biosynthesis although a possible over stimulation of the pathway was in part flowed into CGA. Even though the saccharification efficiency did not differ between mutants and WT, the CSE and HCTamiCSE plants had the opposite pattern of saccharification efficiency (Figure 19 D). Generally, CSE lines had lower saccharification efficiency than WT, whereas double mutants improved saccharification efficiency in up to 57% in HCTamiCSE1 (Figure 19 D). Furthermore, their cellulose content seems to follow the inverse pattern, while CSE lines increased cellulose content, double mutants decreased it. This is especially pronounced 75

comparing CSE12 and HCTamiCSE1, one increased their cellulose content by 35% whereas the other decreased it by 19% (Figure 19 A). Generally, downregulation of C3H, HCT and CSE increased saccharification efficiency and the cellulose content increased (Tong et al., 2015; Ha et al., 2016; Vargas et al., 2016; Saleme et al., 2017; Zhou et al., 2018). Arabidopsis cse mutants though, had saccharification efficiency increased and cellulose content decreased (Vanholme et al., 2013b), the same response found in our mutants, i.e. an inverse relation between saccharification and cellulose. Even though an increase in cellulose content is usually associated with an increase in saccharification, changes in its crystalline structure can reduce biomass degradability (Marriott et al., 2015; Van de Wouwer et al., 2016). Cellulose is a linear polymer formed by β-1-4 linked glucopyranosyl residues, and adjacent residues are rotated 180° to maintain linearity leading to a highly rigid and insoluble crystalline region (Van de Wouwer et al., 2016). This crystallinity difficult the access of hydrolytic enzymes increasing biomass recalcitrance (Marriott et al., 2015). Thus, increasing cellulose crystallinity could reduce cell wall saccharification. Marriott et al., (2014) analyzed several mutants with enhanced saccharification efficiency from Brachypodium distachyon and showed that it is not necessarily associated with changes in lignin content. Sac1 and sac7 mutants did not change lignin content but instead had a decrease in crystalline cellulose content associated with their increase in saccharification. In addition, sac7 increased its total polysaccharide content whereas sac1 had its cell-wall -bound decreased – which could affect the cross-linking between cellulose with hemicellulose and lignin (Marriott et al., 2014). The correlation and network analyses from stem of CSE mutants showed a positive correlation between S/G ratio and cellulose content (R = 0.99) suggesting that the alterations in cellulose content in our mutants may be caused by the perturbation in lignin metabolism. In summary, our data indicate that there is no competition for the common intermediates between CGA and lignin biosynthesis, while the first seems to be produced mainly in leaves by HQT, the other occurs mainly in the stem by HCT. These enzymes seem to be a key point to determine the flux, while both enzymes showed a very characteristic pattern of expression, CSE showed a pattern of expression that suggests its involvement in both pathways. Supporting this hypothesis, our CSE mutants had the excess of carbon generated by overexpression of CSE remobilized into CGA in leaves but not in stem. It is tempting to hypothesize that these differences in carbon remobilization observed between leaves and stem are part of a strategy to finely regulate lignin branch since the excess of caffeic acid can inhibit lignin route. Additionally, the overexpression of HCT in plants downregulating CSE recovered the dwarfed phenotype completely indicating that HCT is capable to produce caffeoyl CoA 76

from caffeoyl shikimate. This fact is remarkable since it is the first time that the ability of HCT to convert caffeoyl shikimate was proved in planta. The differences found between the mutants might be due to the pool of caffeoyl CoA available. CSE is more efficient in the conversion of caffeoyl shikimate into caffeoyl CoA and possible the main route of production. For this reason, it accumulates more caffeoyl CoA increasing its viability to CGA pathway. In another hand, HCT conversion of caffeoyl shikimate to caffeoyl CoA is not so efficient and this way, lower levels of caffeoyl CoA would be available to remobilized into CGA. Even though these genes are part of the same branch in phenolic pathway they may have different roles in plant metabolism and trigger different responses upon perturbation. Lignin metabolic flux seems to be finely regulated in tobacco, since overexpression of shikimic branch does not affect drastically lignin content. The accumulation of caffeic acid in both mutants is an indicator that this compound might be a key point to regulate lignin pathway in tobacco. To conclude, our data indicates that CSE is the preferable route for biosynthesis of caffeoyl CoA in tobacco and its overproduction favor CGA pathway in leaves. On the other hand, HCT is critical for lignin metabolism and its overexpression associated with cse downregulation does not seem to affect CGA pool (i.e., probably due its low affinity to quinate), this enzyme is also capable to produce Caffeoyl CoA even though it does not seem to be the preferable route of biosynthesis.

Acknowledgments NVS thanks São Paulo Research Foundation for a doctoral fellowship (Processo FAPESP n°2014/17831-5). This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.

77

Chapter 3

78

CRISPR/Cas9 genome editing to modify lignin biosynthesis in Nicotiana tabacum Nathalia Volpi e Silva1,2, Ewerton Ribeiro1, Felipe Thadeu Tolentino1 Oleg Raitskin2, Nicola J Patron2, Paulo Mazzafera1.

1 – Universidade Estadual de Campinas; 2 – Earlham Institute

Abstract

Genome editing using clustered regulatory interspaced short palindromic repeats/CRISPR- associated protein 9 system (CRISPR/Cas9) has been proved as a powerful tool in genome editing and promises to revolutionize the use of biotechnology for crop breeding. CRISPR/Cas9 allows the development of biallelic homozygous in T0 and open a new horizon in crop breeding for the possibility to develop "transgenic-free" genome-edited mutants. Here, we propose the genome editing of tobacco (Nicotiana tabacum) using CRISPR/Cas9 technology in order to understand the carbon flow in lignin metabolism through chlorogenic acid (CGA). We are using a key gene in lignin and chlorogenic acid (CGA) pathways, Caffeoyl CoA O-methyltransferase (CCoAOMT), hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyl transferase (HCT) and caffeoyl shikimate esterase (CSE), as a target to induce mutagenesis. Besides cellulose, lignin represents the main compound in plant cell wall composition and a major challenge in biomass processing, such as the production of second-generation bioethanol. On the other hand, CGA is part of a group of phenolic antioxidants highly important for human dietary and plant defense response. These two biosynthesis routes are inter-connected by caffeoyl CoA and probably CGA acts as a carbon skeleton donor to lignin metabolism. We developed double and single mutants for hydroxycinnamoyl-CoA shikimate hydroxycinnamoyl transferase (HCT), and caffeoyl shikimate esterase (CSE) using traditional transformation methods to prove this hypothesis. We successfully assembled and validated the constructions with two sgRNAs and Cas9 by Agro-transient assay in tobacco leaves. Transiently we were successful in inducing specific mutations in tobacco in both haplotypes of CSE and in two HCT haplotypes. This approach enables the development of stable plants in order to overcome the dwarfism in HCT silenced plants inducing mutation in only two of the four haplotypes present in tobacco. Due to CRISPR/Cas9 specificity, it is possible to inactivate only specific haplotypes and isoforms individually very precisely. This approach would help us to understand the role of HCT in lignin and CGA pathways. In conclusion, our vectors enable the development of CCoAOMT, CSE and HCT genome-edited stable plants to give us more information to confirm if CGA and lignin are inter-connected.

1. Introduction

CRISPR system was discovered in prokaryotes as an antiviral defense mechanism, found in ~40% of bacteria and ~90% of archaea, acting as an adaptive immune system against phages or conjugative plasmids through horizontal gene transfer (Waters and Storz, 2009; Kaya et al., 2016). In prokaryotes, CRISPR loci are a cluster of short repeats sequence separated by a short spacer sequence with the invader DNA added during infections (Figure 1a). During a new infection, a new spacer relative to the foreign DNA is integrated into the endogenous 79

CRISPR array, conferring resistance to subsequent infections (Waters and Storz, 2009; Marraffini, 2015). Adjacent to CRISPR loci is an operon of CRISPR associated (CAS) genes that encode Cas proteins (Figure 1a). These endonucleases are guided by the space sequences in the CRISPR-RNA to cleave the invader genome - Figure 1a (Waters and Storz, 2009; Belhaj et al., 2015; Marraffini, 2015). Several CRISPR/Cas immunity systems have been described, but type II has been the most extensively studied (Marraffini, 2015). Recently, the type II nuclease system originated from Streptococcus pyogenes was engineered to be used as a tool for genome editing in eukaryotes - Figure 1b (Pan et al., 2016). The CRISPR/Cas type II system is comprised of a Cas9 endonuclease protein, two small RNAs - the CRISPR-RNA (crRNA) and the trans-activating crRNA (tracrRNA) - and a protospacer adjacent motif (PAM sequence of 5’-NGG-3’) downstream of the target sequence, Figure 1a (Li et al., 2013; Mahfouz et al., 2014; Belhaj et al., 2015; Marraffini, 2015; Pan et al., 2016). The CRISPR/Cas9 protein is guided by a duplex formed by the two small RNAs to cleave the complementary sequence (Figure 1a). Recently, the duplex has been fused to form a single chimeric guide RNA molecule (gRNA) containing a 20-nucleotide (nt) sequence to mediate targeting information - Figure 1b (Mahfouz et al., 2014; Belhaj et al., 2015; Pan et al., 2016). This finding makes CRISPR/Cas9 technology a promising tool for genome editing considering that theoretically any genomic sequence bearing a PAM could be simply and easily engineered (Li et al., 2013; Mahfouz et al., 2014; Ding et al., 2016). As PAM sequences are highly frequent in genomes, virtually any gene could be targeted (Ding et al., 2016). Due to the simple design of target specificity and its compact nature, CRISPR/Cas9 allows simultaneous targeting of multiple gene loci (Lowder et al., 2015; Ma et al., 2015). Lowder et al., (2015) described the simultaneous editing of up to eight different targets. The group developed a toolbox that allows the transcriptional activation or repression of plant endogenous genes for monocots and dicots. Similarly, Ma et al., (2015) edited 46 target sites in rice with a high percentage of mutation, which was mostly biallelic and homozygous. They also provided loss- of-function mutations by simultaneous targeting all members of a given gene family, either genes from the same pathway or different targets in the same gene for both rice and Arabidopsis. 80

Figure 1. CRISPR from prokaryotes immunity system and as a biotechnological tool for genome editing. (a) CRISPR/Cas9 type II system in prokaryotes. The pre-crRNA is transcribed from CRISPR region and it is processed by Cas9, RNaseIII and the tracrRNA. The tracrRNA base-pairs with repeat region of pre-crRNA so the RNaseIII can cleave it. Cas9 is guided by the crRNA to recognize the protospacer sequence in the foreign DNA and induces a double-break. (b) The use of CRISPR/cas9 as a tool, in which the crRNA and the tracrRNA are fused into a single guide RNA (sgRNA). In this remodeled system, the target is a specific region in the genome of the target species. To enable the movement of Cas9 into the eukaryotic nuclear compartment, a nuclear localization signal is added. This figure is from Belhaj et al., (2015).

Editing genome in plants using CRISPR/Cas9 has been successfully applied in Arabidopsis thaliana (Jiang et al., 2013; Li et al., 2013; Fauser et al., 2014; Lowder et al., 2015), Nicotiana tabacum (Gao et al., 2015), Nicotiana benthamiana (Jiang et al., 2013; Nekrasov et al., 2013; Lowder et al., 2015), rice (Jiang et al., 2013; Lowder et al., 2015), wheat (Wang et al., 2014b), maize (Feng et al., 2016), sorghum (Jiang et al., 2013), petunia (Zhang et al., 2016), barley (Lawrenson et al., 2015), tomato (Brooks, C., Nekrasov, V., Lippman, Z.B. and Van Eck, 2014; Pan et al., 2016), Populus (Zhou et al., 2015), soybean (Cai et al., 2015; Jacobs et al., 2015), Brassica oleracea (Lawrenson et al., 2015) and sweet orange (Jia and Nian, 2014). Tobacco plants (N. tabacum) edited via CRISPR/Cas9 were obtained with a mutation rate of 81.8%, a high level of biallelic mutations and no significant off-targets (Gao et al., 2015). An advantage of using CRISPR/Cas9 in basic and applied research is the possibility to produce homozygous and biallelic mutations already in the first generation (Xu et al., 2015; 81

Zhou et al., 2015; Osakabe et al., 2016a), drastically reducing the time for functional analyses comparing to conventional transgenic approaches. Because the CRISPR/Cas9-induced mutations are transmitted to the next generation (Chen et al., 2001; Feng et al., 2014; Schiml et al., 2014; Belhaj et al., 2015; Ding et al., 2016; Osakabe et al., 2016b) this genome-editing tool can be used to accelerate breeding of elite clones. Osakabe et al., (2016b) was able to increase the heritable efficiency in Arabidopsis thaliana using a truncated gRNA (tru-gRNA) guided- Cas9 driven by a specific-promoter with high expression in germline in order to improve the heritable pattern. The use of tru-gRNA was a strategy to avoid off-targets, a usual issue in genome editing technologies. Another advantage of this technology is the development of “transgenic-free” plants carrying heritable biallelic mutations (Woo et al., 2015; Xu et al., 2015; Zhou et al., 2015). Woo et al., (2015) reported the generation of genome-edited Arabidopsis, tobacco and lettuce plants without introducing foreign DNA into plant cells and had up to 46% of mutants in regenerated plants. Even though the study shows an unpredictable pattern of biallelic mutations in T0 lines, the mutations in T1 lines were stably transmitted to the next generations. In addition, the transgene from transgenic plants containing CRISPR/Cas9 can be segregated out and independently from the edited region, generating “transgene-clean” and homozygous plants for the desired mutation (Xu et al., 2015). Zhou et al., (2015) obtained a poplar plant with a CRISPR biallelic mutation for an important gene from the lignin biosynthesis pathway – 4- coumarate:CoA ligase (4CL). It was the first lignin-related gene engineered via CRISPR/Cas9. Having the advantage of no direct influence from the T-DNA insertion, the group was able to analyze the effect of the edited mutation in 30 independent lines. The results showed a reduction in lignin content and in S/G ratio in 4cl1 mutants with high levels of efficiency and phenotypic reproducibility, usually not found with the conventional techniques previously used, highlighting the robustness of genome editing by CRISPR/Cas9 (Zhou et al., 2015).

Here, we genome-edited tobacco-cells by agro-transient method to lignin-related genes: CSE, CCoAOMT, and HCT. The construction and validation of these constructs open the possibility to develop stable plants with CRISPR/Cas9 to induce mutations to lignin genes in tobacco. Moreover, we believe this way we could overcome the dwarfism previously described in tobacco silenced to HCT (Hoffmann et al., 2004). The N.tabacum genome is allotetraploid (2n=4x=48) (Leitch et al., 2008), therefore, it is expected to contain four different haplotypes for each locus. CRISPR/Cas9 shows high specificity, allowing the accurate targeting of only one haplotype. This strategy is particularly interesting if the idea is to 82

understand the role of different haplotypes individually. Genome editing by CRISPR/Cas9 allows the development of more reproducible and accurate results, besides the generation of “transgene-free” plants, which would imply in higher acceptance by the general public (Belhaj et al., 2015; Lowder et al., 2015; Ma et al., 2015; Tong et al., 2015; Zhou et al., 2015). The use of this technique can be applied to improve the production of second-generation ethanol and food with high levels of antioxidants such as CGAs in a safer way and with better public acceptance than with the use of conventional transgenic strategies.

2. Material and Methods

2.1. Target locus selection and sgRNA design

The sgRNA design was done using the gene sequence from Nicotiana tabacum obtained from NCBI (National Center for Biotechnology Information) and SOL Genomics (Bombarely et al., 2011) databases. After the identification of the gene sequences, the design of the sgRNA followed the criteria described by (Parry et al., 2016). One important point to consider when designing sgRNA is to have 100% identity between the seed sequence and the sgRNA. Another important point is to identify the PAM sequence, the pattern necessary for CRISPR/Cas9 to create the cut and enable the mutation. This region is constituted of the NGG sequence and was used to identify possible targets (Xie et al., 2014). Subsequently, the search for potential off-targets was done via Benchling (Benchling Inc., 2018 - https://benchling.com/) and CRISPR RGEN Tools (Bae et al., 2014). The absence of off-targets is crucial considering we want to target a specific locus. Moreover, the presence of non-CpG sensitive restriction site sequence is also important to evaluate because they are predicted to be disrupted by Cas9 induced indels (Lawrenson et al., 2015).

2.2. Construct DNA assembly and multiplex targeting

Binary plasmid vector constructs were assembled using Golden Gate Modular Cloning (MoClo) as described by (Engler et al., 2014; Lawrenson et al., 2015). NtHCT, NtCCoAOMT, and NtCSE were targeted through the development of both single and multiple haplotypes editing. Each sgRNA was designed following the criteria described by (Parry et al., 2016). 83

2.3. Agroinfiltration: Development, test, and delivery of the constructions in tobacco leaves

To test the targeted mutagenesis of the tobacco genome, transient expression by agroinfiltration was employed. Two-months-old plants cultivated in pots with a capacity of 5 kg soil in a greenhouse were used to carry the experiment. Leaf tissue from tobacco was agro- infiltrated following the protocol described by (Nekrasov et al., 2013) with modifications. After bacterial incubation, the O.D600 was adjusted to 0.1. The leaf was infiltrated with the solution and incubated in the dark for 2 days in B.O.D in a temperature of 21ºC, after this period the plants were transferred to a B.O.D in a photoperiod of 12 light / dark hours and a temperature of 25 ° C. The leaves were collected 6 days after agro-infection and the genomic DNA was extracted using Qiagen’s DNeasy Plant DNA Extraction Mini Kit.

2.4. Genotyping

The genomic DNA of the leaf tissue was analyzed by PCR/RE method as described by (Nekrasov et al., 2013) with modifications. Using the restriction enzyme site loss method, the genomic DNA was digested and PCR reaction with primers flanking the target site to detect mutations was performed. The presence of edited sequences was increased in the PCR product which then could be sequenced to identify mutations.

3. Results and Discussion

3.1. Target locus selection and sgRNA design

3.1.1. HCT

To design the sgRNA, we used as references the bioinformatic analysis from Chapter 2. The analysis was done in the SOL genomics database using the sequence described by Hoffmann et al., (2003) as reference. The results revealed the presence of four haplotypes of HCT in N.tabacum genome as shown in the Supplementary Table 2. The next step was to identify from which ancestor each haplotype has originated from. To achieve this goal, BlastN in SOL Genomics database between the haplotypes was performed and it was created a matrix of identity with the haplotypes and the HCTs found in the ancestors’ genomes using BioEdit, shown in Supplementary Table 5. The matrix of identity showed the identity between the sequences analyzed, we used this matrix with the purpose of identifying from which ancestor each allele probably came from and this way try to group the alleles in groups in order to design 84

the specific sgRNA. It was expected to find two haplotypes closer from N. tomentosiformis and two haplotypes closer from N. sylvestri, instead of that we found HCT 1, 2 and 4 closers from N. tomentosiformis and HCT 2 closer from N. sylvestri. While the haplotypes HCT 1 and 2 present high identity with gene 21152, the HCT 4 have high identity with gene 29186, both from N. tomentosiformis. For this reason, we used the identity between the alleles from N.tabacum as criteria to form two groups (Figure 2): Group 1 (HCTg1), formed by the haplotype HCT 3 and 4 with the identity of 0.0473 and Group 2 (HCTg2), formed by HCT 1 and 2 with the identity of 0.0449. In addition to that and in order to select the best group to target we also performed qPCR for both groups to see if they were differentially expressed in leaves and stem, no significant variation was found (Supplementary Figure 3).

Figure 2. Alignment of HCT isoforms to identify the similarity among the haplotypes. The program Benchling Inc., 2018 was used to align the sequences and the algorithm used was MAFFT. The sequences showed inside the blue square represent the group1 while the sequences inside the red square represent group 2. After defining the groups of haplotypes, their sequences were aligned and the pattern of CRISPR/Cas9 target sites NGG, the PAM, was searched. Although Benchling Inc., 2018 (https://benchling.com/) offers a tool to design sgRNA for N.tabacum, we decided to design the sgRNAs manually because we wanted to target two haplotypes per plant, while the program considers the corresponding haplotype as an off-target. As mentioned before, the region upstream of the PAM (20 nt) is the seed sequence and it needs to be 100% identical to the sgRNA recognition sequence. For this reason, we selected regions that were identical 85

between the haplotypes within a given group. In addition, to avoid off-targets, we used CRISPR RGEN Tools Cas-OFFinder (Bae et al., 2014) that searches for off-targets in the N. tabacum genome.

Since the best target was selected the next step was to select the sequence to the sgRNA forward primer scaffold. The scaffold is variable depending on which template is used for the next step and on the promoter used. In this case, the Arabidopsis U6-26 promoter (the template pICSL9002) was used. The forward primers are in table 1:

Table 1. Primers and position in the gene to create the sgRNAs for all haplotypes of HCT. Gene sgRNA Target 20nt Selected region sgRNA F Primer Name bp HCT3 tgtggtctca ATTG ACCCCAAGTGTT sgRNA. and 127-147 ACCCCAAGTGTTTACTTTTAC TACTTTTAC 1.1 HCT4 gttttagagctagaaatagcaag HCT3 tgtggtctca ATTG AACACTTGGGGT sgRNA. and 116-138 AACACTTGGGGTGTGGAAATT GTGGAAATT 1.2 HCT4 gttttagagctagaaatagcaag HCT3 tgtggtctca ATTG AAGACGCCGGA sgRNA. and 355-375 AAGACGCCGGAGTTCCAAAGT GTTCCAAAGT 1.4 HCT4 gttttagagctagaaatagcaag HCT3 tgtggtctca ATTG TTGGTGATTTTG sgRNA. and 338-358 TTGGTGATTTTGCGCCTACTT CGCCTACTT 1.6 HCT4 gttttagagctagaaatagcaag HCT1 tgtggtctca ATTG ACTTTTCCGTCG sgRNA and 144-164 ACTTTTCCGTCGAAGAAATTT AAGAAATTT 2.2 HCT2 gttttagagctagaaatagcaag HCT1 tgtggtctca ATTG CACTTGTGCCGT sgRNA and 188-208 CACTTGTGCCGTTTTATCCTA TTTATCCTA 2.3 HCT2 gttttagagctagaaatagcaag HCT1 tgtggtctca ATTG CAACGGCGGGG sgRNA and 345-364 CAACGGCGGGGATGAGTTGA ATGAGTTGA 2.6 HCT2 gttttagagctagaaatagcaag HCT1 tgtggtctca ATTG CCCGCCGTTGAT sgRNA and 355-374 CCCGCCGTTGATTACTCACA TACTCACA 2.7 HCT2 gttttagagctagaaatagcaag

The reverse primer is the same (tgtggtctca AGCGTAATGCCAACTTTGTAC ) for all sgRNAs. The vector pICSL70001 contains the sgRNA scaffold. Amplification with the primer pair from the previous step resulted in a PCR product that is the specific sgRNA as shown in Figure 3. 86

Figure 3. PCR amplification with the vector pICSL70001 containing the sgRNA scaffold and the sgRNA primer designed for each sgRNA from HCT and CSE; Tm= 60°C. M - Marker NEB 2 -log; 1 - sgRNA 1.1; 2 – sgRNA 1.2; 3 – sgRNA 1.4; 4 – sgRNA 1.6; 5 – sgRNA 2.3; 6 – sgRNA 2.6; 7 – sgRNA 2.7; 10 – sgRNA 3.3; 11 – sgRNA 3.4; 12 – sgRNA 4.1; 13 – sgRNA 4.2; 14 – sgRNA 4.4; 15 – sgRNA 4.5.

The PCR amplification that did not work were repeated with a different annealing temperature (from 60°C to 58°C) and shown in Figure 4.

Figure 4. PCR amplification with the vector pICSL70001 containing the sgRNA scaffold and the sgRNA primer designed for each sgRNA from HCT and CSE; Tm= 58°C. M - Marker NEB 2 -log; 1 – sgRNA 1.2; 2 – sgRNA 1.4; 3 – sgRNA 2.2.; 4 – sgRNA 2.6; 7 – sgRNA4.1; 8 – sgRNA 4.4.

3.1.2. CCoAOMT

CCoAOMT have been classified into three different classes (Maury et al., 2002). We decided to focus on class I because it is the first expressed during stem development (Maury et al., 2002). The presence of four isoforms of CCoAOMT from class I have been described already by Martz et al., (1998). Therefore, this previous was used work as a reference for our analyzes. In fact, the presence of extra isoforms or haplotypes in the tobacco genome were 87

checked by BlastN searching in the SOL Genomics database, but no other sequences were found. The alignment of CCoAOMT isoforms pointed a low identity level among them, making it difficult to design sgRNA with 100% identity with all isoforms. Consequently, we decided to design sgRNA to target each isoform individually. Considering the different results found when the expression of different CCoAOMT isoforms was manipulated (Zhong et al., 1998; Pinçon et al., 2001), it would be interesting to determine the potential role of one specific isoform in both pathways: lignin and CGA. Accordingly, to design the sgRNA specific for each isoform a tool offered by Benchling Inc., 2018 (https://benchling.com/) was used, this tool analyzes the best targeting region for each gene taking into consideration the match between the sequence and the sgRNA. It also analyzes the possible off-targets through a screening of the N.tabacum genome. The targets selected can be seen in Table 2. We used the same strategy previously described to create the sgRNA for CCoAOMT isoforms and Table 2 shows the forward primers.

88

Table 2. Primers and position in the gene to create the sgRNAs for all haplotypes of CCoAOMT. Gene sgRNA Target 20nt Selected sgRNA F Primer region Name tgtggtctca ATTG CCoAOMT TGCCATCATCGGG 272- sgRNA5. TGCCATCATCGGGAAGAGCCA 1 and 2 AAGAGCCA 292 1 gttttagagctagaaatagcaag CCoAOMT TCATCAATGCCAA 206- tgtggtctca ATTG TCATCAATGCCAAAAACACAA sgRNA5. 3 AAACACAA 226 gttttagagctagaaatagcaag 5 tgtggtctca ATTG CCoAOMT TCATCGGCAGAGG 156- sgRNA5. TCATCGGCAGAGGTGGTCATG 1 and 2 TGGTCATG 176 6 gttttagagctagaaatagcaag tgtggtctca ATTG CCoAOMT TCATCAGCAGAGG 156- sgRNA5. TCATCAGCAGAGGTGGTCATG 3 TGGTCATG 176 7 gttttagagctagaaatagcaag CCoAOMT CTTAGCTCTTTCAT tgtggtctca ATTG CTTAGCTCTTTCATGGGCTC sgRNA11 -106 1 GGGCTC gttttagagctagaaatagcaag .1 CCoAOMT AGATCACCGCAAA tgtggtctca ATTG AGATCACCGCAAAACACCCC sgRNA11 147 1 ACACCCC gttttagagctagaaatagcaag .2 CCoAOMT TTCTTTCGTTGGTG tgtggtctca ATTG TTCTTTCGTTGGTGTTGAAG sgRNA10 -32 2 TTGAAG gttttagagctagaaatagcaag .1 CCoAOMT AATGGAAGACATC tgtggtctca ATTG AATGGAAGACATCAAGAAGT sgRNA10 98 2 AAGAAGT gttttagagctagaaatagcaag .2 CCoAOMT GAGGTGGTCATGA tgtggtctca ATT GAGGTGGTCATGATGTTCCA sgRNA9. -157 3 TGTTCCA gttttagagctagaaatagcaag 1 CCoAOMT TGACCACCTCTGCT tgtggtctca ATTG TGACCACCTCTGCTGATGAA sgRNA9. 186 3 GATGAA gttttagagctagaaatagcaag 2 CCoAOMT CATCAATGCCAAA tgtggtctca ATTG CATCAATGCCAAAAACACAA sgRNA9. 235 3 AACACAA gttttagagctagaaatagcaag 3 CCoAOMT CACTGGTTTCAAG tgtggtctca ATTG CACTGGTTTCAAGTATGTAC sgRNA8. -72 4 TATGTAC gttttagagctagaaatagcaag 1 CCoAOMT ATCCATGAAAGAG tgtggtctca ATTG ATCCATGAAAGAGCTCAGGG sgRNA8. 130 4 CTCAGGG gttttagagctagaaatagcaag 4 CCoAOMT AGGTGACTGCTAA tgtggtctca ATTG AGGTGACTGCTAAGCATCCA sgRNA8. 150 4 GCATCCA gttttagagctagaaatagcaag 5

The vector pICSL70001 contains the sgRNA scaffold. Amplification with the primers pair from the previous step resulted in a PCR product that is the specific sgRNA as shown in Figure 5.

89

Figure 5. PCR amplification with the vector pICSL70001 containing the sgRNA scaffold and the sgRNA primer designed for each sgRNA from CCoAOMT 1 -4; Tm= 60°C. M - Marker NEB 2 -log; 1 – sgRNA 5.1; 2 – sgRNA 5.5; 3 – sgRNA 5.6.; 4 – sgRNA 5.7; 5 – sgRNA 8.1; 6 – sgRNA 8.4; 7 – sgRNA 8.5; 8 – sgRNA 9.1; 9 – sgRNA 9.2; 10 – sgRNA 9.3; 11 – sgRNA 10.1; 12 – sgRNA 10.2; 13 – sgRNA 11.1 14 – sgRNA 11.2.

3.1.3. CSE

The caffeoyl shikimate esterase (CSE) is an enzyme recently discovered as part of the lignin pathway (Vanholme et al., 2013b). We have been already characterizing the role of CSE in lignin biosynthesis in tobacco in Chapter 2. Our previous results showed that strong CSE down-regulation leads to a similar phenotype found in HCT downregulated plants (Hoffmann et al., 2004) (Results showed in Chapter 2). In order to confirm the dwarfed phenotype found in Chapter 2 is due the downregulation o CSE and is not related to the use of amiRNA, one of the advantages of using genome editing by CRISPR/Cas9 is that the results are not affected by the position where the transgene was inserted since it can be easily removed by Mendelian segregation. For this reason, we decided to create CRISPR/Cas9 mutants to edit both CSE and validate these results. The sgRNA design strategy was the same as previously described for HCT although the goal was to target both CSE haplotypes (described in detail in the Chapter 2) at once. Table 3 describes the forward primers from sgRNA designed:

90

Table 3. Primers and position in the gene to create the sgRNAs for all haplotypes of CSE. Gene sgRNA Target 20nt Selected sgRNA F Primer region Name ATCTTACTTCGAAAC tgtggtctca ATTG ATCTTACTTCGAAACACCCAA CSE ACCCAA 93-113 gttttagagctagaaatagcaag sgRNA4.1 ATTGAGTGAAGAGC tgtggtctca ATTG ATTGAGTGAAGAGCTTGCCGT CSE TTGCCGT 113-133 gttttagagctagaaatagcaag sgRNA4.2 GGCTACGGTTCCGA tgtggtctca ATT GGCTACGGTTCCGATACCGGT CSE TACCGGT 190-210 gttttagagctagaaatagcaag sgRNA4.4 CCGGTATCGGAACC tgtggtctca ATTG CCGGTATCGGAACCGTAGCCA CSE GTAGCCA 189-209 gttttagagctagaaatagcaag sgRNA4.5

The vector pICSL70001 contains the sgRNA scaffold. Amplification with the primers pair from the previous step resulted in a PCR product that is the specific sgRNA as shown in Figure 4.

3.2. Construct DNA assembly and multiplex targeting

To assemble CRISPR/Cas9 with the sgRNA we used the Golden Gate Modular Cloning Toolkit (MoClo) (Addgene kit # 1000000044; Addgene kit # 1000000047). This tool increases the speed of assembling of multiple genetic elements (Engler et al., 2008; Weber et al., 2011; Werner et al., 2012; Engler et al., 2014; Marillonnet and Werner, 2015). This method is based on the use of type IIS restriction enzymes, BsaI and BpI, combined with restriction- ligation to create multiple gene constructs with just a few reactions and is based on different vector levels: Level 0, Level 1, Level 2 (Engler et al., 2008). Level 0 is the entry-level used to clone the promoter, terminator, and gene individually. The next level is Level 1, a binary vector where it is necessary to insert the promoter with the gene of interest and the terminator together by the combination of Level 0 constructs. There are seven positions of Level 1 vectors and its position will determine the position of each construct in Level 2 vectors, depending how many constructs do you need to assemble (Engler et al., 2008; Weber et al., 2011; Werner et al., 2012; Engler et al., 2014; Marillonnet and Werner, 2015). Consequently, it is important to design the final construct before start cloning into Level 1 vectors. To assemble our sgRNA with the hCas9 protein (Mali et al., 2013) the first step was to design the final construct to determine in which position of the Level 1 each gene or sgRNA should be cloned (Table 4). Then we had to determine which resistance gene would be chosen for the final construct, a construct Level1 containing the nptII gene fused with Nos promoter was used. Furthermore, a construct Level 1 with the Cas9 described by Mali et al., (2013) fused with the CaMV35S promoter (pICSL11021). The level 1 91

containing sgRNA was done fusing the sgRNA with the U6-26 promoter from Arabidopsis thaliana (Lawrenson et al., 2015). Level 0 containing the promoter and terminator were selected from the existing library and assembled with the sgRNA to create the Level 1 U6-26::sgRNA. The PCR to confirm the insertion of U6-26::sgRNA into Level 1 can be seen in Figure 5 and 6.

Table 4. Experiment design to assemble Level 1 vectors. Position Gene Name of vector sgRNA name 1 Kan, CCoAOMT3 pICH47732 5.5; 9.3 2 Cas9 pICH47742 -

3 HCT1, CSE, CCoAOMT1,2,3 pICH47751 1.1; 1.4; 4.1; 5.1; 5.7; 9.1; 9.2; 11.1

4 HCT1, CSE, CCoAOMT1,2,3 pICH47761 1.2; 1.; 4.2; 5.5; 5.6; 9.3; 11.2

5 HCT2, CCoAOMT2,4 pICH47772 2.2; 2.6; 4.4; 8.1; 10.1

6 HCT2, CCoAOMT4 pICH47781 2.3; 2.7; 4.5;8.4; 8.5; 10.2

7 CCoAOMT3 pICH47791 5.7;9.1;9.2

Figure 5. Colony PCR amplification from Level 1 vectors contains the U6-26 fused with HCT and CSE. Tm= 60°C. M – NEB 2-log; 1 – sgRNA 1.1; 2 – sgRNA 1.2 colony 1; 3 – sgRNA 1.2 colony 2; 4 – sgRNA 1.4 colony 1; 5 – sgRNA 1.4 colony 2; 6 – sgRNA 1.6; 7 – sgRNA 2.2 colony 1; 8 – sgRNA 2.2 colony 2; 9 – sgRNA 2.3 colony 1; 10 – sgRNA 2.3 colony 2; 11 – sgRNA 2.6 colony1; 12 – sgRNA 2.6 colony 2; 13 – sgRNA 2.7 colony 1; 14 – sgRNA 2.7 colony 2; 22 – sgRNA 4.1 colony 1; 23 – sgRNA 4.1 colony 2; 24 – sgRNA 4.2 colony 1; 25 – sgRNA 4.4 colony 1; 26 – sgRNA 4.4 colony 2; 27 – sgRNA 4.5 colony 1; 28 – sgRNA 4.5 colony 2.

92

Figure 6. Colony PCR amplification from Level 1 vectors contains the U6-26 fused with CCoAOMT 1-4. Tm= 60°C. M – NEB 2-log; 1 – sgRNA 5.1; 2 – sgRNA 5.1 colony 1; 3 – sgRNA 5.5 colony 2; 4 – sgRNA 5.6; 5 – sgRNA 5.7 colony 1; 6 – sgRNA 5.7 colony 2; 7 – sgRNA 8.1; 8 – sgRNA 8.4; 9 – sgRNA 8.5; 10 – sgRNA 9.1 colony 1; 11 – sgRNA 9.1 colony 2; 12 – sgRNA 9.2 colony 1; 13 – sgRNA 9.2 colony 2; 14 – sgRNA 9.3 colony 1; 15 – sgRNA 9.3 colony 2; 16 – sgRNA 10.1; 17 – sgRNA 10.2; 18 – sgRNA 11.1; 19 – sgRNA 11.2; 22 – Blank.

The next level would be Level 2. For the constructs containing only 4 sgRNA, we used Level 2 as the final binary level (Figure 7). In Level 2 it is possible to join 6 different modules of Level 1 to create one Level 2 (Engler et al., 2014; Marillonnet and Werner, 2015). To add more than 6 Level 1 modules, it is necessary to use Level M. The Level M works exactly like the level 2, but it is not the final vector, there are 7 Level M vectors that can be combined to form one Level P, the final binary vector (Weber et al., 2011; Werner et al., 2012). We used a combination of two sgRNA per haplotype distant enough (100 bp) to create deletions to produce truncated proteins. The sgRNAs were designed at the beginning of the mRNA to guarantee the protein produced after the deletion would be non-functional. The design of Level 2 is shown in Figure 8 and the combination of sgRNAs used in each construction can be seen in Supplementary Table 6. 93

Figure 7. Schematic of binary vectors delivered in N.tabacum by agro-transient assay and target sequence. A) The tobacco construct has a neomycin phosphotransferase II (NptII) gene driven and terminated by Nopaline Synthase promoter (pNOS) and terminator (T-NOS); a Cas9 expression cassette consists in the sequence of Cas9 from S. pyogenes human codon-optimized (SpCas9h) driven and terminated by 35S promoter and terminator; and two single guide RNA (sgRNA) driven by A. thaliana U6 promoter; B) The gene model are represented by two exons (blue boxes) and one intron (blue line), the sgRNA are shown below target region in exon one in the same region of recognition of restriction enzymes (RE).

94

Figure 8. Schematic of all binary vectors delivered in N.tabacum by agro-transient assay. All constructs have a neomycin phosphotransferase II (NptII) gene driven and terminated by Nopaline Synthase promoter (pNOS) and terminator (T-NOS); a Cas9 expression cassette consisting in the sequence of Cas9 from S. pyogenes human codon-optimized (SpCas9h) driven and terminated by 35S promoter and terminator; and two single guide RNA (sgRNA) driven by A. thaliana U6 promoter. The figure (A – E) represent all constructions tested to target HCT gene; (F) represent the construction tested to target CSE genes and (G – N) represent all the constructions tested to target CCoAOMT.

Level 2 was assembled, and the positive colonies were mini-prepped and digested with the enzyme HindIII for 1 hour at 37°C (Figure 9, 10 and 11). The plasmid with the correct pattern of digestion was sequenced and introduced into Agrobacterium tumenfaciens for further tests. 95

Figure 9. Digestion of Level 2 constructs for HCT and CSE with the enzyme HindIII to confirm the assembly of Level 1. M – NEB 2-log; A1 – Level 2 A colony 1; A2 – Level 2 A colony 2; B1 – Level 2 B colony 1; C1 – Level 2 C colony 1; C2 – Level 2 C colony 2; D1 – Level 2 D colony 1; D2 – Level 2 D colony 2; E1 – Level 2 E colony 1; E2 – Level 2 E colony 2; F1 – Level 2 F colony 1; F2 – Level 2 F colony 2. The plasmids from construct E did not work and four new colonies were selected and digested (Figure 10).

Figure 10. Digestion of Level 2 constructs for HCT and CSE with the enzyme HindIII to confirm the assembly of Level 1. M – NEB 2-log; E3 – Level 2 E colony 3; E4 – Level 2 E colony 4; E5 – Level 2 E colony 5; E6 – Level 2 E colony 6.

Figure 11. Digestion of Level 2 constructs for CCoAOMT 1-4 with the enzyme HindIII to confirm the assembly of Level 1. M – NEB 2-log; M1 – Level 2 M colony 1; I1 – Level 2 I colony 1; I2 – Level 2 I colony 2; I3 – Level 2 I colony 3; J1 – Level 2 J colony 1; J2 – Level 2 J colony 2; K1 – Level 2 K colony 1; M2 – Level 2 M colony 2.

96

Figures 9, 10 and 11 show the Level 2 constructs for HCT and CSE which were properly produced. According to Martz et al., (1998), CCoAOMT3 and CCoAOMT4 are the isoforms most expressed in vascular tissue, suggesting that these isoforms are likely the most important for lignification. CCoAOMT2 was more expressed in flowers (Martz et al., 1998). Accordingly, the subsequent experiments were carried out with the constructs successfully produced.

3.3. Agroinfiltration and Genotyping

To evaluate the efficiency of the designed sgRNAs we performed Agroinfiltration in tobacco leaves. The plants were cultivated in greenhouse conditions for 4 weeks before the Agro-infiltration, and 4 replicates were used per group. Before assembling the Level 2 vectors, we performed the infiltration with different mixtures of Level 1 vectors (Supplementary Table 7) to test which would be the most efficient combination to construct the Level 2 vectors and to evaluate whether the sgRNAs were working properly. Three days after infiltration, we extracted the genomic DNA to perform PCR amplification and detect deletions.

Figure 12. PCR using as template the genomic DNA of leaves after the transient expression assay. The name of each group can be found in table 11; each sample corresponds to a mix of the four replicates. M – NEB 2-log; 1 – blank targeted with primers from HCT4; 2 – WT DNA targeted with primers from HCT4; 3 – TE 1 targeted with primers from HCT4; 4 – TE 2 targeted with primers from HCT4; 5 – Blank targeted with primers from HCT3; 6 – WT DNA targeted with primers HCT3; 7 – TE 1 targeted with primers HCT3; 8 – TE 2 targeted with primers HCT3; 9 – Blank targeted with primers HCT1; 10 – WT DNA targeted with primers HCT1; 11 – TE 3 targeted with primers HCT1; 12 – TE 4 targeted with primers HCT1; 13 - Blank targeted with primers HCT2; 14 – WT DNA targeted with primers HCT2; 15 – TE 3 targeted with primers HCT2; 16 – TE 4 targeted with primers HCT2; 17 – Blank targeted with primers CSE1; 18 – WT DNA targeted with primers CSE1; 19 – T5 targeted with primers CSE1; 20 – T6 targeted with primers CSE1; 21 – T7 targeted with primers CSE1; 22 – Blank targeted with primers CSE2; 23 – WT DNA targeted with primers CSE2; 24 – T5 targeted with primers CSE2; 25 – T6 targeted with primers CSE2; 26 – T7 targeted with primers CSE2.

97

We could not detect any mutation by PCR (Figure 12). Therefore, we decided to use the restriction enzyme site loss method (RE) to enrich the DNA with the edited sequence. In this strategy, we digested the PCR product with a restriction enzyme that cuts in the region to be edited (Supplementary Table 7). Previous studies showed that Cas9 cuts around 3 or 4 bp upstream of the PAM (Lawrenson et al., 2015). If we had success in the editing, the sequence would lose the recognition site for the RE and it would not be cut (Nekrasov et al., 2013; Lawrenson et al., 2015). After the digestion with the RE, the PCR was submitted to PCR again. Even after the digestion, we could not identify any mutation (Supplementary Figure 4). After analyzing the results, it was impossible to conclude whether the sgRNA was working or not and for this reason we decided to change our strategy: 1) we had mixed Level 1 vectors to perform the Agro-infiltration, which could affect the efficiency of the transient assay, this way, to guarantee that the Cas9 protein and the sgRNA were delivered to the same cell at the same time the transient assay was performed using only Level 2 vectors (Figure 8). 2) we had harvested the plants only 3 days after the transient assay as described in the literature (Sparkes et al., 2006; Nekrasov et al., 2013). Although that´s the time with the peak of protein expression (Sparkes et al., 2006; Nekrasov et al., 2013), perhaps the protein may not have enough time to induce the mutation since in CRISPR/Cas9 genome editing we have to way until the protein Cas9 to edit the genome. That is why we changed the harvest time from 3 days to 6 days (Figure 13), to provide enough time for genome editing. 3) we performed the RE assay after the PCR, while it would have been more efficient to enrich the genomic DNA, for this reason, we changed the restriction enzyme site loss method assay (RE): the genomic DNA was digested before the first PCR, in order to enrich the genomic DNA with the edited sequences. The enzymes used for each sgRNA are described in Table 10.

Figure 13. Transient assay – 6 days after Agro-infiltration; the plants were incubated for 2 days in the dark and for 4 days under a 12:12 dark: light photoperiod. 98

After these changes, we could detect which sgRNA was working and sent them for sequencing to detect mutations (Figure 14 B, Figure 15 B). For HCT, only the construct targeting group 2 (haplotype HCT 1 and 2) worked properly (Figure 14). We sequenced the PCR samples to confirm the genome editing, in all sequences analyzed we obtained a mixture of sequences after the seed region indicating the presence cell with mutation and cell without mutations (Figure 14 B). Considering that the HCT haplotype 1 was previously characterized as important for lignin biosynthesis (Hoffmann et al., 2004), the production of stably edited plants will be performed with these sgRNAs. Probably, the use of sgRNA that target only one specific group of haplotypes can overcome the dwarf phenotype found previously by Hoffmann et al., (2004).

99

Figure 14. Results from Agro-transient assay with sgRNA from HCT. A) PCR from Agro- transient assay with the sgRNA from HCT after digestion with the enzymes Hpy166 II and BseLI, the figure shows which sample was digested with the enzymes. The first line is referent from PCR using primers from HCT group 1 – Samples number 1 – 13 and 27 – 37 were amplified with primers to target HCT 4; while samples numbers 14 – 26 and 38 – 48 were amplified with primers to target HCT 3. M - NEB 2-log; 1 – WT DNAg digested with Hpy166II; 2 – 7 DNAg from TE using Construction A Replicate 1 – 6 digested with Hpy166 II; 7 – 11 DNAg from TE using Construction B Replicate 1 – 4 digested with Hpy166 II; 12 – WT DNAg without digestion; 13 – Blank; 14 – WT DNAg digested with Hpy166II; 15 – 20 DNAg from TE using Construction A Replicate 1 – 6 digested with Hpy166 II; 21 – 24 DNAg from TE using Construction B Replicate 1 – 4 digested with Hpy166 II; 25 – WT DNAg without digestion; 26 – Blank; 27 – WT DNAg digested with BseLI; 28 – 33 DNAg from TE using Construction A Replicate 1 – 6 digested with BseLI; 34 – 37 DNAg from TE using Construction B Replicate 1 – 4 digested with BseLI; 38 – WT DNAg digested with BseLI; 39 – 44 DNAg from TE using Construction A Replicate 1 – 6 digested with BseLI; 45 – 48 DNAg from TE using Construction B Replicate 1 – 4 digested with BseLI;. The second line is referent from PCR using primers from HCT group 2 – Samples number 49 – 61 amplified with primers to target HCT1; while samples numbers 62 – 74 to target HCT 2. M - NEB 2-log; 49 – Blank; 50 – WT DNAg; 51 – WT DNAg digested with Hpy166II; 52 – 57 DNAg from TE using Construction A Replicate 1 – 6 digested with Hpy166 II; 58 – 61 DNAg from TE using Construction B Replicate 1 – 4 digested with Hpy166 II; 62 – WT DNAg digested with Hpy166 II; 63 – 68 DNAg from TE using Construction A Replicate 1 – 6 digested with Hpy166 II; 69 – 72 DNAg from TE using Construction B Replicate 1 – 4 digested with Hpy166 II 73 – WT DNAg; 74 – Blank; B) PCR sequencing to detect mutations – in the first line the WT sequence (pointed in red) and below three sequences from hct-crispr mutated sequences (pointed in blue).

All CSE constructs worked as expected (Figure 15 A). The sequencing confirmed the presence of mutations, both CSE1 and CSE2 were mutated to CRISPR/Cas9. While for CSE1 we obtained deletions for CSE2 we obtained a chromatogram with a mixture of sequences 100

after the seed region indicating the presence of cell with different sequences (Figure 15 B) but in both cases, the mutations were enough to change the frame of the protein generated. Both constructs for CCoAOMT4 worked properly (Figure 16).

Figure 15. Results from Agro-transient assay with sgRNA from CSE. A) PCR from Agro- transient assay with the sgRNA from CSE after digestion with the enzymes AgeI and BceAI, the figure shows which sample was digested with the enzymes. The first line is referent from PCR using primers from CSE1. M - NEB 2-log; 1 – Blank; 2 – 5 DNAg from TE using Construction F Replicate 1 – 4 digested with AgeI; 6 – WT DNAg digested with AgeI; 7 – WT DNAg without digestion; 8 – WT DNAg digested with BceAI; 9 – 12 DNAg from TE using Construction F Replicate 1 – 4 digested with BceAI. The second line represents the PCR using the primers from CSE2. M - NEB 2-log; 13 – Blank; 14 – WT DNAg without digestion; 15 – WT DNAg digested with AgeI; 16 – 19 DNAg from TE using Construction F Replicate 1 – 4 digested with AgeI; 20 – WT DNAg digested with BceAI; 21 – 24 DNAg from TE using Construction F Replicate 1 – 4 digested with BceAI. B) PCR sequencing to detect mutations – Pointed in blue the sequences from CSE 1- in the first line the WT sequence from CSE1 and the two lines below are two sequences from cse1-crispr mutated sequences; pointed in red the sequences from CSE2 – in the first line WT sequences and below two sequences from cse2- crispr mutated sequences.

101

Figure 16. PCR from Agro-transient assay with the sgRNA from CCoAOMT4 after digestion with the enzymes Hpy166 II. M - NEB 2-log; 1 – Blank; 2 – WT DNAg without digestion; 3 – WT DNAG digested with Hpy166 II; 4 – 7 DNAg from TE using Construction I Replicate 1 – 4 digested with Hpy166 II; 8 – 11 DNAg from TE using Construction J Replicate 1 – 4 digested with Hpy166 II. 4. Discussion

In the present study, we achieved the targeted mutagenesis of three key genes from lignin (CSE, HCT, and CCoAOMT) metabolism in tobacco using CRISPR/Cas9 system by agrotransient expression. This achievement opens the possibility for the development of tobacco stable transformation to genome edit lignin/CGA pathway. The CRISPR/Cas9 system has been previously used to target gene from lignin metabolism as 4-coumarate: (4CL), involved in early steps of the lignin pathway, in poplar and in switchgrass (Zhou et al., 2015; Park et al., 2017). Recently, Takeda et al., (2019) target conipheraldehyde 5-hydrolase (OsCAld5H1) in rice. All these studies showed a high level of accuracy and reproducibility in the results to study lignin metabolism. In tobacco, previous studies have shown the use of CRISPR/Cas9 to target marker genes either in protoplast or in stable plants (Jiang et al., 2013; Li et al., 2013; Nekrasov et al., 2013; Gao et al., 2015) but none of them to target gene from lignin metabolism. Here, we used database and online designs tools to find the best sgRNA to CSE and CCoAOMT genes, these online tools found the best seed sequence near PAM, already considering potential off-targets. On the other hand, to HCT we designed sgRNA manually since our goal was to distinguish between the four alleles found in the tobacco genome. Comparing to conventional downregulation by transgenic approach (RNAi, amiRNA), CRISPR/Cas9 has the advantage to enable transgene segregation and, therefore, eliminate the effect of transgene insertion (Xu et al., 2015; Zhou et al., 2015). To design sgRNA to HCT, we gather the alleles in two groups based on their similarities: one group closer from the ancestors N. sylvestris (2n=24) and the other closer from N. tomentosiformis (2n=24). Tobacco (N.tabacum) genome is allotetraploid (2n=4x=48) (Leitch et al., 2008) and, for this reason, the design of specific sgRNA was more challenging, especially considering our goal of targeting 102

only specific haplotypes. Thus, to achieve this goal, we eliminated the regions where the sequences were similar to design the sgRNA manually in a region where PAM was present for the alleles of interest and absent for the other two alleles. The allele-specific targeting has been previously reported for genome editing of animal and human genome due to its applicability in genic therapy (Yoshimi et al., 2014; Shin et al., 2016). Shin et al., (2016) developed a CRISPR/Cas9 strategy using two sgRNA to increase specificity in human cells. The dual gRNA approach, using PAM altering SNPs, improved allele discrimination. The same strategy was used in rats by Yoshimi et al., (2014). The group used a combination of sgRNA, one targeting the mutant’s allele of interest and the other targeting the wild-type allele. This way, it was possible to correct only the disease-associated phenotypes in rats. Here we used a combination of two sgRNAs (i.e., both specific to the alleles we wanted to target) in order to create a deletion. To assemble the constructs containing sgRNAs, Cas9 and Gene Marker we used Golden Gate MoCLo toolkit. We selected this methodology because it is the most used for multiplex assembly to enable simultaneous assemble of multiples sgRNAs (for a review see Volpi e Silva and Patron, 2017). Using this strategy, we designed vectors to enable us successfully genome-edited CSE, CCoAOMT and HCT group 2 (HCT1 and HCT2) alleles. The RE-PCR was sufficient to identify the sequence with mutation in HCT1 and HCT2 since the sgRNAs were designed to induce a 100bp deletion (Figure 14 A). Considering we used agrotransient assay to induce mutation it was expected to obtain a mixture of cells, some of them with mutations. For this reason, we identified the presence of two bands in the agarose gel, one representing the sequences without mutation and/or with punctual mutations and another one (100bp smaller) with the deletion expected. In accordance with this data, our sequencing showed the presence of mutation and deletions in the expected region (Figure 14 B). Therefore, we successfully achieved our goal for this gene, which was to create a construct capable of targeting one group of HCT alleles. This construct enables us to evaluate HCT impact in lignin biosynthesis without interfering in plant development. Furthermore, we successfully targeted both alleles of CSE using a combination of two sgRNA (Figure 15 A and B). Both alleles were sequenced, and we observed the presence of mutations in both CSE1 and CSE2 (Figure 15 B). The construct aimed to induce mutations in CCoAOMT4 also worked as expected, the RE-PCR showed the presence of two bands: one indicating deletions and another one with the WT sequence (Figure 16). RE-PCR based method is a common strategy to detect CRISPR/Cas9 mutations, since the use of RE can enrich DNA and facilitate the mutation detection via PCR (Nekrasov et al., 2013; Lor et al., 2014; Wang et al., 2014b; Lawrenson et al., 2015). In N. benthamiana. this method was used to validate CRISPR/Cas9 efficiency via 103

agro-transient expression assay (Nekrasov et al., 2013). Lawrenson et al., (2015) also used this methodology in barley and B. oleraceae to screen stable plant transformants and find the expected crispr/cas9-mutations. This way, the methodology used here to identify CRISPR/Cas9 mutations in the agro-transient assay can also be applied in the screening of tobacco stable plants using the constructions we designed.

In conclusion, we successfully designed and tested sgRNAs for HCT, CSE, and CCoAOMT and induce mutations by transient expression in tobacco leaves using CRISPR/Cas9. Our next step is to create stable edited plants and evaluate whether the resulting loss-of-function mutants show any effect in lignin and/or CGA biosynthesis.

Acknowledgments NVS thanks São Paulo Research Foundation for a BEPE doctoral fellowship (Processo FAPESP n°2016/15834-2). This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001. 104

FINAL CONCLUSION

In conclusion, Chapter 1 was reviewed and discussed biochemical and molecular evidence of the metabolic re-routing of CGAs towards lignin giving more support to data found in Chapter 2. We developed transgenic tobacco plants by traditional agrobacterium methodology for 7 different constructs and analyzed phenotypical, histochemical, biochemical and molecular the transgenic from the construction HCTamiCSE, CSE and amiCSE – overexpressing HCT and silencing CSE; only overexpressing CSE and only silencing CSE, respectively. Our qPCR analyses from different organs from WT plants indicates that there is no competition for the common intermediates between CGA and lignin biosynthesis. CSE showed a pattern of expression that suggests its involvement in both pathways. The abundance of HCT in stem seems to favor lignin branch, probably due to its low affinity to convert caffeoyl CoA into CGA. On the other hand, in leaves, where there is an abundance of HQT, the higher affinity from this enzyme to quinate seems to favor CGA biosynthesis. To support this hypothesis, our CSE mutants had the excess of carbon generated by overexpression of CSE remobilized into CGA in leaves but not in stem. These differences in carbon remobilization observed between leaves and stem might be part of a strategy to finely regulate lignin branch since the excess of caffeic acid can inhibit lignin route. The development of cse mutants indicates CSE is essential for plant development since plants were severely dwarfed, possibly due to the collapse of vascular vessels. The downregulation of CSE leads to a reduction of HQT transcript level, reinforcing the interconnection between both pathways. Interestingly, the overexpression of HCT in plants downregulating CSE recover the dwarfed phenotype completely indicating that HCT is capable to produce caffeoyl CoA from caffeoyl shikimate. This fact is remarkable since it is the first time that the ability of HCT to convert caffeoyl shikimate was proved in planta. Moreover, comparing both mutants we found an opposite pattern of cellulose content which might be associated with the differences in saccharification efficiency found in these plants. Even though HCT and CSE are part of the same branch in phenolic pathway they have different roles in plant metabolism and trigger different responses upon perturbation. Lignin metabolic flux seems to be finely regulated in tobacco, since overexpression of shikimic branch does not affect drastically lignin content. The accumulation of caffeic acid in both mutants is an indicator that this compound might be a key point to regulate lignin pathway in tobacco. In Chapter 3 we developed constructs to genome edit tobacco by CRISPR/Cas9 and tested it by agro-transient assay. Furthermore, to help us to clarify the data found in Chapter 2 we also developed constructs to partially silence the gene HCT to 105

try to overcome the dwarfed phenotype previously described and designed a construct to target CCoAOMT4 – the isoform with higher level of expression in stem and the next step the pathway after Caffeoyl CoA formation by HCT or CSE. All these constructs were tested by agro- transient assay in tobacco and we were successful in generating mutations. The development of these constructs enables us to generate mutated plants more specifically and with more accurate results to understand lignin/CGA connection. We were able to develop 8 stable plants with HCT CRISPR/Cas9 constructs, but these plants did not show any mutation, probably due to the low number of plants analyzed. To summarize, our data indicate that CSE is main role of production of Caffeoyl CoA in shikimate branch since its downregulation severely affect plant development. Moreover, CSE overproduction favor CGA pathway, indicating a role of shipmate branch in the production of CGA in tobacco. On the other hand, HCT is critical for lignin metabolism and its overexpression associated with cse downregulation does not seem to affect CGA pool (i.e., probably due its low affinity to quinate). Our work also indicates that HCT is capable to produce caffeoyl CoA even though it is probably not the main route of production.

106

PERSPECTIVES

Our next step to finalize Chapter 2 is to analyse by qPCR other genes from lignin pathway to see how the route was affected. In addition to that, a biochemical analysis from cse mutants would help us to better understand the dwarfed phenotype we observed. In order to finalize Chapter 3 we will develop stable mutants with the constructs we develop, in this step we will be able to obtain transgene free plant in order to analyse the mutations in HCT, CSE and CCoAOMT genes with more accuracy. 107

REFERENCES

Abiven S, Heim A, Schmidt MWI (2011) Lignin content and chemical characteristics in maize and wheat vary between plant organs and growth stages: consequences for assessing lignin dynamics in soil. Plant Soil 343: 369–378 Aerts RJ, Baumann TW (1994) Distribution and utilization of chlorogenic acid in Coffea seedlings. J Exp Bot 45: 497–503 Alalwan HA, Alminshid AH, Aljaafari HAS (2019) Promising evolution of biofuel generations. Subject review. Renew Energy Focus 28: 127–139 Altschul SF, Madden TL, Schäffer a a, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389–402 Altschup SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic Local Alignment Search Tool. J Mol Biol 215: 403–410 Bae S, Park J, Kim JS (2014) Cas-OFFinder: A fast and versatile algorithm that searches for potential off-target sites of Cas9 RNA-guided endonucleases. Bioinformatics 30: 1473–1475 Becerra-Moreno A, Redondo-Gil M, Benavides J, Nair V, Cisneros-Zevallos L, Jacobo-Velázquez DA (2015) Combined effect of water loss and wounding stress on gene activation of metabolic pathways associated with phenolic biosynthesis in carrot. Front Plant Sci 6: 837 Belhaj K, Chaparro-Garcia A, Kamoun S, Patron NJ, Nekrasov V (2015) Editing plant genomes with CRISPR/Cas9. Curr Opin Biotechnol 32: 76–84 Benchling Inc. (2018) Benchling for Academics (Biology Software). Retrieved from https://benchling.com. Bertrand C, Noirot M, Doulbeau S, de Kochko A, Hamon S, Campa C (2003) Chlorogenic acid content swap during fruit maturation in Coffea pseudozanguebariae: Qualitative comparison with leaves. Plant Sci 165: 1355–1361 Bhering LL (2017) Rbio : A tool for biometric and statistical analysis using the R platform. Crop Breed Appl Biotechnol 17: 187–190 Boerjan W, Ralph J, Baucher M (2003) Lignin biosynthesis. Annu Rev Plant Biol 54: 519–46 Bombarely A, Menda N, Tecle IY, Buels RM, Strickler S, Fischer-York T, Pujar A, Leto J, Gosselin J, Mueller L a (2011) The Sol Genomics Network 108

(solgenomics.net): growing tomatoes using Perl. Nucleic Acids Res 39: D1149-55 Bonawitz ND, Chapple C (2010) The genetics of lignin biosynthesis: connecting genotype to phenotype. Annu Rev Genet 44: 337–363 Bonawitz ND, Chapple C (2013) Can genetic engineering of lignin deposition be accomplished without an unacceptable yield penalty? Curr Opin Biotechnol 24: 336–343 Brooks, C., Nekrasov, V., Lippman, Z.B. and Van Eck J (2014) Efficient Gene Editing in Tomato in the First Generation Using the Clustered Regularly Interspaced Short Palindromic Repeats/CRISPR-Associated9 System1. Plant Physiol 166: 1292–1297. Cai Y, Chen L, Liu X, Sun S, Wu C, Jiang B, Han T, Hou W (2015) CRISPR/Cas9-mediated genome editing in soybean hairy roots. PLoS One 10: 1–13 Campa C, Noirot M, Bourgeois M, Pervent M, Ky CL, Chrestin H, Hamon S, De Kochko A (2003) Genetic mapping of a caffeoyl-coenzyme a 3-0-methyltransferase gene in coffee trees. Impact on chlorogenic acid content. Theor Appl Genet 107: 751–756 Casas MI, Vaughan MJ, Bonello P, McSpadden Gardener B, Grotewold E, Alonso AP (2017) Identification of biochemical features of defective Coffea arabica L. beans. Food Res Int 95: 59–67 Castro E De, Sigrist CJA, Gattiker A, Bulliard V, Langendijk-genevaux PS, Gasteiger E, Bairoch A, Hulo N (2006) ScanProsite : detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nuc. Acids Res. 34: 362–365 Castro RD De, Marraccini P (2006) Cytology , biochemistry and molecular changes during coffee fruit development. Brazilian J Plant Physiol 18: 175–199 Cesarino I, Araújo P, Domingues Júnior AP, Mazzafera P (2012) An overview of lignin metabolism and its effect on biomass recalcitrance. Brazilian J Bot 35: 303–311 Chang S, Puryear J, Cairney J (1993) A simple and efficient method for isolating RNA from pine trees. Plant Mol Biol Report 11: 113–116 Chen F, Dahal P, Bradford K (2001) Two tomato expansin genes show divergent expression and localization in embryos during seed development and germination. Plant Physiol 127: 928–936 Chen F, Srinivasa Reddy MS, Temple S, Jackson L, Shadle G, Dixon RA (2006) Multi-site genetic modulation of monolignol biosynthesis suggests new routes for formation of syringyl lignin and wallbound ferulic acid in alfalfa (Medicago sativa L.). Plant J 48: 113–24 Chen L, Auh C, Chen F, Cheng X, Aljoe H, Dixon RA, Wang Z (2002) Lignin 109

deposition and associated changes in anatomy, enzyme activity, gene expression, and ruminal degradability in stems of tall fescue at different developmental stages. J Agric Food Chem. doi: 10.1021/jf020516x Clé C, Hill LM, Niggeweg R, Martin CR, Guisez Y, Prinsen E, Jansen M a K (2008) Modulation of chlorogenic acid biosynthesis in Solanum lycopersicum; consequences for phenolic accumulation and UV-tolerance. Phytochemistry 69: 2149–56 Clifford MN (1999) Chlorogenic acids and other cinnamates – nature , occurrence and dietary burden . J Sci Food Agric 79: 362–372 Comino C, Hehn A, Moglia A, Menin B, Bourgaud F, Lanteri S, Portis E (2009) The isolation and mapping of a novel hydroxycinnamoyltransferase in the globe artichoke chlorogenic acid pathway. BMC Plant Biol 9: 30 Correa DF, Beyer HL, Fargione JE, Hill JD, Possingham HP, Thomas-Hall SR, Schenk PM (2019) Towards the implementation of sustainable biofuel production systems. Renew Sustain Energy Rev 107: 250–263 Cosgrove DJ (2000) Loosening of plant cell walls by expansins. Nature 407: 321– 6 Días J, Barcelo RA, Cáceres M de F, Díaz J, Barceló AROS, De Cárceres FM (1997) Changes in shikimate dehydrogenase and the end products of the shikimate pathway, chlorogenic acid and , during the early development of seedlings of Capsicum annuum. New Phytol 136: 183–188 Van Dijk AE, Olthof MR, Meeuse JC, Seebus E, Heine RJ, Van Dam RM (2009) Acute effects of decaffeinated coffee and the major coffee components chlorogenic acid and trigonelline on glucose tolerance. Diabetes Care 32: 1023–1025 Ding Y, Li H, Chen L-L, Xie K (2016) Recent Advances in Genome Editing Using CRISPR/Cas9. Front Plant Sci 7: 1–12 Engler C, Kandzia R, Marillonnet S (2008) A one pot, one step, precision cloning method with high throughput capability. PLoS One. doi: 10.1371/journal.pone.0003647 Engler C, Youles M, Gruetzner R, Ehnert T-MM, Werner S, Jones JDG, Patron NJ, Marillonnet S (2014) A Golden Gate Modular Cloning Toolbox for Plants. ACS Synth Biol 3: 839–843 Escamilla-Treviño LL, Shen H, Hernandez T, Yin Y, Xu Y, Dixon R a (2014) Early lignin pathway enzymes and routes to chlorogenic acid in switchgrass (Panicum virgatum L.). Plant Mol Biol 84: 565–76 Esteban DJ, Buller RML (2005) Ectromelia virus: The causative agent of 110

mousepox. J Gen Virol 86: 2645–2659 F. Evert R (2006) Xylem: Cell Types and Developmental Aspects. pp 255–290 Faraji M, Fonseca LL, Escamilla-Treviño L, Barros-Rios J, Engle N, Yang ZK, Tschaplinski TJ, Dixon RA, Voit EO (2018) Mathematical models of lignin biosynthesis. Biotechnol Biofuels 11: 1–17 Fauser F, Schiml S, Puchta H (2014) Both CRISPR / Cas-based nucleases and nickases can be used efficiently for genome engineering in Arabidopsis thaliana. The Plant J. 79: 348–359 Feng C, Yuan J, Wang R, Liu Y, Birchler JA, Han F (2016) Efficient Targeted Genome Modification in Maize Using CRISPR/Cas9 System. J Genet Genomics 43: 37–43 Feng Z, Mao Y, Xu N, Zhang B, Wei P, Yang D-LL, Wang Z, Zhang Z, Zheng R, Yang L, et al (2014) Multigeneration analysis reveals the inheritance, specificity, and patterns of CRISPR/Cas-induced gene modifications in Arabidopsis. Proc Natl Acad Sci 111: 4632–4637 Ferrer J-L, Austin MB, Stewart C, Noel JP (2008) Structure and function of enzymes involved in the biosynthesis of phenylpropanoids. Plant Physiol Biochem 46: 356–70 Figueiredo R, Araújo P, Llerena JPP, Mazzafera P (2019) and hemicellulose in sugarcane cell wall architecture and crop digestibility: A biotechnological perspective. Food Energy Secur 1–24 Fornalé S, Rencoret J, García-Calvo L, Encina A, Rigau J, Gutiérrez A, Del Río JC, Caparros-Ruiz D (2017) Changes in cell wall polymers and degradability in maize mutants lacking 3' - and 5'-o-methyltransferases involved in lignin biosynthesis. Plant Cell Physiol 58: 240–255 Franke R, Hemm MR, Denault JW, Ruegger MO, Humphreys JM, Chapple C (2002a) Changes in secondary metabolism and deposition of an unusual lignin in the ref8 mutant of Arabidopsis. Plant J 30: 47–59 Franke R, Humphreys JM, Hemm MR, Denault JW, Ruegger MO, Cusumano JC, Chapple C (2002b) The Arabidopsis REF8 gene encodes the 3-hydroxylase of phenylpropanoid metabolism. Plant J 30: 33–45 Friend J, Reynolds SB, Aveyard MA (1973) Phenylalanine Ammonia-Lyase, Chlorogenic Acid and Lignin in Potato-Tuber Tissue Inoculated with Phytophthora-Infestans. Physiol Plant Pathol 3: 495–507 Gallego-Giraldo L, Bhattarai K, Pislariu CI, Nakashima J, Jikumaru Y, Kamiya Y, Udvardi MK, Monteros MJ, Dixon R a (2014) Lignin modification leads to 111

increased nodule numbers in alfalfa. Plant Physiol 164: 1139–50 Gallego-giraldo L, Escamilla-trevino L, Jackson LA, Dixon RA (2011) mediates the reduced growth of lignin down-regulated plants. PNAS 108: 20814–20819 Gamborg OL (1966) Aromatic metabolism in Plants. Biochem Biophys Acta 28: 483–491 Gao J, Wang G, Ma S, Xie X, Wu X, Zhang X, Wu Y, Zhao P, Xia Q (2015) CRISPR/Cas9-mediated targeted mutagenesis in Nicotiana tabacum. Plant Mol Biol 87: 99– 110 Gao W, Li HY, Xiao S, Chye ML (2010) Acyl-CoA-binding protein 2 binds lysophospholipase 2 and lysoPC to promote tolerance to cadmium-induced oxidative stress in transgenic Arabidopsis. Plant J 62: 989–1003 Gauthier L, Bonnin-Verdal MN, Marchegay G, Pinson-Gadais L, Ducos C, Richard-Forget F, Atanasova-Penichon V (2016) Fungal biotransformation of chlorogenic and caffeic acids by Fusarium graminearum: New insights in the contribution of phenolic acids to resistance to deoxynivalenol accumulation in cereals. Int J Food Microbiol 221: 61–68 Giordano D, Provenzano S, Ferrandino A, Vitali M, Pagliarani C, Roman F, Cardinale F, Castellarin SD, Schubert A (2016) Characterization of a multifunctional caffeoyl-CoA O-methyltransferase activated in grape berries upon drought stress. Plant Physiol Biochem 101: 23–32 Ha CM, Escamilla-Trevino L, Yarce JCS, Kim H, Ralph J, Chen F, Dixon RA (2016) An essential role of caffeoyl shikimate esterase in monolignol biosynthesis in Medicago truncatula. Plant J 86: 363–375 Hall TA (1990) BioEdit: a user-friendly biological sequence aligment editor and analysis program fro Windlows 95/98/NT. Nucl Acid Symp Ser 41: 95–98 Harris PJ, Stone BA (2008) Chemistry and Molecular Organization of Plant Cell Walls. Biomass Recalcitrance deconstructing Plant Cell Wall Bioenergy. Himmel ME, Ding S-Y, Johnson DK, Adney WS, Nimlos MR, Brady JW, Foust TD (2007) Biomass recalcitrance: engineering plants and enzymes for biofuels production. Science 315: 804–7 Hoffmann L, Besseau S, Geoffroy P, Ritzenthaler C, Meyer D, Lapierre C, Pollet B, Legrand M (2004) Silencing of Hydroxycinnamoyl-Coenzyme A Shikimate / Quinate Hydroxycinnamoyltransferase Affects Phenylpropanoid Biosynthesis. Plant Cell 16: 1446– 1465 Hoffmann L, Maury S, Martz F, Geoffroy P, Legrand M (2003) Purification, 112

cloning, and properties of an acyltransferase controlling shikimate and quinate ester intermediates in phenylpropanoid metabolism. J Biol Chem 278: 95–103 Horsch RB, Fry JE, Hoffman NL, Eichholtz D, Rogers SG, Fraley RT (1985) A Simple and General Method for Transferring Genes into Plants. Science 227: 1229–1231 Howe E, Holton K, Nair S, Schlauch D, Sinha R, Quackenbush J (2010) MeV: MultiExperiment viewer. Biomed Informatics Cancer Res. doi: 10.1007/978-1-4419-5714- 6_15 IEA IEA (2018) World Energy Outlook 2018 Good Standard of long-term energy analyses. Islam S (2006) Sweetpotato (Ipomoea batatas L .) Leaf: Its Potential Effect on Human Health and Nutrition. J Food Sci 71: 13–21 Jacobo-Velázquez D a, González-Agüero M, Cisneros-Zevallos L (2015) Cross- talk between signaling pathways: the link between plant production and wounding stress response. Sci Rep 5: 8608 Jacobs TB, LaFayette PR, Schmitz RJ, Parrott WA (2015) Targeted genome modifications in soybean with CRISPR/Cas9. BMC Biotechnol 15: 16 Jia H, Nian W (2014) Targeted genome editing of sweet orange using Cas9/sgRNA. PLoS One. doi: 10.1371/journal.pone.0093806 Jiang N-H, Zhang G-H, Zhang J-J, Shu L-P, Zhang W, Long G-Q, Liu T, Meng Z-G, Chen J-W, Yang S-C (2014) Analysis of the transcriptome of Erigeron breviscapus uncovers putative scutellarin and chlorogenic acids biosynthetic genes and genetic markers. PLoS One 9: e100357 Jiang W, Zhou H, Bi H, Fromm M, Yang B, Weeks DP (2013) Demonstration of CRISPR/Cas9/sgRNA-mediated targeted gene modification in Arabidopsis, tobacco, sorghum and rice. Nucleic Acids Res 41: e188–e188 Joët T, Laffargue A, Salmona J, Doulbeau S, Descroix F, Bertrand B, de Kochko A, Dussert S (2009) Metabolic pathways in tropical dicotyledonous albuminous seeds: Coffea arabica as a case study. New Phytol 182: 146–62 Jung JH, Fouad WM, Vermerris W, Gallo M, Altpeter F (2012) RNAi suppression of lignin biosynthesis in sugarcane reduces recalcitrance for biofuel production from lignocellulosic biomass. Plant Biotechnol J 10: 1067–1076 Karlsson M, Contreras JA, Hellman U, Tornqvist H, Holm C (1997) cDNA cloning, tissue distribution, and identification of the catalytic triad of monoglyceride lipase. Evolutionary relationship to esterases, lysophospholipases, and haloperoxidases. J Biol Chem 113

272: 27218–23 Kaya H, Mikami M, Endo A, Endo M, Toki S (2016) Highly specific targeted mutagenesis in plants using Staphylococcus aureus Cas9. Sci Rep 6: 26871 Kundu A, Vadassery J (2019) Chlorogenic acid-mediated chemical defence of plants against insect herbivores. Plant Biol 21: 185–189 Lallemand L a., McCarthy JG, McSweeney S, McCarthy A a. (2012a) Purification, crystallization and preliminary X-ray diffraction analysis of a hydroxycinnamoyl- CoA shikimate/quinate hydroxycinnamoyltransferase (HCT) from Coffea canephora involved in chlorogenic acid biosynthesis. Acta Crystallogr Sect F Struct Biol Cryst Commun 68: 824– 8 Lallemand L a., Zubieta C, Lee SG, Wang Y, Acajjaoui S, Timmins J, McSweeney S, Jez JM, McCarthy JG, McCarthy A a. (2012b) A structural basis for the biosynthesis of the major chlorogenic acids found in coffee. Plant Physiol 160: 249–60 Lawrenson T, Shorinola O, Stacey N, Li C, Østergaard L, Patron N, Uauy C, Harwood W (2015) Induction of targeted, heritable mutations in barley and Brassica oleracea using RNA-guided Cas9 nuclease. Genome Biol 16: 258 Leiss KA, Maltese F, Choi YH, Verpoorte R, Klinkhamer PGL (2009) Identification of Chlorogenic Acid as a Resistance Factor for Thrips in Chrysanthemum. Plant Physiol 150: 1567 LP – 1575 Leitch IJ, Hanson L, Lim KY, Kovarik A, Chase MW, Clarkson JJ, Leitch AR (2008) The ups and downs of genome size evolution in polyploid species of Nicotiana (Solanaceae). Ann. Bot. pp 805–814 Lepelley M, Cheminade G, Tremillon N, Simkin A, Caillet V, McCarthy J (2007) Chlorogenic acid synthesis in coffee: An analysis of CGA content and real-time RT- PCR expression of HCT, HQT, C3H1, and CCoAOMT1 genes during grain development in C. canephora. Plant Sci 172: 978–996 Li J-F, Norville JE, Aach J, McCormack M, Zhang D, Bush J, Church GM, Sheen J, Norville JE, McCormack M, et al (2013) Multiplex and homologous recombination–mediated genome editing in Arabidopsis and Nicotiana benthamiana using guide RNA and Cas9. Nat Biotechnol 31: 688–691 Li X, Weng J-K, Chapple C (2008) Improvement of biomass through lignin modification. Plant J 54: 569–81 Llerena JPP, Figueiredo R, Brito MDS, Kiyota E, Mayer JLS, Araujo P, Schimpl FC, Dama M, Pauly M, Mazzafera P (2019) Deposition of lignin in four species of 114

Saccharum. Sci Rep 9: 5877 Loqué D, Scheller H V., Pauly M (2015) Engineering of plant cell walls for enhanced biofuel production. Curr Opin Plant Biol 25: 151–161 Lor VS, Starker CG, Voytas DF, Weiss D, Olszewski NE (2014) Targeted mutagenesis of the tomato PROCERA gene using TALENs. Plant Physiol 166: 1288–1291 Lorenzo G De, Ferrari S, Giovannoni M, Mattei B, Cervone F (2019) Cell wall traits that influence plant development , immunity. The Plant J. 97: 134–147 Lowder LG, Zhang D, Baltes NJ, Paul JW, Tang X, Zheng X, Voytas DF, Hsieh T-F, Zhang Y, Qi Y (2015) A CRISPR/Cas9 Toolbox for Multiplexed Plant Genome Editing and Transcriptional Regulation. Plant Physiol. doi: 10.1104/pp.15.00636 Ma X, Zhang Q, Zhu Q, Liu W, Chen YY, Qiu R, Wang B, Yang Z, Li H, Lin Y, et al (2015) A Robust CRISPR/Cas9 System for Convenient, High-Efficiency Multiplex Genome Editing in Monocot and Dicot Plants. Mol Plant 8: 1274–1284 Macrelli S, Galbe M, Wallberg O (2014) Effects of production and market factors on ethanol profitability for an integrated first and second generation ethanol plant using the whole sugarcane as feedstock. Biotechnol Biofuels 7: 26 Mahfouz MM, Piatek A, Stewart CN (2014) Genome engineering via TALENs and CRISPR/Cas9 systems: Challenges and perspectives. Plant Biotechnol J 12: 1006–1014 Mahon EL, Mansfield SD (2019) Tailor-made trees: engineering lignin for ease of processing and tomorrow’s bioeconomy. Curr Opin Biotechnol 56: 147–155 Mali P, Yang L, Esvelt KM, Aach J, Guell M, DiCarlo JE, Norville JE, Church GM (2013) RNA-guided human genome engineering via Cas9. Science 339: 823–826 Marchler-Bauer A, Bryant SH (2004) CD-Search: protein domain annotations on the fly. Nucleic Acids Res 32: W327-31 Marillonnet S, Werner S (2015) Assembly of multigene constructs using golden gate cloning. Glyco-Engineering Methods Protoc. pp 269–284 Marraffini LA (2015) CRISPR-Cas immunity in prokaryotes. Nature 526: 55–61 Marriott PE, Gómez LD, Mcqueen-mason SJ (2015) Unlocking the potential of lignocellulosic biomass through plant science. New Phytol 209: 1366–1381 Marriott PE, Sibout R, Lapierre C, Fangel JU, Willats WGT, Hofte H, Gómez LD, McQueen-Mason SJ (2014) Range of cell-wall alterations enhance saccharification in Brachypodium distachyon mutants . Proc Natl Acad Sci 111: 14601–14606 Martz F, Maury S, Pinçon G, Legrand M (1998) cDNA cloning, substrate specificity and expression study of tobacco caffeoyl-CoA 3-O-methyltransferase, a lignin 115

biosynthetic enzyme. Plant Mol Biol 36: 427–437 Maury S, Geoffroy P, Legrand M (2002) Tobacco O -Methyltransferases Involved in Phenylpropanoid Metabolism. The Different Caffeoyl-Coenzyme A/5- Hydroxyferuloyl-Coenzyme A 3/5- O -Methyltransferase and Caffeic Acid/5-Hydroxyferulic Acid 3/5- O -Methyltransferase Classes Have Distinct Substrat. Plant Physiol 121: 215–224 Mazzafera P (1999) Chemical Composition of defective coffee beans. Food Chem 64: 547–554 Mccann MC, Buckeridge MS (2014) Plants and BioEnergy. doi: 10.1007/978-1- 4614-9329-7 Meester B De, Vries L De, Özparpucu M, Gierlinger N, Corneillie S, Open IB, Meester B De, Vries L De, Özparpucu M, Gierlinger N, et al (2018) Vessel-Specific Reintroduction of CINNAMOYL-COA REDUCTASE1 (CCR1) in Dwarfed ccr1 Mutants Restores Vessel and Xylary Fiber Integrity and Increases Biomass. 176: 611–633 Moglia A, Lanteri S, Comino C, Hill L, Knevitt D, Cagliero C, Rubiolo P, Bornemann S, Martin C (2014) Dual Catalytic Activity of Hydroxycinnamoyl-Coenzyme A Quinate Transferase from Tomato Allows It to Moonlight in the Synthesis of Both Mono- and Dicaffeoylquinic Acids. Plant Physiol 166: 1777–1787 Mokochinski JB, Bataglion GA, Kiyota E, de Souza LM, Mazzafera P, Sawaya ACHF (2015) A simple protocol to determine lignin S/G ratio in plants by UHPLC-MS. Anal Bioanal Chem 407: 7221–7227 Mondolot L, La Fisca P, Buatois B, Talansier E, Kochko A de, Campa C (2006) Evolution in Coffeoylquinic Acid Content and Histolocalization During Coffea canephora Leaf development. Ann Bot 98: 33–40 Mottiar Y, Vanholme R, Boerjan W, Ralph J, Mansfield SD (2016) Designer lignins: Harnessing the plasticity of lignification. Curr Opin Biotechnol 37: 190–200 Murashige T, Skoog F (1962) A Revised Medium for Rapid Growth and Bio Agsays with Tohaoco Tissue Cultures. Physiol Plant 15: 473–497 Muro-Villanueva F, Mao X, Chapple C (2019) Linking phenylpropanoid metabolism, lignin deposition, and plant growth inhibition. Curr Opin Biotechnol 56: 202–208 Nardini M, Dijkstra BW (1999) Alpha/beta hydrolase fold enzymes: the family keeps growing. Curr Opin Struct Biol 9: 732–7 Naseer S, Lee Y, Lapierre C, Franke R, Nawrath C, Geldner N (2012) Casparian strip diffusion barrier in Arabidopsis is made of a lignin polymer without suberin. Proc Natl Acad Sci U S A 109: 10101–6 116

Nekrasov V, Staskawicz B, Weigel D, Jones JDG, Kamoun S (2013) Targeted mutagenesis in the model plant Nicotiana benthamiana using Cas9 RNA-guided endonuclease. Nat Biotechnol 31: 691–693 Neutelings G (2011) Lignin variability in plant cell walls: contribution of new models. Plant Sci 181: 379–86 Niggeweg R, Michael AJ, Martin C (2004) Engineering plants with increased levels of the antioxidant chlorogenic acid. Nat Biotechnol 22: 746–54 Oboh G, Agunloye OM, Akinyemi AJ, Ademiluyi AO, Adefegha SA (2013) Comparative Study on the Inhibitory Effect of Caffeic and Chlorogenic Acids on Key Enzymes Linked to Alzheimer’s Disease and Some Pro-oxidant Induced Oxidative Stress in Rats’ Brain- In Vitro. Neurochem Res 38: 413–419 Ohkawara T, Takeda H, Nishihira J (2017) Protective effect of chlorogenic acid on the inflammatory damage of pancreas and lung in mice with l--induced pancreatitis. Life Sci 190: 91–96 Olthof MR, Hollman PCH, Katan MB (2001) Human Nutrition and Metabolism Chlorogenic Acid and Caffeic Acid Are Absorbed in Humans. J Nutr 22: 66–71 Osakabe Y, Sugano SS, Osakabe K (2016a) Genome engineering of woody plants: past, present and future. J Wood Sci 62: 1–9 Osakabe Y, Watanabe T, Sugano SS, Ueta R, Ishihara R, Shinozaki K, Osakabe K (2016b) Optimization of CRISPR/Cas9 genome editing to modify abiotic stress responses in plants. Sci Rep 6: 26685 Ossowski S, Schwab R, Weigel D (2008) Gene silencing in plants using artificial microRNAs and other small RNAs. Plant J 53: 674–90 Oyarce P, De Meester B, Fonseca F, de Vries L, Goeminne G, Pallidis A, De Rycke R, Tsuji Y, Li Y, Van den Bosch S, et al (2019) Introducing biosynthesis in Arabidopsis enhances lignocellulosic biomass processing. Nat Plants 5: 225–237 Pan C, Ye L, Qin L, Liu X, He Y, Wang J, Chen L, Lu G (2016) CRISPR/Cas9- mediated efficient and heritable targeted mutagenesis in tomato plants in the first and later generations. Nat Publ Gr 6: 2–10 Park JJ, Yoo CG, Flanagan A, Pu Y, Debnath S, Ge Y, Ragauskas AJ, Wang ZY (2017) Biotechnology for Biofuels Defined tetra ‑ allelic gene disruption of the 4 ‑ coumarate : coenzyme A ligase 1 ( Pv4CL1 ) gene by CRISPR / Cas9 in switchgrass results in lignin reduction and improved sugar release. Biotechnol Biofuels 10: 1–11 Parry G, Patron N, Bastow R, Matthewman C (2016) Meeting report: 117

GARNet/OpenPlant CRISPR-Cas workshop. Plant Methods. doi: 10.1186/s13007-016-0104-z Pauly M, Keegstra K (2008) Cell-wall carbohydrates and their modification as a resource for biofuels. Plant J 54: 559–68 Payyavula RS, Shakya R, Sengoda VG, Munyaneza JE, Swamy P, Navarre DA (2015) Synthesis and regulation of chlorogenic acid in potato: Rerouting phenylpropanoid flux in HQT -silenced lines. Plant Biotechnol J 13: 551–564 Peng X-P, Sun S-L, Wen J-L, Yin W-L, Sun R-C (2014) Structural characterization of lignins from hydroxycinnamoyl transferase (HCT) down-regulated transgenic poplars. Fuel 134: 485–492 Pereira L, Domingues-Junior AP, Jansen S, Choat B, Mazzafera P (2018) Is embolism resistance in plant xylem associated with quantity and characteristics of lignin? Trees v. 32: 10-358–2018 v.32 no.2 Peterson JK, Harrison HF, Snook ME, Jackson DM (2005) Chlorogenic acid content in sweetpotato germplasm: Possible role in disease and pest resistance. Allelopath. J. Petrov V, Hille J, Mueller-roeber B, Gechev TS (2015) ROS-mediated abiotic stress-induced programmed cell death in plants. Front Plant Sci 6: 1–16 Pinçon G, Maury S, Hoffmann L, Geoffroy P, Lapierre C, Pollet B, Legrand M (2001) Repression of O-methyltransferase genes in transgenic tobacco affects lignin synthesis and plant growth. Phytochemistry 57: 1167–1176 Ponniah SK, Shang Z, Akbudak MA, Srivastava V, Manoharan M (2017) Down-regulation of hydroxycinnamoyl CoA: shikimate hydroxycinnamoyl transferase, cinnamoyl CoA reductase, and cinnamyl alcohol dehydrogenase leads to lignin reduction in rice (Oryza sativa L. ssp. japonica cv. Nipponbare). Plant Biotechnol Rep 11: 17–27 Pu G, Zhou B, Xiang F (2017) Isolation and functional characterization of a Lonicera japonica hydroxycinnamoyl transferase involved in chlorogenic acid synthesis. Biol 72: 608–618 Ralph J, Lapierre C, Boerjan W (2019) Lignin structure and its engineering. Curr Opin Biotechnol 56: 240–249 Rogers WJ, Michaux S, Bastin M, Bucheli P (1999) Changes to the content of sugars, sugar alcohols, myo-inositol, carboxylic acids and inorganic anions in developing grains from different varieties of Robusta (Coffea canephora) and Arabica (C. arabica) coffees. Plant Sci 149: 115–123 Ruel K, Berrio-sierra J, Derikvand MM, Pollet B, Thévenin J, Lapierre C, Jouanin L, Joseleau J, Mir-derikvand M, Catherine L (2009) Impact of CCR1 silencing on 118

the assembly of lignified secondary walls in Arabidopsis thaliana. New Phyt 184: 99–113 Saleme M de LS, Cesarino I, Vargas L, Kim H, Vanholme R, Goeminne G, Van Acker R, Fonseca FC de A, Pallidis A, Voorend W, et al (2017) Silencing CAFFEOYL SHIKIMATE ESTERASE affects lignification and improves saccharification. Plant Physiol pp.00920.2017 Schilmiller AL, Stout J, Weng J-K, Humphreys J, Ruegger MO, Chapple C (2009) Mutations in the cinnamate 4-hydroxylase gene impact metabolism, growth and development in Arabidopsis. Plant J 60: 771–782 Schiml S, Fauser F, Puchta H (2014) The CRISPR/Cas system can be used as nuclease for in planta gene targeting and as paired nickases for directed mutagenesis in Arabidopsis resulting in heritable progeny. Plant J 80: 1139–1150 Schmidt GW, Delaney SK (2010) Stable internal reference genes for normalization of real-time RT-PCR in tobacco (Nicotiana tabacum) during development and abiotic stress. Mol Genet Genomics 283: 233–241 Schubert C (2006) Can biofuels finally take center stage? Nat Biotechnol 24: 777– 784 Schwab R, Ossowski S, Riester M, Warthmann N, Weigel D (2006) Highly Specific Gene Silencing by Artificial MicroRNAs in Arabidopsis. Plant Cell 18: 1121–1133 Seh ZW, Kibsgaard J, Dickens CF, Chorkendorff I, Nørskov JK, Jaramillo TF (2017) Combining theory and experiment in electrocatalysis: Insights into materials design. Science 355: eaad4998 Senthil-Kumar M, Hema R, Suryachandra TR, Ramegowda H V., Gopalakrishna R, Rama N, Udayakumar M, Mysore KS (2010) Functional characterization of three water deficit stress-induced genes in tobacco and Arabidopsis: An approach based on gene down regulation. Plant Physiol Biochem 48: 35–44 Shadle G, Chen F, Srinivasa Reddy MS, Jackson L, Nakashima J, Dixon R a (2007) Down-regulation of hydroxycinnamoyl CoA: shikimate hydroxycinnamoyl transferase in transgenic alfalfa affects lignification, development and forage quality. Phytochemistry 68: 1521–9 Shin JW, Kim K, Chao MJ, Atwal RS, Gillis T, Macdonald ME, Gusella JF, Lee J (2016) Permanent inactivation of Huntington ’ s disease mutation by personalized allele- specific CRISPR / Cas9. 25: 4566–4576 Silva N, Mazzafera P, Cesarino I (2019) Should I stay or should I go: are chlorogenic acids mobilized towards lignin biosynthesis? Phytochemistry 166: 112063 119

Somerville C, Bauer S, Brininstool G, Facette M, Hamann T, Milne J, Osborne E, Paredez A, Persson S, Raab T, et al (2004) Toward a systems approach to understanding plant cell walls. Science 306: 2206–11 Sonnante G, D’Amore R, Blanco E, Pierri CL, De Palma M, Luo J, Tucci M, Martin C (2010) Novel Hydroxycinnamoyl-Coenzyme A Quinate Transferase Genes from Artichoke Are Involved in the Synthesis of Chlorogenic Acid. Plant Physiol 153: 1224–1238 Sparkes IA, Runions J, Kearns A, Hawes C (2006) Rapid, transient expression of fluorescent fusion proteins in tobacco plants and generation of stably transformed plants. Nat Protoc 1: 2019–2025 St-Pierre B, Laflamme P, Alarco AM, De Luca V (1998) The terminal O- acetyltransferase involved in vindoline biosynthesis defines a new class of proteins responsible for coenzyme A-dependent acyl transfer. Plant J 14: 703–713 St-Pierre B, Luca V De (2000) Evolution of acyltransferase genes: Origin and diversification fo the BAHD superfamily of acyltransferases involved in secondary metabolism. Recent Adv Phytochem 34: 285–315 Stewart JJ, Akiyama T, Chapple C, Ralph J, Mansfield SD (2009) The Effects on Lignin Structure of Overexpression of Ferulate 5-Hydroxylase in Hybrid Poplar. Plant Physiol 150: 621–635 Taiz L, Zeiger E (2006) Cell Walls: Structure, Biogenesis, and Expansion. Plant Physiol. 313-338. Takeda Y, Suzuki S, Tobimatsu Y, Osakabe K, Osakabe Y, Ragamustari SK (2019) Lignin characterization of rice CONIFERALDEHYDE 5-HYDROXYLASE loss-of- function mutants generated with the CRISPR / Cas9 system. The Plant J. 97: 543–554 Takeda Y, Tobimatsu Y, Karlen SD, Koshiba T, Suzuki S, Yamamura M, (2018) Downregulation of p-COUMAROYL ESTER 3-HYDROXYLASE in rice leads to altered cell wall structures and improves biomass saccharification. The Plant J. 95: 796–811 Team RDC (2011) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. doi: ISBN 3-900051-07-0 Terrett OM, Dupree P (2019) Covalent interactions between lignin and hemicelluloses in plant secondary cell walls. Curr Opin Biotechnol 56: 97–104 Thom E (2007) The effect of chlorogenic acid enriched coffee on glucose absorption in healthy volunteers and its effect on body mass when used long-term in overweight and obese people. J Int Med Res 35: 900–908 Tong Z, Li H, Zhang R, Ma L, Dong J, Wang T (2015) Co-downregulation of 120

the hydroxycinnamoyl-CoA:shikimate hydroxycinnamoyl transferase and coumarate 3- hydroxylase significantly increases cellulose content in transgenic alfalfa (Medicago sativa L.). Plant Sci 239: 230–237 Torras-claveria L, Jáuregui O, Codina C, Tiburcio AF, Bastida J, Viladomat F (2012) Plant Science Analysis of phenolic compounds by high-performance liquid chromatography coupled to electrospray ionization tandem mass spectrometry in senescent and water-stressed tobacco. Plant Sci 182: 71–78 Valiñas MA, Lanteri ML, Ten Have A, Andreu AB (2015) Chlorogenic Acid Biosynthesis Appears Linked with Suberin Production in Potato Tuber (Solanum tuberosum). J Agric Food Chem 63: 4902–4913 Vanholme B, Cesarino I, Goeminne G, Kim H, Marroni F, Van Acker R, Vanholme R, Morreel K, Ivens B, Pinosio S, et al (2013a) Breeding with rare defective alleles (BRDA): A natural Populus nigra HCT mutant with modified lignin as a case study. New Phytol 198: 765–776 Vanholme B, Houari I El, Boerjan W (2019a) Bioactivity : phenylpropanoids ’ best kept secret. Curr Opin Biotechnol 56: 156–162 Vanholme R, Cesarino I, Rataj K, Xiao Y, Sundin L, Goeminne G, Kim H, Cross J, Morreel K, Araujo P, et al (2013b) Caffeoyl shikimate esterase (CSE) is an enzyme in the lignin biosynthetic pathway in Arabidopsis. Science 341: 1103–6 Vanholme R, De Meester B, Ralph J, Boerjan W, Meester B De, Ralph J, Boerjan W (2019b) Lignin biosynthesis and its integration into metabolism. Curr Opin Biotechnol 56: 230–239 Vargas L, Cesarino I, Vanholme R, Voorend W, de Lyra Soriano Saleme M, Morreel K, Boerjan W (2016) Improving total saccharification yield of Arabidopsis plants by vessel-specific complementation of caffeoyl shikimate esterase (cse) mutants. Biotechnol Biofuels 9: 139 Verdaguer B, de Kochko a, Beachy RN, Fauquet C (1996) Isolation and expression in transgenic tobacco and rice plants, of the cassava vein mosaic virus (CVMV) promoter. Plant Mol Biol 31: 1129–39 Ververis C, Georghiou K, Christodoulakis N, Santas P, Santas R (2004) Fiber dimensions, lignin and cellulose content of various plant materials and their suitability for paper production. Ind Crops Prod 19: 245–254 Vicentini R, Bottcher A, Brito M dos S, dos Santos AB, Creste S, Landell MG de A, Cesarino I, Mazzafera P (2015) Large-Scale Transcriptome Analysis of Two Sugarcane 121

Genotypes Contrasting for Lignin Content. PLoS One 10: e0134909 Vijayaraj P, Jashal CB, Vijayakumar A, Rani SH, Rao DKV, Rajasekharan R (2012) A Bifunctional Enzyme That Has Both Monoacylglycerol Acyl Hydrolase Activities. Plant Physiol 160: 667–683 Villegas RJA, Kojima M (1986) Purification and Characterization of Hydroxycinnamoyl D-Glucose. J Biol Chem 261: 8729–8733 Volpi e Silva N, Patron NJ (2017) CRISPR-based tools for plant genome engineering. Emerg Top Life Sci 1: 135–149 Wagner A, Ralph J, Akiyama T, Flint H, Phillips L, Torr K, Nanayakkara B, Te Kiri L (2007) Exploring lignification in conifers by silencing hydroxycinnamoyl- CoA:shikimate hydroxycinnamoyltransferase in Pinus radiata. Proc Natl Acad Sci U S A 104: 11856–61 Walker AM, Hayes RP, Youn B, Vermerris W, Sattler SE, Kang C (2013) Elucidation of the Structure and Reaction Mechanism of Sorghum Hydroxycinnamoyltransferase and Its Structural Relationship to Other Coenzyme A-Dependent Transferases and Synthases. Plant Physiol 162: 640–651 Wang G-F, Balint-Kurti P (2016) Maize Homologs of CCoAOMT and HCT, Two Key Enzymes in Lignin Biosynthesis, Form Complexes with the NLR Rp1 Protein to Modulate the Defense Response. Plant Physiol 171: 2166–2177 Wang H-Z, Dixon R a (2012) On-off switches for secondary cell wall biosynthesis. Mol Plant 5: 297–303 Wang JP, Naik PP, Chen H-C, Shi R, Lin C-Y, Liu J, Shuford CM, Li Q, Sun Y-H, Tunlaya-Anukit S, et al (2014a) Complete Proteomic-Based Enzyme Reaction and Inhibition Kinetics Reveal How Monolignol Biosynthetic Enzyme Families Affect Metabolic Flux and Lignin in Populus trichocarpa. Plant Cell 26: 894 LP – 914 Wang Y, Cheng X, Shan Q, Zhang Y, Liu J, Gao C, Qiu J-L (2014b) Simultaneous editing of three homoeoalleles in hexaploid bread wheat confers heritable resistance to powdery mildew. Nat Biotechnol Advance on: 1–6 Waters LS, Storz G (2009) Regulatory RNAs in Bacteria. Cell 136: 615–628 Weber E, Engler C, Gruetzner R, Werner S, Marillonnet S (2011) A modular cloning system for standardized assembly of multigene constructs. PLoS One. doi: 10.1371/journal.pone.0016765 Werner S, Engler C, Weber E, Gruetzner R, Marillonnet S (2012) Fast track assembly of multigene constructs using golden gate cloning and the MoClo system. Bioeng 122

Bugs 3: 38–43 Whayeb S a, Yamamoto K, Castillo ME, Tojo H, Honda T (1996) Lysophospholipase L2 of Vibrio cholerae O1 affects cholera toxin production. FEMS Immunol Med Microbiol 15: 9–15 Woo JW, Kim JJ-SJJ-S, Kwon S Il, Corvalán C, Cho SW, Kim H, Kim S-TS- G, Kim S-TS-G, Choe S, Kim JJ-SJJ-S (2015) DNA-free genome editing in plants with preassembled CRISPR-Cas9 ribonucleoproteins. Nat Biotechnol 33: 1162–1164 Van de Wouwer D, Boerjan W, Vanholme B (2018) Plant cell wall sugars: sweeteners for a bio-based economy. Physiol Plantaruim 164: 27–44 Van de Wouwer D, Vanholme R, Decou R, Goeminne G, Audenaert D, Nguyen L, Höfer R, Pesquet E, Vanholme B, Boerjan W (2016) Chemical Genetics Uncovers Novel Inhibitors of Lignification, Including p -Iodobenzoic Acid Targeting CINNAMATE-4- HYDROXYLASE . Plant Physiol 172: 198–220 Xie S, Shen B, Zhang C, Huang X, Zhang Y (2014) SgRNAcas9: A software package for designing CRISPR sgRNA and evaluating potential off-target cleavage sites. PLoS One 9: 1–9 Xu R-F, Li H, Qin R-Y, Li J, Qiu C-H, Yang Y-C, Ma H, Li L, Wei P-C, Yang J-B (2015) Generation of inheritable and “transgene clean” targeted genome-modified rice in later generations using the CRISPR/Cas9 system. Sci Rep 5: 11491 Yamaguchi T, Chikama A, Mori K, Watanabe T, Shioya Y, Katsuragi Y, Tokimitsu I (2008) Hydroxyhydroquinone-free coffee: A double-blind, randomized controlled dose;response study of blood pressure. Nutr Metab Cardiovasc Dis 18: 408–414 Yamamoto YY, Obokata J (2008) Ppdb: a Plant Promoter Database. Nucleic Acids Res 36: D977-81 Yang Q, Reinhard K, Schiltz E, Matern U (1997) Characterization and heterologous expression of hydroxycinnamoyl/benzoyl-coA:anthranilate N- hydroxycinnamoyl/benzoyltransferase from elicited cell cultures of carnation, Dianthus caryophyllus L. Plant Mol Biol 35: 777–789 Yoshimi K, Kaneko T, Voigt B, Mashimo T (2014) Allele-specific genome editing and correction of disease-associated phenotypes in rats using the CRISPR–Cas platform. Nat Commun 5: 4240 Yoshimoto M, Yahara S, Okuno S, Islam MS, Ishiguro K, Yamakawa O (2002) Antimutagenicity of Mono-, Di-, and Tricaffeoylquinic Acid Derivatives Isolated from Sweetpotato ( Ipomoea batatas L.) Leaf. Biosci Biotechnol Biochem 66: 2336–2341 123

Zeng Y, Zhao S, Yang S, Ding SY (2014) Lignin plays a negative role in the biochemical process for producing lignocellulosic biofuels. Curr Opin Biotechnol 27: 38–45 Zhang B, Yang X, Yang C, Li M, Guo Y (2016) Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia. Sci Rep 6: 20315 Zhang J, Yang Y, Zheng K, Xie M, Feng K, Jawdy SS, Gunter LE, Ranjan P, Singan VR, Engle N, et al (2018) Genome-wide association studies and expression-based quantitative trait loci analyses reveal roles of HCT2 in caffeoylquinic acid biosynthesis and its regulation by defense-responsive transcription factors in Populus. New Phytol 220: 502–516 Zhao Q, Nakashima J, Chen F, Yin Y, Fu C, Yun J, Shao H, Wang X, Wang Z-Y, Dixon R a (2013) Laccase is necessary and nonredundant with peroxidase for lignin polymerization during vascular development in Arabidopsis. Plant Cell 25: 3976–87 Zhong R, Iii W, Negrel J, Ye Z (1998) Dual pathways in lignin biosynthesis. Plant Cell 10: 2033–2046 Zhou X, Jacobs TB, Xue LJ, Harding SA, Tsai CJ (2015) Exploiting SNPs for biallelic CRISPR mutations in the outcrossing woody perennial Populus reveals 4-coumarate: CoA ligase specificity and redundancy. New Phytol 208: 298–301 Zhou X, Ren S, Lu M, Zhao S, Chen Z, Zhao R, Lv J (2018) Preliminary study of Cell Wall Structure and its Mechanical Properties of C3H and HCT RNAi Transgenic Poplar Sapling. Sci Rep 8: 1–10

124

SUPPLEMENTARY INFORMATION

Supplementary Table 1. The matrix of identity from HCT isoforms.

Sequence HCT1 HCT2 HCT3 HCT4 HCT1 - 0.956 0.814 0.813 HCT2 0.956 - 0.823 0.821 HCT3 0.814 0.823 - 0.954 HCT4 0.813 0.821 0.954 -

125

Supplementary Table 2. BlastN from HCT in N.tabacum TN90 Genome in Sol Genomics database.

e- Description Ident Region mRNA Name value Ntab-TN90_AYMY- 396784-397686, 401711- gene_27881 HCT1 SS16269 0 100% 402118 Ntab-TN90_AYMY-SS218 0 96% 92884-93786, 89665-90072 gene_45849 HCT2 Ntab-TN90_AYMY- 476756-477647, 481209- gene_29243 HCT3 SS16452 0 83% 481619 Ntab-TN90_AYMY-SS9042 0 83% 50610-51501, 48086-48496 gene_83292 HCT4 Ntab-TN90_AYMY- 522424-522775 - SS10287 1e-127 91%

126

Supplementary Table 3. BlastP to search CSE from tobacco in SOL genomics database.

Query e- Name ID Sol Genomics ID Name Ident ID Cover value Lysophospholipase e- AT1G52760 CSE2 mRNA_119258_cds 89% 80% 2 153 .1 Lysophospholipase e- AT1G52760 CSE1 mRNA_108581_cds 88% 80% 2 153 .1

127

Supplementary Table 4. BlastN from CSE in Sol Genomics database – N. tabacum TN90 Genome.

ID Sol Genomics Sequence Cromossome Localization Ntab-TN90_AYMY-SS390 mRNA_119258 - gene_55941 13652-14489, 22395-22962 (CSE2) Ntab-TN90_AYMY- mRNA_108581 - gene_50887 200285-201004, 198409- SS2876 (CSE1) 198999

128

Supplementary Figure 1. Example of age and size from tobacco plants used in the analyses. A) Region was the histological cuts were done – 7th internode; B) Height from plants used for the analyses.

129

A 70 qPCR CSE - CSE mutants T0 60 50 40 30 20 10

Relative Quantities RelativeQuantities (CNRQ) 0

WT

CSE1 P1 CSE1 P4 CSE1 P6 CSE1 P7 CSE1 P8 CSE1 P9 CSE1 P1 CSE2 P3 CSE2 P5 CSE2 P6 CSE2 P8 CSE2 P9 CSE2 CSE1 P3 CSE1 P2 CSE2

CSE1 P2 CSE1

CSE1 P12 CSE1 P12 CSE2 B P14 CSE2 qPCR CSE - HCTamiCSE mutants T0 6 5 4 3 2 1

0 Relative Quantities RelativeQuantities (CNRQ)

C

qPCR HCT - HCTamiCSE mutants T0 20 15 10 5

0 Relative Quantities RelativeQuantities (CNRQ)

Supplementary Figure 2. qPCR from CSE Mutants and HCTamiCSE double-mutants in T0 in order to select the lines to analyze in T1 generation. A) qPCR from CSE gene in T0 CSE mutants; B) qPCR from CSE gene in T0 HCTamiCSE double mutants; C) qPCR from HCT1 gene in T0 HCTamiCSE double mutants.

130

Supplementary Table 5. Matrix of identity between the haplotypes of HCT from N.tabacum and the haplotypes found in the genomes of the ancestors. The number highlighted in red represent the closest ones. Gene N.sylv. N. tomen N.tomen HCT4 HCT1 HCT3 HCT2 gene_31158 gene_29186 gene_21152

N.sylv 0.0000 0.0464 0.2056 0.0480 0.2068 0.0008 0.1949 gene_31158 N.tom 0.0464 0.0000 0.2045 0.0015 0.2057 0.0456 0.1951 gene_29186 N.tom 0.2056 0.2045 0.0000 0.2066 0.0008 0.2052 0.0457 gene_21152 HCT4 0.0480 0.0015 0.2066 0.0000 0.2078 0.0473 0.1972

HCT1 0.2068 0.2057 0.0008 0.2078 0.0000 0.2063 0.0449

HCT3 0.0008 0.0456 0.2052 0.0473 0.2063 0.0000 0.1944

HCT2 0.1949 0.1951 0.0457 0.1972 0.0449 0.1944 0.0000

131

1.6 Stem Leaves

1.4

1.2

1.0

0.8

0.6

Relative quantite quantite Relative CNRQ 0.4

0.2

0.0 HCT Group 2 HCT Group 1 Groups of HCT haplotypes Supplementary Figure 3. qPCR from tobacco stem and leaves to compare the expression from both groups of HCT alleles.

132

M 1 2 3 4 5 6 7 8 9 10 11 12

Supplementary Figure 4. Second round of PCR after BseLI digestion from PCR of agro- transient assay using Level 1 vectors. M - NEB 2-log; 2 – negative control; 3 – Digestion from genomic DNAg used as control (HCT1); 4 – 7 PCR after digestion of samples 9 -12 from Figure 11 (HCT1);8 – Digestion from genomic DNAg used as control (HCT2); 9 – 12 PCR after digestion of samples 13 – 16 from Figure 11 (HCT2). 133

Supplementary Table 6. Experiment design to assemble Level 2 vectors. Level Level 1 Level 1 Level 1 Level 1 Level 1 Level 1 Name 1 Position Position1 Position3 Position4 Position5 Position6 Position2 7 U6- U6- U6- NOS::Kana CaMV35 U6::26sgRN pICH418 A 26::sgRNA 26::sgRNA 26::sgRNA2 mycin S:: Cas9 A1.6 22 1.1 2.6 .3 U6- U6- U6- U6- NOS::Kana CaMV35 pICH418 B 26::sgRNA 26::sgRNA1. 26::sgRNA 26::sgRNA2 mycin S:: Cas9 22 1.4 2 2.6 .3 U6- U6- NOS::Kana CaMV35 C 26::sgRNA 26::sgRNA1. pICH41780 - - mycin S:: Cas9 1.1 6 U6- U6- NOS::Kana CaMV35 D 26::sgRNA 26::sgRNA1. pICH41780 - - mycin S:: Cas9 1.4 2 U6- U6- NOS::Kana CaMV35 pICH418 E pICH54033 pICH54044 26::sgRNA 26::sgRNA2 mycin S:: Cas9 22 2.3 .6 U6- U6- NOS::Kana CaMV35 F pICH54033 26::sgRNA4. 26::sgRNA pICH41800 - mycin S:: Cas9 2 4.4 U6- U6- NOS::Kana CaMV35 G 26::sgRNA 26::sgRNA pICH41780 - - mycin S:: Cas9 5.1 5.6 U6- U6- NOS::Kana CaMV35 H 26::sgRNA 26::sgRNA5. pICH41780 - - mycin S:: Cas9 5.7 5 U6- U6- NOS::Kana CaMV35 pICH418 I pICH54033 pICH54044 26::sgRNA 26::sgRNA8 mycin S:: Cas9 22 8.1 .4 U6- U6- NOS::Kana CaMV35 pICH418 J pICH54033 pICH54044 26::sgRNA 26::sgRNA8 mycin S:: Cas9 22 8.1 .5 U6- U6- NOS::Kana CaMV35 K 26::sgRNA 26::sgRNA pICH41780 - - mycin S:: Cas9 9.1 9.3 U6- NOS::Kana CaMV35 U6-26:: L 26::sgRNA pICH41780 - - mycin S:: Cas9 sgRNA 9.3 9.2 U6- U6- NOS::Kana CaMV35 pICH418 M pICH54033 pICH54044 26::sgRNA 26::sgRNA1 mycin S:: Cas9 22 10.1 0.2 U6- U6- NOS::Kana CaMV35 N 26::sgRNA 26::sgRNA pICH41780 - - mycin S:: Cas9 11.1 11.2

134

Supplementary Table 7. Experiment design to Agro-infiltration. The table describes the gene/group of genes targeted, which sgRNA combination was used, the expected size of deletion, the enzyme used to the restriction enzyme (RE) site loss method and the target from RE. sgRNA Group Size Restriction Gene sgRNA sgRNA Targeted Name Deletion Enzyme RE HCT TE 1 1.1 1.2 20pb Hpy166II Deletion group1 HCT TE 2 1.4 1.6 37pb None - group1 HCT TE 3 2.2 2.3 60pb BseLI 2.3 group2 HCT TE 4 2.6 2.7 29pb HincII 2.6 group2 TE 5 CSE 4.1 4.2 30pb BceAI Deletion

TE 6 CSE 4.1 4.5 100pb NcoI 4.5

TE 7 CSE 4.2 4.4 76pb AgeI 4.4

135

ATTACHMENT

Attachment 1

136

Attachment 2

137

Attachment 3