University of Calgary PRISM: University of Calgary's Digital Repository

Graduate Studies The : Electronic Theses and Dissertations

2014-02-07 Comparative transcriptomics and metabolism in pea (Pisum sativum) seed coat

Ferraro, Kiva

Ferraro, K. (2014). Comparative transcriptomics and proanthocyanidin metabolism in pea (Pisum sativum) seed coat (Unpublished doctoral thesis). University of Calgary, Calgary, AB. doi:10.11575/PRISM/25368 http://hdl.handle.net/11023/1373 doctoral thesis

University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. Downloaded from PRISM: https://prism.ucalgary.ca UNIVERSITY OF CALGARY

Comparative transcriptomics and proanthocyanidin metabolism in pea (Pisum sativum) seed coat

by

Kiva Sage Ferraro

A THESIS

SUBMITTED TO THE FACULTY OF GRADUATE STUDIES

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

DEGREE OF DOCTOR OF PHILOSOPHY

DEPARTMENT OF BIOLOGICAL SCIENCES

CALGARY, ALBERTA

JANUARY, 2014

© Kiva Ferraro 2014 Abstract

Plants produce a vast array of specialized compounds known as secondary metabolites, which were originally thought to be non-essential for plant survival. However, we now know that secondary metabolites play integral roles in plant defense, signalling, reproduction and more.

Proanthocyanidins (PAs) are a class of flavonoid polymers derived from the phenylpropanoid pathway. PAs accumulate in the seed coat, bark, and leaves of many plants and are believed to play a role in plant defense. Recent evidence of health benefits associated with PA consumption has spurred new research interests in PA biosynthesis. Many of the studies of seed coat PA biosynthesis have been conducted in non-crop species that produce a limited variety of PAs. Pea

(Pisum sativum) offers a number of unique advantages for PA research. Peas produce large seed coats, which are easy to isolate. Dry peas are an important source of nutrition for both humans and livestock, enabling research integration into the human diet and commercial agriculture.

Finally, centuries of breeding have produced a wide variety of pea cultivars, making pea a valuable phenotypic and genetic resource for plant research. Despite these advantages, PA biosynthesis in pea has not been well characterized. This work presents the seed coat PA chemical profile of three pea cultivars and the biochemical characterization of two PA branch point enzymes, anthocyanidin reductase and leucoanthocyanidin reductase, from pea. In addition, the seed coat transcriptomes of these three varieties were compared to those of two varieties lacking PAs in an effort to elucidate novel genetic mechanisms relating to PA biosynthesis. This comparative transcriptomic analysis was expanded to study general seed phenotypic differences between pea cultivars. Two target genes were identified; one related to seed weight and another to PAs, the latter of which was further characterized.

ii Acknowledgements

Scientific inquiry is not an individual endeavour. This work would not have been possible without the personal and academic support I have received from many people. I would like to thank my supervisor, Dr. Dae-Kyun Ro, for his advice and support in designing this project and mentoring me throughout the past six years. His guidance and enthusiasm for research have been an inspiration to me. I would also like to thank my supervisory committee members, Dr. Douglas

Muench and Dr. Peter Facchini, for their advice and support. Parts of this work were done in collaboration with Dr. Jocelyn Ozga (University of Alberta), Dr. Alena L. Jin and Dr. Mark

Wildung (Washington State University). Their contributions and expertise were invaluable.

Science has a steep learning curve and I would also like to thank all of the past and present members of the Ro Lab who have taught me much and helped with the inevitable troubleshooting, including Gillian MacNevin, Dr. Vince Yang Qu, Dr. T. Don Nguyen, Rod

Mitchell, Dr. Hue Tran, Dr. Romit Chakrabarty, and Dr. Benjamin Pickel.

I wish to also extend my gratitude to the many friends with whom I have shared coffee walks and science talks, including Dr. Glen Urhig, Scott Farrow, Guillaume Beaudoin, and

Donald Dinsmore. During my studies I was fortunate to receive financial support from the

Province of Alberta (Queen Elizabeth II Scholarships) and from the University of Calgary.

Without the personal support of close friends and especially my family, I would not be writing this today. Thank you Annie, Phil, Nancy, Dave, Calina and Shaman for all your love and encouragement. And to my wife, Natasha, who has been with me every step of this journey, you have my eternal gratitude for all your love, sacrifice and support.

iii Dedication

To the giants of knowledge whose shoulders I stand upon,

And to all of those who helped lift me up.

iv Table of Contents

Abstract ...... ii Acknowledgements ...... iii Dedication ...... iv Table of Contents ...... v List of Tables ...... ix List of Figures ...... x List of Abbreviations ...... xii Epigraph ...... xiv

CHAPTER ONE: INTRODUCTION ...... 1 1.1 Plant secondary metabolism ...... 1 1.2 Phenylpropanoid metabolism ...... 2 1.2.1 Plant polyphenols ...... 2 1.2.2 Biosynthesis of phenylpropanoid precursors: phenylalanine and tyrosine ...... 3 1.2.3 Core phenylpropanoid metabolism (PAL, C4H and 4CL) ...... 4 1.2.4 Flavonoid biosynthesis ...... 6 1.2.4.1 Isoflavonoids ...... 7 1.2.4.2 Flavanols, dihydroflavanols and flavanones ...... 8 1.2.4.3 Anthocyanin and proanthocyanidin branch point ...... 10 1.3 ...... 11 1.3.1 Proanthocyanidin biosynthesis ...... 11 1.3.2 Trans-flavan-3-ol biosynthesis by LAR ...... 13 1.3.3 Cis-flavan-3-ol biosynthesis by ANR ...... 15 1.3.4 Cis- to trans-flavan-3-ol epimerization by ANR ...... 16 1.3.5 Substrate channelling in the PA biosynthetic pathway ...... 17 1.3.6 Intracellular transport of PA precursors ...... 18 1.3.6.1 Glutathione S-transferase (GST) in PA metabolism ...... 19 1.3.6.2 Transporters in PA metabolism ...... 21 1.3.6.3 Proton (H+)-antiporter in PA metabolism ...... 22 1.3.6.4 Glycosylation of PA monomers ...... 22 1.3.6.5 Vesicle-mediated transport of PA monomers ...... 23 1.3.7 PA structures ...... 25 1.3.8 Condensation of PA monomers ...... 26 1.3.9 Transcriptional regulation of PA biosynthesis ...... 28 1.3.9.1 Transcriptional regulation of PA metabolism in Arabidopsis ...... 30 1.3.9.2 Hormonal regulation of Arabidopsis PA biosynthesis ...... 34 1.3.9.3 Feedback and RNAi-mediated regulation of flavonoid biosynthesis ....35 1.4 Pea: Pisum sativum L...... 36 1.5 Next generation sequencing (NGS) technology ...... 38 1.5.1 Roche/454-pyrosequencing ...... 39 1.5.2 Illumina Hi-Seq 2000 sequencing ...... 39 1.5.3 NGS data processing and analysis ...... 40 1.6 Summary ...... 42 1.7 Research Objectives ...... 43

v CHAPTER TWO: MATERIALS AND METHODS ...... 49 2.1 Plant material and growth conditions ...... 49 2.2 Genomic DNA extraction ...... 50 2.3 Arabidopsis thaliana Nossen AT2G47115 T-DNA insertion genotyping ...... 51 2.4 RNA isolation and cDNA preparation ...... 51 2.4.1 Next-generation sequencing (NGS) RNA extraction and cDNA preparation .51 2.5 Cloning of the Pisum sativum ANR, DFR and LAR ...... 53 2.6 Characterization of Pisum sativum ANR, DFR and LAR ...... 54 2.6.1 Recombinant expression and purification of Pisum sativum ANR, DFR and LAR ...... 54 2.6.2 PsANR in vitro assays ...... 55 2.6.3 PsDFR-PsLAR coupled enzyme assays ...... 57 2.7 Liquid chromatography mass spectrometry (LC-MS/MS) ...... 57 2.8 Arabidopsis thaliana PsLAR transgenic plants ...... 58 2.9 Quantitative reverse-transcription PCR ...... 59 2.10 Extraction, purification and identification of proanthocyanidins ...... 60 2.11 Transcriptome assembly and RNA-Seq analysis ...... 61 2.12 AT2G47115 (AtPUP) subcellular localization ...... 63

CHAPTER THREE: ANALYSIS OF PISUM SATIVUM PA CHEMICAL PROFILE AND CHARACTERIZATION ANTHOCYANIDIN REDUCTASE AND LEUCOANTHOCYANIDIN REDUCTASE ...... 67 3.1 Introduction ...... 67 3.2 Results ...... 70 3.2.1 Proanthocyanidin profile of Pisum sativum cultivars ...... 70 3.2.2 ‘Courier’ proanthocyanidin profile over seed coat development ...... 72 3.2.3 Cloning and characterization P. sativum ANR ...... 73 3.2.4 Transcriptomics of P. sativum seed coat ...... 74 3.2.5 Cloning and characterization P. sativum LAR ...... 77 3.2.6 Developmental regulation of PsANR, PsDFR and PsLAR in pea seed coat ...79 3.2.7 Heterologous expression of PsLAR in Arabidopsis ...... 79 3.3 Discussion ...... 80 3.3.1 Proanthocyanidin in pea (Pisum sativum) ...... 80 3.3.2 Contribution of LAR to proanthocyanidin biosynthesis in pea and other plants ...... 81

CHAPTER FOUR: COMPARATIVE TRANSCRIPTOMICS IN PEA SEED COAT ..100 4.1 Introduction ...... 100 4.1.1 Comparative transcriptomics by RNA-sequencing (RNA-Seq) ...... 101 4.2 Results and Discussion ...... 102 4.2.1 Phenotype of five P. sativum cultivars ...... 102 4.2.2 Down-regulation of key metabolic genes in phenylpropanoid pathway in PLC ...... 104 4.2.3 NGS sequencing and transcriptome assembly ...... 105 4.2.3.1 Further evaluation of the transcript assembly ...... 106 4.3 Comparative Transcriptomics ...... 110

vi 4.3.1 Differential expression analysis of known phenylpropanoid genes in PACs vs PLCs ...... 111 4.3.2 A point mutation in A/PsTT8 causes mis-splicing in ‘Alaska’ and ‘Canstar’ ...... …..112 4.3.3 Comparison between RNA-Seq and qRT-PCR ...... 114 4.4 Differential gene expression analysis ...... 115 4.4.1 ‘Solido’ versus ‘LAN3017’/’Courier’ ...... 115 4.4.2 PACs versus PLCs ...... 116 4.4.3 Hormone metabolism developing pea seed coat ...... 118 4.4.3.1 Gibberellin metabolic genes ...... 119 4.4.3.2 Abscisic acid metabolic gene expression in pea seed coat ...... 120 4.4.3.3 Auxin metabolic genes ...... 120 4.4.4 Carbohydrate and protein metabolism in pea seed coat ...... 121 4.4.4.1 Amino acid transporters ...... 122 4.4.4.2 Amino acid metabolism ...... 122 4.4.4.3 Polysaccharide metabolism ...... 123 4.5 Summary ...... 124

CHAPTER FIVE: CHARACTERIZATION OF PA-RELATED UNKNOWN PROTEIN ...... ….137 5.1 Introduction ...... 137 5.1.1 PA extraction and analysis ...... 137 5.2 Results ...... 138 5.2.1 Identification of Arabidopsis PUP ortholog ...... 138 5.2.2 Expression profiles of PUP in P. sativum and Arabidopsis ...... 139 5.2.3 Subcellular localization of PUP ...... 140 5.2.4 Identification of transposon insertion pup mutants ...... 142 5.2.5 PA characterization in Arabidopsis pup mutants ...... 143 5.3 Discussion ...... 143

CHAPTER SIX: GENERAL DISCUSSION...... 151 6.1 Pisum sativum proanthocyanidins ...... 151 6.2 PA-branch point enzymes: anthocyanidin reductase and leucoanthocyanidin reductase ...... 152 6.2.1 Anthocyanidin reductase ...... 152 6.2.2 Metabolic flux through leucoanthocyanidin reductase ...... 153 6.2.3 Protein-protein interactions and metabolite channelling in the phenylpropanoid pathway ...... 155 6.2.3.1 Does metabolite channelling occur between LAR and DFR? ...... 157 6.3 Pea seed coat comparative transcriptomics ...... 159 6.3.1 RNA-Seq analysis uncovered a CYP78A gene family homolog in ‘Solido’ 160 6.3.2 A possible role for PA-Related Unknown Protein (PUP) in PA trafficking .163

APPENDIX A: COMPARATIVE TRANSCRIPTOMICS ...... 184 A.1. Unigenes differentially expressed in ‘Solido’ (S) versus ‘Courier’ (Cr) and ‘LAN3017’ (L)...... 184

vii A.2. Unigenes differentially expressed in PACs versus PLCs. RPKM values. A, ‘Alaska’; Cn, ‘Canstar’; Cr, ‘Courier’; L, ‘LAN3017’; S, ‘Solido’...... 192 A.3. Key genes involved in hormone, amino acid and polysaccharide metabolism. RPKM values: A, ‘Alaska ’; Cn, ‘Canstar’; Cr, ‘Courier’; L, ‘LAN3017’; S, ‘Solido’. .198

viii List of Tables

Table 2.1 List of PCR primers used in this work ...... 64

Table 3.1 PA chemical analyses of ‘Courier’, ‘Solido’and ‘LAN3017’ pea seeds and seed coats...... 95

Table 3.2 Characterization of phloroglucinolysis products from pea seeds using LC-MS-MS analysis...... 96

Table 3.3 PA profiles in developing seed coats of ‘Courier’ ...... 97

Table 3.4 PsANR reaction kinetics using cyanidin, pelargonidin or delphinidin as a substrate. . 97

Table 3.5 Top 20 unigenes in ‘Courier’ 10-25 DAA seed coat transcriptome...... 98

Table 3.6 Relative transcript abundance of phenylpropanoid and PA pathway genes in ‘Courier’ 10-25 DAA seed coat transcriptome...... 99

Table 4.1 NGS reads summary...... 134

Table 4.2 Basic assembly metrics ...... 134

Table 4.3 Number of unigenes matching to known pea gene sequences in different de novo assemblies...... 134

Table 4.4 Illumina RNA-Seq estimation of expression of known PA biosynthetic and regulatory genes in 10 DAA pea seed coats...... 135

Table 4.5 Summary of genes of interest identified through differential gene expression analysis ...... 136

ix List of Figures

Figure 1.1 Phenylpropanoid pathway...... 45

Figure 1.2 PA chemical diversity...... 46

Figure 1.3 Naturally common isomers are shaded in green. Naturally common isomers are shaded in green ...... 47

Figure 1.4 Pisum sativum seed diversity...... 48

Figure 3.1 HPLC chromatograms of the phloroglucinol acid hydrolysis products from pea seeds of ‘Courier’, ‘LAN3017’, and ‘Canstar’ ...... 85

Figure 3.2 Expression of PsANR, PsLAR, and PsDFR in E. coli and purification of the recombinant proteins ...... 85

Figure 3.3 In vitro characterization of PsANR recombinant enzyme ...... 86

Figure 3.4 PsANR in vitro assay controls ...... 87

Figure 3.5 Alignment of LAR protein sequences ...... 89

Figure 3.6 In vitro PsDFR and PsLAR coupled assays ...... 90

Figure 3.7 LC-MS/MS profiles of the PsDFR-PsLAR coupled assay and controls ...... 91

Figure 3.8 Temporal profiles of PsANR, PsDFR and PsLAR transcript abundance, PA content and mean degree of polymerization in pea seed coats of ‘Courier’ ...... 92

Figure 3.9 RT-PCR and immunoblot analyses of PsLAR transcript and protein in Arabidopsis wild-type (WT) and Arabidopsis ANR knock-out (anr) lines ...... 93

Figure 3.10 Analysis of seed coat color and PA chemical profile from PsLAR-transgenic Arabidopsis...... 94

Figure 4.1 Dry seed from Pisum sativum cultivars ‘Alaska’, ‘Canstar’, ‘Courier’, LAN3017’, and ‘Solido’ ...... 126

Figure 4.2 Relative expression of PA biosynthetic genes in pea seed coats at 10 DAA determined by qRT-PCR ...... 127

Figure 4.3 Comparison of de novo assembled pea transcriptomes ...... 128

Figure 4.4 GOSlim biological process term assignments for 454NB transcriptome ...... 129

Figure 4.5 PsA (bHLH transcription factor) mutations in ‘Alaska ’ and ‘Canstar’ ...... 130

x Figure 4.6 Comparison of gene expression determined by qRT-PCR and Illunina RNA-Seq relative to ‘Canstar’ or ‘Solido’...... 131

Figure 4.7 Microarray expression profiles of M. truncatula orthologs in different tissues and in transgenic hairy roots overexpressing β-glucuronidase (control) or A. thaliana transparent testa2 (AtTT2) ...... 132

Figure 4.8 Gibberellin metabolism pathway and related gene expression in pea seed coat at 10 DAA ...... 133

Figure 5.1 Comparison between P. sativum PA-Related Unknown Protein (PsPUP) and Arabidopsis AT2G47115 (AtPUP) ...... 145

Figure 5.2 Expression of profile of PUP orthologs in pea and Arabidopsis determined by qRT-PCR ...... 146

Figure 5.3 Subcellular localization of AtPUP transiently expressed in tobacco leaf epidermal pavement cells and a stomatal guard ...... 147

Figure 5.4 Subcellular localization of AtPUP by transient expression in a tobacco leaf stomatal guard cell and pavement cell ...... 148

Figure 5.5 Transposon insertion position in AtPUP in A. thaliana Nossen ...... 149

Figure 5.6 PCR screening of A. thaliana Nossen AtPUP transposon insertion homozygous mutants and wild-type segregant plants...... 149

Figure 5.7 Characterization of seed proanthocyanidins in A. thaliana Nossen wild-type segregants and Atpup transposon homozygous mutants...... 150

xi List of Abbreviations

454 Roche/454-pyrosequencing

454NB de novo assembly of Roche/454-pyrosequencing reads

ANR anthocyanidin reductase bp bases pairs

Cat catechin cDNA complementary DNA

CLC CLC Genomics Workbench

CYP cytochrome P450-dependent monooxygenase

DAA days after anthesis

DFR dihydroflavanol 4-reductase

DHM dihydromyricetin

DHQ dihydroquercetin

EAZ epiafzelechin

EC epicatechin

EGC epigallocatechin

Gbp giga base pairs (billions)

GC gallocatechin

HPLC high performance liquid chromatography hr hour

ILM Illumina Hi-Seq 2000 sequencing kb kilo bases (thousands)

LAR leucoanthocyanidin reductase

xii LC-MS liquid chromatography-mass spectrometry

Mbp mega bases (millions) mDP mean degree of polymerization min minute

NB Newbler

NGS next generation sequencing

ORF open reading frame

PA proanthocyanidin

PAC proanthocyanidin-accumulating cultivar

PCR polymerase chain reaction

PUP PA-related Unknown Protein

PLC proanthocyanidin-lacking cultivar qRT-PCR quantitative reverse transcription PCR

RACE rapid amplification of cDNA ends

RNA-Seq RNA sequencing

RPKM reads per kilobase pair per million reads mapped

RT-PCR reverse transcription PCR

TF transcription factor

xiii Epigraph

“Somewhere, something incredible is waiting to be known.”

‒ Carl Sagan

xiv Declarations

Chapter 3 of the thesis was performed in collaboration with Dr. Alena Jin at Dr. Jocelyn Ozga’s

laboratory in the University of Alberta. Chemical analyses of proanthocyadidin in pea seed coat

were performed by Alena Jin, and I conducted all other experiments in Chapter 3. Chemical

analysis results are necessary for the interpretation of the data presented in Chapter 3 and

Chapter 4, and thus were included in this dissertation.

The content of Chapter 3 was submitted to BMC Plant Biology on December 1, 2013. The

manuscript title and authors are as follows.

“Characterization of proanthocyanidin biosynthesis in pea (Pisum sativum) seeds” by

Kiva Ferraro, Alena L. Jin, Dennis M. Reinecke, Jocelyn A. Ozga, and Dae-Kyun Ro

The de novo assembly of the Roche/454-sequencing reads by Newbler software was performed by Dr. Mark Wildung at Washington State University. I performed all evaluations of the de novo assemblies and conducted all other bioinformatics analyses presented in Chapter 4.

xv

Chapter One: Introduction

1.1 Plant secondary metabolism

In plants, secondary (or specialized) metabolism is a term used to describe biochemical

processes that were originally thought to produce compounds non-essential to the basic survival

needs of a plant, in contrast to the products produced by primary metabolic pathways, such as glycolysis, pentose phosphate pathway, shikimate pathway, or the Kreb cycle. However, over the past several decades research has shown secondary metabolites act in plant defense responses against microbial pathogens and herbivores, play roles as attractants for pollinators or seed-

dispersing herbivores, and act as mediators of plant-plant and plant-microbe signalling

(Pichersky and Gang, 2000; Theis and Lerdau, 2003; Landolino and Cook, 2009). Thus, secondary metabolism significantly increases the evolutionary fitness and success of plants in diverse niches.

Several tens of thousands of secondary metabolites have been identified, though this likely only represents a small percentage of the chemical diversity that naturally exists.

Phenylpropanoids are found in all vascular plants and are one of the most thoroughly described families of secondary metabolites. The family is made up of a number of major classes, such as lignin, lignans, coumarins, flavonoids, and . Over 7,000 phenylpropanoids have been identified, of which more than 5,000 are flavonoids (Wink, 2010).

Structurally, phenylpropanoids are C6-C3 derivatives derived from L-phenylalanine or, to

a lesser, extent L-tyrosine. A range of environmental and developmental factors can influence

phenylpropanoid levels in plants, including light, developmental stage, herbivore and pathogen

attack, and nutrient deficiency or excess (Tohge et al., 2013). The flavonoid branch of the

phenylpropanoid pathway consists of several main subclasses of compounds: anthocyanins,

1

proanthocyanidins, isoflavonoids, and flavanols. Due to pigmentation properties of anthocyanins, much of early plant research was focused on this branch of the flavonoid pathway as metabolic perturbation can be easily visualized in an induced mutant population. Over the past couple decades, flavonoid biosynthesis has received renewed research interest due to the association between these compounds and a variety of health benefits (Dixon et al., 2005). Advances in

DNA sequencing technology have enabled new approaches for examining the genetic control of flavonoid biosynthesis in non-model species.

1.2 Phenylpropanoid metabolism

1.2.1 Plant polyphenols

Plants produce an enormous variety of polyphenols from a few shikimate pathway intermediates through phenylpropanoid metabolism (Fig. 1.1). These compounds serve a wide array of purposes including structural integrity, defense, signalling and adaptation to new habitats (Vogt, 2010). For example, coumarins, which are produced from derivatives of cinnamic and 4-coumaric acids, function in UV-B protection, defense and neutralization of oxygen radicals (Tohge et al., 2013). Hydroxycinnamoyl alcohol derivatives can be polymerized to produce lignins, which are found in all vascular plants and are the second most abundant polymers on earth, containing roughly 30% of the organic carbon present in the biosphere

(Boerjan et al., 2003). Lignins are a major structural component in plants and their biosynthesis has become an increased focal point in plant research due to their properties relating to timber quality and biofuel production (Li et al., 2008a). Lignans, not to be confused with lignins, are dimers of monolignols (hydroxycinnamoyl alcohol derivatives) linked by a central carbon (C8) and are widely distributed in plants. Lignans act primarily as antifeedants against insect pests

2

(Harmatha and Dinan, 2003). In humans, dietary lignans are metabolized to phytoestrogens,

which have been linked to a range of health benefits, including anticancer properties and

protection against coronary heart disease (Adlercreutz, 2007). A number of volatile

phenylpropanoid derivatives play important roles in fragrance, pollinator attractants, plant-plant

signalling, and in defense against insect pests and herbivores (Vogt, 2010). These compounds are

also commercially important for perfumes and food fragrance industries. For example, the

phenylpropenes, eugenol and chavicol, derived from coniferyl alcohol, are important fragrance

compounds that have a spicy, sweet odor and are major constituents of basil oil (Gang et al.,

2001). Commercially, eugenol is used in vanilla production.

1.2.2 Biosynthesis of phenylpropanoid precursors: phenylalanine and tyrosine

Flux from primary metabolism to phenylpropanoid biosynthesis is primarily routed

through the plant shikimate pathway, which produces L-tryptophan (Trp), L-phenylalanine (Phe)

and L-tyrosine (Tyr) as the major end-products. These aromatic amino acids account for over a

third of the photosynthetically-fixed carbon in vascular plants, of which Phe constitutes the

highest percentage of carbon flux (Maeda and Dudareva, 2012).

The shikimate pathway exists in the and converts the glycolysis and pentose

phosphate pathway products, D-erythrose-4-phosphate and phosphoenolpyruvate, respectively, to the core intermediate chorismate, which is further metabolized to Phe by the arogenate pathway (Maeda and Dudareva, 2012). While some evidence exists to suggest that low levels of

Phe can induce genes in the shikimate pathway, numerous studies have demonstrated induction of the pathway in response to abiotic stimuli, such as ozone or wounding, and biotic stimuli, such as pathogens (Maeda and Dudareva, 2012). Furthermore, spatial and temporal regulation of the

3

shikimate pathway coincides with the developmentally regulated production of phenylpropanoids in various stages and tissues in plant development (Maeda and Dudareva,

2012).

1.2.3 Core phenylpropanoid metabolism (PAL, C4H and 4CL)

Entry into the phenylpropanoid pathway begins with phenylalanine ammonia lyase (PAL,

EC 4.3.1.5), which catalyzes the formation of trans-cinnamic acid from Phe (Fig. 1.1) by non- oxidative deamination and the introduction of a trans-double bond between C7 and C8 of the side chain, a reaction which is unusual due to the lack of a required cofactor. PAL is highly homologous to the less common tyrosine ammonia lyase (TAL), which to date has been primarily studied in bacteria (Kyndt et al., 2002, Berner et al., 2006) and the extent to which TAL exists in vascular plants is debatable as some PALs possess TAL activity (Rosler et al., 1997). A single amino acid substitution of His89 to Phe is sufficient to convert TAL activity to PAL activity and vice versa (Watts et al., 2006). PAL exists as a gene family in plants, with as many as nine copies found in rice and as few as four in Arabidopsis thaliana (Hamberger et al., 2007).

Individual PAL members display differential expression in response to abiotic and biotic stimuli, as well as unique spatial and developmental profiles (Vogt, 2010). Regulation of PAL occurs primarily through modulation of gene expression, though negative-feedback by trans-cinnamic acid is also involved (Tohge et al., 2013).

Following deamination, 4-coumaric acid (Fig. 1.1) is generated by hydroxylation of trans-cinnamate at C4 position by a cytochrome P450 (CYP) monooxygenase belonging to the

CYP73 family, cinnamate 4-hydroxylase (C4H; EC 1.14.13.11). The CYP73 family is not found in fungi or bacteria and appears to be unique to higher plants (Tohge et al., 2013). C4H is a

4

membrane (; ER)-bound enzyme and displays strict substrate acceptance,

hydroxylating trans-cinnamate only (Petersen et al., 2010). C4H also exists as a gene family in some species; however, in Arabidopsis, C4H is encoded by a single gene, the mutation of which results in reduced content of multiple classes of phenylpropanoid compounds, dramatically impacting growth and development (Schilmiller et al., 2009).

4-Coumarate CoA-ligase (4CL, EC 6.2.1.12) converts the chemically inert 4-coumaric acid to a reactive coenzyme-A thioester, 4-coumaroyl-CoA, which is a central intermediate for numerous classes of phenylpropanoids (Fig. 1.1; Vogt, 2010). In vascular plants, 4CL is encoded by a gene family, the isoforms of which accept a small variety of substrates, including 4- coumaric, and caffeic and sinapic acids, which are hydroxycinnamoyl alcohol derivatives

(Petersen et al., 2010, Tohge et al., 2013).

Alternatively, hydroxycinnamic acids and alcohols can be activated by glucosylation at the aromatic hydroxyl group or the alipathic side chain, which is characteristic of phenylpropanoid storage or transport compounds, or overlaps with downstream products of hydroxycinnamic:CoA esters, respectively (Petersen et al., 2010). This glucosylation is catalyzed by uridine diphosphate glucose (UDPG)-dependent glucosyltransferases, a large family of enzymes consisting of over 100 members in Arabidopsis with involvement in numerous processes including detoxification of xenobiotics, modulation of plant hormone bioactivity, as well as alteration of the inter- and intracellular transport properties of compounds (Ross et al.,

2001). Substrate specificity of glucosyltransferases is often regio-specific, though a range of structurally similar substrates may be accepted, as exemplified by the grape (Vitis labrusca) glucosyltransferase, VLRSgt, which forms glucose esters of cinnamic, caffeic, coumaric and benzoic acids at an acidic pH (Hall and De Luca, 2007). Interestingly, at a basic pH the substrate

5

preference of VLRSgt changes to accept the flavonoids kaempferol, quercetin, naringenin and dihydroquercetin, demonstrating a bi-functional role of the enzyme (Hall and De Luca, 2007).

PAL, C4H and 4CL constitute the core phenylpropanoid pathway (Fig. 1.1) in vascular plants. These gene products produce the initial precursors from which a wide range of phenylpropanoids are produced.

1.2.4 Flavonoid biosynthesis

Flavonoids are found in a wide variety of foods and beverages, such as fruits and vegetables, tea, chocolate, coffee, wine, and are associated with numerous health benefits, including, anti-inflammatory, antimicrobial, anti-cancer and neuroprotective properties, which has generated considerable research into the biosynthetic and regulatory processes responsible for their accumulation (Tohge et al., 2013). In plants, flavonoids function as UV protectants, antimicrobial and herbivore defense compounds, and as signalling molecules in symbiotic relationships (Dixon and Pasinetti, 2010). Flavonoids can be divided into two groups based on their structure (Fig. 1.1): 2-phenylchromans (e.g. flavones, flavanols, flavan-3-ols, anthocyanidins) and 3-phenylchromans (e.g. isoflavonoids).

The first step committed step in the flavonoid biosynthetic pathway is chalcone synthase

(CHS; EC 2.3.1.74), a type III polyketide synthase, which catalyzes the biosynthesis of naringenin chalcone (also known as tetrahydroxychalcone; Fig. 1.1), through the condensation of the acetate residues from three malonyl-CoA with p-coumaroyl-CoA (4-coumaroyl-CoA). CHS is a member of a diverse gene family with high homology (60-95% amino acid identity) in

Embryophyta (land plants), but is also found in bacteria, fungi, and lycophytes (e.g. ferns)

(Tohge et al., 2013). Naringenin chalcone is an immediate precursor to a diverse variety of

6

flavonoid compounds separated into a number of subclasses, including aurones, stilbenes,

isoflavonoids, anthocyanins, and proanthocyanidins (PAs; Fig. 1.1).

Chalcone isomerase (CHI; EC 5.5.1.6) catalyzes an intramolecular cyclization of

naringenin chalcone yielding the flavanone (2S)-naringen (Fig. 1.1), which is a central

intermediate in the biosynthesis of isoflavonoids and flavon-4-ols, as well as for downstream

processes that produce anthocyanins and PAs. Two types of CHI enzymes exist and are

distinguished based on their products. Type I 6'-hydroxychalcone CHIs are found in non- leguminous plants whereas type II CHIs also convert 5-deoxychalcone to liquiritigenin as a branch of the legume-specific 5-deoxyflavonoid pathway (Tohge et al., 2013). Consistent with these roles, soybean (Glycine max) type I CHIs are co-regulated with other flavonoid biosynthetic genes whereas type II CHIs are coordinately regulated with isoflavonoid-specific genes (Ralston et al., 2005).

1.2.4.1 Isoflavonoids

Isoflavonoids are important signalling molecules in a process called nodulation, which involves the formation of a symbiotic relationship between legumes and nitrogen-fixing bacteria

(Rhizobiaceae), in which the bacteria infect the roots of the host plant, which in turn forms containment structures around the bacteria (Crespi and Gálvez, 2000). Isoflavonoid synthase

(IFS; EC 1.14.13.136), also known as 2-hydroxyisoflavanone synthase, is a cytochrome P450 that transfers the aryl ring from the C2 to the C3 position of 2S-naringenin and liquiritigen (Fig.

1.1), forming genistein and daidzein, respectively (Dixon and Pasinetti, 2010). Transformation of rice (Oryza sativa) with soybean IFS resulted in genistein synthesis in rice, which allowed the

7

activation of microbial nodulation-specific genes (Sreevidya et al., 2006). This result suggests

that a novel trait can be conferred to rice by manipulating its secondary metabolism.

1.2.4.2 Flavanols, dihydroflavanols and flavanones

Numerous modifications to the C6-C3-C6 carbon skeleton generate substantial diversity

in plant flavonoids. One such modification is hydroxylation of the B-ring (Fig. 1.1). Naringenin has a 4'-hydroxyl group on the B-ring, a product of C4H activity, but additional hydroxylation can be introduced by the action of two cytochrome P450s (CYP75), flavonoid 3'-hydroxylase

(F3'H; EC 1.14.13.21) and flavonoid 3',5'-hydroxylase (F3'5'H; EC 1.14.13.88), which add either one or two hydroxyl groups, respectively (Stotz et al., 1985, Holton et al., 1993). In addition to naringenin, these enzymes have been shown to catalyze the hydroxylation of flavonols, dihydroflavonols, and leucoanthocyanidins (Kaltenbach et al., 1999, Dixon et al., 2013).

Furthermore, a degree of functional redundancy may also exist as F3'5'H from Petunia hydrida

(petunia) and Catharanthus roseus (Madagascar periwinkle) is able to catalyze both 3'- and 3',5'- hydroxylation (Kaltenbach et al., 1999).

The presence and level of F3'H or F3'5'H activity has visible consequences in plants as the degree of B-ring hydroxylation changes the colour properties of the resulting anthocyanins, which are pigment molecules responsible for blue-purple (e.g. delphinidin) and pink/red-orange colours (e.g. cyanidin or pelaronidin) of flowers and fruits (Fig. 1.1). Production of delphinidin requires the activity of F3'5'H, whereas cyanidin is produced from flavonoids hydroxylated by

F3'H. 4'-hydroxylation only, yielding pelargonidin, is the default pathway that occurs in the absence of both F3'H or F3'5'H activity.

8

Anthocyanins are unique among flavonoids in that their colour is pH-dependent. A shift

from a strongly acidic (pH < 2) to a weakly acidic or neutral pH (pH 6-7) changes the colour between a reddish-orange to a purple-violet by way of a colourless intermediate (He and Giusti,

2010). The influence of vacuolar pH and degree of B-ring hydroxylation on floral color is well demonstrated in rose (Rosa hybrida). Despite centuries of extensive breeding of roses, traditional breeding was not able to create rose flowers with blue or purple petals. Roses lack F3’5’H and have an acidic , meaning the species fundamentally lacks the two primary components required for blue-purple pigmentation (Biolley and Jay, 1993). Only recently, through transgenic modification have blue roses been produced (Katsumoto et al., 2007).

Fe2+/2-oxoglutarate-dependent dioxygenases (2-ODD) make up a key enzyme family in pigment biosynthesis. A group of 2-ODD enzymes catalyze a number of reactions that can generate taxa and species-specific variations in flavonoid profiles (Tohge et al., 2013). These include flavanone 3-hydroxylase (F3H; EC 1.14.11.9), flavanol synthase (FLS; EC 1.14.11.23), and anthocyanidin synthase (ANS; EC 1.14.11.19), sometimes also referred to as leucoanthocyanidin dioxygenase (LDOX). C3-hydroxylation of flavanones by F3H yields dihydroflavanols, and subsequent 2,3-dehydration by FLS produces flavanols (Fig. 1.1).

Interestingly, mandarin (Citrus unshiu) FLS can catalyze both of these steps in vitro, converting the flavanone substrate (naringenin) to a flavanol (kaempferol; Lukačin et al., 2003, Martens et al., 2003). The bi-functional nature of FLS may be isoform-specific or be the result of incomplete evolution of the enzyme. However, Arabidopsis FLS displays broad substrate specificity, indicating the latter explanation may be more accurate (Turnbull et al., 2004).

Stereospecific reduction of dihydroflavonols to leucoanthocyanidins (flavan-3,4-diols) is performed by dihydroflavonol 4-reductase (DFR; EC 1.1.1.219; Fig. 1.1), a member of the short

9

chain dehydrogenase/reductase (SDR) superfamily. DFR exists as a gene family in some species, such as Lotus japonicas (Shimada et al., 2005) and M. truncatula (Xie et al., 2004a), while it is present as a single-copy gene in others, including Arabidopsis (Shirley et al., 1992) and grape

(Sparvoli et al., 1994). Species-specific substrate preferences of DFR can limit the diversity of downstream flavonoids in species such as orchids (Cymbidium hybrida) and petunia, which do not reduce dihydrokaempferol efficiently (Johnson et al., 1999). Even among species with multiple copies of DFR, differences in isoform substrate acceptance and spatial expression profiles has been observed (Xie et al., 2004a, Shimada et al., 2005).

1.2.4.3 Anthocyanin and proanthocyanidin branch point

Leucoanthocyanidins, the enzymatic products of DFR, represent the first of two branch points between anthocyanin and proanthocyanidin biosynthetic pathways. Anthocyanin synthase

(ANS) is a key enzyme in both pathways and was first biochemically characterized from a plant belonging to the mint family (Perilla frutescens), though clones had been previously identified and characterized in a number of species, including grape, M. truncatula, Gerbera hybrida, and

Arabidopsis (Saito et al., 1999, Abrahams et al., 2003, Wellmann et al., 2006, Pang et al., 2007,

Wang et al., 2010). LAR catalyzes a stereoselective C3-hydroxylation of leucoanthocyanidins, yielding a colourless 2-flaven-3,4-diol, which spontaneously converts to a pigmented falvylium ion (anthocyanidin, 3-hydroxy-anthocyanidin; Fig. 1.1) in the presence of an acid (Wilmouth et al., 2002). Interestingly, Arabidopsis ANS is reported to also produce both cis- and trans- dihydroquercetin (a dihydroflavanol), quercetin (a flavanol) and cyanidin (an anthocyanidin) when provided leucocyanidin as a substrate (Turnbull et al., 2000). Remarkably, cyanidin, the anticipated natural product of the reaction was formed in a relatively minor amount, and the two

10

dihydroflavanols were further converted to quercetin (Turnbull et al., 2000). Furthermore, another study found that naringenin can be converted to dihydrokaempferol (a dihydroflavanol) by ANS, with trace amounts of a 3-deoxyanthocyanidin and a flavonol accumulating (Welford et al., 2001). These studies support an initial C3-hydroxylation of the substrate by ANS, followed by non-enzymatic enolization, and points to a high degree of in vitro promiscuity with respect to substrate acceptance. Whether this promiscuity exists in vivo is uncertain. But if it does, it could prove problematic as indiscriminate ANS activity may redirect flux towards unintended pathways and reduce the overall efficiency of biosynthesis. Substrate channelling via a multi- enzyme complex has been proposed as a possible mechanism to overcome this issue (Winkel,

2004).

1.3 Proanthocyanidin

1.3.1 Proanthocyanidin biosynthesis

Tannins are natural polyphenols historically associated with the tanning of animal skins to make leather. The name derives from the French ‘tanin’, meaning ‘tanning substance’.

Tannins were initially classified based on their ability to hydrolyase in hot water. Hydrolysable tannins consist of gallotannins, composed of one galloyl unit and a polyol (typically a D-glucose derivative), and ellagitannins, formed from coupling of two galloyl units from gallocatechins

(Khanbabaee and Ree, 2001). Ellagitannins, which are only classified as hydrolysable tannins for historical purposes but are in fact not hydrolysable, are the largest group of known tannins with more than 500 types identified (Khanbabaee and Ree, 2001). Complex tannins, which are comprised of a unit of gallotannin or ellagitannin and the flavan-3-ol, catechin, is an intermediate group that are only partially hydrolyasable (Khanbabaee and Ree, 2001).

11

Proanothcyanidins (PAs; also known as condensed tannins), non-hydrolyzable tannins, are flavonoid oligomers and polymers typically composed of cis- and trans-flavan-3-ols (Fig. 1.1 and 1.2; Khanbabaee and Ree, 2001). The name, proanthocyanidin, derives from the release of anthocyanidins upon acid catalyzed hydrolysis of PAs. PAs are often described in terms of soluble/extractable and insoluble/unextractable fractions, which refer to the fraction of total PAs that can be solubilized in an organic solvent versus the fraction that remains bound to the plant tissue.

Most people are unknowingly familiar with PAs due to the astringent flavour they produce in wine, tea and fruit juices. PAs are also found in a variety of other foods including chocolate, nuts, fruits and berries (Rasmussen et al., 2005). Renewed research into the biosynthesis of PAs has been stimulated by correlations between human consumption and reduced risks of cardiovascular disease, impairment of certain cancers and improved management of diabetes (Dixon et al., 2005, Lee et al., 2008, Chung et al., 2009). PAs are also

of interest in animal agriculture as moderate accumulation (2-4% dry weight) of PAs in forage

crops can protect ruminants from pasture bloat and also promote animal growth by increasing

absorption of amino acids due to the ability of PAs to bind proteins, thereby slowing their

breakdown (Aerts et al., 1999). However, at higher PA concentrations (6-12% dry weight), the

effects are reversed due to reduced palatability and digestion of proteins (Aerts et al., 1999).

Conventional breeding has been unsuccessful at introducing PA into forage crops and so far

genetic engineering attempts have not achieved sufficiently high PA concentrations (Dixon et al.,

2013). A sophisticated understanding of PA biosynthesis and regulation is required to develop

finely tuned engineering strategies to achieve optimal PA content in forage crops.

12

In plants, PAs primarily accumulate in the seed coat where they are believed to play a

role in protecting the embryo and endosperm (Lepiniec et al., 2006). Upon maturation and desiccation of the seed coat, PAs are oxidized producing a characteristic brown pigmentation.

The loss of PAs leads to seeds in which the colour of the embryo is visible due to the lack of pigmentation in the seed coat, a phenotype referred to as transparent testa (‘tt’ for short form) and also the etiology of the names of many of the genes involved in biosynthesis and regulation of PA production. PAs also harden the seed coat, thereby affecting germination rates (Debeaujon et al., 2000). In addition to the seed coat, PAs are found in bark, leaves of certain plants (notably in teas), and in many skin and/or flesh of fruits such as grapes, apples, cranberries and persimmon (Xie and Dixon, 2005), where PAs are believed to play a role in defense (Iriti et al.,

2005, Miranda et al., 2007, Singh et al., 2009, Panjehkeh et al., 2010, Yuan et al., 2012).

Most PAs are comprised of flavan-3-ols; however, it is worth noting that PA polymers containing other flavonoid subunits, such as flavan-4-ol (e.g. apiferol and luteoferol) or anthocyanin derivatives, are found in some species (Xie and Dixon, 2005).

1.3.2 Trans-flavan-3-ol biosynthesis by LAR

Early biochemical studies using crude cell extract demonstrated the conversion of dihydroflavanols to 2,3-trans-flavan-3-ols, a process attributed to the consecutive actions of DFR and leucoanthocyanidin reductase (LAR; EC 1.17.1.3; Stafford and Lester, 1985). LAR was first characterized from Desmodium uncinatum and shown to catalyze one of two metabolic branch points leading to PA biosynthesis (Tanner et al., 2003). LAR is a SDR superfamily member that

catalyzes the NADPH-dependent C4-reduction of 2R,3S,4S-flavan-3,4-diols

(leucoanthocyanidins) to 2R,3S-trans-flavan-3-ols (Fig. 1.1; Tanner et al., 2003). LAR is

13

frequently encoded by a small gene family with the notable exception of M. truncatula, in which

only one gene has been identified (Bogs et al., 2005, Tsai et al., 2006, Almeida et al., 2007, Pang

et al., 2007, Paolocci et al., 2007). Interestingly, grape LAR isoforms display relatively divergent

homology with LAR1 and LAR2 only 58% identical at the amino acid level (Pfeiffer et al.,

2006).

Kinetic analysis of LAR is very limited, likely owing to the reported instability and lack

of commercial availability of the leucoanthocyanidin substrate. To date, a detailed kinetic study

of LAR has only been accomplished in Desmodium uncinatum (Tanner et al., 2003). DuLAR

displays comparable affinity for leucodelphinidin (3',4',5'-hydroxylated leucoanthocyanidin) and

leucocyanidin (3',4'-hydroxylated), whereas affinity for leucopelargonidin (4'-hydroxylated) was

about 5-fold lower (Tanner et al., 2003). The substrate preference may be due to the interactions

between the B-ring hydroxyl groups of the substrate and NADPH, which is believed to bind

LAR first prior to substrate docking (Maugé et al., 2010). Furthermore, Tanner et al. (2003)

found that DuLAR was stereoselective for the 3,4-cis-leucocyandins and did not reduce the 3, 4- trans-isomers, though the presence of these unnatural substrates was inhibitory, as was the presence of anthocyanidins and dihydroflavanols. However, grape LAR is able to also reduce the flavan-4-ol, luteoforol, suggesting a limited degree of promiscuity with respect to substrate acceptance (Pfeiffer et al., 2006). In the absence of a leucoanthocyanidin substrate, many studies have opted to use DFR-LAR coupled assays to test the activity of LAR protein products

(Paolocci et al., 2007, Gagné et al., 2009, Maugé et al., 2010).

Not all species studied accumulate 2,3-trans-flavan-3-ols. No LAR gene has been identified in Arabidopsis, which produces PAs composed exclusively of the 2,3-cis-flavan-3-ol, epicatechin (EC; Lepiniec et al., 2006). Although the M. truncatula genome contains a LAR

14

gene, the gene product displays relative low in vitro activity and M. truncatula PAs are

composed almost exclusively of EC, indicating that the LAR gene in M. truncatula does not

contribute to trans-flavan-3-ol biosynthesis (Pang et al., 2007).

1.3.3 Cis-flavan-3-ol biosynthesis by ANR

Initial models of PA biosynthesis assumed condensation of flavan-3-ols with an

electrophile, such as leucoanthocyanidin; however, the 2,3-cis-stereochemistry of most extension

units demonstrated this model was incomplete (Xie and Dixon, 2005). The BANYULS (BAN)

gene in Arabidopsis was initially thought to encode LAR (Devic et al., 1999), but enzymatic

characterization revealed that its product was a novel enzyme, anthocyanidin reductase (ANR;

EC 1.3.1.77), responsible for the formation of 2,3-flavan-3-ols (Fig. 1.1; Xie et al., 2003). ANR acts downstream of ANS, reducing anthocyanidins to cis-flavan-3-ols via a NADPH-dependent double reduction at C3 and C4 to form 2R,3R-cis-flavan-3-ols, which is unusual as it requires two cofactor equivalents to reduce one equivalent of substrate. Substrate acceptance is generally strict with species-specific kinetic preferences, based on the degree of hydroxylation of the B- ring of the substrate (Xie et al., 2004b).

ANR is also a SDR family member and displays high amino acid homology to DFR

(Devic et al., 1999, Xie et al., 2003, Gargouri et al., 2010). Both ANR and LAR have been characterized in several species, including Arabidopsis (ANR only), M. truncatula, tea (Camellia sinensis), grape (Vitis vinifera), poplar (Populus spp.) and apple (Malus x domestic Borkh; Xie et al., 2004b, Pfeiffer et al., 2006, Pang et al., 2007, Gagné et al., 2009, Singh et al., 2009, Wang et al., 2013). Unlike LAR, ANR is encoded by a single gene in many species investigated, though two putative copies were found in strawberry (Fragaria x ananassa), three putative copies in

15

poplar and two functional copies exist in tea, suggesting a degree of genetic diversity may exist

(Tsai et al., 2006, Almeida et al., 2007, Pang et al., 2013a). Expression of ANR in Arabidopsis is

restricted to flower tissues and immature seed coat (Devic et al., 1999). In other species, such as

M. truncatula, grape and tea, ANR is also expressed in the leaves and fruit where PAs also

accumulate (Xie et al., 2004b, Pfeiffer et al., 2006, Singh et al., 2009).

1.3.4 Cis- to trans-flavan-3-ol epimerization by ANR

One controversial finding in recent years is that ANR can synthesize trans-flavan-3-ols

(LAR product) from cis-flavan-3-ols by inherent epimerization activity. In grape, ANR has been

observed to produce both 2S,3R-trans-flavan-3-ols and 2S,3S-cis-flavan-3-ols in vitro (Fig. 1.2;

Gargouri et al., 2009b, Gargouri et al., 2010). This reaction occurs with high efficiency in the presence of excess NADP+, generating roughly a 1:1 ratio of cis- and trans-products from

cyanidin (3',4'-hydroxylated anthocyanidin; Gargouri et al., 2010). Further examination revealed

that the 2S-flavan-3-ol was a product of C3-epimerization, a process which only occurs with 2R-

flavan-3-ols (Gargouri et al., 2010). No conversion of 2S- to 2R-stereochemistry was observed,

which is strange as the 2R,3R-stereochemistry is considered to be the natural product of ANR.

Tea ANR1 and 2 are highly homologous to grape ANR (79% and 83% protein identity,

respectively) and when provided with an anthocyanidin substrate, both tea ANRs produced a

mixture of cis- and trans-flavan-3-ols (Pang et al., 2013a). The proportion of the cis- versus trans-product varied between isoform and depending on the substrate provided, further indicating that epimerization is enzymatic (Pang et al., 2013a).

For both tea and grape ANRs, the major products formed from the in vitro reactions display (+)-cis-stereochemistry, which has only been identified in few species including palm

16

(Palmae spp.), spotted knapweed (Centaurea maculosa) and guarana (Paullinia cupana), and (-)

-trans-stereochemistry, which is only known to be present as a minor constituent in Chamaebatia

foliolosa (Delle Monache et al., 1972, Nahrstedt et al., 1987, Perry et al., 2005, Yamaguti-Sasaki

et al., 2007). Neither compound has been reported as a component of grape or tea PAs, the latter

of which is believed to be composed of (-)-cis- and (+)-trans-flavan-3-ols as these are the major flavan-3-ols reported in tea (Henning et al., 2003, Hodgson, 2008). This raises the question of whether these products are artefacts of the in vitro reactions conditions. Non-enzymatic epimerization of (-)-EC to (-)-catechin (Cat; a 2,3-trans-flavan-3-ol) has been reported before

(Xie et al., 2004b). In the cases of grape and tea ANR, enzymatic epimerization was clearly indicated; however, this does not rule out the possibility that the in vitro conditions unnaturally

influence the activity of grape and tea ANRs. Unfortunately, many of the studies of ANR did not

examine the chirality of the products, and so knowledge of the frequency with which

epimerization activity occurs remains limited. Nonetheless, the confirmation of ANR epimerase

activity in two distinct species raises the possibility of in vivo bi-functionality for ANR.

1.3.5 Substrate channelling in the PA biosynthetic pathway

The involvement of phenylpropanoid biosynthetic genes in multi-enzyme complexes

(metabolons) has been suggested as a possible mechanism for controlling metabolic flux in the early phenylpropanoid pathway (Winkel, 2004). Isoforms of PAL display differential spatial expression profiles that correlate with the accumulation of compounds produced by different branches of the phenylpropanoid pathway (Kao et al., 2002). For example, in aspen (Populus tremulodies), PtmPAL1 expression correlates with proanthocyanidin biosynthesis, whereas

PtmPAL2 is primarily expressed in lignifying tissues (Kao et al., 2002). A study of tobacco

17

(Nicotiana tabacum) NtPAL subcellular localization demonstrated that NtPAL1 is localized to both the and the endomembrane, whereas NtPAL2 is only cytosolic (Achnine et al.,

2004). However, when co-expressed with membrane-bound NtC4H, localization of NtPAL2 shifted to the endomembrane (Achnine et al., 2004).

The enzymatic promiscuity of ANS (Section 1.2.4.3) raises a number of questions about how flux through ANS is controlled in planta. The ability of ANS to catalyze both the ‘forward’ conversion of a substrate (e.g. leucoanthocyanidin) to the expected natural product (e.g. anthocyanidin), while also catalyzing the ‘reversal’ of the substrate to an upstream compound

(e.g. dihydroflavanol) is perplexing from a biological standpoint (Fig. 1.1). One proposed explanation is that cis-leucoanthocyanidins, which are less stable than the trans- form produced by DFR, favour the production of the anticipated natural product of ANS, cyanidin (Wilmouth et al., 2002). While this model can account for the in planta activity of ANS as established by mutant analysis (Abrahams et al., 2003), it does not explain how hydroxylation of the unintended substrate, naringenin, is avoided in vivo. Substrate channelling is an appealing explanation, and a complex between ANS and anthocyanidin 3-glucosyltransferase, which catalyzes the first committed step in anthocyanin biosynthesis, has also been proposed (Saito et al., 1999).

Recently, the localization of grape ANS to the ER was demonstrated by immunohistochemistry

(Wang et al., 2010). As ANS lacks an apparent transmembrane domain, this further suggests that the enzyme interacts with other proteins localized on the ER.

1.3.6 Intracellular transport of PA precursors

Biosynthesis of PA intermediates up to flavan-3-ols is believed to occur on the cytoplasmic face of the ER, whereas PAs accumulate in the central vacuole (Zhao et al., 2010).

18

Plant are large that can account for the majority of a cell’s volume in

vegetative tissues and are involved in an array of functions, including lytic processes analogous

to animal , storage of ions and metabolites, structural integrity through turgor pressure,

and maintenance of cellular homeostasis (Rosado and Raikhel, 2010). Different types of

vacuoles develop in different tissues or in some cases within the same cell. Vacuole types can be

distinguished based on the presence of different tonoplast intrinsic proteins (TIPs) found in their

membrane. Trafficking of proteins and metabolites to vacuoles occurs via anterograde transport

from the ER (secretory pathway), or from (endocytic pathway), by Golgi-dependent

and independent means (Marty, 1999). Glycosylation or acylation is a common feature in the

trafficking of phenylpropanoids, and is catalyzed by a large group of UDP-glycosyltransferases and acyltransferases, respectively (Tohge et al., 2013).

1.3.6.1 Glutathione S-transferase (GST) in PA metabolism

TRANSPARENT TESTA19 (TT19) encodes a glutathione S-transferase (GST), which

when defective results in a significant reduction in anthocyanins and soluble PAs in Arabidopsis

(Kitamura et al., 2004, Kitamura et al., 2010). Vanillin stain reacts with leucoanthocyanidins,

flavan-3-ols, and the terminal units of PAs, making it a useful tool for studying the localization

of these compounds (Debeaujon et al., 2000). In tt19 plants, vanillin staining revealed

accumulated flavan-3-ols or PAs in small vacuole-like structures (Kitamura et al., 2010). Despite the lack of soluble PAs, the transparent testa phenotype of tt19 seeds gradually disappears during long-term storage and desiccation, eventually producing a seed coat colour comparable to wild- type seed (Kitamura et al., 2004). This is likely due to the oxidation of insoluble, -bound

PAs, which accumulate in tt19 plants (Kitamura et al., 2010). Thus, TT19 is believed to play an

19

important role in the trafficking and vacuolar accumulation of anthocyanins and soluble PAs, or

PA precursors.

Processing of anthocyanins versus PA-derivatives appears to be unique. An Arabidopsis

activation tagging mutant, tt19-4, overexpresses a TT19 protein with a C-terminal point mutation

and accumulates anthocyanins at wild-type levels, but lacks a brown seed coat indicative of a

lack of PA production (Li et al., 2011). This single amino acid changes appears to result in a

protein with an altered C-terminal domain that is significantly less soluble (Li et al., 2011).

Similarly, constitutive expression of a TT19 ortholog from petunia, anthocyanin 9 (AN9), is able

to rescue anthocyanin accumulation in the vacuoles of loss-of-function tt19 plants, but proanthocyanidin accumulation remains impaired (Kitamura et al., 2004).

Despite being classified as a GST, TT19 does not appear to produce flavonoid- glutathione conjugates, but rather it appears to form a protein-flavanoid complex, as has been shown with coumaric and sinapic acids (Li et al., 2011). TT19 may function by binding phenylpropanoids and directing their intra- or intercellular transport (Zhao et al., 2010).

Arabidopsis tt19-7 loss-of-function mutants contain increased flavanol levels but very little anthocyanins and produced seeds with pale yellow coats, suggesting a lack of PAs (Sun et al.,

2012). In vitro assays demonstrated that TT19 binds cyanidin (an anthocyanidin), and to a lesser extent cyanidin-3-O-glucoside, significantly increasing the water solubility of these compounds

(Sun et al., 2012). Overall, TT19 appears to be involved primarily in anthocyanin biosynthesis; however, its function may overlap with PA biosynthesis at the point of anthocyanidins, which serve as immediate precursors to flavan-3-ols.

20

1.3.6.2 Transporters in PA metabolism

Further evidence of active intracellular transport of PA-precursors comes from studies of the multidrug and toxic compound extrusion (MATE) genes TRANPARENT TEST12 (TT12) from Arabidopsis and the M. truncatula homolog, MATE1, both of which encode a flavonoid/(proton) H+-antiporter (Debeaujon et al., 2001, Marinova et al., 2007, Zhao and

Dixon, 2009). Arabidopsis tt12 loss-of-function mutants produce transparent testa seeds that show reduced dormancy during germination, both indicators of reduce PA levels (Debeaujon et al., 2001). Furthermore, vanillin staining and metabolite analysis indicate a reduction in vacuolar sequestration and overall accumulation of free flavan-3-ols and PA polymers, respectively, in tt12 plants (Debeaujon et al., 2001, Marinova et al., 2007). Instead, vanillin reactive compounds accumulate in clusters around small vacuoles in tt12 plants (Kitamura et al., 2010). Consistent

with its involvement in PA biosynthesis, TT12 is localized to the tonoplast (vacuolar membrane)

of PA producing cells (Marinova et al., 2007).

Like TT12, MATE1 is localized to the tonoplast and is able to rescues the tt12 phenotype

(Zhao and Dixon, 2009). Transport assays in yeast found that both TT12 and MATE1 are able to catalyze uptake of epicatechin 3'-O-glucoside (EC3'OG), and to a lesser degree cyanidin-3-O-

glucoside (Cy3OG), though only the former is consider a PA precursor (Marinova et al., 2007,

Zhao and Dixon, 2009). Furthermore, an EC hexoside, possibly EC3'OG, accumulates at higher

levels in Arabidopsis tt12 mutants (Kitamura et al., 2010). No transport of a variety of flavonoid substrates, including cyanidin and EC aglycones, flavanol-glycosides and dimers, by

TT12 was observed (Marinova et al., 2007). In addition, catechin-3-O-glucoside inhibited the transport of Cy3OG, suggesting TT12/MATE substrate preference is limited to flavan-3-ols

21

(Marinova et al., 2007). Together, these results clearly demonstrate the involvement of highly specific transporters in PA-precursor trafficking.

1.3.6.3 Proton (H+)-antiporter in PA metabolism

Consistent with the involvement of a H+-antiporter, a mutation in a P-type H+-ATPase, autoinhibited H+-ATPase isoform 10 (AHA10), results in Arabidopsis plants with light-coloured

seeds and delayed seed coat vacuole development (Baxter et al., 2005). Plant P-type H+-ATPases

typically generate a proton gradient across the plasma membrane in an ATP-dependent manner,

which makes AHA10 somewhat unique as it generates a gradient across the tonoplast.

Significantly, aha10 plants accumulate wild-type levels of anthocyanins but contain PA levels

more than 100-fold lower than wild-type Arabidopsis (Baxter et al., 2005). Detailed examination

by Baxter et al. (2005) revealed accumulation of epicatechin, which is not normally detectable in

wild-type Arabidopsis, and slightly elevated levels of cyanidin, demonstrating a tight association

between AHA10 expression and flavan-3-ol levels.

1.3.6.4 Glycosylation of PA monomers

The role of flavan-3-ol glucosylation in PA biosynthesis was further established by the

identification of a uridine 5'-diphospho-glucuronosyltransferase (UGT) in M. truncatula

(UGT72L1; Pang et al., 2013b). Seeds of ugt71L1 loss-of-function plants retain a dark brown

seed coat; however, soluble PA levels are significantly lower (Pang et al., 2013b). Furthermore,

constitutive expression of UGT72L1 resulted in a 200-fold increase in the concentration of

EC3'OG, indicating preferential glucosylation of EC, but surprisingly the amount of soluble PAs

were reduced in plants overexpressing UGT72L1 while insoluble PAs were not affected (Pang et

22

al., 2013b). A similar phenomenon is seen in tt19 loss-of-function mutants, which accumulate a much higher percentage of insoluble PAs relative to soluble PAs (Li et al., 2011). This may suggest that a bottleneck at the point of flavan-3-ols in the PA pathway shifts flux to insoluble

PAs through a separate process. Alternatively, excess flavan-3-ols may spontaneously form insoluble oligomers. Analysis of insoluble PAs is typically conducted using a butanol:HCl method that hydrolyzes the monomeric subunits to anthocyanins that can be quantified spectro- photometrically . However, this method does not give any structural information which may distinguish insoluble PAs formed in wild-type plants from those in UGT72L1-expressing or tt19 loss-of-function lines.

UGT72L1 may be only one of a number of UGTs involved in flavan-3-ol trafficking.

While M. truncatula PAs are composed of EC, PAs in many other plants contain a variety of flavan-3-ols displaying different degrees of B-ring hydroxylation and chiral chemistry.

UGT72L1 is unable to glucosylate (+)-Cat, suggesting either that additional UGTs exist in other plant species or UGT72L1 may have uniquely strict substrate specificity (Pang et al., 2013b).

1.3.6.5 Vesicle-mediated transport of PA monomers

Despite the considerable progress made in uncovering transport processes of PA- precursors, the means by which these compounds are trafficked from their point of synthesis, presumably the surface of the ER, to the vacuole remains unknown. Studies of TT19 suggest a complex with a protein-partner may be involved. However, whether this complex is actively trafficked (i.e. vesicle-mediated) or passively diffuses to the tonoplast has yet to be determined.

Some intriguing studies of anthocyanin trafficking suggest active mechanisms may be involved.

Anthocyanins can be readily visualized in a cell due to their fluorescent properties, making for

23

relatively easy examination of the subcellular localization in Arabidopsis seedlings grown under

anthocyanin-induction conditions (Poustka et al., 2007). Using this approach, Poustka et al.

(2007) identified anthocyanin-containing vesicles associated with the ER that appeared to be

trafficked to the storage vacuole in a Golgi-independent manner. Furthermore, broad chemical-

inhibition of GSTs and ATP-binding cassette (ABC)-class transporters increased the size and possibly the number of anthocyanin-containing vesicles, indicating a block in trafficking

(Poustka et al., 2007). Metabolite analysis identified Cy3OG as the major anthocyanin present in

Arabidopsis seedlings grown under inductive conditions (Pourcel et al., 2010). Arabidopsis lines carrying mutations in important genes displayed reduced anthocyanin levels, further suggesting a role for membrane-bound vesicles in trafficking (Pourcel et al., 2010). Given the parallels in biosynthesis and chemical similarities between PA-precursors and anthocyanins, it is tempting to speculate that PA-precursors share a similar vesicle-mediated transport mechanism.

However, a key question remains. In mature seeds, PAs are associated with the cell wall yet much evidence indicates that PA-precursors accumulate in the vacuole, where polymerization is thought to occur (Zhao et al., 2010). How these monomers and/or polymers exit the vacuole and cross the plasma membrane is a mystery. Mechanical forces produced by the expanding embryo may simply lead to cell death causing release of soluble PAs, which then interact with cell wall material.

Recently, evidence of vesicle-trafficking of PA-like material was demonstrated in grape, as well as a number of other vascular plants (Brillouet et al., 2013). Brillouet et al. (2013) took advantage of the protein-binding property of PAs and localized PAs using gelatin coupled to

Oregon Green, a fluorophore excitable by a 488 nm laser. In doing so, the researchers were able to visualize PAs inside of membrane bound structures, which they termed tannosomes. However,

24

the tannosomes were reported to be of plastidial () origin, which conflicts with the

widely reported ER or cytosolic localization of flavonoid biosynthetic enzymes (He et al., 2008,

Brillouet et al., 2013). While the evidence presented by Brillouet et al. (2013) is compelling,

acceptance of vesicle-mediated trafficking in PA metabolism will require resolution of this

apparent conflict and further characterization of tannosomes.

1.3.7 PA structures

PA oligomers and polymers are commonly formed by polymerization of 2,3-cis- and 2,3-

trans-flavan-3-ol monomeric subunits producing a natural polymer (PAs), a process that is

believed to occur in the vacuole of the cell (Zhao et al., 2010). PA structural diversity arises

through variations in the type(s) of flavan-3-ol subunits, the inter-flavan-3-ol linkage between

successive subunits (Fig. 1.2), the average length of the polymers (mean degree of

polymerization), and derivatizations, such as O-methylation, O-acetylation, and C-/O- glycosylation (He et al., 2008).

(+)-Cat was originally considered the ‘starter’ or terminal subunit; however, extensive analysis of PAs from a wide variety of species found that (-)-EC was also frequently present as a terminal unit (Xie and Dixon, 2005). The most common PAs polymers and dimers are ‘B-type’, consisting of C4-C8 or C4-C6 linkages between the ‘upper’ and ‘lower’ subunits, respectively, in which the inter-flavan-3-ol bond can be either α- or β-orientation (Xie and Dixon, 2005).

Different combinations of C4-C8 or C4-C6, α- or β-linkages, and (-)-cis- or (+)-trans-flavna-3-ol terminal units, produces a complex array of different B-type PAs, termed B1 to B8 (Fig. 1.2; He et al., 2008). Additionally, less common linkages have also been found in certain species; a double linkage (A-type PA) between C2 and C4 of the ‘upper’ unit and the oxygen at C7 and C6

25

or C8 of the ‘lower’ unit (Fig. 1.2), respectively, in plants such as peanut (Arachis hypogaea) and cranberry (Vaccinium oxycoccos; Dixon et al., 2005).

The expanding structural diversity of PAs has rendered the conventional nomenclature system obsolete. A new system, derived from polysaccharide nomenclature has been introduced, though conventional terminology is still widely used (He et al., 2008). The new system designates PAs based on their constituent subunits and inter-flavan-3-ol bonds. For example, under the old system, a C4-C8 EC-Cat dimer with a β-bond was referred to as a procyanidin B1 dimer, whereas the new nomenclature for this compound is epicatechin-(4β8)-catechin, with polymers denoted by [subunit-(bond)]n.

1.3.8 Condensation of PA monomers

The mechanism of PA polymerization/condensation remains the great mystery of

proanthocyanidin metabolism (Zhao et al., 2010). Parallels between PAs and the biosynthesis of

lignins have been drawn due to their common origin and roles in plant defense (Stafford, 1988),

and lignin biochemistry has informed a number of hypothesises for non-enzymatic PA

polymerization (Dixon et al., 2005). However, PAs display a species-specific stereochemistry, a

feature not shared with lignins (Xie and Dixon, 2005). Lignans are produced by dimerization of a

hydroxycinnamoyl alcohol derivative (E-coniferyl alcohol), a process that readily occurs in vitro

in the presence of an oxidative laccase enzyme (Davin et al., 1997). However, this reaction leads to a racemic mixture of products, whereas strict stereochemistry is found in planta [i.e. only (+)- pinoresinol accumulates). Davin et al. (1997) discovered that specific stereo-configuration is guided by an accessory protein, termed a dirigent protein, which lacks catalytic activity and

26

instead simply maintains proper orientation between oxidized radical subunits during the

bimolecular phenoxy radical coupling reaction.

An intriguing clue comes from an Arabidopsis transparent testa10 (tt10) loss-of-function

mutant, which accumulates increased levels of free EC (Pourcel et al., 2005). TT10 encodes a laccase-type oxidase that is expressed early in seed development and overlaps with PA- accumulating cell layers (Pourcel et al., 2005). An oxidase or laccase could generate flavan-3-ol derived reactive intermediates for polymerization.

Altering the accumulation of PA-precursors in TT10 and tt10 Arabidopsis provided some interesting results. In plants constitutively expressing UGT72L1, insoluble PAs were relatively unchanged regardless of the presence of a functional TT10 gene. However, the amount of soluble

PAs doubled when UGT72L1 was constitutively expressed in a tt10 genetic background and levels of EC-aglycone and EC3’OG were also elevated (Pang et al., 2013b).

Interestingly, in vitro incubation of TT10 and EC yields unnatural oxidation products, which strongly suggests that if TT10 is involved in polymerization, additional mechanisms to specify stereochemistry may be required (Pourcel et al., 2005). TT10 may be involved in the conversion of soluble PAs into insoluble PAs, rather than as an absolutely required partner in PA polymerization. This would mean that insoluble PAs can still accumulate in the absence of TT10, either through a redundant enzyme function or a spontaneous reaction, as hypothesized in the case of Arabidopsis tt19 mutants and M. truncatula hairy-root cultures constitutively expressing

UGT72L1 (see sections 1.3.6.1 and 1.3.6.4).

Curiously, when UGT72L1 was constitutively expressed in Arabidopsis tt10 lines, PA polymers with a lower mean degree of polymerization were produced compared to UGT72L1- overexpression in a wild-type background, though the same result was not seen when UGT72L1

27

was overexpressed in M. truncatula hairy-roots (Pang et al., 2013b). However, as hair roots do not naturally accumulate PAs, the required oxidase partner may not be sufficiently expressed

(Pang et al., 2008). Pang et al. (2013b) suggest a model in which EC acts as the starter unit and

EC3’OG serves as the extension units. Thus, the ratio of the aglycone to glucoside may influence the mean degree of polymerization (Marinova et al., 2007, Zhao and Dixon, 2009).

Despite near complete saturation mutagenesis in Arabidopsis, no gene encoding a protein conclusively responsible for PA polymerization has been identified. This may be due to functional redundancy or the existence of a gene family, both of which are known to exist for other genes in the PA biosynthetic pathway (as discussed above). Also, conventional screening for PA pathway perturbation relies on examining seed coat colour. A single mutation in a dirigent-like protein may result in abnormal polymerization, yet still produce a seed coat that appears wild-type in colour. New screening methods are required to address this potential gap.

Alternatively, polymerization may be a spontaneous reaction. In vitro condensation between Cat or EC and leucoanthocyanidins provided early evidence to support this model (Creasy and

Swain, 1965). However, the apparent species-specific subunit linkage and the predominant 2,3- cis-stereochemistry of most extension units complicates this explanation.

1.3.9 Transcriptional regulation of PA biosynthesis

Numerous biotic and abiotic stimuli, such as attack by microbial pathogens, changes in plant hormone homeostasis, light stress, and nutrient deficiencies, have been connected to changes in PA biosynthesis (He et al., 2008). Along with developmental influences, these signals are primarily mediated by six families of transcription factors (TFs): MYC or basic-helix-loop- helix (bHLH) TFs, MYB TFs, WD40-repeat proteins (WDR), WRKY TFs, MADS

28

homeodomain genes and WIP (TFIIIA-like) proteins, of which bHLH and MYB TFs families are the most direct regulators of PA accumulation (He et al., 2008, Tohge et al., 2013). These TF families are typically named in accordance with the conserved domain(s) that define each family.

For example, bHLH TFs contain a basic-helix-loop-helix domain composed of conserved basic region involved in DNA binding and a helix-loop-helix structure that directs homo- or heterodimerization (Hichri et al., 2011). MYB TFs are characterized based on R1, R2, and R3 N- terminal repeat domains consisting of roughly 50 amino acids. DNA binding and dimerization is directed by this N-terminal domain, whereas the C-terminal domain modulates gene expression

(Hichri et al., 2011). WDR proteins acquire their name from an 11-24 residue motif that may be tandemly repeated. Interestingly, WDRs are not known to have any DNA binding or gene regulatory properties. Instead, they are thought to act as stabilizers of TF complexes by interacting with several proteins at once (Hichri et al., 2011). The name of WKRY TFs derives from the N-terminal DNA-binding domain, of which the flanking Trp (W) and Try (Y) are strictly conserved whereas a limited degree of variation occurs with the Arg (R) and Lys (K) residues (Rushton et al., 2010).

Many of the TFs regulating flavonoid biosynthesis were first identified in maize, but have since been studied in a variety of species, such as Arabidopsis, Petunia, snapdragons

(Antirrhinum spp.) and grape (Hichri et al., 2011). As the regulation of PA biosynthesis has been most thoroughly characterized in Arabidopsis, discussion will focus on this species though important distinctions or similarities between species will be noted.

29

1.3.9.1 Transcriptional regulation of PA metabolism in Arabidopsis

In Arabidopsis, a group of eight genes have been identified as regulators of PA

biosynthesis, which includes TRANSPARENT TESTA2 (TT2, MYB), TT8 (bHLH), TT1 (WIP),

TT GLABRA1 (TTG1; WDR), TTG2 (WRKY), GLABRA3 (GL3; bHLH), and ENHANCER OF

GLABRA3 (EGL3; bHLH; Fig. 1.3; He et al., 2008). Of these TFs, a ternary complex of a MYB

(TT2), bHLH (TT8) and WDR (TTG1) primarily regulates PA accumulation (Fig. 1.3; Baudry et

al., 2004).

PA production in Arabidopsis is directed by TT2, which encodes a R2R3-MYB TF that

directly activates ANR (BAN) expression (Nesi et al., 2001). TT2 is specifically expressed in PA

accumulating tissues, and Arabidopsis tt2 loss-of-function mutants display impaired expression

of DFR, ANS, ANR and TT12, whereas CHS, CHI, F3H, F3’H and FLS are all expressed

normally (Nesi et al., 2001). Similarly, constitutive expression of AtTT2 is sufficient to induce

ecoptic MtANR expression, resulting in the accumulation of high levels of PAs in M. truncatula

hairy-roots, which do not normally produce PAs (Pang et al., 2008). Importantly, ectopic

induction of AtANR by AtTT2 is dependent on the presence of a function copy of TTG1 (Baudry

et al., 2004). Similarly, the MYB Proanthocyanidin Regulator (MtPAR) specifically regulates

PA biosynthesis in M. truncatula (Verdier et al., 2012). Expression of MtPAR induced PA

production in M. truncatula hairy-roots and in the related legume M. sativa, while par loss-of-

function mutants produce normal levels of anthocyanins and other flavonoids (Verdier et al.,

2012). Three L. japonicus MYB TT2-homologs, which vary in tissue expression, are able to form a complex with AtTT8 (bHLH) and AtTTG1 (WDR), leading to the activation of AtANR and

LjLAR promoters (Yoshida et al., 2008). Like AtTT2, grape VvMYBPA1 promotes the expression of PA-specific genes in developing fruits and seeds, while VvMYB2 induces production in young

30

berries and leaves (Bogs et al., 2007, Terrier et al., 2009). Conservation of TT2 between species

appears to be high as AtTT2-expression in grape led to broad PA production in grape seedlings

and expression of LjTT2s was able to drive AtANR expression in Arabidopsis (Bogs et al., 2007,

Yoshida et al., 2008). MYB-mediated signalling also appears to be involved in the induction of

PA biosynthesis in response to stress. UVB-light, herbivore feeding activities, and fungal infection trigger the expression of poplar MYB134, an AtTT2 homolog, and a subsequent increase in PA content (Mellway et al., 2009). In addition to the regulation of structural genes, TT2 also influences the downstream WKRY regulator, TTG2, which is required for proper trichrome development, differentiation of root hairless cells and seed coat pigmentation (Ishida et al., 2007,

Gonzalez et al., 2009). Overall, of the three TFs in the MYB-bHLH-WDR complex, the MYB

protein (e.g. TT2) is a central regulator of the PA pathway.

Several bHLH TFs influence PA pathway genes and are characterized by N- and C-

terminal domains involved in protein-protein interactions with the MYB and WDR complex

partners, respectively (Hichri et al., 2011). In Arabidopsis, a deficiency in TT8, a bHLH TF,

results in a loss of PA accumulation (Nesi et al., 2000). AtTT8 expression overlaps with late PA

pathway biosynthetic genes while having little or no direct impact on early phenylpropanoid

pathway genes. For example, PAL1, 4CL1/3 and C4H are all expressed at normal levels in tt8

loss-of-function mutants, whereas expression of ANR and DFR are significantly inhibited (Nesi

et al., 2000). Interestingly, AtTT8 plays a positive role in its own expression, and is a required

partner for the binding of TT2 to the TT8 promoter (Baudry et al., 2006). However, the bHLH

homologs GL3 and EGL3 are able to substitute for TT8 in this role (Baudry et al., 2006). Though seed coat mucilage production, root hair development, and anthocyanin biosynthesis are perturbed in Arabidopsis gl3 egl3 double mutants, the seeds of these plants have brown, wild-

31

type coloured coats (Zhang et al., 2003). When overexpressed, GL3 and EGL3 are able to

partially suppress the light coloured phenotype of tt8 seed coat (Zhang et al., 2003). This may be

a result of increased flux through the late flavonoid biosynthetic pathway as F3’H, DFR, ANS,

and TT8 are all direct targets of GL3 expression (Gonzalez et al., 2008). Furthermore, both GL3

and EGL3 can substitute for TT8, and activate the ANR promoter in conjunction with AtTT2 and

TTG1 (Baudry et al., 2004). In contrast, AtTT2 cannot be substituted with any closely related

MYBs in Arabidopsis (Baudry et al., 2004). However, AtTT2 partially complements loss-of-

function mutations in AtMYB5, a major regulator of outer seed coat differentiation and trichrome

development (Gonzalez et al., 2009). Thus, regulation of PA production in Arabidopsis displays

a higher degree of redundancy with respect to the bHLH TFs compared to the MYBs, which may be the product of broader specificity among some bHLHs.

WDR homologs of TTG1 have been identified in numerous species, such as petunia, maize (Zea mays), M. truncatula, and grape. However, unlike the MYB and bHLH genes which

exist as gene families, the WDR genes appear to be a single copy in most species studied (Hichri

et al., 2011). M. truncatula, however, does contain two similar copies of a TTG1-like gene,

MtWD40-1 and -2, (Pang et al., 2009). But as the expression of MtWD40-2 is substantially lower

than MtWD40-1 and knockout of MtWD40-1 alone results in a lack of PA accumulation,

MtWD40-2 does not appear to be functionally redundant with MtWD40-1 (Pang et al., 2009).

TTG1 functions more broadly than TT8 or TT2, affecting the production of seed mucilage, the

accumulation of anthocyanins and PAs, and formation of hairs on leaves, stems and roots in

Arabidopsis (Walker et al., 1999). TTG1 appears to function as a promoter of transcriptional activation. While TT2 and TT8 can activate the ANR promoter in the absence of TTG1, the degree of activation is significantly higher when TTG1 is present (Baudry et al., 2004). AtDFR,

32

AtANS, and AtTT8 also appear to be direct targets of AtTTG1, demonstrating a role for TTG1 in regulating the expression of both structural and regulatory genes (Gonzalez et al., 2008). In

Arabidopsis ectopically expressing AtTT2, ANR expression is dependent on the co-expression of

TTG1, which thereby conveys spatial and temporal control of ANR expression in the transgenic plants (Baudry et al., 2004). Overall, in flavonoid biosynthesis the WDR protein appears to act by promoting gene activation via MYB-bHLH-WDR complexes composed of different MYB and/or bHLH partners, and determine the tissue specificity and timing of gene expression.

In addition to regulating PA structural genes, members of the MYB-bHLH-WDR ternary complex influence the expression of each other (i.e., auto-regulations; Fig. 1.3). While

Arabidopsis tt2 plants express normal levels of TT8 and TTG1, the constitutive expression of

TT2 induces TT8 in Arabidopsis roots, suggesting a degree of involvement of TT2 in TT8 expression (Nesi et al., 2001, Baudry et al., 2006). However, neither TT8 nor TTG1 are required for TT2 expression (Nesi et al., 2001). Though expression of TT8 is not dependent on TT2, TT8 expression is impaired in ttg1 mutants (Baudry et al., 2006). TTG1 expression also affects GL2, further indicating broad regulatory role for the WDR protein (Gonzalez et al., 2009).

In addition to the three primary partners of the MYB-bHLH-WDR complex, an additional WIP-type zinc finger protein, TT1, is required for seed coat pigmentation in

Arabidopsis (Fig. 1.3; Appelhagen et al., 2011). Arabidopsis tt1 siliques display lower levels of

CHS, ANS, ANR and TT12 expression, whereas wild-type levels of CHS expression were observed in tt1 flower buds (Appelhagen et al., 2011). Appelhagen et al. (2011) noted that TT1 expression preceded both TT2 and TT8 expression in developing siliques, and that expression of known PA-related TFs was not significantly altered in tt1 mutants, suggesting that TT1 does not act through mediation of other TFs. However, TT1 is able to dimerize with TT2 and possibly

33

TT16, a MADS box protein (Appelhagen et al., 2011). Overall, TT1 displays a rather unique combination of broad gene regulation combined with spatial specificity, making it highly specific for PA biosynthesis.

The hierarchy of PA regulation remains unclear. While TT2-overexpression can partially recover the transparent testa phenotypes of tt16 and tt1, expression of both TT2 and TT16 was normal in tt1 mutants (Debeaujon et al., 2003, Appelhagen et al., 2011). Furthermore, TT1, like

TT16, shows a cell-type specific influence on PA biosynthesis (Debeaujon et al., 2003,

Appelhagen et al., 2011). A lack of change in the expression of CHS and CHI in Arabidopsis tt2 mutants suggests that other regulatory factors influence the expression of early flavonoid biosynthetic genes (Nesi et al., 2001). Appelhagen et al. (2011) proposed a model whereby TT1 functions to promote TT2-TT8-TTG1 induction of PA biosynthesis specifically in the seed coat endothelium (Fig. 1.3).

1.3.9.2 Hormonal regulation of Arabidopsis PA biosynthesis

Recently, new research has shed light on the regulation of PA biosynthesis by hormone status, specifically abscisic acid (ABA) homeostasis. ABA is a major plant hormone involved in regulating development changes, such as transitions from vegetative to reproductive growth, fruit ripening, and mediating responses to biotic and abiotic stress (Finkelstein et al., 2002). Early studies have shown a positive correlation between anthocyanin accumulation and ABA (Pirie and Mullins, 1976, Loreti et al., 2008). In grapes, ABA and content display an overlapping profile, decreasing early in berry development and remaining relatively constant with the exception of a significant peak during véraison (Lacampagne et al., 2010). Lacampagne et al. (2010) found that application of ABA to developing grapes decreased VvANR and VvLAR

34

activity early in development relative to untreated plants. VvANR and VvLAR undergo two

induction peaks during development. VvANR and VvLAR1 peak early in development, with a

second VvANR peak and strong VvLAR2 induction during véraison (Bogs et al., 2005).

Application of ABA shifted the first activation peak to a later time point relative to untreated plants and abolished the second activation peak for VvANR (Lacampagne et al., 2010). However,

Lacampagne et al. (2010) noted a disconnect between gene expression levels and enzyme activity, suggesting the possible involvement of post-translational regulation. Thus, the influence of ABA on PA production appears to be indirect, possibly functioning through alterations in the developmental status of the tissue.

1.3.9.3 Feedback and RNAi-mediated regulation of flavonoid biosynthesis

In addition to TF-mediated regulation, feedback-regulation by pathway intermediates and

RNA-interference (RNAi) have been implicated in regulation of flavonoid biosynthesis. In

Arabidopsis, deficiency in TT19 results in a slight increase in the expression of other flavonoid structural and regulatory genes, which suggests that pathway intermediates may act in a positive feedback loop influencing flavonoid gene expression (Li et al., 2011, Sun et al., 2012).

Soybean (G. max) encodes two I locus alleles that contain inverted repeats of three

GmCHS genes, the expression of which results in the production of short-interfering and the silencing of all GmCHS gene expression (Tuteja et al., 2009). In fact, the observation of

RNAi-silencing was first noted in petunia when constitutive expression of native CHS was attempted (Napoli et al., 1990). Interestingly, this natural silencing of CHS in soybean occurs only in the seed coat and represents a rare example of tissue specific RNAi-mediated silencing of secondary metabolism. RNAi-mediated regulation is also involved in a number of developmental

35

and defense processes, such as leaf patterning, flowering time, and responses to microbial or

viral pathogens (de Alba et al., 2013). However, examples of tissue specific regulation of

secondary metabolism by RNAi are not well documented.

1.4 Pea: Pisum sativum L.

Pea (Pisum sativum) is a member of the Leguminosae, or Fabaceae, family (subfamily

Faboideae). As one of the oldest domesticated crop species, human consumption and cultivation of pea predates modern history. Modern pea varieties are descendants of a wild population that originated in the Mediterranean basin and southwest Asia (Zohary et al., 2012). In the Middle

East region known as the Fertile Crescent, archaeologists have uncovered evidence in farming

villages of wild peas and cultivated peas dating back to 23,000 years before present era (BP) and

10,500-10,200 BP, respectively (Zohary et al., 2012). Its adaptability to both warm and temperate climates has led to wide incorporation of pea into agricultural regimes across the world (Zohary et al., 2012). While dry pea seeds were traditionally consumed, the advent of modern technology has shifted primary consumption to either fresh or frozen immature seeds.

Legumes are an economically important crop species, both for seed and for their ability

to fix atmospheric nitrogen, which allows for low-input farming methods to be used (Graham

and Vance, 2003). Cultivation of grain and forage legumes occupies 12-15% of the arable land on earth, with grain legumes alone providing a third of human dietary protein nitrogen requirements (Graham and Vance, 2003). Dry peas account for the fourth largest legume crop in the world, of which Canada produces roughly a third of global yield1. Behind beans (e.g.

Phaseolus, Vicia and Vigna spp.), peas are the second most prominent dietary legume in our

1 FAOSTAT (http://www.fao.org/) Accessed October 22, 2013. 36

diets and provide a rich source of protein, slowly digestible starch, sugars, fibre, minerals and

vitamins (Smýkal et al., 2012). Legumes also produce several classes of secondary metabolites,

such as isoflavonoids and proanthocyanidins, which have been linked to human health benefits

(Griffiths, 1981, Dixon and Sumner, 2003, Dixon et al., 2005).

Pea is arguably one of the founding models for genetics. In the mid to late 1800s,

Gregory Mendel observed the segregation of phenotypic features following the crossings of different pea lines leading to his establishment of the laws of inheritance. Centuries, if not millennia, of pea breeding have produced a large number of different types of pea (e.g. green, yellow, marrowfat, maple, etc.; Fig. 1.4) and various cultivated varieties of each type, referred to as cultivars (e.g. yellow pea cultivars ‘Cutlass’ and ‘CDC Golden’ or maple pea cultivars ‘CDC

Acer’ and ‘CDC Rocket’). Though there is no international repository of pea seeds, a number of collections exist around the world and contain an estimated 2,000 different cultivars of pea

(Smýkal et al., 2012). The true genetic diversity of pea contained within seed banks is difficult to determine as duplication likely exists between repositories and large scale DNA sequencing of different pea cultivars remains limited.

Seed development is a carefully orchestrated process that involves a complex network of genetic and metabolic signals that direct the differentiation of specialized tissues. Seeds are comprised of a mosaic of maternal and paternal tissues. Development of the filial endosperm and embryo is preceded by growth in the maternally-derived seed coat, which both protects as well as nourishes the developing embryo and endosperm within the seed (Weber et al., 2005). Pea seed coat functions as transient storage organ, retaining reserves of sugars and producing amino acids that are then transported into the developing seed (Rochat and Boutin, 1991, Van Dongen et al.,

2003). In certain varieties and cultivars of pea, significant amounts of PAs accumulate in the

37

seed coat. Peas provide two advantages for studying PA biosynthesis. They produce large seeds

with seed coats that are easy to isolate, and a natural diversity of PAs exists between different

cultivars and types of pea (Section 3.2.1; Griffiths, 1981, Wang et al., 1998). Despite the

importance of the seed coat in seed development and the potential health benefits of PAs, a broad

genetic analysis of the pea seed coat has not been done and knowledge of flavonoid metabolism

in the seed coat is also limited.

1.5 Next generation sequencing (NGS) technology

Exploration of the genetic diversity of Pisum sativum has been hampered by the

relatively large size of the pea genome, which is diploid and estimated to be 4300 Mb2. In

comparison, the human genome is estimated to be 3300 Mb3 and the model legume, Medicago

truncatula, has a genome 5-10 fold smaller than pea (Kaló et al., 2004). Furthermore, 20% of the

pea genome consists of a large proportion of repetitive sequences (long terminal repeat

retrotransposons), making genome sequencing difficult (Macas et al., 2007). However, recent

advances in DNA sequencing technology provide new possibilities to explore the vast genetic

diversity contained within pea cultivars by examining only the sequences of expressed genes

(gene transcripts), thereby greatly reducing complexity (Franssen et al., 2011, Kaur et al., 2012).

Since the first plant genome, Arabidopsis thaliana, was sequenced in 2000, twenty-seven plant

genomes have become publicly available4, due primarily to the high-throughput nature and reduced cost of NGS versus Sanger sequencing (Delseny et al., 2010).

2 Kew Royal Botanic Garden (http://data.kew.org/cvalues/) Accessed October 22, 2013. 3 Archive EnsEMBL release 68 - July 2012 4 http://plantgdb.org/prj/GenomeBrowser/. Accessed October 22, 2013. 38

1.5.1 Roche/454-pyrosequencing

Released in 2004, Roche/454-pyrosequencing5 (454-sequencing) was the first widely adopted NGS technology (Rothberg and Leamon, 2008). 454-sequencing involves fragmenting a

DNA library (~300-800 bp), followed by denaturation and ligation of each fragment first to adapters and then to small beads under conditions that limit one fragment per bead. PCR amplification then copies each DNA fragment several million times. The beads are placed in a

PicoTiterPlate that only allows one bead per well, and the plate is placed in a flow cell that sequentially applies a single nucleotide base to all the wells. A DNA-polymerase extends the adapter primers of each DNA fragment. When a nucleotide is added, a pyrophosphate molecule is released. Sulfurylase then converts the pyrophosphate to ATP, which is used by luciferase to produce a chemiluminescent signal that is recorded by the instrument. This overall process is referred to as ‘sequencing by synthesis after amplification’. The flow cell is then washed and the cycle is repeated with a different nucleotide base. The signals recorded for each well are combined to produce a DNA sequence, referred to as a ‘read’, for each original DNA fragment.

Over the course of a 10-hour run, 454-sequencing can produce up to 600 Mbp of data with an average length of 400 bp, though newer machines boast average read lengths up to 1 kb (Delseny

et al., 2010).

1.5.2 Illumina Hi-Seq 2000 sequencing

Like 454-sequencing, Illumina6 sequencing is ‘sequencing by synthesis after

amplification’. A fragmented DNA library is ligated to 5ˈ- and 3ˈ-adapters, which both adhere to

a series of primers bound to a glass slide, thereby forming a ‘bridge’ out of the DNA fragment.

5 http://www.454.com 6 http://www.illumina.com 39

Subsequent rounds of PCR amplification across this bridge generate clusters of ~1000 amplicons

of each initial DNA fragment. A universal primer is applied, after which sequencing by synthesis

is conducted in a flow cell using modified nucleotides that act as terminators. Each of the four

nucleotide bases applied possesses a unique fluorochrome that is cleaved chemically after each

round of synthesis. The machine measures the fluorescence of the flow cell in four channels,

thereby recording the base incorporated by each amplicon cluster. Once the fluorochrome is

cleaved another round of sequencing is conducted. Current Illumina instruments produce >100

Gbp of data in a 4-9 day run, with an average read length of 75-150 bp (Delseny et al., 2010).

1.5.3 NGS data processing and analysis

Analysis of NGS data presents its own set of challenges. Each NGS platform generates

millions of individual reads, each of which only represents a fragment of a complete gene

sequence. One NGS application, RNA-sequencing (RNA-Seq) provides a snapshot of essentially all the expressed genes (the transcriptome) in a tissue sample (Trapnell et al., 2010), and new protocols are able to assess gene expression in specific cell-types (Martin et al., 2013). The ability to conduct gene expression studies in non-model plant species has led to RNA-Seq replacing traditional methods for quantifying gene expression, such as DNA microarrays (Martin et al., 2013). Additionally, RNA-Seq offers greater sensitivity and an increased linear range compared to microarrays (Martin et al., 2013). A typical RNA-Seq experiment is conducted by first isolating RNA and then converting the messenger RNA (mRNA) to complementary DNA

(cDNA), forming a cDNA library, which is can be sequenced using Illumina or 454-sequencing.

When conducting RNA-Seq in a species that lacks a reference genome, these reads must be assembled de novo to generate a reference transcriptome. De novo assembly is an in silico

40

process that combines overlapping reads, theoretically derived from the same transcript, into longer sequences, called ‘contigs’. In some cases, low expressed genes do not yield a sufficient number of reads to form contigs representative of complete mRNAs. Also, highly similar reads produced by alternative mRNA splicing, or by transcripts from gene families, may fail to assemble properly. In such cases, these genes may end up being represented by single reads, referred to as ‘singletons’, or by more than one contig. Theoretically, a contig or singleton represents a single mRNA, and the term ‘unigene’ is used to describe a contig or singleton corresponding to a single locus. However, in practice redundancy often occurs in de novo assembled transcriptomes. Gene expression is determined by mapping the unassembled reads against the reference transcriptome, or a reference genome if one is available. The number of reads mapped to a unigene in the reference is proportional to the expression of that gene in the tissue sampled.

To improve de novo assembly and the accuracy of read mapping, paired-end (PE) sequencing can be employed. In PE sequencing, a cDNA library of variable fragment sizes is generated and both ends of the fragments are sequenced as opposed to only one end in conventional NGS. This provides sequence information for the 5ˈ- and 3ˈ-ends of each cDNA fragment and leaves a non-sequenced region of a defined size range in the middle. Using PE sequencing, reads can be assembled into contigs and then the contigs into ‘scaffolds’ as associations between unassembled contigs can be inferred based on the linkage of paired reads common to separate contigs. During de novo assembly, these scaffolds are collapsed, yielding longer unigenes. Furthermore, PE sequencing can increase mapping accuracy as each read effectively has a linked partner read that can also be mapped.

41

A detailed discussion of the different strategies for de novo assembly of RNA-Seq reads and software available for handling NGS data is beyond the scope of this work and has been reviewed elsewhere (Kumar and Blaxter, 2010, Robertson et al., 2010, Strickler et al., 2012, Lu

et al., 2013).

1.6 Summary

Flavonoids and PAs hold a number of potential benefits for human health and agriculture.

However, to fully realize these benefits a complete understanding of PA biosynthesis is required.

Despite significant achievements in uncovering the enzymes and regulatory factors involved in

PA biosynthesis, a number of important questions remain. The process by which flavan-3-ol

monomers moved from their point of origin to the vacuole remains unclear and the mechanism

of polymerization continues to be elusive. Understanding how polymerization occurs may allow

us to modify the length of polymers produced. This holds significant implications as the

absorption of PA polymers (bioavailability) is inversely proportional to the length of the

polymers (Rasmussen et al., 2005). Degradation of polymers in the gut may allow for increased

bioavailability; however, stability of different types of PA polymers (e.g. A-type and B-type linkages) may prevent efficient breakdown in the gut (Rasmussen et al., 2005).

Pea is an economically important crop species, which represents a significant source of nutrition in human and animal diets. The seed coat plays a vital role in the development of pea seeds, which is the only part of the plant consumed. Despite this importance, an unbiased analysis of seed coat gene expression early in development has not been done. Furthermore, the natural PA diversity present in pea makes it an excellent species in which to investigate the biochemistry and genetics of PA biosynthesis.

42

1.7 Research Objectives

The research results described in chapters 3, 4 and 5 of this dissertation address three primary objectives.

1. Despite the potential benefits of integrating PA into the human diet via the widely

cultivated crop species pea, chemical and biochemical analysis of pea PAs has not been

performed. The PA chemical profile of a sample of pea cultivars will be evaluated and

linked with biochemical characterization of the two major branch point enzymes, ANR

and LAR.

2. Extensive breeding has produced a wide variety of pea cultivars. The phenotypic

diversity within these cultivars is postulated to be influenced by differential gene

expression between cultivars. The seed coat plays an important role in the development

of seeds, which are the primary component of peas consumed by humans or used in

animal feed. Comparative transcriptomics will be used to generate datasets from

developing seed coats of five distinct pea cultivars to interrogate gene expression

difference relating to seed phenotypic variation between the five cultivars.

3. Differential gene expression will be used to identify a candidate gene related to PA

biosynthesis, and its ortholog in Arabidopsis will be characterized by a reverse genetics

approach.

43

44

Figure 1.1 Phenylpropanoid pathway. PAL, phenylalanine ammonia lyase; TAL, tyrosine ammonia lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4-coumarate:CoA ligase ; CHS, chalcone synthase; CHI, chalcone isomerase; IFS, isoflavonoid synthase; F3’5’H, flavonoid 3’5’- hydroxylase; F3’H, flavonoid 3’-hydroxylase; FLS, flavanol synthase; F3H, flavanone 3- hydroxylase; DFR, dihydroflavonal 4-reductase; ANS, anthocyanidin synthase; ANR, anthocyanidin reductase; LAR, leucoanthocyanidin reductase. Adapted from Dixon et al., 2013.

45

Figure 1.2 PA chemical diversity. Naturally common isomers are shaded in green.

46

Figure 1.3 Transcriptional regulation of PA biosynthesis. A) A representation of the current model for the complex network of transcriptional control of PA metabolism. This is limited to the primary factors and should not be considered a complete representation of all the influential components. Blue arrows, metabolic flux. Red arrows, direct enzymatic or protein function in PA metabolism. Green arrows, evidence of direct induction of gene expression. Black arrows, functional relationship. Dashed arrows, relationship not fully elucidated. B) Model of the ternary MYB-bHLH-WDR complex at the core of PA transcriptional regulation. Adapted from Baudry et al., 2004 and He et al., 2008.

47

Figure 1.4 Pisum sativum seed diversity. Representative images of seeds from different varieties of pea. Phenotypic features such as seed coat pigmentation, weight and wrinkling can further vary with each variety. Images from Pulse Canada (www.pulsecanada.com; used with permission).

48

Chapter Two: Materials and methods

2.1 Plant material and growth conditions

Mature air-dried seeds of the pea (Pisum sativum L.) cultivars, ‘Alaska’, ‘Canstar’,

‘Courier’, ‘Solido’ and ‘LAN3017’ (grown in Lethbridge, Alberta, Canada in 2007 or in

Barrhead or Namao, Alberta, Canada in 2008) were used for PA extraction or growth chamber

studies. For growth chamber studies, seeds were planted at an approximate depth of 2.5 cm in 3-

L plastic pots (3 seeds per pot) in Sunshine no. 4 potting mix (Sun Gro Horticulture, Vancouver,

Canada) and sand at 1:1. Plants were grown in a climate-controlled growth chamber with a 16 h-

light/8 h-dark photoperiod (19°/17°C) with an average photon flux density of 383.5 µE/m2/s

(measured with a LI-188 photometer, Li-Cor Biosciences, Lincoln, Nebraska). Flowers were

tagged at anthesis, and seeds were harvested at selected stages as identified by days after anthesis

(DAA). Seeds were harvested directly onto ice at 6, 8, 10, 12, 15, 20, 25, and 30, DAA and dissected immediately into seed coats stored at -80°C.

Arabidopsis thaliana were grown at the University of Calgary in growth chambers with a

16 h-light/8 h-dark photoperiod (22°/20°C). Wild-type and transgenic seeds were sterilized by soaking in 50x volume of 70% ethanol for 1 min, followed by immersion in 50x volume of 50% bleach/50% water/0.05% Tween for 10 minutes (vortexed every 2 min) and then rinsed 3x with sterile distilled water (Zhang et al., 2006). Sterilized seeds were germinated on Murashige and

Skoog (MS) basal medium (premixed with sucrose at 30 g/L and agar at 8 g/L; Sigma) prepared at half the specified concentration (plates contained 21.2 g/L MS media, pH 5.6, adjusted with

0.1 M KOH) supplemented with 0.4 g/L Phytoblend agar (Caisson Laboratories, USA). For transgenic lines 25 µg/mL hygromycin or kanamycin was added for selection. Plates were sealed with micropore surgical tape. Seeds were vernalized on plates for 3 days at 4°C then germinated

49

at room temperature in the dark for 3-4 days, after which time the plates were moved to a

climate-controlled growth chamber with a 16 h-light/8 h-dark photoperiod (24°/20°C) and 45%

relative humidity until seedlings were large enough to transfer to soil (5-7 days). Seedlings were planted in soil (1 part peat, 1 part vermiculite, 1 part perlight, 1 part shale) in 75 mm round pots and placed in growth chambers. The Arabidopsis were watered (sub-irrigation for seedlings) as needed and provided with 0.5 g/L 20-20-20 fertilizer once per week. At the bolting stage, each pot was wrapped with cellophane supported by 12-inch wooden dowels (barbeque skewers) to keep plants separated.

2.2 Genomic DNA extraction

Genomic DNA was extracted using a cetrimonium bromide extraction method (CTAB).

Briefly, frozen plant tissue was ground with a mortar and pestle in CTAB buffer (27.5 mM

CTAB, 50 mM Tris-HCl pH 8, 20 mM EDTA, 700 mM NaCl, 0.4% (v/v) 2-mercaptoethanol)

and then heated at 65°C for 10 min. Solution was mixed 1:1 with chloroform, gently vortexed for

1 min and centrifuged at 14,000 g at room temperature for 5 min. The aqueous fraction was

removed and extracted twice more with chloroform. The aqueous fraction was then mixed with

isoproponal at a ratio of 1:0.75 (aqueous fraction to isopropanol) and incubated on ice for 10

min. The solution was centrifuged at 14,000 g at room temperature for 10 min, the supernatant

removed, the pellet washed with 70% ethanol and centrifuged again. The supernatant was then

removed, the pellet partially air-dried and the DNA resuspended in 10 mM Tris-Cl, pH 8.5.

50

2.3 Arabidopsis thaliana Nossen at2g47115 T-DNA insertion genotyping

Genotyping of at2g47115 transposon insertion lines was conducted by screening gDNA from F1 plants from a self-cross of the heterozygous Arabidopsis Nossen line 53-3618-1

(resource number pst00267; Riken BioResource Centre, Japan) using primer 49 and 50 (see

Table 2.1 for primers), which flank the transposon insertion site, and primer 51, which is specific

for the transposon. A lack of amplicon produced by primers 49 and 50 combined with an

amplicon produced by primers 49 and 51 confirmed the presence of homozygous transposon

insertion in AT2G47115.

2.4 RNA isolation and cDNA preparation

Pea seeds were freshly collected and the seed coats immediately isolated on dry ice then

snap frozen in liquid nitrogen. Arabidopsis immature siliques were removed from the plants and

immediately frozen in liquid nitrogen. Frozen plant tissue was ground with a mortar and pestle.

For qPCR analysis and gene cloning, total RNA was extracted using a RNeasy Plant Mini Kit

(Qiagen). Polyvinylpolypyrrolidone (PVP-40) was added to the RTL extraction buffer at 2%

(w/v) to reduce precipitation of RNA by phenolic compounds (Wang et al., 2000). First-strand cDNA was synthesized using Superscript II reverse transcriptase and an Oligo (dT)12-18 primer

(Invitrogen).

2.4.1 Next-generation sequencing (NGS) RNA extraction and cDNA preparation

Two rounds of Roche/454-pyrosequencing were performed. For the first round (half plate sequencing), total RNA was isolated from 10-25 DAA ‘Courier’ seed coats (4-5 g) using an

E.Z.N.A. Plant RNA kit (Omega Bio-tek, Georgia, USA), with the of 2% PVP-40 in

51

the RB (extraction) buffer. Total RNA was concentrated by precipitation in 2.5x volume cold

ethanol (-20°C) and 0.3 M sodium acetate [prepared with diethylpyrocarbonate (DEPC) treated water]. Precipitation was carried out at 4°C overnight after which the RNA was pelleted by centrifugation at 12,000 g at 4°C for 20 min. The pellet was washed with 75% cold ethanol (25%

DEPC sterile water) after which the pellet was partially air-dried and resuspended in DEPC sterile water. Total RNA was quantified using a Nanodrop ND-1000 (Thermo Scientific) and integrity evaluated by examination of the 16S and 18S RNA bands by gel electrophoresis. mRNA was purified from the total RNA preparation using an Oligotex mRNA Mini kit

(Qiagen). Double stranded (ds)-cDNA was synthesized from the mRNA according to the Joint

Genome Institutes (US Department of Energy) cDNA Library Creation Protocol (Zhao and Ng,

2007) and quantified using the Quant-iT PicoGreen™ dsDNA assay (Invitrogen). Approximately

2 µg of cDNA was sent to the National Research Council Plant Biotechnology Institute (NRC-

PBI, Saskatoon, Canada) for Roche/454 Titanium sequencing. Transcriptome assembly was performed by personnel at the National Research Council Plant Biotechnology Institute (NRC-

PBI; Saskatoon, Canada) using GS De Novo Assembler version 2.6 (Roche,

Branford, Connecticut).

For the second round (full plate), ds-cDNA prepared from total RNA was isolated from approximately 5 g of 10 DAA seed coat tissue from each of the five pea cultivars as described above, with the exception that the total RNA was concentrated by precipitation with 0.25x volume of 10 M lithium chloride (prepared with sterile DEPC water) for 2-hrs on ice.

Approximately 2 µg of cDNA from each cultivar was sequenced using a Roche 454 GS-FLX

Titanium technology at the NRC-PBI.

52

For Illumia sequencing, total RNA was isolated from approximately 0.1 g of 10 DAA

seed coat tissue from each of the five pea cultivars as described above. An on-column DNase I

digestion was performed. Total RNA concentration was estimated using a NanoDrop ND-1000

and approximately 5 µg was sent for cDNA synthesis and sequencing by Illumina Hi-Seq 2000

technology at the McGill University and Génome Québec Innovation Centre (McGill University,

Canada).

2.5 Cloning of the Pisum sativum ANR, DFR and LAR

PsANR (GenBank KF516483) was cloned from cDNA prepared from 20 DAA ‘Courier’

seed coat tissue using degenerate primers based on conserved amino acid regions of ANR in

Medicago truncatula (AAN77735.1), Malus x domestica (AAZ79363.1), Fragaria ananassa

(ABG76843.1), Arabidopsis thaliana (AAF23859.1) and Vitis vinifera (AAZ82409.1). PCR

reactions (50 µL) were run for a total of thirty-five cycles using 1 unit of Phusion polymerase

(New England Biolabs), GC buffer (New England Biolabs), 0.2 mM dNTPs and 2 pmol primers

1 and 2 (see Table 2.1 for primers), and 250 ng of 20 DAA Courier seed coat cDNA.

Amplified fragments were gel purified using a QIAquick Gel Extraction Kit (Qiagen),

cloned into pBlueScript II SK (-) (Stratagene) and sequenced. The full-length sequence of

PsANR was recovered by 3',5'-rapid amplification of cDNA ends (3'5'-RACE) using a SMART

RACE cDNA Amplification kit (Clontech) and primers 3 to 6. 3'- and 5'-RACE ready cDNA

was produced according to the manufacturer’s protocol and used as a template for further

reactions. RACE fragments were ligated into pGEM-T Easy vector (Promega) and sequenced.

Full-length PsANR was cloned into the Gateway donor vector pDONR221 (Invitrogen) using primers 7 and 8, and the universal Gateway primers 47 and 48.

53

The complete coding sequence of PsDFR (GenBank KF516484) was retrieved from the

‘Courier’ Roche/454-pyrosequencing data and the gene cloned using primers 13 and 14. The

PsLAR (GenBank KF516485) sequence in the 454-sequencing data lacked the 5'-end; therefore,

5'-RACE was performed using a SMART RACE cDNA Amplification kit (Clontech) to obtain the complete coding sequence. The 5'-RACE fragment was cloned using primers 9 and 10, and ligated into pGEM-T Easy vector (Promega) for sequencing. Full-length PsLAR and PsDFR were cloned into pDONR221 (Invitrogen) vector using primers 11, 12 and 13, 14, respectively, and the universal Gateway primers 47 and 48.

2.6 Characterization of Pisum sativum ANR, DFR and LAR

2.6.1 Recombinant expression and purification of Pisum sativum ANR, DFR and LAR

Using Gateway cloning, PsANR, PsDFR and PsLAR were cloned into the bacterial overexpression vector pDEST17 (Invitrogen), which contains an N-terminal 6x-histidine tag, and the vector was transformed into E. coli BL21-AI (Invitrogen). The bacteria were grown in LB

-1 media with ampicillin (100 µg mL ) in a shaker (200 rpm) at 37°C to an OD600 of 0.4-0.6 at

which point the cultures were transferred to a refrigerated shaker for an additional 20-30 minutes

at 12°C (ANR) or 15°C (DFR and LAR). Expression was then induced by adding L-arabinose

(Sigma) to a final concentration of 0.2% (w/v). Following overnight incubation at 12°C or 15°C, the bacteria were pelleted by centrifugation at 4°C at 8000 g for 20 min. Pellets were

resuspended in 1% of original culture volume in lysis/wash buffer 1 (100 mM Tris-HCl pH 8, 10 mM imidazole, 10% glycerol, 0.1% Triton X-100, 10 mM β-mercaptoethanol). Cells were lysed by sonication on ice using a Microson Ultrasonic Cell Disrupter XL (Misonix, Farmingdale, NY) at 6 Watts using 10 seconds bursts followed by 10 seconds cooling time, repeated 8-10 times.

54

The lysate was centrifuged at 12,000 g at 4°C for 20 min. The supernatant was applied to a 1 mL

Bio-Scale Mini Profinity IMAC Ni-NTA column (BioRad), equilibrated with wash buffer 1,

using a BioLogic DuoFlow (Biorad) fast protein liquid chromatography (FPLC) machine. The

column was washed at 1 mL min-1 with 6 mL lysis/wash buffer 1 followed by 6 mL wash buffer

2 (100 mM Tris-HCl pH 8, 20 mM imidazole, 10% glycerol, 0.1% Triton X-100, 10 mM β- mercaptoethanol, 1 M KCl). Recombinant protein was eluted at 1 mL/min with 7.5 mL of elution buffer (100 mM Tris-HCl pH 8, 250 mM imidazole, 10% glycerol, 300 mM KCl). The eluent was collected in 15 mm glass test tubes (in ice-water bath) using an automated fraction collector.

The eluent was concentrated using an Amicon Ultra-30 column (Millipore). Protein concentration was determined by the Bradford assay using Bradford reagent (Biorad; Bradford,

1976). Aliquots of concentrated protein were frozen in liquid nitrogen and stored at -80°C.

2.6.2 PsANR in vitro assays

To determine the linear range of PsANR activity, 5-100 µg of purified protein was assayed in a final reaction volume of 250 µL. The assays were run in 100 mM Tris-HCl containing 20 mM NADPH and 100 µM cyanidin-chloride (Extrasynthase, Genay, France).

Coumarin was added to a final concentration of 25 µM as an internal standard by normalizing product peak areas to the average coumarin peak area across all samples. The reactions were incubated at 30°C for 30 min and stopped by extracting twice with 500 µL ethyl acetate, vortexing for 1 min, and centrifuging for 1 min. The ethyl acetate was evaporated at room temperature under a nitrogen stream. The organic fraction was resuspended in 50 µL of 50% methanol, passed through a 0.45 µm filter, and analyzed by high performance liquid chromatography (HPLC; Waters 2795 Separations Module) using a Sunfire C18 3.5 µm 4.6x150

55

mm column (Waters) and a photodiode array detector (Waters, model 2996) scanning between

100-400 nm. Peak area was quantified at 280 nm (epicatechin, epiafzelechin) and 270 nm

(epigallocatechin). An injection volume of 10 µL was used for all standards while 20 µL was used for all in vitro assays. For the cyanidin and pelargonidin assays, the column was eluted

using a linear gradient consisting of solvent A (100% H2O) and solvent B (100% acetonitrile) at a flow rate of 1.2 mL min-1 as follows: 0-8 min 20-50% B, 8.5-10 min 100% B. The

chromatography gradient for delphinidin assays was 0-17 min 5-50% B, 17-18.5 min 100% B.

Epicatechin (Extrasynthase), epigallocatechin (Extrasynthase) and epiafzelechin (MicroSource,

Gaylordsville, Connecticut) were used as authentic standards. The optimum temperature for

PsANR activity was determined using 50 µg of purified protein in the above reaction mixture at

temperatures between 25-60°C for 30 minutes. Incubation time linearity was determined using

the above reaction mixture incubated at 40°C for 15, 30, 60, 90, 120 and 240 min. The optimum

pH for PsANR activity was determined using the above reaction mixture incubated at 40°C for

30 min. Three buffers were used to test a pH range from 4-8.5 (50 mM citrate/phosphate: pH 4,

5, 6, 7; 50 mM MES (2-(N-morpholino)ethanesulfonic acid): pH 5, 6, 6.5, 7; 100 mM Tris-HCl:

pH 7, 7.5, 8, 8.5). Reactions were stopped and analysis carried out as described above.

Kinetics assays were carried out in triplicate in a total reaction volume of 250 µL

containing 100 mM Tris-HCl pH 7, using 40 µg purified PsANR, 20 mM NADPH, at 42°C for

20 minutes. Substrates (Extrasynthase) tested were pelargonidin-chloride (5-100 µM), cyanidin-

chloride (5-200 µM) and delphinidin-chloride (50-750 µM). Acetosyringone (37 µM) was added immediately prior to ethyl acetate extraction as an internal HPLC standard using the same method as described above for courmarin. Extraction and resuspension were conducted as described above.

56

2.6.3 PsDFR-PsLAR coupled enzyme assays

In vitro coupled enzyme activity assays were carried out using 100 mM Tris-HCl, 50 µg purified PsDFR, 25 µg purified PsLAR, 100 µM dihydromyricetin (DHM), 2 mM NADPH at temperatures from 22ºC to 68ºC for 30 minutes to determine the optimum temperature for the coupled reaction. The optimum pH of the coupled assay was determined using 50 mM MES (pH

5, 6, 7), 100 mM sodium phosphate (pH 5.5, 6, 7) and 100 mM Tris-HCl buffers (pH 7, 7.5, 8,

8.5). 100 µM DHM, 50 µg PsDFR and 25 µg PsLAR were added to each 250 µL reaction.

NADPH was added to a final concentration of 2 mM. Conversion efficiency was determined using 1-1000 µM DHM (Chromadex, USA) or 1-1000 µM dihydroquercetin (DHQ;

Extrasynthase), 50 µg PsDFR and 25 µg PsLAR in 50 mM MES buffer, pH 6 at 40°C for 30 min. Reactions were stopped and extracted as described for the ANR in vitro assays, except that the organic fraction was resuspended in 100 µL of 50% MeOH. Catechin-hydrate (> 95.0%;

Tokyo Chemical Company, Japan) and gallocatechin (≥ 98.0%; Sigma) were used as LC-MS/MS standards. Acetosyringone was used as an internal standard as described above.

2.7 Liquid chromatography mass spectrometry (LC-MS/MS)

PsANR and PsDFR-PsLAR reaction products were confirmed using an Agilent

Technologies (Santa Clara, California) 6410 Triple Quad LC-MS/MS with a 1200 Series liquid chromatography system equipped with an electron spray ionization source and an Eclipse Plus

C18 1.8 µm 2.1x50 mm column (Agilent). Samples were extracted and prepared as described above. Products and standards were detected in positive ion mode using product ion scan. Mass to charge ratio (m/z) selected for fragmentation were: 291 for epicatechin (EC)/catechin (Cat),

307 epigallocatechin (EGC)/gallocatechin (GC) and 275 for epiafzelechin (EAZ). Fragmentor

57

energies were: EC, 85 V; EGC, 50 V; EAZ, 80 V. Collision energies were: EC/Cat, 12 and 25

eV; EGC/GC, 0 and 20 eV; EAZ, 8 eV and 72 eV. Liquid chromatography solvents were A) 1%

(v/v) aqueous acetic acid and B) 100% acetonitrile. Gradients for the samples were: EAZ, 0-8 min 10-50% B, 8-10 min 100% B; EC/Cat, 0-0.5 min 5% B, 0.5-8 min 5-30% B, 8-10 min 100%

B; EGC/GC, 0-8 min 30-100% B, 8-10 min 100% B. Flow rate for all samples was 0.4 mL min-1.

2.8 Arabidopsis thaliana PsLAR transgenic plants

A seed coat specific expression cassette (PANR::FLAG::PsLAR::T35S) was generated by

PCR-stitching, consisting of a 1367-bp portion of the native A. thaliana ANR (BANYULS)

promoter, shown to be sufficient to drive seed coat specific expression of a GUS reporter

(Debeaujon et al., 2003), and the construct was cloned into pKGWFS7 (Karimi et al., 2002).

Briefly, full-length PsLAR was cloned from ‘Courier’10 DAA cDNA using primers 15 and 16

(see Table 2.1 for primers), the PANR fragment was cloned from Arabidopsis Columbia-0 genomic DNA using primers 20 and 21, and the cauliflower mosaic virus 35S terminator (T35S) was cloned from pKWG2D (Karimi et al., 2002) using primers 18 and 19, which also introduced a 5'-region that overlapped with 3'-PsLAR. All fragments generated throughout this stitching procedure were gel purified prior to subsequent PCR-stitching reactions. PsLAR and T35S were stitched together using primers 17 and 19, which also added a 5'-FLAG tag to PsLAR. A 3'- overlap region between 5'-FLAG::PsLAR was added to the PANR fragment using primers 20 and

22. The PANR::FLAG and FLAG::PsLAR::T35S constructs were stitched together using primers

19 and 23. PCR reactions (50 µL) were run for a total of thirty cycles using 1 unit of Phusion polymerase (New England Biolabs), HF buffer (New England Biolabs), 0.2 mM dNTPs and 0.4-

0.5 pmol forward and reverse primers. Touchdown PCR was used for the stitching reactions with

58

an initial five cycles at an annealing temperature (Tm) calculated based on the overlap regions

involved. The Tm of the subsequent twenty-five cycles was dependent on the PCR primer Tm.

Additionally, when stitching, the templates were added in equimolar amounts and PCR primers

were added after the initial five cycles.

The vector was transformed into Agrobacterium tumefaciens GV3101. Arabidopsis

Columbia-0 and an anr transposon knockout line (SALK_040250C; Arabidopsis Biological

Resource Centre, Ohio State University) were transformed by floral dip as previously described

(Zhang et al., 2006), F1 seeds were tested for kanamycin resistance and positive transformants were confirmed by genomic PCR using primers 16 and 20.

2.9 Quantitative reverse-transcription PCR

Total RNA was extracted from 6-20 DAA pea seed coat tissues or immature Arabidopsis

siliques and cDNA synthesis performed as described above. Quantitative reverse-transcription

PCR (qRT-PCR; Step One Real-Time PCR System and 7300 Real-Time PCR System, Applied

Biosystems, Carlsbad, CA) was done using Power SYBR Green PCR mix (Applied Biosystems),

0.3 pmol primers, and 1-3 ng (from mRNA) or 5-25 ng (from total RNA) of cDNA template in a

reaction volume of 20 μL. The reaction conditions for qRT-PCR were: 1 cycle of 95°C for 10

min and 40 cycles of 95°C for 15 s and 58°C for 1 min. Actin and ubiquitin were used as

references for analysis of pea and Arabidopsis genes, respectively. Relative transcript abundance

was determined using the ∆∆CT analysis method with three to four technical replicates (Pfaffl,

2001). See Table 2.1 for qRT-PCR primers.

59

2.10 Extraction, purification and identification of proanthocyanidins

To estimate seed coat total soluble PA concentration in pea seed coat, approximately 25 mg subsamples of seed coat tissue [lyophilized and ground to a fine powder using a Retsch ZM

200 mill (PA, USA) with 0.5 mm screen filter] were weighed into 15 mL Falcon tubes. The samples were extracted with 10 ml of 80% methanol for 24-hrs with shaking. After vortexing the slurry and centrifuging for 5 min at 4000 rpm, the supernatants were used for PA analysis as previously described (Porter et al., 1985). In brief, 2 mL of the butanol:HCl reagent and 66.75

µL of iron reagent were added into a 15 mL glass culture tube. Then, 0.5 mL of clear sample extract was added to the tube and the mixture was vortexed. Two 350 µL aliquots of the above solution were removed for use as sample blanks, and the remaining solution was placed into a

95°C water bath. After 40 min at 95oC, the solution was allowed to cool at room temperature for

30 min. The reaction products, sample blanks, and a PA standard curve dilution series were monitored for absorbance at 550 nm using a 96 well UV plate reader (Spectra Max 190,

Molecular Devices, CA, USA). The PA standard solution used was an extract from ‘CDC Acer’ pea seed coats purified as described previously (Jin et al., 2012). PA subunit composition and degree of polymerization were characterized and quantified using a method of acid-catalyzed cleavage of the PAs followed by phloroglucinol derivatization (phloroglucinolysis), as described by Jin et al. (2012). Identification of the PA subunits was confirmed by LC-MS/MS also using the method of Jin et al. (2012).

To estimate the total soluble and insoluble PA concentration of Arabidopsis seeds, approximately 30 mg of mature, air-dried seeds were ground in liquid nitrogen and the powder weighed prior to being placed in 3 mL of 75% aqueous acetone, and the mixture was shaken overnight at 4°C. The next day the slurry was vortexed and centrifuging for 5 min at 21,000 g.

60

The supernatant was used for PA analysis as using a procedure derived from the method

described above. Briefly, the supernatant was removed and the pellets rinsed twice with an

additional 1 mL of 75% aqueous acetone with vortexing and centrifugation after each step. The

supernatants were pooled and the acetone removed at room temperature under a nitrogen gas

stream after which the samples were freeze-dried. The soluble PA extract was resuspended in

200 µL of 75% (v/v) aqueous acetonitrile. The butanol:HCl assay was then used to estimate PA

concentration. 50 µL of the soluble PA extract or the entire insoluble pellet were mixed with

95% butanol/5% HCl to a final volume of 3 mL and 0.1 mL of iron reagent (2 N HCl, 2% ferric

ammonium sulphate) was added. Concentration of soluble anthocyanins was measured at 550

nm, using the butanol:HCl:iron reagent as a blank. The samples were then heated at 100°C for 1

hour and then briefly cooled on ice, after which the absorbance at 550 nm was taken for all the

samples. For the estimation of the soluble PA concentration, the absorbance prior to heating was

subtracted from the post-heating absorbance. Relative PA concentrations in Arabidopsis wild- type and at2g47115 knockout seeds were compared based on OD550.

2.11 Transcriptome assembly and RNA-Seq analysis

Illumina paired-end sequencing as well as initial read quality trimming and filtering was

performed by personnel at McGill University and Génome Québec Innovation Centre (Montreal,

Canada). CLC Genomics Workbench™ v5.5.1 was used for de novo assembly and

bioinformatics analysis. 454 reads from the second round of Roche/454 NGS were also

separately de novo assembled using Newbler 2.5 using default parameters.

For CLC assemblies, duplicate reads were removed using the CLC Duplicate Reads

Removal plugin (ver. 0.3 beta 3). Illumina reads were assembled using the CLC default

61

parameters. Pooled 454 and Illumina reads were assembled using default CLC parameters with the following exceptions: mismatch cost = 1, insertion cost = 2, deletion cost = 2, length fraction

= 0.4, and with the ‘updated contigs’ option selected.

All assemblies were trimmed by removing all unigenes <150 bp prior to evaluation.

Mapping to Medicago truncatula genes CDS 4.0v1 (downloaded 2013-07-31) was conducted using CLC and the following parameters: mismatch cost = 1, insertion cost = 2, deletion cost = 2, length fraction = 0.5, similarity fraction = 0.8, non-specific match handling = map randomly, and default settings for other parameters. Mapping to TAIR10 (version 20110103) used the same parameters, except similarity fraction = 0.55. The Roche/454 Newbler assembly (454NB) was ultimately used for RNA-Seq analysis. The 454NB assembly was annotated by BLASTx against the TAIR10 (version date 20110103) representative gene model set and the UniProt

Viridiplantae protein database (downloaded Aug. 31, 2013). The 454NB assembly was annotated by BLASTx against the TAIR10 (20110103) representative gene model set and the UniProt

Viridiplantae (130831) set.

RNA-Seq analysis was performed using CLC default parameters, expect for the following adjustments: min length fraction = 0.75, min similarity fraction = 0.95, and the

‘include broken pairs’ counting scheme was selected. Pair-end read distance set at 150 to 600 bp.

The results were filtered by removing all unigenes with < 10 total mapped reads and an RPKM value <1 for each cultivar in one of the two cultivar groups being compared. To target highly expressed genes, a second filter was used to select only unigenes with ≥ 5-fold higher expression and a minimum RPKM of 50.

62

2.12 AT2G47115 (AtPUP) subcellular localization

The AT2G47115 coding sequence without a stop codon was cloned into pSITE-2NA

(GenBank: EF212299), or with a stop codon into pSITE-2CA (GenBank: EF212294;

Chakrabarty et al., 2007) using Gateway cloning technology (Invitrogen) and primers 52 and 53 as well as the universal Gateway primers 47 and 48 (Table 2.1). pBIN20 vectors with a series of and structural RFP-markers (Hennegan and Danna, 1998) and the pSITE constructs were separately transformed into Agrobacterium tumefaciens GV3101. Agrobacteria were used for co-infection of Nicotiana benthamiana by leaf infiltration. Briefly, liquid agrobacteria culture

[28°C, Lysogeny broth (LB), 25 mg/L of each spectinomycin (vector selection), rifampicin, and gentamicin] was used to inoculate solid selection media, which was incubated overnight at 28°C.

The Agrobacterium was scraped off the plates and resuspended (OD600 0.8-1) in 2 mL of buffer

[10 mM 2-(N-morpholino)ethanesulfonic acid (MES), pH 5.6, 10 mM MgCl2). The solution was

diluted 1:20 using the same buffer with 150 µM acetosyringone added. After a 1-hr incubation at

room temperature, the bacterial solution was injected into N. benthamiana leaves using a 1 mL

syringe. Three days later, subcellular localization was studied using a spinning disk confocal

microscope (UltraVIEW VoX, Perkin Elmer, MA, USA) and with a confocal microscope (Leica

TCS SP5 II, Leica Microsystems Inc., Concord, ON, Canada). EGFP was excited at 488nm and

the emission was collected from 500-530 nm. RFP or mCherry was excited at 543nm and the

emission was collected from 590-650 nm. Subcellular localization of AtPUP-GFP was studied in

12 stomatal guard cells and eight pavement cells in tobacco leaf epidermis.

63

Table 2.1 List of PCR primers used in this work No. Primer namesa Sequence (5’ > 3’) 1 PsANR degenerate F ATAGTCTAGAGAYATGATHAARCCNGC 2 PsANR degenerate R CGATTCTAGAGCRCARCADATRTAYCT 3 PsANR RACE 5’-gsp CGGCGATCAAAGGTGTGTTGAATGTGTTG 4 PsANR RACE 5’-nested CTGGTCTGATGTTGAGTTTCTGAATGCGGC 5 PsANR RACE 3’-gsp TGAGCTCGGCAAATATCCTCGACATGA 6 PsANR RACE 3’-nested GTGGCAGAGAAAGAATCAGCTTCTGG 7 PsANR Gateway F AAAAAGCAGGCTTAATGGCTAGTATCAAAGAAG 8 PsANR Gateway R AGAAAGCTGGGTATTACTTCTTCAAGACCCCC 9 PsLAR RACE 5’-gsp CTGATCCAACGGTGGAGGAAGC 10 PsLAR RACE 5’-nested CCAGATTCCTCGATCACACGTCTAACC 11 PsLAR Gateway F AAAAAGCAGGCTTCATGGCACCAACTTC ATCAC 12 PsLAR Gateway R AGAAAGCTGGGTCTCAACAGGAAGCTGTGAT TATTAC 13 PsDFR Gateway F AAAAAGCAGGCTTCATGGGTTCGGTGTCG 14 PsDFR Gateway R AGAAAGCTGGGTCTTATTTCTTCATGGTGTCATTAACTTC GGTC 15 PsLAR F ATGGCACCAACTTCATCACCACCAACCAC 16 PsLAR R TCAACAGGAAGCTGTGATTATTACTGGTTCT ACC 17 5’ FLAG tag+PsLAR overlap F ATGGATTACAAGGATGACGACGATAAGATCATGGCACCA ACTTCATCACCACC 18 T35S overlap with3'-LAR F CCAGTAATAATCACAGCTTCCTGTTGACGGCCATGCTAGA GTCCGC 19 T35S Gateway R AGAAAGCTGGGTCAGGTCACTGGATTTTGGT TTTAGG 20 Ath PANR F CCAGGAGGTTTTCAAAGACTATGGAGTG 21 Ath PANR R CATAACAACTAAATCTCTATCTCTGTAATTTCAAAAGTAC AATC 22 Ath PANR FLAG overlap R TTATCGTCGTCATCCTTGTAATCCATGATTGTACTTTTGAA ATTACAGAGATAGAG 23 Ath PANR Gateway F AAAAAGCAGGCTTCCCAGGAGGTTTTCAAAG ACTATGGA 24 PsActin qRT-PCR F TTCTCACTGAAGCTCCGCTTAACC 25 PsActin qRT-PCR R CAATACCAGTTGTACGGCCACTAGC 26 PsANR qRT-PCR F TCAGAATACCTGTGTTCCCGAGCTTG 27 PsANR qRT-PCR R CCTTGCGGCAATCCTCGAATTTAGT 28 PsLAR qRT-PCR F TCCTGTGGAGCCAGGTTTAGCAAT 29 PsLAR qRT-PCR R AGTAAGGCCAAGATGCGATGGAGT 30 PsDFR qRT-PCR F CGTTCGCGATCCAGATAACGTGAA 31 PsDFR qRT-PCR R ACCCTCTTCAGCAAGATCAGCCTT 32 Ath ubiquitin qRT-PCR F GGCCTTGTATAATCCCTGATGAATAAG 33 Ath ubiquitin qRT-PCR R AAAGAGATAACAGGAACGGAAACATA 34 Ath ANR qRT-PCR F ACCGGGAAAGAAATGCATGTGACC 35 Ath ANR qRT-PCR R ATGGGCACGACGTAAATCGTCTAC 36 Ath ANS qRT-PCR F TGGGTTGGTGAATAAGGAGAAG 37 Ath ANS qRT-PCR R GGCAACGGCTTAAGAACAATC 38 Ath CHS qRT-PCR F TGACTGGAACTCCCTCTTCT 39 Ath CHS qRT-PCR R GCCCTCATCTTCTCTTCCTTTAG 40 Ath DFR qRT-PCR F GGTCGGTCCATTCATCACAA 41 Ath DFR qRT-PCR R CGTTGCATAAGTCGTCCAAATG

64

No. Primer namesa Sequence (5’ > 3’) 42 Ath AT2G457115 Gateway F AAAAAGCAGGCTTCATGGCATTGATCG 43 Ath AT2G457115 Gateway R AGAAAGCTGGGTCCTAAGTTACAAGAGC 44 Ath AT2G457115 Gateway R AGAAAGCTGGGTCAGTTACAAGAGCATTTGGGAATAGAT (no STOP) ATGG 45 Ath AT2G457115 F ATGGCATTGATCGTTTATTGGTATGAC 46 Ath AT2G457115 R CTAAGTTACAAGAGCATTTGGGAATAGATATG 47 attB1 universal Gateway GGGGACAAGTTTGTACAAAAAAGCAGGCTTC primer 48 attB2 universal Gateway GGGGACCACTTTGTACAAGAAAGCTGGGTC primer 49 Ath AT2G47115 gDNA F GTTCTTGGCTTCTGATGATTCG 50 Ath AT2G47115 gDNA R GTAAAGAGTAGCCAGCCGG 51 Ath AT2G47115 insert R TACCTCGGGTTCGAAATCGAT 52 Ath AT2G47115 cDNA F GCAATCGTCACGTCCTTGTGGTTT 53 Ath AT2G47115 cDNA R GCAGATCGTCAACGTGTTTAACCCAAAG 54 Ath TIM44-2 F GCTACAGAAGAGGTCAAGGAGTCATTC 55 Ath TIM44-2 R TCTCTCTGAGCCTCCAAATGGGAT 56 PsTT8/PsA F GGTCCAATCGAAGAACCTCTCGATG 57 PsTT8/PsA R GAGAGTAGTGTGTGTCTTCTTGTGTTAAGTC 58 AT2G47115 qRT-PCR F TTCACTTGGTGGCCATATCCGTTC 59 AT2G47115 qRT-PCR R GCTCCATAGCAAGGGATATGAACAAT 60 PsTT2 qRT-PCR F ACGAGACAGAGACAGAGACAGCAA 61 PsTT2 qRT-PCR R TGTTACTCGTCGAATCACTCGCCA 62 PsTT8/PsA qRT-PCR F TCCTGTGTTGGATGGTGTCGTTGA 63 PsTT8/PsA qRT-PCR R TGGTGGCAAAGAGTGGTGGTCTAT 64 PsTTG1/PsA2 qRT-PCR F ACTCCTTTGCTTCGTTTGGCTTGG 65 PsTTG1/ PsA2 qRT-PCR R GCAATCGCATTAACACCAGCACGA 66 PsTTG2 qRT-PCR F GTCTAGTAGAGCCACGGATCGTGAT 67 PsTTG2 qRT-PCR R TCGATTGCTCTCTCGACGTGCT 68 PsPAR qRT-PCR F AGCTCCAAGGCCAAGGAGATTGAA 69 PsPAR qRT-PCR R ATCGTTGTCGATGAGGAAATCGGC 70 PsPAL qRT-PCR F CTGACGATCCTTGTCTAGCTACATACCC 71 PsPAL qRT-PCR R ACCTTCCACATTCACCAACGCA 72 PsContig03511 qRT-PCR F TCACATACCCTGCTACGGCATCT 73 PsContig03511 qRT-PCR R CGAGAAGGCATGAGGGAAGAATCTAGG 74 PsContig11789 qRT-PCR F TCATGGTCTTGGTCCTTACACTGG 75 PsContig11789 qRT-PCR R GCATAGTTCTGACCTGTTGGATGAGG 76 PsContig13014 qRT-PCR F GGTGGCTGTGCTGAAACAAGAACT 77 PsContig13014 qRT-PCR R AATGACCTCTCTGCTGCTTCCTGT 78 PsContig12490 qRT-PCR F GGTGGCGGTTGAAGTTGAAGTGTC 79 PsContig12490 qRT-PCR R CCTTCTCTCGCATCACTCGATTCAAG 80 PsContig10591 qRT-PCR F TTTACCTCCGATTCGTGTCCGTGT 81 PsCotig10591 qRT-PCR R CAGGAATTGCAGCCATAGCCACAT 82 PsContig07172 qRT-PCR F GTTCTTGTCCTGCTGTGTCCCATA 83 PsCotig07172 qRT-PCR R AAACCAGGAACCAACCATAGGAGC 84 PsContig14173 qRT-PCR F GGCTGAAATGGATTTGGAAGG 85 PsContig14173 qRT-PCR R CACCTAGTCCACCTTTGATTCC 86 PsContig19292 qRT-PCR F TCCAAGGGCTTCCATTCATCCTCA

65

No. Primer namesa Sequence (5’ > 3’) 87 PsContig19292 qRT-PCR R ATCCATCAAGCAGCTATACGGGCA 88 PsContig11924 qRT-PCR F ATCTATGCACAGCAGCTTGTCCGA 89 PsContig11924 qRT-PCR R ATGTCACAATTGGTGGCACCTTCC 90 PsContig12107 qRT-PCR F AGGAGCAAAGCCAGTGTTCTCTGT 91 PsContig12107 qRT-PCR R TGAGAGCTGTCCAGTGCCATCAAT 92 PsContig12117 qRT-PCR F TGATTGGACCTTCCTTTCGAGGCA 93 PsContig12117 qRT-PCR R TGTTGGAGAGGAACAGAAGTGCGA 94 PsContig09978 qRT-PCR F TGGATTGATGCCTTCCGATC 95 PsContig09978 qRT-PCR R CAGTTTGACATGGATTTGCCG a Forward and reverse primer are abbreviated as F and R; gsp means gene specific primers.

66

Chapter Three: Analysis of Pisum sativum PA chemical profile and characterization anthocyanidin reductase and leucoanthocyanidin reductase

3.1 Introduction

Pisum sativum L. (pea) seeds are a rich source of minerals, proteins, starch and

antioxidants in the human diet. Dry pea seeds are widely used in agriculture as feed for livestock

and are gaining interest as feed in aquaculture. Pea seeds are also gaining wide recognition as a

health food due to the low glycemic index of the starches (Guillon and Champ, 2002).

Furthermore, certain varieties of pea accumulate significant quantities of proanthocyanidins

(PAs) in their seed coats (Wang et al., 1998). Historically, PAs were considered anti-nutritional compounds because they can precipitate proteins and reduce bioavailability of some minerals

(Griffiths, 1981, Gatel and Grosjean, 1990). However, recent research suggests that PAs have

considerable potential for use as a novel therapy or treatment for a range of human health

conditions, including cardiovascular disease, cancer establishment and progression, and bacterial infections (Dixon et al., 2005). The inclusion of PAs as plant-based health-beneficial components in the human diet has led to renewed interest in this class of flavonoids in food crops

(Rasmussen et al., 2005, Lee et al., 2008). Specifically, studies indicate that PA polymer length

is inversely related to bioavailability, the ability to be absorb by the body (Manach et al., 2005).

Therefore, identification of variations in PA content and polymer length within Pisum sativum,

as well as the mechanisms responsible for this variation would be a great benefit for breeding

new cultivated varieties (cultivars) with additional beneficial health properties.

PAs are derived from the flavonoid branch of the phenylpropanoid pathway (Fig. 1.1).

Chemical diversity can be introduced early in the pathway by two regioselective cytochrome

P450 monooxygenases, flavonoid 3'-hydroxylase and flavonoid 3',5'-hydroxylase (Fig 1.1),

67

which hydroxylate the 3'- or 3',5'-positions, respectively, of the B-ring of naringenin (Holton et

al., 1993, Brugliera et al., 1999). Two consecutive reactions by flavonone 3-hydoxylase (Britsch

and Grisebach, 1986) and dihydroflavanol 4-reductase synthesize colorless flavan-3,4-diols

(leucoanthocyanidins; Reddy et al., 1987), which are further converted to 2,3-cis-flavan-3-ols through the sequential reactions of anthocyanidin synthase (ANS; Pelletier et al., 1997) and anthocyanidin reductase (ANR; Xie et al., 2003), or to 2,3-trans-flavan-3-ols by leucoanthocyanidin reductase (LAR; Fig. 1.1; Tanner et al., 2003). Flavan-3-ols are believed to be the monomeric precursors from which PA polymers are formed (Dixon et al., 2005).

The ability of LAR to synthesize trans-flavan-3-ols in vitro is well established (Tanner et al., 2003, Bogs et al., 2005), and the lack of trans-flavan-3-ols in Arabidopsis thaliana, in which no LAR gene has been identified, is consistent with the same role in vivo (Lepiniec et al., 2006).

However, a number of questions remain regarding the in vivo metabolic flux through LAR and attempts to express LAR in heterologous species have produced mixed results. Constitutive expression of Desmodium uncinatum LAR in tobacco (Nicotiana tabacum) and white clover

(Trifolium repens) yielded only trace levels of trans-flavan-3-ols, despite high in vitro activity of

DuLAR (Tanner et al., 2003). Similarly, transgenic tobacco constitutively expressing Medicago truncatula LAR failed to accumulate trans-flavan-3-ols or PAs (Pang et al., 2007). In contrast, the constitutive expression of black cottonwood (Populus trichocarpa) PtcLAR3 in a related species, Chinese white poplar (P. tomentosa Carr.), led to significant ectopic increase in PAs

(Yuan et al., 2012). Similarly, transgenic black cottonwood constitutively expressing of PtcLAR1 also increased PA levels (Wang et al., 2013). M. truncatula hairy roots constitutively expressing tea (Camellia sinensis) LAR unexpectedly produced high levels of anthocyanins with no evidence of increased PA content (Pang et al., 2013a). However, Pang et al. (2013) were

68

successful in increasing levels of PA-like compounds in transgenic tobacco co-expressing

CsLAR and an Arabidopsis transcription factor, PRODUCTION OF ANTHOCYANIN

PIGMENTS1 (PAP1), known to dramatically increase anthocyanin accumulation, and therefore flux through the flavonoid pathway. Unexpectedly though, levels of epicatechin (EC), a cis- flavan-3-ol, were increased to a greater degree than levels of catechin (Cat), a trans-flavan-3-ol.

Overall, though the role of LAR as the source of endogenous trans-flavan-3-ols is well accepted, our understanding of metabolic flux through LAR is incomplete.

Much of the PA research in seed coat has been conducted using the non-crop species

Arabidopsis and M. truncatula. However, both of these species produce PA polymers composed almost exclusively of the cis-flavan-3-ol, EC (Abrahams et al., 2003, Pang et al., 2007). Pea offers unique advantages to study PA biosynthesis. Pea seeds are substantial larger than those of

Arabidopsis and M. truncatula, allowing for easy isolation of the seed coat, the primary site of expression of the core PA pathway enzyme ANR (Debeaujon et al., 2003). A long history of agricultural breeding of pea has produced a wide variety of pea cultivars (Smýkal et al., 2012).

Variations in the quantity of phenolics exist between pea cultivars (Wang et al., 1998), and it is reasonable to predict that PA flavan-3-ol composition and polymer length also vary with Pisum sativum, which may provide a valuable resource for improving desirable PAs by breeding or by biotechnological means.

Despite the importance of pea in human diet and possible value in understanding pea PA metabolism, comprehensive chemical and biochemical studies of PA in pea have not been achieved to date. As the first step in advancing knowledge of PA metabolism in pea, analyses of the PA profile of three PA-accumulating cultivars (PACs; ‘Courier’, ‘LAN3017’ and ‘Solido’)

69

and biochemical characterizations of the two key PA branch point enzymes, ANR and LAR (Fig.

1.1), from the PA-rich cultivar, ‘Courier’, were conducted.

This work was performed in collaboration with Dr. Alena Jin, at the time a doctoral student in the lab of Dr. Jocelyn Ozga at the University of Alberta. Dr. Jin conducted the PA chemical analyses in pea seed coat. The results of her work are presented here to better place the enzyme characterization and gene expression data in a biological context.

3.2 Results

3.2.1 Proanthocyanidin profile of Pisum sativum cultivars

In pea seed coat, PAs primarily accumulate in the epidermal and ground parenchyma cell

layers (Alena Jin, unpublished data). The PA content and subunit composition of the seed coats

from the three PACs and one PA-lacking cultivar, ‘Canstar’ (Fig. 3.1) were determined by acid-

catalyzed cleavage followed by phloroglucinol derivatization (phloroglucinolysis; Table 3.1;

Kennedy and Jones, 2001). This method allows the determination of PA subunit composition and

concentration by the comparison of the retention properties of reaction products with those of

flavan-3-ol standards and other well characterized PA phloroglucinol reaction products. Flavan-

3-ol PA extension units form phloroglucinol adducts at their C4 position while terminal flavan-3- ol units are released as flavan-3-ol monomers (Fig. 1.1), the ratio of which allows determination of the mean degree of polymerization (mDP; average length of the polymers). No PA subunits were detected in the reverse phase-high performance liquid chromatography (RP-HPLC) of

‘Canstar’ seed extracts (Fig. 3.1). This lack of PA results in a clear-coloured seed coat in

‘Canstar’, in contrast to the solid brown or brown-speckled seed coats of the three PACs, in which PAs were detected (Fig. 4.1 and Table 3.1).

70

The three PACs accumulated similar levels of solvent soluble PAs, though the amount in

‘LAN3017’ appeared slightly higher than in ‘Solido’ and ‘Courier’ when the butanol:HCl method was used (Table 3.1). The total soluble PA yield from whole seed extracts was also calculated using the PA extract yield values and the conversion yield of PAs to known subunits with data from the phloroglucinolysis method (Table 3.1; Porter et al., 1985). The lower PA content values obtained in the whole seed extracts compared to the seed coat extracts were the result of: 1) PA localization in the seed coat and not the embryo of the seeds for all cultivars, and

2) a larger ratio of embryo to seed coat tissue in the seeds of ‘Solido’, and decreased solubility of the longer PA polymers of ‘LAN3017’ in the extraction solvent used in the phloroglucinolysis procedure compared to the shorter PA polymers present in ‘Courier’ and ‘Solido’. Therefore, the total soluble PA content of the seed coat as estimated using the butanol-HCl assay was the method of choice for determining PA content difference among these cultivars.

The composition of the PAs differed among the three PACs. The extension subunits in

‘Courier’ and ‘Solido’ were similar, being mainly composed of -derived flavan-3-

ols (2',3',4'-hydroxylated B-ring; Fig. 1.1; Table 3.1), gallocatechin (GC-P; -P denoting the 4α-2-

phloroglucinol adduct; peak 3) and epigallocatechin (EGC-P; peak 4, Fig 3.1; Table 3.1), with

the latter being more abundant, and only minor amounts of the procyanidin-derived (2',3'-

hydroxylated) flavan-3-ol, epicatechin (EC-P; peak 8, Fig 3.1; Table 3.1). The terminal subunits

of ‘Courier’ and ‘Solido’ PAs were primarily GC (peak 5, Fig 3.1; Table 3.1) and EGC (peak 9,

Fig 3.1; Table 3.1), with the former being more abundant. Overall, the PA profile of ‘Solido’ was

very similar to ‘Courier’, with the addition of a trace amount of EC terminal units (peak 11, Fig

3.1; Table 3.1) and Cat extension units (Cat-P; peak 7, Fig 3.1; Table 3.1). On the other hand, the

PA profile of ‘LAN3017’ was qualitatively different from those of ‘Courier’ and ‘Solido’.

71

‘LAN3017’ extension and terminal units were almost exclusively composed of the procyanidin

flavan-3-ols, EC and Cat (Cat terminal unit, peak 10, Fig 3.1; Table 3.1), with the former being more abundant. Only minor amounts of GC and EGC extension units were detected.

Remarkably, the mDP of PA from ‘LAN3017’ was approximately 2-3 fold longer than those from ‘Courier’ and ‘Solido’ (Table 3.1). The identities of the PA subunits detected in the HPLC analysis were further confirmed by liquid chromatography–tandem mass spectrometry (LC-

MS/MS; Table 3.2). The PA extension and terminal subunits in the PA-containing pea cultivars were assumed to be linked in a B-type configuration (C4C8 or C4C6; Fig 1.2), as the PA

interflavonoid bonds were readily cleaved under the acidic conditions.

No PAs were detected from ‘Canstar’ from these analyses. These results were consistent

with the phenotypic observation that the ‘Canstar’ seed coat does not accumulate dark pigments

(Fig. 4.1).

3.2.2 ‘Courier’ proanthocyanidin profile over seed coat development

To acquire further understanding of PA accumulation in pea seed coat, the content and composition of ‘Courier’ PAs was examined during seed coat development. The comparative abundance of EC and EC-P to EGC and EGC-P remained relatively constant from 12-30 days

after anthesis (DAA), the point at which the flower fully opens (Table 3.3). However, between

12-25 DAA, GC-P showed an approximately 50% increase in molar percentage concomitant

with an approximately 10% decrease in EGC-P. ‘Courier’ PA mDP increased between 12-15

DAA, but then remained relatively constant at a value of approximately 5 (Fig. 3.8D) until

increasing to approximately 7 at maturity (Table 3.1). The quantity solvent soluble PAs in

72

‘Courier’ seed coat increased to a maximum at 20 DAA and then declined until maturity (Fig.

3.8D).

3.2.3 Cloning and characterization P. sativum ANR

In pea seed coat, high quantities of both cis-flavan-3-ols and trans-flavan-3-ols were identified (Table 3.1). In contrast, the closely related legume species M. truncatula and model plant A. thaliana are not reported to accumulate trans-flavan-3-ols (Lepiniec et al., 2006, Pang et

al., 2007). This implied that both LAR and ANR are highly active in pea seed coat (Fig. 1.1). No biochemical studies of these two key branch enzymes have been conducted in the crop species pea, and thus we pursued biochemical studies of PsANR and PsLAR.

‘Courier’ was chosen as the source of a PsANR clone as this cultivar displayed high PA accumulation as well as significant quantities of cis-flavan-3-ols (Table 3.1). A full-length

PsANR clone was retrieved from ‘Courier’ seed coat complementary (cDNA) by polymerase chain reaction (PCR) with degenerate primers, followed by rapid amplification of cDNA ends

(5'-, 3'-RACE). PsANR encodes a 1,017-bp open reading frame (ORF), which shares 84% and

60% amino acid identity with MtANR and AtANR, respectively. PsANR is highly conserved between ‘Courier’, ‘LAN3017’ and ‘Solido’ cultivars, differing by only a single amino acid in

‘LAN3017’ (position 28, Gln to Glu) and in ‘Solido’ (position 327, Ile to Val).

To examine the catalytic activity of PsANR, PsANR was expressed as a N-terminal six- histidine tagged recombinant protein and purified using a Ni-NTA column (Fig. 3.2A). Based on the pea PA metabolite data, the primary in planta substrate of PsANR in ‘Courier’ was expected to be the 2',3',4'-hydroxylated anthocyanidin, delphinidin (Table 3.1; Fig. 1.1). Therefore, delphinidin as well as two related compounds, 2',3'- hydroxylated cyanidin and 3'-hydroxylated pelargonidin, were assessed as substrates of recombinant PsANR. When the PsANR enzymatic

73

products were analyzed by LC-MS/MS, they showed identical MS fragmentation and co-

chromatographic patterns with authentic standards (Fig. 3.3, right panel). No flavan-3-ol product was detected when NADPH was omitted or when the protein was boiled prior to the assay (Fig.

3.4). These results showed that all three compounds can be efficiently used as substrates to produce cis-flavan-3-ols. The optimal pH (using citrate/phosphate and Tris-HCl buffers) and temperature for PsANR activity were determined to be 7.0 and 40°C. In the optimized reaction condition, the kinetics properties of PsANR for the three substrates were further determined (Fig.

3.3; Table 3.4). The rates of the respective product formation (i.e., cis-flavan-3-ols) from substrates fit well to the Michaelis-Menten kinetics model with minor variations in affinity and turnover number. PsANR showed comparable kcat values for all three substrates ranging from

-3 -1 0.5-1.2 x 10 sec . However, the Km values for pelagornidin and cyanidin as substrates was

approximately 5-fold lower than for delphinidin, making the overall kinetic efficiency of PsANR for delphinidin 3-7 fold lower than for pelagonidin and cyanidin.

Interestingly, it was recently reported that ANR from grape (Vitis vinifera) and tea have an intrinsic epimerase activity, synthesizing trans-flavan-3-ols as well as cis-flavan-3-ols

(Gargouri et al., 2010, Pang et al., 2013a). In this case, ANR can contribute to the formation of trans-flavan-3-ols. However, we observed no evidence for PsANR epimerase activity as trans- flavan-3-ol products were not observed using cis-flavan-3-ols (EC and EGC) as substrates in the

PsANR recombinant enzyme assays.

3.2.4 Transcriptomics of P. sativum seed coat

Although PsANR activity could be evaluated using commercial substrates, LAR substrates (leucoanthocyanidins; Fig. 1.1) are not stable or commercially available. Due to the

74

lack of substrate, PsLAR activity was examined using substrates enzymatically synthesized by

recombinant PsDFR. However, both PsDFR and PsLAR clones were not initially present in the publicly available sequence databases nor were sufficiently similar homologs available, which would enable degenerate PCR. During the progress of this work, two transcript shotgun

assemblies (TSA) from garden and field pea were released to the NCBI using next-generation sequencings (NGS, Roche/454 sequencing platform; Franssen et al., 2011, Kaur et al., 2012). In these datasets, a full length PsDFR could be identified, but a full-length PsLAR clone was still missing.

To facilitate further studies of PA metabolism, pea seed coats were physically isolated and pooled from ripening fruits between 10 and 25 DAA. A small scale NGS (a quarter plate) was performed using the Roche/454 sequencing method (Section 1.5). Accordingly, a total of

40,903 reads with an average read length of 392-bp were generated, and these individual reads were assembled using the MIRA algorithm to yield 16,272 unigenes (5,766 contigs and 10,506 singletons; Chevreux et al., 1999). These unigenes were annotated by BLASTx against TAIR and UniProt protein sets through the FIESTA bioinformatics pipeline (National Research

Council Plant Biotechnology Institute, Canada). With an E-value of 10-2 cut-off, the unigenes showed 9,702 and 9,420 hits against TAIR and UniProt protein sets.

The numbers of reads comprising a unigene gives an approximate estimation of the level of expression of the transcript represented by the unigene (Section 1.5). The annotated unigenes were ranked according to the number of reads composing each unigene (Table 3.5). The unigenes representing the top 20 most highly expressed genes included 1-aminocyclopropane-1- carboxylate oxidase (ethylene biosynthesis; ranked 2nd), indole-3-acetic acid amido synthetase

(auxin sequestering, ranked 3rd), methionine synthase (ethylene biosynthetic precursor; rank

75

4th), and gibberellin 2β dioxygenase (gibberellin catabolism; ranked 16th). These results are

consistent with gene expression changes observed in other studies (increase in PsGA2ox1 in the

pea seed coat during a similar stage of development (Nadeau et al., 2011) and other hormonal regulation of seed development processes (Sreenivasulu and Wobus, 2013), which provided confidence in the quality of the transcript library and Roche/454 sequencing. Notably, two unigenes annotated as ANR and F3'5' hydroxylase were ranked as the 5th and 6th most abundant

contigs in the database, indicating that PA biosynthesis is a major metabolic pathway in pea seed

coat.

Next, we assessed the coverage of PA metabolic genes represented in our TSA dataset.

The protein sequences of the characterized enzymes involved in PA biosynthesis were curated

from Arabidopsis, alfalfa (Medicago sativa), petunia (Petunia spp.) and were used as BLASTx queries. The identified contigs and singletons with significant E-value (<10-5) hits were manually inspected to determine the numbers of reads for each gene. This quantitative analysis revealed that all 12 genes for PA biosynthesis are present in the pea TSA dataset, but their read numbers vary significantly from 2 to 222 out of approximately 40,000 total reads (Table 3.6). In agreement with the PA chemical phenotype of ‘Courier’ (i.e. mostly 3',4',5'-hydroxy flavan-3- ols), F3'5'H showed an abundant read number (186 reads) of transcripts while F3'H had only two reads. As these two enzymes compete for the common substrate naringenin, this relative transcript abundance supported the delphinidin-derived PA profile of ‘Courier’. In this TSA dataset, DFR was represented by 71 reads and present as a full-length, but LAR had only 13 reads and was present as a partial clone.

76

3.2.5 Cloning and characterization P. sativum LAR

The deduced protein sequences from the full-length PsDFR (1,029-bp ORF) was estimated to be approximately 38.4 kDa, and it shows 89% and 70% amino acid identity to M. truncatula and Arabidopsis DFR, respectively. Contigs representing PsLAR lacked a portion of the 5'-sequence, and hence the full-length PsLAR (1,056-bp ORF) was recovered by 5'-RACE.

The encoded PsLAR protein sequence, predicted to be approximately 38.8 kDa, is 85% and 67% identical to M. truncatula LAR and D. uncinatum LAR, respectively. The LAR characteristic amino acid motifs RFLP, ICCN, and THD were conserved in the PsLAR protein sequence (Fig.

3.5; Bogs et al., 2005).

To examine PsLAR catalytic activities, PsLAR and PsDFR were expressed as recombinant proteins and purified using the same method as for PsANR (Fig. 3.2B). Purified

PsDFR was used to provide PsLAR substrate in vitro. PsDFR and PsLAR were mixed at a 2:1 molar ratio, and a DFR substrate, dihydroquercetin (DHQ) or dihydromyricetin (DHM), was added to the reaction under optimized conditions (40˚C and a slightly acidic pH of 6). The formation of the predicted trans-flavan-3-ol (Cat or GC, respectively) was then analyzed by LC-

MS/MS in comparison to the authentic standards (Fig 3.6 right panel). Various negative controls

(i.e., boiled enzyme, minus NADPH, PsDFR or PsLAR alone) did not synthesize the predicted compounds (Fig. 3.7A), only co-incubation of PsDFR and PsLAR could synthesize products displaying [M+H] ion for Cat (m/z = 291) or GC (m/z = 307). In addition to Cat and GC, early eluting compounds (peak 3 and 7 in Fig 3.6) were detected in each assay. These compounds showed parental m/z values identical to the expected products but were not detected in any of the negative controls. However, commercial Cat and GC standards also displayed these compounds

(peak 1 and 5 in Fig 3.6), suggesting that the early eluting compounds originated from Cat and

77

GC after the products were synthesized by PsLAR. While the early eluting compounds were

almost always observed, they were not present in every sample allowing positive identification

of the authentic product peak. The identities of these compounds were not further investigated.

Coupled assays with increased molar ratios of DFR to LAR (up to 4:1) were tested, but showed

no difference in final product formation (Fig. 3.7B). This result indicated that PsLAR catalytic

activity was the rate-limiting step in the coupled assay.

Overall, the coupled assays showed efficient conversions of the substrates DHQ and

DHM. When the coupled assays were performed at 65 μM (peak efficiency), 36% conversion of

DHQ to Cat and 12% conversion of DHM to GC was observed. Due to the inaccuracy of calculating kinetic properties from the coupled assays, only ‘pseudo-kinetics’ could be estimated

by plotting product formation rate in relation to varying substrate (DHM or DHQ) concentrations

(Fig. 3.6 left panel). Conventional determination of Michaelis–Menten kinetics cannot be

conducted as the concentration of the reaction intermediate (leucoanthocyanidin) has to be

inferred from the concentration of DFR substrate (dihydroflavanol) used. Thus, the term pseudo-

kinetics was derived, which refers to the rate of conversion of dihydroflavanol to flavan-3-ol, when PsLAR is the rate limiting step. In these pseudo-kinetic analyses, the product synthesis rates (Vmax) of the coupled assays were 2 to 3-fold lower than those from PsANR but still

comparable (compare Fig. 3.3 and Fig. 3.6; Table 3.4). As observed for PsANR activity, the

substrate with a lower degree of B-ring hydroxylation (DHQ) was converted more efficiently, even though DHM is the expected native substrate in ‘Courier’. The confirmation of an active

PsLAR protein when coupled to PsDFR in vitro, therefore, supported the abundance of 2,3- trans-flavan-3-ols found in the PACs.

78

3.2.6 Developmental regulation of PsANR, PsDFR and PsLAR in pea seed coat

Coordinated expression of PsANR and PsLAR influences the PA content and composition

in pea seed coat. To understand the developmental regulation of these two key genes, temporal

expression of PsANR, PsLAR, and PsDFR, from 6-20 DAA in ‘Courier’ seed coat was

determined by quantitative reverse-transcription PCR (qRT-PCR). The transcript abundance of

both PsANR and PsLAR were high during the earlier stages of pea seed coat development (Fig.

3.8A and B). Both genes displayed a decline in expression as the seed coat matured, but PsLAR

transcripts decreased significantly faster than PsANR transcript. Seed coat PsDFR transcript

levels (codes for the enzyme responsible for the production of substrate used by LAR and ANS)

were stable from 6 to 20 DAA, except for a 2-fold increase at 10 DAA (Fig. 3.8C). Maximal PA accumulation in ‘Courier’ seed coat was not immediately followed by transcriptional induction of PsANR and PsLAR, but reached its highest level at 20 DAA (Fig. 3.8D). PA mDP increased to

five by 15 DAA, and it remained at this level to 20 DAA (Fig. 3.8D).

3.2.7 Heterologous expression of PsLAR in Arabidopsis

Arabidopsis lacks LAR and does not accumulate trans-flavan-3-ols (Lepiniec et al.,

2006). In Arabidopsis, all leucoanthocyanidins are channelled to cis-flavan-3-ols by AtANS and

AtANR (Fig. 1.1). In order to examine if the expression of PsLAR in Arabidopsis seed coat can re-direct metabolic flux to trans-flavan-3-ols, PsLAR with FLAG-epitope tag was expressed by an approximately1.3-kb fragment of Arabidopsis ANR promoter (Debeaujon et al., 2003). This chimeric construct (PANR-PsLAR-35S terminator) was transformed into wild-type Arabidopsis as

well as an Arabidopsis ANR knockout mutant (anr) identified from T-DNA knockout database

(Fig. 3.9A). The wild-type Arabidopsis expressing PsLAR was predicted to synthesize PAs

79

comprised of a mixture of cis- and trans-flavan-3-ols while the PAs of the anr mutant expressing

PsLAR were expected to be exclusively composed of trans-flavan-3-ols.

Arabidopsis PsLAR transformants were screened by kanamycin selection and by PCR of genomic DNA. Subsequently, the presence of PsLAR transcript and its recombinant enzyme were confirmed by reverse transcription (RT)-PCR and immunoblot analysis using anti-FLAG antibodies (Fig. 3.9B and C). The Arabidopsis transgenic lines validated by RT-PCR and immunoblot were further examined for the expected alteration to the PA chemical phenotypes.

The anr mutant expressing PsLAR was used to observe the restoration of seed coat color (brown pigmentation) and to detect the presence of DMACA reactive products (i.e. PAs or flavan-3-ols); the seeds from wild-type Arabidopsis expressing PsLAR were used to profile the flavan-3-ol units composing the seed coat PAs after phloroglucinol derivatization. Despite the clear evidence of successful transformation, transgene expression and PsLAR accumulation, no complementation of the seed coat color or presence of DMACA reactive products was observed in the in anr mutant background, nor was the presence of a trans-flavan-3-ol (Cat) or its phloroglucinol derivative observed in wild-type Arabidopsis (Fig. 3.10).

3.3 Discussion

3.3.1 Proanthocyanidin in pea (Pisum sativum)

To investigate the PA diversity of pea seeds, an analysis of four pea cultivars was conducted, which demonstrated there is both qualitative and quantitative PA chemical diversity within with Pisum sativum. All three PACs contained PA levels comparable to that found in blueberries, cranberries, sorghum (high tannin whole grain extrudate) and hazelnuts (Gu et al.,

2004). ‘Courier’ and ‘Solido’ PAs are composed primarily of prodelphinidin subunits (tri-

80

hydroxylated B-ring; Fig. 1.1; Table 3.1), similar to tea (Lin et al., 1996). In contrast,

‘LAN3017’, was composed of procyanidin-type subunits (di-hydroxylated B-ring; Fig. 1.1;

Table 3.1). These differences in PA subunit composition may impact the nutritional quality as

tri-hydroxylated flavan-3-ols (e.g. GC and EGC) have a higher antioxidant potential than di-

hydroxylated forms (e.g. Cat and EC; Rice-Evans et al., 1997). Additionally, ‘LAN3017’ PA

polymers are 2-3 fold longer than those in ‘Courier’ or ‘Solido’. The mechanism controlling PA

polymerization remains obscured, but it is particularly relevant as PA bioavailability after

consumption by animals and humans is inversely related to polymer length (Rasmussen et al.,

2005). In this regard, pea offers a valuable system to investigate the molecular basis for subtle

biochemical differences in PA biosynthesis, and the pea cultivars with different PA profiles can

be integrated into animal and human nutritional studies.

3.3.2 Contribution of LAR to proanthocyanidin biosynthesis in pea and other plants

Since the discoveries of ANR (also known as BANYULS) from Arabidopsis and LAR from

D. uncinatum, it has become well accepted that cis-flavan-3-ols are synthesized by the consecutive reactions ANS and ANR, whereas trans-flavan-3-ols are synthesized by LAR, from the common substrates flavan-3,4-diols (leucoanthocyanidins; Fig. 1.1; Tanner et al., 2003, Xie et al., 2003). For LAR activity, biochemical data using purified recombinant enzyme from this work and studies with D. uncinatum (Tanner et al., 2003), grape (Bogs et al., 2005), and tea

(Pang et al., 2013a) have shown that LAR efficiently catalyzes the synthesis of trans-flavan-3- ols from leucoanthocyanidins (e.g., leucocyanidin). In the PsDFR-PsLAR coupled assays shown here, the synthesis rates of the LAR products, Cat and GC, from the respective substrates (i.e.,

DFR substrates) were slightly lower than, but still comparable to, the rates of cis-flavan-3-ols

81

production by PsANR. This is consistent with the pea PA monomer profile of comparable

amounts of trans- and cis-flavan-3-ols, which the latter being slightly more abundant (Table 3.1).

Although LAR has been isolated from several plants, LAR kinetic data is scarce, likely due to the

instability and inaccessibility of its substrates. The only Km values reported are 5-26 μM for three types of leucoanthocyanidins using native D. uncinatum LAR (Tanner et al., 2003). In the coupled assays, the amount of PsLAR substrates (i.e., intermediates in coupled reaction) is expected to be very low. Thus, the efficient synthesis of PsLAR products observed in the coupled assays implies rapid consumption of low abundant intermediates by PsLAR and may reflect a high affinity of PsLAR for these substrates. Evidence of this efficiency was seen in two previous studies that employed a DFR-LAR coupled assays (Maugé et al., 2010) and a crude

plant protein extract (Singh et al., 1997), in which no leucoanthocyanidin intermediates were observed.

The biochemical data for PsLAR strongly support its role in production of trans-flavan-3-

ols in pea. However, data from LAR overexpression studies in heterologous plants suggest that

production of trans-flavan-3-ols through LAR in planta may require more than the presence of

enzymatically competent LAR protein. Previously, the expression of D. uncinatum and M.

truncatula LAR by a constitutive cauliflower mosaic virus (CaMV) 35S promoter in two LAR-

lacking (thus, trans-flavan-3-ol free) plants, white clover and tobacco, did not lead to the

synthesis of trans-flavan-3-ols (Tanner et al., 2003, Pang et al., 2007). In the present study,

instead of the constitutive CaMV35S promoter, the Arabidopsis ANR promoter, known to drive

strong gene expression in seed coat (Debeaujon et al., 2003), was used to drive PsLAR

expression in Arabidopsis seed coat. Precise expression of PsLAR by the ANR promoter at the

appropriate developmental stages of seed coat cells, where substrates for PsLAR would be

82

abundant, was expected to result in the production of trans-flavan-3-ols (Cat). Accordingly,

PsLAR transcript and protein were clearly detected by RT-PCR and immunoblot analysis from the transgenic Arabidopsis siliques (Fig. 3.9); nonetheless, no evidence of trans-flavan-3-ol synthesis was observed in either wild-type Arabidopsis or anr mutant backgrounds (Fig. 3.10).

Very recently, the isolation of C. sinensis LAR was reported and it was constitutively expressed in transgenic tobacco co-expressing Arabidopsis PAP1 transcription factor (Pang et al., 2013a).

PAP1 is known to massively increase the flux of LAR substrates. By employing such flux- enhanced condition, transgenetically produced Cat was detected for the first time in planta.

Further studies are required to build a working model to fully describe the metabolic flux through LAR for the production of trans-flavan-3-ols in planta.

Both PsLAR and PsANR transcript abundance was high early in pea seed coat development (Fig. 3.8A and B). PsLAR showed the highest expression at 6 DAA but its transcripts were rapidly reduced to a basal level by 10-12 DAA. On the other hand, PsANR displayed a longer duration of expression with a substantial level of transcription up to 15 DAA, which overlapped with half-maximal PA accumulation and preceded peak PA accumulation at

20 DAA (Fig. 3.8D). Consistent with this result, the Roche/454-sequencing read number of

PsLAR from the 10-25 DAA seed coat sample was 13-fold lower than that of PsANR (Table 3.6).

Curiously, the molar percent of GC extension units (LAR product) steadily increased in the seed coat PAs from 12-30 DAA (Table 3.3), yet PsLAR transcript was minimal by 12 DAA and remained unchanged until 20 DAA (Fig. 3.8B). This may suggest that a pool of flavan-3-ols (e.g.

GC) is produced early during seed development from which substrates are drawn for PA polymerization later in development. Alternatively, PsLAR may have an unusually long half-life.

A better understanding of PA polymerization is required to better address this discrepancy

83

Intriguingly, it was reported that VvANR can synthesize not only EC (cis-flavan-3-ol) but also Cat (trans-flavan-3-ol) in 50:50 molar ratio by intrinsic epimerase activity (Gargouri et

al., 2010), which was also recently observed with CsANR (Pang et al., 2013a). However,

epimerase activity could not be detected from PsANR in our study, suggesting the trans-flavan-

3-ols in pea were not derived from PsANR epimerase activity.

In the light of this work and that of others, LARs from different plants have displayed competent biochemical activity to transform leucoanthocyanidins to trans-flavan-3-ols in vitro; however, the expression of LAR in heterologous LAR-free plants still raise questions. In addition, the possibility of in vivo contribution by ANR epimerase activity to trans-flavan-3-ols production complicates our understanding of PA biosynthesis and requires more meticulous investigation. It appears that the analysis of the LAR knockouts in trans-flavan-3-ol accumulating plants (e.g., pea, tea, or grape) is necessary to firmly establish in vivo function of LAR.

84

Figure 3.1 HPLC chromatograms of the phloroglucinol acid hydrolysis products from pea seeds of ‘Courier’, ‘LAN3017’, and ‘Canstar’. 1. L-Ascorbic acid; 2. Phloroglucinol; 3. Gallocatechin-(4α-2)-phloroglucinol; 4. Epigallocatechin-(4β-2)-phloroglucinol; 5. Gallocatechin; 6. Putative Catechin-(4β-2)-phloroglucinol; 7. Catechin-(4α-2)-phloroglucinol; 8. Epicatechin-(4β-2)-phloroglucinol; 9. Epigallocatechin; 10. Catechin; 11. Epicatechin.

Figure 3.2 Expression of PsANR, PsLAR, and PsDFR in E. coli and purification of the recombinant proteins. A) Recombinant PsANR. Crude extract (lane 1), washes 1 and 2 (lanes 2 and 3, respectively) and elutent (lane 4). B) Recombinat PsDFR and PsLAR. Lanes 1, 2 and 3 represent crude soluble protein, crude insoluble protein and purified concentrated protein, respectively, in PsDFR expressing culture. Lanes 4, 5 and 6 represent the similar samples from PsLAR expressing culture. Each lane contains approximately 12-15 µg of protein visualized by Coomassie staining.

85

Figure 3.3 In vitro characterization of PsANR recombinant enzyme. A-C: PsANR reaction kinetics were explored using cyanidin (A), delphinidin (B) and pelargonidin (C). Left: Michaelis- Menten kinetics plots. Each data point represents means ± SE (n=3). Right: product identification in reference to authentic standards by ESI-LC-MS/MS.

86

Figure 3.4 PsANR in vitro assay controls. A, minus NADPH control (blue), boiled control (green), complete reaction (black). B, authentic epicatechin. Peak 1, 6.30 min; peak 2 6.31 min. C, mass spectrum of peak 1. D, mass spectrum of peak 2. Positive ESI product ion scan, EIC 291. Mass spectrum CID (collision induced dissociation energy) 12 eV (electron volts).

87

88

Figure 3.5 Alignment of LAR protein sequences. Pisum sativum (Ps; KF516485), Medicago truncatula (Mt; XP_003591830.1), Lotus corniculatus (Lc; LAR2-1, ABC71328.1; LAR2-2, ABC71331.1), Desmodium uncinatum (Du; Q84V83.1), Phaseolus coccineus (Pc; CAI56322.1), Vitis vinifera (Vv; CAI26309.1). LAR characteristic amino acid motifs RFLP, ICCN, and THD marked by black bars above the Pisum sativum sequence.

89

Figure 3.6 In vitro PsDFR and PsLAR coupled assays. Product synthesis rates from the coupled assays were measured using DFR substrates, dihydroquercetin (A) and dihydromyricetin (B). Left: pseudo-kinetics plots were inferred from the coupled assays. Each data point represents means ± SE (n=3); Right: product identification in reference to authentic standards by ESI-LC-MS/MS. Peak 1 and 3, unknown. Peak 2 and 3, catechin. Peak 5 and 7, unknown. Peak 6 and 8, gallocatechin. Unknown peaks were seen in most, but not all, chromatographs of commercial standards, allowing for positive confirmation of authentic peak.

90

Figure 3.7 LC-MS/MS profiles of the PsDFR-PsLAR coupled assay and controls. A, Coupled assay controls. LC-MS/MS extracted ion chromatogram of coupled assay using DHM for detection of gallocatechin ion [M+H] m/z 307. 1, PsDFR and PsLAR; 2, minus NADPH control; 3, minus PsDFR control; 4, minus PsLAR control; 5, boiled protein control. B) Comparison of 2:1, 3:1 and 4:1 DFR:LAR ratios. 2:1, black line; 3:1, green line; 4:1, purple line. Arrow, gallocatechin peak.

91

Figure 3.8 Temporal profiles of PsANR, PsDFR and PsLAR transcript abundance, PA content and mean degree of polymerization in pea seed coats of ‘Courier’. Relative transcript abundance of ‘Courier’ A) PsANR B) PsDFR and C) PsLAR from 6 to 20 DAA using qRT-PCR. Transcript abundance values of PsANR and PsDFR were normalized to the 20 DAA, PsLAR was normalized to the 12 DAA samples. Actin was used as the reference gene in all experiments. Data are means ± SE (PsLAR and PsANR, n=4; PsDFR n=3). D) PA content (black circles) and mean degree of polymerization (mDP; white circles) in developing ‘Courier’ seeds coats from 12 to 30 DAA; data are means ± SE (n=3).

92

Figure 3.9 RT-PCR and immunoblot analyses of PsLAR transcript and protein in Arabidopsis wild-type (WT) and Arabidopsis ANR knock-out (anr) lines. A) Confirmation of anr knockout. Primers flanking the T-DNA insertion site are only able to amplify the wild-type allele from genomic DNA (left) and AtANR transcript is only present in wild-type cDNA (right). TIM44-2 used as loading control. B) RT-PCR of PsLAR transcript (above) and AtDFR (below). C) Western blot detection of FLAG-tagged PsLAR (above) and Coomassie stained SDS-PAGE gels (below).

93

Figure 3.10 Analysis of seed coat color and PA chemical profile from PsLAR-transgenic Arabidopsis. A) Dry mature seeds (above) and p-dimethylaminocinnamaldehyde (DMACA) stained seeds (below) from GFP vector control lines (Col GFP and ANR KO GFP) and ANR KO- PsLAR transgenic lines (13, 14, and 15) show a lack of proanthocyanidins or flavan-3-ols in the seed coat. During maturation and dessication, proanthocyanidins oxidize producing a golden brown colour. DMACA reacts with free flavan-3-ols and proanthocyanidins, producing a dark colour. B) HPLC chromatograph of the phloroglucinol acid hydrolysis products of proanthocyanidins extracted from the mature seeds of Col PsLAR 5. No catechin-phloroglucinol (PA extension units; solid arrow) or catechin (PA terminal units; dashed arrow) were detected in Col LAR5. ‘LAN3017’ seed coat proanthocyanidins were used as a control for catechin- and epicatechin-phloroglucinol. Peak 1, catechin-phloroglucinol. Peak 2, epicatechin-phloroglucinol. For catechin (peak 3) and epicatechin (peak 4), commercial standards were used.

94

Table 3.1 PA chemical analyses of ‘Courier’, ‘Solido’and ‘LAN3017’ pea seeds and seed coats. PA analysis using phloroglucinolysis and RP-HPLC-DAD in mature pea seeds Peak Compound ‘Courier’ ‘Solido’ ‘LAN3017’ ID 3 GC-P 29.23 ± 0.73a 28.99 ± 0.88 1.36 ± 0.02 4 EGC-P 55.38 ± 1.05 51.16 ± 1.27 0.79 ± 0.05 5 GC 9.88 ± 0.29 10.53 ± 0.14 ndb 6 Cat-P isomer nd nd 4.98 ± 0.00 7 Cat-P nd 0.22 ± 0.01 21.41 ± 0.01 8 EC-P 0.37 ± 0.01 0.60 ± 0.02 65.37 ± 0.03 9 EGC 5.13 ± 0.04 8.24 ± 0.21 nd 10 Cat nd nd 0.94 ± 0.00 11 EC nd 0.27 ± 0.02 5.15 ± 0.02

mDP 6.7 ± 0.2 5.3 ± 0.1 16.4 ± 0.0 Conversion yieldc 83.9 ± 1.6 78.3 ± 4.9 59.1 ± 0.9 Total seed PA d 416.0 ± 7.7 264.1 ± 14.6 96.7 ± 13.2

Butanol-HCl quantification of PA content from pea seed coats Total seed coat PA 4.57 ± 0.03 4.51 ± 0.09 5.10 ± 0.07 (%)e a Molar % ± SE (n=2); b nd, not detected; c Yield of PA extract calculated. d Total seed PA content based on characterized PA subunits, expressed as mg/100 g dry weight of whole seeds. GC-P, gallocatechin-(4α→2)-phloroglucinol; GC, gallocatechin; EGC-P, epigallocatechin-(4β→2)-phloroglucinol; EGC, epigallocatechin; Cat-P, catechin-(4α→2)- phloroglucinol; Cat, catechin; EC-P, epicatechin-(4β→2)-phloroglucinol; EC, epicatechin. PAs were extracted using 66% acetone. e Total seed coat PA content expressed as % = mg/100 mg dry weight of seed coat sample using 80% methanol extraction. Proanthocyanidin extract from ‘CDC Acer’ pea seed coats purified as described by Jin et al. (2012) was used as a standard for the butanol-HCl assay. mDP, mean degree of polymerization. Data are means ± SE (n=3).

95

Table 3.2 Characterization of phloroglucinolysis products from pea seeds using LC-MS- MS analysis.

a b - c Compound Cultivar tR (min) [M-H] Fragment ions GC-P ‘Solido’ 7.4 429 303, 261, 177 EGC-P ‘Solido’ 9.6 429 303, 261, 177 GC ‘Solido’ 16.8 305 219, 137 GC Standard 18.1 305 231, 219, 179 Cat-P isomer ‘LAN3017’ 17.2 413 287, 261, 161, 135 Cat-P ‘LAN3017’ 21.3 413 287, 261, 217, 175 EC-P ‘LAN3017’ 22.1 413 287, 261, 175 EGC ‘Solido’ 33.4 305 219, 137 EGC Standard 34.4 305 219, 179, 137 Cat ‘LAN3017’ 33.3 289 245, 173, 137 Cat Standard 33.2 289 245, 205, 137 EC ‘LAN3017’ 44.3 289 245, 137 a GC-P, gallocatechin-(4α→2)-phloroglucinol; EGC-P, epigallocatechin- (4β→2)-phloroglucinol; EC-P, epicatechin-(4β→2)-phloroglucinol; CT-P, catechin-(4α→2)-phloroglucinol; Cat, catechin; GC, gallocatechin; EGC, epigallocatechin; EC, epicatechin. b Retention time on LC-MS. The gallocatechin and epigallocatechin standards were run at different times with the legume samples leading to some variation in retention time between the same compounds in the standards and samples. - c MS was run in the negative mode and all the molecular ions are [M-H] .

96

Table 3.3 PA profiles in developing seed coats of ‘Courier’. PA content was determined using the phloroglucinolysis and RP-HPLC-DAD analysis method. GC-P EGC-P EC-P GC EGC EC 12 DAAa 20.5 ± 1.9b 55.8 ± 1.9 0.42 ± 0.02 17.3 ± 0.4 5.65 ± 0.26 0.38 ± 0.03 15 DAA 24.8 ± 0.7 55.2 ± 0.1 0.38 ± 0.03 15.0 ± 0.3 4.35 ± 0.36 0.32 ± 0.10 20 DAA 27.5 ± 0.6 52.9 ± 0.7 0.36 ± 0.04 14.4 ± 0.7 4.47 ± 0.19 0.29 ± 0.07 25 DAA 29.2 ± 0.6 51.6 ± 1.0 0.22 ± 0.19 14.3 ± 0.4 4.51 ± 0.16 0.21 ± 0.01 30 DAA 30.8 ± 1.7 49.8 ± 2.2 0.31 ± 0.04 14.1 ± 0.4 4.76 ± 0.21 0.19 ± 0.05 aDAA, days after anthesis; bMolar % ± SE (n=3). GC-P, gallocatechin-(4α→2)-phloroglucinol; EGC-P, epigallocatechin-(4β→2)-phloroglucinol; EC-P, epicatechin-(4β→2)-phloroglucinol; GC, gallocatechin; EGC, epigallocatechin; EC, epicatechin.

Table 3.4 PsANR reaction kinetics using cyanidin, pelargonidin or delphinidin as a substrate.

Substrate Km Vmax kcat kcat/Km (µM) (nmol mg-1 min-1) (sec -1) (M-1 sec -1) Pelargonidin 39.0 ± 0.1a 72.5 ± 2.2 1.2 x 10-3 30.7 Cyanidin 37.0 ± 0.2 30.2 ± 5.0 0.5 x 10-3 13.5 Delphinidin 183.6 ± 0.2 47.3 ± 6.4 0.8 x 10-3 4.3 aMeans ± SE (n=3).

97

Table 3.5 Top 20 unigenes in ‘Courier’ 10-25 DAA seed coat transcriptome. Following assembly and UniProt annotation, contigs were sorted based on the number of reads comprising each contig. Read E- Sequence ID Hit ID-UniProt Functional description number value 1.00E- CL1Contig1039 218 Q9FT05_CICAR Cationic peroxidase 173 1.00E- 1-aminocyclopropane-1- CL1Contig946 217 ACCO_PEA 141 carboxylate oxidase Indole-3-acetic acid-amido CL1Contig433 216 0 A2Q639_MEDTR synthetase CL1Contig362 195 0 C3TS12_CICAR Methionine synthase 1.00E- CL1Contig1430 187 Q84XT1_MEDTR Anthocyanidin reductase 157 CL1Contig1488 186 0 A5Y5L3_SOYBN Flavonoid 3',5' hydroxylase 1.00E- Putative Glucan 1,3-beta- CL1Contig753 165 Q8RU51_ORYSJ 130 glucosidase 1,4-alpha-D-glucan CL1Contig1170 143 0 AMYB_VIGUN maltohydrolase 4.00E- CL1Contig898 125 Q8VYY0_PEA Dehydrin-related protein 19 Vacuolar-processing enzyme, CL1Contig383 124 0 VPE_VICSA Proteinase B CL1Contig1280 115 0 Q2HRK7_MEDTR Protease-associated protein 7.00E- CL1Contig610 105 Q9MB25_VIGUN Pathogenesis-related protein 52 3.00E- CL1Contig12 100 Q0GPF8_SOYBN BZIP transcription factor bZIP124 45 4.00E- Carboxylic ester hydrolase, CL1Contig1745 94 B9SRU3_RICCO 64 putative 3.00E- CL1Contig780 87 P93332_MEDTR Nodulin MtN3 family protein 71 1.00E- CL1Contig1844 83 G2OX1_PEA Gibberellin 2-beta-dioxygenase 1 135 CL1Contig59 79 0 Q6UDA0_TRIPR Actin 1.00E- CL1Contig1264 78 PRS7_ARATH 26S protease regulatory subunit 7 178 CL1Contig78 77 0 Q9FEU4_PEA Putative serine carboxypeptidase

98

Table 3.6 Relative transcript abundance of phenylpropanoid and PA pathway genes in ‘Courier’ 10-25 DAA seed coat transcriptome. Gene Read number Phenylalanine ammonia lyase 27 Cinnamate 4-hydroxylase 9 4-coumarate:CoA ligase 14 Chalcone synthase 113 Chalcone isomerase 17 Flavonoid 3'-hydroxylase 2 Flavonoid 3'5'-hydroxylase 186 Flavanone 3-hydroxylase 67 Dihydroflavonal 4-reductase 71 Anthocyanidin synthase 10 Anthocyanidin reductase 222 Leucoanthocyanidin reductase 13

99

Chapter Four: Comparative transcriptomics in pea seed coat

4.1 Introduction

Legumes provide a rich source of nutrients to both humans and livestock. Economic and

nutritional importance has made pea (Pisum sativum) an attractive species for plant research.

Peas produce large seeds from which the seed coat is easy to isolate, making pea an ideal plant

for studying seed coat genetics. Furthermore, due to a long breeding history, an estimated 2,000

cultivated varieties of pea now exist in seed banks around the world (Smýkal et al., 2012). The

genomes of P. satiuvm and related species, such as P. elatius, P. abyssinicum, P. humile and P.

fulvum, are similar in size, and there is a high degree of genomic stability within the Pisum genus

(Baranyi and Greilhuber, 1996). Similar genetic backgrounds combined with the phenotypic

diversity within P. sativum, and more broadly within Pisum, provide a useful resource for

comparative genomics studies.

The seed coat plays an essential role in flowering plant biology by providing nourishment and protection to the developing embryo. Seed development is a highly orchestrated process.

One of the first steps in seed development is the differentiation of the maternal ovule integument into the seed coat tissue (Le et al., 2007). Seed development in pea proceeds in three stages;

initial growth primarily occurs in the endosperm and seed coat (pre-storage), followed by

embryogenesis (storage), and after a lag phase, maturation and desiccation (Hedley and

Ambrose, 1980). Transition through the early stages is mediated at least in part by signalling via

the seed coat (Weber et al., 2005). During early development, the seed coat also acts as transient

storage organ. Sugars and amino acids are transported via the phloem to the seed coat, which

sequesters, metabolizes and diffuses these nutrients to the developing embryo and endosperm

(Rochat and Boutin, 1991, Van Dongen et al., 2003). Improving our understanding of pea seed

100

coat genetics and biochemistry may help plant breeders develop new varieties with improved

seed quality traits.

In addition to playing a regulatory role, the seed coat acts as a protective barrier,

shielding the developing embryo and endosperm from pathogens and environmental damage.

Certain pea cultivars accumulate large amounts of proanthocyanidins (PAs), which are believed

to play a role in defense (Section 3.2.1; Dixon et al., 2005, Jin et al., 2012). Lately, research interest in PAs has increased as human consumption of PAs has been associated with anti- inflammatory activities, reduced risk of cardiovascular disease, and inhibition or prevention of certain cancers (He et al., 2008). However, our understanding of PA biosynthesis and accumulation remains incomplete, specifically the processes associated with intracellular trafficking and polymerization/condensation. A natural qualitative and quantitative PA diversity exists within P. sativum (Section 3.2.1), providing a useful genetic spectrum to interrogate for novel genes and regulatory pathways relating to PA composition, as well as for other phenotypic differences, such as seed weight (Fig. 4.1; Jin et al., 2012).

4.1.1 Comparative transcriptomics by RNA-sequencing (RNA-Seq)

Analysis of RNA-sequencing (RNA-Seq) data involves mapping sequence reads against reference sequences, which can be a sequenced genome, a reference transcriptome or a de novo assembled transcriptome (see section 1.5.3 for details). Gene expression can be estimated by mapping the reads against the reference. The number of reads mapped to a unigene in the reference is proportional to the expression of that gene in the tissue sampled. However, preparation of cDNA for NGS involves a fragmentation step. Longer transcripts generate more cDNA fragments and thus more reads, and therefore normalization of the data is required. Gene

101

expression is typically represented as reads per kilobase of a transcript per million reads mapped

(RPKM; Mortazavi et al., 2008). This normalization accounts for differences in transcript

lengths and differences in the total number of reads between libraries.

The differences between NGS technologies can impact the choice of sequencing platform

based on the experimental goals. Illumina technology is advantageous over 454-sequencing

(Section 1.5) for RNA-Seq experiments as it produces far more reads, thereby providing better sequencing depth (Ratan et al., 2013). However, in the absence of a reference genome de novo

assembly of the reads is necessary, in which case the inclusion of technologies that generate

longer reads is often favourable (this work and Hornett and Wheat, 2012).

Processing and analysis of NGS data requires significant computational power and

sophisticated software. Most de novo assembly software pipelines are optimized for specific

NGS data (i.e. 454-reads or Illumina reads). CLC Genomics Workbench7 is a commercial

software package that effectively handles NGS data composed of separate or mixed 454 and

Illumina reads (Hornett and Wheat, 2012). Due to its flexibility, low computation requirements

and ease of use, CLC was chosen for the majority of the NGS data analysis conducted here.

4.2 Results and Discussion

4.2.1 Phenotype of five P. sativum cultivars

Seed coats from five pea cultivars (‘Alaska’, ‘Canstar’, ‘Courier’, ‘LAN3017’, and

‘Solido’) were subjected to comparative transcriptomics analysis in this study. Of these, ‘Alaska’

(green pea) and ‘Canstar’ (yellow pea) are PA-lacking cultivars (PLC), whereas ‘Courier and

‘Solido’ (marrowfat pea) and ‘LAN3017’ (maple peas) are PA-accumulating cultivars (PAC;

7 http://www.clcbio.com/ 102

Fig. 4.1). ‘Alaska’ and ‘Canstar’ seeds had clear (non-pigmented) seed coats, and ‘Alaska’ seeds

were the only ones that remained visibly green after desiccation (Fig. 4.1), which could be due to

the presence of an ii mutation affecting chlorophyll in the cotyledons (Ellis et al., 2011).

‘Courier’, ‘LAN3017’ and ‘Solido’ all displayed speckled or solid brown seed coats, due to PA accumulation and oxidization (Section 3.2.1; Fig. 4.1). ‘Alaska’ and ‘Solido’ seeds were also slightly wrinkled, a trait in pea associated with the r allele affecting starch content in the seed

(Ellis et al., 2011). Finally, ‘Solido’ seeds were 33-43% heavier than seeds from the other four cultivars (Fig. 4.1). These phenotypic variations suggest different gene expression profiles in the coat of these cultivars.

As the majority of seed coat growth and development occurs early in seed development, ten days after anthesis (DAA) was selected as the point at which to interrogate seed coat gene expression across the five pea cultivars. Pea seed coat growth rate peaks between eight to ten

DAA, and then stabilizes around 12 DAA (Nadeau et al., 2011). Ten DAA also marks the peak expression of the key PA biosynthetic gene, anthocyanidin reductase (PsANR), in ‘Courier’ (Fig.

3.8A). Together, ANR and leucoanthocyanidin reducatse (LAR) catalyze the committed steps in

PA biosynthesis (Fig. 1.1), forming the monomeric subunits, 2,3-cis-flavan-3-ol and 2,3-trans- flavan-3-ol, respectively (He et al., 2008). While examples of LAR-deficient PA producing species exist in nature, all known PA-producing species display ANR activity, indicating ANR is an essential gene for PA biosynthesis (Lepiniec et al., 2006, Peng et al., 2012). Therefore, we decided to use pea seed coats collected at 10 DAA from five cultivars for comparative transcriptomics analysis.

103

4.2.2 Down-regulation of key metabolic genes in phenylpropanoid pathway in PLC

One of the goals of this study is to better understand PA metabolism in pea by linking PA

chemical profiles with the gene expression patterns. The lack of the pigment in the two PLCs

(‘Alaska’ and ‘Canstar’) can arise simply from mutation of biosynthetic genes in phenylpropanoid pathway in seed coat or from mutations in other genes (e.g., transcription factor, transporter, etc.). Furthermore, ‘Alaska’ and ‘Canstar’ may have different origins of mutations. In order to probe the nature of PLC phenotype, relative expression of the four genes in the phenylpropanoid pathway were initially analyzed by quantitative reverse-transcription

PCR (qRT-PCR) using RNA from pea seed coat isolated at 10 DAA. Selected genes were phenylalanine ammonia lyase (PAL; encodes an entry point enzyme of phenylpropanoid biosynthesis), dihydroflavonol 4-reductase (DFR, encodes an enzyme for flavonoid biosynthesis) and anthocyanidin redutase/leucoanthocyanidin reductase (ANR/LAR, encode enzymes for PA biosynthesis; Fig. 1.1).

qRT-PCR results clearly showed that transcript abundance of all four genes at 10 DAA was negligible in PLCs in comparison to PACs (Fig. 4.2A). Among PACs, ‘LAN3017’ showed higher levels of gene expression for four genes than those in ‘Courier’ and ‘Solido’. These results indicated that the PLC phenotype was not due to a mutation of a biosynthetic gene but likely due to a mutation in a regulatory gene. Therefore, further comparison of gene expression between the pea cultivars will help to understand PA metabolism as well as other phenotypic differences in five pea cultivars.

104

4.2.3 NGS sequencing and transcriptome assembly

The focus of this research was to generate a dataset to examine differential gene expression in young pea seed coat. To achieve this, total RNA-derived cDNA from 10 DAA seed coat tissue from the five pea cultivars was sequenced using Illumina Hi-Seq 2000 and Roche/454

GS-FLX Titanium sequencing. Illumina sequencing yielded approximately 295 million paired- end reads with an average length of 99-bp and Roche/454-sequencing produced approximately

1.1 million reads with an average length of 336-bp (see Table 4.1 for details).

Reliable reference sequences for quantitative read mapping are essential, but a complete

P. sativum genome sequence was not available. Therefore, de novo assembly of transcripts from all five cultivars was chosen, and used for a reference dataset for quantitative transcriptomics.

Advantage and disadvantage of 454 and Illumina sequences were taken into consideration. 454 can generate longer reads but is prone to sequencing errors in homo-polymers; Illumina can produce significantly more reads (>100-fold more than 454) but such depth can result in the sequencing of unprocessed transcripts and extremely low abundance contamination (genomic

DNA, for example), which risks overestimating unique transcripts and can make de novo assembly difficult. To select the most suitable assembly for the purposes of the project, assemblies by different methods and datasets were generated and evaluated.

Two different assembly methods (Newbler or CLC; see section 2.12) were used with either 454 or Illumina reads alone or with the reads pooled together (Table 4.2). As a result, four different assemblies were generated (referred to as 454NB, 454CLC, ILM, 454-ILM; see Table

4.2 for detailed description and basic metrics). Two basic metrics for evaluating assemblies are the N50 and the average length of contigs (in silico generated sequences composed of two or more assembled reads). The N50 is the length (N) for which 50% of all the bases in the assembly

105

are contained in sequences of N length or greater. These two metrics provide a rough estimate of

the completeness of assembled unigenes [contigs and singletons (single unassembled reads)].

Given an estimated average mature transcript length of between 1-1.5 kb (Alexandrov et al.,

2006), N50 and average contigs lengths within this range are desirable as this suggests the presence of completely assembled transcripts. When compared, the N50 and average contig lengths of all the assemblies were comparable (Table 4.2).

However, the number of unigenes and BLASTx hits from assemblies of only 454 reads differed significantly from the assemblies containing Illumina reads. 454 assemblies (454NB and

454CLC) were comparable, containing approximately 20,000 unigenes, and 77% to 84%

BLASTx hits to Arabidopsis thaliana TAIR10 and Medicago truncatula 4.0 protein databases, respectively, whereas assemblies containing Illumina reads (ILM and 454-ILM) consisted of roughly 90,000 unigenes, and only 34 to 45% BLASTx hits (Table 4.2). The substantially higher number of unigenes combined with the much lower percentage of BLASTx hits is indicative of incomplete transcript assembly, sequencing of immature mRNA, or possibly chimeric unigenes.

To determine which assembly best represented the pea seed coat transcriptome, and hence be most suitable for comparative transcriptomics analysis, further evaluation was conducted.

4.2.3.1 Further evaluation of the transcript assembly

Recently, two P. sativum de novo transcriptomes were published (Franssen et al., 2011,

Kaur et al., 2012). The Franssen et al. (2011) assembly (hereafter referred to as the Franssen assembly) and the Kaur et al. (2012) assembly (the Kaur assembly) were comprised of reads generated from transcripts from numerous tissue types, but only the Kaur assembly included immature seed tissue. Despite the use of 454-sequencing similar to that employed by this work,

106

the Franssen and Kaur assemblies contained 81,449 and 70,682 unigenes, respectively, which is

more comparable to the results of the Illumina assemblies produced here (Table 4.2). As the

Franssen assembly lacked seed coat transcripts and reportedly contained a significant number of

chimeras, it was not suitable as a reference for seed coat transcriptomic analysis.

The four de novo assemblies and the Kaur assembly were further evaluated by two

different methods. First, five known single copy genes from pea were searched against the

assemblies using BLASTn (Table 4.3). Splitting of a single transcript into multiple unigenes is a

common problem in de novo transcript assembly and can interfere with RNA-Seq analysis as

RPKM values may be artificially low if reads from a transcript are diluted across multiple unigenes all representing a single transcript. This analysis allowed an assessment of the level of artificial (in silico) transcript redundancy in the assembled transcriptomes. PsANR, PsDFR, and

PsLAR sequences, which were previously determined (Section 3.2.3 and 3.2.5), were also

included to assess the presence of PA pathway genes. Second, the unigenes from each assembly were mapped to the gene models predicted from M. truncatula whole genome sequences using high stringency conditions (Section 2.12). Although this method is similar to the BLASTn analysis presented in Table 1, higher stringency was applied (e.g. min. 80% amino acid similarity; Section 2.12) and hence this analysis will identify direct orthologs between these two legume plants.

First, for an assessment of split transcripts, five known single copy pea genes previously used to evaluate de novo assembly by Franssen et al. (2011) were selected. BLASTn revealed the presence of unigenes matching these genes in each of the assemblies compared (Table 4.3). All five genes were presented as single unigenes in the 454CLC assembly set while only HMG-IY transcript was split in the 454NB assembly. On the other hand, the assemblies from the Illumina

107

reads and the published assemblies showed a higher degree of transcript splitting. This analysis indicated that assemblies containing only 454 reads produced more complete representative transcripts compared to assemblies that included Illumina reads.

As the copy number of PsDFR, PsANR, and PsLAR is unknown, single unigenes were not expected. However, a comparison between the assemblies of the number of unigenes representing these gene transcripts is still informative. Most of these transcripts were present as a small number of unigenes (1-3) in all except for the Franssen assembly. As expected, the

Franssen assembly lacked any unigenes representing these genes, confirming the inadequateness of this assembly for seed coat genetics.

Second, orthologous unigenes from the four new assemblies and Kaur assembly that mapped to M. truncatula gene models were identified and analyzed. The total number of unigenes mapped from the Kaur assembly was markedly higher than from the other four assemblies; among the four new pea assemblies, the number of mapped unigenes from assemblies using the Illumina reads was approximately 2-fold higher than those using only 454 reads, closely reflecting the data in Table 4.1 (Fig. 4.3A). The high number of Kaur unigenes is not surprising as this assembly contained transcripts from multiple tissues, whereas the four new pea assemblies consisted of only seed coat tissues. These results were transformed to the percentage of total unigenes mapped (Fig. 4.3B). About 60% of the unigenes from the assemblies containing only 454 reads were orthologous to M. truncatula gene models, but only

20% of unigenes from the assemblies containing Illumina reads had orthologous partners in M. truncatula.

To gain more insight into the unigenes mapped to the M. truncatula gene models, the average lengths of the mapped unigenes from each assembly were compared (Fig. 4.3C). The

108

average length of mapped unigenes from the four new assemblies was between approximately 1-

1.2 kb. However, the average length of mapped unigenes from the Kaur assembly was only about

400-bp. The length of the orthologous unigenes combined with the total number of unigenes is

an important consideration. Assemblies with a low number of unigenes and a short length may

result in a low percentage of mapped reads when RNA-Seq analysis is conducted. However, despite the significantly lower number of unigenes in the 454NB and 454CLC assemblies, both contain relatively long orthologous unigenes.

As the objective of de novo assembly was to produce a reference transcriptome for the analysis of seed coat gene expression (i.e. RNA-Seq), mapping of ‘Courier’ Illumina reads to the five assembles was examined (Fig 4.3D). Approximately 80% of the reads were mapped to all of the assemblies, demonstrating a sufficient level of similarity between the unigenes produced by de novo assembly and unassembled Illumina reads for RNA-Seq analysis. In the case of the Kaur assembly, the relatively short length of orthologous unigenes raised the possibility of a reduced percentage of mapped ‘Courier’ reads. The high total number of unigenes in the Kaur assembly may have compensated for the short average length; however, this raised concerns about redundancy or split transcripts.

There is still no standard metric for evaluating de novo assembly of transcript data.

Instead, holistic analyses of multiple metrics are required. Overall, the 454NB and 454CLC assemblies were very similar. Both assemblies had very similar numbers of unigenes mapped to reference datasets (Table 4.1 and Fig 4.3). Assemblies containing Illumina reads contained over

4-fold more unigenes and indicated much more transcript splitting (Table 4.3) relative to assemblies of only 454 reads. The Kaur assembly performed well in almost all comparisons, with the exception of the short length of orthologous unigenes mapped to M. truncatula gene models

109

(Fig. 4.3C). In the end, assemblies composed of only 454 reads performed best, but the 454NB

assembly was ultimately chosen over 454CLC as 454NB had a slightly higher percentage of

mapped Illumina reads (Fig 4.3D). A final quality check was performed by comparing the

percentages of Illumina reads mapped to the 454NB assembly from each of the five cultivars

(Fig. 4.3E). Approximately 80% of all reads from each cultivar were mapped using the same

parameters subsequently employed for RNA-Seq analyses, indicating consistent similarity

between the 454NB unigenes and the Illumina reads from each cultivar. Therefore, the 454NB

assembly was used for all data analysis in this chapter.

The 454NB assembly was annotated by BLASTx (E-value ≤ 10-5) against the TAIR10 representative gene model set [14,811 unigenes (77%) annotated] and the UniProt Viridiplantae sequence set [16,001 unigenes (83%) annotated]. Of the 19,199 unigenes in the 454NB assembly, 6,932 were categorized into one or more gene ontology (GO) categories representing twenty-one active biological process groups in pea seed coat (Fig. 4.4). Biosynthesis and related metabolic processes were among the most abundant represented, with a number of organogenesis processes also included. A number of processes that have important biological and human interests were examined in further detail.

4.3 Comparative Transcriptomics

Relative transcript abundance is proportional to the number of Illumina reads

(approximately 56 – 69 million reads per cultivar) mapped to the representative unigenes. By individually mapping Illumina reads from each of the five cultivars to the 454NB reference, a

RPKM (reads per kilobase per million reads) value was determined for each unigene. RPKM represents the relative expression of a gene, normalized to the length of the unigene (per

110

kilobase) in the reference assembly and to the total number of reads in the library (per million

reads). Since RNAs were isolated from the same tissue type at a comparable developmental stage

(10 DAA), RPKM values will accurately reflect the expression levels of the genes in the five different pea cultivars. The RPKM value of each unigene in each cultivar was compared to identify genes differentially expressed between cultivars, which were grouped based on phenotypes.

4.3.1 Differential expression analysis of known phenylpropanoid genes in PACs vs PLCs

Previous PA chemical analysis showed that ‘Alaska’ and ‘Canstar’ (PLCs) lack PAs while ‘Courier’, ‘LAN3017’, and ‘Solido’ (PACs) accumulate PAs (Section 3.2.1; Fig. 4.1).

Therefore, to validate the quality of RNA-Seq dataset, RPKM values of known biosynthetic and regulatory genes involved in PA biosynthesis were compared between the five cultivars. With the exception of F3H, all of the PA biosynthetic genes were down-regulated > 2-fold in PLCs

relative to PACs, including two related transcription factors (PsTT8 and PsTTG2; Table 4.4).

These results demonstrated that PA metabolic genes, including PsTT8 and PsTTG2, are systematically down-regulated in PLCs.

Interestingly, higher expression of PA-biosynthetic genes was found in ‘LAN3017’ compared to ‘Courier’ and ‘Solido’ (Table 4.4). The RPKM fold differences were minimal, but the pattern was consistent, with the exception of 4CL and F3’5’H, the latter of which was expected as ‘LAN3017’ accumulates procyanidin (3', 4'-hydroxylated B-ring) PAs, whereas

‘Courier’ and ‘Solido’ contain prodelphinidin (3', 4', 5'- hydroxylated B-ring) PAs (Table 3.1;

Fig. 1.1). The elevated expression of PA biosynthetic genes is noteworthy as the mean degree of

polymerization of ‘LAN3017’ PAs was approximately 2-3 fold greater than that of ‘Courier’ and

111

‘Solido’ (Table 3.1). Further research is required to determine if these expression differences

translate into differences in flux through the PA pathway, and whether flux plays a role in

determining the mean degree of polymerization of pea seed coat PAs.

Three transcription factors, TRANSPARENT TESTA2 (TT2, a R2R3-MYB protein),

TRANSPARENT TESTA GLABRA1 (TTG1, a WD-repeat protein), and TRANSPARENT

TESTA8 (TT8, a bHLH protein) act as a ternary complex to promote the expression of genes in the early and late PA pathway (Walker et al., 1999, Nesi et al., 2000, Nesi et al., 2001, Baudry et al., 2004). In pea, A and A2 were recently identified as pea orthologs of TT8 and TTG1, respectively (Hellens et al., 2010). TRANSPARENT TESTA GLABRA2 (TTG2; a WRKY

transcription factor), which acts downstream of TTG1, is also involved in PA accumulation,

though it appears to exert this function indirectly as a regulator of cellular differentiation

(Johnson et al., 2002). Based on RPKM values, A/PsTT8 (bHLH) and PsTTG2 (WRKY) were

differentially expressed between the PACs and PLCs, while PsTT2 (MYB) and A2/PsTTG1

(WDR) were not (Table 4.4). This discrepancy suggested that differences in these transcription factors may account for the PA-lacking phenotype of ‘Alaska’ and ‘Canstar’.

4.3.2 A point mutation in A/PsTT8 causes mis-splicing in ‘Alaska’ and ‘Canstar’

PLCs produce white flowers, whereas PACs produce pink-purple coloured flowers.

During the progress of this work, it was reported that a point mutation at the exon/intron junction of A (pea TT8) causes mis-splicing of A, producing a premature truncation of the protein resulting in a loss of flower colour in pea (Hellens et al., 2010). However, Hellens et al. (2010) did not include ‘Alaska’ or ‘Canstar’ in their study. To determine if the same type of mutant allele for a can be found in ‘Alaska’ and ‘Canstar’, a region of the A genomic locus was

112

amplified and sequenced, and the assembled unigenes from the Illumina sequences (from individual cultivar assemblies) were also investigated. Sequencing the genomic locus of A from the five cultivars demonstrated that the same point mutation (G to A) reported previously was present only in ‘Alaska’ and ‘Canstar’ but not in other the three PACs (Fig. 4.5A). When present, this mutation causes the spliceosome to pass the wild-type GT motif at the 5'-end of intron 6, which triggers splicing of the immature transcript. Instead the spliceosome recognizes the subsequent downstream GT in the intron and eight intron nucleotides (ATAAATCG) are inserted in the transcript, generating a frameshift and a premature stop codon. Assembled A transcripts from the five cultivars showed that this 8-bp insertion and premature stop codon are indeed present, which leads to a premature truncation of the A protein (Fig 4.5B). The transcript and genome sequence analysis of A in the five cultivars unambiguously showed that ‘Alaska’ and

‘Canstar’ contain the a mutant allele.

While the expression of two components of the ternary PA regulatory complex,

A2/PsTTG1 and PsTT2, were equal across all five cultivars, A/PsTT8 is also required for production of PAs (Baudry et al., 2004). According to the current model, TT2 and TT8 recognize specific DNA sequences (Johnson et al., 2002). TTG1 does not appear to be required for recognition or binding of specific DNA sequences, but rather acts as an enhancer of expression, possibly by stabilizing the complex or preventing degradation of TT8 (Johnson et al.,

2002). Thus, despite the high level of PsTT2 and A2/PsTTG1expression in the PLCs, lack of

PsTT8/A can result in the PA-deficient phenotype of the PLCs.

PsTTG2 expression was approximately 15-fold lower in PLCs relative to PACs (Table

4.4). TTG2 is involved in tissue development in a variety of tissues, and its expression in the seed coat is dependent on TTG1 (Johnson et al., 2002, Ishida et al., 2007). TT2 has been shown to

113

also bind to the TTG2 promoter (Ishida et al., 2007). Given that both PsTT2 and PsTTG1/A2

showed high levels of expression in PLCs, the lack of TTG2 expression was unexpected. This

suggests the need of TT8 to form a bHLH-MYB-WD40 (TT8-TT2-TTG1) complex to drive expression of PsTTG2 in pea seed coats. Taken together, the results indicate a pivotal role of

A/PsTT8 in regulating expression of PA biosynthetic and regulatory genes in pea seed coats.

Interestingly, the expression of PA biosynthetic genes differed between ‘Alaska’ and

‘Canstar’. For example, PsANR expression is approximately 100-fold higher in ‘Canstar’ relative to ‘Alaska’, whereas PsLAR and PsANS are approximately 14-fold and 5-fold greater in

‘Alaska’, respectively (Table 4.4). PsANR and PsLAR expression differences between ‘Alaska’ and ‘Canstar’ were confirmed by qRT-PCR (Fig. 4.2B). These findings suggest that beyond the a mutant allele, additional regulatory differences may exist between the five cultivars.

4.3.3 Comparison between RNA-Seq and qRT-PCR

Preparation of cDNA libraries for NGS and in silico gene expression analysis can introduce biases, therefore a sample of the RNA-Seq data was confirmed by qRT-PCR. A total of 21 genes were selected for qRT-PCR analysis, including known PA biosynthetic genes and unigenes representing unknown genes (a number of which are discussed in later sections of this chapter), some of which appeared differentially expressed between PACs and PLCs or were equally expressed across all five cultivars based on RPKM values (Fig. 4.6). The correlation between the fold difference in relative transcript abundance determined by qRT-PCR or RNA-

Seq was high (relative to ‘Canstar’, R² = 0.683; relative to ‘Solido’, R² = 0.7583). Some discrepancies between the two methods were noted; however these occurred primarily with unigenes that displayed a low degree of differential expression (less than approximately 5-fold)

114

across all five cultivars. Overall, the comparison justified a high confidence in the accuracy of

RNA-Seq estimations of unigenes differentially expressed between the pea cultivars.

4.4 Differential gene expression analysis

Examination of the well characterized PA pathway and identification of distinct gene expression differences between the pea cultivars provided an excellent basis to identify novel genes contributing to phenotype differences. The most obvious comparison was between the

PACs and PLCs. However, the heavier weight of ‘Solido’ seeds warranted a comparison between ‘Solido’ and the other cultivars. A stringent filter was developed to concentrate on unigenes that were reasonably expressed (RPKM ≥ 50 in each cultivar belonging to at least one of the two groups being compared) and highly differentially expressed (≥ 5-fold expression difference between the groups).

4.4.1 ‘Solido’ versus ‘LAN3017’/’Courier’

A comparison between ‘Solido’ and ‘Courier’/‘LAN3017’ identified 116 unigenes highly expressed in ‘Solido’ (Appendix A.1). ‘Alaska’ and ‘Canstar’ were excluded from the initial comparison to avoid the obvious differences related to phenylpropanoid metabolism. However,

RPKM values from these two cultivars were taken into consideration when evaluating the 116 unigenes identified through the initial comparison.

The 116 unigenes were further investigated based on BLASTx homology and functional annotation. Of these, a unigene (contig12605) annotated as a cytochrome P450 (CYP) belonging to the CYP78A family was of particular interest as it was expressed 119-fold higher in ‘Solido’ versus ‘Courier’ and ‘LAN3017’ (Appendix A.1), and 113- to 144-fold higher in ‘Solido’

115

relative to the ‘Alaska’ and ‘Canstar’, respectively (Table 4.5). The Arabidopsis CYP78A family

contains six members, three of which (KLUH/78A5, EOD3/78A6 and CYP78A9) affect seed size

(Adamski et al., 2009, Fang et al., 2012). Contig12605 is highly homologous to CYP78A9 (70%

amino acid identity), EOD3/CYP78A6 (68%), CYP78A8 (65%) and KLUH/CYP78A5 (55%).

The expression of KLUH/78A5 in the inner integument of the seed coat of wild-type Arabidopsis

positively correlates with seed size, and its effect is independent of phytohormones or plant

resource status (Adamski et al., 2009). Similarly, EOD3/CYP78A6 is thought to function

independently of other known seed growth regulators (Fang et al., 2012). While CYP78A9 and

EOD3/CYP78A6 are functionally redundant in Arabidopsis (Fang et al., 2012), CYP78A9 and

KLUH/CYP78A5 appear to act independently (Adamski et al., 2009), which suggests that

CYP78A family members may have distinct role in modulating seed development and size. A

number of other genes have been linked to seed size in Arabidopsis, including TTG2, IKU (haiku

mutants), MINISEEDS3, APETALA2, AUXIN RESPONSE FACTOR2, SHORT HYPOCOTYL

UNDER BLUE1, and DA1 (Li et al., 2008b, Zhou et al., 2009, North et al., 2010). Putative

orthologs of these genes were either not found in the pea seed coat transcriptome or showed

comparable expression across all five cultivars (data not shown). In these analyses, contig12605 presented a single, compelling target gene differentially expressed in ‘Solido’. It should be noted, however, that biochemical function (or substrate) of any of the CYP78A enzymes is not known.

4.4.2 PACs versus PLCs

Analysis of ‘Courier’/‘Solido’ compared to ‘Alaska’/‘Canstar’ identified 76 unigenes highly expressed in the PACs relative to the PLCs (Appendix A.2). ‘LAN3017’ PA polymers are significantly longer than ‘Courier’ and ‘Solido’ PAs, and are composed primarily of procyanidin

116

subunits, whereas ‘Courier’ and ‘Solido’ PAs are derived from (Table 3.1).

Therefore, to minimize intra-group variation with respect to PA profile, ‘LAN3017’ was excluded from the PAC group for the initial screen. Twenty-five of the 76 unigenes differentially expressed in the two PACs belong to the PA biosynthetic pathway (Appendix A.2). The remaining unigenes were further evaluated based on functional annotation and the expression profiles of putative orthologs in publically available databases. Of these 49, three unigenes

(contig11789, 03509, 03511) representing two different transcripts were of particular interest.

Contig11789 was annotated as a strictosidine synthase-like (SSL) protein with homology to AT1G08470 (Table 4.5). Strictosodine synthase is known to be involved in alkaloid biosynthesis (Treimer and Zenk, 1979). Although a SSL gene family is present in Arabidopsis, this plant is not known to produce complex alkaloids, and the predicted SSL proteins lack a conserved catalytic residue, indicating none are likely to have strictosidine synthase activity

(Kibble et al., 2009). However, the reaction catalyzed by strictosidine synthase shares similar properties with proposed mechanisms for non-enzymatic PA oligomerization, such as a positively charged intermediate, non-stereospecific oligomerization under acidic conditions, as well as enzymatic oligomerization of two aromatic compounds in a stereoselective orientation

(Dixon et al., 2005, Maresh et al., 2007).

Contig03511 and contig03509 encode a partial coding sequence with homology to

AT2G47115 (Table 4.5). A consensus sequence containing a 915-bp open reading frame (304 amino acids) was produced by assembly of Illumina reads. This clone was named pea PA-related

Unknown Protein (PsPUP). PsPUP lacks any known functional domains but appears to be conserved in a number of diverse plant species based on BLASTx homology, including M. truncatula (E-value = 4e-176, 78% amino acid identity), Cicer arietinum (chickpea; 2e-154,

117

74%), Phaseolus vulgaris (common bean; 6e-136, 63%), Theobroma cacao (cocoa tree; 3e-126,

60%), Populus trichocarpa (black cottonwood; 1e-123, 58%), Prunus persica (peach; 9e-123,

59%), Glycine max (soybean; 2e-121, 58%), Vitis vinifera (grape; 1e-116, 55%) and Arabidopsis

(2e-72, 55%).

Significantly, the M. truncatula ortholog (Mtr.51818.1.S1_at) is down-regulated 23-fold in Mtwd40 (TTG1-like) mutant plants and up-regulated 14-fold in AtTT2-overexpressing M. truncatula hairy roots (Pang et al., 2008, Pang et al., 2009). These results indicated that this gene is regulated by the same transcription factors that control known PA biosynthetic genes. Using publicly available M. truncatula microarray data8, expression patterns of PA biosynthetic genes

and unigenes of unknown function can be compared. M. truncatula orthologs of both

contig03511 and contig11789 (SSL) showed very similar expression pattern as MtANR, though

only the contig03511 ortholog displayed induction by the Arabidopsis PA-specific transcription

factor TT2 (Fig. 4.7; Nesi et al., 2001). These results suggest that the genes encoded in these

unigenes (contig03511 and contig11789) may have implications in PA metabolism. Further

characterization of PsPUP is shown in chapter 5.

4.4.3 Hormone metabolism developing pea seed coat

The success of differential gene expression analyses in identifying novel genes possibly

related to pea phenotypic traits prompted a broader investigation of gene expression in

developing pea seed coat. Gene expression relating to the metabolism of several major hormones

was examined in an effort to uncover potential hormonal differences between the cultivars.

8 Medicago truncatula Gene Expression Atlas (http://mtgea.noble.org/v3/) 118

4.4.3.1 Gibberellin metabolic genes

Two different pathways for gibberellin (GA) biosynthesis exist in plants, which differ

based on the presence or absence of a hydroxyl group at C13 in the precursor GA12; pea seeds primarily produce 13-hydroxylated GAs (Sponsel, 1995, Nadeau et al., 2011). Three genes,

PsGA20ox, PsGA3ox and PsGA2ox, each consisting of at least two isoforms in pea, are responsible for producing biologically active GAs and the ‘deactivation’ of active GAs (Fig.

4.8A). PsGA20ox catalyzes the formation of GA20/GA9 (i.e. C13-OH/C13-H), which are

converted to the biologically active GA1/GA4 by PsGA3ox (Lester et al., 1996, García-Martínez

et al., 1997, Lester et al., 1997, Weston et al., 2008). GA1/GA4 and GA20/GA9 can be

‘deactivated’ by PsGA2ox, yielding GA8/GA34 and GA29/GA51, respectively (Lester et al.,

1999, Martin et al., 1999).

PsGA20ox, PsGA3ox, and PsGA2ox were each represented by a small group of unigenes

(2-5; Appendix A.3). PsGA20ox and PsGA3ox displayed comparable expression levels across all five pea cultivars, though PsGA20x expression was dramatically higher than PsGA3ox (Fig.

4.8B). In contrast, expression of PsGA2ox varied significantly between the cultivars, particularly in ‘Solido’ where PsGA2ox expression exceeded that of PsGA20ox (Fig. 4.8B). Conversely, in

‘Alaska’ PsGA2ox expression was comparable to that of PsGA3ox (Fig. 4.8B).

The expression of these three GA metabolic genes provides insight into the developmental stage of pea seed coat at 10 DAA. Early in ‘Alaska’ seed coat development, expression of PsGA20ox and PsGA3ox increases sharply between 8-10 and 10-14 DAA, respectively, followed by rapid declines (Nadeau et al., 2011). Thus, the roughly equal expression of GA20ox and PsGA3ox in the seed coats of all five cultivars indicates a comparable developmental stage at 10 DAA (Fig. 4.8B).

119

The high expression of ‘Solido’ PsGA2ox is curious. While correlating hormone levels to

gene expression can be hazardous, with respect to GA metabolism in pea the correlation appears

consistent (Nadeau et al., 2011). GAs have been suggested to influence pea seed coat growth rate

as well (Nadeau et al., 2011). In ‘Alaska’, growth the branched parenchyma cell layer of the seed

coat is preceded by the peak in PsGA3ox expression (Nadeau et al., 2011). Thus, it is interesting to consider whether high GA2ox expression influences the rate at which ‘Solido’ seed coat develops.

4.4.3.2 Abscisic acid metabolic gene expression in pea seed coat

Abscisic acid (ABA) is a mediator of plant environmental stress responses and regulates

a wide array of developmental processes. Unigenes annotated as ABA biosynthetic enzymes

zeaxanthin epoxidase, 9-cis-epoxycarotenoid dioxygenase (NCED3), ABA2, and abscisic aldehyde oxidase (AAO) as well as an ABA catabolic cytochrome P450, CYP707A3 (Nambara and Marion-Poll, 2005), were all identified in the pea seed coat transcriptome (Appendix A3).

The biosynthetic genes displayed significantly higher expression compared to CYP707A3, consistent with the role of ABA in early seed development to promote embryo growth and prevent of abortion (Nambara and Marion-Poll, 2005). Overall, the expression of ABA metabolic genes was comparable across all five cultivars and indicated a state of ABA biosynthesis in pea seed coat at 10 DAA.

4.4.3.3 Auxin metabolic genes

Auxins play a major role in modulating plant growth and development. Amino acid conjugation is an important process in auxin homeostasis as free auxins are predominantly more

120

bioactive than auxin-conjugates (Zhao, 2010). Conjugation is thought to be involved in auxin catabolism, though it also serves as a mechanism to sequester auxins in seeds for subsequent use during germination (Normanly et al., 2010). As elucidation of de novo biosynthesis of auxins in plants remains incomplete, flux through the auxin pathway cannot be accurately predicted based only on transcript abundance. However, a number of auxin biosynthetic genes, including

ALDEHYDE OXDIASE1 (AAO1), two YUCCA family members (YUC10 and YUC11) and two

auxin-related transcription factors (STY1 and NPY1), were expressed at least 20-fold lower than

two auxin-amido synthetases (GH3.5 and GH3.6) across all five cultivars, suggesting active

catabolism or storage of auxins at 10 DAA in pea seed coat (Appendix A.3; Staswick et al.,

2005, Cheng et al., 2006, Sohlberg et al., 2006, Cheng et al., 2007a, Cheng et al., 2007b, Zhao,

2010). Consistent with this, the expression of the GH3 synthetases was higher than that of a number of unigenes (contig02455, 02456, 02459, 02460) annotated as auxin-conjugate hydroxylases (Appendix A.3).

Overall, hormone-related gene expression was generally synchronistic between the cultivars, which is consistent with a shared developmental stage in all five cultivars at 10 DAA.

However, the high expression of PsGA20x in ‘Solido’, and to a lesser degree in ‘Canstar’, was an exception. Further study is required to determine the impact of this difference on GA metabolism in ‘Solido’ seed coat.

4.4.4 Carbohydrate and protein metabolism in pea seed coat

Early in seed development, the seed coat functions as a transient storage organ, maintaining reserves of amino acids and sugars while also modifying and transferring these nutrients to the embryo (Weber et al., 2005). Consistent with this biological role, protein

121

metabolism (19%), transport (12%) and carbohydrate metabolism (9%) were among the most

prominent GO processes represented in the pea seed coat transcriptome (Fig. 4.4).

4.4.4.1 Amino acid transporters

Protein accumulation in the seed requires sufficient uptake of amino acids delivered from the phloem and dispersed to the embryo/endosperm by the cells of the seed coat (Weber et al.,

2005). Amino acid permeases (AAPs) are a major group of transporters involved in cellular uptake of amino acids, and are of particular interest as overexpression of fava bean AAP1 was able to increase protein content in pea seeds (Weigelt et al., 2008). Putative pea orthologs of

AtAAP1 and AtAAP6 were among the highest expressed amino acid tranporters identified

(Appendix A.3). Interestingly, a putative pea ortholog of Siliques Are Red1 (SIAR1), a bidirectional Arabidopsis amino acid transporter involved in amino acid accumulation in developing seeds (Ladwig et al., 2012), displayed the highest expression of all the amino acid transports examined. Furthermore, SAIR1 expression was approximately 2-4 fold higher in

‘Canstar’ and ‘Solido’ compared to the other three cultivars (Appendix A.3). Overall, the results were consistent with the role of the seed coat in the uptake of nitrogen compounds from phloem sap and dispersion to the embryo.

4.4.4.2 Amino acid metabolism

The seed coat also metabolizes amino acids received from the phloem prior to their transport to the embryo (Weber et al., 2005). The major amino acids transported in pea phloem sap to developing seeds are glutamine and asparagine, with minor amounts of glutamate, aspartate and the non-protein amino acid homoserine (Rochat and Boutin, 1991). However, the

122

major amino acids released by young pea seed coats are glutamine, alanine and threonine, with

virtually no asparagine detectable, demonstrating significant amino acid metabolism by the cells

of the seed coat (Rochat and Boutin, 1991, Delgado-Alvarado et al., 2007). Unigenes annotated

as genes related to glutamine, glutamate, aspartate, asparagine, threonine and alanine metabolism

all showed comparably high levels of expression in all five pea cultivars, supporting the role of

the seed coat in metabolizing amino acids destined for the embryo and endosperm. (Appendix

A.3). However, expression of one-type of gene in amino acid metabolism was noticeably

different in Solido. Two unigenes (contig08742, 15427) annotated as asparagine synthetase,

which displayed approximately 10- to 50-fold higher expression in ‘Solido’ compared to the

other four cultivars, were notable as expression of asparagine synthetase correlates with soluble

protein levels in Arabidopsis seeds (Lam et al., 2003).

4.4.4.3 Polysaccharide metabolism

During the first stage of legume seed development, extracellular seed coat invertases

metabolize sucrose from the phloem into hexose, which is supplied to the developing embryo

(Sturm and Tang, 1999). As development progresses, a transition from a high hexose to a high

sucrose state triggers a shift from the pre-storage phase (seed coat growth) to the storage phase of seed development (Weber et al., 2005). Invertases and sucrose synthases are generally expressed sequentially, with increasing sucrose synthase activity later in seed development (Weber et al.,

2005). Consistent with this, eight unigenes annotated as sucrose synthases displayed significantly higher expression than seven invertase unigenes, across all five cultivars (Appendix A.3).

Interestingly, a pectin methylesterase inhibitor (PMEI; contig16364) expressed dominantly only in the three PAC cultivars (~20-fold higher) was identified (Table 4.5;

123

Appendix A.3). Pectin is a major component of walls and seed mucilage. PME activity is of commercial interest as PMEs degrade pectin during storage, resulting in softer fruit and vegetable products (Giovane et al., 2004). Constitutive expression of a PMEI in Arabidopsis and wheat (Triticum durum ‘Svevo’) increased resistance to a fungal pathogen due to modifications in cell wall pectin (Lionetti et al., 2007, Volpi et al., 2011). Thus, pea cultivars with higher

PMEI content may be better suited for commercial processing and storage as well as have increased resistance to microbial pathogens. However, the reason for the differential expression of PMEI (20-fold higher in PAC than in PLC) is not clear.

Overall, the analyses of protein and carbohydrate metabolic genes were consistent with the biological role of the seed coat early in development, supporting a high degree of confidence in the accuracy of the RNA-Seq dataset. While the expression of the genes examined was mostly consistent between the five cultivars, the high expression of SIAR1 and an asparagine synthetase was interesting as these may have biological implications relating to seed nutritional value.

4.5 Summary

Whole transcriptome sequencing of five pea cultivars enabled high-resolution examination of the metabolic processes active in early pea seed coat development and investigation of gene expression differences between the cultivars. The wide variety of pea cultivars, which possess clear phenotypic differences, provides an ideal system with which seed coat physiology can be studied.

The work presented here demonstrates the feasibility of exploiting the diversity within P. sativum to identify target genes related to phenotypic variations. A number of putative molecular breeding and biotechnology targets that may be useful in developing new pea varieties were

124

identified. For example, ‘Solido’ produced significantly heavier seeds than the other four

cultivars, and displayed a number of gene expression differences, of which the high expression

of a CYP78A gene family member was the most interesting (Table 4.5).

Analysis of transcriptional differences relating to PA biosynthesis identified three interesting features. ‘LAN3017’ produced longer PA polymers than ‘Courier’ or ‘Solido’ (Table

3.1) and also showed slightly higher expression of the general PA biosynthetic pathway compared to the other two PACs (Table 4.4), raising the interesting possibility that pathway flux may influence the mean degree of polymerization of seed coat PAs. Additionally, two novel genes, contig11789 and PsPUP, were identified (Table 4.5). Contig11789 remains uncharacterized, yet is an interesting target given its annotation as a SSL protein and the chemistry of the strictosidine synthase reaction. However, as expression of the M. truncatula

PUP is under the control of two known PA-related transcription factors (Section 4.4.2), this gene was chosen for further characterization in the next chapter.

125

Figure 4.1 Dry seed from Pisum sativum cultivars ‘Alaska’, ‘Canstar’, ‘Courier’, LAN3017’, and ‘Solido’. Mean dry weight per seed ± standard error (n = 200 seeds per cultivar).

126

Figure 4.2 Relative expression of PA biosynthetic genes in pea seed coats at 10 DAA determined by qRT-PCR. A) PAL, phenylalanine ammonia lyase; DFR, dihydroflavonal 4- reductase; ANR, anthocyanidin reductase; LAR, leucoanthocyanidin reductase. B) Scale adjusted to visualize ‘Canstar’ and ‘Alaska’. Relative expression set as R=1 for ‘Solido’. Actin used as a reference gene. Data are means ± SD (n=3).

127

Figure 4.3 Comparison of de novo assembled pea transcriptomes. De novo transcriptome assemblies were mapped to the Medicago truncatula 4.0 representative gene model database. NB, Newbler. CLC, CLC Genomics Workbench. Previously published pea transcriptomes: Franssen et al. (2011) and Kaur et al. (2012). A) Total number of unigenes mapped. B) Percentage of total unigenes mapped. C) Average length of mapped unigenes. D) Mapping of ‘Courier’ Illumina PE reads to different assemblies. E) Mapping of Illumina PE reads from each pea cultivars to the 454NB assembly.

128

Figure 4.4 GOSlim biological process term assignments for 454NB transcriptome. Percentage of total 454NB unigenes with GO annotation (6,932) is show. Only processes representing ≥ 2% are included.

129

Figure 4.5 PsA (bHLH transcription factor) mutations in ‘Alaska’ and ‘Canstar’. A) Genomic DNA sequence, recovered by PCR cloning and Sanger sequencing, at the boarder for intron 6 (green). Exon sequence shown in blue. A G-to-A conversion (arrow) disrupts the GT splice site motif. B) Complementary DNA sequence, recovered from de novo assembly of Illumina pair-ends reads, shows an eight nucleotide insertion in ‘Alaska ’ and ‘Canstar’, before the splicesome recognizes the next GT splice site motif, which creates a premature stop codon (yellow) in these two cultivars. PsA (GU13294).

130

Figure 4.6 Comparison of gene expression determined by qRT-PCR and Illumina RNA- Seq relative to ‘Canstar’ or ‘Solido’. ‘Canstar’, black squares. ‘Solido’, blue dots. Data points represent the expression of 21 genes (PAL, DFR, LAR, ANR, PAR putative, TT2, TT8, TTG1, TTG2, contig03511, contig11789 STR-like, contig13014, contig12490, contig10591, contig14173, Rab-5C, contig19292, contig11924, contig12107, contig12117, contig09978) in 10 DAA seed coat from the five cultivars relative to ‘Canstar’ (R² = 0.683) and ‘Solido’ (R² = 0.758). Expression of DFR and CL139 for ‘Canstar’ and ‘Alaska’ omitted from comparison because qRT-PCR CT values >30.

131

Figure 4.7 Microarray expression profiles of M. truncatula orthologs in different tissues and in transgenic hairy roots overexpressing β-glucuronidase (vector control) or A. thaliana transparent testa2 (AtTT2). Gene and probe ID: F'3H, flavonoid 3'-hydroxylase (Mtr.36333.1.S1_at). DFR, dihydroflavanol 4-reductase (Mtr.38073.1.S1_at). ANR, anthocyanidin reductase (Mtr.44985.1.S1_at). UGT72L1, UDP-glucosyltransferase72L1 (Mtr.21996.1.S1_x_at). Contig03511, Mtr.51818.1.S1_at. Contig11789, Mtr.14895.1.S1_at. dap, days after pollination. Y-axis, arbitrary units for relative expression. Data from M. truncatula Gene Expression Atlas (http://mtgea.noble.org/v3).

132

Figure 4.8 Gibberellin metabolism pathway and related gene expression in pea seed coat at 10 DAA. A) Schematic displaying the formation of biologically active GAs (shaded in yellow) and inactive GAs (shaded in grey). Enzymatic steps are colour coordinated with the enzymes responsible. B) Relative transcript abundance (RPKM) of unigenes representing GA20ox, GA3ox and GA2ox in each of the pea cultivars.

133

Table 4.1 NGS reads summary.

Illumina Hi-Seq 2000 Roche/454-Titanium Cultivar Avg. length Avg. length # reads # reads (bp) (bp) Alaska 51,412,519 98.8 205,584 334.5 Canstar 61,467,949 98.7 387,083 336.3 Courier 68,300,520 98.8 154,610 337.1 LAN3017 53,523,294 98.8 173,974 338.2 Solido 60,234,719 98.8 203,361 338

Table 4.2 Basic assembly metrics. NB, Newbler. CLC, CLC Genomics Work Bench. Assemblies were blasted against the Arabidopsis TAIR10 and Medicago truncatula 4.0 protein databases. Avg contig BLASTx hits Assembly Reads Assembler N50 length # contigs TAIR Mt (bp) 454NB 454 Newbler 1272 829 19,199 77% 83% 454CLC 454 CLC 1142 892 20,596 77% 84% ILM Illumina CLC 1291 778 88,122 34% 41% Illumina 454-ILM CLC 1184 748 92,931 35% 45% & 454

Table 4.3 Number of unigenes matching to known pea gene sequences in different de novo assemblies. Single copy genes: LH, MHG-IY, plastocyanin, Fed-1, and ApxI. Pisum satiuvm flavonoid and proanthocyanidin biosynthetic genes: PsANR, PsDFR, PsLAR. ILM- Gene Accession 454NB 454CLC ILM Franssen Kaur 454 LH AY245442 1 1 5 5 2 5 HMG-IY X99373 3 1 6 5 1 3 plastocyanin X16082 1 1 1 1 5 1 Fed-1 M31713 1 1 2 2 5 1 ApxI X62077 1 1 1 3 5 1 PsANR KF516483 2 1 1 2 0 1 PsDFR KF516484 3 3 2 2 0 1 PsLAR KF516485 1 1 1 1 0 3

134

Table 4.4 Illumina RNA-Seq estimation of expression of known PA biosynthetic and regulatory genes in 10 DAA pea seed coats. PAC, proanthocyanidin accumulation cultivars (‘Courier’, ‘LAN3017, ‘Solido’). PLC, proanthocyanidin lacking cultivars (‘Alaska ’, ‘Canstar’). PAL, phenylalanine ammonia lyase; C4H, cinnamate 4-hydroxylase; 4CL, 4-coumarate:CoA ligase ; CHI, chalcone isomerase; CHS, chalcone synthase; F3'5'H, flavonoid 3', 5'-hydroxylase; F3'H, flavonoid 3'-hydroxylase; F3H, flavanone 3-hydroxylase; DFR, dihydroflavonal 4- reductase; ANS, anthocyanidin synthase; ANR, anthocyanidin reductase; LAR, leucoanthocyanidin reductase; TT12, MATE transporter; TT15, UDP-glucose:sterol- glucosyltransferase ; TT2, MYB transcription factor; PsA, (TT8 homolog) bHLH transcription factor; PsA2 (transparent testa glabra 1 homolog), WD40 transcription factor; TTG2, transparent testa glabra 2 (WRKY transcription factor). Fold Gene Courier LAN3017 Solido Alaska Canstar Difference PAC v PLC PAL* 356.1 445.1 244.3 70.6 38.7 6.4 C4H 313.8 344.1 264.7 86.5 25.3 5.5 4CL* 206.4 168.8 144.2 70.9 71.2 2.4 CHI* 306.6 458.3 449.0 142.3 122.0 3.1 CHS* 2543.2 3668.9 2214.0 11.1 4.0 371.7 F3'H* 196.6 281.3 168.8 86.6 18.8 4.1 F3'5'H* 3834.0 146.2 2843.5 116.6 634.2 6.1 F3H* 3248.8 5412.7 3361.6 2208.1 2827.6 1.6 DFR* 1534.7 1819.4 1527.7 757.1 726.3 2.2 ANS* 573.3 725.9 485.3 287.6 55.1 3.5 LAR 283.9 366.9 84.5 33.0 2.3 13.9 ANR* 4552.4 6615.6 5721.2 3.9 386.3 28.9 TT12 235.1 303.9 97.8 2.6 1.1 115.5 TT15 45.4 48.2 42.1 6.7 7.3 6.5 TT2* 45.6 70.9 42.9 72.2 45.7 0.9 TT8/A 44.4 39.2 40.6 3.7 1.6 15.6 TTG1/A2 36.4 24.0 22.8 29.1 25.7 1.0 TTG2 82.6 83.6 66.0 7.7 2.6 15.0 *RPKM values of genes represented by more than one unigene were normalized based the combined number of reads mapped to longest unigene per million mapped reads.

135

Table 4.5 Summary of genes of interest identified through differential gene expression analysis. RPKM values: A, ‘Alaska’; Cn, ‘Canstar’; Cr, ‘Courier’; L, ‘LAN3017’; S, ‘Solido’. Unigene Pea Cultivar RPKM E-value AGI/UniProt Description (contig#) A Cn Cr L S 12605 8.7E-132 AT3G61880/ CYP78A9 1.4 1.1 1.4 1.3 158.4 03509 7.26E-17 AT2G47115 unknown protein 0.1 0.1 86.3 156.0 96.5 03511 4.88E-52 AT2G47115 0.1 0.0 60.5 128.4 69.6 11789 2.3E-171 AT1G08470 strictosidine synthase-like 3 0.4 0.7 164.9 221.4 140.0 16364 4.31E-55 AT5G62350 Plant invertase/pectin 58.8 68.5 1188.2 858.8 1273.2 methylesterase inhibitor superfamily protein

136

Chapter Five: Characterization of PA-related Unknown Protein

5.1 Introduction

Differential gene expression analysis identified one gene, referred to as PA-related

Unknown Protein (PUP), which encodes a protein of unknown function and shows an exclusive expression pattern in pea PAC (PA-accumulating cultivars) of peas compared to PLC (PA- lacking cultivars; section 4.4.2). In addition, in silico analysis indicated that the Medicago truncatula PUP ortholog is regulated by known PA transcription factors (i.e., 23-fold decrease in

MtTTG1 mutant and 14-fold increase by AtTT2 overexpression; see section 4.4.2). These results together with the comparative transcriptomics data strongly suggested that PUP is involved in

PA metabolism, and thus PUP was selected for further characterization.

The genomic resources available for Arabidopsis thaliana far surpass that for pea, and knockout lines for genes of interest can be identified in Arabidopsis knockout databases. For these reasons, further characterization of P. sativum PUP (PsPUP) was conducted using the

Arabidopsis ortholog, AT2G47115 (AtPUP).

5.1.1 PA extraction and analysis

Analysis of PAs requires their extraction from plant material. While flavan-3-ol monomers are soluble in water, extraction of oligomeric and polymeric PAs requires organic solvents (Hümmer and Schreier, 2008). Typically, aqueous methanol, acetonitrile or acetone is used for PA extraction (Routaboul et al., 2006, Hümmer and Schreier, 2008). However, no method exists to exhaustively extract PAs; thus analyses of PAs are conducted based on solvent soluble (extractable) and insoluble (non-extractable) fractions. Soluble PA fractions roughly consist of short oligomers (e.g. mDP less than approximately eight), whereas insoluble fractions

137

contain long polymers and oxidized PAs bound to cellular material (i.e. cell wall; Routaboul et

al., 2006). The composition of the fractions is further contingent on the choice of organic solvent

and acidification of the solvent or addition of antioxidants, all of which can affect the efficiency

of solubilisation (Hümmer and Schreier, 2008).

Several spectrophotometric methods exist for analyzing PA content, including the acid

butanol, vanillin, and p-dimethylaminocinnamaldehyde (DMACA) assays, of which the acid

butanol assay is the most widely used for combined analysis of soluble and insoluble PAs (Porter

et al., 1985, Hümmer and Schreier, 2008). After grinding the plant material and extracting

soluble PAs in an organic solvent, the soluble and insoluble fractions are separated by

centrifugation. The soluble and insoluble fractions are mixed with butanol:HCl separately, and

the two samples were heated to catalyze acid-hydrolysis. This process hydrolyzes the PAs and produces an anthocyanin, which is detected by a spectrophotometer (Porter et al., 1985).

5.2 Results

5.2.1 Identification of Arabidopsis PUP ortholog

The PsPUP protein sequence was used for protein BLAST analysis against TAIR10

Arabidopsis protein database to identify the most homologous gene in Arabidopsis. AT2G47115

(AtPUP) was identified from this search. AtPUP shares an overall 55% amino acid identity (E-

value 2e-72) with PsPUP (Fig. 5.1A). However, the C-terminal half of PsPUP is highly conserved with the C-terminal half of AT2G47115 (75% amino acid identity). The next closest homolog was AT1G10660 which encodes a protein with 44% amino acid identity to PsPUP.

138

Structural prediction by TMHMM9 identified six or seven regions likely to form helical

transmembrane domains in PsPUP and AT2G47115 (Fig. 5.2B). The transmembrane domain

pattern was entirely conserved between the two protein sequences. Therefore, the closest

Arabidopsis homolog of PsPUP was predicted to be AT2G47115 (AtPUP). Computational

annotation of this gene in the database indicated that AT2G47115 encodes a protein of unknown

function.

To more carefully assess the possible redundancy of AtPUP, the Arabidopsis genome

was analyzed by BLAST search using AT2G47115 (AtPUP) as a query (E-value cut-off 1e-10).

AT1G10660, AT5G62960, AT3G27770, and AT1G70550 were additionally identified, and they

shared amino acid identities with AtPUP between 35-45%. Therefore, there is no apparent gene

redundancy of AT2G47115 (AtPUP) by recent duplication, although the possibility that distantly

related homologs still perform the same function cannot be excluded. Overall, these in silico analyses support AT2G47115 as the most likely PsPUP ortholog in Arabidopsis, if PUP function is important and conserved in all PA biosynthetic plants.

5.2.2 Expression profiles of PUP in P. sativum and Arabidopsis

Differential gene expression analysis provided an initial basis upon which to consider an association between PsPUP and the PA pathway. To gain further evidence, the temporal expression profile of PsPUP in ‘Courier’ seed coat from 6-20 days after anthesis (DAA) was examined by quantitative real-time PCR (qRT-PCR), along with the spatial expression profile of

AtPUP in various Arabidopsis tissues (Fig. 5.2). Except for a slight increase at 10 DAA, the temporal expression of PsPUP was relatively stable during early seed coat development in

9 http://www.cbs.dtu.dk/services/TMHMM-2.0/ 139

‘Courier’ (Fig. 5.2A), and is reminiscent of dihydroflavanol 4-reductase (DFR) expression,

which is also generally stable during seed coat development except for a small peak at 10 DAA

in ‘Courier’ seed coat (Section 3.2.6, Fig. 3.8C). It is also worth noting that anthocyanidin reductase (ANR) expression peaks at 10 DAA in ‘Courier’ seed coat (Section 3.2.6, Fig. 3.8A).

A slight increase in PsPUP expression is seen late in development; however, this may be an

artifact of declining actin expression (reference) due to sclerification of the seed coat.

The spatial expression profile of AtPUP in Arabidopsis flower, leaf, root, stem and

immature silique was examined in comparison to ANR, chalcone synthase (CHS), and DFR (Fig.

5.2B). Expression of AtPUP was limited to tissues in which DFR and ANR were also expressed.

Significantly, AtPUP was not expressed in leaf, where only CHS transcripts were detected,

which may implicate AtPUP in the flavonoid branch of the phenylpropanoid pathway. There

were no indication of AtPUP expression in root and stem. Together, these results point to the co-

regulation of PUP with flavonoid and PA pathway genes. This is consistent with the influence of

known PA transcription factors on the expression of the M. truncatula PUP ortholog (section

4.4.2).

5.2.3 Subcellular localization of PUP

Full-length PUP was isolated from cDNAs of Arabidopsis siliques, and this clone was

used for subcellular localization. Localization of PUP was studied by Agrobacterium-mediated

transient expression of the GFP-fusion construct in tobacco (Nicotiana benthamiana) leaf

epidermal pavement and stomatal guard cells. Fluorescence was visualized by confocal laser

scanning microscopy. N-terminal GFP-tagged AtPUP (GFP-AtPUP) was localized to the cytosol

similar to free GFP (Fig. 5.3A and B). In contrast, the C-terminal fusion product (AtPUP-GFP)

140

was localized to small, bright punctate formations that appeared cytosolic (Fig. 5.3C). Closer examination revealed that in addition to the punctate formations, GFP fluorescence appeared to be present on small vesicle-like structures (Fig. 5.3D). As PAs accumulate in the central vacuole, subcellular localization of these vesicles was studied by co-expression of a tonoplastic (vacuole membrane) marker, γ-TIP-RFP (Fig. 5.3 E, H and K). Interestingly, these vesicle-like structures and several punctate formations appeared to be present on the lumenal side of the tonoplast (Fig.

5.3F). As stomatal guard cells are physiologically more active than pavement cells, stomatal cells were observed for co-transformation of AtPUP-GFP and γ-TIP-RFP. In stomatal cells, AtPUP-

GFP was present on numerous vesicle-like structures of varying size (Fig. 5.3G and J).

Consistent with the observed localization in pavement cells, the vesicle-like features appeared to be present on the lumenal side of the tonoplast in stomatal cells (Fig. 5.3I and L).

Merging the focal planes provided a three-dimensional image of the vesicle-like structures and punctate formations present in AtPUP-GFP expression cells (Fig. 5.4A and D).

The central vacuole (γ-TIP-RFP) occupied the majority of the intracellular space (Fig. 5.4B and

E). The nucleus was visible as an indentation in the tonoplast. Numerous transvacuolor stands, presumably cytoplasmic streaming channels were observed, particularly in the pavement cells.

These channels are delimited by the tonoplast and connect on the peripheries of the cell. The vesicle-like structures and punctate formations appeared to associate with the cytosol, or possibly the lumen side of the tonoplast, often within the vicinity of these transvaculor strands

(Fig. 5.4F).

Interestingly, a video of AtPUP-GFP taken with a spinning-disc confocal microscope revealed movement of the punctate structures, indicating that they may be mobile (data not shown). Overall, the subcellular localization of AtPUP-GFP supported the in silico

141

transmembrane prediction (Fig 5.1) and indicated AtPUP-GFP localizes to punctuated particles

and vesicles, and the vesicles appeared to be located on the lumenal side of the tonoplast in vacuole.

5.2.4 Identification of transposon insertion pup mutants

Involvement of AtPUP in PA biosynthesis was further investigated using A. thaliana

Nossen homozygous mutants, containing a transposon insert in the 5'-exon of AtPUP (Fig. 5.5), along with segregant wild-type lines from the self-crossing of the heterozygous transposon insert parental line. Homozygous knockouts (KO) and wild-type segregant (WT) lines were screened by genomic PCR using primers flanking the transposon insertion site and a third primer specific for the transposon. PCR conditions were set such that when primers flanking the transposon site were used, only the wild-type allele was able to amplify. The large size of the transposon prevented amplification of alleles containing the insert. A second PCR reaction was run using a primer specific for the transposon. When used together, these two PCR reactions were able to discern wild-type, heterozygous and homozygous plants. Screening of ten progeny of a heterozygous self-cross identified three wild-type segregants (WT: #4, 6 and 10) and six homozygous transposon insertion lines (KO: #3, 5, 7, 8, 9, 10, and 12; Fig. 5.6A). Line #11 produce an unexpected AtPUP amplicon size and was therefore excluded from further analysis.

The genomic DNA screening results were confirmed by reverse transcription (RT)-PCR using

cDNA generated from immature silique RNA. AtPUP transcripts were only detected in lines #4,

6 and 10 (Fig. 5.6B) in agreement with the genomic PCR screen.

142

5.2.5 PA characterization in Arabidopsis pup mutants

Chemical and visible phenotypes of KO and WT were examined. Both KO and WT

plants cultivated in identical conditions showed no phenotypic differences. No visible differences

were noted in seed coat colour (Fig 5.7A). The PA profile of WT and KO seeds was examined

by butanol:HCl and phloroglucinol acid hydrolysis, which provide information about PA content

and the length of the PA polymers (mDP, mean degree of polymerization), respectively (Section

3.2.1). In butanol:HCl analysis, the soluble PA content of KO seeds was 31% higher compared

to the level in WT seeds (Fig. 5.7B). However, the insoluble PA content as well as the mDP of

PAs in WT and KO seeds were comparable (Fig. 5.7B and C). In conclusion, loss-of-function of

AtPUP showed an increase of soluble PA in Arabidopsis seeds, but overall insoluble PA content and mDP were not affected in pup mutant.

5.3 Discussion

Expression of PUP appears tightly associated with the flavonoid pathway (Fig. 5.2 and section 4.4.2). The protein was predicted to be membrane-bound, which was consistent with the

observed subcellular localization (Fig. 5.3). Analysis of an Arabidopsis pup loss-of-function mutant found an increase in soluble PA content relative to wild-type (Fig. 5.7), which provided

further indication of an association between PUP and the PA pathway. The difference in soluble

PAs between the WT and Atpup mutant was subtle, and insoluble PA content was not affected in the pup mutant (Fig. 5.7). The higher soluble relative to insoluble PA content is likely a result of

the strong organic solvent (75% acetone) used in this work, which is consistent with other studies

that employed high concentration acetone versus acetonitrile for soluble PA extraction (Pourcel

et al., 2005, Routaboul et al., 2006, Zhao and Dixon, 2009, Li et al., 2011).

143

The slight increase of soluble PAs in the Arabidopsis pup mutant combined with the subcellular localization suggests that AtPUP is involved in transport of soluble PAs to the central vacuole (also see section 6.3.3.1 for further discussion). However, the apparent lack of a change in the insoluble PA content in seeds of the Atpup mutant raises the question of whether AtPUP alone plays an essential role in PA metabolism in Arabidopsis. AtPUP may partially contribute to the vesicle-mediated trafficking of PAs.

Despite subtle differences in the PA profile of the pup mutant, the localization of AtPUP to vesicle-like structures is an intriguing observation. Several studies indicate that anthocyanins, a group of flavonoids closely related to PAs, are transported by membrane-bound vesicles to the vacuole, which is also believed to be the site of soluble PA accumulation (Poustka et al., 2007,

Kitamura et al., 2010, Gomez et al., 2011). Furthermore, there is evidence that anthocyanins are sequestered, at least temporarily, within membrane bound structures in the vacuolar lumen

(Poustka et al., 2007). PA-like materials have been previously reported to accumulate in provacuoles that later merge with the central vacuole (Chafe and Durzan, 1973, Baur and

Walkinshaw, 1974). Recently, evidence of vesicles (tannosomes) containing PA-like material was found in grape (Vitis vinifera) and several other species (Brillouet et al.,

2013). Such ample experimental evidence strengthens the possible role of PUP in vesicle- mediated trafficking of PAs.

Interestingly, the appearance of the vesicles in AtPUP-GFP overexpressed plant cells is similar to those formed by over-expression of ara7, Q69L gain-of-function, mutant in plants (Jia et al, 2013). ARA7 is a Rab5-GTPase that is localized to pre-vaculoar vesicles. This also indicates that AtPUP may be involved in vesicle-mediated trafficking inside the cell.

144

Figure 5.1 Comparison between P. sativum PA-related Unknown Protein (PsPUP) and Arabidopsis AT2G47115 (AtPUP). A) ClustalW protein alignment. Conserved amino acids are shaded in black, similar amino acids are shaded in gray. B) Transmembrane (TM) domain prediction of the probability a residue is in a TM helix (red), or its relative position with respect to the interior (blue) or exterior (pink) of the membrane bound structure by TMHMM.

145

Figure 5.2 Expression of profile of PUP in pea and Arabidopsis PUP determined by qRT- PCR. A) Expression of PsPUP in developing ‘Courier’ seed coat (DAA, days after anthesis). Actin was used as reference. R=1 at 15 DAA. B) Expression of AtPUP in Arabidopsis tissues. ANR, anthocyanidin reductase. CHS, chalcone synthase. DFR, dihydroflavanol 4-reductase. Ubiquitin10 was used as a reference. Silique expression was set as R=1.

146

Figure 5.3 Subcellular localization of AtPUP transiently expressed in tobacco leaf epidermal pavement cells and a stomatal guard cell. A) GFP only in pavement cell. B) GFP- AtPUP in tobacco pavement cell. AtPUP-GFP in pavement cells (C and D) and a stomatal cell (G and J). γ-TIP-RFP localized to the tonoplast in a pavement (E) and stomatal cell (H and K). F) D and E merged. I) G and H merged. L) K and L merged. Images G-L are from a single cell taken at different z-axis positions. Arrow denotes punctate formation. ex, extracellular region. n, nucleus.

147

Figure 5.4 Subcellular localization of AtPUP by transient expression in a tobacco leaf stomatal guard cell (left) and pavement cell (right). AtPUP-GFP (A and D). γ-TIP-RFP localized to the tonoplast (B and E). C, A and B merged. F, D and E merged. All images are merged focal planes. n, nucleus.

148

Figure 5.5 Transposon insertion position in AtPUP in Arabidopsis thaliana Nossen. Exons denoted by thick bars, introns by lines. Line 53-3618-1 was obtained from Riken BioResource Centre.

Figure 5.6 PCR screening of Arabidopsis thaliana Nossen AtPUP transposon insertion homozygous mutants and wild-type segregant plants. A) PCR genomic DNA screen. Top, AtPUP amplicon. Bottom, transposon amplicon. Pos ctrl, DNA from plants previously determined to be homozygous for transposon insert. B) Reverse transcription PCR screen for AtPUP transcripts. Top, AtPUP amplicon. Bottom, loading control (TIM44-2). Neg, no template PCR control.

149

Figure 5.7 Characterization of seed proanthocyanidins in Arabidopsis thaliana ‘Nossen’ wild-type segregants and Atpup transposon homozygous mutants. A) Mature, air-dried seeds. B) Butanol:HCl assay comparing relative levels of soluble and insoluble PAs WT and Atpup seeds (WT n=3, KO n=6; *P <0.05). C) Mean degree of polymerization of PAs in WT and Atpup seeds as determined by phloroglucinol acid hydrolysis (n=3).

150

Chapter Six: General Discussion

6.1 Pisum sativum proanthocyanidins

Pea (Pisum sativum) was one of the original model organisms for genetics research, used

by Mendel in the mid to late 1800’s in establishing the laws of inheritance. As a nutritional and

commercial crop, pea is one of the most widely cultivated plants on Earth. However, the use of

pea in plant genetics has lagged behind until only recently. A long history of breeding has

produced a wide array of different pea cultivars with different phenotypic traits. This diversity is

a valuable resource that is only now beginning to be exploited due to the advent of next-

generation sequencing technology. The work conducted here is designed to demonstrate the

feasibility of pea-based comparative transcriptomics and to provide new resources for seed coat

genetics and biochemistry studies, particularly the investigation of proanthocyanidin (PA)

biosynthesis.

PAs have received renewed research interest due to the potential health benefits

associated with their consumption (Serrano et al., 2009). Despite the importance of pea in human

nutrition and as a component of livestock feed, no detailed characterization of PA biosynthesis in

pea seed coat has been conducted. In pea, PAs primarily accumulate in the vacuoles of the

epidermal and ground parenchyma cells of the seed coat (Ferraro et al. submitted). PA

accumulation in five pea cultivars was examined and the PA chemical profiles of four were

analyzed in detail, demonstrating significant quantitative and qualitative differences. Cultivars

‘Alaska’ and ‘Canstar’ (PA-lacking cultivars, PLCs) did not accumulate PAs (Fig. 3.1 and 4.1), whereas ‘Courier’, ‘Solido’, and ‘LAN3017’ (PA-accumulating cultivars, PACs) PAs differed in

composition and quantity (Table 3.1). Of the PACs, ‘LAN3017’ contained the highest amount of

PAs. ‘Courier’ and ‘Solido’ PAs were composed primarily of delphinidin-derived flavan-3-ol

151

subunits (2',3',4'-hydroxylated B-ring), epigallocatechin (EGC) and gallocatecin (GC), similar to

grape (Vitis vinifera; Bogs et al., 2005) and tea (Camellia sinensis; Graham, 1992). In contrast,

‘LAN3017’ PAs were composed of procyanidin flavan-3-ols (2',3'-hydroxylated B-ring), epicatechin (EC) and catechin (Cat). Qualitative PA diversity is a useful trait for nutritional research as differences in PA subunit composition may impact antioxidant potential, which

increases with flavan-3-ol B-ring hydroxylation (Rice-Evans et al., 1997). Additionally, the

mean degree of polymerization (mDP; average polymer length) of ‘LAN3017’ PAs was 2-3 fold

longer than ‘Courier’ or ‘Solido’ PAs. The mechanism controlling polymerization remains

obscured, but is particularly relevant as bioavailability is inversely related to the mDP

(Rasmussen et al., 2005). In this regard, pea offers a valuable system to investigate the molecular

basis for subtle biochemical differences, which can be integrated to the human diet and health.

6.2 PA-branch point enzymes: anthocyanidin reductase and leucoanthocyanidin reductase

To better understand PA biosynthesis in pea, anthocyanidin reductase (ANR) and

leucoanthocyanidin reductase (LAR), which catalyze the PA pathway branch points from the

flavonoid pathway (Fig. 1.1), were cloned and biochemically characterized.

6.2.1 Anthocyanidin reductase

PsANR displayed high conservation between the three PACs. In vitro biochemical

characterization confirmed that PsANR is highly active towards anthocyanidins (Table 3.4), with

kinetic properties comparable to ANRs from Arabidopsis thaliana, Medicago truncatula, and

grape (Vitis vinifera; Xie et al., 2004b, Gargouri et al., 2009a). With respect to kinetic

preferences between substrates based on their degree of B-ring hydroxylation, PsANR was

152

similar to MtANR, using the 3'- and 2',3'-hydroxylated substrates, pelargonidin and cyanidin, respectively, more efficiently than 2',3',4'-hydroxylated delphinidin. This was opposite the preference of AtANR, suggesting structural differences influence substrate acceptance. Overall, the high in vitro activity of PsANR and accumulation of significant levels of PAs in pea seed coat were consistent with the central role of ANR in pea PA biosynthesis.

Recently, tea (Camellia sinensis) ANR and VvANR were reported to catalyze the in vitro epimerization of 2,3-cis-flavan-3-ols to 2,3-trans-flavan-3-ols (Gargouri et al., 2009b, Pang et al., 2013a). However, the products of this reaction were the naturally rare stereoisomers 2S,3S-

(+)-cis- and 2S,3R -(-)-trans-flavan-3-ols (Fig. 1.2), which have not been reported in either species. Whether epimerization by CsANR and VvANR occurs in planta remains unknown. If not, it suggests additional factors in planta prevent formation of these products. If epimerization does occur, others factors may either limit the accumulation of these products to levels below the detection limits of current methods. No epimerase activity was detected with PsANR, indicating that PsLAR is the source of trans-flavan-3-ols in pea seed coat.

6.2.2 Metabolic flux through leucoanthocyanidin reductase

The ability of LAR to efficiently synthesize 2,3-trans-flavan-3-ols in vitro is well established (Fig. 3.6; Tanner et al., 2003, Bogs et al., 2005, Pfeiffer et al., 2006). Furthermore, the lack of LAR in Arabidopsis and the low activity of Medicago truncatula LAR, support the role of LAR as the endogenous source of trans-flavan-3-ols in PA biosynthesis, as both of these species produce PAs composed almost exclusively of cis-flavan-3-ols (Lepiniec et al., 2006,

Pang et al., 2007). However, the picture of the metabolic flux through LAR remains murky.

153

In an attempt to learn more about flux through PsANR and PsLAR, the flavan-3-ol

subunit composition of soluble PAs and the relative transcript abundance of these two genes

were examined during pea seed development (Table 3.3). In ‘Courier’, the increase in the molar

percent of GC in PAs 12-25 days after anthesis (DAA) appeared in conflict with the expression

profile of PsLAR (Fig. 3.8B). PsLAR is highly expressed early in development but declines

quickly and reaches negligible levels by 12 DAA. Thus, the source of GC incorporated into PAs

late in development is unclear. It may be possible that a pool of trans-flavan-3-ols accumulates

early in seed coat development, which later provides substrates for polymerization.

Numerous attempts to produce trans-flavan-3-ols in vivo by heterologous expression of

LAR have been unsuccessful (Section 3.2.7; Tanner et al., 2003, Pang et al., 2007, Pang et al.,

2013a). This is surprising as heterologous expression of MtANR, AtANR, CsANR, and two apple

(Malus x domestica Borkh.) ANRs in tobacco (Nicotiana benthamiana) or M. truncatula hairy

roots resulted in a significant increase in monomeric EC and PA accumulation (Xie et al., 2003,

Han et al., 2012, Pang et al., 2013a). Surprisingly, ectopic MdANR expression also increased

accumulation of Cat in tobacco (Han et al., 2012). In contrast, the co-expression of CsLAR and

Production of Anthocyanin Pigment1 (PAP1), a transcription factor that positively influences

anthocyanin production, in tobacco increased levels of EC substantially more than did co- expression of CsANR and PAP1 (Pang et al., 2013a). Similarly, silencing of anthocyanin

synthase (ANS), which provides the substrate for ANR and competes for substrate with LAR, in

apple (Malus sp.) led to an expected increase in Cat as well as an unexpected increase in EC

(Szankowski et al., 2009). Thus, there is discontinuity between expected shifts in metabolic flux

through ANR and LAR, and the in planta metabolite accumulation of flavan-3-ols. Tobacco

constitutively expressing CsLAR and PAP1 did accumulate Cat, representing a rare success in

154

heterologous expression of LAR, but the need to unnaturally increase flux through the flavonoid

pathway using PAP1 raises questions about the biological relevance of these results.

Previous attempts of heterologous LAR expression have employed a common constitutive

cauliflower mosaic virus 35S (CaMV35S) promoter. The activity of the CaMV35S promoter in

seed coat may be inefficient in some species, despite reports of ectopic function of this promoter

(Odell et al., 1985, Young et al., 2008, Wu et al., 2010). In an attempt to synchronize

heterologous expression of PsLAR with the PA pathway in Arabidopsis seed coat, a portion of

the Arabidopsis ANR promoter was fused to FLAG-epitope tagged PsLAR, and transformed into

wild-type Arabidopsis and an Arabidopsis anr T-DNA insertion mutant (Section 3.2.7). Despite

detectable PsLAR in developing siliques of the transgenic plants, no accumulation of Cat or PAs

was observed in mature seeds of wild-type and anr lines, respectively (Fig. 3.9 and 3.10). This

apparent lack of in vivo PsLAR activity is puzzling, considering that PsLAR was highly active in

vitro (Fig. 3.6). Thus, while our understanding of the in vitro properties of LAR is consistent with its accepted role in planta, the metabolic flux through this branch point enzyme is less clear.

In confronting this discrepancy, it is curious to note that in backgrounds with natural flavonoid pathway flux, ectopic expression of LAR has only been successful in increasing flavan-3-ol and

PA content when the source of the LAR clone (i.e. Populus trichocarpa) belonged to the same genus as the species in which LAR was expressed (i.e. P. tomentosa Carr.; Yuan et al., 2012,

Wang et al., 2013).

6.2.3 Protein-protein interactions and metabolite channelling in the phenylpropanoid pathway

The phenylpropanoid pathway is highly branched and there exist significant possibilities for competition between enzymes for intermediates. Additionally, some intermediates in

155

flavonoid biosynthesis are labile, notably leucoanthocyanidins, the substrate for LAR and ANS.

Metabolite channelling has been proposed as way for plants to control flux to different branches

of phenylpropanoid biosynthesis and also to protect labile intermediates from degradation or

cellular machinery from toxic intermediates (Winkel, 2004).

Specific isoforms of biosynthetic genes may be involved in different branches of

phenylpropanoid metabolism, and may be the key to channelling intermediates toward different

branches of the phenylpropanoid pathway. For example, quaking aspen (Populus tremuloides)

phenylalanine ammonia lyase1 (PAL1) is primarily associated with PA production whereas

PtrPAL2 appears to be mainly involved in lignin biosynthesis (Kao et al., 2002). Similarly,

Arabidopsis 4-coumarate:CoA ligase1 (4CL1) and 4CL2 correlate with lignin and non-flavonoid

phenylpropanoid biosynthesis whereas 4CL3 is believed to primarily be involved in flavonoid

biosynthesis (Ehlting et al., 1999). Interactions between enzymes in the phenylpropanoid pathway may differ based on specific isoforms. Co-expression of tobacco (N. tabacum) cinnamate 4-hydroxylase (C4H) with NtPAL2-GFP caused a shift of the fluorescence signal

from the cytosol to the endoplasmic reticulum (ER), whereas NtPAL1-GFP localized to both

sites independent of NtC4H (Achnine et al., 2004).

In addition to mediating co-localization, these protein-protein interactions may also

enable metabolite channelling between enzymes catalyzing sequential reactions. Feeding of 3H-

L-phenylalanine to tobacco cell culture produced 3H-4-coumaric acid (Fig. 1.1). However, when exogenous trans-cinnamic acid was added, the proportion of radiolabelled 4-coumaric acid remained relatively unchanged, indicating that trans-cinnamic acid produced by PAL was channelled directly to CHS (Rasmussen and Dixon, 1999). Overall, there is good evidence

156

indicating interactions between enzymes in the phenylpropanoid pathway and of metabolite

channelling.

6.2.3.1 Does metabolite channelling occur between LAR and DFR?

LAR is encoded by a small gene family in grape (V. vinifera) and poplar (Populus spp.).

VvLAR1 and VvLAR2 both encode active enzymes but display different developmental expression profiles (Bogs et al., 2005). Poplar contains three copies of LAR that display different levels of induction in response to wounding and microbial pathogen attack, though their tissue expression profiles largely overlap (Tsai et al., 2006, Yuan et al., 2012). Thus, like PAL and

4CL, isoforms of LAR may be specialized toward different developmental and environmental

cues.

Several studies suggest possible metabolite channelling between DFR and LAR.

Incubation of 14C-dihydromyricetin (DHM) with sainfoin (Onobrychis viciifolia) leaf protein

isolates produced 14C-labelled GC, with only trace amounts of the radiolabelled intermediate,

leucodelphinidin (LD), detected (Singh et al., 1997). Similarly, no leucocyanidin intermediate was detected when VvDFR and VvLAR1 were co-incubated with dihydroquercetin (Maugé et al., 2010). Rapid degradation of leucoanthocyanidins could explain these observations; however, when 14C-DHM and 3H-LD were co-incubated with sainfoin protein isolates, the DMH-derived

radiolabel was incorporated into gallocatechin at more than a 5:1 ratio over the LD-derived label

(Fig. 1.1), which indicates that LD produced by OvDFR does not freely exchange with 3H-LD in solution (Singh et al., 1997). Finally, both the DFR and ANR substrates, dihydroflavanols and anthocyanidins, respectively, inhibit LAR at physiologically relevant concentrations (Tanner et al., 2003). This indirectly suggests plants have a mechanism for separating these compounds

157

away from LAR in vivo. Together, these results indicate metabolite channelling between DFR

and LAR.

Interaction between DFR and LAR could explain why heterologous expression of LAR in

distantly related species failed to result in an increase of trans-flavan-3-ols while expression in a closely related species was successful (Tanner et al., 2003, Pang et al., 2007, Yuan et al., 2012,

Wang et al., 2013). This could account for the consistent in vitro activity of LAR and DFR-LAR

coupled assays, where protein-protein interaction is not required (Fig. 3.6; Tanner et al., 2003,

Pfeiffer et al., 2006, Paolocci et al., 2007, Maugé et al., 2010). Conversely, this may explain why

increased flux through the flavonoid pathway by co-expression of PAP1 with CsLAR in tobacco

successfully produced trans-flavan-3-ols, as elevated substrate levels could have negated the lack

of efficient substrate channelling between NtDFR and CsLAR (Pang et al., 2013a). The co-

localization of LAR or ANR with other flavonoid biosynthetic enzymes has not been reported.

On the contrary, localization of MtLAR, MtANS, and MtANR is reportedly cytosolic (Pang et al., 2007). However, localization was determined by expression of a GFP-fusion construct, which has previous been shown to disrupt multi-enzyme complex formation (Nielsen et al.,

2008). Recently, immunolocalization, which does not involve a bulky fusion protein, indicated that VvANS is largely localized to the ER in grape berry skin (Wang et al., 2010), in contrast to the reported cytosolic localization of MtANS-GFP (Pang et al., 2007).

Despite considerable research to date, many questions remain unanswered regarding the metabolite flux through LAR. Immunolocalization of LAR and DFR could shed further light on this possibility. Silencing of LAR in species that contain a significant quantity of polymeric trans-flavan-3-ols could also be very informative with respect to PA pathway flux. Similarly, heterologous expression of LAR from different species within the same host could help address

158

why previous attempts at heterologous expression were unsuccessful. Pea is amendable to

transformation (Krejčí et al., 2007) and ‘LAN3017’ PAs contain high levels of Cat (Table 3.1),

making it an ideal species in which to conduct further studies of LAR.

6.3 Pea seed coat comparative transcriptomics

In addition to providing a protective envelop for the embryo, the pea seed coat functions

as a transient storage organ and mediator of signals influencing seed development (Weber et al.,

2005). To examine the genetic basis for the varied seed phenotypes of the five pea cultivars (Fig.

4.1), comparative transcriptomics analyses of the seed coats early in development were performed using complementary next generation-sequencing technologies.

The comparable expression of two gibberellin-related genes, PsGA3ox and PsGA20ox

(Fig. 4.8B), both of which display substantial fluctuations in expression during early pea seed coat development (Nadeau et al., 2011), the parallel growth of the plants, and harvesting of tissues at the same developmental time point (10 DAA) collectively indicate that the seed coats of all five cultivars were in a common stage of development (pre-storage phase) when analyzed.

Thus, the gene expression differences identified could have biological significance.

RNA-Seq analysis showed a metabolically active state in pea seed coat at 10 DAA.

Protein metabolism, macromolecular compound metabolism and transport processes were among the most prominent gene ontology categories identified (Fig. 4.4). The most obvious phenotypic difference was between PACs and PLCs, the latter of which displayed broad down-regulation of

PA biosynthetic and regulatory genes (Table 4.4). A point mutation in A, a pea bHLH transcription factor (TT8 ortholog) that regulates the PA pathway (Hellens et al., 2010), was found only in the two PLCs and is the most likely cause of this down-regulation (Section 4.3.2).

159

Additionally, ‘Solido’ produced seeds that were 33-43% heavier than those of the other four

cultivars (Fig. 4.1). Comparative transcriptomic analyses revealed gene expression differences in

‘Solido’ related to gibberellin and amino acid metabolism (Fig. 4.8B; Appendix A1); however

the uniquely high expression of a cytochrome P450 (CYP) CYP78A family member related to

seed sized was particularly interesting (Table 4.5).

6.3.1 RNA-Seq analysis uncovered a CYP78A gene family homolog in ‘Solido’

Analysis of differential gene expression between ‘Solido’ and ‘Courier’/’LAN3017’ was

conducted in an attempt to identify novel genes or processes related to seed dry weight (Section

4.4.1). Among the 116 unigenes that displayed higher expression in ‘Solido’, a unigene

(contig12605) annotated as CYP78A family member was identified. Three members of the

CYP78A family, KLUH (KLU)/78A5, EOD3/78A6 and CYP78A9, have been linked to seed size

and weight in Arabidopsis (Adamski et al., 2009, Fang et al., 2012). Overall, the expression of

contig12605 was more than 100-fold greater in ‘Solido’ relative to the other four cultivars (Table

4.5). A number of other genes associated with seed size have been described, including

MINISEED3, HAIKU1, HAIKU2, APETALA2, AUXIN RESPONSE FACTOR2, KIP-RELATED

PROTEIN2, a DNA methyltransferase MET1, and TRANSPARENT TESTA GLABRA2, which all either lacked a homolog in the pea seed coat transcriptome or displayed comparable expression in all five cultivars (see Luo et al., 2005 and Adamski et al., 2009 and references contained within). However, the involvement of these genes in pea seed size through their expression in non-seed coat tissues cannot be ruled out.

KLU was the first member of the CYP78A family identified and is the most extensively characterized (Zondlo and Irish, 1999). In Arabidopsis, KLU expression occurs in the embryonic

160

and vegetative shoot apical meristem, at the base of developing leaves, in developing floral tissues, and at the shoot-lateral organ boundary (Zondlo and Irish, 1999, Anastasiou et al., 2007).

A modest increase in expression limited to endogenous KLU domains, increased the cell number in leaves, sepals, and petals, consequently increasing the size of these structures (Anastasiou et al., 2007). In contrast, klu loss-of-function mutants develop smaller leaves, sepals and petals than wild-type or heterozygous plants (Anastasiou et al., 2007). Growth in these tissues arrests earlier in klu mutants, whereas the growth period was extended in plants with elevated KLU expression

(Anastasiou et al., 2007). KLU has been proposed to produce a mobile growth factor that temporally regulates the growth period and prevents premature termination, independent of known phytohormones (Anastasiou et al., 2007, Eriksson et al., 2010, Kazama et al., 2010). The nature of this growth factor remains hypothetical, but based on sequence homology, it has been proposed to be a modified fatty acid (Anastasiou et al., 2007).

CYP78A9, identified by activation tagging in Arabidopsis, is expressed in floral tissues where it is associated with carpel development (Ito and Meyerowitz, 2000). An Arabidopsis cyp78a9 loss-of-function mutant produced lighter, smaller seeds and shorter siliques; however, the leaves, petals, stems, number of siliques per plant and number ovules per silique were all indistinguishable from wild-type plants (Fang et al., 2012). Furthermore, a cyp78a9 eod3 double- knockout displayed a more severe phenotype that included a reduced number of siliques per plant, rounding of the seeds, and reduced petal and leaf size, which together indicates that

CYP78A9 and EOD3 act redundantly to regulate specific tissue development (Fang et al., 2012).

However, the reduction in petal and leaf size was substantially less than that seen in klu mutants, suggesting that the function of CYP78A9 and EOD3 may be more tissue specific than KLU.

161

Significantly with respect to seed development, the functions of EOD3 and CYP78A9 are based in maternal tissues. The size and weight of seeds produced by an eod3 loss-of-function mutant and an eod3 cyp78a9 double-knockout were independent of a wild-type genotype of the embryo and endosperm (Fang et al., 2012). Furthermore, the Arabidopsis eod3 cyp78a9 double- knockout developed smaller outer integuments – an outer layer of the ovule (maternal tissue) that differentiates into the seed coat – whereas eod3 and cyp78a9 gain-of-function mutants formed larger ovules with increased cell numbers in the outer integument as well as larger embryos

(Fang et al., 2012, Sotelo-Silveira et al., 2013). In contrast, outer integument development was arrested in an Arabidopsis cyp78a8 cyp78a9 loss-of-function mutant (Sotelo-Silveira et al.,

2013).

These results indicate that the CYP78A family plays a significant role in seed development, which makes the high expression of contig12605 in ‘Solido’ seed coat all the more compelling. The correlation between elevated KLU expression and an extended period of growth

(Anastasiou et al., 2007) is similar to the observation that embryos of an eod3 gain-of-function mutant progressed slightly slower through the initial stages of development compared to wild- type (Fang et al., 2012). Interestingly, the relationship of elevated CYP78A expression with a reduced developmental rate is consistent with the high expression of contig12605 in ‘Solido’ and the suggestion of prolonged early seed coat development in ‘Solido’, relative to the other four cultivars (Section 4.4.3.1).

The uniquely high expression of a CYP78A-like gene in ‘Solido’ is very intriguing and may be useful as a molecular marker for future pea breeding. The developmental expression profile of contig12605 in pea seed coat needs to be examined to determine if peak expression is in fact higher in ‘Solido’ or if the duration of expression is longer. As well, the expression of

162

contig12605 in additional pea cultivars that produce a range of seed weights is needed before contig12605 can be established as a molecular marker of seed weight in pea.

6.3.2 A possible role for PA-Related Unknown Protein (PUP) in PA trafficking

Evidence of vesicle-mediated transport of PA precursors has accumulated (see section

1.3.6 for details), and the characterization of PUP in this work (Chapter 5) provides additional support. Analysis of differential gene expression between PACs and PLCs identified a transcript, referred to as PA-related Unknown Protein (PUP), represented by contig03511 and contig03509, which was virtually exclusively expressed in PACs (Section 4.4.2; Table 4.5). At the amino acid level, PsPUP appeared conserved among diverse plant species that produce PAs. Tissue specific expression of the Arabidopsis ortholog, AT2G47115 (AtPUP; Fig. 5.2B), and coordination of the

M. truncatula ortholog by known PA transcription factors (Section 4.4.2), further indicated that expression of PUP is associated with the flavonoid pathway. The subcellular localization of

AtPUP was reminiscent of the localization of ARA7 (Section 5.3), an Arabidopsis Rab5-GTPase located to pre-vacuolar vesicles (Jia et al., 2013). Furthermore, transiently expressed AtPUP-

GFP was observed on mobile cytosolic punctate structures in tobacco leaf pavement cells.

Subcellular localization of AtPUP-GFP also indicated that the protein may be located on large vesicle- or provacuole-like structures (Fig. 5.3G and J). Earlier works with white spruce (Picea glauca [Moench] Voss), slash pine (Pinus elliotti), Douglas-fir (Pseudotsuga menziesii) and loblolly pine (Pinus taeda) cell cultures noted the formation of granules, vesicles and provacuoles containing PA-like material (Chafe and Durzan, 1973, Baur and Walkinshaw, 1974,

Parham and Kaustinen, 1977). Significantly, these structures were also observed within the vacuole, and at times associated with the lumenal side of the tonoplast (vacuole membrane).

163

Within this context, PUP may be located on the membrane of vesicles or provacuoles associated

with PAs.

The increase of soluble PAs (e.g. non-oxidized or short polymers, and flavan-3-ol dimers) in an Arabidopsis pup knockout appears to suggest the association of AtPUP with PA metabolism (Fig. 5.7). It also helps to deduce a potential role for PUP in the PA pathway.

Soluble PAs have been shown to be increased by several different genetic modifications in plants. The constitutive expression of MtANR in M. truncatula hairy roots increased flux through flavan-3-ols, which greatly increased soluble PAs, as well as insoluble PAs though to a far lesser degree (Pang et al., 2013b). Similarly, constitutive expression of UGT72L1, an epicatechin (EC) glucosyltransferase, increased the level of soluble PAs as well as EC-3-O-glucoside (EC3'OG;

Pang et al., 2013b) and seeds of an Arabidopsis tt12 (MATE transporter) loss-of-function mutant accumulated an EC-hexoside, presumably EC3'OG (Kitamura et al., 2010). Also, free EC, which does not normally accumulate in mature Arabidopsis seeds, accumulated in an Arabidopsis aha10 loss-of-function mutant (Baxter et al., 2005). Finally, seeds of an Arabidopsis tt10 (a laccase-like enzyme) loss-of-function mutant accumulated higher levels of soluble PAs and free

EC compared to wild-type (Pourcel et al., 2005). Together, these observations demonstrate that soluble PA levels can be increased with elevated flux through the PA pathway, a lack of precursor transport to the vacuole, and/or a lack of oxidation.

Since PUP lacks an obvious catalytic domain, its function in biosynthesis, oxidation, or polymerization is unlikely. Instead, the increased level of soluble PAs in Arabidopsis pup knockout seeds and the subcellular localization of AtPUP suggest reduced efficiency of transport of soluble PAs to the central vacuole. Pre-vacuolar PAs in vesicles originating in the vicinity of the ER were reported in the previous work with douglas-fir and slash pine cell cultures (Baur and

164

Walkinshaw, 1974, Parham and Kaustinen, 1977). Chafe and Durzan (1973) also reported PA-

containing small vacuole-like structures that appeared to progress toward the central vacuole. A

similar observation was recently made in grape, where tannosomes (PA-containing vesicles)

were reported crossing or merging with the tonoplast (Brillouet et al., 2013). In this context, PUP

may be localized to vesicles or provacuoles involved in transporting soluble PAs to the central

vacuole. It is worth noting that similar membrane-bound structures containing vanillin-reactive

products (section 1.3.6.1) were observed in seeds of an Arabidopsis tt19 loss-of-function mutant

(Kitamura et al., 2010). Significantly, TT12-GFP was observed on the surface of these structures in the tt19 mutants (Kitamura et al., 2010).

However, in the Arabidopsis pup mutant, the increase of soluble PA was subtle, although statistically significant, and no decrease of insoluble PA content was detected. Therefore, there may be redundant mechanisms for the PA-related transport in planta. If vesicles containing PUP do in fact transport soluble PAs, this protein may prove a useful marker for elucidating the origin and route of transport. However, despite these advances, our understanding of the trafficking of

PA precursors is hampered by the fact that the substrates for and the process of PA polymerization remain unknown.

Vesicle-mediated trafficking in the PA pathway requires further investigation. Brillouet et al. (2013) developed a clever method for visualizing PAs in vivo by exploiting the protein binding property of PAs. PA-containing tissue sections were treated with gelatin fused to Oregon

Green, a fluorophore excitable at 488 nm, allowing the visualization of PA-filled vesicles by confocal microscopy. However, the vesicles bearing PUP may transport PA-precursors, not polymers. Thus, the use of immunolocalization of PUP in fixed tissue stained with vanillin or p- dimethylaminocinnamaldehyde (DMACA) combined with the method developed by Brillouet et

165

al. (2013) could aid in determining of the nature of the compounds within PUP containing vesicles (Feucht and Schmid, 1983).

166

References

Abrahams, S, Lee E, Walker AR, Tanner GJ, Larkin PJ and Ashton AR (2003). The Arabidopsis TDS4 gene encodes leucoanthocyanidin dioxygenase (LDOX) and is essential for proanthocyanidin synthesis and vacuole development. Plant J 35(5): 624-636. Achnine, L, Blancaflor EB, Rasmussen S and Dixon RA (2004). Colocalization of L- Phenylalanine Ammonia-Lyase and Cinnamate 4-Hydroxylase for Metabolic Channeling in Phenylpropanoid Biosynthesis. The Plant Cell Online 16(11): 3098- 3109. Adamski, NM, Anastasiou E, Eriksson S, O'Neill CM and Lenhard M (2009). Local maternal control of seed size by KLUH/CYP78A5-dependent growth signaling. Proceedings of the National Academy of Sciences 106(47): 20115-20120. Adlercreutz, H (2007). Lignans and Human Health. Critical Reviews in Clinical Laboratory Sciences 44(5-6): 483-525. Aerts, RJ, Barry TN and McNabb WC (1999). Polyphenols and agriculture: beneficial effects of proanthocyanidins in forages. Agriculture, Ecosystems & Environment 75(1–2): 1- 12. Alexandrov, N, Troukhan M, Brover V, Tatarinova T, Flavell R and Feldmann K (2006). Features of Arabidopsis Genes and Genome Discovered using Full-length cDNAs. Plant Molecular Biology 60(1): 69-85. Almeida, JR, D'Amico E, Preuss A, Carbone F, de Vos CH, Deiml B, Mourgues F, Perrotta G, Fischer TC, Bovy AG, Martens S and Rosati C (2007). Characterization of major enzymes and genes involved in flavonoid and proanthocyanidin biosynthesis during fruit development in strawberry (Fragaria xananassa). Arch Biochem Biophys 465(1): 61-71. Anastasiou, E, Kenz S, Gerstung M, MacLean D, Timmer J, Fleck C and Lenhard M (2007). Control of Plant Organ Size by KLUH/CYP78A5-Dependent Intercellular Signaling. Developmental cell 13(6): 843-856. Appelhagen, I, Lu G-H, Huep G, Schmelzer E, Weisshaar B and Sagasser M (2011). TRANSPARENT TESTA1 interacts with R2R3-MYB factors and affects early and late steps of flavonoid biosynthesis in the endothelium of Arabidopsis thaliana seeds. The Plant Journal 67(3): 406-419. Baranyi, M and Greilhuber J (1996). Flow cytometric and Feulgen densitometric analysis of genome size variation in Pisum. Theoretical and Applied Genetics 92(3-4): 297-307. Baudry, A, Caboche M and Lepiniec L (2006). TT8 controls its own expression in a feedback regulation involving TTG1 and homologous MYB and bHLH factors, allowing a strong and cell-specific accumulation of flavonoids in Arabidopsis thaliana. The Plant Journal 46(5): 768-779. Baudry, A, Heim MA, Dubreucq B, Caboche M, Weisshaar B and Lepiniec L (2004). TT2, TT8, and TTG1 synergistically specify the expression of BANYULS and proanthocyanidin biosynthesis in Arabidopsis thaliana. The Plant Journal 39(3): 366- 380. Baur, PS and Walkinshaw CH (1974). Fine structure of tannin accumulations in callus cultures of Pinus elliotti (slash pine). Canadian Journal of Botany 52(3): 615-619.

167

Baxter, IR, Young JC, Armstrong G, Foster N, Bogenschutz N, Cordova T, Peer WA, Hazen SP, Murphy AS and Harper JF (2005). A plasma membrane H+-ATPase is required for the formation of proanthocyanidins in the seed coat endothelium of Arabidopsis thaliana. PNAS 102(7): 2649-2654. Berner, M, Krug D, Bihlmaier C, Vente A, Müller R and Bechthold A (2006). Genes and Enzymes Involved in Caffeic Acid Biosynthesis in the Actinomycete Saccharothrix espanaensis. Journal of Bacteriology 188(7): 2666-2673. Biolley, JP and Jay M (1993). Anthocyanins in Modern Roses: Chemical and Colorimetric Features in Relation to the Colour Range. Journal of Experimental Botany 44(11): 1725-1734. Boerjan, W, Ralph J and Baucher M (2003). Lignin biosynthesis. Annu. Rev. Plant Biol. 54(1): 519-546. Bogs, J, Downey MO, Harvey JS, Ashton AR, Tanner GJ and Robinson SP (2005). Proanthocyanidin Synthesis and Expression of Genes Encoding Leucoanthocyanidin Reductase and Anthocyanidin Reductase in Developing Grape Berries and Grapevine Leaves. Plant Physiol. 139(2): 652-663. Bogs, J, Jaffe FW, Takos AM, Walker AR and Robinson SP (2007). The grapevine transcription factor VvMYBPA1 regulates proanthocyanidin synthesis during fruit development. Plant Physiol 143(3): 1347-1361. Bradford, MM (1976). A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Analytical biochemistry 72(1): 248-254. Brillouet, J-M, Romieu C, Schoefs B, Solymosi K, Cheynier V, Fulcrand H, Verdeil J-L and Conéjéro G (2013). The tannosome is an organelle forming condensed tannins in the chlorophyllous organs of Tracheophyta. Ann Bot 112(6): 1003-1014. Britsch, L and Grisebach H (1986). Purification and characterization of (2S)-flavanone 3- hydroxylase from Petunia hybrida. Eur. J. Biochem. 156(3): 569-577. Brugliera, F, Barri-Rewell G, Holton TA and Mason JG (1999). Isolation and characterization of a flavonoid 3′-hydroxylase cDNA clone corresponding to the Ht1 locus of Petunia hybrida. Plant J. 19(4): 441-451. Chafe, SC and Durzan DJ (1973). Tannin inclusions in cell suspension cultures of white spruce. Planta 113(3): 251-262. Chakrabarty, R, Banerjee R, Chung S-M, Farman M, Citovsky V, Hogenhout SA, Tzfira T and Goodin M (2007). pSITE Vectors for Stable Integration or Transient Expression of Autofluorescent Protein Fusions in Plants: Probing Nicotiana benthamiana-Virus Interactions. Molecular Plant-Microbe Interactions 20(7): 740-750. Cheng, Y, Dai X and Zhao Y (2006). Auxin biosynthesis by the YUCCA flavin monooxygenases controls the formation of floral organs and vascular tissues in Arabidopsis. Genes & Development 20(13): 1790-1799. Cheng, Y, Dai X and Zhao Y (2007a). Auxin Synthesized by the YUCCA Flavin Monooxygenases Is Essential for Embryogenesis and Leaf Formation in Arabidopsis. The Plant Cell Online 19(8): 2430-2439. Cheng, Y, Qin G, Dai X and Zhao Y (2007b). NPY1, a BTB-NPH3-like protein, plays a critical role in auxin-regulated organogenesis in Arabidopsis. Proceedings of the National Academy of Sciences 104(47): 18825-18829.

168

Chevreux, B, Wetter T and Suhai S (1999). Genome sequence assembly using trace signals and additional sequence information. German Conference on Bioinformatics, Hannover, Germany. Chung, WG, Miranda CL, Stevens JF and Maier CS (2009). Hop proanthocyanidins induce apoptosis, protein carbonylation, and disorganization in human colorectal adenocarcinoma cells via reactive oxygen species. Food Chem Toxicol 47(4): 827-836. Creasy, LL and Swain T (1965). Structure of condensed tannins. Nature 208: 151-153. Crespi, M and Gálvez S (2000). Molecular Mechanisms in Root Nodule Development. Journal of Plant Growth Regulation 19(2): 155-166. Davin, LB, Wang H-B, Crowell AL, Bedgar DL, Martin DM, Sarkanen S and Lewis NG (1997). Stereoselective bimolecular phenoxy radical coupling by an auxiliary (dirigent) protein without an active center. Science 275(5298): 362-367. de Alba, AEM, Parent J-S and Vaucheret H (2013). Small RNA-Mediated Control of Development in Plants. Epigenetic Memory and Control in Plants. G. Grafi and N. Ohad, Springer Berlin Heidelberg. 18: 177-199. Debeaujon, I, Léon-Kloosterziel KM and Koornneef M (2000). Influence of the Testa on Seed Dormancy, Germination, and Longevity in Arabidopsis. Plant Physiology 122(2): 403-414. Debeaujon, I, Nesi N, Perez P, Devic M, Grandjean O, Caboche M and Lepiniec L (2003). Proanthocyanidin-accumulating cells in Arabidopsis testa: regulation of differentiation and role in seed development. Plant Cell 15(11): 2514-2531. Debeaujon, I, Peeters AJM, Léon-Kloosterziel KM and Koornneef M (2001). The TRANSPARENT TESTA12 Gene of Arabidopsis Encodes a Multidrug Secondary Transporter-like Protein Required for Flavonoid Sequestration in Vacuoles of the Seed Coat Endothelium. The Plant Cell Online 13(4): 853-871. Delgado-Alvarado, A, Walker RP and Leegood RC (2007). Phosphoenolpyruvate carboxykinase in developing pea seeds is associated with tissues involved in solute transport and is nitrogen-responsive. Plant, Cell & Environment 30(2): 225-235. Delle Monache, F, Ferrari F, Poce-Tucci A and Marini-Bettolo GB (1972). Catechins with (+)- epi-configuration in nature. Phytochemistry 11(7): 2333-2335. Delseny, M, Han B and Hsing YI (2010). High throughput DNA sequencing: the new sequencing revolution. Plant Science 179(5): 407-422. Devic, M, Guilleminot J, Debeaujon I, Bechtold N, Bensaude E, Koornneef M, Pelletier G and Delseny M (1999). The BANYULS gene encodes a DFR-like protein and is a marker of early seed coat development. Plant J 19(4): 387-398. Dixon, RA, Liu C and Jun JH (2013). Metabolic engineering of anthocyanins and condensed tannins in plants. Current Opinion in Biotechnology 24(2): 329-335. Dixon, RA and Pasinetti GM (2010). Flavonoids and Isoflavonoids: From Plant Biology to Agriculture and Neuroscience. Plant Physiology 154(2): 453-457. Dixon, RA and Sumner LW (2003). Legume Natural Products: Understanding and Manipulating Complex Pathways for Human and Animal Health. Plant Physiology 131(3): 878-885. Dixon, RA, Xie DY and Sharma SB (2005). Proanthocyanidins--a final frontier in flavonoid research? New Phytol 165(1): 9-28.

169

Ehlting, J, Büttner D, Wang Q, Douglas CJ, Somssich IE and Kombrink E (1999). Three 4- coumarate:coenzyme A ligases in Arabidopsis thaliana represent two evolutionarily divergent classes in angiosperms. The Plant Journal 19(1): 9-20. Ellis, THN, Hofer JMI, Timmerman-Vaughan GM, Coyne CJ and Hellens RP (2011). Mendel, 150 years on. Trends in Plant Science 16(11): 590-596. Eriksson, S, Stransfeld L, Adamski NM, Breuninger H and Lenhard M (2010). KLUH/CYP78A5-Dependent Growth Signaling Coordinates Floral Organ Growth in Arabidopsis. Current biology 20(6): 527-532. Fang, W, Wang Z, Cui R, Li J and Li Y (2012). Maternal control of seed size by EOD3/CYP78A6 in Arabidopsis thaliana. The Plant Journal 70(6): 929-939. Feucht, W and Schmid P (1983). Selektiver histochemischer Nachweis von Flavanen (Catechinen) mit p-Dimethylaminozimtaldehyd in Sprossen einiger Obstgeholze. Gartenbauwissenschaft (Horticultural science). Finkelstein, RR, Gampala SSL and Rock CD (2002). Abscisic Acid Signaling in Seeds and Seedlings. The Plant Cell Online 14(suppl 1): S15-S45. Franssen, S, Shrestha R, Bräutigam A, Bornberg-Bauer E and Weber A (2011). Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing. BMC Genomics 11(12): 227-242. Gagné, S, Lacampagne S, Claisse O and Gény L (2009). Leucoanthocyanidin reductase and anthocyanidin reductase gene expression and activity in flowers, young berries and skins of Vitis vinifera L. cv. Cabernet-Sauvignon during development. Plant Physiology and Biochemistry 47(4): 282-290. Gang, DR, Wang J, Dudareva N, Nam KH, Simon JE, Lewinsohn E and Pichersky E (2001). An Investigation of the Storage and Biosynthesis of Phenylpropenes in Sweet Basil. Plant Physiology 125(2): 539-555. García-Martínez, J, López-Diaz I, Sánchez-Beltrán M, Phillips A, Ward D, Gaskin P and Hedden P (1997). Isolation and transcript analysis of gibberellin 20-oxidase genes in pea and bean in relation to fruit development. Plant Molecular Biology 33(6): 1073-1084. Gargouri, M, Chaudière J, Manigand C, Mauge C, Bathany K, Schmitter J-M and Gallois B (2010). The epimerase activity of anthocyanidin reductase from Vitis vinifera and its regiospecific hydride transfers. Biol. Chem. 391(2/3): 219-227. Gargouri, M, Gallois B and Chaudière J (2009a). Binding-equilibrium and kinetic studies of anthocyanidin reductase from Vitis vinifera. Archives of Biochemistry and Biophysics 491(1-2): 61-68. Gargouri, M, Manigand C, Mauge C, Granier T, Langlois d'Estaintot B, Cala O, Pianet I, Bathany K, Chaudiere J and Gallois B (2009b). Structure and epimerase activity of anthocyanidin reductase from Vitis vinifera. Acta Crystallographica Section D 65(9): 989-1000. Gatel, F and Grosjean F (1990). Composition and nutritive value of peas for pigs: A review of European results. Livestock Production Science 26(3): 155-175. Giovane, A, Servillo L, Balestrieri C, Raiola A, D'Avino R, Tamburrini M, Ciardiello MA and Camardella L (2004). Pectin methylesterase inhibitor. Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics 1696(2): 245-252. Gomez, C, Conejero G, Torregrosa L, Cheynier V, Terrier N and Ageorges A (2011). In vivo grapevine anthocyanin transport involves vesicle-mediated trafficking and the

170

contribution of anthoMATE transporters and GST. The Plant Journal 67(6): 960- 970. Gonzalez, A, Mendenhall J, Huo Y and Lloyd A (2009). TTG1 complex MYBs, MYB5 and TT2, control outer seed coat differentiation. Developmental Biology 325(2): 412-421. Gonzalez, A, Zhao M, Leavitt JM and Lloyd AM (2008). Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. The Plant Journal 53(5): 814-827. Graham, HN (1992). Green tea composition, consumption, and polyphenol chemistry. Preventive Medicine 21(3): 334-350. Graham, PH and Vance CP (2003). Legumes: Importance and Constraints to Greater Use. Plant Physiology 131(3): 872-877. Griffiths, DW (1981). The polyphenolic content and enzyme inhibitory activity of testas from bean (Vicia faba) and pea (Pisum spp.) varieties. Journal of the Science of Food and Agriculture 32(8): 797-804. Gu, LW, Kelm MA, Hammerstone JF, Beecher G, Holden J, Haytowitz D, Gebhardt S and Prior RL (2004). Concentrations of proanthocyanidins in common foods and estimations of normal consumption. J. Nutr. 134(3): 613-617. Guillon, F and Champ MM-J (2002). Carbohydrate fractions of legumes: uses in human nutrition and potential for health. Br. J. Nutr. 88: 293-306. Hall, D and De Luca V (2007). Mesocarp localization of a bi-functional resveratrol/hydroxycinnamic acid glucosyltransferase of Concord grape (Vitis labrusca). The Plant Journal 49(4): 579-591. Hamberger, B, Ellis M, Friedmann M, de Azevedo Souza C, Barbazuk B and Douglas CJ (2007). Genome-wide analyses of phenylpropanoid-related genes in Populus trichocarpa, Arabidopsis thaliana, and Oryza sativa: the Populus lignin toolbox and conservation and diversification of angiosperm gene familiesThis article is one of a selection of papers published in the Special Issue on Poplar Research in Canada. Canadian Journal of Botany 85(12): 1182-1201. Han, Y, Vimolmangkang S, Soria-Guerra RE and Korban SS (2012). Introduction of apple ANR genes into tobacco inhibits expression of both CHI and DFR genes in flowers, leading to loss of anthocyanin. Journal of Experimental Botany. Harmatha, J and Dinan L (2003). Biological activities of lignans and stilbenoids associated with plant-insect chemical interactions. Phytochemistry Reviews 2(3): 321-330. He, F, Pan Q-H, Shi Y and Duan C-Q (2008). Biosynthesis and Genetic Regulation of Proanthocyanidins in Plants. Molecules 13(10): 2674-2703. He, J and Giusti MM (2010). Anthocyanins: Natural Colorants with Health-Promoting Properties. Annual Review of Food Science and Technology 1(1): 163-187. Hedley, CL and Ambrose MJ (1980). An Analysis of Seed Development in Pisum sativum L. Ann Bot 46(1): 89-105. Hellens, RP, Moreau C, Lin-Wang K, Schwinn KE, Thomson SJ, Fiers MWEJ, Frew TJ, Murray SR, Hofer JMI, Jacobs JME, Davies KM, Allan AC, Bendahmane A, Coyne CJ, Timmerman-Vaughan GM and Ellis THN (2010). Identification of Mendel's White Flower Character. PLoS ONE 5(10): e13230.

171

Hennegan, KP and Danna KJ (1998). pBIN20: An Improved Binary Vector for shape Agrobacterium-mediated Transformation. Plant Molecular Biology Reporter 16(2): 129-131. Henning, SM, Fajardo-Lira C, Lee HW, Youssefian AA, Go VLW and Heber D (2003). Catechin Content of 18 Teas and a Green Tea Extract Supplement Correlates With the Antioxidant Capacity. Nutrition and Cancer 45(2): 226-235. Hichri, I, Barrieu F, Bogs J, Kappel C, Delrot S and Lauvergeat V (2011). Recent advances in the transcriptional regulation of the flavonoid biosynthetic pathway. Journal of Experimental Botany 62(8): 2465-2483. Hodgson, JM (2008). Tea flavonoids and cardiovascular disease. Asia Pac J Clin Nutr 17(Suppl 1): 288-290. Holton, TA, Brugliera F, Lester DR, Tanaka Y, Hyland CD, Menting JGT, Lu C-Y, Farcy E, Stevenson TW and Cornish EC (1993). Cloning and expression of cytochrome P450 genes controlling flower colour. Nature 366(6452): 276-279. Hornett, EA and Wheat CW (2012). Quantitative RNA-Seq analysis in non-model species: assessing transcriptome assemblies as a scaffold and the utility of evolutionary divergent genomic reference species. BMC genomics 13(1): 361. Hümmer, W and Schreier P (2008). Analysis of proanthocyanidins. Molecular Nutrition & Food Research 52(12): 1381-1398. Iriti, M, Rossoni M, Borgo M, Ferrara L and Faoro F (2005). Induction of Resistance to Gray Mold with Benzothiadiazole Modifies Amino Acid Profile and Increases Proanthocyanidins in Grape: Primary versus Secondary Metabolism. Journal of Agricultural and Food Chemistry 53(23): 9133-9139. Ishida, T, Hattori S, Sano R, Inoue K, Shirano Y, Hayashi H, Shibata D, Sato S, Kato T, Tabata S, Okada K and Wada T (2007). Arabidopsis TRANSPARENT TESTA GLABRA2 Is Directly Regulated by R2R3 MYB Transcription Factors and Is Involved in Regulation of GLABRA2 Transcription in Epidermal Differentiation. Plant Cell 19(8): 2531-2543. Ito, T and Meyerowitz EM (2000). Overexpression of a Gene Encoding a Cytochrome P450, CYP78A9, Induces Large and Seedless Fruit in Arabidopsis. Plant Cell 12(9): 1541- 1550. Jia, T, Gao C, Cui Y, Wang J, Ding Y, Cai Y, Ueda T, Nakano A and Jiang L (2013). ARA7(Q69L) expression in transgenic Arabidopsis cells induces the formation of enlarged multivesicular bodies. Journal of Experimental Botany 64(10): 2817-2829. Jin, A, Ozga JA, Lopes-Lutz D, Schieber A and Reinecke DM (2012). Characterization of proanthocyanidins in pea (Pisum sativum L.), lentil (Lens culinaris L.), and faba bean (Vicia faba L.) seeds. Food Res. Int. 46(2): 528-535. Johnson, CS, Kolevski B and Smyth DR (2002). TRANSPARENT TESTA GLABRA2, a Trichome and Seed Coat Development Gene of Arabidopsis, Encodes a WRKY Transcription Factor. The Plant Cell Online 14(6): 1359-1375. Johnson, ET, Yi H, Shin B, Oh B-J, Cheong H and Choi G (1999). Cymbidium hybrida dihydroflavonol 4-reductase does not efficiently reduce dihydrokaempferol to produce orange pelargonidin-type anthocyanins. The Plant Journal 19(1): 81-85.

172

Kaló, P, Seres A, Taylor SA, Jakab J, Kevei Z, Kereszt A, Endre G, Ellis THN and Kiss GB (2004). Comparative mapping between Medicago sativa and Pisum sativum. Molecular Genetics and Genomics 272(3): 235-246. Kaltenbach, M, Schröder G, Schmelzer E, Lutz V and Schröder J (1999). Flavonoid hydroxylase from Catharanthus roseus: cDNA, heterologous expression, enzyme properties and cell-type specific expression in plants. The Plant Journal 19(2): 183- 193. Kao, Y-Y, Harding SA and Tsai C-J (2002). Differential Expression of Two Distinct Phenylalanine Ammonia-Lyase Genes in -Accumulating and Lignifying Cells of Quaking Aspen. Plant Physiology 130(2): 796-807. Karimi, M, Inzé D and Depicker A (2002). GATEWAY™ vectors for Agrobacterium- mediated plant transformation. Trends in Plant Sci. 7(5): 193-195. Katsumoto, Y, Fukuchi-Mizutani M, Fukui Y, Brugliera F, Holton TA, Karan M, Nakamura N, Yonekura-Sakakibara K, Togami J, Pigeaire A, Tao G-Q, Nehra NS, Lu C-Y, Dyson BK, Tsuda S, Ashikari T, Kusumi T, Mason JG and Tanaka Y (2007). Engineering of the Rose Flavonoid Biosynthetic Pathway Successfully Generated Blue-Hued Flowers Accumulating Delphinidin. Plant and Cell Physiology 48(11): 1589-1600. Kaur, S, Pembleton LW, Cogan NOI, Savin KW, Leonforte T, Paull J, Materne M and Forster JW (2012). Transcriptome sequencing of field pea and faba bean for discovery and validation of SSR genetic markers. BMC Genomics 13(1): 104-115. Kazama, T, Ichihashi Y, Murata S and Tsukaya H (2010). The Mechanism of Cell Cycle Arrest Front Progression Explained by a KLUH/CYP78A5-dependent Mobile Growth Factor in Developing Leaves of Arabidopsis thaliana. Plant and Cell Physiology 51(6): 1046-1054. Kennedy, JA and Jones GP (2001). Analysis of proanthocyanidin cleavage products following acid-catalysis in the presence of excess phloroglucinol. J. Agric. Food Chem. 49(4): 1740-1746. Khanbabaee, K and Ree Tv (2001). Tannins: classification and definition. Natural Product Reports 18(6): 641-649. Kibble, NAJ, Sohani MM, Shirley N, Byrt C, Roessner U, Bacic A, Schmidt O and Schultz CJ (2009). Phylogenetic analysis and functional characterisation of strictosidine synthase-like genes in Arabidopsis thaliana. Functional Plant Biology 36(12): 1098- 1109. Kitamura, S, Matsuda F, Tohge T, Yonekura-Sakakibara K, Yamazaki M, Saito K and Narumi I (2010). Metabolic profiling and cytological analysis of proanthocyanidins in immature seeds of Arabidopsis thaliana flavonoid accumulation mutants. Plant J. 62(4): 549-559. Kitamura, S, Shikazono N and Tanaka A (2004). TRANSPARENT TESTA 19 is involved in the accumulation of both anthocyanins and proanthocyanidins in Arabidopsis. Plant J. 37(1): 104-114. Krejčí, P, Matušková P, Hanáček P, Reinöhl V and Procházka S (2007). The transformation of pea (Pisum sativum L.): applicable methods of Agrobacterium tumefaciens- mediated gene transfer. Acta Physiologiae Plantarum 29(2): 157-163. Kumar, S and Blaxter ML (2010). Comparing de novo assemblers for 454 transcriptome data. BMC genomics 11(1): 571.

173

Kyndt, JA, Meyer TE, Cusanovich MA and Van Beeumen JJ (2002). Characterization of a bacterial tyrosine ammonia lyase, a biosynthetic enzyme for the photoactive yellow protein. FEBS Letters 512(1–3): 240-244. Lacampagne, S, Gagné S and Gény L (2010). Involvement of Abscisic Acid in Controlling the Proanthocyanidin Biosynthesis Pathway in Grape Skin: New Elements Regarding the Regulation of Tannin Composition and Leucoanthocyanidin Reductase (LAR) and Anthocyanidin Reductase (ANR) Activities and Expression. Journal of Plant Growth Regulation 29(1): 81-90. Ladwig, F, Stahl M, Ludewig U, Hirner AA, Hammes UZ, Stadler R, Harter K and Koch W (2012). Siliques Are Red1 from Arabidopsis Acts as a Bidirectional Amino Acid Transporter That Is Crucial for the Amino Acid Homeostasis of Siliques. Plant Physiology 158(4): 1643-1655. Lam, H-M, Wong P, Chan H-K, Yam K-M, Chen L, Chow C-M and Coruzzi GM (2003). Overexpression of the ASN1 Gene Enhances Nitrogen Status in Seeds of Arabidopsis. Plant Physiology 132(2): 926-935. Landolino, AB and Cook DR (2009). Phenylpropanoid Metabolism in Plants: Biochemistry, Functional Biology, and Metabolic Engineering. Plant Phenolics and Human Health, John Wiley & Sons, Inc.: 489-563. Le, BH, Wagmaister JA, Kawashima T, Bui AQ, Harada JJ and Goldberg RB (2007). Using genomics to study legume seed development. Plant Physiol 144(2): 562-574. Lee, YA, Cho EJ and Yokozawa T (2008). Effects of proanthocyanidin preparations on hyperlipidemia and other biomarkers in mouse model of type 2 diabetes. J. Agric. Food Chem. 56(17): 7781-7789. Lepiniec, L, Debeaujon I, Routaboul JM, Baudry A, Pourcel L, Nesi N and Caboche M (2006). Genetics and biochemistry of seed flavonoids. Annu Rev Plant Biol 57: 405-430. Lester, D, Ross J, Ait-Ali T, Martin D and Reid J (1996). A gibberellin 20-oxidase cDNA (Accession No. U58830) from pea (Pisum sativum L.) seed (PGR 96-050). Plant Physiol 111(4): 1353. Lester, DR, Ross JJ, Davies PJ and Reid JB (1997). Mendel's stem length gene (Le) encodes a gibberellin 3 beta-hydroxylase. The Plant Cell Online 9(8): 1435-1443. Lester, DR, Ross JJ, Smith JJ, Elliott RC and Reid JB (1999). Gibberellin 2-oxidation and the SLN gene of Pisum sativum. The Plant Journal 19(1): 65-73. Li, X, Gao P, Cui D, Wu L, Parkin I, Saberianfar R, Menassa R, Pan H, Westcott N and Gruber MY (2011). The Arabidopsis tt19-4 mutant differentially accumulates proanthocyanidin and anthocyanin through a 3′ amino acid substitution in glutathione S-transferase. Plant, Cell & Environment 34(3): 374-388. Li, X, Weng J-K and Chapple C (2008a). Improvement of biomass through lignin modification. The Plant Journal 54(4): 569-581. Li, Y, Zheng L, Corke F, Smith C and Bevan MW (2008b). Control of final seed and organ size by the DA1 gene family in Arabidopsis thaliana. Genes & Development 22(10): 1331-1336. Lin, Y-L, Juan IM, Chen Y-L, Liang Y-C and Lin J-K (1996). Composition of Polyphenols in Fresh Tea Leaves and Associations of Their Oxygen-Radical-Absorbing Capacity with Antiproliferative Actions in Fibroblast Cells. J Agric Food Chem 44(6): 1387- 1394.

174

Lionetti, V, Raiola A, Camardella L, Giovane A, Obel N, Pauly M, Favaron F, Cervone F and Bellincampi D (2007). Overexpression of Pectin Methylesterase Inhibitors in Arabidopsis Restricts Fungal Infection by Botrytis cinerea. Plant Physiology 143(4): 1871-1880. Loreti, E, Povero G, Novi G, Solfanelli C, Alpi A and Perata P (2008). Gibberellins, jasmonate and abscisic acid modulate the sucrose-induced expression of anthocyanin biosynthetic genes in Arabidopsis. New Phytologist 179(4): 1004-1016. Lu, B, Zeng Z and Shi T (2013). Comparative study of de novo assembly and genome-guided assembly strategies for transcriptome reconstruction based on RNA-Seq. Science China Life Sciences 56(2): 143-155. Lukačin, R, Wellmann F, Britsch L, Martens S and Matern U (2003). Flavonol synthase from Citrus unshiu is a bifunctional dioxygenase. Phytochemistry 62(3): 287-292. Luo, M, Dennis ES, Berger F, Peacock WJ and Chaudhury A (2005). MINISEED3 (MINI3), a WRKY family gene, and HAIKU2 (IKU2), a leucine-rich repeat (LRR) KINASE gene, are regulators of seed size in Arabidopsis. PNAS 102(48): 17531-17536. Macas, J, Neumann P and Navrátilová A (2007). Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics 8(1): 427. Maeda, H and Dudareva N (2012). The Shikimate Pathway and Aromatic Amino Acid Biosynthesis in Plants. Annual Review of Plant Biology 63(1): 73-105. Manach, C, Williamson G, Morand C, Scalbert A and Rémésy C (2005). Bioavailability and bioefficacy of polyphenols in humans. I. Review of 97 bioavailability studies. Am. J. Clin. Nutr. 81(1): 230S-242S. Maresh, JJ, Giddings L-A, Friedrich A, Loris EA, Panjikar S, Trout BL, Stöckigt J, Peters B and O'Connor SE (2007). Strictosidine Synthase: Mechanism of a Pictet−Spengler Catalyzing Enzyme†. Journal of the American Chemical Society 130(2): 710-723. Marinova, K, Pourcel L, Weder B, Schwarz M, Barron D, Routaboul JM, Debeaujon I and Klein M (2007). The Arabidopsis MATE transporter TT12 acts as a vacuolar flavonoid/H+ -antiporter active in proanthocyanidin-accumulating cells of the seed coat. Plant Cell 19(6): 2023-2038. Martens, S, Forkmann G, Britsch L, Wellmann F, Matern U and Lukačin R (2003). Divergent evolution of flavonoid 2-oxoglutarate-dependent dioxygenases in parsley. FEBS Letters 544(1–3): 93-98. Martin, DN, Proebsting WM and Hedden P (1999). The SLENDER Gene of Pea Encodes a Gibberellin 2-Oxidase. Plant Physiology 121(3): 775-781. Martin, L, Fei Z, Giovannoni J and Rose JKC (2013). Catalyzing Plant Science Research with RNA-seq. Frontiers in Plant Science 4. Marty, F (1999). Plant Vacuoles. The Plant Cell 11(4): 587-599. Maugé, C, Granier T, d'Estaintot BL, Gargouri M, Manigand C, Schmitter J-M, Chaudière J and Gallois B (2010). Crystal Structure and Catalytic Mechanism of Leucoanthocyanidin Reductase from Vitis vinifera. Journal of Molecular Biology 397(4): 1079-1091. Mellway, RD, Tran LT, Prouse MB, Campbell MM and Constabel CP (2009). The Wound-, Pathogen-, and Ultraviolet B-Responsive MYB134 Gene Encodes an R2R3 MYB Transcription Factor That Regulates Proanthocyanidin Synthesis in Poplar. Plant Physiol. 150(2): 924-941.

175

Miranda, M, Ralph SG, Mellway R, White R, Heath MC, Bohlmann J and Constabel CP (2007). The Transcriptional Response of Hybrid Poplar (Populus trichocarpa x P. deltoids) to Infection by Melampsora medusae Leaf Rust Involves Induction of Flavonoid Pathway Genes Leading to the Accumulation of Proanthocyanidins. Molecular Plant-Microbe Interactions 20(7): 816-831. Mortazavi, A, Williams BA, McCue K, Schaeffer L and Wold B (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Meth 5(7): 621-628. Nadeau, CD, Ozga JA, Kurepin LV, Jin A, Pharis RP and Reinecke DM (2011). Tissue-specific regulation of gibberellin biosynthesis in developing pea seeds. Plant Physiol. 156(2): 897-912. Nahrstedt, A, Proksch P and Conn EE (1987). Dhurrin, (−)-catechin, flavonol glycosides and flavones from Chamaebatia foliolosa. Phytochemistry 26(5): 1546-1547. Nambara, E and Marion-Poll A (2005). ABSCISIC ACID BIOSYNTHESIS AND CATABOLISM. Annual Review of Plant Biology 56(1): 165-185. Napoli, C, Lemieux C and Jorgensen R (1990). Introduction of a Chimeric Chalcone Synthase Gene into Petunia Results in Reversible Co-Suppression of Homologous Genes in trans. The Plant Cell Online 2(4): 279-289. Nesi, N, Debeaujon I, Jond C, Pelletier G, Caboche M and Lepiniec L (2000). The TT8 Gene Encodes a Basic Helix-Loop-Helix Domain Protein Required for Expression of DFR and BAN Genes in Arabidopsis Siliques. The Plant Cell Online 12(10): 1863-1878. Nesi, N, Jond C, Debeaujon I, Caboche M and Lepiniec L (2001). The Arabidopsis TT2 Gene Encodes an R2R3 MYB Domain Protein That Acts as a Key Determinant for Proanthocyanidin Accumulation in Developing Seed. The Plant Cell Online 13(9): 2099-2114. Nielsen, KA, Tattersall DB, Jones PR and Møller BL (2008). Metabolon formation in dhurrin biosynthesis. Phytochemistry 69(1): 88-98. Normanly, J, Slovin JP and Cohen JD (2010). Auxin biosynthesis and metabolism. Plant Hormones, Springer: 36-62. North, H, Baud S, Debeaujon I, Dubos C, Dubreucq B, Grappin P, Jullien M, Lepiniec L, Marion-Poll A, Miquel M, Rajjou L, Routaboul J-M and Caboche M (2010). Arabidopsis seed secrets unravelled after a decade of genetic and omics-driven research. The Plant Journal 61(6): 971-981. Odell, JT, Nagy F and Chua N-H (1985). Identification of DNA sequences required for activity of the cauliflower mosaic virus 35S promoter. Nature 313:810-812. Pang, Y, Abeysinghe ISB, He J, He X, Huhman D, Mewan KM, Sumner LW, Yun J and Dixon RA (2013a). Functional characterization of proanthocyanidin pathway enzymes from tea and their application for metabolic engineering. Plant Physiol. 161(3): 1103- 1116. Pang, Y, Cheng X, Huhman D, Ma J, Peel G, Yonekura-Sakakibara K, Saito K, Shen G, Sumner L, Tang Y, Wen J, Yun J and Dixon R (2013b). Medicago glucosyltransferase UGT72L1: potential roles in proanthocyanidin biosynthesis. Planta 238(1): 139-154. Pang, Y, Peel GJ, Sharma SB, Tang Y and Dixon RA (2008). A transcript profiling approach reveals an epicatechin-specific glucosyltransferase expressed in the seed coat of Medicago truncatula. PNAS 105(37): 14210-14215.

176

Pang, Y, Peel GJ, Wright E, Wang Z and Dixon RA (2007). Early steps in proanthocyanidin biosynthesis in the model legume Medicago truncatula. Plant Physiol 145(3): 601-615. Pang, Y, Wenger JP, Saathoff K, Peel GJ, Wen J, Huhman D, Allen SN, Tang Y, Cheng X, Tadege M, Ratet P, Mysore KS, Sumner LW, Marks MD and Dixon RA (2009). A WD40 Repeat Protein from Medicago truncatula Is Necessary for Tissue-Specific Anthocyanin and Proanthocyanidin Biosynthesis But Not for Trichome Development. Plant Physiol. 151(3): 1114-1129. Panjehkeh, N, Backhouse D and Taji A (2010). Role of Proanthocyanidins in Resistance of the Legume Swainsona formosa to Phytophthora cinnamomi. Journal of Phytopathology 158(5): 365-371. Paolocci, F, Robbins MP, Madeo L, Arcioni S, Martens S and Damiani F (2007). Ectopic expression of a basic helix-loop-helix gene transactivates parallel pathways of proanthocyanidin biosynthesis. structure, expression analysis, and genetic control of leucoanthocyanidin 4-reductase and anthocyanidin reductase genes in Lotus corniculatus. Plant Physiol 143(1): 504-516. Parham, R and Kaustinen H (1977). On the site of tannin synthesis in plant cells. Botanical Gazette: 465-467. Pelletier, MK, Murrell JR and Shirley BW (1997). Characterization of flavonol synthase and leucoanthocyanidin dioxygenase genes in Arabidopsis (further evidence for differential regulation of "early" and "late" genes). Plant Physiol. 113(4): 1437-1445. Peng, Q-Z, Zhu Y, Liu Z, Du C, Li K-G and Xie D-Y (2012). An integrated approach to demonstrating the ANR pathway of proanthocyanidin biosynthesis in plants. Planta 236(3): 901-918. Perry, LG, Thelen GC, Ridenour WM, Weir TL, Callaway RM, Paschke MW and M. Vivanco J (2005). Dual role for an allelochemical: (±)-catechin from Centaurea maculosa root exudates regulates conspecific seedling establishment. Journal of Ecology 93(6): 1126-1135. Petersen, M, Hans J and Matern U (2010). Biosynthesis of Phenylpropanoids and Related Compounds. Annual Plant Reviews Volume 40: Biochemistry of Plant Secondary Metabolism, Wiley-Blackwell: 182-257. Pfaffl, MW (2001). A new mathematical model for relative quantification in real-time RT- PCR. Nucleic Acids Res 29(9): e45. Pfeiffer, J, Kühnel C, Brandt J, Duy D, Punyasiri PAN, Forkmann G and Fischer TC (2006). Biosynthesis of flavan 3-ols by leucoanthocyanidin 4-reductases and anthocyanidin reductases in leaves of grape (Vitis vinifera L.), apple (Malus x domestica Borkh.) and other crops. Plant Physiology and Biochemistry 44(5-6): 323-334. Pichersky, E and Gang, D. (2000). Genetics and biochemistry of secondary metabolites in plants: an evolutionary perspective. Trends in Plant Science 5(10):439-445. Pirie, A and Mullins MG (1976). Changes in Anthocyanin and Phenolics Content of Grapevine Leaf and Fruit Tissues Treated with Sucrose, Nitrate, and Abscisic Acid. Plant Physiology 58(4): 468-472. Porter, LJ, Hrstich LN and Chan BG (1985). The conversion of and prodelphinidins to cyanidin and delphinidin. Phytochemistry 25(1): 223-230.

177

Pourcel, L, Irani NG, Lu Y, Riedl K, Schwartz S and Grotewold E (2010). The Formation of Anthocyanic Vacuolar Inclusions in Arabidopsis thaliana and Implications for the Sequestration of Anthocyanin Pigments. Molecular Plant 3(1): 78-90. Pourcel, L, Routaboul J-M, Kerhoas L, Caboche M, Lepiniec L and Debeaujon I (2005). TRANSPARENT TESTA10 encodes a laccase-like enzyme involved in oxidative polymerization of flavonoids in Arabidopsis seed coat. Plant Cell 17(11): 2966-2980. Poustka, F, Irani NG, Feller A, Lu Y, Pourcel L, Frame K and Grotewold E (2007). A trafficking pathway for anthocyanins overlaps with the endoplasmic reticulum-to- vacuole protein-sorting route in Arabidopsis and contributes to the formation of vacuolar inclusions. Plant Physiol. 145(4): 1323-1335. Ralston, L, Subramanian S, Matsuno M and Yu O (2005). Partial reconstruction of flavonoid and isoflavonoid biosynthesis in yeast using soybean type I and type II chalcone isomerases. Plant Physiol 137(4): 1375-1388. Rasmussen, S and Dixon RA (1999). Transgene-Mediated and Elicitor-Induced Perturbation of Metabolic Channeling at the Entry Point into the Phenylpropanoid Pathway. The Plant Cell Online 11(8): 1537-1551. Rasmussen, SE, Frederiksen H, Struntze Krogholm K and Poulsen L (2005). Dietary proanthocyanidins: Occurrence, dietary intake, bioavailability, and protection against cardiovascular disease. Mol. Nutr. Food Res. 49(2): 159-174. Ratan, A, Miller W, Guillory J, Stinson J, Seshagiri S and Schuster SC (2013). Comparison of Sequencing Platforms for Single Nucleotide Variant Calls in a Human Sample. PLoS ONE 8(2): e55089. Reddy, AR, Britsch L, Salamini F, Saedler H and Rohde W (1987). The A1 (anthocyanin-1) locus in Zea mays encodes dihydroquercetin reductase. Plant Sci. 52(1–2): 7-13. Rice-Evans, C, Miller N and Paganga G (1997). Antioxidant properties of phenolic compounds. Trends Plant Sci. 2(4): 152-159. Robertson, G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM and Qian JQ (2010). De novo assembly and analysis of RNA-seq data. Nature methods 7(11): 909-912. Rochat, C and Boutin J-P (1991). Metabolism of phloem-borne amino acids in maternal tissues of fuit of nodulated or nitrate-fed pea plants (Pisum sativum L.). Journal of Experimental Botany 42(2): 207-214. Rosado, A and Raikhel NV (2010). Understanding Plant Vacuolar Trafficking from a Systems Biology Perspective. Plant Physiology 154(2): 545-550. Rosler, J, Krekel F, Amrhein N and Schmid J (1997). Maize Phenylalanine Ammonia-Lyase Has Tyrosine Ammonia-Lyase Activity. Plant Physiology 113(1): 175-179. Ross, J, Li Y, Lim E and Bowles DJ (2001). Higher plant glycosyltransferases. Genome Biol 2(2): REVIEWS3004. Rothberg, JM and Leamon JH (2008). The development and impact of 454 sequencing. Nature biotechnology 26(10): 1117-1124. Routaboul, J-M, Kerhoas L, Debeaujon I, Pourcel L, Caboche M, Einhorn J and Lepiniec L (2006). Flavonoid diversity and biosynthesis in seed of Arabidopsis thaliana. Planta 224(1): 96-107. Rushton, PJ, Somssich IE, Ringler P and Shen QJ (2010). WRKY transcription factors. Trends in plant science 15(5): 247-258.

178

Saito, K, Kobayashi M, Gong Z, Tanaka Y and Yamazaki M (1999). Direct evidence for anthocyanidin synthase as a 2-oxoglutarate-dependent oxygenase: molecular cloning and functional expression of cDNA from a red forma ofPerilla frutescens. The Plant Journal 17(2): 181-189. Schilmiller, AL, Stout J, Weng J-K, Humphreys J, Ruegger MO and Chapple C (2009). Mutations in the cinnamate 4-hydroxylase gene impact metabolism, growth and development in Arabidopsis. The Plant Journal 60(5): 771-782. Serrano, J, Puupponen-Pimiä R, Dauer A, Aura A-M and Saura-Calixto F (2009). Tannins: Current knowledge of food sources, intake, bioavailability and biological effects. Molecular Nutrition & Food Research 53(S2): S310-S329. Shimada, N, Sasaki R, Sato S, Kaneko T, Tabata S, Aoki T and Ayabe S-i (2005). A comprehensive analysis of six dihydroflavonol 4-reductases encoded by a gene cluster of the Lotus japonicus genome. Journal of Experimental Botany 56(419): 2573- 2585. Shirley, BW, Hanley S and Goodman HM (1992). Effects of ionizing radiation on a plant genome: analysis of two Arabidopsis transparent testa mutations. The Plant Cell Online 4(3): 333-347. Singh, K, Rani A, Paul A, Dutt S, Joshi R, Gulati A, Ahuja PS and Kumar S (2009). Differential display mediated cloning of anthocyanidin reductase gene from tea (Camellia sinensis) and its relationship with the concentration of epicatechins. Tree Physiol 29(6): 837-846. Singh, S, McCallum J, Gruber MY, Towers GHN, Muir AD, Bohm BA, Koupai-Abyazani MR and Glass ADM (1997). Biosynthesis of flavan-3-ols by leaf extracts of Onobrychis viciifolia. Phytochemistry 44(3): 8. Smýkal, P, Aubert G, Burstin J, Coyne CJ, Ellis NTH, Flavell AJ, Ford R, Hýbl M, Macas J, Neumann P, McPhee KE, Redden RJ, Rubiales D, Weller JL and Warkentin TD (2012). Pea (Pisum sativum L.) in the Genomic Era. Agronomy 2(2): 74-115. Sohlberg, JJ, Myrenås M, Kuusk S, Lagercrantz U, Kowalczyk M, Sandberg G and Sundberg E (2006). STY1 regulates auxin homeostasis and affects apical–basal patterning of the Arabidopsis gynoecium. The Plant Journal 47(1): 112-123. Sotelo-Silveira, M, Cucinotta M, Chauvin A-L, Chávez Montes RA, Colombo L, Marsch- Martínez N and de Folter S (2013). Cytochrome P450 CYP78A9 Is Involved in Arabidopsis Reproductive Development. Plant Physiology 162(2): 779-799. Sparvoli, F, Martin C, Scienza A, Gavazzi G and Tonelli C (1994). Cloning and molecular analysis of structural genes involved in flavonoid and stilbene biosynthesis in grape (Vitis vinifera L.). Plant Molecular Biology 24(5): 743-755. Sponsel, V (1995). The biosynthesis and metabolism of gibberellins in higher plants. Plant Hormones: Physiology, Biochemistry and Molecular Biology. P. Davies. Dordrecht, The Netherlands, Kluwer Academic Publishers: 66–97. Sreenivasulu, N and Wobus U (2013). Seed-Development Programs: A Systems Biology– Based Comparison Between Dicots and Monocots. Annu Rev Plant Biol 64(1): 189- 217. Sreevidya, V, Srinivasa Rao C, Sullia S, Ladha JK and Reddy PM (2006). Metabolic engineering of rice with soybean isoflavone synthase for promoting nodulation gene expression in rhizobia. Journal of Experimental Botany 57(9): 1957-1969.

179

Stafford, HA (1988). Proanthocyanidins and the lignin connection. Phytochemistry 27(1): 1-6. Stafford, HA and Lester HH (1985). Flavan-3-ol Biosynthesis : The Conversion of (+)- Dihydromyricetin to Its Flavan-3,4-Diol (Leucodelphinidin) and to (+)- Gallocatechin by Reductases Extracted from Tissue Cultures of Ginkgo biloba and Pseudotsuga menziesii. Plant Physiol. 78(4): 791-794. Staswick, PE, Serban B, Rowe M, Tiryaki I, Maldonado MT, Maldonado MC and Suza W (2005). Characterization of an Arabidopsis Enzyme Family That Conjugates Amino Acids to Indole-3-Acetic Acid. The Plant Cell Online 17(2): 616-627. Stotz, G, Vlaming P, Wiering H, Schram AW and Forkmann G (1985). Genetic and biochemical studies on flavonoid 3′-hydroxylation in flowers of Petunia hybrida. Theoretical and Applied Genetics 70(3): 300-305. Strickler, SR, Bombarely A and Mueller LA (2012). Designing a transcriptome next- generation sequencing project for a nonmodel plant species1. American Journal of Botany 99(2): 257-266. Sturm, A and Tang G-Q (1999). The sucrose-cleaving enzymes of plants are crucial for development, growth and carbon partitioning. Trends in Plant Science 4(10): 401- 407. Sun, Y, Li H and Huang J-R (2012). Arabidopsis TT19 Functions as a Carrier to Transport Anthocyanin from the Cytosol to Tonoplasts. Molecular Plant 5(2): 387-400. Szankowski, I, Flachowsky H, Li H, Halbwirth H, Treutter D, Regos I, Hanke M-V, Stich K and Fischer T (2009). Shift in polyphenol profile and sublethal phenotype caused by silencing of anthocyanidin synthase in apple (Malus sp.). Planta 229(3): 681-692. Tanner, GJ, Francki KT, Abrahams S, Watson JM, Larkin PJ and Ashton AR (2003). Proanthocyanidin biosynthesis in plants. Purification of legume leucoanthocyanidin reductase and molecular cloning of its cDNA. J Biol Chem 278(34): 31647-31656. Terrier, N, Torregrosa L, Ageorges A, Vialet S, Verries C, Cheynier V and Romieu C (2009). Ectopic Expression of VvMybPA2 Promotes Proanthocyanidin Biosynthesis in Grapevine and Suggests Additional Targets in the Pathway. Plant Physiol. 149(2): 1028-1041. Theis, N and Lerdau, M. The Evolution of Function in Plant Secondary Metabolites. International Journal of Plant Sciences 164(S3):S93-S102. Tohge, T, Watanabe M, Hoefgen R and Fernie AR (2013). The evolution of phenylpropanoid metabolism in the green lineage. Critical Reviews in Biochemistry and Molecular Biology 48(2): 123-152. Trapnell, C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ and Pachter L (2010). Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology 28(5): 511-515. Treimer, JF and Zenk MH (1979). Purification and Properties of Strictosidine Synthase, the Key Enzyme in Indole Alkaloid Formation. European Journal of Biochemistry 101(1): 225-233. Tsai, C-J, Harding SA, Tschaplinski TJ, Lindroth RL and Yuan Y (2006). Genome-wide analysis of the structural genes regulating defense phenylpropanoid metabolism in Populus. New Phytologist 172(1): 47-62.

180

Turnbull, JJ, Nakajima J, Welford RW, Yamazaki M, Saito K and Schofield CJ (2004). Mechanistic studies on three 2-oxoglutarate-dependent oxygenases of flavonoid biosynthesis: anthocyanidin synthase, flavonol synthase, and flavanone 3beta- hydroxylase. J Biol Chem 279(2): 1206-1216. Turnbull, JJ, Sobey WJ, Aplin RT, Hassan A, Schofield CJ, Firmin JL and Prescott AG (2000). Are anthocyanidins the immediate products of anthocyanidin synthase? Chemical Communications(24): 2473-2474. Tuteja, JH, Zabala G, Varala K, Hudson M and Vodkin LO (2009). Endogenous, Tissue- Specific Short Interfering RNAs Silence the Chalcone Synthase Gene Family in Glycine max Seed Coats. Plant Cell 21(10): 3063-3077. Van Dongen, JT, Ammerlaan AMH, Wouterlood M, Van Aelst AC and Borstlap AC (2003). Structure of the developing pea seed coat and the post‐phloem transport pathway of nutrients. Ann. Bot. 91(6): 729-737. Verdier, J, Zhao J, Torres-Jerez I, Ge S, Liu C, He X, Mysore KS, Dixon RA and Udvardi MK (2012). MtPAR MYB transcription factor acts as an on switch for proanthocyanidin biosynthesis in Medicago truncatula. Proceedings of the National Academy of Sciences 109(5): 1766-1771. Vogt, T (2010). Phenylpropanoid Biosynthesis. Molecular Plant 3(1): 2-20. Volpi, C, Janni M, Lionetti V, Bellincampi D, Favaron F and D'Ovidio R (2011). The Ectopic Expression of a Pectin Methyl Esterase Inhibitor Increases Pectin Methyl Esterification and Limits Fungal Diseases in Wheat. Molecular Plant-Microbe Interactions 24(9): 1012-1019. Walker, AR, Davison PA, Bolognesi-Winfield AC, James CM, Srinivasan N, Blundell TL, Esch JJ, Marks MD and Gray JC (1999). The TRANSPARENT TESTA GLABRA1 Locus, Which Regulates Trichome Differentiation and Anthocyanin Biosynthesis in Arabidopsis, Encodes a WD40 Repeat Protein. The Plant Cell Online 11(7): 1337- 1350. Wang, H, Wang W, Zhang P, Pan Q, Zhan J and Huang W (2010). Gene transcript accumulation, tissue and subcellular localization of anthocyanidin synthase (ANS) in developing grape berries. Plant Science 179(1-2): 103-113. Wang, L, Jiang Y, Yuan L, Lu W, Yang L, Karim A and Luo K (2013). Isolation and Characterization of cDNAs Encoding Leucoanthocyanidin Reductase and Anthocyanidin Reductase from Populus trichocarpa. PLoS ONE 8(5): e64664. Wang, SX, Hunter W and Plant A (2000). Isolation and purification of functional total RNA from woody branches and needles of sitka and white spruce. BioTechniques 28(2): 292-296. Wang, X, Warkentin T, Briggs C, Oomah BD, Campbell C and Woods S (1998). Total phenolics and condensed tannins in field pea (Pisum sativum L.) and grass pea (Lathyrus sativus L.). Euphytica 101(1): 97-102. Watts, KT, Mijts BN, Lee PC, Manning AJ and Schmidt-Dannert C (2006). Discovery of a Substrate Selectivity Switch in Tyrosine Ammonia-Lyase, a Member of the Aromatic Amino Acid Lyase Family. Chemistry & Biology 13(12): 1317-1326. Weber, H, Borisjuk L and Wobus U (2005). Molecular physiology of legume seed development. Annu Rev Plant Biol 56: 253-279.

181

Weigelt, K, Küster H, Radchuk R, Müller M, Weichert H, Fait A, Fernie AR, Saalbach I and Weber H (2008). Increasing amino acid supply in pea embryos reveals specific interactions of N and C metabolism, and highlights the importance of mitochondrial metabolism. The Plant Journal 55(6): 909-926. Welford, RWD, Turnbull JJ, Claridge TDW, Prescott AG and Schofield CJ (2001). Evidence for oxidation at C-3 of the flavonoid C-ring during anthocyanin biosynthesis. Chemical Communications(18): 1828-1829. Wellmann, F, Griesser M, Schwab W, Martens S, Eisenreich W, Matern U and Lukačin R (2006). Anthocyanidin synthase from Gerbera hybrida catalyzes the conversion of (+)-catechin to cyanidin and a novel procyanidin. FEBS Letters 580(6): 1642-1648. Weston, DE, Elliott RC, Lester DR, Rameau C, Reid JB, Murfet IC and Ross JJ (2008). The Pea DELLA Proteins LA and CRY Are Important Regulators of Gibberellin Synthesis and Root Growth. Plant Physiology 147(1): 199-205. Wilmouth, RC, Turnbull JJ, Welford RW, Clifton IJ, Prescott AG and Schofield CJ (2002). Structure and mechanism of anthocyanidin synthase from Arabidopsis thaliana. Structure 10(1): 93-103. Wink, M (2010). Introduction: Biochemistry, Physiology and Ecological Functions of Secondary Metabolites. Annual Plant Reviews Volume 40: Biochemistry of Plant Secondary Metabolism, Wiley-Blackwell: 1-19. Winkel, BS (2004). Metabolic channeling in plants. Annu Rev Plant Biol 55: 85-107. Wu, L, El-mezawy A, Duong M and Shah S (2010). Two seed coat-specific promoters are functionally conserved between Arabidopsis thaliana and Brassica napus. In Vitro Cellular & Developmental Biology - Plant 46(4): 338-347. Xie, D-Y and Dixon RA (2005). Proanthocyanidin biosynthesis – still more questions than answers? Phytochemistry 66(18): 2127-2144. Xie, D-Y, Jackson LA, Cooper JD, Ferreira D and Paiva NL (2004a). Molecular and Biochemical Analysis of Two cDNA Clones Encoding Dihydroflavonol-4-Reductase from Medicago truncatula. Plant Physiol. 134(3): 979-994. Xie, DY, Sharma SB and Dixon RA (2004b). Anthocyanidin reductases from Medicago truncatula and Arabidopsis thaliana. Arch Biochem Biophys 422(1): 91-102. Xie, DY, Sharma SB, Paiva NL, Ferreira D and Dixon RA (2003). Role of anthocyanidin reductase, encoded by BANYULS in plant flavonoid biosynthesis. Science 299(5605): 396-369. Yamaguti-Sasaki, E, Ito L, Canteli V, Ushirobira T, Ueda-Nakamura T, Filho B, Nakamura C and Palazzo de Mello J (2007). Antioxidant Capacity and In Vitro Prevention of Dental Plaque Formation by Extracts and Condensed Tannins of Paullinia cupana. Molecules 12(8): 1950-1963. Yoshida, K, Iwasaka R, Kaneko T, Sato S, Tabata S and Sakuta M (2008). Functional Differentiation of Lotus japonicus TT2s, R2R3-MYB Transcription Factors Comprising a Multigene Family. Plant and Cell Physiology 49(2): 157-169. Young, RE, McFarlane HE, Hahn MG, Western TL, Haughn GW and Samuels AL (2008). Analysis of the in Arabidopsis Seed Coat Cells during Polarized Secretion of Pectin-Rich Mucilage. The Plant Cell Online 20(6): 1623-1638. Yuan, L, Wang L, Han Z, Jiang Y, Zhao L, Liu H, Yang L and Luo K (2012). Molecular cloning and characterization of PtrLAR3, a gene encoding leucoanthocyanidin

182

reductase from Populus trichocarpa, and its constitutive expression enhances fungal resistance in transgenic plants. Journal of Experimental Botany. Zhang, F, Gonzalez A, Zhao M, Payne CT and Lloyd A (2003). A network of redundant bHLH proteins functions in all TTG1-dependent pathways of Arabidopsis. Development 130(20): 4859-4869. Zhang, X, Henriques R, Lin S-S, Niu Q-W and Chua N-H (2006). Agrobacterium-mediated transformation of Arabidopsis thaliana using the floral dip method. Nat. Protoc. 1(2): 641-646. Zhao, J and Dixon RA (2009). MATE transporters facilitate vacuolar uptake of epicatechin 3'-O-glucoside for proanthocyanidin biosynthesis in Medicago truncatula and Arabidopsis. Plant Cell 21(8): 2323-2340. Zhao, J, Pang Y and Dixon RA (2010). The Mysteries of Proanthocyanidin Transport and Polymerization. Plant Physiol.: pp.110.155432. Zhao, Y (2010). Auxin Biosynthesis and Its Role in Plant Development. Annual Review of Plant Biology 61(1): 49-64. Zhao, Z and Ng D (2007). US Department of Energy Joint Genome Institute: cDNA Library Creation Protocol. US Department of Energy Joint Genome Institute. E. Lindquist, P. Richardson and F. Chen. Zhou, Y, Zhang X, Kang X, Zhao X, Zhang X and Ni M (2009). SHORT HYPOCOTYL UNDER BLUE1 Associates with MINISEED3 and HAIKU2 Promoters in Vivo to Regulate Arabidopsis Seed Development. The Plant Cell Online 21(1): 106-117. Zohary, D, Hopf M and Weiss E (2012). Domestication of Plants in the Old World: The origin and spread of domesticated plants in Southwest Asia, Europe, and the Mediterranean Basin, Oxford University Press. Zondlo, SC and Irish VF (1999). CYP78A5 encodes a cytochrome P450 that marks the shoot apical meristem boundary in Arabidopsis. Plant J 19(3): 259-268.

183

APPENDIX A: COMPARATIVE TRANSCRIPTOMICS

A.1. Unigenes differentially expressed in ‘Solido’ (S) versus ‘Courier’ (Cr) and ‘LAN3017’ (L). TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value 01179 NA not available NA not available 156.7 0.0 0.0 Absolute 03878 NA not available NA not available 103.8 0.0 0.0 Absolute 03256 NA not available NA not available 77.4 0.0 0.0 Absolute 02084 3.71E- AT1G55810 uridine kinase-like 3 0.0003 D9I4C0 Uridine kinase 69.9 0.0 0.0 Absolute 05 05128 NA not available NA not available 67.5 0.0 0.0 Absolute 03174 5.63E- AT3G52370 FASCICLIN-like 6.01E- Q2PF38 Putative 58.2 0.0 0.0 Absolute 07 arabinogalactan protein 10 uncharacterized 15 precursor protein 01365 5.29E- AT3G08580 AAC1 ADP/ATP carrier 1.6E- I3T6D5 Uncharacterized 54.2 0.0 0.0 Absolute 28 1 28 protein 04027 NA not available NA not available 225.8 0.0 0.0 6206.2 01447 1.61E- AT2G36325 GDSL-like 4.37E- G7I351 GDSL 1589.3 0.3 0.2 6092.9 11 Lipase/Acylhydrolase 19 esterase/lipase superfamily protein 01435 2.46E- AT1G63120 RHOMBOID-like 2 9.18E- D7KU18 Putative 361.1 0.1 0.0 4974.0 27 26 uncharacterized protein 03276 1.19E- AT3G08030 Protein of unknown 3.2E- I3SH19 Uncharacterized 466.8 0.0 0.2 4935.1 76 function 109 protein 03432 NA not available 2.05E- G7ZZU7 Acetyl-CoA 100.7 0.0 0.0 4547.7 08 acetyltransferase cytosolic 05285 NA not available NA not available 180.2 0.0 0.1 3662.3 02927 3.41E- AT4G16760 acyl-CoA oxidase 1 1.9E- I3SB18 Uncharacterized 131.3 0.1 0.0 3002.8 27 31 protein 03244 3.57E- AT1G20950 Phosphofructokinase 1.9E- I3S558 Uncharacterized 161.8 0.0 0.1 2977.1 43 family protein 48 protein 03904 2.01E- AT1G22520 Domain of unknown 2.68E- I3T1N1 Uncharacterized 156.0 0.0 0.1 2737.9 17 function 25 protein

184

TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value 07936 NA not available NA not available 126.6 0.0 0.1 2269.0 00821 2.56E- AT1G43800 Plant stearoyl-acyl- 2.04E- K7LY05 Uncharacterized 142.2 0.1 0.1 2255.4 31 carrier-protein desaturase 39 protein family protein 03204 8.43E- AT3G53110 LOS4 P-loop containing 3.04E- I3SPA4 Uncharacterized 61.1 0.0 0.1 1900.6 22 nucleoside triphosphate 39 protein hydrolases superfamily protein 06184 NA not available NA not available 65.2 0.0 0.1 1878.5 02256 1.23E- AT2G02380 glutathione S-transferase 4.62E- I3T1N4 Uncharacterized 71.4 0.0 0.1 1742.4 09 (class zeta) 2 18 protein 03584 1.25E- AT5G46110 APE2, TPT Glucose-6- 1.92E- I1KLP7 Uncharacterized 236.7 0.2 0.1 1632.8 79 phosphate/phosphate 87 protein translocator-related 07883 NA not available NA not available 101.0 0.0 0.1 1538.3 02513 NA not available NA not available 82.3 0.1 0.0 1536.5 01020 NA not available NA not available 171.5 0.2 0.1 1416.0 13154 7.15E- AT4G25810 xyloglucan 2.7E- C6TEP4 Putative 70.5 0.0 0.1 1324.1 131 endotransglycosylase 6 146 uncharacterized protein 13629 NA not available NA not available 68.0 0.1 0.1 1156.9 03822 2.02E- AT2G45640 SIN3 associated 2E-61 C6T2R6 Uncharacterized 97.3 0.2 0.0 1124.3 44 polypeptide P18 protein 03504 8.85E- AT1G09660 RNA-binding KH 1.88E- Q1SL18 KH 109.3 0.1 0.1 1019.2 16 domain-containing 26 protein 01222 NA not available 9.28E- G7L0L2 BRI1-KD 52.9 0.0 0.1 825.0 07 interacting protein 01307 1.17E- AT5G05170 Cellulose synthase family 1.3E- Q3SA40 Putative cellulose 120.8 0.2 0.1 823.7 53 protein 55 synthase 03140 1.06E- AT3G53260 PAL2 1.67E- A0PBZ8 Phenylalanine 62.3 0.1 0.1 672.0 36 46 ammonia-lyase 16526 1.99E- AT2G18550 homeobox protein 21 1.99E- G7KC76 Homeodomain- 133.1 0.1 0.4 487.1 49 72 leucine zipper 185

TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value protein 07426 1.58E- AT2G46540 25 plant structures 3.44E- I1JS10 Uncharacterized 135.1 0.4 0.2 455.5 24 30 protein 03592 1.28E- AT5G43830 Aluminium induced 6.72E- I3SFA9 Uncharacterized 132.0 0.4 0.3 401.4 68 protein with YGL and 83 protein LRDR motifs 04166 1.93E- AT1G60390 polygalacturonase 1 1.06E- I1KXG3 Uncharacterized 1000.1 2.2 2.8 398.1 75 84 protein 18681 6.98E- AT4G08300 nodulin MtN21 /EamA- 4.14E- G7J311 Auxin-induced 72.4 0.0 0.4 386.9 37 like transporter family 61 protein 5NG4 protein 15888 1.22E- AT5G50260 Cysteine proteinases 2.6E- O82708 Pre-pro-TPE4A 90.0 0.4 0.2 311.1 98 superfamily protein 133 protein 04849 NA not available NA not available 64.1 0.4 0.3 187.4 13339 9.65E- AT5G07050 nodulin MtN21 /EamA- 1.5E- I3RZD8 Uncharacterized 219.0 0.2 2.3 177.4 84 like transporter family 141 protein protein 12605 8.75E- AT3G61880 CYP78A9 7E- Q2MJ07 Cytochrome P450 158.4 1.3 1.4 118.7 132 169 78A3 00141 NA not available NA not available 134.9 2.0 0.9 93.7 15314 7.47E- AT5G02230 Haloacid dehalogenase- 4.3E- I3S857 Uncharacterized 99.1 1.4 1.4 70.7 102 like hydrolase (HAD) 138 protein superfamily protein 04601 2.78E- AT5G03795 Exostosin family protein 1.44E- I3S3Z0 Uncharacterized 50.9 0.6 0.9 68.4 13 17 protein 14672 1.54E- AT4G27410 RD26 NAC (No Apical 2.1E- I1KE01 Uncharacterized 72.4 1.2 0.9 68.3 58 Meristem) domain 102 protein transcriptional regulator superfamily protein 00142 NA not available 1.5E- O22612 Dormancy- 943.0 11.9 15.9 67.9 05 associated protein 00155 NA not available NA not available 137.6 1.8 2.7 61.8 07362 8.69E- AT5G62360 Plant invertase/pectin 2.14E- I3T947 Uncharacterized 135.1 1.5 3.2 58.1 35 methylesterase inhibitor 54 protein

186

TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value superfamily protein 15101 3.48E- AT5G64260 EXORDIUM like 2 5.6E- G7JCS2 Uncharacterized 143.1 2.7 2.5 55.0 103 138 protein 12670 2.97E- AT3G48530 KING1 SNF1-related 2.2E- Q7XYY0 AKIN gamma 53.5 1.1 1.0 51.1 132 protein kinase regulatory 171 subunit gamma 1 08742 0 AT3G47340 glutamine-dependent 0 P93618 Asparagine 183.7 5.0 4.6 38.4 asparagine synthase 1 synthetase 05372 5.94E- AT5G63530 farnesylated protein 3 4.41E- M0T4K4 Uncharacterized 183.5 7.1 7.0 26.0 25 24 protein 13191 3.74E- AT2G28930 PK1B protein kinase 1B 1.3E- B7FJL9 Uncharacterized 59.9 2.8 1.9 25.6 125 156 protein 18899 1.15E- AT3G10870 methyl esterase 17 8.32E- I3SIX8 Uncharacterized 104.1 5.0 3.9 23.5 57 83 protein 00871 1.91E- AT2G47140 NAD(P)-binding 1E- Q76KV8 Short-chain alcohol 171.6 10.2 4.7 23.1 44 Rossmann-fold 107 dehydrogenase A superfamily protein 11247 1.52E- AT3G23160 Protein of unknown 0 G7JTD5 Avr9/Cf-9 rapidly 69.4 3.1 2.9 23.0 57 function (DUF668) elicited protein 00218 NA not available 1.31E- G7JYW4 Putative 79.4 1.4 5.5 22.8 05 uncharacterized protein 05374 7.58E- AT5G50740 Heavy metal 1.31E- G7J3E1 Putative 140.3 5.8 7.1 21.7 30 transport/detoxification 30 uncharacterized superfamily protein protein 20292 3.14E- AT5G13020 Emsy N Terminus (ENT)/ 7.38E- K7K984 Uncharacterized 51.9 2.6 2.4 21.0 06 plant Tudor-like domains- 06 protein containing protein 00873 1.39E- AT3G29260 NAD(P)-binding 1.49E- Q76KV8 Short-chain alcohol 156.9 9.4 5.9 20.5 11 Rossmann-fold 31 dehydrogenase A superfamily protein 19135 4.86E- AT5G10530 Concanavalin A-like 2.16E- Q41069 Vegetative lectin 305.6 29.6 2.4 19.1 15 lectin protein kinase 94 family protein 01045 NA not available 5.87E- I1KDD1 Uncharacterized 80.2 0.0 9.5 16.9

187

TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value 08 protein 18083 NA not available 2.33E- G7KF23 Putative 682.4 13.5 69.3 16.5 65 uncharacterized protein 11963 3.78E- AT1G78440 GA2OX1 Arabidopsis 4.2E- I1M2Q0 Uncharacterized 894.9 37.1 75.9 15.8 97 thaliana gibberellin 2- 153 protein oxidase 1 00876 8.25E- AT2G47140 NAD(P)-binding 3.68E- Q9SQJ2 Short-chain alcohol 95.0 9.3 3.3 15.0 17 Rossmann-fold 46 dehydrogenase superfamily protein 00457 1.76E- AT4G39480 CYP96A9 4.9E- I1NCY9 Uncharacterized 201.7 11.1 15.8 15.0 85 124 protein 16257 2.26E- AT4G20260 PCAP1 plasma- 9.07E- B7FH09 Uncharacterized 56.1 4.4 3.5 14.1 39 membrane associated 57 protein cation-binding protein 1 19912 NA not available 1.83E- G7LAG4 Putative 90.8 8.2 5.1 13.6 21 uncharacterized protein 12491 1.32E- AT4G37890 EDA40 Zinc finger 0 A9YWQ9 Zinc finger protein 59.4 3.8 5.3 13.1 93 (C3HC4-type RING finger) family protein 00346 3.09E- AT1G75280 NmrA-like negative 4.38E- I3SC37 Uncharacterized 706.8 108.3 0.9 12.9 62 transcriptional regulator 78 protein family protein 15427 2.74E- AT4G27450 Aluminium induced 3.3E- G7IP18 Asparagine 197.6 13.3 17.5 12.8 101 protein with YGL and 146 synthetase LRDR motifs 12565 5.23E- AT1G52340 ABA2 3.3E- I3SBG4 Uncharacterized 815.6 60.1 71.1 12.4 66 126 protein 19131 NA not available 5.35E- A3FF19 Phloem specific 161.1 22.7 3.7 12.2 58 protein 08973 0 AT4G02280 sucrose synthase 3 0 K7KGC7 Sucrose synthase 154.2 15.0 10.3 12.2 09328 0 AT5G67360 ARA12 Subtilase family 0 Q2HRK7 Protease-associated 103.0 8.8 8.5 12.0 protein PA; Proteinase inhibitor I9,

188

TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value subtilisin propeptide 19499 NA not available 2.69E- B7FGG0 Uncharacterized 58.7 4.8 5.4 11.6 41 protein 11472 1.60E- AT3G30340 nodulin MtN21 /EamA- 0 G7IBE3 Auxin-induced 50.3 2.6 6.4 11.3 102 like transporter family protein 5NG4 protein 18585 1.74E- AT3G09390 metallothionein 2A 5.51E- Q75NH7 Type 1 868.1 113.7 54.1 10.3 11 31 metallothionein 03332 4.17E- AT1G09640 Translation elongation 7.2E- I1LTA5 Uncharacterized 213.1 18.2 24.0 10.1 100 factor EF1B, gamma 109 protein chain 03764 9.12E- AT2G19570 cytidine deaminase 1 1.46E- G7KJU9 Cytidine deaminase 87.3 0.1 17.5 9.9 10 17 14175 1.09E- AT1G20850 xylem cysteine peptidase 1.7E- G7LJF1 Cysteine proteinase 88.5 14.5 3.4 9.9 70 2 172 14459 1.82E- AT5G66040 sulfurtransferase protein 5.36E- B7FGV4 Senescence- 97.3 8.3 12.0 9.6 44 16 87 associated protein DIN1 12827 5.63E- AT5G64260 EXORDIUM like 2 0 G7JCS2 Uncharacterized 147.5 11.8 19.9 9.3 146 protein 18644 NA not available 7.38E- O64396 Peaci11.8 139.0 4.2 28.4 8.5 12 14750 3.94E- AT5G02020 SIS (Salt Induced Serine 1.48E- G7L5L0 Uncharacterized 185.3 39.0 9.7 7.6 31 rich) 67 protein 15162 9.60E- AT5G05250 Archae - 12; Bacteria 2.82E- I1JYB9 Uncharacterized 59.7 8.1 7.6 7.6 44 65 protein 11719 0 AT5G49630 AAP6 amino acid 0 G7I283 Amino acid 165.7 27.0 16.9 7.6 permease 6 transporter 00836 3.95E- AT4G35540 zinc ion 0 G7IML9 Putative 68.1 8.9 9.3 7.5 79 binding;transcription uncharacterized regulators protein 08664 0 AT2G24520 AHA5, HA5 H(+)- 0 Q9AR52 P-type H+-ATPase 397.4 58.2 48.6 7.4 ATPase 5 01175 1.79E- AT5G12380 annexin 8 9.1E- G7LF88 Annexin 304.5 46.3 36.6 7.3 16 23

189

TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value 17489 1.02E- AT1G76690 OPR2, 12- 8.26E- G7K3S2 12- 56.3 11.3 5.1 6.9 72 oxophytodienoate 82 oxophytodienoate reductase 2 reductase 14489 1.29E- AT5G19140 AILP1 Aluminium 3.5E- I3T590 Uncharacterized 262.5 50.7 25.8 6.9 128 induced protein with YGL 147 protein and LRDR motifs 16564 2.19E- AT3G14280 NCB 1.4E- B7FMF3 Uncharacterized 372.9 60.1 48.7 6.9 13 69 protein 12439 NA not available NA not available 196.7 5.6 54.5 6.5 14873 1.60E- AT2G32150 Haloacid dehalogenase- 1.1E- I3SNY9 Uncharacterized 298.9 23.6 68.9 6.5 87 like hydrolase (HAD) 138 protein superfamily protein 15667 2.42E- AT1G76690 OPR2, 12- 1.7E- Q76FS1 12-oxophytodienoic 72.3 14.4 8.8 6.2 109 oxophytodienoate 146 acid 10, 11- reductase 2 reductase 19631 1.41E- AT1G76680 OPR1 12- 1.1E- Q76FR8 Putative 50.1 11.1 5.0 6.2 80 oxophytodienoate 116 uncharacterized reductase 1 protein PsOPR4 18394 NA not available NA not available 55.0 9.9 7.9 6.2 17295 2.81E- AT3G03990 alpha/beta-Hydrolases 3.25E- G7ICD6 Sigma factor sigB 57.6 13.1 6.3 5.9 62 superfamily protein 80 regulation protein rsbQ 01174 NA not available 3.02E- G7J6I5 BURP domain- 175.5 7.5 52.8 5.8 19 containing protein 00746 9.85E- AT1G80920 J8 Chaperone DnaJ- 3.46E- E0A8T4 DnaJ 235.1 34.3 47.3 5.8 07 domain superfamily 29 protein 12625 8.54E- AT3G21690 MATE efflux family 3.4E- K7KYA0 Uncharacterized 59.0 7.9 12.6 5.8 34 protein 64 protein 13392 7.57E- AT2G37460 nodulin MtN21 /EamA- 5.7E- G7L0M3 Auxin-induced 74.8 3.9 22.4 5.7 93 like transporter family 136 protein 5NG4 protein 19639 8.90E- AT4G33355 Bifunctional 8.26E- K4AYX6 Non-specific lipid- 368.7 53.7 76.6 5.7 18 inhibitor/lipid-transfer 22 transfer protein protein/seed storage 2S

190

TAIR UniProt Pea cultivars RPKM Unigene Fold diff. E- E- (contig#) AGI Description AC/ID Description S L Cr S vs L+Cr value value albumin superfamily protein 11772 2.27E- AT3G61510 ACC synthase 1 0 G7L6P5 1- 92.1 8.7 24.0 5.6 154 aminocyclopropane- 1-carboxylate synthase 11454 6.24E- AT5G10930 CIPK5, SnRK3.24 CBL- 0 G7I7M2 CBL-interacting 86.6 15.7 15.2 5.6 160 interacting protein kinase protein kinase 5 06182 3.56E- AT1G65870 Disease resistance- 3E- G7KWC7 Disease resistance 227.9 26.4 55.3 5.6 47 responsive (dirigent-like 114 response protein protein) family protein 09557 1.24E- AT1G22170 Phosphoglycerate mutase 8.1E- G8A376 Phosphatidylinositol 70.0 13.7 12.0 5.4 118 family protein 175 transfer protein 13100 6.77E- AT1G08630 threonine aldolase 1 0 G7I6S9 L-allo-threonine 178.4 32.7 33.1 5.4 158 aldolase 00750 6.23E- AT1G80920 J8 Chaperone DnaJ- 6.79E- Q5QJD5 Chloroplast outer 53.7 9.7 10.6 5.3 09 domain superfamily 13 envelope protein protein translocator Toc12 14847 2.58E- AT1G03790 SOM Zinc finger C-x8-C- 1.49E- G7JTW1 Zinc finger CCCH 55.0 9.0 11.9 5.3 46 x5-C-x3-H type family 70 domain-containing protein protein 08943 0 AT4G19710 aspartate kinase- 0 G7JAT7 Aspartokinase- 87.5 16.4 17.3 5.2 homoserine homoserine dehydrogenase ii dehydrogenase 10109 0 AT4G16480 inositol transporter 4 0 G7IP93 Inositol transporter 58.4 11.8 10.7 5.2 05730 9.11E- AT1G49320 unknown seed protein like 8.2E- G7J6I1 BURP domain- 93.4 3.5 32.7 5.1 14 1 100 containing protein 13424 1.42E- AT1G08650 PPCK1, 1E- G7I696 Calcium-dependent 204.6 41.2 39.0 5.1 102 phosphoenolpyruvate 150 protein kinase carboxylase kinase 1 18920 2.58E- AT3G32930 unknown 2.49E- G7J9K0 Putative 91.0 19.9 16.4 5.0 76 91 uncharacterized protein

191

A.2. Unigenes differentially expressed in PACs versus PLCs. RPKM values. A, ‘Alaska’; Cn, ‘Canstar’; Cr, ‘Courier’; L, ‘LAN3017’; S, ‘Solido’. Fold Diff. Unigene TAIR10 UniProt Pea Cultivar RPKM E- E- PACs vs (contig#) Cr S A Cn value AGI Description value AC/ID Description PLCs 01370 5.29E- AT3G08580 AAC1 ADP/ATP carrier 1 1.60E- I3T6D5 Uncharacterized 187.3 69.3 0.0 0.0 Absolute 28 28 protein 04312 7.57E- AT5G20940 Glycosyl hydrolase family 2.58E- I1MW70 Uncharacterized 216.4 194.0 0.0 0.0 11281.8 108 protein 124 protein 01367 5.11E- AT3G08580 AAC1 ADP/ATP carrier 1 2.63E- R0I357 Uncharacterized 288.1 155.3 0.0 0.1 7691.9 51 49 protein 21057 not available 8.52E- A0N069 Glycine-rich protein 160.9 189.8 0.0 0.1 6478.5 12 03364 9.56E- AT5G13420 Aldolase-type TIM barrel 2.90E- I1KZJ2 Uncharacterized 432.3 374.8 0.2 0.1 3499.1 80 family protein 91 protein 06202 8.08E- AT5G10220 annexin 6 3.80E- Q42922 Annexin 197.2 175.7 0.1 0.1 3430.5 62 77 03567 3.60E- AT5G08640 FLS, ATFLS1, FLS1 9.56E- G7J3J7 Flavonol 246.8 320.0 0.0 0.2 41 flavonol synthase 1 101 synthase/flavanone 3- 3026.4 hydroxylase 11412 1.61E- AT4G22880 ANS 0 A8RRU3 Anthocyanidin 465.9 341.1 0.1 0.2 2902.6 166 synthase 03511 4.88E- AT2G47115 unknown 2.16E- G7IBT3 Putative 60.5 69.6 0.1 0.0 52 65 uncharacterized 2414.3 protein 03400 8.19E- AT2G30880 Pleckstrin homology (PH) 1.98E- G7K3B9 Putative 79.7 73.8 0.0 0.1 37 domain-containing protein 48 uncharacterized 2129.0 protein 18651 not available not 1915.8 2581.5 1.3 0.9 2036.8 20371 9.20E- AT3G55280 ribosomal protein L23AB 2.02E- G7JTN9 60S ribosomal protein 244.4 188.4 0.2 0.1 1931.0 39 38 L23a 03410 2.30E- AT5G48230 acetoacetyl-CoA thiolase 1.58E- G7K0L4 Acetyl-CoA 74.0 64.1 0.1 0.0 39 2 41 acetyltransferase, 1821.9 cytosolic 03509 7.26E- AT2G47115 unknown 4.90E- G7IBT3 Putative 86.3 96.5 0.1 0.1 1469.7

192

Fold Diff. Unigene TAIR10 UniProt Pea Cultivar RPKM E- E- PACs vs (contig#) Cr S A Cn value AGI Description value AC/ID Description PLCs 17 64 uncharacterized protein 00993 1.45E- AT5G42800 DFR 8.93E- F8UWD2 Dihydroflavonol 4- 989.5 941.7 1.0 0.4 1407.2 131 168 reductase 01292 not available not 343.2 397.9 0.3 0.2 1355.6 01908 1.52E- AT2G23970 Class I glutamine 1.20E- B7FHL2 Putative 61.5 63.3 0.0 0.1 12 amidotransferase-like 26 uncharacterized 1336.5 superfamily protein protein 04748 6.39E- AT3G27740 CARA carbamoyl 6.92E- G7JAX1 Carbamoyl-phosphate 134.4 137.0 0.0 0.2 1290.6 44 phosphate synthetase A 58 synthase small chain 00991 2.43E- AT5G42800 DFR 3.63E- F8UWD2 Dihydroflavonol 4- 996.7 396.9 0.9 0.3 1219.0 26 48 reductase 03756 2.10E- AT5G10360 Ribosomal protein S6e 2.88E- Q2HTS1 40S ribosomal protein 145.6 114.8 0.1 0.1 1201.3 44 46 S6 03444 not available not 72.9 63.3 0.1 0.0 1183.4 12295 0 AT5G13930 CHS 0 G7JA71 Chalcone synthase 757.9 503.1 0.7 0.4 1151.9 06579 not available not 88.5 60.9 0.1 0.1 1059.1 00311 7.29E- AT5G13930 CHS 6.79E- B6S399 Chalcone synthase 319.9 319.7 0.7 0.0 949.7 12 12 10774 9.15E- AT1G30530 UDP-glucosyl transferase 2.82E- I1KL63 Uncharacterized 233.2 84.9 0.3 0.1 940.4 77 78D1 146 protein 03198 not available not 114.7 101.1 0.0 0.3 699.2 03696 not available 1.49E- Q43877 HMG-I/Y 126.3 74.1 0.2 0.1 653.1 10 03232 5.13E- AT3G30775 ERD5, PRODH, AT- 8.77E- I3S7A2 Uncharacterized 83.1 165.0 0.4 0.2 86 POX, ATPOX, ATPDH, 128 protein PRO1 447.9 Methylenetetrahydrofolate reductase family protein 00324 0 AT5G13930 CHS 0 F2Y9R4 Chalcone synthase 1 366.7 1285.5 3.1 0.6 437.2 11767 0 AT5G13930 CHS 0 G7JA70 Chalcone synthase 533.0 467.6 1.0 2.1 326.1 00325 3.34E- AT5G13930 CHS 1.02E- A1E5S9 Chalcone synthase 3 558.4 133.2 1.8 0.3 324.0 42 45

193

Fold Diff. Unigene TAIR10 UniProt Pea Cultivar RPKM E- E- PACs vs (contig#) Cr S A Cn value AGI Description value AC/ID Description PLCs 11789 2.28E- AT1G08470 SSL3 strictosidine 0 G7IRH8 Adipocyte plasma 164.9 140.0 0.4 0.7 171 synthase-like 3 membrane- 268.2 associated protein 00322 2.01E- AT5G13930 CHS 2.19E- B6S399 Chalcone synthase 896.6 802.5 7.7 1.4 186.6 28 30 03137 0 AT2G37040 PAL1 0 Q1AJZ5 Phenylalanine 129.7 69.1 0.4 0.7 177.7 ammonia-lyase 00316 3.98E- AT5G13930 CHS 4.83E- A1E5S9 Chalcone synthase 3 916.6 177.4 5.7 0.8 169.3 60 73 00314 1.69E- AT5G13930 CHS 7.38E- Q45NH7 Chalcone synthase 1448.9 285.6 11.7 1.2 134.7 64 83 10268 5.13E- AT1G09850 xylem bark cysteine 0 G7ZXJ6 Cysteine proteinase 150.7 97.3 1.7 0.2 130.9 106 peptidase 3 10920 0 AT3G59030 TT12 0 C9WSQ2 MATE transporter 235.1 97.8 2.6 1.1 90.6 03816 not available not 351.6 185.3 0.2 5.7 90.0 16927 1.39E- AT5G48810 cytochrome B5 isoform D 3.40E- G8A2A6 Cytochrome B5 846.3 763.1 9.0 9.3 87.9 28 66 12550 1.56E- AT5G12890 UDP-Glycosyltransferase 4.14E- G7K854 Cis-zeatin O- 59.8 60.5 1.0 0.5 80.5 46 superfamily protein 164 glucosyltransferase 13359 4.53E- AT5G01450 RING/U-box superfamily 2.31E- K7MTI8 Uncharacterized 80.4 65.0 0.7 1.8 58.1 74 protein 143 protein 03184 8.54E- AT5G61790 CNX1, ATCNX1 4.13E- K7TK23 Uncharacterized 594.2 471.9 19.5 10.5 35.5 28 28 protein 12490 not available 1.85E- G7K3D3 Putative 152.9 92.2 4.8 3.8 40 uncharacterized 28.5 protein 00181 1.14E- AT1G61720 ANR (BAN) 6.26E- H6S0H6 Predicted 3393.3 4262.3 3.0 283.1 65 98 anthocyanidin 26.8 reductase 00198 3.84E- AT1G61720 ANR (BAN) 1.95E- H6S0H6 Predicted 539.5 684.3 0.6 46.1 06 13 anthocyanidin 26.2 reductase 13176 not available not 70.9 76.4 5.7 0.0 25.8 00182 7.27E- AT1G61720 ANR (BAN) 1.59E- H6S0H6 Predicted 2917.1 3661.5 1.9 263.4 24.8

194

Fold Diff. Unigene TAIR10 UniProt Pea Cultivar RPKM E- E- PACs vs (contig#) Cr S A Cn value AGI Description value AC/ID Description PLCs 16 22 anthocyanidin reductase 13442 5.54E- AT4G10490 2-oxoglutarate (2OG) and 8.59E- G7JXT3 1-aminocyclopropane- 115.0 74.4 5.2 2.7 64 Fe(II)-dependent 173 1-carboxylate oxidase 23.9 oxygenase superfamily protein 00193 not available not 751.2 977.7 0.6 86.2 19.9 07172 8.26E- AT4G19640 ARA7, Ras-related small 1.61E- G7II40 Ras-related protein 123.4 170.9 13.9 0.9 68 GTP-binding family 106 Rab-5C 19.9 protein 16364 4.31E- AT5G62350 Plant invertase/pectin 7.94E- G7IYJ3 21 kDa protein 1188.2 1273.2 58.8 68.5 55 methylesterase inhibitor 90 19.3 superfamily protein 02523 1.22E- AT5G45130 RAB5A 6.47E- G7L8X6 Ras-related protein 84.2 87.2 7.8 2.1 17.4 19 49 Rab-5C 02347 0 AT1G55320 AAE18 acyl-activating 0 G7LHZ3 Acetyl-coenzyme A 200.3 170.3 19.0 3.7 16.3 enzyme 18 synthetase 19639 8.90E- AT4G33355 Bifunctional 8.26E- K4AYX6 Non-specific lipid- 76.6 368.7 10.8 16.7 18 inhibitor/lipid-transfer 22 transfer protein protein/seed storage 2S 16.2 albumin superfamily protein 02346 1.88E- AT1G55320 AAE18 acyl-activating 9.33E- G7LHZ3 Acetyl-coenzyme A 172.1 150.2 20.2 1.0 15.2 66 enzyme 18 87 synthetase 12565 5.23E- AT1G52340 ABA2 3.30E- I3SBG4 Uncharacterized 71.1 815.6 0.2 60.1 14.7 66 126 protein 10371 7.05E- AT2G37260 TTG2 0 G7L1N8 WRKY transcription 82.6 66.0 7.7 2.6 14.4 86 factor 10954 7.97E- AT4G01070 UGT72B1 UDP- 0 A4F1R9 Putative 65.4 61.6 7.2 2.9 166 Glycosyltransferase glycosyltransferase 12.6 superfamily protein 00423 1.59E- AT1G50010 TUA2 tubulin alpha-2 4.58E- Q5UMY4 Alpha tubulin 1347.7 1092.3 7.9 208.0 11.3 61 chain 64 12994 1.59E- AT4G39230 NmrA-like negative 0 Q3KN75 Leucanthocyanidin 283.9 84.5 33.0 2.3 10.4 59 transcriptional regulator reductase

195

Fold Diff. Unigene TAIR10 UniProt Pea Cultivar RPKM E- E- PACs vs (contig#) Cr S A Cn value AGI Description value AC/ID Description PLCs family protein 04844 9.20E- AT5G07990 TT7, CYP75B1, (F3'5'H) 4.08E- F2VPT1 Flavonoid 3' 5' 3642.1 2655.0 92.0 541.9 9.9 58 157 hydroxylase 03548 3.19E- AT5G50740 Heavy metal 1.20E- E7E1K2 Metal ion binding 62.2 83.8 0.0 15.0 13 transport/detoxification 23 protein 9.8 superfamily protein 11616 6.58E- AT1G19640 JMT jasmonic acid 0 I1MNB5 Uncharacterized 141.8 53.1 15.1 7.3 97 carboxyl protein 8.7 methyltransferase 04845 2.57E- AT5G07990 TT7, CYP75B1, (F3'5'H) 1.18E- F2VPT1 Flavonoid 3' 5' 3325.4 2523.8 93.1 586.5 8.6 82 146 hydroxylase 18083 not available 2.33E- G7KF23 Putative 69.3 682.4 6.8 89.6 65 uncharacterized 7.8 protein 01607 1.38E- AT1G62290 Saposin-like aspartyl 1.71E- G7JJA2 Aspartic proteinase 571.9 157.3 57.5 46.1 7.0 83 protease family protein 98 12778 1.68E- AT4G24380 15 growth st 2.12E- G7L2Q8 Dihydrofolate 132.1 75.8 25.7 3.9 7.0 66 156 reductase 06727 7.58E- AT4G12470 AZI1 azelaic acid induced 8.48E- G7I725 14 kDa proline-rich 412.3 325.0 36.3 68.9 7.0 26 1 32 protein DC2.15 19151 1.68E- AT2G25060 early nodulin-like protein 6.24E- Q2HT27 Blue (Type 1) copper 117.9 148.9 20.6 17.9 6.9 13 14 60 domain 11963 3.78E- AT1G78440 ATGA2OX1, GA2OX1 4.18E- I1M2Q0 Uncharacterized 75.9 894.9 12.6 132.5 97 Arabidopsis thaliana 153 protein 6.7 gibberellin 2-oxidase 1 09290 0 AT2G37040 PAL1 0 G7IBI3 Phenylalanine 120.6 74.8 24.7 7.4 6.1 ammonia-lyase 05940 6.50E- AT1G26520 Cobalamin biosynthesis 1.29E- G7JWS4 COBW domain- 368.8 104.6 44.2 40.3 5.6 135 CobW-like protein 167 containing protein 10524 0 AT1G65060 CoA ligase 3 0 G7K6G3 4-coumarate CoA 112.4 67.5 20.0 13.0 5.4 ligase 11345 0 AT2G30490 C4H 0 B2LSE0 Cinnamic acid 4- 313.8 264.7 86.5 25.3 hydroxylase 5.2 (Fragment)

196

Fold Diff. Unigene TAIR10 UniProt Pea Cultivar RPKM E- E- PACs vs (contig#) Cr S A Cn value AGI Description value AC/ID Description PLCs 10689 0 AT1G51680 CoA ligase 1 0 G7JVI7 4-coumarate CoA 84.0 64.4 11.2 17.7 5.1 ligase

197

A.3. Key genes involved in hormone, amino acid and polysaccharide metabolism. RPKM values: A, ‘Alaska ’; Cn, ‘Canstar’; Cr, ‘Courier’; L, ‘LAN3017’; S, ‘Solido’. TAIR10 Uniprot Pea cultivar RPKM Unigene E- E- (contig#) AGI Description AC/ID Uniprot Description A Cn Cr L S value value ABA metabolism 09035 0 AT5G67030 zeaxanthin epoxidase 0 G8A346 zeaxanthin epoxidase 16.7 19.5 19.9 17.6 30.4 09951 0 AT5G67030 (ZEP/ABA1) 0 G0Z350 zeaxanthin epoxidase 41.7 55.4 41.6 47.6 120.9 09237 0 AT3G14440 9-cis-epoxycarotenoid 0 Q8LP15 9-cis-epoxycarotenoid 21.6 80.3 20.4 19.2 40.1 dioxygenase3 (NCED3) dioxygenase3 03539 1.4E- AT1G52340 ABA2 1.40E G7IW74 short-chain 243.4 645.3 594.6 303.6 512.9 126 -160 dehydrogenase/ reductase 03925 1.54E- AT1G52340 2.36E I3S6M3 uncharacterized protein 898.4 859.6 646.4 1263.7 942.0 23 -51 12565 5.23E- AT1G52340 3.30E I3SBG4 uncharacterized protein 0.2 60.1 71.1 60.1 815.6 66 -126 ABA2* 496.2 844.1 767.1 728.9 1529.1 08531 0 AT2G27150 abscisic aldehyde 0 B0LB01 aldehyde oxidase 3 50.2 41.8 51.0 35.9 47.8 11191 3.1E- AT2G27150 oxidase3 (AAO3) 0 B0LB00 aldehyde oxidase 2 46.0 20.1 25.0 12.5 30.8 173 14964 1.8E- AT5G45340 CYP707A3 1.62E G7K418 abscisic acid 8'- 1.0 2.9 3.1 1.9 3.5 126 -160 hydroxylase GA metabolism 06029 3.26E- AT2G30840 2OG-Fe(II) oxygenase 1.81E G8A030 gibberellin 20 oxidase 75.6 157.2 76.7 102.4 207.8 36 superfamily -77 1 01813 4.18E- AT3G19000 2OG-Fe(II) oxygenase 4.50E G7JMJ1 gibberellin 20 oxidase 298.4 223.7 275.6 190.0 176.5 81 superfamily -136 12856 6.43E- AT5G07200 gibberellin 20-oxidase3 0 G7LE51 gibberellin 20-oxidase 556.0 381.2 387.9 504.7 489.1 96 20425 1.18E- AT5G07200 2.34E G7LE46 gibberellin 20-oxidase 6.5 4.0 4.2 6.8 5.1 44 -64 14433 6.9E- AT5G51810 gibberellin 20-oxidase2 0 Q9LD21 gibberellin 20-oxidase 18.8 4.7 20.7 25.1 12.6 139 GA20ox* 816.0 612.6 633.2 706.0 716.7

198

TAIR10 Uniprot Pea cultivar RPKM Unigene E- E- (contig#) AGI Description AC/ID Uniprot Description A Cn Cr L S value value 06761 3.55E- AT2G36690 2OG-Fe(II) oxygenase 6.90E G7KAK1 gibberellin 3-beta- 7.0 6.4 9.2 5.2 7.5 12 superfamily -42 dioxygenase 06762 2.99E- AT5G24530 2OG-Fe(II) oxygenase 2.09E G7KAK1 gibberellin 3-beta- 3.5 4.5 6.3 2.8 8.1 26 superfamily -52 dioxygenase 20887 6.71E- AT1G52810 2OG-Fe(II) oxygenase 2.13E G7JTE6 gibberellin 2-beta- 12.1 3.7 0.1 2.1 1.9 17 superfamily -37 dioxygenase 11963 3.78E- AT1G78440 gibberellin 2-oxidase1 4.18E I1M2Q0 uncharacterized protein 12.6 132.5 75.9 37.1 894.9 97 -153 Auxin metabolism 17438 1.93E- AT3G51060 STY1 8.70E G7KGA2 short internode related 21.9 20.0 16.0 20.9 12.2 33 -66 sequence 18479 8.39E- AT4G31820 NPY1 2.14E G7I878 BTB/POZ domain- 14.6 5.4 4.5 7.4 3.9 51 -116 containing protein 11824 5.8E- AT1G48910 YUC10 flavin-containing 0 I3SWC6 uncharacterized protein 34.2 34.0 38.7 44.0 53.1 99 monooxygenase 12224 1.5E- AT1G48910 0 I1JQG5 uncharacterized protein 33.2 2.5 3.3 2.0 24.3 114 18625 3.44E- AT1G48910 5.93E K7KM38 uncharacterized protein 10.8 8.3 8.7 8.1 7.6 70 -92 20725 5.09E- AT1G21430 YUC11 flavin-binding 1.30E I1KAH8 uncharacterized protein 11.0 6.7 7.9 10.4 6.5 17 monooxygenase -25 18615 4.17E- AT5G20960 AAO1, aldehyde 3.52E B0LB00 Aldehyde oxidase2 19.4 9.5 13.7 5.9 18.3 48 oxidase1 -114 02460 8.67E- AT5G56660 3.87E G7IRW2 IAA-amino acid 1130.8 398.7 655.7 758.0 638.5 43 -61 hydrolase ILR1-like protein 02455 1.4E- AT1G51760 IAR3, JR3 peptidase 4.92E Q0GXX4 Auxin conjugate 866.5 504.1 566.8 622.1 434.5 123 M20/M25/M40 family -137 hydrolase 02459 2.2E- AT1G51760 protein 0 G7IHX1 IAA-amino acid 937.7 353.8 537.3 607.2 534.0 149 hydrolase ILR1-like protein 02456 2.18E- AT1G51780 IAA-leucine resistant 1.41E B7FI00 Uncharacterized 423.8 249.2 263.0 282.2 201.0 13 (ILR)-like gene 5, ILL5 -29 protein 00497 1.1E- AT4G27260 GH3.5, WES1 Auxin- 0 A2Q639 GH3 auxin-responsive 2249.9 1909.9 2071.9 1874.1 2020.9 167 responsive GH3 family promoter

199

TAIR10 Uniprot Pea cultivar RPKM Unigene E- E- (contig#) AGI Description AC/ID Uniprot Description A Cn Cr L S value value 21119 1.32E- AT4G27260 protein 2.58E A2Q639 1462.9 1244.5 1033.9 1053.4 1036.1 33 -35 00495 1.51E- AT5G54510 GH3.6, DFL1 Auxin- 1.65E A2Q639 1860.4 1551.0 1857.7 1644.4 1697.5 35 responsive GH3 family -45 00499 5.04E- AT5G54510 protein 6.17E A2Q639 2414.0 2101.4 1921.3 1862.0 1964.0 96 -105 Amino acid transporters 11342 1.6E- AT1G44800 SILIQUES ARE RED 1 0 I3T3M3 uncharacterized protein 585.0 950.2 413.1 355.9 1369.4 147 01124 1.39E- AT1G58360 amino acid permease 1 3.025 Q7Y077 Amino acid permease 1 110.2 113.9 337.5 82.2 320.6 32 (AAP1) E-34 10762 6.7E- AT1G58360 0 G7J3I7 Amino acid permease 170.6 94.0 138.6 177.3 126.5 154 AAP1* 217.8 142.8 283.0 212.5 263.8 12315 5.3E- AT5G09220 amino acid permease 2 0 Q9ZR62 Amino acid transporter 9.6 10.3 10.1 15.6 21.5 164 (APP2) 09737 0 AT5G09220 0 Q93X14 Amino acid permease 42.9 46.5 32.8 34.3 42.6 AAP2* 49.2 53.4 39.5 44.6 56.9 20713 3.81E- AT1G77380 amino acid permease 3 5.26E Q56H85 Amino acid transporter 5.0 3.5 7.7 5.0 6.5 66 (APP3) -91 11719 0 AT5G49630 amino acid permease 6 0 G7I283 Amino acid transporter 27.7 33.9 16.9 27.0 165.7 01119 6.9E- AT5G49630 (AAP6) 6.55E G7I283 Amino acid transporter 356.0 295.9 407.1 306.8 359.3 145 -152 01120 2.77E- AT5G49630 4.223 M1A1P1 Uncharacterized 381.3 194.3 17.2 284.8 16.2 34 E-37 protein AAP6* 258.7 204.7 224.3 218.9 306.1 05520 7.36E- AT5G23810 amino acid permease 7 9.51E G7K4U9 Amino acid permease 4.2 18.4 28.8 27.3 31.9 98 (AAP7) -134 05521 9.51E- AT5G23810 3.392 K7M4A3 Uncharacterized 0.6 1.5 5.5 3.8 4.4 15 E-21 protein 05519 3.78E- AT5G23810 1.204 G7K4U9 Amino acid permease 4.8 13.0 25.7 21.6 26.5 62 E-94 11122 2E-136 AT5G23810 0 K7MQE6 Uncharacterized 15.9 33.5 7.5 12.8 10.2 protein 200

TAIR10 Uniprot Pea cultivar RPKM Unigene E- E- (contig#) AGI Description AC/ID Uniprot Description A Cn Cr L S value value 12799 2.7E- AT5G23810 0 G7K4V0 Amino acid permease 75.0 60.8 42.9 50.7 37.9 110 APP7* 79.7 96.3 66.8 75.6 67.0 09732 0 AT1G58030 cationic amino acid 0 G7K4U9 Amino acid/polyamine 14.1 18.7 17.6 15.5 37.2 transporter 2 (CAT2) transporter I 09535 0 AT1G58030 0 I1K1U1 Uncharacterized 25.5 20.7 42.9 43.1 37.1 20891 3.1E- AT1G58030 1.224 I1K1U1 protein 0.6 3.2 0.9 4.8 0.8 05 E-06 05293 2.13E- AT1G58030 5.423 G7L1J3 CCP (Amino 1.7 5.7 5.2 4.5 34.7 61 E-74 acid/polyamine transporter I) 18986 7.64E- AT1G58030 1.245 G7L1J3 CCP (Amino 3.5 6.3 6.0 5.0 40.7 52 E-76 acid/polyamine transporter I) CAT2* 40.3 41.9 62.5 61.0 90.7 10500 0 AT5G41800 Transmembrane amino 0 I1K6A7 Uncharacterized 49.5 40.4 39.1 26.1 39.4 acid transporter family protein protein 02489 6.42E- AT3G30390 Transmembrane amino 1.809 K7MWE5 Uncharacterized 17.9 20.3 21.0 21.8 43.8 82 acid transporter family E-97 protein 09906 9.1E- AT3G30390 protein 0 G7KS18 Amino acid transporter 75.8 63.7 74.1 74.6 98.8 180 02490 1.25E- AT3G30390 1.504 I1KWM4 Uncharacterized 20.5 26.9 20.9 17.6 21.6 61 E-72 protein 12633 1.4E- AT3G30390 0 G7LG76 Sodium-coupled 27.4 20.9 16.6 20.2 21.8 153 neutral amino acid transporter AT3G30390* 109.5 96.4 102.3 104.6 142.0 11128 1.4E- AT3G56200 Transmembrane amino 0 G7K7B8 Sodium-coupled 30.7 54.4 57.7 46.7 121.7 150 acid transporter family neutral amino acid protein transporter 05187 4.67E- AT3G56200 1.92E G7J2G3 Sodium-coupled 39.1 29.6 19.2 25.5 38.0 84 -131 neutral amino acid transporter

201

TAIR10 Uniprot Pea cultivar RPKM Unigene E- E- (contig#) AGI Description AC/ID Uniprot Description A Cn Cr L S value value 05186 4.46E- AT3G56200 1.194 K4N2J8 Amino acid transporter 34.4 9.6 11.4 21.7 17.1 15 E-26 protein 17996 5.52E- AT3G56200 2.916 G7J2G4 Sodium-coupled 48.2 10.0 10.8 23.7 10.1 42 E-56 neutral amino acid transporter AT3G56200* 87.3 83.5 79.2 80.2 158.8 02492 4.2E- AT5G38820 Transmembrane amino 4.591 K7MWE5 Uncharacterized 11.9 18.1 19.4 16.9 39.1 52 acid transporter family E-71 protein 02493 8.26E- AT5G38820 protein 1.255 K7MWE5 4.6 4.9 7.1 5.9 6.0 16 E-17 15754 9.12E- AT5G38820 5.844 I3SRL7 Uncharacterized 25.8 28.2 20.7 19.8 20.5 26 E-28 protein AT5G38820* 34.3 40.3 34.3 31.5 45.0 14657 3.7E- AT5G40780 lysine histidine 1.38E I1N7F7 Uncharacterized 2.3 0.1 4.5 7.3 5.0 135 transporter 1 (LHT1) -142 protein 11612 0 AT5G40780 0 K7MFH4 Uncharacterized 89.3 67.3 72.2 79.2 44.5 protein LHT1* 90.9 67.4 75.2 84.1 47.8 13031 0 AT5G64410 oligopeptide transporter 0 A7UQT0 Oligopeptide 20.9 21.9 10.9 18.3 15.2 4 (OPT4 ) transporter 18627 1.97E- AT5G64410 5.18E I1ME24 Uncharacterized 23.1 19.6 8.9 17.3 14.9 57 -68 protein OPT4* 32.6 31.9 15.4 27.1 22.7 16763 1.38E- AT5G53510 oligopeptide transporter 3.12E I1KAW1 Uncharacterized 14.0 14.6 15.2 8.6 12.4 96 9 (OPT9) -97 protein 19771 1.63E- AT5G53520 oligopeptide transporter 1.87E K7KLV6 Uncharacterized 13.8 13.2 11.8 9.6 9.1 23 8 (OPT8) -26 protein Amino acid metabolism 05012 7.8E- AT3G16150 N-terminal nucleophile 4.84E Q2HTR7 L-asparaginase 1639.2 2726.2 2329.7 2901.2 1525.8 134 aminohydrolases -156 05013 6.18E- AT3G16150 superfamily protein 2.93E B4UWB5 L-asparaginase 3.2 2243.2 2175.6 2662.7 1449.6 45 -47 05014 5.33E- AT3G16150 2.46E B4UWB5 2142.7 1.8 1.9 1.4 1.0 46 -48

202

TAIR10 Uniprot Pea cultivar RPKM Unigene E- E- (contig#) AGI Description AC/ID Uniprot Description A Cn Cr L S value value L-asparaginase* 2290.0 3840.8 3410.8 4224.0 2246.1 09770 0 AT4G29840 pyridoxal-5'-phosphate- 0 L0EK99 threonine synthase 332.6 373.4 384.0 265.6 490.3 dependent enzyme family protein 00414 1.9E- AT1G08630 THA1 threonine aldolase 3.85E I3SIF2 uncharacterized protein 652.8 487.9 507.6 721.3 881.3 143 1 -166 11394 0 AT5G11520 aspartate 0 I3SJC8 aspartate 343.7 341.8 282.8 421.3 273.3 aminotransferase 3 aminotransferase 11508 0 AT4G31990 aspartate 0 Q40325 aspartate 270.8 305.6 316.8 276.7 266.6 aminotransferase 5 aminotransferase 15396 2.5E- AT2G03667 asparagine synthase 6.64E G7JHJ3 asparagine synthetase 5.3 5.8 4.8 5.1 4.1 101 family protein -177 domain-containing protein 15427 2.7E- AT4G27450 aluminium induced 3.34E G7IP18 asparagine synthetase 8.3 8.7 17.5 13.3 197.6 101 protein with YGL and -146 LRDR motifs 08742 0 AT3G47340 glutamine-dependent 0 P93618 asparagine synthetase 9.6 3.9 4.6 5.0 183.7 18012 8.27E- AT3G47340 asparagine synthase 1 1.50E G7K282 asparagine amidase A 11.3 9.6 10.2 8.1 8.9 23 -59 16732 1.22E- AT3G47340 9.72E G7JZK0 asparagine synthetase 0.2 0.4 1.2 0.4 16.6 87 -110 glutamine-dependent asparagine synthase 1* 12.1 6.1 7.1 6.8 189.9 10544 0 AT5G10240 asparagine synthetase 3 0 G7J3R9 asparagine synthetase 62.7 59.8 53.1 41.2 52.6 09466 0 AT5G49570 peptide-N-glycanase 1 0 G7L1U8 asparagine amidase 32.7 30.8 35.1 32.4 30.7 Polysaccharide metabolism 10278 0 AT3G13790 Glycosyl hydrolases 0 Q43856 Cell wall invertase II; 53.1 103.7 131.3 149.1 147.7 family 32 protein beta-furanofructosidase 09850 0 AT4G09510 cytosolic invertase 2 0 G7IAG4 Neutral invertase-like 81.5 71.9 78.5 76.2 60.0 protein 11144 0 AT2G36190 cell wall invertase 4 0 Q43855 Beta- 46.6 69.0 30.9 35.2 36.5 fructofuranosidase; cell wall invertase I 09138 0 AT5G22510 A/N-InvE 0 G7LBD0 Alkaline/neutral 23.3 20.5 22.6 25.0 20.2 alkaline/neutral invertase invertase 203

TAIR10 Uniprot Pea cultivar RPKM Unigene E- E- (contig#) AGI Description AC/ID Uniprot Description A Cn Cr L S value value 04916 0 AT3G06500 Plant neutral invertase 0 I1KMM2 Uncharacterized 11.2 7.4 11.0 8.7 14.2 family protein protein 04917 4.41E- AT3G05820 A/N-InvH invertase H 3.19E I1KMM2 Uncharacterized 10.0 7.5 9.4 9.6 10.2 52 -58 protein 12669 0 AT1G56560 Plant neutral invertase 0 G7I9I6 Neutral invertase 9.0 10.3 10.0 8.2 9.2 family protein

02029 0 AT5G20830 sucrose synthase 1 0 I1SUZ1 sucrose synthase 1842.2 2182.9 1077.3 1360.3 828.3 08973 0 AT4G02280 sucrose synthase 3 0 K7KGC7 sucrose synthase 18.0 22.3 10.3 15.0 154.2 04058 0 AT3G43190 sucrose synthase 4 0 Q9AVR8 sucrose synthase 721.7 651.1 424.3 370.0 373.5 04059 2.44E- AT3G43190 2.43E Q9XG55 sucrose synthase 571.1 461.0 243.3 265.7 230.0 23 -28 04060 1.13E- AT3G43190 1.11E I1L1U2 Sucrose synthase 134.3 124.4 67.7 72.6 70.5 21 -24 sucrose synthase 4* 863.0 767.1 485.8 437.0 432.4 19967 2.98E- AT5G37180 sucrose synthase 5 8.01E G7KFT7 Sucrose synthase 4.9 6.6 2.5 6.1 6.6 29 -41 16398 1.45E- AT1G73370 sucrose synthase 6 1.51E G7KM39 Sucrose synthase 8.5 9.7 5.0 10.4 9.0 69 -104 04254 0 AT1G73370 0 G7J800 Sucrose synthase 5.8 8.3 2.9 8.4 8.6 sucrose synthase 6* 11.5 14.7 6.2 15.3 14.5 16258 6.03E- AT5G62350 Plant invertase/pectin 3.72E G7IYJ3 21 kDa protein 250.0 173.1 242.4 291.2 375.5 56 methylesterase inhibitor -99 16364 4.31E- AT5G62350 superfamily protein 7.94E G7IYJ3 58.8 68.5 1188.2 858.8 1273.2 55 -90 PMEI* 308.0 240.6 1415.2 1138.9 1632.1

204