Molecular Mapping and Characterization of Phenylpropanoid Pathway Genes in Common ( vulgaris L.)

By

Zeinab Yadegari

A Thesis Presented to The University of Guelph

In partial fulfillment of requirements for the degree of Doctor of Philosophy in Agriculture

Guelph, Ontario, Canada © Zeinab Yadegari, June 2013

ABSTRACT

MOLECULAR MAPPING AND CHARACTERIZATION OF PHENYLPROPANOID PATHWAY GENES IN COMMON BEAN ( L.)

Zeinab Yadegari Advisor: University of Guelph, 2013 Professor K. P. Pauls

Common bean is a nutritionally and economically important food crop and a major source of dietary protein in many developing countries throughout the world. Seed coat colour and size in this crop are the main factors determining its marketability in different parts of the world. Flavonoid compounds that are responsible for seed coat colour in have been shown to have anti-oxidant, anti-proliferative, anti-tumor, anti-inflammatory, and pro- apoptotic activities. They also may enhance the resistance of beans to pest and disease. A better understanding of the relationships between seed coat colour and flavonoid metabolism in the seed coat may help breeders to select for more nutritionally-beneficial bean varieties. The objective of this research was to test the hypothesis that the genes determining colour in beans are structural and regulatory genes of the phenylpropanoid pathway.

The map positions of phenylpropanoid genes were determined in two recombinant inbred populations. Segregation patterns of 18 phenylpropanoid pathway genes in the

BAT93 × Jalo EEP 558 RIL population and five phenylpropanoid pathway genes in OAC

Rex × SVM Taylor were used to place them on the linkage maps for these populations. Five out of 18 genes were mapped within 2-17 cM of colour gene loci in the BAT93 × Jalo EEP

558 RIL population.

The sequences of central genes of the phenylpropanoid pathway were determined by sequencing 6 BAC clones selected with probes for two PAL genes, two CHS genes, DFR, and Myb. The functional annotations of the BAC clones were determined and the similarities between bean phenylpropanoid genes and their corresponding orthologs in other plant species were investigated.

A recently developed approach of whole genome sequence comparison was utilized to compare the microsynteny of the sequenced BAC clones with regions of the soybean genome. The physical locations of BAC clones were verified on the bean genome and their counterpart locations on the soybean genome were confirmed. The results agreed with previous studies that indicated that bean genome segments have two homologous segments in soybean and confirmed the high degree of microsynteny that is shared between bean and soybean.

ACKNOWLEDGEMENTS

I would like to express my deep and sincere gratitude to my advisor Dr. Peter Pauls for his excellent support and guidance throughout this project. I am thankful for having the opportunity to work within his group.

Special thanks to my advisory committee, Dr. Judith Strommer, Dr. Istvan Rajcan and Dr. Annette Nassuth for their valuable input during the course of this research, its writing and defense.

I am deeply grateful to Dr. Matthew Blair the external examiner of this thesis for his thorough review and constructive comments. My special thanks extend to Dr. Alireza

Navabi for his great comments on this work.

I would like to acknowledge the help of members of Peter’s lab specially Jan

Brazolot, Dr. Yarmilla Reinprecht and Dr. Loo-Sar Chia. Special thanks go to Tom Smith for his help with greenhouse work.

I am grateful for the love and support of my family specially my husband Ali who helped me a lot during bioinformatics analysis of the data. Also my sincere thanks go to my parents for their encouragement and support throughout my life.

iv

TABLE OF CONTENTS

I. INTRODUCTION ...... 1 The bean crop in the world and Ontario ...... 1 Centres of origin for P. vulgaris ...... 1

II. LITERATURE REVIEW ...... 3 Nutritional importance of dry bean ...... 3 The genetics of seed coat colour in bean ...... 4 Association between colour genes and the phenylpropanoid pathway in common bean ...... 6 Phenylpropanoid pathway and colour ...... 10 Phenylalanine ammonia (PAL) ...... 13 Chalcone synthase (CHS) ...... 18 Dihydroflavonol 4-reductase (DFR) ...... 24 Myb Transcription factor ...... 29 Phenylpropanoid compounds and disease resistance in common bean ...... 32 Linkage mapping in common bean ...... 33 Physical map in common bean ...... 34 Genomics and bioinformatics in Legumes ...... 34 Genomic features of common bean ...... 38 Available Common Bean Genomic Resources ...... 38 Thesis hypothesis and objectives ...... 38

III. MAPPING OF PHENYLPROPANOID PATHWAY GENES IN COMMON BEAN (PHASEOLUS VULGARIS L.) ...... 40 Abstract ...... 40 Introduction ...... 42 Materials and methods ...... 44 Plant materials ...... 44 DNA extraction, digestion, and electrophoresis ...... 45 Southern blotting, hybridization and autoradiography ...... 47 SSR and SNP genotyping in the OAC population ...... 47 Statistical and linkage analysis ...... 48 Results ...... 49 Marker polymorphism and segregation ...... 49 Core population ...... 49 OAC Rex × SVM Taylor RI population ...... 54 Construction of the linkage map ...... 57 Core population ...... 57 OAC Rex × SVM Taylor RI population ...... 58 Integrated genetic map and genome sequence ...... 65 Discussion ...... 71 Phenylpropanoid genes in core population ...... 71 Cosegregation of DFR1 and flower colour in bean ...... 74

v

IV. MOLECULAR CHARACTERIZATION OF PHENYLPROPANOID GENES IN COMMON BEAN (PHASEOLUS VULGARIS L.) ...... 76 Abstract ...... 76 Introduction ...... 77 Materials and Methods ...... 79 Extraction of total RNA ...... 79 Amplification of partial genomic sequence ...... 79 PCR products detection and sequence analysis ...... 80 RT–PCR characterization of PvCHS-A, PvCHS-B and PvDFR expression in seed coat of P.vulgaris ...... 80 BAC library screening ...... 81 DIG Probe Synthesis ...... 81 Analysis of BAC clones ...... 82 Colony Selection and PCR confirmation ...... 83 Plasmid Purification and quantification for DNA Sequencing ...... 83 Sequence assembly and annotation ...... 85 Sequence comparison and phylogenetic analysis ...... 85 Structure prediction of sequenced genes ...... 86 Results ...... 87 Expression of PvCHS-A and PvCHS-B genes in the seed coat of common bean ...... 87 454-Sequencing of PV-GBa 0083H05 Clone (PvCHS-A) ...... 88 Functional annotation of the BAC clone PV-GBa 0083H05 ...... 88 454-Sequencing of PV-GBa 0005G03 Clone (PvCHS-B) ...... 90 Functional annotation of the BAC clone PV-GBa 0005G03 ...... 91 PvCHS-A and PvCHS-B sequence analysis ...... 93 PvCHS protein analysis ...... 102 Three-Dimensional Model Analysis ...... 104 In silico amino acid analysis of PvCHS-B gene family ...... 105 Expression of PvDFR gene in the seed coat of common bean ...... 107 454-Sequencing of PV-GBa 0072I22 Clone (PvDFR3) ...... 108 Functional annotation of the BAC clone PV-GBa 0072I22 ...... 108 PvDFR3 and PvDFR1 nucleotide alignment ...... 110 PvDFR3 protein analysis ...... 121 Three-Dimensional Model Analysis ...... 121 454-Sequencing of PV-GBa 0043K12 Clone (PvMyb15 transcription factor) .... 122 Functional annotation of the BAC clone PV-GBa 0043K12 ...... 123 PvMyb15 sequence analysis ...... 125 Three-Dimensional Model Analysis ...... 128 In silico upstream cis-acting elements of PvMyb15 ...... 129 454-Sequencing of PV-GBa 0079P22 Clone (PvPAL2) ...... 130 Functional annotation of the BAC clone PV-GBa 0079P22 ...... 130 PvPAL2 sequence analysis ...... 132 Three-Dimensional Model Analysis ...... 133 454-Sequencing of PV-GBa 0061D18 Clone (PvPAL3) ...... 134 Functional annotation of the BAC clone PV-GBa 0061D18 ...... 135 PvPAL3 sequence analysis ...... 138

vi

Three-Dimensional Model Analysis ...... 138 Discussion ...... 147 PvCHS-A and PvCHS-B ...... 147 Pv ِ DFR gene ...... 147 PvMyb15transcription factor ...... 155 PvPAL2 and PvPAL3 ...... 157

V. STRUCTURAL CHARACTERIZATION OF SIX BAC CLONE SEQUENCES IN COMMON BEAN AND THEIR SYNTENIC RELATIONSHIP WITH SOYBEAN ..... 160 Abstract ...... 160 Introduction ...... 161 Materials and methods ...... 165 BAC isolation and DNA preparation ...... 165 Analysis of microsynteny ...... 166 Results ...... 166 Comparison of orthologous regions of common bean PV-GBa 0005G03 BAC clone and soybean ...... 167 Comparison of orthologous regions of common bean PV-GBa 0083H05 BAC clone and soybean ...... 171 Molecular analysis of chalcone synthase gene in bean and soybean ...... 174 Comparison of orthologous regions of common bean PV-GBa 0072I22 BAC clone and soybean ...... 175 Comparison of orthologous regions of common bean PV-GBa 0043K12 BAC clone and soybean ...... 178 Comparison of orthologous regions of common bean PV-GBa 0079P22 BAC clone and soybean ...... 181 Comparison of orthologous regions of common bean PV-GBa 0061D18 BAC clone and soybean ...... 184 Molecular analysis of Phenylalanine ammonia-lyase genes in bean and soybean .... 187 Discussion ...... 188

VI. CONCLUSION AND FINAL REMARKS ...... 192

VII. REFERENCES ...... 198

IX. APPENDICES ...... 227 Appendix 1 ...... 228 Appendix 2 ...... 232

vii

LIST OF TABLES

Table 1. Seed coat colour and pattern genes and their effects on the seed coat of common bean based on the core map linkage group designation (Adapted from Bassett, 2007) ...... 5 Table 2. Primers for genes that were mapped as PCR markers ...... 46 Table 3. Polymorphisms of the phenylpropanoid genes in the Core population ...... 50 Table 4. Chi-square test of segregation ratios for phenylpropanoid gene and linkage map position in the BJ population ...... 54 Table 5. Polymorphisms for genotyping phenylpropanoid genes in OAC population ...... 55 Table 6. Chi-square test of segregation ratios for phenylpropanoid gene ...... 57 Table 7. Cosegregation of flower colour with DFR1...... 59 Table 8. Single marker analysis of DFR1 and flower colour in OAC Rex × SVM Taylor population...... 59 Table 9. Primer pairs used to amplify PvCHS-A, PvCHS-B, PvDFR, PvMyb, PvPAL2 and PvPAL3 ...... 80 Table 10. Next generation sequencing of PV-GBa 0083H05 clone ...... 88 Table 11. Predicted genes in the BAC clone PV-GBa 0083H05 and their putative function ...... 90 Table 12. Next generation sequencing of PV-GBa 0005G03 clone ...... 91 Table 13. Predicted genes in the BAC clone PV-GBa 0005G03 and their putative identities...... 93 Table 14. Next generation sequencing of PV-GBa 0072I22 clone...... 108 Table 15. Predicted genes in the BAC clone PV-GBa 0072I22 and their putative function ...... 110 Table 16. Next generation sequencing of PV-GBa 0043K12 clone ...... 122 Table 17. Predicted genes in the BAC clone PV-GBa 0043K12 and their putative function ...... 125 Table 18. Potential cis-acting elements associated with the PvMYB15 gene 1500 bp upstream of the gene start codon...... 129 Table 19. Next generation sequencing of PV-GBa 00479P22 clone...... 130 Table 20. Predicted genes in the BAC clone PV-GBa 00479P22 and their putative function ..... 132 Table 21. Next generation sequencing of PV-GBa 0061D18 clone...... 134 Table 22. Predicted genes in the BAC clone PV-GBa 0061D18 and their putative function ...... 137 Table 23. Annotations and positions of genes in PV-GBa 0005G03 BAC clone of common bean and its syntenic regions in soybean...... 169 Table 24. Annotations and positions of genes in PV-GBa 0083H05 BAC clone of common bean and its syntenic regions in soybean...... 173 Table 25. Annotation and position of genes in PV-GBa 0072I22 BAC clone of common bean and its syntenic region in soybean...... 177 Table 26. Annotation and position of genes in PV-GBa 0043K12 BAC clone of common bean and its syntenic region in soybean...... 180 Table 27. Annotation and position of genes in PV-GBa 0079P22 BAC clone of common bean and its syntenic region in soybean...... 183 Table 28. Annotation and position of genes in PV-GBa 0061D18 BAC clone of common bean and its syntenic region in soybean...... 186

viii

LIST OF FIGURES

Figure 1. Summary of the work of Beninger and Hosfield showing the phenylpropanoid pathway leading to the flavonoid pigments found in seed coats of P. vulgaris...... 8 Figure 2. Scheme of branch pathways of phenylpropanoid metabolism in leading to the synthesis of anthocyanins, flavonols, PAs, and lignin...... 10 Figure 3. General scheme of the flavonoid pathway...... 13 Figure 4. PAL and TAL conversions, showing substrates and products...... 14 Figure 5. Typical reactions catalyzed by plant Chalcone synthase ...... 18 Figure 6. Sequence and related features of the bean CHS15 promoter...... 22 Figure 7. Bean CHS15 promoter and regulators ...... 23 Figure 8. A schematic diagram showing the catalyzed by dihydrofavonol 4- reductase ...... 26 Figure 9. Syntentic relationships of soybean relative to the common bean genetic map ...... 37 Figure 10. Five phenylpropanoid genes were mapped by PCR in the core RIL population ...... 51 Figure 11. 13 phenylpropanoid genes were mapped as RFLP markers in the core RIL population ...... 53 Figure 12. Five phenylpropanoid genes were mapped by PCR in OAC Rex × SVM Taylor RIL population ...... 56 Figure 13. Core Linkage map of the common bean recombinant inbred population ...... 62 Figure 14. ‘OAC Rex’ × ‘SVM taylor’ linkage map of the common bean recombinant inbred population ...... 65 Figure 15. Integrated genetic map and sequence assembly ...... 70 Figure 16. Chalocone synthase (CHS) and chalchone (CHI) expression in seed coats of bean ...... 87 Figure 17. Schematic of the distribution of putative genes identified in P. vulgaris PV-GBa 0083H05 BAC clone...... 89 Figure 18. Schematic of the distribution of putative genes identified in P. vulgaris PV-GBa 0005G03 BAC clone ...... 92 Figure 19. DNA sequence comparison of PvCHS-A and PvCHS-B ...... 96 Figure 20. Multi-alignment of DNA sequences for individual PvCHS-B sequences in the PV- GBa 0005G03 BAC clone ...... 97 Figure 21. Alignment tree of coding sequences of PvCHS-B multigene family, PvCHS-A and previously known CHS genes of common bean ...... 101 Figure 22. Multi-alignment of deduced amino acid sequences of PvCHS-B and other leguminosae CHSs ...... 103 Figure 23. The three-dimensional structure of the predicted protein encoded by the PvCHS-B- 1 gene...... 105 Figure 24. Phylogenetic tree of deduced amino acid sequences of PvCHS-B multigene family . 105 Figure 25. Multi-alignment of deduced amino acid sequences of PvCHS-B family members .... 106 Figure 26. Dihydroflavonol-4-reductase (PvDFR) expression in seed coats of bean...... 107 Figure 27. Schematic of the distribution of putative genes identified in P. vulgaris PV-GBa 0072I22 BAC clone containing the PvDFR3gene...... 109 Figure 28. Genomic DNA sequence comparison of PvvDFR1, PvDFR2, PvDFR3 and PvDFR4 of P. vulgaris ...... 112 ix

Figure 29. Alignment of deduced amino acid sequences for DFR genes of common bean ...... 119 Figure 30. A Neighbor-joining phylogenetic tree of dihydroflavonol 4-reductase (DFR) amino acids sequences...... 121 Figure 31. Deduced amino acid sequence of PvDFR3...... 121 Figure 32. Three dimensional structure of PvDFR3 ...... 122 Figure 33. Schematic of the distribution of putative genes identified in P. vulgaris PV-GBa 0043K12 BAC clone ...... 124 Figure 34. Phylogenetic relationship of few R2R3-MYB subgroups MYB transcription factors of Arabidopsis and anthocyanin-related of other plants with bean MYB15 ...... 126 Figure 35. Comparison of predicted PvMyb15 protein sequence of P.vulgaris with homolog MYB proteins of other species ...... 127 Figure 36. Three dimensional structure of PvMyb15 using phyre2 server ...... 128 Figure 37. Schematic of the distribution of putative genes identified in P. vulgaris PV-GBa 00479P22 BAC clone ...... 131 Figure 38. Deduced amino acid sequence of PvPAL2 ...... 133 Figure 39. The three-dimensional structure of the predicted protein of the PvPAL2 gene ...... 134 Figure 40. Schematic of the distribution of putative genes identified in P. vulgaris PV-GBa 0061D18 BAC clone ...... 136 Figure 41. Deduced amino acid sequence of PvPAL3 ...... 138 Figure 42. The three-dimensional structure of the predicted protein of the PvPAL3 gene ...... 139 Figure 43. Genomic DNA sequence comparison of PvPAL2 and PvPAL3 ...... 140 Figure 44. Comparison of amino acid sequences of PvPAL2 and PvPAL3 with PALs from soybean and Arabidopsis thaliana PAL3 ...... 144 Figure 45. A Neighbor-joining phylogenetic tree of Phenylalanine ammonia-lyase (PAL) amino acids sequences...... 146 Figure 46. Overview of shared microsynteny between common bean BAC PV-GBa 0005G03 . 168 Figure 47. Overview of shared microsynteny between common bean BAC PV-GBa 0083H05 . 172 Figure 48. Overview of shared microsynteny between common bean BAC PV-GBa 0072I22... 176 Figure 49. Overview of shared microsynteny between common bean BAC PV-GBa 0043K12 . 179 Figure 50. Overview of shared microsynteny between common bean BAC PV-GBa 0079P22 .. 182 Figure 51. Overview of shared microsynteny between common bean BAC PV-GBa 0061D18 . 185

x

LIST OF ABBREVIATIONS

4CL1, 4-coumarate CoA

AS1, Glutathione S-

CAD, Cinnamyl alcohol dehydrogenase

CCR, Cinnamyl CoA reductase

CHI, Chalcone isomerase

CHR, Chalcone reductase

CHS, Chalcone synthase

DFR, Dihydroflavonol 4-reductase

F3’H, Flavonoid 3’ hydroxylase

F5H, Ferulate 5-hydroxylase

FBP4, Lignin proxidase

IFR, Isoflavanone reductase

IFS, Isoflavanone synthase

IOMT, 7-O-methyltransferase

LAC, Laccase

LAR, Anthocyanidin reductase

Myb, Myb transcription factor

PAL, Phenylalanine ammonia lyase

RT, Rhamnosyl transferase

VT, Vacuolar transporter

xi

I. Introduction

The bean crop in the world and Ontario

Common bean (Phaseolus vulgaris L.) is a nutritionally and economically important food crop grown around the world. It is a major source of dietary protein in Latin America and a source of income for specialty crop farmers in Canada and other countries. Brazil, India, Myanmar, China and USA are the top producers of dry beans in the world (FAOSTAT, 2011).

Canada produced 162 thousand tones of dry beans in 2011-12 (AAFC, 2013) with average price of 970-1000 $/tone. Eighteen ktons of this production was used domestically, whereas, 144 ktons were exported. Ontario, Manitoba and Alberta are the major dry bean producing regions (AAFC, 2011). Ontario produces approximately 60% of Canada's dry beans.

In 2010, beans provided $236 million from exports with $130 million from Ontario production

(AAFC, 2011).

Centres of origin for P. vulgaris

P. vulgaris (common bean) is the most important member of the Phaseolus genus. Common beans were domesticated in two distinct geographical areas, Middle America and the southern

Andes, which resulted in two gene-pools. The Middle American region encompasses southern

Mexico and northern Central America, and the Andean region includes southern Peru, Bolivia and Argentina (Gepts, 1988). Several lines of evidence including anthropological, molecular, proteomics, metabolomics and transcriptomics support the existence of different gene pools

(Gepts, 1988; Mensack et al. 2010). Evidence from DNA sequence analysis of different genes presents clear evidence of a Mesoamerican origin of P. vulgaris, which was most likely located in Mexico. From there different migration events extended the distribution of P. vulgaris into

1

South America (Bitocchi et al. 2012). Differences in molecular markers, morphology, reproductive isolation, and geographical and ecological adaptations support the idea that divergence of the wild ancestors occurred before domestication of the Middle American and

Andean landraces (Singh, 1991).

The first gene pool, known as the Mesoamerican gene pool, contains small and medium- seeded beans, whereas the Andean gene pool contains mainly large-seeded beans (Gepts, 1988).

The Middle American landrace can be further subdivided into three races, Mesoamerica,

Guatemala, Durango, and Jalisco (with the latter two grouped together); and the Andean landrace can be subdivided into three races: Chile, Nueva Granada and Peru (Singh et al. 1991; Beebe et al. 2000, 2001; Blair et al. 2009).

Among the North American market classes of dry beans, the small seeded navy, small red and black beans belong to race Mesoamerica, while the medium seeded pinto, great northern, pink, and red Mexican beans belong to the Durango race of the Mesoamerican gene pool.

Moreover, the large seeded kidney and cranberry beans belong to the race Nueva Granada and the cranberry beans belong to the race Chile of the Andean gene pool (Singh, 1991). These ecogeographical races can be distinguished from one another using molecular markers, as well as morphological and physical characteristics (Singh, 1991).

2

II. Literature review

Nutritional importance of dry bean

Common bean is a major source of calories and protein in many developing countries throughout the world. In South and Central America as well as Eastern and Central Africa dry bean is the main source of dietary protein (Reyes-Moreno and Paredes-Lopez, 1993). Common beans are also good sources of vitamins (thiamine, riboflavin, niacin, vitamin B6) and minerals such as:

Ca, Fe, Cu, Zn, P, K, and Mg. They are excellent sources of complex carbohydrates and polyunsaturated free fatty acids (linoleic, linolenic) (Reyes-Moreno and Paredes-Lopez, 1993).

Legumes and cereals are nutritionally complementary since legumes are a rich source of S- containing amino acids and lysine, whereas, cereals contain more methionine (Gepts and Bliss,

1985).

Apart from being an important source of protein, beans have many therapeutic benefits.

Flavonoids that are present in beans have been shown to have anti-oxidant, anti-cell proliferation, anti-tumor, anti-inflammatory, and pro-apoptotic activities (Williams et al. 2004;

Taylor and Grotewold 2005; Singh et al. 2008). These compounds interact with key , as well as signaling cascades involving cytokines and transcription factors, or antioxidant systems with health-promoting effects (Polya, 2003). On the other hand, some flavonoids may cause beans to darken in colour upon aging and also making them hard to cook and digest (Aw and

Swanson, 1985).

Alternatively, beans have antinutritional factors that interfere with the biological utilization of nutrients. These factors include oligosaccharides (raffinose, stachyose and verbascose), tannins, sapponins, phytic acid and lectins. Oligosaccharides in bean have limited

3 digestibility and are responsible for gastrointestinal discomfort in some individuals (Price et al.

1988). Saponins, which are primarily anti-predation compounds in plants, cause the lysis of erythrocytes and make the intestinal mucus membrane permeable (Khalil and El-Adawy, 1994).

Phytic acid also binds to proteases and amylases in the digestive tract, which can interfere with protein and carbohydrate digestion. These compounds also can block the absorption of macro nutrients like zinc, calcium and iron (Tabekhia and Luh, 1980). Lectins are made of sugar- binding proteins and have the strongest antinutritional effect. They can cause bloating, nausea and diarrhea (Liener, 1982). Hernandez-Infante et al. (1979) found that the digestibility of bean protein were lowest for black seeded bean followed by red beans, while they were highest for white beans.

The genetics of seed coat colour in bean

Considerable variability exists in common bean (P. vulgaris L.) for seed characteristics, and consumers have specific preferences for various combinations of size, shape, and colour of the dry bean seeds. Seed coat colour and seed size are the two main characteristics that identify the numerous market classes recognized throughout the world (Beninger et al. 1998a).

Genetic analyses have identified specific genes that control seed coat pattern (T, Z, L, J,

Bip, and Ana) and colour (P, C, R, J, D, G, B, V, and Rk); (Prakken, 1970, 1972). In Phaseolus

C, D, and J are the colour genes whereas, G, B, V and Rk are modifying genes that have intensifying or darkening effects upon pale colours (Prakken, 1970). Many of these genes exhibit epistatic interactions with other genes, which define the many seed coat patterns and colours observed within the species (Beninger et al. 1998a). Table 1 shows the phenotype caused by each colour gene and its position on the bean chromosome.

4

Table 1. Seed coat colour and pattern genes and their effects on the seed coat of common bean based on the core map linkage group designation. (Adapted from Bassett, 2007) Colour and pattern genes of common bean seed coat Map linkage group Trait name Symbol (none found) B1 Brown seed coat B B2 Red kidney colours (recessive red) Rk B2 Hilum ring colour or partly coloured seed coat Z B3 Yellow seed coat G B4 (none found) B5 Violet (to blue or black) seed coat V B6 Seed coat colour ground factor P B7 Greenish yellow seed coat Gy B8 Oxblood red seed coat (dominant red) R B8 Seed coat pattern (vs. cartridge buff) C B8 Pod colours (purple to red) and patterns Prp B8 Partly coloured seed coat (ground factor) T B9 Partly coloured seed coat (bipunctata) Bip B10 Immature seed coat colours (and J B10 afterdarkening) or partly coloured seed coats (none found) B11

The P locus is known as the ground factor for all seed coat colour genotypes. The p allele causes white seed coat and flower (almost) regardless of the genotype for any other gene in the complex genetic system controlling seed coat colour. In the presence of P allele, flower and seed coat color are determined by plant genotype for other color genes. There are several alleles at the

P locus. The pgri allele is for grayish white colour. This gene causes (with V) grayish white seed coat and a distinctive pattern of violet and white flower colour. The pstp (stippled) allele causes fine dotting on the seed coat. Another allele at this locus is phbw (half banner white) which controls a distinctive seed coat and flower pattern. The presence of the allele, pmic (micropyle stripe) results in a unique phenotype in the seed coat but nothing on the flower. On the basis of

5 allelism test results (Bassett, 2007), the dominance order of the six known alleles at P locus is as follows:

P> pmic> phbw> pstp> pgri> p

Association between colour genes and the phenylpropanoid pathway in common bean

In common bean a number of flavonoids have been isolated and identified (Feenstra, 1960).

Researchers accept the fact that the pigments responsible for seed coat colour in P. vulgaris are flavonoids and related chemical substances (Beninger et al. 1998b). In particular, several glycosidic forms of quercetin and kaempferol have been identified in seeds of P. vulgaris

(Beninger et al. 1998b; Beninger et al. 1999; Clifford, 1996; Hertog et al. 1993; Hertog et al.

1992, Romani et al. 2004; Scalbert and Williamson, 2000; Vinson et al. 1998). The seed coat colour of dry beans is determined by the presence and amounts of flavonol glycosides, anthocyanins, and condensed tannins (proanthocyanidins) (Feenstra, 1960). Beninger et al.

(1998b, 1999) obtained a variety of seed coat colour genotypes and identified the isolated phenolic compounds that were responsible for colour.

There has been a desire to investigate the relationship between colour genes and pigment synthesis in common bean. Feenstra (1960) found an association between C, J and V genes and flavonoid compounds but was unable to determine the biochemical effects of single genes. The analysis of flavonoids present in ‘Prim’, a manteca type dry bean with a yellow seed coat colour, was the beginning of the renewed efforts to establish the relationship between the Mendelian genes controlling seed coat colour in P. vulgaris and the pigments present in the seed coat

(Beninger et al.1998b). Two flavonol glycosides and no proanthocyanidins (condensed tannins) were found in the methanol extracts of ‘Prim’ seed coat. This work showed that only flavonol

6 monomers were produced when the dominant alleles C and G were present and the remaining loci were recessive but did not determine the biochemical actions of any of the genes responsible for seed coat colour (Beninger et al.1998b).

Feenstra (1960) concluded from his studies that C promotes the formation of anthocyanins and flavonol glycosides and Leakey (1988) showed that the genes G and B control the production of a 3, 5- diglycoside on the quinoid ring of the flavonol and the hydroxylation of the B-ring of the flavonoid nucleus, respectively. Beninger et al. (1999, 2000) showed that anthocyanin production is dependent on V, and that the B gene may act to regulate a common precursor to all anthocyanins (Fig 1). The precursor is then converted to anthocyanins (Beninger et al. 2000). Substitution of allele b for B, caused a decline in the amount of the main flavonoid monomer, astragalin (kaempferol 3-O-glucoside) (Fig 1). Dihydrokaempferol is needed for synthesis of both anthocyanins and flavonol glycosides, thus B may act to promote synthesis of a common precursor, either at or before conversion of the flavanone, naringenin, to dihydrokaempferol (chalcone synthase or chalcone isomerase steps) in the flavonoid biosynthetic pathway (Beninger et al. 1999) (Fig 1). Beninger et al. (1999, 2000) provide experimental evidence supporting Feenstra’s (1960) hypothesis that dominant J in P. vulgaris promotes the production of proanthocyanidins (condensed tannins) and showed that only genotypes with a dominant allele at the J locus had proanthocyanidins (Fig 1). They also found three flavonol glycosides and proanthocyanidins but no anthocyanins in ‘Montcalm’ (dark red kidney) and concluded that with P and cu, three flavonol monomers are produced, but the alleles, rk and rkd interact with cu to restrict the production of the flavonol glycosides to one kaempferol

(astragalin) compound, the other two flavonols being quercetins (Beninger et al. 1999).

7

Figure 1. Summary of the work of Beninger and Hosfield showing the phenylpropanoid pathway leading to the flavonoid pigments found in seed coats of P. vulgaris and the genes involved in the various compound inter-conversions (Hosfield, 2001).

8

To date, our knowledge of the flavonoids resulting from the action of the seed coat colour determining genes has been for the most part inferred from other plant systems such as maize

(Zea mays L.), petunia (Petunia hybrida Hort. Vilm-Andr.) and snapdragon (Antirrhinum majus

L.) (Koes et al. 1994). But, the identities of the seed coat colour genes in P. vulgaris, the pathways they function in and the products resulting from their action have not been elucidated and are largely speculative.

9

Phenylpropanoid pathway and colour

Phenylpropanoids are a diverse group of plant secondary metabolites, including: anthocyanins, flavonols, proanthocyanidins (PAs), and lignins (Fig 2). These compounds accumulate in a wide variety of plant tissues (Deluc et al. 2006).

Figure 2. Scheme of branch pathways of phenylpropanoid metabolism in plants leading to the synthesis of anthocyanins, flavonols, PAs, and lignin. Enzymes that function in multiple or specific pathways are indicated. Abbreviations are as follows: ANR, anthocyanidin reductase; ANS/LDOX, anthocyanidin synthase; CAD, cinnamyl alcohol dehydrogenase; C4H, cinnamate 4-hydroxylase; CCR, cinnamyl-CoA reductase; C3H, 4-coumarate 3-hydroxylase; 4CL, 4- coumarate-CoA ligase; CHS, chalcone synthase; CHI, chalcone isomerase; COMT, caffeic acid O-methyltransferase; DFR, dihydroflavonol 4-reductase; F3H, flavanone 3-hydroxylase; F3'H, flavonoid 3'-hydroxylase; F5H, ferulate 5-hydroxylase; FLS, flavonol synthase; 3GT (UFGT), UDPG-flavonoid-3-O-glucosyltransferase; LAR, leucoanthocyanidin reductase; LDOX, leucoanthocyanidin dioxygenase; PAL, Phe ammonia-lyase; 3RT, anthocyanidin-3-glucoside rhamnosyl transferase. (From Deluc et al. 2006)

10

Flavonoids are low molecular weight secondary metabolites that are not essential for plant survival. These bioactive compounds are widely distributed throughout the plant kingdom.

Over 9000 structural variants are known (Williams and Grayer 2004) and their biosynthetic pathways have been characterized in detail in numerous plant species (Springob et al. 2003).

Flavonoids have different roles in plants, including effects on: auxin transport (Peer and Murphy,

2007), plant defense (Treutter, 2005), allelopathy (Bais et al. 2006), and the levels of reactive oxygen species (ROS) (Taylor and Grotewold 2005; Bais et al. 2006). Flavonoids attract pollinators by providing flower colour (Mol et al. 1998) and in many species they are required for pollen viability (Mo et al. 1992). However, mutants lacking flavonoids have viable pollen in

Arabidopsis thaliana (Burbulis et al. 1996), which suggests that other molecules can compensate for their absence in these plants. Flavonoids protect plants against UV radiation (Winkel-Shirley,

2002) and flavonoid-deficient mutants are susceptible to UV irradiation (Li et al. 1993).

Favonoids have also have evolved particular roles in legume to signal symbiotic bacteria in establishing the legume-Rhizobium symbiosis (Wasson et al. 2006). They have important direct roles in root nodule development (Zhang et al. 2009a). Flavonoids also accumulate in the different legume organs and influence the development of in vitro root formation (Imin et al.

2007). Moreover, PAs subsequently lead to condensed tannins (CTs) and these in turn can protect plants against microbial and fungal growth (Dixon et al.2005).

The first step in the general phenylpropanoid pathway is catalyzed by phenylalanine ammonia lyase (PAL), and produces cinnamic acid from phenylalanine. Subsequent steps in the phenylpropanoid pathway channel compounds into diverse branches, such as the flavonoid and isoflavonoid branch, leading to the synthesis of a wide variety of metabolites.

11

Chalcone synthase (CHS) is the first enzyme leading to the flavonoid/anthocyanin biosynthetic pathway. It belongs to the type III polyketide synthase (PKS) family of enzymes and catalyzes the stepwise condensation of one p-coumaroyl-coenzyme A (CoA) with three malonyl-CoAs to produce naringenin chalcone and isoliquiritigenin that have the C15 flavonoid skeleton. These central intermediates go through isomerization and further substitution to produce a set of flavonoid phytoalexins, anthocyanin pigments, and isoflavonoids (Fig 3). In the isoflavonoid biosynthesis branch, CHS supplies the chalcone to downstream enzymes that synthesize a diverse set of isoflavonoids (Du et al. 2010).

Chalcone isomerase (CHI) catalyzes the cyclization of chalcone to form flavanone. The

first step for isoflavonoid biosynthesis is a reaction catalyzed by the isoflavone synthase (IFS).

The immediate of the IFS reaction is (2S)-flavanone (Tian and Dixon, 2006).

Dehydration of (2S)-flavanone results in the production of isoflavone, whereas its oxidation, which is accomplished by flavone synthase I or II (FNS I and II), yields flavones.

Dihydroflavonols arise from (2S)-flavanones by the action of flavanone 3ß-hydroxylase

(F3H/FHT). F3H has a key position in flavonoid metabolism and competes with FNS I or II and to control the flux of flavanones into branch pathways for end products that have distinct physiological functions such as signals for pollinators and other organisms, participating in plant hormone signaling, facilitating pollen-tube growth and plants protection from UV-B radiation

(Owens et al. 2008). Colourless or yellowish flavonols may be formed from dihydroflavonols by the action of flavonol synthase (FLS). Jointly acting dihydroflavonol 4-reductase (DFR) and leucoanthocyanidin dioxygenase (LDOX; synonym anthocyanidin synthase, ANS) compete with

FLS in utilizing dihydroflavonols for the formation of anthocyanidins or proanthocyanidins

(Davies et al. 2003). Flavonoids and particularly anthocyanidins may be glucosylated by UDP-

12 glucose: flavonoid O-glucosyltransferases (UFGT) for stable storage of pigments (Davies and

Schwinn, 2007).

Figure 3. General scheme of the flavonoid pathway. CHS, chalcone synthase; CHI, chalcone isomerase; FHT (F3H), flavanone 3-b-hydroxylase; DFR, dihydroflavonol 4-reductase; LDOX (ANS), anthocyanidin synthase; FGT, flavonoid glycosyltransferase; FNS, flavone synthase; FLS, flavonol synthase; LAR, leucoanthocyanidin reductase; ANR, anthocyanidin reductase; IFS, isoflavone synthase; HID, 2-hydroxyisoflavanone dehydratase, F2H, flavanone 2- hydoxylase; HFD, 2-hydroxyflavanone dehydratase (Martens et al. 2010).

Phenylalanine ammonia lyase (PAL)

Phenylalanine ammonia-lyase (PAL; EC 4.3.1.5) is the first enzyme from the primary metabolism into the important secondary phenylpropanoid metabolism in plants (Hahlbrock and

Scheel, 1989). PAL catalyzes the nonoxidative elimination of ammonia from l-Phe to give trans- cinnamate (Fig 4) which is the precursor of numerous phenylpropanoid compounds (Dixon and

Paiva, 1995). In monocots, PAL can utilize L-tyrosine (L-Tyr) as producing p- coumaric acid and an ammonium ion (Rosler et al. 1997).

13

Figure 4. PAL and TAL conversions, showing substrates and products (Taken from Cochrane et al. 2004).

PAL is also a key enzyme in plant stress response. Pathogenic attack, tissue wounding,

UV irradiation, low temperature, or low levels of nitrogen, phosphate, or iron can stimulate its biosynthesis (Dixon and Paiva, 1995; Sarma and Sharma, 1999). Furthermore, PAL is involved in the biosynthesis of the signaling molecule salicylic acid, which is required for plant systemic resistance (Nugroho et al. 2002; Chaman et al. 2003). PAL activity is regulated by different mechanisms, such as allosteric effects, inactivation by specific proteases and end-product inhibition (Tena et al. 1984). Studies have shown that the product (transe-CA) also plays a major role in the down-regulation of the phenylpropanoid pathway by inhibiting PAL mRNA transcription (Edwards et al. 1990).

Because PAL is ubiquitously distributed in plants and absent in animals, it is a promising target for herbicides (Appert et al. 2003). This enzyme lacks a and because of its unusual nonoxidative deamination reaction, it requires an electrophilic group, which is not available among the 20 standard amino acid residues (Hanson and Havir, 1970). Therefore it was suggested that the electrophile is produced by a posttranslational dehydration of a serine to form a dehydroalanine (Hanson and Havir, 1970). The serine was later identified as Ser203 (Schuster and Retey, 1994).

14

PAL also has a medical application for a human genetic disease, phenylketonuria that leads to severe mental retardation (Levy, 1999). It has been shown in mice that PAL can be used to convert poisonous excess Phe in the blood into the harmless compounds trans-cinnamate and ammonia. Therefore, PAL may become a useful cure for phenylketonuria (Safos, 1995).

Purified PAL from various sources has a molecular mass of 270–330 kDa (Watanabe et al. 1992). The enzyme is tetrameric with identical subunits and pairs of monomers form a protomer with a single (Camm and Towers, 1973). The PAL enzyme is widely distributed in higher plants (Koukol and Conn, 1961), fungi (Sikora and Marzluff, 1982), yeasts

(Orndorff et al. 1988), and in a single prokaryote, Streptomyces (Bezanson et al. 1970), but it is absent in true bacteria and animal tissues. The enzyme is soluble and cytoplasmic in origin in the majority of the cases (MacDonald and D’Cunha, 2007). Deshpande et al. (1993) showed that in potato tubers it may be present as a multi-enzyme complex with two other enzymes of the phenylpropanoid pathway.

The isolation of PAL and the characterization of a reaction it catalyzed were first reported by Koukol and Conn (1961) for Hordeum vulgare and the enzyme was given the name phenylalanine deaminase. Other major breakthroughs include the identification of a carbonyl group in the active site of PAL (Hodgins 1968) and the enhancement of PAL activity in plants upon exposure to gamma radiation (Riov et al. 1968; Oufedjikh et al. 2000)

PAL is usually encoded by a small multigene family (Cramer et al.1989; Bolwell et al.1985). Arabidopsis has four putative isoenzymes (Cochrane et al. 2004). In raspberry and bean it is encoded by a multi-gene family with few members (Kumar and Ellis, 2001; Cramer et al.

15

1989). Potato appears to be an exception with more than 40 copies reported (Joos and Hahlbrock,

1992).

PAL genomic sequences were isolated from bean (P. vulgaris L.) genomic libraries

(Bolwell et al. 1985). Three divergent classes of PAL genes were found in the bean genome and polymorphic forms were observed within each class (Cramer et al. 1989). The nucleotide sequences of two PAL genes showed that PAL2 contains an open reading frame encoding a polypeptide of 712 amino acids, interrupted by a 1720 bp intron in the codon for amino acid 130

(Cramer et al. 1989). On the other hand, PAL3 containes a 447 bp intron at the same location and encodes a polypeptide of 710 amino acids with 72% similarity with the protein encoded by PAL2

(Cramer et al. 1989). At the nucleotide level, PAL2 and PAL3 showed 59% sequence identity in exon I, 74% identity in exon II, and extensive sequence divergence in the intron, 5' and 3' flanking regions (Cramer et al. 1989). Transcription start sites of PAL2 and PAL3 are located 99 bp and 35 bp upstream of the initiation codon ATG, respectively. Both PAL2 and PAL3 were activated by wounding of hypocotyls whereas only PAL2 was activated by elicitor (Cramer et al.

1989). The 5' flanking region of both genes contain TATA and CAAT boxes and PAL2 contains a 40 bp palindromic sequence and a 22 bp motif that are also found at similar positions in other elicitor-induced bean genes (Cramer et al. 1989). PAL1 from Bambusa oldhamii was the first intronless PAL gene found in angiosperm plants (Hsieh et al. 2011).

From a physiological perspective, PAL is expressed constitutively and can also be induced upon exposure to various stresses (Qzeki et al. 1990; Olsen et al. 2008). The expression pattern is also different in different plant organs (Kao et al. 2002). Studies in Arabidopsis have shown that Atpal1 and Atpal2 are preferentially induced by stimuli such as light, wounding, heavy metals (e.g. mercuric chloride) and pathogen attack (Ohl et al. 1990) whereas Atpal3 is

16 expressed mainly in roots and leaves (Wanner et al. 1995) and Atpal4 is more abundant in developing seed tissue (Costa et al. 2003).

In tomato there are at least five different classes of PAL genes with PAL5 as the most strongly expressed form (Lee et al. 1992). Studies have shown that in at least one member of this group, transcription is initiated from two sites that appear to be differentially regulated in response to changes in light or wounding or to infection by a plant pathogen (Chang et al. 2008).

Expression analysis of bean PAL2 has shown that this gene is expressed in the early stages of vascular development at the inception of xylem differentiation and is associated with the synthesis of lignin precursors. Studies of PAL2 promoter-[beta]-glucuronidase gene fusions in transgenic tobacco plants showed that cis elements located between nucleotides -289 and -74 relative to the transcription start site were essential for xylem expression (Leyva et al. 1992). The

-135 to -119 region implicated in xylem expression contains a negative element that suppresses the activity of a cis element for phloem expression located between -480 and -289. Interaction between these elements of the PAL2 promoter provides a flexible mechanism for modifying tissue specific expression (Leyva et al. 1992).

To provide a basis for detailed structure–function studies, the PAL enzyme from parsley

(Petroselinum crispum) was crystallized (Ritter and Schulz, 2004). The structure of PAL shows that this key plant enzyme contains a shielding domain which restricts the access to the active center so that the risk of inactivation by nucleophiles in conjunction with dioxygen is minimized

(Ritter and Schulz, 2004). This may help PAL to function in stressed plant tissue. It also has been shown that PAL forms its electrophilic prosthetic group autocatalytically from its own

17 polypeptide which makes it independent of any cofactor and facilitates its upregulation (Ritter and Schulz, 2004).

Chalcone synthase (CHS)

CHS is a well-studied enzyme that catalyzes sequential decarboxylative condensation of p- coumaroyl-CoA with three molecules of malonyl-CoA to produce a new aromatic ring system, naringenin chalcone, the key intermediate in the biosynthesis of flavonoids (Fig 5) (Abe and

Morita, 2010; Martin, 1993). Isomerization is accomplished in vivo by the enzyme chalcone isomerase (CHI) to produce chalcone. The product of CHS may be further modified in a number of subsequent biochemical steps to yield many different end products (Fig 3) (Durbin et al.

2000).

Figure 5. Typical enzyme reactions catalyzed by plant chalcone synthase (Abe and Morita, 2010).

Chalcone synthase functions as a homodimer (Martin, 1993) and can use other substrate besides

4-coumaroyl-CoA (Ibrahim and Varin, 1993). One of the end products is anthocyanin, which is a pigment responsible for flower colour in plants. CHS was thought to be a cytosolic enzyme

(Beerhues and Wiermann, 1988) although there is now evidence that it is associated with the cytoplasmic side of the endoplasmic reticulum (Hrazdina et al. 1987).

18

A cysteine at amino acid 169 was thought to be part of the site that binds 4-coumaroyl-

CoA and is required for enzyme activity (Martin, 1993). However studies of the three- dimensional structure of alfalfa CHS2 have provided a view of the active site that catalyzes chalcone formation (Ferrer et al. 1999). Four residues (Cys164, His303, Asn336, and Phe215) form the catalytic center of CHS and are strictly conserved in all CHS sequences (Ferrer et al.

1999). The high sequence similarity and conserved gene structure suggest that CHS genes may originate from a common ancestor (Yang and Gu, 2006).

Chalcone synthase is thought to have arisen from an enzyme involved in fatty acid synthesis (Stafford, 1991). All CHS genes studied so far in flowering plants contain one intron at a conserved site, with the exception of the CHS in Antirrhinum majus, which has a second intron

(Sommer and Saedler, 1986). Generally, loss of CHS function results in a lack of anthocyanin and an albino flower colour phenotype (Durbin et al. 2000). However, it is now known that CHS is encoded by a small multigene family in many species (Martin, 1993) including in those species containing mutations that result in loss of CHS activity. As many as eight CHS genes are found in bean (Ryder et al. 1987) and eight or more are found in petunia (Koes et al. 1989).

Studies on the expression pattern of different members of the CHS gene family showed that different duplicate copies of CHS have acquired specialized functional roles over the course of evolution which comes from differentiation in gene expression (Durbin et al. 2000).

In bean (P. vulgaris), CHS comprises a gene family of at least seven members with different transcription level in response to external stimuli and internal cues (Ryder et al. 1987).

Each member of the gene family is differentially transcribed in a tissue- and development- specific manner, and the overall pattern of CHS expression reflects the sum of differential transcriptional activation of the individual CHS genes (Schmid et al. 1990). CHS1, CHS4,

19

CHS14 and CHS17 share 96%―98% identity in sequences but are expressed differently. Ryder et al. (1987) determined that the CHS1 transcript was markedly induced by illumination but the

CHS4, CHS14 and CHS17 transcripts did not accumulate in response to illumination. The same study showed that the CHS1, CHS4 and CHS14 transcripts were strongly induced by fungal elicitor, while the CHS17 transcript was only slightly induced (Ryder et al. 1987). Studies of the promoter regions of two members of the bean CHS gene family, CHS8 and CHS15, in transgenic tobacco, have shown that a 1.4 kb CHS8 promoter fragment is activated in the inner epidermal cells of petals, and in root and lateral root meristems (Schmid et al. 1990). The CHS8 promoter is inducible in leaves by both abiotic and biotic elicitors; including UV light, wounding, mercuric chloride, fungal elicitor, or infection with Pseudomonas syringae (Doerner et al. 1990; Schmid et al. 1990; Stermer et al. 1990; Sun et al. 2011). In contrast, a 490 bp CHS15 promoter fragment was activated by UV light and mercuric chloride, but not by P. syringae (Stermer et al. 1990).

Most functional studies of the CHS15 promoter have shown that it is responsive to fungal elicitor or glutathione (Choudhary et al. 1990; Dron et al. 1988; Harrison et al. 1991; Lawton et al.

1991). Studies of the promoter regions of bean CHSs have defined several functional regulatory elements that serve as the binding sites for transcription factors (Harrison et al. 1991; Lawton et al. 1991; Yu et al. 1993; Liu et al. 2011). These elements include a G-box (CACGTG) and three

H-boxes (CCTACC(N)7CT) (Yu et al. 1993), which are necessary and sufficient for stimulation of transcription by trans-p-coumaric acid (a phenylpropanoid pathway intermediate) (Loake et al.

1991; Loake et al. 1992) and are also implicated in activation of the promoter in response to fungal elicitor and glutathione.

Distal to the H- and G-boxes are three sequence elements, designated box I, II, and III, which are involved in the quantitative induction of CHS15 transcription by light (Dron et al.

20

1988; Harrison et al. 1991). These three boxes each contain one or two copies of the consensus motif GGTTAA(A/T)(A/T)(A/T) (Lawton et al. 1991) (Figure 6) in which the first six bases are identical to the GT-1 recognition motif GGTTAA (Green et al. 1988). A bean nuclear factor, designated SBF-1 (Silencer Box Factor-l) due to the silencing effect of the box I, II, III region

(Dron et al. 1988), has been identified that binds in vitro to each of box I, II, and III, with highest affinity for box III (Lawton et al. 1991). Phosphorylation of SBF-1 is required for binding activity, and may also be required to maintain stable binding in a preformed SBF-1/DNA complex (Harrison et al. 1991). Activation of expression in initiating tobacco lateral roots and in developing seeds demonstrate that specific deletions within the box I, II, III region modulate expression (Hotter et al. 1995). Figure 7 shows a schematic view of promoter of CHS15 in bean.

21

Figure 6. Sequence and related features of the bean CHS15 promoter. DNase I-footprinted regions using bean suspension cell nuclear extracts (SBF-1) are underlined and labelled Box I, Box II, and Box III. The positions of the 6 bp sequences (large dots) with consensus homology to the GT-1 core GGTTAA, and 3 bp extended regions of consensus homology (A/TA/TA/T, small dots) are indicated above the sequence. Also indicated with small dots is a sequence with lesser homology to these core and extended sequences. The three H boxes and the G box are underlined and labelled H Box and G Box. Also underlined are two sequence elements with the consensus GPuTPuGAGATG and the TATA box. Additional regions with sequence similarity to regionsof the bean PAL2 promoter are indicated with a dashed overline. Two of these regions show sequence similarity to the PAL2 promoter element CCACCAACCCC (Leyva et al. 1992), while the third region overlaps SBF-1 box III. The deletion endpoints of the CHS15 promoter-deletion constructs examined in this study are also shown (Hotter et al. 1995).

22

Figure 7. Bean CHS15 promoter and regulators. SBF silencer binding factor, H H-Box (CCTACC), G G-Box (CACGTG), a/a2 regulation loci (Dao et al, 2011)

In soybean, the I (Inhibitor) locus has two dominant forms (I and ii) that inhibit the pigmentation of the seed coat. The I allele results in a colourless or light yellow seed on the entire seed coat, whereas the ii allele produces a yellow seed coat with a pigmented hilum (where the seed coat attaches to the pod). Black or brown anthocyanin pigments have undesirable effects on protein and oil extractions of soybean, therefore, most cultivated soybean varieties have been selected for a yellow, nonpigmented seed coat (homozygous I or ii alleles) (Tuteja et al. 2009).

The CHS gene family in soybean genome contains nine members (CHS1–CHS9) with 80 to 99% sequence identity within the coding regions and a duplicate copy of CHS1 (Tuteja and Vodkin

2008). Five (CHS1, CHS3, CHS4, CHS5, and CHS9) of the nine non-identical CHS gene family members are clustered in a 200- to 300-kb region (Clough et al. 2004; Tuteja and Vodkin, 2008) on Gm08. Three of these five genes CHS1, CHS3, and CHS4 are grouped as two 10.91-kb perfect, inverted repeat separated by 5.87 kb of intervening sequence that define the I locus

(Todd and Vodkin, 1996). This structure silences the expression of all CHS genes, including

23

CHS7 and CHS8, located on other chromosomes (Tuteja et al. 2009). In plants with the recessive i allele, deletion of CHS promoter sequences from CHS4 or CHS1 results in increased

CHS7/CHS8 transcripts accumulation and pigmented soybean seed coats (Todd and Vodkin,

1996; Tuteja et al. 2004). CHS2, CHS6, CHS7, and CHS8, are mapped on Gm05, Gm09, Gm01, and Gm11, respectively (Matsumura et al. 2005). Matsumura et al. (2005) studied the coding sequences of the eight genes in this family and showed that genes CHS1–CHS6 were closely related and formed a close cluster in a phylogenetic tree, whereas CHS7 and CHS8 genes grouped into a separate clade.

Expression studies of CHS genes in soybean indicated a tissue specific pattern (Tuteja et al. 2004). The expression of both CHS7 and CHS8 genes was highest in roots (Tuteja et al. 2004) and these genes have a role in seed isoflavonoid biosynthesis (Dhaubhadel et al. 2007). The structural diversity within CHS7 and CHS8 promoters may lead to differential activation of these genes by different inducers as well as to developmental stage and tissue-specific differences in gene expression (Yi et al. 2010).

Dihydroflavonol 4-reductase (DFR)

Dihydroflavonol 4-reductase (DFR) is a main enzyme of the flavonoid biosynthesis pathway which catalyses the NADPH-dependent reduction of 2R, 3R-trans-dihydroflavonols to leucoanthocyanidins (Trabelsi et al. 2008). There are three different classes of anthocyanidins responsible for the primary shade of the flower colour: pelargonidin (orange to brick red), cyanidin (red to pink) and delphinidin (purple to blue) (Tanaka et al. 1998). DFR catalyzes an essential reaction in each of the three primary branches of anthocyanin synthesis, reducing colour less dihydrokaempferol (DHK), dihydroquercitin (DHQ), and dihydromyricetin (DHM) to produce pelargonidin, cyanidin, and delphinidin-based anthocyanins, respectively (Holton and 24

Cornish 1995) (Figure 6). DFRs from many species can utilize all three substrates since the three substrates of DFR are very similar in structure, differing only in the number of hydroxyl groups on the B phenyl ring, which is not the site of enzymatic action (Helariutta et al. 1993; Heller et al. 1985; Tanaka et al. 1995). Therefore, the synthesis of the three different anthocyanidins is mainly determined by the enzyme activities of two hydroxylases: flavonoid 3′-hydroxylase

(F3′H) and flavonoid 3′, 5′-hydroxylase (F3′5′H). F3′H converts DHK to DHQ and F3′5′H converts DHK to DHM or DHQ to DHM (Brugliera et al. 1999) (Fig 8). DFR competes with flavonol synthase and the hydroxylases for the dihydroflavonols to synthesize corresponding leucoanthocyanidins, which are precursors of anthocyanidins (Winkel-Shirley, 1999).

Subsequent modifications occur to the anthocyanin structure such as acylation, glycosylation and methylation which, in combination with the relative abundance of these anthocyanidins, determine flower colour. Vacuolar pH and copigmentation are other important factors (Holton and Tanaka, 1994). DFRs from some species, such as Petunia and Cymbidium, cannot reduce

DHK efficiently and these species cannot produce pelargonidin-based orange flower colour even if both F3′H and F3′5′H are absent (Forkmann and Ruhnau, 1987). The introduction of maize

DFR however successfully overcame the inability of Petunia DFR to reduce DHK and orange flowers were produced (Meyer et al. 1987).

25

Figure 8. A schematic diagram showing the chemical reaction catalyzed by dihydrofavonol 4- reductase (DFR).Other abbreviations used are; CHS: chalcone synthase, CHI: chalcone isomerase, F3H: flavanone 3-hydroxylase, F3’H: favonoid 3’-hydroxylase, F3’5’H: favonoid 3’5'-hydroxylase, ANS: anthocyanin synthase, 3GT: flavonoid 3-glucosyltransferase (Johnson et al. 2001).

DFR activity was reported for the first time in the case of a synthesized enzyme from Zea mays (O'Reilly et al. 1985). Afterwards, the activities of various DFRs were deduced from plant protein extracts or phytochemical analyses of mutants and/or transgenic plants, until the successful expression of a functional DFR sequence from Gerbera in eukaryotic cells (Martens et al. 2002). The purified DFR protein was described as very unstable (Forkmann and Ruhnau,

1987) which may explain why there is little information about the crystal structure of the enzyme and the only structural information concerning DFRs has been deduced from sequence analysis.

The DFR sequence contains most of the motifs for the short-chain dehydrogenase/reductase

26

(SDR) superfamily (Kallberg et al. 2002). Therefore DFR, like any other member of the superfamily should have a large α/β single conserved domain named the “Rossmann fold”.

However, sequence analyses cannot determine why most DFRs prefer to accept dihydroflavonols with different hydroxylation patterns as substrates. De Jong et al. (2003) and Zhang et al. (2009) showed that the ‘red allele’ of potato DFR is capable of reducing DHK, while other potato DFR alleles cannot. Studies of chimeric DFRs of Petunia and Gerbera showed that there is a region

(amino acids at position 132-157) in the protein that determines the substrate specificity of DFR.

Furthermore, by changing a single amino acid in this region, the DFR enzyme of Petunia preferentially reduces dihydrokaempferol (Johnson et al. 2001). Recent investigations indicated that DFR sequences presenting an asparagine residue at position 134 (Gerbera numbering) reduce DHK more readily than DHQ, whereas, those presenting an aspartic acid show a marked preference for DHQ (Shimada et al. 2005; Johnson et al. 2001). The assumption about the amino acid region determining the substrate binding site was only deduced from an analysis of sequence alignment and no 3D detailed description of the interactions between the substrate and the enzyme is available to date (Petit et al. 2007).

Expression of DFR gene can be affected by many physiological and environmental factors. A study of the promoter of this gene in grape showed that a specific sequence located between −725 to −233 might be involved in the expression of the DFR gene in fruits (Gollop et al. 2002). Environmental stimuli such as light, calcium and sucrose can induce or suppress DFR gene expression (Gollop et al. 2002). DFR expression can affect the metabolites of the whole plant (Takahashi et al. 2006) and is correlated with the anthocyanin accumulation pattern in florets: it is basipetally induced, epidermally specific and restricted to the ligular part of corolla

27

(Helariutta et al. 1993). The activity of the DFR enzyme can also be affected by other factors.

Trabelsi et al. (2008) showed that the activity of the enzyme is inhibited by its substrate.

DFR genes have been isolated from maize (Reddy et al. 1987), snapdragon (Martin et al.

1985) and petunia (Beld et al., 1989) and it has been shown that mutation in this gene leads to colourless aleurone layer in maize and uncoloured or partially coloured flowers in snapdragon.

Five DFR genes were found in a cluster within 38 kb region in the Lotus japonicus genome.

From the same plant a transversion at a splicing acceptor site of one DFR gene resulted in six cDNAs including two splicing variants (Shimada et al. 2005). The expression of each member of the family however is regulated independently (Yoshida et al. 2010). Lo Piero et al (2006) showed that in Citrus fruit mutation in a regulatory gene controlling the expression of DFR might play a role in the phenotypic change from blood to blond orange. In another study variegation in the flower colour of Linaria was attributed to an unstable mutant allele of the DFR gene. This allele carries an insertion of a transposon belonging to the CACTA family which blocks its expression and results in ivory flower colour phenotype. The transposon is occasionally excised in dividing epidermal cells to produce clonal patches of red tissue on the ivory background, and in cells giving rise to gametes to generate reversion alleles conferring a fully coloured phenotype (Galego and Almeida, 2007). In soybean the W4 locus encodes DFR2 that governs flower colour. Mutations in this locus, which are the result of an insertion of a

CACTA-like transposable element, have been characterized by variegated and pale flowers (Xu et al. 2010). In yellow onion (Allium cepa L.) cultivars, two novel inactive alleles of DFR were reported; in one allele the 3′ portion of the coding sequences was deleted and the other one contained a premature stop codon (Kim et al. 2009).

28

In common bean (P. vulgaris L.), variation in the first intron of the DFR gene was investigated by sequencing it in 92 genotypes that represent both landraces and cultivars. The allele-specific primers that were developed in this study were used to map this gene on linkage group B8 (McClean et al. 2004).

Myb Transcription factor

In plants, the structural genes of the flavonoid biosynthetic pathway are largely regulated at the transcription level (Sainz et al. 1997). In all species studied to date, the expression of anthocyanin biosynthetic genes are regulated by complexes of MYB transcription factors (TF), basic helix-loop-helix (bHLH) TFs and WD-repeat proteins (the MYB-bHLH-WD40 "MBW" complex) (Baudry et al. 2004). In a proposed model for the activation of structural pigmentation genes, regulators interact with each other to form transcriptional complexes in conjunction with the promoters of structural genes (Koes et al. 2005). In maize, ZmC1MYB and ZmBbHLH bind to the promoter of dihydroflavonol 4-reductase (DFR) gene, which is a flavonoid structural gene

(Goff et al. 1992). In petunia PhAN2 and PhJAF13 bind to the promoters of spinach DFR and

ANS (Schwinn et al. 2006). The WD-repeat protein is also involved in this complex. The petunia

WD-repeat protein AN11 acts upstream of AN2, which is a MYB transcription factor (De Vetten et al. 1997).

Myb TFs are characterized by a conserved DNA-binding domain of 52 amino acids, called the MYB domain, consisting of single or multiple imperfect repeats that binds to DNA in a sequence specific manner. A total of 126 R2R3 MYB TF-encoding genes were studied in

Arabidopsis (Stracke et al. 2007) and based on the number of highly conserved imperfect repeats in the DNA-binding domain, the MYB TFs were classified into three subfamilies including R3

MYB (MYB1R) with one repeat, R2R3 MYB with two repeats, and R1R2R3 MYB (MYB3R)

29 with three repeats (Rosinski and Atchley, 1998; Jin and Martin, 1999). Those associated with the anthocyanin pathway are of the two-repeat (R2R3) class (Allan et al. 2008).

Anthocyanin-regulating MYBs have been isolated from many species, including

Arabidopsis (AtMYB90 or PAP2, AtMYB113 and AtMYB114; Gonzalez et al. 2008), Solanum lycopersicum (ANT1; Mathews et al. 2003), Petunia hybrida (AN2; Quattrocchio et al. 1999),

Capsicum annuum (A; Borovsky et al. 2004), Vitis vinifera (VvMYB1a; Kobayashi et al. 2004),

Zea mays (P; Grotewold et al. 1991), Oryza saliva (C1; Saitoh et al. 2004), Ipomoea batatas

(IbMYB;1 Mano et al. 2007), Anitirrhinum majus (ROSEA1, ROSEA2 and VENOSA; Schwinn et al. 2006), Gerbera hybrid (GhMYB10; Elomaa et al. 2003), Picea mariana (MBF1; Xue et al.

2003), Garcinia mangostana (GmMYB10; Palapol et al. 2009), and Gentian (GtMYB3;

Nakatsuka et al. 2008).

In another study the gene for PRODUCTION OF ANTHOCYANIN PIGMENT 1 (PAP1, or AtMYB75) was a MYB transcription factor of Arabidopsis whose protein sequence demonstrated a high degree of amino acid conservation with other known MYB regulators of anthocyanin production (Allan et al. 2008). Overexpression of PAP1 caused the upregulation of several genes in the anthocyanin biosynthesis pathway and produced purple plants (Borevitz et al. 2000; Tohge et al. 2005).

PyMYB10, isolated from Asian pear (Pyrus pyrifolia) cv. 'Aoguan' was an ortholog of the

MdMYB10 gene, which regulates anthocyanin biosynthesis in red fleshed apple (Malus x domestica) cv. 'Red Field'. This gene had three exons and in its upstream sequence contained core sequences of cis-acting regulatory elements involved in light responsiveness (Feng et al.

2010). In V. vinifera four MYB genes have been identified, with VvMYBA1 and/or VvMYBA2

30 regulating colouration (Walker et al. 2007). In apples, three MYB genes that belong to the R2R3 class have been identified and are responsible for anthocyanin accumulation (Takos et al. 2006;

Espley et al. 2007).

Chinese bayberry (Myrica rubra) is a fruit crop with cultivars producing fruit ranging from white (Shuijing, SJ) to red (Dongkui, DK) and dark red-purple (Biqi, BQ). The transcript level of MrMYB1 and anthocyanin biosynthetic genes were strongly associated with the anthocyanin content in the ripe fruit of the three cultivars and were induced by light. The

MrMYB1d allele in this fruit had a 1 bp deletion at nucleotide 30 of the coding sequence. This nonsense mutation of the MYB1 protein which was present in the SJ and DK fruit may be responsible for no or low expression of MYB1 protein affecting multiple biosynthetic genes involved in anthocyanin accumulation in the white and red fruit (Niu et al. 2010).

An interesting and unique Purple (Pr) gene mutation in cauliflower (Brassica oleracea var botrytis) gives the phenotype of intense purple colour in the flower heads (a.k.a. curds) and a few other tissues. The Pr gene encodes a R2R3 MYB transcription factor and up-regulation of this gene activated a basic helix-loop-helix transcription factor and a subset of anthocyanin structural genes in the purple cauliflower. Studies showed that a DNA transposon insertion in the upstream regulatory region of this gene is responsible for the up-regulation of the Pr gene and induction of the phenotypic changes in the plant (Chiu et al. 2010). Myb transcription factors not only have an important role in transcription induction but also can repress the expression of flavonoid pathway genes. FaMYB1 is an R2R3 MYB from ripening strawberry was found to act as repressor to transcription of anthocyanin-related genes (Aharoni et al. 2001).

31

Phenylpropanoid compounds and disease resistance in common bean

Tannins are produced via the phenylpropanoid pathway and have the capacity to bind and coagulate nutrients such as protein, starches, and iron (Bate-Smith, 1973). These compounds have negative impacts on protein and organic matter digestibility and also decrease digestibility of forage (Makkar et al. 1997). Makkar et al. (1997) showed that the elimination of tannins from the seeds of Vicia faba beans improved their nutritional value. Tannins are located in the testa of coloured common bean seeds while the testa of white seeded beans generally has been shown to contain no or little tannins (Ma and Bliss, 1978; Diaz et al. 2010). Tannins are a subset of polyphenolics which are involved in seed colour expression and are often associated with plant resistance to pathogens or insects. Statler (1970) reported that common beans with more total phenols had greater resistance to root-rot disease. Harris and Burns (1973) reported that tannins were beneficial in the field since they provided resistance to fungi and seed vivipary.

Tannins have also been known for their negative effects on protein digestibility and Fe availability, therefore one possible way to improve the nutritional value of bean is to reduce tannins. However, reducing tannins might have adverse effects on bean disease resistance. These effects depend on the gene pool, the disease or pest, and the seed colour class (Islam et al. 2003).

Studies on a variety of common beans with different seed coat colour showed that black and red bean classes had a higher amount of tannins in their seed coats and these compounds were greater in beans from the Middle American gene pool compared to the Andean gene pool

(Islam et al. 2002). Different seed classes had different reactions to disease and pest infestations which were correlated to the tannin content of their coat extract. The black seeded beans in both gene pools showed significant higher levels of disease resistance. However, overall the relationship between seed coat extractable tannin levels and disease resistance was complex in

32 that study suggesting that selection for resistance might affect coat extract levels in the breeding program.

Linkage mapping in common bean

Lamprecht (1961) published the first genetic linkage map for common bean which contained previous linkage reports. There were 26 naturally occurring traits on this linkage map, including genes controlling the colour of flowers, seeds or pods. Further work was done to extend this map

(Awuma and Bassett, 1988; Gepts 1988; Koenig and Gepts 1989; Vallejos and Chase 1991). The revised map published by Bassett (1991) consisted of 13 linkage groups and 47 marker genes.

The availability of DNA-based markers in the mid-1980s facilitated further development of the common bean linkage map. Molecular linkage mapping of common bean was initiated by

Vallejos et al (1992) and Nodari et al. (1993) which subsequently evolved into two major bean mapping populations. The parental lines for both populations were widely divergent to maximize polymorphisms, phenotypic diversity and variability to disease resistance and other traits. The

Florida map, which was developed by Vallejos et al (1992), was based on a backcross (BC1) population. Later, a common bacterial blight resistance QTL was mapped using this mapping population (Yu et al. 1998). This map consists of 294 markers, including the pigmentation gene

P, and covers 900 cM (Vallejos et al. 2001).

The Davis map is an F2 population derived from the cross between BAT 93 of the Middle

American gene pool and Jalo EEP558 of the Andean gene pool (Nodari et al. 1993). The Paris

BC1 map was the next significant molecular map and it was published by Adam-Blondon et al. in 1994. This population used in this study was developed primarily to localize specific anthracnose resistance genes.

33

A CIAT map was developed from another Andean x Mesoamerican cross (DOR364 x

G19833) which has been used for mapping microsatellite and single nucleotide polymorphism

(SNP) markers and for QTL analysis of tannin concentration (Caldas and Blair, 2009; Galeano et al. 2012).

Many recombinant inbred (RI) populations have been developed and Kelly et al (2003) provided a list of 14 common bean populations. Eventually most maps integrate with the CIAT,

Davis (BJ) or Florida map.

Physical map in common bean

A large insert BAC library based on the Andean genotype G19833 was utilized to construct a common bean physical map (Schlueter et al. 2008). This library contained over 44 thousand

BAC clones and had ~12x genome coverage (M. Blair, pers. Communication). A total of 41,717

BAC clones were fingerprinted from this library to create the physical map, which assembled into 708 scaffolds (Schlueter et al. 2008). In total 540 markers derived from RFLPs, genes,

ESTs and other sequences have been physically put on this map. Out of 540, 84 are genetically mapped and provide linkage between the physical and genetic maps. The physical map is publicly available at http://phaseolus.genomics.purdue.edu/. Initial analysis of BAC end sequences showed that ~49% of the genome was repetitive and 29% genic (Schlueter et al.

2008). Newly generated information is deposited in the Legume Information System

(http://www.comparative-legumes.org/).

Genomics and bioinformatics in Legumes

Model plants are a source of the accumulated knowledge to transfer to crop plants by means of common gene sequences and DNA markers and in legumes some of the information of model

34 plants has been utilized in crops (Sato et al. 2007). This information can be used either to identify the synteny among legumes followed by the development of selection markers or direct gene transfer from a model plant to a crop (Deavours and Dixon, 2005).

The legume family (Leguminosae) is the third largest family of higher plants and includes

>19,000 species (Lewis et al. 2005). The legume family is divided into three subfamilies:

Caesalpinioideae, Mimosoideae, and Papilionoideae. Most of the economically important legumes are members of the subfamily Papilionoideae. Lotus japonicus and Medicago truncatula are the two best-characterized legume genomes which share a remarkably high level of conserved macrosynteny (Choi et al. 2004). Both of the model legumes are herbaceous plants of limited agricultural use with relatively small genomes. The estimated size of the genome of

Medicago is between 471 and 583 Mb (Medicago Genome Sequence Consortium, 2007) and for the Lotus genome it is 472 Mb (Sato et al. 2007). Lotus and Medicago are equally closely related to bean (Doyle and Luckow, 2003) with a larger genome size of 588 Mbp (Mercado-Ruaro and

Kenton, 1993) to 650 Mbp (Arugumuthanan and Earle, 1991). Macrosynteny between bean and

Lotus was described by Zhu et al. (2005) in a circle diagram which includes 8 of 11 bean linkage groups and the relations to Lotus were inferred through positions shared between Lotus and

Medicago. Meanwhile Galeano et al. (2010) showed the more direct synteny with both model species through marker homology.

In another study, the extent of macrosynteny between Lotus and bean was evaluated by positioning nodulation-related Leg markers (Hougaard et al. 2008) on the genetic linkage maps of lotus and bean. A total of 99 unique positions were shared between bean and Lotus with an average of 9 common markers per linkage group. The positions of shared marker on Lotus and bean chromosomes indicate that only a limited number of large-scale rearrangements occurred

35 during the evolution of legume chromosomes. There are blocks of syntenic loci which are only partly collinear and smaller internal chromosome rearrangements have occurred in addition to the large scale rearrangements (Hougaard et al. 2008). A comparison of the positions of 75 shared

Leg markers between bean and Medicago showed substantial synteny between these two species.

Two of the bean linkage groups have markers mapping to a single Medicago chromosome, whereas, six of the 11 bean linkage groups have markers that map to >2 different Medicago linkage groups. In both situations, blocks of syntenic loci show partial collinearity indicating that small internal chromosome rearrangements have occurred (Hougaard et al. 2008).

On the other hand, soybean has been a model for studies of seed development (Vodkin et al. 2008), root hair development and early nodulation responses, mineral uptake and protein and oil biosynthesis (Cannon et al. 2009). The estimated size of the soybean genome is 1,115 Mb.

The current assembly for the soybean genome consists of 950 Mb in 20 chromosome pseudomolecule sequences, and 23 Mb in additional smaller, unanchored scaffold sequence assemblies (Soybean Genome Sequencing Consortium, http://www.phytozome.net/soybean.php).

The soybean genome underwent polyploidy approximately 13 Mya (Shoemaker et al. 2006) and some evidence indicates that soybean and some other legumes underwent another duplication which is estimated to have occurred at approximately 59 Mya (Schlueter et al. 2007). The consequence of new duplication event in soybean is that any given genomic region in bean,

Medicago or Lotus is likely to correspond well with two genomic regions in Glycine (Schmutz et al. 2010). To test this assumption in bean, McClean et al. 2010 compared 300 mapped gene- based loci from bean against all of the scaffold sequences from the initial build of soybean genome. Their results showed that not only are nearly all of the P. vulgaris genes duplicated in soybean but also a single P. vulgaris linkage group is syntenic with multiple soybean

36 chromosomes (Fig 9). Galeano et al (2010) found the same aspect of macrosyteny comparisons between common bean and soybean genomes. This indicates that the soybean genome was fractionated and the fragments rearranged following the duplication event, to construct the modern soybean genome with 20 chromosomes.

Figure 9. Syntentic relationships of soybean relative to the common bean genetic map. A common bean genetic map anchors corresponding syntenic regions of the soybean 1.01 genome build. The location (in megabase pairs) of each soybean fragment on both sides of the common bean linkage is noted at the beginning and end of the homology (McClean et al. 2010).

37

Genomic features of common bean

P. vulgaris L. is a diploid species with 11 chromosomes and its genome size is estimated to be

588 Mbp (Mercado-Ruaro and Kenton, 1993).

Available Common Bean Genomic Resources

Ten BAC libraries are available in P. vulgaris (Gepts et al. 2008). Most libraries have coverage of 5–12x genomes, whereas BAT93 library has coverage of 20x. This library has been designated as the standard genotype for Phaseolus genomics (Broughton et al. 2003) but was not selected for immediate sequencing by DOE-JGI. G19833 which was selected for genomic sequencing is an Andean landrace from Peru, which has a BAC library made at Clemson Univ

(Blair and Muñoz, pers. Communic). To date, the full sequences of only two BAC clones have been published, one around the Co-4 locus for resistance to anthracnose (Melotto et al. 2004), and the other around the APA locus (Kami et al. 2006).

Preview release of the initial version of genome sequence of the common bean is available on Phytozome (www.phytozome.net). The main genome assembly is approximately

486.9 Mb arranged in 10,132 scaffolds. 1,601 scaffolds are bigger than 50kb in size, representing approximately 87.4% of the genome. There are 26,374 total loci containing protein-coding transcripts and 4,347 total alternatively spliced transcripts.

Thesis hypothesis and objectives

This research was aimed to test the hypothesis that seed coat color genes are structural or regulatory genes in the phenylpropanoid pathway.

38

The objectives of this study were to:

1) map phenylpropanoid pathway genes in two common bean RIL population of BAT93 ×

Jalo EEP 558 and OAC Rex × SVM Taylor Horticulture in order to find possible

associations between colour genes and phenylpropanoid genes in bean,

2) fully characterize few main genes from the phenylpropanoid pathway in common bean,

and

3) uncover co- linearity of phenylpropanoid genes between bean and soybean genome with

the help of new approach of whole genome comparison.

39

III. MAPPING OF PHENYLPROPANOID PATHWAY GENES IN COMMON BEAN (Phaseolus vulgaris L.)

Abstract

Seed coat colour is one of the main characteristics that define market classes in beans. Previous genetic analyses identified 15 genes that control seed coat pattern and colour in common bean and some of them have been positioned on the common bean linkage map. It has been hypothesized that genes involved in the phenylpropanoid pathway correspond to some of the classical seed coat colour genes in bean. In a previous study we cloned and sequenced fragments of thirty-five phenylpropanoid pathway genes from common bean. The purpose of this study was to position the phenylpropanoid genes on the common bean linkage map and determine whether their positions correspond to those determined for any of the classical seed coat colour genes.

Polymerase chain reaction (PCR) and restriction fragment length polymorphisms (RFLP) were used to map the phenylpropanoid pathway genes in two RI populations derived from

BAT93 × Jalo EEP 558 and OAC Rex × SVM Taylor Horticulture. The segregation patterns of

18 phenylpropanoid pathway genes (IFR, FBP4, CHR, LAC, IFS, 4CL, AS, CAD, F5H, Myb transcription factor, RT, IOMT, VT, CCR, F3’H, LAR, PAL1, PAL2) in the BAT93 × Jalo EEP

558 recombinant inbred line population were analysed and their locations in the bean linkage map were determined using JoinMap analysis. Five out of 18 genes were mapped within 2-17 cM of colour gene loci. In particular, associations were found between PAL1 (13.2 cM), PAL2

(17.1 cM) and Myb transcription factor (7.8 cM) to P (Seed coat colour ground factor), CHR (7 cM) to G (Yellow seed coat) and CAD (2.1) to B (Brown seed coat). Polymorphisms for five

40 phenylpropanoid pathway genes in OAC Rex × SVM Taylor were used to place them on the linkage map of this population. DFR1 was mapped 13.9 cM from a flower colour locus and the marker for this gene was significantly associated with this trait. Further studies are needed to confirm the roles of the phenylpropanoid genes as potential colour genes.

Ten SSR markers and 89 SNP markers were also scored for the 89 inbred line derived from cross between ‘OAC Rex’ and ‘SVM Taylor’. Their map location was added to the existing genetic map of this population.

41

Introduction

Considerable variability exists in common bean (Phaseolus vulgaris L.) for seed characteristics, and consumers have specific preferences for various combinations of size, shape, and colour of the dry bean seeds. Seed coat colour and seed size are the two main characteristics that identify the numerous market classes recognized throughout the world (Beninger et al. 1998a). Genetic analyses have identified specific genes that control seed coat pattern (T, Z, L, J, Bip, and Ana) and colour (P, C, R, J, D, G, B, V, and Rk); (Prakken, 1972; Prakken, 1970). In Phaseolus C, D, and J are the colour genes, whereas, G, B, V and Rk are modifying genes that have intensifying or darkening effects upon pale colours (Prakken, 1970). The distinction between these two groups is based on their interaction with the ground factor P. In addition, many of the seed coat colour genes exhibit epistatic interactions with other genes for partly coloured patterns (e.g., T,

Z, Bip, Fib) that define the many seed coat patterns observed within the species (Bassett, 2007).

The seed coat colours of dry beans are determined by the presence and amounts of flavonol glycosides, anthocyanins, and condensed tannins (proanthocyanidins) (Feenstra, 1960);

(Beninger and Clifford 1998). To date, our knowledge of the flavonoids resulting from the action of the seed coat colour determining genes has been inferred from other plant systems such as maize (Zea mays L.), petunia (Petunia hybrida Hort. Vilm-Andr.), and snapdragon (Antirrhinum majus L.) (Koes et al. 1994). However, the pathways and products resulting from the seed coat colour genes in P. vulgaris have not been elucidated and are speculative.

Many of the flavonoid pigments that give rise to seed coat colour in beans may also impart positive health benefits as antioxidants (Hagerman et al. 1998; Beninger and Hosfield,

2003; Romani et al. 2004; Takeoka et al. 1997). On the other hand, some flavonoids may cause

42 beans to darken in colour upon aging and making them hard to cook and digest (AW and

Swanson, 1985). It also has been shown that flavonoid compounds make the plant and seeds resistant to some diseases (Islam et al. 2003) and pests (Onyilagha et al. 2004).

Resolution of the genes responsible for flavonoids and tannin formation, along with the antioxidant activity of these compounds may enable breeders to select for varieties that have a range of antioxidant activities and also, perhaps, balance antioxidant activity with antinutritional effects. This relationship is complex and better understanding of it will help breeders to select germplasm with improved nutritional quality without adversely affecting disease or pest resistance.

43

Materials and methods

Plant materials

A recombinant inbred population of 75 individuals was obtained from the University of

California and used for mapping the phenylpropanoid genes in this study. This population was developed by single seed descent from a cross between ‘BAT93’ and ‘Jalo EEP558’ (Nodari et al. 1992). The BJ RI population is considered to be the common bean core mapping population

(Freyre et al. 1998) and has been used to map more than two thousand molecular markers

(http://phaseolusgenes.bioinformatics.ucdavis.edu/). BAT93 is a multiple disease resistant breeding line belonging to the Mesoamerican gene pool from the Centro Internacional de

Agricultura (CIAT), Cali, Colombia, Jalo EEP558 is a Brazilian genotype released by the

Estação Experimental de Patos de Minas (Nodari et al. 1993) from the Andean gene pool (Freyre et al. 1998). The parents have different morphologies and agronomic traits such as seed size, flower colour and the presence or absence of seed corona (Freyre et al. 1998).

The second population used in this study is an F4 recombinant inbred line (RI) derived from cross between OAC Rex and SVM Taylor. Eighty nine individuals of this population were used to map the phenylpropanoid genes and SSR and SNP markers. OAC Rex is a CBB

(Common Bacterial Blight) resistant cultivar that has been developed by the bean breeding program at the University of Guelph (Michaels et al. 2006). It is a small white seeded cultivar from the Mesoamerican gene pool and has an indeterminate growth habit. SVM Taylor

Horticulture (AAFC, 2000) is a large seeded cranberry bean from the Andean gene pool. The seed coat of this cultivar has red mottled flecks in a white to beige background. It has a determinate growth habit and is susceptible to CBB.

44

DNA extraction, digestion, and electrophoresis

Total genomic DNA from frozen leaf tissue (2-3 g) that had been harvested prior to flowering was extracted using the Cetyl Trimethyl Ammonium Bromide (CTAB) method as described by

Doyle and Doyle (1987). The target fragments in parental lines (BAT93, JaloEEP558, OAC Rex and SVM Taylor) were amplified using polymerase chain reaction (PCR) with Taq DNA polymerase (New England, Biolabs, Ipswich, MA). PCR reaction mixtures contained approximately 25 ng of total DNA, 0.2 mM of dNTP, 0.2 µM of forward and reverse primers, standard Taq buffer with 1.5 mM MgCl2, and 1 unit of Taq polymerase in a total volume of 20 µl reaction. The PCR cycle consisted of 2 min at 94 ºC and 35 cycles of 30 s at 94 ºC, 45 s at 55 ºC and 1 min at 72 ºC followed by a 7 min extension at 72 ºC. The amplicon was run on 1% agarose gel.

Table 2 shows the list of gene-specific primers that were used to amplify phenylpropanoid genes. They were designed based on sequences for the genes in OAC Rex as determined by Dr. Yarmilla Reinprecht (unpublished).

45

Table 2. Primers for genes that were mapped as PCR markers and the GeneBank accession number from which the primers were designed. E value is an indicator of similarity between the bean sequences cloned and sequenced by Yarmilla Reinprecht and the gene sequence source. Gene primer Gene Bean E sequence accession value source number IFR F: CAGTGCCATCTCTGGTGTTC P. coccineus CV670750 e-139 R: AGAGAGGCTGCCAGCAAA FBP4 F: TGACGGTTCTGTTCTCATTTCC P. vulgaris CW652095 0.0 (caps) R: TGCCCATCTTCACCATAGCATTT CW652096 0.0 CHR F: TGCCTTTGAGGTTGGCTACAGA G.max CW652099 6e-55 R:ACAGCAGGAGGGATGGTTGC CW652100 2e-78 LAC F: CCACTCCCTGCTTACAACGACA G. max CV670743 e-142 R: CCCGAAACCCTCTGCAACAA IFS F: GGCGAGGCTGAGGAGATCAGA G. max CV670762 6e-23 R: CTGGGAGTGGTGGGTGCATT

For FBP4, that was mapped as a CAPS marker in the core population, 10 µl of PCR

product was digested with 3 unit of MboI restriction endonuclease for one hour at 37 ºC.

To select the restriction endonuclease that showed polymorphisms with most of the

genes, the probes were hybridized to blots containing genomic DNA samples (5-10 µg) from

parents that were digested with EcoRI, EeoRV, HaeIII, HindIII, Dra1 and PstI restriction

endonucleases. Restriction digestions (5 U enzyme/µg of genomic DNA) were performed

according to the manufacturers' recommendations (Invitrogen) for 10-12 h at 37. The digested

DNA was electrophoresed in 0.8% agarose gels (prepared with TAE: 40 mM TRIS-acetate pH

7.4, 1 mM EDTA) for 16-20 h (1 V/cm of gel) in TAE running buffer.

The same procedure was used with individuals in the population.

46

Southern blotting, hybridization and autoradiography

Hybridization was performed according to standard protocols (Maniatis et al. 1989). Partial sequences of phenylpropanoid pathway genes were labelled for Southern hybridization using the

PCR DIG Probe Synthesis kit (Roche). Approximately, 20-30 µg of digested genomic DNA was separated analysed on a 0.8% agarose gel and transferred to positively charged nylon membranes by capillary transfer. Hybridization and detection were performed using the DIG Easy

Hybridization, DIG Wash and Block Buffer Set, and DIG Luminescent Detection kit (Roche) according to manufacturer’s instructions. Probes detecting restriction fragment length polymorphisms (RFLPs) were then hybridized to Southern blots of RIL DNAs digested with the appropriate enzyme.

SSR and SNP genotyping in the OAC population

Ten SSR markers were genotyped in the RIL population of OAC Rex × SVM Taylor. PCR reaction mixtures contained approximately 25 ng of total DNA, 0.2 mM of dNTP, 0.2 uM of forward and reverse primers, standard Taq buffer with 1.5 mM MgCl2, and 1 unit of Taq polymerase in a total volume of 20 µl reaction. The PCR cycle consisted of 2 min at 94 ºC and

35 cycles of 30 sat 94 ºC, 45 sat 50-56 ºC and 1 min at 72 ºC followed by a 7 min extension at 72

ºC. Annealing temperature and product size-fractionation on agarose gel for SSR markers can be found in appendix 3. The amplicon was run on 3% metaphor® gel.

SNP genotyping of this population was facilitated by Genome Quebec using the

Sequenom Massarray iPLEX Platform. In this assay a locus-specific primer which anneals immediately upstream of the polymorphic site of interest is used in a PCR reaction. Extension of the primer is carried out by mass-modified dideoxynucleotide terminators and through the use of

47

MALDI-TOF mass spectrometry, the mass of the extended primer is determined. This mass indicates the sequence and, therefore, the alleles present at the polymorphic site of interest.

A list of primers and single nucleotide polymorphism for each marker can be found in

Appendices 1 and 2.

Statistical and linkage analysis

Segregation data were collected for each gene either as a PCR marker or by hybridizations with multiple probes. Linkage analysis was performed using a total of 680 and 172 markers in the core and OAC population respectively. For the former, data for markers were obtained from Paul

Gepts. The linkage relationships of the segregating markers were analyzed with the multipoint linkage analysis software JoinMap 3.0 (Stam and Van Ooijen 1995). Markers were sorted into distinct groups using a LOD of 4.0, a distance of 30 cM and Kosambi mapping function as default linkage criteria.

Analysis of variance (ANOVA) and correlation were performed to determine the association between flower colour and DFR1 gene. PROC GLM and PROC CORR Spearman protocols (SAS Institute Inc, 1999) were used respectively.

48

Results

The PCR amplifications using degenerate primers designed in a previous study (Reinprecht et al, unpublished) were successful in identifying homologous sequences potentially coding for phenylpropanoid pathway genes in common bean. BLAST searches of GenBank revealed that the sequences obtained in the previous study are very similar to respective genes in other legumes. From 35 genes that were previously identified for the phenylpropanoid pathway in bean this study placed 18 genes on the core (BAT93 × JaloEEP558) map and 5 on the OAC Rex ×

SVM Taylor genetic map.

Marker polymorphism and segregation

Core population

When gene-specific primers were used to amplify fragments from the phenylpropanoid pathway genes in ‘BAT93’ and ‘Jalo EEP558’ four genes (IFR, IFS, CHR and LAC) gave polymorphic bands (Fig 10, Table 3). The individuals in the RI population were scored for the polymorphisms observed in the band sizes amplified with the primers for the 4 gene markers (Fig 10) and their distribution did not differ from the expected 1:1 distribution (Table 4).

49

Table 3. Polymorphisms of the phenylpropanoid genes in the Core population. Genes and corresponding restriction enzymes are specified with polymorphic band sizes in the right two columns.

Gene Enzyme Band size (bp) Mapped by PCR Jalo Bat

IFR Isoflavanone reductase 1000 1200 IFS Isoflavanone synthase 1100 - CHR Chalcone reductase 600 600, 700 LAC Laccase 1100 1200 FBP4 (caps) Lignin proxidase 1100 850 RFLP (EcoRI) 4CL1 4-coumarate CoA ligase 8000 15000 AS1 Glutathione S-transferase 20000 23000 CAD Cinamylalcohol dehydrogenase - 4800 F5H Ferulate 5-hydroxylase 4500 - Myb Myb transcription factor 5000 6500 RT Rhamnosyl transferase 6000 - IOMT 7-O-methyltransferase 6500 5500 VT Vacuolar transporter - 4300 RFLP (HindIII) CCR Cinnamoyl CoA reductase 6200, 6400 6200, 7000 F3’H Flavonoid 3’ hydroxylase 7000 4000 LAR Anthocyanidin reductase 4000 4300 PAL1 Phenylalanine ammonia lyase 2500 2900 PAL2 Phenylalanine ammonia lyase 2700 3000

50

IFR IFS CHR

Jalo Bat RIL Jalo Bat RIL Jalo Bat RIL

1100 bp 700 bp 1200 bp

LAC FBP4

Jalo Bat RIL Jalo Bat RIL

1100 bp 850 bp

Figure 10. Five phenylpropanoid genes (IFR, IFS, CHR, LAC and FBP4) were mapped by PCR in the core RIL population. Gel electrophoresis of amplicons from parental lines and RI individuals from this population of P. vulgaris is shown. FBP4 was mapped as CAPS marker and the image shows the migration pattern of digested PCR product. Migration positions of a linear DNA marker ladder are shown at left. Arrow shows the polymorphic band which was scored and used in JoinMap analysis.

51

FBP4 was scored as a CAPS (Cleaved Amplified Polymorphic Sequence) marker since the PCR products that were amplified with the FBP4 primers were the same size (1100 bp) with both parental DNA samples. However, after treating FBP4 amplicon with MboI restriction endonuclease an 850 bp band was observed in the Bat sample, whereas, the Jalo fragment remained at 1100 bp (Fig 10; Table 3). Distribution of alleles in the RIL population was not significantly different from 1:1 (Table 4).

To map genes as RFLP markers, Southern blotting was carried out using cloned and partially sequenced phenylpropanoid pathway genes as probes. Six different restriction endonucleases (EcoRI, EeoRV, HaeIII, HindIII, Dra1 and PstI) were used to digest genomic

DNA to test their ability to generate polymorphic markers between the parents and to determine which endonuclease resulted in polymorphisms for most of the genes. On the basis of these preliminary tests HindIII and EcoRI were selected to screen the individuals in the RI population.

The resulting RFLP markers based on the phenylpropanoid pathway genes were sized between

300 - 1000 bp (Fig 11; Table 3). Eight of these genes (4CL, AS, CAD, F5H, Myb, RT, IOMT, VT) showed polymorphisms using EcoRI and five (CCR, F3’H, LAR, PAL1, PAL2) showed polymorphisms when HindIII was used as restriction endonuclease (Fig 11, table 4).

52

CAD1 AS1 4CL1

Jalo Bat RIL Jalo Bat RIL Jalo Bat RIL

20000 bp 8000 bp

4800 bp

F5H Myb F3’H Jalo Bat RIL Jalo Bat RIL Jalo Bat RIL

4500 bp 6500 bp 4000 bp

CCR IOMT RT Jalo Bat RIL Jalo Bat RIL Jalo Bat RIL

7000 bp 7500 bp 6000 bp

Figure 11. 13 phenylpropanoid genes (4CL1, AS1, CAD, F5H, Myb, RT, IOMT, VT, CCR, F3’H, LAR, PAL1 and PAL2) were mapped as RFLP markers in the core RIL population. Southern blot of genomic DNA of parental lines and RI individuals from the core population of P. vulgaris is shown. Migration positions of a linear DNA marker ladder are shown at left. Arrow shows the polymorphic band which was scoredJalo Batand used RIL in JoinMap analysis.

53

The allele frequencies for 13 of the 14 markers mapped as RFLPs fit a 1:1 segregation ratio in the core mapping population (Table 4). Only the allele frequency for the Myb TF marker deviated significantly (P < 0.05) from the expected 1:1 ratio. Segregation of this marker was

1:2.3, biased toward the BAT93 allele.

Table 4. Chi-square test of segregation ratios for phenylpropanoid gene and linkage map position in the BJ population.

Common bean homolog Segregation ratio observed χ2 (1:1) Map location (linkage group) Bat 93 allele Jalo EEP558 allele LAR 36 28 1.00 Pv02 4CL1 36 29 0.75 Pv03 IFR 33 40 0.67 Pv03 IFS 33 38 0.35 Pv05 F3’H 34 24 1.72 Pv09 IOMT 27 33 0.60 Pv06 LAC 40 34 0.49 Pv01 PAL1 29 24 0.47 Pv07 PAL2 35 30 0.38 Pv07 CAD1 36 31 0.37 Pv02 VT 27 23 0.32 Pv08 CHR 39 31 0.91 Pv04 AS1 35 32 0.13 Pv08 RT1 35 33 0.06 Pv03 CCR 30 29 0.02 Pv10 Myb 47 20 10.88* Pv07 F5H 32 34 0.06 Pv03 FBP4 29 31 0.07 Pv06 * Significantly different from segregation ratio of 1:1 at P= 0.05.

OAC Rex × SVM Taylor RI population

Five phenylpropanoid pathway genes were polymorphic when genomic DNA of ‘OAC Rex’ and

‘SVM Taylor’ was amplified using the gene specific primers (Fig 12). PCRs with IFR, DFR1,

CHR, CAD and PAL3 primers gave polymorphic amplicons between 300 to 1200 bp with parental DNA (Table 5) and the individuals in the RI population. The distributions of the alleles 54 were not significantly different from 1:1 except for CHR, which was biased 2:1 in favour of the

SVM Taylor allele (Table 6).

Table 5. Polymorphisms for genotyping phenylpropanoid genes in the OAC population and the amplicon size for parental lines of the population.

Gene Enzyme Band size (bp) Rex Taylor IFR Isoflavanone reductase 1200 1000 DFR1 Dehydroflavonol 4-reductase 300 500 CHR Chalcone reductase 600, 700 600 CAD Cinamylalcohol dehydrogenase 1900 1400 PAL3 Phenylalanine ammonia lyase-3 650 750

55

PAL3 CHR DFR-B

OAC SVM RIL OAC SVM RIL OAC SVM RIL

700 bp 650 bp 300 bp

IFR CAD

OAC SVM RIL OAC SVM RIL

1400 bp 1200 bp

Figure 12. Five phenylpropanoid genes (PAL3, CHR, DFR1, IFR and CAD) were mapped by PCR in OAC Rex × SVM Taylor RIL population. Gel electrophoresis of amplicons from parental lines and RI individuals from this population of P. vulgaris is shown. Migration positions of a linear DNA marker ladder are shown at left. Arrow shows the polymorphic band which was scored and used in JoinMap analysis.

56

Table 6. Chi-square test of segregation ratios for phenylpropanoid gene and linkage map position in the OAC population.

Common bean homolog Segregation ratio observed χ2 (1:1) Map location (linkage group) OAC Rex allele SVM Taylor allele IFR 40 27 2.52 Pv03 DFR1 40 28 2.12 Pv01 CHR 25 48 7.25* Pv04 CAD 21 32 2.28 Pv2 PAL3 33 31 0.06 Pv08 *Significantly different from segregation ratio of 1:1 at P= 0.05.

Construction of the linkage map

Core population

The locations of the phenylpropanoid pathway genes in the common bean linkage map based on the segregation of the markers developed above are shown in Fig 13. The original linkage map that was used to build the current map was obtained from Paul Gepts and had 680 markers.

However, for clarity the linkage map in Fig 13 only shows 282 markers. The locations of the colour genes are indicated by boxes. The data to place these genes was obtained from (Bassett,

2007; McClean et al. 2002; and http://bic.css.msu.edu/_pdf/Bean_Core_map_2009.pdf)

The 18 phenylpropanoid genes (in boldface text on the current map) are distributed throughout the bean genome: F3’H and LAC are on linkage group 1 (LG1), LAR and CAD on

LG2, F5H, IFR and RT1 on LG3, CHR on LG4, IFS on LG5, FBP4 and IOMT on LG6, PAL1,

PAL2 and Myb TF on LG7, VT and AS1 on LG8 and CCR on LG11 (Table 5). Five out of 18 genes were mapped within 2-17 cM of colour gene loci. In particular, PAL1, PAL2 and Myb transcription factor mapped 13.2 cM, 17.1 cM and 7.8 cM from the basic colour gene P

(Lamprecht 1939; Smith 1939) on linkage group 7. CHR mapped 7 cM from G, the yellow-

57 brown factor of Prakken (1970), on Pv04. CAD mapped 2.1 cM from B, the greenish brown factor of Prakken (1970) on Pv02 based on linkage with the dominant I gene for bean common mosaic virus BCMV resistance (Vallejos et al. 2006).

OAC Rex × SVM Taylor RI population

The LOD 3.0 linkage map obtained from the OAC Rex × SVM Taylor RI population (Fig 14) covered 750 cM and was based on an original linkage map (Larsen RJ, 2005) with 68 markers, and 129 markers that were added in the current study (Phenylpropanoid genes are in boldface text. SSR, SNP and previously mapped markers are in regular font). It currently contains five phenylpropanoid gene markers (DFR1, PAL3, CAD, IFR and CHR), ten SSR markers and 114

SNP markers that were scored for the 89 inbred lines derived from cross between ‘OAC Rex’ and ‘SVM Taylor’. Of the 114 SNP markers that were screened, 89 (78%) were polymorphic between OAC Rex and Taylor (Appendix 1). The orientation of the map is consistent with the recently established standard (Pedrosa-Harand et al, 2008).

DFR1 mapped 13.9 cM from a flower colour gene on linkage group one that was previously mapped by Larsen et al (2005). Four flower colours were observed in the OAC Rex ×

SVM Taylor RI population, namely: w purple, violet, pink and white flowers. This trait was predicted to be determined by colour gene V based on the chromosomal location. Cosegregation analysis of this trait and DFR1 gene is shown in table 7. The segregation ratio of coloured flower versus white flower was 1:1 in accordance with the ratio of gene segregation in RI population.

DFR1 gene segregation ratio was also not significantly different from 1:1 (Table 7).

58

Table 7. Cosegregation of flower colour with DFR1in population OAC Rex x Taylor. Coloured flower vs white flower DFR1 Segregation ratios Phenotype Coloured White flower A H B flower Observed 43 45 40 19 29 Expected 44:44 44:44 ratio 1:1 1:1 χ2 0.05 2.12 χ2 : not significant (P = 0.05).

Single marker analysis (ANOVA) showed significant linkages between DFR1 gene and flower colour (Table 8).

Table 8. Single marker analysis of DFR1 and flower colour in OAC Rex × SVM Taylor population.

Source of Variation SS df MS F P-value F crit

Between Groups 3.970819 1 3.970819 20.34568 2.77E-05 3.98856 Within Groups 12.6859 65 0.195168 Total 16.65672 66

Rank correlation analysis was performed between flower colour and DFR1 gene. The result identified a positive correlation between the two (r=0.41442, <.0001). Seventeen percent of variation in flower colour can be explained by DFR1.

59

Figure 13. Core Linkage map of the common bean recombinant inbred population. Symbols for colour gene are boxed and boldface and those for phenylpropanoid pathway gene are in boldface text. Centimorgan (cM) distances between markers ordered at a LOD score 4.0 are shown to the left of each linkage group.

60

Figure 13. Cont’d.

61

Figure 13. Contn’d.

62

Figure 14. ‘OAC Rex’ × ‘SVM taylor’ linkage map of the common bean recombinant inbred population. Phenylpropanoid pathway gene are in boldface text. Centimorgan (cM) distances between markers ordered at a LOD score 3.0 are shown to the left of each linkage group.

63

Figure 14. Contn’d

64

Figure 14. Contn’d

Integrated genetic map and genome sequence

The initial version of genome sequence of the common bean is available on Phytozome. This genome assembly is approximately 521.1 Mb arranged in 708 scaffolds. Ninety eight point eight percent (98.8%) of the assembled sequence is contained in 11 chromosomes. Phenylpropanoid genes were positioned on the chromosomes based on sequence similarity. The result will add confidence to the positioning of these genes on bean genetic map. Position of seven genes which were mapped close to colour genes was determined on bean chromosomes (Fig 15).

Colinearity between genetic linkage maps and the physical map is very clear as shown in

Fig 15. There are minor disagreements in ordering, which are easily identified on the integrated genetic map by intersecting transverse lines connecting corresponding loci on the linkage groups

65 and chromosomes. Such differences could be due to problems with markers, a misassembly in the reference sequence, or an actual polymorphism between the mapping population and the sequenced individual. However, without additional sequencing data from the sequenced individual it is not possible to distinguish between a misassembly and a true polymorphism.

66

Figure 15. Integrated genetic map and sequence assembly. Vertical bar in the center represents chromosomes in the preview release of the assembly on Phytozome (physical map). Bars on the right side represent the genetic linkage map of the core population. The left bar shows the genetic linkage map of OAC Rex × SVM Taylor. All bars are drawn to scale. Transverse lines link the location of each marker on the genetic and physical maps.

67

Figure 15. Cont’d.

68

Figure 15. Cont’d.

69

Figure 15. Cont’d.

70

Discussion

Phenylpropanoid genes in the core population

The results of the current study that indicate that markers for a structural gene (PAL) and a regulatory gene (Myb) in the phenylpropanoid pathway map close to the P locus on Pv07, which is interesting because both are good candidates for the P gene. The P gene controls the presence and absence of colour production in the bean seed (Bassett, 2007). Plants that are homozygous

“pp” have white seed coats and flowers regardless of the genotype at any other gene in the complex genetic system controlling seed coat colour. Only plants with PP or P_ genotypes will express the colour genes located at other loci determining seed coat colour. Several P alleles have been identified and, on the basis of an allelism test the dominance order of the six known alleles at P locus is as follows: P> pmic> phbw> pstp> pgri> p; where P = dominant, pmic = micropyle stripe; phbw = half banner white; pstp = stippled; pgri = grayish white colour; and p = recesive (Bassett, 2007). A QTL for tannin content was also found to be located at the p locus givng another alleles for this trait even in colored beans from the population DRO364 x G19833

(Caldas and Blair, 2009).

Erdmann and colleagues (2002) speculated that the P gene could be: 1) a transcription factor necessary to initiate the activation of all of the other genes (such as C, J, V, G, B or Rk) or key individual genes responsible for colour expression in seed coats and flowers or, 2) it could be involved in a key biosynthetic complex consisting of enzymes in the flavonoid biosynthetic pathway.

Since phenylalanine ammonia-lyase (PAL; EC 4.3.1.5) is the first enzyme from primary metabolism into the phenylpropanoid metabolism pathway in plants, which leads to colour

71 compounds like anthocyanins, flavonoids and color affecting compounds such as tannins,

(Hahlbrock and Scheel, 1989), the cosegregation of two PAL loci with the P locus raises the question whether PAL, and in particular PAL1, is a candidate gene for the P locus. PAL catalyzes the nonoxidative elimination of ammonia from l-Phe to give trans-cinnamate which is the precursor of all of the phenylpropanoid compounds (Dixon and Pavia, 1995). PAL is also a key enzyme in plant stress response. Pathogenic attack, tissue wounding, UV irradiation, low temperature, or low levels of nitrogen, phosphate, or iron can stimulate its biosynthesis (Dixon and Pavia, 1995; Sarma and Sharma 1999). PAL is also involved in the biosynthesis of the signaling molecule salicylic acid, which is required for plant systemic resistance (Nugroho et al.

2002) . If the P locus encodes the PAL gene, plants with a homozygous recessive condition may have a completely inactive phenylpropanoid pathway because the starting substrate (trans- cinnamate) is lacking.

Perhaps more intriguing was the finding that Myb15 was linked to the P locus at a recombination distance of 7.8 cM, Slightly closer than the PAL genes making it a better candidate. In plants, the structural genes of the flavonoid biosynthetic pathway are largely regulated at the transcription level (Sainz et al. 1997). In all species studied to date, the expression of anthocyanin biosynthetic genes are regulated through a complex of MYB transcription factors (TF), basic helix-loop-helix (bHLH) TFs and WD-repeat proteins (the

MYB-bHLH-WD40 "MBW" complex) (Baudry et al. 2004). In a proposed model for the activation of structural pigmentation genes (Koes et al. 1994), regulators interact with each other to form transcriptional complexes in conjunction with the promoters of structural genes. Myb

TFs are characterized by a conserved DNA-binding domain consisting of single or multiple imperfect repeats (Allan et al. 2008). The MYB domain is a region of about 52 amino acids that

72 binds DNA in a sequence specific manner (Allan et al. 2008). Cis-acting elements that bind

Mybs have been identified in several genes for enzymes in the phenylpropanoid pathway, including F3H, DFR, ANS, UFGT, F3′H and FLS (Schwinn et al. 2006). The MYB and bHLHs

TFs interact and form transcription complexes that regulate anthocyanin biosynthesis in plants

(Schwinn et al. 2006; Holton and Cornish, 1995; Borovsky et al. 2004). These complexes can bind to the promoter of flavonoid structural genes to promote or repress the expression of specific genes (Winkel-Shirley, 2001). The suggestion the Myb15 is a candidate gene for P is consistent with the suggestion by Erdmann and colleagues (2002) that P factor could be a transcription factor that controls the expression of many phenylpropanoid pathway genes in a co- ordinated fashion.

Another interesting result of this study is the fact that chalcone reductase (CHR) was mapped close to the marker for the G gene (7 cM apart from OAP3) on Pv04. G is the “yellow brown factor” of Prakken (Prakken, 1970) that when dominant intensifies the brown colour in the seed coat. This gene shows small to no interaction with most other colour genes (Prakken,

1972). CHR is an enzyme that co-acts with CHS to produce a branch in the first step of the flavonoid pathway. In most plants, the product of CHS is the 2’, 4’, 6’, 4 tetrahydroxylated naringenin chalcone which is the substrate for most of the common flavonoids (Austin and Noel,

2003). In leguminosae, chalcones without a hydroxyl at the 6’position (6’-deoxychalcones), are produced (Ayabe et al. 1988). These compounds are substrate for other flavonoids involved in plant-microbe interactions (Dixon and Pavia, 1995). Before production of flavonoid and isoflavonoids there is a biosynthetic route that leads to the production of aurones (Farag et al.

2009). Aurones are yellow coloured flavonoid subclass that has significant roles in flower pigmentation (Nakayama et al. 2001) and defense responses (Paré et al. 1991). Activity of CHR

73 enzyme may cause the shift of primary metabolites toward production of aurones and thus give a yellow brown colour to the seed coat.

Another interesting observation occurs with Cinnamyl alcohol dehydrogenase (CAD) which was mapped close to the B locus (2.1 cM) on Pv02. B is the gene for (Greenish) Gray

Brown seed coat (Prakken, 1972) that changes chamois to (greenish) gray brown (Bassett, 2007).

The effect of this gene on seed coat phenotype is very environmentally sensitive (Bassett, 2007).

Gene B was associated with I which is a disease resistance gene (Kyle and Dixon, 1988) and was fine mapped to Pv02 (Vallejos et al. 2006). CAD catalyses the last step in the biosynthesis of the lignin monomers (Baucher et al. 1999). Down-regulation of the CAD enzyme was associated with a red colouration of the stem. Although the lignin quantity remained unchanged the lignin composition was altered (Baucher et al. 1999). The reason for this phenotype is unknown but it cannot be excluded that the colouration might be due to the presence of a pigment. The latter might be indicative of a switch toward the anthocyanin branch in the phenylpropanoid pathway.

Cosegregation of DFR1 and flower colour in bean

Cosegregation analysis of flower colour phenotype (coloured vs white) and DFR1 gene (Table 6 and 7) implies a strong link between gene and the trait. DFR1 was mapped on linkage group one of OAC Rex × SVM Taylor RIL population 13.9 cM apart from flower colour. OAC Rex × SVM

Taylor is a RIL population, which segregates for flower and seed coat colour. SVM Taylor has a crème beige background with red striping and a yellow Hilum ring. OAC Rex has an all white seed coat. Four flower colours were observed in this population including purple, violet, pink and white and this trait was phenotypically mapped on Pv01 of this population (Larsen JR,

2005).

74

DFR1 also segregates in this population. The long distance between flower colour and

DFR1 might be explained by lack of enough markers on this linkage group. In soybean, there are two loci associated with flower colour, W3 and W4. DFR2 cosegregates with theW4 locus and

DFR1 is most likely associated with the W3 locus(Yang et al. 2010). In another study variegation in the flower colour of Linaria was attributed to an unstable mutant allele of the DFR gene. This allele carries an insertion of a transposon belonging to the CACTA family which blocks its expression and results in ivory flower colour phenotype. The transposon is occasionally excised in dividing epidermal cells to produce clonal patches of red tissue on the ivory background, and in cells giving rise to gametes to generate reversion alleles conferring a fully coloured phenotype (Galego and Almeida, 2007). Previous studies have shown that overexpression of the DFR genes from cranberry and M. truncatula in tobacco (N. tabacum) resulted in an increase in anthocyanin accumulation and a change in flower colour (Aida et al.

2000; Xie et al. 2004; Polashock et al. 2002; Davies et al. 2003;). Transgenic tobacco plants harboring the 35S:PtrDFR1 transcription cassette from poplar also produced much darker pink flowers than were observed on wild-type control plants. Further, compared to wild-type control plants, a significantly higher accumulation of anthocyanins was detected in the PtrDFR1 overexpression transgenic tobacco plants (Huang et al. 2012).

Although evidence shows a strong correlation between flower colour phenotype and the

DFR1 gene of the phenylpropanoid pathway, the fact that there are four different colour phenotypes observed in bean flowers (purple, violet, pink and white) suggests that there is at least one other gene involved in this trait. Further genetic studies are needed to clear the ambiguity.

75

IV. MOLECULAR CHARACTERIZATION OF PHENYLPROPANOID GENES IN COMMON BEAN (Phaseolus vulgaris L.)

Abstract

Six phenylpropanoid genes were studied in detail in common bean: two members of the chalcone synthase (CHS) gene family (PvCHS-A and PvCHS-B), phenylalanine ammonia lyase 2

(PvPAL2) and phenylalanine ammonia lyase 3 (PvPAL3), dehydroflavonol 4-reductase

(PvDFR3) and a myb transcription factor (PvMyb15). Partial sequences of these genes were used as probes to screen the common bean (G19833) genomic DNA BAC library and one positive clone for each gene was fully sequenced using the 454 next generation sequencing method. The similarities between these genes and their corresponding orthologs in other plant species

(especially soybean), copy number for PvCHS-B, expression patterns for PvCHS-A, PvCHS-B and PvDFR and cis-acting elements of PvMyb15 are discussed in detail.

76

Introduction

Phenylalanine ammonia-lyase (PAL; EC 4.3.1.5) is the first enzyme from the primary metabolism into the important secondary phenylpropanoid metabolism in plants (Hahlbrock and

Scheel, 1989). PAL catalyzes the nonoxidative elimination of ammonia from l-Phe to give trans- cinnamate, which is the precursor of numerous phenylpropanoid compounds (Dixon and Paiva,

1995). PAL is usually encoded by a small multigene family (Cramer et al. 1989; Bolwell et al.

1985). Arabidopsis has four putative isoenzymes (Cochrane et al. 2004). In raspberry and bean, it is encoded by a multi-gene family with few members (Kumar and Ellis, 2001; Cramer et al.

1989). Potato appears to be an exception with more than 40 copies reported (Joos and Hahlbrock,

1992).

Chalcone synthase (CHS) is a well-studied plant-specific type of III PKS (a family of enzymes or enzyme complexes that produce polyketides) that catalyzes sequential decarboxylative condensation of p-coumaroyl-CoA with three molecules of malonyl-CoA. The product of this reaction is a new aromatic ring system, naringenin chalcone, the key intermediate in the biosynthesis of flavonoids (Watanabe et al. 2007). This product may be further modified in a number of subsequent biochemical steps to yield many different end products (Durbin et al.

2000). It is now known that CHS is encoded by a small multigene family in many species and most of the CHS genes studied so far in flowering plants contain one intron at a conserved site

(Martin, 1993). Studies of the expression patterns of different members of CHS gene family showed that different copies of CHS have acquired specialized functional roles over the course of evolution and are differentially expressed temporally and spatially (Durbin et al. 2000). One end product of the pathway is anthocyanin, which is a pigment responsible for seed coat and

77

flower colour in plants (Martin, 1993). Generally, the loss of CHS function results in a lack of anthocyanin and a white flower colour phenotype (Durbin et al. 2000).

Dihydroflavonol 4-reductase (DFR; EC 1.1.1.219) is a key enzyme of the flavonoid pathway leading to common anthocyanins and condensed tannins (CTs). This enzyme reduces dihydroflavonols to make flavan-3, 4-diols, which is an intermediate of flavonol biosynthesis through the flavonol synthase reaction. The DFR gene has been isolated from various plants and a single or multiple gene(s) encoding DFR protein(s) in a few plant genomes have been reported

(Helariutta et al. 1993; Tanaka et al. 1995; Himi et al. 2004; Inagaki et al. 1999). Flower colours in ornamental plants have been modified by controlling the expression levels of DFR genes

(Aida et al. 2000) and alteration of CT levels has been performed by introducing a DFR gene into a forage legume Lotus corniculatus (Robbins et al. 1998).

Three different groups of transcription factors including MYB, basic helix loop helix

(bHLH) and WD40 proteins have been shown, in different species, to regulate the phenylpropanoid pathway ( Koes et al. 2005). Several plant species such as Arabidopsis, maize, rice (Oryza sativa), petunia (Petunia hybrida), snapdragon (Antirrhinum majus), grapevine (Vitis vinifera L.), poplar (Populus tremuloides) and apple (Malus domestica) have been investigated to elucidate the functions of MYB proteins using both genetic and molecular analyses. As a result of these studies the mechanisms that control MYB protein activities, gene expression profiles and several target genes have been determined (Dubos et al. 2010). Myb transcription factors are characterized by a structurally conserved DNA-binding domain consisting of single or multiple imperfect repeats. Myb TFs associated with the anthocyanin pathway are of the two-repeat

(R2R3) class (Stracke et al. 2001)

78

Materials and Methods

Extraction of total RNA

Three tester lines including 5-593, Pcdj BC3 5-593 and p BC3 5-593 were selected for gene expression studies. Plants were grown in the growth room at 28/22 °C (day/night) and 16/8 hour

(day/night), and immature seeds (3-5 mm) were harvested from green pods. Seeds were removed from the pods and the seed coats of 30-40 seeds per line were harvested and placed into liquid nitrogen until the samples were placed into an ultra-low temperature freezer (-80°C). Total RNA samples were prepared from the seed coats using Qiagen Plant RNA extraction kit according to the manufacturer’s instruction.

Amplification of partial genomic sequence

For genomic sequence amplification, 1 μL (25ng) of genomic DNA was used in a 20 μL PCR reaction containing 2l of 10X PCR Buffer (Invitrogen), 1.2 l of a stock 25 mM MgCl2 buffer

(Invitrogen), 20 M nucleotide mix, 5 U of Platinum Taq (NEB Laboratory), and 10 pmol each of forward and reverse primers. Cycling conditions were as follows: pre-denaturation at 94 °C for 2 min, followed by 35 cycles of amplification (94 °C for 30 s, 60 °C for 45 s and 72 °C for

1 min) and a final extension at 72 °C for 10 min.

79

Table 9. Primer pairs used to amplify PvCHS-A, PvCHS-B, PvDFR, PvMyb, PvPAL2 and PvPAL3.

Gene Primer

PvCHS-A F:5’- CAGATGGTGACAGTCGAGGA -3’ R:5’- TGGAACGTTAGTCCCACCTC -3’ PvCHS-B F: 5’- GGAAAGAGGCTGCAGTCAAG -3’ R: 5’- GGGCATGGTGGGTGTTATAC -3’ PvDFR F:5’- TACAGGGGCTTCTGGTTTCA -3’ R:5’- CCTGTGATAGGGGAAAGTGC -3’ PvDFR1 F:5’- TACAGGGGCTTCTGGTTTCA -3’ R:5’- GAGGTACTTCAGGGTATCCTAATCC -3’ PvDFR3 F:5’- TACAGGGGCTTCTGGTTTCA -3’ R:5’- ACGAAATACATCCAACCAGTCATC -3’ PvMyb15 F:5’- GAGTGATGGCGTGCGGAGTA- 3’ R:5’- CCCAGCGTCTCATGCAGCTT -3’ PvPAL2 F:5’- CCCACCGCCGTACCAAACAA -3’ R:5’- GCAACTCAAAGAACTCAGAACCAA -3’ PvPAL3 F:5’- GCAACTCAAAGAACTCAGAACCAA -3’ R:5’- GCAGCAATGTAGGACAGAGGAA -3’

PCR product detection and sequence analysis

The PCR products were separated by electrophoresis through 1% agarose in 1× TBE, stained with ethidium bromide and visualized with UV. The DNA of target bands was recovered with a

Gel Extraction Kit (Qiagen, Canada) and ligated to a TOPO vector (Invitrogen). Transformed

DH5α cells were selected on regular LB media and Ampicillin with X-gal and white colonies were cultured in liquid LB and Ampicillin. DNA fragment insertion was confirmed by PCR amplification with the amplification primers and positive clones were sequenced on a Beckman

CEQ 2000 using TOPO5 sequencing primer. BLAST analyses of the nucleotide and deduced amino acid sequences were performed on the NCBI website (http://www.ncbi.nlm.nih.gov/).

RT–PCR characterization of PvCHS-A, PvCHS-B and PvDFR expression in seed coat of P.vulgaris

Semi-quantitative reverse transcription-PCR (RT–PCR) was performed to evaluate the expression of PvCHS-A, PvCHS-B and PvDFR in the developing seed coats of common bean. 80

One microgram of each total RNA sample was used for Oligo(dT)20-directed reverse transcription using a SuperScript III First-Strand Synthesis SuperMix kit (Invitrogen). An aliquot of 1μL total first strand cDNA from each sample was used as template in a 20 μL standard Taq

PCR reaction (described above) using the primer pairs PvCHS-A-F and PvCHS-A-R for PvCHS-

A, PvCHS-B-F and PvCHS-B-R for PvCHS-B and PvDFR-F and PvDFR-R for PvDFR (Table

9). The annealing temperature was 60 °C and the amplifications were taken to 35 cycles. For an internal control, primers CHI-F (5′-TCATTTCAGGACCCTTTGAA-3′) and CHI-R (5′-

CAACTTTAGTGAGAAGAAAGAGAGAAA -3′) were used to simultaneously amplify a chalcone isomerase gene fragment from the cDNA using the same amplification conditions.

BAC library screening

The P. vulgaris BAC library constructed from HindIII digested and size selected P. vulgaris cultivar G19833 genomic filters were obtained from the Clemson University Genomics Institute.

This library has 12 genome equivalents and an average insert size of 145 kb with a total of over

44 thousand clones in the full library (https://www.genome.clemson.edu/).

DIG Probe Synthesis

The library was screened with DIG-labeled probes of PvCHS-A, PvCHS-B, PvDFR3, PvMyb,

PvPAL2 and PvPAL3 genes amplified from cloned genomic DNA. The hybridization probes were synthesized using a digoxygenin (DIG) PCR Labeling kit (Roche) according to the manufacturer’s instructions. Using the PCR primers listed in Table 1 each reaction tube contained 20 ng of DNA from the JaloEEP558 as a template, 5l of 10X DIG PCR Buffer

(Invitrogen), 10 mM DIG nucleotide mix and 10 pmol each of forward and reverse primer.

Cycling conditions were as follows: pre-denaturation at 94 °C for 2 min, followed by 35 cycles of amplification (94 °C for 30 s, 60 °C for 45 s and 72 °C for 1 min) and a final extension at

81

72 °C for 10 min. Quantification of the probe solutions showed that they contained ~20ng DIG labeled DNA/μl and when the labeling reactions were separated by electrophoresis they contained slightly larger bands than the fragments amplified by the regular PCR which correspond to DIG labeled fragments.

Analysis of BAC clones

Membranes prepared by the Clemson Genomic Institute containing 55296 BAC clones of P. vulgaris cultivar G19833 genomic DNA were screened by Southern hybridization with the DIG labelled probes. Pre-hybridization was performed overnight at 42°C in an Eazy-Hyb bag (Roche) containing 30 ml of Eazy-Hyb solution (Roche). The following day, this solution was replaced with 30ml of Eazy-Hyb solution containing 20 l of probe, preheated to 42C. Prior to mixing with hybridization solution the probe was heated to 95C for 5 min to denature the DNA. The hybridization was conducted at 42C overnight.

Following hybridization, the membrane was washed twice at room temperature with

2XSSC and 0.1% SDS for 5min per wash. A second round of washing was performed at 65C with 0.5XSSC and 0.1%SDS for 10min per wash. The membrane was rinsed with 1X Washing

Buffer (Roche) and the bag was filled with 50ml of Blocking Solution (Roche) and placed on a shaker for 30 min at room temperature. The Blocking Solution was replaced with a second 50 ml aliquot of Blocking Solution containing 12.5l of anti-DIG secondary antibody (Roche). The secondary antibody was centrifuged at 13,000×g for 5min before adding it to the Blocking

Solution. The membrane was incubated at room temperature for 30 min, and washed twice for

15min each time with 1X Washing Buffer. The membrane was submerged in Detection Buffer for 2 min and Kodak X- Ray film was exposed to the membrane for 10-15 min.

82

Colony Selection and PCR confirmation

The identities of the BAC clones to which the probes strongly hybridized were identified on the basis of the membrane positions and patterns of the hybridization spots seen in the images. BAC clones that strongly hybridized with the probes were requested from the Clemson Genomic

Institute. Plasmid DNA was extracted from the clones using a plasmid extraction kit (Qiagen) and the presence of probe sequences was verified by PCR amplification using primers listed in

Table 9. The thermocycler conditions were the same as for probe synthesis. The plasmid preps were diluted 1:1000 with water and 1l aliquots of the diluted samples were used as templates in

25l of PCR reaction mix (see above).

After electrophoresis, staining and visualization of the PCR products they were compared to the original genomic PCR band. One clone per gene from a few clones with the appropriate size was selected for sequencing.

Plasmid Purification and quantification for DNA Sequencing

Escherichia coli cells carrying individual BAC clones were grown on LB plates supplemented with 12.5 µg/ml of chloramphenicol at 37°C. For each BAC clone, one colony was picked and cultured in 2 ml LB supplemented with 12.5 µg/ml chloramphenicol overnight at 37C on a shaker at 200 rpm. The following day 200l aliquots of each of these cultures were transferred into 200 ml of CircleGrow media (MP Biomedicals, Solon OH) containing 12.5 g/ ml of chloramphenicol. The cultures were grown in 2.8 L flasks and incubated overnight at 37C and

200 rpm.

The cells were collected by centrifugation at 4000 × g and 4 C, for 15 min and the pellets were stored on ice for DNA extraction. Twenty ml of buffer P1 (Qiagen) was added to the pellets to resuspend the cells and the cells were divided into two 50 ml Oakridge tubes, each

83 containing 10ml of culture. The cells were lysed by adding 10ml of buffer P2 (Qiagen) to the bottles and mixing twice by rotating the bottles, with 2 min between each mixing. After 5 min,

10 ml of buffer P3 (Qiagen) were added to the tubes and the bottles were rotated to mix the samples and placed on ice for 10-15min. The lysates were centrifuged at 20,000 × g for 15 min and the supernatants were filtered through filter paper pre-wetted with distilled water.

Approximately 12.5-15 ml (or 0.6 volumes) of room temperature isopropanol were added to the filtrates to precipitate the DNA. The DNA was collected by centrifuging the samples at 20,000 × g for 30 min at 4C. The pellets were washed with 5 ml of 70% (v/v) ethanol/water and centrifuged at 15,000 × g for 15 min at 4C. The pellets were allowed to air dry for no more than

10 min.

To eliminate chromosomal DNA, the DNA pellets were dissolved in 4.25 ml of buffer

EX (Qiagen) containing 200 l of exonuclease solution. The samples were incubated at 37 C for

1 h and during that time Qiagen columns were equilibrated according to the manufacturer’s instructions (Qiagen). After the exonuclease digestions were complete 10ml of buffer QS

(Qiagen) were added to each sample and the mixtures were added to the columns. The columns were washed twice with 30 ml of wash buffer and the plasmid DNA samples were eluted from the columns into new 50 ml Oakridge tubes using three 5 ml aliquots of buffer QF (Qiagen) heated to 65C.

The plasmid DNA samples were precipitated using 0.7 volumes of isopropanol. After centrifugation at 20,000 × g for 20 min the pellets were washed with 5 ml of 70% (v/v) ethanol/water and spun at 20,000 × g for 15min. The pellets were allowed to air dry for 10 min and were dissolved in 600 l of TE at 4C overnight. The following day, the DNA solution was incubated at 65 C for 10 min. The DNA concentrations were determined with a Qbit DNA

84 analyzer (BioRad) according to the manufacturer’s instructions. This method measures only double stranded DNA and is not confounded by interference from residual RNA in the sample.

10-15 g of DNA from each clone was used for 454 next generation sequencing in the National

Research Centre in Saskatoon.

Sequence assembly and annotation

Sequence reads were trimmed for contaminating bacterial (E. coli) genomic DNA, BAC clone backbone (pBeloBAC11) and vectors in Univec database using CLC Genomics Workbench. The trimmed reads were then assembled using the CLC Genomics Workbench reference assembly

(http://www.clcbio.com). The genome sequence of bean (http://www.Phytozome.org) was used as reference. The consensus sequence was analyzed for the presence of coding regions with two different computer programs GENESCAN (http://genes.mit.edu/GENSCAN.html) and

FGENESH (http://www.softberry.com/) using Arabidopsis and Medicago trancatula respectively as the model organisms. The coding regions were compared to known genes at the

GenBank to uncover their putative function.

Sequence comparison and phylogenetic analysis

Orthologous protein and nucleotide sequences were retrieved from NCBI

(http://www.ncbi.nlm.nih.gov) and aligned with common bean sequences using ClustalW multiple sequence alignment program (Larkin et al, 2007) using default parameters. The obtained alignments were imported into the molecular evolutionary genetics analysis (MEGA) software

(Tamura et al, 2007). Phylogenetic trees were constructed using neighbor-joining (Saitou N and

Nei, 1987) with 1000 bootstrap replicates.

85

Structure prediction of sequenced genes

Crystal structures for the enzymes PvDFR3 and PvMYB15 are not available in the Protein Data

Bank (http://www.rcsb.org/pdb/home/home.do) therefore modelling of the target proteins was performed. Phyre2 server (Kelley and Sternberg, 2009) was implemented to generate probable three dimensional structures of these proteins. Protein threading or fold recognition is used by this server to model those proteins that have the same fold as other proteins with known structures, but do not have homologous proteins with known structure. Crystal structures are known for plant PAL and CHS and therefore SWISS-MODEL Workspace (Arnold, K. 2006) was used for PvCHS and PvPAL proteins.

86

Results

Expression of PvCHS-A and PvCHS-B genes in the seed coat of common bean

PvCHS-A was strongly expressed in the seed coat of the black seeded tester line 5-593

(dominant for all colour genes) but was not expressed in white seeded Pcdj BC3 5-593 (which is dominant for P but recessive for c, d and j) nor was it expressed in p BC3 5-593 ( recessive for P but dominant for all other colour genes). PCRs with the same cDNA samples and primers for

CHI resulted in equal intensity amplicons from all three samples (Fig 16b). PCRs with primers for PvCHS-B resulted in an intense band for the cDNA from the black seeded tester line but gave much fainter bands for the cDNA samples from the white seeded lines.

PvCHS-A PvCHS-B

a.

1500 bp

800 bp 600 bp

b.

Figure 16. Chalocone synthase (CHS) and chalchone isomerase (CHI) expression in seed coats of bean seeds developing on black-seeded (5-593) and white-seeded (Pcdj BC3 5-593; pBC3 5- 593) tester lines five days after pollination. a. PvCHS gene expression in the seed coats of three tester lines with different seed colours. The first three lanes show the expression of PvCHS-A, the second three the expression of PvCHS-B. b. CHI gene amplified from the seed coats of the three tester lines as positive controls for cDNA concentrations in the three samples.

87

454-Sequencing of PV-GBa 0083H05 Clone (PvCHS-A)

The clone PV-GBa 0083H05 was identified by screening the Clemson Genomic Institute P. vulgaris G19833 BAC library membranes with the PvCHS-A probe. 454 sequencing resulted in

18,332 reads with an average length of 324.27 bp, giving 5,944,484 bp of sequence information

(Table 10). An assembly of these sequences with the bean genome sequence as a reference

(Pv01:13570000..15269999) resulted in one contig with a length of 162,772 bp (figure 17) and a read depth of 9X.

Table 10. Next generation sequencing of PV-GBa 0083H05 clone.

Count Average length Total bases Reads 11,105 264.13 2,933,208 Matched 10,157 265.01 2,691,723 Not matched 948 254.73 241,485

Functional annotation of the BAC clone PV-GBa 0083H05

An analysis of the 162,772 bp sequence with GENESCAN and FGENESH identified 26 putative genes (Table 11, Figure 17). A BLASTX analysis identified one copy of the CHS gene, a brefeldin A-inhibited guanine nucleotide-exchange protein, six novel (No hit) putative gene sequences with no similarities to any other genes currently described, genes that encoded retrotransposon elements, a protein kinase and a reverse transcriptase.

88

CHSA Brefeldin A-inhibited guanine nucleotide-exchange protein 1-like 0 Putative receptor-like protein kinase Hypothetical Uncharacterized proteins protein Hypothetical proteins Gag-polpolyprotein 20

No hit Putative receptor-like protein kinase Protein kinase family protein 40

Hypothetical protein Uncharacterized protein Putative polyprotein 60

No hit Hypothetical protein No hit Gag-polpolyprotein 80

No hit No hit Uncharacterized protein No hit 100

Hypothetical protein Putative receptor-like protein kinase Reverse 120

transcriptase Gag-polpolyprotein 140

Key: = Retrotransposon elements = Chloroplast DNA = Homology identified coding region = Mitochondrial DNA = Hypothetical proteins = Phenylpropanoid gene = No hit Arrows point in the proposed direction of transcription

Figure 17. Schematic representation of the distribution of putative genes identified in P. vulgaris PV-GBa 0083H05 BAC clone.

89

Table 11. Predicted genes in the BAC clone PV-GBa 0083H05 and their putative function Position on Length BLASTX Accession number E BAC (bp) (organism) value

284-8342 8058 Chalcone synthase-like gi|356571645 (G. max) 0 8686-15344 6658 Brefeldin A-inhibited guanine gi|356545802 (G. max) 5E-67 nucleotide-exchange protein 1-like 15441-17690 2249 Hypothetical protein gi|147775903(V. vinifera) 1E-44 17720-20315 2595 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 2E-77 20479-22493 2014 Hypothetical protein gi|147834092 (V. vinifera) 6E-62 23654-26424 2770 Uncharacterized protein gi|356560773 (G. max) 8E-76 26451-33017 6566 Hypothetical protein gi|357515063 3E-24 33072-39477 6405 Gag-pol polyprotein gi|38194929 (P. vulgaris) 4E-46 40132-43766 3634 No hit 44279-50853 6574 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 3E-129 55394-58596 3202 Protein kinase family protein gi|351727579 (G. max) 0 58798-62453 3655 Hypothetical protein gi|147818189 (V. vinifera) 4E-05 64765-75891 11126 Uncharacterized protein gi|356570105 (G. max) 2E-94 75911-79665 3754 Putative polyprotein gi|50511382 (O. sativa) 4E-68 81743-87453 5710 No hit 88043-91348 3305 Hypothetical protein gi|147783627 (V. vinifera) 7E-27 91385-92801 1416 No hit 93328-107433 14105 Putative gag/pol polyprotein gi|353685495 (P. vulgaris) 0 107553-109531 1978 No hit 109717-111294 1577 No hit 111475-118100 6625 Uncharacterized protein gi|356519395 (G. max) 2E-11 118168-122848 4680 No hit 123299-124574 1275 Uncharacterized protein gi|356554765 (G. max) 2E-27 125305-138093 12788 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 3E-142 138157-143172 5015 Reverse transcriptase gi|8778340 (A. thaliana) 1E-80 143273-149410 6137 Gag-pol polyprotein gi|38194929 (P. vulgaris) 6E-127

454-Sequencing of PV-GBa 0005G03 Clone (PvCHS-B)

The PV-GBa 0005G03 clone was identified by probing the Clemson Genomic Institute BAC library membranes with the PvCHS-B gene clone (data not shown). 454 sequencing of the clone resulted in 22,772 reads with an average length of 332 bp, giving 7,562,301 bp of sequence information (Table 12). Assembly of these sequences with a reference (Pv02:

3000000..4719999) resulted in one contig with a length of 138,143 bp and a 29X coverage.

90

Table 12. Next generation sequencing of PV-GBa 0005G03 clone. Count Average length Total bases Reads 16,633 253.99 4,224,534 Matched 15,964 255.56 4,079,733 Not matched 669 216.44 144,801

Functional annotation of the BAC clone PV-GBa 0005G03

Annotation of the 138,143 bp sequence identified 20 putative genes (Table 13, Figure 18). In particular, the annotation showed that it contained eight complete copies and one partial copy of chalcone synthase in a 100 kb section of this clone. The BLASTX analysis identified two novel putative genes with no similarities to any currently described genes. Other putative genes identified in this clone are low copy genes, including: cellulose synthase, 4-coumarate:coA ligase, transcription factor DP, E3 ubiquitin ligase, poly(ADP-ribose) glycohydrolase, transposon protein( Mutator sub-class), ADP-ribosylation factor and exportin.

91

Mutator sub-class Uncharacterized protein ADP-ribosylation factor 0

Exportin-1-like isoform 2 CHSB-1 No hit 20

CHSB-2 CHSB-3 No hit CHSB-4 40

Uncharacterized CHSB- protein CHSB-5 CHSB-6 CHSB-7 partial 60

CHSB-8 E3 ubiquitin-protein ligase synoviolin-like Uncharacterized protein 80

Uncharacterized protein Transcription factor-like protein DPB-like 100

4-coumarate- Poly(ADP-ribose) glycohydrolase1-like CoA ligase 2 120

Key: = Retrotransposon elements = Chloroplast DNA = Homology identified coding region = Mitochondrial DNA = Hypothetical proteins = Phenylpropanoid gene = No hit Arrows point in the proposed direction of transcription

Figure 18. Schematic representation of the distribution of putative genes identified in P. vulgaris PV-GBa 0005G03 BAC clone

92

Table 13. Predicted genes in the BAC clone PV-GBa 0005G03 and their putative identities. Positionon Length BLASTX Accession number E BAC (bp) (organism) value 763-4059 3296 Transposon protein, Mutator sub- gi|77551327 (O. sativa) 1E-24 class 4730-8557 3827 Uncharacterized protein gi|356540363 (G. max) 0 8753-16546 7793 ADP-ribosylation factor gi|255557975 (R. 1E-101 communis) 16639-34460 17821 Exportin-1-like isoform 2 gi|356538753 (G. max) 0 34721-36510 1789 Chalcone synthase 17 gi|1345810 (P. vulgaris) 0 37003-41482 4479 No hit 41528-45032 3504 Chalcone synthase 17 gi|1345810 (P. vulgaris) 0 46090-48439 2349 Chalcone synthase 17 gi|1345810 (P. vulgaris) 1E-115 50144-53788 3644 No hit 53797-57363 3566 Chalcone synthase 17 gi|1345810 (P. vulgaris) 0 59365-62908 3543 Uncharacterized protein gi|356546458 (G. max) 3E-13 63339-67220 3881 Chalcone synthase 17 gi|1345810 (P. vulgaris) 0 72389-74705 2316 Chalcone synthase 17 gi|1345810 (P. vulgaris) 0 77115-79636 2521 Chalcone synthase 17 gi|1345810 (P. vulgaris) 5E-99 79668-84983 5315 Chalcone synthase 17 gi|1345810 (P. vulgaris) 0 85406-90478 5072 E3 ubiquitin-protein ligase gi|356538686 (G. max) 1E-113 synoviolin-like 91245-99688 8443 Uncharacterized protein gi|351726614 (G. max) 9E-89 99915-113296 13381 Uncharacterized protein gi|356495735 (G. max) 0 113576-118987 5411 Transcription factor-like protein gi|356497355 (G. max) 4E-95 DPB-like 119123-137952 18829 Poly(ADP-ribose) glycohydrolase 1- gi|356538602 (G. max) 0 like 137715-138036 321 4-coumarate-CoA ligase 2 gi|18266852 (G. max) 9E-25

PvCHS-A and PvCHS-B sequence analysis

The coding sequence of PvCHS-A, which exists as a single copy gene in clone PV-GBa

0083H05, is 1700 bp long and contains one intron that is 534 bp long. The PvCHS-A gene codes for a protein with 389 amino acids.

93

In contrast, the eight copies of PvCHS-B that occur in PV-GBa 0005G03, have an average size of 1270 bp for the coding sequence and each include one intron of 104 bp and code for a protein with 389 amino acid residues.

The Alignment of PvCHS-A and PvCHS-B-1 (Fig 19) shows that the two genes are very similar in their exons and the major difference occurs in the single intron.

94

PvCHS-B 1 ATGGTGAGTGTATCCGAGATCCGCCAGGCTCAAAGGGCAGAAGGCCCAGCAACCATCCTT PvCHS-A 1 ATGGTGACAGTCGAGGAAATCCGCAACGCCCAGCGCTCCCATGGCCCCGCCACCATCCTC

PvCHS-B 61 GCCATTGGAACTGCAACCCCATCTAACTGTGTTGATCAGAGCACATATCCCGATTACTAC PvCHS-A 61 GCCTTTGGCACTGCCACCCC-TCCAACTGTATCTCCCAAGCGGATTACCCTGACTACTAC

PvCHS-B 121 TTCAGAATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTCCAGCGCATGTGT PvCHS-A 120 TTCCGCATTACCAACAGCGAACACATGACCGACCTCAAGGAGAAGTTCAAGCGCATGTGT

PvCHS-B 181 a------agtcctctcatcttctcatatct PvCHS-A 180 acgttctatcaacacctcaactccgtttatcgttattagtcttttaattgtctcagattt

PvCHS-B 205 acca------PvCHS-A 240 cccatttttgcttattcagtttttataatcagttaagaaactgattttctgtctgaaatt

PvCHS-B 209 ------PvCHS-A 300 taggttcacttaaagcttaccctcactgttaggtgaaacttgacctacccgcagagttca

PvCHS-B 209 ------aaatcagaaatctaacaagtttatgatgttgaatga------PvCHS-A 360 tggaacttttaattatgataatatgaatttaagaggtttttgttttttagttatactcag

PvCHS-B 245 ------PvCHS-A 420 aaaatactttttacacttctactctcttgaatcatctctttaagcaaaatgaaaaattat

PvCHS-B 245 ------aggtgattaacattca------PvCHS-A 480 ggctgtgagattattactcttcactaatgacgcataaaatcattttcatcgtgtttatcc

PvCHS-B 261 ------PvCHS-A 540 tcacaagaactcaattgtaaagttctgtgccttattgcaccagtgattttcaaacaattt

PvCHS-B 261 ------tattggtttacaa------PvCHS-A 600 gtggcaaaggttttcttgtagctctttgttagcagtttagaaaagcgatttttgtggctg

PvCHS-B 274 ---tgttt------caggtGATAAG PvCHS-A 660 atgtatttacttttttaagttcaaaataagaaaagtgatttgaattagacaggtGAAAAG

PvCHS-B 290 TCGATGATAAAGAAGAGATATATGCACCTGAACGAGGAGATACTGAAGGAGAATCCTAAC PvCHS-A 720 TCGATGATAAAGAAGCGTTACATGCACCTGACGGAGGAGTTTCTGAAGGAGAATCCAAAC

PvCHS-B 350 ATGTGTGCTTACATGGCACCTTCTTTGGATGCGAGGCAAGACATAGTGGTGGTAGAGGTA PvCHS-A 780 ATGTGTGCGTACATGGCGCCGTCGCTGGACGCGAGGCAGGACATAGTGGTGGTGGAAGTG

PvCHS-B 410 CCAAAGCTAGGGAAAGAGGCTGCAGTGAAGGCCATAAAGGAGTGGGGACAGCCAAAGTCA PvCHS-A 840 CCGAAGCTGGGAAAAGAAGCAGCGAGGAAGGCGATAAAGGAGTGGGGTCAACCCAAGTCA

PvCHS-B 470 AAGATTACACACTTGATATTTTGCACCACCAGTGGTGTGGACATGCCTGGTGCTGATTAC PvCHS-A 900 AAGATCACACACCTGGTGTTCTGCACCACTTCAGGCGTGGACATGCCTGGAGCCGATTAC

PvCHS-B 530 CAGCTCACCAAACTCTTGGGACTTCGGCCCTATGTGAAGAGGTACATGATGTACCAACAA PvCHS-A 960 CAGCTTACCAAGCTTCTAGGGCTGAGGTCCTCCGTGAAGCGCCTCATGATGTACCAGCAG

PvCHS-B 590 GGATGCTTTGCAGGAGGCACGGTTCTTCGATTGGCCAAGGATTTGGCTGAGAACAACAAG PvCHS-A 1020 GGCTGCTTTGCCGGCGGCACCGTCCTCCGCCTCGCCAAGGACCTTGCCGAGAACAATAAG

PvCHS-B 650 GGTGCCCGTGTGCTTGTGGTGTGTTCTGAGATCACTGCAGTGACTTTCCGTGGGCCAAGT PvCHS-A 1080 GGCGCCCGTGTTCTAGTAGTCTGCTCCGAAATCACTGCCGTGACGTTCCGCGGCCCGTCC

95

PvCHS-B 710 GACACCCACCTAGACAGTCTTGTGGGACAGGCATTGTTTGGAGATGGAGCAGCTGCAGTG PvCHS-A 1140 GATGCCCACCTTGACTCCCTCGTCGGTCAGGCACTGTTTGGGGACGGAGCTGCCGCGATG

PvCHS-B 770 ATTGTTGGTTCTGACCCAATTCCACAGATTGAGAAGCCTTTGTTTGAGTTGGTTTGGACT PvCHS-A 1200 ATCATAGGAGCGGATCCTGACAGGAGTGTAGAACGGCCTATATTTGAATTGGTATCGGCC

PvCHS-B 830 GCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGTCACCTTCGTGAAGTTGGA PvCHS-A 1260 GCCCAGACCATTCTGCCGGACTCTGATGGTGCCATCGACGGGCACTTAAGGGAGGTGGGA

PvCHS-B 890 CTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTGGAAAGGCA PvCHS-A 1320 CTAACGTTCCATCTTCTAAAAGATGTGCCTGGAATCATCTCGAAGAACATTGAGAAGAGT

PvCHS-B 950 CTTTTTGAGGCCTTCAACCCATTGAACATATCTGATTACAACTCCATCTTCTGGATTGCA PvCHS-A 1380 CTGACAGAGGCGTTTGCGCCGATTGGGATTAATGACTGGAACTCGATCTTCTGGGTGGCA

PvCHS-B 1010 CACCCTGGTGGACCTGCAATTCTGGACCAAGTTGAGCAAAAGTTGGGTCTGAAACCTGAA PvCHS-A 1440 CACCCGGGTGGACCGGCGATTCTGGACCAGGTTGAGGAGAAGTTACGGCTGAAACCGGAG

PvCHS-B 1070 AAGATGAAGGCCACTAGAGATGTGCTGAGTGATTATGGGAACATGTCAAGTGCATGTGTG PvCHS-A 1500 AAACTCCGGTCCACCCGGCACGTGCTGAGCGAGTATGGAAACATGTCAAGTGCATGTGTT

PvCHS-B 1130 CTATTCATCTTGGATGAGATGAGGAGGAAATCAGTTGAAAATGGACTTAAAACGACAGGT PvCHS-A 1560 TTGTTCATTCTTGATGAAATGAGGAAGAAGTCGAAGGAGGAAGAGAAGGGCAGCACAGGA

PvCHS-B 1190 GAAGGACTTGAATGGGGTGTTTTGTT-GGTTTTGGACCTGGACTTACCATTGAGACCGTT PvCHS-A 1620 GAAGGGCTAGAATGGGGGGTGTTATTCGGGTTCGGGCCGGGTCTAACCGTTGAGACGGTT

PvCHS-B 1249 GTTCTCCACAGTGTCGCAGTA--- PvCHS-A 1680 GTGCTGCACAGCGTTCCCTTGGAG

Figure 19. DNA sequence comparison of PvCHS-A and PvCHS-B (PvCHS-B-1 was used as a representative of bean PvCHS-B family) from sequenced BAC clones. Lower case indicates the intron. Dark shading indicates positions where both sequences are identical and lighter shading represents positions where there is a lower conservation.

Members of the PvCHS-B gene family had minor differences in sequence compared to each other and one partial CHS was present between two differently oriented CHS genes (Fig

18). Fig 20 is a comparison among eight complete PvCHS-B sequences from PV-GBa 0005G03.

The nucleotide alignment of the PvCHS-B gene family shows that beside single nucleotide differences between each gene, there are also large deletions in some members. For example,

PvCHS-B-3 is missing more than 380bp, whereas, PvCHS-B-7, PvCHS-B-5 and PvCHS-B-4 have approximately 180bp, 70bp and 30bp deletions, respectively. All deletions occur in the second exon and this will affect the protein sequence, structure and possibly function (Fig 20).

96

PvCHS-B-5 1 ATGGTGAGTGTATCCGAGATCCGCCAGGCTCAAAGGGCAGAA----GGCCCAGCAAC-CA PvCHS-B-7 1 ATGGTGAGTGTATCCGAGATCCGCCAGGCTCAAAGGGCAGAA----GGCCCAGCAAC-CA PvCHS-B-1 1 ATGGTGAGTGTATCCGAGATCCGCCAGGCTCAAAGGGCAGAA----GGCCCAGCAAC-CA PvCHS-B-3 1 ATGGTGAGTGTATCCGAGATCCGCCAGGCTCAAAGGGCAGAA----GGCCCAGCAAA-CA PvCHS-B-8 1 ATGGTGAGTGTATCCGAGATTCGTCAGGTTCAAAGGGCAGAA----GGTCCAGCAAC-CA PvCHS-B-4 1 ATGGTGAGTGTATCCGAGATCCGCCAGGCTCAAAGGGCAGAA----GGCCCAGCAAC-CA PvCHS-B-6 1 ATGGTGAGTGTATCTGAGATCCGACAGGCTCAAAGGGCAGAA----GGTCCAGCAAC-CA PvCHS-B-2 1 ATGGTGAGTGTATCCGAGATTCGTCAGGCTCAAAGGGCAGAA----GGCCCAGCAAC-CA PvCHS-B-p 1 ------AAGCACAAATGTTAGGATTCTGCTTCAGTATCTCA

PvCHS-B-5 56 TCCTTGCCATTGGAACTGCAACCCCATCTAACTGTGTTGATCAGAGCACATATCCCGATT PvCHS-B-7 56 TCCTTGCCATTGGAACTGCAACCCCATCAAACTGTGTTGATCAGAGCACAT--CCCGATT PvCHS-B-1 56 TCCTTGCCATAGGAACTGCAACCCCATCAAACTGTGTTGATCAAAGCACATATCCTGATT PvCHS-B-3 56 TCCTTGCCATTGGAACTGCAACCCCATCAAACTGTGTTGATCAGAGCACATATCCCGATT PvCHS-B-8 56 TCCTTGCCATTGGAACTGCAACCCCATCGAACTGTGTTGATCAGAGCACATATCCTGATT PvCHS-B-4 56 TCCTTGCCATTGGAACTGCAACCCCATCTAACTGTGTTGATCAGAGCACGTATCCCGATT PvCHS-B-6 56 TCCTTGCCATTGGAACTGCAACCCCATCTAACTGTGTTGATCAGAGCACATATCCCGATT PvCHS-B-2 56 TCCTTGCCATTGGAACTGCAACCCCATCTAACTGTGTTGATCAGAGCACATATCCTGATT PvCHS-B-p 36 TCGTT-CAGTTGCA--TATATCTCTTCTTTATTAT--CGATT------TGTCACCTGA--

PvCHS-B-5 116 ACTACTTCAGA--ATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTCCAGCG PvCHS-B-7 114 ACTACTTCAGA--ATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTCCAGCG PvCHS-B-1 116 ACTACTTCAGA--ATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTCCAGCG PvCHS-B-3 116 ACTACTTCAGA--ATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTCCAGCG PvCHS-B-8 116 ACTACTTCAGA--ATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTCCAGCG PvCHS-B-4 116 ACTACTTCAGA--ATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTCCAGCG PvCHS-B-6 116 ACTACTTCAGA--ATCACAAACAGTGAGCACATGACCGACCTCAAAGAGAAGTTTCAGCG PvCHS-B-2 116 ACTACTTCAGA--ATCACAAACAGTGAACACATGACCGACCTCAAAGAGAAGTTCCAGCG PvCHS-B-p 83 AACATTGTAAACCAATATGAATGTTAATCACAT--TCG--TTCAA------CAACA

PvCHS-B-5 174 CATGTGTAAGTCC-TCTCATCTTCTCATATCTACCAAAATCAGAAATCTAA---CAAGTT PvCHS-B-7 172 CATGTGTAAGTCC-TCTCATCTTCTCATATCTACCAAAATCCTAAATTTATTACCAAGTT PvCHS-B-1 174 CATGTGTAAGTCC-TCTCATCTTCTCATATCTACCAAAATCCTAAATTTAA---CAAGTT PvCHS-B-3 174 CATGTGTAAGTCC-TCTCATCTTCTCATATCTACCAAAATCAGAAATCTAA---CAAGTT PvCHS-B-8 174 CATGTGTAAGTCCCTCTCATCTTCTCATATCTACAAAAATCAGAAATCTAA---CAAGTT PvCHS-B-4 174 CATGTGTAAGTCC-TCTCATCTTCTCATATCTACCAAAATCCTAAATTTAA---CAAGTT PvCHS-B-6 174 CATGTGTAAGTCC-TCTCATCTTCTCATATCTACCAAAATCCTAAATTTAA---CAAGTT PvCHS-B-2 174 CATGTGTAAGTCCCTCTCATCTTTTGATATCTACCAAAATCCAAAATTTAA---CAAGTT PvCHS-B-p 129 ------TAAA--CTTCTTAAATTTTGGATTTTGATAGAAT-GAAAATTTAA---GAGGTT

PvCHS-B-5 230 TATGATGTTGAATGAAGGTGAT---TAACATTCATATTGGTTTACAATGTTTCAGGTGAT PvCHS-B-7 231 TATGATGTTGAATGAAGATGAT---TAATATTGATATTGGTTTAAAATGTTTCAGGTGAC PvCHS-B-1 230 TATGATGTTCAATGAAGATGAT---TAACATTCATATTGGTTTAAAATATTTCAGGTGAC PvCHS-B-3 230 TATGATGTTGAATGAAGGTGAT---TAACATTCATATTGGTTTACAATGTTTCAGGTGAT PvCHS-B-8 231 TATGATGTTGAATGAAGGTGATGATTAACATTGATATTGGTTTACAATGTTTCAGGTGAT PvCHS-B-4 230 TACGATGTTGAATGAAGATGAT---TAACATTCATATTGGTTTACAATGTTTCAGGTGAT PvCHS-B-6 230 TATGATGTTGAATGAAGGTGAT---TAACATTGATATTGGTTTACAATGTTTCAGGTGAC PvCHS-B-2 231 TACGATGTTGAATGAAGATGAT---TAAAATTCATATTGGTTTACAATGTTTCAGGTGAC PvCHS-B-p 177 TCTTTTGTTTAATGAAGATGAT---TAAGATTGATATTGGTTTACAATGTTTCAGGTTAC

Figure 20. Multiple-alignment of individual PvCHS-B sequences in the PV-GBa 0005G03 BAC clone. PvCHS-B-p stands for partial PvCHS-B. Dark shading indicates positions where all sequences are identical and lighter shading represents positions where there is lower conservation. Lower case indicates the intron. The comparison was done using CLUSTAL 2.1 multiple sequence alignment. Contn’d in the next few pages. 97

PvCHS-B-5 287 AAGT-CGATGATAAAGAAGAGATATATGCACCTGAACGAGGAGATACTGAAGGAGAATCC PvCHS-B-7 288 AAGT-CGATGATAAAGAAGAGATATATGCACCTCAACGAGGAGATACTGAAGGAGAATCC PvCHS-B-1 287 AAGT-CGATGATAAAGAAGAGATATATGCACCTGGACGAGGAGATACTGAAGGAGAATCC PvCHS-B-3 287 AAGT-CGATGATAAAGAAGAGATATATGCACCTGAACGAGGAGATACTGAAGGAGAATCC PvCHS-B-8 291 AAGT-CGATGATAAAGAAGAGATATATGCACCTCAACGAGGAGATACTGAAGGAGAATCC PvCHS-B-4 287 AAGT-CGATGATAAAGAAGAGATATATGCACCTGAACGAGGAGATACTGAAGGAGAATCC PvCHS-B-6 287 AAGT-CGATGATAAAGAAGAGATATATGCACTTGAACGAGGAGATACTGAAGGAGAATCC PvCHS-B-2 288 AAGT-CGATGATAAAGAAGAGATATATGCACCTGGACGAGGAGATACTGAAGGAGAATCC PvCHS-B-p 234 AATTTCAATGATAAAGAAGAGATATATGCACCGGAACGAGGAGA------

PvCHS-B-5 346 TAACATGTGTGCTTACATGGCACCTTCTTTGGATGCGAGGCAAGACATAGTGGTGGTAGA PvCHS-B-7 347 TAACATGTGTGCTTACATGGCACCTTCTTTGGATGCGAGACAAGACATAGTGGTGGTAGA PvCHS-B-1 346 TAACATGTGTGCTTACATGGCACCTTCTTTGGATGCGAGGCAAGACATAGTGGTGGTAGA PvCHS-B-3 346 TAACATGTGTGCTTACATGGCACCTTCTTT------PvCHS-B-8 350 TAACATGTGTGCTTACATGGCACCTTCTTTAGATGCGAGGCAAGACATAGTTGTGGTAGA PvCHS-B-4 346 TAACATGTGTGCTTACATGGCACCTTCTTTGGATGCGAGGCAAGACATAGTGGTGGTAGA PvCHS-B-6 346 TAACATGTGTGCTTACATGGCACCTTCTTTGGATGCGAGACAAGACATAGTGGTGGTAGA PvCHS-B-2 347 TAACATGTGTGCTTACATGGCACCTTCTTTGGATGCGAGACAAGACATAGTGGTGGTAGA PvCHS-B-p ------

PvCHS-B-5 406 GGTACCAAAGCTAGGGAAAGAGGCTGCAGTCAAGGCCATAAAGGAGTGGGGGCAGCCAAA PvCHS-B-7 407 GGTACCAAAGTTAGGGAAAGAGGCTGCAGTGAAGGCCATAAAGGAGTGGGGACAGCCAAA PvCHS-B-1 406 GGTACCAAAGCTAGGGAAAGAGGCTGCAGTGAAGGCCATAAAGGAGTGGGGGCAGCCAAA PvCHS-B-3 376 ------PvCHS-B-8 410 GGTACCAAAGCTAGGGAAAGAGGCTGCAGTGAAGGCCATAAAGGAGTGGGGACAGCCAAA PvCHS-B-4 406 AGTACCAAAGCTAGGGAAAGAGGCTGCAGTCAAGGCCATAAAGGAGTGGGGGCAGCCAAA PvCHS-B-6 406 GGTACCAAAGTTAGGGAAAGAGGCTGCAGTGAAGGCCATAAAGGAGTGGGGACAGCCAAA PvCHS-B-2 407 GGTACCAAAGCTAGGGAAAGAGGCTGCAGTGAAGGCCATAAAGGAGTGGGGGCAGCCAAA PvCHS-B-p ------

PvCHS-B-5 466 GTCAAAGATTACACACTTGATATTTTGCACCACCAGTGGCGTGGACATGCCTGGTGCTGA PvCHS-B-7 467 GTCAAAGATTACACACTTGATATTTTGCACCACTAGTGGCGTGGACATGCCTGGTGCTGA PvCHS-B-1 466 GTCAAAGATTACACACTTGATATTTTGCACCACCAGTGGCGTGGACATGCCCGGTGCTGA PvCHS-B-3 376 ------PvCHS-B-8 470 GTCAAAGATTACACACTTGATATTTTGCACCACTAGTGGCGTGGACATGCCTGGTGCTGA PvCHS-B-4 466 GTCAAAGATTACACACTTGATATTTTGCACCACCAGTGGCGTGGACATGCCTGGTGCTGA PvCHS-B-6 466 GTCAAAGATTACACACTTGATATTTTGCACCACCAGTGGCGTGGACATGCCTGGTGCTGA PvCHS-B-2 467 ATCAAAGATTACACACTTGATATTTTGCACCACCAGTGGTGTGGACATGCCTGGTGCTGA PvCHS-B-p ------

PvCHS-B-5 526 TTACCAGCTCACCAAACTCTTGGGACTTCGGCCCTATGTGAAGAGGTACATGATGTACCA PvCHS-B-7 527 TTACCAGCTCACCAAACTCTTGGGACTTCGGCCCTATGTGAAGAGGTACATGATGTACCA PvCHS-B-1 526 TTACCAACTCACCAAACTCTTGGGACTTCGGCCCTATGTGAAGAGATACATGATGTACCA PvCHS-B-3 376 ------PvCHS-B-8 530 TTACCAGCTCACAAAACTCTTGGGACTTCGACCCTATGTGAAGAGGTACATGATGTACCA PvCHS-B-4 526 TTACCAGCTCACCAAAC------ACATGATGTACCA PvCHS-B-6 526 TTACCAGCTCACCAAACTCTTGGGACTTCGGCCCTATGTGAAGAGGTACATGATGTACCA PvCHS-B-2 527 TTACCAGCTCACCAAACTCTTGGGACTTCGGCCTTATGTGAAGAGGTACATGATGTACCA PvCHS-B-p ------

PvCHS-B-5 586 ACAAGGG-TGCTTTGCAGGAGGCACGG-TTCTTCGAAT--GGCCAAGGATTTGGCTGAGA PvCHS-B-7 587 ACAAGGGGTGCTTTGCAGGAGGCACGGGTTCTTCAATTTGGGCCAAGGATTTGGCTGAG- PvCHS-B-1 586 ACAAGGG-TGCTTTGCAGGAGGCACGG-TTCTTCGATT--GGCCAAGGATTTGGCTGAGA PvCHS-B-3 376 ------PvCHS-B-8 590 ACAAGGG-TGCTTTGCAGGAGGCACGG-TTCTTCGAAT--GGCCAAGGATTTGGCTGAGA PvCHS-B-4 556 ACAAGGA-TGCTTTGCAGGAGGCACGG-TTCTTCGATT--GGCCAAGGATTTGGCTGAGA PvCHS-B-6 586 ACAAGGA-TGCTTTGCAGGAGGCACGG-TTCTTCGATT--GGCCAAGGATTTGGCTGAGA

98

PvCHS-B-2 587 ACAAGGA-TGCTTTGCTGGAGGCACGG-TTCTTCGATT--GGCCAAGGATTTGGCTGAGA PvCHS-B-p ------

PvCHS-B-5 642 ACAACAAGGGTGCCCGTGTGCTTGTGGTGTGTTCTGAGATCACTGCAGTGACTTTCCGTG PvCHS-B-7 646 ------PvCHS-B-1 642 ACAACAAGGGTGCCCGTGTGCTTGTGGTGTGTTCTGAGATAACTGCGGTGACCTTCCGTG PvCHS-B-3 376 ------PvCHS-B-8 646 ACAACAAGGGTGCCCGTGTGCTTGTGGTGTGTTCTGAGATCACTGCAGTGACTTTCCGTG PvCHS-B-4 612 ACAACAAGGGTGCCCGTGTGCTTGTGGTGTGTTCTGAGATCACTGCAGTCACCTTCCGTG PvCHS-B-6 642 ACAACAAAGGTGCCCGTGTGCTTGTGGTGTGTTCTGAGATCACTGCAGTGACTTTCCGTG PvCHS-B-2 643 ACAACAAGGGTGCCCGTGTGCTTGTGGTGTGTTCTGAGATCACTGCAGTGACTTTCCGTG PvCHS-B-p ------

PvCHS-B-5 702 GGCCAAGTGA------PvCHS-B-7 646 ------PvCHS-B-1 702 GGCCAAGTGACACCCACCTAGACAGTCTTGTGGGACAGGCATTGTTTGGAGATGGAGCAG PvCHS-B-3 376 ------PvCHS-B-8 706 GGCCAAGTGACACCCACCTAGACAGTCTTGTGGGACAGGCATTGTTTGGAGATGGAGCAG PvCHS-B-4 672 GGCCAAGTGACACCCACCTAGACAGTCTTGTGGGACAGGCATTGTTTGGAGATGGAGCAG PvCHS-B-6 702 GGCCAAGTGACACCCACCTAGACAGTCTTGTGGGACAGGCATTGTTTGGAGATGGAGCAG PvCHS-B-2 703 GGCCAAGTGACACCCACCTAGACAGTCTTGTGGGTCAGGCATTGTTTGGAGATGGAGCAG PvCHS-B-p ------

PvCHS-B-5 712 ------GTTCTGACCCAATTCCACAGATTGAGAAGCCTTTGTTTGAGTTGG PvCHS-B-7 646 ------PvCHS-B-1 762 CTGCAGTGATTGTTGGTTCTGACCCAATTCCACAGATTGAGAAGCCTTTGTTTGAGCTGG PvCHS-B-3 376 --GCAGTGATTGTTGGTTCTGACCCAATTCCACAGATTGAGAAGCCTTTGTTTGAGTTGG PvCHS-B-8 766 CTGCAGTGATTGTTGGTTCTGACCCAATTCCACAGATTGAGAAGCCTTTGTTTGAGCTGG PvCHS-B-4 732 CTGCAGTGATTGTTGGTTCTGATCCTGTTCCACAGATTGAGAAGCCTTTGTTTGAGTTGG PvCHS-B-6 762 CTGCAGTGATTGTTGGTTCTGACCCAATTCCACAGATTGAGAAGCCTTTGTTTGAGTTGG PvCHS-B-2 763 CTGCAGTGATTGTTGGTTCTGACCCAATTCCACAGATTGAGAAGCCTTTGTTTGAACTGG PvCHS-B-p ------

PvCHS-B-5 757 TTTGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGTCACCTTCGTG PvCHS-B-7 646 --TGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGCCACCTTCGTG PvCHS-B-1 822 TTTGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGTCACCTTCGTG PvCHS-B-3 434 TTTGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGCCACCTTCGTG PvCHS-B-8 826 TTTGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGCCACCTTCGTG PvCHS-B-4 792 TTTGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGCCACCTTCGTG PvCHS-B-6 822 TTTGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGTCACCTTCGTG PvCHS-B-2 823 TTTGGACTGCACAGACCATTGCTCCAGACAGTGATGGTGCTATTGATGGCCACCTACGTG PvCHS-B-p ------

PvCHS-B-5 817 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-7 704 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-1 882 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-3 494 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-8 886 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-4 852 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-6 882 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-2 883 AAGTTGGACTCACGTTTCACCTCCTTAAGGATGTTCCTGGGATTGTCTCAAAGAACATTG PvCHS-B-p ------

PvCHS-B-5 877 GAAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATATCTGATTACAACTCCATCTTCT PvCHS-B-7 764 GAAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATCTCTGATTACAACTCCATCTTCT PvCHS-B-1 942 GAAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATCTCTGATTACAACTCCATCTTCT PvCHS-B-3 554 GAAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATCTCTGATTACAACTCCATCTTCT

99

PvCHS-B-8 946 GAAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATCTCTGATTACAACTCCATCTTCT PvCHS-B-4 912 GAAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATCTCTGATTACAACTCCATCTTCT PvCHS-B-6 942 GAAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATATCTGATTACAACTCCATCTTCT PvCHS-B-2 943 AGAAGGCACTTTTTGAGGCCTTCAACCCATTGAACATCTCTGATTACAACTCCATCTTCT PvCHS-B-p ------

PvCHS-B-5 937 GGATTGCACACCCTGGTGGACCTGCAATTCTGGACCAAGTTGAGCAAAAGTTGGGTCTGA PvCHS-B-7 824 GGATTGCACACCCTGGTGGACCTGCAATTCTGGACCAAGTTGAGCAAAAGTTGGGTCTGA PvCHS-B-1 1002 GGATTGCACACCCTGGTGGACCTGCAATTCTGGACCAAGTTGAGCAAAAGTTGGGTTTGA PvCHS-B-3 614 GGATCGCACACCCTGGTGGACCTGCAATTTTGGACCAAGTTGAGCAAAAGTTGGGTCTGA PvCHS-B-8 1006 GGATCGCACACCCTGGTGGACCTGCAATTCTGGACCAAGTTGAGCAAAAGTTGGGTCTGA PvCHS-B-4 972 GGATTGCACACCCTGGTGGACCTGCAATTCTGGACCAAGTTGAGCAAAAGTTGGATCTGA PvCHS-B-6 1002 GGATTGCACACCCTGGTGGACCTGCAATTCTGGACCAAGTTGAGCAAAAGTTGGGTCTGA PvCHS-B-2 1003 GGATTGCACACCCTGGTGGACCTGCAATTCTAGACCAAGTTGAGCAAAAATTGGGTCTGA PvCHS-B-p ------

PvCHS-B-5 997 AACCTGAAAAGATGAAGGCCACTAGAGATGTGCTGAGTGATTATGGGAACATGTCAAGTG PvCHS-B-7 884 AACCTGAAAAGATGAAGGCCACTAGAGATGTGCTTAGTGATTATGGGAACATGTCAAGTG PvCHS-B-1 1062 AACCTGAAAAGATGAAGGCCACTAGAGATGTGCTTAGCAATTATGGGAACATGTCAAGTG PvCHS-B-3 674 AACCTGAAAAGATGAAGGCCACAAGAGATGTGCTGAGTGATTATGGGAACATGTCAAGTG PvCHS-B-8 1066 AACCTGAAAAGATGAAGGCGACTAGAGATGTGCTGAGTGATTATGGGAACATGTCAAGTG PvCHS-B-4 1032 AACCTGAAAAGATGAAGGCCACTAGAGATGTGCTGAGTGATTATGGGAACATGTCAAGTG PvCHS-B-6 1062 AACCTGAAAAGATGAAGGCCACTAGAGATGTGCTGAGTGATTATGGGAACATGTCAAGTG PvCHS-B-2 1063 AACCTGAAAAGATGAAGGCCACTAGAGATGTGCTGAGTGATTACGGAAACATGTCAAGTG PvCHS-B-p ------

PvCHS-B-5 1057 CATGTGTGCTATTCATCTTGGATGAGATGAGGAGGAAATCAGTTGAAAATGGACTTAAAA PvCHS-B-7 944 CATGTGTGCTTTTCATCTTGGATGAGATGAGGAGGAAATCAGTTGAAAATGGACTTAAAA PvCHS-B-1 1122 CATGTGTGCTATTCATCTTGGATGAGATGAGGAGGAAATCAGTTGAAAATGGACTTAAAA PvCHS-B-3 734 CATGTGTGCTATTCATCTTGGATGAGATGAGGAGGAAATCAGCTGAAAAGGGACTTAAAA PvCHS-B-8 1126 CATGTGTGCTATTCATCTTGGATGAGATGAGGAGAAAATCAGCTGAAAAGGGACTTAAAA PvCHS-B-4 1092 CATGTGTGCTATTCATCTTGGATGAAATGAGGAGGAAATCAGCTGAAAATGGACTTAAAA PvCHS-B-6 1122 CATGTGTGCTATTCATCTTGGATGAGATGAGGAGGAAATCAGCTGAAAAGGGACTTAAAA PvCHS-B-2 1123 CATGTGTGCTTTTCATCTTGGATGAGATGAGAAGGAAATCAGCTGAAAATGGACTTAAAA PvCHS-B-p ------

PvCHS-B-5 1117 CGACAGGTGAAGGACTTGAATGGGGTGTTTTGTTTGGTTTTGGACCTGGACTTACCATTG PvCHS-B-7 1004 CAACAGGTGAAGGACTTGAATGGGGTGTTTTGTTTGGTTTTGGACCTGGTCTTACCATCG PvCHS-B-1 1182 CGACAGGTGAAGGACTTGAATGGGGTGTTTTGTT-GGGTTTGGACCTGGACTTACCATTG PvCHS-B-3 794 CAACAGGTGAAGGACTTGAATGGGGTGTTTTGTTTGGTTTTGGACCCGGACTTACCATCG PvCHS-B-8 1186 CAACAGGTGAAGGACTTGAATGGGGTGTTTTGTTTGGTTTTGGACCCGGACTTACTATCG PvCHS-B-4 1152 CGACAGGTGAAGGACTTGAATGGGGTGTCTTGTTTGGTTTTGGACCTGGACTTACCATTG PvCHS-B-6 1182 CAACAGGTGAAGGACTTGAATGGGGTGTTTTGTTTGGTTTTGGACCTGGACTTACCATCG PvCHS-B-2 1183 CCACAGGTGAAGGACTTGAATGGGGTGTTTTATTTGGTTTCGGACCTGGACTTACCATTG PvCHS-B-p ------

PvCHS-B-5 1177 AGACCGTTGTTCTCCACAGTGTC------PvCHS-B-7 1064 AGACTGTTGTTCTCCACAGTGT------PvCHS-B-1 1241 AGACCGTTGTTCTCCACAGTGTC------PvCHS-B-3 854 AAACAGTCGTTCTCCATAGTGTCGCAATA PvCHS-B-8 1246 AAACAGTCGTTCTCCACAGTGTCGCAATA PvCHS-B-4 1212 AGACTGTTGTTCTCCACAGTGTCTCAATA PvCHS-B-6 1242 AGACCGTCGTTCTCCACAGTGTCGCAATA PvCHS-B-2 1243 AGACCGTTGTTCTCCACAGTGTC------PvCHS-B-p ------

Figure 20. Contn’d. 100

The partial sequence of PvCHS-B is located between two complete PvCHS-B-7 and

PvCHS-B-8 genes. This fragment is 277 bp in length and is missing some sequences from both

3’ and 5’ ends.

A nucleotide sequence comparison between PvCHS-A and PvCHS-B from this study and previously known bean CHSs (Ryder et al. 1987) is shown in Figure 21. CHS1, CHS4, CHS5,

CHS14 and CHS17 are previously published cDNA sequences from common bean (Ryder et al.

1987). To have a more accurate alignment of cDNA and genomic DNA, the only intron was removed from the PvCHS genes. The result of this alignment highlights the similarity between previously published bean CHSs and PvCHS-B. PvCHS-A is by itself in a separate group.

98 PvCHSB-3

54 PvCHSB-8 PvCHSB-6 29 97 CHS17 PvCHSB-7 43 96 CHS4

95 PvCHSB-5 CHS14 90 27 CHS1

99 PvCHSB-1 81 CHS5 PvCHSB-4 PvCHSB-2 PvCHSA

Figure 21. Alignment tree for coding sequences of the PvCHS-B (PvCHSB1-PvCHSB-8) multigene family, PvCHS-A and previously reported CHS (CHS1, CHS4, CHS5, CHS14 and CHS17) (Ryder et al. 1987) genes of common bean. Numbers above the branches are bootstrap probabilities (1,000 replicates).

101

PvCHS protein analysis

There are four motifs in the plant CHS amino acid sequences that include almost all the conserved residues necessary for protein function (Ferrer et al. 1999). Motif I contains the Cys amino acid with at least seven other residues which are highly conserved among the plant CHSs.

In motif II the Phe active residue and highly conserved residues of Asp and Gly are present. The third motif (III) includes the His and Asn active residues, and four other conserved residues (Trp and three Gly residues). A Ser, which forms the coumaroyl-binding pocket, is also in this motif.

The fourth motif (IV) includes the residues at position 372–376 which are involved in substrate specific recognition (Seshime et al. 2005; Ferrer et al. 1999). These residues with the sequence

G(F/L)GPG are known as CHS-family signature sequence (Lu et al. 2009). In Figure 22 active sites, catalytic residues and CoA-binding sites are indicated as ,  and  respectively.

An amino acid residue alignment of PvCHS-B (PvCHS-B-1 as representative of the family) and PvCHS-A with other CHS proteins of Legominosae shows a high level of conservation, especially for the important amino acids in the motifs. This supports the view that

PvCHS-B-1 codes for a functional CHS enzyme (Fig 22). PvCHS-A shows minor differences with other CHS proteins but the essential and functionally important amino acids are conserved.

102

PvCHS-B 1 MVSVSEIRQAQRAEGPATILAIGTATPSNCVDQSTYPDYYFRITNSEHMTDLKEKFQRMC Pv-CHS17 1 MVSVSEIRQAQRAEGPATILAIGTATPSNCVDQSTYPDYYFRITNSEHMTDLKEKFQRMC GmCHS 1 MVSVAEIRQAQRAEGPATILAIGTANPPNCVAQSTYPDYYFRITNSEHMTELKEKFQRMC GsCHS 1 MVSVAEIRQAQRAEGPATILAIGTANPPNCVDQSTYPDYYFRITNSEHMTELKEKFQRMC MsCHS 1 MVSVSEIRKAQRAEGPATILAIGTANPANCVEQSTYPDFYFKITNSEHKTELKEKFQRMC PvCHS-A 1 MVTVEEIRNAQRSHGPATILAFGTATPSNCISQADYPDYYFRITNSEHMTDLKEKFKRMC

PvCHS-B 61 DKSMIKKRYMHLDEEILKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAAVKAIKEWGQP Pv-CHS17 61 DKSMIKKRYMHLNEEILKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAAVKAIKEWGQP GmCHS 61 DKSMIKRRYMYLNEEILKENPNMCAYMAPSLDARQDMVVVEVPKLGKEAAVKAIKEWGQP GsCHS 61 DKSMIKRRYMYLNEEILKENPNMCAYMAPSLDARQDMVVVEVPKLGKEAAVKAIKEWGQP MsCHS 61 DKSMIKRRYMYLTEEILKENPNVCEYMAPSLDARQDMVVVEVPRLGKEAAVKAIKEWGQP PvCHS-A 61 EKSMIKKRYMHLTEEFLKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAARKAIKEWGQP

PvCHS-B 121 KSKITHLIFCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAEN Pv-CHS17 121 KSKITHLIFCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAEN GmCHS 121 KSKITHLIFCTTSGVDMPGADYQLTKQLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAEN GsCHS 121 KSKITHLIFCTTSGVDMPGADYQLTKQLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAEN MsCHS 121 KSKITHLIVCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAEN PvCHS-A 121 KSKITHLVFCTTSGVDMPGADYQLTKLLGLRSSVKRLMMYQQGCFAGGTVLRLAKDLAEN

Motif I    

PvCHS-B 181 NKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPIPQIEKPLFELV Pv-CHS17 181 NKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPIPQIEKPLFELV GmCHS 181 NKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPIPQVEKPLYELV GsCHS 181 NKGARVLVVCSEITAVTFRGPSDTRLDSLVGQALFGDGAAAVIVGSDPIPQVEKPLYELV MsCHS 181 NKGARVLVVCSEVTAVTFRGPSDTHLDSLVGQALFGDGAAALIVGSDPVPEIEKPIFEMV PvCHS-A 181 NKGARVLVVCSEITAVTFRGPSDAHLDSLVGQALFGDGAAAMIIGADPDRSVERPIFELV

    Motif II

PvCHS-B 241 WTAQTIAPDSDGAIDGHLREVGLTFHLLKDVPGIVSKNIGKALFEAFNPLNISDYNSIFW Pv-CHS17 241 WTAQTIAPDSDGAIDGHLREVGLTFHLLKDVPGIVSKNIGKALFEAFNPLNISDYNSIFW GmCHS 241 WTAQTIAPDSEGAIDGHLREVGLTFHLLKDVPGIVSKNIDKALFEAFNPLNISDYNSIFW GsCHS 241 WTAQTIAPDSEGAIDGHLREVGLTFHLLKDVPGIVSKNIDKALFEAFNPLNISDYNSIFW MsCHS 241 WTAQTIAPDSEGAIDGHLREAGLTFHLLKDVPGIVSKNITKALVEAFEPLGISDYNSIFW PvCHS-A 241 SAAQTILPDSDGAIDGHLREVGLTFHLLKDVPGIISKNIEKSLTEAFAPIGINDWNSIFW

   Motif III

PvCHS-B 301 IAHPGGPAILDQVEQKLGLKPEKMKATRDVLSNYGNMSSACVLFILDEMRRKSVENGLKT Pv-CHS17 301 IAHPGGPAILDQVEQKLGLKPEKMKATRDVLSDYGNMSGACVLFILDEMRRKSAEKGLKT GmCHS 301 IAHPGGPAILDQVEQKLGLKPEKMKATRDVLSEYGNMSSACVLFILDEMRRKSAENGLKT GsCHS 301 IAHPGGPAILDQVEQKLGLKPEKMKATRDVLSEYGNMSSACVLFILDEMRRKSAENGLKT MsCHS 301 IAHPGGPAILDQVEQKLALKPEKMNATREVLSEYGNMSSACVLFILDEMRKKSTQNGLKT PvCHS-A 301 VAHPGGPAILDQVEEKLRLKPEKLRSTRHVLSEYGNMSSACVLFILDEMRKKSKEEEKGS   

PvCHS-B 361 TGEGLEWGVLFGFGPGLTIETVVLHSVAI-- Pv-CHS17 361 TGEGLEWGVLFGFGPGLTIETVVLHSVAI-- GmCHS 361 TGEGLEWGVLFGFGPGLTIETVVLRSVAI-- GsCHS 361 TGEGLEWGVLFGFGPGLTIETVVLRSVAI-- MsCHS 361 TGEGLEWGVLFGFGPGLTIETVVLRSVAI-- PvCHS-A 361 TGEGLEWGVLFGFGPGLTVETVVLHSVPLEG

Motif IV

103

Figure 22. Multi-alignment of deduced amino acid sequences of PvCHS-B (PvCHSB-1 was used as representative of the family) and other leguminosae CHSs. The aligned CHSs are from G. max (AAO67373), Medicago sativa (CAC20725) and Glycine soja (ACT32034). Highly conserved residues are indicated in white with black background and partially conserved residues are showed in white background. The protein sequence of PvCHS-B has four CHS-specific conserved motifs (Marked Motif I, II, III and IV). Active site, catalytic residues and CoA- binding sites are indicated as ,  and , respectively.

Three-Dimensional Model Analysis

A comparative modeling of the 3D structure of PvCHS-B protein was performed using SWISS-

MODEL (Arnold, 2006). The template for modeling was the 3D structure of alfalfa MsCHS

(PDB No. 1bi5). The model covered amino acids 1-389 of PvCHS-B and consisted of α-helices and β-turns.

104

Figure 23. The three-dimensional structure of the predicted protein encoded by the PvCHS-B-1 gene (100% confidence). The structure was derived by homology-based 3-D structural modeling using SWISS-MODEL Workspace (Arnold, 2006). Active site and CoA binding sites are specified on the picture.

In silico amino acid analysis of PvCHS-B gene family

The deduced amino acid sequences of eight members of PvCHS-B were aligned and compared to one another (Fig 24). As expected, the amino acid sequences of PvCHSB-3, PvCHSB-5 and

PvCHSB-7 were different from the rest of the group and depending on the size of the deletion at the nucleotide level, their amino acid sequences were different. PvCHS-B-3 has the biggest deletion in its second exon followed by PvCHS-B-7 and PvCHS-B-5. The dendrogram based on the amino acid sequences of members of this gene family indicates that PvCHS-B-7, PvCHS-B-5 and PvCHS-B-3 were separate from the other family members (Fig 24). Therefore, these sequences were excluded from the amino acid alignment in Fig 25.

84 PvCHS-B-6 78 PvCHS-B-8 57 PvCHS-B-2 100 PvCHS-B-1 PvCHS-B-4 PvCHS-B-5 PvCHS-B-3 100 PvCHS-B-7

0.1

Figure 24. Dendrogram of deduced amino acid sequences of the PvCHS-B (PvCHSB1- PvCHSB-8) multigene family. Numbers above the branches are bootstrap probabilities (1,000 replicates).

105

PvCHS-B-4 1 MVSVSEIRQAQRAEGPATILAIGTATPSNCVDQSTYPDYYFRITNSEHMTDLKEKFQRMC PvCHS-B-6 1 MVSVSEIRQAQRAEGPATILAIGTATPSNCVDQSTYPDYYFRITNSEHMTDLKEKFQRMC PvCHS-B-8 1 MVSVSEIRQVQRAEGPATILAIGTATPSNCVDQSTYPDYYFRITNSEHMTDLKEKFQRMC PvCHS-B-2 1 MVSVSEIRQAQRAEGPATILAIGTATPSNCVDQSTYPDYYFRITNSEHMTDLKEKFQRMC PvCHS-B-1 1 MVSVSEIRQAQRAEGPATILAIGTATPSNCVDQSTYPDYYFRITNSEHMTDLKEKFQRMC

PvCHS-B-4 61 DKSMIKKRYMHLNEEILKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAAVKAIKEWGQP PvCHS-B-6 61 DKSMIKKRYMHLNEEILKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAAVKAIKEWGQP PvCHS-B-8 61 DKSMIKKRYMHLNEEILKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAAVKAIKEWGQP PvCHS-B-2 61 DKSMIKKRYMHLDEEILKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAAVKAIKEWGQP PvCHS-B-1 61 DKSMIKKRYMHLDEEILKENPNMCAYMAPSLDARQDIVVVEVPKLGKEAAVKAIKEWGQP

PvCHS-B-4 121 KSK-ITHLIFCTTSGVDMPGADYQLT------KHMMYQQGCFAGGTVLRLAKDLAE PvCHS-B-6 121 KSK-ITHLIFCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAE PvCHS-B-8 121 KSK-ITHLIFCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRMAKDLAE PvCHS-B-2 121 KSK-ITHLIFCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAE PvCHS-B-1 121 KSK-ITHLIFCTTSGVDMPGADYQLTKLLGLRPYVKRYMMYQQGCFAGGTVLRLAKDLAE

Motif I     PvCHS-B-4 170 NNKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPVPQIEKPLFEL PvCHS-B-6 180 NNKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPIPQIEKPLFEL PvCHS-B-8 180 NNKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPIPQIEKPLFEL PvCHS-B-2 180 NNKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPIPQIEKPLFEL PvCHS-B-1 180 NNKGARVLVVCSEITAVTFRGPSDTHLDSLVGQALFGDGAAAVIVGSDPIPQIEKPLFEL

    Motif II PvCHS-B-4 230 VWTAQTIAPDSDGAIDGHLREVGLTFHLLKDVPGIVSKNIGKALFEAFNPLNISDYNSIF PvCHS-B-6 240 VWTAQTIAPDSDGAIDGHLREVGLTFHLLKDVPGIVSKNIGKALFEAFNPLNISDYNSIF PvCHS-B-8 240 VWTAQTIAPDSDGAIDGHLREVGLTFHLLKDVPGIVSKNIGKALFEAFNPLNISDYNSIF PvCHS-B-2 240 VWTAQTIAPDSDGAIDGHLREVGLTFHLLKDVPGIVSKNIEKALFEAFNPLNISDYNSIF PvCHS-B-1 240 VWTAQTIAPDSDGAIDGHLREVGLTFHLLKDVPGIVSKNIGKALFEAFNPLNISDYNSIF

   Motif III PvCHS-B-4 290 WIAHPGGPAILDQVEQKLDLKPEKMKATRDVLSDYGNMSSACVLFILDEMRRKSAENGLK PvCHS-B-6 300 WIAHPGGPAILDQVEQKLGLKPEKMKATRDVLSDYGNMSSACVLFILDEMRRKSAEKGLK PvCHS-B-8 300 WIAHPGGPAILDQVEQKLGLKPEKMKATRDVLSDYGNMSSACVLFILDEMRRKSAEKGLK PvCHS-B-2 300 WIAHPGGPAILDQVEQKLGLKPEKMKATRDVLSDYGNMSSACVLFILDEMRRKSAENGLK PvCHS-B-1 300 WIAHPGGPAILDQVEQKLGLKPEKMKATRDVLSNYGNMSSACVLFILDEMRRKSVENGLK    PvCHS-B-4 350 TTGEGLEWGVLFGFGPGLTIETVVLHSVSI PvCHS-B-6 360 TTGEGLEWGVLFGFGPGLTIETVVLHSVAI PvCHS-B-8 360 TTGEGLEWGVLFGFGPGLTIETVVLHSVAI PvCHS-B-2 360 TTGEGLEWGVLFGFGPGLTIETVVLHSV-- PvCHS-B-1 360 TTGEGLEWGVLLGLDLDLPLR-PLFSTV--

Motif IV Figure 25. Multi-alignment of deduced amino acid sequences of PvCHS-B family members (PvCHS-B-1, PvCHS-B-2, PvCHS-B-4, PvCHS-B-6 and PvCHS-B-8). Highly conserved residues are indicated in white with black background and partially conserved residues are showed in white background. The protein sequence of PvCHS-B has four CHS-specific conserved motifs (Marked Motif I, II, III and IV). Active site, catalytic residues and CoA- binding site residues are indicated as ,  and  respectively.

106

Expression of PvDFR gene in the seed coat of common bean

PvDFR was strongly expressed in the seed coat of the black seeded tester line 5-593 (dominant for all colour genes) but was not expressed in the seed coat of white seeded Pcdj BC3 5-593

(dominant for P but recessive for c, d and j) nor was it expressed in p BC3 5-593 (recessive for p but dominant for all other colour genes). PCRs with the same cDNA samples and primers for

CHI resulted in equal intensity amplicons from all three samples (Fig 26b).

Figure 26. Dihydroflavonol-4-reductase (PvDFR) expression in seed coats of bean seeds developing on black-seeded (Florida line 5-593) and white-seeded (Pcdj BC3 5-593; pBC3 5- 593) tester lines five days after pollination. a. PvDFR1 gene expression in the seed coats of three tester lines with different colours. The three lanes show the expression of PvDFR1 in black seeded, white seeded dominant P but recessive for c, d and j, and white seeded with recessive p locus, respectively. b. The CHI gene amplified from the seed coats of the three tester lines as positive controls for cDNA concentrations.

107

454-Sequencing of PV-GBa 0072I22 Clone (PvDFR3)

PV-GBa 0072I22 was the clone with the PvDFR3gene that was selected from the Clemson

Genomic Institute BAC library to be sequenced using 454 sequencing. The raw data consisted of

47,810 reads with an average length of 334.19bp, giving a total of 15,977,827 bp of sequence information (Table 14). Reference (Pv07:47680000..49689999) assembly of these sequences resulted in a 192,189bp contig with average depth coverage of 41X.

Table 14. Next generation sequencing of PV-GBa 0072I22 clone.

Count Average length Total bases Reads 32,372 263.38 8,526,056 Matched 31,480 264.6 8,329,699 Not matched 892 220.13 196,357

Functional annotation of the BAC clone PV-GBa 0072I22

Analysis of the 192,189 bp sequence revealed the presence of 20 putative genes (Table 15,

Figure 27). The predicted gene sequences were compared with the GenBank database using the

BLASTX algorithm. Two novel putative genes were identified with no similarities to any other genes currently described. Other putative genes identified in this clone are low copy genes, including serine carboxypeptidase, dihydrokaempferol 4-reductase, amine Oxidase (FLD), nodulin-like protein, pyruvate kinase, glycerol-3-phosphate dehydrogenase, F-box domain, methyltransferase-like protein and serine-threonine protein kinase.

108

Nodule-specific protein Nlj70 Transcriptional corepressor LEUNIG-like 0

No hit F-box protein Gag-polpolyprotein 20

Hypothetical protein Predicted protein 40

Pre-mRNA-processing ATP-dependent RNA helicase prp5-like Glycerol-3-phosphate dehydrogenase SDP6 60

No hit Serine/threonine-protein kinase CTR1-like Hypothetical protein 80

Serine/threonine-protein kinase CTR1-like Erythroid differentiation-related factor 100

Serine carboxypeptidase-like 49-like isoform 1 120

Uncharacterized protein Uncharacterized protein Dihydroflavonol-4- 140

Methyltransferase- -reductase like protein 13-like 160

Lysine-specific histone demethylase 1 homolog 3-like 180

Key: = Retrotransposon elements = Chloroplast DNA = Homology identified coding region = Mitochondrial DNA = Hypothetical proteins = Phenylpropanoid gene = No hit Arrows point in the proposed direction of transcription

Figure 27. Schematic representation of the distribution of putative genes identified in P. vulgaris PV-GBa 0072I22 BAC clone containing the PvDFR3gene.

109

Table 15. Predicted genes in the BAC clone PV-GBa 0072I22 and their putative function Position on Length BLASTX Accession number E BAC (bp) (organism) value 29-5220 5191 Nodule-specific protein Nlj70 gi|356500351 (M. 0 truncatula) 7065-24382 17317 Transcriptional corepressor LEUNIG- gi|356500353 (G. max) 0 like 24495-28691 4196 No hit 29905-35023 5118 F-box protein gi|356537561 (G. max) 2E-79 38153-39767 1614 Gag-pol polyprotein gi|38194929 (P. vulgaris) 4E-106 40014-57190 17176 Hypothetical protein gi|147839415 (V. 2E-160 vinifera) 57451-60134 2683 Predicted protein gi|224142551 (P. 8E-15 trichocarpa) 60855-69871 9016 Pre-mrna-processing ATP-dependent gi|356500419 (G. max) 0 RNA helicase prp5-like 70525-79477 8952 Glycerol-3-phosphate dehydrogenase gi|356500417 (G. max) 0 SDP6 79854-83469 3615 No hit 83537-92382 8845 Serine/threonine-protein kinase gi|356537525 (G. max) 1E-40 CTR1-like 92454-98444 5990 Hypothetical protein gi|147772908 (V. 0 vinifera) 99151-107466 8315 Serine/threonine-protein kinase gi|356500413 (G. max) 7E-103 CTR1-like 108730-123453 14723 Erythroid differentiation-related gi|356502908 (M. 0 factor truncatula) 123460-139352 15892 Serine carboxypeptidase-like 49-like gi|356534720 (G. max) 0 isoform 1 140033-145659 5626 Uncharacterized protein gi|356559510 (G. max) 7E-22 145781-156978 11197 Uncharacterized protein gi|356549106 (G. max) 2E-33 157279-170324 13045 Dihydroflavonol-4-reductase DFR1 gi|358248856 (G. max) 6E-132 172121-176030 3909 Methyltransferase-like protein 13-like gi|356537555 (G. max) 3E-93 176156-191772 15616 Lysine-specific histone demethylase 1 gi|356502918 (G. max) 0 homolog 3-like

PvDFR3 and PvDFR1 nucleotide alignment

PvDFR3 was sequenced from the BAC clone by next generation sequencing method. The coding sequence is 2793bp long and contains six exons and five introns. BLAST searches showed that its nucleotide sequence is similar to DFR3 of soybean. 110

PvDFR1 was partially sequenced from a PCR amplicon using the Sanger method (this study). The total length of the sequenced portion is 1385bp, which includes 2.5 exons and two introns. A BLAST search showed that its nucleotide sequence is similar to that of DFR1 of soybean. A full-length sequence of this gene containing six exons and five introns was retrieved from Phytozome and used for alignment (Pv01: 1086988..1090236). This gene is located on

Pv01 where two DFR genes sit back to back. The latter was called PvDFR2; it also has six exons and five introns and its full-length sequence was used in the alignment

(Pv01:1093757..1097036). Another gene annotated as DFR (PvDFR4) is located on Pv11

(Pv11:49726238..49727361), and has only four exons and three introns. Figure 13 shows the comparison of four DFR gene sequences from the start to stop codons with lower case letters designating the introns. PvDFR1 shows high similarity to PvDFR3 and PvDFR2 with 82% and

90% identical nucleotide sequences, respectively (Fig 28). Despite the fact that a search in the protein database of NCBI finds PvDFR4 to be very similar to DFR enzymes in other plant species, its nucleotide sequence shows a very low similarity to other genes in this gene family of common bean.

111

PvDFR2 1 ATGGGTTCAACTTCCGAATCCGTTTGCGTCACCGGAGCTTCTGGTTTCATAGGATCATGG PvDFR4 1 ATGGAAGAAAGCAAAGGAAGAGTGTGTGTGACAGGAGGTACAGGTTTTATTGGTTCATGG PvDFR3 1 ATGGGTTCAGAATCCTTAACCGTTTGCGTTACAGGGGCTTCTGGTTTCATCGGATCATGG PvDFR1 1 ATGGGTTCAGTGTCTGAAACTGTTTGCGTCACCGGAGCTTCTGGTTTCATCGGTTCATGG

PvDFR2 61 CTTGTCATGAGGCTAATGGAGCGTGGCTACACCGTTCGAGCCACCGTACGCGACCCAGgt PvDFR4 61 ATTATCAAGACCCTCCTTCAAGATGGTTACTCTGTTAACACCACTGTGAGAAAC------PvDFR3 61 CTTGTCATGTCACTCATCCAACGTGGCTATACTGTTCGAGCCACTGTTATTGACCCAGgt PvDFR1 61 CTTGTTATGAGACTCCTCGAGCGTGGCTACACCGTTCGAGCCACCGTTCGTGACCCAGgt

PvDFR2 121 ctcattcttacatcacttttttttttaatttctaagaattatctacaagtatttcaaatt PvDFR4 115 ------PvDFR3 121 tc------PvDFR1 121 ctcattctt------

PvDFR2 181 ttgatacattcttacaattgagaagttcggacactcatcttaaatggtgtgtcggtgtcc PvDFR4 115 ------PvDFR3 123 ------PvDFR1 130 ------

PvDFR2 241 gatacgaatatcggacaccgacactcgtatgacacatgtaggacatgtatccgtgaaatg PvDFR4 115 ------AATCCAGgtaacacaaaagcttctcacatg PvDFR3 123 ------atctcactctctaaattcatctttttcatc PvDFR1 130 ------aaccatgttaattctttctttttatatca

PvDFR2 301 tcaaatttaaaaagtatttgttagatttctgacaattctagtacggttctaacacaattt PvDFR4 145 ctaaacatttatagaatc------PvDFR3 153 ttcctcataccctaaact------PvDFR1 159 tgcaacctaacatataggtc------

PvDFR2 361 taaaaaagaaaaatacattaattttctaaaaattcaaactttattgtataaattgttatt PvDFR4 163 ------PvDFR3 171 ------PvDFR1 179 ------

PvDFR2 421 atgattataaaaaaaaaaaaaatctttgtgaaccaatcatgaaaaacatttttctgctct PvDFR4 163 ------agaacattctccttcttt PvDFR3 171 ------aggtctttttgttcac PvDFR1 179 ------gaaggttgtggggtttct

PvDFR2 481 aaaaaataatctgtaacatacttatgcaaataaatctttattgtcaatttatataattta PvDFR4 181 ttattataagcttt------tcatgttataggatttc PvDFR3 187 ccatttgaatttctggtttcattctactctcaagtttttaaatttttatctgggtatcta PvDFR1 197 gtatctagtattttatacgtgtttttgggtagtgtcatatagggtctggtggatctttgt

PvDFR2 541 taattatataatatatatagatctgtgtcctcgtgtcctacatttcaaagattttacgta PvDFR4 212 aaatcaaatgctacactgttcttagttgtttcctgaacttaatttgac------PvDFR3 247 aaatgtagctcagatctctctcactttaccatgtcaacttgattatagatgtatgaagat PvDFR1 257 tggtgcaagaaaacttgtttttgattttttataagaattgggtatccc------

PvDFR2 601 tattcgtgttcgtgtcgtatgaatgttgtatgagtgtctgtgtcagtgtttgtactgcat PvDFR4 260 ------catatatattgaat PvDFR3 307 tgtgagtgatgcttatgtgaactagtgttgtggttagttcaagaatttgactgaaaagtt PvDFR1 305 ------tgaag

PvDFR2 661 agactaagactccgtcatggttttacatagttttaatttatactaaacatatataactta PvDFR4 274 attcatattgattcccagatacccttttggtgtgggttcatactagacattgctgcttga

112

PvDFR3 367 aactttatctgtgagagacactttattatatattattaaagaaaatctttagatagtaga PvDFR1 310 tatctcattttaaacttgttttttatatcatgcaacctaacatataggttgaaggttgta

PvDFR2 721 aatttattgataatgataaattccatgaatatttggtatcaatatacagttataaaaaaa PvDFR4 334 agggtgggaatttttttacttgtttttggcatttagtttcaatg------PvDFR3 427 gaccatgttgaaataaaaaattatacacaactttttattcaacattcaataaattatttt PvDFR1 370 gggtctctgtatctcatattttatacgtgtttttggatattgtc------atgtagg

PvDFR2 781 atcagtttggaagctctacaacgatggtatcataaaaagt--actgttatacgagtattg PvDFR4 378 atttgtgcataagctgtgtgttgaaattgatgtttaaatg--attgtggtgtgtttctga PvDFR3 487 cttaacttttgctttctgtatttttatttatttttgtacttagttattgtcatcgcatta PvDFR1 421 gtctggtccttgctgatgcaagaaaacttgtttttggtttt--ttataaggattaggata

PvDFR2 839 tgtagaattgttagatgagattcagcacgtgaagaagaagggagatatatttttgtttgt PvDFR4 436 tatggtggtccttttggaattgcc------aaagaaacaacattcttggttga PvDFR3 547 tgtgtgctgacatgacatgttatgataaaatagttagaaatgaaaatagttttcttaaaa PvDFR1 479 ccctgaagtacctcattttattttatttattctttgatgataaaaaaaaatcatgcaacc

PvDFR2 899 tttacactgaatacaaccctccatcgtaaagtcaaacagagaaactatcttccattcaat PvDFR4 483 tttgagtaagattgccccc------aatggaaaaaagaacagcactcttaatctctct PvDFR3 607 cattacacagattattacttaaaattttcaggaaccgaatacagaatttataactatttt PvDFR1 539 taacactttcatgttatcactgttttgctgataaaaaaaaaaagttatcaatgtttctga

PvDFR2 959 gcactttcttttggatgaacatcctcattttataatatataaataaatacaaatcttatt PvDFR4 535 aaatgttcttt------PvDFR3 667 aaagataaaaattaatagaa------PvDFR1 599 aagttggtatt------

PvDFR21019 tttaagttattttttaagatttaattaggtttattttttaatacggtacgatagtctttt PvDFR4 546 ------PvDFR3 687 ------PvDFR1 610 ------

PvDFR21079 agagtttaagattttattgaatttattaagtcatccgttttcagacatctcttaaatata PvDFR4 546 ------PvDFR3 687 ------PvDFR1 610 ------

PvDFR21139 tagtcttgcatctgaattgattggtttcgatgtaagagtgtgtattaaaaatcatatatc PvDFR4 546 ------PvDFR3 687 ------taaatacaaaattca PvDFR1 610 ------

PvDFR21199 attagaaataaaatatttcatagtatataagtgaatgtaaaattcggtttattaggttat PvDFR4 546 ------atttgatttagtttgcaatttgattcactgaaccaa PvDFR3 702 gaaatttttcaaagactacaattcattcactataaaataaaaattatttttgaaaacaat PvDFR1 610 ------ttgaaggagtattatgtgttattttttgattatt

PvDFR21259 aattaaatccattttctaataaatttcaacccaaaatcttgattacaaaataaatgttat PvDFR4 582 aactgca------PvDFR3 762 aaatatagttacttggtgaattaatttagattctgagattg------PvDFR1 644 acatgcat------

PvDFR21319 ctaacctgctatgaaactattgaaaatatataattatgtagtgttttatgttaattatta PvDFR4 589 ------PvDFR3 803 ------PvDFR1 652 ------

113

PvDFR21379 tatacacgtgcatgtataccctgcagAAAACATGAAGAAGGTGAAGCATTTGGTGGAACT PvDFR4 589 -----attttgatgtttaaaacaaagAACACAAGAAGGATCTTAGCTTTCTCACCGGCTT PvDFR3 803 ----tattctatatgatgtggtgcagATGATACGAAGGAGGTGAAGCATTTGCTGGAGAT PvDFR1 652 ------gtgtagGGAACATGAAGAAGGTGAAGCATTTGGTGGAGCT

PvDFR21439 TCCAGGTGCAAAGACGAAACTGTCTCTATGGAAAGCTGACCTTGCTGAAGAGGGAAGCTT PvDFR4 644 ACCAGGAGCATCTCAAAGGCTACAAATTCTGAGTGCTGATCTCAGCAATCCAGAAAGCTT PvDFR3 859 AGGAGGTGCAAAGAGCAAGCTTTCACTGTGGAAGGCTAACCTTGAAGAAGAGGGAAGCTT PvDFR1 692 GCCTGGTGCAAAGACGAAACTGTCTCTGTGGAAGGCTGATCTTGGTGAAGAAGGAAGCTT

PvDFR21499 TGACGAAGCCATTAAAGGCTGCACAGGAGTTTTCCACGTCGCAACCCCCATGGATTTCGA PvDFR4 704 CAATGCAGTCATTGAAGGATGTGTTGGAGTTTTCCATGTTGCTACCCCAGTTGACTTTGA PvDFR3 919 TGATGAAGCCATTAAAGGGTGCATTGGAGTTTTCCACTTGGCCACCCCCATTAACTTTGA PvDFR1 752 TGATGAAGCCATTAAAGGGTGTACTGGAGTTTTTCATGTGGCAACTCCTATGGACTTTGA

PvDFR21559 GTCCAAGGACCCTGAGgtaccaattgttcttttataactacttctc------PvDFR4 764 ACTAAGAGAACCAGAAGAA------PvDFR3 979 ATCCAAAGACCCTGAGgtcttcattctatcatactttttctttc------PvDFR1 812 GTCTAAGGACCCTGAGgtacaacataacgtatatattacagtcaaaatatctgatgatca

PvDFR21605 ------PvDFR4 783 ------PvDFR31023 ------PvDFR1 872 tcgacgatattttataatatataagtaaatacaaactttattttataaacatatttaaag

PvDFR21605 ------PvDFR4 783 ------PvDFR31023 ------PvDFR1 932 ttcacttcttaatatgatatatacagttatttagaatttatattaatgagagtttgttag

PvDFR21605 ------PvDFR4 783 ------PvDFR31023 ------PvDFR1 992 acttatcaagttacgtgttatcgagactcataatatatagtcttatgagttgtcagtctc

PvDFR21605 ------tgaaaattttgaattaaggtaccaa PvDFR4 783 ------PvDFR31023 ------ttctcatttttctcaaatt PvDFR11052 tgcatgagcatgaacgagtgtgttaaagatccacatcgaataaaaatttatagtgtatat

PvDFR21630 ttgttgttttataactacttctctggaaattttgaattaagttgttataatctatatagt PvDFR4 783 ------PvDFR31042 tattaaggcaaagactaataccatcaaataacttcaatggtgatctttttctgttttaca PvDFR11112 aagtgaatgtaaactttatcttataaatcgattttataaagttgaattaagtttaaatta

PvDFR21690 acaaatacgacatggtatatactatttaagatatgtgtctgagtgttttataattataac PvDFR4 783 ------PvDFR31102 ttagatctgagtaaagtcttctttgtggtatatgatttagagtgaagtatgataaattaa PvDFR11172 cattttttttaataatcttttgaaactatagagaagaagttatgaaaatgatggattgaa

PvDFR21750 acggttagaaaaattgagtgaaaatgcagAATGAAGTGATAAAGCCTACAATAAAGGGGT PvDFR4 783 ------GTAGTGACCAAAAGATCCATTGATGGTG PvDFR31162 tctggtttaaaatttgattggaaatgcagAATGAAGTGATAAAGCCTGCAATAAGGGGAG PvDFR11232 tctgagaaagttgatggcatgtggtgcagAATGAAGTGATAAATCCTACAGTGAACGGAA

PvDFR21810 TAGTGGATATCATGAAAGCATGCGTGAAGGCCAAAAGTGTGAGAAGGATTGTCTTCACAT

114

PvDFR4 811 CACTTGGCATTTTGAAGGCATGCCTGAATTGCAAGACTGTGAAACGAGTTGTTTACACCT PvDFR31222 TAATAGATATCATGAAAGCATGCTTGAAGGCAAAAAGTGTGAGAAGGCTTGTATTCACAT PvDFR11292 TTCTAGACATCATGAAAGCATGCATGAAGACAAAGACAGTGCGAAGGCTTATATTTACAT

PvDFR21870 CCTCAGCTGGAACTGTGG---ATGTTGCCGAGAAGTCAAAGCCTTTTTATGATGAGAACT PvDFR4 871 CTAGTGCCTCTGCTGTGGTTCATGGTGGCACAGAAGAACAACAAGTGATGGATGAAAGCT PvDFR31282 CCTCAGCCATAACCACCC---AAATTTCTCATCACCAAAAGCCTCTGTATGATGAGACCT PvDFR11352 CCTCAGCAGGAACCCTTA---ATGTTTTTGAGCACCAAAAGCCTGTTATGGATGAAACCT

PvDFR21927 GTTGGAGTGATGTTGAGTTCTGCAGAAGGGTCAAAATGACTGGTTGGgttagttcattct PvDFR4 931 CTTGGACTGATGTGGATCTTCTTAGAACTTCAAAGGCATTTGGTTGGAGTTATGCA---- PvDFR31339 GTTGGACTGATGTTGAGTTATGCAGAGCAGCCAAGATGACTGGTTGGgttagtacttttt PvDFR11409 GTTGGAGTGACGTTGACTTTTGTAGGAGAGTCAAGATGACTGGTTGGgttagtcattttt

PvDFR21987 atc------PvDFR4 987 ------PvDFR31399 ctcaccaactacgtgaggatcaattatcgtatttgagattgtccgtttcggagtcgatgt PvDFR11469 tttctcacaatgtcaatttttttttattaaaaaagaataaataatataaaacaaaatatt

PvDFR21990 ------PvDFR4 987 ------PvDFR31459 tttgtcacagttttgtttttaaaaggcacgtggattaaggaatgtatttaaactgttatt PvDFR11529 tgagatatatcccaattcttataaaaaaaactaaaacaaattttcttgcacctgattagt

PvDFR21990 ------PvDFR4 987 ------PvDFR31519 gac------ctc PvDFR11589 taaagatctattggaccttacacgatattatccaaaatcttatattagaacaaaatacag

PvDFR21990 --tatttatcttggtgttgttttttttcatacgacattaaggtttaaggtttagtgttta PvDFR4 987 ------PvDFR31525 gactcctaaatgtcgtgtaactatattttgaccggaacaataatcatttatttttcaatg PvDFR11649 aattcctaacatcttgtaccattcaaattttcatatttgaacacaaaactagagcaaatt

PvDFR22048 agatttttattgattttctatcgtgtttctttgtagATGTATTTTGTTTCAAAGACACTG PvDFR4 987 ------GTTTCAAAGACATTG PvDFR31585 gataaagtttttatttttatctttgtttgaatgcagATGTATTTCGTTTCCAAAACTCTG PvDFR11709 aaactacaccgtgcatttttttttcttgatgtgtagATGTATTTCGTTTCTAAAACACTG

PvDFR22108 GCGGAGAAAGAAGCTTGGAAATTTGCCAAAGAGCATAACATAGACTTTGTCTCAATCATT PvDFR41002 ACAGAGAAGGCAGTGCTTGAATTTGGAGAACAAAATGGATTGGAAGTTGTGACTCTGATT PvDFR31645 GCAGAGCAAGAAGCATGGAAATTTGCCAAAGAAAAAGGAATGGACTTTGTCACTATCCTT PvDFR11769 GCTGAGAAAGAAGCTTGGAAATTTGCCAAAGAGCATGGCATGGACTTCATCACTATCATT

PvDFR22168 CCACCTCTTGTTGTTGGTCCTTTTCTCATGCCTACAATGCCACCAAGCCTAATCACTGCT PvDFR41062 CCAACTTTTGTTTTTGGACCCTTCATTTGTCCAAAGCTTCCTGGCTCAGTTCAAGCTTCA PvDFR31705 CCAACTCTCGTTGTTGGCCCTTTTCTGCTCCCATCAATGCCATCTAGCTTAATCACTGCA PvDFR11829 CCACCTCTTGTTGTTGGTCCCTTTCTCATGCCAACAATGCCACCTAGCCTAATCACTGCT

PvDFR22228 CTTTCACTCATCACAGgtgcattacac------PvDFR41122 TTGAAATTCTCATTTGGTCAgtgatgc------PvDFR31765 CTTTCCCCTATCACAGgt------PvDFR11889 CTTTCGCCCATCACAGgtaacttatatcactacatatttttttatgagcaacgaattaaa

PvDFR22255 ------PvDFR41149 ------PvDFR31783 ------

115

PvDFR11949 tgaataaattgaaacatacatttcagatataagtaatgatactttaacaatctgctccaa

PvDFR22255 ------PvDFR41149 ------PvDFR31783 ------PvDFR12009 tacaccatctataattaataatacgaagaaaggttatttttgtcacaacttgctccaata

PvDFR22255 ------PvDFR41149 ------PvDFR31783 ------PvDFR12069 caccactcggtgcaagttttaaacaagaaaatggaattataaactgatacaagttgtcaa

PvDFR22255 ------PvDFR41149 ------PvDFR31783 ------PvDFR12129 tatatcatttctctaagaaatattctaacatttataaaaaaaagtttgagagatatattt

PvDFR22255 ------ga PvDFR41149 ------aa PvDFR31783 ------PvDFR12189 acctcaataaacattacaaaaagaaccctatactccattgtcactaaacacaatataaaa

PvDFR22257 ggagaaccaacaaacattcagaatcaatcacacataatttaatcactttctgtttacgat PvDFR41151 tttgattatcctattttctaccatttatcgtgtaacagatagttaagaaaatgtaaatat PvDFR31783 --aaatttacaaaaattctgaaacacattcttattgatttgttcaaattattgttaaaat PvDFR12249 ccttacaaaaagaatctataaaataattaaaacgtaaattagttagtgaaattgtaacta

PvDFR22317 ttcaacttgcattaatatgttggaatgatacttggtttttgaa------PvDFR41211 atagaaagaaatttaca------PvDFR31841 ttgtgaagaaattttgatat------PvDFR12309 ataaactttgttttaaatgtgattgatacatttatcactatgaagttatgtgaatagatt

PvDFR22360 ------PvDFR41228 ------PvDFR31861 ------PvDFR12369 gttcttactttttcacattcttatataataagaatatgaattttatgagtggcactgaaa

PvDFR22360 --ttagcagGAAATGAAGGGCATTACCATATCATAAAGCAAGGCCAGTTCGTGCACTTAG PvDFR41228 ------cagGAGAAAAAAGTGGATTTGATTCCTTGCTTGAGACACCAATGGTGCATGTGG PvDFR31861 -----gcagGAAAAGAGCAGCATTATTCGATCATAAGACAAGGTCAATTGGTGCACGTAG PvDFR12429 atttggcagGAAATGAGGGCCACTATTCGATCATAAAGCAAGGCCAATTCGTGCACTTGG

PvDFR22418 ATGACCTTTGTCTTGCTCATATATTTTTGTTCGAGAATCCAAAAGCAGAAGGGAGGTACA PvDFR41282 ATGATGTGGCTAGAGCACATATATTTCTGCTGGAGAATCCTAATTCAAAAGGGAGGTATA PvDFR31916 AAGATGTTTGTCGTGCTCATATATTTCTATTTGAAGAGCCAAAAGCTGAAGGAAGATACA PvDFR12489 ATGATCTATGTCTTGCTCACATATTTCTGTTTGAAGAACCAAAAGTTGAAGGGAGGTACA

PvDFR22478 TATGCTGTTCGCATGAGGCTACCATTCATGACATTGCAAAACTGCTGAACCAGAAATACC PvDFR41342 ATTGTTCAAAATGTTTGGTTACTTATGAAAGGATCTCTGAAATTGTTTCTGCCAAATACC PvDFR31976 TATGCAATGCATGTGATGTCACTATTCATCACATTGCAAAATTAATTAACAAAAAATACC PvDFR12549 TATGCAGTGCATGTGACACTACTATTCATCACATTGCCAAACTAATCAATGAAAAATACC

PvDFR22538 CTCACTATAATATCCCCACAAAgtaagacttctcaatttttgtcatttctatcaattgaa PvDFR41402 A------PvDFR32036 CAGAATACAAGGTTCCAACTAGgtaactctc------tcaa PvDFR12609 CTGATTACAACATCCCCACCAAgtaagtctcc------

116

PvDFR22598 ttcattgcatgtagagaatgttatagaaataataataatacaagttagtagactttaaac PvDFR41403 ------AGAATTTAAGC PvDFR32071 caaatttcttacatcataactaatttaattctcaaatattttttataaatgggtttataa PvDFR12641 ------caattaaccaataag

PvDFR22658 ctaacacaattttgtaaaattgacttataagatgaggtttgcagtctcacttgtatatta PvDFR41414 CAGAGACAGTAGAgtgag------PvDFR32131 ttaacttactatcacaattttaacttcttaatttaaagtt------tgtatatta PvDFR12656 ctttcaattttcacatattttatcacctaattttttctctacatcttttaccattaggtt

PvDFR22718 tgaaatatttttatctccagtcgatatggaatctccatcatactcctttacatcgagact PvDFR41432 ------tttctgttgaacttcatgttgtgacacatttgacagttcttcttgatttgagc PvDFR32180 taatttgattatatatttagttcgtatacaatcttcattatatttcttcatactttaaca PvDFR12716 tttcaaatttatacattttactatcataaaaataccattttgtaaaattcttgttcataa

PvDFR22778 atctagtcgatatgggattttcattacacccctttacgttgagtctgtttaatcgatgtg PvDFR41485 atttggtttttctttgatttcattgcaacaagtgtttggtttatttttttatttattgta PvDFR32240 ttaaatatgaaaatcaatataaattcggtagtagatgatatgatcatcttaacaaactta PvDFR12776 ataatgttgtttatgtacatcaaacatatcaataaattaatgtgcaaaaatataaacatg

PvDFR22838 ggatatacatcaacacccctttacgtcgagac------PvDFR41545 gATGTTTCAACAAAAT------PvDFR32300 atataaaagataaattattaagaagagaacacattccatctaaattaaattcactcttaa PvDFR12836 catgcataaaaaattcacaatagtatttgcaa------

PvDFR22870 ------tgtaactcgtgtgtatgactatatttattagtagtctgata PvDFR41561 ------PvDFR32360 attttactattttaactttcattaataatattcttaaaattagttctagtaaaattagaa PvDFR12868 ------aatgataatatt

PvDFR22911 acggtacgacagaaggaaagagacctaataaagccaataaatcttgttagaataggttct PvDFR41561 ------PvDFR32420 attagattttacaattaactgaacctctaaaaccatgttaaaatctaccattactataac PvDFR12880 attttttttatcagtaaaaaataataaataaatgagaaccacttaaggatgatccaattc

PvDFR22971 aaatagttttaatatcatattaagaagttgattttaagtctaattcaattttacaaaatc PvDFR41561 ------PvDFR32480 atttttaaataaaaattaattagttattatcgattaaactagtttttattatagataaat PvDFR12940 ttaaaaaaacgtcagtcttttcctaacaagacgaacatctcctaccaaccgaaaacacaa

PvDFR23031 ggtttataagatgagatttgtatagtataatattagatgaatatgtatggatcatggatg PvDFR41561 ------PvDFR32540 aatttatagattaacatctaattaattaccaaaattttagatatcaattatttagattct PvDFR13000 tttctgaaaatcttaaaccttaaaccctgaatcttaaaccctaaatatacaactaaagca

PvDFR23091 agttaaaagtgtatatatgctgtactaattgtggtcattttgatgcaGGTTCAAGGATAT PvDFR41561 ------PvDFR32600 aaagttaaaaactaaaattctaatacacttatatctgatt---tttaGGTTTGAGAAGAT PvDFR13060 taacattaatatagggttgacctaatattgtacacattggatttctaGGTTCAAGAATAT

PvDFR23151 TCCAGATGAATTGGAAATTATTAGATTTTCTTCGAAGAAGATCACAGACATGGGCTTCAA PvDFR41561 ---AAAAGGTGTGAAGATACCAGATTTATCATCAAAGAAGCTCATAGATGCTGGATTTGT PvDFR32657 TCCTGATGAATTGGAGTTGGTGAGGTTATCTTCCAAGAAAATAAGAGAGTTGGGATTCAA PvDFR13120 TCCAGATGAATTGGAGCTTGTGAGATTTTCTTCAAAGAAGATCAGAGACTTGGGATTCCA

PvDFR23211 GTTTGAGTACAGCTTAGAGGATATGTTCGCAGGAGCTGTTGAGACCTGCAGAGAAAAAGG PvDFR41618 GTTCAAGTATGGAATTGAGGAGATGCTTGATGATGCAATCCAATGCTGCAAGGAAAAAGG

117

PvDFR32717 ATTCAAATACAGCTTGGAGGATATGTATACAGAAGCAATTGATGCATGCAAAGAAAAAGG PvDFR13180 ATTTAAGTACAGCTTAGAGGACATGTACTGTGGAGCAATTGACACATGCAGAGACAAAGG

PvDFR23271 GCTTCTTCCTCAACCTGCTCAAACTCCTGTTAATGGCACCATGCACAAATAG------PvDFR41678 GTTACCTCTTAAGTAA------PvDFR32777 GTTTCTTCCTAAACATGCTTAG------PvDFR13240 GCTTCTTCCTAAACCTAAACCTAAACCTGCAGAAACTCCTCTCAGTAGTATCATTCAGAA

PvDFR2 ------PvDFR4 ------PvDFR3 ------PvDFR13300 TGCAGAAACCTCCATGAATGGCATCATTCAGAACTAA

Figure 28. Genomic DNA sequence comparison of PvvDFR1, PvDFR2, PvDFR3 and PvDFR4 of P. vulgaris using ClustalW2 program. Lower case indicates the intron. Dark shading indicates positions where both sequences are identical and lighter shading represents positions where there is a lower conservation.

Figure 29 shows an alignment of the deduced amino acid sequences of bean DFR family members. The NADP-binding domain and the region determining substrate specificity of DFR enzymes are indicated (Beld et al, 1989; Johnson et al, 2001). PvDFR2 has an aspartic acid residue at position 134, as was observed for the Petunia and Populus proteins (Johnson et al,

2001), whereas PvDFR1 is similar to Gerbera and some Lotus DFRs with asparagine at the same position (Johnson et al, 2001).

118

PvDFR1 1 MGSVSETVCVTGASGFIGSWLVMRLLERGYTVRATVRD-PGNMKKVKHLVELPGAKTKLS PvDFR2 1 MGSTSESVCVTGASGFIGSWLVMRLMERGYTVRATVRD-PENMKKVKHLVELPGAKTKLS PvDFR3 1 MGSESLTVCVTGASGFIGSWLVMSLIQRGYTVRATVID-PDDTKEVKHLLEIGGAKSKLS PvDFR4 1 MEESKGRVCVTGGTGFIGSWIIKTLLQDGYSVNTTVRNNPEHKKDLSFLTGLPGASQRLQ

PvDFR1 60 LWKADLGEEGSFDEAIKGCTGVFHVATPMDFESKDPENEVINPTVNGILDIMKACMKTKT PvDFR2 60 LWKADLAEEGSFDEAIKGCTGVFHVATPMDFESKDPENEVIKPTIKGLVDIMKACVKAKS PvDFR3 60 LWKANLEEEGSFDEAIKGCIGVFHLATPINFESKDPENEVIKPAIRGVIDIMKACLKAKS PvDFR4 61 ILSADLSNPESFNAVIEGCVGVFHVATPVDFELREPEEVVTKRSIDGALGILKACLNCKT

PvDFR1 120 VRRLIFTSSAGTLNVFEHQ-KPVMDETCWSDVDFCRRVKMTGWMYFVSKTLAEKEAWKFA PvDFR2 120 VRRIVFTSSAGTVDVAEKS-KPFYDENCWSDVEFCRRVKMTGWMYFVSKTLAEKEAWKFA PvDFR3 120 VRRLVFTSSAITTQISHHQ-KPLYDETCWTDVELCRAAKMTGWMYFVSKTLAEQEAWKFA PvDFR4 121 VKRVVYTSSASAVVHGGTEEQQVMDESSWTDVDLLRTSKAFGWSYAVSKTLTEKAVLEFG

PvDFR1 179 KEHGMDFITIIPPLVVGPFLMPTMPPSLITALSPITGNEGHYSIIKQGQFVHLDDLCLAH PvDFR2 179 KEHNIDFVSIIPPLVVGPFLMPTMPPSLITALSLITGNEGHYHIIKQGQFVHLDDLCLAH PvDFR3 179 KEKGMDFVTILPTLVVGPFLLPSMPSSLITALSPITGKEQHYSIIRQGQLVHVEDVCRAH PvDFR4 181 EQNGLEVVTLIPTFVFGPFICPKLPGSVQASLKFSFGEKSGFDSLLETPMVHVDDVARAH

PvDFR1 239 IFLFEEPKVEGRYICSACDTTIHHIAKLINEKYPDYN-IPTKFKNIPDELELVRFSSKKI PvDFR2 239 IFLFENPKAEGRYICCSHEATIHDIAKLLNQKYPHYN-IPTKFKDIPDELEIIRFSSKKI PvDFR3 239 IFLFEEPKAEGRYICNACDVTIHHIAKLINKKYPEYK-VPTRFEKIPDELELVRLSSKKI PvDFR4 241 IFLLENPNSKGRYNCSKCLVTYERISEIVSAKYQEFKPETVECFNKIKGVKIPDLSSKKL

PvDFR1 298 RDLGFQFKYSLEDMYCGAIDTCRDKGLLPKPKPKPAETPLSSIIQNAETSMNGIIQN PvDFR2 298 TDMGFKFEYSLEDMFAGAVETCREKGLLPQPAQTPVNGTMHK------PvDFR3 298 RELGFKFKYSLEDMYTEAIDACKEKGFLPKHA------PvDFR4 301 IDAGFVFKYGIEEMLDDAIQCCKEKGLPLK------

Figure 29. Alignment of deduced amino acid sequences for DFR genes of common bean. Identical amino acids are highlighted in dark gray and similar amino acids in light gray. The NADP- binding domain is underlined. Boxed amino acids have been considered to control the substrate specificity of DFR enzyme and the amino acid residue that is especially important for this specificity is indicated by an arrowhead.

119

99 LjDFR4 51 LjDFR3 LjDFR2 98 100 LcDFR3

69 LjDFR1 87 GmDFR1 P PvDFR-BvDFR1 78 87 GmDFR3 97 P PvDFR-AvDFR3

73 LjDFR5 MtDFR2 45 97 GmDFR2 89 PPvDFR-CvDFR2 VvDFR 40 52 MdDFR 89 FaDFR 53 RhDFR DcDFR 27 100 VmDFR2 97 VmDFR1 CsDFR

98 CcDFR 68 GhDFR 98 GtDFR

89 ThDFR 61 84 PfDFR 58 86 AmDFR FiDFR

55 100 IpDFR InDFR

94 PhDFR

100 StDFR 81 SlDFR AtDFR

100 TmDFR 100 HvDFR OsDFR 88 LhDFR Monocots

97 ChDFR 100 BfDFR PPvDFR-DvDFR4

120 0.1

Figure 30. A Neighbor-joining phylogenetic tree of dihydroflavonol 4-reductase (DFR) amino acids sequences. Sequences were aligned using CLASTALW and the tree was inferred using the neighbor-joining method (MEGA4) with 1,000 bootstrap replicates.

PvDFR3 protein analysis

The amino acid sequence of the PvDFR3 gene was compared to the protein database to determine the presence of conserved domains (Fig 30). There is a conserved domain of flavonoid reductase (FR) spanning from the eighth amino acid through amino acid number 302. These FRs act in the NADP-dependent reduction of flavonoids in plant secondary metabolism. They have the characteristic active site and an NADP-binding motif. A characteristic of this domain is the presence of a Rossmann fold (alpha/beta folding pattern with a central beta-sheet) core region

(Rao and Rossmann, 1973) (Fig 31). The Rossmann fold is at the N-terminal domain, whereas, the C-terminal domain participates in substrate binding (Petit et al, 2007).

MGSESLTVCVTGASGFIGSWLVMSLIQRGYTVRATVIDPDDTKEVKHLLEIGGAKSKLSLWKAN LEEEGSFDEAIKGCIGVFHLATPINFESKDPENEVIKPAIRGVIDIMKACLKAKSVRRLVFTSSAITT QISHHQKPLYDETCWTDVELCRAAKMTGWMYFVSKTLAEQEAWKFAKEKGMDFVTILPTLVV GPFLLPSMPSSLITALSPITGKEQHYSIIRQGQLVHVEDVCRAHIFLFEEPKAEGRYICNACDVTIHH IAKLINKKYPEYKVPTRFEKIPDELELVRLSSKKIRELGFKFKYSLEDMYTEAIDACKEKGFLPKH A Figure 31. Deduced amino acid sequence of PvDFR3. Highlighted region shows the flavonoid reductase (Rossmann fold) conserved domain.

Three-Dimensional Model Analysis

A comparative modeling of 3D model of PvDFR3 protein was performed using Phyre2 server

(Kelley and Sternberg, 2009). The model covered amino acids 6-326 of PvDFR3 and consisted of α-helices and β-turns (Fig 32).

121

Figure 32. Predicted three dimensional structure of PvDFR3 using phyre2 server (100% confidence) (Kelley and Sternberg, 2009). Position of the Rossmann fold is indicated in the picture.

454-Sequencing of PV-GBa 0043K12 Clone (PvMyb15 transcription factor)

The PV-GBa 0043K12 clone was identified by probing the Clemson Genomic Institute BAC library membranes with the Myb gene clone (data not shown) and sequenced. The raw data consisted of 19,025 reads with an average length of 333.32bp, giving a total of 6,341,420bp of sequence information (Table 16). Reference (Pv07:12620000..14429999) assembly of these sequences resulted in a 175,524 bp contig with 19X coverage.

122

Table 16. Next generation sequencing of PV-GBa 0043K12 clone.

Count Average length Total bases Reads 14,197 263.43 3,739,956 Matched 13,913 264.52 3,680,290 Not matched 284 210.09 59,666

Functional annotation of the BAC clone PV-GBa 0043K12

Analysis of the 175,524 bp sequence revealed the presence of 21 putative genes (Table 17,

Figure 33) including three novel putative genes with no similarities to any other genes currently described. Other putative genes identified in this clone include some retro-element associated proteins (a retrotransposon protein like, gag-pol polyprotein) and low copy genes, including:

MYB29 protein, actin-depolymerizing factor 6-like, cryptochrome-2-like, uncharacterized proteins, integrase core domain containing protein, retrotransposon protein (Ty3-gypsy sub- class), gag-pol polyprotein, poly(ADP-ribose) glycohydrolase 1-like and basic 7S globulin-like.

123

Uncharacterized protein Integrase core domain containing protein 0

Ty3-gypsy sub-class Putative retrotransposon No hit 20

Putative polyprotein 40

GmMYB29 Uncharacterizedprotein Gag-polpolyprotein 60

Gag-polpolyprotein Uncharacterized protein 80

Gag-polpolyprotein Gag-polpolyprotein 100

Poly(ADP-ribose) glycohydrolase1-like No hit 120

Gag-polpolyprotein Actin-depolymerizing factor 6-like 140

Basic 7S Cryptochrome-2-like globulin-like No hit Uncharacterized protein 160

Key: = Retrotransposon elements = Chloroplast DNA = Homology identified coding region = Mitochondrial DNA = Hypothetical proteins = Phenylpropanoid gene = No hit Arrows point in the proposed direction of transcription

Figure 33. Schematic representation of the distribution of putative genes identified in P. vulgaris PV-GBa 0043K12 BAC clone

124

Table 17. Predicted genes in the BAC clone PV-GBa 0043K12 and their putative function Position on Length BLASTX Accession number E BAC (bp) (organism) value 2081-7802 5721 Uncharacterized protein gi|351720911 (G. max) 5E-67 8091-21301 13210 Integrase core domain containing gi|396582343 (P. vulgaris) 0 protein 21809-32441 10632 Retrotransposon protein, putative, gi|14018077 (O. sativa) 4E-73 Ty3-gypsy sub-class 33932-37874 3942 Putative retrotransposon gi|353685479 (P. vulgaris) 1E-106 37940-39772 1832 No hit 40147-56328 16181 Putative polyprotein gi|22213212 (O. sativa) 3E-146 56517-65916 9399 MYB29 protein gi|359806618 (G. max) 2E-156 68101-75535 7434 Uncharacterized protein gi|356518961 (G. max) 2E-39 75577-76505 928 Gag-pol polyprotein gi|38194929 (P. vulgaris) 8E-117 76560-85327 8767 Gag-pol polyprotein gi|38194929 (P. vulgaris) 0 85417-99721 14304 Uncharacterized protein gi|356515310 (G. max) 3E-114 100083-112840 12757 Gag-pol polyprotein gi|38194929 (P. vulgaris) 4E-32 113261-121559 8298 Gag-pol polyprotein gi|38194929 (P. vulgaris) 0 121592-134038 12446 Poly(ADP-ribose) glycohydrolase 1- gi|147778450 (V. vinifera) 2E-166 like 135134-144654 9520 No hit 144694-149819 5125 Putative gag polyprotein gi|353685476 (P. vulgaris) 7E-52 150084-158068 7984 Actin-depolymerizing factor 6-like gi|356518048 (G. max) 9E-88 162769-169128 6359 Cryptochrome-2-like gi|356576533 (G. max) 0 169186-170721 1535 Basic 7S globulin-like gi|356576537 (G. max) 0 171173-173502 2329 No hit 173556-175262 1706 Uncharacterized protein gi|356495639 (G. max) 3E-96

PvMyb15 sequence analysis

The PvMyb15 ORF sequence is 1781 bp in size and includes two introns that are 176 bp and 804 bp in length. The three exons are 133, 130 and 538 bp in length and code for a protein with 272 amino acid residues. A homology search with the deduced amino acids of this gene showed that it is similar to a group of transcription factors from other plant species (Fig 34). It has two R domains and a helix turn helix motif, which facilitates its binding to the target. Kranz et al (1998) classified the MYB proteins of Arabidopsis into 22 subgroups based on their amino acid

125 sequence similarity. AtMYB13, AtMYB14 and AtMYB15 belong to the subgroup 2 which contain the amino acid sequences of IDxSFWSE –MxFWFD (black boxes on Fig 35) in their C- terminal region. PvMyb15 shows high similarity to this protein of Arabidopsis and may belong to the same group.

Figure 34. Phylogenetic relationship amongst a few R2R3-MYB from Arabidopsis and anthocyanin-related MYB transcription factors of other plants with bean MYB15. Sequences were aligned using CLUSTALX2 and the tree was inferred using the neighbor-joining method (phyml) with 1,000 bootstrap replicates.

126

GmMYB29 1 MVRAPCCEKMGLKKGPWAPEEDQILTSYIDKHGHGNWRALPKQAGLLRCGKSCRLRWINY PvMYB15 1 MVRAPCCEKMGLKKGPWAPEEDQILTSYIQKHGHGNWRALPKQA-LLRCGKSCRLRWINY LjMYB14 1 MVRAPCCEKMGLKKGPWAAEEDEILTSYIQKHGHGNWRALPKQAGLLRCGKSCRLRWINY LjMYB15 1 MVRAPCCEKIGLKKGPWTSEEDQILISYIQKHGHGNWRALPKHAGLLRCGKSCRLRWINY AtMYB15 1 MGRAPCCEKMGLKRGPWTPEEDQILVSFILNHGHSNWRALPKQAGLLRCGKSCRLRWMNY AtMYB14 1 MGRAPCCEKMGVKRGPWTPEEDQILINYIHLYGHSNWRALPKHAGLLRCGKSCRLRWINY

R2 domain bHLH motif GmMYB29 61 LRPDIKRGNFTIEEEETIIKLHDMLGNRWSAIAAKLPGRTDNEIKNVWHTNLKKRLLKSD PvMYB15 60 LKPDIKRGNFTSEEEEIILKLHETLGN-WSAIAAKLPGRTDNEIKNVWHTNLKKRLLKAD LjMYB14 61 LRPDIKRGNFTAEEEESIIKLHEMLGNRWSAIAAKLPGRTDHEIKNVWHTHLKKKLPKTE LjMYB15 61 LRPDIKRGNFTAEEEELIIKMHELLGNRWSAIAAKLPNRTDNEIKNVWHTHLKKRLQKTN AtMYB15 61 LKPDIKRGNFTKEEEDAIISLHQILGNRWSAIAAKLPGRTDNEIKNVWHTHLKKRLEDYQ AtMYB14 61 LRPDIKRGNFTPQEEQTIINLHESLGNRWSAIAAKLPGRTDNEIKNVWHTHLKKRLSKNL

R3 domain GmMYB29 121 --QSKSKPSSKRAIKPK----IERSDSN--SSIITQS-EPDNFNFREMDTITSSACTTSS PvMYB15 119 --QSSS--NSRRVTKPK----IKRSDSN--SSIVTQS-EQTNFNFREMD--STSACTTSS LjMYB14 121 HQQQPSSKPKIRVSKSK----IKRSDSN--SSTITQSSEPVTLSFRDME--SSSACTTTT LjMYB15 121 ---NQSNSDRKRVSKPK----IKRSDSS--SSTLTSS------SDFSSSV AtMYB15 121 P-AKPKTSNKKKGTKPKSESVITSSNSTRSESELADSSNPSGESLFSTSP-STSEVSSMT AtMYB14 121 N----NGGDTKDVNGIN-----ETTNEDK-GSVIVDT------ASLQQFS

GmMYB29 172 S-----DFSSVTVGD--SKNIKSE-DTESTET-MPVIDESFWSEAAIDDETPTMSSSQSL PvMYB15 166 ------DFSSVTVGD--SKNIKCE-DIDSLET-MPVIDESFWSEPAID-ETPSMSS-QSM LjMYB14 173 SS----DFSSVTVGDESQKNAKSEEDTESMET-MPEIDESFWSEAAMDDEIETPSL-PSL LjMYB15 156 N------EGVEIMD---NSIKSEEDIESLETIMPVIDESFWSEEAMDDESSTMPS-NSL AtMYB15 179 LISHDGYSNEINMDNKPGDISTIDQECVSFETFGADIDESFWKETLYSQDEHNYVS-NDL AtMYB14 155 N------SITTFD---ISNDNKDDIMSYEDISALIDDSFWSDVISVDNSNKNEK----

GmMYB29 223 TISNEMRLQYPFANYEETFQQGHHAYDSNFDDGMDFWYDIFT---RTNDSIELLEF- PvMYB15 214 TFADQMPLQYPFTNYEETFQP-SHAYDSNFDDGMDFWYDIFT---RTADSTELPDF- LjMYB14 227 TVSNELPLEEPFN-YDETFKQ-SYGSNSNFDDGMDFWYDIFI---KTEDPVELPEF- LjMYB15 205 TVSNELQPQCSVN-SVETFQV--QSNGSKIDDGMDFWYDLYI---RSGESTELPEL- AtMYB15 238 EVAGLVEIQQEFQ------NLGSANNEMIFDSEMDFWFDVLA---RTGGEQDLLAGL AtMYB14 200 KIEDWEGLIDRNS------KKCSYSNSKLYNDDMEFWFDVFTSNRRIEEFSDIPEF-

Figure 35. Comparison of predicted PvMyb15 protein sequence of P.vulgaris with homologous MYB proteins of other species. Identical amino acids are highlighted in dark gray and similar amino acids in light gray. The R2R3-binding domain is underlined. The blue box indicates specific residues that form the motif implicated in bHLH co-factor interaction in Arabidopsis (Zimmermann et al. 2004). The accession numbers of these proteins, or translated products, in the GenBank database are as follows: Arabidopsis thaliana (AtMYB14: AEC08504; AtMYB15: AEE76741), Lotus japonicus (LjMYB14: chr5.CM0071.410.nd; LjMYB15: chr6.CM1613.30.nc) and Glycine max (GmMYB29B2: BAA81736).

127

Three-Dimensional Model Analysis

A comparative modeling of 3D structure of PvMYB15 protein was performed using Phyre2 server (Kelley and Sternberg, 2009). The model covered amino acids 12-116 of PvMYB15 and consisted of α-helices and β-turns (Fig 36).

Figure 36. Predicted three dimensional structure of PvMyb15 using phyre2 server (100% confidence) (Kelley and Sternberg, 2009). R2 and R3 domains are specified.

Figure 36 illustrates the predicted 3D structure of PvMyb15 based on its homology to ternary protein-dna complex1 of human and mouse. This figure shows only residues 10 to 111.

128

In silico upstream cis-acting elements of PvMyb15

Motif analysis of the 1.5-kb upstream region from the start codon, probably corresponding to the promoter, suggested that this region comprised 11 types of cis-acting regulatory elements (Table

18). Interestingly, most of these elements are involved in light responsiveness.

Table 18. Potential cis-acting elements associated with the PvMYB15 gene 1500 bp upstream of the gene start codon.

Motif Strand Distance Sequence Function from ATG 5UTR Py-rich - 1256 TTTCTCTCTCTCTC Cis-acting element conferring high stretch transcription levels - 1262 TTTCTCTCTCTCTC - 1258 TTTCTCTCTCTCTC - 1267 TTTCTTCTCT ABRE + 1121 ACGTGGC Cis-acting element involved in the abscisic acid responsiveness ATCT-motif - 205 AATCTAATCC Part of a conserved DNA module involved in light responsiveness - 529 AATCTAATCC G-Box - 1120 CACGTT Cis-acting regulatory element involved in light responsiveness GA-motif - 1471 AAGGAAGA Part of a light responsive element Sp1 - 524 CC(G/A)CCC Light responsive element TC-rich repeats - 167 ATTTTCTCCA Cis-acting element involved in defense and stress responsiveness + 1461 ATTTTCTTCA TCA-element + 586 CCATCTTTTT Cis-acting element involved in salicylic acid responsiveness - 686 CCATCTTTTT circadian - 206 CAANNNNATC Cis-acting regulatory element involved in circadian control - 935 CAANNNNATC chs-CMA1a - 551 TTACTTAA Part of a light responsive element ARE - 708 TGGTTT Cis-acting regulatory element essential for the anaerobic induction + 1299 TGGTTT

129

454-Sequencing of PV-GBa 0079P22 Clone (PvPAL2)

The PV-GBa 0079P22 BAC clone was selected with the PAL2 gene clone and sequenced. The raw data consisted of 14,110 reads with an average length of 338.18bp, giving a total of

4,771,698bp of sequence information (Table 11). A reference (Pv07:36300000..37849999) assembly of these sequences resulted in a 109,205bp contig with 21X coverage.

Table 19. Next generation sequencing of PV-GBa 00479P22 clone. Count Average length Total bases Reads 13,071 262.61 3,432,626 Matched 12,260 262.8 3,221,920 Not matched 811 259.81 210,706

Functional annotation of the BAC clone PV-GBa 0079P22

Analysis of the 109,205 bp sequence of PV-GBa 0079P22 revealed the presence of 18 putative genes (Table 20, Figure 37) including four novel putative genes with no similarities to any other genes currently described. Other putative genes identified in this clone include three gag-pol retrotransposon protein genes, an ethylene-insensitive protein, phenylalanine ammonia-lyase

(PAL), phytoalexin-deficient 4-2 protein and putative receptor-like protein kinase.

The amino acid sequence of the PAL2 gene was compared to the protein database to determine the presence of conserved domains.

130

Phenylalanine ammonia- Putative receptor-like protein kinase Phytoalexin-deficient 4-2 protein lyase class 2 0

Receptor-like No hit Putative retrotransposon protein kinase 20

Receptor-like Receptor-like Receptor-like protein kinase protein kinase protein kinase Gag-polpolyprotein 40

Gag-polpolyprotein No hit Gag-polpolyprotein 60

Hypothetical protein Hypothetical protein 80

No hit No hit Ethylene-insensitive protein 2-like 100

Key: = Retrotransposon elements = Chloroplast DNA = Homology identified coding region = Mitochondrial DNA = Hypothetical proteins = Phenylpropanoid gene = No hit Arrows point in the proposed direction of transcription

Figure 37. Schematic representation of the distribution of putative genes identified in P. vulgaris PV-GBa 00479P22 BAC clone

131

Table 20. Predicted genes in the BAC clone PV-GBa 00479P22 and their putative function Position on Length BLASTX Accession number E BAC (bp) (organism) value 486-7435 6949 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 2E-116 9692-14157 4465 Phytoalexin-deficient 4-2 protein gi|396582331 (P. vulgaris) 4E-16 14376-18606 4230 Phenylalanine ammonia-lyase class 2 gi|129585 (P. vulgaris) 0 18899-25440 6541 No hit 30101-35330 5229 Putative retrotransposon gi|353685481 (P. vulgaris) 5E-41 35454-37134 1680 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 4E-53 38616-42938 4322 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 6E-128 43022-44194 1172 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 4E-141 44753-46393 1640 Putative receptor-like protein kinase gi|396582329 (P. vulgaris) 4E-92 46863-62675 15812 Gag-pol polyprotein gi|38194929 (P. vulgaris) 0 63697-66626 2929 Gag-pol polyprotein gi|38194929 (P. vulgaris) 4E-118 66637-71015 4378 No hit 72272-86411 14139 Gag-pol polyprotein gi|38194929 (P. vulgaris) 4E-160 86427-92994 6567 Hypothetical protein gi|147784677 (V. vinifera) 9E-138 93036-99950 6914 Hypothetical protein gi|147828431 (V. vinifera) 4E-33 100929-102072 1143 No hit 103339-104441 1102 No hit 104823-109127 4304 Ethylene-insensitive protein 2-like gi|356548291 (G. max) 1E-85

PvPAL2 sequence analysis

The coding sequence of PvPAL2 is 3835bp long and has one intron, which is 1696bp long. The two exons are 389 and 1750bp long and code for a protein with 712 amino acid residues. Fig 38 shows the amino acid sequence of PvPAL2. Conserved domain database search in NCBI with

PvPAL2 amino acid sequence supports the contention that PvPAL2 has a lyase domain

(highlighted).

132

MDATPNGKDAFCVTAANAAGDPLNWAAAAEALSGSHLDEVKRMVAEYRKPAVRLGGQTLTIA QVAATAAHDQGLKVELAESARACVKASSDWVMESMDKGTDSYGVTTGFGATSHRRTKQGGA LQKELIRFLNAGIFGNGTESNCTLPHTATRAAMLVRVNTLLQGYSGIRFEILEAITKLLNNNITPCL PLRGTITASGDLVPLSYIAGLLTGRPNSKAVGPSGEILNAKEAFELANIGSEFFELQPKEGLALVNG TAVGSGLASIVLFEANILAVLSEVISAIFAEVMQGKPEFTDHLTHKLKHHPGQIEAAAIMEHILDG SSYIKAAKKLHEIDPLQKPKQDRYALRTSPQWLGPQIEVIRFSTKSIEREINSVNDNPLIDVSRNKA LHGGNFQGTPIGVSMDNTRLAIASIGKLMFAQFSELVNDYYNNGLPSNLTASRNPSLDYGFKGAE IAMASYCSELQYLANPVTSHVQSAEQHNQDVNSLGLISSRKTNEAIEILKLMSSTFLVALCQAIDL RHLEENLKNTVKNVVSQVAKRTLTTGVNGELHPSRFCEKDLLKVVEREYTFAYIDDPCSGTYPL MQKLRQVLVDYALANGENEKNLNTSIFQKIASFEEELKTLLPKEVEGARLAYENDQCAIPNKIKD CRSYPLYKFVREELGTSLLTGEKVISPGEECDKVFSAMCQGKIIDPLLECLGEWNGAPLPIC Figure 38. Deduced amino acid sequence of PvPAL2. Highlighted region shows the phenylalanine ammonia-lyase conserved domain.

Three-Dimensional Model Analysis

In order to better understand the deduced PvPAL2 protein, a comparative modeling of 3D structure of this protein was performed using SWISS-MODEL (Arnold, K. 2006). The template for modeling was 3D structure of parsley PcPAL (PDB No.1w27). The model covered the 21-

712 amino acids of PvPAL2 and consisted of α-helices, β-turns and random coils, and extended strands (Fig 39).

133

Figure 39. The three-dimensional structure of the predicted protein of the PvPAL2 gene (100% confidence) (Arnold, 2006). PAL active site is shown.

454-Sequencing of PV-GBa 0061D18 Clone (PvPAL3)

The PV-GBa 0061D18 BAC clone was selected with the PAL3 gene fragment and sequenced.

The raw data consisted of 50,304 reads with an average length of 337.37bp, giving a total of

16,970,878bp of sequence information (Table 21). A reference (Pv08:57942533..59662532) assembly of these sequences resulted in a 162,475bp contig with 59X coverage.

Table 21. Next generation sequencing of PV-GBa 0061D18 clone. Count Average length Total bases Reads 37,270 264.35 9,852,489 Matched 36,867 265.53 9,789,228 Not matched 403 156.98 63,261

134

Functional annotation of the BAC clone PV-GBa 0061D18

An analysis of the 162,475bp PV-GBa 0061D18 insert sequence revealed the presence of 24 putative genes (Table 22, Figure 40), including two novel putative genes with no similarities to any other genes currently described. Other putative genes identified in this clone include: phenylalanine ammonia-lyase class 3, transmembrane 9 superfamily member 4-like, chlorophyll a-b binding protein 3, coproporphyrinogen oxidase, ATPase WRNIP1-like, anthocyanidin 3-O- glucosyltransferase 1-like protein, VQ motif-containing protein, RING/U-box domain-containing protein, UDP-glycosyltransferase 82A1-like protein, putative serine/threonine-protein kinase,

MATE efflux family protein 3, chloroplastic-like, isocitrate dehydrogenase NAD regulatory subunit 1, zinc finger protein ZAT3-like and Niemann-Pick C1 protein-like.

135

Phenylalanine Transmembrane 9 superfamily Uncharacterized protein ammonia-lyase class 3 member 4-like 0

Chlorophyll a-b binding protein 3, Coproporphyrinoge ATPase Pentatricopeptide chloroplastic-like isoform 1 n oxidase WRNIP1-like protein 20

Choline-phosphate Anthocyanidin 3-O-glucosyltransferase 1-like protein cytidylyltransferase B-like protein 40

VQ motif- RING/U-box domain- UDP-glycosyltransferase containing protein containing protein 82A1-like protein No hit 60

Putative serine/threonine- Putative serine/threonine- protein kinase protein kinase Uncharacterized protein 80

Isocitrate dehydrogenase MATE efflux family protein 3 NAD regulatory subunit 1 100

UncharacteriZinc finger zed protein protein No hit Protein NLP2-like Niemann-Pick C1 protein-like 120

140

160

Key: = Retrotransposon elements = Chloroplast DNA = Homology identified coding region = Mitochondrial DNA = Hypothetical proteins = Phenylpropanoid gene = No hit Arrows point in the proposed direction of transcription

Figure 40. Schematic representation of the distribution of putative genes identified in P. vulgaris PV-GBa 0061D18 BAC clone

136

Table 22. Predicted genes in the BAC clone PV-GBa 0061D18 and their putative function Position on Length BLASTX Accession number E BAC (bp) (organism) value 1685-9978 8293 Uncharacterized protein gi|356553440 (G. max) 0 10170-14782 4612 Phenylalanine ammonia-lyase class 3 gi|129586 (P. vulgaris) 0 15412-19466 4054 Transmembrane 9 superfamily gi|356501751 (G. max) 0 member 4-like 19611-26997 7386 Chlorophyll a-b binding protein 3, gi|356554088 (G. max) 2E-141 chloroplastic-like isoform 1 27605-31487 3882 Coproporphyrinogen oxidase gi|462260 (G. max) 3E-160 31636-34251 2615 Atpase WRNIP1-like gi|356553438 (G. max) 0 34412-38178 3766 Pentatricopeptide repeat-containing gi|356499507 (G. max) 0 protein At3g22150, chloroplastic 38888-40891 2003 Anthocyanidin 3-O- gi|396582354 (P. vulgaris) 0 glucosyltransferase 1-like protein 41797-52860 11063 Anthocyanidin 3-O- gi|396582355 (P. vulgaris) 0 glucosyltransferase 1-like protein 53188-60042 6854 Choline-phosphate gi|396582358 (P. vulgaris) 3E-163 cytidylyltransferase B-like protein 60441-63548 3107 VQ motif-containing protein gi|396582359 (P. vulgaris) 2E-78 63965-69970 6005 RING/U-box domain-containing gi|396582360 (P. vulgaris) 0 protein 70082-77492 7410 UDP-glycosyltransferase 82A1-like gi|396582346 (P. vulgaris) 0 protein 77637-80311 2674 No hit 80693-83314 2621 Putative serine/threonine-protein gi|396582348 (P. vulgaris) 3E-118 kinase 85526-89182 3656 Putative serine/threonine-protein gi|396582349 (P. vulgaris) 0 kinase 89406-107493 18087 Uncharacterized protein gi|356499513 (G. max) 0 109274-117209 7935 MATE efflux family protein 3, gi|356553429 (G. max) 0 chloroplastic-like 117223-119052 1829 Isocitrate dehydrogenase NAD gi|356552735 (G. max) 0 regulatory subunit 1, mitochondrial- like 119103-121052 1949 Uncharacterized protein gi|351725127 (G. max) 7E-111 122087-123547 1460 Zinc finger protein ZAT3-like gi|356499523 (G. max) 2E-23 123902-125893 1991 No hit 125955-130752 4797 Protein NLP2-like gi|356499521 (G. max) 0 130884-162402 31518 Niemann-Pick C1 protein-like gi|147770431 (G. max) 0

137

PvPAL3 sequence analysis

The ORF for PvPAL3 is 2576bp in length and has two introns 445bp and 70bp long. The three exons that are 389bp, 412bp and 1260bp in length code for a protein with 686 amino acid residues. The amino acid sequence of the PAL3 gene was compared to the protein database to determine the presence of conserved domains. Figure 41 shows amino acid sequence of PvPAL3 and the conserved lyase domain is highlighted. In Figure 44 all important residues for the active site, homotetramer interaction and phosphorylation are specified in detail.

MATITPQKGSQSEVGVSNTDPLNWGNAVESLKGSHLEEVKGMVAEYREAVIHVGGGETLTVSK VAAVANQYLQAKVDLSESAREGVDSSCKWIVDNIDKGIPIYGVTTGFGANSNRQTQEGLALQKE MVRFLNCAIFGYQTELSHTLPKSATRAAMLVRVNTLLQGYSGIRFEILEAITKLLNHNVTPILPLR GTITASGDLIPLSYIAALLIGRRNSKAVGPSGESLNAKEAFHLAGVDGGFFELKPKEGLALVNGTA VGSGVASMGKPEFTDHLIHKLKYHPGQIEAAAIMEHILDGSSYVKNAKLQQPDPLQKPRKDRYA LVTSPQWLGPQIEIIRFSTKSIEREINSVNDNPLIDVTRNKAVSGGNFQGTPIGVSMDNARLAVASI GKLIFAQFTELANDLYNNGLPSNLSVGRNPSLDYGFKASEVAMAAYCSELQYLANPVTSHVQST EQHNQDVNSLGLISALKTVEAIEILKLMSSTYLVALCQAIDLRHLEEIFKNTVKNTVSRVALKTLT TEDKEETNPFRFSEEELLKVVDREYVFSYIDDPLNVRYPLMPKLKQVLYEQAHTSVINDKNVSLL VFEKIGAFEDELKSLLPKEVESARVAYENGNPATPNRIKECRSYPLYKFVREELGIRLLTGEKALS PDEEFEKVYTAMCQAKIIDPILECLEDWNGVPIPI Figure 41. Deduced amino acid sequence of PvPAL3. Highlighted region shows the phenylalanine ammonia-lyase conserved domain.

Three-Dimensional Model Analysis

A comparative modeling of 3D structure of this PvPAL3 was performed using SWISS-MODEL

(Arnold, K. 2006). The template for modeling was 3D structure of parsley PcPAL (PDB

No.1w27). The model covered the 20-710 amino acids of PvPAL3 and consisted of α-helices, β- turns and random coils, and extended strands (Fig 42).

138

Figure 42. The three-dimensional structure of the predicted protein of the PvPAL3 gene (100% confidence) (Arnold, K. 2006)

The PvPAL3 gene from G19833 cultivar (Clemson genomic institute BAC library) encodes for a polypeptide of 686 amino acids with 71% similarity to a protein encoded by

PvPAL2 from the same source. At the nucleotide level, PvPAL2 and PvPAL3 showed 70% sequence similarity in exon I and extensive sequence divergence in the intron, 5' and 3' flanking regions (Fig 43). Transcription start sites of PvPAL2 and PvPAL3 are located 129 bp and 396 bp upstream of the initiation codon ATG, respectively (Cramer et al, 1989).

139

PvPAL3 1 ATGGCAACCATTACTCCACAAAAGGGTAGCCAAA-GTGAGGTGGGTGTGAGCAACACAG- PvPAL2 1 ATGG-ACGCAACGCCCAATGGAAAAGACGCTTTCTGCGTCACTGCCGCCAATGCCGCGGG

PvPAL3 59 --ACCCCCTCAACTGGGGTAATGCAGTGGAGTCACTGAAGGGTAGCCACTTGGAGGAAGT PvPAL2 60 GGACCCGCTCAACTGGGCCGCTGCGGCGGAGGCGCTCTCGGGCAGTCACCTCGACGAGGT

PvPAL3 117 AAAAGGGATGGTGGCGGAGTACCGGGAGGCAGTGATCCATGTGGGGGGAGGGGAGACACT PvPAL2 120 CAAGCGCATGGTCGCCGAGTACCGCAAGCCCGCGGTGCGCCTCGGCGGTC---AGACGCT

PvPAL3 177 TACTGTATCAAAGGTGGCTGCTGTTGCAAACCAGTACTTGCAGGCTAAGGTTGACCTCTC PvPAL2 177 CACCATCGCTCAGGTAGCGGCCACTGCGGCGCACGACCAGGGTCTCAAGGTGGAGCTGGC

PvPAL3 237 TGAGTCTGCAAGAGAAGGCGTTGACTCCAGCTGCAAGTGGATCGTGGATAACATCGACAA PvPAL2 237 GGAGTCCGCCAGGGCCTGCGTCAAGGCCAGCAGTGACTGGGTGATGGAGAGCATGGACAA

PvPAL3 297 AGGCATTCCCATTTATGGTGTCACCACTGGTTTTGGTGCAAACTCCAACAGGCAAACTCA PvPAL2 297 GGGGACTGACAGTTACGGTGTCACCACCGGGTTCGGTGCAACCTCCCACCGCCGTACCAA

PvPAL3 357 AGAAGGCCTTGCTCTTCAGAAGGAAATGGTTAGgtga------ggatc------PvPAL2 357 ACAAGGTGGCGCCTTGCAGAAGGAGCTTATCAGgtcactataatctacaaaatctctctc

PvPAL3 399 ------actcaact----ctgct-ttgc PvPAL2 417 tctctctctctttctgtatgcattacattattacacatgctcatctgcaactggtgttgc

PvPAL3 416 ---ttgtgaa------gtcgtacactc------cttttaa-atgtcgtg PvPAL2 477 acctcgtgaatcaacgcaagaaaatagcatattttctaccacatctttcaacatgacatg

PvPAL3 449 tctgtat------cgaca---tatgtatcggatatcga------PvPAL2 537 tatatatataaacacaacaatttgtgtattacatattaaatttgttttttcttgtcttgc

PvPAL3 478 ------cattcgta------PvPAL2 597 gatcttacgaatgatgttaaaacttatttcactagtaacatctctttcttgtatggttct

PvPAL3 486 ---gaac------agtg------atgtgtc---- PvPAL2 657 atcgaacctgtaactgtagtgcagcatcgtgaaggaaaattgcatgatcatgtatcataa

PvPAL3 501 ------taattt------PvPAL2 717 aatgaaaaaaaaacatattattttacataaattaatttcacaacaagattgttttaacat

PvPAL3 507 ------aatatttg------ttgaa------PvPAL2 777 cgtttgttgtaatgtttgaaaaataagaaatgtgtgaaaaaaattgaaggatgttgtgtt

PvPAL3 520 ------tttctg------PvPAL2 837 taatatataatatatactaatagtaaaatttgttttttctactttctctgaacttttctt

PvPAL3 526 ------gtaatttaaaac-- PvPAL2 897 ccctctttgccttgtacccatcccacaccctctttcattatgagaaataattaaagatgg

PvPAL3 538 ------acattttt------PvPAL2 957 acgatctaattaatattatatgaacgttttttattattcttcttttattcccttatatat

PvPAL3 546 ------PvPAL2 1017 tattattccgttgtatttttctgtttctgttgcccattcatcattttctattgcccacca

PvPAL3 546 ------aaa------PvPAL2 1077 tgcgtcttatactttattttattactgctttcatttctcttaagtaaattattgttgctc

140

PvPAL3 549 ------aaaata--ca------gggtttgtat------cagtgttt PvPAL2 1137 ggctacctaaaatattcaccacctaattaacgaatttaaatgaaatattattcattgttt

PvPAL3 575 gtca--gt------acc------agtg PvPAL2 1197 ctcatagttgagaaaagaggtggataagttggctcacctaacacaggaaatttggtaata

PvPAL3 588 ttt------gtttctt------PvPAL2 1257 ttttcttttattagagagagttgtttcttatactaaaaaaattagtttagtatgctatat

PvPAL3 598 -tgtttttgctac------gtagatgtatttttc------tagt------PvPAL2 1317 ttatttttacttttaattctagttatatatgtttttttcctaatatagtaagaattttga

PvPAL3 629 ------ctgtaggaattatccaa--cttcata------PvPAL2 1377 aaagtttctggaacaactgttcaaaccttcatacaattttagataaattttttataaaaa

PvPAL3 653 gtgaaaacact------aattat----- PvPAL2 1437 ataaaaactcttcagacaaataattgatataaaaatatattataataataattatgtgta

PvPAL3 670 tctgagat--tatc------actttt---ctctttt------PvPAL2 1497 tctataataatatttatgataatttctgagctattttaaaatatatttggtgaacaaata

PvPAL3 695 ------gtctt------gtaa------PvPAL2 1557 ataaacgcggttttaataatttgttgggtggtaaattaaaagaacacgtgctggcattca

PvPAL3 704 ------gttgtatcatc--aa------cttatct---ct-----ttcaactga- PvPAL2 1617 aagagagttgtattgtctgaatcggatactaccctgatctggactcgagatcctattgag

PvPAL3 735 -taaaa------gaatcacttcat------atacaacac------PvPAL2 1677 tcgaaacggccgagtggcgagttaactcattaaatgataacacgacgcacgtgcgcgagg

PvPAL3 761 -----gaaactgc------PvPAL2 1737 ttggtgagactgccaagccctccaacctttccaagtcctaacactccatctttgacttct

PvPAL3 769 ------attgattatat------PvPAL2 1797 tagcttctcaattttctaatttattttttgtcctaatttattacacccaatcaattcaat

PvPAL3 780 ------ataat------PvPAL2 1857 tttttaattttttgtatttttttatgacatcagcacactaataatgtcaattatttattt

PvPAL3 785 ------tcttct-----t PvPAL2 1917 ttcgtttttcccctattttatcatggtgtctgttccaaaagaattatgtcttttcttggt

PvPAL3 792 tttg---gttgatgt------att PvPAL2 1977 tttgtatgttgatgtcctcacctaattccccacccaactacatcatatcttttattaatt

PvPAL3 807 catgga------tttacctctttt-----CCttcccaagGTTTTTAAACTGC PvPAL2 2037 tataaaaaagttcataattcttaatcttttttttgttcttctttcagGTTTTTGAATGCT

PvPAL3 848 GCCATATTTGGCTACCAAACGGAGTTATCTCATACACTGCCAAAATCAGCAACCAGAGCA PvPAL2 2097 GGGATATTTGGCAATGGTACAGAGTCCAACTGCACCCTACCCCACACTGCAACTAGAGCA

PvPAL3 908 GCAATGCTTGTGAGGGTTAATACCCTTCTTCAAGGGTATTCAGGTATTAGATTTGAAATC PvPAL2 2157 GCTATGCTTGTGAGAGTGAACACTCTTCTCCAAGGGTACTCAGGAATTAGATTTGAAATT

PvPAL3 968 CTAGAAGCTATCACCAAACTCCTCAACCACAATGTCACCCCCATCTTGCCCTTACGTGGT PvPAL2 2217 TTGGAGGCCATCACCAAGCTTTTGAACAACAACATTACTCCATGTTTGCCACTAAGGGGT

141

PvPAL3 1028 ACAATCACTGCTTCTGGTGATCTGATTCCTCTGTCCTACATTGCTGCATTGCTAATTGGT PvPAL2 2277 ACAATTACAGCATCTGGTGATCTTGTACCTTTGTCATACATTGCCGGTTTGCTAACTGGT

PvPAL3 1088 AGAAGAAACAGTAAAGCTGTTGGACCCTCTGGAGAGTCCCTCAATGCTAAGGAAGCTTTC PvPAL2 2337 AGGCCAAACTCCAAGGCTGTTGGTCCCTCTGGAGAGATTCTGAATGCTAAGGAAGCCTTT

PvPAL3 1148 CACTTAGCAGGTGTAGATGGTGGGTTCTTTGAGTTGAAGCCTAAGGAAGGTCTTGCCCTT PvPAL2 2397 GAATTGGCCAACATTGGTTCTGAGTTCTTTGAGTTGCAACCTAAGGAAGGTCTTGCCCTT

PvPAL3 1208 GTAAATGGCACAGCCGTTGGGTCTGGTGTAGCTTCTATGGTGCT-TTTGAGGCAAACATA PvPAL2 2457 GTGAATGGAACGGCTGTTGGCTCTGGCTTGGCCTCTATTGTTCTCTTTGAAGCAAACATC

PvPAL3 1267 TTAGCT-TATTGGCAGAAGTTCTATCAGCAGTTTTTGCTGAAGTGATGCAGGGAAAGCCA PvPAL2 2517 CTTGCCGTCTTGTCTGAAGTTATTTCAGCAATTTTTGCTGAAGTGATGCAAGGAAAGCCC

PvPAL3 1326 GAGTTCACAGATCACCTTATACATAAGCTGAAGTACCATCCTGGTCAAATTGAAGCAGCT PvPAL2 2577 GAGTTCACTGATCATTTGACTCATAAGCTAAAGCACCACCCTGGCCAGATTGAGGCTGCT

PvPAL3 1386 GCTATTATGGAACATATTCTAGATGGAAGCTCTTATGTCAAGAATGCTAAA---CTGCAA PvPAL2 2637 GCTATTATGGAGCACATTTTGGATGGAAGCTCTTACATCAAAGCTGCTAAGAAGTTGCAT

PvPAL3 1443 CAGCCAGATCCATTGCAGAAGCCTAGAAAAGATCGTTATGCTCTTGTAACTTCTCCTCAA PvPAL2 2697 GAGATTGATCCTTTGCAGAAACCCAAACAAGATCGCTATGCCCTTAGAACTTCACCCCAA

PvPAL3 1503 TGGCTTGGTCCACAGATTGAAATCATCAGGTTTTCGACCAAATCAATTGAAAGGGAAATA PvPAL2 2757 TGGCTTGGTCCTCAAATTGAAGTGATCAGGTTTTCTACCAAGTCAATTGAGAGGGAGATC

PvPAL3 1563 AACTCAGTAAATGACAATCCCTTGATTGATGTCACAAGGAACAAGGCTGTGAGTGGTGGT PvPAL2 2817 AACTCAGTCAATGACAACCCTTTGATTGATGTGTCTAGGAACAAGGCCTTGCATGGTGGT

PvPAL3 1623 AATTTCCAAGGCACCCCAATTGGAGTTTCCATGGATAATGCACGTTTAGCTGTTGCTTCA PvPAL2 2877 AACTTCCAAGGAACTCCAATTGGAGTCTCCATGGATAACACCCGTTTGGCTATTGCTTCA

PvPAL3 1683 ATTGGCAAACTCATCTTTGCACAATTTACTGAACTAGCCAATGATTTGTATAATAATGGG PvPAL2 2937 ATTGGAAAACTCATGTTTGCTCAATTCTCTGAGCTTGTCAATGACTATTACAATAATGGG

PvPAL3 1743 CTACCATCAAACCTCTCTGTTGGTAGAAATCCAAGTCTGGATTATGGGTTCAAGGCATCT PvPAL2 2997 TTGCCTTCTAATCTCACTGCCAGCAGAAACCCCAGCTTGGATTATGGTTTCAAGGGAGCT

PvPAL3 1803 GAAGTTGCCATGGCTGCTTATTGTTCTGAACTTCAGTATCTAGCAAATCCAGTAACCAGC PvPAL2 3057 GAAATTGCCATGGCATCTTACTGCTCTGAACTCCAATATTTGGCTAACCCCGTAACCAGC

PvPAL3 1863 CATGTGCAAAGTACTGAGCAGCACAACCAAGATGTGAATTCTTTGGGCTTAATTTCGGCT PvPAL2 3117 CATGTCCAAAGCGCCGAGCAACACAACCAAGATGTGAACTCTTTGGGATTGATTTCATCT

PvPAL3 1923 TTGAAAACTGTGGAAGCCATAGAGATATTAAAGCTTATGTCTTCCACTTATCTGGTTGCA PvPAL2 3177 AGGAAAACCAATGAGGCTATTGAGATCCTCAAGTTAATGTCTTCAACTTTCCTCGTTGCA

PvPAL3 1983 CTCTGCCAAGCTATTGATTTGAGGCATTTGGAGGAAATTTTCAAGAACACTGTCAAGAAT PvPAL2 3237 CTTTGCCAGGCCATTGACTTGAGGCATTTGGAGGAGAATTTGAAGAACACTGTGAAGAAT

PvPAL3 2043 ACTGTGAGCAGAGTTGCACTGAAAACATTAACTACTGAAGACAAAGAAGAAACTAACCCA PvPAL2 3297 GTTGTGAGCCAGGTTGCCAAGAGGACTCTCACCACAGGTGTCAATGGAGAACTCCACCCT

PvPAL3 2103 TTTCGATTCAGTGAGGAAGAACTGCTTAAAGTGGTGGATAGAGAATATGTATTTTCATAC PvPAL2 3357 TCAAGATTTTGTGAGAAAGACTTGTTGAAAGTTGTTGAAAGGGAGTACACATTTGCCTAC

142

PvPAL3 2163 ATTGATGATCCCTTAAATGTAAGGTACCCATTGATGCCAAAACTAAAGCAGGTACTTTAT PvPAL2 3417 ATTGACGACCCTTGCAGTGGCACATACCCCTTGATGCAAAAACTAAGGCAAGTGCTTGTG

PvPAL3 2223 GAGCAAGCACATACCAGTGTCATTAATGACAAGAATGTGAGTTTGTTGGTTTTTGAGAAG PvPAL2 3477 GACTATGCATTGGCCAATGGAGAGAACGAGAAAAACTTGAACACGTCAATCTTCCAAAAG

PvPAL3 2283 ATTGGAGCTTTTGAGGATGAGTTGAAGTCTCTCTTGCCAAAGGAAGTAGAAAGTGCACGG PvPAL2 3537 ATTGCATCATTTGAGGAGGAGTTGAAGACCCTCTTGCCTAAGGAAGTGGAAGGTGCAAGA

PvPAL3 2343 GTAGCTTATGAGAATGGTAATCCAGCAACTCCAAACAGAATCAAGGAGTGCAGGTCATAT PvPAL2 3597 CTTGCATATGAGAACGACCAATGTGCAATTCCCAACAAGATCAAGGATTGTAGATCTTAC

PvPAL3 2403 CCACTGTACAAATTTGTGAGGGAGGAGTTAGGGATACGGTTGCTCACCGGCGAAAAAGCT PvPAL2 3657 CCCTTGTACAAGTTTGTGAGAGAGGAGTTAGGGACATCGTTGCTGACTGGTGAAAAGGTG

PvPAL3 2463 CTCTCTCCAGATGAGGAATTTGAAAAGGTTTATACAGCCATGTGTCAAGCAAAGATAATT PvPAL2 3717 ATCTCACCGGGTGAAGAGTGTGACAAAGTGTTCAGTGCTATGTGCCAAGGAAAAATCATT

PvPAL3 2523 GATCCAATTCTGGAATGTCTAGAAGATTGGAACGGGGTTCCCATCCCAATA--- PvPAL2 3777 GATCCTCTCTTGGAATGCCTTGGAGAGTGGAATGGTGCTCCTCTTCCAATTTGT

Figure 43. Genomic DNA sequence comparison of PvPAL2 and PvPAL3 using ClastalW2 program. Lower case indicates the intron. Dark shading indicates positions where both sequences are identical and lighter shading represents positions where there is a lower conservation.

Comparison of the protein sequences of PvPAL2 and PvPAL3 with soybean and

Arabidopsis PAL sequences (Fig 44) showed that PvPAL2 is 93% identical to GmPAL2

(NP_001236956.1). In addition, PvPAL3 has 90% identity to GmPAL3 (XP_003518580.1) and

67% to AtPAL3 (P45725). Alignment of sequences revealed the presence of a PAL–HAL

(phenylalanine ammonia lyase–histidine ammonia lyase) domain with all amino acid of the active site (Fig 44) (Rother et al, 2001). Cyclization of conserved Ala-Ser-Gly tripeptide (219-

221) is also important for activity of cofactor 4-methylidene-imidazole-5-one (MIO). A posttranslational phosphorylation site (Thr-545) (Alwood et al, 1999) was also conserved in

PvPAL2 and PvPAL3 sequences.

143

AtPAL3 1 ------MEFRQPNATALSDPLNWNVAAEALKGSHLEEVKKMVKDYRKGTVQLG GmPAL2 1 MASEANAANTNFCVNVSNNGYISANDPLNWGAAAEAMAGSHLDEVKRMLEEYRRPVVKLG GmPAL3 1 ------MATIILENDPLNWSHAADSLKGSHFEEVKRMVAEYRKPLISLG PvPAL2 1 MDATPN-GKDAFVV----TAANAAGDPLNWAAAAEALSGSHLDEVKRMVAEYRKPAVRLG PvPAL3 1 MATIT------PQKGSQSEVGVSNTDPLNWGNAVESLKGSHLEEVKGMVAEYREAVIHVG

AtPAL3 48 -GETLTI GQVAAVAS------GGPTVELSEEARGGVKASSDWVMESMNRDTDTYGITTGF GmPAL2 61 -GETLTISQVAAIA----AHDQGVKVELAESSRAGVKASSDWVMESMNKGTDSYGVTTGF GmPAL3 44 GGETLTISQVAAVAVANANHNLQAKVDLSESARAGVDASCDWITQNINKGTPIYGVTTGF PvPAL2 56 -GQTLTIAQVAATA----AHDQGLKVELAESARACVKAISDWVMESMDKGTDSYGITTGF PvPAL3 55 GGETLTVSKVAAV----ANQYLQAKVDLSESAREGVDSSCKWIVDNIDKGIPIYGVTTGF *** * * ##

AtPAL3 101 G SSSRRRTDQGAALQKELIRYLNAGIFATGNEDDDRSNTLPRPATRAAMLIRVNTLLQGY GmPAL2 116 GATSHRRTKQGAALQKELIRFLNAGIFGNGTE---SNCTLPHTATRAAMLVRINTLLQGY GmPAL3 104 GAASHRQTQQGLALQKEMVRFLNCAIFGYQTE---LSHTLPKSATRAAMLVRVNTLLQGY PvPAL2 111 GA TSHRRTKQGGALQKELIRFLNAGIFGNGTE---SNCTLPHTATRAAMLVRVNTLLQGY PvPAL3 111 GANSNRQTQEGLALQKEMVRFLNCAIFGYQTE---LSHTLPKSATRAAMLVRVNTLLQGY * * # ######## #

AtPAL3 161 SGIRFEILEAIT TLLNCKITPLLPLRGTITASGDLVPLSYIAGFLIGRPNSRSVGPSGEI GmPAL2 173 SGIRFEILEAITKLLN NNITPCLPLRGTITASGDLVPLSYIAGLLTGRPNSKAVGPSGEI GmPAL3 161 SGIRFEILEAITKLLNHNVTPILPLRGTVTASGDLIPLSYIVALLTGRRNSKAVGPSGES PvPAL2 168 SGIRFEILEAITKLLN NNITPCLPLRGTITASGDLVPLSYIAGLLTGRPNSKAVGPSGEI PvPAL3 168 SGIRFEILEAITKLLNHNVTPILPLRGTITASGDLIPLSYIAALLIGRRNSKAVGPSGES

## MIO # #

AtPAL3 221 L TALEAFKLAGVS-SFFELRPKEGLALVNGTAVGSALASTVLYDANILVVFSEVASAMFA GmPAL2 233 LNAKEAF ELANIGAEFFELQPKEGLALVNGTAVGSGLASIVLFEANIIAVLSEVISAIFA GmPAL3 221 LNAKEAFHLAGLHSGFFELKPKEGLALVNGTAVGSGVASTVLFEANILALLSEVLSAVFA PvPAL2 228 LNAKEAF ELANIGSEFFELQPKEGLALVNGTAVGSGLASIVLFEANILAVLSEVISAIFA PvPAL3 228 LNAKEAFHLAGVDGGFFELKPKEGLALVNGTAVGSGVASMVLFEANILALLAEVLSAVFA ***** *

AtPAL3 280 EVMQGKPEFTDHLTHKLKHHPGQIEAAAIMEHILDGSSYVK EALHLHKIDPLQKPKQDRY GmPAL2 293 EVMQGKPEFTDHLTHKLKHHPGQIEAAAIMEHIL EGSSYVKAAKKLHEIDPLQKPKQDRY GmPAL3 281 EVMQGKPEFTHHLIHKLKYHPGQIEAAAIMEHILDGSSYVKDA-KLQQPDPLQKPRKDRY PvPAL2 288 EVMQGKPEFTDHLTHKLKHHPGQIEAAAIMEHILDGSSY IKAAKKLHEIDPLQKPKQDRY PvPAL3 288 EVMQGKPEFTDHLIHKLKYHPGQIEAAAIMEHILDGSSYVKNA-KLQQPDPLQKPRKDRY * ## ## # # # ##

AtPAL3 340 ALRTSPQWLGPQIEVIR AATKMIEREINSVNDNPLIDVSRNKAIHGGNFQGTPIGVAMDN GmPAL2 353 ALRTSPQWLGP LIEVIRFSTKSIEREINSVNDNPLIDVSRNKALHGGNFQGTPIGVSMDN GmPAL3 340 ALVTSPQWLGPQIEIIRYSTKSIEREINSVNDNPLIDVTRNKALNGGNFQGTPIGVSMDN PvPAL2 348 ALRTSPQWLGPQIEVIRFSTKSIEREINSVNDNPLI SVSRNKALHGGNFQGTPIGVSMDN PvPAL3 347 ALVTSPQWLGPQIEIIRFSTKSIEREINSVNDNPLIDVTRNKAVSGGNFQGTPIGVSMDN * * * # ## ## ## # ## ## ## ### # ####### ## # #

144

AtPAL3 400 TRLA LASIGKLMFAQFTELVNDFYNNGLPSNLSGGRNPSLDYGLKGAEVAMASYCSELQF GmPAL2 413 TRLALASIGKLMFAQFSELVNDYYNNGLPSNLTASRNPSLDYGFKGAEIAMASYCSELQY GmPAL3 400 ARLAVASIGKLIFAQFTELVNDLYNNGLPSNLSAGRNPSLDYGFKASEVAMAAYCSELQY PvPAL2 408 TRLA IASIGKLMFAQFSDLVNDYYNNGLPSNLTASRNPSLDYGFKGAEIAMASYCSELQY PvPAL3 407 ARLAVASIGKLIFAQFTELANDLYNNGLPSNLSVGRNPSLDYGFKASEVAMAAYCSELQY

# # # # # # ## # #### # ###### ##

AtPAL3 460 LANPVT NHVESASQHNQDVNSLGLISSRTTAEAVVILKLMSTTYLVALCQAFDLRHLEEI GmPAL2 473 LANPVTSHVQSAEQHNQDVNSLGLISSRKT HEAIEILKLMSSTFLVALCQAIDLRHLEEN GmPAL3 460 LANPVTSHVQSAEQHNQDVNSLGLISALKTVEAVEILKLMSSTYLVALCQAIDLRHLEEN PvPAL2 468 LANPVTSHVQSAEQHNQDVNSLGLISSRKT NEALEILKLMSSTFLVALCQAIDLRHLEEN PvPAL3 467 LANPVTSHVQSTEQHNQDVNSLGLISALKTVEAIEILKLMSSTYLVALCQAIDLRHLEEI

## ## #### # ## # ##

AtPAL3 520 LKKAVNEVVSHTAKSVLA------IEPFRK-HDDILGVVNREYVFSYVDDPSSLTNPLMQ GmPAL2 533 LKNTVKNVVS QVAKRTLTTGVNGELHPSRFCEKDLLKVVDREYTFAYIDDPCSGTYPLMQ GmPAL3 520 FKSTVKNTVSRVAQKTLITEGKEEINPFRLCEKDLLKVVDREYVFSYIDDPSNVTYPLMP PvPAL2 528 LKNTVKNVVSQVAKRTLTTGVNGELHPSRFCEKALLKVVEREYTFAYIDDPCSGTYPLMQ PvPAL3 527 FKNTVKNTVSRVALKTLTTEDKEETNPFRFSEEELLKVVDREYVFSYIDDPLNVRYPLMP

AtPAL3 573 KLRHVLFDKALAEPEGE---TDTVFRKIGAFEAELKFLLPKEVERVRTEYENGTFNVANR GmPAL2 593 KLRQVLVDYALANGENEKNTSTSIFQKIATFEEELKTLLPKEVEGARVAYENDQCAIPNK GmPAL3 580 KLKQVLYEKAHISAINDKNVSLLIFEKIGAFEDELKSLLPKEVENARVAYENGNPAIPNR PvPAL2 588 KLRQVLVDYALANGENEKNLNTSIFQKIASFEEELKTLLPKEVEGARLAYENDQCAIPNK PvPAL3 587 KLKQVLYEQAHTSVINDKNVSLLVFEKIGAFEDELKSLLPKEVESARVAYENGNPATPNR

AtPAL3 630 IKKCRSYPLYRFVRNELETRLLTGEDVRSPGEDFDKVFRAISQGKLIDPLFECLKEWNGA GmPAL2 653 IKECRSYPLYKFVREELGTALLTGERVISPGEECDKVFTALCQGKIIDPLLECLGEWNGA GmPAL3 640 IKECRSYPLYKFVREELEIGLLTGEKNLSPDEEFEKVYTAMCQAKIVDPILECLGDWKGS PvPAL2 648 IKDCRSYPLYKFVREELGTSLLTGEKVISPGEECDKVFSAMCQGKIIDPLLECLGEWNGA PvPAL3 647 IKECRSYPLYKFVREELGIRLLTGEKALSPDEEFEKVYTAMCQAKIIDPILECLEDWNGV

AtPAL3 690 PISIC GmPAL2 713 PLPIC GmPAL3 700 PIPI- PvPAL2 708 PLPIC PvPAL3 707 PIPI-

Figure 44. Comparison of amino acid sequences of PvPAL2 and PvPAL3 with PALs from soybean (GmPAL2: NP_001236956.1; GmPAL3: XP_003518580.1) and Arabidopsis thaliana PAL3 (P45725). Black and grey boxes indicate identical and similar residues, respectively. The PAL–HAL domain is indicated with a black box. Symbols: *: active site residues; #: amino acid that interacts in the homotetramer; ▲: putative phosphorylation site.

145

To study the evolutionary relationship among PAL proteins, PvPAL2 and PvPAL3 were compared to a set of 20 plant PALs, using Mega 4.0. PvPAL2 and PvPAL3 clustered with

GmPAL2 and GmPAL3 respectively (Fig 45).

100 NtPAL 100 NaPAL 100 SlPAL5 68 IbPAL PcPAL3 24 100 PcPAL2 AtPAL2

100 AtPAL1 17 100 BrPAL2 PtPAL3

99 PsPAL2 100 51 PsPAL1 41 MsPAL 95 LjPAL

53 GmPAL2 100 PvPAL2 VuPAL 99 CaPAL AtPAL3 100 AtPAL4 GmPAL3 100 PvPAL3

0.02

Figure 45. A Neighbor-joining phylogenetic tree of Phenylalanine ammonia-lyase (PAL) amino acids sequences. Sequences were aligned using CLASTALW and the tree was inferred using the neighbor-joining method (MEGA4) with 1,000 bootstrap replicates.

146

Discussion

The gene identity of the P factor (Bassett, 2007), which plays an important role in determining seed coat colour in common bean, is unknown, but work described in the previous chapter placed several phenylpropanoid genes, including PAL1, PAL2 and Myb15 close to its map location on

Pv07 (Erdmann et al. 2002). The current work tested the possibility that the linked genes were responsible for the P locus phenotype by studying the expression of the linked genes in a series of tester lines developed by Basset (1992) to facilitate seed coat colour genotyping of unknown lines. One of these lines is Florida dry-bean breeding line 5-593 with black seed and Bishop’s violet flowers (Bassett, 1998). This line has dominant alleles for all colour genes (PPCCDDJJ).

The tester Pcdj BC3 5-593 is dominant for P but recessive for c, d and j (PPccddjj) and has white seed coat. The line pBC3 5-593 is recessive for P but dominant for all other colour genes

(ppCCDDJJ) and is a white seeded line. Based on genetic description of tester lines, any candidate gene for P locus should highly express in BC35-59 and Pcdj BC3 5-593 but not in pCDJBC3 5-593. The expression patterns of the structural phenylpropanoid genes for three tester lines were only different for CHS (chalcone synthase), DFR (dihydroflavonol 4-reductase) and

ANS (anthocyanin synthase). However, none of these structural phenylpropanoid genes showed an expression pattern consistent with P candidate gene.

PvCHS-A and PvCHS-B

The structure of the PvCHS-A gene in the PV-GBa 0083H05 clone consisting of two exons and a single intron, is typical of CHS genes seen in a number of species (Yang et al. 2002). As observed in most comparisons of homologous genes, the differences in structure between the genomic sequences for CHS-A and a representative of the CHS-B form were most evident in the intron, with CHS-A having a considerably larger intron sequence. Exon two was more conserved

147 than exon one in length and encoded almost all the amino acid residues of the CHS active site.

Nucleotide and amino acid sequences of PvCHS-B-3, PvCHS-B-5 and PvCHS-B-7 were different from the rest of the group. PvCHS-B-3 had the largest deletion in its second exon, followed by PvCHS-B-7 and PvCHS-B-5. A similar pattern was seen at amino acid level with

PvCHS-B-3, PvCHS-B-5 and PvCHS-B-7 having 92, 203 and 100 amino acids of the conserved

389 residues of bean CHS17 protein.

Differences in the sequence occurred throughout the exons between the homologs, but because of splicing and codon redundancy, the amino acid sequence identity between the CHS-A and CHS-B forms was quite high (84%), especially in the four CHS-specific conserved motifs

(Motif I, II, III and IV), the active site, and catalytic residues and the CoA-binding sites (Seshime et al. 2005; Ferrer et al. 1999) in PvCHS-B-1, PvCHS-B-2, PvCHS-B-4, PvCHS-B-6 and

PvCHS-B-8. The three dimensional model obtained for PvCHS-B was very similar to previous structures obtained for alfalfa (Ferrer et al. 1999) and hops (Matouseka et al, 2002).

Beside the PvCHS-A gene in the PV-GBa 0083H05 clone, there were a number of retro transposable element genes, including several genes for putative gag/pol polyproteins.

Retrotransposons are a class of mobile elements (MEs) that are divided into five groups based on their structural organization and transposition mechanism (Sormacheva and Blinov, 2011). This five groups include: retrotransposons with long terminal repeats (LTR retrotransposons); non_LTR retrotransposons, which do not contain LTRs (are also called LINEs (long interspersed nuclear elements)); DIRS_like elements, containing a tyrosine recombinase gene instead of an integrase gene (INT) and split direct repeats (SDRs) or inverted repeats; Penelope_like elements

(PLEs), encoding reverse transcriptase (RT); and SINEs (short interspersed nuclear elements)

(Sormacheva and Blinov, 2011).

148

LTR retrotransposons are usually divided into two superfamilies, Ty1/copia and

Ty3/gypsy. The LTR retrotransposon body contains two gene sequences: gag and pol (Wicker et al., 2007). The gag gene encodes a protein similar to the nucleocapsid protein of viruses. The pol gene encodes a protein that has the protease (PR), reverse transcriptase (RT), and ribonuclease

(RH) activities required for ME replication and transposition. The integrase (INT) activity is also provided in the pol gene for integration of the ME into a new target (Wicker et al. 2007).

Schlueter at al. (2008) estimated that about 50% of the common bean’s genome is composed of repetitive sequences but most of these sequences are unknown. To date, only four groups of repetitive DNA have been characterized in P. vulgaris (Ribeiro et al. 2011).

Transposable elements and their remanants constitute 50-90% of higher plant genomes

(Wawrzynski et al. 2008). These repetitive DNAs have had a major impact on genome structure by promoting mutations of genes, affecting gene regulatory sequences and creating new genes by

“exon-shuffling” and retrotransposition.

Analysis of approximately 3.7 megabases (Mb) of genomic sequence of soybean (Glycine max), including 0.87 Mb of pericentromeric sequence, uncovered 45 intact long terminal repeat

(LTR)-retrotransposons. Similar analysis of 0.94 Mb of sequence from Phaseolus vulgaris

(common bean) uncovered some intact retroelements with no mutations accumulated in their

LTRs, indicating very recent insertion (Wawrzynski et al. 2008). Thus, retrotransposons appear to be abundant and active in both Glycine and Phaseolus. It seems that the G. max genome has been heavily impacted by the activity of retroelements and continues to be shaped by their replication. Relatively low frequency of insertion/deletion events observed in the LTRs of G. max compared to M. truncatula suggests that the G. max genome is likely still expanding. In addition, identification of a retroelement carrying an NB-LRR disease resistance-like gene

149 provides a potential new mechanism for the rapid evolution of new resistance genes

(Wawrzynski et al. 2008).

Out of six BAC clones sequenced and analyzed in this study, one (PV-GBa 0043K12:

PvMYB15) is heavily populated with predicted gag-pol polyproteins (eight genes) and two (PV-

GBa 0083H05: PvCHS-A and PV-GBa 0079P22: PvPAL2) have a moderate number of retrotransposones, with three and four genes respectively. These three clones are located around the centromeric region of respective chromosomes, with abundant repetitive DNA. Whereas, the other three BAC clones (PV-GBa 0061D18: PvPAL3; PV-GBa 0005G03: PvCHS-B and PV-

GBa 0072I22: PvDFR-A) tend to have one or no retrotransposones in their sequences and are located toward the telomeric ends on their respective chromosomes. The presence and amount of retrotransposones in the BAC clone has a significant effect on their gene number and the extent of their shared synteny with soybean, which will be discussed in the next chapter.

The PV-GBa 0005G03 clone contains less repetitive DNA but has eight full copies of

CHS with two of them in reverse orientation compared to the rest of the group and one partial copy of the CHS. Ryder and colleagues (1987) demonstrated that the haploid genome of bean contains a family of about six to eight CHS genes with some of them being tightly clustered. The clustering of CHS genes seen in clone PV-GBa 0005G03 is similar to a cluster of CHS genes in the I (Inhibitor) locus of soybean that has been associated with seed coat colour development in this species (Senda et al. 2002). Two naturally occurring dominant alleles, I and ii , inhibit pigmentation in the seed coat of that crop (Tuteja and Vodkin, 2008). The homozygous recessive i allele however results in a dark brown or black colour in the seed coat. Analysis of spontaneous mutations from I to i has shown that these mutations are closely related to the deletion of one of the CHS genes (Tuteja et al, 2009).

150

Genetic analysis of CHS gene in the cultivar Williams containing the ii allele, showed that five (CHS1, CHS3, CHS4, CHS5, and CHS9) of the nine nonidentical CHS gene family members were clustered in a 200- to 300-kb region of Gm08 (Clough et al., 2004; Tuteja and

Vodkin, 2008). CHS2 and CHS6 are located on Gm05 and Gm09 respectively, while CHS7 and

CHS8 were found on Gm01 and Gm11, respectively (Tuteja et al, 2009). Three of these five genes, CHS1, CHS3, and CHS4, were revealed to occur as two 10.91-kb perfect, inverted repeat clusters separated by 5.87 kb of intervening sequence. This 5.87 kb defines the I locus and it is absent in recessive i mutations (Tuteja et al, 2009).

The inhibition of seed coat pigmentation in soybean is due to naturally occurring posttranscriptional gene silencing (PTGS) of the chalcone synthase (CHS) genes (Clough et al.

2004; Tuteja and Vodkin, 2008). However, the regulation mechanism and tissue specificity of this silencing have not been fully characterized. Studies showed that silencing from the complex

CHS loci in soybean I and ii varieties occurs because these loci contain clusters of CHS sequences in inverse orientation, which generate RNA transcripts that could fold and create aberrant double-stranded RNA (dsRNA). The latter are diced to generate primary siRNAs that target other CHS family member genes (Eckardt 2009). This naturally occurring instance of gene silencing is similar in some ways to co-suppression of CHS in petunia, in which plant transformation with extra copies of CHS causes silencing of the endogenous CHS genes in floral tissue (De Paoli et al. 2009).

The clustering and arrangement of the the PvCHS-B genes, including the occurrence of a partial CHS sequence in clone PV-GBa 0005G03, is similar to arrangement in the I locus of soybean. Whether or not similar posttranscriptional gene silencing (PTGS) occurs in common bean needs more investigation.

151

Although CHS is the first committed enzyme of the flavonoid pathway and its expression can have strong effects on downstream genes, PvCHS-B and PvCHS-A cannot be candidates for the P gene because they are located on Pv02 and Pv01, respectively, whereas, the P locus has been mapped on Pv07. Also, although PvCHS-B and PvCHS-A were strongly expressed in black seeded (dominant P) beans, PvCHS-A was not expressed in the white dominant P (nor in the white recessive p). PvCHS-B weakly expressed in both white lines. These results suggest that

PvCHS-B and PvCHS-A are under the control of other unknown regulatory factors.

PvDFR gene

Dihydroflavonol 4-reductase (DFR) is involved in the biosynthesis of the flavonoids, anthocyanidin and proanthocyanidin (Trabelsi et al. 2008). This enzyme catalyzes the reduction of dihydroflavonols to leucoanthocyanidins, which are precursors for anthocyanin and proanthocyanidin biosynthesis. Leucoanthocyanidins are the colourless precursors of anthocyanins, which are the major water-soluble pigments found in flowers and fruits (Holton and Cornish 1995). DFR is an important gene in the phenylpropanoid pathway and its spatial expression and substrate specificity can affect colour in different plant organs.

Genetic and molecular studies in soybean have shown that DFR is a multigene family with three members. Markers developed from the DFR1, DFR2,and DFR3 genes were mapped on B2 (Gm14), D2 (Gm17), and D1b (Gm02), respectively (Yang et al, 2010). A survey in

Phytozome database indicated that common bean has four members for DFR gene. They are located on Pv01 (PvDFR1 and PvDFR2 ), seven (PvDFR3) and eleven (PvDFR4). Two of these genes are tandemly located on Pv01. PvDFR3 and PvDFR1 were sequenced in this study prior to availability of bean genome sequence.

152

A number of the analyses suggest that PvDFR4 is different from the three other PvDFR genes. In particular, PvDFR3, PvDFR1 and PvDFR2 have six exons and five introns, whereas,

PvDFR4 has only four exons and three introns. PvDFR3, PvDFR1 and PvDFR2 have very similar nucleotide and amino acid sequences, (78-84% and 72-81%, respectively),whereas

PvDFR4 is similar to the rest of gene family only at the protein level. The presence of multiple copies of DFR with different structures and sequences allows differences in spatial and temporal expression patterns of the gene to evolve.

Another interesting aspect of DFR enzyme is that it catalyses the reduction of three dihydroflavonols, including: dihydromyricetin (DHM), dihydroquercetin (DHQ), and dihydrokaempferol (DHK) into colourless leucoanthocyanidins. These are further converted to coloured delphinidin (purple to blue), cyanidin (red to pink), and pelargonidin (orange to brick red). F3′H and F3′5′H have important roles in the synthesis of anthocyanidins. F3′H converts

DHK to DHQ and F3′5′H converts DHK to DHM (Holton and Cornish, 1995). DFR enzymes from different plants show substrate specificities, based on differences among the hydroxylation patterns of anthocyanin molecules (Nakatsuka et al, 2007). The alignment of Petunia DFR with other plants indicated a variable region that controls substrate recognition. Petunia hybrida does not produce orange flowers, because its DFR enzyme has an aspartic acid residue at the 134th position and cannot use dihydrokaempferol as substrate to produce pelargonidin (Nakatsuka et al, 2007; Beld et al, 1989) but it can convert dihydroquercetin to leucocyanidin and dihydromyricetin to leucodelphinidin (Nakatsuka et al, 2007; Forkmann and Ruhnau, 1987).

Whereas, some Gerbera genotypes have an asparagine residue at this same position in their

DFRs and can utilize the three dihydroflavonols as substrates to produce orange to red coloured flowers. Thus, it is hypothesized that flower colour is partly determined by alteration of a single

153 amino acid that changes the substrate specificity of the DFR enzyme (Helariutta et al, 1993,

Nakatsuka et al, 2007).

Alignment of deduced DFR amino acid sequences from common bean demonstrates that they posses different amino acids at the substrate recognition site. Out of four genes at least one

(PvDFR1) can utilize all three forms of substrate because it has an asparagine residue at the 134th amino acid position and can use the three dihydroflavonols as substrates. Thus, it may not be a limiting factor in the range of colours that can be seen in flowers. PvDFR1 was mapped on Pv01 and is a candidate for a flower colour gene. The flower colour locus was mapped at the very proximal end of Pv01 in the current study (chapter 2) and PvDFR1 is also located at 1,090 kb of this chromosome. This is a novel observation and the close map association between the flower colour trait and the DFR1 gene suggest that it controls flower colour in bean.

PvDFR1 is highly expressed in the seed coat of black tester line (Florida line 5-593) with dominant P which has a bishop violet flower phenotype. This gene does not express in the seed coat of tester lines pBC3 5-593 with white seed coat and pure white flower and PcdjBC3 5-593 with white seed coat and purple flower, indicating that it is not related to the P gene. Given its association with flower colour in the current results it would be interesting to test its expression in the flowers of the tester lines.

In soybean five loci W1, W3, W4, Wm, and Wp control the purple pigmentations in flowers and hypocotyls (Palmer et al. 2004). W3 that controls purple-throat flower, cosegregates with a DFR1 gene (AF167556) based on restriction fragment length polymorphism (RFLP) analysis (Fasoula et al. 1995). W4 locus codes for dihydroflavonol-4-reductase (DFR2) gene and its mutated allele harbors an active CACTA-like transposable element with dilute purple and

154 pale flower phenotype (Xu et al. 2010). DFR2 has two genomic allele sequences (EF187612 and

DQ026299) and its interruption by transposable elements causes variegated flower phenotype in soybean (Xu et al. 2010). Bean genome survey using soybean DFR1 and DFR2 sequences shows that these genes are similar to two tandem DFR genes on Pv01 correspondingly. This region of

Pv01 shares its synteny with regions of soybean Gm14 and Gm17 where DFR1 and DFR2 are located. We speculate that GmDFR1 and GmDFR2 have originated from tandem PvDFR1 and

PvDFR2 and were shuffled and rearranged to their current positions in soybean genome.

Involvement of DFR genes in flower colour development in the soybean supports the association between PvDFR1 and flower colour in common bean as shown in this study.

PvMyb15transcription factor

The PvMyb15 transcription factor homolog identified by BLAST in clone PV-GBa 0043K12 in the current work is the first Myb gene described for P. vulgaris. The name for the gene was chosen because of its similarity to AtMYB15. Phylogenetic analysis placed the PvMYB15 with the Arabidopsis MYB transcription factors in subgroups 2 close to AtMYB14 and AtMYB15.

This similarity is significant for the current study because a homolog of AtMyb15 in Lotus

(LjMyb15) has been shown to activate isoflavonoid biosynthesis in response to biotic and abiotic stresses (Shelton et al, 2012) and AtMYB13, AtMYB14 and AtMYB15 of Arabidopsis, which are involved in the phenylpropanoid pathway, all have a conserved C-domain containing the sequences IDxSFWSE –MxFWFD. This sequence is also found in PvMyb15.

The MYB TF superfamily is the most abundant TF group in plants (Riechmann et al,

2000) with diverse functions in plant development, hormone response (Shin et al, 2007), growth

(Petroni et al, 2008), epidermal cell fate and formation of trichomes (Wang et al, 2008), stomatal movements and development (Cominelli et al, 2005), seed development (Gonzalez et al, 2009), 155 response to drought (Abe et al, 2003; Ding et al, 2009) and cold (Zhu et al, 2005), pathogen- response (Vailleau et al, 2002), light-sensing responses (Kawamura et al, 2008), sugar-related responses (Lu et al, 2002), modulation of secondary metabolites such as glucosinolates

(Sonderby et al, 2007) and phenylpropanoids (Stracke et al, 2007). There are 133 R2R3 MYBs in

Arabidopsis and they are clustered into 24 subgroups (Stracke et al 2001).

MYB transcription factors are characterized by the presence of three imperfect repeats of the DNA Binding Domain (MYB domain). Each repeat is characterized by a motif of 52 amino acids containing three evenly spaced tryptophan residues that gives a triple helix-turn-helix three dimensional conformation to the protein (Jin and Martin, 1999). The third α-helix of the MYB

DNA binding domain recognizes and binds the cis-regulatory motifs. These motifs are called

MYB Binding Sites (MBS) and are highly enriched in adenosine and cytosine residues (Jiang et al. 2004). In 1998, Romero and collaborators defined three different MBS signatures recognized by R2R3 MYB proteins, MBSI: [T/CAACG/TGA/C/TA/C/T], MBSII: [TAACTAAC] and

MBSIIG: [T/CACCA/TAC/AC]. There are important key residues in this N-terminal region which are necessary for trans-activation efficiency residues that regulate and specify DNA binding and interactions with bHLHs (Li et al, 2006). Shared conserved motifs in the C-terminal domains may indicate similarities in function (Stracke et al., 2001).

PvMYB15 has sequence similarity to GmMYB29 of soybean which is expressed when plant is exposed to UV-B light (Shimizu et al, 2000). GmMYB29 activates the genes in the flavonoid pathway in order to increase the accumulation of flavonoid compounds in the epidermal cell layer (Shimizu et al, 2000). Flavonoids are considered to be among the most effective protectants against the ultraviolet-B radiation (Shimizu et al, 2000). Cis-acting elements identified in a stretch of 1500 bp upstream of PvMYB15 start codon are mostly predicted to be

156 active in response to light and abiotic stress. Hence, its expression apparently increases under the light that in turn triggers the expression of flavonoid pathway gene.

PvMYB15 was mapped close (7.8 cM) to pigmentation locus (P) as a RFLP marker on

Pv07 in the common bean Core population in this study. This locus was previously mapped phenotypically to a linkage group that is collinear with Pv07 (Vallejos et al. 1992). The closest marker to this locus is Bng204 that is a RFLP marker was converted to a STS marker by Murray et al (2002). Bng204 is 12.2 cM from PvMYB15 and 11 cM from OU32300, which is an RAPD marker for P locus in the Core population (Erdmann et al. 2002). The physical position of

Bng204 is at 37,401 kb of Pv07. There is also a homolog of AtMYB15 at 43061 kb of Pv07.

LjMyb15 is a homolog of AtMyb15 in Lotus japonicus and has been shown to activate isoflavonoid biosynthesis in this plant (Shelton et al, 2012). No Myb TF has been previously reported in common bean. In other plants Myb proteins interact with other transcription factors, including basic helix-loop-helix (bHLH) TFs and WD-repeat proteins (the MYB-bHLH-WD40

"MBW" complex) (Baudry et al. 2004) to control the transcription of genes for enzymes in the phenylpropanoid pathway leading to anthocyanin synthesis in seed coats (Koes et al. 2005).

PvPAL2 and PvPAL3

Previous work on the PAL gene family in bean identified three members coding for proteins with

506,712, and 710 amino acids (Cramer et al, 1989). The PAL genes characterized in the current study may be the same genes as those identified previously by Cramer et al. (1989) since, PAL3 codes for a protein of 710 amino acids and PvPAL2 codes for a 712 amino acid polypeptide, and

PvPAL3 codes for a protein with 686 residues. Also, all the PAL genes described so far have consisted of two exons and an intron. Differences in sequence between the previous and current results might be attributed to differences among the bean classes and varieties used for the

157 current and previous work. The previous work used a genomic library of a red kidney bean named Canadian Wonder, whereas the G19833 cultivar, which was used in this study, has a patterned seed coat with red thin mottles on a dark yellow background.

Phenylalanine ammonia-lyase (PAL) is a member of the lyase class I-like superfamily that catalyzes a beta-elimination reaction. It is the first committed step of the phenylpropanoid pathway (the conversion of L-phenylalanine to E-cinnamic acid) and is the link between primary and secondary metabolism. Different PAL isoforms are active as homotetramers and are thought to have different metabolic roles (Huang et al. 2010).

PAL is usually encoded by a small multigene family in plants (Cramer et al. 1989;

Bolwell et al.1985). Arabidopsis has four putative isoenzymes (Cochrane et al. 2004) while in raspberry and bean it is encoded by a multi-gene family with few members (Kumar and Ellis,

2001; Cramer et al. 1989). There are five PAL genes in poplar and nine in rice (Hamberger et al.,

2007). Potato appears to be an exception with more than 40 copies reported (Joos and Hahlbrock,

1992). The individual genes may respond differently to biotic and abiotic stresses and their expression is developmentally and spatially controlled (Lillo et al., 2008). In poplar, PAL genes show an organ-specific expression (Kao et al., 2002). One is involved in lignin formation, while the second PAL gene is specifically targeted to condensed tannin formation and the third gene is associated with flowering (Hamberger et al., 2007). However, in Lycopersicon esculentum with an estimated 20 putative PAL genes only a single gene appears to be strongly expressed in all tissues, whereas the remaining genes appear to be effectively silenced (Chang et al, 2008). This phenomenon is unusual and the precise genetic mechanisms of this extensive gene silencing remain to be established. The transient induction of expression by elicitors has been studied in bean (Mavandad et al, 1990) and showed that three PAL genes were expressed with different

158 kinetics in response to an elicitor, characterized by a more rapid initial appearance of PALl transcripts. These studies supported a potential role for CA (trans-cinnamic acid; the product of the PAL reaction) as a regulator of the expression of phenylpropanoid biosynthetic genes at the level of transcription (Mavandad et al, 1990). Exposure of cell cultures to CA affected the levels of transcripts from the three PAL genes of bean in a similar manner. However, PAL2 transcripts were the most sensitive to inhibitory effects and PALl transcripts showed a delayed response.

The physiological relevance of these small differences remains to be determined (Mavandad et al, 1990).

The analysis and comparison of the proteins encoded by PvPAL2 and PvPAL3 with other plant PAL proteins revealed that the bean genes contained the PAL–HAL domain that is characteristic of this enzyme and is conserved in bacteria, plants and animals. This domain contains all residues that are part of the active site (Rother et al, 2001; Schwede et al, 1999) and the amino acids that interact in the characteristic homo-tetramer formation (Ritter and Schulz,

2004). These data suggested that our PvPAL2 and PvPAL3 coded for functional enzymes.

Furthermore, PvPAL2 and PvPAL3 sequences included a conserved phosphorylation site that is conserved in plant PALs, suggesting a posttranscriptional modulation of the enzyme activity

(Alwood et al, 1999). The phylogenetic analysis of these proteins showed a close relationship with PAL enzymes from soybean and other legumes. Three dimensional structure analysis of

PvPAL2 and PvPAL3 protein showed that both are similar to thr crystal structure of parsley PAL enzyme.

159

V. STRUCTURAL CHARACTERIZATION OF SIX BAC CLONE SEQUENCES IN COMMON BEAN AND THEIR SYNTENIC RELATIONSHIP WITH SOYBEAN

Abstract

Six BAC clones containing phenylpropanoid pathway genes were sequenced and their syntenies were analyzed in bean and soybean. BAC clone PV-GBa 0005G03 is located on Pv02 of bean and is most similar to segments of soybean Gm01 and Gm11. This clone contains eight tandem repeated chalcone synthase gene whereas its related region in soybean has only one copy of this gene. BAC clone PV-GBa 0083H05 with PvCHS-A is located on Pv01 and is similar to segments of soybean Gm19 and Gm16. BAC clones PV-GBa 0072I22 (PvDFR-A), PV-GBa

0043K12 (PvMYB15) and PV-GBa 0079P22 (PvPAL2) are all located on Pv07 and are similar to segments of Gm10 and Gm02, Gm20 and Gm10 of soybean respectively. BAC clone PV-GBa

0061D18 has the PvPAL3 gene and is located on Pv08 of common bean and is similar to segments of the soybean Gm02 and Gm14. Although a new approach to genome sequence comparison was taken for this analysis, the results are in full agreement with previous studies where molecular markers were utilized to examine the shared synteny between common bean and soybean.

160

Introduction

Synteny is the co-localization of genes and preserved syntenic relationships across species provides a valuable framework for inference of shared ancestry of genes. But what is it that keeps genes in the same order over time? It is speculated that functionally related genes are kept in close physical proximity as they are often co-regulated (Stechmann, 2004). Studies with yeasts have shown that essential genes are highly clustered. These clusters are usually located in regions with low recombination rates (Stechmann, 2004).

Beside the apparent microsynteny or microcolinearity between many plant genomes, there are also many small exceptions. Frequent insertions of transposable elements and duplications or deletions of genes are examples of small rearrangements in the genome. These changes do not have significant effects on most adjacent sequences and are related to the environmental or ploidy status of the plant (SanMiguel et al. 1998). Small rearrangements are more frequent than are the large relocations detected by traditional cytogenetics and low- resolution genetic maps (Bennetzen, 2000).

Syntenic analyses allow the findings from model organisms to be translated to less-well- understood systems (Tang et al. 2008). An understanding of syntenic relationships among species also facilitates the investigation of genome evolution and dynamics (Mudge et al. 2005).

This knowledge allows genetic information from one species to be applied to gene isolation and molecular tagging experiments of a related species. An example of this sort of transfer of knowledge is given by the work to identify the genes for determinacy in soybean. These genes were identified by homology to the Dt1 gene in common bean, which was in turn identified as a candidate for dwarfing via its homology to the Tfl1 (terminal flower 1) gene in Arabidopsis thaliana (Kwak et al. 2008, Tian et al. 2010).

161

Investigations of syntenic relationships and comparative mapping have been conducted with closely related plant species in the Solanaceae (pepper, tomato, and potato) (Bonierbale et al, 1988; Park et al, 2011; Wu and Tanksley, 2010), Gramineae (Ahn and Tanksley, 1993; Gale and Devos, 1998), (Kalo et al. 2004; Yan et al. 2004; Cannon et al. 2006; McClean et al. 2010) and Brassicaceae (Cheng et al. 2012), and Rosaceae (Prunun spp) (Jung et al. 2012).

Synteny analysis is based on comparisons of genetic maps, which usually requires whole genome sequence data. In legume genomics, soybean, medicago (Medicago truncatula) and lotus

(Lotus japonicus) are three legumes that have complete or almost complete genome sequence information. These genome sequences are useful for whole genome comparisons and information transfer from model genome sequences to crop species. The evolutionary distances between these species, as well as the rates and nature of the changes in the genomes over time are main factors that affect the ability to transfer knowledge between species (Cannon et al. 2009).

Analyses that have compared common bean and sequenced legumes reported syntenic blocks of various sizes (Galeano et al. 2009; Hougaard et al. 2008; McConnell et al. 2010;

McClean et al. 2010). However, in these studies the common bean information came from low or medium saturated linkage maps developed using single bi-parental populations.

Common bean and soybean are the two most important members of the Phaseoleae legumes. These species diverged nearly 20 million years ago around the time of the major duplication event in soybean (Lavin et al. 2005). Synteny analysis indicates that most segments of any one chromosome of common bean are highly similar to segments of two chromosomes of soybean (Galeano et al. 2009). Since P. vulgaris is a true diploid with an estimated genome size of 588- 637 mega base pairs (Mbp) (Arumuganthan and Earle, 1991), it been suggested as

162 serving as a model for understanding the ~1,150 million base pairs (Mbp) of the soybean genome

(McConnell et al. 2010).

The soybean genome sequence was assembled in late 2008 from ~8.5-fold whole-genome shotgun coverage (Schmutz et al. 2010). The soybean genome is moderately large compared to most other plant genomes that have been sequenced. It is ~1,150 million basepairs (Mbp), which is more than eight times the size of the Arabidopsis genome (125 Mbp), almost two and a half times the size of the genomes of the model legumes Medicago truncatula and Lotus japonicus or the grape genome (each is ~450 – 470 Mbp) and roughly double the size of common bean and poplar (625 and 550 Mbp, respectively) (Cannon and Shoemaker, 2012). The number of predicted protein encoding genes in soybean is ~46,400, which is relatively high in comparison to ~27,400 genes in Arabidopsis, ~26,400 in grapes and ~41,300 in poplars (Schmutz et al. 2010,

Sterck et al. 2007). Both the relatively large genome size and high gene count might have been caused by a recent polyploidization event in soybean. Soybean underwent whole-genome duplication (WGD) which has been dated to between ~5 and ~13 Mya (Doyle and Egan 2009,

Schmutz et al. 2010). However there is still uncertainty about whether the event was autopolyploidy (derived from a single species) or allopolyploidy (derived from different species).

The latter is more likely to have occurred due to the existence of two divergent sets of centromeric repeats (Walling et al. 2006). Beside the WGD in Glycine, its genome has undergone at least two other rounds of genome duplications: one at around 58 million years ago, near the origin of the papilionoid legume subfamily; and a genomic triplication before the radiation of the Rosid or Fabid clade, more than 130 million years ago (Schmutz et al. 2010).

Because of these changes, any given genomic region of soybean has up to 12 homoeologous genomic copies. Large, distinct pericentromeres in all of the chromosomes is another well-

163 known feature in the soybean genome which comprises approximately 57% of the current assembly (Schmutz et al. 2010). These regions are repeat-dense and gene-poor, and have extremely suppressed rates of recombination (Cannon and Shoemaker, 2012).

Few studies have been published on the synteny between bean and soybean. In 2009,

Galeano and colleagues placed a total of 118 new marker loci into an integrated molecular map for common bean consisting of 288 markers. Of these, 218 were used for synteny analysis and

186 presented homology with segments of the soybean genome. Their synteny analysis with soybean showed a mosaic pattern of syntenic blocks and most segments of any one common bean linkage group associated with segments of two soybean chromosomes. The analysis with

Medicago truncatula and Lotus japonicus presented fewer syntenic regions consistent with the more distant phylogenetic relationship between these species (Galeano et al. 2009).

McClean et al. (2010) mapped the genetically-positioned transcript loci of common bean to the recent soybean genome assembly. In almost all cases, each common bean locus mapped to two loci in soybean. Synteny blocks of 32 cM in average in common bean and 4.9 Mb in soybean were observed for all 11 common bean chromosomes which were mapped to all 20 soybean chromosomes. Based on physical distances in soybean, the average physical-to-genetic distance ratio in common bean was estimated ~120 kb/cM. Using the shared syntentic blocks as references points, ~15,000 common bean sequences (primarily EST contigs and EST singletons) were electronically positioned onto the common bean map. The results of this study support the duplication history of soybean and also provide evidence that the soybean genome was fractionated and reassembled at some point following the duplication event (McClean et al.

2010).

164

In an even more recent study a total of 772 marker sequences were compared with the soybean genome (Galeano et al. 2011). From this comparison a total of 44 syntenic blocks were identified. Out of 11 linkage groups of bean the linkage group Pv6 presented the most diverse pattern of synteny with seven syntenic blocks, and Pv9 showed the most consistent relations with soybean with just two syntenic blocks.

Common bean is also related to other legume species including cowpea (Vigna unguiculata) and pigeon pea (Vigna radiata). A better understanding of the common bean genome will lead to a better knowledge of other important legumes as well as the development of comparative genomics resources. It should be mentioned that the common bean genome is currently being sequenced (McClean et al. 2010) and its first genome assembly has been released.

Materials and methods

BAC isolation and DNA preparation

The P. vulgaris BAC library constructed from HindIII digested and size selected P. vulgaris cultivar G19833 genomic DNA was obtained from the Clemson University Genomics Institute.

The library was screened with DIG-labeled probes of PvCHS-A, PvCHS-B, PvDFR1, PvMyb,

PvPAL2 and PvPAL3 genes amplified from cloned genomic DNA. The hybridization probes were synthesized using a digoxygenin (DIG) PCR Labeling kit (Roche) according to the manufacturer’s instructions. Membranes prepared by Clemson Genomic Institute containing

55296 BAC clones of P. vulgaris cultivar G19833 genomic DNA were screened by Southern hybridization with the DIG labelled probes. The identities of the BAC clones to which the probes strongly hybridized were identified on the bases of the membrane positions and patterns of the hybridization spots seen in the images. BAC clones that were strongly hybridized with the

165 probes were requested from the Clemson Genomic Institute. Plasmid DNA was extracted from the clones using plasmid extraction kit (Qiagen) and presence of probe sequences was verified by

PCR amplification using gene specific primers. After electrophoresis, staining and visualization of the PCR products they were compared to the original genomic PCR band. One clone per gene from a few clones with the appropriate size was selected for sequencing. Escherichia coli cells carrying individual BAC clones were grown on LB plates supplemented with 12.5 µg/ml of chloramphenicol at 37°C. For each BAC clone, one colony was picked and cultured in 2 ml LB supplemented with 12.5 µg/ml chloramphenicol. 10-15 ug of DNA from each clone were used for 454 next generation sequencing in the National Research Centre in Saskatoon.

Analysis of microsynteny

Mercator and MAVID are two programs that can be combined to carry out whole genome alignments. Mercator gets multiple whole genomes as input and constructs an orthology map, which is then used to guide nucleotide-level multiple alignments produced by MAVID. These programs are both fast and freely available and genome alignments can be performed on a single laptop (Dewey, 2007). Whole genome of bean and soybean and their gene annotation were provided for these programs to be aligned and the result was visualized using Gbrowse_syn 2.0 from GMOD (http://gmod.org/wiki/GBrowse_syn).

Results

An analysis of genome synteny in common bean and soybean for the six sequenced BAC clones in this study showed that gene order and orientation were largely maintained, especially for the regions most similar between the two species. However, several structural rearrangements were observed between syntenic regions, particularly for the second homologous region in soybean.

166

Comparison of orthologous regions of common bean PV-GBa 0005G03 BAC clone and soybean

BAC clone PV-GBa 0005G03 is located on Pv02 at 3,691,740..3,830,946 bp of the v1.0 assembly (Phytozome) and is most similar to segments of the soybean Gm01 and Gm11. Bean regions 3691..3779 kb, 3779..3819 kb and 3819..3830 kb share synteny with regions 790..894 kb and 738..785 kb of Gm11 (in reverse) and 54962..54974 kb on Gm01, respectively (Fig 46).

The second syntenic region between this BAC clone and soybean is located on Gm01 (Table 23).

Based on homology to Arabidopsis, there are 19 genes in this region of bean with a gene density of one gene per 7.3 kb and 23 genes on the syntenic part of Gm11 with gene density of one gene per 6.5 kb. Eleven of these genes are shared between two species with ten and one common genes between the BAC clone and soybean Gm11 and Gm01, respectively. Gene function and their position in bean and soybean are described in detail in Table 23. PvCHS-B comprises a family of eight complete chalcone synthase gene in bean whereas there is only one

CHS gene in this BAC’s similar region in soybean (Fig 46, Table 23). PPR repeat, Translin- associated protein X and Poly (ADP-ribose) glycohydrolase are three genes not present in this

BAC clone but are in its similar region in soybean. They are located on Pv07, Pv03 and Pv07 of bean respectively (Table 23).

167

Figure 46. Overview of shared microsynteny between common bean BAC PV-GBa 0005G03 located on Pv02 and regions of soybean Gm11. Genes are represented by boxes and arrows (indicate transcription orientation). The upper panel indicates the position of the BAC clone in Pv02 of bean. The middle panel is the BAC clone region in the JGI v1.0 bean genome assembly (Phytozome). The lower panel is an annotation of the corresponding syntenic regions in Gm11 of soybean used as reference genome in synteny browser. Stacked boxes of gene annotation show alternative expression of the same gene. Numbers above bean and soybean genes correspond to the numbers in the annotation list (Table 23).

168

Table 23. Annotations and positions of genes in PV-GBa 0005G03 BAC clone of common bean and its syntenic regions in soybean. No. Gene annotation Gene position Protein Pv02 of common bean Soybean syntenic Soybean syntenic region 2 similarity* region 1 1 Pre-mrna processing protein 3697429 - 3700028 Gm11: 823636 - 826477 Gm01: 54775780 - 54778022 89.3% PRP39-related 2 GTP-binding ADP-ribosylation 3701961 - 3706685 Gm11: 818871 - 822582 Gm01: 54769621 - 54772941 100.0% factor Arf1 3 Nuclear transport receptor 3709434 - 3722523 Gm11: 803979 - 817244 Gm01: 54755183 - 54767332 97.4% CRM1/MSN5 PPR repeat Pv07: 41367903 - 41372194 Gm11: 798178 - 802346 Gm01: 54749672 - 54753731 90.4% 4 Chalcone synthase 3726784 - 3728413 Gm11: 793459 - 795686 Gm01: 54742696 - 54744656 99.0% 5 Chalcone synthase 3733338 - 3734917 - - 6 Chalcone synthase 3739327 - 3740751 - - 7 Chalcone synthase 3748310 - 3749652 - - 8 Chalcone synthase 3756821 - 3758400 - - 9 Chalcone synthase 3765494 - 3767075 - - 10 Chalcone synthase 3770982 - 3772252 - - 11 Chalcone synthase 3775079 - 3776552 - - Translin-associated protein X Pv03: 50621603 - 50631602 Gm11: 785499 - 789342 Gm01: 54738244 - 54740007 87.9% 12 E3 ubiquitin-protein ligase 3779474 - 3785159 Gm11: 779151 - 784231 Gm01: 54732549 - 54737347 85.6% 13 NADH dehydrogenase 3788518 - 3792528 Gm11: 774814 - 778042 Gm01: 54729403 - 54732431 98.7% 14 Uncharacterized conserved 3794218 - 3797902 Gm11: 770955 - 774071 Gm01: 54725552 - 54728813 87.3% protein 15 RNA polymerase III transcription 3799414 - 3806048 Gm11: 757277 - 766424 Gm01: 54715371 - 54722476 74.1% initiation factor b 16 Transcription factor DP 3807199 - 3812706 Gm11: 751050 - 757246 Gm01: 54706739 - 54713869 93.5% Poly (ADP-ribose) Pv07: 6775537 - 6780989 Gm11: 746640 - 749324 - 37.7%

169

glycohydrolase 17 Poly(ADP-ribose) glycohydrolase 3814805 - 3819844 Gm11: 746640 - 749324 - 36.1% 4-coumarate-coa ligase 3830262 - 3833723 Gm01: 54974200 - Gm11: 727649 - 732167 94.1% 54978159 *The protein similarity shown is between bean and Soybean syntenic region 1.

170

Comparison of orthologous regions of common bean PV-GBa 0083H05 BAC clone and soybean

BAC clone PV-GBa 0083H05 is located on Pv01 at 14,526,848..14,805,105 bp of the v1.0 assembly (Phytozome) and is similar to segments of Gm19 and Gm16. Synteny in bean regions

14526..14547 kb and 14547..14804 are most similar to 34823..35285 kb of Gm19 and

4541..4544 kb of Gm16 of soybean, respectively (Figure 47). The second syntenic region between this BAC clone and soybean is located on Gm16 (Table 24).

There are three genes in this region of the bean genome with a gene density of one gene per 90 kb and five genes on the similar part of soybean genome with gene density of one gene per 16 kb. All three genes are shared between the two species, including: Chalcone synthase and

Serine/threonine protein kinase and a protein of unknown function (DUF677), with 91.3%,

89.7% and 72.2% similarity at the amino acid level, respectively. There are additional genes in the soybean segment of Gm19 including UTP--glucose-1-phosphate uridylyltransferase which is located on Pv08 of bean with another homolog on Gm18 of soybean. The other gene does not have a known function or a homolog in bean and cannot be found in the second syntenic region on Gm16. The additional genes in the soybean segments of Gm19 are UTP-glucose-1-phosphate uridylyltransferase and another gene with no functional annotation. These genes are not found in the second syntenic region on Gm16.

171

Figure 47. Overview of shared microsynteny between common bean BAC PV-GBa 0083H05 located on Pv01 and region of soybean Gm19. Genes are represented by boxes and arrows (indicate transcription orientation). The upper panel indicates the position of the BAC clone in Pv01 of bean. The middle panel is the BAC clone region in the JGI v1.0 bean genome assembly (Phytozome). The lower panel is an annotation of the corresponding syntenic regions in Gm19 soybean used as reference genome in synteny browser. Stacked boxes of gene annotation show alternative expression of the same gene. Numbers above bean and soybean genes correspond to the numbers in the annotation list (Table 24).

172

Table 24. Annotations and positions of genes in PV-GBa 0083H05 BAC clone of common bean and its syntenic regions in soybean. No. Gene annotation Gene position Protein Pv01 of common bean Soybean syntenic region 1 Soybean syntenic region 2 similarity* 1 Chalcone synthase A 14547185 - 14549317 Gm19: 35280341 - 35283392 Gm09: 8193450 - 8195784 91.3% UTP--glucose-1-phosphate Pv08: 55496379 - Gm19: 35245344 - 35245843 Gm18: 3811642 - 3815577 66.7% uridylyltransferase 55503818 2 Serine/threonine protein kinase 14604051 - 14608321 Gm19: 35220930 - 35225636 Gm16: 4439167 - 4443832 89.2% 3 Protein of unknown function 14754096 - 14756087 Gm19: 35204659 - 35206868 Gm16: 4447529 - 4449665 72.2% (DUF677) no functional annotations - Gm19: 35202145 - 35203368 - - *The protein similarity shown is between bean and soybean syntenic region 1.

173

Molecular analysis of chalcone synthase gene in bean and soybean

A survey in Phytozome illustrates nine loci annotated as CHS gene for bean which are located on

Pv01, Pv02, Pv08, Pv09 and Pv11. There are also 15 CHS loci for soybean on Gm01, Gm02,

Gm05, Gm06, Gm08, Gm09, Gm11, Gm12, Gm13 and Gm19 (clustered CHSs are accounted for one locus in both bean and soybean). Out of nine CHS loci in bean, seven have a homolog in soybean. Clustered CHSs in bean are eight genes tandem on Pv02 within a 50kb stretch of genomic DNA and the region with shared synteny on Gm11 has only one copy of CHS. There is also a CHS cluster in soybean which is located on Gm08 at a 100kb extent of genome. This part of soybean genome shares its synteny with Pv03 of bean where no CHS gene can be found.

174

Comparison of orthologous regions of common bean PV-GBa 0072I22 BAC clone and soybean

PV-GBa 0072I22 BAC clone is located on Pv07 at 48,408,886..48,606,955 bp of the v1.0 assembly (Phytozome) and is similar to segments of the soybean Gm10 and Gm02. Bean regions

48408..48420, 48457..48484, 8484..48492, 48492..48524, 48524..48538, 48592..45606,

48420..48457 and 48538..48592 kb are syntenic to 21588..21740, 29353..29645, 29348..29353,

27950..29350, 16880..17350 and 17348..17358 kb of Gm02 and 22690..23250 and 20260..22690 kb of Gm10 of soybean (Fig 48). The second syntenic region between this BAC clone and soybean is scattered over different chromosomes (Table 25).

There are 19 genes in this region of bean and seven of them have unknown function.

Gene density in this BAC clone is one gene per 11.5 kb. Eleven of these genes are shared between bean and soybean (Table 25).

175

Figure 48. Overview of shared microsynteny between common bean BAC PV-GBa 0072I22 located on Pv07 and regions of soybean Gm10 and Gm02. Genes are represented by boxes and arrows (indicate transcription orientation). The upper panel indicates the position of the BAC clone in Pv07 of bean. The middle panel is an annotation of the corresponding syntenic regions in Gm10 and Gm02 of soybean. The lower panel is the BAC clone region in the JGI v1.0 bean genome assembly (Phytozome) used as reference genome in synteny browser. Stacked boxes of gene annotation show alternative expression of the same gene. Numbers above bean genes correspond to the numbers in the annotation list (Table 25). 176

Table 25. Annotation and position of genes in PV-GBa 0072I22 BAC clone of common bean and its syntenic region in soybean. No. Gene annotation Gene position Protein Pv07 of common Soybean syntenic region 1 Soybean syntenic region similarity* bean 2 1 Monocarboxylate transporter 48408758 - 48410861 Gm02: 25135324 - 25137093 Gm16: 31483740 - 31487529 91.7% 2 WD40 repeat-containing protein 48422403 - 48435427 Gm10: 22973808 - 22991989 Gm02: 25590103 - 25615786 96.3% 3 protein binding 48445332 - 48448728 Gm10: 23044729 - 23050861 Gm03: 36110882 - 36118717 85.3% 4 Pyruvate kinase 48449913 - 48466884 Gm02: 25907469 - 25910473 Gm20: 41672482 - 41681921 95.3% 5 Early growth response protein 48472183 - 48472893 Gm02: 21543508 - 21544242 Gm10:24291415 - 24292154 66.1% 6 ATP-dependent RNA helicase 48475170 - 48482872 Gm02: 29516490 - 29548584 Gm01: 1008759 - 1013501 71.9% 7 Glycerol-3-phosphate 48483955 - 48492672 Gm02: 29345160 - 29353913 Gm20: 40994842 - 41005860 96.7% dehydrogenase no functional annotations 48497603 - 48498260 Gm10: 20166065 - 20170817 Gm02: 29264728 - 29275074 45.7% 8 Serine-threonine protein kinase 48499824 - 48506705 Gm02: 29013058 - 29050207 Gm10: 6360210 - 6370081 71.1% no functional annotations 48514313 - 48521880 Gm02: 29013058 - 29050207 Gm10: 6360210 - 6370081 89.7% no functional annotations 48524254 - 48532892 Gm02: 16999167 - 17010787 - 91.0% 9 protein self binding 48535709 - 48538176 Gm02: 17021550 - 17027132 - 86.0% 10 Serine carboxypeptidases 48538724 - 48549790 Gm10: 20263575 - 20290098 - 91.4% no functional annotations 48558930 - 48559345 - - - no functional annotations 48560702 - 48563000 Gm10: 39367540 - 39392259 Gm20: 45031228 - 45051516 54.7% no functional annotations 48563008 - 48563915 - - - 11 Flavonol reductase/cinnamoyl- 48573352 - 48576358 Gm02: 17169065 - 17174480 Gm14: 6000488 - 6004812 87.3% CoA reductase no functional annotations 48582832 - 48588851 - - - 12 Amine oxidase 48592000 - 48597895 Gm02: 17350445 - 17357305 - 89.5% * Protein similarity shown is between bean and Soybean syntenic region 1.

177

Comparison of orthologous regions of common bean PV-GBa 0043K12 BAC clone and soybean

Microsynteny between bean BAC clone PV-GBa 0043K12 and soybean revealed high similarity between the genomes. This BAC clone is located on 13377553..13576830 of Pv07 of common bean (Fig 49) and is similar to segments of the soybean Gm20. Bean region 13377553..13576830 bp of Pv07 is syntenic to 43,445,000..43,494,000 bp of Gm20 of soybean (Fig 49). The second syntenic region between this BAC clone and soybean is located on Gm10 (Table 26).

Based on homology to Arabidopsis, there are seven genes in this region of bean with gene density of one gene per 25 kb and six genes on the syntenic part of soybean genome with gene density of one gene per 8.2 kb. Four of these genes are shared between the two species including: Myb Transcription factor, actin depolymerizing factor, DNA photolyase and aspartyl protease with 92.2%, 75.5%, 96.4% and 87.9% similarity at amino acid level. One gene annotated as Cotton fibre expressed protein is not present in this BAC clone but can be found on both syntenic regions (Table 26).

The Myb transcription factor in this clone is homologous to Myb15 of Arabidopsis which is involved in drought tolerance (Ding et al, 2009). AtMyb15 in also homolog to LjMyb15 of

Lotus that has been shown to activate isoflavonoid biosynthesis in response to biotic and abiotic stresses (Shelton et al, 2012). The homolog of PvMyb15 in soybean is GmMyb29 which is expressed when plant is exposed to UV-B light (Shimizu et al, 2000). GmMyb29 activates the genes in the flavonoid pathway in order to increase the accumulation of flavonoid compounds in the epidermal cell layer which protect the plant against the ultraviolet-B radiation (Shimizu et al,

2000).

178

Figure 49. Overview of shared microsynteny between common bean BAC PV-GBa 0043K12 located on Pv07 and regions of soybean Gm20. Genes are represented by boxes and arrows (indicate transcription orientation). The upper panel indicates the position of the BAC clone in Pv07 of bean. The middle panel is the BAC clone region in the JGI v1.0 bean genome assembly (Phytozome). The lower panel is an annotation of the corresponding syntenic regions in Gm20 of soybean used as reference genome in synteny browser. Stacked boxes of gene annotation show alternative expression of the same gene. Numbers above bean and soybean genes correspond to the numbers in the annotation list (Table 26).

179

Table 26. Annotation and position of genes in PV-GBa 0043K12 BAC clone of common bean and its syntenic region in soybean. No. Gene annotation Gene position Protein Pv07 of common bean Soybean syntenic region Soybean syntenic region similarity* 1 2 1 no functional annotations 13400341 - 13401324 Gm20: 43447555 - 43448723 Gm10: 40909453 - 40911886 75.1% Cotton fibre expressed - Gm20: 43452291 - 43452785 Gm10: 40909453 - 40911886 - protein 2 Transcription factor, 13461806 - 13464239 Gm20: 43456228 - 43459054 Gm10: 40893555 - 40895977 91.2% Myb superfamily 3 no functional annotations 13493709 - 13494086 - - 4 no functional annotations 13498440 - 13499400 - - 5 Actin depolymerizing 13554177 - 13558424 Gm20: 43470755 - 43475344 Gm10: 40877292 - 40881497 75.5% factor 6 DNA photolyase 13562731 - 13568701 Gm20: 43477591 - 43483821 Gm10: 40869589 - 40875360 96.5% 7 Aspartyl protease 13570178 - 13571494 Gm20: 43489941 - 43492214 Gm10: 40867207 - 40869164 87.4% * Protein similarity shown is between bean and Soybean syntenic region 1.

180

Comparison of orthologous regions of common bean PV-GBa 0079P22 BAC clone and soybean

BAC clone PV-GBa 0079P22 is located on 36990886..37135402 of Pv07 of the v1.0 assembly

(Phytozome) and is similar to segments of the soybean Gm10. The region 36990..37135 kb of

Pv07 in common bean is syntenic to 5,303..5,344 kb of Gm10 of soybean (Fig 50). The second syntenic region between this BAC clone and soybean is located on Gm13 (Table 27).

Based on homology to Arabidopsis and bean ESTs, there are two complete genes and one incomplete gene at the 5’ end of the BAC clone which makes the gene density of one gene per

57.8 kb. However there is one complete gene and one incomplete gene at the 5’ end of the syntenic region of soybean genome with a gene density of one gene per 27 kb. Both genes are shared between two species with 97.1% similarity for Phenylalanine and histidine ammonia- lyase and 85.1% similarity for incomplete ethylene-insensitive protein 2 at amino acid level.

181

Figure 50. Overview of shared microsynteny between common bean BAC PV-GBa 0079P22 located on Pv07 and regions of soybean Gm10. Genes are represented by boxes and arrows (indicate transcription orientation). The upper panel indicates the position of the BAC clone in Pv07 of bean. The middle panel is the BAC clone region in the JGI v1.0 bean genome assembly (Phytozome). The lower panel is an annotation of the corresponding syntenic regions in Gm10 of soybean used as reference genome in synteny browser. Stacked boxes of gene annotation show alternative expression of the same gene. Numbers above bean and soybean genes correspond to the numbers in the annotation list (Table 27).

182

Table 27. Annotation and position of genes in PV-GBa 0079P22 BAC clone of common bean and its syntenic region in soybean. No. Gene annotation Gene position Protein similarity* Pv07 of common bean Soybean syntenic region 1 Soybean syntenic region 2

1 Phenylalanine and histidine 37020027 - 37024239 Gm10: 5306846 - 5311432 Gm13: 24273026 - 24278160 96.4% ammonia-lyase (PvPAL2)

2 Ethylene-insensitive protein 2 37134085 - 37142031 Gm10: 5342873 - 5351900 Gm13: 24307360 - 24316489 85.7%

* Protein similarity shown is between bean and Soybean syntenic region 1.

183

Comparison of orthologous regions of common bean PV-GBa 0061D18 BAC clone and soybean

BAC clone PV-GBa 0061D18 is located on 59,298,741..59,462,821 bp of Pv08 of of the v1.0 assembly (Phytozome) in common bean and is similar to segments of the soybean Gm02 and

Gm14. Bean region 59307..59313, 59298..59307 and 59313..59463 kb are syntenic to

51,260..51,368 kb of Gm02 and 232..328 kb of Gm14 of soybean respectively (Fig 51). The second syntenic region between this BAC clone and soybean is located on Gm02 (Table 28).

Based on homology to Arabidopsis, there are 27 genes in this region of bean with gene density of one gene per 6 kb and 21 genes on the syntenic part of soybean Gm14 with gene density of one gene per 9 kb. This BAC clone shares only one gene (PAL) with its syntenic part in Gm02. There is no PAL gene on Gm14 of soybean.

Seventeen of these annotated genes in this BAC clone are shared between bean and soybean with one on Gm02 and 16 on Gm14 with similarity ranging between 56-96% at amino acid level (Table 28). Genes with no function annotated were excluded from comparison.

Two genes including Exocyst complex and a non functional annotated gene are present on Gm14 of soybean. These genes are missing from PV-GBa 0061D18 BAC clone of bean and the second syntenic region on Gm02 of soybean (Table 28).

184

Figure 51. Overview of shared microsynteny between common bean BAC PV-GBa 0061D18 located on Pv08 and regions of soybean Gm14. Genes are represented by boxes and arrows (indicate transcription orientation). The upper panel indicates the position of the BAC clone in Pv08 of bean. The middle panel is the BAC clone region in the JGI v1.0 bean genome assembly (Phytozome). The lower panel is an annotation of the corresponding syntenic regions in Gm14 of soybean used as reference genome in synteny browser. Stacked boxes of gene annotation show alternative expression of the same gene. Numbers above bean and soybean genes correspond to the numbers in the annotation list (Table 28).

185

Table 28. Annotation and position of genes in PV-GBa 0061D18 BAC clone of common bean and its syntenic region in soybean. No. Gene annotation Gene position Protein Pv08 of common bean Soybean syntenic region 1 Soybean syntenic region 2 similarity* 1 Phenylalanine and histidine 59310665 - 59313444 Gm02: 51366270 - 51369028 Gm19: 43914585 - 43918615 90.8% ammonia-lyase 2 Endosomal membrane proteins 59314392 - 59317165 Gm14: 313533 - 317222 Gm02: 51369934 - 51373222 96.8% 3 Chlorophyll A-B binding protein 59318664 - 59320347 Gm14: 310973 - 312695 Gm02: 51374315 - 51376273 93.8% 4 N-acetyltransferase 59322048 - 59325184 Gm14: 305587 - 308887 Gm02: 51377292 - 51380459 95.2% 5 Coproporphyrinogen III oxidase 59327355 - 59330911 Gm14: 299732 - 303586 - 84.4% 6 Helicase - related 59331222 - 59332910 Gm14: 297805 - 299412 - 86.4% 7 PPR repeat 59333334 - 59336378 Gm14: 294412 - 296905 Gm02: 51383107 - 51385405 89.8% 8 UDP-glucosyl transferase 59337695 - 59339086 - - 9 Clathrin adaptor complex 59344467 - 59346775 - - 10 Zinc-iron transporter 59348009 - 59351799 - - 11 Phosphorylcholine transferase 59354912 - 59358570 Gm14: 289660 - 293516 Gm02: 51389714 - 51393366 95.6% 12 VQ motif 59359458 - 59361716 Gm14: 286885 - 288230 - 78.3% 13 Zinc finger, C3HC4 type 59365409 - 59370251 Gm14: 277970 - 285104 Gm02: 51394513 - 51401303 80.1% 14 no functional annotations 59371755 - 59373431 Gm14: 273695 - 275630 Gm02: 51405726 - 51407282 68.5% 15 UDP-glucosyl transferase 59373855 - 59376743 Gm14: 270271 - 273193 Gm02: 51407505 - 51409648 72.2% 16 Serine/threonine protein kinase 59380947 - 59387873 Gm14: 256121 - 263469 Gm02: 51416872 - 51427204 88.3% 17 Protein of unknown function 59391474 - 59396975 Gm14: 247241 - 252173 Gm02: 51431035 - 51436258 84.0% (DUF632) no functional annotations - Gm14: 244995 - 245584 - - 18 GDP-fucose protein O- 59402390 - 59407236 Gm14: 230771 - 235135 Gm02: 51455008 - 51458085 82.5% fucosyltransferase 19 Multidrug resistance pump 59410953 - 59418200 Gm14: 220008 - 225633 Gm02: 51460574 - 51466588 75.6% 20 Isocitrate dehydrogenase 59417674 - 59419529 Gm14: 218370 - 219993 - 96.7% 21 Phosphoglycerate mutase family 59419441 - 59421216 Gm14: 216645 - 218368 - 72.4% 22 no functional annotations 59422403 - 59422999 Gm14: 214656 - 215105 Gm02: 51536800 - 51538308 59.1% 23 Protein binding 59428378 - 59431166 Gm14: 195872 - 198137 Gm02: 51519759 - 51522866 70.7% 24 Patched-related 59432373 - 59453526 Gm14: 154072 - 194033 Gm02: 51471260 - 51518254 92.9% Exocyst complex, subunit SEC15 - Gm14: 146858 - 153075 - - 25 Serine/threonine protein kinase 59456714 - 59459935 Gm14: 139224 - 151498 Gm02: 51540269 - 51543752 77.0% * Protein similarity shown is between bean and Soybean syntenic region 1.

186

Molecular analysis of Phenylalanine ammonia-lyase genes in bean and soybean

Phytozome survay showed six genes annotated as Phenylalanine ammonia-lyase on Pv01, Pv07 and Pv08 of bean and eight genes on Gm02, Gm03, Gm10, Gm19 and Gm20 of soybean. Out of six PAL genes in bean, five have a homolog in soybean. PvPAL2 and PvPAL3 are homologs of

PAL genes on Gm10 at 5306..5311 kb and Gm02 at 51,366..51,369 kb respectively.

187

Discussion

Previous comparative studies of gene order in bean and soybean have utilized marker positions in the two genomes in order to reveal conservation of synteny between these genomes (Galeano et al. 2009; McClean et al. 2010; Galeano et al. 2011). As in this study, they also concluded that most loci in bean have two counterparts in the soybean genome and that the order of genes has been preserved in the two genomes in a segmental patern so that each bean chromosome segment is related to two soybean chromosome segments (Galeano et al. 2009; McClean et al. 2010;

Galeano et al. 2011). These results fit well with proposals that there was a whole genome duplication event in soybean after the divergence of beans and soybeans (REF). Based on these results a few maps of common bean chromosomes have been developed with corresponding markers of soybean incorporated into bean linkage groups.

In this study a different approach was used to compare bean and soybean genome organization. In particular, sections of bean and soybean genomes were aligned using Mercator and Mavid alignment tools, instead of physical positioning of molecular markers. The important criteria for these comparisons are the order and orientation of genes on the chromosomes, but the output from the analyses only identifies the best match to the subject genome. Therefore, in this study only one syntenic soybean match was found for each bean clone. In order to find the second syntenic region, the common molecular markers between bean and soybean were identified.

In spite of the different approach that was taken here, there was good correspondence between the current and previous regions of similarity that were identified between the bean and soybean genomes. For example, syntenic regions for the bean clones PV-GBa 0043K12, PV-

GBa 0079P22 and PV-GBa 0072I22 which contain PvMYB15, PvPAL2 and PvDFR3 were

188 found to be on soybean chromosomes Gm20, Gm10 and both Gm02 and Gm10 respectively.

This result is in agreement with Galeano et al (2011) who found that Pv07 shares its sequences mainly with Gm10 and Gm20 and partly with Gm13 and Gm02 of soybean. Gm10 and Gm20 of soybean seem to be partial duplications of each other, whereas Gm10 shares only some sequences with Gm02 (Cannon and Shoemaker, 2012). Gm02 is syntenic to the end of Pv07, which is in accordance with the physical localization determined in the current study for PV-GBa

0072I22 on the end of Pv07 (48,408,886..48,606,955 bp). The total length of Pv07 is 51769999 bp. The BAC clone PV-GBa 0072I22 has a shared synteny to the centromere and pericentromeric regions on Gm02, which are located at ~21 Mbp and 16 – 41 Mbp respectively.

The pericentromeric region is packed with repetitive DNA which can explain the dispersed pattern of both primary and secondary syntenic regions for this clone. Despite the presence of all genes of the BAC clone in the region with shared synteny in soybean, the transposable elements may cause transposition of genes, which might result in changes in their order and makes finding shared genes at their presumed position complicated.

PV-GBa 0083H05 with PvCHS-A is syntenic mainly to Gm19 and partly to Gm16. This clone shares all its three genes with Gm19 and all except CHS with Gm16 of soybean as the second syntenic region. This clone is located on Pv01 and the result is in agreement with

Galeano et al (2011) that the beginning of Pv01 in bean is syntenic to Gm19 of soybean. Gm19 shares some sequences in the middle of the chromosome to Gm16 (Cannon and Shoemaker,

2012). The fact that CHS is missing from the second syntenic region in soybean can be explained by the birth and death evolution of multigene families. In this model, some duplicate genes stay in the genome for a long time, whereas others may degenerate into pseudogenes or may be deleted from the genome through unequal crossing-over (Nei and Rooney, 2005).

189

The positioning of GBa 0005G03 at the beginning of Pv02 of bean and the identification of shared synteny over 11 common genes with Gm11 and one shared gene with Gm01 of soybean is in agreement with previous results (Galeano et al. 2011). Interestingly, although there are eight CHS genes in this clone, there is only one CHS gene at this position in the Gm11 syntenic region. The difference could be the result of gene duplication through illegitimate crossing over or retroposition in bean after the divergence between bean and soybean 19 MYA.

Such gene ampliciation allows each member of a gene family to acquire new spatial and temporal expression patterns as was shown by Ryder et al (1987).

The location of clone PV-GBa 0061D18 containing PvPAL3, a member of the gene family for the first committed enzyme of phenylpropanoid pathway, on the distal end of Pv08 of bean and the identification of shared syntenic regions with Gm14 and partially with Gm02 is in agreement with results from Galeano et al (2011) who showed that distal end of Pv08 is syntenic to Gm14 and Gm02 of soybean.. The clone shares its 16 annotated genes with Gm14 but only one gene with Gm02. In soybean, Gm02 has similarity at its distal end to Gm14 and Gm14 shares its sequences at the distal and proximal ends with Gm02 (Cannon and Shoemaker, 2012).

It is very interesting to note that, except for two clones located on Pv07 that were similar to Gm10 and Gm20, the rest of clones were colinear to two chromosomes in soybean which seem to be duplicates of each other. Gm10 and Gm20 also share a big stretch of sequence at their distal ends (Cannon and Shoemaker, 2012). These results indicate that the soybean genome has undergone some relocation and rearrangement after duplication. The Glycine lineage has undergone two rounds of whole genome duplication (WGD), once around 59 million years ago and the second between 5 and 13 mya (Schmutz et al., 2010). Analysis of satellite repeats showed that perhaps both events were allotetraploidy (Gill et al., 2009). In the soybean genome

190 sequence (cv Williams 82) 31,264 genes (15,632 gene pairs) are recent paralogs and 15,166 are singletons (REFS). The paralogs are believed to have been duplicated and retained after the 13- mya duplication event, whereas, the rest have reverted to singletons (Schmutz et al., 2010).

Common bean and soybean diverged ~19 MYA, thus every gene in bean should have at least two homologs in the soybean genome except where orthologs have been lost due to evolutionary forces of divergence, inactivation or other birth and death processes.

More specifically, there are different factors such as recombination and transposon movements that change genomes over time. “Biased fractionation” is an evolutionary process in which genes are preferentially removed from one of the two homologs derived from WGD

(Thomas et al., 2006; Woodhouse et al., 2010). This elimination usually happens through illegitimate recombination (Woodhouse et al., 2010). PvPAL3 which is missing from the first syntenic region of soybean on Gm14 is an example of such event in this study. The gene balance hypothesis (GBH) also describes non random gene loss (Freeling, 2009). The hypothesis is that highly connected genes are in balance with their respective interactions, cascades, and complexes and are subject to purifying selection. On the other hand, poorly connected genes may have redundant function and are subject to random loss (Freeling, 2009). The results of this study show that most of the analyzed phenylpropanoid genes are preserved in both genomes and nearly all of them are at least duplicated in tetraploid soybean.

191

VI. CONCLUSION AND FINAL REMARKS

The objective of this thesis was to test the hypothesis that colour genes are structural and regulatory genes of the phenylpropanoid pathway. Better understanding of this complex relationship will help breeders to select germplasm with improved nutritional quality without adversely affecting disease or pest resistance.

The map locations of 18 phenylpropanoid genes were determined using the Core (BAT93

× Jalo EEP 558) recombinant inbred line population and five genes were placed close to markers for colour loci. PvPAL1 and PvPAL2 structural genes and PvMYB15 transcription factor were mapped close to the P locus on Pv07 with 13.2 cM, 17.1 cM and 7.8 cM distance correspondingly. The P locus controls the presence and absence of colour production in the bean seed coat (Bassett, 2007) and was speculated to have a regulatory function in the phenylpropanoid pathway. The gene mapping and gene characterization work in the current study suggest that PvMYB15 is a good candidate as the pigmentation (P) gene in bean.

Chalcone reductase (CHR) was mapped close to a marker of G locus (7 cM apart from

OAP3) on Pv04. This gene is the “yellow brown factor” of Prakken (Prakken, 1970) and CHR is an enzyme that co-acts with CHS to produce a branch in the first step of the flavonoid pathway.

In legumes, there is a biosynthetic route before production of flavonoid and isoflavonoids, that leads to the production of aurones (Farag, 2009). Aurones are yellow-coloured flavonoid compounds that have significant roles in flower pigmentation (Nakayama, 2001). The current results raised the possibility that an increase in the activity of the CHR enzyme may cause the

192 shift of primary metabolites towards the production of aurones and, thus, give a yellow brown colour to the seed coat.

Cinnamyl alcohol dehydrogenase (CAD) was mapped on Pv02 close to B locus (2.1 cM) which is the gene for (Greenish) Gray Brown seed coat (Prakken, 1972). CAD catalyses the last step in the biosynthesis of the lignin monomers (Baucher, 1999) and its down-regulation was associated with a red colouration of the stem (Baucher, 1999) therefore presence of colour in the stem might be indicative of a switch toward anthocyanin branch.

The map locations of five phenylpropanoid genes were established in the OAC (OAC

Rex × SVM Taylor) recombinant inbred line populations and PvDFR1 was mapped close (13.9 cM) to a flower colour locus on Pv01. Flower colour in bean is controlled by few different loci including: P, T, V, C, Rk, Am, Beg, bic, blu, No, Prp, Sal, wb, F and Nud (Yarnell, 1965). The blu locus was shown to be linked to the Fin locus which controls indeterminate vs. determinate plant growth (Bassett, 1992) and the Fin locus is located on Pv01 (Kwak et al. 2008). Except for

P, T, V, C and Rk, which are also the main colour loci for seed coat, the map positions of other flower colour loci are unknown.

A significant contribution to the OAC molecular marker database was made. Ten SSR markers and 89 SNP markers were scored for the 89 inbred line derived from cross between

‘OAC Rex’ and ‘SVM Taylor’. These markers were put on an existing OAC genetic linkage map

(Larsen, 2005) which segregates for main the seed coat colour loci, especially P, and is an excellent resource to study colour genes. An even more saturated linkage map will help to locate the colour loci more accurately.

193

The Core population used in this study was initially developed for mapping studies

(Nodari et al. 1992) and not all colour genes segregate in this population. Thus, different populations need to be studied in order to reveal potential relationship between any colour locus and phenylpropanoid genes. Another limiting factor is that colour loci have epistatic interactions, whereas the expression of some of the colour genes depends on the presence of other genes in dominant states. For example C is required for the expression of the V alleles (Feenstra, 1960).

Also, despite considerable effort, there is no routine transformation system yet available for common bean and complementation or knockout studies are not feasible in this species.

Therefore, the best way to study the identity of colour loci is to work with backcross populations in which only one colour gene segregates in the offspring.

Genetic tester stocks for the colours and patterns of common bean seed coats have been developed by backcrossing selected recessive alleles, singly and in combination, into a recurrent parent 5-593 in order to facilitate seed coat colour genotyping of unknown lines (Bassett, 1994).

Florida dry-bean breeding line 5-593, with black seed and Bishop’s violet flowers has dominant alleles for all colour genes (Bassett, 1998). Since the P locus was of a particular interest in this study, the expression patterns of phenylpropanoid genes were studied in three tester lines including 5-593, pBC3 5-593 (white seed coat and flower) and cdjBC3 5-593 (white seed coat and purple flower) with different alleles at P locus. Only the expression pattern of Chalcone synthase

(CHS) and Dihydroflavonol reductase (DFR) were different in these lines. However, the expression pattern of none of these genes was consistent with what is expected for the P gene. In the future, expression studies (RNAseq) in tester lines, of all the colour genes, will be very useful.

194

Sequencing of the complete coding region is the best approach for gene comparisons, hence, it was decided to fully sequence the candidates for P, including PAL and MYB15 (based on their genetic location on Pv07 close to P locus) and CHS and DFR for their important functions in the phenylpropanoid pathway and specifically in the anthocyanin branch. The P. vulgaris BAC library constructed from HindIII digested P. vulgaris cultivar G19833 genomic

DNA (obtained from the Clemson University Genomics Institute) was screened for these genes and a positive, PCR-confirmed clone were picked for plasmid extraction and high throughput sequencing. Two BAC clones for CHS (PvCHS-A and PvCHS-B), one for DFR (PvDFR3), one for MYB TF (PvMYB15) and three for PAL (PvPAL1, PvPAL2 and PvPAL3) were sequenced.

For unknown reasons, sequencing of PvPAL1 failed. BAC sequences were assembled using the bean genome assembly v1.0 (Phytozome) as reference and their physical positions on bean genome were determined. Gene composition of each BAC clone and their shared synteny with soybean was studied in detail.

PV-GBa 0083H05 and PV-GBa 0005G03 BAC clones contain PvCHS-A and PvCHS-B and located on Pv01 and Pv02 respectively. As typical CHS genes the ones characterized in the current study consisted of two exons and a single intron (Yang et al. 2002) but PvCHS-A had a considerably larger intron than PvCHS-B. Although PvCHS-A was present as a single copy gene the clone PV-GBa 0005G03 contained eight full and one partial copy of the PvCHS-B. The clustering observed for the PvCHS-B genes is similar to a cluster of CHS genes previously identified in the I (Inhibitor) locus of soybean, that has been associated with seed coat colour development through posttranscriptional gene silencing (PTGS) (Eckardt et al. 2009). Future studies to sequence the region containing the PvCHS-B gene cluster in bean lines with different seed coat phenotypes, and also microRNA and siRNA studies would be useful to determine if the

195 gene silencing events observed in soybean also occur in common bean. PV-GBa 0005G03 has a shared synteny with Gm11 and Gm01 of soybean where there is only one CHS as their syntenic counterpart. This occurrence could be the result of gene duplication in bean after its divergence from soybean 19 MYA. PV-GBa 0083H05 has shared synteny with Gm19 and Gm16 but the

CHS gene is missing from the Gm16 syntenic region, possibly due to the high number of retrotransposons observed in this region in both bean and soybean. A more detailed study of gene evolution in bean and soybean will help to estimate the time of gene duplication and deletion in this case. Also, a more detailed study of the promoter regions for each gene in the family will determine if any of them have acquired different spatial and temporal expression pattern.

Studying the expression patterns of these genes in different tissues would also be helpful but since they have such similar coding sequences, designing gene specific primers would be a challenge.

Clones PV-GBa 0043K12, PV-GBa 0079P22 and PV-GBa 0072I22 containing

PvMYB15, PvPAL2 and PvDFR3 are located on Pv07 and have a shared synteny with Gm20,

Gm10 and both Gm02 and Gm10 of soybean respectively. All three clones contained single copies of related genes and the structures of the genes are consistent with those in other plant species. Also the syntenic regions in bean and soybean had more or less the same genes which indicate high sequence and gene conservation in the two species.

PV-GBa 0061D18 is the BAC clone with the PvPAL3 gene which is physically located on Pv08 of common bean. The PvPAL3 gene contains two exons and one intron and codes for a protein with 686 amino acid residues. This gene is placed at the very end of Pv08 on which four colour loci are located (Gy, R, C and Prp). Sequence variation analysis of full length PvPAL3 in

196 bean lines with dominant and recessive alleles at these loci will clarify any potential function of this gene in seed coat colour development.

197

REFERENCES

Abe H, Urao T, Ito T, Seki M, Shinozaki K, Yamaguchi-Shinozaki K (2003) Arabidopsis AtMYC2 (bHLH) and AtMYB2 (MYB) function as transcriptional activators in abscisic acid signaling. Plant Cell 15: 63-78

Abe I, Morita H (2010) Structure and function of the chalcone synthase superfamily of plant type III polyketide synthases. Nat Prod Rep 27: 809-838

Adam-Blondon AF, Sevignac M, Dron M, Bannerot H (1994) A genetic map of common bean to localize specific resistance genes against anthracnose. Genome 37: 915-924

Agriculture and Agri-Food Canada (AAFC) (2000) Dry Beans: Situation and Outlook. Biweekly Bulletin 13 (16)

Agriculture and Agri-Food Canada (AAFC) (2013) Canada: Outlook for Principal Field Crops

Aharoni A, De Vos C, Wein M, Sun Z, Greco R, Kroon A, Mol JN, O'Connell A (2001) The strawberry FaMYB1 transcription factor suppresses anthocyanin and flavonol accumulation in transgenic tobacco. Plant J 28: 319-332

Ahn S, Tanksley SD (1993) Comparative linkage maps of rice and maize genomes. Proc Natl Acad Sci 90: 7980-7984

Aida R, Yoshida K, Kondo T, Kishimoto S, Shibata M (2000) Copigmentation gives bluer flowers on transgenic torenia plants with the antisense dihydroflavonol-4-reductase gene. Plant Science 160: 49-56

Allan AC, Hellens RP, Laing WA (2008) MYB transcription factors that colour our fruit. Trends Plant Sci 13: 99-102

Allwood EG, Davies DR, Gerrish C, Ellis BE, Bolwell GP (1999) Phosphorylation of phenylalanine ammonia-lyase: evidence for a novel protein kinase and identification of the phosphorylated residue. FEBS Lett 457: 47-52

Appert C, Zon J, Amrhein N (2003) Kinetic analysis of the inhibition of phenylalanine ammonia- lyase by 2-aminoindan-2-phosphonic acid and other phenylalanine analogues. Phytochemistry 62: 415-422

Aragão FJL, Ribeiro SG, Barros LMG, Brasileiro ACM, Maxwell DP, Rech EL, Faria JC (1998) Transgenic beans (Phaseolus vulgaris L.) engineered to express viral antisense RNAs show delayed and attenuated symptoms to bean golden mosaic geminivirus. Mol Breed 4: 491-499

198

Arnold K, Bordoli L, Kopp J, Schwede T (2006) The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics 22: 195-201

Arumuganthan K, Earle ED (1991) Nuclear DNA content of some important plant species. Plant Mol Biol Rep 9: 208-218

Austin M, Noel J (2003) The chalcone synthase superfamily of type III polyketide synthases. Nat Prod Rep 20(1): 79-110

Aw TL, Swanson BG (1985) Influence of tannin on Phaseolus vulgaris protein digestibility and quality. J Food Sci 50: 67-71

Awuma K, Bassett MJ (1988) Addition of genes for dwarf seed (ds) and spindly branch (sb) to the linkage map of common bean. J Amer Soc Hort Sci 113: 464-467

Ayabe SI, Udagawa A, Furuya T (1988) NAD(P)H-dependent 6'-deoxychalcone synthase activity in Glycyrrhiza echinata cells induced by yeast extract. Arch Biochem Biophys 261: 458-462

Bais HP, Weir TL, Perry LG, Gilroy S, Vivanco JM (2006) The role of root exudates in rhizosphere interactions with plants and other organisms. Annu Rev Plant Biol 57: 233-266

Bassett MJ (1992) An induced mutant for blue flowers in common bean that is not allelic to V or Sal and is linked to Fin. J Amer Soc Hort Sci. 117:317-320

Bassett MJ (1991) A revised linkage map of common bean. HortScience 26: 834-836

Bassett MJ (1994) The griseoalbus (gray-white) seed coat colour is controlled by an allele (pgri) at the P locus in common bean. HortScience 29:1178–1179

Bassett MJ (1998) A third recessive allele, stpmic, for seedcoat pattern at the Stp locus in common bean. J Amer Soc Hort Sci 123: 404-406

Bassett MJ (2007) Genetics of seed coat colour and pattern in common bean. In: Plant Breeding Reviews. John Wiley & Sons, Inc, pp 239-315

Bassette MJ (2003) inheritance of the cartridge buff micropyle stripe expressed in the genetic

stock, p BC3 5-593, of common bean maintained at Pullman, WA. BIC 46: 35-36

Bate-Smith E (1973) Haemanalysis of tannins: The concept of relative astringency. Phytochemistry 12: 907-912

Baucher M, Bernard-vailhé M, Chabbert B, Besle J, Opsomer C, Montagu M, Botterman J (1999) Down-regulation of cinnamyl alcohol dehydrogenase in transgenic alfalfa (Medicago

199

sativa L.) and the effect on lignin composition and digestibility. Plant Molecular Biology 39 (3): 437-447

Baudry A, Heim MA, Dubreucq B, Caboche M, Weisshaar B, Lepiniec L (2004) TT2, TT8, and TTG1 synergistically specify the expression of BANYULS and proanthocyanidin biosynthesis in Arabidopsis thaliana. Plant J 39: 366-380

Beebe S, Gonzalez AV, Rengifo J (2000) Research on trace minerals in the common bean. Food Nutr Bull 21:387–391

Beebe S, Rengifo J, Gaitan E, Duque MC, Tohme J (2001) Diversity and Origin of Andean Landraces of Common Bean. Crop Sci 41:854-862

Beerhues L, Wiermann R (1988) Chalcone synthases from spinach (Spinacia oleracea L.). Planta 173: 532-543

Beld M, Martin C, Huits H, Stuitje AR, Gerats AGM (1989) Flavonoid synthesis in Petunia hybrida: partial characterization of dihydroflavonol-4-reductase genes. Plant Mol Biol 13: 491-502

Beninger CW, Hosfield GL (2003) Antioxidant Activity of Extracts, Condensed Tannin Fractions, and Pure Flavonoids from Phaseolus vulgaris L. Seed Coat Colour Genotypes. J Agric Food Chem 51: 7879-7883

Beninger CW, Hosfield GL, Bassett MJ (1999) Flavonoid composition of three genotypes of dry bean (Phaseolus vulgaris) differing in seedcoat colour. J Amer Soc Hort Sci 124: 514-518

Beninger CW, Hosfield GL, Bassett MJ, Owens S (2000) chemical and morphological expression of the B and Asp seedcoat genes in Phaseolus vulgaris. J Amer Soc Hort Sci 125: 52-58

Beninger CW, Hosfield GL, Nair MG (1998a) Physical characteristics of dry beans in relation to seedcoat colour genotype. HortScience 33: 328-329

Beninger CW, Hosfield GL, Nair MG (1998b) Flavonol glycosides from the seed coat of a new manteca-type dry bean (Phaseolus vulgaris L.). J Agric Food Chem 46: 2906-2910

Bennetzen JL (2000) Comparative sequence analysis of plant nuclear genomes: microcolinearity and its many exceptions. Plant Cell 12:1021-1029

Bezanson GS, Desaty D, Emes AV, Vining LC (1970) Biosynthesis of cinnamamide and detection of phenylalanine ammonia lyase in Strepromyces verticillatus. Can J Microbiol 16: 147-151

200

Bitocchi E, Nanni L, Bellucci E, Rossi M, Giardini A, Zeuli PS, Logozzo G, Stougaard J, McClean P, Attene G, Papa R (2012) Mesoamerican origin of the common bean (Phaseolus vulgaris L.) is revealed by sequence data. PNAS 109: E788-E796

Blair M, Diaz LM, Buendía HF, Duque MC (2009) Genetic diversity, seed size associations and population structure of a core collection of common beans (Phaseolus vulgaris L.). Theor Appl Genet 119:955–972

Bolwell GP, Bell JN, Cramer CL, Schuch W, Lamb CJ, Dixon RA (1985) l-Phenylalanine ammonia-lyase from Phaseolus vulgaris. European Journal of Biochemistry 149: 411-419

Bonierbale MW, Plaisted RL, Tanksley SD (1988) RFLP maps based on a common set of clones reveal modes of chromosomal evolution in potato and tomato. Genetics 120: 1095-1103

Borevitz JO, Xia Y, Blount J, Dixon RA, Lamb C (2000) Activation Tagging Identifies a Conserved MYB Regulator of Phenylpropanoid Biosynthesis. Plant Cell 12: 2383-2394

Borovsky Y, Oren-Shamir M, Ovadia R, De Jong W, Paran I (2004) The A locus that controls anthocyanin accumulation in pepper encodes a MYB transcription factor homologous to Anthocyanin2 of Petunia. Theor Appl Genet 109: 23-29

Broughton WJ, Hernandez G, Blair M, Beebe S, Gepts P, Vanderleyden J (2003) Beans (Phaseolus spp.) - model food legumes. Plant and Soil 252: 55-128

Brugliera F, Barri-Rewell G, Holton TA, Mason JG (1999) Isolation and characterization of a flavonoid f3'-hydroxylase cDNA clone corresponding to the Ht1 locus of Petunia hybrida. The Plant Journal 19: 441-451

Burbulis IE, Iacobucci M, Shirley BW (1996) A null mutation in the first enzyme of flavonoid biosynthesis does not affect male fertility in arabidopsis. Plant Cell 8: 1013-1025

Caldas GV, Blair MW (2009) Inheritance of seed condensed tannins and their relationship with seed-coat color and pattern genes in common bean (Phaseolus vulgaris L.). Theor Appl Genet 119(1): 131-142

Camm EL, Towers GHN (1973) Phenylalanine ammonia lyase. Phytochemistry 12: 961-973

Cannon S, Shoemaker R (2012) Evolutionary and comparative analyses of the soybean genome. Breeding Science 61: 437-444

Cannon SB, May GD, Jackson SA (2009) Three sequenced legume genomes and many crop species: rich opportunities for translational genomics. Plant Physiol 151: 970-977

Cannon SB, Sterck L, Rombauts S, Sato S, Cheung F, Gouzy J, Wang X (2006) Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes. Proc Natl Acad Sci 103: 14959-14964

201

Chaman ME, Copaja SV, Argandona VH (2003) Relationships between Salicylic Acid Content, Phenylalanine Ammonia-lyase (PAL) Activity, and Resistance of Barley to Aphid Infestation. J Agric Food Chem 51: 2227-2231

Chang A, Lim M, Lee S, Robb EJ, Nazar RN (2008) Tomato Phenylalanine Ammonia-Lyase Gene Family, Highly Redundant but Strongly Underutilized. J Biol Chem 283: 33591-33601

Cheng F, Wu J, Fang L, Wang X (2012) Syntenic gene analysis between Brassica rapa and other Brassicaceae species. Front Plant Sci 3:198

Chiu L, Zhou X, Burke S, Wu X, Prior RL, Li L (2010) The Purple Cauliflower Arises from Activation of a MYB Transcription Factor. Plant Physiol 154: 1470-1480

Choi H, Mun J, Kim D, Zhu H, Baek J, Mudge J, Roe B, Ellis N, Doyle J, Kiss GB, Young ND, Cook DR (2004) Estimating genome conservation between crop and model legume species. Proceedings of the National Academy of Sciences of the United States of America 101: 15289-15294

Choudhary AD, Kessmann H, Lamb CJ, Dixon RA (1990) Stress responses in alfalfa (Medicago sativa L.) IV. Expression of defense gene constructs in electroporated suspension cell protoplasts. Plant Cell Rep 9: 42-46

Clifford MN (1996) Anthocyanins in foods. In: Symposium on polyphenols and anthocyanins as food colourants and antioxidants. Brussels, Belgium. pp 1-19

Clough SJ, Tuteja JH, Li M, Marek LF, Shoemaker RC, Vodkin LO (2004) Features of a 103-kb gene-rich region in soybean include an inverted perfect repeat cluster of CHS genes comprising the I locus. Genome 47: 819-831

Cochrane FC, Davin LB, Lewis NG (2004) The Arabidopsis phenylalanine ammonia lyase gene family: kinetic characterization of the four PAL isoforms. Phytochemistry 65: 1557-1564

Cominelli E, Galbiati M, Vavasseur A, Conti L, Sala T, Vuylsteke M, Leonhardt N, Dellaporta S, C T (2005) A guard-cell-specific MYB transcription factor regulates stomatal movements and plant drought tolerance. Curr Biol 15: 1196-1200

Costa MA, Collins RE, Anterola AM, Cochrane FC, Davin LB, Lewis NG (2003) An in silico assessment of gene function and organization of the phenylpropanoid pathway metabolic networks in Arabidopsis thaliana and limitations thereof. Phytochemistry 64: 1097-1112

Cramer C, Edwards K, Dron M, Liang X, Dildine S, Bolwell G, Dixon R, Lamb C, Schuch W (1989) Phenylalanine ammonia-lyase gene organization and structure. Plant Mol Biol 12: 367-383

Dao TTH, Linthorst HJM, Verpoorte R (2011) Chalcone synthase and its functions in plant resistance. Phytochemistry Reviews 10: 397-412

202

Davies KM, Schwinn KE (2007) Molecular biology and biotechnology of flavonoid biosynthesis. In: O. M. Andersen and K. R. Markham (eds) Flavonoids – Chemistry, Biochemistry and Applications. Taylor and Francis Group, Boca Raton pp 143-218

Davies KM, Schwinn KE, Deroles SC, Manson DG, Lewis DH, Bloor SJ, Bradley JM (2003) Enhancing anthocyanin production by altering competition for substrate between flavonol synthase and dihydroflavonol 4-reductase. Euphytica 131: 259-268

De Jong WS, De Jong DM, De Jong H, Kalazich J, Bodis M (2003) An allele of dihydroflavonol 4-reductase associated with the ability to produce red anthocyanin pigments in potato (Solanum tuberosum L.). Theor Appl Genet 107: 1375-1383

De Paoli E, Dorantes-Acosta A, Zhai J, Accerbi M, Jeong D, Park S, Meyers BC, Jorgensen RA, Green PJ (2009) Distinct extremely abundant siRNAs associated with cosuppression in petunia. RNA 15: 1965-1970

De Vetten N, Quattrocchio F, Mol J, Koes R (1997) The an11 locus controlling flower pigmentation in petunia encodes a novel WD-repeat protein conserved in yeast, plants, and animals. Genes Dev 11: 1422-1434

Deavours BE, Dixon RA (2005) Metabolic Engineering of Isoflavonoid Biosynthesis in Alfalfa. Plant Physiol 138: 2245-2259

Deluc L, Barrieu F, Marchive C, Lauvergeat V, Decendit A, Richard T, Carde J, Merillon J, Hamdi S (2006) Characterization of a grapevine R2R3-MYB transcription factor that regulates the phenylpropanoid pathway. Plant Physiol 140: 499-511

Deshpande AS, Surendranathan KK, Nair PM (1993) The phenyl propanoid pathway enzymes in Solanum tuberosum exist as a multienzyme complex. Indian J Biochem Biophys 30: 36-41

Dewey CN (2007) Aligning multiple whole genomes with Mercator and MAVID. Methods Mol Biol 395: 221-236

Dhaubhadel S, Gijzen M, Moy P, Farhangkhoee M (2007) Transcriptome analysis reveals a critical role of CHS7 and CHS8 genes for isoflavonoid synthesis in soybean seeds. Plant Physiol 143: 326-338

Diaz AM, Caldas GV, Blair MW (2010). Concentrations of condensed tannins and anthocyanins in common bean seed coats. Food Res Int 43:595-601

Ding Z, Li S, An X, Liu X, Qin H, Wang D (2009) Transgenic expression of MYB15 confers enhanced sensitivity to abscisic acid and improved drought tolerance in Arabidopsis thaliana. Journal of Genetics and Genomics 36: 17-29

Dixon RA, Paiva NL (1995) Stress-induced phenylpropanoid metabolism. Plant Cell 7: 1085- 1097

203

Dixon RA, Xie D, Sharma SB (2005) Proanthocyanidins: a final frontier in flavonoid research? New Phytol 165: 9-28

Doerner PW, Stermer B, Schmid J, Dixon RA, Lamb CJ (1990) Plant defense gene promoter- reporter gene fusions in transgenic plants: tools for identification of novel inducers. Nat Biotech 8: 845-848

Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh leaf tissue. Phytochemical Bulletin 19: 11-15

Doyle JJ, Egan AN (2010) Dating the origins of polyploidy events. New Phytol 186: 73-85

Doyle JJ, Luckow MA (2003) The Rest of the Iceberg. Legume Diversity and Evolution in a Phylogenetic Context. Plant Physiol 131: 900-910

Dron M, Clouse SD, Dixon RA, Lawton MA, Lamb CJ (1988) Glutathione and fungal elicitor regulation of a plant defense gene promoter in electroporated protoplasts. Proc Natl Acad Sci 85: 6738-6742

Du H, Huang Y, Tang Y (2010) Genetic and metabolic engineering of isoflavonoid biosynthesis. Applied Microbiology and Biotechnology 86: 1293-1312

Dubos C, Stracke R, Grotewold E, Weisshaar B, Martin C, Lepiniec L (2010) MYB transcription factors in Arabidopsis. Trends Plant Sci 15: 573-581

Durbin ML, McCaig B, Clegg MT (2000) Molecular evolution of the chalcone synthase multigene family in the morning glory genome. Plant Mol Biol 42: 79-92

Eckardt N (2009) Tissue-Specific siRNAs that silence CHS genes in soybean. Plant Cell 21: 2983-2984

Edwards R, Mavandad M, Dixon RA (1990) Metabolic fate of cinnamic acid in elicitor treated cell suspension cultures of Phaseolus vulgaris . Phytochemistry 29: 1867-1873

Elomaa P, Uimari A, Mehto M, Albert VA, Laitinen RA, Teeri TH (2003) Activation of anthocyanin biosynthesis in Gerbera hybrida (Asteraceae) suggests conserved protein- protein and protein-promoter interactions between the anciently diverged monocots and eudicots. Plant Physiol 133: 1831-1842

Erdmann PM, Lee RK, Bassett MJ, McClean PE (2002) A molecular marker tightly linked to P, a gene required for flower and seedcoat colour in common bean (Phaseolus vulgaris L.), contains the Ty3-gypsy retrotransposon Tpv3g. Genome 45(4): 728-736

Espley RV, Hellens RP, Putterill J, Stevenson DE, Kutty-Amma S, Allan AC (2007) Red colouration in apple fruit is due to the activity of the MYB transcription factor, MdMYB10. Plant J 49: 414-427 204

Estrada-Navarrete G, Alvarado-Affantranger X, Olivares JE, Diaz-Camino C, Santana O, Murillo E, Guillen G, Sanchez-Guevara N, Acosta J, Quinto C, Li D, Gresshoff PM, Sanchez F (2006) Agrobacterium rhizogenes Transformation of the Phaseolus spp.: A Tool for Functional Genomics. Mol Plant-Microbe Interact 19: 1385-1393

FAOSTAT. 2011. Agricultural Production. htttp://faostat.fao.org.

Farag MA, Deavours BE, de Fátima Â, Naoumkina M, Dixon RA, Sumner LW (2009) Integrated metabolite and transcript profiling identify a biosynthetic mechanism for hispidol in Medicago truncatula Cell Cultures. Plant Physiology 151: 1096-1113

Fasoula DA, Stephens PA, Nickell CD Vodkin LO (1995) Cosegregation of purple-throat flower colour with dihydroflavonal reductase polymorphism in soybean. Crop Sci 35: 1028-1031

Feenstra WJ (1960) Biochemical aspects of seedcoat colour inheritance in Phaseolus vulgaris L. Meded Landbouwhogeschool Wageningen 60: 1-53

Feng S, Wang Y, Yang S, Xu Y, Chen X (2010) Anthocyanin biosynthesis in pears is regulated by a R2R3-MYB transcription factor PyMYB10. Planta 232: 245-255

Ferrer J, Jez JM, Bowman ME, Dixon RA, Noel JP (1999) Structure of chalcone synthase and the molecular basis of plant polyketide biosynthesis. Nat Struct Mol Biol 6: 775-784

Forkmann G, Ruhnau B (1987) Distinct substrate specificity of dihydroflavonol 4-reductase from flowers of Petunia hybrida. Z Naturforsch 42C: 1146-1148

Freeling M (2009) Bias in plant gene content following different sorts of duplication: Tandem, whole-genome, segmental, or by transposition. Annu Rev Plant Biol 60: 433–453

Freyre R, Skroch PW, Geffroy V, Adam-Blondon AF, Shirmohamadali A, Johnson WC, Llaca V, Nodari RO, Pereira PA, Tsai SM (1998) Towards an integrated linkage map of common bean. 4. Development of a core linkage map and alignment of RFLP maps. TAG Theoretical and Applied Genetics 97(5/6): 847-856

Gale MD, Devos KM (1998) Comparative genetics in the grasses. Proc Natl Acad Sci 95: 1971- 1974

Galeano CH, Fernandez AC, Franco-Herrera N, Cichy KA, McClean PE, Vanderleyden J, Blair MW (2011) Saturation of an intra-gene pool linkage map: towards a unified consensus linkage map for fine mapping and synteny analysis in common bean. PLoS ONE 6: e28135

Galeano CH, Fernandez AC, Gomez M, Blair MW (2009) Single strand conformation polymorphism based SNP and indel markers for genetic mapping and synteny analysis of common bean (Phaseolus vulgaris L.). BMC Genomics 10: 629

205

Galego L, Almeida J (2007) Molecular genetic basis of flower colour variegation in Linaria. Genet Res 89: 129-134

Gepts P (1988) Phaseolin as an evolutionary marker. In: Anonymous Kluwer Academic Publishers, Dordrecht, Holland pp 215-241

Gepts P, Aragão FJL, de Barros E, Blair M, Brondani R, Broughton W, Galasso I, Hernández G, Kami J, Lariguet P, McClean P, Melotto M, Miklas P, Pauls P, Pedrosa-Harand A, Porch T, Sánchez F, Sparvoli F, Yu K (2008) Genomics of Phaseolus Beans, a Major Source of Dietary Protein and Micronutrients in the Tropics. In: Moore PH and Ming R (eds) Genomics of tropical crop plants. Springer, New York, N.Y. pp 113-143

Gepts P, Bliss FA (1985) F1 hybrid weakness in the common bean. J Hered 76: 447-450

Gill N, Findley S, Walling JG, Hans C,Ma J, Doyle J, Stacey G, Jackson SA (2009) Molecular and chromosomal evidence for allopolyploidy in soybean. Plant Physiology 151: 1167–1174

Goff SA, Cone KC, Chandler VL (1992) Functional analysis of the transcriptional activator encoded by the maize B gene: evidence for a direct functional interaction between two classes of regulatory proteins. Genes Dev 6: 864-875

Gollop R, Even S, Colova‐Tsolova V, Perl A (2002) Expression of the grape dihydroflavonol reductase gene and analysis of its promoter region. Journal of Experimental Botany 53: 1397- 1409

Gonzalez A, Mendenhall J, Huo Y, Lloyd A (2009) TTG1 complex MYBs, MYB5 and TT2, control outer seed coat differentiation. Dev Biol 325: 412-421

Gonzalez A, Zhao M, Leavitt JM, Lloyd AM (2008) Regulation of the anthocyanin biosynthetic pathway by the TTG1/bHLH/Myb transcriptional complex in Arabidopsis seedlings. Plant J 53: 814-827

Green PJ, Yong MH, Cuozzo M, Kano-Murakami Y, Silverstein P, Chua NH (1988) Binding site requirements for pea nuclear protein factor GT-1 correlate with sequences required for light- dependent transcriptional activation of the rbcS-3A gene. EMBO J 7: 4035-4044

Grotewold E, Athma P, Peterson T (1991) Alternatively spliced products of the maize P gene encode proteins with homology to the DNA-binding domain of myb-like transcription factors. Proc Natl Acad Sci USA 88: 4587-4591

Hagerman AE, Riedl KM, Jones GA, Sovik KN, Ritchard NT, Hartzfeld PW, Riechel TL (1998) High molecular weight plant polyphenolics (tannins) as biological antioxidants. J Agric Food Chem 46: 1887-1892

206

Hahlbrock K, Scheel D (1989) Physiology and molecular biology of phenylpropanoid metabolism. Annu Rev Plant Physiol Plant Mol Biol 40: 347-369

Hamberger B, Ellis M, Friedmann M, de Azevedo Sousa C, Barbazuk B, Douglas C (2007) Genome-wide analyses of phenylpropanoid-related genes in Populus trichocarpa, Arabidopsis thaliana, and Oryza sativa: the Populus lignin toolbox and conservation and diversification of angiosperm gene families. Can J Bot 85: 1182–1201

Hanson KR, Havir EA (1970) -Phenylalanine ammonia-lyase : IV. Evidence that the prosthetic group contains a dehydroalanyl residue and mechanism of action. Arch Biochem Biophys 141: 1-17

Harris HB, Burns RE (1973) Relationship between tannin content of sorghum grain and preharvest seed molding. Agron J 65: 957-959

Harrison MJ, Choudhary AD, Dubery I, Lamb CJ, Dixon RA (1991) Stress responses in alfalfa (Medicago sativa L.). 8. Cis-elements and trans-acting factors for the quantitative expression of a bean chalcone synthase gene promoter in electroporated alfalfa protoplasts. Plant Mol Biol 16: 877-890

Harrison MJ, Lawton MA, Lamb CJ, Dixon RA (1991) Characterization of a nuclear protein that binds to three elements within the silencer region of a bean chalcone synthase gene promoter. Proc Natl Acad Sci USA 88: 2515-2519

Helariutta Y, Elomaa P, Kotilainen M, Seppänen P, Teeri TH (1993) Cloning of cDNA coding for dihydroflavonol-4-reductase (DFR) and characterization of dfr expression in the corollas of Gerbera hybrida var. Regina (Compositae). Plant Molecular Biology 22: 183-193

Heller W, Forkmann G, Britsch L, Grisebach H (1985) Enzymatic reduction of (+)- dihydroflavonols to flavan-3,4-cis-diols with flower extracts from Matthiola incana and its role in anthocyanin biosynthesis. Planta 165: 284-287

Hernandez-Infante M, Herrador-Pena G, Sotelo-Lopez A (1979) Nutritive value of two different beans (Phaseolus vulgaris) supplemented with methionine. J Agric Food Chem 27: 965-968

Hertog MGL, Feskens EJM, Hollman PCH, Katan MB, Kromhout D (1993) Dietary antioxidant flavonoids and risk of coronary heart disease: a Zutphen Elderly Study. Lancet 342: 10007- 10011

Hertog MGL, Hollman PCH, Katan MB (1992) Content of potentially anticarcinogenic flavonoids in 28 vegetables and 9 fruits commonly consumed in the Netherlands. J of Agr Food Chem 40: 2379-2383

Himi E, Noda K (2004) Isolation and location of three homoeologous dihydroflavonol-4- reductase (DFR) genes of wheat and their tissue-dependent expression. J Exp Bot 55: 365- 375

207

Hodgins DS (1968) The presence of a carbonyl group at the active site of L-phenylalanine ammonia-lyase. Biochem Biophys Res Commun 32: 246-253

Holton TA, Cornish EC (1995) Genetics and biochemistry of anthocyanin biosynthesis. Plant Cell 7: 1071-1083

Holton TA, Tanaka Y (1994) Blue roses- a pigment of our imagination? Trends Biotechnol 12: 40-42

Hosfield GL (2001) Seed coat colour in Phaseolus vulgaris L.: Its chemistry and associated health benefits. Annual Report of the Bean Improvement Cooperative 44:1–6

Hotter GS, Kooter J, Dubery IA, Lamb CJ, Dixon RA, Harrison MJ (1995) Cis elements and potential trans-acting factors for the developmental regulation of the Phaseolus vulgaris CHS15 promoter. Plant Mol Biol 28: 967-981; 981

Hougaard BK, Madsen LH, Sandal N, Moretzsohn MC, Fredslund J (2008) Legume anchor markers link syntenic regions between Phaseolus vulgaris, Lotus japonicus, Medicago truncatula and Arachis. Genetics 119: 2299-2312

Hrazdina G, Zobel AM, Hoch HC (1987) Biochemical, immunological, and immunocytochemical evidence for the association of chalcone synthase with endoplasmic reticulum membranes. Proceedings of the National Academy of Sciences of the United States of America 84: 8966-8970

Hsieh L, Hsieh Y, Yeh C, Cheng C, Yang C, Lee P (2011) Molecular characterization of a phenylalanine ammonia-lyase gene (BoPAL1) from Bambusa oldhamii. Mol Biol Rep 38: 283-290

Huang J, Gu M, Lai Z, Fan B, Shi K, Zhou Y, Yu J, Chen Z (2010) Functional Analysis of the Arabidopsis PAL Gene Family in Plant Growth, Development, and Response to Environmental Stress. Plant Physiol 153: 1526-1538

Huang Y, Gou J, Jia Z, Yang L, Sun Y, Xiao X, Song F, Luo K (2012) Molecular Cloning and Characterization of Two Genes Encoding Dihydroflavonol-4-Reductase from Populus trichocarpa. PLoS ONE 7: e30364

Ibrahim RK, Varin L (1993) Flavonoid enzymology. Meth Plant Biochem 9: 99-131

Imin N, Nizamidin M, Wu T, Rolfe BG (2007) Factors involved in root formation in Medicago truncatula. J of Exp Bot 58: 439-451

Inagaki Y, Johzuka-Hisatomi Y, Mori T, Takahashi S, Hayakawa Y, Peyachoknagul S, Ozeki Y, Iida S (1999) Genomic organization of the genes encoding dihydroflavonol 4-reductase for flower pigmentation in the Japanese and common morning glories. Gene 226: 181-188

208

Islam FMA, Basford KE, Jara C, Redden RJ, Beebe S (2002) Seed compositional and disease resistance differences among gene pools in cultivated common bean. Genetic Resources and Crop Evolution 49: 285-293

Islam FMA, Rengifo J, Redden RJ, Basford KE, Beebe SE (2003) Association between seed coat polyphenolics (tannins) and disease resistance in common bean. Plant Foods for Human Nutrition (Formerly Qualitas Plantarum) 58: 285-297

Jiang C, Gu J , Chopra S, Gu X, Peterson T (2004) Ordered origin of the typical two- and three- repeat Myb genes. Gene 326: 13-22

Jin,H, Martin,C (1999) Multifunctionality and diversity within the plant MYB-gene family. Plant Mol Biol 41: 577-585

Johnson ET, Ryu S, Yi H, Shin B, Cheong H, Choi G (2001) Alteration of a single amino acid changes the substrate specificity of dihydroflavonol 4-reductase. The Plant Journal 25: 325- 333

Joos HJ, Hahlbrock K (1992) Phenylalanine ammonia-lyase in potato (Solanum tuberosum L.). European Journal of Biochemistry 204: 621-629

Jung S, Cestaro A, Troggio M, Main D, Zheng P, Cho I, Folta KM, Sosinski B, Abbott AG, Celton JM, Arús P, Shulaev V, Verde I, Morgante M, Rokhsar DS, Velasco R, Sargent DJ (2012) Whole genome comparisons of Fragaria, Prunus and Malus reveal different modes of evolution between Rosaceous subfamilies. BMC Genomics 13:129

Kallberg Y, Oppermann U, Jornvall H, Persson B (2002) Short-chain dehydrogenases/reductases (SDRs). European Journal of Biochemistry 269: 4409-4417

Kalo P, Seres A, Taylor SA, Jakab J, Kevei Z, Kereszt A, Endre G, Ellis TH, Kiss GB (2004) Comparative mapping between Medicago sativa and Pisum sativum. Mol Genet Genomics 272: 235-246

Kami J, Poncet V, Geffroy V, Gepts P (2006) Development of four phylogenetically-arrayed BAC libraries and sequence of the APA locus in Phaseolus vulgaris. TAG Theoretical and Applied Genetics 112: 987-998

Kao Y, Harding SA, Tsai C (2002) Differential expression of two distinct phenylalanine ammonia-lyase genes in condensed tannin-accumulating and lignifying cells of quaking Aspen. Plant Physiol 130: 796-807

Kawamura M, Ito S, Nakamichi N, Yamashino T, Mizuno T (2008) The function of the clock- associated transcriptional regulator CCA1 (CIRCADIAN CLOCK-ASSOCIATED 1) in Arabidopsis thaliana. Biosci Biotechnol Biochem 72: 1307-1316

209

Kelley LA, Sternberg MJE (2009) Protein structure prediction on the web: a case study using the Phyre server. Nature Protocols 4: 363- 371

Kelly JD, Gepts P, Miklas PN, Coyne DP (2003) Tagging and mapping of genes and QTL and molecular marker-assisted selection for traits of economic importance in bean and cowpea. Field Crops Res 82: 135-154

Khalil AH, El-Adawy T (1994) Isolation, identification and toxicity of saponin from different legumes. Food Chem 50: 197-201

Kim S, Baek D, Cho D, Lee E, Yoon M (2009) Identification of two novel inactive DFR-A alleles responsible for failure to produce anthocyanin and development of a simple PCR- based molecular marker for bulb colour selection in onion (Allium cepa L.). TAG Theoretical and Applied Genetics 118: 1391-1399

Kobayashi S, Goto-Yamamoto N, Hirochika H (2004) Retrotransposon-induced mutations in grape skin colour. Science 304: 982

Koenig R, Gepts P (1989) Segregation and Linkage of Genes for Seed Proteins, Isozymes, and Morphological Traits in Common Bean (Phaseolus vulgaris). Journal of Heredity 80: 455- 456

Koes R, Verweij W, Quattrocchio F (2005) Flavonoids: a colourful model for the regulation and evolution of biochemical pathways. Trends Plant Sci 10: 236-242

Koes RE, Quattrocchio F, Mol JNM (1994) The flavonoid biosynthetic pathway in plants: Function and evolution. Bioessays 16: 123-132

Koes RE, Spelt CE, van dE, Mol JNM (1989) Cloning and molecular characterization of the chalcone synthase multigene family of Petunia hybrida. Gene 81: 245-257

Koukol J, Conn EE (1961) Metabolism of aromatic compounds in higher plants. IV. Purification and properties of phenylalanine deaminase of Horden vulgare. J Biol Chem 236: 2692-2698

Kranz HD, Denekamp M, Greco R, Jin H, Leyva A, Meissner RC, Petroni K, Urzainqui A, Bevan M, Martin C, Smeekens S, Tonelli C, Paz-Ares J, Weisshaar B (1998) Towards functional characterisation of the members of theR2R3-MYBgene family fromArabidopsis thaliana. The Plant Journal 16: 263-276

Kumar A, Ellis BE (2001) The phenylalanine ammonia-lyase gene family in raspberry. Structure, expression, and evolution. Plant Physiol 127: 230-239

Kwak M, Velasco DM, Gepts P (2008) Mapping homologous sequences for determinacy and photoperiod sensitivity in common bean (Phaseolus vulgaris). J Hered 99:283-291

210

Kyle M, Dickson M (1988) Linkage of hypersensitivity to five viruses with the B locus in Phaseolus vulgaris L. J Hered 79: 308-311

Lamprecht H (1939) Zur Genetik von Phaseolus vulgaris. XIV. Über die Wirkung der Gene P, C, J, Ins, Can, G, B, V, Vir, Och und Flav. Hereditas 25:255-288.

Lamprecht H (1961) Weitere Koppelungsstudien an Phaseolus vulgaris mit einer Übersicht über die Koppelungsgruppen. Agri Hort Genet 19: 319-332

Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics, 23: 2947-2948

Larsen J (2005) Discovery and utilization of molecular markers for genetic studies of common bacterial blight resistance and seed coat colour in P. vulgaris L. M.Sc. thesis, University of Guelph, Guelph, ON.

Lavin M, Herendeen PS, Wojjciechowski MF (2005) Evolutionary rate analysis of leguminosae implicates a rapid diversification of lineages during the tertiary. Syst Biol 54: 575-594

Lawton MA, Dean SM, Dron M, Kooter JM, Kragh KM, Harrison MJ, Yu L, Tanguay L, Dixon RA, Lamb CJ (1991) Silencer region of a chalcone synthase promoter contains multiple binding sites for a factor, SBF-1, closely related to GT-1. Plant Mol Biol 16: 235-249

Leakey CLA (1988) Genotypic and phenotypic markers in common bean. In: P. Gepts (ed) Genetic Resources of Phaseolus Beans. Kluwer Academic Publishers, pp 245-327

Lee SW, Robb J, Nazar RN (1992) Truncated phenylalanine ammonia-lyase expression in tomato (Lycopersicon esculentum). J Biol Chem 267: 11824-11830

Levy HL (1999) Phenylketonuria: Old disease, new approach to treatment. Proc Natl Acad Sci 96: 1811-1813

Lewis G, Schrire B, Mackind B, Lock M (2005) Legumes of the World. Royal Botanic Gardens. Kew, UK

Leyva A, Liang X, Pintor-Toro JA, Dixon RA, Lamb CJ (1992) Cis-element combinations determine phenylalanine ammonia-lyase gene tissue-specific expression patterns. Plant Cell 4: 263-271

Li J, Ou-Lee TM, Raba R, Amundson RG, Last RL (1993) Arabidopsis flavonoid mutants are hypersensitive to UV-B irradiation. Plant Cell 5: 171-179

211

Li J, Yang X, Wang Y, Li X, Gao Z, Pei M, Chen Z, Qu LJ, Gu H (2006) Two groups of MYB transcription factors share a motif which enhances trans-activation activity. Biochem Biophys Res Commun 341: 1155-1163

Liener IE (1982) Toxic constituents in legumes. In: Arora S.K. (ed) Chemistry and biochemistry of legumes. Oxford and IBH, New Delhi, India pp 217

Lillo C, Lea US, Ruoff P (2008) Nutrient depletion as a key factor for manipulating gene expression and product formation in different branches of the flavonoid pathway. Plant Cell Environm 31: 582–601

Liu Z, Park B, Kanno A, Kameya T (2005) The novel use of a combination of sonication and vacuum infiltration in agrobacterium-mediated transformation of kidney bean (Phaseolus vulgaris L.) with lea gene. Mol Breed 16: 189-197

Lo Piero AR, Puglisi I, Petrone G (2006) Gene characterization, analysis of expression and in vitro synthesis of dihydroflavonol 4-reductase from [Citrus sinensis (L.) Osbeck]. Phytochemistry 67: 684-695

Loake GJ, Choudhary AD, Harrison MJ, Mavandad M, Lamb CJ, Dixon RA (1991) Phenylpropanoid pathway intermediates regulate transient expression of a chalcone synthase gene promoter. Plant Cell 3: 829-840

Loake GJ, Faktor O, Lamb CJ, Dixon RA (1992) Combination of H-box [CCTACC(N)7CT] and G-box (CACGTG) cis elements is necessary for feed-forward stimulation of a chalcone synthase promoter by the phenylpropanoid-pathway intermediate p-coumaric acid. Proc Natl Acad Sci U S A 89: 9230-9234

Lu CA, Ho TH, Ho SL, Yu SM (2002) Three novel MYB proteins with one DNA binding repeat mediate sugar and hormone regulation of alpha-amylase gene expression. Plant Cell 14: 1963-1980

Lu X, Zhou W, Gao F (2009) Cloning, characterization and localization of CHS gene from blood orange, Citrus sinensis (L.) Osbeck cv. Ruby. Mol Biol Rep 36: 1983-1990

Ma Y, Bliss FA (1978) Tannin content and inheritance in common bean. Crop Sci 18: 201-204

MacDonald MJ, D'Cunha GB (2007) A modern view of phenylalanine ammonia lyase. Biochemistry & Cell Biology 85: 273-282

Makkar HPS, Becker K, Abel H, Pawelzik E (1997) Nutrient contents, rumen protein degradability and antinutritional factors in some colour- and white-flowering cultivars of Vicia faba beans. J Sci Food Agric 75: 511-520

Maniatis T, Fritsch EF, Sambrook J (1989) Molecular cloning : a laboratory manual / J. Sambrook, E.F. Fritsch, T. Maniatis. New York : Cold Spring Harbor Laboratory Press,

212

Mano H, Ogasawara F, Sato K, Higo H, Minobe Y (2007) Isolation of a regulatory gene of anthocyanin biosynthesis in tuberous roots of purple-fleshed sweet potato. Plant Physiol 143: 1252-1268

Martens S, Preuß A, Matern U (2010) Multifunctional flavonoid dioxygenases: Flavonol and anthocyanin biosynthesis in Arabidopsis thaliana L. Phytochemistry 71: 1040-1049

Martens S, Teeri T, Forkmann G (2002) Heterologous expression of dihydroflavonol 4- reductases from various plants. FEBS Lett 531: 453-458

Martin C, Carpenter R, Sommer H, Saedler H, Coen ES (1985) Molecular analysis of instability in flower pigmentation of Antirrhinum majus, following isolation of the pallida locus by transposon tagging. EMBO J 4: 1625-1630

Martin CR (1993) Structure, function, and regulation of the chalcone synthase. Int Rev Cytol 147: 233-284

Mathews H, Clendennen SK, Caldwell CG, Liu XL, Connors K, Matheis N, Schuster DK, Menasco DJ, Wagoner W, Lightner J, Wagner DR (2003) Activation tagging in tomato identifies a transcriptional regulator of anthocyanin biosynthesis, modification, and transport. Plant Cell 15: 1689-1703

Matousek J, Novak P, Brıza J, Patzak J, Niedermeierova H (2002) Cloning and characterisation of chs-specific DNA and cDNA sequences from hop (Humulus lupulus L.). Plant Science 162: 1007-1018

Matsumura H, Watanabe S, Harada K, Senda M, Akada S, Kawasaki S, Dubouzet EG, Minaka N, Takahashi R (2005) Molecular linkage mapping and phylogeny of the chalcone synthase multigene family in soybean. Theoretical and Applied Genetics 110: 1203-1209

Mavandad M, Edwards R, Liang X, Lamb CJ, Dixon RA (1990) Effects of trans-cinnamic acid on expression of the bean phenylalanine ammonia-lyase gene family. Plant Physiol 94(2):671-680

McClean PE, Cannon S, Gepts P, Hudson M, Jackson S, Rokhsar D, Schmutz J, Vance C Towards a whole genome sequence of common bean (Phaseolus vulgaris): Background, Approaches, Applications. Executive Summary http://www.csrees.usda.gov/business/reporting/stakeholder/pdfs/pl_common_bean.pdf:

McClean PE, Lee RK, Miklas PN (2004) Sequence diversity analysis of dihydroflavonol 4- reductase intron 1 in common bean. Genome 47: 266-280

McClean PE, Lee RK, Otto C, Gepts P, Bassett MJ (2002) Molecular and Phenotypic Mapping of Genes Controlling Seed Coat Pattern and Color in Common Bean (Phaseolus vulgaris L.). J Hered 93: 148-152

213

McClean PE, Mamidi S, McConnell M, Chikara S, Lee R (2010) Synteny mapping between common bean and soybean reveals extensive blocks of shared loci. BMC Genomics 11: 184

McConnell M, Mamidi S, Lee R, Chikara S, Rossi M, Papa R, McClean P (2010) Syntenic relationships among legumes revealed using a gene-based genetic linkage map of common bean (Phaseolus vulgaris L.). Theor Appl Genet 121: 1103-1116

Melotto M, Coelho M, Pedrosa-Harand A, Kelly J, Camargo L (2004) The anthracnose resistance locus Co-4 of common bean is located on chromosome 3 and contains putative disease resistance-related genes. TAG Theoretical and Applied Genetics 109: 690-699

Mensack M, Fitzgerald V, Ryan E, Lewis M, Thompson H, Brick M (2010) Evaluation of diversity among common beans (Phaseolus vulgaris L.) from two centers of domestication using 'omics' technologies. BMC Genomics 11:686

Mercado-Ruaro P, Kenton AY (1993) http://data.kew.org/cvalues/homepage.html.

Meyer P, Heidmann I, Forkmann G, Saedler H (1987) A new petunia flower colour generated by transformation of a mutant with a maize gene. Nature 330: 677-678

Michaels TE, Smith TW, Larsen J, Beattie AD, Pauls KP (2006) OAC Rex common bean. Can J Plant Sci 86: 733–736

Mo Y, Nagel C, Taylor LP (1992) Biochemical complementation of chalcone synthase mutants defines a role for flavonols in functional pollen. P N A S 89: 7213-7217

Mol J, Grotewold E, Koes R (1998) How genes paint flowers and seeds. Trends Plant Sci 3: 212- 217

Mudge J, Cannon SB, Kalo P, Oldroyd GE, Roe BA, Town CD, Young ND (2005) Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana. BMC Plant Biol 5: 15

Murray J, Larsen J, Michaels TE, Schaafsma A, Vallejos CE, Pauls KP (2002) Identification of putative genes in bean (Phaseolus vulgaris) genomic (Bng) RFLP clones and their conversion to STSs. Genome 45(6): 1013-1024

Nakatsuka T, Abe Y, Kakizaki Y, Yamamura S, Nishihara M (2007) Production of red-flowered plants by genetic engineering of multiple flavonoid biosynthetic genes. Plant Cell Reports 26: 1951-1959

Nakatsuka T, Haruta KS, Pitaksutheepong C, Abe Y, Kakizaki Y, Yamamoto K, Shimada N, Yamamura S, Nishihara M (2008) Identification and characterization of R2R3-MYB and bHLH transcription factors regulating anthocyanin biosynthesis in gentian flowers. Plant Cell Physiol 49: 1818-1829

214

Nakayama T, Sato T, Fukui Y, Yonekura-Sakakibara K, Hayashi H, Tanaka Y, Kusumi T, Nishino T (2001) Specificity analysis and mechanism of aurone synthesis catalyzed by aureusidin synthase, a polyphenol oxidase homolog responsible for flower colouration. FEBS Letters 499(1): 107-111

Nei M, Rooney AP (2005) Concerted and birth-and-death evolution of multigene families. Annu Rev Genet 39: 121-152

Niu S, Xu C, Zhang W, Zhang B, Li X, Lin-Wang K, Ferguson I, Allan A, Chen K (2010) Coordinated regulation of anthocyanin biosynthesis in Chinese bayberry (Myrica rubra) fruit by a R2R3 MYB transcription factor. Planta 231: 887-899

Nodari RO, Koinange EMK, Kelly JD, Gepts P (1992) Towards an integrated linkage map of common bean. TAG Theoretical and Applied Genetics 84: 186-192

Nodari RO, Tsail SM, Gilbertson RL, Gepts P (1993) Towards an integrated linkage map of common bean 2. Development of an RFLP-based linkage map. Theoretical and Applied Genetics 85: 513-520

Nugroho LH, Verberne MC, Verpoorte R (2002) Activities of enzymes involved in the phenylpropanoid pathway in constitutively salicylic acid-producing tobacco plants. Plant Physiology and Biochemistry 40: 755-760

Ohl S, Hedrick SA, Chory J, Lamb CJ (1990) Functional Properties of a Phenylalanine Ammonia-Lyase Promoter from Arabidopsis. Plant Cell 2: 837-848

Olsen KM, Lea US, Slimestad R, Verheul M, Lillo C (2008) Differential expression of four Arabidopsis PAL genes; PAL1 and PAL2 have functional specialization in abiotic environmental-triggered flavonoid synthesis. J Plant Physiol 165: 1491-1499

Onyilagha JC, Lazorko J, Gruber MY, Soroka JJ, Erlandson MA (2004) Effect of flavonoids on feeding preference and development of the crucifer pest Mamestra configurata Walker. J Chem Ecol 30(1): 109-124

O'Reilly C, Shepherd NS, Pereira A, Schwarz-Sommer Z, Bertram I, Robertson DS (1985) Molecular cloning of the a1 locus of Zea mays using the transposable elements En and Mu1. EMBO J 4: 877-882

Orndorff SA, Costantino N, Stewart D, Durham DR (1988) Strain improvement of Rhodotorula graminis for production of a novel l-phenylalanine ammonia-lyase. Appl Environ Microbiol 54: 996-1002

Oufedjikh H, Mahrouz M, Amiot MJ, Lacroix M (2000) Effect of γ-irradiation on phenolic compounds and phenylalanine ammonia-lyase activity during storage in relation to peel injury from peel of Citrus clementina Hort. Ex. Tanaka. J Agric Food Chem 48: 559-565 215

Owens DK, Crosby KC, Runac J, Howard BA, Winkel BSJ (2008) Biochemical and genetic characterization of Arabidopsis flavanone 3[beta]-hydroxylase. Plant Physiology and Biochemistry 46: 833-843

Palapol Y, Ketsa S, Lin-Wang K, Ferguson I, Allan A (2009) A MYB transcription factor regulates anthocyanin biosynthesis in mangosteen (Garcinia mangostana L.) fruit during ripening. Planta 229: 1323-1334

Palmer RG, Pfeiffer TW, Buss GR, Kilen TC (2004) Qualitative genetics, pp. 137–234 in Soybeans: Improvement, Production, and Uses, edited by J. E. Specht and H. R. Boerma. American Society of Agronomy, Madison, WI.

Paré PW, Dmitrieva N, Mabry TJ (1991) Phytoalexin aurone induced in Cephalocereus senilis liquid suspension-culture. Phytochemistry 30: 1133-1135

Park M, Jo SH, Kwon JK, Park J, Ahn JH, Kim S, Lee YH, Yang TJ, Hur CG, Kang BC, Kim SD, Choi D (2011) Comparative analysis of pepper and tomato reveals euchromatin expansion of pepper genome caused by differential accumulation of Ty3/Gypsy-like elements. BMC Genomics 12: 85

Pedrosa-Harand A, Porch TG, Gepts P (2008) Standard nomenclature for common bean chromosomes and linkage groups. Bean Improvement Cooperative Annual Report 51:106- 107

Peer WA, Murphy AS (2007) Flavonoids and auxin transport: modulators or regulators? Trends Plant Sci 12: 556-563

Petit P, Granier T, d'Estaintot BL, Manigand C, Bathany K, Schmitter JM, Lauvergeat V, Hamdi S, Gallois B (2007) Crystal structure of grape dihydroflavonol 4-reductase, a key enzyme in flavonoid biosynthesis. J Mol Biol 368: 1345-1357

Petroni K, Falasca G, Calvenzani V, Allegra D, Stolfi C, Fabrizi L, Altamura MCT (2008) The AtMYB11 gene from Arabidopsis is expressed in meristematic cells and modulates growth in planta and organogenesis in vitro. J Exp Bot 59: 1201-1213

Polashock J, Griesbach R, Sullivan R, Vorsa N (2002) Cloning of a cDNA encoding the cranberry dihydroflavonol-4-reductase (DFR) and expression in transgenic tobacco. Plant Science 163: 241-251

Polya GM (2003) Biochemical targets of plant bioactive compounds: A pharmacological reference guide to sites of action and biological effects. Taylor & Francis, New York

Prakken R (1970) Inheritance of colour in Phaseolus vulgaris L. II. A critical review. Meded Landbouwhogeschool Wageningen 23: 1-38

216

Prakken R (1972) Inheritance of colours in Phaseolus vulgaris L. III. On genes for red seed coat colour and a general synthesis. Meded Landbouwhogeschool Wageningen 29: 1-82

Price KR, Eagles J, Fenwick GR (1988) Saponin composition of 13 varieties of legume seed using fast atom bombardment mass spectrometry. J Sci Food Agric 42: 183-193

Quattrocchio F, Wing J, Woude K, Souer E, de Vetten N, Mol J, Koes R (1999) Molecular analysis of the anthocyanin2 gene of petunia and its role in the evolution of flower colour. Plant Cell 11: 1433-1444

Qzeki Y, Matsui K, Sakuta M, Matsuoka M, Ohashi Y, Kano-Murakami Y, Yamamoto N, Tanaka Y (1990) Differential regulation of phenylalanine ammonia-lyase genes during anthocyanin synthesis and by transfer effect in carrot cell suspension cultures. Physiol Plantarum 80: 379-387

Rao S, Rossmann M (1973) Comparison of super-secondary structures in proteins. J Mol Biol 76 (2): 241-56

Reddy AR, Brltsch L, Salamlnl F, Saedler H, and Rohde W (1987) The A7 (Anthocyanin-1) locus in Zea mays encodes dihydroquercetin reductase. Plant Sci 52: 7-13

Reyes-Moreno C, Paredes-López O (1993) Hard-to-cook phenomenon in common beans--a review. Crit Rev Food Sci Nutr 33: 227-286

Ribeiro T, dos Santos K, Fonsêca A, Pedrosa-Harand A (2011) Isolation and characterization of a new repetitive DNA family recently amplified in the Mesoamerican gene pool of the common bean (Phaseolus vulgaris L., Fabaceae). Genetica 139: 1135-1142

Riechmann JL, Heard J, Martin G, Reuber L, Jiang C, Keddie J, Adam L, Pineda O, Ratcliffe OJ, Samaha RR (2000) Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes. Science 290: 2105-2110

Riov J, Monselise SP, Kahan RS (1968) Effect of gamma radiation on phenylalanine ammonialyase activity and accumulation of phenolic compounds in citrus fruit peel. Radiation Botany 8: 463-466

Ritter H, Schulz GE (2004) Structural basis for the entrance into the phenylpropanoid metabolism catalyzed by phenylalanine ammonia-lyase. Plant Cell 16: 3426-3436

Robbins MP, Bavage AD, Strudwicke C, Morris P (1998) Genetic Manipulation of Condensed Tannins in Higher Plants . II. Analysis of Birdsfoot Trefoil Plants Harboring Antisense Dihydroflavonol Reductase Constructs. Plant Physiol 116: 1133-1144

Romani A, Vignolini P, Galardi C, Mulinacci N, Benedettelli S, Heimler D (2004) Germplasm characterization of zolfino landraces (Phaseolus vulgaris L.) by flavonoid content. J Agric Food Chem 52: 3838-3842

217

Romero I, Fuertes A, Benito MJ, Malpica JM, Leyva A, Paz-Ares J (1998) More than 80R2R3- MYB regulatory genes in the genome of Arabidopsis thaliana. Plant Journal 14(3): 273-284

Rosinski JA, Atchley WR (1998) Molecular evolution of the Myb family of transcription factors: evidence for polyphyletic origin. J Mol Evol 46: 74-83

Rosler J, Krekel F, Amrhein N, Schmid J (1997) Maize phenylalanine ammonia-lyase has tyrosine ammonia-lyase activity. Plant Physiol 113: 175-179

Rother D, Poppe L, Viergutz S, Langer B, Retey J (2001) Characterization of the active site of histidine ammonia-lyase from Pseudomonas putida. European Journal of Biochemistry 268: 6011-6019

Ryder TB, Hedrick SA, Bell JN, Liang X, Clouse SD, Lamb CJ (1987) Organization and differential activation of a gene family encoding the plant defense enzyme chalcone synthase in Phaseolus vulgaris. Molecular and General Genetics MGG 210: 219-233

Safos S (1995) Enzyme replacement therapy in ENU2 phenylketonuric mice using oral microencapsulated phenylalanine ammonia-lyase: a preliminary report. Artif Cells Blood Substit Immobil Biotechnol 23: 681-692

Sainz MB, Grotewold E, Chandler VL (1997) Evidence for direct activation of an anthocyanin promoter by the maize C1 protein and comparison of DNA binding by related Myb domain proteins. Plant Cell 9: 611-625

Saitoh K, Onishi K, Mikami I, Thidar K, Sano Y (2004) Allelic diversification at the C (OsC1) locus of wild and cultivated rice: nucleotide changes associated with phenotypes. Genetics 168: 997-1007

Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4(4):406–425

SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL (1998) The paleontology of intergene retrotransposons of maize. Nature Genet. 20: 43–45

Sarma AD, Sharma R (1999) Purification and characterization of UV-B induced phenylalanine ammonia-lyase from rice seedlings. Phytochemistry 50: 729-737

SAS Institute. 2008. SAS Version 9.1.3. SAS Inst., Cary, NC.

Sato S, Nakamura Y, Asamizu E, Isobe S, Tabata S (2007) Genome Sequencing and Genome Resources in Model Legumes. Plant Physiol 144: 588-593

Scalbert A, Williamson G (2000) Dietary intake and bioavailability of polyphenols. J Nutr 130: 2073-2085

218

Schlueter J, Goicoechea J, Collura K, Gill N, Lin J, Yu Y, Kudrna D, Zuccolo A, Vallejos C, Munoz-Torres M, Blair M, Tohme J, Tomkins J, McClean P, Wing R, Jackson S (2008) BAC-end sequence analysis and a draft physical map of the common bean (Phaseolus vulgaris L.) genome. Tropical Plant Biology 1: 40-48

Schlueter J, Lin J, Schlueter S, Vasylenko-Sanders I, Deshpande S, Yi J, O'Bleness M, Roe B, Nelson R, Scheffler B, Jackson S, Shoemaker R (2007) Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing. BMC Genomics 8: 330

Schmid J, Doerner PW, Clouse SD, Dixon RA, Lamb CJ (1990) Developmental and environmental regulation of a bean chalcone synthase promoter in transgenic tobacco. Plant Cell 2: 619-631

Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ,Cheng J (2010) Genome sequence of the paleopolyploid soybean. Nature 463: 178-183

Schuster B, Rétey J (1994) Serine-202 is the putative precursor of the active site dehydroalanine of phenylalanine ammonia lyase site-directed mutagenesis studies on the enzyme from parsley (Petroselinum crispum L.). FEBS Lett 349: 252-254

Schwede TF, Retey J, Schulz GE (1999) Crystal Structure of Histidine Ammonia-Lyase Revealing a Novel Polypeptide Modification as the Catalytic Electrophile. Biochemistry (NY) 38: 5355-5361

Schwinn K, Venail J, Shang Y, Mackay S, Alm V, Butelli E, Oyama R, Bailey P, Davies K, Martin C (2006) A small family of MYB-regulatory genes controls floral pigmentation intensity and patterning in the genus Antirrhinum. Plant Cell 18: 831-851

Senda M, Kasai A, Yumoto S, Akada S, Ishikawa R, Harada T, Niizeki M (2002) Sequence divergence at chalcone synthase gene in pigmented seed coat soybean mutants of the Inhibitor locus. Genes Genet Syst 77: 341-350

Seshime Y, Juvvadi PR, Fujii I, Kitamoto K (2005) Discovery of a novel superfamily of type III polyketide synthases in Aspergillus oryzae. Biochem Biophys Res Commun 331: 253-260

Shelton D, Stranne M, Mikklesen M, Pakseresht N, Welham T, Hiraka H, Tabata S, Sato S, Paquette S, Wang T, Martin C, Bailey P (2012) Transcription factors of Lotus japonicus: regulation of isoflavonoid biosynthesis requires co-ordinated changes in transcription factor activity. Plant Physiology 159: 531–547

Shimada N, Sasaki R, Sato S, Kaneko T, Tabata S, Aoki T, Ayabe S (2005) A comprehensive analysis of six dihydroflavonol 4-reductases encoded by a gene cluster of the Lotus japonicus genome. J Exp Bot 56: 2573-2585

219

Shimizu T, Fujibe R, Senda M, Ishikawa R, Harada T, Niizeki M, Akada S (2000) Molecular Cloning and Characterization of a Subfamily of UV-B Responsive MYB genes from Soybean. Breed Sci 50: 81-90 Shin R, Burch A, Huppert K, Tiwari S, Murphy A, Guilfoyle T, Schachtman D (2007) The Arabidopsis transcription factor MYB77 modulates auxin signal transduction. Plant Cell 19: 2440-2453

Shoemaker RC, Schlueter J, Doyle JJ (2006) Paleopolyploidy and gene duplication in soybean and other legumes. Curr Opin Plant Biol 9: 104-109

Sikora LA, Marzluf GA (1982) Regulation of L-phenylalanine ammonia-lyase by L- phenylalanine and nitrogen in Neurospora crassa. J Bacteriol 150: 1287-1291

Singh RP, Gu M, Agarwal R (2008) Silibinin inhibits colourectal cancer growth by inhibiting tumor cell proliferation and angiogenesis. Cancer Res 68: 2043-2050

Singh SP, Gepts P, Debouck DG (1991) Races of common bean (Phaseolus vulgaris, Fabaceae). Economic botany 45: 379-396

Sommer H, Saedler H (1986) Structure of the chalcone synthase gene of Antirrhinum majus. Molecular and General Genetics MGG 202: 429-434

Sonderby IE, Hansen BG, Bjarnholt N, Ticconi C, Halkier BA, Kliebenstein DJ (2007) A systems biology approach identifies a R2R3 MYB gene subfamily with distinct and overlapping functions in regulation of aliphatic glucosinolates. PLoS ONE 2: e1322 Sormacheva ID, Blinov AG (2011) LTR Retrotransposons in Plants. Russian Journal of Genetics: Applied Research 1(6): 540–564

Springob K, Nakajima J, Yamazaki M, Saito K (2003) Recent advances in the biosynthesis and accumulation of anthocyanins. Nat Prod Rep 20: 288-303

Stafford HA (1991) Flavonoid Evolution: An Enzymic Approach. Plant Physiol 96: 680-685

Stam P, Van Ooijen JW (1995) JoinMap Version 2.0: Software for the Calculation of Genetic Linkage Maps. Wageningen, The Netherlands.

Statler GD (1970) Resistance of bean plants to Fusarium Solani F Phaseoli . Plant Dis Rep 54: 698-699

Stechmann A (2004) Genome evolution: the dynamics of static genomes. Curr Biol 14:473-474

Sterck L, Rombauts S, Vandepoele K, Rouze P, Van de Peer Y (2007) How many genes are there in plants (... and why are they there)? Curr Opin Plant Biol 10: 199-203

220

Stermer BA, Schmid J, Lamb CJ, Dixon RA (1990) Infection and strees activation of bean chalcone synthase promoters in transgenic tobacco. Molecular plant-microbe interactions 3: 381-388

Stracke R, Ishihara H, Huep G, Barsch A, Mehrtens F, Niehaus K, Weisshaar B (2007) Differential regulation of closely related R2R3-MYB transcription factors controls flavonol accumulation in different parts of the Arabidopsis thaliana seedling. Plant J 50: 660-677

Stracke R, Werber M, Weisshaar B (2001) The R2R3-MYB gene family in Arabidopsis thaliana. Curr Opin Plant Biol 4: 447-456

Sun Y, Tian Q, Yuan L, Jiang Y, Huang Y, Sun M, Tang S, Luo K (2011) Isolation and promoter analysis of a chalcone synthase gene PtrCHS4 from Populus trichocarpa. Plant Cell Rep 30: 1661-1671

Tabekhia MM, Luh BS (1980) Effect of germination, cooking and canning on phosphorus and phytate retention in dry beans. J Food Sci 45: 406-408

Takahashi H, Hayashi M, Goto F, Sato S, Soga T, Nishioka T, Tomita M, Kawai-Yamada M, Uchimiya H (2006) Evaluation of metabolic alteration in transgenic rice overexpressing dihydroflavonol-4-reductase. Annals of Botany 98: 819-825

Takeoka GR, Dao LT, Full GH, Wong RY, Harden LA, Edwards RH, Berrios JDJ (1997) Characterization of Black Bean (Phaseolus vulgaris L.) Anthocyanins. J Agric Food Chem 45: 3395-3400

Takos AM, Jaffe FW, Jacob SR, Bogs J, Robinson SP, Walker AR (2006) Light-induced expression of a MYB gene regulates anthocyanin biosynthesis in red apples. Plant Physiol 142: 1216-1232

Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Molecular Biology and Evolution 24:1596-1599

Tanaka Y, Fukui Y, Fukuchi-Mizutani M, Holton TA, Higgins E, Kusumi T (1995) Molecular cloning and characterization of Rosa hybrida dihydroflavonol 4-reductase gene. Plant and Cell Physiology 36: 1023-1031

Tanaka Y, Tsuda S, Kusumi T (1998) Metabolic engineering to modify flower colour. Plant and Cell Physiology 39: 1119-1126

Tang H, Bowers EJ, Wang X, Ming R, Alam M, Paterson AH (2008) Synteny and collinearity in plant genomes. Science 320: 486-488

Taylor LP, Grotewold E (2005) Flavonoids as developmental regulators. Curr Opin Plant Biol 8: 317-323

221

Tena M, Lopez-Valbuena R, Jorrin J (1984) Induction of phenylalanine ammonia-lyase in hypocotyls of sunflower seedlings by light, excision and sucrose. Physiol Plantarum 60: 159- 165

Thomas BC, Pedersen B, Freeling M (2006) Following tetraploidy in an Arabidopsis ancestor, genes were removed preferentially from one homeolog leaving clusters enriched in dose- sensitive genes. Genome Res 16:934–946

Tian L, Dixon RA (2006) Engineering isoflavone metabolism with an artificial bifunctional enzyme. Planta 224: 496-507

Tian ZX, Wang XB, Lee R, Li YH, Specht JE, Nelson RL, McClean PE, Qiu LJ, Ma JX (2010) Artificial selection for determinate growth habit in soybean. Proc Natl Acad Sci USA 107: 8563–8568

Todd JJ, Vodkin LO (1996) Duplications that suppress and deletions that restore expression from a chalcone synthase multigene family. Plant Cell 8: 687-699

Tohge T, Nishiyama Y, Hirai MY, Yano M, Nakajima J, Awazuhara M, Inoue E, Takahashi H, Goodenowe DB, Kitayama M, Noji M, Yamazaki M, Saito K (2005) Functional genomics by integrated analysis of metabolome and transcriptome of Arabidopsis plants over-expressing an MYB transcription factor. Plant J 42: 218-235

Trabelsi N, Petit P, Manigand C, Langlois d'Estaintot B, Granier T, Chaudière J, Gallois B (2008) Structural evidence for the inhibition of grape dihydroflavonol 4-reductase by flavonols. Acta Crystallographica Section D 883-891

Treutter D (2005) Significance of flavonoids in plant resistance and enhancement of their biosynthesis. Plant Biol 7: 581-591

Tuteja J, Zabala G, Varala K, Hudson M, Vodkin L (2009) Endogenous, tissue-specific short iInterfering RNAs silence the chalcone synthase gene family in G. max seed coats. Plant Cell 21: 3063-3077

Tuteja JH, Clough SJ, Chan W, Vodkin LO (2004) Tissue-specific gene silencing mediated by a naturally occurring chalcone synthase gene cluster in Glycine max. Plant Cell 16: 819-835

Tuteja JH, Vodkin LO (2008) Structural features of the endogenous CHS silencing and target loci in the soybean genome. Crop science 48: 49-68

Vailleau F, Daniel X, Tronchet M, Montillet JL, Triantaphylides C, Roby D (2002) A R2R3- MYB gene, AtMYB30, acts as a positive regulator of the hypersensitive cell death program in plants in response to pathogen attack. Proc Natl Acad Sci USA 99: 10179-10184

Vallejos CE, Chase C (1991) Extended map for the phaseolin linkage group of Phaseolus vulgaris L. Theoretical and Applied Genetics 82: 353-357; 357

222

Vallejos CE, Astua-Monge G, Jones V, Plyler TR, Sakiyama NS, Mackenzie SA (2006) Genetic and molecular characterization of the I locus of Phaseolus vulgaris. Genetics 172: 1229-1242

Vallejos CE, Sakiyama NS, Chase CD (1992) A molecular marker-based linkage map of Phaseolus vulgaris L. Genetics 131: 733-740

Vallejos CE, Skroch PW, Nienhuis J (2001) Phaseolus vulgaris: The common bean, integration of RFLP and RAPD-based linkage maps, pp 300-317 in DNA-based Markers in Plants edited by Phillips RL and Vasil IK. Kluwer Academic Publishers, Dordrecht. 2nd edition.

Vinson JA, Hao Y, Su X, Zubik L (1998) Phenol antioxidant quantity and quality in foods: vegetables. J Agric Food Chem 46: 3630-3634

Vodkin L, Jones S, Gonzales OD, Thibaud-Nissen F, Tutega ZG (2008) Genomics of soybean seed development. In: Stacey G (ed) Genetics and Genomics of Soybean. Springer, New York pp 163-184

Walker AR, Lee E, Bogs J, McDavid DAJ, Thomas MR, Robinson SP (2007) White grapes arose through the mutation of two similar and adjacent regulatory genes. The Plant Journal 49: 772-785

Walling JG, Shoemaker R,Young N , Mudge J, Jackson S (2006) Chromosome-level homeology in paleopolyploid soybean (Glycine max) revealed through integration of genetic and chromosome maps. Genetics 172: 1893-1900

Wang S, Hubbard L, Chang Y, Guo J, Schiefelbein J, Chen JG (2008) Comprehensive analysis of single-repeat R3 MYB proteins in epidermal cell patterning and their transcriptional regulation in Arabidopsis. BMC Plant Biol 8: 81

Wanner LA, Li G, Ware D, Somssich IE, Davis KR (1995) The phenylalanine ammonia-lyase gene family in Arabidopsis thaliana. Plant Mol Biol 27: 327-338; 338

Wasson AP, Pellerone FI, Mathesius U (2006) Silencing the flavonoid pathway in Medicago truncatula inhibits root nodule formation and prevents auxin transport regulation by rhizobia. Plant Cell 18: 1617-1629

Watanabe K, Praseuth AP, Wang CCC (2007) A comprehensive and engaging overview of the type III family of polyketide synthases. Curr Opin Chem Biol 11: 279-286

Watanabe SK, Hemandez-Velazco G, Iturbe-Chinas F, Lopez-Mungia A (1992) Phenylalanine ammonia lyase from Sporidiobolus pararoseus and Rhodosporidium toruloides: application for phenylalanine and tyrosine deamination. World J Microbiol Biotechnol 8: 406-410

Wawrzynski A, Ashfield T, Chen NW, Mammadov J, Nguyen A, Podicheti R, Cannon SB, Thareau V, Ameline-Torregrosa C, Cannon E, et al. (2008) Replication of nonautonomous

223

retroelements in soybean appears to be both recent and common. Plant Physiol 148: 1760– 1771

Wicker T, Sabot F, Hua-Van A (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8: 973–982

Williams CA, Grayer RJ (2004) Anthocyanins and other flavonoids. Nat Prod Rep 21: 539-573

Williams RJ, Spencer JPE, Rice-Evans C (2004) Flavonoids: antioxidants or signalling molecules? Free Radical Biology and Medicine 36: 838-849

Winkel-Shirley B (1999) Evidence for enzyme complexes in the phenylpropanoid and flavonoid pathways. Physiologia Plantarum 107: 142-149

Winkel-Shirley B (2001) Flavonoid biosynthesis: a colorful model for genetics, biochemistry, cell biology and biotechnology. Plant Physiol 126:485–493

Winkel-Shirley B (2002) Biosynthesis of flavonoids and effects of stress. Curr Opin Plant Biol 5: 218-223

Woodhouse MR, Schnable JC, Pedersen BS, Lyons E, Lisch D, Subramaniam S, Freeling M (2010) Following tetraploidy in maize, a short deletion mechanism removed genes preferentially from one of the two homeologs. PLoS Biol 8: e1000409

Wu F, Tanksley SD (2010) Chromosomal evolution in the plant family Solanaceae. BMC Genomics 11: 182

Xie D, Jackson L, Cooper J, Ferreira D, Paiva N (2004) Molecular and Biochemical Analysis of Two cDNA Clones Encoding Dihydroflavonol-4-Reductase from Medicago truncatula. Plant Physiol 134: 979-994

Xu M, Brar HK, Grosic S, Palmer RG, Bhattacharyya MK (2010) Excision of an active CACTA- like transposable element from DFR2 causes variegated flowers in soybean [Glycine max (L.) Merr.]. Genetics 184:53-63

Xue B, Charest PJ, Devantier Y, Rutledge RG (2003) Characterization of a MYBR2R3 gene from black spruce (Picea mariana) that shares functional conservation with maize C1. Mol Genet Genomics 270: 78-86

Yan HH, Mudge J, Kim DJ, Shoemaker RC, Cook DR, Young ND (2004) Comparative physical mapping reveals features of microsynteny between Glycine max, Medicago truncatula, and Arabidopsis thaliana. Genome 47: 141-155

Yang J, Gu H (2006) Duplication and divergent evolution of the CHS and CHS-like genes in the chalcone synthase (CHS) superfamily. Chinese Science Bulletin 51: 505-509; 509

224

Yang J, Huang J, Gu H, Zhong Y, Yang Z (2002) Duplication and adaptive evolution of the chalcone synthase genes of dendranthema (Asteraceae). Mol Biol Evol 19: 1752-1759

Yang K, Jeong N, Moon J, Lee Y, Lee S, Kim HM, Hwang CH, Back K, Palmer RG, Jeong S (2010) Genetic Analysis of Genes Controlling Natural Variation of Seed Coat and Flower Colors in Soybean. J Hered 101: 757-768

Yarnell SH (1965) Cytogenetics of the vegetable crops. IV. Legumes. Bet Rev 31:247-330

Yi J, Derynck M, Chen L, Dhaubhadel S (2010) Differential expression of CHS7 and CHS8 genes in soybean. Planta 231: 741-753

Yoshida K, Iwasaka R, Shimada N, Ayabe S, Aoki T, Sakuta M (2010) Transcriptional control of the dihydroflavonol 4-reductase multigene family in Lotus japonicus. Journal of Plant Research 123: 801-805

Yu LM, Lamb CJ, Dixon RA (1993) Purification and biochemical characterization of proteins which bind to the H-box cis-element implicated in transcriptional activation of plant defense genes. The Plant Journal 3: 805-816

Yu ZH, Stall RE, Vallejos CE (1998) Detection of genes for resistance to common bacterial blight of beans. Crop science 38: 1290-1296

Zambre M, Goossens A, Cardona C, Van Montagu M, Terryn N, Angenon G (2005) A reproducible genetic transformation system for cultivated Phaseolus acutifolius (tepary bean) and its use to assess the role of arcelins in resistance to the Mexican bean weevil. TAG Theoretical and Applied Genetics 110: 914-924

Zhang C, Bradshaw JD, Whitham SA, Hill JH (2010) The development of an efficient multipurpose bean pod mottle virus viral vector set for foreign gene expression and RNA silencing. Plant Physiol 153: 52-65

Zhang J, Subramanian S, Stacey G, Yu O (2009) Flavones and flavonols play distinct critical roles during nodulation of Medicago truncatula by Sinorhizobium meliloti. The Plant J 57: 171-183

Zhang Y, Cheng S, De Jong D, Griffiths H, Halitschke R, De Jong W (2009) The potato R locus codes for dihydroflavonol 4-reductase. TAG Theoretical and Applied Genetics 119: 931-937

Zhu H, Choi H, Cook DR, Shoemaker RC (2005) Bridging model and crop legumes through comparative genomics. Plant Physiol 137: 1189-1196

Zhu J, Verslues PE, Zheng X, Lee BH, Zhan X, Manabe Y, Sokolchik I, Zhu Y, Dong CH, Zhu JK, Hasegawa PM, Bressan RA (2005) HOS10 encodes an R2R3-type MYB transcription factor essential for cold acclimation in plants. Proc Natl Acad Sci USA 102: 9966-9971

225

Zimmermann IM, Heim MA, Weisshaar B, Uhrig JF (2004) Comprehensive identification of Arabidopsis thaliana MYB transcription factors interacting with R/B-like BHLH proteins. Plant J 40: 22-34

226

APPENDICES

227

Appendix 1 Table 1. List of all markers tested, type, expected size, sequence, source and polymorphism between OAC Rex and SVM Taylor Hort. polymorphism Locus LG type size Tm(C) Forward PCR primer (5’— 3’) Reverse PCR primer (5’— 3’) Ref OAC Rex/SVM Taylor BM185 7 SSR 105 52.0 - AAGGAGGTTTCTACCTAATTCC AAAGCAGGGATGTAGTTGC Gaitan-Solis et al. 2002 BM160 7 SSR 211 52.0 - CGTGCTTGGCGAATAGCTTTG CGCGGTTCTGATCGTGACTTC Gaitan-Solis et al. 2002 BM210 7 SSR 166 52.0 - ACCACTGCAATCCTCATCTTTG CCCTCATCCTCCATTCTTATCG Gaitan-Solis et al. 2002 BM183 7 SSR 149 52.0 - CTCAAATCTATTCACTGGTCAGC TCTTACAGCCTTGCAGACATC Gaitan-Solis et al. 2002 PVBR269 7 SSR 167 56.0 - TCGCCCCATATTCACTTTTC TGGTGTGCAGAAAGTCTGTGA Grisi et al., 2007 BM201 7 SSR 102 50.0 - TGGTGCTACAGACTTGATGG TGTCACCTCTCTCCTCCAAT Gaitan-Solis et al. 2002 PVBR35 7 SSR 214 56.0 - TCTACGCGTTCCCTCTGTCT AGTGGATGTGTGGGAAAAGC Grisi et al., 2007 PVBR173 8 SSR 219 56.0 - TCGAGATGGATTGAAAACGA CTCTCCCCGCAAAACACAC Grisi et al., 2007 BM211 8 SSR 186 52.0 - ATACCCACATGCACAAGTTTGG CCACCATGTGCTCATGAAGAT Gaitan-Solis et al. 2002 PVBR45 8 SSR 155 56.0 - CGATTGAACGCACTCTACGA GAGGCTGGTTCCTTCAAACA Grisi et al., 2007 g2562 1 SNP - - T/A ACGTTGGATGATATTTGACGCCAAGGCAGG ACGTTGGATGTGGTGTAGGACCATTACCTG Shi et al., 2011 g1404 1 SNP - - G/C ACGTTGGATGTGGTGTGATGAGGAGGTATG ACGTTGGATGGTGACACTAGCATAAAACTC Shi et al., 2011 g1886 1 SNP - - T/C ACGTTGGATGCCGGGAACAGTATTTTGAGG ACGTTGGATGGATAAACCTCCGACCTCTTC Shi et al., 2011 g724 1 SNP - - G/A ACGTTGGATGACAGAGTGATCTGTGACCTG ACGTTGGATGACCCCTTTTAGTCAATTCGC Shi et al., 2011 g1959 1 SNP - - G/C ACGTTGGATGCAGTGCTAGCAATGATGCAG ACGTTGGATGACGCAGGAAAGTTGGGTTTG Shi et al., 2011 g934 1 SNP - - G/C ACGTTGGATGACCGTCCAACTAGAAACTCC ACGTTGGATGAAGACCCTAAGCTGTTCGAG Shi et al., 2011 g1645 1 SNP - - C/T ACGTTGGATGGATCCAATTCCAGAGACACC ACGTTGGATGGAAGAAAGCTCGTAAAGCCC Shi et al., 2011 g1795 1 SNP - - C/G ACGTTGGATGAGAAGGAGTATGTGGTGGAC ACGTTGGATGATAATCTGCGTGAGCTCACC Shi et al., 2011 g774 2 SNP - - C/T ACGTTGGATGAAGAAGATCGATCCGTGAGC ACGTTGGATGATGACCAGAGGGATGAAACC Shi et al., 2011 g680 2 SNP - - C/T ACGTTGGATGGCCACAGATCCTCAGAAATC ACGTTGGATGGAAGAGAAACAAAAGTAGCAC Shi et al., 2011 g457 2 SNP - - G/A ACGTTGGATGCCTCCCGGTTAGTTACATAC ACGTTGGATGGTCTTGAACACTCAAACCTG Shi et al., 2011 g321 2 SNP - - C/T ACGTTGGATGTGGTGACTCAGCTGAGGGA ACGTTGGATGAATCCACCACCATCTTCACC Shi et al., 2011 g2581 2 SNP - - T/C ACGTTGGATGAGTCACCTAAGCAACCTCTC ACGTTGGATGAGAGACCTGTCCCATTGTTG Shi et al., 2011 g2020 2 SNP - - C/T ACGTTGGATGTTGATTGGAGTAAGGCACCC ACGTTGGATGATTAGAGGCACAGTTGGCAG Shi et al., 2011 g2540 2 SNP - - AT ACGTTGGATGCTAGGCTAATGGGAACAGAC ACGTTGGATGATCTACAACAGGGACCATGC Shi et al., 2011 g1296 3 SNP - - A/G ACGTTGGATGGGGATGAGAATGGTAAAGCC ACGTTGGATGAATTCTTGCAAGGCACACCC Shi et al., 2011 g1808 3 SNP - - C/T ACGTTGGATGATGTCCTCTGCCACGTAAAC ACGTTGGATGCCAAGAATCAACTGGCTGTG Shi et al., 2011 g2476 3 SNP - - C/G ACGTTGGATGCACTTTCCTTGATGCTGCTC ACGTTGGATGGCTTGAGTTCAGTCTCTTCC Shi et al., 2011

228

Table 1. Continued polymorphism Locus LG type size Tm(C) Forward PCR primer (5’— 3’) Reverse PCR primer (5’— 3’) Ref OAC Rex/SVM Taylor g1656 3 SNP - - G/A ACGTTGGATGACCACTTCCCATGTGAAGTC ACGTTGGATGCAACCTTGATCTGAAGAGGG Shi et al., 2011 g586 3 SNP - - T/C ACGTTGGATGGGAAAAATCATGCACCTATC ACGTTGGATGACCAGGTATATAACCGCATC Shi et al., 2011 g2108 3 SNP - - C/T ACGTTGGATGAAATGTTCACGCCGAAGAGC ACGTTGGATGTGGAAGGCGCGGAATAATAG Shi et al., 2011 g2274 3 SNP - - C/T ACGTTGGATGCCCCACATGTTTGTGAATGC ACGTTGGATGCCTTCAGATACTCCTTGACC Shi et al., 2011 g968 4 SNP - - A/C ACGTTGGATGGAATTCGTGCATGCTAAACC ACGTTGGATGTCTGCAACTTCCACTCTCTC Shi et al., 2011 g2595 4 SNP - - C/G ACGTTGGATGTGGAGCATGCTAGCCTTTTG ACGTTGGATGGGCATTACACACTCAAACAC Shi et al., 2011 g128 4 SNP - - G/C ACGTTGGATGCCCCCTTCTCCATATAGTTC ACGTTGGATGGGAAGACTTCAAATATGCTC Shi et al., 2011 g483 4 SNP - - G/A ACGTTGGATGACCCAAATTCGCAGAAATCC ACGTTGGATGCGGGTTTGAGAAGTTTAGGG Shi et al., 2011 g1375 4 SNP - - G/C ACGTTGGATGTGAACCACTCCGATGCAATC ACGTTGGATGCTACAAGAAGCCTTGGAGAG Shi et al., 2011 g2467 4 SNP - - T/C ACGTTGGATGGATGCAGGCCAAAGTTAAGG ACGTTGGATGTGAGAGATGGCTTGGTGAAC Shi et al., 2011 g1188 5 SNP - - A/T ACGTTGGATGCTCCATGTTGGTCTATCTCC ACGTTGGATGGATTGTGTGAGAGCAGAACC Shi et al., 2011 g1968 5 SNP - - G/T ACGTTGGATGTAGTGCTAACTCTTGCTAGG ACGTTGGATGGGGCTGCGAAGTGAAAAAAG Shi et al., 2011 g1333 5 SNP - - C/T ACGTTGGATGGATGTGAAAGTGGAATAGGC ACGTTGGATGGCTTTCATGTCTGCAAGGTC Shi et al., 2011 g1689 5 SNP - - G/C ACGTTGGATGTGCTGTGATGTGTCATGGTC ACGTTGGATGAGAAGCAGAAAAACACGTGG Shi et al., 2011 g1664 5 SNP - - C/G ACGTTGGATGACACGTCTCAGGTTCCTAAG ACGTTGGATGTGGGATAACGAACACTCAGC Shi et al., 2011 g1883 5 SNP - - T/A ACGTTGGATGGTGGCAGGTAGTCAACTTTG ACGTTGGATGGGTTTCCAGCGAAGGAATTG Shi et al., 2011 g1676 5 SNP - - A/T ACGTTGGATGCCCTTGAATTTCCGTTCTAC ACGTTGGATGGGCTTTGGCTTTTACCATTG Shi et al., 2011 g2410 5 SNP - - G/T ACGTTGGATGGGTCTAACCTTTCAATCGTC ACGTTGGATGAGACACGGTACGTATTTCCC Shi et al., 2011 g1757 6 SNP - - A/G ACGTTGGATGTCCACAATGGCTCAATCTCC ACGTTGGATGAGATGGAGCCGGAGACAATG Shi et al., 2011 g2208 6 SNP - - C/T ACGTTGGATGGCAAAAATCATGCAGCAGCC ACGTTGGATGAGGTGCAACTGCATCACAAC Shi et al., 2011 g1998 6 SNP - - C/T ACGTTGGATGTGCCCACTGAAAAGATCGCC ACGTTGGATGTACCCAACACAGAGACTAAC Shi et al., 2011 g471 6 SNP - - G/A ACGTTGGATGTGAGGAAACTAGAGGTGTGC ACGTTGGATGTGCAGTTACAGTCTTCCTCC Shi et al., 2011 g1436 6 SNP - - G/A ACGTTGGATGGGGTTGCAAGGTTTCACTTA ACGTTGGATGAATCAGAGCCATCACAACCC Shi et al., 2011 g2538 6 SNP - - A/G ACGTTGGATGACGAGGAGGTTGATGAGATG ACGTTGGATGTTACTCCTCACTTGGCCATC Shi et al., 2011 g1174 6 SNP - - AG ACGTTGGATGCACTCGTCAAAGAAAAACCAG ACGTTGGATGGATTGTGACTCACAAGGAGG Shi et al., 2011 g503 7 SNP - - G/C ACGTTGGATGTTGCCAACTGGAAGATCTCG ACGTTGGATGATGCTCAGCTGCAGAGCTTC Shi et al., 2011 g134 7 SNP - - G/C ACGTTGGATGTTGCCAACTGGAAGATCTCG ACGTTGGATGATTCTCAGCTGCAGAGCTTC Shi et al., 2011 g1615 7 SNP - - A/C ACGTTGGATGACATGGCGGTGCTTTACTTG ACGTTGGATGTTCCTCTTGCTGTCCTTCAC Shi et al., 2011 g2129 7 SNP - - G/A ACGTTGGATGGGTGCTCCAAGAATGGTATG ACGTTGGATGCACTCAATCCAAACCAAGCC Shi et al., 2011

229

Table 1. Continued polymorphism Locus LG type size Tm(C) Forward PCR primer (5’— 3’) Reverse PCR primer (5’— 3’) Ref OAC Rex/SVM Taylor g2531 7 SNP - - A/C ACGTTGGATGTCTTTCTCGGTCCTGAATCC ACGTTGGATGAGGTGAGTGTAGTGTCTTTG Shi et al., 2011 g1065 7 SNP - - A/C ACGTTGGATGGCTGAGTCAACAAGTGCAAC ACGTTGGATGTTCTGGCAAACAACTACCCG Shi et al., 2011 g290 7 SNP - - T/C ACGTTGGATGGGACGTGAAAGATCACATTG ACGTTGGATGGGTGGTGCACACAATTATCC Shi et al., 2011 g2357 7 SNP - - AT ACGTTGGATGTTGCTTATTGCCTTCCTGCC ACGTTGGATGATGTGGGTTGCTGGGTTTAG Shi et al., 2011 g2311 8 SNP - - T/G ACGTTGGATGCTGTGACATGACAACTTCGG ACGTTGGATGGTCTTACAACTTCAGCCTGC Shi et al., 2011 g2393 8 SNP - - G/C ACGTTGGATGAGGGTGATGTGGACACAATG ACGTTGGATGGCATGTAAGGTGTTCATGGG Shi et al., 2011 g1119 8 SNP - - C/A ACGTTGGATGGGTCCATGTTGAGTGAAAGC ACGTTGGATGCCGAGAAGAACCATTCTGAG Shi et al., 2011 g696 8 SNP - - C/T ACGTTGGATGTCTTTTTGCTTCCGCGGATG ACGTTGGATGCTAAGATCCCCTTCGAGGAG Shi et al., 2011 g580 8 SNP - - T/C ACGTTGGATGCAACACAGTCTCGTAAACCC ACGTTGGATGCGTATGCAGGAAAAGTACGG Shi et al., 2011 g1713 8 SNP - - C/T ACGTTGGATGACAGGGCAAAACTGGATGAC ACGTTGGATGTAAGTGCCAAGTCCTTGGTC Shi et al., 2011 g796 8 SNP - - A/G ACGTTGGATGCAGAACGGTCTTAACTACGC ACGTTGGATGCCTCCAAAGTGTTGGGATTG Shi et al., 2011 g195 9 SNP - - C/T ACGTTGGATGTGAGAAGGTGTCAACTTTCG ACGTTGGATGCTCTTGGACAGTACCACTAC Shi et al., 2011 g1379 9 SNP - - C/T ACGTTGGATGTTGCTGGTGACCTTGGCAAC ACGTTGGATGAAATGAGCAACGCACAGACG Shi et al., 2011 g1206 9 SNP - - C/T ACGTTGGATGGCAGTTCGGTTACTTCAAGC ACGTTGGATGTCTTCTTCTCGGCGATCTTG Shi et al., 2011 g792 9 SNP - - G/T ACGTTGGATGTGACATTGGTTGATCCCCTG ACGTTGGATGTTTCTCTGAGTCTGTCTGCC Shi et al., 2011 g2498 9 SNP - - T/C ACGTTGGATGAAACTCTGATCCCGTAGCAC ACGTTGGATGTGCCACAAGCAGAATTACCC Shi et al., 2011 g544 9 SNP - - C/G ACGTTGGATGGATAACAGATCCACCACTGC ACGTTGGATGGGGACCTTTTTGCAAGTGG Shi et al., 2011 g1286 9 SNP - - T/A ACGTTGGATGAACTGAGTGGCACTGGATAC ACGTTGGATGTGATAATACCCGGTTGGAGG Shi et al., 2011 g2521 10 SNP - - T/A ACGTTGGATGGGTGTTGGTAATCATGTGCC ACGTTGGATGAGCTAGAGTTCATGCTTGTG Shi et al., 2011 g1320 10 SNP - - T/C ACGTTGGATGTGTTGTCAGTGGCATTGGTG ACGTTGGATGTGTCCTTCCCATGAACTTCC Shi et al., 2011 g1029 10 SNP - - T/A ACGTTGGATGAAAGCATGGGGTTACAACTG ACGTTGGATGGCGCCTCTGCAAATTGTATG Shi et al., 2011 g1994 10 SNP - - G/C ACGTTGGATGATCAGAAGCAGCCAAGTGAC ACGTTGGATGCAGATCACACTAGACCTACC Shi et al., 2011 g2600 10 SNP - - C/T ACGTTGGATGGGAAGGAACAACAATTCAAG ACGTTGGATGCTCTGAAAGTGAGACCTTCC Shi et al., 2011 g2560 10 SNP - - C/A ACGTTGGATGTGGGAGAAGTTCATTGCCAG ACGTTGGATGAGTATATGGCTGGATCCCTG Shi et al., 2011 g1724 10 SNP - - C/T ACGTTGGATGAGAAACTCCCTCGCAATGCC ACGTTGGATGCTAGAAGATATGTCAAGGTG Shi et al., 2011 g2260 10 SNP - - T/A ACGTTGGATGATGGCATCAAAAGGAGAAGG ACGTTGGATGGTGGCATCTTTGCTTATGCG Shi et al., 2011 g1383 10 SNP - - CT ACGTTGGATGTGACATAAGTGCACAACACC ACGTTGGATGAGTTGATTTCTTTGTTGGG Shi et al., 2011 g2273 11 SNP - - C/T ACGTTGGATGTCACGATTCACGACTGCTTC ACGTTGGATGCACCAATAACCAATTGCAGG Shi et al., 2011 g1215 11 SNP - - C/G ACGTTGGATGTTCACGACGGGATTCTCCCT ACGTTGGATGACCTTTTCTGCGCTGAGCAC Shi et al., 2011

230

Table 1. Continued polymorphism Locus LG type size Tm(C) Forward PCR primer (5’— 3’) Reverse PCR primer (5’— 3’) Ref OAC Rex/SVM Taylor g1415 11 SNP - - T/A ACGTTGGATGACAAGCAAACTGATCAGTCG ACGTTGGATGACGCAGGAACCTTTTTAGTG Shi et al., 2011 g1438 11 SNP - - C/T ACGTTGGATGATGTCCCCATTCCCATATCC ACGTTGGATGCTTCCACAAACCCAGGTTTC Shi et al., 2011 g1168 11 SNP - - C/T ACGTTGGATGTAATCCATAGGAGGAGCCAG ACGTTGGATGCAGAGTGACTCAGATTCCTC Shi et al., 2011 g156 11 SNP - - T/C ACGTTGGATGCCTTTTGAGGAGTCCTTGTG ACGTTGGATGTTCCAGCTCCAGTAAACACC Shi et al., 2011 g188 11 SNP - - CA ACGTTGGATGAGAGTGTGTATAAGAGTGTG ACGTTGGATGTCCGGTTCTAGTGATAAGGG Shi et al., 2011

231

Appendix 2

Table 1. SNP marker spreadsheet for OAC Rex × SVM Taylor population:

232

Taylor Rex 6-1 6-3 6-4 6-7 6-13 6-16 6-17W 6-18 6-19 12-3 13-1 13-2 13-4W 13-1- 13-11 13-12 13-15 13-16W 13-19 13-22 13-23 13-25 13-26 13-27 13-29 51-1 51-3 51-4 51-5 51-6 51-9 51-1- 51-11 G680_B A B B B B B B B B B B A B B B A B B B A A B B B A B B B B A B B B B B g1029_H A B B B A B B B B A H A B B H B H B B B H A A B A B B B B H A A B B B g1031 ------g1065 A B A A A A A B B B A B A A B B A A B B A A A A A A A B B B B B A A B g1084 ------g1084_B ------g1119 A B B A A B A B B A H H A B A A B B B B A B B A A A B A B B B B B B A g1168 A B A A A A A A B H A B B B B B B B B A A A A A A B H B B B A A H A A g1174 A B A B B B B B B B B A A B B A B A B B A A B A A B A B B A A A B B B g1188_B A B B B A B A A B A B A B B A B A B B B A B A A A A A B B B A H B B B g1206 A B H A A H H B B B B B B B B B B B B B A A A A A A A B B H B B B B B g1215 A B A A B A B A B A A B B B B B A B B B B B A A A A A A B B B B H B A g1249_B ------g1286 A B B B A H B B B A B B B H B A A B B A A H H A A A A H B H B B B B B g128_H A B A B B A A A B B B B B A B B B A B B A A B A A A B A B B B B A B A g1296 A B B A A H B A B B B A A A A A B H B A A H B A A B B A B A B B B B B g1320 A B B B A B B B B A H A B B H B H B B B H A A B A B B B B H H A B B A g1333 A B B A A B A A B B B A B B A A A B B A A B B A A B A B B B A B B B A g1333_B ------g134 A B B A A B A H B B B A B A A B B A B B A A A B A B A A A A B B B B A g1375 A B A B B A H A B H B B B A A B B A B B A A A A A B A A B A B B A A B g1379 A B A A A A A B B A B H B B B B B A B B A B A A A A B B B A B B B B A g1383 A B B B A - B B B H H A B A H B H B B B H A B B A B B B B H H A B B A g1383_B ------g14-4 A B B A B B B B B A H B B B A A B B B A H B B A A A B B B A B B A A B g1415 A B A A A A A A B B B B B B B B B H B A B A A A A B H B B B A A H A H g1436 A B B B A B B H B B A A B A A A A B B A B A B B A A A A B A A A H B B g1438 A B A A A A A A B B B B B B B B B B B A A A A A A B H B B B A A H A H g156 A B A A A A A A B H A B B B B B B B B A A A A A A B H B B B A A H A A g1615 ------g1615_B A B A A H H A B B B A B A A A B A H B B A A A A A B A B B A B B B H B g1645 A B A A A A B A B B B B B A B A H B B B B B A B A A B A B H A A B B H g1656 A B B A A B B H B A A A A A A B B B B B B B A A A A B A B A A A B B B g1664 A B B A A B A A B B B A B B A A A B B A A B A A A B A A B H B A A A A g1676 A B B B A B A A B B B A B B A B A B B B A B A A A A A B B B A A B B B

233

51-12 51-13 51-14 51-15W 51-18W 51-23 51-24 51-26 51-28C1 51-29 51-33W 51-4- 52-2 52-4 52-13 52-14 52-18 52-23C1 52-24 52-25W 52-25C 52-28 52-29 52-3-W 52-3-C 53-5 53-9 53-12 53-16 53-23 53-25 53-27 53-28 53-29 53-31 G680_B B B A A - B A A B B B A B B B A B B B B B B A B A B B B A B B B B B A g1029_H B B A H B H A B B B B B A B A B B B B B B B A B B A A A B B B B H B A g1031 ------g1065 A A A A A A B A A B B A A A B A B A B B A B B B A B A A B A B A A B B g1084 ------g1084_B ------g1119 A B A B H B B A B H H A A B B B A B B A A A B A B H B A A B A A B A B g1168 B B B A A B A A A B A A A A A B B A A A H A A A B A A A A A B A B A B g1174 B A B B A A B B B A A B B A B B A B B A A B A B B B A B B B B B A B A g1188_B A B B A A H B B A A A B B A A A A B A A A A B B A B B B A A A B H A B g1206 A B A B A A B A A A A A A A A A B B B A A B A A A B B B A A B B B B A g1215 B A B B A A H A B B A A A A B B B A A A B B H B B A A A A A B A H B A g1249_B ------g1286 H B B B B A B B H A A A A A B A B B B A A B B A A B B B H B A B B A A g128_H A A B A - B A B A A B A B A A B B B B B B B A A A B A H A B A B A B A g1296 B B B B A A B B A A A A B B B B B B B H B H B A A A A B A A B B B B A g1320 B B A H B H A B B B B B A B A B B B B B B B A B B A A A B B B B H B A g1333 A B B A A A A B B A A B B A H A A B A A B B B B B A B B A A B B B A A g1333_B ------g134 A B B B A A B B A A A A B A B A A H A A A A B A A B B A A B H B A B A g1375 A B B A B B A B B A B B A B A B B A B B B B A A H B B A B B A B A A A g1379 A A A A A A B A A B A A A H A A A B B B A B A A B A B B B A B B B H A g1383 B B A H B H A B B B B B A B A B B B B B B B A B B A A A B B B B H B B g1383_B ------g14-4 B A B A A B A A B A A A A B A B B A A B B B B B B B A A B B B A B B B g1415 B B B A A A B A A B A A A A H B B A B A H A A A A B B A A B B A B H B g1436 B A A A A B A A H A A A A B A B B B A A A B A B B A A B H B A A A A B g1438 B B B A A B B A A B A A A A H B B A B A H A A A B B B A A B B A B H B g156 B B B A A B A A A B A A A A A B B A A A H B A A B A A A A A B A B B B g1615 ------g1615_B A A A B A A B H A A B A A A B A B B A H A A B B B B B B A A A A B B B g1645 A A A B B A H B B A B A A A A A B A B A A A A A A A B B A A B B A H A g1656 B H B A B A H B A A B A B A A B B A B A A A A H A H B B A B B A A B A g1664 A A B B A A A B A A A A B B B A A B A B B B B A B H B A A B B A B A A g1676 B B B B - B B B A A A B B A A A A B A A A A B B A B B B A A A B B A B

234

53-34 53-35 53-36 53-38 54-1 54-6 54-8 54-1-W 54-18W 54-18C 54-19 54-2-W 54-29 54-31 55-2 55-7 55-9 55-12 55-14 55-15 55-17W G680_B B A B A B A A B B B B B B B B B A B A A A g1029_H H B A A B B B B B B B A B B B A B B B B B g1031 ------g1065 A B A B B B B B A A A B A B A B B B A A B g1084 ------g1084_B ------g1119 A B A A A A A B B B H B B A A B A A A A A g1168 A A A B A A H A A H B B A B A A B B A A A g1174 B B A B B A B A A A B B B B B B B A B B A g1188_B A B B A B A B A A A A A A B B A B B B B H g1206 B B A B A B A A A A A A B A B B B B A A H g1215 B A A B A B A - B B B B A H B B A B B B H g1249_B ------g1286 A B A B B B A B A H A B B B A A A B A A B g128_H A A A B A A B A A A A B B B B H B A A A A g1296 A B B A A A A A A A A A A B A B A B B B B g1320 H B A A B B A B B B B A B B B A B B B B B g1333 B A B A A B A A B B A A A B B H B B B B H g1333_B ------g134 B A A A A A B B B B A A A B A A A B B B B g1375 A B A A B B B A B B A B A B B B B A B B B g1379 B A A A B B A A A A A A B A B H H H B B H g1383 H B A A B B A - B B B A B B B A B B B B B g1383_B ------g14-4 B A H A A A A B A A A B A A A A A H A A H g1415 A A A B B A H B A B B B A A B A B B A A A g1436 A B A A A A A B H A H H A A B A A B B B H g1438 A A A B A A H B A H B B A B A A B B A A A g156 A A A B A A H A A H B B A B A A B B A A A g1615 ------g1615_B H B H B B B B B B H A A A B A A B B B B B g1645 A B A A B A A B A A H A A A A A B A A A B g1656 A A A H B A H B B B B A B B A B B B B B A g1664 B A B A A A A A B B A A B B B A A B A A B g1676 A B B - B B B A A A A A H B B B B B B B B

235

Taylor Rex 6-1 6-3 6-4 6-7 6-13 6-16 6-17W 6-18 6-19 12-3 13-1 13-2 13-4W 13-1- 13-11 13-12 13-15 13-16W 13-19 13-22 13-23 13-25 13-26 13-27 13-29 51-1 51-3 51-4 51-5 51-6 51-9 51-1- 51-11 g1689 A B B A A B A A B B B A B B A A A B B A A B A A A B A B B B A B B B A g1713 A B H B A H A B B B B A A B A H B A B H B A B A A A A A B A B H B A A g1719 ------g1724 ------g1731 A B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B g1757 A B A B B H B B B B B A A B B A B A B B A A B A A A A B B A A A B B B g1795 A B A A A A B A B B B B B A B A H B B B B B A B A A B A B H A A B B H g1801 ------g1808 A B B A A B B B B B B B B A B A B B B A H B B B A B H B B B A B B H B g1837 ------g188 A B A - B - B B B A A B B B B B A B B B B B A A A A A A B B B B - B A g1883 A B B A A B A A B B B A B B A A A B B A A B A A A B A A B H B A A A A g1886 A B B A B B B B B A H B B A A A B H B A H B B A A A A B B A B B A A B g1925 ------g195 A B B A A B A B B A B H B A B A B A B A A B A A A A B B B A A A B B A g1959 A B A A A A A B B B H B B A B B B B B A B B A B A B B B B A A B B B B g195_B A B B A A B A B B A B H B A B A B A B A A B A A A A B B B A A A B B A g1968 A B B H A B A A B B B A B B A B A H B A A B B A A A H B B B A A B B A g1968_B A B B H A B A A B B B A B B A B A H B A A B B A A A H B B B A A B B A g1994 A B B H A B B B B H H A B A A B A B B B H A B A A B B B B H H B B B A g1998 A B B B B B B B B B B A B B B A A A B B A A A A A B A A B A A A B B B g2020 A B B B B A A A B A A B A A B B B H B H A B A B A A A A B A A A A H A g2108 A B B A A B B A B A H B A B B B B B B B B H B A A H A A B A A A A H B g2129 A B B A H B H B B B A B A A B B A H B B A A A A A B A B B A B A B H H g2135 ------g2208_B A B B B B B B B B B B A B B B A A A B B A A A A A B A A B A A A B B B g2221 ------g2260 A B A A A A B B B B H B B B B B B A B B B A B A A H A B B H A A B B A g2273 A B A A A A B B B B B B B B B B B H B B B B A A A B H B B B A A A A A g2274 A B A A A A B A B A B B A B B B B B B B A H B A A H A A B A A A A A B g2303 ------g2311 A B A A A A A B B A A A B A B A B B B B A B B B A B B B B B B B B B A g2357 - B A A A - A A B B A B A H B B A A B B A A B A A A A B B B B B A A B g2393 A B B H A B A B B A H H B A B A B B B B A B B B A B B B B B B B B B A g2410 A B B H A B A A B B B A B B A B A H B A A B B A A A A B B B A A B B B

236

51-12 51-13 51-14 51-15W 51-18W 51-23 51-24 51-26 51-28C1 51-29 51-33W 51-4- 52-2 52-4 52-13 52-14 52-18 52-23C1 52-24 52-25W 52-25C 52-28 52-29 52-3-W 52-3-C 53-5 53-9 53-12 53-16 53-23 53-25 53-27 53-28 53-29 53-31 g1689 A B B A A A A B B A A B B A H A A B A A B B B A A H B H A A B B B A A g1713 B B A B B A B B A A B B B B A A B A B B B B B A A H A B B A A A B B H g1719 ------g1724 ------g1731 B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B B g1757 B A A B A A B A B A A H B A B B A B B A A B A B B B A A B B A B A B A g1795 A A A B B A H B B A B A A A A A B A B A A A A A A A B B H A B B A H A g1801 ------g1808 B B B B H B B B B B B B A B H H B B B B A B B B B B H H B H B B B A B g1837 ------g188 B H A B A A H A B B A A A A B B B A - A A B - B B A A A A B B A H B A g1883 A A B B A A A B A A A A B B B A A B A B B B B A B H B A A B B A B A A g1886 B A B A A B A H B A A A A B A B B A A B B B B B B B A A B B B A B B B g1925 ------g195 A A A A A B B A A B A B B B A A A A B H A B B B B A B B B A B B B H B g1959 B A B A A B B B B A A B A B A B H B B B B B A A A B A B B A B A A H A g195_B A A A A A B B A A B A B B B A A A A B H A B B B B A B B B A B B B H B g1968 A B B A A A B B B A A B B B A A A B A A A B B B B A B B A A B B B A B g1968_B A B B A A A B B B A A B B B A A A B A A A B B B B A B B A A B B B A B g1994 B B B A B A A B B H B B A B A A B B B B B B A B A A B A B B B B H A B g1998 B A A A A A B A B B A H B A B B B B A A A B B B B B A A B B A A A B A g2020 A A A B A A B A B A B A A B B A A B A A A A B A A A A A A B A A B A B g2108 A A B A H A A B A A B B B B B B B B H A A A A B A B A A A B H B A B H g2129 A A A B A A B H A A B A B A B A B A A B A A B B B B B B A A A A A B B g2135 ------g2208_B B A A A A A B A B B A H B A B B B B A A A B B B B B A A B B A A A B A g2221 ------g2260 B A B A A A B B H B A B A B H B B B B B B B B B B A B B B B B B A A B g2273 B B B A A H B A A B A A A A H B B A B A A A A A A B B A A B B A A A B g2274 A A B A H A A B A A B B B B B A B B H A A A B B A B A A A B A B A B B g2303 ------g2311 B B A B B B A A B B A B B B B B A A B A A A B A B B B A A B A A A A B g2357 A A A B B A B A A B B A A A B A B A B B A B B B A B - A B A B A A B B g2393 B B A B B B A A B B H A B B B B A B B A A A B A B B B A A A A A B A B g2410 H B B A A A B B H A A B B B A A A B A A A B B B B A B B A A A B B A B

237

53-34 53-35 53-36 53-38 54-1 54-6 54-8 54-1-W 54-18W 54-18C 54-19 54-2-W 54-29 54-31 55-2 55-7 55-9 55-12 55-14 55-15 55-17W g1689 B A B H A B A A B B A A B B B H B B A A H g1713 A B B H A A H H A A B B B A B B A B B B A g1719 ------g1724 ------g1731 B B B B B B B B B B B B B B B B B B A A A g1757 B B A H B A B B A A H B B A B B B A B B A g1795 A B A A B A A - A A H A A A A A A A A A B g1801 ------g1808 B B H H B B B B B H B A H H B H B B H H A g1837 ------g188 B A A B A B A A A A B B A H - B A B - B B g1883 B A B A A A A A B H A A B B B A A B A A B g1886 B A A A A A A B A A H B A A A A A A A A H g1925 ------g195 B B A B A B H A A A A A B A A H B A A A B g1959 B A A A B A A A A A B B A A A B A A A A A g195_B B B A B A B H A A A A A B A A H B A A A B g1968 A B B A B B A A A A A A A B B H B B B B H g1968_B A B B A B B A A A A A A A B B H B B B B H g1994 B B A A B B A B B B B A B B B B H B B B B g1998 B B A A B A B B A A H B A A B A A A B B A g2020 H A B A B B A A A B B B A A B B B B A A A g2108 A A A A B A A B B A A A B A A A B A A A A g2129 H B A A B A B B B H A A A B A A B B B B B g2135 ------g2208_B B B A A B A B B A A H B A A B A A A B B A g2221 ------g2260 A B H A B A A A B B A A A B A A B B A A B g2273 A B A B B A H B A A B H A A B B B B A A A g2274 A B A H A A A B B B B B B A A A B A A A H g2303 ------g2311 A A A A A A A - B B B B H A A B B A A A A g2357 A B A B B B B - B H A B A B B B A B A A B g2393 H B A A A A A B B B B B B A A B B A A A A g2410 A B B B B B B A A A A A A B B H B B B B H

238

Taylor Rex 6-1 6-3 6-4 6-7 6-13 6-16 6-17W 6-18 6-19 12-3 13-1 13-2 13-4W 13-1- 13-11 13-12 13-15 13-16W 13-19 13-22 13-23 13-25 13-26 13-27 13-29 51-1 51-3 51-4 51-5 51-6 51-9 51-1- 51-11 g2467 A B A B B A A A B B B B B A A B A A B A H A B A A B A H B A B B A A B g2476 A B B A A B A H B A A A A A A B B B B B B B A A A A B A B A A A B B B g2498 A B B B A H B B B A B B B A B A B B B A A H A A A A A H B B B B B B B g2521 A B B B A B B B B A H A B B H B H B B B H A B B - B B B B H B B B B B g2521_B A B B B A B B B B A B A B B B B B B B B B A B B A B B B B B B B B B B g2531 A B B A B B H A B B A B A A B B A A B B A A B A A A A B B A B A A H H g2538 A B B B A B B H B B A A B A A A A B B A B A B B A A A A B A A A H B B g2540 A - B - B - B - B B B A B B B A - A B A A - B B A A B A B A B - B B B g2558 ------g2560 A B B H A A B B B A H A B A A B B A B B A A B A A B B B B H A A B B A g2562 A B B A B B B B B B A H B A A H B B B A B B B A A A A B B B A B A A B g2562_B A B B A B B B B B B A H B A A H B B B A B B B A A A A B B B A B A A B g2581 A B H B A A B B B A B A B B A A B A B A A A A B A A B B B A A A A H A g2595 A B A H B A B A B A B A B A B B B A B B A B B A A B B A B B B B A A A g2600_B A B B H A B B B B A H A B A A B B A B B A A B A A B B B B H A A B B A g290 A B A A A A A B B B A B A A B A B A B A A A A A A A A B B A B B B B B g290_B A B A A A - A B B B A B A A B A B A B A A A A A A A A B B A B B B B B g292 ------g321 A B H B A A A B B A B A B B A A B A B A A A A B A A B B B A A A A H A g385 ------g385_B ------g457 A B A A A A A B B H B A B B A A B A B A A A A B A A B B B A H A A A A g457_B A B A A A A A B B H B A B B A A B A B A A A A B A A B B B A H A A A A g471 A B B B A B B H B B A A B A A A A B B A B A B B A A A A B A A A A B B g483 A B A B B A A A B B B B B A B B B A B B A A B B A A B A B B B B A B A g487 ------g503 A B B A A B A H B B B A B A A B B A B B A A A B A B A A A A B B B B A g510_B ------g544 A B A B A A B B B H B B B A B A A B B A A H H A A A A H B B A A B B B g580 A B B B A B A B B A B H A A A A B A B H B H A A A A A A B B A A B A A g586 ------g586_B A B B A B B B H B A A A A A A B B B B B B B A A A A B A B A A A B B B g680 A B B B B B B B B B B A B B B A B H B A A B B B A H B H B A B B B B B g696 A B B H A B A B B A H H A A A B B A B B A B B A A A B A B B B B B B A g724 A B A A A A A B B B H B B A B B B B B A B B A B A B B B B A A B B B B

239

51-12 51-13 51-14 51-15W 51-18W 51-23 51-24 51-26 51-28C1 51-29 51-33W 51-4- 52-2 52-4 52-13 52-14 52-18 52-23C1 52-24 52-25W 52-25C 52-28 52-29 52-3-W 52-3-C 53-5 53-9 53-12 53-16 53-23 53-25 53-27 53-28 53-29 53-31 g2467 A B B A B B A B B A B H A B A B A A B B B B A A B A B A B B A B A A A g2476 B H B A B A H B A A B A B A A B B A B A A A A H A A B B A A B A A B A g2498 A B B B - A B A A A A A A A B A B B B A A B B A A B B B H B A B B A A g2521 B B A - B A A B B B B B A B B B B B B B B B B B B H A A B B B B H B A g2521_B B B A B B A A B B B B B A B B B B B B B B B B B B B A A B B B B B B A g2531 A A B A B A B B A A B A B A B A B A B B A A B B B B A B A A A A A B H g2538 B A A A A B A A H A A B A B A B B B A A A B A B B A A B H B A A A A B g2540 B A A A A A A A B - - A B B B A B A B A B A A A A B - - A - - B B B A g2558 ------g2560 B A B A B A B B B H B B A B A B B B B B B B B B B A B B B B B B A A B g2562 A A B A A A A A B B A B B A A B B A A H H B A B B B B A A B B A B B B g2562_B A A B A A A A A B B A B B A A B B A A H H B A B B B B A A B B A B B B g2581 A A B A A B A A A B H A B B B A A A A A B H B A A B B A A H B B A A H g2595 A A B A B B B B A B H B B A A A B B B B B B B A B B A H A B A A A B A g2600_B B A B A B A B B B H B B A B A B B B B B A B H B B A B A B B B B A A B g290 A A A A A A B B B B A A A A B B A A B B B B B B B B B A B A B A A B B g290_B A A A A - A B B B B A A A A B B A A B B B B B B B B B A B A B A A B B g292 ------g321 A A B A A B A A A B H A B B B A A A A A B H B A A B B A A H B B A A A g385 ------g385_B ------g457 A A B A A B H A B B H A B B A A A A A A H A A A A B B B A H B B A A A g457_B A A B A - B H A B B H A B B A A A A A A H A A A A B B B A H B B A A A g471 B A A A A A A A H A A A A B B B B B A A A B B B B A A B H B A A A A B g483 A A B A B B A B A A B A B A A B B B B B B B A A A B A H A B A B A B A g487 ------g503 A B B B A A B B A A A A B A B A A H A A A A B A A B B A A B H B A B A g510_B ------g544 H B A B B A B B H A A A A A B A B B A A A B B A A B B B H B A B B A A g580 B B A - A A B B A H A A B B B A B A B B B B B A A A A A B A A H B B H g586 ------g586_B B H B A - A H B A A B A B A A B B A B A A A A H A H B B A B B B A B A g680 B B A A A H A A B B B A B B B A B H B H B H A B A B B B A B B B B B A g696 A B A B H B B H B H H A A B B B B B B A A A A A H A A A B B A A B A B g724 B A B B A B B B B A A B A B A B H B A B B B A A B B A B B A B A A H A

240

53-34 53-35 53-36 53-38 54-1 54-6 54-8 54-1-W 54-18W 54-18C 54-19 54-2-W 54-29 54-31 55-2 55-7 55-9 55-12 55-14 55-15 55-17W g2467 A B A A B B A B B B A A A H H B B A B B B g2476 A A A H B A H B B B B A B B A B B A B B A g2498 A H A B B B B B A H A B B B B A A B A A B g2521 H B A A B B B B B B B A B H B A B B B B A g2521_B B B A A B B B B B B B A B B B A B B B B A g2531 A B A A B A B B B H A B A B A A H B B B B g2538 A B B A A A B B B A H H A A B A A B B B A g2540 B A B A B A A B A A A B B B B B A - A A A g2558 ------g2560 B B A A B A A B B B B A H B A A B B A A B g2562 B A A A B B A B A A A A A A B B A H A A B g2562_B B A A A B B A B A A A A A A B B A H A A B g2581 B B A A A A A A A A B A A A A B A B B B A g2595 A A A B A A B A A A B B A B B B B A B B A g2600_B B B A A B H A B B B B A H B B A H B A A B g290 B A A B A B B B A - A B A B A B B B B B B g290_B B A A B A B B B A H A B A B A B B B B B B g292 ------g321 B B A B A A A A A A B A A A A B A B B B A g385 ------g385_B ------g457 B B A A A B B B A A A A A A A B A B B B A g457_B B B A A A B - B A A A A A A A B A B B B A g471 A B A A A A A B H A A A A A B A A A B B H g483 A A A A A A B A A A A B B B B H B A A A A g487 ------g503 B A A A A A B B B B A A A B A A A B B B B g510_B ------g544 B B A B A B B B A B A B A B A A A A A A A g580 A B A B A B A B A A B B A A B B A A A A A g586 ------g586_B A A A B B A H B B B B A B B A B B B B B A g680 B A B A B A A B H H H B B B B B A B A A A g696 A B A H A A A B B B H B B A A B A H A A A g724 B A A A B A A - A A B B A A A B A A A A A

241

Taylor Rex 6-1 6-3 6-4 6-7 6-13 6-16 6-17W 6-18 6-19 12-3 13-1 13-2 13-4W 13-1- 13-11 13-12 13-15 13-16W 13-19 13-22 13-23 13-25 13-26 13-27 13-29 51-1 51-3 51-4 51-5 51-6 51-9 51-1- 51-11 g732 ------g774 A B B B B B B B B B B A B B B A B H B A A B B B A H B H B A B B B B B g792 A B H A A H H B B B B B B A B B B B B B A H A A A A A B B H B B B H H g796 A B H B B A B A B B B A A H H H B A B A B A B A A B B A B A B A A A A g811 ------g893 ------g893_B ------g934 A B A A A A A B B B B B B A B B B B B A A B A B A A B A B A A B B B A

g968 A B A H B A B A B B A H B A B A B B B A A B B A A B B A B H B A B A B

51-12 51-13 51-14 51-15W 51-18W 51-23 51-24 51-26 51-28C1 51-29 51-33W 51-4- 52-2 52-4 52-13 52-14 52-18 52-23C1 52-24 52-25W 52-25C 52-28 52-29 52-3-W 52-3-C 53-5 53-9 53-12 53-16 53-23 53-25 53-27 53-28 53-29 53-31 g732 ------g774 B B A - A H A A B B B A B B B A B H B H B H A B A B B B A B B B B B A g792 A B A B B A B A A A A A A A A A B B B A A B A A A B B B A B A B B H A g796 B H A B B B B B B A B B B B B A B A A B B B B A A B B B A A A B A B B g811 ------g893 ------g893_B ------g934 B A A B A B A B B A A B A H A A H B A A A B A A A A B B A A B A A H A

g968 B B B A A B B B B B A B B B A A B A B B B B H B B A A A A A B B A B A

53-34 53-35 53-36 53-38 54-1 54-6 54-8 54-1-W 54-18W 54-18C 54-19 54-2-W 54-29 54-31 55-2 55-7 55-9 55-12 55-14 55-15 55-17W g732 ------g774 B A B A B A A - H H A B B H B B A B A A A g792 A H A B B B A A A A A A B B B B B B B B H g796 A B B B A A A - A A A B B A A A A B H H A g811 ------g893 ------g893_B ------g934 A B A A B A A A A A B B A A A A B A A A A g968 A A A - B A B A A A B H A A B A B B A A A 242