Naturally occurring variation in the promoter of the chromoplast-specific Cyc-B in can be used to modulate levels of β- in ripe tomato

THESIS

Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University

By

Caleb James Orchard

Graduate Program in Horticulture and Crop Science

The Ohio State University

2014

Thesis Committee:

Dr. David Francis, “Advisor”

Dr. Leah McHale

Dr. Joseph Scheerens

Dr. Steven Schwartz

Copyright by

Caleb James Orchard

2014

Abstract

β-carotene in tomato is an important for human health due to its pro- vitamin A activity. Although many exhibit provitamin A activity and are available in food crops, Vitamin A deficiency remains the leading cause of preventable blindness in many developing countries. In tomato (S. lycopersicum, L.), the B gene (Cyc-

B) encodes a chromoplast-specific -β-cyclase that converts trans-lycopene to β- carotene. Prior research suggests that DNA sequence variation in the promoter of B may be responsible for the high β-carotene Beta phenotype. We examined the carotenoid profiles of vintage and contemporary tomato varieties to identify sources of high β- carotene. Red tomatoes had a range from 0.2 – 0.97 mg/100 g fresh weight of β-carotene, while several -fruited varieties had 1.67 – 4.0 mg/100 g. The non-transcribed region 5′ to the B gene (promoter) contains significant nucleotide variation, with eleven unique haplotypes across 1850 bp of sequence. Sequence analysis suggested that the B promoter was derived from wild tomato species. Association mapping and non- parametric statistical approaches suggest two single nucleotide polymorphisms (SNPs) as the most likely cause(s) of high β-carotene, presumably through their influence on of the gene. A marker-assisted backcross breeding scheme leveraging -wide SNPs was used to rapidly develop a series of genetic resources containing different alleles of the B promoter in a uniform genetic background. Replicated field trials demonstrated that distinct alleles can be used to modulate the levels of β-carotene in ii tomato. These genetic resources are available to develop β-carotene enriched food products or to study dietary adsorption and utilization of carotenoids in the food matrix.

Furthermore, studying the basis of variation in carotenoid biosynthesis in general, and specifically β-carotene, provides a clearer understanding of biochemical regulation and phenotypic variation in .

iii

Acknowledgments

Many people contributed to the formation of this thesis and the findings presented therein. I would like to thank the following people for their kind support and advice: my lab group including Deborah Liabeuf, Nancy Haurachi-Morejon, Nico Lara, Andrew

Kruezman, Brayton Orchard, Gabriel Abud, Marcela Andrade, Eka Sari and Benard

Eriku for their help in data generation, visualization and advice; Troy Aldrich and Jiheun

Cho for taking care of my plants in the greenhouse and field; Kesia Harztler and Mike

Devault for overseeing greenhouse operations; the field crews in Wooster and in Fremont led by Bruce Williams and Matt Hoeflich for maintaining my plants in the field; Jessica

Cooperstone and the entire Schwartz Food Science lab for their hospitality and assistance in carotenoid profiling; members of the MCIC at OARDC for their assistance in sequencing and genotyping; my committee members Dr.’s Leah McHale, Steven

Schwartz and Joseph Scheerens for their willingness to serve on my committee and for their time and effort in reviewing this thesis; Sung-Chur Sim for the excellent mentorship he provided early in my Master’s experience; and my family for their constant support and encouragement. Finally, I would like to thank my advisor, Dr. David Francis, for his continued support, mentorship and kindness throughout my time in graduate school. This thesis was only made possible by the kind support and advice of these people.

iv

Vita

June, 2007 ...... Loudonville High School

May 2011 ...... B.S. Biology, Political Science, Grove City

College

2012 to present ...... Graduate Research Assistant, Department of

Horticulture and Crop Science, The Ohio

State University

Fields of Study

Major Field: Horticulture and Crop Science

v

Table of Contents

Abstract ...... ii

Acknowledgments...... iv

Vita ...... v

List of Tables ...... viii

List of Figures ...... ix

Chapter 1: Introduction ...... 1

Carotenoids ...... 1 β-carotene ...... 2 Carotenoid biosynthesis in tomato ...... 4 Classical characterization of the B locus ...... 7 Molecular characterization of the B locus ...... 8 Exploring allelic variation through sequencing ...... 8 Backcrossing ...... 10 Marker-assisted backcrossing ...... 10 Rationale and Significance ...... 14 Approach ...... 16 Project Aim ...... 20

Chapter 2: Sequencing the promoter of the chromoplast-specific Cyc-B gene in tomato reveals novel high β-carotene alleles and putative functional mutations ...... 21

Acknowledgments...... 21 Abstract ...... 22 Introduction ...... 23 Materials and Methods ...... 25 vi

Results ...... 31 Discussion ...... 35

Chapter 3: Naturally occurring variation in the promoter of the chromoplast-specific Cyc- B gene in tomato can be used to modulate levels of β-carotene in ripe tomato fruit ...... 46

Acknowledgments...... 46 Abstract ...... 47 Introduction ...... 48 Materials and Methods ...... 52 Results ...... 58 Discussion ...... 61

Chapter 4: Conclusion...... 72

References ...... 76

Appendix A: SNPs used for background genome selection in BC1 ...... 85

Appendix B: SNPs used for background genome selection in BC2...... 88

vii

List of Tables

Table 2.1: Carotenoid profiling by HPLC of 29 orange and red-fruited tomato accessions...... 39

Table 3.1: Distribution of percent recurrent parent genome (OH8245) in sub-populations in BC1 and BC2 generations ...... 67

Table 3.2: Sources of variation in β-carotene content in ripe tomato fruit over two years

...... 68

viii

List of Figures

Figure 1.1: Simplified depiction of the carotenoid biosynthetic pathway in tomato

(Solanum lycopersicum L.) ...... 6

Figure 1.2: Illustration of marker assisted backcrossing at the BC1 generation...... 12

Figure 2.1: Variation in the DNA sequence directly 5′ to the Cyc-B gene...... 41

Figure 2.2: Phylogenetic tree of the Cyc-B promoter region (1600 bp)...... 42

Figure 2.3: Putative functional SNPs within the B promoter identified by association analysis ...... 43

Figure 2.4: First putative functional single nucleotide polymorphism (SNP) occurring in the Cyc-B promoter ...... 44

Figure 2.5: Second putative functional single nucleotide polymorphism (SNP) occurring in the Cyc-B promoter ...... 45

Figure 3.1: Distribution of progeny versus percent recurrent parent genome in the BC1 generation ...... 69

Figure 3.2: Distribution of progeny versus percent recurrent parent genome in the BC2 generation ...... 70

ix

Figure 3.3: β-carotene content in ripe tomato fruit according to the source of the B allele

...... 71

x

Chapter 1: Introduction

Tomato (Solanum lycopersicum, L.) is an excellent source of carotenoids and is a globally prevalent food crop. The vivid color changes that occur in tomato fruit during ripening make tomato a model to study carotenoid biosynthesis. Carotenoids that accumulate in the skin and flesh of tomato fruit impart the wide range of colored tomatoes found in local markets and grocery stores. The predominant carotenoid in red tomatoes is lycopene; however, in some tomatoes, lycopene is supplanted by β-carotene, turning the fruit orange. This shift in carotenoid composition is due to the Beta (B) allele of the chromoplast-specific Cyc-B gene (Ronen et al., 2000). Previous research suggested that DNA sequence variation in the sequences 5′ to the start of of the B gene, a region commonly referred to as the promoter, may be responsible for the high β-carotene

Beta phenotype (Dalal et al., 2010; Ronen et al., 2000). Until the work described in this thesis, the functional mutation(s) leading to high β-carotene were unknown and the effects of different high β-carotene alleles on carotenoid content in tomatoes remained to be determined.

Carotenoids

Carotenoids are isoprenoid pigments that in higher plants aid in light-harvesting and photo-protection (Frank and Cogdell, 1996; Goodwin, 1980). The colorful properties of carotenoids attract and entice that then disperse seed. Carotenoids

1 also play an important role in the diet. Humans do not naturally produce carotenoids, therefore they must enter the body through food sources. and vegetables are rich sources of carotenoids and the primary contributors of these pigments in the human diet

(Johnson, 2002; Maiani et al., 2009). The effects of carotenoids on human health have been extensively documented (Johnson, 2002; Rao and Rao, 2007). Consumption of carotenoid-rich foods has been associated with decreased mortality and disease risk

(Diplock et al., 1998; Giovannucci et al., 1995). These health benefits may be due to the antioxidant activity of carotenoids including their ability to effectively quench singlet oxygen and other reactive oxygen species (Fiedor and Burda, 2014). Although the health benefits of carotenoids are well known, questions concerning the bioavailability, absorption and metabolism of carotenoids are still being investigated.

β-carotene

Provitamin A carotenoids are precursors to Vitamin A that can be converted into

Vitamin A once in the body. Provitamin A carotenoids are found in many orange and fruits and vegetables. They are also present in some green leafy vegetables, where their orange color is masked by . β-carotene is a naturally occurring provitamin A carotenoid and has the highest vitamin A activity of compounds frequently found in foods (Haskell, 2012; Olson, 1989; Yeum and Russell, 2002). β-carotene belongs to a group of carotenoids called , which are hydrocarbon compounds. In nature, β-carotene is most commonly found in all-trans isomer form (Grune et al., 2010).

Once in the body, all-trans-β-carotene is catalyzed into all-trans-retinal by β-carotene monooxygenase and then into all-trans retinoic acid (Germain et al., 2006; Grune et al.,

2

2010; Nagao and Olson, 1994).

Vitamin A is a nutrient essential for cellular development, immune function, vision and reproduction (Grune et al., 2010; Ross et al., 2000; Tanumihardjo, 2011).

Vitamin A is important during the development of the embryo and into adolescence and adulthood. In embryos, differentiation and vascularization cease without sufficient

Vitamin A (Ross et al., 2000). Vitamin A deficiency in young children and adults can result in vision loss and increased mortality (Tanumihardjo, 2011). While provitamin A carotenoids are bioavailable in food crops, nevertheless, Vitamin A deficiency remains the leading cause of preventable blindness in many developing countries. Low serum retinol levels occur in approximately 100-140 million pre-school age children and 19.1 million pregnant women worldwide (Black et al., 2008; Iannotti et al., 2013; West, 2002).

Dietary β-carotene is an effective provitamin A carotenoid and can be safely consumed in the diet at physiologic levels (1-10 mg/d) (Haskell, 2012; Novotny et al., 2010).

Supplementation of β-carotene from sources can improve Vitamin A status in at- risk populations; however the homeostatic nature of serum retinol concentrations makes it difficult to estimate effects of supplementation (Haskell, 2012; Ribaya-Mercado et al.,

2007; Strobel et al., 2007). More research is needed to determine the most effective way that β-carotene can be used to counteract low retinol supply.

Beyond its role as a precursor to an essential nutrient, β-carotene has generated interest due to its antioxidant behavior. Supplementation with β-carotene has not been proven to reduce the risk of diseases such as cancer (Druesne-Pecollo et al., 2010; Jeon et al., 2011; Omenn et al., 1996). Studies have also indicated that high doses (20-30 mg/d) of supplemental β-carotene may have negative health effects, particularly in smokers

3

(Omenn et al., 1996). Recent research suggests that cleavage products of β-carotene interfere with retinoid acid receptors, in turn affecting endogenous gene expression

(Eroglu et al., 2010). This antagonistic behavior by β-carotene metabolites may be responsible for the negative health effects observed in human trials involving supplementation.

Carotenoid biosynthesis in tomato

The production of carotenoids in fruit is developmentally regulated and this control plays an important role in the structure and concentration of these compounds.

Tomato serves as a model for understanding the biosynthesis of carotenoids. The existence of fruit color variation has provided insight into the genetic and molecular controls of carotenoid biosynthesis. In tomato, carotenoids are synthesized via the DOXP isoprenoid pathway (Bramley, 2002; Lichtenthaler, 1999). Figure 1.1 shows a simplified representation of the carotenoid pathway in tomato. The colorless compound phytoene is synthesized from geranylgeranyl pyrophosphate (GGPP) by the enzyme phytoene synthase (PSY) (Fraser et al., 1994). Two isoforms of the PSY gene are present in tomato,

Psy-1 and Psy-2. Only Psy-1 contributes to carotenogenesis in tomato fruit (Fraser et al.,

1999). The red carotenoid lycopene is formed following the four desaturations of phytoene into phytofluene, ζ-carotene, neurosporene (Fraser and Bramley, 2004; Namitha et al., 2011). If the tangerine variant of the Carotenoid Isomerase (CRTISO) is present, an alternate lycopene isomer, tetra-cis-lycopene, will accumulate (Isaacson et al., 2002;

Kachanovsky et al., 2012). When cis-lycopene is isomerized into trans-lycopene, the pathway branches, and lycopene is either cyclized into β-carotene or δ-carotene (Ronen et

4 al., 2000; Ronen et al., 1999). In one branch, Lycopene ε-cyclase (CRTL-e) synthesizes lycopene into the orange carotenoid δ-carotene (Ronen et al., 1999). In the other branch, the enzyme lycopene-β-cyclase converts lycopene to β-carotene.

5

geranylgeranyl pyrophosphate PSY phytoene PDS phytofluene PDS ζ-carotene ZDS neurosporene ZDS tetra-cis-lycopene CRTISO lycopene LYC-E Cyc-B δ-carotene γ-Carotene Cyc-B Cyc-B α-carotene β-carotene CrtR-B1 CrtR-B1 lutein β-cryptoxanthin CrtR-B1 zeaxanthin ZEP antheraxanthin ZEP NXS

Figure 1.1: Simplified depiction of the carotenoid biosynthetic pathway in tomato (Solanum lycopersicum L.).

Major are show in italics near the reaction arrow. PSY, phytoene synthase; PDS, phytoene desaturase; ZDS, ζ-carotene desaturase; CRTISO, carotenoid isomerase; Cyc-β, chromoplast-specific lycopene β-cyclase; LCY-E, lycopene ε-cyclase; CrTR-B1, β- carotene hydroxylase 1; ZEP, zeaxanthin epoxidase; NXS, neoxanthin synthase. Modified from (Namitha et al., 2011; Stigliani et al., 2011).

6

Classical characterization of the B locus

Three key mutations affect β-carotene levels in tomato fruit: B, a semi-dominant allele on 6 that increases β-carotene levels in the fruit, old-gold (og) and old-gold crimson (ogc), recessive mutations causing high levels of lycopene accumulation in fruit by eliminating β-carotene production. The B locus has been characterized using classical genetics and was described as a semi-dominant allele (Tomes et al., 1956;

Tomes et al., 1954). Within cultivated tomato, natural variation in high β-carotene accessions traces to multiple sources. Since it was first introgressed into cultivated tomato in the 1950’s, reports described B from the green-fruited S. habrochaites (L. hirsutum) and more recently, S. pennellii (Dalal et al., 2010; Lincoln and Porter, 1950;

Ronen et al., 2000; Tomes et al., 1954). Similar sources of the B allele trace to closely related wild relatives S. galapagense and S. cheesmanii (Miller and Tanksley, 1990; Rick,

1956; Stommel, 2001). Little it known about other sources of B from S. glandulosum, S. minutum, S. chilense, and S. peruvianum (Chmielewski, 1965; Lesley, 1943; Lesley,

1947; Soost, 1956). The widely accepted theory is that tomatoes with the B allele descended from red-fruited species. The occurrence of B allele in a wide variety of wild tomato species suggests that B may predate red-fruited species and was a mutation out of green-fruited species (Chmielewski, 1965). Currently, high β-carotene varieties are available as breeding lines or commercial seed from both public and commercial sources.

Recently released breeding lines homozygous for the B allele from the Galapagos wild tomatoes include 97L63, 97L66 and 97L97 (Stommel, 2001). Several “heirloom” or vintage varieties contain also high β-carotene, and the origin of this variation, until recently, was unknown.

7

Molecular characterization of the B locus

The Cyc-B gene encodes a chromoplast-specific lycopene-β-cyclase enzyme with amino acid homology to capsanthin-capsorubin synthase (Ronen et al., 2000) and neoxanthin synthase (Al-Babili et al., 2000). The activity of the Cyc-B enzyme decreases during fruit ripening resulting in lycopene accumulation and the characteristic red color in ripe tomato fruit (Ronen et al., 2000). In tomatoes with the B allele, β-carotene is accumulated in ripe fruit at the expense of lycopene and fruit are orange colored (Ronen et al., 2000). Fruit of the typical cultivated red-fruited variety primarily contain the carotenoid lycopene (90%) with low levels of β-carotene (5-10%) (Harris and Spurr,

1969). Conversely, in plants with B, β-carotene may account for up to 45% of the total carotenoids. Varieties with og and ogc contain more than 99% lycopene and very little β- carotene (Ronen et al., 2000). While the coding sequence is conserved between plants with B and the wild-type allele b (with the exception of og, ogc, and silent SNP variation), the 5’ untranscribed region (promoter) is highly polymorphic and may be responsible for increased transcription (Dalal et al., 2010; Ronen et al., 2000). The original description of

B promoter variation from S. pennellii outlined polymorphisms in a figure, but did not provide sequence data for the promoter region (Ronen et al., 2000). More recently, 908 bp of sequence upstream of the B gene was released for S. habrochaites and several red- fruited accessions, revealing sequence polymorphisms (Dalal et al., 2010).

Exploring allelic variation through sequencing

Variation in the DNA of may be confirmed by sequencing specific genomic regions or the entire genome. Sequencing involves mapping the genetic code of

8 an organism into a string of nucleotide base pairs: A, T, G, C. Sequencing technology is rapidly advancing with increasing throughput and cost-effectiveness. Currently, there are approximately 440 sequenced available for tomato (2012; Aflitos et al., 2014;

Lin et al., 2014). Sequences may be compared to investigate genetic relationships between multiple individuals or species. Sequence data may also be used to identify single nucleotide polymorphisms (SNPs) that can be assayed efficiently and in a highly parallel fashion, allowing them to be used as DNA-based markers. Currently, SNPs are the most common DNA-based marker due to their widespread availability throughout the genome. With the advancement of sequencing technology, the availability of SNP resources is increasing for tomato (Hirakawa et al., 2013; Sim et al., 2012a; Sim et al.,

2012b). SNP resources may be leveraged for basic research and for practical breeding purposes (Collard and Mackill, 2008). Approaches to marker-assisted crop improvement that were once considered theoretical are now becoming practical due to the revolution in sequence technology.

With nearly 440 sequenced genomes available for tomato and efficient tools for assaying genetic variation, rapid approaches to identify new functional alleles with phenotypes that compliment or expand existing allelic variation are desirable. Variation in B may serve as a demonstration of how sequencing can reveal new alleles while marker-assisted breeding strategies can rapidly allow the assessment of those alleles in a common genetic background.

9

Backcrossing

In plant breeding, it is common to transfer a beneficial trait from a variety into the genetic background of an advanced-bred line (Allard, 1960; Reyes-Valdés, 2000). The process of introgressing a trait is commonly performed using a strategy known as backcrossing. Backcrossing consists of an initial cross between the donor parent with the beneficial trait and the recurrent parent with the desired agronomic characteristics. The resulting progeny are then crossed back to the recurrent parent. In subsequent generations, the trait of interest is selected and the progeny contain less of the donor parent’s genetics. After several generations of backcrossing, a plant is produced containing the beneficial trait along with the desirable agronomic qualities of the recurrent parent. For a single-locus trait and n generations of backcrossing, the proportion of recurrent parent genome is given as (2n+1-1)/2n+1, assuming an infinite population size

(Collard et al., 2005). Without selecting for recurrent parent alleles, it takes approximately 6-8 generations of backcrossing to fully recover the recurrent parent genome (Jiang, 2013). With the advent of efficient genotyping strategies, selecting for the recurrent parent genome using molecular markers aims to reduce the number of generations required for backcrossing.

Marker-assisted backcrossing

Selecting for the trait of interest may be done solely based on phenotype, but in recent decades selecting for the trait has been possible using DNA-based molecular markers as indicators of the presence of the allele (Collard and Mackill, 2008; Semagn,

2006). Selection based on markers may be direct when the functional mutations are

10 characterized or indirect when polymorphisms are associated with the trait due to genetic linkage. Selecting using a marker has advantages when the trait of interest is recessive, or not expressed until late in development. Marker-assisted selection is also useful when phenotyping the trait is time-consuming or expensive. In a backcrossing program, developing a single gene marker for a trait of interest can allow for the selection of the trait when the marker is linked to the gene of interest. This step is commonly referred to as “foreground selection (Hospital et al., 2003). High-throughput assays, utilizing many markers at once, can then be used to select plants that contain the highest percentage of recurrent parent genetics (Herzog and Frisch, 2011; Hospital et al., 1992). Selecting for recurrent parent alleles on non-target is referred to as “background selection” (Figure 1.2) (Hospital and Charcosset, 1997)(Hospital and Charcosset, 1997;

Hosptial et al., 2003). Finally, markers can be used to select for recombination events between the target locus and flanking regions. This approach can be used to reduce the size of the donor chromosome segment to limit the effects of linked donor genes, a phenomenon referred to as linkage drag (Frisch et al., 1999; Hospital et al., 2005).

Markers may also be used to select for desirable combinations of genes or loci in coupling phase (Yang and Francis, 2005).

11

Figure 1.2: Illustration of marker-assisted backcrossing at the BC1 generation.

BC1 progeny are shown as graphical genotypes containing both donor alleles (black) and recurrent parent alleles (white). Selection is based on the proportion of recurrent parent, with selected individual highlighted with a hatched box (Modified from IRRI, 2006).

12

Several factors influence the efficiency of marker-assisted backcrossing including linkage between the target trait and the marker, population size, the existence of undesirable linkage drag and marker number and density (Collard and Mackill, 2008;

Collard et al., 2005; Jiang, 2013). Simulations suggest that high marker density is not required, but suggest that equal spacing of markers on each chromosome is beneficial

(Frisch and Melchinger, 2005; Herzog and Frisch, 2011; Servin and Hospital, 2002).

Further suggested gains in time and cost can be made if large populations are used in early generations (ie. BC1) (Frisch and Melchinger, 2005; Herzog and Frisch, 2011).

Simulations suggest that simultaneously selecting for the trait of interest as well as for the recurrent parent genome during marker-assisted backcrossing can reduce the number of backcross generations needed to recover the desirable agronomic traits (Frisch et al.,

1999).

Marker-assisted backcrossing is widely employed in plant breeding programs.

The technique has been used to introgress beneficial genes and quantitative trait loci

(QTL) in a variety of crops including maize, rice and tomato (Benchimol et al., 2005;

Iftekharuddaula et al., 2012; Jiang, 2013; Lecomte et al., 2004). However, most published studies focus on wide-crosses. The focus on crosses between distant relatives has been due to a scarcity of markers for use within cultivated populations and the predominance of wide-crosses in academic research. Simulations and theoretical work suggest benefits to background genome selection, however there is little empirical research published that demonstrates the gains per generation that are possible using the approach (Frisch et al.,

1999; Herzog and Frisch, 2011; Hospital, 2005) (Frisch et al., 1999; Herzog and Frisch,

2011; Hospital et al., 2005; Hospital et al., 2003).

13

Recently developed SNP resources for tomato provide access to polymorphisms for crosses between cultivated varieties (Hirakawa et al., 2013; Sim et al., 2012a; Sim et al., 2012b). Thus, selecting for recurrent parent alleles on background chromosomes is now a feasible approach in backcrossing schemes within the context of applied plant breeding. The availability of these resources provides tools that can now be used to test theory and gain practical experience with the rapid introgression of novel alleles.

Rationale and Significance

The main goal of this research was to determine if natural variation in the promoter of the Cyc-B gene leads to differences in β-carotene content in tomato fruit. We hypothesized that different alleles of the B promoter would result in distinct levels of β- carotene in ripe tomato fruit. In addition, exploring the extent of natural variation in β- carotene content offers an approach for developing plant genetic resources that can be used to address questions about human absorption, nutrition, and health (Tan et al.,

2010). Understanding the basis of natural variation in β-carotene content may be crucial for modulating β-carotene in tomato. The B gene controls the high-β-carotene phenotype, but the number of alleles of B is unknown. Novel alleles could be used to modulate carotenoid levels in tomato fruit. The widespread consumption of tomato as well as its nutrient profile gives it the potential to be used as a food for combating nutritional deficiencies. Bioavailability and antioxidant activity of carotenoids can be increased during food processing (Dewanto et al., 2002; Reboul et al., 2005; Van Het Hof et al.,

2000). Therefore, processing tomato varieties containing elevated β-carotene levels could be an effective vehicle for delivering β-carotene in the diet and an alternative to

14 supplementation with β-carotene (Stommel, 2001). In addition, consumption of tomato products has been shown to reduce the risk of certain cancers and diseases, suggesting an important role for this fruit in contributing to health (Chichili et al., 2006; Er et al., 2014;

Ferruzzi et al., 1998; Rao and Rao, 2007). We used an accelerated backcrossing approach in which alleles of the B promoter were introgressed into the same processing tomato variety. The resulting progeny were used to test hypotheses concerning β-carotene modulation. While markers are routinely used to introgress traits from a wild donor, this thesis is one of the first examples describing the utility of background genome selection for crosses containing cultivated donor parents. The specific objectives for the research described in this document were:

Objective 1: Survey variation in carotenoid content among orange-fruited accessions of tomato.

Objective 2: Identify novel sequence variation in the B promoter.

Objective 3. Associate phenotypic variation with sequence variation in order to uncover functional variation in the B promoter.

Objective 4: Use background genome selection to introgress distinct alleles of the B promoter into an elite processing tomato variety.

Objective 5: Determine if differences in B promoter alleles are associated with varying amounts of β-carotene in tomato fruit.

15

Approach

Objective 1: Survey variation in carotenoid content among orange-fruited accessions of tomato.

Substantial variation exists for color within tomato. Orange fruit can result from the accumulation of β-carotene, δ-carotene or cis-lycopene in the flesh of ripe fruit. It is difficult to distinguish the difference in orange-colored phenotypes with the naked eye.

Variation in carotenoid content in the fruit may be due to the source of key carotenoid genes and/or interactions between these genes and the genetic backgrounds of each variety. We surveyed 29 tomatoes from a variety of seed sources based on descriptions of flesh-color and carotenoid content in order to determine which tomato varieties contained

β-carotene as the predominant pigment. Carotenoid extractions were performed using pureed ripe fruit samples and profiled using high performance liquid chromatography

(HPLC). Varieties containing elevated levels of β-carotene (>1 mg/100 g fw) were selected for sequencing of the B promoter.

Objective 2: Identify novel sequence variation in the B promoter.

Preliminary evaluation of the B promoter revealed natural sequence variation between high β-carotene tomato varieties. A custom PERL script was used to extract sequence of the B promoter from the reference tomato genome (Heinz 1706 Version SL

1.5) (Tomato Genome Consortium, 2012). Three pairs of nested primers were developed to span 1600 bp of the B promoter. We sequenced 1600 bp directly upstream (5’) of the B coding sequence in three high β-carotene tomato varieties: 97L97 (S. galapagense), 16

LA0716 (S. pennellii) and Jaune Flamme (S. lycopersicum). Comparing the promoter regions with the Heinz 1706 reference genome identified novel mutations including both single nucleotide polymorphisms (SNPs) and insertion deletion mutations. Several SNPs were conserved between the high β-carotene varieties. The presence of polymorphisms throughout the sequenced regions suggested the existence of distinct alleles of the B promoter. Further sequencing was performed to determine if additional alleles existed and to examine the inheritance of the B promoter. Sequencing identified eleven unique haplotypes in 26 additional varieties with either B or the wild-type (b) allele.

Objective 3: Associate phenotypic variation with sequence variation in order to uncover functional variation in the B promoter.

Prior research suggests that the B promoter might contain functional mutation(s) responsible for high β-carotene content in tomato (Dalal et al., 2010; Ronen et al., 2000).

We observed natural variation in both β-carotene content and in promoter sequences for the tomatoes in our collection. Therefore, a combination of naïve association analysis and an association analysis corrected for population structure were performed to discover putative functional mutation(s) within the B promoter. Levels of β-carotene of each variety were compared to 52 SNP or InDel polymorphisms for which the minor allele frequency was greater than 0.05. Two populations were used for this analysis. The first was a large population consisting of accessions from the SolCAP tomato collection (Sim et al., 2012a; Sim et al., 2012b) for which both data on lycopene and β-carotene were available and B locus genotypes could be inferred from pedigree and sequencing (De

17

Nardo et al., 2009; Rubio-Diaz et al., 2011; Rubio-Diaz et al., 2010). The second was a core set in which B loci that were identical by descent based on pedigree were removed from the analysis in order to avoid over sampling the same mutations and confounding genetic replication with technical replication.

Objective 4: Introgress alleles of the B promoter into an elite processing tomato variety.

In order to test the functionality of different B promoters, the alleles must be tested in a uniform genetic background. The red-fruited processing variety OH8245 (S. lycopersicum) was selected as the recurrent parent in the backcrossing scheme. OH8245 is still used as a parent for commercial hybrids and contains desirable agronomic traits for a processing tomato, including a compact growth habit and determinate flowering (Berry et al., 1991). LA3501, LA3502 and Jaune Flamme were selected as donor parents due to the sequence diversity in the B promoters. All of the varieties contain the B allele and display the high β-carotene phenotype. LA3501 and LA3502 are introgression lines containing different segments of the S. pennellii B allele from LA0716 in the background of processing variety M82 (Eshed and Zamir, 1995). Jaune Flamme is an orange-fruited variety from France and is commonly described as an “heirloom”(Kingsolver, 2007).

Sequencing of the Jaune Flamme B promoter suggests that this locus is an introgression from S. habrochaites (Chapter 2). The donor parents represent cultivated varieties with the B allele, which removed the inefficiency inherent in making backcrosses between a wild species and cultivated tomato. The line 97L97 is a high β-carotene tomato previously derived by introgressing an allele of B from S. galapagense into OH8245 as a

18 recurrent parent, and was used as a benchmark in evaluating our high β-carotene backcrossing lines (Stommel, 2001).

A combination of a simple PCR marker and high-throughput marker assays were chosen to accelerate the introgression of the B alleles. I developed a PCR molecular marker to distinguish alleles of the B promoter using a gel electrophoresis assay. This marker was used to select plants with high β-carotene alleles at the seedling stage. This marker allowed selection of plants prior to background genome selection, thus reducing the population used for genome-wide genotyping. A high-throughput array designed to detect 7,720 SNPs was used to compare polymorphic markers between our donor parents and the recurrent parent OH8245 (Sim et al., 2012a; Sim et al., 2012b). From the large array, we selected custom marker sets spaced across the 12 chromosomes of tomato.

These background markers were used to perform background genome selection in which an additional criterion for proportion of recurrent parent genome was used to select plants for evaluation and the next generation of crossing in order to accelerate the backcrossing scheme.

Objective 5: Determine if differences in the B promoter lead to varying amounts of β- carotene in tomato fruit.

Following each round of background selection, the Beta plants containing the highest proportion of recurrent parent genetics were both back-crossed and self- pollinated. The progeny from self-pollination were again genotyped with the B marker, segregated into three genotypic classes, and grown in field trials. Field trials were

19 conducted in the summers of 2013 (BC1S1) and 2014 (BC2S1) at two locations: Wooster,

Ohio and Fremont, Ohio. Fruit were harvested from each plot and frozen as pureed samples. β-carotene and lycopene content were measured for each sample using high performance liquid chromatography (HPLC).

Carotenoid levels in each of the field samples were compared to the B promoter genotypes. Analysis of variance (ANOVA) using a linear regression model was performed to examine the effects of the B promoter source, allele state and specific cross on the levels of β-carotene and lycopene in fruit samples. The experimental model was designed to partition genetic effects and environmental effects that might influence carotenoid levels. Genetic effects included promoter sources, alleles state, and background genome effects. Environmental effects included within field variation, location variation and year-to-year differences. The purpose of the experimental model was to separate genetic effects from environmental effects as well as examining the potential residual effects from other genomic regions.

Project Aim

By manipulating β-carotene production using the B allele, I aim to further tomato’s utility as a bio-fortified crop. Furthermore, studying the basis of variation in carotenoid biosynthesis in general, and specifically β-carotene, can provide a clearer understanding of biochemical regulation and phenotypic variation in plants.

20

Chapter 2:

Sequencing the promoter of the chromoplast-specific Cyc-B gene in tomato reveals

novel high β-carotene alleles and putative functional mutations

Acknowledgments

This chapter is intended as a multi-author publication with the following authors:

Caleb Orchard, Marcela Andrade, Jessica Cooperstone, Gabriel Abud, Brayton Orchard,

Steven Schwartz and David Francis. These authors all made important contributions to the project and without their support, the project would not have been possible. I (Caleb

Orchard) designed the sequencing strategy, and completed sequencing of 15 accessions.

I selected germplasm for carotenoid profiling, performed extractions and analysis of the germplasm, conducted analysis and wrote the manuscript (chapter). Marcela Andrade sequenced the B promoter in 11 S. cheesmanii and S. galapagense accessions. Jessica

Cooperstone provided training and assistance during carotenoid profiling of the tomato collection. Gabriel Abud created a custom PERL script used to extract the B promoter from the reference tomato genome, and initiated sequencing. Brayton Orchard assisted in the design and creation of the figures used in this chapter. Dr. Steven Schwartz kindly provided supplies, equipment and lab space used for carotenoid profiling. Dr. David

21

Francis provided input and direction in all facets of the experiment and writing of the chapter.

Abstract

β-carotene in tomato is an important carotenoid for human health due to its pro- vitamin A activity. The B gene encodes a chromoplast-specific lycopene-β-cyclase, the enzyme responsible for converting trans-lycopene to β-carotene in the carotenoid biosynthesis pathway. Prior research suggests that variation in the promoter of the B gene may modulate β-carotene levels in tomato (Solanum lycopersicum, L.) fruit. The objective of this study was to determine if additional sequence variation exists in the region 5′ to the B gene in high β-carotene tomato varieties. We examined the carotenoid content of 29 vintage and contemporary tomato varieties to identify sources of high β- carotene. We sequenced the promoter region of the B gene in 26 accessions and discovered eleven unique haplotypes, nine of which occurred in high-β-carotene varieties.

Sequence analysis suggested that the B promoter was derived from a wild tomato species.

Variation that exists in tomato with respect to β-carotene may allow for the modulation of carotenoid production and further enhancement of tomato as a bio-fortified crop.

Furthermore, studying the basis of variation in carotenoid biosynthesis in general, and specifically β-carotene, provides a clearer understanding of biochemical regulation and phenotypic variation in plants.

22

Introduction

Tomato (Solanum lycopersicum, L.) is a model organism for studying carotenoid biosynthesis due to changes in fruit color during ripening and an abundance of natural variation affecting color. Lycopene is the predominant carotenoid in red tomatoes, but in some orange-colored tomatoes, lycopene is replaced by β-carotene. β-carotene is a yellow-orange carotenoid that is important for human health as a source of pro-vitamin

A. The B allele of the chromoplast-specific Cyc-B gene has been shown to cause elevated levels of β-carotene in ripe tomato fruit (Lincoln and Porter, 1950; Ronen et al., 2000;

Tomes et al., 1956; Tomes et al., 1954). Despite the cloning of the B allele, the functional mutation(s) leading to high β-carotene tomatoes are unknown. Sequence-based approaches have been widely used to assess genetic variation in plants and to pinpoint specific polymorphisms as candidates for functional analysis. Characterizing the natural variation in B can identify polymorphisms as candidates for the functional mutation(s) as well as yield a more detailed understanding of carotenoid biosynthesis. Novel variation can also be used to generate genetic resources for the enhancement of β-carotene in the diet.

The B allele has been characterized using classical genetics and was described as a semi-dominant allele (Stommel and Haynes, 1994; Tomes et al., 1954). Within cultivated tomato, natural variation in high β-carotene accessions traces to multiple wild species relatives. Since it was first introgressed into cultivated tomato in the 1950’s, reports described B from the green-fruited S. habrochaites (L. hirsutum) (Lincoln and

Porter, 1950; Tomes et al., 1954). Other sources of the B allele trace to S. galapagense,

23

S. glandulosum, S. minutum, S. chilense and S. peruvianum (Chmielewski, 1965; Lesley,

1943; Lesley, 1947; Miller and Tanksley, 1990; Rick, 1956; Soost, 1956; Stommel,

2001). Several “heirloom” or vintage varieties also contain high β-carotene, and the origin of this variation is unknown.

Molecular characterization showed that the expression level of the B gene is increased in plants with the mutant Beta phenotype (Ronen et al., 2000). Instead of primarily accumulating the red carotenoid pigment lycopene, fruit from Beta mutant plants accumulate β-carotene and appear orange when ripe. This compositional change occurs at a key branch in the carotenoid pathway where lycopene is cyclized to β- carotene by the enzyme lycopene-β-cyclase (Ronen et al., 1999). Cyc-B is a chromoplast-specific lycopene-β-cyclase with homology to neoxanthin synthase (Bouvier et al., 2000). B is one of four alleles affecting the chromoplast-specific Cyc-B gene; the other three being old-gold (og) and old-gold-crimson (ogc) and the wild-type b allele

(Ronen et al., 2000). Both og and ogc are loss of function frame-shift mutations in the structural gene (Ronen et al., 2000). The high β-carotene content imparted by the B allele appears to be related to variation in the promoter region (Ronen et al., 2000). The coding sequence is conserved between plants with B and the wild-type allele b, but the region 5′ to the gene in both S. pennellii and S. habrochaites is polymorphic and is correlated with an increase in transcript production in tomato fruit (Dalal et al., 2010; Ronen et al., 2000).

Thus, multiple alleles exist capable of increasing levels of β-carotene in tomato fruit.

Polymorphism in the promoter region also suggests that sequence variation may be an effective tool for tracing different sources of the B allele in cultivated tomato.

24

The objective of this study was to determine if additional sequence variation exists in the region 5′ to the B gene in high-β-carotene tomato varieties. The original description of the B allele from S. pennellii suggested sequence polymorphism in the promoter was responsible for the high β-carotene phenotype, but the authors did not provide sequence data (Ronen et al., 2000). More recently, 908 bp of sequence upstream of the B gene was released for S. habrochaites and polymorphism(s) were described

(Dalal et al., 2010; Mishra et al., 2002). The recent completion of the genomes for two red-fruited tomato varieties, Heinz 1706 (S. lycopersicum) and S. pimpinellifolium accession LA1589, provide sequence resources to further investigate variation at the B locus (Consortium, 2012). We hypothesized that the diverse sources of wild species used in breeding have contributed sequence variation to high β-carotene varieties and that novel sequence variation could be used to distinguish different sources of the B promoter.

Here we describe carotenoid profiling of orange-fruited tomato accessions, sequence diversity across 1600 bp 5′ to the B gene, and genetic analysis to pinpoint specific polymorphisms associated with high β-carotene content in tomato.

Materials and Methods

Plant Materials. Tomato seeds from academic and commercial sources were included in the experiments. Twenty-nine tomato accessions were included for carotenoid profiling, and an additional 15 accessions were included for sequencing. Twenty of the 29 tomato varieties included in the carotenoid profiling were donated by the Tomato Genetics

Resource Center (TGRC) in Davis, California or purchased from Tomato Grower’s

25

Supply Company in Ft. Myers, Florida. The nine remaining varieties originated from both academic and commercial sources, including seeds that were kindly provided by Jay

W. Scott and Sam Hutton of the Gulf Coast Research and Education Center at the

University of Florida, Gainesville. In the sequencing collection, we included tomato varieties exhibiting the Beta phenotype in carotenoid profiles or containing the B allele from distinct genetic backgrounds. Seeds for a Galapagos Islands tomato core collection consisting of 11 S. cheesmanii and S. galapagense were donated from the TGRC. The accessions of S. cheesmanii and S. galapagense contain the B allele and are orange- fruited accessions. LA0716 (S. pennellii) is a green-fruited accession from the TGRC and was used in previous publications due to its functional B allele (Ronen et al., 2000).

LA0716 is also the source of the high β-carotene introgression lines LA3501 and LA3502

(Eshed and Zamir, 1995). Jaune Flamme (S. lycopersicum) is an orange-fruited variety from France with high β-carotene content and frequently described as an heirloom

(Kingsolver, 2007). 97L97 is an orange-fruited processing tomato variety developed by introgressing the B allele from S. galapagense accession LA0317 into Ohio 8245 (S. lycopersicum) (Stommel, 2001). Fla. 456 and Purdue 89-28-1 are orange-fruited varieties.

Fla. 456 contains the B allele from S. chilense. Purdue 89-28-1 is believed to contain the

B allele and originated from Dr. Ed Tigchelaar’s breeding program at Purdue University.

Eight orange-fruited varieties from the 29 profiled were included due to high β-carotene content.

26

Carotenoid Profiling. Two replicates of each tomato variety were grown in the greenhouse at temperatures between 21 °C and 30 °C. Mature fruit were harvested and pureed in a commercial blender (Waring Blender, Stamford, CT). For each variety, two

50 mL tubes of fruit puree were stored at -80 °C for carotenoid extraction. Carotenoids were extracted from fruit puree samples using a modified hexane extraction method

(Ferruzzi et al., 1998). Five milliliters of MeOH was added to approximately 1.5g of blended fruit tissue in a glass test tube. Tubes were probe sonicated [Dismembrantor,

Fisher Scientific, Waltham, MA] for 8 seconds and stored on ice. Samples were centrifuged at 1600 x g for 5 minutes and the supernatant was decanted and kept in a separate test tube. Next, 5 mL of 1:1 hexane: acetone was added to the pellet, and sonicated and centrifuged as described above. Three extractions with 1:1 hexane:acetone were performed and hexane supernatants were pooled. Water was added to the pooled hexane layers to phase separate. Hexane was removed, brought up to 25 mL, and 2 mL aliquots were dried using nitrogen and stored at -80 °C until HPLC analysis. Carotenoid extracts were resuspended in 2 mL of 1:1 MTBE:MeOH and filtered through a 13 mm,

0.2 μm nylon filter. Carotenoid content was quantified using high-performance liquid chromatography (HPLC) on a Waters Alliance 2695 HPLC with a Waters 996 photodiode array detector (Waters, Milford, MA) using a C30 column, 4.6x150 mm, 5

μm pore size (YMC Inc., Wilmington, NC) at 30 °C. A binary gradient was used (solvent

A: 88% MeOH, 5% MTBE, 5% H2O and 2% of a 2% solution of aqueous ammonium acetate, solvent B: 78% MTBE, 20% MeOH and 2% of a 2% solution of aqueous ammonium acetate), flowing at 1.7 mL/min with an injection volume of 10 μL. External

27 controls (Sigma, St Louis) were used to generate standard curves used to quantify all- trans-lycopene (471 nm) and β-carotene (450 nm). A ratio of the molar extinction coefficients for β-carotene and tetra-cis-lycopene was used to obtain the relative slope for tetra-cis-lycopene (440 nm). Varieties that contained high levels (>1 mg/100 g fresh weight) of β-carotene were used for DNA extraction and subsequent sequence analysis.

Sequencing Strategy. tissue was collected from seedlings for each variety and stored at -80 °C for DNA extraction. A PCR amplification strategy was used for sequencing. Genome sequences provided by the Tomato Genome Consortium for the red-fruited tomato varieties Heinz 1706 and LA1589 were searched using stand-alone

BLAST and a custom PERL script was used to retrieve approximately 2500 bp of sequence 5′ of the putative ATG start of translation for the chromoplast-specific Cyc-B gene on chromosome 6 (Consortium, 2012; Tao, 2010). We developed three pairs of tiled primers each amplifying approximately 600 bp spanning 1600 bp 5′ to the structural gene. Primer pairs (5′ to 3′) included 1) TGTTCCTAGGTTCGGTGAGC (F) and

TAAGGTCTTCCGTGCCTGTT (R), 2) CAAAGCCTGTTGCCTTTCTC (F) and

TAATTTTGCAGTTGGGCACA (R), and 3) AATATACCTGCCGTCCATGC (F) and

TCTTCTCAAGCCTTTTCCATC (R).

Procedures for DNA extraction, polymerase chain reaction (PCR), and gel electrophoresis used in the analysis of these samples have been previously described

(Robbins et al., 2009). Briefly, the PCR conditions were as follows: denaturation at 94 °C for 3.5 min, annealing at 56 °C for 30 s, and elongation at 72 °C for 45 s, repeated for 40

28 cycles. Amplified products were purified by precipitation using a 9:1 ethanol: sodium acetate mixture. Sequencing was performed at the Molecular and Cellular Imaging Center in Wooster, Ohio. Each amplicon was Sanger sequenced using the ABI Prism Sequencer

(3100x1; Grand Island, NY, USA), with sequence generated for each amplicon in both the forward and reverse directions.

Sequence Alignments and Marker Development. Sequence data were quality checked and trimmed prior to alignment using the collection of programs in the STADEN software package (Bonfield et al., 1995). Contigs were assembled using PreGap4 and refined manually in Gap4. All new contigs generated in this study were submitted to the

GenBank sequence database (Accession #’s upon publication).

Using the contigs generated in Gap4, sequences were aligned using MUSCLE

(version 3.8), T-Coffee (version 10.00.r1613), and Clustal Omega (version 1.2.0). Using the alignment files from the three software programs, neighbor-joining phylogenetic trees were created in MEGA 6.06 (Tamura et al., 2011) to visualize sequence divergence in the

Cyc-B promoter region.

Following the comparison of alignment software, the phylogenetic tree created in

MEGA 6.06 using the sequence alignment from MUSCLE was chosen due to

MUSCLE’s alignment of the large insertion from LA4380. Briefly, the MUSCLE alignment file was loaded into MEGA 6.06 and the Tamura-Nei model (Tamura and Nei,

1993) and 10000 bootstrap replications were used in the phylogeny reconstruction.

29

Association Analysis. Association analysis was conducted in R version 3.1.0 (R Core

Team, 2014). Several approaches were used based on different assumptions about the independence of data, correction for population structure, and the distribution of trait data. Polymorphisms were scored relative to the initiation of transcription and insertion and deletion polymorphisms were treated as a single mutation.

Naïve association analysis was performed based on a simple linear model with no correction for population structure. For this analysis, levels of β-carotene (mg/g) were treated as a quantitative value and regressed on SNPs and insertion/deletions detected through sequencing or analysis based on the SolCAP infinium SNP array (Sim et al.,

2012a; Sim et al., 2012b). The statistical model was: Yi,n = uX + SNPn,k + Error, where

Y was the vector of phenotypic values for n individuals estimated from HPLC data, X was the population mean, and SNPs were the SNP alleles for k polymorphisms scored on n individuals. Analyses were also conducted using a non-parametric association model based on the Kruskal-Wallis test. For this analysis β-carotene levels were simply expressed as “high” or “low” due to the bi-modal distribution of data.

Finally, an association analysis with a correction for population structure was also performed. For this analysis we used the R package rrBLUP and the GWAS utilities package (Endelman, 2011). The model tested was: Yi,n = uX + SNPn,k + Kn,n + Error, where Y, X and SNP were as described above and K was a matrix to correct for local structure in the B locus. K was derived from the MUSCLE alignment and represents a symmetrical matrix of pairwise DNA sequence identity along the B locus.

30

Two populations were used for this analysis. The first was a large population consisting of accessions from the SolCAP tomato collection (Sim et al., 2012a; Sim et al.,

2012b) for which both data on lycopene and β-carotene were available (De Nardo et al.,

2009; Rubio-Diaz et al., 2011; Rubio-Diaz et al., 2010) and B locus genotypes could be inferred from pedigree and genotyping. This population consisted of 84 individuals. The second was a core set of the larger population of 84 in which B loci that were identical by descent based on pedigree were removed from the analysis in order to avoid over sampling the same mutations and confounding genetic replication with technical replication. This core collection consisted of 31 individuals.

Results

We surveyed tomatoes from a variety of seed sources based on descriptions of flesh-color and carotenoid content in order to determine which tomato varieties contained

β-carotene as the predominant pigment. We measured absorbance using HPLC of hexane extractions from pureed fruit samples for 29 varieties (Table 2.1). Nine varieties contained >1mg β-carotene /100g freshweight tissue and were considered high β- carotene accessions. The amount of β-carotene within these nine varieties ranged from

1.60-4.06 mg/100g. Of the remaining samples, nine varieties were classified as primarily containing all-trans-lycopene. The variety ‘Caro Red’ (LA2374) was unique in that it was classified as high β-carotene (2.97 mg /100g fresh weight), but also contained all- trans-lycopene (3.64 mg /100g fresh weight) as the predominant carotenoid. Eleven varieties contained tetra-cis-lycopene as the predominate carotenoid. The presence of

31 natural variation in β-carotene levels suggested that either differences in genetic backgrounds and or source of the B allele might play a role in determining β-carotene content.

Sanger sequencing was performed for 27 tomato accessions consisting of nine high β-carotene accessions, the green-fruited S. pennellii LA0716 accession (source of high β-carotene in LA3502), the tangerine variety ‘Verna Orange’, five red-fruited accessions, as well as six S. cheesmanii and five S. galapagense accessions with the B allele. These 27 sequences were compared with the references Heinz 1706 (S. lycopersicum) and LA1589 (S. pimpinellifolium) as well as sequences from the three orange-fruited S. galapagense (LA0483, LA1044, LA1401) recently published (Aflitos et al., 2014). Contigs assembled from sequencing reads covered approximately 1600 bp

5′ to the chromoplast-specific Cyc-B gene. Sequence data showed that the promoter region of Cyc-B was highly variable. Sequences from Sungold appeared to be heterozygous and therefore, were not included in the phylogenetic tree or alignments.

We observed 157 SNPs and 19 indels relative to the Heinz1706 reference sequence

(Figure 2.1). LA4380 contained a 258 bp insertion, and we therefore sequenced over

1800 bp for this accession. It appears that all the B promoters are derived from wild species.

We created alignments and phylogenetic trees from the sequence data to visualize the relationship between alleles. The trees using MUSCLE and T-Coffee alignments resulted in identical clustering in MEGA, but the tree using Clustal Omega’s alignment shifted the clustering of LA4380. This shift is most likely due to how Clustal Omega

32 aligns the large insertion in the LA4380 sequence. This insertion occurs within a poly-C string. Because Clustal Omega is less consistent in positioning flanking sequences relative to the insertion, the differences between accessions were magnified in the alignment. While the Clustal Omega tree places LA4380 as the furthest outlier from the red-fruited species, MUSCLE and T-Coffee alignments place this accession within a cluster of sequences originating from green-fruited relatives of the cultivated tomato.

Visual inspection suggests that MUSCLE and T-Coffee minimize differences associated with the position of insertions and deletions.

Eleven distinct B promoter alleles were identified as a result of the clustering analysis (Figure 2.2). Six of these represent alleles that have not previously been described (Dalal et al., 2010; Lesley, 1943; Lincoln and Porter, 1950; Ronen et al., 2000;

Stommel, 2001). Clustering also suggested clades separating the B promoter alleles along the classical subgenus eulycopersicon and eriolycopersicon split corresponding to red-fruited and green-fruited species. Accessions in which the B promoter allele originated from known green-fruited species formed the furthest out-group from the red- fruited sequences. The promoter sequence from Jaune Flamme clustered with promoter sequences from the green-fruited wild species S. habrochaites. The sequence for this variety was identical to that of LA0316. The Galapagos Islands tomatoes formed an independent clade from both the red and green-fruited species. The high β-carotene variety 97L97, derives its B allele from S. galapagense, and clustered in the Galapagos group. The variety Western Seed 862978 appeared to cluster closely with the red-fruited accessions, though independently of the larger group. Purdue 89-28-1 also formed a

33 distinct group near the green-fruited glade. The high β-carotene variety ‘Caro Red’ contained a B promoter allele that was identical to the majority of the red-fruited accessions, suggesting an alternative mechanism for high β-carotene in this variety.

Association analysis was conducted using 52 SNP or InDel polymorphisms for which the minor allele frequency was greater than 0.05. Differences were noted primarily in the number of polymorphisms detected as significantly associated with high

β-carotene and the P-value accompanying the SNP or InDel in each analysis. Analysis based on the larger population and the smaller core set identified the same two SNPs as the most likely candidates for the functional mutations conferring high β-carotene (Figure

2.3). The first SNP, C (red) > T (high beta) is located 401 bp upstream of the ATG

(45899723) at position 45900124 on the H1706 reference sequence (Figure 2.4). The second SNP, G (red) > A (high beta), occurs 506 bp 5′ to the putative Cyc-B start codon, at position 45900229 on the H1706 reference (SLch62.50) sequence (Figure 2.5). These two SNPs occur in all the high β-carotene accessions. The majority (36/52) of SNP or

InDel markers were significant at P < 0.01 using the naïve regression model on the full population (n = 84) for which both phenotypic values and sequence data were available.

The SNPs at position -401 and -506 were significant at P = 3.74 x 10-35. We explored more conservative analysis to adjust for both phenotypic distribution and sub-structure within the population. Using a Kruskal–Wallis (K-W) test to account for a non- parametric distribution of phenotypic values, SNPs at position -401 and -506 were significant at P < 2.2 x 10-16. Again, a high proportion of SNPs or InDel markers were significantly associated with β-carotene content using the K-W test, with 52 significant at

34

P < 0.05 and 34 significant at P < 0.01. When the association model was corrected for local structure using the similarity matrix derived from sequence alignment, SNPs -401 and -506 were the only markers associated with β-carotene content (P = 8.64 x 10-4).

Because a large number of red-fruited accessions derived the B locus by descent, we narrowed the population to a smaller core set in order to minimize the effect of treating identical alleles as separate genotypes in the analysis. Using the smaller core population

(n = 31), the SNPs at position -401 and -501 were significant at P = 4.32 x 10-8 with the

K-W model, P = 7.4 x 10-10 with the naïve regression and P = 0.001 in the K corrected model.

Discussion

Plant germplasm repositories and horticultural seed catalogs contain a wide array of diversity with respect to tomato fruit color. We profiled 29 tomato accessions and determined that of the 20 with orange fruit, nine had elevated β-carotene levels and eleven varieties contained tetra-cis-lycopene as the predominate carotenoid. Tetra-cis- lycopene is an alternate isomer and imparts a characteristic “tangerine” color to tomatoes

(Zechmeister, 1944). Of the nine high β-carotene tomatoes, five were known to be derived from wide crosses in which the green-fruited allele from the B donor imparted orange fruit when introgressed into a red-fruited genetic background.

Sequencing confirmed that the region 5′ to Cyc-B is highly diverse and identified new alleles. We observed known polymorphisms in S. pennellii and S. habrochaites and also identified additional variation that was not described previously (Dalal et al., 2010;

35

Ronen et al., 2000). This novel variation can be used to develop markers for high β- carotene content from various wild tomato species.

Clustering results from this wider panel of B promoters support previous statements that the B alleles found in vintage varieties, contemporary breeding lines and hybrids were originally derived from wild tomato species (Lincoln and Porter, 1950;

Ronen et al., 2000). Clustering of the high-β-carotene varieties in our promoter tree implies that the high β-carotene alleles are ancestral. Although the origin of some of the sequence variation can be obtained from pedigree records, the source of variation in vintage varieties is more ambiguous. For example, within the tomato seed saving community, Jaune Flamme is described as an orange-fruited “heirloom” originating from

France (Kingsolver, 2007). Sequence comparison shows that its promoter is identical to

S. habrochaites LA0316. This identity suggests that Jaune Flamme derived its B allele following the introgression of an S. habrochaites B allele into cultivated germplasm. This introgression may have occurred as early as the 1950’s when studies of this variation were initiated (Lincoln and Porter, 1950; Tomes et al., 1956; Tomes et al., 1954). For most of the high β-carotene germplasm we evaluated, the B allele appears to have been introgressed from a wild relative, with the possible exception of Western Seed 862978.

In addition, the variety ‘Caro Red’ was the only high β-carotene variety to contain a B allele identical to red-fruited accessions. Previous studies have mentioned the presence of the additional gene Beta-modifier (MoB) that interacts with the B allele to influence β-carotene content (Tomes et al., 1954). In the case of ‘Caro Red,’ this modifier

36 or other unknown modifiers may be present and cause elevated levels of β-carotene without a sharp decline in lycopene content.

It has been proposed that sequence elements that exist upstream to the promoter may be responsible for regulating transcription in B or suppressing transcription in b

(Ronen et al., 2000). Association analysis using different statistical models and two populations both identified SNPs at -401 and -506 as the most significantly associated with high β-carotene. These two SNPs were also present in B loci described in the PhD dissertation of Falcone (2009). These SNPs are also found in the S. cheesmanii and S. galapagense accessions included as part of the “150 genome project” (Aflitos et al.,

2014) and the “360 genome project” (Lin et al., 2014) further strengthening the argument that these SNPs represent the most likely candidates for the functional mutations responsible for high β-carotene. Either both or one of these SNPs are the most likely changes affecting expression of the β-carotene phenotype associated with B alleles. One

SNP, located -506 bp from the ATG, is within a putative CAAT box and possibly involved with in binding transcription factors (Dalal et al., 2010). Understanding how the two SNPs contribute to phenotypic variation will require further analysis. In addition, more complete analysis will be required to determine whether the multiple other SNPs and indels also play a role in modulating the Beta phenotype.

The presence of eleven distinct B promoter haplotypes not only illustrates the role of wild species in contributing new genetic variation but also suggests the possibility of additional variation within the genus Lycopsersicum. Breeding efforts involving wild tomato species clearly contributed to sequence diversity 5′ to the B coding sequence.

37

Both transgenic approaches and traditional breeding have been successful at increasing β- carotene content in tomato fruit; however, more precise control may be possible by utilizing Cyc-B promoter variation (Apel and Bock, 2009; D’ambrosio et al., 2004; Dalal et al., 2010; Ronen et al., 2000; Stommel, 2001). The presence of distinct alleles for the varieties we sequenced suggests the possibility of modulating levels of β-carotene through allele choice. Within the subset of Beta accessions, we observed a range of β- carotene levels. It is possible that diverse genetic backgrounds or modifying alleles such as MoB are responsible for this variation; but also equally likely that sequence variation directly 5′ to B is manifested in the observed phenotypic variation. The novel mutations identified in this study may be used for marker-assisted breeding of tomato varieties in order to test this hypothesis. The identification of nine haplotypes associated with high β- carotene may provide plant breeders with increased diversity to modulate β-carotene and carotenoid production through selection.

38

β-carotene trans-lycopene tetra-cis-lycopene (mg/100 g fresh (mg/100 g fresh (mg/100 g fresh Name Accession Seed Sourcez Class weight) weight) weight) LA0316 LA0316 TGRC High β 4.06 0.96 NA 862978 862978 Western Seed Americas Inc. High β 3.24 0.84 NA (via OARDC) Purdue 89-28-1 Purdue 89-28-1 Harris Moran Seed Co. High β 3.19 0.94 NA Fla. 456 Fla. 456 GCREC High β 3.06 1.62 NA LA4380 LA4380 TGRC High β 3.00 2.63 NA Caro Red LA2374 TGRC High β 2.97 3.64 NA 97L97 97L97 John Stommel (via OARDC) High β 2.68 0.14 NA Jaune Flamme N/A Tomato Grower’s Supply High β 1.78 1.19 NA (via OARDC) Sungold N/A FedCo Seeds High β 1.60 1.08 NA LA4421 LA4421 TGRC Red 0.97 3.35 NA

39 Purple Calabash LA2377 TGRC Red 0.79 8.04 NA

German Giant N/A Tomato Grower’s Supply Red 0.31 4.25 NA Georgia Streak LA2969 TGRC Red 0.29 3.30 NA

Table 2.1: Carotenoid profiling by HPLC of 29 orange and red-fruited tomato accessions. continued

zSeed sources from academic institutions included the Tomato Genetics Resource Center at the University of California, Davis, in Davis, California (TGRC); the Gulf Coast Research and Education Center at the University of Florida, in Wimauma, Florida (GCREC); and the Ohio Agricultural Research and Development Center at The Ohio State University, in Wooster, OH (OARDC). Commercial seed sources included Western Seed Americas, in Plant City, Florida; Harris Moran Seed Co. in Modesto, California; FedCo Seeds in Waterville, Maine; and Tomato Grower’s Supply in Fort Myers, Florida. Seed sources designated “via OARDC” indicate that seeds were already present in the collections at the OARDC in Wooster, OH.

39

Table 2.1: Continued

β-carotene trans lycopene tetra-cis-lycopene (mg/100 g (mg/100 g fresh (mg/100 g fresh Name Accession Seed Sourcez Class fresh weight) weight) weight) OH8245 SCT_0116 OARDC Red 0.26 6.08 NA M82 LA3475 OARDC Red 0.24 5.95 NA Georgia Streak N/A Tomato Grower’s Supply Red 0.23 2.14 NA OH7530 OH7530 OARDC Red 0.22 6.96 NA BR03-7264 BR03-7264 OARDC Red 0.20 11.46 NA Hawaiian Pineapple N/A Tomato Grower’s Supply Tangerine NA 0.06 1.82 Kellogs Breakfast N/A Tomato Grower’s Supply Tangerine NA 0.01 3.17 Kentucky Beefsteak N/A Tomato Grower’s Supply Tangerine NA 0.08 3.37 Orange Strawberry SCT_0306 Tomato Grower’s Supply Tangerine NA 0.21 4.79

4 Persimmon N/A Tomato Grower’s Supply Tangerine NA 0.06 3.15

0 Sunray F N/A Tomato Grower’s Supply Tangerine NA 0.04 7.90 Tangella N/A Tomato Grower’s Supply Tangerine NA 0.14 3.37 Valencia N/A Tomato Grower’s Supply Tangerine NA 0.03 3.19 Verna Orange N/A Tomato Grower’s Supply Tangerine NA 0.04 4.40 Orange Minsk N/A Tomato Grower’s Supply Tangerine NA 0.08 5.84 LA2971 LA2971 TGRC Tangerine NA 0.12 2.69

40

Figure 2.1: Variation in the DNA sequence directly 5′ to the Cyc-B gene.

The figure illustrates sequence variation among 31 tomato accessions with either (a) red- fruited promoters or (b) high β-carotene promoters. Each ring represents a unique tomato variety and are as follows from top to bottom: H1706, S. lyc.; LA1589, S. pim.; M82, S. lyc.; OH8245, S. lyc.; BR03-7264, S. lyc.; Caro Red, S. lyc.; Georgia Streak, S. lyc.; Purple Calabash, S. lyc.; Verna Orange, S. lyc.; LA4380, unknown; Western Seed 862978, unknown; Jaune Flamme, unknown; Purdue 89-28-1, unknown; LA0716, S. pen.; FL456, S. chil., LA0316, S. hab.; 97L97, S. gal.; LA0531, S. che.; LA1039, S. che.; LA1406, S. che.; LA1407, S. che.; LA1409, S. che.; LA1412, S. che.; LA0438, S. gal.; LA0483, S. gal.; LA0526, S. gal.; LA1044, S. gal.; LA1136, S. gal.; LA1141, S. gal.; LA1401, S. gal.; and LA1410, S. gal. Nucleotide differences, either insertion/deletion mutations or single nucleotide polymorphisms are shown in black. “0” represents the start codon ATG at position 45899723 according to the Heinz 1706 reference genome SL2.50. Sequence alignment was created using MUSCLE version 3.8 (Edgar, 2004). 41

Figure 2.2: Phylogenetic tree of the Cyc-B promoter region (1600 bp).

Circles indicate tomato varieties that exhibit the Beta phenotype or contain the B allele. Squares indicate varieties in which β-carotene is not the predominant carotenoid in ripe fruit. Neighbor-joining tree was constructed using MEGA 6.06 (Tamura et al., 2013). Ten thousand bootstrap replications were performed using the Tamura-Nei model (Tamura and Nei, 1993) and bootstrap values are given for each branch. 42

- - -

4

3

- - -

Figure 2.3: Putative functional SNPs within the B promoter identified by association analysis.

“Manhattan plot” showing position of SNP or insertion/deletion variation relative to the Cyc-B start codon (ATG = 0) vs –LogP based on statistical analysis for association with β-carotene levels in fruit. Grey bar indicates black bar, P = 0.001. “0” represents position 45899723 according to the Heinz 1706 reference genome SL2.50. Top graph is an association analysis conducted using the R statistical program (R Core Team, 2014) with the rrBLUP package (Endelman, 2011). A correction for structure was based on local sequence variation by converting the distance matrix from MUSCLE to a similarity matrix. Bottom graph is a naïve regression with no correction for structure. 43

H1706_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1589_S.PIM. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC M82_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC OH8245_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC BR03_7264_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC CARO_RED_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC GEORGIA_STREAK_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC PURPLE_CALABASH_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC VERNA_ORANGE_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA4380_UNKNOWN AAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAG--TATAGTGC WESTERN_SEED_UNKNOWN GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC JAUNE_FLAMME_UNKNOWN GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC PURDUE_89281_UNKNOWN GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC LA0716_S.PEN. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC FL456_S.CHIL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0316_S.HAB. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC 97L97_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0531_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1039_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1406_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1407_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1409_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1412_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0438_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0483_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0526_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1044_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1136_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1141_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1401_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1410_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC ************ ****************************** ********

H1706_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1589_S.PIM. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACTGAATG M82_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG OH8245_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG BR03_7264_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG CARO_RED_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG GEORGIA_STREAK_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG PURPLE_CALABASH_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG VERNA_ORANGE_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA4380_UNKNOWN AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG WESTERN_SEED_UNKNOWN AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG JAUNE_FLAMME_UNKNOWN AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG PURDUE_89281_UNKNOWN AAAATATTTTTCATTTTCTTGTCATCGAAAATTATTTATAATTGAAATTGAAATCAAATG LA0716_S.PEN. AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG FL456_S.CHIL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0316_S.HAB. AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG 97L97_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0531_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1039_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1406_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1407_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1409_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1412_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0438_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0483_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0526_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1044_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1136_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1141_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1401_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1410_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG *** ** ***************** ************************ *** **** Figure 2.4: First putative functional single nucleotide polymorphism (SNP) occurring in the Cyc-B promoter.

SNP alleles from tomato varieties with the B mutation are outlined in black. The SNP, C (red) > T (high β-carotene) is located 401 bp 5′ of the ATG (45899723) at position 45900124 on the H1706 reference sequence (SLch62.50).

44

H1706_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1589_S.PIM. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC M82_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC OH8245_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC BR03_7264_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC CARO_RED_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC GEORGIA_STREAK_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC PURPLE_CALABASH_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC VERNA_ORANGE_S.LYC. GAAAGTTCACCGAGAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA4380_UNKNOWN AAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAG--TATAGTGC WESTERN_SEED_UNKNOWN GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC JAUNE_FLAMME_UNKNOWN GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC PURDUE_89281_UNKNOWN GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC LA0716_S.PEN. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC FL456_S.CHIL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0316_S.HAB. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTATCGAAGTATATAGTGC 97L97_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0531_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1039_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1406_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1407_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1409_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1412_S.CHE. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0438_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0483_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA0526_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1044_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1136_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1141_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1401_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC LA1410_S.GAL. GAAAGTTCACCGAAAATAATTTTCTATTTGTGGCATAACTAGTA------TATAGTGC ************ ****************************** ********

FigureH1706_S.LYC. 2.5: Second p utative AAAGTA functionalCTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG single nucleotide polymorphism (SNP) occurring inLA1589_S.PIM. the Cyc-B promoter. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACTGAATG M82_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG OH8245_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG SNPBR03_7264_S.LYC. alleles from tomato varieties AAAGTAC withTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG the B mutation are outlined in black. The SNP, G CARO_RED_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG (red)GEORGIA_STREAK_S.LYC. > A (high β-carotene), AAAGTAoccursCTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG 506 bp 5′ to the putative Cyc-B start codon PURPLE_CALABASH_S.LYC. AAAGTACTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG (45899723)VERNA_ORANGE_S.LYC., at position 45900229 AAAGTAC TTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATGon the H1706 reference (SLch2.50). LA4380_UNKNOWN AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG WESTERN_SEED_UNKNOWN AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG JAUNE_FLAMME_UNKNOWN AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG PURDUE_89281_UNKNOWN AAAATATTTTTCATTTTCTTGTCATCGAAAATTATTTATAATTGAAATTGAAATCAAATG LA0716_S.PEN. AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG FL456_S.CHIL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0316_S.HAB. AAAATATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTGAAACCGAATG 97L97_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0531_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1039_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1406_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1407_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1409_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1412_S.CHE. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0438_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0483_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA0526_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1044_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1136_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1141_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1401_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG LA1410_S.GAL. AAAGTATTTTTCATTTTCTTGTCACCGAAAATTATTTATAATTGAAATTAAAACCGAATG *** ** ***************** ************************ *** ****

45

Chapter 3:

Naturally occurring variation in the promoter of the chromoplast-specific Cyc-B

gene in tomato can be used to modulate levels of β-carotene in ripe tomato fruit

Acknowledgments

This chapter is intended as a multi-author publication with the following authors:

Caleb Orchard, Jessica Cooperstone, Steven Schwartz and David Francis. These authors all made important contributions to the project and without their support, the project would not have been possible. I (Caleb Orchard) designed the B specific PCR marker, selected makers for background genome selection, and performed selection. I designed and organized the field trials, collected fruit, and performed carotenoid analysis. I conducted all statistical analysis and wrote the manuscript (Chapter 2). Jessica

Cooperstone provided training and assistance during carotenoid profiling of the tomato samples from field trials. Dr. Steven Schwartz kindly provided supplies, equipment and lab space used for carotenoid profiling. Dr. David Francis is the PI for the project and provided input and direction in all facets of the experiment and writing of the chapter.

46

Abstract

Carotenoids are naturally occurring pigments found in a variety of colors in plants, fungi and . In the human diet, carotenoids serve as precursors for essential vitamins and nutrients. While many carotenoids exhibit pro-Vitamin A activity and are available in food crops, nevertheless, Vitamin A deficiency remains the leading cause of preventable blindness in many developing countries. Tomato (Solanum lycopersicum, L.) is a globally prevalent food crop and an excellent source of carotenoids. Ripe tomatoes normally contain the red-colored pigment lycopene and low levels of β-carotene. Levels of β-carotene can be increased in tomatoes by breeding varieties to contain the B, an allele of the chromoplast-specific Cyc-B gene. The objective of this research was to determine if differences in the promoter region of the B gene lead to varying amounts of

β-carotene in tomato fruit. A marker-assisted backcross breeding scheme leveraging genome-wide SNPs was used to rapidly develop a series of genetic resources containing different alleles of the B promoter in a uniform genetic background. Replicated field trials demonstrated that distinct alleles can be used to modulate the levels of β-carotene in tomato. These genetic resources are available to develop β-carotene enriched food products or to study dietary adsorption and utilization of carotenoids in the food matrix.

47

Introduction

Recent interest in selecting tomato (Solanum lycopersicum, L.) varieties with diverse colors has been driven by interest in market diversification and the nutritional properties of pigments found in tomato fruit. Tomatoes with pigments ranging across the color spectrum are routinely visible at markets and grocery stores. As consumers become more health-conscious, the use of tomato as a vehicle for delivering nutritional compounds becomes increasingly relevant.

Carotenoids play an important role in the biology of plants and provide nutritional benefits to animals that consume plants. Considerable documentation is available concerning the health benefits of carotenoids in humans (Johnson, 2002; Rao and Rao,

2007). Consuming fruits and vegetables, rich sources of carotenoids, can provide nutritional and health benefits such as decreased risk of certain diseases. Lycopene is a red pigment in tomatoes that is associated with a reduced risk for prostate cancer, among other diseases (Er et al., 2014). The red-orange pigment β-carotene is the most common form of pro-vitamin A and plays a role in cellular development and immune function

(Grune et al., 2010). Other carotenoids such as lutein may protect against macular degeneration and cis-lycopene is of interest due to its increased bioavailability compared to all-trans-lycopene (Armoza et al., 2013; Hadley et al., 2003; Unlu et al., 2007a; Unlu et al., 2007b).

Tomato accumulates carotenoids at different stages of the ripening process. Green tomatoes primarily contain β-carotene, lutein and violaxanthin, while breaker stage tomatoes begin to accumulate lycopene. The activity of lycopene β-cyclase (Cyc-B), the

48 enzyme that converts lycopene to β-carotene, decreases, and by ripe stage lycopene is the predominant carotenoid in red tomatoes (Bramley, 2002; Namitha et al., 2011). The Cyc-

B gene encodes a chromoplast-specific lycopene β-cyclase that is active in tomato fruit

(Ronen et al., 2000). Natural variation for alleles of the Cyc-B gene cause elevated levels of either lycopene or β-carotene in the fruit. Frame-shift mutations in the structural gene occurring in the old-gold (og) and old-gold-crimson (ogc) alleles result in high lycopene tomatoes. In contrast, plants with distinct B alleles have elevated levels of β-carotene in the fruit due to increased cyclization of lycopene by lycopene β-cyclase (Dalal et al.,

2010) (Chapter 2).

The regulation of B is not fully understood, but prior research suggests that the promoter contains sequence variation important for transcriptional control (Dalal et al.,

2010; Ronen et al., 2000). Multiple alleles of the B promoter exist, including several from

S. pennellii, S. habrochaites, S. galapagense and S. chilense. Recently released processing tomato varieties with the B allele from S. galapagense and containing high β- carotene include 97L63, 97L66 and 97L97 (Stommel, 2001). Due to natural variation at the B locus, controlling levels of β-carotene through allele choice may be feasible.

In order to provide new genetic resources for nutritional research on β-carotene and to add to the existing collection of high β-carotene processing varieties, we elected to introgress B alleles into a uniform processing tomato background using marker-assisted backcross breeding. Marker-assisted backcross breeding has been widely used in plants as a means to introgress desirable traits (Collard and Mackill, 2008; Jiang, 2013). In a backcrossing strategy, markers linked to the trait of interest can be used to select

49 individuals containing the desirable trait. This first selection step is often referred to as

“foreground selection” and is especially useful when the phenotyping the trait is expensive, difficult or time-consuming (Collard and Mackill, 2008). Foreground selection is also advantageous when the phenotype of interest is recessive, or not displayed until the later stages of in plant maturity. Next, plants with recombination events between the target locus and flanking markers are selected, thus reducing the size of the donor chromosome segment and limiting linkage drag (Frisch et al., 1999; Hospital, 2005).

Background markers located on other chromosomes are then used to recover desirable alleles from the recurrent parent (Hospital et al., 1992). The selection for recurrent parent alleles is often referred to as “background selection”. Simulations suggest that repeated rounds of background selection can eliminate nearly all remnants of the donor genome and drive individuals to homozygosity after three to four generations of backcrossing

(Frisch et al., 1999). While high marker density is not required, working with large populations, especially in early generations (ie. BC1) is suggested to further increase gains in time and cost (Frisch and Melchinger, 2005; Herzog and Frisch, 2011). This increase in efficiency is a marked improvement over conventional backcrossing that may take up to 6-8 generations to complete introgression. Spacing markers equally across chromosomes has been proposed as a strategy to further improve background selection efficiency (Herzog and Frisch, 2011; Servin and Hospital, 2002). Marker spacing depends upon chromosome size, the total number of markers per chromosome and the generation in which selection is performed (Servin and Hospital, 2002). Simulation

50 studies recommend equally spaced markers at approximately 10-20cM (Frisch and

Melchinger, 2005; Herzog and Frisch, 2011; Servin and Hospital, 2002).

Despite a theoretical advantage, the practice of background selection has not been widely reported in tomato. Its use has been restricted due to lack of polymorphic markers and few examples of crosses between cultivated varieties within published research.

Recently, the availability of SNP resources has increased the opportunity for background selection (Hirakawa et al., 2013; Sim et al., 2012a; Sim et al., 2012b). A primary objective of this study was to determine if marker-assisted backcross breeding could accelerate trait introgression in tomato.

Wild tomato species often harbor important alleles for cultivar improvement. We introgressed B alleles into the same processing variety. Although the origin of these alleles was from the wild species S. pennellii and S. habrochaites, the genetic background of the donors were cultivated varieties ‘M82’ and ‘Jaune Flamme’. Our goal was to determine if differences in the promoter region of B (alleles of B) lead to varying amounts of β-carotene in tomato fruit. A molecular marker for the B promoter was developed to select for high β-carotene content and a custom marker array facilitated recovery of the recurrent parent genome. Finally, replicated field trials were conducted to determine if altered levels of β-carotene could be achieved. Variation that exists in tomato with respect to β-carotene may allow for the modulation of carotenoid production and further enhancement of tomato as a bio-fortified crop.

51

Materials and Methods

Plant Materials and Crossing. The Tomato Genetics Resource Center (TGRC) in Davis,

California contributed the seeds for the donor parents LA3501 and LA3502. LA3501, also known as IL 6-2, and LA3502, also known as IL 6-3, contain the B allele introgressed into the M82 genetic background from S. pennellii (Eshed and Zamir, 1995).

Seeds for ‘Jaune Flamme’ and the recurrent parent, OH8245, were in the germplasm collection at the Ohio Agricultural Research and Development Center in Wooster, Ohio.

‘Jaune Flamme’ is described as an orange-fruited “heirloom” tomato variety from France and was originally purchased from Seed Savers Exchange (Kingsolver, 2007). The processing variety OH8245 was developed by Stan Berry at the Ohio Agricultural

Research and Development Center in Wooster, Ohio (Berry et al., 1991). 97L97 is an orange-fruited processing tomato variety developed by introgressing the B allele from S. galapagense accession LA0317 into Ohio 8245 (Stommel, 2001). Purdue 89-28-1 has the

B allele from an unknown source and originated from Dr. Ed Tigchelaar’s breeding program at Purdue University. Tomato varieties were grown in the greenhouse at temperatures between 21 °C and 30 °C. At each generation of recurrent backcrossing, progeny were crossed to OH8245.

Marker Development. A molecular marker that distinguishes S. pennellii, ‘Jaune

Flamme’, and red-fruited B alleles was developed using sequence variation in a region

655 bp upstream from the initiation codon of the B gene. We designed a primer pair flanking a region with two unique insertion-deletion mutations. The primer pair (5′ to 3′)

52 is as follows: CGTCTTAGGCTTGGGTTAGTTG (forward) and

TGAGCTTCGCAACTTTCTCA (reverse). This PCR-based marker was used to select for B alleles in both the heterozygous and homozygous states.

For the initial round of background selection, a custom panel of 96 single nucleotide polymorphism (SNP) markers polymorphic between OH8245 and either

LA3501, LA3502 (M82) or ‘Jaune Flamme’ was chosen from the 7,720 SNP markers available on the SolCAP Infinium SNP Array (Sim et al., 2012a; Sim et al., 2012b).

According to the SolCAP Infinium data, each background (Jaune Flamme or M82) had

28 unique markers that were polymorphic versus OH8245. Forty additional markers overlapped between all sources, totaling 68 polymorphic markers per source. Markers were selected on both arms of the twelve tomato chromosomes with a minimum distance of 5 cM between each SNP.

Using the same criteria as in the first marker design process, we developed a custom panel of 72 SNP markers for the second round of background selection.

According to the SolCAP Infinium data, the M82 background had 18 unique polymorphic markers versus OH8245, while the Jaune Flamme background had 29 unique markers. Twenty-five additional markers were polymorphic between OH8245 and both backgrounds. To further reduce the segments of donor parent genome, these markers were chosen in chromosomal regions that were not fixed after the first round of background selection.

53

Marker-assisted Selection. Prior to genotyping, leaf tissue was collected from seedlings in the greenhouse and stored at -80 °C until DNA extraction. DNA extraction and polymerase chain reaction (PCR) were performed according to procedures described in

Robbins et al., 2009. Briefly, DNA extraction was performed in 96-well cluster tubes

(Corning, Inc., Corning, NY) using a CTAB procedure and a GenoGrinder (OPS

Diagnostics, Lebanon, NJ). The PCR conditions included denaturation at 94 °C for 3.5 min, annealing at 56 °C for 30 s, and elongation at 72 °C for 60 s, repeated for 40 cycles.

PCR products were run on a two percent agarose gel at 180 V. We genotyped backcross 1

(BC1) progeny with the B PCR marker and selected individuals heterozygous for the B promoter allele. Heterozygous individuals and the parental controls were genotyped with a custom 96 SNP marker array using a ligation-extension SNP assay as implemented on the Illumina BeadXpress platform (Illumina, Inc. San Diego, CA, USA). Following genotyping, data were quality-checked for correct clustering using Genome Studio v2011.1 (Illumina, Inc, San Diego, CA, USA). A custom R script was used to facilitate selection of individuals containing the highest percentage recurrent parent genome (R

Core Team, 2014). The R script utilized the data.table v1.92 package for genotype calling and a custom script for allele frequency counting (Chan, 2012;Dowle et al., 2014).

Selected progeny were then subjected to a further round of backcrossing.

BC2 progeny were again genotyped with the B PCR marker and individuals heterozygous for the B promoter were identified for a second round of background selection. These BC2 progeny and parental controls were genotyped with 72 SNP markers using the Kompetitive Allele Specific PCR (KASP) assay

54

(http://www.lgcgenomics.com). Genotyping and quality analysis was performed by K-bio

(LGC Genomics) in Beverly, Massachusetts. Selection was again performed using the custom R script to identify individuals with the highest proportion of OH8245 alleles.

Field Trials. Following the first round of background selection, BC1 progeny with >83% recurrent parent markers and the B allele were self pollinated to produce BC1S1 plants.

Multiple BC1S1 individuals were retained for the Jaune Flamme and LA0716 sources.

These were maintained as separate families within each of the three donor populations.

Within each population and family, BC1S1 plants were genotyped with the B PCR marker and separated into three classes relative to the state of the B locus: BB, Bb, and bb.

Following genotyping, the BC1S1 plants were transplanted in the field. In the summer of

2013, BC1S1 plants were grown in the field in a randomized complete block design at two locations: Wooster, Ohio and Fremont, Ohio. Each location contained two replications.

Plants were grown in plots ten feet long, with one and one half feet spacing between plants. Most plots contained between eight and ten plants. Replicated checks were included at each location. Check varieties included Jaune Flamme, M82, OH8245,

Purdue 8928-1, LA3501, LA3502, and 97L97. Ten ripe fruit were harvested from each plot. To avoid differences in ripeness, harvest was staged to collect fruit from the second cluster at full ripeness. Over a period of 4 weeks, a total of five harvests at the Wooster location and seven at the Fremont location were performed to account for differences in plant maturity. Fruit from each plot were pureed in a Waring blender (Waring Blender,

Stamford, CT, U.S.A.) and stored in 50 mL tubes at -20 C until carotenoid extraction.

55

Field trials were conducted again during the summer of 2014. Following the second round of background selection, selected BC2 progeny were self pollinated and separated according to population and family, with population linking the selections back to the B source and family linking back to the BC1 selection. BC2S1 plants were genotyped with the B PCR marker and divided into BB, Bb, and bb allele classes. Plants were grown on the same farms in Wooster and Fremont, Ohio. The experimental design was the same as that used in 2013. Exceptions include dropping the LA3501 population due to extreme necrosis of plants observed in 2013. Checks for the 2014 field trial were

PS696 and the hybrid formed by crossing 97L97 and Purdue 8928-1. Fruit samples were treated as described above.

Carotenoid Profiling. Carotenoid extractions and high performance liquid chromatography (HPLC) were performed as described previously (Ferruzzi et al.,

1998)(Chapter 2). Briefly, carotenoid extractions used a modified hexane extraction method (Ferruzzi et al., 1998). Carotenoid extracts were divided into 2 mL aliquots, dried using nitrogen and stored at -80 °C until HPLC analysis. Carotenoid extracts were resuspended in 1 mL/250 uL of 1:1 MTBE:MeOH and filtered through a 13 mm, 0.2 μm nylon filter. HPLC analysis was conducted on a Waters Alliance 2695 HPLC with a

Waters 996 photodiode array detector (Waters, Milford, MA) using a C30 column,

4.6x150 mm, 5 μm pore size (YMC Inc., Wilmington, NC) at 30 °C. A binary gradient consisting of two solvents was used (solvent A: 88% MeOH, 5% MTBE, 5% H2O and

2% of a 2% solution of aqueous ammonium acetate, solvent B: 78% MTBE, 20% MeOH

56 and 2% of a 2% solution of aqueous ammonium acetate). The injection volume was 10

μL and flowed at 1.7 mL/min. Standard curves were generated using external controls

(Sigma, St. Louis) and used to quantify all-trans-lycopene (471 nm) and β-carotene (450 nm).

Data Analysis. An analysis of variance (ANOVA) was performed using a linear regression model to examine the effects of promoter source (population), BC1 selection

(family) and B allele state (BB, Bb, bb) on β-carotene content in tomato fruit. Analysis was performed using the lmer function of the lme4 package in the R statistical program

(Bates, 2014). The experimental model was as follows:

y = u + POP + FAM(POP) + Allele_B + B*POP + LOC + REP(LOC):YEAR + YEAR +

Error

The variable y is a vector of phenotype values measured, specifically β-carotene and lycopene content; u is the grand mean for the variable in question. All terms in the model were considered random. POP denotes population effect due to the source of the

B allele (Jaune Flamme or LA0716). FAM(POP) is the effect due to family within population, and is an indication of allele affects other than B that may be segregating.

Allele_B is the effect due to having either the B, wild-type b, or heterozygous allele state.

The interaction B*POP is the effect of the allele in the population. LOC denotes the experimental location, while REP(LOC):YEAR is the term describing variation within a particular environment. YEAR denotes the year in which the experiment was conducted.

57

Finally, Error is the residual variation unaccounted for in the experimental design. The proportion of variance explained was estimated from lme4 using the “Summary” statement. Significance was determined for model effects by ANOVA using a fixed effects model containing the same variables as the random effects model.

Results

We used a combination of PCR-based and high-throughput marker assays to accelerate the introgression of the B locus into a processing tomato background. We developed a PCR-based molecular marker to distinguish alleles of the B promoter. The primer pair amplified a region 655 bp upstream of the B gene in which both S. pennellii and ‘Jaune Flamme’ have different length deletions relative to the red cultivated varieties.

The marker identified the deletion mutations CCTTGTTCGAGTTCCTTTATAAA in

‘Jaune Flamme’ and GTCTAATACTCTAACTA in S. pennellii and allowed three alleles of B to be distinguished.

We genotyped 1444 backcross 1 (BC1) progeny from three populations with the B

PCR marker and selected 384 individuals heterozygous for the B promoter allele. The three populations were derived from donor parents Jaune Flamme, LA3502 and LA3501 backcrossed to recurrent parent OH8245. Each population contained 137, 140 and 94 individuals heterozygous for B, respectively. These 384 heterozygous plants were genotyped with a custom set of 96 SNP markers to select those individuals containing the highest percentage of recurrent parent genome. The 96 SNP marker array we selected for the BC1 stage averaged 8 markers per chromosome, with a minimum of 6 markers and a

58 maximum of 10 markers per chromosome (Appendix A). The average proportion of recurrent parent genetics for each population and the selected individuals is displayed in

Table 3.1. All three populations averaged approximately 75% recurrent parent genome, based on the markers included in the assay (Figure 3.1). The Jaune Flamme population averaged 75.1%, LA3501 averaged 74.9% and LA3502 averaged 73.7% recurrent parent genome. Fourteen individuals, 4-5 from each donor, with between 83%-88% recurrent parent alleles, were selected for further backcrossing. These individuals represent plants that contain a proportion of recurrent parent alleles expected at the BC2 stage.

The 14 selected BC1 plants were backcrossed to OH8245 to produce BC2 progeny. Forty-nine individuals heterozygous for the B promoter were identified following PCR screening of 95 BC2 plants with the B PCR marker. For the second round of background selection, we chose 72 SNP markers located in areas of the genome that were not yet fixed after the first round of background selection (Appendix B). Marker number averaged 6 SNPs per chromosome, but additional SNPs were selected on chromosomes with multiple segments that were not fixed according to the BC1 stage marker data. Nineteen markers were monomorphic and were removed from the dataset.

The remaining 51 markers were used for background genome selection. One individual was discarded due to missing data, reducing the dataset to 48 individuals. The average proportion of recurrent parent genome for the three populations was 89.3%, 92% and

92.5% for Jaune Flamme, LA3501 and LA3502, respectively (Table 3.1). These averages are near the expected range of 91.5%-94% recurrent parent genome based on the BC1 selections. The total population of 48 individuals averaged 90.9% recurrent parent

59 genome (Figure. 3.2). Background genotyping identified 6 BC2 plants, two for each population (Jaune Flamme, LA3501 and LA3502) containing between 93% and 98%

OH8245 genetics.

Field trials consisting of segregating populations for the B promoter were conducted to determine if the source of the B promoter and the state of the B allele affected β-carotene production in the fruit. Segregating populations from BC1 selections were planted in two locations in 2013 and in 2014 with segregating populations from BC2 selections. Populations from B sources LA3501, LA3502 and Jaune Flamme were planted in 2013. In 2013, severe necrosis was observed in plants that were homozygous for the introgression from LA3501. Therefore, all plots from the LA3501 population were not included in statistical analyses and were dropped from the 2014 field trials.

Each year, fully mature fruit were harvested from individual plots and the levels of β-carotene in ripe fruit samples were quantified using HPLC. Variance partitioning by

ANOVA using a linear regression model revealed that allele state explained the largest proportion of total variation in β-carotene content (Table 3.2). For field data combining the 2013 and 2014 trials, the allele state was responsible for 52.1% (p <.001) of the total variation for β-carotene content. The source of the B promoter explained 7.7% of the total variation for β-carotene content (p <.001). The effect of the allele within each promoter source explained 16.44% (p <.001) of the total variation for β-carotene content. Family within promoter source explained less than 1% of the total variation for β-carotene content, but was significant (p = .002). Year was also significant (p = .01), but explained less than 1% of the total variation for β-carotene. Location, replication within location by

60 year and the interaction terms location by year and population by year did not explain a significant portion (p > .05) of the total variation for β-carotene. Residual error explained

11.1% of the variation for β-carotene.

Plants with wild-type (bb) allele averaged 0.54 mg/100 g fw β-carotene while plants that were homozygous for B had 2.79 mg/100 g fw β-carotene. Overall, when the

B allele was in the heterozygous state, β-carotene levels (2.16 mg/100 g fw) nearly reached those of the homozygous B plants. We compared plants with the B allele introgressed into OH8245 from three donor sources (S. pennelli, Jaune Flamme and S. galapagense). Across both years, the Jaune Flamme and LA3502 BC1 populations had

3.36 and 2.12 mg/ 100g fw β-carotene, respectively, compared to the 4.15 mg/ 100 g fw

β-carotene levels of 97L97 in 2013 (Figure 3.3). These results demonstrate that the alleles of B inherited from S. galapagense, S. pennellii, and Jaune Flamme are able to differentially modulate the level of β-carotene in fruit.

Discussion

Theoretical and simulation studies suggest that background genome selection will be effective with few markers and population sizes that are reasonably maintained in a breeding program. Combining single marker assays with high-throughput marker assays is proposed as a strategy to further reduce costs and optimize the background selection process (Herzog and Frisch, 2011). Recent simulations have suggested genotyping large populations in early generations of backcrossing (Herzog and Frisch, 2011). We adopted this approach in the BC1 stage and used single marker PCR assays to reduce the number

61 of individuals required for high-throughput genotyping. The B PCR marker that we developed could differentiate the heterozygotes and homozygous alleles of the B promoter. We used this marker to distinguish alleles of the B promoter from S. lycopersicum, S. pennelllii and S. habrochaites. Additional variation exists in the B promoter that could be used as either PCR or SNP markers for other B alleles (Chapter

2).

The availability of recently developed SNP resources permit background genome selection for tomato, even when the crosses involve cultivated donor and recurrent parents (Sim et al., 2012a; Sim et al., 2012b). High-throughput SNP arrays can be used as a cost-effective tool for facilitating population-specific background genome selection strategies (Herzog and Frisch, 2011). Although several thousand SNPs are available throughout the tomato genome, simulations suggest that 2-4 markers per 100 cM chromosome are sufficient to maintain a high level of background selection efficiency

(Hospital et al., 1992; Servin and Hospital, 2002; Visscher et al., 1996). Increasing marker density beyond this level may not yield significant gains in the ability to recover the recurrent parent genome (Hospital et al., 1992). We used custom sets of 96 and 72

SNP markers to select backcross individuals with the most recurrent parent genetics at the BC1 and BC2 stages. The two donor sources of the B allele, M82 and Jaune Flamme, necessitated the average marker number per chromosome for the BC1 array to be twice the minimum number recommended in simulation studies because some markers were only polymorphic across a single donor source. In addition, the input requirements for

BeadXpress (Illumina, San Diego, CA) platform determined that 96 SNPs should be used

62 once we determined that 48 would provide insufficient coverage for the two donor genomes. In the BC1 generation, the average recurrent parent genetics for all three populations was approximately 75 percent. This percentage matches the expected proportion at BC1 in a conventional backcrossing scheme. We were able to select individuals containing between 83 percent and 88 percent recurrent parent genetics, essentially skipping a generation of backcrossing. These gains were repeated at the BC2 stage. After two rounds of foreground and background selection, the final BC2 plants were approximately equivalent to BC4 individuals, containing between 93 percent and 98 percent recurrent parent genetics. Simulation studies suggest that increasing marker density and population size each generation can reduce the total number of marker data points needed for background selection (Prigge et al., 2009). While this approach may be optimal, we still observed expected gains with relatively small population sizes and marker number at the BC2 stage. Targeting specific regions of the genome that were not fixed after initial rounds of background genome selection may allow for reduced marker numbers in later rounds of selection. The alternate approach of saturating these targeted regions with markers, thus increasing the overall marker number, may also allow for efficient recurrent parent genome recovery in later generations. Overall, the results from two rounds of background genome selection confirm that marker-assisted backcrossing with relatively small marker sets can be used to make generational gains.

There are few empirical examples of background genome selection in tomato. Of these studies, most did not utilize SNPs and focused on introgressed traits from wide crosses with wild tomato species (Goodstal et al., 2005; Lecomte et al., 2004). The

63 present study is one of the first examples demonstrating the practical use of background genome selection in crosses between cultivated germplasm. While the B allele was derived from wild species in both Jaune Flamme and the S. pennellii introgression lines, the donor genomes were both cultivated (Chapter 2). Furthermore, M82, the donor genome in LA3501 and LA3502, is in the same market class as OH8245. Both represent processing tomato germplasm, with OH8245 in the Midwest U.S. clade and M82 in the

California clade as a subselection from UC82. As SNP resources for tomato continue to expand, background genome selection for cultivated crosses may become routine. Even when working with cultivated donor genomes, there are still challenges to performing background genome selection. The necrosis observed in the 2013 field trials in the

LA3501 backcross populations may have been due to additional sequence elements such associated with LA3501 introgression. A study describing the fine-mapping of RXopJ4, a

S. pennellii resistance locus for bacterial spot of tomato caused by Xanthomonas perforans (Xp), observed severe leaf necrosis in populations using LA3501 as a parent; suggesting negative effects from linkage drag (Sharlach et al., 2013).

With respect to the B locus, natural variation can be exploited to modulate levels of β-carotene in ripe tomato fruit. Cultivated varieties with elevated β-carotene and lycopene content are currently available on the market; however further modulation of carotenoid content may be required when developing new biofortified foods (Hirschi,

2008; Silletti et al., 2013; Welch, 2002). In addition, plant genetic resources with assorted carotenoid contents can be used to test hypotheses about human adsorption, utilization and metabolism of nutrients in the food matrix (Unlu et al., 2007a; Unlu et al., 2007b).

64

We introgressed promoter alleles for high β-carotene content into the same cultivated processing variety genetic background. Three B promoter alleles were compared; each with distinct sequence differences from the donor parents. Marker- assisted backcrossing in two generations led to the selection of plants that shared 93-98% of the recurrent parent, and elevated β-carotene content due to alleles from S. galapagense, S. pennellii, and S. habrochaites. Field trials containing progeny from these selections showed that modulation of β-carotene content could be achieved through allele choice. The state of the B allele (bb, Bb, BB) accounted for the most variation in β- carotene content, because plants with the wild-type allele (bb) do not exhibit increased cyclization of lycopene by lycopene-β-cyclase (Ronen et al., 2000). The donor source of the B allele was more important than family within a specific source, demonstrating that the three alleles had the effect of modulating β-carotene levels. This result suggests that diversifying donor sources of the B allele can also diversify the concentration of β- carotene`. Interestingly, in 2013, family within population explained approximately 15% of the total variation in β-carotene content. Following selection and crossing, this variation was reduced to 0.65% in the two-year dataset. Backcrossing the BC1 selections to form the BC2 populations may have removed other influential loci outside of the B locus.

Results from replicated field trials support the classical definition of B as a semi- dominant allele. Levels of β-carotene in heterozygotes were nearly as high as those found in homozygous BB plants. Thus, further modulation of β-carotene content may be possible by using these alleles in the heterozygous condition.

65

Fine-tuning of carotenoid content in tomato is possible using natural allelic variation. The findings of the present study have implications for the development of functional foods with various carotenoid profiles. This study also provides a model for rapidly introgressing natural variation into an elite tomato background, and assessing the effects of that variation. As plant breeders continue to fix beneficial alleles of traits in crops, using existing allelic variation may provide further enhancements for important traits.

66

Generation

BC1 BC2 Jaune Jaune Flamme LA3501 LA3502 Flamme LA3501 LA3502 N 136 93 140 21 22 5 (5)z (5) (4) (2) (2) ( 2) Minimum 61.2 61.3 58.5 84.3 87.3 90.0 (84.5) (84.7) (83.1) (94.1) (96.1) (93.0) Maximum 87.1 88.7 84.6 98.0 96.1 96.1 (87.1) (88.7) (84.6) (98.0) (96.1) (94.1) Mean 75.1 74.9 73.7 89.3 92.0 92.9 (85.9) (86.5) (83.8) (96.1) (96.1) (93.6)

Table 3.1: Distribution of percent recurrent parent genome (OH8245) in sub-populations in BC1 and BC2 generations. z Values in parentheses denote selections made following background genome selection.

67

Sources of Variationz Variance Std. Dev. % Total Variance POP 0.16 0.40 7.70x Allele 1.09 1.05 52.13x LOC 0.00 0.00 0.00w YEAR 0.02 0.12 0.70y FAM(POP) 0.01 0.12 0.65x Allele:POP 0.35 0.59 16.44x POP:YEAR 0.01 0.07 0.22w LOC:YEAR 0.00 0.00 0.00w REP(LOC):YEAR 0.23 0.48 11.07w Residual 0.23 0.48 11.09

Table 3.2: Sources of variation in β-carotene content in ripe tomato fruit over two years. z POP denotes source of B promoter (Jaune Flamme, LA3502); Allele is state of B promoter (BB, Bb, bb); LOC denotes location; YEAR denotes the year of field trials (2013, 2014); FAM(POP) denotes specific cross within each B population; Allele:POP is the interaction between state of the B promoter and the source of the B allele; POP:YEAR is the interaction between the source of the B allele and the year of field trials; LOC:YEAR denotes the interaction between location and the year of field trials; REP(LOC):YEAR is the effect of replication within each location, across each year of field trials. y Significant at p = 0.01. x Significant at p = 0.001. w Not significant

68

Distribution of Progeny vs. Percent Recurrent Parent Genome in BC1

0

6

0

5

0

4

y

n

e

g

o

r

P

0

f

3

o

r

e

b

m

u

N

0

2

0

1 0

55 60 65 70 75 80 85 90 Percent (%) Recurrent Parent Genome (OH8245)

Figure 3.1: Distribution of progeny versus percent recurrent parent genome in the BC1 generation.

Y-axis indicates the number of BC1 progeny. X-axis is the percentage of recurrent parent genome (OH8245). 369 BC1 progeny containing the B allele were genotyped with a 96 SNP marker array for background genome selection. Individuals containing between 83 percent and 88 percent recurrent parent genome were selected for further backcrossing.

69

Distribution of Progeny vs. Percent Recurrent Parent Genome in BC2

8

6

y

n

e

g

o

r

P

f

4

o

r

e

b

m

u

N

2 0

84 86 88 90 92 94 96 98 Percent (%) Recurrent Parent Genome (OH8245)

Figure 3.2: Distribution of progeny versus percent recurrent parent genome in the BC2 generation.

Y-axis indicates the number of BC2 progeny. X-axis is the percentage of recurrent parent genome (OH8245). 49 BC2 progeny containing the B allele were genotyped with a 72 SNP marker array for background genome selection. Individuals with at least 93 percent recurrent parent genome were selected for further backcrossing.

70

Figure 3.3: β-carotene content in ripe tomato fruit according to the source of the B allele.

Y-axis indicates the level of β-carotene in mg/100 g fresh weight. X-axis is the donor source of the B allele. Boxes indicate levels of β-carotene from BC1 and BC2 plants across two years of field trials. The three high β-carotene populations (97L97, Jaune Flamme, LA3502) contain homozygous B alleles backcrossed into OH8245. OH8245 contains a red-fruited b allele.

71

Chapter 4: Conclusion

Classical plant breeding has traditionally focused on improving varieties by adding new genetic variation. New genetic variation often comes from wild relatives.

Historically, plant breeders would observe variation in the form of a novel phenotypic trait and attempt to trace this trait to a specific chromosomal region. They would then perform crosses to introgress the beneficial alleles of the trait into their germplasm. As genomic information is routinely incorporated into breeding programs, the processes of allele discovery and trait introgression are rapidly changing. Sequencing technology has generated massive amounts of data for crop plants and sequence archives will continue to expand as more varieties are sequenced. Currently, there are approximately 440 tomato genomes available (2012; Aflitos et al., 2014; Lin et al., 2014). With this newly generated sequence data comes the opportunity to discover novel allelic variation. By comparing existing alleles of genetic loci with newly acquired sequence data, breeders can screen for additional alleles and determine in which germplasm sources those alleles are present.

However, a novel allele discovered in sequence data may not be a functional element affecting a phenotypic trait. Therefore, breeders must determine the effect of the novel allele on the trait of interest. Transgenic technology and/or traditional breeding can be used to transfer the target allele into cultivated germplasm. Trait introgression via

72 traditional breeding utilizes backcrossing, and recently markers have facilitated both foreground and background selection for target alleles. Recent expansion of SNP resources in tomato has given breeders increased access to polymorphisms between both cultivated and wild germplasm to use in trait introgression (Hirakawa et al., 2013; Sim et al., 2012a; Sim et al., 2012b). However, there are few published models of trait introgression via background genome selection for novel alleles.

With over 440 tomato genomes available, and more in development, enormous quantities data are now available to breeders. Thus, the question arises of “how can we use existing allelic variation to enhance crops?” The aim of this thesis was to provide a model to rapidly identify new genetic variation, introgress that variation into an elite background and then assesses the effect of that variation. We examined the carotenoid profiles of 29 vintage and contemporary tomato varieties and identified sources of high β- carotene. Generating new sequence data for the B promoter in 26 tomato accessions allowed us to identify novel alleles as well as two putative functional mutations for high

β-carotene content. B promoter alleles found in contemporary and vintage varieties appear to have been derived from wild species. Distinct clades separating red, orange and green-fruited species were apparent from clustering analysis. We then demonstrated that background genome selection using SNPs is a viable method for trait introgression in tomato, and one that can save crossing generations in the greenhouse. Background genome selection with relatively few markers produced selections containing at least

93% recurrent parent genome after only two backcrosses. Finally, we showed that modulation of high β-carotene content in tomato is possible through allele choice.

73

Replicated field trials conducted over two years showed that the allele state of B is the most important factor in determining β-carotene content, but the source of the B allele can also provide further modulation. These findings may be used both to nutritionally enhance tomato fruit as well as accelerate the trait introgression process.

Future work involving our findings could be directed towards basic and applied areas. Functional characterization of the two promoter SNPs may involve site-directed mutagenesis and a transgenic system. Other genetic elements that may influence B, such as the B modifier (MoB), could also be targeted for sequencing and characterization. MoB is linked to B and is suggested as an additional gene that may affect β-carotene content in tomato fruit (Aflitos et al., 2014; Tomes et al., 1954; Zhang and Stommel, 2000). The B gene and the region 3′ to B could also be sequenced to search for further variation that may act alone or in concert with the B promoter to control enzyme activity. Additionally, the high β-carotene lines produced in this thesis could be subjected to further rounds of backcrossing until the plants reach homozygosity. Introgressing B alleles from other wild and cultivated sources may provide insight into the action of B as well as produce breeding lines with elevated carotenoid content. For example, Caro Red exhibited increased β-carotene and lycopene content. In the future, tomatoes with elevated levels of multiple nutritional compounds may be desired. Identifying varieties that naturally contain this phenotype can give breeders a head start in creating bio-fortified foods.

Plant breeders strive to fix the beneficial alleles of traits into crops. The effects of alleles in particular environments are constantly changing. Once crop varieties are fixed with the beneficial alleles for important traits, what is the next step in looking for new

74 variation? This project provides an example of how classical breeding can incorporate allelic variation to quickly assess new phenotypes. In the future, genomic information will become plentiful for crop plants and breeders will have to decide what are the best ways in which to use this data to improve crops. This thesis provides an approach to consider when facing this challenge.

75

References

Aflitos, S., E. Schijlen, H. De Jong, D. De Ridder, S. Smit, R. Finkers, J. Wang, G. Zhang, N. Li, L. Mao, F. Bakker, R. Dirks, T. Breit, B. Gravendeel, H. Huits, D. Struss, R. Swanson-Wagner, H. Van Leeuwen, R.C. Van Ham, L. Fito, L. Guignier, M. Sevilla, P. Ellul, E. Ganko, A. Kapur, E. Reclus, B. De Geus, H. Van De Geest, B. Te Lintel Hekkert, J. Van Haarst, L. Smits, A. Koops, G. Sanchez-Perez, A.W. Van Heusden, R. Visser, Z. Quan, J. Min, L. Liao, X. Wang, G. Wang, Z. Yue, X. Yang, N. Xu, E. Schranz, E. Smets, R. Vos, J. Rauwerda, R. Ursem, C. Schuit, M. Kerns, J. Van Den Berg, W. Vriezen, A. Janssen, E. Datema, T. Jahrman, F. Moquet, J. Bonnet, and S. Peters, 2014. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing. Plant J 80:136-148. Al-Babili, S., P. Hugueney, M. Schledz, R. Welsch, H. Frohnmeyer, O. Laule, and P. Beyer, 2000. Identification of a novel gene coding for neoxanthin synthase from Solanum tuberosum. FEBS Lett 485:168-172. Allard, R.W., 1960. Principles of plant breeding. Wiley, New York. Apel, W. and R. Bock, 2009. Enhancement of Carotenoid Biosynthesis in Transplastomic Tomatoes by Induced Lycopene-to-Provitamin A Conversion. Plant Physiology 151:59-66. Armoza, A., Y. Haim, A. Bashiri, T. Wolak, and E. Paran, 2013. Tomato extract and the carotenoids lycopene and lutein improve endothelial function and attenuate inflammatory NF-kappaB signaling in endothelial cells. J Hypertens 31:521- 529; discussion 529. Benchimol, L.L., C.L.D. Souza Jr, and A.P.D. Souza, 2005. Microsatellite-assisted backcross selection in maize. Genetics and Molecular Biology 28:789-797. Berry, S.Z., W.A. Gould, and K.L. Wiese, 1991. `Ohio 8245' Processing Tomato. HortScience 26:1093. Black, R.E., L.H. Allen, Z.A. Bhutta, L.E. Caulfield, M. De Onis, M. Ezzati, C. Mathers, and J. Rivera, 2008. Maternal and child undernutrition: global and regional exposures and health consequences. Lancet 371:243-260. Bonfield, J.K., K.F. Smith, and R.A. Staden, 1995. A new DNA sequence assembly program. Nucleic Acids Res 23:4992-4999.

76

Bouvier, F., A. D'harlingue, R.A. Backhaus, M.H. Kumagai, and B. Camara, 2000. Identification of neoxanthin synthase as a carotenoid cyclase paralog. Eur J Biochem 267:6346-6352. Bramley, P.M., 2002. Regulation of carotenoid formation during tomato fruit ripening and development. Journal of Experimental Botany 53:2107-2113. Chan, E. 2008. R Scripts. July 5, 2014. . Chichili, G.R., D. Nohr, J. Frank, A. Flaccus, P.D. Fraser, E.M. Enfissi, and H.K. Biesalski, 2006. Protective effects of tomato extract with elevated beta-carotene levels on oxidative stress in ARPE-19 cells. Br J Nutr 96:643-649. Chmielewski, T.M. 1965. Davis, California 15). Collard, B.C. and D.J. Mackill, 2008. Marker-assisted selection: an approach for precision plant breeding in the twenty-first century. Philos Trans R Soc Lond B Biol Sci 363:557-572. Collard, B.C.Y., M.Z.Z. Jahufer, J.B. Brouwer, and E.C.K. Pang, 2005. An introduction to markers, quantitative trait loci (QTL) mapping and marker-assisted selection for crop improvement: The basic concepts. Euphytica 142:169-196. Consortium, T.T.G., 2012. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485:635-641. D’ambrosio, C., G. Giorio, I. Marino, A. Merendino, A. Petrozza, L. Salfi, A.L. Stigliani, and F. Cellini, 2004. Virtually complete conversion of lycopene into β- carotene in fruits of tomato plants transformed with the tomato lycopene β- cyclase (tlcy-b) cDNA. Plant Science 166:207-214. Dalal, M., V. Chinnusamy, and K. Bansal, 2010. Isolation and functional characterization of Lycopene beta-cyclase (CYC-B) promoter from Solanum habrochaites. BMC Plant Biology 10:61. De Nardo, T., C. Shiroma-Kian, Y. Halim, D. Francis, and L.E. Rodriguez-Saona, 2009. Rapid and simultaneous determination of lycopene and beta-carotene contents in tomato juice by infrared spectroscopy. J Agric Food Chem 57:1105-1112. Dewanto, V., X. Wu, K.K. Adom, and R.H. Liu, 2002. Thermal processing enhances the nutritional value of tomatoes by increasing total antioxidant activity. J Agric Food Chem 50:3010-3014. Diplock, A.T., J.L. Charleux, G. Crozier-Willi, F.J. Kok, C. Rice-Evans, M. Roberfroid, W. Stahl, and J. Vina-Ribes, 1998. Functional food science and defence against reactive oxidative species. Br J Nutr 80 Suppl 1:S77-112. Druesne-Pecollo, N., P. Latino-Martel, T. Norat, E. Barrandon, S. Bertrais, P. Galan, and S. Hercberg, 2010. Beta-carotene supplementation and cancer risk: a systematic review and metaanalysis of randomized controlled trials. Int J Cancer 127:172-184. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32:1792-1797. Endelman, J.B., 2011. Ridge regression and other kernels for genomic selection with R package rrBLUP. The Plant Genome 4:250-255. 77

Er, V., J.A. Lane, R.M. Martin, P. Emmett, R. Gilbert, K.N. Avery, E. Walsh, J.L. Donovan, D.E. Neal, F.C. Hamdy, and M. Jeffreys, 2014. Adherence to Dietary and Lifestyle Recommendations and Prostate Cancer Risk in the Prostate Testing for Cancer and Treatment (ProtecT) Trial. Cancer Epidemiol Biomarkers Prev 23:2066-2077. Eroglu, A., D.P. Hruszkewycz, R.W. Curley, Jr., and E.H. Harrison, 2010. The eccentric cleavage product of beta-carotene, beta-apo-13-carotenone, functions as an antagonist of RXRalpha. Arch Biochem Biophys 504:11-16. Eshed, Y. and D. Zamir, 1995. An introgression line population of Lycopersicon pennellii in the cultivated tomato enables the identification and fine mapping of yield-associated QTL. Genetics 141:1147-1162. Falcone, G., 2009. Genetic evolution of the pathway controlling fruit carotenoid content in tomato (S. lycopersicum) and its wild relatives. University of Verona,Verona, Italy, PhD Diss. 129. Ferruzzi, M.G., L.C. Sander, C.L. Rock, and S.J. Schwartz, 1998. Carotenoid determination in biological microsamples using liquid chromatography with a coulometric electrochemical array detector. Anal Biochem 256:74-81. Fiedor, J. and K. Burda, 2014. Potential role of carotenoids as antioxidants in human health and disease. Nutrients 6:466-488. Frank, H.A. and R.J. Cogdell, 1996. Carotenoids in . Photochemistry and Photobiology 63:257-264. Fraser, P.D. and P.M. Bramley, 2004. The biosynthesis and nutritional uses of carotenoids. Prog Lipid Res 43:228-265. Fraser, P.D., J.W. Kiano, M.R. Truesdale, W. Schuch, and P.M. Bramley, 1999. Phytoene synthase-2 enzyme activity in tomato does not contribute to carotenoid synthesis in ripening fruit. Plant Mol Biol 40:687-698. Fraser, P.D., M.R. Truesdale, C.R. Bird, W. Schuch, and P.M. Bramley, 1994. Carotenoid Biosynthesis during Tomato Fruit Development (Evidence for Tissue-Specific Gene Expression). Plant Physiol 105:405-413. Frisch, M., M. Bohn, and A.E. Melchinger, 1999. Comparison of selection strategies for marker-assisted backcrossing of a gene. Crop Science 39:1295-1301. Frisch, M. and A.E. Melchinger, 2005. Selection theory for marker-assisted backcrossing. Genetics 170:909-917. Germain, P., P. Chambon, G. Eichele, R.M. Evans, M.A. Lazar, M. Leid, A.R. De Lera, R. Lotan, D.J. Mangelsdorf, and H. Gronemeyer, 2006. International Union of Pharmacology. LX. Retinoic acid receptors. Pharmacol Rev 58:712-725. Giovannucci, E., A. Ascherio, E.B. Rimm, M.J. Stampfer, G.A. Colditz, and W.C. Willett, 1995. Intake of carotenoids and retinol in relation to risk of prostate cancer. J Natl Cancer Inst 87:1767-1776. Goodstal, F.J., G.R. Kohler, L.B. Randall, A.J. Bloom, and D.a.S. Clair, 2005. A major QTL introgressed from wild Lycopersicon hirsutum confers chilling tolerance to cultivated tomato (Lycopersicon esculentum). Theoretical and Applied Genetics 111:898-905. 78

Goodwin, T.W., 1980. The Biochemistry of the Carotenoids, Vol 1. Chapman and Hall, London. Grune, T., G. Lietz, A. Palou, A.C. Ross, W. Stahl, G. Tang, D. Thurnham, S.A. Yin, and H.K. Biesalski, 2010. Beta-carotene is an important vitamin A source for humans. J Nutr 140:2268s-2285s. Hadley, C.W., S.K. Clinton, and S.J. Schwartz, 2003. The consumption of processed tomato products enhances plasma lycopene concentrations in association with a reduced lipoprotein sensitivity to oxidative damage. J Nutr 133:727- 732. Harris, W.M. and A.R. Spurr, 1969. Chromoplasts of Tomato Fruits. II. The Red Tomato. American Journal of Botany 56:380-389. Haskell, M.J., 2012. The challenge to reach nutritional adequacy for vitamin A: beta- carotene bioavailability and conversion--evidence in humans. Am J Clin Nutr 96:1193S-1203S. Herzog, E. and M. Frisch, 2011. Selection strategies for marker-assisted backcrossing with high-throughput marker systems. Theor Appl Genet 123:251-260. Hirakawa, H., K. Shirasawa, A. Ohyama, H. Fukuoka, K. Aoki, C. Rothan, S. Sato, S. Isobe, and S. Tabata, 2013. Genome-wide SNP genotyping to infer the effects on gene functions in tomato. DNA Res 20:221-233. Hirschi, K., 2008. Nutritional improvements in plants: time to bite on biofortified foods. Trends in plant science 13:459-463. Hospital, F., 2005. Selection in backcross programmes. Philos Trans R Soc Lond B Biol Sci 360:1503-1511. Hospital, F. and A. Charcosset, 1997. Marker-assisted introgression of quantitative trait loci. Genetics 147:1469-1485. Hospital, F., C. Chevalet, and P. Mulsant, 1992. Using markers in gene introgression breeding programs. Genetics 132:1199-1210. Iannotti, L.L., I. Trehan, and M.J. Manary, 2013. Review of the safety and efficacy of vitamin A supplementation in the treatment of children with severe acute malnutrition. Nutr J 12:125. Iftekharuddaula, K.M., M.A. Salam, M.A. Newaz, H.U. Ahmed, B.C. Collard, E.M. Septiningsih, D.L. Sanchez, A.M. Pamplona, and D.J. Mackill, 2012. Comparison of phenotypic versus marker-assisted background selection for the SUB1 QTL during backcrossing in rice. Breed Sci 62:216-222. International Rice Research Institute (IRRI). 2006. Molecular Breeding, Marker assisted breeding for rice improvement. Knowledgebank. 21 November 2014. . Isaacson, T., G. Ronen, D. Zamir, and J. Hirschberg, 2002. Cloning of tangerine from tomato reveals a carotenoid isomerase essential for the production of beta- carotene and in plants. The 14:333-342. Jeon, Y.J., S.K. Myung, E.H. Lee, Y. Kim, Y.J. Chang, W. Ju, H.J. Cho, H.G. Seo, and B.Y. Huh, 2011. Effects of beta-carotene supplements on cancer prevention: meta- analysis of randomized controlled trials. Nutr Cancer 63:1196-1207. 79

Jiang, G.-L., 2013. Molecular Markers and Marker-Assisted Breeding in Plants. Johnson, E.J., 2002. The role of carotenoids in human health. Nutr Clin Care 5:56-65. Kachanovsky, D.E., S. Filler, T. Isaacson, and J. Hirschberg, 2012. Epistasis in tomato color mutations involves regulation of phytoene synthase 1 expression by cis-carotenoids. Proc Natl Acad Sci U S A 109:19021-19026. Kingsolver, B.H., Steven L.; Kingsolver, Camille, 2007. , Vegetable, Miracle: A Year of Food . Harper, Collins, NY. Lecomte, L., P. Duffe, M. Buret, B. Servin, F. Hospital, and M. Causse, 2004. Marker- assisted introgression of five QTLs controlling fruit quality traits into three tomato lines revealed interactions between QTLs and genetic backgrounds. Theor Appl Genet 109:658-668. Lesley, M.M.a.J.W.L., 1943. Hybrids of the Chilean Tomato: Sterile and Fertile Plants of Lycopersicum Peruvianum var. Dentatum Dun. (L. Chilense Dun.) and Diploid and Tetraploid Hybrids with Cultivated Tomatoes. The Journal of Heredity 34:199-205. Lesley, M.M.a.J.W.L., 1947. Flesh Color in Hybrdis of Tomato: Sub-Generic Crosses Indicate that Three or More Genes Determine Red-Yellow Color Series. Journal of Heredity 38:245-251. Lichtenthaler, H.K., 1999. The 1-deoxy-D-xylulose-5-phospate pathway of isoprenoid biosynthesis in plants. Annu Rev Plant Physiol Plant Mol Biol 50:47-65. Lin, T., G. Zhu, J. Zhang, X. Xu, Q. Yu, Z. Zheng, Z. Zhang, Y. Lun, S. Li, X. Wang, Z. Huang, J. Li, C. Zhang, T. Wang, Y. Zhang, A. Wang, Y. Zhang, K. Lin, C. Li, G. Xiong, Y. Xue, A. Mazzucato, M. Causse, Z. Fei, J.J. Giovannoni, R.T. Chetelat, D. Zamir, T. Stadler, J. Li, Z. Ye, Y. Du, and S. Huang, 2014. Genomic analyses provide insights into the history of tomato breeding. Nat Genet. Lincoln, R.E. and J.W. Porter, 1950. Inheritance of Beta-Carotene in Tomatoes. Genetics 35:206-211. Maiani, G., M.J. Caston, G. Catasta, E. Toti, I.G. Cambrodon, A. Bysted, F. Granado- Lorencio, B. Olmedilla-Alonso, P. Knuthsen, M. Valoti, V. Bohm, E. Mayer- Miebach, D. Behsnilian, and U. Schlemmer, 2009. Carotenoids: actual knowledge on food sources, intakes, stability and bioavailability and their protective role in humans. Mol Nutr Food Res 53 Suppl 2:S194-218. Miller, J.C. and S.D. Tanksley, 1990. RFLP analysis of phylogenetic relationships and genetic variation in the genus Lycopersicon. Theoretical and Applied Genetics 80:437-448. Mishra, R.N., S.L. Singla-Pareek, S. Nair, S.K. Sopory, and M.K. Reddy, 2002. Directional genome walking using PCR. Biotechniques 33:830-832, 834. Nagao, A. and J.A. Olson, 1994. Enzymatic formation of 9-cis, 13-cis, and all-trans retinals from isomers of beta-carotene. Faseb j 8:968-973. Namitha, K.K., S.N. Archana, and P.S. Negi, 2011. Expression of carotenoid biosynthetic pathway genes and changes in carotenoids during ripening in tomato (Lycopersicon esculentum). Food Funct 2:168-173. 80

Novotny, J.A., D.J. Harrison, R. Pawlosky, V.P. Flanagan, E.H. Harrison, and A.C. Kurilich, 2010. Beta-carotene conversion to vitamin A decreases as the dietary dose increases in humans. J Nutr 140:915-918. Olson, J.A., 1989. Provitamin A function of carotenoids: the conversion of beta- carotene into vitamin A. J Nutr 119:105-108. Omenn, G.S., G.E. Goodman, M.D. Thornquist, J. Balmes, M.R. Cullen, A. Glass, J.P. Keogh, F.L. Meyskens, B. Valanis, J.H. Williams, S. Barnhart, and S. Hammar, 1996. Effects of a combination of beta carotene and vitamin A on lung cancer and cardiovascular disease. N Engl J Med 334:1150-1155. Prigge, V., A.E. Melchinger, B.S. Dhillon, and M. Frisch, 2009. Efficiency gain of marker-assisted backcrossing by sequentially increasing marker densities over generations. Theoretical and applied genetics 119:23-32. R Core Team, 2014. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, . Rao, A.V. and L.G. Rao, 2007. Carotenoids and human health. Pharmacol Res 55:207- 216. Reboul, E., P. Borel, C. Mikail, L. Abou, M. Charbonnier, C. Caris-Veyrat, P. Goupy, H. Portugal, D. Lairon, and M.J. Amiot, 2005. Enrichment of tomato paste with 6% tomato peel increases lycopene and beta-carotene bioavailability in men. J Nutr 135:790-794. Reyes-Valdés, M.H., 2000. A Model for Marker-Based Selection in Gene Introgression Breeding Programs. Crop Sci. 40:91-98. Ribaya-Mercado, J.D., C.C. Maramag, L.W. Tengco, G.G. Dolnikowski, J.B. Blumberg, and F.S. Solon, 2007. Carotene-rich plant foods ingested with minimal dietary fat enhance the total-body vitamin A pool size in Filipino schoolchildren as assessed by stable-isotope-dilution methodology. Am J Clin Nutr 85:1041- 1049. Rick, C. 1956. Davis, California 6). Robbins, M.D., A. Darrigues, S.C. Sim, M.A. Masud, and D.M. Francis, 2009. Characterization of hypersensitive resistance to bacterial spot race T3 (Xanthomonas perforans) from tomato accession PI 128216. Phytopathology 99:1037-1044. Ronen, G., L. Carmel-Goren, D. Zamir, and J. Hirschberg, 2000. An alternative pathway to beta -carotene formation in plant chromoplasts discovered by map-based cloning of beta and old-gold color mutations in tomato. Proc Natl Acad Sci U S A 97:11102-11107. Ronen, G., M. Cohen, D. Zamir, and J. Hirschberg, 1999. Regulation of carotenoid biosynthesis during tomato fruit development: expression of the gene for lycopene epsilon-cyclase is down-regulated during ripening and is elevated in the mutant Delta. Plant J 17:341-351. Ross, S.A., P.J. Mccaffery, U.C. Drager, and L.M. De Luca, 2000. Retinoids in embryonal development. Physiol Rev 80:1021-1054. 81

Rubio-Diaz, D.E., D.M. Francis, and L.E. Rodriguez-Saona, 2011. External calibration models for the measurement of tomato carotenoids by infrared spectroscopy. Journal of food composition and analysis 24:121-126. Rubio-Diaz, D.E., A. Santos, D.M. Francis, and L.E. Rodriguez-Saona, 2010. Carotenoid stability during production and storage of tomato juice made from tomatoes with diverse pigment profiles measured by infrared spectroscopy. J Agric Food Chem 58:8692-8698. Semagn, . ., A .; and M. N. Ndjiondjop, 2006. Progress and prospects of marker assisted backcrossing as a tool in crop breeding programs. African Journal of Biotechnology 5:2588-2603. Servin, B. and F. Hospital, 2002. Optimal positioning of markers to control genetic background in marker-assisted backcrossing. J Hered 93:214-217. Sharlach, M., D. Dahlbeck, L. Liu, J. Chiu, J.M. Jiménez-Gómez, S. Kimura, D. Koenig, J.N. Maloof, N. Sinha, G.V. Minsavage, J.B. Jones, R.E. Stall, and B.J. Staskawicz, 2013. Fine genetic mapping of RXopJ4, a bacterial spot disease resistance locus from Solanum pennellii LA716. TAG. Theoretical And Applied Genetics. Theoretische Und Angewandte Genetik 126:601-609. Silletti, M.F., A. Petrozza, A.L. Stigliani, G. Giorio, F. Cellini, C. D’ambrosio, and F. Carriero, 2013. An increase of lycopene content in tomato fruit is associated with a novel Cyc-B allele isolated through TILLING technology. Molecular Breeding 31:665-674. Sim, S.C., G. Durstewitz, J. Plieske, R. Wieseke, M.W. Ganal, A. Van Deynze, J.P. Hamilton, C.R. Buell, M. Causse, S. Wijeratne, and D.M. Francis, 2012a. Development of a large SNP genotyping array and generation of high-density genetic maps in tomato. PLoS One 7:e40563. Sim, S.C., A. Van Deynze, K. Stoffel, D.S. Douches, D. Zarka, M.W. Ganal, R.T. Chetelat, S.F. Hutton, J.W. Scott, R.G. Gardner, D.R. Panthee, M. Mutschler, J.R. Myers, and D.M. Francis, 2012b. High-density SNP genotyping of tomato (Solanum lycopersicum L.) reveals patterns of genetic variation due to breeding. PLoS One 7:e45520. Soost, R.K. 1956. Davis, California 6). Stigliani, A.L., G. Giorio, and C. D'ambrosio, 2011. Characterization of P450 carotenoid beta- and epsilon-hydroxylases of tomato and transcriptional regulation of biosynthesis in , leaf, petal and fruit. Plant Cell Physiol 52:851-865. Stommel, J.R., 2001. USDA 97L63, 97L66, and 97L97: Tomato Breeding Lines with High Fruit Beta-carotene Content. HortScience 36:387-388. Stommel, J.R. and K.G. Haynes, 1994. Inheritance of Beta Carotene Content in the Wild Tomato Species Lycopersicon cheesmanii. Journal of Heredity 85:401- 404. Strobel, M., J. Tinz, and H.K. Biesalski, 2007. The importance of beta-carotene as a source of vitamin A with special regard to pregnant and breastfeeding women. Eur J Nutr 46 Suppl 1:I1-20. 82

Tamura, K. and M. Nei, 1993. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10:512-526. Tamura, K., D. Peterson, N. Peterson, G. Stecher, N. Nei, and S. Kumar, 2011. MEGA5: Molecular Evolutionary Genetics Analysis using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Molecular Biology and Evolution 28:2731-2739. Tamura, K., G. Stecher, D. Peterson, A. Filipski, and S. Kumar, 2013. MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Molecular Biology and Evolution 30:2725-2729. Tan, H.L., J.M. Thomas-Ahner, E.M. Grainger, L. Wan, D.M. Francis, S.J. Schwartz, J.W. Erdman, Jr., and S.K. Clinton, 2010. Tomato-based food products for prostate cancer prevention: what have we learned? Cancer Metastasis Rev 29:553-568. Tanumihardjo, S.A., 2011. Vitamin A: biomarkers of nutrition for development. Am J Clin Nutr 94:658s-665s. Tao, T., 2010. Standalone BLAST Setup for Unix, BLAST® Help National Center for Biotechnology Information, Bethesda, Maryland. Tomato Genome Consortium, 2012. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485:635-641. Tomes, M.L., F.W. Quackenbush, and T.E. Kargl, 1956. Action of the Gene B in Biosynthesis of Carotenes in the Tomato. Botanical Gazette 117:248-253. Tomes, M.L., F.W. Quackenbush, and M. Mcquistan, 1954. Modification and Dominance of the Gene Governing Formation of High Concentrations of BETA-Carotene in the Tomato. Genetics 39:810-817. Unlu, N.Z., T. Bohn, D. Francis, S.K. Clinton, and S.J. Schwartz, 2007a. Carotenoid absorption in humans consuming tomato sauces obtained from tangerine or high-beta-carotene varieties of tomatoes. J Agric Food Chem 55:1597-1603. Unlu, N.Z., T. Bohn, D.M. Francis, H.N. Nagaraja, S.K. Clinton, and S.J. Schwartz, 2007b. Lycopene from heat-induced cis-isomer-rich tomato sauce is more bioavailable than from all-trans-rich tomato sauce in human subjects. Br J Nutr 98:140-146. Van Het Hof, K.H., B.C. De Boer, L.B. Tijburg, B.R. Lucius, I. Zijp, C.E. West, J.G. Hautvast, and J.A. Weststrate, 2000. Carotenoid bioavailability in humans from tomatoes processed in different ways determined from the carotenoid response in the triglyceride-rich lipoprotein fraction of plasma after a single consumption and in plasma after four days of consumption. J Nutr 130:1189- 1196. Visscher, P.M., C.S. Haley, and R. Thompson, 1996. Marker-assisted introgression in backcross breeding programs. Genetics 144:1923-1932. Welch, R.M., 2002. Breeding strategies for biofortified staple plant foods to reduce micronutrient malnutrition globally. The Journal of nutrition 132:495S-499S. West, K.P., Jr., 2002. Extent of vitamin A deficiency among preschool children and women of reproductive age. J Nutr 132:2857s-2866s. 83

Yang, W. and D.M. Francis, 2005. Marker-assisted Selection for Combining Resistance to Bacterial Spot and Bacterial Speck in Tomato. Journal of the American Society for Horticultural Science 130:716-721. Yeum, K.J. and R.M. Russell, 2002. Carotenoid bioavailability and bioconversion. Annu Rev Nutr 22:483-504. Zechmeister, L., 1944. Cis-trans Isomerization and Stereochemistry of Carotenoids and Diphenyl-polyenes. Chemical Reviews 34:267-344. Zhang, Y. and J.R. Stommel, 2000. RAPD and AFLP tagging and mapping of Beta (B) and Beta modifier (MoB), two genes which influence β-carotene accumulation in fruit of tomato (Lycopersicon esculentum Mill.). Theoretical and Applied Genetics 100:368-375.

84

Appendix A: SNPs used for background genome selection in BC1

Genetic Map (EXPEN Physical Map 2012) Locus Phy_Chr Phy_Pos Gen_Chr Gen_Pos (bp) (cM) solcap_snp_sl_8656 SL2.40ch01 2446525 CHR01 20.8427 solcap_snp_sl_8655 SL2.40ch01 2446729 CHR01 20.8427 solcap_snp_sl_456 SL2.40ch01 71090567 CHR01 46.3464 solcap_snp_sl_12359 SL2.40ch01 72800389 CHR01 46.3464 solcap_snp_sl_1814 SL2.40ch01 74289606 Unknown Unknown solcap_snp_sl_9758 SL2.40ch01 76901865 CHR01 62.5107 solcap_snp_sl_2202 SL2.40ch01 77916027 CHR01 67.2503 CL009293-0681 SL2.40ch01 86883070 CHR01 105.5407 C2_At2g15890_430_b SL2.40ch01 87798243 CHR01 108.6871 solcap_snp_sl_53864 SL2.40ch01 89251742 CHR01 113.0908 solcap_snp_sl_12647 SL2.40ch02 21285067 CHR02 3.1582 solcap_snp_sl_36443 SL2.40ch02 30745716 CHR02 19.6423 solcap_snp_sl_20325 SL2.40ch02 31821502 CHR02 21.8513 solcap_snp_sl_20361 SL2.40ch02 32311842 CHR02 24.3783 solcap_snp_sl_8464 SL2.40ch02 33122579 CHR02 30.3639 solcap_snp_sl_66052 SL2.40ch02 34867654 CHR02 41.1056 241_2F_264_241_2b_60_b SL2.40ch02 37376716 CHR02 55.3068 solcap_snp_sl_42346 SL2.40ch02 43978137 CHR02 80.8888 solcap_snp_sl_20052 SL2.40ch02 48684980 CHR02 106.7147 solcap_snp_sl_9703 SL2.40ch03 2372139 CHR03 35.2309 solcap_snp_sl_19508 SL2.40ch03 7988073 CHR03 39.6983 solcap_snp_sl_18967 SL2.40ch03 47216892 CHR03 45.9933 solcap_snp_sl_21685 SL2.40ch03 53165426 Unknown Unknown solcap_snp_sl_14714 SL2.40ch03 55799963 CHR03 51.9487 solcap_snp_sl_62467 SL2.40ch03 57379407 CHR03 61.1268 solcap_snp_sl_20723 SL2.40ch03 61134967 CHR03 87.9554 solcap_snp_sl_22017 SL2.40ch03 64397149 CHR03 105.0781

85 solcap_snp_sl_9856 SL2.40ch04 845964 CHR04 10.6127 solcap_snp_sl_21335 SL2.40ch04 2050817 CHR04 24.5454 solcap_snp_sl_34669 SL2.40ch04 2282123 CHR04 27.3758 solcap_snp_sl_21376 SL2.40ch04 2927162 CHR04 33.3951 solcap_snp_sl_25082 SL2.40ch04 3504202 CHR04 36.8932 solcap_snp_sl_1698 SL2.40ch04 7217389 CHR04 49.9849 solcap_snp_sl_14104 SL2.40ch04 53220275 CHR04 55.3543 solcap_snp_sl_47277 SL2.40ch04 61090901 CHR04 89.8029 solcap_snp_sl_47540 SL2.40ch04 62129471 CHR04 95.7946 solcap_snp_sl_29326 SL2.40ch04 63478301 Unknown Unknown solcap_snp_sl_19102 SL2.40ch05 1860363 CHR05 15.5909 solcap_snp_sl_48900 SL2.40ch05 3948419 CHR05 33.2853 solcap_snp_sl_23721 SL2.40ch05 4058965 CHR05 35.7991 solcap_snp_sl_16137 SL2.40ch05 59787643 CHR05 57.8982 solcap_snp_sl_37209 SL2.40ch05 61181080 CHR05 64.8383 solcap_snp_sl_12213 SL2.40ch05 61680264 CHR05 66.7953 solcap_snp_sl_16204 SL2.40ch05 62624468 CHR05 76.2602 solcap_snp_sl_15515 SL2.40ch06 1605168 CHR06 0.6210 solcap_snp_sl_34975 SL2.40ch06 3502385 CHR06 8.4762 solcap_snp_sl_25660 SL2.40ch06 33357839 CHR06 27.6403 solcap_snp_sl_24361 SL2.40ch06 34952415 CHR06 32.9887 Bcyc_868 SL2.40ch06 42288756 CHR06 63.4169 solcap_snp_sl_32223 unknown unknown CHR06 76.6629 solcap_snp_sl_11231 SL2.40ch07 681821 CHR07 0.0000 solcap_snp_sl_11171 SL2.40ch07 1890817 CHR07 0.0000 solcap_snp_sl_11082 SL2.40ch07 3460132 CHR07 7.6026 solcap_snp_sl_22065 SL2.40ch07 3745280 CHR07 11.7655 solcap_snp_sl_22770 SL2.40ch07 53094348 CHR07 30.4672 solcap_snp_sl_51838 SL2.40ch07 55250659 CHR07 34.8804 solcap_snp_sl_5840 SL2.40ch07 57417315 CHR07 42.1563 solcap_snp_sl_6370 SL2.40ch07 58657417 CHR07 43.4198 solcap_snp_sl_7028 SL2.40ch07 60957030 CHR07 44.3687 solcap_snp_sl_37030 SL2.40ch07 65051548 CHR07 74.7518 solcap_snp_sl_24384 SL2.40ch08 216751 CHR08 0.0000 solcap_snp_sl_5428 SL2.40ch08 3261235 CHR08 10.0912 solcap_snp_sl_4374 SL2.40ch08 53146592 CHR08 24.9273 solcap_snp_sl_29413 SL2.40ch08 54785908 CHR08 34.1149 solcap_snp_sl_21400 SL2.40ch08 56129389 CHR08 36.3947 solcap_snp_sl_34763 SL2.40ch08 58879636 CHR08 50.9402 86 solcap_snp_sl_15432 SL2.40ch08 60016886 CHR08 57.9484 solcap_snp_sl_28404 SL2.40ch09 651775 CHR09 5.3634 solcap_snp_sl_16579 SL2.40ch09 4645709 CHR09 40.8246 solcap_snp_sl_31883 SL2.40ch09 17881465 CHR09 45.2383 solcap_snp_sl_3429 SL2.40ch09 63185156 CHR09 56.9057 solcap_snp_sl_36856 SL2.40ch09 65167467 CHR09 70.8639 SGN-U574631_snp51901 SL2.40ch09 67274497 CHR09 93.1560 solcap_snp_sl_13202 SL2.40ch10 1163895 CHR10 7.5697 solcap_snp_sl_25001 SL2.40ch10 4142126 CHR10 35.0838 solcap_snp_sl_8000 SL2.40ch10 46999047 CHR10 38.2340 SL10386_455 SL2.40ch10 61574054 CHR10 55.9923 solcap_snp_sl_61192 SL2.40ch10 62487651 CHR10 62.3075 CL017176-0241 SL2.40ch10 63002657 CHR10 68.9189 solcap_snp_sl_8825 SL2.40ch10 63847892 CHR10 77.1731 solcap_snp_sl_8787 SL2.40ch10 64541757 CHR10 84.4528 solcap_snp_sl_15631 SL2.40ch11 746762 CHR11 3.1695 solcap_snp_sl_20987 SL2.40ch11 3763874 CHR11 28.9030 solcap_snp_sl_15284 SL2.40ch11 7241522 CHR11 48.4175 solcap_snp_sl_14367 SL2.40ch11 7715560 CHR11 48.7330 solcap_snp_sl_29505 SL2.40ch11 8626324 CHR11 49.9952 solcap_snp_sl_12406 SL2.40ch11 11934896 CHR11 51.8839 solcap_snp_sl_17549 SL2.40ch11 52053533 CHR11 87.3251 solcap_snp_sl_17550 SL2.40ch11 52053611 CHR11 87.3251 solcap_snp_sl_17571 SL2.40ch11 52801188 CHR11 93.4002 solcap_snp_sl_12656 SL2.40ch12 1971483 CHR12 24.8000 solcap_snp_sl_1572 SL2.40ch12 4038565 CHR12 42.3857 solcap_snp_sl_16803 SL2.40ch12 14055893 CHR12 51.0890 solcap_snp_sl_19345 SL2.40ch12 47221804 CHR12 58.0409 solcap_snp_sl_14428 SL2.40ch12 62436776 CHR12 69.1114 solcap_snp_sl_19393 SL2.40ch12 64479141 CHR12 92.7642

87

Appendix B: SNPs used for background genome selection in BC2

Physical Map Genetic Map (EXPEN 2012) Locus Phy_Chr Phy_Pos Gen_Chr Gen_Pos (bp) (cM) solcap_snp_sl_33745 SL2.40ch01 534448 Unknown Unknown CL009286-0792 SL2.40ch01 2442427 Unknown Unknown solcap_snp_sl_2234 SL2.40ch01 78525073 CHR01 70.7058 solcap_snp_sl_13404 SL2.40ch01 87977912 CHR01 109.6292 solcap_snp_sl_12647 SL2.40ch02 21285067 CHR02 3.1582 solcap_snp_sl_8464 SL2.40ch02 33122579 CHR02 30.3639 solcap_snp_sl_36224 SL2.40ch02 46994941 Unknown Unknown solcap_snp_sl_21862 SL2.40ch02 47981517 CHR02 101.3615 solcap_snp_sl_9703 SL2.40ch03 2372139 CHR03 35.2309 solcap_snp_sl_23192 SL2.40ch03 8663868 Unknown Unknown solcap_snp_sl_12718 SL2.40ch03 8904650 CHR03 39.6983 solcap_snp_sl_7942 SL2.40ch03 55982142 CHR03 52.8905 solcap_snp_sl_62495 SL2.40ch03 57216874 CHR03 59.5510 solcap_snp_sl_9856 SL2.40ch04 845964 CHR04 10.6127 solcap_snp_sl_21335 SL2.40ch04 2050817 CHR04 24.5454 solcap_snp_sl_21372 SL2.40ch04 2915842 CHR04 33.3951 solcap_snp_sl_27167 SL2.40ch04 4832005 CHR04 45.5694 solcap_snp_sl_1698 SL2.40ch04 7217389 CHR04 49.9849 solcap_snp_sl_5211 SL2.40ch04 16131381 CHR04 50.6219 solcap_snp_sl_18755 SL2.40ch04 18163912 CHR04 50.6219 solcap_snp_sl_53136 SL2.40ch04 53215634 Unknown Unknown solcap_snp_sl_36809 SL2.40ch04 57939715 CHR04 68.9419 solcap_snp_sl_23589 SL2.40ch04 63490336 Unknown Unknown solcap_snp_sl_19102 SL2.40ch05 1860363 CHR05 15.5909 solcap_snp_sl_13481 SL2.40ch05 3898119 CHR05 32.9747 solcap_snp_sl_29477 SL2.40ch05 4244941 Unknown Unknown solcap_snp_sl_29473 SL2.40ch05 4829681 CHR05 42.4247

88 solcap_snp_sl_22565 SL2.40ch05 59035173 CHR05 56.6364 solcap_snp_sl_12213 SL2.40ch05 61680264 CHR05 66.7953 solcap_snp_sl_15515 SL2.40ch06 1605168 CHR06 0.6210 solcap_snp_sl_65677 SL2.40ch06 1674079 CHR06 0.9315 solcap_snp_sl_25160 SL2.40ch06 2104973 CHR06 4.0790 solcap_snp_sl_2629 SL2.40ch06 30039946 Unknown Unknown solcap_snp_sl_12638 SL2.40ch06 36178958 Unknown Unknown solcap_snp_sl_27197 SL2.40ch06 36959813 Unknown Unknown solcap_snp_sl_17019 SL2.40ch06 37553460 Unknown Unknown solcap_snp_sl_11231 SL2.40ch07 681821 CHR07 0.0000 solcap_snp_sl_22109 SL2.40ch07 1819971 Unknown Unknown solcap_snp_sl_22065 SL2.40ch07 3745280 CHR07 11.7655 solcap_snp_sl_7025 SL2.40ch07 60967654 CHR07 44.3687 solcap_snp_sl_19759 SL2.40ch08 393921 Unknown Unknown solcap_snp_sl_7305 SL2.40ch08 711380 CHR08 0.0000 solcap_snp_sl_7386 SL2.40ch08 2849019 CHR08 4.7347 solcap_snp_sl_21461 SL2.40ch08 58492892 CHR08 50.0087 solcap_snp_sl_25111 SL2.40ch08 59008932 Unknown Unknown solcap_snp_sl_15432 SL2.40ch08 60016886 CHR08 57.9484 solcap_snp_sl_17525 SL2.40ch09 259326 Unknown Unknown solcap_snp_sl_17481 SL2.40ch09 629861 Unknown Unknown solcap_snp_sl_7775 SL2.40ch09 1221555 CHR09 10.7171 solcap_snp_sl_26683 SL2.40ch09 2389435 CHR09 23.0565 solcap_snp_sl_69978 SL2.40ch09 66989065 CHR09 89.5929 solcap_snp_sl_63641 SL2.40ch09 67362854 CHR09 95.7237 solcap_snp_sl_63588 SL2.40ch09 67653341 CHR09 96.6659 solcap_snp_sl_13202 SL2.40ch10 1163895 CHR10 7.5697 solcap_snp_sl_34373 SL2.40ch10 3991802 CHR10 35.0838 solcap_snp_sl_61192 SL2.40ch10 62487651 CHR10 62.3075 CL017176-0241 SL2.40ch10 63002657 CHR10 68.9189 solcap_snp_sl_8835 SL2.40ch10 63664772 CHR10 75.9148 solcap_snp_sl_15094 SL2.40ch10 63928029 CHR10 78.7486 solcap_snp_sl_8787 SL2.40ch10 64541757 CHR10 84.4528 solcap_snp_sl_21829 SL2.40ch11 496383 CHR11 1.2743 solcap_snp_sl_21035 SL2.40ch11 4173515 CHR11 31.4029 solcap_snp_sl_6905 SL2.40ch11 7843871 Unknown Unknown solcap_snp_sl_706 SL2.40ch11 10741171 Unknown Unknown SL10890_654 SL2.40ch11 51606430 Unknown Unknown solcap_snp_sl_3163 SL2.40ch12 621614 CHR12 5.6645 89 solcap_snp_sl_12656 SL2.40ch12 1971483 CHR12 24.8000 solcap_snp_sl_1572 SL2.40ch12 4038565 CHR12 42.3857 solcap_snp_sl_9707 SL2.40ch12 5718726 CHR12 49.1957 solcap_snp_sl_16795 SL2.40ch12 10579861 CHR12 51.0890 solcap_snp_sl_19345 SL2.40ch12 47221804 CHR12 58.0409 solcap_snp_sl_23507 SL2.40ch12 54473238 Unknown Unknown

90