Exploring metabolic and genetic diversity in secondary metabolites

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

in the Graduate School of The Ohio State University

By

Michael Paul Dzakovich

Graduate Program in Horticulture and Crop Science

The Ohio State University

2020

Dissertation Committee

Dr. Jessica Cooperstone, Advisor

Dr. David Francis

Dr. Ken Lee

Dr. Dave Mackey

Dr. Leah McHale

1

Copyrighted by

Michael Paul Dzakovich

2020

2

Abstract

Tomatoes (Solanum lycopersicum) are an important crop for economic and nutritional reasons that biosynthesize a multiplicity of metabolites, also known as phytochemicals. Through epidemiological studies, it has been posited that certain phytochemicals present in tomatoes, such as carotenoids, may be responsible for observed health benefits, including reduced risk for certain cancers. However, whole tomato consumption offers additional benefits over supplementation with individual phytochemicals. The overarching focus of my dissertation research is to develop novel methodologies to measure tomato phytochemicals of interest, determine how tomato phytochemicals affect mammalian tissues where they are deposited, and explore the diversity and genetic basis for biosynthesis of tomato steroidal glycoalkaloids.

The specific objectives of my dissertation are: 1) Develop a high-throughput extraction and analysis workflow suitable for plant breeders to determine how carotenoid pathway intermediates are affected by natural variation; 2) Determine how dietary tomato consumption affects by quantifying transcriptome and metabolome alterations in mouse liver tissue; 3a) Develop and validate a high-throughput method to extract and quantify potentially health-promoting tomato steroidal ; and 3b) Quantify natural phenotypic variation of potentially health-promoting tomato steroidal alkaloids and describe the underlying genetic architecture.

ii

For objective 1, I hypothesized that a high throughput extraction and analysis method for tomato carotenoids could be developed that would more quickly produce comparable data to traditional methods. Methods developed can extract 12 samples/hour and separate phytoene, phytofluene, β-carotene (and isomers), all-trans- (and isomers) in 4.2 minutes (Dzakovich et al., 2019). Novel methodology developed as part of this objective resulted in the fastest available chromatographic method that separates typical tomato carotenoids. Subpopulations of tomatoes with varying carotenoid profiles were able to be discerned using both methods making this workflow suitable for breeders who need to make fast, data driven decisions.

In objective 2, I hypothesized that tomato consumption would affect gene expression by way of altering the chemical composition of the liver. Overall, tomato consumption had a small effect on gene transcription, though of differentially expressed genes, those related to Phase I/II xenobiotic metabolism, were differentially regulated by tomato consumption and type. Moreover, the chemical profiles of liver tissue were altered by tomato consumption and I confirmed the identities of two steroidal alkaloids with authentic standards. Additionally, I tentatively identified 17 masses of Phase I/II tomato steroidal metabolites. These compounds may be bioactive in vivo and many are reported for the first time.

For objective 3a, I developed and validated a comprehensive, quantitative extraction and mass spectrometry-based analysis method for tomato steroidal alkaloids and analyzed a variety of tomato based products commonly consumed by Americans and

iii available in grocery stores. The method I developed was able to extract 16 samples in 20 minutes and separate 18 steroidal glycoalkaloids derived from 9 masses in 13 minutes.

In objective 3b, I assembled a diversity panel comprised of 107 accessions of red- fruited tomato species selected to maximize genetic diversity, and applied my high throughput steroidal alkaloid extraction and analysis workflow. I hypothesized that steroidal alkaloids would be most diverse in wild tomato germplasm. A genome-wide association study (GWAS) revealed quantitative trait loci (QTL) associated with various steroidal alkaloids and these QTL were confirmed in a separate biparental mapping population. Early and late steroidal alkaloid pathway intermediates were differentially regulated by QTL on chromosome 3 (early) and chromosomes 10 and 11 (late).

Information generated from this hypothesis added to the incomplete literature surrounding tomato steroidal alkaloids and created genetic resources useful for creation of germplasm designed for nutritional intervention studies.

The findings in this dissertation have led to the development of new analytical methodologies, revealed novel information about the chemodiversity of tomato phytochemicals in both fruits and mammalian tissue, identified wild sources of diversity in tomato steroidal alkaloids, and developed genetic material that can be used for both fine mapping experiments and tomato material that can be used in clinical trials testing hypotheses about tomato consumption and human health.

iv

Dedication

“My dear [committee],

(…)

But I am very poorly today & very stupid & hate everybody & everything. One lives only to make blunders. I am going to write a little book (…) on [tomatoes] & today I hate them worse than everything so farewell & in a sweet frame of mind, I am ever yours.”

-Charles Darwin, October 1st, 1861 (with slight modification)

Dedicated to those having a bad day, feeling stupid, or living only to make blunders…

v

Acknowledgments

Firstly, a big thank you to my advisor, Dr. Jessica Cooperstone, for being a constant source of encouragement, kindness, reason, and in some parts of the year, Long

Island bagels. Although my PhD program has been one of most challenging periods of my life, I could not have asked for a better advisor. Your advice and experience has helped shape me into a better scientist. Perhaps more importantly, consistently being someone who is genuinely empathetic and a good listener has also helped shape me into a better person. I would also like to thank the other members of my committee: Dr. David

Francis, Dr. Ken Lee, Dr. David Mackey, and Dr. Leah McHale for their time and advice over the years. A particular thank you to Dr. David Francis who advised me in the early part of my PhD program and helped teach me valuable time management, data analysis, and experimental design skills. I would not have been able to reach this point without his guidance.

A special thanks to collaborators on many of these projects with an emphasis on

Dr. Jennifer Thomas-Ahner who helped me design my mouse liver transcriptomics/metabolomics project as well as craft a funded proposal for that work. I would also like to thank the hard working members of the North Central Agriculture

Research Station who helped manage literally thousands of tomato plants that were integral to my dissertation research.

vi

I would also like to thank my lab mates Emma Bilbrey, Mallory Goggans, Jordan

Hartman, Jenna Miller, and Matt Teegarden for their support, humor, and individual personalities. Particularly during the late-stages of our degrees, we all came together and helped each other in various ways. All of you brought something special to the table and I look forward to watching you all continue to flourish.

While I spent most of my time in the lab or working, my time outside of my PhD program was critical to helping me stay balanced and centered. A special thanks to

Johnny Jackson, who helped me regain the use of my left arm/shoulder after my bike accident and Adam Zevchik, who relatively gently, introduced me to the world of powerlifting. I appreciate both of you helping me tap into a side of myself I ignored for most of my life. An extra special thanks to Bruce Becker, a flamenco cantaor I have been accompanying since 2009, for giving me an extra reason to visit home and immerse me in a culture and thinking that’s different than my own. Te extraño todos los días.

Lastly, I would like to thank my family, friends, and partner Christine Adams for their unwavering support during some of the most difficult parts of my dissertation.

Without these people, I am certain that I would not have made it to the end (and would have quit in spring 2016, in all honesty). I appreciate all of you more than you know and will always be willing to return the favor.

vii

Vita

2019-present – Graduate Research Associate, Dept. of Horticulture and Crop

Science, The Ohio State University

2017-2019 – USDA National Needs Research Fellow, Dept. of Horticulture and

Crop Science, The Ohio State University

2015-2017 – Graduate School Fellow, Dept of Horticulture and Crop Science,

The Ohio State University

2015 – M.S. Horticultural Science – Purdue University

2013-2015 – Graduate Research Fellow, Dept. of Horticulture and Landscape

Architecture, Purdue University

2013 – B.S. Horticultural Science – Purdue University

Publications

Dzakovich, M.P., Hartman, J.L., Cooperstone, J.L. (2020). A High-Throughput Extraction and Analysis Method for Steroidal Glycoalkaloids in Tomato. Front. Plant Sci. 11. doi: 10.3389/fpls.2020.00767 Shetge, S.A., Dzakovich, M.P., Cooperstone, J.L., Kleinmeier, D., and Redan, B.W. (2020). Concentrations of the Opium Alkaloids Morphine, Codeine, and Thebaine in Poppy Seeds Are Reduced After Thermal and Washing Treatments but Are Not Affected When Incorporated in a Model Baked Product. Ag. And Food Chem. 68(18): 5241-5248. doi: 10.1021/acs.jafc.0c01681. Dzakovich, M.P., Gas-Pascual, E., Orchard, C.J., Sari, E.N., Riedl, K.M., Schwartz, S.J., Francis, D.M., Cooperstone, J.L. (2019). Analysis of

viii

tomato carotenoids: comparing extraction and chromatographic methods. J of AOAC Int. 102(4):1069-1079. doi: 10.5740/jaoacint.19-0017 Dzakovich, M.P., Gómez, C., Ferruzzi, M.G., Mitchell, C.A. (2017). Chemical and sensory properties of greenhouse tomatoes remain unchanged in response to red, blue, and far-red supplemental light from light emitting diodes. Hort. Sci. 52(12):1734-1741. doi: 10.21273/HORTSCI12469-17 Dzakovich, M.P., Ferruzzi, M.G., Mitchell, C.A. (2016). Manipulating sensory and phytochemical profiles of greenhouse tomatoes using environmentally relevant doses of ultraviolet radiation. Ag. & Food Chem. 64(36):6801- 6808. doi: 10.1021/acs.jafc.6b02983 Dzakovich, M.P., Gómez, C., Mitchell, C.A. (2015). Tomatoes grown with light- emitting diodes or high-pressure sodium supplemental lights have similar fruit-quality attributes. Hort. Sci. 50(10):1498-1502. doi: 10.21273/HORTSCI.50.10.1498 Mitchell, C.A., Burr, J.F., Dzakovich, M.P., Gómez, C., Lopez, R., Hernández, R., Kubota, C., Currey, C.J., Meng, Q., Runkle, E.S., Bourget, C.M., Morrow, R.C., Both, A.J. (2015). Light-emitting diodes in horticulture. Hort. Reviews 43(1):1-87. doi: 10.1002/9781119107781.ch01

Fields of Study

Major Field: Horticulture and Crop Science

ix

Table of Contents

Abstract ...... ii Dedication ...... v Acknowledgments...... vi Vita ...... viii Table of Contents ...... x List of Tables ...... xiv List of Figures ...... xvi Chapter 1. Literature Review ...... 1 Tomato History at a Glance ...... 1 1.1 Tomato Phytochemicals and Their Biosynthesis ...... 3 1.1.1 Carotenoid Biosynthesis ...... 4 1.1.2 Phenolic Acids and Flavonoids ...... 16 1.1.3 Tomato Steroidal Glycoalkaloid Biosynthesis...... 18 1.2 Tomato Phytochemicals and Human Health...... 25 1.2.1 Carotenoid Absorption and Potential Health Benefits ...... 25 1.2.2 Phenolic Acids and Flavonoids ...... 29 1.2.3 Tomato Steroidal Glycoalkaloids ...... 32 1.2.4 Liver Diseases ...... 34 1.3 Quantifying Tomato Phytochemicals and Assessing their Biological Activity...... 40 1.3.1 Targeted Assays ...... 41 1.3.2 Untargeted Metabolomics ...... 44 1.3.3 RNA-seq ...... 50 1.3.4 Integration of Metabolomic and Transcriptomic Data...... 53 1.4 Objectives: ...... 56 1.4.1 Objective 1: Develop a high-throughput extraction and analysis workflow suitable for plant breeders to determine how carotenoid pathway intermediates are affected by alleles of Beta and tangerine ...... 56 1.4.2 Objective 2: Determine how dietary tomato consumption affects metabolism by quantifying transcriptome and metabolome alterations in mouse liver tissue. .... 56

x

1.4.3 Objective 3a: Develop and validate a high-throughput method to extract and quantify potentially health-promoting tomato steroidal alkaloids...... 57 1.4.4 Objective 3b: Quantify the range of natural phenotypic variation of potentially health-promoting tomato steroidal alkaloids and describe the underlying genetic architecture...... 58 Chapter 2. Analysis of Tomato Carotenoids: Comparing Extraction and Chromatographic Methods...... 59 2.1 Abstract ...... 59 2.1 Introduction ...... 60 2.2 Methods...... 63 2.2.1 Plant Material ...... 63 2.2.2 Experimental Design ...... 64 2.2.3 Chemical Reagents...... 67 2.2.4 Standard Carotenoid Extraction ...... 67 2.2.5 Rapid Carotenoid Extraction...... 68 2.2.6 Standard HPLC-DAD Analysis ...... 68 2.2.7 UHPLC-DAD Analysis ...... 69 2.2.8 Statistical Analysis ...... 70 2.3 Results and Discussion ...... 72 2.3.1 Extraction Methods ...... 73 2.3.2 HPLC-DAD and UHPLC-DAD Analysis Methods ...... 82 2.4 Conclusion ...... 88 Chapter 3. The effects of tomato consumption on the transcriptome and metabolome of murine liver ...... 91 3.1 Abstract ...... 91 3.2 Introduction ...... 92 3.3 Materials and Methods ...... 95 3.3.1 Reagents and standards ...... 95 3.3.2 Animal Diets and Experimental Design ...... 95 3.3.3 Experimental Design ...... 96 3.3.4 RNA Extraction ...... 96 3.3.5 cDNA Library Preparation and RNA-Seq Data Acquisition ...... 97 3.3.6 Analysis of RNA-Seq Data ...... 97 3.3.7 Extraction of Polar Metabolites ...... 98 xi

3.3.8 Untargeted Metabolomics Data Collection ...... 99 3.3.9 Analysis of Untargeted Metabolomics Data ...... 100 3.3.10 Dataset Availability ...... 102 3.4 Results and Discussion ...... 102 3.4.1 Animal weights and tissue mass ...... 102 3.4.2 Tomato consumption and type influence the transcriptome of mouse liver tissue ...... 106 3.4.3 The chemical landscape of mouse liver tissue is affected by tomato consumption and less so by type ...... 119 Chapter 4. A High-Throughput Extraction and Analysis Method for Steroidal Glycoalkaloids in Tomato ...... 143 4.1 Abstract ...... 143 4.2 Introduction ...... 144 4.3 Materials and Methods ...... 150 4.3.1 Chemical and reagents ...... 150 4.3.2 Sample material ...... 151 4.3.3 Extraction of tSGAs ...... 151 4.3.4 UHPLC-MS/MS Quantification of tSGAs ...... 152 4.3.5 UHPLC-QTOF/MS Confirmation of tSGA Identities ...... 155 Limit of Detection (LOD) and Limit of Quantification (LOQ) ...... 155 4.3.6 Spike Recovery Experiments ...... 156 4.3.7 Intra/Interday Variability Experiments ...... 156 4.4 Results and Discussion: ...... 157 4.4.1 Development of high-throughput extraction method ...... 157 4.4.2 Selection of Precursor Ions ...... 158 4.4.3 Use of Internal Standards ...... 159 4.4.4 Optimization of MS parameters ...... 160 4.4.5 Development of Chromatographic Gradient ...... 162 4.4.6 Confirmation of Analytes using High-Resolution Mass Spectrometry ...... 163 4.4.7 LOD and LOQ ...... 167 4.4.8 Spike Recovery ...... 168 4.4.9 Intra/Interday Variability: ...... 170 4.4.10 12-hour Stability Experiment:...... 172

xii

4.4.11 Grocery Store Survey ...... 172 4.5 Conflict of Interest ...... 176 4.6 Author Contributions ...... 177 4.7 Funding ...... 177 4.8 Acknowledgements ...... 177 4.9 Data Availability Statement ...... 178 Chapter 5. Regulation of Steroidal Alkaloid Pathway Intermediates Differs Among Tomatoes in The Red-Fruited Clade ...... 179 5.1 Abstract ...... 179 5.2 Introduction ...... 180 5.3 Results ...... 183 5.3.1 Wild tomato species exhibit steroidal alkaloid chemical diversity ...... 183 5.3.2 Cultivated material lacks diversity in steroidal alkaloids relative to wild accessions and early pathway intermediates drive separation in our diversity panel ...... 186 5.3.3 Concentrations of steroidal alkaloids correlate and define early and late stages of the pathway ...... 189 5.3.4 Diversity in tomato steroidal glycoalkaloids is largely under genetic control191 5.3.5 Association determined by GWAS are validated in a biparental population 197 5.4 Discussion ...... 201 5.5 Materials and Methods ...... 205 5.5.6 Plant Material ...... 205 5.5.7 Field Trial...... 206 5.5.8 Inbred Backcross Population Development ...... 207 5.5.9 Chemical Reagents...... 208 5.5.10 Steroidal Alkaloid Profiling ...... 208 5.5.11 Genotyping ...... 208 5.5.12 Statistical Analysis ...... 209 5.5.13 GWAS ...... 211 5.5.14 QTL Analysis ...... 212 Bibliography ...... 214 Appendix A. Supplemental Figures for Chapter 5...... 275 Appendix B. Supplemental Tables for Chapter 5 ...... 280

xiii

List of Tables

Table 2.1 Carotenoid concentration (mg/100g fresh weight; ± standard deviation) in tomatoes grown in multiple locations as a function of background, Beta allele, and extraction method...... 76 Table 2.2 Carotenoid concentration (mg/100g fresh weight; ± standard deviation) in processing tomatoes grown in multiple locations as a function of tangerine allele and extraction method...... 78 Table 2.3 Proportion of variance explained by genetics, environment, and methodology for all carotenoids measured...... 80 Table 2.4 Regression analysis for both extraction and analysis methods...... 81 Table 2.5 Carotenoid concentration (mg/100g fresh weight; ± standard deviation) in tomatoes grown in multiple locations as a function of background, Beta allele, and analysis method...... 85 Table 3.1 Mean ± standard deviation of mouse body and liver mass at sacrifice (10 wk)...... 105 Table 3.2 Gene set overlaps enriched in differentially expressed genes from mice fed tomato supplemented diets (Continued)...... 112 Table 3.3 Select differentially expressed genes related to xenobiotic metabolism and mammalian circadian rhythm that were differentially expressed (Q < 0.1) among treatment groups (Continued)...... 114 Table 3.4 Partial least squares discriminate analysis model performance statistics...... 126 Table 3.5 Tomato steroidal alkaloid-derived metabolites present in liver tissue from tomato-fed (red and tangerine) mice that were absent in control animals (Continued).. 134 Table 4.1 LC-MS/MS MRM parameters of steroidal glycoalkaloids quantified by our method...... 154 Table 4.2 UHPLC-QTOF-MS confirmation of tSGA identities (Continued)...... 165 Table 4.3 Extraction efficiency of commercially available tSGAs and -derived internal standards...... 169 Table 4.4 Intraday and interday coefficient of variation values for analytes quantified by our UHPLC-MS/MS method...... 171 Table 4.5 Survey of tSGAs in common tomato-based products reported in µg per serving size...... 175 Table 5.1 Percentage of total variance due to the contribution of genetics and environment for steroidal alkaloid content...... 192 Table 5.2 Markers associated (P <0.01) with tomato steroidal alkaloids in the BC1S1 validation population. Marker means reported in µg/100g fresh weight (Continued). .. 198

xiv

B.1 Tomato steroidal alkaloid concentrations (in µg/100g fresh weight, ± standard deviation) in parental material and wide cross hybrids (F1 generation)…………………………………………………………………………….. 280 B.2. Metadata information for diversity panel germplasm (Continued)………...……..282 B.3. Means plus or minus standard deviations of steroidal glycoalkaloids for each genotype represented in the diversity panel (Continued)…………………………...… 289

xv

List of Figures

Figure 1.1 Graphical representation of the gene encoding CRTISO. The tangerine alleles t3183 and t3002 are featured. Based on Isaacson et al. (2002)...... 16 Figure 2.1 Graphical representation of data analysis strategy. Linear models, detailed in the “Statistical Analysis” section, were used to determine if sub-populations should be analyzed separately due to inherent differences in carotenoid composition or concentrations. Arrows with solid black tails indicate comparisons made between extraction or analysis methods...... 66 Figure 2.2 Principal components analysis (PCA) of tomatoes extracted using the standard or rapid method (A) and tomatoes analyzed by HPLC-DAD or UHPLC-DAD (B). Individuals with alleles of tangerine were not included in Figure 2.2B due to an inability to resolve -carotene, neurosporene, and tetra-cis-lycopene by UHPLC-DAD. Sub- population clustering was similar regardless of extraction or analysis method...... 74 Figure 2.3 Chromatograms of tomatoes carrying an allele of Beta (LA716) generated by HPLC-DAD and UHPLC-DAD (inset and scaled for difference in run time). Carotenoids quantified included: 1.) Phytoene; 2.) Phytofluene; 3.) β-carotene; 3a.) β-carotene isomers; 4.) all-trans-lycopene; 4a.) cis-lycopene isomers. Traces indicate DAD wavelengths 286 nm, 348 nm, and 471 nm...... 84 Figure 3.1 Boxplots of log2 transformed and TMM normalized RNA-Seq (A); Multidimensional scaling analysis of RNA-Seq data before (B) and after (C) merging data generated on two sequencing lanes; and scatter plot of calculated dispersion of the design matrix used in the analysis of RNA-Seq data (D)...... 108 Figure 3.2 Venn diagrams displaying counts of overlapping upregulated (A) and downregulated (B) genes following differential expression analysis of 14,951 genes... 109 Figure 3.3 Boxplots of log2 transformed and Pareto scaled untargeted metabolomics data (including 7 QC samples on the right side) (A) and principal components analysis scores plots visualizing data structure from untargeted metabolomics (ESI+) data including quality control samples (B). Principal components analysis loading plots with samples colored by assigned k-means cluster groups for red, tangerine, and control (C), and tomato and control (D)...... 121 Figure 3.4 Volcano plot of features detected by untargeted metabolomics (ESI+) comparing mice fed diets enriched with tomatoes to control. Features with a -log10 FDR- adjusted P-value above 2 and a log2 fold change greater than 1 were colored red (i.e., higher in mice fed tomatoes). Features with a -log10 FDR-adjusted P-value above 2 and a log2 fold change less than -1 were colored green (i.e. higher in mice fed control diets). 124 Figure 3.5 Venn diagram displaying counts of significantly different features overlapping between treatment groups following the analysis of 2,492 detected features...... 127

xvi

Figure 3.6 Random forest model tuning parameters used for untargeted metabolomics (ESI+) feature selection. Model error was estimated as a function of number of trees (A) and number of variables randomly selected at each node (B) when classifying red, tangerine, and control samples. The same parameters were tested for a random forest classification model with tomato and control samples (C and D)...... 128 Figure 3.7 Heatmap of features detected by untargeted metabolomics (ESI+) with variable importance scores > 1.0 generated by a PLS-DA model. Hierarchical clustering was used horizontally categorize samples and vertically group features using Euclidean distances and Ward’s linkage method...... 131 Figure 3.8 Spectra of dihydroxytomatidine (A) and sulfated hydroxytomatidine (B) derived from MSMS experiments conducted on a UHPLC-QTOF/MS using a collision energy of 30 eV. Double bonds and functional groups are provisionally assigned...... 142 Figure 4.1 Structural and isomeric variation in selected tomato steroidal alkaloids. Steroidal glycoalkaloids found in tomato (tSGAs) are spirosolane-type with variations in a singular double-bond (C5:6), F-ring decorations (C22-C27), F-ring rearrangement (resulting in a change in stereochemistry at C22), and C3 glycosylation (typically a four-sugar tetrasaccharide, lycotetraose). The undecorated SA steroidal alkaloid backbone is shown first with relevant carbons numbered and ring names (A-F). Steroidal alkaloids were grouped based on structural similarity with bonds of varying stereochemistry denoted by wavy bonds and varying C5:6 saturation status denoted by a dashed bond. Structural variation, along with the monoisotopic mass, molecular formula, and common name are displayed alongside structures for each group. R-groups were used to denote status of C3 glycosylation (R1 and R2) and possible positions of glucosylation on glucosylated (dehydro)acetoxytomatine (R3, R4). All possible isomers and derivatives are not shown, just those quantitated in this method ...... 146 Figure 4.2 Chromatogram of tSGAs found in red ripe tomatoes measured by our UHPLC- MS/MS method. Peaks are identified as follows: 1a–c: Esculeoside B and isomers; 2a–d: Hydroxytomatine; 3: Dehydrolycoperoside F, G, or Dehydroesculeoside A; 4a,b: Lycoperoside F, G, or Esculeoside A; 5a–c: Acetoxytomatine; 6a,b: Dehydrotomatine; 7: Alpha-; 8: Alpha-solanine; 9: Solanidine; 10: Tomatidine; 11: Dehydrotomatidine...... 149 Figure 5.1 Box and whisker plots of alpha-tomatine and lycoperoside F, G, or esculeoside A. Each dot represents an individual observation. The y-axis was log transformed to visually condense the large amount of variation observed in the concentrations of all tomato steroidal alkaloids measured in this study. Distinct sub-groups of wild cherry tomatoes can be observed for lycoperoside F, G, or esculeoside A...... 185 Figure 5.2 Principal components analysis scores plot (A) and corresponding loading plots (B) for 107 genotypes represented in the diversity panel. Loadings represent vectors of steroidal alkaloids phenotyped in the population and their direction/magnitude indicate their influence on a given principal component. Wild accessions (e.g. S. pimpinellifolium) exhibited greater diversity in steroidal alkaloids relative to processing accessions. Two distinct subgroups of wild cherry tomatoes appeared to separate based on compounds found in different halves of the tomato steroidal alkaloid pathway...... 188

xvii

Figure 5.3 Correlation matrix of all tomato steroidal alkaloids quantified in diversity panel. Size and darkness of circle indicate intensity of correlation coefficient (see legend on right) and *, **, and *** indicate statistical significance at P<0.05, 0.01, and 0.001, respectively. Cells with no significance indicator were found to be P>0.05. Pathway intermediates tended to correlate strongly with neighboring metabolites in the proposed biosynthetic pathway. All analytes correlated with “total” tomato steroidal alkaloids to varying degrees...... 190 Figure 5.4 Manhattan plots of steroidal alkaloids used in GWAS. Red colored bars indicate SNPs above a significance threshold of-log(P-values) of 2...... 196 Figure 5.5 Manhattan plots of steroidal alkaloids used in QTL analysis on the BC1S1 validation population. Red colored bars indicate SNPs above a significance threshold of- log(P-values) of 2……………………………………………………………………… 200 A.1. Map of South and Central America displaying locations where Solanum pimpinellifolium and Solanum lycopersicum var. cerasiforme were collected……..… 276 A.2. Additional box and whisker plots of steroidal alkaloids measured in diverse germplasm. The y-axis was log transformed to visually condense the large amount of variation observed in the concentrations of all tomato steroidal alkaloids measured in this study………………………………………………………………………………….... 277 A.3. Additional box and whisker plots of steroidal alkaloids measured in diverse germplasm. The y-axis was log transformed to visually condense the large amount of variation observed in the concentrations of all tomato steroidal alkaloids measured in this study………………………………………………………………………………….... 278 A.4. Correlation matrix of all tomato steroidal alkaloids and their isomers quantified in diversity panel. Size and darkness of circle indicate intensity of correlation coefficient (see legend on right) and *, **, and *** indicate statistical significance at P<0.05, 0.01, and 0.001, respectively. Cells with no significance indicator were found to be P>0.05……………………………………………………..………………………...… 279

xviii

Chapter 1. Literature Review

Tomato History at a Glance

The modern tomato (Solanum lycopersicum) is descended from the wild currant tomato

(Solanum pimpinellifolium) (Blanca et al., 2015, 2012). The primary domestication event likely occurred in the Andes region of Southern Ecuador and Northern Peru with a subsequent event in Mexico (Blanca et al., 2015). How the tomato arrived in Central

America is poorly understood, but some have hypothesized that various herbivores ranging from birds to turtles facilitated the spread of tomato (Smith, 1994) though human movement cannot be ruled out. It has long been postulated that Solanum lycopersicum var. cerasiforme, a genetic admixture, was the ancestor to the domesticated Solanum lycopersicum (Ranc et al., 2008). However, recent reports challenge that assumption by presenting evidence that Solanum lycopersicum var. cerasiforme may have predated domestication (Razifard et al., 2020).

During the 16th century, tomatoes were brought to Europe by Spanish explorers

(Jenkins, 1948). Within this time period, tomatoes became integrated within cuisines of the Mediterranean as well as Southeast Asia (Smith, 1994). The botanist Pietro Andrea

Matthioli is credited with coining “mala aurea” (golden apple) as an early common name for the tomato. Matthioli classified tomatoes and other Solanaceous crops that were

1 brought to Europe as being related to mandrakes, which were known for aphrodisiac properties. Unfortunately, mandrakes also had the reputation of being poisonous, which tarnished the reputation of tomatoes. In the mid-16th century, Matthioli used the term pomi d’oro which is the root of the common name for a tomato in modern Italy (Smith,

1994).

The modern tomato would likely be unrecognizable to the explorers who brought this plant to Europe. Tomato breeding programs in the 20th century focused largely on yield, disease resistance, ease of shipping, changes in plant habit to fit production systems, and ripening uniformity (Georgelis et al., 2004; Powell et al., 2012). Modern tomatoes only have approximately 5% of the genetic diversity compared to wild species because of the major bottlenecks that happened during domestication (Bai and Lindhout, 2007; Blanca et al., 2015; Miller and Tanksley, 1990). Restricted genetic variation has been recognized as an issue for nearly 100 years, and tomato breeders have therefore used wild relatives to introgress unique alleles since the early 20th century (Hawkes, 1977; Zamir, 2001). Likely because of this breeding emphasis, cultivated tomatoes have greater genetic diversity than heirlooms or vintage varieties (Blanca et al., 2015; S.-C. Sim et al., 2012b). In addition to serving as a source of disease resistance loci, wild species have provided alleles that affect fruit quality including sugar content (Matsukura, 2016) and nutritional value (Dalal et al.,

2010; Lincoln and Porter, 1950; Ronen et al., 2000; Stommel, 2001; Stommel and Haynes,

1994)

2

1.1 Tomato Phytochemicals and Their Biosynthesis

Tomatoes are a rich source of vitamin C, manganese, provitamin A, vitamin K, and potassium as well as vitamin E, folate, niacin, magnesium, vitamin B6, copper, thiamin, and phosphorous (USDA-ARS, 2014). It should be noted that approximately

75% of tomato-based foods are predominantly consumed after some form of processing

(e.g. tomato paste, tomato sauce, etc.) (USDA National Agricultural Statistics, 2012) which has shaped the priorities of tomato breeding programs. While recent consumer trends indicate that less processed foods are preferred, processing steps such as heating can destroy cell walls and alter the physical properties of various phytochemicals thereby increasing their bioaccessibility and bioavailability (Gärtner et al., 1997; Stahl et al.,

1992; Tonucci et al., 1995). However, compounds such as vitamin C and various flavonoids can be destroyed concurrently (Georgé et al., 2011; Vallverdú-Queralt et al.,

2012). Thus, research into processing techniques that preserve biological compounds beneficial to human health and nutrition is critical.

Tomato fruits accumulate carotenoids (including provitamin A carotenoids), vitamin C, phenolic acids, and flavonoids, and these phytochemicals are frequent targets for tomato breeding programs. These compounds have been associated through epidemiological studies with preventing an array of chronic diseases (Frusciante et al.,

2007; Giovannucci et al., 1995; Raiola et al., 2014) and improving brain health (Spencer,

2009a, 2009b). From a plant genetics perspective, these compounds have been positively selected for in some cases as a means to defend the plant, protect future generations of progeny, and provide a dietary incentive for seed dispersers (Lewinsohn and Gijzen, 3

2009). Additionally, these compounds are produced to ameliorate the effects of oxidative stress in plants, which can come from normal metabolic processes as well as high light and temperatures (Gautier et al., 2005; Giliberto et al., 2005; Liu et al., 2004; Torres et al., 2006). To satisfy the demand for healthy and nutritious tomatoes by consumers, research has focused both on environmental manipulation and breeding strategies to improve the health benefits of tomato-based food.

1.1.1 Carotenoid Biosynthesis

Carotenoids are the most recognizable class of phytochemicals present in tomatoes. Responsible for the red, orange, and yellow coloration of tomato fruits, these non-polar compounds have been associated with benefitting human health potentially through antioxidant mechanisms, pro-vitamin A activity (Story et al., 2010), and yet to be defined mechanisms. However, carotenoids are not only present within tomato fruits.

Carotenoids are pivotal for the function of the light harvesting complex and protect lipid membranes and proteins from photo-oxidation (Demmig-Adams et al., 1996). Depending on the light environment that a plant is suited for, carotenoid profiles in above-ground biomass can vary tremendously (Demmig-Adams and Adams, 1992). In plants, as well as some species of algae and fungi, over 600 carotenoids have been characterized (H.

Gerster, 1997) and there are thousands of theoretical isomers (Zechmeister et al., 1941).

Understanding the biosynthesis of carotenoids is one way for breeders to increase the bioactive properties of crops through manipulating carotenoid content.

4

Carotenoids are assembled from C5 isoprene molecules derived from two pathways: the cytosol-based mevalonic acid pathway (MVA) or the plastid-localized methylerythritol 4-phosphate (MEP) pathway (Logan et al., 2000). While both pathways generate isoprene-based molecules and are spatially independent, some cross-talk occurs

(Hemmerlin et al., 2003). However, substrates used in the biosynthesis of carotenoids are primarily derived from the MEP pathway (Moise et al., 2014). A key study published in

1995 found that plants overexpressing 3-hydroxy-3-methylglutaryl coenzyme A reductase (HMGR), an enzyme early in the MVA pathway, had altered levels of sterols, but not carotenoids, chlorophylls, or sesquiterpene defense compounds (Chappell et al.,

1995). This study provided key evidence of the existence of two separate pathways.

Isoprene biosynthesis via the MEP pathway begins with the condensation of pyruvate and glyceraldehyde-3-phosphate; products from glycolysis (Estévez et al.,

2001). This reaction is catalyzed by 1-deoxy-D-xylulose 5-phosphate synthase (DXS) and generates 1-deoxy-D-xylulose 5-phosphate (Grassi et al., 2013). Through the action of 1-deoxy-D-xylulose 5-phosphate reductoisomerase (DXR), 2-C-methyl-D-erythritol

4-phosphate is produced (Grassi et al., 2013). From there, 2-C-methyl-D-erythritol 4- phosphate cytidyltransferase (MCT) catalyzes the reaction that generates 4- diphosphocytidyl-2-C-methyl-D-erythritol (Grassi et al., 2013). This product is converted into 4-diphosphocytidil 2-C-methyl-D-erythritol 2 phosphate via 4-diphosphocytidyl-2-

C-methyl-D-erythritol kinase (CMK) (Grassi et al., 2013). 2-C-methyl-D-erythritol 2,4- cyclodiphosphate synthase (MDS) activity then generates 2-C-methyl-D-erythritol 2,4- cyclodiphosphate (Grassi et al., 2013). 4-hydroxy-3-methylbut-2-en-1-yl diphosphate

5 synthase (HDS) then yields 4-hydroxy-3-methylbut-2-en-1-yl diphosphate which can be converted to either dimethylallyl diphosphate or isopentenyl-diphosphate (IPP) via 4- hydroxy-3-methylbut-2-enyl diphosphate reductase (HDR) (Grassi et al., 2013).

Importantly, IPP can be shunted either into the MVA pathway or additional IPP can be taken from the MVA pathway via isopentenyl-diphosphate δ-isomerase (IDI) or diphosphomevalonate decarboxylase (PMD) activity (Grassi et al., 2013). Regardless,

IPP, is a precursor for geranyl geranyl diphosphate synthase (GGPS) which yields geranyl geranyl pyrophosphate (GGPP) (Grassi et al., 2013). GGPP is a crucial pathway intermediate as it is a precursor for gibberellins, tocopherols, chlorophylls, plastoquinones, and carotenoids (Grassi et al., 2013). For the sake of brevity, only carotenoids will be of focus.

Through the action of phytoene synthase 1 (PSY1), a chromoplast specific phytoene synthase, two GGPP molecules are condensed to form the first carotenoid, a 40- carbon compound called phytoene (Grassi et al., 2013). This step in the pathway is also considered to be rate-limiting (Nisar et al., 2015) and is partially controlled by the red/far-red photoreceptor phytochrome (Toledo-Ortiz et al., 2010). Phytoene is then converted to phytofluene via phytoene desaturase (PDS) (Grassi et al., 2013).

Additionally, PDS can also convert phytofluene into an isomer of ζ-carotene (9,15,9’–tri- cis-ζ-carotene) (Grassi et al., 2013). From there, this compound is converted to another isomer via ζ-carotene isomerase (Z-ISO) (Grassi et al., 2013). ζ-carotene desaturase

(ZDS) converted ζ-carotene into 7,9,0’-tri-cis-neurosporene which is converted into

9’cis-neurosporene via carotenoid isomerase (CRTISO) (Grassi et al., 2013). ZDS

6 converts 9’cis-neurosporene to 7,9,7’,9’-tetra-cis-lycopene (prolycopene) and CRTISO then converts this compound into trans-lycopene (Grassi et al., 2013). Tomatoes with the tangerine mutation do not have functional CRTISO enzymes and accumulate prolycopene as well as other lycopene precursors such as neurosporene and ζ-carotene

(Isaacson et al., 2002; Kachanovsky et al., 2012).

The carotenoid pathway diverges at this point. One branch converts all-trans- lycopene into δ-carotene via lycopene epsilon cyclase (LYCE) (Grassi et al., 2013). This compound can be converted into α-carotene, zeinoxanthin and lutein via lycopene beta cyclase (LCYB), β-carotene hydroxylase (CHYB), and ε-carotene hydroxylase (CHYE), respectively (Grassi et al., 2013). The second branch converts all-trans-lycopene into γ- carotene and then β-carotene by LCYB (Grassi et al., 2013). In tomatoes, the expression of genes encoding enzymes in the branch that yields lutein as well enzymes beyond β- carotene are downregulated during ripening (Ronen et al., 1999). However, the rest of this pathway is particularly important in leafy tissues as xanthophyll carotenoids are critical for quenching excess energy from photosynthesis (Demmig-Adams and Adams,

1992; Demmig-Adams et al., 1996). Briefly, β-carotene is converted into β-cryptoxanthin and then zeaxanthin by CHYB (Grassi et al., 2013). Zeaxanthin can be converted into antheraxanthin and violaxanthin via zeaxanthin epoxidase (ZEP) (Grassi et al., 2013).

These can also be converted backwards through the pathway by violaxanthin deepoxidase

(VDE) which is critical for the function of the xanthophyll cycle (Demmig-Adams et al.,

1996; Demmig-Adams and Adams, 1992; Niyogi et al., 1998). Violaxanthin can be converted to neoxanthin (NXS) or to 9-cis-violaxanthin by neoxanthin synthase and nine-

7 cis-epoxy dioxygenase (NCED) enzymes (Grassi et al., 2013). Eventually, 9-cis- violaxanthin is converted into the phytohormone abscisic acid (ABA) which is crucial for many plant processes including seed dormancy and stomatal aperture regulation

(Nambara and Marion-Poll, 2005; Schwartz, 1997; M. Zhang et al., 2009). While carotenoids are important pigments due to their many functions within plants, they are also important for human health and this aspect will be covered in a forthcoming section.

Carotenoid Biosynthesis Mutants

Because of their vibrant color, carotenoids have long been subject to selection by tomato breeding programs to diversify germplasm and increase marketability. Breeders could visually confirm Mendelian inheritance patterns by simply tracking fruit color in breeding populations. Several carotenoid biosynthesis mutants have been characterized and used for a variety of breeding outcomes (Bramley, 2002; Liu et al., 2015). Here, I will review two specific carotenoid biosynthesis mutants. Beta affects the production of the provitamin A carotenoid β-carotene while tangerine affects the conversion of tetra- cis-lycopene to all-trans-lycopene. Coincidentally, both of these mutations result in orange colored tomato fruits, indistinguishable from each other visually, but for different reasons. These mutants, as well as their alleles, are of particular interest because of their potential to positively influence human health and facilitate testing nutritional hypotheses about carotenoids.

8

1.1.1.1 Beta

Beta (B) is a chromoplast-specific allele of the enzyme lycopene beta cyclase

(CYC-B) derived from the green-fruited tomato Solanum habrochaites and was first characterized in 1950 (Lincoln and Porter, 1950). In typical red tomato fruits, about 80-

90% of total carotenoids are trans-lycopene and 7-10% are β-carotene (Frusciante et al.,

2007). However, tomatoes with the B allele have approximately 45-50% of their carotenoid content as β-carotene and potentially over 90% in the presence of a Beta- modifier gene (Ronen et al., 2000). Crosses made with low and high β-carotene tomatoes as well as individuals with the red flesh (R) allele demonstrated how beta-carotene content segregated in the progeny as a function of lycopene content (Lincoln and Porter,

1950). This group determined that the function of B is determined by the state of R implicating a requirement for lycopene for β-carotene biosynthesis (Lincoln and Porter,

1950). The authors also noted the possibility for B existing as an allele of the gene required to convert lycopene into β-carotene, but did not have sufficient evidence to justify their speculation. In later experiments, individuals heterozygous for the Beta allele were selfed and segregation patterns indicated that Beta displays complete-dominance in the absence of a “modifier” (e.g. R) which is present in many high-lycopene tomato varieties and is not strongly linked to Beta (Tomes et al., 1954). The Beta allele was further studied along with other carotenoid biosynthesis mutations such as tangerine

(Jenkins and Mackinney, 1955).

While B is known for conferring fruits with high proportions of β-carotene, the exact mechanism was not understood until the B locus was cloned (Ronen et al., 2000).

9

With prior knowledge that B mapped to chromosome 6 (Lincoln and Porter, 1950),

Ronen and others crossed introgression lines (IL 6-2 and IL 6-3) to M-82, a common model tomato plant. The F1 population was selfed and the F2 individuals that segregated for B were assessed using markers that flanked the Beta locus. Yeast artificial chromosome (YAC) libraries were compared to markers that co-segregated with B and

YAC 310 and 271 were found to match a probe marker (TM16) which also co-segregated with B. Fine mapping revealed B’s location on chromosome 6 and sequencing revealed homology to capsanthin-capsorubin synthase (CCS) (Bouvier et al., 1994) as well as neoxanthin synthase (Al-Babili et al., 2000; Ronen et al., 2000).

Sequence variation 5′ to the coding region for CYC-B was hypothesized to cause differences in mRNA transcription between B and b (Ronen et al., 2000). This hypothesis could partially explain the differences in β-carotene accumulation reported in the literature (Miller and Tanksley, 1990; Stommel and Haynes, 1994; Stommel, 2001;

Ronen et al., 2000). However, it is important to consider environmental conditions, genetic backgrounds, and means of measuring carotenoid content. Interestingly, the exonic regions of B are largely conserved among tomato species, but the promoter regions tend to have many single nucleotide polymorphisms (SNPs) that alter the transcription of B (Dalal et al., 2010; Orchard, 2014; Ronen et al., 2000). Despite the description of 5′ variation by Ronen et al. (2000), it was not until a decade later when sequence data of the promoter region of B was published (Dalal et al., 2010). Their findings echoed previous literature suggesting that sequence variation in the promoter region of B was responsible for phenotypic variation. While it is generally understood

10 that variation in the promoter source of B leads to altered B transcript levels, many knowledge gaps exist pertaining to how this variation alters the rest of the carotenoid pathway, carotenoid breakdown products (apocarotenoids), or pathways outside of carotenoid biosynthesis.

1.1.1.2 Tangerine

Tangerine tomatoes first appeared in the literature in the early 1930’s in a report published by J.W. MacArthur (MacArthur, 1934). This report summarized field and greenhouse studies spanning 1924 to 1933 that involved over 48,000 individual plants and attempted to characterize the inheritance patterns of a large number of phenotypic traits ranging from seedling to ripe-fruit characteristics. These authors asserted that the tangerine allele mapped to chromosome 7 based on their experiments. In the 1940’s,

Zechmeister and colleagues analyzed the unique blend of carotenoids found in tangerine tomatoes. They were the first to report the presence of prolycopene in tangerine tomatoes; a poly-cis isomer of lycopene (Zechmeister et al., 1941). These authors noted that the orange color in tangerine tomatoes is only found when the tangerine allele is in a homozygous state, confirming the recessive nature of the tangerine mutation. While their study used thin layer chromatography and spectrophotometry, they were able to separate many of the lycopene precursors found in tangerine tomatoes. Many of these compounds were not formally identified until later in the 20th century, when analytical chemistry technology was advanced enough to separate, identify, and quantify these metabolites.

11

Still, the unusual pigmentation of tangerine tomatoes compelled many researchers to investigate the genetic underpinnings of the mutation.

Experiments in the 1950s used classical genetic approaches to understand how tangerine segregates in a population. Crosses were made using tomatoes with yellow flesh (r and ry), tangerine, and beta (B) and to determine epistatic ratios in F2 populations based on fruit pigment characteristics (Tomes et al., 1953). Using open-column chromatography and spectrophotometry, they were able to identify phytofluene, ζ- carotene, protetrahydrolycopene (ostensibly neurosporene), prolycopene, trans-lycopene, and β-carotene in the fruits of the F2 population. The abundance and ratios of these individual carotenoids were determined primarily by the genetic characteristics of the parents. Particularly, yellow flesh reduced overall carotenoid content in fruit flesh. Yellow flesh was shown to be epistatic to beta, as it determined the phenotype regardless of beta’s allele state (BB, Bb, or bb) (Tomes et al., 1953). Additionally, when tangerine was present in a homozygous recessive state, the allele state of beta did not alter the pigmentation of the fruit. Therefore, tangerine was posited to affect biosynthetic processes upstream of beta. Although yellow flesh was thought to affect a process early in carotenogenesis, the authors were perplexed that tangerine was apparently epistatic to yellow flesh, which goes against the norm that genes in a common pathway are epistatic only to those downstream.

Using MacArthur’s 1934 publication as a guide, (Jenkins and Mackinney, 1953) tried to confirm previously established epistasis ratios for tomatoes with both the tangerine mutation as well as yellow flesh. Unlike previous reports, they used objective

12 chemical data to make informed decisions in terms of classifying progeny into certain epistatic categories. Earlier studies asserted that yellow x orange crosses yielded progenies that segregated in a 9:3:4 fashion (red: yellow: orange). Jenkins and

Mackinney were able to replicate the 9:3:4 ratio that was previously reported

(MacArthur, 1934). However, they noted that of the orange progeny, the individuals could be further subdivided into 3:1 orange: light orange. Thus, tangerine and yellow flesh crosses yield progeny exhibiting a 9:3:3:1 ratio. Also, they reported that the light orange phenotype was particularly influenced by environmental variation and that clonal propagules of one individual plant yielded fruits of varying degrees of fruit color depending on the time of year. This finding highlights the need for carefully planned and properly replicated experiments when working with phenotypes that are controlled in part by the environment. Their overall conclusion was that yellow flesh abolishes most carotenoid production in tomato fruits, tangerine interrupts the carotenoid pathway and prevents the formation of trans-lycopene, and that tangerine was epistatic to yellow flesh

(Jenkins and Mackinney, 1953). Again, the notion that tangerine was epistatic to yellow flesh was difficult to explain for the authors. It would not be until early in the 21st century when these experiments would be revisited in an effort to characterize the genetic origin of the enzymes controlling the carotenoid biosynthetic pathway.

In 2012, Jenkins' and Mackinney's (1953) as well as Tomes' and others' (1953) experiments were revived to better understand the interaction between tangerine and yellow flesh (Kachanovsky et al., 2012). Using relatively modern genetic approaches coupled with high-performance liquid chromatography (HPLC), this group confirmed

13 that tangerine is indeed epistatic to yellow flesh. It should be noted that unlike experiments conducted in the 1950’s, the genetic mechanism for yellow flesh is now known to be caused by faulty TOM5 sequences on chromosome 3 (Fray and Grierson,

1993). Kachanovsky and others (2012) used multiple alleles of tangerine, and yellow flesh, generated by exposing M82 tomatoes to ethyl methanesulfonate (EMS) mutagenesis, they made crosses to generate F1’s containing both tangerine and yellow flesh and then selfed these individuals to generate segregating F2’s. The expression of genes encoding enzymes in the carotenoid pathway was quantified using real-time PCR

(qPCR) and quantitatively confirmed that tangerine does rescue, to a large extent, the expression of phytoene synthase 1 (PSY1), the gene that is knocked-out in individuals with the yellow flesh allele. PSY1 gene expression was also shown to increase in plants with the tangerine allele, indicating that tangerine either directly or indirectly regulates

PSY1. This study generated many important questions about the interplay within the carotenoid pathway. The authors suspect that PSY1 is regulated by tangerine via cis- carotene species and/or cleavage products generated by carotenoid cleavage dioxygenases (CCD) enzymes (Kachanovsky et al., 2012). To ensure that it was not the lack of trans-lycopene or β-carotene, the authors generated one zeta line using EMS mutagenesis. This line lacked a functional zeta carotene desaturase (ZDS) which prevented the accumulation of neurosporene and prolycopene in fruits. However, their data showed that yellow flesh was epistatic to zeta, indicating that the lycopene precursors neurosporene and prolycopene and/or their cleavage products may very well be responsible for tangerine’s effect over yellow flesh. While many questions remain about

14 the tangerine/yellow flesh interaction, the existence of epistasis suggests that the carotenoid pathway is under complex regulation which is not yet fully understood.

In the most basic sense, tangerine is a recessive mutation that impairs the ability of carotenoid isomerase (CRTISO) to function normally (converting tetra-cis-lycopene to all-trans lycopene) (Isaacson et al., 2002). It should be noted that tangerine is a category of CRTISO mutations that represents several alleles that differentially impair the function of this gene. Common alleles of tangerine are t3183, tmic, and t3002 (also known as

“tangerine virescent”). The locations of t3183 and t3002 can be seen in Figure 1.1. Others, which were generated by EMS mutagenesis, include t3406, t4838, and t9776 (Kachanovsky et al., 2012). The sequence of tangerine was determined using map based cloning (Isaacson et al., 2002). They found that tangerine maps to a region on chromosome 10, and crossed tangerine plants to an introgression line (IL) known as IL10-2 and analyzed the F2 individuals. After finding that their marker CT57 co-segregated with tangerine, they screened a tomato genomic bacterial artificial chromosome (BAC) library and identified

BAC 21O12. Using PCR to amplify the sequences on the tails of BAC 21O12, they verified that the BAC contained the entire tangerine locus. The team sequenced BAC

21O12 and reported that the gene, CRTISO, has 13 exons and 12 introns and exists as one copy (Isaacson et al., 2002). Importantly, Isaacson and others characterized the nature of tmic and t3183 as being a 282 bp deletion that spans the first exon and intron and a

348 bp deletion in the 5′ nontranscribed region, respectively. Strangely, the carotenoid profiles between tmic and t3183 are not identical even though they both knock-out CRTISO.

This finding raises the question as to whether the truncated mRNA transcript produced by

15 tmic serves a signaling role or if there is simply a difference in transcriptional rates.

Additionally tmic, displays yellow leaves during its early development partly due to a lack of leaf xanthophylls. However, this phenotype is temporary even though CRTISO is non- functional. Thus, photoisomerization has been hypothesized to be the alternate way in which limited amounts of lycopene precursors are able to continue through the carotenoid pathway in tangerine tomatoes (Isaacson et al., 2002).

Figure 1.1 Graphical representation of the gene encoding CRTISO. The tangerine alleles t3183 and t3002 are featured. Based on Isaacson et al. (2002).

1.1.2 Phenolic Acids and Flavonoids

While not as abundant as carotenoids, phenolic acids and flavonoids are two major classes of polar phytochemicals that can be found in tomatoes. These compounds are produced primarily for plant defense against herbivores, structural support, and can function as “sun screen” for above ground tissues (Kotilainen et al., 2010; Li et al., 1993,

2010). Both phenolic acids and flavonoids are products of the phenylpropanoid pathway

16 which is derived from the Shikimate pathway (Fraser and Chapple, 2011; Vogt, 2010).

Within the shikimate pathway, amino acids such as tyrosine, tryptophan, and phenylalanine are produced. Phenylalanine can then undergo various reactions to generate various phenolic acids.

Phenolic acids can be classified into either hydroxycinnamic or hydroxybenzoic acids (Raiola et al., 2014). The most prominent phenolic acids in tomato are ferulic, p- coumaric, caffeic, and chlorogenic acid (Luthria et al., 2006) . These compounds function as antioxidants and potentially have anti-cancer roles (Raiola et al., 2014; Rajendra

Prasad et al., 2011; Silva et al., 2000). Additionally, tomatoes can use phenylalanine to biosynthesize flavonoids.

Over 10,000 flavonoids have been discovered and this number is steadily increasing (Martens et al., 2010). Flavonoids are defined by their structure as C6-C3-C6 indicating two aromatic rings that are connected by three carbons. Because of many hydroxyl groups present on these compounds, flavonoids can function as electron donors.

Furthermore, many flavonoids are glycosylated which allows for the biosynthesis of many unique compounds (Chahar et al., 2011; Hodek et al., 2002). Ultimately, tomatoes accumulate naringenin chalcone, kamepferol-3-O-rutinoside, and quercetin-3-O- rutinoside in their fruit peels (Giuntini et al., 2008; Le Gall et al., 2003; Muir et al., 2001;

Slimestad et al., 2008). In leaves, flavonoids are accumulated in the upper epidermis and are glycosylated with a variety of sugar moieties (Stewart et al., 2000).

Flavonoids biosynthesis is partly regulated by MYB transcription factors (Zoratti et al., 2014). MYB transcription factors interact with basic helix-loop-helix (bHLH) as

17 well as WD40-repeat proteins to form a protein complex (Falcone Ferreyra et al., 2012;

Jaakola, 2013). Proteins involved in light perception are also involved such as the E3 ubiquitin ligase COP1 which negatively regulates the pathway by degrading MYB transcription factors (Li et al., 2012).

1.1.3 Tomato Steroidal Glycoalkaloid Biosynthesis

Alkaloids are nitrogenous secondary metabolites produced ubiquitously across plantae and fungi. Within the plant kingdom, it has been estimated that there are over

12,000 unique alkaloid species (Yonekura-Sakakibara et al., 2009). This broad class is highly structurally diverse (Afendi et al., 2012). While not necessary for plant growth and development, these compounds are an important component in plant defense responses to a variety of pathogens and predators spanning bacteria to humans (Koh et al., 2013; Milner et al., 2011). The diversity and abundance of these compounds varies by plant species and can be heavily influenced by environmental conditions, such as soil nitrogen availability

(Koh et al., 2013). Within this expansive class of phytochemicals, tomatoes uniquely produce a unique class of alkaloids known as steroidal alkaloids. Tomato steroidal alkaloids were discovered to be derived in the mid 20th century (Heftmann et al., 1967). Alpha-tomatine is the most well studied tomato steroidal alkaloid and its discovery was made over 70 years ago (Fontaine et al., 1948). Reports of alpha-tomatine concentrations in tomato plants vary considerably depending on the tissue being measured or the ripeness stage of fruits (Friedman et al., 1994; Friedman and Levin, 1995; Kozukue

18 et al., 2004). However, values in the literature also vary greatly due to differences in genetic backgrounds and cultural conditions. To date, there is little information in regard to how steroidal alkaloids are affected by genetics and the environment.

Some of the earliest studies that examined natural variation in glycoalkaloid concentrations were restricted to leaf and stem tissues. Crosses between wild species such as Solanum pimpinellifolium (LA1335) and Solanum lycopersicum var. cerasiforme

(LA1310) to cultivated varieties were made to determine segregation patterns of foliar alpha-tomatine (Juvik and Stevens, 1982). Due to the bimodal and trimodal segregation patterns seen in the backcross and F1 populations, respectively, they concluded that foliar alpha-tomatine concentration is controlled by two alleles of a gene that exhibit additivity

(Juvik and Stevens, 1982). The inheritance of genes controlling alpha-tomatine in fruits was later investigated by Charles Rick and others when 88 accessions of Solanum lycopersicum var. cerasiforme from Colombia, Ecuador, Peru, and Bolivia were surveyed for alpha-tomatine content (Rick et al., 1994). Two accessions from the Alto Mayo region of Peru (LA2213 and LA2262) were identified and crosses were made with sweet tomato varieties (LA2295 and LA490) to determine the inheritance of the bitter trait. Rick and colleagues postulated that the high alkaloid accessions had a recessive mutation in a single gene that controls alpha-tomatine conversion. Furthermore, they concluded that because the bitter accessions were from a small area of the total geographic survey, this was a random mutation that did not have a clear evolutionary purpose (Rick et al., 1994).

It would not be until the advent of next generation sequencing technologies that scientists

19 would learn that alpha-tomatine biosynthesis is controlled by at least a dozen genes spread across multiple chromosomes (Itkin et al., 2013)

Researchers have used association analysis, comparative transcriptomics, and genetic modification as well as high resolution mass spectrometry to describe the steroidal alkaloid biosynthetic pathway and its chemical constituents (Abdelkareem et al.,

2017; Alseekh et al., 2015; Ballester et al., 2016; Bednarz et al., 2019; Cárdenas et al.,

2019; Iijima et al., 2013; Itkin et al., 2011; Mintz-Oron et al., 2008; Schwahn et al., 2014;

Sonawane et al., 2018; Zhu et al., 2018). One of the first genes in tomato’s glycoalkaloid biosynthesis pathway to be discovered was GLYCOALKALOID METABOLISM1

(GAME1) (Itkin et al., 2011). This gene encodes a galactosyltransferase that acts on tomatidine. In a later study, co-expression analyses in both tomato and potato revealed that based glycoalkaloid biosynthesis is controlled by at least ten genes (Itkin et al., 2013). Interestingly, six of these genes are located in close proximity to each other on chromosome 7 and another two form a cluster on chromosome 12. The locations of these genes were similar in potato, suggesting a common evolutionary event that permitted for the formation of these gene clusters (Itkin et al., 2013). The existence of these clusters also has important implications from a plant breeder’s perspective; linked genes violate

Mendel’s second law of independent assortment. Thus, multiple GAME genes can be inherited together in the same allelic combination. It has been hypothesized that these allelic combinations may be advantageous as they prevent the over-accumulation of glycoalkaloid biosynthetic pathway intermediates that are phytotoxic (Itkin et al., 2013,

20

2011). Other groups have reported similar metabolic gene clusters on chromosomes 4 and

12 in other solanaceous crops such as eggplant (Barchi et al., 2019).

The structures of lycoperoside A-D, esculeoside A, and lycoperoside F-H were determined by using 1Hand 13C nuclear magnetic resonance spectroscopy

(NMR) (Yahara et al., 2004). A few years later, another group also characterized other tomato steroidal alkaloids by purifying a methanolic extract from Solanum lycopersicum var. cerasiforme and using 1H NMR to characterize differences between the esculeoside

A and B (Yamanaka et al., 2009). Another study published around this time expanded upon this work by using both targeted and untargeted metabolomics approaches to determine changes in fruit-localized metabolites throughout fruit development (Mintz-

Oron et al., 2008). This group found that mature green tomatoes contained high levels of alpha-tomatine which decreased as fruits ripen. During ripening, acetyl glycosylated metabolites such as esculeoside A, lycoperoside F, and lycoperoside G increased, indicating that esculeosides and lycoperosides are products from an alpha-tomatine conversion pathway (Mintz-Oron et al., 2008). This observation was also reported in another untargeted metabolomics study, further supporting the notion that esculeosides and lycoperosides are alpha-tomatine conversion products (Moco et al., 2006). While the alpha-tomatine conversion process is likely under tight genetic control, production and perception is inextricably linked to fruit alpha-tomatine content.

One report determined that alpha-tomatine initial and final concentration in tomato fruits can be affected by ripening mutations such as rin and nor (Elsadig A.

Eltayeb and Roddick, 1984). This phenomenon was later attributed to ethylene

21 production when comparing tomatoes with rin (LA3012), nor (LA3013), and Nr

(LA3001) to wild type tomatoes (LA1090) (Iijima et al., 2009). During ripening, alpha- tomatine concentration decreases due to conversion to other steroidal alkaloids. In ripe tomatoes with the rin, nor, or Nr mutation, alpha-tomatine concentrations were on average 12.4x, 20.3x, and 22.1x higher than wildtype tomatoes (Iijima et al., 2009). A putative alpha-tomatine conversion pathway was outlined by Iijima and colleagues

(2008) and this group also utilized rin and nor ripening mutants to assess which steps were most affected by these mutations. Not surprisingly, wildtype fruits had low levels of alpha-tomatine and modified intermediates prior to esculeoside A. Conversely, rin and nor fruits had high levels of alpha-tomatine and modified intermediates, but no detectable amounts of lycoperoside F or esculeoside A (Iijima et al., 2008). This provides further evidence that the conversion of alpha-tomatine to downstream is partly under the control of ethylene. Given that the rin and nor mutations are sometimes present in modern tomato cultivars in a heterozygous state to increase shelf life (Giovannoni, 2007), it could be hypothesized that these long shelf life tomatoes may intrinsically have altered alkaloid profiles compared to tomatoes with normal ripening processes. Care must be taken when designing populations to test hypotheses about glycoalkaloid accumulation in tomatoes to ensure that mutations in ripening processes are not confounding with effects from genes in the glycoalkaloid biosynthesis or conversion pathway.

Recent efforts have been made to elucidate the genes and enzymes involved in additional steps of the steroidal alkaloid biosynthetic pathway and its regulatory mechanisms (Ballester et al., 2016; Cárdenas et al., 2019; Itkin et al., 2013, 2011;

22

Sonawane et al., 2018; Yu et al., 2020). GLYCOALKALOID METABOLISM (GAME) enzymes 4, 6, 7, 8, 11, and 12 have been shown to catalyze a series of hydroxylation, oxidation, and transamination reactions on the aliphatic tail of cholesterol to generate the

E and nitrogenous F rings characteristic of solanaceous steroidal alkaloids. (Itkin et al.,

2013). This series of reactions converts cholesterol into dehydrotomatidine, the first steroidal alkaloid in the proposed pathway. Dehydrotomatidine is then converted into tomatidine by GAME25, SlS5αR2 (a C5-alpha reductase), and Sl3βHSD1 (a C3- dehydrogenase/reductase) (Akiyama et al., 2019; Lee et al., 2019; Sonawane et al., 2018).

From here, it has been proposed that desaturated (derived from dehydrotomatidine) and saturated (derived from tomatidine) steroidal alkaloids are biosynthesized in parallel using the same enzymes at each step. Dehydrotomatidine and tomatidine are then converted to dehydrotomatine and alpha-tomatine, respectively, by a series of glycosylations catalyzed by GAME1, 17, 18, and 2 (Itkin et al., 2013, 2011). Both of these compounds can then be hydroxylated into hydroxy-dehydrotomatine and hydroxytomatine by 2-oxoglutarate-dependent dioxygenases (GAME31/32) (Cárdenas et al., 2019). The next steps are presumed to follow the order hydroxytomatine to acetoxytomatine, lycoperoside F/G/esculeoside A, and esculeoside B (and their desaturated counterparts), based on chemical structure.

While many structural genes related to steroidal alkaloid biosynthesis have been discovered, the regulatory mechanisms that control their expression are less understood.

GAME9 (aka JRE4) is among the first regulatory genes to be discovered for this pathway

(Cárdenas et al., 2016). This gene encodes an APETALA2/ethylene responsive

23 transcription factor transcription factor that can work in conjunction with SlMYC2, another transcription factor, to modulate the expression of genes within the steroidal alkaloid pathway as well as those part of the preceding cholesterol biosynthesis pathway.

Allelic variation for GAME9 can influence the expression of its target genes(Yu et al.,

2020). As previously mentioned, steroidal alkaloids can be affected by ripening mutations such as rin, nor, and Nr strengthening the idea that ethylene is an important hormone for this pathway. More recent evidence suggests that GAME9 is able to influence the expression of genes as far upstream of steroidal alkaloids as the MVA pathway (Nakayasu et al., 2018). Jasmonates, another class of compounds related to defense, have also been shown to influence the expression of genes in the steroidal alkaloid biosynthesis pathway with the involvement of CORONATINE INSENSITIVE 1

(Abdelkareem et al., 2017). Lastly, transcription factors involved in signal transduction cascades related to light such as ELONGATED HYPOCOTYL 5 (HY5) and

PHYTOCHROME INTERACTING FACTOR 3 (SlPIF3) have been shown to bind with the promoter regions of GAME1, 4, and 17 which are responsible for adding various sugars onto tomatidine prior to alpha-tomatine formation (Wang et al., 2018). While parts of the steroidal alkaloid biosynthesis pathway and its regulatory mechanisms have been resolved, more information is needed in order to effectively manipulate these compounds to test nutritional hypotheses.

24

1.2 Tomato Phytochemicals and Human Health

Tomatoes are a good source of carotenoids, flavonoids, and vitamin C (USDA-

ARS, 2014). Because of their health benefits, good taste, and ability to be easily processed into a variety of products, tomatoes are widely consumed. American consumers use almost 13.6 pounds fresh and 25.3 pounds processed tomatoes each

(USDA-ERS, 2018). To investigate the effects of tomatoes on human health, many studies have formulated tomato beverages as a way to ensure compliance and examine the effects of tomato phytochemicals in a realistic food matrix (Porrini et al., 2007; Stahl and Sies, 1992; Sutherland et al., 1999). Epidemiological studies relying on dietary records from large populations of individuals have highlighted an association between lycopene and health benefits such as a reduced risk for developing prostate cancer

(Giovannucci et al., 2002, 1995). However, tomatoes produce thousands of phytochemicals and it is unlikely that a single compound would be entirely responsible for observed health benefits. Here, I will broadly summarize the absorption and potential health benefits associated with major phytochemical classes found in tomato.

1.2.1 Carotenoid Absorption and Potential Health Benefits

Carotenoids are non-polar compounds that generally accumulate in crystalline structures in plastids. Because of their non-polar nature and deposition form, their absorption is limited in the human body. Most carotenoids are found in the trans configuration in plant tissues (Chandler and Schwartz, 1987). However, the cis form is generally more bioavailable, notably in the case of lycopene (Clinton et al., 1996;

25

Cooperstone et al., 2015a). Red tomatoes generally contain all-trans lycopene as the primary pigment and only about 5% of the total lycopene content is in the cis form

(Porrini et al., 1998). However, 58-73% of lycopene found in human blood is in the cis form, indicating that this conformation is better absorbed (Clinton et al., 1996).

Carotenoid absorption can be broken into five steps: 1.) extraction from a food matrix; 2.) incorporation into lipid micelles in the intestinal lumen; 3.) uptake by intestinal epithelial cell; 4.) integration with cholesterol-rich chylomicrons; and 5.) deposition into the lymph and target tissues (Harrison, 2012). The efficiency in which carotenoids are incorporated into lipid micelles can vary greatly depending on the amount and composition of fat in the diet (Arranz et al., 2015; During and Harrison, 2004; Huo et al., 2007; Kim et al., 2015). A study was recently performed where participants were given meals with or without avocados (Persea americana). It was demonstrated that the unsaturated lipids naturally present in avocados enhanced the absorption of β-carotene by

2.4-6.6 fold depending on the source of β-carotene (high β-carotene tomato sauce or carrots) (Kopec et al., 2014). In the case of tangerine tomatoes, carotenoids accumulate in lipid droplets, often in the cis conformation, and can be substantially more bioavailable than their all-trans counterparts (Cooperstone et al., 2015). In the case of tetra-cis- lycopene, bioavailability was determined to be 8.5x higher than all-trans lycopene

(Cooperstone et al., 2015).

After incorporation into micelles, carotenoids are transported into intestinal epithelial cells either by a transporter such as Scavenger Receptor class B type I (SR-BI) or possibly through diffusion (Harrison, 2012). Once inside the enterocyte, they can be

26 acted upon by β-carotene oxygenase (BCO1) proteins and both the parent compounds and cleavage products are incorporated into chylomicrons which contain other lipid species such as cholesterol, phospholipids, triglycerides, and apolipoproteins such as apolipoprotein B (apoB) (Harrison, 2012). The chylomicrons are then transported into the lymph where they interact with other tissues. Preferential deposition of carotenoids in the liver, adrenal gland, and testes has been hypothesized to be attributed to the increased amount of LDL receptors in these tissues (Erdman et al., 1993). Thus, potential carotenoid bioactivity aligns well with the tissues in which they accumulate. While the link between carotenoids and prostate cancer has been well studied, less is known about the role of carotenoids in other tissues such as the liver.

Although many carotenoids are not essential for health and wellbeing, others are necessary for an array of biological processes. For instance, the retinal macula has an apparent region that is rich in zeaxanthin and lutein which prevents damage to the retina from excess light (Bone et al., 1985). Additionally, many carotenoids such as β-carotene and β-cryptoxanthin exhibit pro-vitamin A activity and can be enzymatically cleaved to produce retinol (Harrison, 2012). The bioactivity of carotenoids in regards to cardiovascular disease and cancer prevention is well reviewed (Ciccone et al., 2013;

González-Vallinas et al., 2013; Story et al., 2010). Of the over 700 known carotenoids

(Britton et al., 2008), tomatoes primarily contain lycopene and β-carotene (Frusciante et al., 2007).

Lycopene’s function in vivo is still poorly understood. It is thought to benefit human health in part through decreasing oxidative stress within target tissues (Friedman,

27

2013). Oxidative stress can lead to toxic metabolic byproducts that can damage DNA molecules as well as proteins and lipid membranes (Halliwell and Chirico, 1993).

Ultimately, this damage can result in various diseases including cancers, cataracts, rheumatoid arthritis, autoimmune diseases, atherosclerosis, as well as cardiovascular and neurodegenerative diseases (Raiola et al., 2014). Lycopene has been shown to synergistically act with 1,25-dihydroxyvitamin D3, a form of vitamin D, and alter the progression of the cell cycle (Amir et al., 1999). Lycopene has also been shown to impact diseases related to the prostate and liver (Erhardt et al., 2011; Ip and Wang, 2014; Wan et al., 2014; Wang et al., 2010; Wu et al., 2004). Thus, it appears likely that beneficial effects of lycopene extend beyond its antioxidant activity.

While lycopene accounts for the majority of carotenoids in tomato, 7-10% of the amount is represented by β-carotene (Frusciante et al., 2007). β-carotene is of particular interest because of it can be cleaved into two retinol molecules (Novotny et al., 2010).

Vitamin A is crucial for human vision and its absorption can be enhanced by dietary lipids (Kopec et al., 2014; Rando, 1990). Additionally, dietary supplements of β-carotene and α-tocopherol (Vitamin E) were found to protect human skin from UV-B radiation in a synergistic manner (Stahl et al., 2000). This effect was also observed in food products contain multiple carotenoids such as tomato paste (Stahl et al., 2001). While carotenoids play an important role in human health, other tomato phytochemicals like flavonoids, phenolic acids, and ascorbic acid also serve important purposes.

Other than fruit-derived carotenoids, flavonoids have been implicated in a multitude of roles in human health. For example, flavonoids have exhibited positive

28 effects on memory as well as preventing apoptosis and improving blood flow to nascent nervous system tissues (Spencer, 2009a, 2009b). Flavonoids have also been linked to ameliorating coronary artery disease and the mechanism is likely due to reduced oxidation of LDL cholesterol and impairing the formation of arterial plaque (Naderi et al., 2003). Additional reviews for flavonoids on cardiovascular health are reviewed in

(De Pascual-Teresa et al., 2010; Peterson et al., 2012). In addition to direct affects, flavonoids may also alter gene expression. Quercetin-3-O-rutinoside (rutin), was shown to reduce NF-κB, IL1, and IL6 expression and reduce the severity of arthritis (Kauss et al., 2008; Raiola et al., 2014). Because of the evidence surrounding the bioactive role of flavonoids, these compounds have been targeted by plant breeders.

While the scientific consensus supports the consumption of tomatoes because of their various phytochemical classes, plant scientists and breeders need to be aware of the metabolic cost of increasing these compounds in plant tissues. For example, overexpressing regulatory components of the carotenoid pathway lead to dwarfed plants

(Giliberto et al., 2005; Sun and Kamiya, 1994). This effect is due to a depletion in GGPP pools that would otherwise be used to generate hormones such as gibberellins. Therefore, plant scientists and breeders need to take entire metabolic networks into consideration when breeding for increased nutritional quality to ensure that there is no major yield penalty associated with manipulating a certain pathway.

1.2.2 Phenolic Acids and Flavonoids

Tomato fruits biosynthesize many commonly observed phenolics and flavonoids found in fruits and vegetables such as caffeic acid, chlorogenic acid, ferulic acid,

29 naringenin, rutin, and kaempferol (Del Giudice et al., 2015; Luthria et al., 2006;

Slimestad et al., 2008). Phenolic acids and flavonoids are absorbed by brush border cells in the small intestine and are acted upon by the body’s xenobiotic defense system (e.g.

Phase I and II metabolism) (Cassidy and Minihane, 2017; Crozier et al., 2010). In many cases, these compounds can also be acted upon by the gut microbiome and their catabolites can be absorbed by the colon (Cassidy and Minihane, 2017; Stalmach et al.,

2010; Williamson and Clifford, 2010). In the case of flavonoids which are commonly glycosylated in planta, aglycone species tend to be the only forms detected in human plasma and tissues (Day and Williamson, 2001; Hollman, 2004). Moreover, it is important to consider potential Phase I and II metabolic processes (e.g. hydroxylation, sulfonation, glucuronidation) when searching for these compounds in vivo. These metaboilites as well as bacterial catabolites are hypothesized to be the most relevant forms from a health outcomes perspective (Williamson and Clifford, 2010). Here, I will briefly highlight some of the potential health benefits associated with phenolic and flavonoid compounds commonly found in tomato.

In a murine model, the phenolic acids mentioned above were shown to suppress skin tumor growth (Huang et al., 1988). Ferulic acid has also been cited as a health beneficial compound in a number of contexts including neurological diseases, cancer, and inflammatory processes (Srinivasan et al., 2007). While these compounds likely act through a variety of mechanisms, their antioxidant capacity has been studied most heavily and is thought to be a major contributor to their biological activity (Raiola et al.,

2014; Srinivasan et al., 2007).

30

Flavonoids, structurally more complex relatives of phenolic acids, have been implicated in a multitude of roles in human health. Typical flavonoid profiles in tomato fruits tend to be dominated by naringenin, rutin, and kaempferol (Slimestad et al., 2008).

Flavonoids have been shown to positively influence memory through direct interaction with target sites in the brain (Spencer, 2009a). Furthermore, flavonoids have been posited to influence the survival of nervous cell tissue by delaying or preventing apoptosis as well as influencing blood flow and eventually the proliferation of new nerve cells in the hippocampus (Spencer, 2009b). In terms of their anti-cancer activity, many in vivo experiments have been performed with these compounds and their generally promising outcomes have been well-reviewed (Chahar et al., 2011). Flavonoids have also been linked to ameliorating coronary artery disease and the mechanism is likely due to reduced oxidation of LDL cholesterol and impairing the formation of arterial plaque (De Pascual-

Teresa et al., 2010; Naderi et al., 2003; Peterson et al., 2012). In addition to direct affects, flavonoids may also alter gene expression. The role flavonoids play in cardiovascular health is well reviewed in (De Pascual-Teresa et al., 2010; Peterson et al., 2012). Rutin

(quercetin-3-O-rutinoside), a prominent flavonol glycoside in tomatoes, is associated with reducing the severity of arthritis through a reduction in tumor necrosis factor-alpha, interleukins 1 and 6 (Kauss et al., 2008; Raiola et al., 2014). Lowered cytokines in a rat model used by these researchers supported this finding. Lastly, a variety of flavonoids have been associated with reducing intestinal inflammation and their biological activity is largely due to structural motifs that determine their bioavailability and functionality

(González et al., 2011). The amounts of these health-promoting flavonoids present in

31 plant tissues can be a direct reflection of their growing environment, but have also been targeted by breeders (Bovy et al., 2002; Muir et al., 2001; Rein et al., 2006).

1.2.3 Tomato Steroidal Glycoalkaloids

Tomato’s reputation as a health-promoting food is often attributed to the diverse array of carotenoids, flavonoids, and phenolic compounds mentioned in previous sections. Epidemiological studies have particularly highlighted lycopene as one of the primary drivers of health benefits (Giovannucci et al., 1995). However, evidence suggests that whole tomato consumption offers additional benefits over consuming individual phytochemicals, such as lycopene (Boileau et al., 2003). Recent studies have identified tomato steroidal glycoalkaloids as chemical features that distinguish animals that consumed tomato-fortified diets and had positive health outcomes (Cooperstone et al.,

2017). Additionally, these compounds are of interest as potential biomarkers of tomato consumption since they are chemically unique to tomatoes (Cárdenas et al., 2015; Cichon et al., 2017b; Hövelmann et al., 2019). To many plant scientists, the word “alkaloid” tends to evoke thoughts of “toxicity”, “poison”, “bitter”, and other negative attributes. A small, but growing, body of evidence suggests that tomato steroidal alkaloids may have a positive influence on human health. While little is known about their bioaccessibility, bioavailability, and bioactivity within tissues in which they are deposited, I will briefly review what is currently known and highlight remaining knowledge gaps.

In vitro assays have demonstrated that alpha-tomatine, as well as its immediate precursors such as beta, gamma, and delta-tomatine, have varying degrees of

32 anticarcinogenic effects against colon and liver cancer cells (Lee et al., 2004). Similar outcomes have also been reported in prostate cancer cell lines (Choi et al., 2012).

Concentrations used in the previous study ranged from approximately 1 to 100 μM and it is unclear if these concentrations can be achieved in vivo. More importantly, the smallest effects seen in these in vitro studies tends to be with the aglycone tomatidine. However, sugar-less steroidal alkaloids like tomatidine, are the form that will most likely be present in tissues or circulating in blood plasma.

Aside from potential anticancer activity, alpha-tomatine has also been shown to reduce plasma lipoprotein cholesterol. (Cayen, 1971; Friedman et al., 2000b, 2000a).

This effect is hypothesized to be due to the poor absorption of cholesterol-alpha-tomatine complexes that form between the tetrasaccharide moiety on α-tomatine and ring structures of sterols (Keukens et al., 1995). Esculeogenin A, the aglycone of esculeoside

A, has been shown to reduce atherosclerosis and hyperlipidemia in ApoE deficient mice when provided in the diet at 50 or 100 mg/kg bodyweight (Fujiwara et al., 2007). Similar findings have also been found with tomatidine, the aglycone of alpha-tomatine, provided at the same dose (Fujiwara et al., 2012). Although these findings are promising, it is unclear if this dose is representative of what a human would consume or have circulating within the body.

Lastly, computational biology approaches for small molecule discovery have been used to identify potential therapeutic agents for muscle atrophy and wasting (Adams et al., 2015). This strategy revealed that tomatidine is one such biomolecule that may have a beneficial impact on muscle tissue (Dyle et al., 2014). The transcription factor activating

33 transcription factor 4 (ATF4) was found to be a regulator of multiple mRNAs associated with tomatidine (Ebert et al., 2015). Tomatidine may be a candidate for the development of therapies to reduce ATF4 activity and the negative effects this transcription factor elicits on muscle tissue.

While not well studied compared to other classes of phytochemicals, tomato steroidal alkaloids have been shown to play a role in attenuating a variety of diseases related to cardiovascular health, muscle function, and various cancers. However, baseline information about the bioavailability, deposition forms, and concentrations within tissues/fluids of tomato steroidal alkaloids is still lacking. Further research is needed to better understand the role these compounds play in human health.

1.2.4 Liver Diseases

Because of the growing body of epidemiological literature praising the potential benefits of carotenoids to human health, many studies have been conducted at the cellular, animal, and human level to better understand the molecular and physiological mechanisms that are affected by these compounds (Clinton, 2009; Story et al., 2010;

Wertz, 2009). Of primary interest is cancer, which is one of the leading causes of death worldwide and is second only to heart disease (National Center for Health Statistics,

2015). Cancers include a range of specific diseases that can affect virtually any tissue. By the number of cases, liver cancer (hepatocellular carcinoma (HCC)) ranks sixth globally but is the considered the second deadliest cancer (National Center for Health Statistics,

2015). Individuals with HCC tend to have low plasma carotenoid levels (Yu et al., 1999).

Although carotenoids have been shown to affect many different cancers by interfering

34 directly with signaling pathways (Clinton, 2009; Ip and Wang, 2014; Kotake-Nara et al.,

2001), this review will be focused on the liver.

Because HCC is often detected in its late stage (>80% of cases) limited treatment options have shown promise for extending the lifespan of patients. Mortality rates from

HCC continue to increase in the US and most people diagnosed with HCC die within a year of detection (Siegel et al., 2012). Risk factors for liver cancer include cigarette smoking, aflatoxin exposure, heavy drinking, hepatitis B or C infection, and high-fat diets

(Bosch et al., 1999). Additionally, liver cancer incidence is two to four times higher in men than in women (Bosch et al., 2004). Hypotheses about this discrepancy have been tested and the most widely accepted conclusion is based on strong evidence in mouse models that demonstrated estrogen’s ability to reduce the secretion of interleukin-6 (IL-6)

(Naugler et al., 2007). When IL-6 production was knocked out, there was no association between gender and liver cancer incidence (Naugler et al., 2007). Gender differences aside, over 90% of HCC cases are largely associated with inflammation in the liver, which can be caused by abnormally high concentrations of liver fat (steatosis) (Nakagawa and Maeda, 2012).

Non-alcoholic fatty liver disease (NAFLD) is found in 75-100% of obese individuals and is characterized by fat deposition in the liver without being directly caused by alcohol consumption (Baffy et al., 2012; Page and Harrison, 2009). Of those who have HCC, 30-40% also exhibit NAFLD (El-Serag and Rudolph, 2007). If dietary and exercise regimes remain unchanged, NAFLD can progress into non-alcoholic steatohepatitis (NASH) which significantly increases the risk of HCC (Baffy et al., 2012;

35

Cohen et al., 2011). In this disease state, regions of the liver become inflamed and exhibit cirrhosis which is one of the primary risk factors associated with HCC (Cohen et al.,

2011; Sun and Karin, 2012). When enough cirrhosis occurs, the liver hardens due to collagen accumulation and loses much of its functionality. Ultimately, cirrhosis leads to the formation of neoplastic lesions and HCC (Park et al., 2010).

Cancer in humans is regulated by many complex signaling cascades that exhibit homology with other animals. These include VEGF, FGF, MAPK, PI3k/AKT/mTOR,

EGFR, IGF, TGF-β, and HGF which are reviewed in (Wu and Li, 2012). One notable pathway related to HCC is the Wnt signaling pathway which is normally involved in regulation of cell growth and development, embryogenesis, and cellular homeostasis

(Nusse, 2005; Yardy and Brewster, 2005). Irregularities in this pathway through mutations in protein-encoding genes or inhibition of specific steps has been shown to lead to carcinogenesis (Cadigan and Nusse, 1997; Clevers, 2006). The disruption of this pathway and its implications in HCC have been well-reviewed (Pez et al., 2013). Some evidence of carotenoids disrupting Wnt signaling exists (Ip and Wang, 2014; Kavitha et al., 2013; Liu et al., 2008; Preet et al., 2013; Tanaka et al., 2012), but many HCC cases are caused by mutations in β-catenin that renders the Wnt pathway permanently disrupted

(Huang et al., 1999). Thus, dietary interventions are of little benefit in these cases. Drugs like sorafenib are usually recommended for late stage HCC. While sorafenib can extend patient life by 8-11 months (Llovet et al., 2008), quality of life is low due to its multi- tyrosine kinase activity which disrupts many normal metabolic processes. While it is virtually impossible that tomato consumption would have a positive clinical effect on late

36 stage HCC, there is promising evidence that suggests that diet-related liver diseases that can lead to HCC can be attenuated with tomatoes and their constituent phytochemicals

(Ip and Wang, 2014).

Gene expression of mice fed tomato-supplemented diets (10% red tomato powder by weight) or diets enriched with lycopene (0.25%) over a period of 3 weeks was previously investigated (Tan et al., 2014). Both lycopene and tomato altered gene expression in a manner that indicated decreased lipid uptake, cell proliferation, and decreased expression of retinoid X receptor (RXR) activation genes (Tan et al., 2014).

Data from this study showed that lycopene appears to be a major component of tomato’s ability to alter hepatic gene expression. Mongolian gerbils that were on a high fat diet with 0.05% lycopene had lower levels of lipid peroxide in their livers than gerbils on a high fat diet without lycopene supplementation (Choi and Seo, 2013). This finding indicates either less lipid radicals in the liver due to lycopene quenching or that antioxidant pathways to cope with lipid radicals were upregulated by lycopene. In another study, rats were fed a high fat diet (>70% calories from fat) in order to induce

NASH and lycopene and tomato extracts were supplemented into the diet and administered ad libitum for 6 weeks (Wang et al., 2010). Rats were injected with diethylnitrosamine (DEN) to quickly allow for NASH to develop into HCC. They found that both lycopene and tomato extract were able to reduce the severity of HCC in rats that were on the high fat diet by reducing cyclinD1 protein, NFκB expression as well as extracellular signal-related kinase (ERK) expression. While both lycopene and tomato extract supplemented diets were able to decrease HCC progression in high fat diet rats,

37 some differences between the two supplementations were reported. Primarily, the treatment group that was given a high fat diet with tomato extract supplemented feed had significantly less inflammatory foci per area than rats fed only a high fat diet or a high fat diet with lycopene. This trend also carried through for TNF-α, IL-1β, and IL-12 where lycopene alone had little effect but tomato extract supplemented rats were significantly lower expression (Wang et al., 2010). This finding indicates that phytochemicals other than lycopene may be important for tomato’s bioactivity in the context of HCC.

Rats on a control or high-fat diet (35 and 71% fat, respectively) that either included or did not include red tomato extract were studied during the course of HCC development (Melendez-Martinez et al., 2013). 6 weeks after HCC induction by DEN, rats with tomato extract had less inflammatory foci than those only fed high fat diets.

Genes related to lipid metabolism such as peroxisome proliferator-activated receptor gamma (PPARγ) and sterol-regulatory element binding protein (SREBP-1) were altered by tomato extract. Phytoene, phytofluene, and trace lycopene were found in rat livers in supplemented with tomato extracts and the changes in metabolism reported in this study could be due to any of these carotenoids or other compounds that were not measured. A few studies have specifically focused on carotenoid breakdown products on HCC development.

Apo-10’-lycopenoic acid (APO10LA; 10 mg/kg) was provided to mice for 24 weeks after having HCC induced by DEN (Ip et al., 2013). This study showed that

APO10LA reduced tumor incidence by 50% and decreased tumor size by 65% in mice on a high fat diet who also had HCC. APO10LA inhibited the activation of Akt, TNFα, NF-

38

κB, IL6, and STAT3, as well as many other biomarkers for HCC (Ip et al., 2013). Since

APO10LA is generated by cleaving lycopene with BCO2 enzymes, increased lycopene in tomatoes could potentially achieve similar benefits. It is also important to note that the doses and plasma levels achieved in this study are considered realistic and this aspect was expanded upon in their materials and methods section. This study was followed by another study where BCO2 knockout (BCO2-) mice were used to examine how lycopene supplementation and their breakdown products affected HCC development (Ip et al.,

2014). Lycopene supplementation was able to reduce HCC progression in both wild type and BCO2-, however, the mechanisms by which this effect occurred were unique. Wild type mice supplemented with lycopene had reduced inflammatory foci and exhibited phosphorylation of NF-κB, p65, and STAT3 as well as IL6. BCO2- mice exhibited lower mTOR complex 1 activation, Met mRNA, and β-catenin (Ip et al., 2014). These results demonstrate that lycopene itself can affect HCC signaling independently of one of its breakdown products, APO10LA. However, it is important to bear in mind that this apolycopenoid has never been seen in circulating blood plasma before.

Six week old BCO1-/-BCO2-/- mice were induced to develop HCC using DEN and then put on a high fat diet (60% calories from fat) that did or did not contain tomato powder for 24 weeks (Xia et al., 2018). Mice that had tomato powder enriched diets showed a reduction in various biomarkers associated with inflammation such as interleukins 1, 6, and 12. Interestingly, genes associated with circadian rhythm including period 2, cryptochrome-2, and circadian locomotor output cycles kaput

(CLOCK) were increased from tomato consumption. Lastly, gut microflora diversity was

39 positively affected by tomato consumption, although the implications of this result are unclear.

While the literature supports the role of tomatoes and tomato phytochemicals in modulating molecular processes that regulate HCC and diseases that lead to HCC, questions remain as to what other phytochemicals could elicit more pronounced effects.

Tangerine tomatoes contain a unique blend of carotenoids not found in red tomatoes and many of these are more bioavailable due to their chemical structure (Clinton et al., 1996;

Cooperstone et al., 2015a; Isaacson et al., 2002). Pre-clinical animal studies comparing effects of dietary red and tangerine tomatoes on the liver transcriptome could yield valuable information that guides future studies aimed at using tangerine tomatoes for nutritional interventions. More broadly, it is still poorly understood how tomato consumption affects the chemical and transcriptional landscape of the liver in general.

1.3 Quantifying Tomato Phytochemicals and Assessing their Biological Activity

“To measure is to know” is a quote, albeit cliché, attributed to Lord Kelvin. While broadly applicable to many situations, this quote resonates especially well with those studying phytochemicals. In order to understand the biological role that phytochemicals play in planta or in vivo, it is crucial to have effective methodology to extract and quantify compounds of interest. Here, I will briefly review targeted and untargeted

(hypothesis driven and hypothesis generating) approaches used to extract and quantify tomato phytochemicals. Approaches such as next generation sequencing will also be

40 addressed as a means to determine the effect that phytochemicals may have on the transcriptional landscape of relevant tissues.

1.3.1 Targeted Assays

To measure phytochemicals, a plethora of assays have been created ranging from spectrophotometry to tandem mass spectrometry. Phytochemical assays vary based their ease of execution, the time they take to conduct, their reproducibility, and their sensitivity. Given the diversity in phytochemicals present in tomato, there are seemingly countless assays that could be reviewed. For the sake of focus, I will discuss those relevant to carotenoids and steroidal alkaloids.

There are many methods available to partition and analyze tomato carotenoids that take care to minimize enzymatic and oxidative degradation (Britton, 1996; Ferruzzi et al., 2001, 1998; Kopec et al., 2012; Rodríguez-Amaya and Kimura, 2004). While the solvent systems differ slightly, a common theme is that samples are first extracted with a water-miscible solvent to dehydrate the sample and then re-extract multiple times with nonpolar solvents. Each successive extraction requires centrifugation and liquid-handling steps which are time consuming. Extracts are then phase separated to isolate the nonpolar components of the supernatant. Carotenoid extractions can be extremely time intense and even methods developed to streamline this process might be able to generate only 20 samples every three hours (Kopec et al., 2014). After drying an aliquot of supernatant and re-dissolving in a known volume of solvent, carotenoids and their isomers are typically separated and quantified using liquid chromatography which can take 15 to over 41

100 minutes per sample (Bijttebier et al., 2014; Bramley, 1992; Cooperstone et al., 2016,

2015b; Daood et al., 2014; Ferruzzi et al., 1998; Kean et al., 2008; Lesellier et al., 1993;

Ronen et al., 2000). On the other hand, spectrophotometric methods offer an alternative to lengthy chromatographic runs (Nagata and Yamashita, 1992). The disadvantage, however, is a decrease in sensitivity and an inability to separate carotenoids and their isomers which may have overlapping absorbance characteristics.

From the perspective of a plant breeder, data needs to be collected and analyzed as soon as possible in order to make informed decisions about selections. When handling hundreds or thousands of samples from breeding populations, lengthy analysis times can create a backlog that could result in a breeder missing a narrow planting window. Rapid methods such as those using spectrophotometry might be helpful in some cases, but the inability to separate carotenoids with similar absorbance characteristics might render them useless. Clearly, there is a need to develop rapid extraction and analysis methods for carotenoids that do not sacrifice data quality for speed.

Unlike carotenoids, tomato steroidal alkaloids are considered to be polar or semi- polar compounds. As such, their extraction from tomato, which is primarily water, is relatively straightforward. Tomato steroidal alkaloids are typically extracted by grinding individual tomato tissue samples with a mortar and pestle, or blender, and then releasing analytes using polar solvents such as . This approach is time consuming because each sample is handled individually. Additionally, this technique has been used for relative profiling, and has not been evaluated for its ability to quantitatively extract steroidal alkaloids. Tomato steroidal alkaloids such as alpha-tomatine, have previously

42 been quantified using gas and liquid chromatography (Kozukue and Friedman, 2003;

Lawson et al., 1992; Rick et al., 1994) as well as a number of bioassays such as cellular agglutination (Schlösser and Gottlieb, 1966) and radioligand assays using radioactive cholesterol (E.A. Eltayeb and Roddick, 1984). These methods are unreliable, suffer from poor sensitivity, cannot differentiate among different alkaloids, and are time consuming.

Previous chromatography-based methods to quantify both potato and tomato steroidal alkaloids relied on photodiode array detectors set to 208 nm (Del Giudice et al., 2015;

Kozukue and Friedman, 2003; Lee et al., 2004; Tajner-Czopek et al., 2014). Given that the molar extinction coefficient for alpha-tomatine is only 5000 M-1c-1, (Keukens et al.,

1994) , photodiode array detectors are often not sensitive enough for detecting low quantities of these compounds, nor distinguishing between different alkaloids. Moreover, photodiode array detectors are often set to 208 nm to quantify tSGAs which is a non- specific wavelength where many compounds (including mobile phases) can absorb light

(Friedman and Levin, 1998, 1992; Keukens et al., 1994).

Recent advances in analytical chemistry have enabled researchers to discover other steroidal alkaloid species in tomato fruits using high resolution mass spectrometry

(Iijima et al., 2008; Zhu et al., 2018), however these methods are qualitative. A small number of quantitative methods using mass spectrometry have been developed, but only for individual or a few steroidal alkaloids (Baldina et al., 2016; Caprioli et al., 2014) .

Previously reported limits of quantification for alpha-tomatine are 0.005 mg/kg

(estimated to be 0.5 ug in a standard 100 g tomato) (Caprioli et al., 2014). While this is an improvement over previous PDA based methods, this level of sensitivity may not be

43 sufficient for other steroidal alkaloids that are less abundant than alpha-tomatine. There is an apparent gap in methodology for tomato steroidal alkaloids. Robust, sensitive, and efficient extraction and quantification methods are needed in order to study the function these compounds serve in planta and in vivo.

1.3.2 Untargeted Metabolomics

Unlike the targeted assays mentioned above, untargeted metabolomics is considered to be an unbiased, hypothesis generating technique. However, bias is intrinsically present due to the extraction used or means of detection. Untargeted metabolomics can use a variety of analytical platforms including GC, LC, and NMR

(Alseekh and Fernie, 2018). Each of these platforms offers unique benefits and drawbacks. For example, GC based metabolomics bolsters the most well curated MS libraries due to the predictable fragmentation reactions generated with these instruments.

However, analytes need to be volatile, or made to be volatile through derivatization, which involves extensive sample preparation. NMR can provide detailed structural information about analytes, but lacks sensitivity compared to GC or LC based platforms

(Wishart, 2008). LC based metabolomics allows for analytes to be separated prior to being detected by the MS. Additionally, LC based metabolomics is amenable to relatively simple sample preparation (e.g. diluting urine, centrifuging, and running), can be chromatographically optimized to increase the amount of features detected, and a wide range of analytes can be detected using this approach (Wishart, 2008). Moving forward, I will focus my brief review of metabolomics on LC based platforms only.

44

Another departure from the targeted assays previously reviewed is that it is highly recommended to minimize sample handling for untargeted metabolomics experiments and to ensure that all samples are treated the same. Failure to do so can result in artifacts in the dataset that can negatively influence downstream statistical analyses. One of the most important, and sometimes overlooked, aspects of untargeted metabolomics is the way in which sample sets are arranged and run. In targeted assays, it might be suitable to simply run a blank and a quality control (QC) sample at the beginning of the run before beginning. In an untargeted metabolomics context, the analyst would potentially lose information about retention time drift or other issues that may occur throughout the run.

It has been recommended that prior to starting an untargeted metabolomics run, a blank sample is run with no injection to determine if there are compounds present in mobile phases or bound to the column from previous use that may influence collected data

(Broadhurst et al., 2018). Afterwards, running multiple solvent blanks, the same solvents used to dissolve samples, can help equilibrate the column and avoid buildup of impurities that may occur when idling. Samples should also be randomized so that potential instrument issues that arise affect all treatments equally. The inclusion of a pooled QC, which is made from a small aliquot of every sample that will be run, is suggested to be run at regular intervals. (Broadhurst et al., 2018; Dunn et al., 2012). Given that pooled

QC samples are identical, these samples can be used to assess data quality and monitor the batch throughout the run. Quality control samples can also be used to correct for changes that occur to over the duration of the run (Broadhurst et al., 2018; Luan et al.,

2018).

45

Once an untargeted metabolomics experiment has been completed, data generated by the instrument needs to be converted into an analyzable form. Peak picking algorithms are utilized to find features of interest. Peaks are then aligned to account for shifts in retention time throughout the batch by utilizing a QC for reference (Alonso et al., 2015).

Peaks are then binned to combine signals from different adducts of the same mass occurring at a given retention time. For example, a charged mass (M+H) at a given retention time might also exist as M+Na (sodium adduct) or M+Fa-H (formic acid adduct) and the algorithm can combinatorially search for all these possible masses, within a set tolerance, at a given time to collapse ions that result from the same metabolite into one ‘feature’ (Alonso et al., 2015). Integration software can then automatically determine peak heights or areas for each feature and generate a data matrix that can be analyzed.

Before analysis, data generated as described above needs to be curated. First, features with more than a certain percent of missing values need to be removed. There is no defined value, but a common starting point is that a feature must be present in 70% of

QC samples. Additionally, removing features with > 30% coefficient of variation remove features that may have been poorly integrated and would hinder statistical analysis.

Transformation and scaling are necessary next steps to reduce differences in error among features and account for differences in abundance that can often vary by multiple orders of magnitude (van den Berg et al., 2006). Log2 transformations and Pareto scaling are two popular and generally effective transformation and scaling approaches, respectively (van den Berg et al., 2006).

46

Once data have been curated, transformed, and normalized, analysis can begin using univariate and multivariate approaches. T-tests and ANOVA are two common univariate approaches that are followed with a FDR correction to account for multiple comparisons (Benjamini et al., 2001; van den Berg et al., 2006). Multivariate approaches such as principal components analysis (PCA), k-means clustering, partial least squares discriminant analysis (PLS-DA), and random forests are all common place in untargeted metabolomics.

Principal components analysis is an unsupervised data dimensionality reduction approach that seeks to visualize the natural structure present within a dataset. PCA uses linear combinations of variables in a way that maximizes explained variance in the first component and subsequent components explain decreasing amounts. This technique is commonly used as a first-pass view of their data and can also be used to discover potential issues in their data. If QC samples are not clustering together, for example, there may be a run-order or batch effect issue that needs to be addressed before moving forward.

Another unsupervised approach is k-means clustering. First, the analyst must decide how many clusters to use. This task can be accomplished by creating a scree plot and looking for an “elbow” at which point the in-cluster sum of squares is no longer rapidly decreasing. The algorithm then goes through multiple iterations of calculating the positions of clusters until the centroids stabilize. K-means clustering can be combined with PCA as a way to check if the algorithm is able to correctly classify samples. The

47 outcome of this approach can be informative about differences (or lack thereof) within a dataset.

Partial least squares discriminant analysis was developed in the late 1960’s, but first used in a chemistry application in the late 1980s (Wold et al., 1987). PLS-DA attempts to find the most amount of covariance between a given variable and the entire dataset (Alonso et al., 2015). This supervised analysis method is widely used to find features of interest that are differentiating different sample groups. However, PLS-DA is prone to overfitting and has been shown to be able to separate random data into groups

(Kjeldahl and Bro, 2010; Westerhuis et al., 2008).Care must be taken when interpreting the results of PLS-DA and it is recommended to use cross validation to assess model performance (Szymańska et al., 2012; Westerhuis et al., 2008).

Although there are many other approaches that can be used for feature selection in untargeted metabolomics, I will focus this final analysis paragraph on random forests.

This supervised learning approach was conceived almost two decades ago (Breiman,

2001). Random forests are an expansion of bagging (bootstrap aggregation), an earlier machine learning technique (Hastie et al., 2009). Random forests are considered to be a collection of weak learners that, when averaged together, can capture interactions among variables in an unbiased way (Hastie et al., 2009). This algorithm can identify features that most strongly contribute to the overall predictability of the model and as such, are explaining the differences between treatment groups. Using random forests in conjunction with other multivariate analyses (e.g. PLS-DA) can allow an analyst to more confidently focus on features that are being deemed important in multiple models.

48

Variable importance (VIP) scores are generated in both of these approaches and features with VIP >1 are considered to be truly important (Chong and Jun, 2005; Lazraq et al.,

2003; Sun et al., 2012). Once features of interest have been selected, they need to be identified.

The advantage high resolution mass spectrometry is that molecular mass is measured so accurately enough that a relatively small number can be predicted; possibly leading to a tentative identification. Databases such as the Human

Metabolome DataBase (Wishart et al., 2013) can also be leveraged to query masses of interest and possibly match with previously annotated compounds. However, tandem MS

(MS/MS) experiments (where ions of interest are subject to voltages resulting in the breaking of chemical bonds, and the mass of the resulting fragments are determined providing information about the chemical structure and connectivity of the compound) are often needed to generate structural information about a feature of interest.

Comprehensive MS/MS spectra can also be uploaded to databases such as the Global

Natural Products Social Molecular Network (GNPS) as a way of determining if observed spectra are similar to other compounds that have been previous annotated (Wang et al.,

2016). Ideally, an authentic standard would provide least two orthogonal pieces of information such as retention time and accurate mass in order to fully identify a feature.

Despite the challenges of planning and executing an untargeted metabolomics experiment, the data generated can provide invaluable information about the chemistry and biology of a system.

49

1.3.3 RNA-seq

RNA-seq is an evolving technology within the “Next-Generation Sequencing” umbrella that allows researchers to profile the transcriptome of multiple organisms, a single organism, specific tissue, or even a single cell (Baginsky et al., 2010; Khan et al.,

2014; Ramsköld et al., 2012; Westermann et al., 2012; Wu et al., 2014). RNA-seq can be used to discover new genes, splice variants, characterize small RNAs and other non- coding species, characterize transcriptional start and stop sites, and globally quantify expression of each unique mRNA transcript (Trapnell et al., 2012; Wang et al., 2009).

Prior to RNA-seq, microarrays were commonly used to simultaneously quantify large sets of genes. However, the dynamic range of a microarray is about 102, whereas the range for RNA-seq is 105 and is considered to be more reproducible and have significantly less background noise (Khan et al., 2014; Wang et al., 2009). Thus, RNA- seq has proven itself invaluable to researchers in a broad range of fields. Because RNA- seq is about a decade old, advances in instrumentation, preparation and reaction chemistry, and computational capabilities are occurring at a rapid pace.

Although RNA-seq might imply that sequencing occurs at the RNA level, RNA- seq actually works with cDNA libraries generated from mRNA (Wang et al., 2009). For

RNA-seq experiments aimed at quantifying gene expression, total RNA is isolated from a tissue of interest and ribosomal RNA (rRNA), which accounts for approximately 95-98% of total RNA in a sample, is separated from mRNA which has a characteristic poly(A) tail (Peano et al., 2013). This process results in only mRNA (coding RNA) and drastically improves coverage. Genomic DNA present in the sample can also be removed

50 through DNase treatment. Identifying sequence tags are added to cDNA fragments from specific sample libraries, allowing for multiple samples to be sequenced in the same lane

(Shishkin et al., 2015). Adapter sequences are then added to each end of the cDNA fragments and these correspond to oligos that are bonded to the flow cell surface (Wang et al., 2009). DNA polymerase activity and bridge amplification lead to the formation of clusters on the flow cell surface (Mardis, 2013). From there, strands are sequenced by synthesis and the addition of oligonucleotides is accompanied by a fluorescent light signal that corresponds with the addition of a specific base (Mardis, 2013; Wang et al.,

2009). These signals are then processed by the sequencer and are used to determine the sequence of each of the cDNA fragments. Modern sequencing platforms, particularly those run on Illumina platforms, can generate 300 bp paired-end reads, however, for simple expression analyses researchers have found little difference in detection of gene expression at various read lengths between 50 and 100 paired-end reads (Chhangawala et al., 2015). However, that the detection of splice-junctions drastically improved with longer reads (Chhangawala et al., 2015). Additionally, trimming processes that occur after sequencing undoubtedly decrease the resolution to detect expression differences, splice-junctions, and other features within the data set (Williams et al., 2016). Therefore, it is often wise to advantage of longer, paired-end reads particularly if alignment to a reference genome is difficult (Katz et al., 2010; Trapnell et al., 2012).

While RNA-seq is considered to have fewer biases than microarrays, sequence characteristics like GC content and repeated nucleotides can be problematic and mathematical corrections have been developed to account for these issues (Khan et al.,

51

2014; Li et al., 2015). Bioinformaticists have developed many software packages to help researchers refine and interrogate large data sets that are inherent with RNA-seq experiments (Shifman et al., 2016; Trapnell et al., 2012, 2010, 2009; Yu et al., 2014).

One caveat is that these software often require knowledge in programming languages, such as Unix or Perl, as they lack user-friendly graphical interfaces (Leipzig, 2016). A popular series of RNA-seq workflow software that were humorously named after various formal evening wear are: Bowtie (Langmead et al., 2009), a short read aligner; TopHat

(Trapnell et al., 2009), which aligns reads with a reference genome; Cufflinks (Trapnell et al., 2010), a multifaceted package that facilitates transcript assembly; and

CummeRbund (Trapnell et al., 2012), a data visualization package. CummeRbund is particularly important as it converts processed files from Bowtie, TopHat and Cufflinks into R files that can be manipulated with a variety of R-based packages (Trapnell et al.,

2012). The packages mentioned here are mainly for analyzing RNA-seq data only.

However, there is great interest developing in the area of merging –omics data sets.

Integrated –omics data could provide more information than the individual contributions of individual –omics datasets. In order to integrate data from multiple –omics platforms, mathematics-based software programs are essential. Methods to integrate mutli–omics data include correlation-based, concatenation-based, pathway-based, and multivariate- based integration (Cavill et al., 2015).

52

1.3.4 Integration of Metabolomic and Transcriptomic Data

Correlation-based integration is one of the most popular ways to interrogate data collected from multiple–omics platforms. Typically this procedure is done by calculating

Pearson’s coefficients for every gene-metabolite pair. After integrating gene expression and metabolite data, many authors have reported problems ranging from low correlation between metabolites and transcripts for the enzymes that produce them to correlations changing drastically depending on experimental conditions (Bradley et al., 2009; Moxley et al., 2009). Lastly, it should be noted that related parts of a pathway are often found to not correlate with one another while strong correlations that span metabolism can occur

(Cavill et al., 2015). This is not to say that correlation-based–omics integration is invalid, but more refined mathematical methods need to be researched in order to gain meaningful information from these results.

Concatenation is another relatively simple way to integrate multiple –omics data sets together. Afterwards, analyses such as co-clustering using k-means or random forests can be used to define gene-metabolite relationships (Acharjee et al., 2011; Jozefczuk et al., 2010). One of the main problems with dataset concatenation is that components from each data set generally cluster well with components from the same data set, limiting the applications for concatenation in multi–omics integration (Cavill et al., 2015). This is especially a problem with metabolomics data which remains unannotated. There are tools available to help minimize these problems (Shen et al., 2012), but interpretation of concatenated data should be met with speculation.

53

Pathway-based integration is a popular way to integrate multi –omics data because it logically fits well with the thought processes commonly used by biochemists and systems biologists. Existing programs such as Ingenuity IPA, PathVisio, Paintomics,

Integrative Meta-analysis of Expression data (INMEX), InCroMAP, and Integrated

Molecular Pathway Level Analysis (IMPaLA) are among the more commonly used programs for integration –omics data and analyzing these data within the context of metabolic pathways (Cavill et al., 2015). The biggest downside with pathway-based integration is that these programs are constructed using current knowledge of metabolic pathways and require metabolite annotations for metabolomics data. Therefore, it is impossible to discover previously unknown gene-metabolite relationships. While pathway analyses may be useful for many applications, its limitations could hinder discoveries and other methods are needed.

The last category of integration strategies for multi –omics data is using multivariate approaches. Arguably, multivariate approaches are the least intuitive strategy as it is difficult to conceptualize in a multidimensional space, but has potential to generate robust information that can guide future research questions. Two common forms of multivariate approaches are principal components analysis (PCA) and partial least squares (PLS). Multivariate methods separate the two data sets into different dimensions

(e.g. expression is the x-axis and metabolites is the y-axis) (Cavill et al., 2015). For PLS, defining the x and y axis has great implications on the information gained from the analysis. Thus, bioinformaticists have turned to two-way orthogonal least squares

(O2PLS) as a method that makes the x and y axes symmetrical and has been used to

54 integrate metabolomics, transcriptomics, and proteomics data (Eveillard et al., 2009;

Grimplet et al., 2009; Rantalainen et al., 2006). Importantly, O2PLS can partition variance from each of the integrated data sets, which is not possible in some of the other methods mentioned above. O2PLS software packages, such as STATegRa, are available within the Bioconductor suite of packages. Simulations to test the robustness of O2PLS have shown that current O2PLS algorithms exhibit problems in the presence of noisy data

(Bouhaddani et al., 2016). However, O2PLS appears to be a promising mathematical strategy to integrate metabolomics and transcriptomic data sets to better understand complex metabolic networks.

More recent developments in multi omics integration include hierarchical community network (HiCoNet) (Li et al., 2017) and integration using linear models

(IntLIM) approaches (Siddiqui et al., 2018). HiCoNet works by exploring datasets generated by multi-omics experiments and finds communities within each dataset.

Communities from different datasets are then tested among each other to find associations. The communities of genes and metabolites are generated from correlation networks (Li et al., 2017). IntLIM, on the other hand, uses a linear regression based approach where the contributions of a given gene, phenotype (treatment classification), and their interaction on the amount of a given metabolite (Siddiqui et al., 2018).

Ultimately, both of these approaches use linear modelling in different ways and may be promising methods for discovering novel gene-metabolite relationships.

55

1.4 Objectives:

1.4.1 Objective 1: Develop a high-throughput extraction and analysis workflow suitable for plant breeders to determine how carotenoid pathway intermediates are affected by alleles of Beta and tangerine

Carotenoids have been long-implicated as a health beneficial phytochemical class produced by tomatoes. One of the major obstacles of quantifying these compounds is the slow pace of current extraction and analytical methods. Here, I developed a rapid extraction and analysis method and compared its performance to established methods by phenotyping a population of tomatoes with unique carotenoid profiles. I anticipated that these rapid methods would be suitable for plant breeders and other scientists who require fast turn- around times in the lab to make data driven decisions.

1.4.2 Objective 2: Determine how dietary tomato consumption affects metabolism by quantifying transcriptome and metabolome alterations in mouse liver tissue.

Studies that have reported health benefits associated with tomato phytochemicals in animal models point to alterations in gene expression as the primary driver of observed outcomes. Although many researchers have devoted considerable effort to understanding how individual phytochemicals mediate these changes, it is still unclear how tomato consumption, in general, affects the functionality of tissues and organs where these compounds are deposited. One major organ of interest is the liver: where absorbed chemical components of our diet are metabolized before being distributed to the rest of the body. In this study, I combined transcriptomics and untargeted metabolomics to investigate how inclusion of tomato in the diet affects mammalian liver tissue on both a gene

56 expression and chemical level. I hypothesized that tomato consumption would affect gene expression by way of altering the chemical composition of the liver. Information from this study will be used to inform future studies investigating the use of tomatoes and tomato products for liver disease prevention research.

1.4.3 Objective 3a: Develop and validate a high-throughput method to extract and quantify potentially health-promoting tomato steroidal alkaloids.

While carotenoids have arguably received most of the attention in the literature regarding tomatoes and human health benefits, studies have shown again and again that whole tomato consumption offers increased benefits over carotenoids alone. Therefore, there are likely other components of tomato that are working individually or in concert with carotenoids. Untargeted metabolomics experiments have revealed that steroidal alkaloids are a major differentiating factor in animal studies where health benefits were observed in groups consuming tomato rich diets compared to controls. These compounds are biologically active in planta as fungicides and have shown to inhibit various cancer cell growth in in vitro studies. Thus, steroidal alkaloids are candidates as a novel class of potentially bioactive tomato phytochemicals and merit further study. Currently, there are no sensitive and quantitative methods to measure tomato steroidal alkaloids. I developed a rapid extraction and ultra-high performance liquid chromatography tandem mass spectrometry method, validated its performance, and utilized it to survey tomato-based grocery store products to test its suitability for a variety of applications. I anticipate that this method will facilitate more research in these compounds by rendering their analysis straightforward and routine.

57

1.4.4 Objective 3b: Quantify the range of natural phenotypic variation of potentially health-promoting tomato steroidal alkaloids and describe the underlying genetic architecture.

In addition to a lack of adequate methodology to measure tomato steroidal alkaloids, little is known about their concentration range or the genetic architecture controlling their production. Now that we have a high-throughput and effective quantification method available for use, I developed a diversity panel of 107 accessions of red fruited tomato species designed to maximize genetic variation. This population was grown in three environments over two years and I phenotyped red fruit for steroidal alkaloids. I hypothesized that steroidal alkaloids would be most diverse in wild tomato germplasm. A genome-wide association study (GWAS) was conducted, and quantitative trait loci (QTL) associated with various steroidal alkaloids were identified. In parallel, a biparental mapping population was created to validate QTL discovered by GWAS and to make available tomato germplasm that could be used to test nutritional hypotheses in clinical trials. Fine mapping studies will be needed to confirm the identities of genes within detected QTL. Information generated from this study will advance our understanding of steroidal alkaloid diversity and biosynthesis.

58

Chapter 2. Analysis of Tomato Carotenoids: Comparing Extraction and Chromatographic Methods

Michael P Dzakovich1, Elisabet Gas-Pascual2, Caleb J Orchard2, Eka N Sari2, Ken M Riedl3, Steven J Schwartz3, David M Francis2, Jessica L Cooperstone1,3

Affiliations 1The Ohio State University, Department of Horticulture and Crop Science, 2001 Fyffe Court, Columbus, OH 43210. 2The Ohio State University, Ohio Agricultural Research and Development Center, Department of Horticulture and Crop Science, 1680 Madison Ave, Wooster, OH 44691. 3The Ohio State University, Department of Food Science and Technology, 2015 Fyffe Court, Columbus, OH 43210.

Available at: doi: 10.5740/jaoacint.19-0017

2.1 Abstract

Tomatoes (Solanum lycopersicum) are an economically and nutritionally important crop colored by carotenoids such as lycopene and β-carotene. Market diversification and interest in the health benefits of carotenoids has created desire by plant, food, and nutritional scientists for improved extraction, and quantification protocols that avoid analytical bottlenecks caused by current methods. Our objective was to compare standard and rapid extraction as well as chromatographic separation methods for tomato carotenoids.

Comparison was based on accuracy and the ability to discriminate alleles and genetic backgrounds. Estimates of the contribution to variance in the presence of genetic and environmental effects were further used for comparison. Selections of cherry and processing tomatoes with varying carotenoid profiles were assessed using both established 59 extraction and high-performance liquid chromatography diode array detector (HPLC-

DAD) methods and rapid extraction and ultra-high-performance liquid chromatography

(UHPLC-DAD) protocols. Discrimination of alleles in samples extracted rapidly (<5 minutes/sample) was similar to samples extracted using a standard method (10 minutes/sample), although carotenoid concentrations were lower due to reduced extraction efficiency. Quantification by HPLC-DAD (21.5 minutes/sample) and UHPLC-DAD (4.2 minutes/sample) were comparable, but the UHPLC-DAD method could not separate all carotenoids and isomers of tangerine tomatoes. Random effects modeling indicated that extraction and chromatographic methods explained a small proportion of variance compared to genetic and environmental sources. The rapid extraction and UHPLC-DAD methods could enhance throughput for some applications compared to standard protocols.

2.1 Introduction

The tomato (Solanum lycopersicum), is an economically important and nutritious horticultural crop containing a range of nutrients, vitamins, micronutrients, and phytochemicals with potential health benefits (Abate-Pella et al., 2017; Arathi et al., 2015).

Epidemiological evidence suggests that consumption of tomatoes and tomato products is associated with a reduced risk for development of prostate cancer and other chronic diseases; an outcome which is often ascribed to the presence of lycopene, the predominant pigment in red tomatoes (Clinton, 2009; Friedman, 2013; Giovannucci et al., 1995; Wei and Giovannucci, 2012; Wu et al., 2004). Another pigment found in tomatoes, β-carotene, is an important pro-vitamin A carotenoid. Carotenoids are also responsible for the vibrant

60 red and orange colors of tomatoes and tomato products which partly drives consumer acceptability (Stommel et al., 2005).

In order to modulate tomato fruit carotenoid profiles, sources of natural variation needed to be identified and accurately characterized. Examples of natural variation for carotenoid concentration and profiles include but are not limited to the chromoplast-specific allele of lycopene beta cyclase (CYC-B) known as Beta (B) (Lincoln and Porter, 1950; Ronen et al., 2000) and alleles of carotenoid isomerase (CRTISO) responsible for tangerine (t)

(Isaacson et al., 2002; MacArthur, 1934). For both genes, specific alleles lead to orange- colored tomato fruits. However, Beta alleles allow for the accumulation of β-carotene in ripe fruits while tangerine alleles prevent the biosynthesis of all-trans-lycopene, and its precursors (phytoene, phytofluene, ζ-carotene, neurosporene, and tetra-cis-lycopene) predominate the carotenoid landscape. Plant breeders have created tomato germplasm with these alleles which has been utilized by seed companies and food scientists to produce various tomato-based products (Aust et al., 2005; Bohn et al., 2013; Grainger et al., 2008;

Misra et al., 2006). In each of these contexts, accurate quantification of carotenoids and fast turnaround time is essential for making decisions about selection, quality control, and dosage delivered.

In order to partition and analyze carotenoids while minimizing enzymatic and oxidative degradation, analytical chemists have developed a wide array of extraction and analysis methods for tomato (Britton, 1996; Ferruzzi et al., 2001, 1998; Kopec et al., 2012;

61

Rodríguez-Amaya and Kimura, 2004). While the solvent systems differ slightly, a common element is that samples are first extracted with a water-miscible solvent and then re- extracted multiple times with nonpolar solvents. Each successive extraction requires centrifugation and liquid-handling steps which are time consuming. Extracts are then phase separated to remove water and water-miscible solvents prior to sample dry-down. After re- dissolving in a known volume of solvent, carotenoids and their isomers are typically separated and quantified using liquid chromatography which can take 15 to over 100 minutes per sample (Bijttebier et al., 2014; Bramley, 1992; Cooperstone et al., 2015b;

Daood et al., 2014; Ferruzzi et al., 1998; Kean et al., 2008; Lesellier et al., 1993). Genetic treatments, environmental factors, and sample processing can profoundly influence carotenoid profiles and these sources of variation speak to the need for rapid and accurate methods to measure carotenoids and their isomers.

To mitigate the bottlenecks created by lengthy extraction and analysis procedures, we developed a rapid extraction and an ultra-high performance liquid chromatography diode array detector (UHPLC-DAD) method focused on tomato. Our goal was to compare and contrast standard extraction and chromatography methods with our rapid protocols by their ability to accurately discriminate between tomatoes with different genetic backgrounds and carotenoid profiles as well as model the amount of variance these methods contribute in the presence of genetic and environmental effects. To test our methods, we phenotyped selected accessions from populations of tomatoes encompassing the natural range of tomato carotenoids with distinct alleles of Beta and tangerine in processing and/or cherry

62 tomato backgrounds (De Jesus, 2005; Orchard, 2014; Sari, 2016). The tomatoes were grown with replication at two locations in order to estimate variance contributed by the effects of genetics, environment, extraction method, and analysis method. Here, we present our results emphasizing the strengths and weaknesses of the standard and rapid extraction and chromatographic analysis approaches. This analysis also provided comprehensive carotenoid profiles for important sub-populations of tomato germplasm.

2.2 Methods

2.2.1 Plant Material

Thirty partially-inbred tomato lines were assembled to represent two major loci affecting carotenoid content and two genetic backgrounds. These lines were considered genetic treatments, or “genotypes.” The thirty lines were divided into sub-populations based on genetic background and major loci (genes) affecting carotenoid biosynthesis. The first sub- population consisted of 11 BC2S3 lines of cherry tomatoes in a Tainan (PI 647556) genetic background containing one of four alleles of Beta (B) in the homozygous state (Sari, 2016).

Beta codes for a fruit-specific beta-cyclase, CYC-B, with the high beta-carotene alleles conditioned by sequence variation in the 5′ untranscribed region generally associated with the promoter (Ronen et al., 2000). The Beta alleles were derived from Purdue 89-28-1

(three independent sibling lines), Jaune Flamme (two independent sibling lines), 97L97

(three independent sibling lines), and Tainan (three independent sibling lines). The second sub-population consisted of 12 BC1S3 selections of processing tomatoes in an OH8245 background containing one of three alleles of Beta in the homozygous state (Orchard, 63

2014). These alleles were derived from LA3502 (four independent sibling lines), Jaune

Flamme (4 independent sibling lines), and OH8245 (S. lycopersicum; four independent sibling lines). Based on sequence comparisons, the alleles of Beta in Purdue 89-28-1, Jaune

Flamme, 97L97 and LA3502 are independent accessions of wild species (Orchard, 2014;

Stommel, 2001). The third sub-population consisted of seven F5 lines in an OH9242 processing tomato genetic background (S. lycopersicum) with alleles of tangerine.

Tangerine codes for carotenoid isomerase (CRTISO) which converts tetra-cis-lycopene to all-trans-lycopene (Isaacson et al., 2004, 2002). Lines contained the tangerine (t) allele from NC99471-4 (three independent sibling lines) or the tangerine virescent (tv) allele from

LA0351 (four independent sibling lines) (De Jesus, 2005).

2.2.2 Experimental Design

Plants were grown in field sites located in Fremont and Wooster, OH during the summer of 2016. At both field sites, each plot contained 6-10 plants of the same line, and plots were arranged in a randomized complete block design with two blocks per location.

Samples represented an aggregate of fruits from all plants within a plot with exception of the first and last plant which were not sampled. Fruits were harvested when ripe and stored whole at -40 °C until analysis. Prior to extraction, fruits were thawed at room temperature, blended while partially frozen into homogenate, and partitioned into 50 mL tubes. Carotenoid extractions were then performed as described below and analyzed by

HPLC-DAD. Carotenoids extracted from tomato fruits using a standard extraction method were also analyzed using a rapid UHPLC-DAD method, and phenotypic data

64 were statistically compared (Figure 2.1). Statistical models used for comparisons are described below.

65

Figure 2.1 Graphical representation of data analysis strategy. Linear models, detailed in the “Statistical Analysis” section, were used to determine if sub-populations should be analyzed separately due to inherent differences in carotenoid composition or concentrations. Arrows with solid black tails indicate comparisons made between extraction or analysis methods.

66

2.2.3 Chemical Reagents

Acetone, ammonium acetate, hexanes, methanol (MeOH), methyl tert-butyl ether

(MtBE), and water were purchased from Fisher Scientific (Pittsburgh, PA) and of HPLC grade. β-carotene (≥95%) was purchased from Sigma Aldrich (St. Louis, MO) and lycopene was purified as previously described (Kopec et al., 2010).

2.2.4 Standard Carotenoid Extraction

Carotenoids were extracted in near darkness (≤1 μmol m–2 s–1) as previously described

(Kopec et al., 2014). Approximately 1 g of tomato puree was weighed into an 11 mL glass vial and extracted with 5 mL of methanol, briefly vortexed, probe sonicated

(Branson Fisher Scientific 150E Sonic Dismembrator) for 8 seconds, and centrifuged for

5 minutes at 2000 x g. The supernatant was decanted and the pellet was re-extracted with

5 mL of 1:1 hexanes:acetone, briefly vortexed, probe sonicated for 8 seconds, and centrifuged for 5 minutes at 2000 x g. The supernatant was added to the methanolic extract and the extraction was repeated two more times or until the pellet was colorless.

To induce phase separation, 10 mL of water was added to the combined supernatants, and

1 mL aliquots of the organic layer (experimentally determined to be on average 7.94 mL

(2.69% coefficient of variation)) were dried under nitrogen gas and stored at -20 °C until analysis. Twenty samples were processed in parallel and each extraction took an average of 10 minutes per sample (200 minutes per batch).

67

2.2.5 Rapid Carotenoid Extraction

In near darkness, approximately 0.5 g of tomato puree was weighed into a 44 mL glass vial and with 5 mL of methanol, briefly vortexed, and probe sonicated twice at 8 second bursts to disperse tomato tissue. Fifteen mL of 1:1 hexanes:acetone was added and samples were probe sonicated three times at 8 second bursts. To induce phase separation,

10 mL of water was added and 1 mL aliquots of the organic layer (experimentally determined to be on average 7.68 mL (1.25% coefficient of variation)) were dried under nitrogen gas and stored at -20 °C until analysis. Twenty samples were processed in parallel and each extraction took an average of 5 minutes per sample (100 minutes per batch).

2.2.6 Standard HPLC-DAD Analysis

Carotenoids were analyzed as previously described and each run lasted 21.5 minutes

(Cooperstone et al., 2015b). Briefly, dried extracts were redissolved in 1 mL of 1:1

MtBE:MeOH, filtered with a 0.22 µm nylon filter (CellTreat; Shirley, MA), and 20 µL was injected into a Waters Alliance 2695 HPLC (Waters Corp.; Milford, MA) fitted with a 996 DAD. Carotenoids were separated on a 4.6 x 250 mm, 3 µm particle size, C30 column (YMC Inc., Wilmington, NC) maintained at 35 °C. A gradient using solvent A:

60% MeOH, 35% MtBE, 3% water, and 2% (w/v) aqueous ammonium acetate and B:

78% MtBE, 20% MeOH, and 2% (w/v) aqueous ammonium acetate at a flow of 1.3 mL/min was used as follows: 100% A to 64.4% A over 9 minutes, 64.4% A to 0% A over

5.5 minutes, a hold at 0% A for an additional 3.5 minutes, and a switch to 100% A for the

68 remaining 3.5 minutes to recondition the column. Quantification was achieved using a 6- point external calibration curve of lycopene and β-carotene. Adjusted slopes were calculated for other carotenoids based on ratios of their molar extinction coefficient to lycopene, as done previously (Cooperstone et al., 2016).

2.2.7 UHPLC-DAD Analysis

Dried extracts were redissolved in 1 mL of 1:1 MtBE:MeOH, filtered with a 0.22 µm nylon filter (CellTreat; Shirley, MA) and 5 µL was injected into an 1290 Infinity II

UHPLC-DAD (Agilent; Santa Clara, CA). Carotenoids were separated on a C18 Acquity

BEH column (Waters Corp.; Milford, MA) 2.1 x 150 mm, 1.7 µm particle size, maintained at 55 °C. An isocratic flow using 42% solvent A (80% MeOH, 20% water, and 2% (w/v) aqueous ammonium acetate) and 58% solvent B (78% MtBE, 20% MeOH, and 2% (w/v) aqueous ammonium acetate) at a flow rate of 0.45 mL/min used and each run lasted 4.2 minutes. Quantification was achieved by 6-point external calibration curves as described above. Carotenoid identities were confirmed by authentic standards, spectral characteristics, and tandem mass spectrometry using a 6495 triple quadrupole mass spectrometer (Agilent; Santa Clara, CA) with an atmospheric pressure chemical ionization source operated in positive mode. Source parameters and multiple reaction monitoring experiments were adapted from those previously reported (Cooperstone et al.,

2015b) and were as follows: phytoene: 545.5>463.6, 421.6, 395.6, 327.4; phytofluene:

543.5>461.6, 393.6, 325.4; β-carotene: 537.5>455.3, 269.2, 69.0; lycopene: 537.5>455.3,

269.2, 69.0.

69

2.2.8 Statistical Analysis

All statistical analysis was conducted in R version 3.31 (R Development Core Team,

2016). Analysis of variance (ANOVA) was used to determine significance of model parameters and their contribution to total variance. Prior to analyzing data, visual inspection of histograms, Q-Q plots, and the output of Levene’s tests revealed that our data violated the assumptions of ANOVA. Log10, log2, and natural log transformations were tested. Visual inspection of Q-Q plots and non-significant outcomes from Levene’s tests determined that natural log transformation of our data satisfied the assumptions of

ANOVA. Natural log transformed data were subsequently analyzed, while untransformed means and standard deviations are presented for ease of interpretation.

Linear models were used to determine if the population should be subdivided based on presence/absence of tangerine alleles, genetic background, and analysis method.

Variance components were estimated considering each model parameter a random effect using the R package “lme4” (Bates et al., 2015). Analysis was first conducted in a hierarchical process on data generated by HPLC-DAD and UHPLC-DAD. A graphical representation of the data analysis workflow is presented in Figure 2.1. The initial linear model used for the entire population was as follows:

푌 = 휇 + 퐺 + 퐺: 퐿 + 퐿 + 퐴푀 + 퐴푀: 퐺 + 퐵퐿퐾(퐿) + 휀

Where “Y” represented the concentration of a given carotenoid, “G” represented genotype, or specific partially inbred line as a measure of genetic variation, “L”

70 represented location as a measure of environmental variation, “AM” represented analysis method, and “BLK(L)” represented block nested within location as a measure of environmental variation due to within field variation. To investigate major effects due to presence of allelic variation at Beta or tangerine or due to genetic background, the datasets were split to test the significance of these effects. First, within the Beta material we tested for differences between cherry and processing tomatoes using the following linear model:

푌 = 휇 + 퐵퐺 + 퐵퐺: 퐿 + 퐿 + 퐴푀 + 퐴푀: 퐵퐾 + 퐵퐿퐾(퐿) + 휀

Where “BG” represented genetic background (cherry or processing) and all other terms held the same meaning as the previous model. After main effects suggested significance due to genetic background, cherry and processing tomatoes were analyzed separately.

Further, processing populations were split based on tangerine and Beta sub-populations.

The following linear model was used to test for allele differences, analysis method differences and potential interactions for each of the three sub populations:

푌 = 휇 + 퐴 + 퐴: 퐿 + 퐿 + 퐴푀 + 퐴: 퐴푀 + 퐵퐿퐾(퐿) + 휀

Where “A” represents allele of either Beta or tangerine depending on the sub-population being analyzed and all other terms held the same meaning as the previous models.

Finally, data from each subpopulation was separated by analysis method and compared using the linear model below:

푌 = 휇 + 퐴 + 퐴: 퐿 + 퐿 + 퐵퐿퐾(퐿) + 휀

71

Means separation tests were carried out using Tukey’s honest significance test (HSD;

α=0.05) using the R package “Agricolae” (Mendiburu, 2009). Means and significance patterns generated from Tukey’s HSD tests were then compared by extraction method.

Moreover, data for each sub-population generated by standard extraction and HPLC-

DAD analysis were compared to the same samples analyzed using a rapid UHPLC-DAD method and statistically compared using the previous model followed by a Tukey’s HSD

(α=0.05) post-hoc tests. Finally, phenotypic data and population structure were visualized by principal components analysis (PCA) with the R packages “FactoMineR” and

“factoextra” using covariance matrices (Lê et al., 2008).

A similar statistical analysis workflow was performed for phenotypic data generated by the two extraction methods compared in this study. However, it was determined from the previous dataset that BLK(L) was not significant for any carotenoid measured in sub- populations with allelic variation for Beta. Therefore, samples from only one block per location were used for extraction method comparison in the two sub-populations containing alleles of Beta.

2.3 Results and Discussion

Assessing extraction and chromatographic methods was done on the basis of ability to separate treatments, accuracy, variance partitioning, and throughput. While uncommon, variance partitioning assesses the contribution to total variance of the individual factors or sources of variation. 72

2.3.1 Extraction Methods

Unsupervised learning using PCA was performed to visualize population structure based on carotenoid profiles as a function of genetic background, allele of Beta or tangerine, and extraction method (Figure 2.2A). Similar clustering patterns were observed regardless of extraction method. In PC1, processing tomatoes with alleles of tangerine clearly separate from other types of tomatoes. This is due to the presence of carotenoids unique to tangerine tomatoes (e.g. ζ-carotene, neurosporene, tetra-cis-lycopene) and altered concentrations of those normally found in red tomatoes (e.g. phytoene and phytofluene). In PC2, tomatoes with various alleles of Beta clustered separately. Red tomatoes (OH8245 and Tainan) separated from orange tomatoes (JF, PU, and 97L97) with alleles of Beta. Orange tomatoes with the LA716 allele of Beta bridged the clusters of red tomatoes and orange tomatoes high in β-carotene in PC2 (Figure 2.2A).

73

Figure 2.2 Principal components analysis (PCA) of tomatoes extracted using the standard or rapid method (A) and tomatoes analyzed by HPLC-DAD or UHPLC-DAD (B). Individuals with alleles of tangerine were not included in Figure 2.2B due to an inability to resolve -carotene, neurosporene, and tetra-cis-lycopene by UHPLC-DAD. Sub-population clustering was similar regardless of extraction or analysis method. 74

Estimates of carotenoid concentrations for both sub-populations are presented in Table

2.1. In a cherry tomato background, concentrations of phytoene and phytofluene were similar regardless of extraction method (Table 2.1). However, concentrations of all-trans- lycopene and total lycopene measured in this sub-population were negatively affected by extraction method. Patterns of significance across the four alleles of Beta present in the cherry tomato sub-population were identical between extraction methods for phytoene, all-trans-β-carotene, all-trans-lycopene, and total β-carotene indicating that either extraction method would yield the same outcome in terms of differentiating alleles of

Beta. Significance trends for other carotenoids measured, such as phytofluene and total lycopene, tended to be similar between extraction methods. Most trends of allelic variation were similar regardless of extraction method. Similar to cherry tomatoes, concentrations of all-trans-lycopene and total lycopene were lower in processing tomato samples extracted rapidly (Table 2.1). Carotenoid concentration data indicate that the amount of lycopene and β-carotene in fruits with the LA716 allele of Beta are intermediate between red tomatoes and high β-carotene accumulating tomatoes such as those with the JF allele (Table 2.1), providing a basis for the separation seen in Figure

2.2A. Concentrations of phytoene, phytofluene, cis-β-carotene, all-trans-β-carotene, and total β-carotene (all-trans-β-carotene + cis isomer species) were similar regardless of extraction method used (Table 2.1).

75

Table 2.1 Carotenoid concentration (mg/100g fresh weight; ± standard deviation) in tomatoes grown in multiple locations as a function of background, Beta allele, and extraction method.

Backgroun Method/ Sample Phytoene Phytofluene All-trans cis- Total All-trans β- cis-β- Total β- d Beta Allele Size Lycopene Lycopene Lycopenez carotene carotene Carotene Cherry Standard 97L97 n=6 0.46±0.14b 0.12±0.05c 0.06±0.02c 0.03±0.03b 0.09±0.04c 3.46±0.87a 0.66±0.09a 4.11±0.81a JF n=4 0.63±0.26b 0.22±0.08b 1.02±0.40b 0.29±0.11ab 1.32±0.51b 3.83±1.85a 0.65±0.18a 4.47±2.02a PU n=6 0.44±0.08b 0.13±0.03bc 0.19±0.21c 0.07±0.07ab 0.26±0.27c 3.46±1.03a 0.74±0.25a 4.19±1.12a Tainan n=6 2.10±0.24a 0.87±0.13a 8.92±1.73a 1.01±0.10a 9.93±1.83a 0.77±0.17b 0.28±0.07b 1.05±0.21b

Rapid 97L97 n=6 0.51±0.09b 0.13±0.04b 0.15±0.19c 0.06±0.09a 0.20±0.28b 4.53±1.51a 0.63±0.17a 5.16±1.43a JF n=4 0.56±0.15b 0.15±0.05b 0.34±0.14b 0.10±0.05a 0.44±0.18b 2.86±0.89a 0.47±0.09a 3.33±0.97a PU n=6 0.61±0.40b 0.10±0.04b 0.08±0.07c 0.04±0.03a 0.12±0.10b 3.43±1.14a 0.81±0.50a 4.24±1.56a Tainan n=6 1.58±0.35a 0.61±0.17a 2.43±1.66a 0.23±0.05a 2.65±1.69a 1.25±1.14b 0.16±0.06b 1.41±1.19b

Processing Standard JF n=8 0.52±0.18b 0.20±0.08b 1.26±0.59b 0.34±0.15b 1.59±0.72b 4.92±2.09a 0.46±0.18a 5.38±2.18a LA716 n=7 1.89±0.60a 0.60±0.26a 4.05±1.07a 0.84±0.28a 4.89±1.32a 3.32±1.35a 0.29±0.12a 3.61±1.44a OH8245 n=8 1.62±0.75a 0.62±0.25a 6.53±2.74a 0.80±0.37a 7.33±3.03a 1.12±0.96b 0.29±0.18a 1.41±0.85b

Rapid JF n=8 0.51±0.13b 0.16±0.05b 0.35±0.47b 0.15±0.06b 0.51±0.22b 4.91±1.81a 0.39±0.15a 5.30±1.85a LA716 n=7 1.94±0.61a 0.59±0.19a 0.94±0.71a 0.28±0.16a 1.21±0.88a 3.34±0.49a 0.20±0.04a 3.54±0.51a OH8245 n=8 1.60±0.86a 0.53±0.28a 1.97±1.24a 0.22±0.11ab 2.18±1.89a 1.42±1.42b 0.15±0.12a 1.57±1.51b z“Total” represents the sum of cis isomers and all-trans configurations for a given carotenoid Values with different letters within an extraction method are statistically different as determined by a Tukey’s honestly significant difference (HSD) test (α = 0.05)

76

All-trans-lycopene and β-carotene accumulate as crystalline structures in chromoplasts in tomato fruits (Harris and Spurr, 1969; Rosso, 1968). However, all-trans-lycopene and β- carotene crystals have different structures. We hypothesize that lower extraction efficiency of all-trans-lycopene using the rapid method is due to a decreased ability to disrupt and solubilize tightly packed H-aggregates of all-trans-lycopene (Schweiggert and Carle, 2017). β-Carotene extraction was not affected to the same extent. We hypothesize that its non-planar structure, which does not lend well to H-aggregate formation (Schweiggert and Carle, 2017), allowed for similar solubility of β-carotene between extraction methods.

For processing tomatoes with allelic variation at the tangerine locus, concentrations of phytoene, phytofluene, neurosporene, and all-trans-lycopene were similar between the two extraction methods (Table 2.2). Lower extraction efficiency in the rapid extraction method influenced the estimated concentrations of tetra-cis-lycopene and ζ-carotene.

Separation between alleles of tangerine were identical for phytoene, phytofluene, tetra- cis-lycopene, cis-lycopene, and all-trans-lycopene. Distinguishing differences between t and tv for ζ-carotene, neurosporene, and total lycopene were trending towards significance

(0.05 < P < 0.07), but only significantly differentiated when samples were extracted using the rapid method (Table 2.2).

77

Table 2.2 Carotenoid concentration (mg/100g fresh weight; ± standard deviation) in processing tomatoes grown in multiple locations as a function of tangerine allele and extraction method.

Method/ Sample Phytoene Phytofluene ζ-Carotene Neurosporene Tetra-cis- All-trans- Other cis- Total tangerine Size Lycopene Lycopene Lycopenez Lycopeney Allele Standard t n=12 5.85±2.64b 1.86±0.87b 3.58±1.99a 0.91±0.65a 3.32±1.22b 0.06±0.02b 0.67±0.22b 4.79±1.78a tv n=16 8.16±2.59a 2.72±0.83a 5.04±2.03a 1.26±0.48a 4.56±1.41a 0.08±0.03a 0.98±0.26a 5.34±1.83a

Rapid t n=12 4.71±2.38b 1.49±0.75b 2.53±1.58b 0.87±0.49b 2.14±0.83b 0.05±0.04b 0.37±0.13b 2.63±0.96b tv n=16 7.14±2.56a 2.37±0.83a 3.82±1.53a 1.31±0.49a 2.98±0.85a 0.09±0.05a 0.64±0.21a 3.71±1.07a

z“Other cis Lycopene” indicates the sum of all cis-lycopene isomers excluding tetra-cis-lycopene. y“Total” represents the sum of cis isomers and all-trans configurations for a given carotenoid Values with different letters within an extraction method are statistically different as determined by a Tukey’s honestly significant difference (HSD) test (α = 0.05)

78

Beyond contrasting the accuracy of the extraction methods, we also took a novel approach of modeling the contribution of genetic, environmental, and extraction effects on total variance. For most carotenoids, genotypic differences (“G”) tended to account for the majority of variance (up to 96.7%). The contribution of extraction method (“EX”) to total variance depended on the carotenoid (Table 2.3). The proportion of variance due to extraction was between 15.6-28.3% for all-trans lycopene, other-cis-lycopene, and total lycopene, compared to 0.0 and 1.3% for all other carotenoids (Table 2.3). Genotype by extraction method (“G:EX”) contributed a relatively high proportion of variance for these carotenoids and may reflect differences in extraction efficiency between the two methods as discussed above.

The concentrations of carotenoids extracted using the standard method correlated with those extracted using the rapid method (Table 2.4). Correlations for phytoene and phytofluene were statistically significant (P<0.001) with high correlations (r=0.937 for both). Most other carotenoids measured followed similar trends. Tetra-cis-lycopene and other cis-lycopene isomers were found to have a moderately strong (r=0.604 and 0.672, respectively) relationship and were also statistically significant (Table 2.4). cis-β-Carotene estimates generated by both extractions were statistically significant (P<0.05) but correlated weakly (r=0.336). Linear models could be utilized to partially compensate for loss of extraction efficiency.

79

Table 2.3 Proportion of variance explained by genetics, environment, and methodology for all carotenoids measured.

Method Phytoene Phytofluene ζ - Neuro- Tetra-cis- All-trans- Other-cis- Total All-trans-β cis-β Total β- Carotene sporene Lycopene Lycopene Lycopenez Lycopeney Carotene Carotene Carotene

Extraction Methods Gx 68.2% 61.5% 65.7% 68.9% 73.8% 33.8% 28.0% 33.2% 72.7% 96.7% 75.8% G:EXw 0.0% 0.0% 0.7% 0.0% 5.0% 36.0% 19.0% 36.3% 0.0% 0.04% 0.0% G:Lv 0.0% 1.6% 2.7% 0.9% 0.5% 7.5% 4.6% 7.7% 3.4% 0.7% 2.9% EX 0.2% 0.6% 0.9% 0.0% 1.3% 15.6% 28.3% 16.2% 0.0% 0.0% 0.0% L 0.0% 0.0% 0.9% 0.7% 0.3% 0.6% 4.9% 0.8% 0.0% 0.0% 0.0% Residual 31.6% 25.4% 29.0% 29.6% 19.1% 6.5% 15.3% 5.8% 23.9% 2.48% 21.4%

Analysis Methods G 60.9% 31.1% NAu NA NA 82.1% 10.3% 81.0% 54.1% 40.4% 35.7% G:AN t 0.0% 3.5% NA NA NA 0.0% 5.2% 0.0% 0.0% 4.6% 0.0% G:L 2.2% 2.7% NA NA NA 1.4% 2.8% 1.7% 5.5% 3.5% 4.4% AN 0.0% 11.5% NA NA NA 0.0% 1.0% 0.0% 1.2% 14.9% 1.9% L 3.2% 2.9% NA NA NA 0.6% 0.1% 0.8% 0.0% 0.4% 0.0% BLK(L) s 7.0% 29.9% NA NA NA 4.7% 61.2% 3.5% 19.2% 12.1% 43.5% Residual 26.7% 18.5% NA NA NA 11.2% 19.4% 13.0% 20.1% 24.0% 14.3% z“Other cis Lycopene” indicates the sum of all cis-lycopene isomers excluding tetra-cis-lycopene. y“Total” represents the sum of cis isomers and all-trans configurations for a given carotenoid xG = Genotype wG:EX = Genotype by extraction method vG:L = Genotype by location uNA = Not applicable tG:AN = Genotype by analysis method sBLK(L) = Block within location

80

Table 2.4 Regression analysis for both extraction and analysis methods.

Comparison Phytoene Phytofluene ζ - Neuro- Tetra-cis- All-trans- Other cis- Total- All-trans-β cis-β Total-β Carotene sporene Lycopene Lycopene Lycopenez Lycopeney Carotene Carotene Carotene

Extraction Methods PCCx 0.937*** 0.937*** 0.853** 0.724* 0.604** 0.735*** 0.672*** 0.708*** 0.775*** 0.336* 0.713*** Modelw 1.063x 1.055x 1.13x 0.818x 0.995x 2.1x 1.11x 2.108x 0.81x 0.26x 0.636x +0.15 +0.09 +0.56 +0.15 +1.29 +0.82 +0.3 +0.81 +0.42 +0.36 +1.12

Analysis Methods PCC 0.948*** 0.906*** NAv NA NA 0.985*** 0.777*** 0.984*** 0.961*** 0.874*** 0.946*** Model 1.083x 1.129x 0.93x 1.31x 0.974x 0.858x 0.653x 0.828x -0.04 -0.19 +0.04 +0.2 +0.16 +0.04 +0.12 +0.12 z“Other cis Lycopene” indicates the sum of all cis-lycopene isomers excluding tetra-cis-lycopene. y“Total” represents the sum of cis isomers and all-trans configurations for a given carotenoid xPCC = Pearson Correlation Coefficient (r) wModel = linear model to convert carotenoid values from a rapid extraction or UHPLC-DAD method to values obtained using standard methods vNA = Not applicable *, **, or *** Statistically significant at P ≤ 0.05, 0.01, or 0.001, respectively

81

The rapid extraction method could be used in contexts where a large number of samples need to be extracted and profiled for carotenoids. In the event that specific samples require additional accuracy or confirmation, they could be extracted using the standard method.

While rapid extractions exist for tomatoes and tomato products (Sadler et al., 1990; Sérino et al., 2009), our extraction method was able to be completed faster, at less than 5 minutes per sample (2 times faster than the standard method we utilized). Typically, tomato carotenoid extractions maximize mass transfer by subjecting samples to multiple rounds of extraction (Ferruzzi et al., 2001, 1998; Kopec et al., 2014, 2012; Rodríguez-Amaya and

Kimura, 2004). Our rapid method aimed to capitalize on the time savings of using a bulk extraction and probe sonication while eliminating time-consuming steps involved with centrifuging and liquid transfer. While omitting multiple steps of solvent addition reduced the duration of the rapid extraction, the capacity to partition analytes into solvent was diminished. This effect explains the difference in carotenoid concentration estimates between the standard and rapid extraction methods (Tables 1 and 2) particularly for less soluble carotenoids like all-trans-lycopene.

2.3.2 HPLC-DAD and UHPLC-DAD Analysis Methods

PCA was utilized to visualize overall similarities and differences in carotenoid phenotypic data generated by HPLC-DAD or UHPLC-DAD (Figure 2.2B). The phenotypic data generated by both analysis methods display similar patterns of clustering.

Different alleles of Beta clustered into three distinct groups containing red tomatoes

82

(OH8245 and Tainan), orange tomatoes (JF, PU, and 97L97), and LA716 as an intermediate between highly pigmented orange and red tomatoes. Tomatoes with alleles of tangerine were excluded from PCA analysis in this context due to the inability to quantify all carotenoids in tangerine tomatoes using the rapid UHPLC-DAD method.

Concentrations were similar for all carotenoids measured and significance trends were similar between the two analysis methods (Table 2.5). Discrimination of alleles of Beta in a cherry tomato background was almost completely unaffected by chromatographic analysis method used. The only deviation in significance trends in this sub-population can be seen in cis-β-carotene and cis-lycopene isomers. This observation is likely due to lack of chromatographic resolution for these minor carotenoid species in contrast to the longer HPLC method (Figure 2.3). Similar trends were observed in the processing tomato sub-population with allelic variation for Beta (Table 2.5). Concentrations of all carotenoids measured were similar between analysis methods and significance trends among different alleles of Beta were identical between analysis methods. The exception was cis-lycopene isomers. An inability to separate cis isomers was also observed in tomatoes with tangerine alleles using the rapid UHPLC-DAD analysis method.

83

Figure 2.3 Chromatograms of tomatoes carrying an allele of Beta (LA716) generated by HPLC-DAD and UHPLC-DAD (inset and scaled for difference in run time). Carotenoids quantified included: 1.) Phytoene; 2.) Phytofluene; 3.) β-carotene; 3a.) β-carotene isomers; 4.) all-trans-lycopene; 4a.) cis-lycopene isomers. Traces indicate DAD wavelengths 286 nm, 348 nm, and 471 nm.

84

Table 2.5 Carotenoid concentration (mg/100g fresh weight; ± standard deviation) in tomatoes grown in multiple locations as a function of background, Beta allele, and analysis method.

Background Method/ Sample Phytoene Phytofluene All-trans- Other cis- Total All-trans β- cis-β- Total β- Beta Size Lycopene Lycopenez Lycopeney carotene carotene Carotene Allele Cherry HPLC-DAD 97L97 n=12 0.49±0.16bc 0.13±0.05c 0.12±0.21c 0.05±0.07c 0.17±0.28c 3.51±0.66a 0.63±0.05a 4.15±0.65a JF n=8 0.66±0.18b 0.21±0.07b 0.95±0.38b 0.28±0.08ab 1.23±0.46b 3.59±1.25a 0.56±0.07a 4.15±1.38a PU n=12 0.43±0.08c 0.12±0.03c 0.21±0.22c 0.07±0.06bc 0.27±0.28c 3.57±0.75a 0.66±0.02a 4.22±0.78a Tainan n=12 2.01±0.47a 0.80±0.21a 8.71±2.02a 0.95±0.24a 9.66±2.23a 0.91±0.36b 0.25±0.21b 1.16±0.32b

UHPLC-DAD 97L97 n=12 0.47±0.14bc 0.32±0.05b 0.08±0.03c 0.06±0.04b 0.14±0.05c 4.44±0.74a 0.41±0.06a 4.84±0.76a JF n=8 0.70±0.38b 0.37±0.14b 1.41±1.60b 0.12±0.16b 1.53±1.76b 3.75±0.40a 0.46±0.16a 4.21±0.53a PU n=12 0.42±0.09c 0.33±0.03b 0.22±0.24c 0.08±0.02b 0.30±0.24c 4.42±0.77a 0.45±0.08a 4.87±0.83a Tainan n=12 1.87±0.30a 0.79±0.20a 9.13±2.02a 0.60±0.26a 9.73±2.13a 0.98±0.22b 0.07±0.02b 1.04±0.24b Processing HPLC-DAD JF n=16 0.65±0.36b 0.25±0.15b 2.14±2.49c 0.45±0.28c 2.59±2.77c 4.88±2.08a 0.38±0.16a 5.25±2.04a LA716 n=15 1.88±0.66a 0.62±0.25a 4.33±2.54b 0.82±0.29b 5.16±2.78b 3.23±1.29a 0.26±0.12a 3.48±1.38a OH8245 n=16 1.77±0.61a 0.68±0.23a 7.81±2.86a 0.94±0.34a 8.75±3.15a 0.92±0.69b 0.27±0.14a 1.19±0.63b

UHPLC-DAD JF n=16 0.66±0.66b 0.38±0.14b 2.09±2.48c 0.15±0.12b 2.24±2.59c 5.59±2.28a 0.52±0.24a 6.12±2.47a LA716 n=15 1.70±0.48a 0.73±0.20a 4.59±2.68b 0.28±0.26b 4.88±2.83b 3.47±1.31a 0.38±0.32a 3.84±1.51a OH8245 n=16 1.78±0.47a 0.76±0.18a 8.57±3.00a 0.50±0.17a 9.07±3.13a 1.04±0.55b 0.08±0.11b 1.11±0.63b z“Other cis Lycopene” indicates the sum of all cis-lycopene isomers excluding tetra-cis-lycopene. y“Total” represents the sum of cis isomers and all-trans configurations for a given carotenoid Values with different letters within an analysis method are statistically different as determined by a Tukey’s honestly significant difference (HSD) test (α = 0.05) 85

The sub-population of processing tomatoes with allelic variation at the tangerine locus was also analyzed both by a standard HPLC-DAD and the rapid UHPLC-DAD analysis method. We were unable to adequately resolve ζ-carotene, neurosporene and tetra-cis- lycopene found in tangerine tomatoes using the rapid UHPLC-DAD method. Tetra-cis- lycopene, ζ-carotene, neurosporene, and their geometrical isomers differ by minor structural alterations that result from desaturation events during their biosynthesis

(Isaacson et al., 2004, 2002). C30 columns were invented to separate geometrical isomers of carotenoids (Sander et al., 1994). However, no UHPLC columns are currently available with this stationary phase. To separate and quantify these carotenoids using

HPLC, long separations (often between 15 and 100 minutes) are generally employed using C18 or C30 stationary phases packed into 250 mm columns (Bijttebier et al., 2014;

Bramley, 1992; Cooperstone et al., 2016, 2015b; Daood et al., 2014; Ferruzzi et al., 1998;

Kean et al., 2008; Lesellier et al., 1993; Ronen et al., 2000). This rapid, isocratic

UHPLC-DAD method was not able to separate carotenoids from tangerine tomatoes during its 4.2 minute run time (Figure 2.3). Thus, the rapid UHPLC-DAD method is best suited for tomatoes that primarily contain lycopene and β-carotene.

Similar to the comparisons of extraction methods, we utilized random effects modeling to estimate contributions to variance. Random effects modeling indicated that analysis method (“AN”) contributed between 0 and 14.9% of the variation for all carotenoids measured by UHPLC-DAD. Genotypic (“G”) and environmental conditions (genotype by location (“G:L”), location (“L”), block within location (“BLK(L)”)) were stronger and

86 influenced carotenoid profiles substantially more than analysis method. Genotypic contributions to total variance were as high as 82.1% for all carotenoids measured by both HPLC-DAD and UHPLC-DAD demonstrating that variance due to biological conditions often overwhelms variance from analytical sources. Analysis method explained almost 15% of the total variance for other cis-β-carotene isomers (Table 2.3) which is reflected in higher values when measured by HPLC-DAD as compared to

UHPLC-DAD (Table 2.5). We hypothesized that this difference is due to an inability to fully resolve cis-β-carotene isomers from all-trans using the rapid UHPLC-DAD method as discussed above. Ultimately, these isomers constitute a small proportion of the total carotenoid content in most tomato fruits and may not be of importance in many contexts.

Overall, genetic and environmental factors overwhelmed the effects of chromatographic method.

To explore the two datasets further, we again used linear regression to determine their relationship (Table 2.4). We found that all carotenoid concentrations in our Beta processing and cherry tomato sub-populations measured by UHPLC-DAD were strongly related to those measured by HPLC-DAD. The regression equations presented in Table

2.4 could be used as a starting point to convert values generated by the UHPLC-DAD method to those generated by the HPLC-DAD method we used.

Due to the structural and chemical similarity of tomato carotenoids as well as the presence of geometrical isomers, chromatographic separation methods tend to be time

87 consuming (Bijttebier et al., 2014; Bramley, 1992; Cooperstone et al., 2016, 2015b;

Daood et al., 2014; Ferruzzi et al., 1998; Kean et al., 2008; Lesellier et al., 1993; Ronen et al., 2000). The UHPLC-DAD method presented here was able to resolve major tomato carotenoids and to an extent some cis isomers of lycopene and β-carotene in only 4.2 minutes (Figure 2.3). Other HPLC-DAD and UHPLC-DAD methods have been recently developed to separate carotenoids, but are considerably longer and do not resolve lycopene precursors such as phytoene and phytofluene (Abate-Pella et al., 2017; Arathi et al., 2015; Gupta et al., 2015). Furthermore, the rapid UHPLC-DAD method requires 93% less solvent compared to the standard method due to the reduction in sample run time and lower flow rate. Given the data we presented here, this UHPLC-DAD analysis method could greatly enhance the analytical throughput in many applications including but not limited to breeding, quality control, and food product development. The UHPLC-DAD method was unable to resolve the complex mixture of carotenoids and geometrical carotenoid isomers found in tangerine tomatoes, though it can still be used to rapidly determine if a sample is from a tangerine tomato and those samples could be profiled using the standard method we presented in this work.

2.4 Conclusion

Here, we developed new carotenoid extraction and analysis methods and applied them to assess diverse selections of tomatoes grown in multiple environments. The rapid extraction protocol was able to distinguish alleles of Beta and tangerine similarly to standard methods, although extraction efficiency was lower for some carotenoids. This 88 extraction method may be best suited for qualitative, high-throughput phenotyping where rapid turnaround time is required. The novel UHPLC-DAD method presented here separates carotenoids 5 times faster compared to the standard method. The UHPLC-DAD method was able to separate genetic background, allele effects, and environmental effects as well as the standard method. While the UHPLC-DAD method is the fastest tomato carotenoid separation protocol to date, carotenoids and geometrical isomers unique to tangerine tomatoes could not be separated and quantified. If a high degree of accuracy is required for carotenoid phenotyping, a subset of samples could be extracted using a standard method and analyzed using the rapid UHPLC-DAD method to capitalize on its time and resource savings. In many cases, genetic and environmental effects tended to contribute more to the variation in our samples than that of the extraction or chromatography methods. The rapid carotenoid extraction and analysis platform we outlined here could be adopted by plant breeders and food product developers interested in making fast, data driven decisions.

Acknowledgements

We thank the North Central Agricultural Research Station and the Ohio Agricultural

Research and Development Center farm crews for their assistance in maintaining research plots. We also thank Troy Aldrich and Jiheun Cho for maintaining tomato germplasm stocks. This work was supported by the National Institute of Food and

Agriculture award number 2014-38420-21844 and Foods for Health, a focus area of the

Discovery Themes Initiative at The Ohio State University. Samples were processed and

89 analyzed in the Nutrient and Phytochemical Analytic Shared Resource of The Ohio State

University’s Comprehensive Cancer Center (NIH P30 CA016058).

Abbreviations: ANOVA: analysis of variance; B: fruit-specific lycopene β-cyclase; BEH: bridged ethylene hybrid; CRTISO: carotenoid isomerase; CYC-B: lycopene β-cyclase;

DAD: diode array detector; HPLC: high performance liquid chromatography; JF: Jaune

Flamme; MeOH: methanol; MtBE: methyl tert-butyl ether; PCA: principal components analysis; PU: Purdue 89-28-1; UHPLC: ultra-high performance liquid chromatography

90

Chapter 3. The effects of tomato consumption on the transcriptome and metabolome of murine liver

Michael P Dzakovich1, Jennifer M. Thomas-Ahner2, Steven K. Clinton2, David M Francis3, Jessica L Cooperstone1,4

Affiliations 1The Ohio State University, Department of Horticulture and Crop Science, 2001 Fyffe Court, Columbus, OH 43210. 2The Ohio State University, Division of Medical Oncology, Department of Internal Medicine 3The Ohio State University, Ohio Agricultural Research and Development Center, Department of Horticulture and Crop Science, 1680 Madison Ave, Wooster, OH 44691. 4The Ohio State University, Department of Food Science and Technology, 2015 Fyffe Court, Columbus, OH 43210.

3.1 Abstract

Tomato consumption is associated with many health benefits including lowered risk for developing certain cancers. After absorption from the diet, tomato phytochemicals are transported to the liver, the primary metabolizing organ, where they are thought to alter gene expression in ways that lead to favorable health outcomes. However, the effects of tomato consumption on cellular processes and chemical profile of mammalian liver is not well understood. We hypothesized that tomato consumption would differentially alter the metabolome and gene expression signature of mouse liver tissue compared to a control diet and that further differentiation will be attributed to the type of tomato. C57BL/6 mice (n=12/group) were fed a macronutrient matched diet containing either 10% red tomato, 10% tangerine tomato, or no tomato

91 powder for 6 weeks after weaning. RNA-Seq analyses revealed that tomato type and consumption, in general, had subtle effect on the transcriptional profiles of mouse livers; altering between 0.02 to 5.6% of total transcripts measured. Although expression profiles varied by treatment, gene set enrichment analyses indicated that Phase I and II xenobiotic metabolism were modulated by tomato consumption. Untargeted metabolomics experiments revealed that mice consuming diets containing tomatoes had altered chemical profiles in liver tissue. Among the features detected, 56-

119 (2.2-4.8% of total features) were statistically significant between treatment groups. We confirmed the identity of two steroidal alkaloids and tentatively identified 17 other Phase I and II metabolites. Steroidal alkaloids strongly differentiated mice that consumed diets enriched with tomatoes from those that did not. Our findings indicate that tomato consumption can modestly impact transcriptional signatures within the liver due to the presence of phytochemicals and that steroidal alkaloids derived from tomatoes differentiate the liver metabolome of animals fed tomato diets.

3.2 Introduction

Epidemiological studies indicate that consuming tomatoes and tomato products is associated with a reduction of risk of developing many chronic diseases including some common cancers. The principal hypothesis is that the carotenoid lycopene conveys this effect (Clinton et al., 1996;

Etminan et al., 2004; Gann et al., 1999; Helga Gerster, 1997; Giovannucci et al., 2002, 1995; Ip et al., 2014, 2013; Melendez-Martinez et al., 2013; Mikhak et al., 2008; Rao et al., 1999; Wang et al., 2010). This is partially because lycopene circulates in plasma and deposits in tissue after tomato consumption (Kaplan et al., 1980; Moran et al., 2013; Parker, 1989) and those who eat more tomatoes have higher plasma lycopene (Grainger et al., 2018). However, studies comparing

92 consumption of feed with tomato with consumption of feed containing the same amount of lycopene as a purified compound, show that benefits are greater with consumption of the whole food (Boileau et al., 2003; Burton-Freeman and Sesso, 2014). These findings have led us to hypothesize that other phytochemicals are contributing to the bioactivity of tomato besides lycopene.

Genetic diversity present in tomato provides a tool with which scientists can manipulate phytochemicals to ask nutritional questions. Tangerine tomato fruits have a naturally occurring genetic mutation that leads to the accumulation of cis-lycopene isomers, which are more bioavailable in both rodents (Cooperstone et al., 2017) and humans (Cooperstone et al., 2015a) than lycopene present in the all-trans form, as found in red tomatoes. In addition, these orange- colored tomatoes contain elevated levels of the lycopene precursors phytoene and phytofluene, while also having considerable amounts of zeta-carotene and neurosporene, both of which are essentially absent in red tomatoes (Cooperstone et al., 2015; Isaacson et al., 2004, 2002). While tomato carotenoids have been extensively studied in the context of human health, there are thousands of other phytochemicals in tomato that may influence health outcomes of those who consume them (Cichon et al., 2017a; Moco et al., 2006; Zhu et al., 2018). An untargeted metabolomics approach could illuminate tomato derived compounds absorbed from the diet and reveal how they are metabolized by organs such as the liver.

After absorption from the diet, the liver is one of the primary destinations for chemical components present within food. Within the liver, chemical compounds can be further metabolized prior to distribution to different parts of the body or tagged for excretion via urine or bile. Evidence from

93 rodent models demonstrates that dietary supplementation with specific tomato phytochemicals and tomato extracts can attenuate diseases of the liver such as nonalcoholic fatty liver disease, nonalcoholic steatohepatitis, and hepatocellular carcinoma (Ip et al., 2014, 2013; Melendez-

Martinez et al., 2013; Tan et al., 2016; Wang et al., 2010). The mechanism of this action has been proposed to be alterations in gene expression mediated by tomato phytochemicals. However, only

PCR and microarray based studies using curated genes have been conducted and next generation sequencing approaches offer an attractive way to discover novel relationships between diet and gene expression since they are not biased by a limited selection of genes.

This study aimed to define the impact of tomato consumption (and type) on the transcriptional and chemical profiles of mouse liver tissue using next generation sequencing and untargeted, mass spectrometry-based metabolomics. We utilized two unique tomato varieties to determine if quantified outcomes were the result of tomato consumption, in general, or due to altering tomato phytochemical profiles (which in this case, differed in carotenoid profile). We focused on the more diverse polar/semi-polar compounds since previous studies have not shown a benefit from increased plasma lycopene levels from tangerine tomatoes (Cooperstone et al., 2017; Geraghty et al., 2020). We hypothesized that dietary tomato consumption will alter the chemical composition and gene expression signature of mouse liver tissue compared to control diets. Further, we hypothesized that the effects of tangerine tomato and traditional red tomato diet consumption will be distinct due to their different genetic backgrounds.

94

3.3 Materials and Methods

3.3.1 Reagents and standards

LC-MS grade Acetonitrile (0.1% formic acid), isopropanol, methanol, and water (0.1% formic acid), and 99.5% butylated hydroxytoluene (BHT), were purchased from Fisher Scientific

(Pittsburgh, PA).

3.3.2 Animal Diets and Experimental Design

Tangerine processing tomato variety FG04-167 and the red processing varieties OH8243,

OH8245, and FG99-218 were grown at The Ohio State University’s North Central Agricultural research station in Fremont, OH. FG04-167 contains the tangerine (t3183) allele of CAROTENOID

ISOMERASE and is enriched in cis-structured lycopene and its precursors (Cooperstone et al.,

2015a; Isaacson et al., 2002). FG990-218 contains the old gold crimson (ogc) null allele of Beta and produce elevated high levels of all-trans lycopene (Ronen et al., 2000). Tomato fruits were processed into juices at The Ohio State University Food Industries Center pilot plant as previously described (Cooperstone et al., 2015a). Tomato juices were then freeze dried and stored at −20 °C as previously described (Cooperstone et al., 2017). Animal diets were manufactured by Dyets.

(Bethlehem, PA, USA) by combining tomato powders into a purified AIN-93G diet at 10% (w/w) prior to pelleting (Kopec et al., 2015). To maintain macronutrient composition across all treatments, control diets were supplemented with corn starch and dextrose to compensate for the lack of carbohydrates naturally present in tomato (Boileau et al., 2003). Animal diets were maintained at −20 °C until use and replaced every 2-3 days to minimize phytochemical degradation.

95

3.3.3 Experimental Design

Male C57BL/6 mice were reared in a vivarium at The Ohio State University according to standards set by the American Association for Accreditation of Laboratory Animal Care (protocol

#2010A00000095). At four weeks of age, mice were randomly assigned treatment groups and fed either a control diet, a diet supplemented with 10% (w/w) red tomato powder, or a diet supplemented with 10% (w/w) tangerine tomato powder as previously described (n=12/group).

Mouse bodyweights and food intake were determined on a weekly basis to ensure consistent growth and development throughout the duration of the experiment. Final body and liver masses are reported in Table 3.1. After 6 weeks of dietary intervention (10 weeks of age), mice were sacrificed, and livers were removed for analysis. Part of one lobe was preserved in RNAlater

(Thermo Fisher Scientific; Waltham, MA, USA) for transcriptomics and the other part was reserved for chemical profiling. Mice were sacrificed at the same time and the same lobes were collected to avoid confounding effects due to circadian rhythm and tissue heterogeneity, respectively. Samples were flash frozen in liquid nitrogen and stored at −80 °C until analysis.

3.3.4 RNA Extraction

Mouse liver RNA was extracted from approximately 10 mg (± 2 mg) of previously preserved and frozen tissue with an RNEasy kit and QIAshredder using manufacturer’s directions (QIAGEN;

Germantown, MD, USA). Extracts were treated with DNase I (QIAGEN) to destroy genomic DNA present in samples. Aliquots of RNA from each sample were run on an Agilent 2100 Bioanalyzer

(Agilent Technologies; Santa Clara, CA, USA) and samples had RNA Integrity Number scores between 8.7 and 9.3. Based on RNA concentration as determined by NanoDrop (Thermo Fisher

Scientific), an aliquot was taken from each sample containing 1 μg of RNA and sent to The

Cleveland Clinic Lerner Genomics Core for library preparation and sequencing. 96

3.3.5 cDNA Library Preparation and RNA-Seq Data Acquisition cDNA libraries were prepared using the NEXTflex Rapid Directional RNA-Seq kit (PerkinElmer;

Waltham, MA, USA) according to manufacturer’s directions. cDNA libraries were then sequenced on two lanes of an Illumina HiSeq2500 (Illumina; San Diego, CA, USA) using 100 bp paired-end reads at a depth of 40-50 million reads/sample. The PhiX quality control library was concurrently run on each lane to ensure proper cluster generation, sequencing, phasing, and pre-phasing

(Illumina).

3.3.6 Analysis of RNA-Seq Data

Sample data in the form of FASTQ files were checked for quality with FASTQC software using the Ohio Supercomputer Center (Andrews, 2015). Adapter sequences as well as low quality reads were trimmed using Trimmomatic (Bolger et al., 2014). The following trimming parameters were used: adapter clip seed mismatches:2, adapter clip palindrome clip:30, adapter.clip.simplle.clip:10, max.info trimming algorithm using a target length of 40 (strictness value of 0.7 to favor read

“correctness”), a minimum length of 36, and an average quality score of 20. These settings resulted in the deletion of 1-2% of reads in each FASTQ file and trimmed files were once again examined for quality using FASTQC. An alignment index was constructed with the mouse genome

(GRCm38.p6) using Rsubread (Liao et al., 2019) and reads from lane 1 and lane 2 were aligned.

Multidimensional scaling (MDS) revealed a lack of lane effects and reads from lanes 1 and 2 were combined using BAMtools (Barnett et al., 2011). Reads above 0.23 counts per million

(10/minimum library size) in at least 12 samples were retained and log2 transformed (14,951 genes remaining after filtering). An additional analysis was performed by retaining reads above 0.23

97 million counts per read and were present in at least six samples. Data were then trimmed mean of

M-values (TMM) normalized and subsequently analyzed for differential expression using the edgeR pipeline (Robinson et al., 2010; Robinson and Oshlack, 2010). Genewise negative binomial generalized linear models were used to compare red . control, tangerine vs. control, red vs. tangerine, and tomato (red and tangerine) vs. control (false discovery rate corrected significance threshold of P<0.1). Gene set overlaps were computed using differentially expressed genes from the aforementioned contrasts and querying the Molecular Signatures Database using the Hallmark and KEGG gene sets (MSigDB; https://www.gsea-msigdb.org/gsea/msigdb/annotate.jsp).

3.3.7 Extraction of Polar Metabolites

Polar metabolites were extracted from 50 mg (± 5 mg) of frozen liver tissue with methanol using a modified extraction method (Masson et al., 2010). Five, 2 mm zirconium oxide beads (Union

Process; Akron, OH, USA) were placed in 1.5 mL tubes containing sample tissue and 500 μL of chilled methanol with 0.1% BHT was immediately added. Process blanks were created using the same process, without the addition of liver, and served to identify compounds present in analysis but not coming from liver sample. Tubes were then placed in a chilled Cryo−Block (SPEX Sample

Prep; Metuchen, NJ, USA) and samples were homogenized in a SPEX Geno/Grinder 2010 for 30 seconds at 1400 RPM. Samples were then centrifuged at 21,130 x g for five min at 4 °C, 475 μL of supernatant fluid was removed from each sample tube and passed through an Oasis PRiME

HLB 1cc/30mg SPE cartridge (Waters; Milford, MA, USA) to remove phospholipids. Cartridges were then flushed with 100 μL of acetonitrile. Three separate 135 μL aliquots were reserved for analysis and a final 75 μL aliquot from each sample was combined to create a pooled QC. Samples, process blanks, and QCs were dried down under a stream of nitrogen and stored at −80 °C until

98 analysis. Whenever possible, samples were kept on ice to slow unwanted chemical reactions/degradation.

3.3.8 Untargeted Metabolomics Data Collection

Full scan UHPLC-QTOF-MS: Dried extracts were redissolved in 100 μL of 1:1 methanol:water and centrifuged at 21,130 x g for 3 min to precipitate undissolved solids. The clear supernatant was transferred into a vial and run on an Agilent 1290 Infinity II ultra−high performance liquid chromatography (UHPLC) system coupled with an Agilent 6545 quadrupole time of flight mass spectrometer (QTOF-MS) (Agilent Technologies). Analytes were separated using a Waters C18

Acquity bridged ethylene hybrid (BEH) column (2.1 × 100 mm, 1.7 µm particle size) maintained at 40 °C using water (A) and acetonitrile (B), both with 0.1% formic acid at 0.5 mL/min. The gradient was as follows: 0% B held for 1 min, a linear gradient to 100% B over 10 min, 100% B held for 2 min, and a return to 0% B for an additional 2 min (total run time of 15 min). Between each injection, the needle and sample loop were washed with acetonitrile for 8 sec, isopropanol for 10 sec, and water for 5 sec to reduce carryover. Samples were maintained at 4 °C and the injection volume was 5 μL. The QTOF-MS was operated in electrospray positive ionization mode and data were collected from 50 – 1700 m/z. The gas temperature was 350 °C with a flow of 10

L/min. Nebulizer gas was maintained at 35 psig. Sheathgas temperature and flow were 375 °C and

11 L/min, respectively. To condition the instrument, a QC was repeatedly injected until base peak chromatograms were stable. QC samples were injected every 6 samples to monitor instrument performance and intra-run variability.

99

Data dependent UHPLC-QTOF-MS/MS: Fragmentation data of features of interest was collected using an Agilent 1290 Infinity II UHPLC in tandem with an Agilent 6545 QTOF-MS using combined pooled extracts of red and tangerine fed mouse livers (n=4/group). Column and chromatographic conditions enumerated previously were used for MS/MS confirmation. The electrospray ionization source was operated in positive mode and data were collected over 50-

1700 m/z for full-scan, data dependent, and data independent MS/MS experiments. Source gas temperatures and flow rates were identical to those described previously. A collision energy of 30 eV was used for all masses and injection volume was 10 or 20 μL.

3.3.9 Analysis of Untargeted Metabolomics Data

Peak integration and feature alignment were done in Agilent MassHunter Profinder 10.0 (Agilent

Technologies). Data deconvolution was conducted using the batch recursive feature extraction program within MassHunter Profinder using the following parameters: Retention time restriction

0.5-11.0 to exclude the solvent front and re-equilibration period, peak height ≥ 500 counts, allowed positive ions: +H, +Na, +K, common organic (no halogens) isotope grouping model, and charge state assignments limited to 1-2, compound ion count threshold required two or more ions.

Retention time tolerance was 0% + 0.3 min with a mass tolerance of 20 ppm + 2.0 mDa. Absolute height threshold set to 1000 counts by determining noise threshold and minimum filter matches required that 75% of files must contain a given compound in at least one sample group. Mass tolerance was ± 10 ppm with a retention time threshold of ± 0.3 minutes. Expansion of values for chromatogram extraction required symmetric (ppm) for possible m/z ± 35 and extracted ion chromatogram ranges were limited. Expected retention time was symmetric ± 1.5 minutes.

Integration was accomplished using the Agile 2 algorithm with Gaussian smoothing using 9 points

100 with a width of 5 points. Peaks were filtered based on height with an absolute area threshold ≥

1000 counts. Chromatogram data format was set to profile when available but used centroid data otherwise. Spectra with average scans > 10 were used and TOF spectra were excluded if above

20% saturation in the m/z ranges used in the chromatogram. Empty spectra were not returned. No

MS peak spectrum background was used. For centroiding, a maximum spike width of 2 with a required valley of 0.7 was selected. Mass spectral data format was set to be profile when available.

Find by ion filters were limited to a target score ≥ 60.0 and minimum filter matches must be in

75% of files in at least one sample group. This process resulted in 2492 features for ESI+ data.

Datasets were filtered by requiring features must be present in all 7 of the QC samples, and removing features with a variance of >30% in the QC samples, or a maximum ion intensity/average intensity less than 10 times that of the process blank. Data were statistically analyzed using R 3.6.3

(R Development Core Team, 2018). Features present in less than 8 individuals within a treatment group were removed and remaining missing values were imputed with half of the lowest peak area value in the dataset. Data were then log2 transformed and Pareto scaled. Principal components analysis (PCA) was used to reduce data dimensionality, and for visualization using the packages

FactoMineR and Factoextra (Lê et al., 2008). K-means clustering was conducted using R 3.6.3

(nstart = 20, iter.max = 200), and scree plots allowed for the visualization of the natural number of clusters within our data structure. P-values from univariate statistics such as t-tests and analysis of variance (ANOVA) were corrected for multiple testing using a Benjamini-Hochberg procedure

(Benjamini et al., 2001). Partial least squares discriminant analysis (PLS-DA) and random forest- based feature selection were conducted using the ropls and randomForest packages, respectively

(Liaw and Wiener, 2002; Thévenot et al., 2015). For PLS-DA models, the number of components was automatically selected by ropls. Components were added if the percentage of dispersion

101 explained by the new component is > 5% and if the predicted residual sums of squares for the model containing the new component is less than the residual sum of squares for the previous model (Eriksson et al., 2001). PLS-DA model performance was determined by 7-fold cross validation and significance was calculated with 100 permutation tests (Szymańska et al., 2012).

Random forest models were tuned by selecting the minimum number of trees and variables at each split point (mtry) to minimize prediction error. Random forest models were run on test sets

(randomly selected individuals comprising 30% of the total dataset) to assess prediction accuracy.

3.3.10 Dataset Availability

Raw datafiles from RNA-Seq and untargeted metabolomics, as well as deconvoluted metabolomics data will be made publicly available in the Gene Expression Omnibus and

Metabolomics Workbench, respectively.

3.4 Results and Discussion

3.4.1 Animal weights and tissue mass

Animals used in our experiments were part of a larger study investigating the effects of dietary carotenoids on the development of prostate cancer (Geraghty et al., 2020). Consumption of tomatoes, and the carotenoid lycopene has been associated with a decreased risk of developing prostate cancer in epidemiological studies (Giovannucci et al., 1995) and this evidence is further supported in pre-clinical trials (Boileau, 2003; Tan et al., 2017; Wan et al., 2014). Tangerine tomatoes were included as part of this larger study due to their cis-carotenoid rich profiles

(Isaacson et al., 2002), and to allow us to ask questions about how differences in carotenoids affects the transcriptome and metabolome of the liver. Evidence from human clinical trials indicates that

102 tetra-cis-lycopene, the predominant form of lycopene in tangerine tomatoes, is 8.5x more bioavailable than its all-trans counterpart found in red tomatoes (Cooperstone et al., 2015a).

Therefore, it was hypothesized that tangerine tomatoes might impart additional benefits due to the abundance of more bioavailable carotenoids. However, additional health benefits from consuming tangerine instead of red tomatoes has not been observed, suggesting that compounds outside of the carotenoid pathway may be influencing health outcomes (Cooperstone et al., 2017). It is unclear to what extent lycopene, or other non-carotenoid phytochemicals are responsible for this effect. In order to understand which phytochemicals might be plausibly bioactive, we aimed to determine which metabolites from tomato are deposited in liver tissue.

Additionally, we sought to understand how tomato consumption affects the liver transcriptome and metabolome in mice. Testing multiple types of tomatoes allowed us to determine if effects seen in the transcriptome and/or metabolome were specific to an individual variety or an effect from tomato, in general. In our experiments, only wildtype C57BL/6 mice were used to avoid any confounding effects prostate cancer development on the rest of the body. It is crucial that animals in different treatment groups are comparable so that the effects of tomato consumption can be isolated. Mice in this study were fed one of three diets over the course of six weeks and matured to a statistically similar weight (P<0.35) (Table 3.1). Mice were permitted to consume diets ad libitum. Palatability and overall compliance with diets was similar among treatments (Table 3.1).

At sacrifice (10 wk), liver masses were recorded, and analysis of variance determined that there was no effect due to diet (P<0.43). Previous studies have also shown that tomato supplementation does not significantly alter weight gain in rodents (Cooperstone et al., 2017; White et al., 1988).

103

Given that our control and tomato containing diets were macronutrient matched, differences seen in RNA-Seq or untargeted metabolomics datasets can be attributed to tomato phytochemicals.

104

Table 3.1 Mean ± standard deviation of mouse body and liver mass at sacrifice (10 wk).

Treatment Body Mass Liver Mass (g) (g) Control 28.13±3.54 1.09±0.16 Red 27.21±4.08 1.20±0.22 Tangerine 25.91±3.27 1.15±0.19 ANOVA among treatment determined that the effect of diet type was non-significant for body mass (P<0.35) and liver mass (P<0.43).

105

3.4.2 Tomato consumption and type influence the transcriptome of mouse liver tissue

RNA-Seq was utilized to gain a broad perspective on the effects of tomato consumption and type on the transcriptional landscape of murine livers. Gene counts were log2 transformed and TMM normalized (Figure 3.1A). Multidimensional scaling (MDS) was then used to determine if lane effects were apparent (Figure 3.1B) and to visualize the data structure after merging (Figure 3.1C). One control and one red tomato red mouse were determined to be outliers based on their relatively high variability in differentially expressed genes as compared to other samples in their group. This attribute was visible in their spatial relationship with other samples in the same treatment groups by MDS and these individuals were subsequently removed. For data quality control, design matrix dispersion was visualized (Figure 3.1D). The apparent lack of trend between average log counts per million and the biological coefficient of variation indicated that genewise negative binomial models were suitable for use in our study and were not biased at different magnitudes of expression (Yoon and Nam, 2017). After removing lowly expressed genes,

14,951 were suitable for differential expression analysis. Due to the small effects observed among treatments, we used an FDR cutoff of 0.1 to increase the probability of finding genes of interest that would have been excluded due to Type II error. This cutoff value is frequently used in RNA-seq studies for the reasons previously outlined (McCabe et al.,

2012; Pimentel et al., 2017; Sheng et al., 2017). An additional analysis was performed using the same steps as outlined previously, but with less stringent filtering criteria (Reads must be above 0.23 counts per million and present in at least 6 samples) resulting in a total of 15,432 genes detected. Contrasts performed between red vs. control, tangerine vs.

106 control, red vs. tangerine, and tomato (red and tangerine) vs. control resulted in either the same or fewer differentially expressed genes; the latter ostensibly due to increased statistical noise. Therefore, results presented below were generated by filtering genes to be present in at least 12 samples.

After passing our filtering parameters, 380 (2.54%) and 458 (3.06%) total genes were up and downregulated, respectively, in response to diet. Venn diagrams (Figure 3.2) illustrating common and unique up (Figure 3.2A) and downregulated (Figure 3.2B) genes in different treatment groups indicated There was little overlap in differentially expressed genes among the different treatment comparisons (Figure 3.2). The majority of differentially expressed genes were found when comparing transcriptional profiles of mice fed tangerine and red tomato enriched diets. This finding indicates that tomato type, ostensibly due to differences in phytochemical composition (because macronutrient composition is matched), is a contributor of gene expression alterations.

107

Figure 3.1 Boxplots of log2 transformed and TMM normalized RNA-Seq (A); Multidimensional scaling analysis of RNA-Seq data before (B) and after (C) merging data generated on two sequencing lanes; and scatter plot of calculated dispersion of the design matrix used in the analysis of RNA-Seq data (D). .

108

Figure 3.2 Venn diagrams displaying counts of overlapping upregulated (A) and downregulated (B) genes following differential expression analysis of 14,951 genes.

109

To contextualize any physiological and metabolic alterations occurring as a result of dietary difference, gene set enrichment analysis (GSEA) with the molecular signatures database (MSigDB) was leveraged. When classifying differentially expressed genes from comparing tangerine to control samples, gene sets related to linoleic acid metabolism, retinoid metabolism, xenobiotic and drug metabolism by cytochrome P450 enzymes were significantly enriched (Table 3.2). Although relatively few genes were differentially expressed in this comparison, the GSEA outcomes were attributed to the statistically significant upregulation of the cytochrome P450s CYP2C29 and CYP1A2 (Table 3.3).

These genes encode cytochrome P450 enzymes involved in Phase I metabolism and have been shown to be downregulated by high-fat diets and environmental pollutants (Hardesty et al., 2019), suppressed in fatty liver disease (Schuck et al., 2014), and overexpressed in mouse models of liver fibrosis (Collino et al., 2018). Given that expression of these genes is controlled by physiological and environmental factors, expression patterns seen in our study are likely due to tomato consumption. However, relatively small fold changes in gene expression make it difficult to speculate about clinical significance of these changes outside of a response to tomato derived xenobiotics present in liver tissue (Table 3.3).

Comparing gene expression profiles from mice fed red tomato enriched diets to those fed control diets, numerous gene sets were enriched including drug metabolism by cytochrome

P450s (similar to tangerine vs. control), xenobiotic metabolism, endocytosis, bile acid metabolism, UV response, IL2 STAT5 signaling, glycolysis, heme metabolism, interferon gamma response, and glutathione metabolism (Table 3.2). These broad responses represent many metabolic pathways within the liver. IL2 and STAT5, for example, are important

110 members of signaling cascades involved in liver cancer development (Gabeen et al., 2014;

Rani and Murphy, 2016). Glutathione metabolism has long been implicated in Phase II detoxification metabolism and has also been shown to bioactivate xenobiotics (Dekant and

Vamvakas, 1993; Liu et al., 2018).

111

Table 3.2 Gene set overlaps enriched in differentially expressed genes from mice fed tomato supplemented diets (Continued). Tangerine vs. Red vs. Control Tangerine vs. Control Red Number of Genes in Comparison 3 319 423

Gene Set Name kz k k KEGG Linoleic Acid Metabolism 2 —y — KEGG Retinol Metabolism 2 — — KEGG Metabolism of Xenobiotics by Cytochrome P450 2 — — KEGG Drug Metabolism by Cytochrome P450 2 5 5 KEGG Endocytosis — 10 — HALLMARK Xenobiotic Metabolism — 10 16 HALLMARK Bile Acid Metabolism — 7 — HALLMARK UV Response (Up) — 8 7 HALLMARK IL2 STAT5 Signaling — 8 — HALLMARK Glycolysis — 8 8 HALLMARK Heme Metabolism — 8 — HALLMARK Interferon Gamma Response — 8 8 KEGG Glutathione Metabolism — 4 — KEGG Circadian Rhythm Mammal — — 6 KEGG Alanine Aspartate and Glutamate Metabolism — — 5 HALLMARK KRAS Signaling (Down) — — 10 KEGG Arginine and Proline Metabolism — — 5 KEGG SNARE Interactions in Vesicular Transport — — 4 HALLMARK Coagulation — — 7 KEGG Endocytosis — — 8 KEGG Arrhythmogenic Right Ventricular Cardiomyopathy — — 5 ARVC KEGG Glycerophospholipid Metabolism — — 5 KEGG Peroxisome — — 5 KEGG Selenoamino Acid Metabolism — — 3 KEGG Tight Junction — — 6

112 zNumber of differentially expressed genes present in a given gene set yGene set overlap not detected Tomato vs. Control was omitted due to a lack of significant gene set overlaps

113

Table 3.3 Select differentially expressed genes related to xenobiotic metabolism and mammalian circadian rhythm that were differentially expressed (Q < 0.1) among treatment groups (Continued).

Treatment Gene Symbol Q-valuez Log2 Fold Fold Changey Comparison Change Tangerine. vs. Control CYP2C29 3.17E-3 0.58 1.49 CYP1A2 4.96E-3 0.52 1.44 Red vs. Control CYP2C29 3.08E-7 0.88 1.84 CYP4F17 1.56E-2 0.57 1.49 EPHX1 8.05E-2 0.36 1.29 GCLC 6.03E-2 0.46 1.38 AOX1 2.90E-2 0.41 1.33 GSS 7.31E-2 0.40 1.32 CDA 8.45E-2 0.81 1.76 ABCC3 40.2E-2 0.58 1.49 ARG1 1.85E-2 -0.34 0.79 DHRS7 5.61E-2 -0.25 0.84 RAI14 4.07E-2 0.28 1.21 RARRES1 8.14E-2 0.36 1.29 Tangerine vs. Red ACOX2 3.15E-2 -0.26 0.84 CYP1A2 2.35E-2 0.37 1.29 CYP2A5 4.63E-2 0.94 1.92 AOX1 5.13E-2 -0.34 0.79 GCLC 4.02E-2 -0.46 0.73 GCH1 1.73E-2 0.32 1.25 PAPSS2 6.91E-2 1.32 7.75 SERPINA6 9.96E-2 0.41 1.33 IGFBP4 9.17E-2 0.24 1.18 TMEM97 9.56E-2 0.33 1.26

114

PDK4 9.12E-2 -0.93 1.18 GSS 8.74E-2 -0.35 0.78 RAI14 8.84E-3 -0.35 0.79 RARRES1 9.19E-2 -0.32 0.80 CRY1 7.29E-2 -0.46 0.73 ARNTL 7.03E-2 -1.35 0.39 NPAS2 2.27E-2 -2.45 0.18 BHLHE41 7.65E-2 -0.46 0.73 PER2 8.45E-2 0.54 1.45 CLOCK 6.17E-2 -0.48 0.72 Tomato vs. Control CYP2C29 2.86E-5 0.73 1.66 OSGIN1 4.69E-2 1.15 2.21 CYP2C53-ps 3.43E-5 1.22 2.33 DLC1 7.85E-5 0.31 1.24 zFDR corrected (α=0.1) P-value yLog2 fold change converted to base 1

115

The cytochrome P450 CYP2C29 and CYP4F17, as well as EPHX1, GCLC, AOX1,

GSS, CDA, and ABCC3 were all significantly upregulated in liver tissue from mice fed red tomato enriched diets compared to controls (Table 3). Conversely, ARG1 and DHRS7 were both downregulated as a result of red tomato consumption. These genes all overlapped with the Hallmark xenobiotic metabolism gene set due to their broad involvement in various detoxification reactions related to Phase I or II metabolism (Lu et al., 2013; Schuck et al., 2014). GCLC and GSS, for example, work in tandem to produce glutathione which can be conjugated to xenobiotic molecules to aid in their excretion

(Franklin et al., 2009). While not classified under a specific gene set, expression of RAI14 and RARRES1 (retinoic acid induced and retinoic acid responder proteins) was significantly higher in mice fed red tomato supplemented diets compared to control. The increased expression of RAI14 and RARRES1 was unique to the red tomato group. RAI14 in particular is induced by retinoic acid (Kutty et al., 2001) and its increased expression has been associated with gastric and prostate cancer (He et al., 2018; Paez et al., 2016).

Mice fed diets containing tangerine tomatoes did not have elevated levels of RAI14 or

RARRES1 perhaps due to a lack of provitamin A carotenoids that are normally present in red tomatoes (Isaacson et al., 2002). Regardless, changes in expression were subtle (0.28 and 0.36 Log2 fold change for RAI14 and RARRES1, respectively) and the clinical significance of this observation is unknown.

Comparing liver transcriptomes from mice who ate tangerine or red tomato supplemented diets, GSEA indicated an enrichment for genes associated with drug metabolism by cytochrome P450s, xenobiotic metabolism, glycolysis, interferon gamma response,

116 circadian rhythm in mammals, alanine, aspartate, and glutamate metabolism, down regulation of genes related to KRAS signaling, arginine and proline metabolism, SNARE interactions in vesicular transport, coagulation, endocytosis, arrhythmogenic right ventricular cardiomyopathy (AVRC), glycerophospholipid metabolism, peroxisome functions, selenoamino acid metabolism, and tight junction transmembrane proteins (Table

3.2). The overlap found in our dataset with the AVRC (a heart-specific condition) gene set points to a pitfall of GSEA and similar analyses. Care needs to be taken when making inferences from these outputs as gene set categories may not be relevant to the tissue of interest. Many of the other overlaps found in this comparison point to alterations in xenobiotic metabolism and other prominent biological processes that occur in the liver.

The cytochrome P450s CYP1A2 and CYP2A5, as well as GCH1, PAPSS2, SERPINA6,

IGFBP4, TMEM97, and PDK4 had higher expression in mice fed tangerine tomato enriched diets than those fed diets enriched with red tomatoes. CYP1A2 and CYP2A5 are part of enzyme families that process xenobiotic compounds as well as endogenous

(Hoffman et al., 2001; Mori et al., 2007) while PAPSS2 encodes phase II metabolism enzymes responsible for sulfonation reactions (Nowell and Falany, 2006). By contrast, other phase I and II related genes such as ACOX2, AOX1, GCLC, GSS, and retinol metabolism related genes RAI14 and RARRES1 were all downregulated in this comparison (Table 3.3). RAI14 and RARRES1 were upregulated in mice consuming diets enriched with red tomatoes relative to control animals (Table 3.3) and we hypothesized that this is due to the presence of provitamin A carotenoids that are absent in tangerine tomatoes. The downregulation of RAI14 and RARRES1 observed in tangerine compared to red supports that hypothesis. Lastly, several genes related to mammalian circadian

117 rhythm were differentially expressed including CRY1, ARNTL, NPAS2, BHLHE41,

PER2, and CLOCK. Tomato consumption has been previously shown to alter canonical pathways related to circadian rhythm in BCO2-/- mice (Tan et al., 2014). More specifically, the expression of genes such as PER2, CLOCK, and CRY2 in BCO1-/-BCO2-/- have been reported to be altered by tomato consumption (Xia et al., 2018) which corroborates well with our findings. There may be implications on xenobiotic metabolism given that the expression of genes encoding Phase I and II enzymes are tightly regulated by circadian rhythm (Y. K. J. Zhang et al., 2009). Strangely, these genes were only found to be differentially expressed when comparing mice fed tangerine and red tomato enriched diets and not in any other of the tomato based treatments.

Given that the effect of diet type modestly altered liver gene expression, we also approached our analysis by reclassifying our data to increase statistical power. Mice that consumed diets containing red or tangerine tomato were reclassified as one group

(“tomato”). In this contrast, only 12 genes were differentially expressed compared to mice fed a control diet. Due to the low number of differentially expressed genes, GSEA was not able to find any significant gene set overlaps. Of the genes that were differentially expressed, two were related to xenobiotic metabolism (CYP2C29 and pseudogene

CYP2C53-ps) while OSGIN1 and DLC1 have been shown to be downregulated or knocked out entirely during the development of hepatocellular carcinoma (HCC) (Jeng et al., 2015;

Wong et al., 2003).

118

Previous studies in similar models have shown that tomato consumption and/or the consumption of lycopene have a marked affected on the development of diseases leading to and including HCC by way of altering gene expression signatures (Ip et al., 2014;

Melendez-Martinez et al., 2013; Wang et al., 2010). However, these studies relied on PCR- based assays of a limited number of curated genes. RNA-Seq provides a more wholistic snapshot of the transcriptome by measuring the totality of genes expressed in a sample given ample sequencing depth and read alignment. Although we observed a relatively small number of genes change in response to diet compared to other studies that focus on vastly different biological statuses (e.g. cancer/non-cancer), our outcome is reasonable given that we were measuring the effects of including or excluding a specific vegetable in the diet.

While each group had unique transcriptional signatures in our study, genes associated with mammalian xenobiotic metabolism tended to be differentially expressed in all comparisons to varying degrees. To better contextualize how tomato consumption affected the liver and xenobiotic metabolism, we conducted an untargeted metabolomics analysis in parallel.

3.4.3 The chemical landscape of mouse liver tissue is affected by tomato consumption and less so by type

Log2 transformed and Pareto scaled data and their distribution can be seen in Figure 3.3A, demonstrating their suitability for analysis. Principal components analysis was run on the dataset with quality control (Figure 3.3B) samples to visualize data structure. Quality control samples were run every six injections throughout the batch and tight clustering of these samples indicates that identical samples cluster together, and our data is not confounded by run order (Figure 3.3B). Distinct separation can be seen between control samples and those from mice fed diets supplemented with red and tangerine tomato fed

119 mice (Figure 3.3B and Figure 3.3C). However, samples from mice consuming diets supplemented with red and tangerine tomato were not visually separated speaking to the similarity of these two tomato types relative to their comparison to control using PCA.

When comparing mice fed diets with supplemented with tomato to control, 119 features were determined to be statistically different by an FDR corrected t-test. On the other hand,

56 features were different when comparing mice fed diets supplemented with red or tangerine tomatoes. This outcome reinforces why control samples separated from tomato samples, but samples from mice who ate diets supplemented with either red or tangerine tomatoes were not as easily distinguished.

120

Figure 3.3 Boxplots of log2 transformed and Pareto scaled untargeted metabolomics data (including 7 QC samples on the right side) (A) and principal components analysis scores plots visualizing data structure from untargeted metabolomics (ESI+) data including quality control samples (B). Principal components analysis loading plots with samples colored by assigned k-means cluster groups for red, tangerine, and control (C), and tomato and control (D).

121

Data were analyzed both by comparison of control vs. red. vs. tangerine fed animals, and by combining red and tangerine into one group representing tomato intervention and comparing features to control animals (control vs. tomato). A combination of univariate (t-tests and ANOVA) and multivariate (k-means clustering, PLS-DA, and random forests) statistical procedures were utilized to prioritize features of interest for identification. Combinations of univariate and multivariate statistical procedures were used concurrently as a strategy to increase our confidence in features we detected as significant.

Partial least squares discriminant analysis, for example, is prone to overfitting and has been shown to be able to separate random data into groups (Kjeldahl and Bro, 2010; Westerhuis et al., 2008). By cross referencing our results from individual analyses, we were able to focus our attention on features that were consistently determined to be important.

We used k-means clustering as an unsupervised analysis to determine if we could delineate samples from different treatments. In Figure 3.3C and Figure 3.3D, PCA loading plots display samples colored by their assigned cluster. In Figure 3.3C, misclassification can be clearly observed. Out of 11 red and 12 tangerine tomato samples, only two tangerine tomato samples are correctly classified indicating a high-degree of similarity between samples from either tomato type. Recategorizing red and tangerine samples into “tomato” resulted in a 100% correct classification (Figure 3.3D) highlighting more distinguishing features that differ between mice fed a control diet or those fed diets enriched with tomato.

The output from t-tests run on all features comparing tomato to control can be seen in the form of a volcano plot in Figure 3.4. The majority of differentially present phytochemicals,

122 of which most were increasing (colored red), were likely derived from tomato. This outcome aligns with our hypothesis that tomato consumption will alter the chemical profile of the liver through the deposition of phytochemicals and their metabolites.

123

Figure 3.4 Volcano plot of features detected by untargeted metabolomics (ESI+) comparing mice fed diets enriched with tomatoes to control. Features with a -log10 FDR-adjusted P-value above 2 and a log2 fold change greater than 1 were colored red (i.e., higher in mice fed tomatoes). Features with a -log10 FDR-adjusted P-value above 2 and a log2 fold change less than -1 were colored green (i.e. higher in mice fed control diets).

124

Partial least squares discriminate analysis was used as a supervised approach to determine if individual mice could be classified by diet treatment, and to discover which features were most important in explaining this variation. Models were created on both red vs tangerine vs control as well as tomato vs control and performance statistics are enumerated in Table 4. Models performed best when red and tangerine samples were combined into one group (tomato vs. control). While PCA was unable to visually separate red from tangerine using two PCs, using PLS-DA we were able to generate good models

(Table 4), and rank features that delineated treatment groups. Previously, t-tests run on red and tangerine samples indicated 56 features were significant between the two treatments

(Figure 3.5) and PLS-DA models were able to leverage these variables to classify red and tangerine samples. Random forests were used as an additional supervised learning technique to prioritize features that differentiated our treatment groups. Random forest model tuning optimization parameters are displayed in Figure 3.7 and were manipulated to maximize prediction accuracy on a test data set. Variable importance scores were assigned to features in all PLS-DA and random forest models and the top 20 features for each model tended to be similar. Similarity among supervised analyses and univariate statistics provides additional confidence that these features are truly relevant.

125

Table 3.4 Partial least squares discriminate analysis model performance statistics.

Treatment Components R2Y P R2Y Q2 P Q2 RMSEE Comparison Usedz Red vs. Tangerine. vs. 5 0.981 0.01 0.808 0.01 0.072 Control

Tomato vs. Control 3 0.989 0.01 0.907 0.01 0.052

zNumber of components used automatically estimated by “ropls” package

126

Figure 3.5 Venn diagram displaying counts of significantly different features overlapping between treatment groups following the analysis of 2,492 detected features.

127

Figure 3.6 Random forest model tuning parameters used for untargeted metabolomics (ESI+) feature selection. Model error was estimated as a function of number of trees (A) and number of variables randomly selected at each node (B) when classifying red, tangerine, and control samples. The same parameters were tested for a random forest classification model with tomato and control samples (C and D).

128

Variable importance scores from PLS-DA models comparing red vs. tangerine vs. control were calculated and features whose VIP score exceeded 1.0 (Chong and Jun, 2005;

Lazraq et al., 2003; Sun et al., 2012) were used to create a heat map (Figure 3.7). A large section at the bottom left corner of the heatmap that contains 68 features distinctly absent in control but present in either red or tangerine samples. Of these features, 61 (or almost

90%) are believed to be metabolites of tomato steroidal alkaloids (Table 5). Identities were confirmed based on accurate mass matching, database searches (https://hmdb.ca), and

MS/MS fragmentation experiments. Tomatidine and dehydrotomatidine were confirmed by MS/MS spectra and retention time with authentic standards (Level 1 according to the

Metabolomics Standard Initiative) (Sumner et al., 2007). Other alkaloids include level 2 and 3 tentatively identified desaturated, hydroxylated, sulfonated, and glucuronidated forms that could be products of many of the Phase I and II enzymes encoded by genes that were differentially expressed in our study (e.g. CYP2C29, PAPSS2). Common fragments enumerated in Table 5 match those previously reported for steroidal alkaloids and their metabolites (Caprioli et al., 2015; Cichon et al., 2017b; Dzakovich et al., 2020; Hövelmann et al., 2019). In many cases, multiple peaks were present for the same m/z, indicating a series of isomers with functional groups at different locations (Table 5). While the 273 and 255 or 271 and 253 fragments have been previously reported for saturated and desaturated steroidal alkaloids, respectively (Dzakovich et al., 2020; Sonawane et al.,

2018), we also observed masses such as 269 and 251.However, there is not always a perfect relationship between desaturation and the presence of the 271 and 253 fragment as previously demonstrated in hydroxytomatine (Caprioli et al., 2015).These masses represent fragments with an additional desaturation ostensibly from the loss of a hydroxyl group.

129

This effect was also observed in fragments derived from the E and F rings such as 157 and

124. Example spectra for dihydroxytomatidine and sulfated hydroxytomatidine can be found in Figure 3.8. Positions of double bonds and functional groups were provisionally assigned based on previous reports (Caprioli et al., 2015; Cichon et al., 2017b). The presence of multiple isomers for each alkaloid and unique fragmentation patterns for each isomer indicates that some enzymes involved in Phase I metabolism, such as cytochromes

P450s, are able to interact with multiple parts of the alkaloid molecule (Guengerich, 2018).

130

Figure 3.7 Heatmap of features detected by untargeted metabolomics (ESI+) with variable importance scores > 1.0 generated by a PLS-DA model. Hierarchical clustering was used horizontally categorize samples and vertically group features using Euclidean distances and Ward’s linkage method.

131

Steroidal alkaloids have been previously reported in plasma (Cichon et al., 2017b) and skin (Cooperstone et al., 2017) of mice that consumed tomato enriched diets, however, the results here considerably expand this list of metabolites. More recently, metabolites of some tomato steroidal alkaloids have been detected in human urine (Hövelmann et al.,

2019). Although previously deemed unabsorbable since steroidal (glyco)alkaloids can bind to dietary cholesterol and be excreted (Friedman, 2002), both in vitro and in vivo studies have demonstrated a variety of bioactive roles these compounds play (Choi et al., 2012;

Dyle et al., 2014; Ebert et al., 2015; Lee et al., 2004). In our study, we have tentatively identified 17masses (with level 2 or 3 confidence) and positively identified two masses of metabolites derived from tomato steroidal alkaloids indicating that they are absorbed and then metabolized in a variety of ways (Table 5). Given that these features were unique to tomato-fed animals, some might be excellent candidates for biomarkers of tomato consumption. The contribution of steroidal glycoalkaloids to the bioactivity of tomatoes against chronic diseases in humans warrants further investigation.

Here, we found that tomato consumption is able to subtly alter the transcriptional signature of liver tissue and these alterations are unique based on the type of tomato. Genes related to Phase I and II xenobiotic metabolism were modestly altered by tomato consumption in all comparisons. While red vs tangerine had the highest amount of differentially expressed genes, an untargeted metabolomics analysis revealed that 57 features were different between these two groups. 119 features were different between either tomato group and control; highlighting a disconnect between these two omics approaches. Differences in

132 chemical profiles did not necessarily guarantee the biggest differences in transcriptional signatures.

Additionally, we confirmed the identity of two steroidal alkaloids and tentatively identified

17 masses of Phase I and II steroidal alkaloid metabolites. This finding greatly expands upon reports of steroidal alkaloids found in mouse skin and plasma where only a few chemical species have been tentatively identified. While once thought to not be absorbed from the diet, we have demonstrated that they are indeed absorbed and metabolized. Given the reported bioactivity of steroidal alkaloids, their presence in the liver indicates that they may contribute to the health benefits associated with tomato consumption. More research is needed to determine the role (if any) steroidal glycoalkaloids play in human health.

133

Table 3.5 Tomato steroidal alkaloid-derived metabolites present in liver tissue from tomato-fed (red and tangerine) mice that were absent in control animals (Continued). Tentative Molecular Retention Monoisotopic Observed Mass Distinguishing Identification Identification Formula Time Mass Mass Error MS/MS Fragments Levelz (min) [M+H] (Δ ppm) Didehydrotomatidine C27H41NO2 5.912 411.3137 412.3216 -0.24 394.3157, 271.2053, 2 161.1325, 124.1123, 112.0760 6.079 412.3213 0.49 394.3082, 269.1908, 2 161.1321, 124.1105, 112.0752 6.33 412.3208 1.70 394.3116, 269.1893, 2 126.1278, 124.1126 6.623 412.3207 1.94 394.3074, 269.1886, 2 255.2088, 161.1322, 126.1241 7.092 412.3219 -0.97 394.3122, 269.1909, 2 159.1129, 126.1278, 124.1125 Dehydrotomatidine C27H43NO2 5.474 413.3294 414.3366 1.45 396.2886, 273.2223, 2 255.2135, 161.1321, 124.1134 6.06 414.3372 0.00 396.3253, 273.2205, 2 255.2107, 253.1953, 124.1127 6.604 414.3372 0.00 396.3264, 271.2069, 2 253.1956, 161.1325, 126.1280 6.855 414.3352 4.83 396.3234, 271.2044, 2 253.1944, 161.1318, 126.1283

134

7.189 414.3369 0.72 396.3274, 271.2059, 1 253.1958, 161.1327, 126.1295 Tomatidine C27H45NO2 6.223 415.3450 416.3529 -0.24 398.3423, 273.2219, 1 255.2112, 161.1329, 126.0986 6.516 416.3536 -1.92 398.3429, 273.2214, 2 255.2120, 161.1335, 126.1271 Didehydro- C27H41NO3 5.182 427.3086 428.3158 1.40 410.3040, 271.2062, 2 hydroxytomatidine 253.1940, 161.1330, 126.1268 5.266 428.3167 -0.70 410.3058, 287.2003, 2 273.1349, 253.1902, 161.1329 5.391 428.3152 2.80 410.3093, 287.1992, 2 273.2251, 271.2043, 253.1952, 161.1320, 128.0700 5.475 428.3167 -0.70 410.3052, 381.2842, 2 287.2003, 271.2061, 253.1945, 161.1331, 128.0707, 116.0704 5.767 428.3164 0.00 410.3050, 287.2014, 2 271.2058, 269.1900, 251.1805, 161.1327, 126.1280, 124.1122 5.851 428.3158 1.40 410.3044, 269.1885, 2 253.2011, 162.1278, 124.1104 5.976 428.3166 -0.47 410.3047, 273.2165, 2 267.1742, 253.1941, 161.1322,

135

Dehydro- C27H43NO3 4.847 429.3243 430.3318 0.70 412.3257, 289.2182, 2 hydroxytomatine 271,2064, 253.0629, 161.1327, 124.1137 5.056 430.3324 -0.70 412.3193, 283.2360, 2 273.2206, 255.2106, 161.1317,128.0703 5.265 430.332 0.23 412.3213, 285.1847, 2 269.1899, 251.1794, 159.1159, 126.1276 5.391 430.3318 0.70 412.3194, 273.2206, 2 269.1892, 255.2108, 161.1314, 156.1024, 128.0699 5.767 430.3326 -1.16 412.3212, 287.2014, 2 269.1895, 251.1805, 161.1329, 126.1280 5.683 430.3312 2.09 412.3208, 383.2933, 2 273.2219, 271.2055, 253.1935, 161.1322, 124.1112 5.892 430.3317 0.93 412.3187, 271.2061, 2 253.1993, 161.1336, 124.1100 6.018 430.3315 1.39 412.3209, 383.2971, 2 287.2016, 271.2063, 253.1949, 161.1328, 159.1174, 124.1125 Hydroxytomatidine C27H45NO3 5.057 431.3399 432.3473 0.93 414.2974, 289.2159, 2 271.2051, 253.1954, 161.1362 5.183 432.3481 -0.93 414.3370, 289.2133, 2 271.2016, 253.1952,

136

5.225 432.3491 -3.24 414.3342, 289.2191, 2 273.2206, 271.2055, 253.1944, 161.1322, 124.1129 5.434 432.3471 1.39 414.3351, 289.2144, 2 271.2059, 253.1951, 161.1316, 124.1121 5.643 432.3476 0.23 414.3364, 289.2183, 2 271.0619, 253.1963, 162.1267, 124.1086 5.726 432.3481 -0.93 414.3342, 289.8590, 2 273.2197, 255.2142, 161.1344, 159.1171, 124.1130 5.894 432.347 1.62 414.3339, 289.2166, 2 273.2196, 253.1955, 161.1323 6.061 432.348 -0.69 414.3374, 271.8424, 2 253.1957, 162.1281,126.1292, 124.1101 6.228 432.348 -0.69 414.3360, 255.2095, 2 251.1796, 161.1321, 126.1251 6.312 432.3465 2.78 414.3375, 255.2136, 2 251.1761, 161.1337, 159.1169, 126.1281, 124.1129 Tridehydro- C27H39NO4 6.604 441.2879 442.2949 1.81 424.3126, 289.1929, 2 hydroxytomatidine 267.1182, 253.1974, 161.1321, 159.1151,157.1020

137

6.771 442.2965 -1.81 424.3288, 289.2167, 2 271.2054, 253.1941, 157.1026 6.897 442.2952 1.13 424.3831, 289.2153, 2 271.2056, 253.1955, 159.1151, 157.1020 Didehydro- C27H41NO4 5.3 443.3036 444.3102 0.68 3 hydroxytomatidine 4.51 444.3107 -0.45 3 5.03 444.3108 -0.68 3 Dehydro- C27H43NO4 4.258 445.3192 446.3273 -0.67 428.3205, 289.2156, 2 dihydroxytomatidine 253.1958, 161.1332, 157.0996, 128.0707 4.551 446.3257 2.91 428.3150, 2 271.2049,128.0688 4.76 446.326 2.24 428.3148, 287.2012, 2 269.1891, 251.1771 4.885 446.3279 -2.02 428.3157, 287.1998, 2 269.1903, 255.1137 128.0711 5.052 446.3283 -2.91 428.3171, 289.2133, 2 287.2058, 271.2036, 269.1903 5.136 446.3273 -0.67 428.3153, 287.2010, 2 271.2053, 251.1794, 162.1267, 161.1314 5.345 446.3259 2.46 428.3158, 287.2046, 2 271.2062, 253.1932, 161.1343, 124.1092 Dihydroxytomatidine C27H45NO4 4.347 447.3348 448.3423 0.67 430.3316, 287.2013, 2 or Esculeogenin B 269.1903, 162,1265, 126.1284, 124.1092

138

4.639 448.3419 1.56 430.3332, 412.3235, 2 289.2135, 271.2012, 253.1916, 157.0994, 162.1265 126.1253 4.807 448.3415 2.45 430.3315, 412.3180, 2 287.2002, 159.1174, 124.1104 4.932 448.3426 0.00 430.3317, 412.3177, 2 383.2954, 289.2135, 287.2002, 273.2214, 271.2051, 269.1875, 255.2095, 126.1268 5.141 448.3432 -1.34 430.3315, 383.2940, 2 273.2199, 255.2126, 253.1896, 251.1796, 161.1323, 156.1020 5.308 448.3435 -2.01 430.3306, 287.202, 2 271.1150, 253.2007, 251.1795, 162.1237 126.1278 5.434 448.3423 0.67 430.3319, 412.3163, 2 383.2908, 273.2215, 255.2090, 161.1331, 156.0977, 128.0703 5.517 448.342 1.34 430.3325, 273.2211, 2 255.2101, 161.1320, 156.1010, 128.0706 5.643 448.3416 2.23 430.3338, 383.2903, 2 289.2134, 273.2203, 271.2114, 253.0177, 161.1327, 128.0725 5.768 448.3421 1.12 430.3319, 412.3208, 2 383.2950, 273.2211,

139

255.2106, 161.1321, 156.1012, 128.0699 Trihydroxy- C27H43NO5 3.76 461.3141 462.3213 1.30 3 dehydrotomatidine Trihydroxytomatidine C27H45NO5 4.221 463.3298 464.3396 -4.31 446.3268, 428.3145, 2 289.2185, 271.2035, 253.1955, 156.1025, 128.0707 4.388 464.3373 0.65 446.3261,428.3156, 2 289.2155, 271.2063, 253.1935, 156.1015 4.555 464.3409 -7.11 446.3274, 428.3163, 2 289.2165, 273.1262, 253.1962, 161.1330, 156.1004 4.722 464.3384 -1.72 446.3274, 428.3259, 2 289.2150, 271.2055, 253.1957 4.89 464.3351 5.38 446.3267, 428.3201, 2 289.2132, 271.2062, 140.1060 5.015 464.3374 0.43 446.3279, 271.2033, 2 253.1968, 161.1321 5.099 464.3363 2.80 446.3265, 289.2156, 2 271.2057, 255.2080, 253.1941, 161.1326, 128.0180 Acetoxytomatidine C29H47NO4 4.75 473.3505 474.3582 0.21 3 Tetra- C27H45NO6 3.33 479.3247 480.3321 0.00 3 hydroxytomatidine Acetoxy- C29H47NO5 5.62 489.3454 490.3529 0.00 3 hydroxytomatidine

140

Sulfated C27H45NO5S 5.52 495.3018 496.3099 -0.60 416.3528, 298.3426, 2 hydroxytomatidine 273.2207, 255.2117, 161.1339, 126.1281

Glucuronidated C33H51NO7 6.059 573.3666 574.3736 1.39 422.6832, 273.2215, 2 dehydrotomatidine 255.1923, 244.1200, 161.1328, Glucuronidated C33H53NO9 5.139 607.3720 608.379 0.66 432.3437, 289.2188, 2 hydroxytomatidine 271.2056, 253.1944, 161.1318, 126.1279 Sulfated C33H51NO10S 4.972 653.3234 654.331 0.31 574.3741, 556.3635, 2 Glucuronidated 274.1166, 273.2192, dehydrotomatidine 269.1901,255.2149 zIdentification level based on the Metabolomics Standards Initiative (Sumner et al., 2007)

141

Figure 3.8 Spectra of dihydroxytomatidine (A) and sulfated hydroxytomatidine (B) derived from MSMS experiments conducted on a UHPLC-QTOF/MS using a collision energy of 30 eV. Double bonds and functional groups are provisionally assigned.

142

Chapter 4. A High-Throughput Extraction and Analysis Method for Steroidal Glycoalkaloids in Tomato

Michael P Dzakovich1, Jordan L. Hartman1, Jessica L Cooperstone1,2

Affiliations 1The Ohio State University, Department of Horticulture and Crop Science, 2001 Fyffe Court, Columbus, OH 43210. 2The Ohio State University, Department of Food Science and Technology, 2015 Fyffe Court, Columbus, OH 43210.

Available at: doi:10.3389/fpls.2020.00767

4.1 Abstract

Tomato steroidal glycoalkaloids (tSGAs) are a class of cholesterol-derived metabolites uniquely produced by the tomato clade. These compounds provide protection against biotic stress due to their fungicidal and insecticidal properties. Although commonly reported as being anti-nutritional, both in vitro as well as pre-clinical animal studies have indicated that some tSGAs may have a beneficial impact on human health. However, the paucity of quantitative extraction and analysis methods presents a major obstacle for determining the biological and nutritional functions of tSGAs. To address this problem, we developed and validated the first comprehensive extraction and ultra-high- performance liquid chromatography tandem mass spectrometry (UHPLC-MS/MS) quantification method for tSGAs. Our extraction method allows for up to 16 samples to be extracted simultaneously in 20 minutes with 93.0 ± 6.8% and 100.8 ± 13.1% recovery

143 rates for tomatidine and alpha-tomatine, respectively. Our UHPLC-MS/MS) method was able to chromatographically separate analytes derived from 18 tSGAs representing 9 different tSGA masses, as well as two internal standards, in 13 minutes. Tomato steroidal glycoalkaloids that did not have available standards were annotated using high resolution mass spectrometry as well as product ion scans that provided fragmentation data. Lastly, we utilized our method to survey a variety of commonly consumed tomato-based products. Total tSGA concentrations ranged from 0.7 to 3.4 mg/serving and represent some of the first reported tSGA concentrations in tomato-based products. Our validation studies indicate that our method is sensitive, robust, and able to be used for a variety of applications where concentrations of biologically relevant tSGAs need to be quantified.

4.2 Introduction

Solanaceous plants produce a spectrum of cholesterol derived compounds called steroidal glycoalkaloids. While each solanaceous clade produces its own unique assortment of steroidal glycoalkaloids, these metabolites share commonality in their role as phytoanticipins and anti-herbivory agents (Etalo et al., 2015; Fontaine et al., 1948; Irving et al., 1945; Ökmen et al., 2013) . Tomato (Solanum lycopersicum and close relatives) is no exception, and over 100 tomato steroidal glycoalkaloids (tSGAs, Error! Reference source not found.) have been suggested . Although these compounds are typically reported as anti-nutritional (Ballester et al., 2016; Cárdenas et al., 2015; Itkin et al., 2013)

, other studies suggest a health-promoting role. In fact, emerging evidence suggests that some tSGAs may play a role in positive health outcomes associated with tomato 144 consumption (Cayen, 1971; Choi et al., 2012; Cooperstone et al., 2017; Lee et al., 2004)

While these compounds continue to be evaluated both in planta and in vivo, there is a lack of quantitative and validated methods to extract and measure tSGAs from tomatoes; a critical need for additional research in this area.

145

Figure 4.1 Structural and isomeric variation in selected tomato steroidal alkaloids. Steroidal glycoalkaloids found in tomato (tSGAs) are spirosolane-type saponins with variations in a singular double-bond (C5:6), F-ring decorations (C22-C27), F-ring rearrangement (resulting in a change in stereochemistry at C22), and C3 glycosylation (typically a four-sugar tetrasaccharide, lycotetraose). The undecorated SA steroidal alkaloid backbone is shown first with relevant carbons numbered and ring names (A-F). Steroidal alkaloids were grouped based on structural similarity with bonds of varying stereochemistry denoted by wavy bonds and varying C5:6 saturation status denoted by a dashed bond. Structural variation, along with the monoisotopic mass, molecular formula, and common name are displayed alongside structures for each group. R-groups were used to denote status of C3 glycosylation (R1 and R2) and possible positions of glucosylation on glucosylated (dehydro)acetoxytomatine (R3, R4). All possible isomers and derivatives are not shown, just those quantitated in this method

146

Tomato steroidal glycoalkaloids are typically extracted by grinding individual samples using a mortar and pestle, or blender and then solubilizing analytes with polar solvent systems, typically methanol. This approach is time consuming because each sample is handled individually. Additionally, this technique has been used for relative profiling, and has not been evaluated for its ability to extract tSGAs quantitatively. Tomato steroidal glycoalkaloids such as alpha-tomatine, have previously been quantified using gas and liquid chromatography (Kozukue and Friedman, 2003; Lawson et al., 1992; Rick et al., 1994) , as well as a number of bioassays including cellular agglutination (Schlösser and Gottlieb, 1966) and radioligand assays using radioactive cholesterol (E.A. Eltayeb and Roddick, 1984) . These methods are unreliable, suffer from poor sensitivity, have poor selectivity for different alkaloids, and are time consuming. Recent advances in analytical chemistry have enabled researchers to discover other tSGA species in tomato fruits using high resolution mass spectrometry (Iijima et al., 2008, 2013; Zhu et al.,

2018), however these methods are qualitative. A small number of quantitative methods using mass spectrometry have been developed, but only for individual or a few tSGAs

(Caprioli et al., 2014; Baldina et al., 2016). Thus, there is a need to develop validated extraction and quantification methods in order to continue to study the role these compounds have in both plant and human health.

To address the lack of suitable approaches to extract and quantify tSGAs, we developed and validated a high-throughput extraction and ultra-high-performance liquid chromatography tandem mass spectrometry (UHPLC-MS/MS) method suitable for

147 tomato and tomato-based products. Our extraction method is able to process 16 samples in parallel in 20 minutes (1.25 min/sample) and our UHPLC-MS/MS method can chromatographically separate, detect, and quantify 18 tSGAs (using two external and two internal standards) representing 9 different tSGA masses (Figure 4.2) in 13 minutes per sample. This is the first comprehensive targeted method to quantify a broad panel of tSGAs. Here, we present the experiments used to develop and validate our method as well as an application providing baseline information of tSGA concentrations in tomatoes and commonly consumed tomato products.

148

Figure 4.2 Chromatogram of tSGAs found in red ripe tomatoes measured by our UHPLC-MS/MS method. Peaks are identified as follows: 1a–c: Esculeoside B and isomers; 2a–d: Hydroxytomatine; 3: Dehydrolycoperoside F, G, or Dehydroesculeoside A; 4a,b: Lycoperoside F, G, or Esculeoside A; 5a–c: Acetoxytomatine; 6a,b: Dehydrotomatine; 7: Alpha-tomatine; 8: Alpha-solanine; 9: Solanidine; 10: Tomatidine; 11: Dehydrotomatidine.

149

4.3 Materials and Methods

4.3.1 Chemical and reagents

Acetonitrile (LC-MS grade), formic acid (LC-MS grade), isopropanol (LC-MS grade), methanol (HPLC grade), and water (LC-MS grade) were purchased from Fisher Scientific

(Pittsburgh, PA). Alpha-tomatine (≥90% purity) and solanidine (≥99% purity) were purchased from Extrasynthese (Genay, France). Alpha-solanine (≥95% purity) and tomatidine (≥95% purity) were purchased from Sigma Aldrich (St. Louis, MO). Stock solutions were prepared by weighing each analyte into glass vials and dissolving into methanol prior to storage at -80 °C. Standard curves were prepared by mixing 15 nmol of alpha-tomatine and 1 nmol of tomatidine in methanol. The solution was evaporated to dryness under a stream of ultra-high purity (5.0 grade) nitrogen gas. The dried residue was then resuspended in 900 µL of methanol, briefly sonicated (~ 5 s), and then diluted with an additional 900 µL of water. An 8-point dilution series was then prepared, and analyte concentrations ranged from 3.81 pmol/mL to 8.34 nmol/mL (11.14 femtomoles to 25 picomoles injected).

To utilize alpha-solanine and solanidine as internal standards (IS), 1.25 nmol and

22.68 pmol of alpha solanine and solanidine, respectively, were spiked into each vial of the alpha-tomatine/tomatidine external standard dilution series described above. The spike intensity of alpha-solanine and solanidine was determined by calculating the amount needed to achieve target peak areas of tSGAs typically seen in tomato samples. 150

4.3.2 Sample material

For UHPLC-MS/MS and UHPLC-quadrupole time-of-flight mass spectrometry (UHPLC-

QTOF-MS) method development experiments, 36 unique accessions of tomato including

Solanum lycopersicum, Solanum lycopersicum var. cerasiforme, and Solanum pimpinellifolium were combined and pureed to create a tomato reference material expected to span the diversity of tSGAs reported in nature. For spike-in recovery experiments, red- ripe processing-type tomatoes (OH8245; courtesy of David M. Francis) were diced, mixed together by hand, and stored at -20 °C until analysis. Items used for the tomato product survey were purchased from supermarkets in Columbus, OH in July 2019. Three unique brands of tomato paste, tomato juice, diced tomatoes, whole peeled tomatoes, ketchup, pasta sauce, and tomato soup were analyzed for tSGAs. Additionally, four heirloom, two fresh-market, one processing, and one cherry variety of unprocessed tomatoes were also analyzed.

4.3.3 Extraction of tSGAs

Five grams of diced OH8245 tomato (± 0.05 g) were weighed in 50 mL falcon tubes. Five grams was selected to balance between sampling enough tissue to allow homogeneity in sampling, and to keep extraction volumes contained to a 50 mL tube. Two 3/8” x 7/8” angled ceramic cutting stones (W.W. Grainger: Lake Forest, IL; Item no.: 5UJX2) were placed on top of the tomato sample and 100 µL of internal standard was added, followed by 15 mL of methanol. Samples were then extracted for 5 minutes at 1400 RPM using a

151

Geno/Grinder 2010 (SPEX Sample Prep: Metuchen, NJ). Sample tubes were immediately centrifuged for 5 minutes at 3000 x g and 4 °C. Two mL aliquots of supernatant from each sample were then transferred to glass vials and diluted with 1 mL of water. Samples were then filtered into LC vials using a 0.22 µm nylon filter (CELLTREAT Scientific Products:

Pepperell, MA).

Tomato products sourced from grocery stores were extracted as described above except fresh fruits of each type were blended in a coffee grinder prior to extraction. To account for differences in water content among the tomato products, 500 µL aliquots from each sample were dried down under nitrogen gas, re-dissolved in 1.5 mL of 50% methanol, and filtered using a 0.22 µm filter prior to analysis.

4.3.4 UHPLC-MS/MS Quantification of tSGAs

Tomato steroidal glycoalkaloids were chromatographically separated on a Waters

(Milford, MA) Acquity UHPLC H-Class System using a Waters C18 Acquity bridged ethylene hybrid (BEH) 2.1 x 100 mm, 1.7 µm particle size column maintained at 40 °C.

The autosampler compartment was maintained at 20 °C. A gradient method with Solvent

A (water + 0.1% (v/v) formic acid) and Solvent B (Acetonitrile + 0.1% (v/v) formic acid) at a flow rate of 0.4 mL/min was utilized as follows: 95% A for 0.25 minutes, 95% A to

80% A for 1.0 minute, 80% A to 75% A for 2.5 minutes, 75% A held for 0.5 minutes, 75%

A to 68% A for 1.7 minutes, 38% A to 15% A for 1.7 minutes, 0% A held for 3.0 minutes, and back to 95% A for 2.35 minutes to re-equilibrate the column. Each run lasted 13 minutes and the sample needle was washed for 10 seconds with 1:1 methanol:isopropanol

152 before and after each injection to minimize carryover. Column eluent was directed into a

Waters TQ Detector tandem mass spectrometer and source parameters and transitions can be found in Table 4.1. Dwell times were optimized for each analyte to allow for 12-15 points across each peak. Quantification was carried out using 6-8 point external calibration curves, depending on the extent of linearity for a given analyte. Relative quantification was used for tSGAs (quantified using alpha-tomatine) and their aglycones (quantified using tomatidine) that did not have commercially available standards. Additionally, signals were normalized to alpha-solanine and solanidine for glycosylated and aglycone analytes to correct for instrument variability.

153

Table 4.1 LC-MS/MS MRM parameters of steroidal glycoalkaloids quantified by our method. Analyte Retention Parent Mass Product Ions Cone Collision Time [M+H]a Voltage Energy (eV) (min) (V)

Esculeoside B 2.55, 2.67, 1228.6 254.9*, 1048.8 75 65, 40 2.74 Hydroxytomatine 3.02, 3.28, 1050.6 254.9, 1032.7 55 55, 30 3.37, 3.57

Dehydrolycoperoside F, G, 3.08 1268.6 252.9*, 1208.9 80 65, 35 or Dehydroesculeoside A

Lycoperoside F, G, or 3.11, 3.26 1270.6 1048.8, 1210.9 70 60, 30 Esculeoside A

Acetoxytomatineb (I) 4.28 1092.6 84.7, 1032.7 40 65, 35

Dehydrotomatinec 5.09, 5.49 1032.5 84.7, 252.9* 70 80, 50

Acetoxytomatine (II) 5.42, 5.66 1092.6 144.7, 162.8 40 50, 45

Alpha-tomatinec 5.45 1034.6 84.7*, 160.8 70 85, 60

Alpha-solaninec,d 5.64 869.1 97.8*, 399.1 70 85, 65

Solanidinec,d 7.22 398.7 80.7, 97.8* 70 55, 35

Tomatidinec 7.30 416.4 160.8, 254.9 50 30, 30

Dehydrotomatidinec 7.36 414.3 125.8, 270.7 40 30, 20 aAnalytes were quantified using the following settings: Mass span: 0.3 Da, Capillary voltage: 0.5 kV, extractor voltage: 5 V, RF Lens voltage: 0.5 V, source temperature: 150°C, desolvation temperature: 500°C, desolvation flow rate: 1000 L/hr, cone gas flow rate: 50 L/hr. bCommonly referred to as lycoperosides A, B, or C cIndicates that analyte was confirmed by authentic standard dIndicates analyte used as an internal standard. *Indicates quantifying ion; other ions used for qualifying purposes. Compounds with no indicated quantifying ion were quantified using the sum of both MRM transitions.

154

4.3.5 UHPLC-QTOF/MS Confirmation of tSGA Identities

We verified the identities of our tSGA analytes using an Agilent 1290 Infinity II UHPLC coupled with an Agilent 6545 QTOF-MS. Identical column and chromatographic separation conditions were used as described above for our MS/MS method. The QTOF-

MS used an electrospray ionization source operated in positive mode and data were collected from 50-1700 m/z for both full-scan and MS/MS experiments. Gas temperature was set to 350 °C, drying gas flow was 10 L/min, nebulizer gas flow was 10 L/min, nebulizer was 35 psig, and sheath gas flow and temperature was 11 L/min and 375 °C, respectively. For MS/MS experiments on the QTOF-MS, identical parameters were used except for the selection of tSGA masses of interest and a two-minute retention time window around each analyte to maximize duty cycle of the instrument. Collision energy for all tSGAs was set to 70 eV and all aglycones were fragmented with 45 eV.

Limit of Detection (LOD) and Limit of Quantification (LOQ)

Limit of detection and LOQ were calculated using six replicates of the lowest concentration standard curve calibrant sample (3.81 and 0.254 femtomoles on column for alpha-tomatine and tomatidine, respectively) and determining their signal to noise ratios.

Moles on column at 3/1 and 10/1 signal to noise were then determined for alpha-tomatine and tomatidine to calculate LOD and LOQ.

155

4.3.6 Spike Recovery Experiments

Ten, 5 g (± 0.01 g) replicates of diced OH8245 processing tomatoes were weighed into 50 mL falcon tubes. Five samples were extracted as outlined previously with the addition of a

100 µL methanolic solution containing 1.67 nmol of alpha-tomatine, 1.25 nmol of alpha solanine, 12.4 pmol of tomatidine, and 22.68 pmol of solanidine (spiked tomato) while another five samples were extracted without IS solution (non-spiked tomato; 100 µL of methanol used in its place). The IS was allowed to integrate into the sample matrix for 30 min. Another set of five samples were prepared by substituting tomato with 5 mL of water and extracted with the addition of 100 µL of the methanolic IS solution mentioned previously (spiked mock sample). Percent recovery was estimated using the following equation:

푆푝푖푘푒푑 푇표푚푎푡표 푅푒푐표푣푒푟푦 (%) = 푁표푛 − 푠푝푖푘푒푑 푇표푚푎푡표 + 푆푝푖푘푒푑 푀표푐푘 푆푎푚푝푙푒

4.3.7 Intra/Interday Variability Experiments

Eight OH8245 tomato fruits were blended together and 5 g aliquots (± 0.05 g) were distributed among 18, 50 mL tubes, and frozen at -20 °C. Over three days, six tubes were randomly selected from the freezer each day and tSGAs were extracted and quantified as outlined above by a single individual. Intraday variability was determined by computing the coefficient of variation for an analyte within a day. Interday variability was calculated by taking the coefficient of variation of all samples run over the three-day period.

156

Autosampler Stability Experiments: A quality control sample containing multiple tomato species, as described, above was extracted with the addition of 100 µL of IS solution as outlined previously. Over a period of 12 hours, the quality control sample was injected and analyzed by UHPLC-MS/MS at hourly intervals. The vial cap was replaced after each injection to prevent sample evaporation between injections and the autosampler compartment was maintained at 20 °C.

4.4 Results and Discussion:

4.4.1 Development of high-throughput extraction method

Generally, tomato samples are pulverized in a mortar in the presence of liquid nitrogen or homogenized using a blender prior to extracting tSGAs. Tomato steroidal glycoalkaloids are considered semi-polar metabolites and are typically extracted via physical disruption in a methanolic solvent system (Moco et al., 2006; Iijima et al., 2008, 2013; Mintz-Oron et al., 2008; Ballester et al., 2016). Current methods are time consuming since each sample needs to be processed individually. Our protocol features a combined homogenization/extraction step using a Geno/Grinder system that can process up to 16 samples at once. Given a five-minute homogenization/extraction, five-minute centrifugation, and an approximately ten-minute dilution/filtration step, our extraction method can process 16 samples every 20 minutes (1.25 min/sample) making it ideal for screening large tomato populations or large sample sets of tomato products. Moreover, the tomato sample is able to stay frozen until the extraction begins which prevents potential enzymatic modification and degradation of analytes. 157

4.4.2 Selection of Precursor Ions

Over 100 tSGAs have been tentatively identified in tomato using high-resolution mass spectrometry and some with MS/MS fragmentation (Iijima et al., 2008, 2013; Zhu et al.,

2018). However, we do not know the specific concentrations of tSGA accumulating in fruits. To study tSGAs further, quantitative analysis methods are necessary. In order to maximize the amount of tSGAs detected and separated in our method, we first compiled a target list of biologically relevant tSGAs by surveying the literature (Fujiwara et al., 2004;

Iijima et al., 2009; Alseekh et al., 2015; Cichon et al., 2017; Cooperstone et al., 2017; Zhu et al., 2018; Hövelmann et al., 2019). Tomato steroidal glycoalkaloid species were prioritized based on their perceived abundance in the tomato clade, previous structural characterization, and having an established record of being impacted by or a part of biological processes such as ripening or plant defense. Using this process, 18 masses covering at least 25 different tSGA species were selected for chromatographic separation and quantification.

A 50% aqueous methanolic extract from a reference material comprised of red-ripe

Solanum lycopersicum, Solanum lycopersicum var. cerasiforme, and Solanum pimpinellifolium fruits was used for method development on a Waters Acquity UHPLC H-

Class System connected to a TQ Detector triple quadrupole mass spectrometer with electrospray ionization operated in positive ion mode. A gradient progressing from 5% to

100% acetonitrile over 15 minutes run on a Waters 2.1 x 100 mm (1.7 µm particle size) column at 0.4 mL/min was used to separate as many potential analytes as possible. Selected

158

Ion Recordings (SIRs) of masses of interest were utilized to identify potential tSGA species. Since only two alkaloids of interest are available commercially (alpha-tomatine and tomatidine), elution order, accurate mass, and fragmentation patterns were used to assign identity all other tSGAs. Source parameters of the MS were then adjusted to the maximize signal of both identified and tentatively identified tSGAs Those tSGAs which were readily detectable in our pooled tomato quality control samples were used in our final method. While studied more extensively than many other tSGAs, we were not able to detect and quantify beta-, gamma-, and delta-tomatine in our reference material, and thus they are not included in our panel.

4.4.3 Use of Internal Standards

We tested three, commercially available potato-derived alkaloids for their suitability as internal standards to correct for inter and intraday variability created in the MS. Alpha- solanine, alpha-chaconine, and solanidine (aglycone of alpha solanine) were selected based on their similarity in structure, ionization efficiencies and retention times to tomato-derived alkaloids. However, alpha-chaconine was excluded due to co-elution with alpha-tomatine.

We determined 1.25 nmol and 22.68 pmol of alpha solanine and solanidine, respectively, should be added to each sample (41.7 femtomoles of alpha solanine and 0.756 femtomoles of solanidine on column) to achieved comparable peak areas to those observed for tSGAs and their aglycones such as tomatidine and dehydrotomatidine (Figure 4.2). Alpha-solanine and solanidine multiple reaction monitoring (MRMs) experiments were then optimized in tandem with tSGAs of interest as follows.

159

4.4.4 Optimization of MS parameters

Desolvation temperature, desolvation gas flow rate, and cone voltage were experimentally optimized. All other source parameters remained at their recommended default settings and are reported in the footer of (Table 4.1). For all experiments, vial caps were replaced after each injection to prevent any possible effects from evaporation through the pierced septa.

To optimize the desolvation temperature, a 50% aqueous methanolic solution of alpha- tomatine and tomatidine was injected and desolvation temperatures ranging from 350 °C to 500 °C at 25 °C increments were tested. A 500 °C desolvation temperature resulted in the highest signal. Desolvation gas flow was tested in a similar manner starting from 600

L/hr to 1000 L/hr in 100 L/hr increments. Likewise, the 1000 L/h flow rate resulted in the most signal for both analytes. Alpha-tomatine and tomatidine were used in these experiments because of their commercial availability, their structural similarity to other tSGAs of interest, and their intended use for relative quantification of all other tSGAs and their aglycones. Finally, cone voltage was optimized by injecting a 50% aqueous methanolic extract of our tomato reference material and measuring the signal of each SIR.

Cone voltages ranged from 20 to 90 V and successive injections were made in 5 V increments. Optimal cone voltages were specific to each mass and are notated in Table 4.1.

With source parameters set to optimize the signal of all precursor ions of interest, product ion scans were then conducted to tentatively identify tSGAs and aid in the development of

MRM experiments, which were ultimately used for quantification.

160

Since each SIR yielded multiple peaks, information from product ion scans was leveraged to determine if each peak was actually a tSGA. Product ion scan experiments were created for each mass of interest and multiple collision energies (20, 45, and 65 eV) were tested. The resulting spectra generated for each peak allowed us to eliminate peaks that were isobaric with tSGAs of interest, but had product ions inconsistent with proposed structures. Masses such as 255 and 273 m/z were particularly useful in identifying alkaloids as they are likely derived from the fragmentation of the steroidal backbone characteristic of all tSGAs (Supplementary Information) and have been previously reported in the literature (Iijima et al., 2013; Caprioli et al., 2014; Cichon et al., 2017; Sonawane et al.,

2018; Zhu et al., 2018). Additionally, tSGAs with the prefix “dehydro” exhibit a desaturation on the B ring of the steroidal backbone between carbons 5 and 6 (Ono et al.,

1997; Itkin et al., 2011; Iijima et al., 2013; Sonawane et al., 2018). We observed that common fragments derived from the steroidal backbone of these alkaloids, such as 253 and

271, were accordingly 2 m/z less than their saturated counterparts. The 273 fragment corresponds to the A-D rings of the steroidal backbone and its corresponding water loss product (Sonawane et al., 2018). Elution order of analytes was used to help tentatively identify tSGAs detected in our reference sample based on previous reports (Alseekh et al.,

2015; Zhu et al., 2018). Multiple collision energies allowed us to select product ions that were abundant and consistently produced under different conditions. These product ions then became candidate ions for MRM development.

MRM experiments allowed us to confidently detect and quantify tSGAs of interest and increase sensitivity by minimizing interference of co-eluting compounds. We created

161

MRM experiments for each mass using optimized source conditions and four product ions with the highest signal/noise ratio. Initially, our 50% aqueous methanolic reference sample extract was injected and each transition was tested at 5 eV. The experiments were rerun at increasing collision energies at 15 eV increments up to 95 eV. Afterwards, a 20 eV window broken into 5 eV increments was determined for each transition and the experiments were re-run. Optimized MRMs are displayed in Table 4.1. To maximize duty cycle, two transitions with the best signal to noise ratio were retained. The gradient was then optimized to chromatograph each analyte. All tSGAs were quantified using a standard curve generated with alpha-tomatine while aglycone species used tomatidine. Due to the structural similarity among tSGA species quantified in our method, we hypothesize that ionization efficiencies will be similar amongst our analytes. Lastly, MRMs were developed for the potato derived alkaloids alpha-solanine and solanidine used as IS. These IS allowed us to correct for instrument derived variability that normally occurs with mass spectrometers.

4.4.5 Development of Chromatographic Gradient

Method development related to the MS was initially carried out using a simple 13-minute gradient outlined above. While this run time is shorter than many of the previously published studies characterizing tSGAs using high-resolution MS (Iijima et al., 2008, 2013;

Zhu et al., 2018) we aimed to create a more efficient method that would be able to accommodate large sample sets. Of the two columns tested (Waters C18 Acquity bridged ethylene hybrid (BEH) 2.1 x 100 mm, 1.7 µm and Waters C18 Acquity high strength silica

162

(HSS) 2.1 x 100 mm, 1.8 µm), the BEH column was able to better resolve analytes of interest with a particular benefit observed in the nonpolar aglycone steroidal alkaloids. We adjusted our gradient conditions in such a way that all separation of analytes occurred within a six-minute window with an additional five minutes devoted to cleaning and requilibrating the column to reduce carryover (Figure 4.2). Additionally, the needle wash was set to rinse the needle and injection port for ten seconds before and after an injection with 1:1 methanol:isopropanol to further reduce carryover. We observed multiple peaks for many of our masses indicating the presence of multiple isobaric tSGAs (likely including structural isomers) (Figure 4.2). In the case of esculeoside B, multiple diastereomers have been previously reported in tomato products which explains our observation of multiple peaks for this analyte (Manabe et al., 2013; Nohara et al., 2015; Hövelmann et al., 2019).

Validation experiments, including confirmation of peak identities using high-resolution mass spectrometry, were next carried out using the finalized chromatographic gradient.

4.4.6 Confirmation of Analytes using High-Resolution Mass Spectrometry

Accurate mass spectrometry was used to confirm the identities of analytes quantified by our UHPLC-MS/MS method. We transferred our method to an Agilent 1290 Infinity II connected to an Agilent 6545 QTOF and profiled tSGAs both in high resolution full scan mode (50-1700 m/z) and through targeted fragmentation experiments. Both types of experiments were consistent with our identities of all tSGAs and aglycones in our UHPLC-

MS/MS method (Table 4.2). Retention times differed slightly between the UHPLC-

163

MS/MS method and the UHPLC-QTOF-MS experiments due to differences in dead volume between the two instruments. However, relative elution order remained the same.

164

Table 4.2 UHPLC-QTOF-MS confirmation of tSGA identities (Continued). Tentative Molecular Retention Monoisotopic Observed Mass Mass Error Common MS/MS Identification Formula Time (min) Mass [M+H] (Δ ppm) Fragmentsa

Esculeoside B C56H93NO28 2.24 1227.5884 1228.5989 2.20 1048.5380, 273.2120, 2.34 1228.5967 0.41 255.2016, 163.0509, 2.45 1228.5966 0.33 145.0404, 85.0205

Hydroxytomatine C50H83NO22 2.84 1049.5407 1050.5500 1.43 1032.5385, 273.2213, 3.22 1050.5513 2.67 255.2203, 161.1318, 3.29 1050.5506 2.00 145.0489, 85.0279 3.50 1050.5501 1.52

Dehydrolycoperoside C58H93NO29 2.41 1267.5828 1268.5930 1.89 1208.5714, 1046.5175, F, G, or 271.2054, 253.1951, Dehydroesculeoside A 163.0600, 85.0284

Lycoperoside F, G, or C58H95NO29 2.44 1269.5985 1270.6076 1.02 1210.5900, 1048.5324, Esculeoside A 3.06 1270.6095 2.52 273.2213, 255.2108, 163.0600, 85.0285 Acetoxytomatine (I) C52H85NO23 4.22 1091.5507 1092.5614 2.65 1032.5386, 273.2216, 255.2112, 161.1326, 145.0497, 85.0287 Dehydrotomatineb C50H81NO21 4.99 1031.5301 1032.5388 0.87 1014.5274, 271.2054, 5.20 1032.5373 0.58 253.1951, 145.0495, 85.0284, 57.0337 Acetoxytomatine (II) C52H85NO23 5.32 1091.5507 1092.5619 3.11 1032.5404, 273.2216, 5.39 1092.5608 2.11 255.2114, 161.1328, 145.0499, 85.0288 Alpha-tomatineb C50H83NO21 5.35 1033.5457 1034.5557 2.13 1016.5449, 416.3523, 273.2217, 255.2112, 145.0498, 85.0287 165

Tomatidineb C27H45NO2 6.95 415.3450 416.3531 0.72 398.3414, 273.2208, 255.2101, 161.1318, 126.1271, 81.0693 Dehydrotomatidineb C27H43NO2 6.98 413.3294 414.3371 0.24 396.3260, 271.2053, 253.1949, 161.1322, 126.1275, 81.0695 aMS/MS product ions generated at 70 eV and 45 eV for glycosylated and aglycone species, respectively. Other source parameters were previously enumerated. bIdentification confirmed by authentic standard

166

Targeted MS/MS experiments using the UHPLC-QTOF-MS allowed us to determine common spectral characteristics for each tSGA (Error! Reference source not found.and Supplementary Information). Using commercially available alpha-tomatine and tomatidine and exploiting the presence of dehydrotomatine and dehydrotomatidine

(tomatidenol) as impurities within these standards, we were able to collect MS/MS fragmentation data on these four analytes. We found that all tSGAs and aglycones fragmented in predictable ways that allow for identification. Common masses produced by each tSGA in our method can be found in Table 4.2. These data allow us to tentatively identify all analytes in our UHPLC-MS/MS with a high degree of confidence.

4.4.7 LOD and LOQ

Previous chromatography-based methods to quantify both potato and tSGAs relied on photodiode array detectors set to 208 nm (Kozukue and Friedman, 2003; Kozukue et al.,

2004; Tajner-Czopek et al., 2014; Del Giudice et al., 2015). Given that the molar extinction coefficient for alpha-tomatine is only 5000 M-1c-1, (Keukens et al., 1994), photodiode array detectors are often not sensitive enough for detecting low quantities of these compounds, nor distinguishing between different alkaloids. Moreover, photodiode array detectors are often set to 208 nm to quantify tSGAs which is a non-specific wavelength where many compounds (including mobile phases) can absorb light (Friedman and Levin, 1992, 1998;

Keukens et al., 1994). Mass spectrometers offer substantial gains in sensitivity through the use of MRM experiments and the ability to differentiate numerous analytes in a single run.

167

Our UHPLC-MS/MS method for quantifying tSGAs was able to detect and quantify alpha- tomatine and tomatidine in the low femtomole-on-column range (Table 4.3). Given our extraction method, tSGAs could be present in picomolar concentrations in tomato and still be quantified. Previously reported limits of quantification for alpha-tomatine range from

0.005 mg/kg (estimated to be 0.5 ug in a standard 100 g tomato) (Caprioli et al., 2014).

Given our reported LOQ of 1.10 femtomoles injected, we estimate that 681 picograms of alpha-tomatine can be quantified in tomatoes making our method almost three orders of magnitude more sensitive. This sensitivity could also be useful for situations when fruit quantity is lacking.

4.4.8 Spike Recovery

Spike addition experiments were conducted to assess the performance of our high- throughput extraction method. Both tomato and potato derived external alkaloid standards were used to determine if our chosen internal standards would behave similarly to analytes native to tomato. Tomato alkaloids alpha-tomatine (100.8% ± 13.1) and tomatidine (93%

± 6.8) as well as the potato-derived internal standards alpha solanine (94.3% ± 3.4) and solanidine (99.7% ± 7.1) were efficiently extracted using our method (Table 4.3). These data indicate that our method is able to effectively extract aglycone and glycosylated steroidal alkaloid species from tomato and our internal standards extract similarly to native analytes.

168

Table 4.3 Extraction efficiency of commercially available tSGAs and potato-derived internal standards. Analyte Sample Extraction LOQ LOD Size Efficiency (%) (femtomole (femtomoles s injected) injected)

Alpha-tomatine n=6 100.8 ± 13.1 1.0988 0.3296

Alpha-solaninea n=6 94.3 ± 3.4 N/Ab N/A

Tomatidine n=6 93.0 ± 6.8 0.3354 0.1006

Solanidinea n=6 99.7 ± 7.1 N/A N/A aAnalyte used as an internal standard with no calibration curve bNot applicable due to its use as an internal standard

169

4.4.9 Intra/Interday Variability:

Experiments to determine intra/interday variability were conducted to determine analytical variability in our extraction and analysis methods. A single operator extracted six tomato samples and analyzed them by UHPLC-MS/MS. This experiment was repeated twice more by the same operator. Our data indicate that our methods are reliable with most analytes having coefficient of variations for both intra and interday variability below 5% (Table

4.4). As expected, interday variability was higher than intraday variability for all analytes reflecting day-to-day variability in the MS.

170

Table 4.4 Intraday and interday coefficient of variation values for analytes quantified by our UHPLC-MS/MS method. Analyte Intraday Interday Coefficient of Coefficient of Variation (%)a Variation (%)b

Esculeoside B 4.46 6.84

Hydroxytomatine 4.00 5.60

Dehydrolycoperoside F, G, or 8.42 8.03 Dehydroesculeoside A Lycoperoside F, G, or Esculeoside A 3.35 4.21

Acetoxytomatine (I) 3.56 3.89

Dehydrotomatine 4.25 7.11

Acetoxytomatine (II) 7.57 7.70

Alpha-tomatine 3.92 6.42

Tomatidine 11.78 13.73

Dehydrotomatidine 11.69 13.61 aAverage coefficient of variation within a day of six samples extracted and run by a single operator. The experiment was repeated over three days. bAverage coefficient of variation over a three-day period of 18 samples extracted and run by a single operator.

171

4.4.10 12-hour Stability Experiment:

Tomato phytochemicals typically analyzed, such as carotenoids, are subject to oxidation and need to be run in small batches to minimize experimental error due to degradation

(Kopec et al., 2012). However, relatively little is known about the stability of tSGAs compared to the above phytochemical classes. We hypothesized that due to the known heat stability of chemically analogous potato steroidal glycoalkaloids, extracted tSGAs would be stable over time. A 12-hour stability study demonstrated that both alpha-tomatine and tomatidine did not degrade over time in an autosampler maintained at 20 °C. This stability enables large batching of analysis of tSGAs in ~ 50 samples at a time, and 100 tomato extracts per day. Analysis of large numbers of samples is critical for plant breeders and large-scale diversity analyses. While there is currently no published literature investigating the stability of tSGAs, some data exists in chemically analogous potato glycoalkaloids.

Potato glycoalkaloids are often extracted at 100 °C temperatures to disrupt cell walls and otherwise weaken the sample matrix (Rodriguez-Saona et al., 1999) and processing studies have shown that these compounds are stable up to 180 °C (Chungcharoen, 1988).

Therefore, tSGAs may also have similar heat tolerance attributes and we speculate that these analytes may remain unchanged in autosamplers well beyond the 12-hour time period we tested.

4.4.11 Grocery Store Survey

To test our extraction and quantification method, we surveyed several commonly consumed tomato-based products available at grocery stores. The purpose was twofold: to

172 test applicability of or method, and to report comprehensive and quantitative values of tSGAs in commonly consumed tomato products. These products included an assortment of fresh tomatoes, ketchup, pasta sauce, pizza sauce, tomato soup, tomato paste, tomato juice, and whole peeled tomatoes (Table 4.5). Values are reported per serving to normalize between tomato products subjected to varying degrees of concentration. While there are some reports of tSGA concentrations in fresh tomatoes using modern methods (Baldina et al., 2016), concentrations in tomato-based products are not well reported in the literature.

Though alpha-tomatine degrades during ripening (Kozukue and Friedman, 2003), other tSGAs, including lycoperosides F, G and esculeoside A, and esculeoside B increase during this period, keeping total concentrations of tSGA more or less constant (Yamanaka et al.,

2009). We found that tSGAs varied depending on type of product. High standard deviations likely reflect differences in geographic origin, harvest time, and processing conditions. Of note, many of our tSGAs varied by up to three orders of magnitude among different analytes and tomato products. This finding indicates a broad range of tSGA concentrations in tomato-based products.

Alpha-tomatine, the first tSGA in the biosynthesis pathway, was found to be in the highest concentration in processed tomato products such as paste, pasta sauce, and soup

(Table 4.5). The discrepancy between fresh and whole peeled tomatoes is hypothesized to be due to genetic and environmental conditions that influenced the chemical profile of the tomatoes prior to processing. Analyte groups like dehydrolycoperoside F, G or A, lycoperosides F, G, or esculeoside A and acetoxytomatine (commonly referred to as lycoperosides A, B, or C) were not detectable in most tomato products except for some

173 fresh varieties and ketchup. Interestingly, lycoperosides F, G, or esculeoside A are typically the most abundant tSGA in fresh tomatoes (Iijima et al., 2013), though there is reported variation among different cultivars (Baldina et al., 2016). This observation raises questions about the effects of processing on tSGAs where few studies have been conducted to date

(Tomas et al., 2017). While the chemically analogous potato glycoalkaloids are considered to be heat stable, high temperatures, pressures, and any combination thereof might be detrimental to some tSGAs or cause shifts in chemical profiles.

Concentrations of tSGAs in tomato products were normalized for serving size to contextualize how much might be ingested in a given meal. Other tomato phytochemicals, such as lycopene, tend to be found in concentrations ranging from 0.09 to 9.93 mg/100g

FW in fresh tomatoes (Dzakovich et al., 2019). Compared to major carotenoids found in tomato, tSGA concentrations were comparable (0.7 to 3.4 mg/serving) (Cooperstone,

2020) This finding contradicts a long-standing misconception that tSGAs are degraded during ripening (Friedman, 2002). Rather, tSGAs such as alpha-tomatine are biochemically transformed during ripening into glycosylated and acetylated forms. Overall, our methods were able to efficiently extract and analyze many types of tSGAs and generate the first quantitative concentration reports of these analytes in commonly consumed tomato products. Moreover, we found that tSGAs can be found in similar concentrations to other major phytochemicals in tomatoes such as carotenoids.

174

Table 4.5 Survey of tSGAs in common tomato-based products reported in µg per serving size.

Fresh market Juice Ketchup Pasta sauce Paste Pizza sauce Soup Whole peeled Analyte (n = 7) (n = 3) (n = 3) (n = 3) (n = 3) (n = 3) (n = 3) (n = 3) Serving size (g): 126 228.5 17 126 33 62 126 126

Esculeoside B 4.3±9.7a 3.3±3.00 0.3±0.6 5.9±7.2 3.6±0.8 1.8±0.6 2.0±2.1 21.8±10.3

Hydroxytomatine 297.9±248.1 54.3± 25.0 12.1±1.3 80.4±14.5 57.9±13.2 26.4±4.9 42.0±7.3 50.4±3.7

Dehydrolycoperosi de F, G, or 7.0±12.1 N.D.b N.D. N.D. N.D. N.D. N.D. N.D. Dehydroesculeosid e A Lycoperoside F, G 1589.4±1738.3 N.D. N.D. N.D. N.D. N.D. N.D. N.D. or Esculeoside A

Acetoxytomatine 30.6±31.6 17.4±8.6 3.1±2.0 20.3±15.8 25.0±4.1 9.4±3.0 10.0±4.8 1.8±3.2

Dehydrotomatine 4.1±3.0 41.0±29.0 5.7±0.7 41.3±14.9 28.2±3.9 19.4±2.0 31.6±7.8 11.3±5.9

Alpha-tomatine 64.5±56.0 1083.5±747.4 156.3±9.7 1109.9±390.8 889.5±119.4 524.7±85.5 964.3±62.5 338.4±156.5

Tomatidine N.Q.c N.Q. 0.4±0.3 1.7±1.3 0.8±0.0 0.8±0.1 1.5±0.5 0.42±0.2

Dehydrotomatidine N.Q. N.Q. N.Q. 0.2±0.1 0.1±0.0 N.Q. N.Q. N.Q.

Total 3376.0±2886.3 1307.7±823.7 191.7±23.4 1541.9±410.3 1135.1±285.9 736.5±166.6 1126.3±34.4 1101.3±116.5 aMean ± standard deviation bNot detected cNot quantified 175

We have developed and described the first comprehensive extraction and analysis method for tSGAs. Our extraction method was able to quickly and efficiently extract tSGAs and allowed for high-throughput workflows (16 samples per ~20 min) to be utilized.

Our UHPLC-MS/MS method was able to separate and quantify 18 tSGAs representing 9 different tSGA masses, as well as two internal standards, in 13 minutes. Limits of quantification for commercially available tSGAs were 1.09 and 0.34 femtomoles on column for alpha-tomatine and tomatidine, respectively. This corresponds to 0.8 and 0.25

µg/100g of alpha-tomatine and tomatidine in tomato, respectively, given our extraction procedures. Relative quantification for tSGAs and aglycones that did not have commercially available standards was performed using alpha-tomatine and tomatidine, respectively. Our methods were able to successfully profile tSGAs in a comprehensive array of commonly available tomato-based products. These values are among the first to be reported in the literature and can serve as benchmarks for future studies investigating tSGAs in a variety of contexts. Our extraction and UHPLC-MS/MS method will allow researchers to rapidly and accurately generate data about tSGAs and overcomes a major limitation hampering this field and allow for the field to advance.

4.5 Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

176

4.6 Author Contributions

MD and JC contributed to the project ideation. MD contributed to the method development and drafted the manuscript. MD and JH analyzed samples. MD, JH, and JC analyzed and interpreted the data, and edited the manuscript. All authors have read and approved the final manuscript. JC has responsibility for final content.

4.7 Funding

Financial support was provided by the USDA-NIFA National Needs Fellowship (2014-

38420-21844), USDA Hatch (OHO01470), Foods for Health, a focus area of the

Discovery Themes Initiative at The Ohio State University, and the Ohio Agricultural

Research and Development Center.

4.8 Acknowledgements

We thank David Francis, Jiheun Cho, Troy Aldrich (The Ohio State University, Ohio

Agriculture Research and Development Center), and the North Central Agricultural

Research Station crews for assistance with selecting, planting, and harvesting tomatoes used in this study.

This manuscript has been released as a pre-print at bioRxiv https://www.biorxiv.org/content/10.1101/2019.12.23.878223v1.

177

4.9 Data Availability Statement

The raw data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researchers.

178

Chapter 5. Regulation of Steroidal Alkaloid Pathway Intermediates Differs Among Tomatoes in The Red-Fruited Clade

Michael P Dzakovich1, David M Francis2, and Jessica L Cooperstone1,3

Affiliations 1The Ohio State University, Department of Horticulture and Crop Science, 2001 Fyffe Court, Columbus, OH 43210. 2The Ohio State University, Ohio Agricultural Research and Development Center, Department of Horticulture and Crop Science, 1680 Madison Ave, Wooster, OH 44691. 4The Ohio State University, Department of Food Science and Technology, 2015 Fyffe Court, Columbus, OH 43210.

5.1 Abstract

The tomato clade of produce a variety of cholesterol-derived steroidal glycoalkaloids. These compounds offer a means of chemical defense against bacteria, fungi, and herbivores. While often deemed anti-nutritional to humans, pre-clinical evidence suggests that these compounds may contribute to the health benefits associated with tomato consumption. However, little is known about the chemical diversity, concentration, and genetic architecture controlling most steroidal alkaloids. In particular, genes encoding enzymes that biosynthesize steroidal alkaloids found later in the pathway have yet to be elucidated. We hypothesized that genetic variation would influence the profile and concentration of tSGAs. To test this hypothesis, we assembled a panel of 107 genetically diverse tomato accessions including 25 accessions of Solanum pimpinellifolium (a close relative of the modern tomato), 32 accessions of Solanum

179 lycopersicum var. cerasiforme (cherry tomatoes), and 45 accessions of commercial processing and fresh market germplasm. This germplasm was grown in a randomized complete block design in three environments over two years (total n=642) and profiled for 9 different steroidal alkaloids using a recently developed ultra-high performance liquid chromatography tandem mass spectrometry method. Average total amounts of steroidal alkaloids in ripe fruits were found in concentrations up to 23 mg/100 g FW in all wild material and germplasm could be differentiated from processing material on the basis of genetic background. A genome-wide association analysis revealed that QTL on chromosome 3 was controlling early steroidal alkaloid pathway intermediates while QTL on chromosomes 10 and 11 were controlling late pathway intermediates. These findings were validated in segregating progeny. Further investigation is needed to fine map QTL of interest and identify candidate genes.

5.2 Introduction

The tomato clade of Solanaceae produce a unique assortment of cholesterol- derived steroidal alkaloids (Cárdenas et al., 2015). These secondary metabolites are thought to be produced for plant defense, due to their reported fungicidal and insecticidal properties (Cipollini and Levey, 1997; Irving et al., 1945). Alpha-tomatine, the most well studied tomato steroidal alkaloid, was hypothesized to impart resistance to Fusarium in the mid 20th century (Gottlieb, 1943; Irving et al., 1945) and was isolated and characterized shortly thereafter (Fontaine et al., 1948). While steroidal alkaloids are frequently considered anti-nutritional compounds (Ballester et al., 2016; Cárdenas et al.,

2016; Itkin et al., 2013), a growing body of literature suggests that tomato steroidal

180 alkaloids may impart positive health benefits as part of the diet (Choi et al., 2012; Dyle et al., 2014; Friedman et al., 2000c; Lee et al., 2004). Researchers have used association analysis, comparative transcriptomics, and genetic modification as well as high resolution mass spectrometry to describe the steroidal alkaloid biosynthetic pathway and its chemical constituents (Abdelkareem et al., 2017; Alseekh et al., 2015; Ballester et al.,

2016; Bednarz et al., 2019; Cárdenas et al., 2019; Iijima et al., 2013; Itkin et al., 2011;

Mintz-Oron et al., 2008; Schwahn et al., 2014; Sonawane et al., 2018; Zhu et al., 2018).

These recent efforts to elucidate the steroidal alkaloid biosynthetic pathway and its regulatory mechanisms in tomato are bearing fruit (Ballester et al., 2016; Cárdenas et al.,

2019; Itkin et al., 2013, 2011; Sonawane et al., 2018; Yu et al., 2020).

GLYCOALKALOID METABOLISM (GAME) enzymes 4, 6, 7, 8, 11, and 12 have been shown to catalyze a series of hydroxylation, oxidation, and transamination reactions on the aliphatic tail of cholesterol to generate the E and nitrogenous F rings characteristic of solanaceous steroidal alkaloids. (Itkin et al., 2013). This series of reactions results in the biosynthesis of dehydrotomatidine, the first steroidal alkaloid in the proposed pathway.

Dehydrotomatidine can be converted into tomatidine by GAME25, SlS5αR2 (a C5-alpha reductase), and Sl3βHSD1 (a C3-dehydrogenase/reductase) (Akiyama et al., 2019; Lee et al., 2019; Sonawane et al., 2018). From here, it has been proposed that desaturated

(derived from dehydrotomatidine) and saturated (derived from tomatidine) steroidal alkaloids are biosynthesized in parallel using the same enzymes at each step.

Dehydrotomatidine and tomatidine are then converted to dehydrotomatine and alpha- tomatine, respectively, by a series of glycosylations catalyzed by GAME1, 17, 18, and 2

(Itkin et al., 2013, 2011). Both of these compounds can then be hydroxylated into

181 hydroxy-dehydrotomatine and hydroxytomatine by 2-oxoglutarate-dependent dioxygenases (GAME31 and 32) (Cárdenas et al., 2019). The next steps are presumed to follow the order hydroxytomatine to acetoxytomatine, lycoperoside F/G/esculeoside A, and esculeoside B (and their desaturated counterparts), based on chemical structure.

While biosynthesis of dehydrotomatidine to hydroxytomatine has been elucidated, the structural and regulatory mechanisms governing the production of hydroxytomatine to esculeoside B has yet to be described.

Alpha-tomatine, the primary steroidal alkaloid in mature green tomatoes, is converted into various downstream lycoperosides and esculeosides as the fruit matures

(Cárdenas et al., 2015; Iijima et al., 2009). Natural variation in this process has been observed in a small subset of wild cherry (Solanum lycopersicum var. cerasiforme) accessions including LA2213, LA2256, and LA2262 (Rick et al., 1994). Fruits from these accessions retain high levels of alpha-tomatine throughout ripening contrary to most members of the red-fruited clade, and are commonly consumed in the Alto Mayo region of Peru (Rick et al., 1994). These accessions provide genetic variability that may provide insight into the regulation of the tomato steroidal alkaloid biosynthetic pathway.

We recently developed a tandem mass spectrometry method that quantifies dehydrotomatidine, tomatidine, alpha-tomatine, dehydrotomatine, hydroxytomatine, acetoxytomatine, dehydrolycoperoside F, G, or dehydroesculeoside A, lycoperoside F, G, or esculeoside A, and esculeoside B (Dzakovich et al., 2020). Equipped with these tools, we sought to describe the chemical diversity of tomato steroidal alkaloids through a detailed and quantitative analysis of cultivated and wild red-fruited relatives of tomato.

Our populations were selected based on known patterns of genomic variation and were

182 cultivated in multiple environments to separate genetic and environmental effects. We quantified nine unique tomato steroidal alkaloids in 107 red-fruit tomato accessions and determined their associated quantitative trait loci (QTL) by a genome-wide association analysis (GWAS). To estimate genetic effects, we also generated structured bi-parental populations of high-steroidal alkaloid germplasm crossed to cultivated parents. Results from GWAS were validated in subsequent segregating progeny.

5.3 Results

5.3.1 Wild tomato species exhibit steroidal alkaloid chemical diversity

We applied our recently developed method that extracts and quantifies 9 members of the proposed tomato steroidal alkaloid pathway (Dzakovich et al., 2020) to a population of 107 tomato accessions grown in three environments using a randomized complete block design. A complete list of tomato germplasm can be found in supplemental information (Appendix B, Supp. Table 2). Based on their position in the proposed steroidal alkaloid biosynthesis pathway, we are defining early pathway intermediates as dehydrotomatidine to acetoxytomatine and late pathway intermediates as dehydrolycoperoside F, G, or dehydroesculeoside A to esculeoside B.

In the case of hydroxytomatine, acetoxytomatine, dehydrolycoperoside F, G, or dehydroesculeoside A, and lycoperoside F, G, or esculeoside A, and esculeoside B, multiple peaks are observed, indicating the presence of structural isomers. For example, lycoperoside F, G, and esculeoside A, are isobaric and exhibit similar fragmentation patterns since their molecular differences are limited to subtle changes in stereochemistry on the F ring (Dzakovich et al., 2020). Isolation followed by nuclear magnetic resonance

183 experiments would be needed to definitively characterize individual isobaric peaks and unequivocally assign identity. Currently, the biological impacts associated with different stereochemistry are unknown. Therefore, values reported here are the sum of all isomers within a given mass, and compounds are annotated as precisely as their identities are known. We report the “total”, or sum of all steroidal alkaloids in a sample as means plus or minus standard deviation for each alkaloid. Genotype means and standard deviation for all steroidal alkaloids and their individual isomers can be found in the supplemental information (Appendix B, Supp. Table 3). For each tomato steroidal alkaloid, differences of up to four orders of magnitude in concentration were observed across the accessions selected. An example of this variation is shown in Figure 5.1illustrating the distribution of concentrations for features identified as alpha-tomatine and lycoperoside F, G, or esculeoside A, representing early and late tomato steroidal alkaloid pathway metabolites, respectively. Multiple sub-groups can be observed for lycoperoside F, G, or esculeoside

A within the wild cherry tomato accessions. Box and whisker plots with log10 scaled y- axes detailing concentrations of tomato steroidal alkaloids across five classes of tomato germplasm present in our collection can be found in Supplemental Figure 2. Each dot within the box and whisker plots represents an individual observation. For most steroidal alkaloids, average concentrations in cultivated material tended to be lower relative to wild accessions.

184

Figure 5.1 Box and whisker plots of alpha-tomatine and lycoperoside F, G, or esculeoside A. Each dot represents an individual observation. The y-axis was log transformed to visually condense the large amount of variation observed in the concentrations of all tomato steroidal alkaloids measured in this study. Distinct sub-groups of wild cherry tomatoes can be observed for lycoperoside F, G, or esculeoside A.

185

The predominant steroidal alkaloid in most classes tended to be lycoperoside F,

G, or esculeoside A, a late pathway intermediate. Lycoperoside F, G, or esculeoside A comprised 82% of the total steroidal alkaloids in Solanum pimpinellifolium accessions measured with our method, with similar proportions in our wide-cross hybrids (72%) and cultivated cherry (72%). The steroidal alkaloid profile of wild cherry tomatoes as well as cultivated processing tomatoes was comprised of 37% and 48% lycoperoside F, G, and esculeoside A, respectively. For wild cherry tomatoes, acetoxytomatine was on average the second most abundant steroidal alkaloid (28%). However, hydroxytomatine was the second most abundant (22%) steroidal alkaloid in processing tomatoes. On average, concentrations of individual steroidal alkaloids were between 0.1 µg/100g fresh weight and 1 mg/100 g fresh weight; commensurate with lower ranges of major phytochemical classes such as carotenoids (Dzakovich et al., 2019). For total alkaloids, average values were between 1 and 5 mg/100g fresh weight for processing and cultivated cherry, respectively. Solanum pimpinellifolium and wild cherry had average steroidal alkaloid values around 23 mg/100g fresh weight.

5.3.2 Cultivated material lacks diversity in steroidal alkaloids relative to wild accessions and early pathway intermediates drive separation in our diversity panel

Variation in the concentrations of steroidal alkaloids extracted from ripe fruit of the diversity panel, including parents and F1 progeny, were visualized using principal components analysis (PCA) (Figure 5.2). The first PC explained 30.1% of the phenotypic variance and the second PC, 26.8%. There appeared to be two sub-groups of wild cherry, both of which distinctly separate from cultivated cherry (Figure 5.2A). One group was

186 exemplified by LA2213, LA2256, and LA2262 and several other accessions primarily from the San Martin and La Libertad regions of Peru. The second group which included

LA2183, LA1683, and LA1668. A loadings plot (Figure 5.2B) of the PCA indicates that early tomato steroidal glycoalkaloids (tomatidine to acetoxytomatine) are driving separation to the left, while later alkaloids (dehydrolycoperoside F, G, and esculeoside A to esculeoside B) drive separation to the right. Total alkaloids separated along PC2.

Cultivated material tended to cluster together indicating similarity of tomato steroidal alkaloid metabolites and concentrations and had decreased concentrations of all alkaloids as compared to wild material (Figure 5.2A).

187

Figure 5.2 Principal components analysis scores plot (A) and corresponding loading plots (B) for 107 genotypes represented in the diversity panel. Loadings represent vectors of steroidal alkaloids phenotyped in the population and their direction/magnitude indicate their influence on a given principal component. Wild accessions (e.g. S. pimpinellifolium) exhibited greater diversity in steroidal alkaloids relative to processing accessions. Two distinct subgroups of wild cherry tomatoes appeared to separate based on compounds found in different halves of the tomato steroidal alkaloid pathway.

188

5.3.3 Concentrations of steroidal alkaloids correlate and define early and late stages of the pathway

Correlation analyses revealed relationships between different tomato steroidal alkaloids (Fig 3). All steroidal alkaloids associated strongly with adjacent pathway metabolites (e.g., concentrations of lycoperoside F, G, or esculeoside A are highly correlated with esculeoside B) and all analytes were positively correlated with total alkaloids. Negative, but statistically significant, correlations were observed among some early and late pathway steroidal alkaloids including dehydrotomatine, alpha-tomatine, and acetoxytomatine with dehydrolycoperoside F, G, or dehydroesculeoside A, lycoperoside F, G, or esculeoside A, and esculeoside B. Correlations among all individual peaks (i.e., individual isomers of tomato steroidal alkaloids) can be found in

Supplementary Figure 3. Different subclasses of the germplasm in the diversity panel displayed unique correlation patterns among steroidal alkaloids.

189

Figure 5.3 Correlation matrix of all tomato steroidal alkaloids quantified in diversity panel. Size and darkness of circle indicate intensity of correlation coefficient (see legend on right) and *, **, and *** indicate statistical significance at P<0.05, 0.01, and 0.001, respectively. Cells with no significance indicator were found to be P>0.05. Pathway intermediates tended to correlate strongly with neighboring metabolites in the proposed biosynthetic pathway. All analytes correlated with “total” tomato steroidal alkaloids to varying degrees.

190

5.3.4 Diversity in tomato steroidal glycoalkaloids is largely under genetic control

Multiple growing environments allowed us to partition effects due to genetics, environment, and their interaction. Variance partitioning using random effects models demonstrate that genetic sources of variation were the major contributors to concentrations of steroidal alkaloids (Table 5.1). For tomato steroidal alkaloids with

“genotype” explaining a large portion of total variance (alpha-tomatine, hydroxytomatine, acetoxytomatine, and lycoperoside F, G, or esculeoside A), broad sense heritability and reliability ranged between 0.53 – 0.96 and 0.46 – 0.68, respectively

(Table 5.1). Dehydrotomatidine and tomatidine, which are at least an order of magnitude lower in concentration, on average, compared to their saturated counterparts, exhibited relatively low proportions of variance explained by “genotype” effects attributed to variation within a single field location. As such, estimates of broad sense heritability and reliability decreased accordingly (Table 5.1).

191 Table 5.1 Percentage of total variance due to the contribution of genetics and environment for steroidal alkaloid content. Dehydro- lycoperoside Lycoperoside Dehydro- Dehydro- Alpha- Hydroxy Acetoxy- F, G, or F, G, or Model tomatidine Tomatidine tomatine tomatine -tomatine tomatine Dehydro- Esculeoside A Esculeoside Parametera esculeoside A B Total Genotype 9.0% 3.0% 25.0% 67.7% 52.2% 46.5% 33.7% 66.6% 42.1% 59.1% Environment 2.0% 3.3% 0.1% 0.2% 1.5% 0.3% 0.1% 0.0% 2.4% 0.4% Within 3.0% 26.9% 69.6% 16.3% 23.4% 34.3% 56.1% 16.9% 26.1% 15.0% Environment Genotype by 43.3% 0.0% 0.0% 0.0% 0.0% 0.1% 0.0% 0.0% 0.0% 0.0% Environment Residual 42.7% 66.8% 5.3% 15.8% 22.9% 18.9% 10.1% 16.5% 29.3% 25.5%

Heritability 0.21 0.29 0.97 0.96 0.93 0.94 0.97 0.96 0.90 0.93 Reliability 0.03 0.09 0.25 0.68 0.52 0.46 0.34 0.67 0.42 0.59 aModel parameter refers to components of random effects models used to calculate variance for each steroidal alkaloid. A full model statement and details of its execution are enumerated in the “statistical analysis” section of the materials and methods.

192

When LA2213, LA2256, and LA2262 were each crossed to OH8243 (a processing tomato cultivar) and Tainan (a commercial fresh market cherry tomato), F1 individuals were not statistically different from the cultivated parent in terms of alpha- tomatine content suggesting dominant gene action for low levels (Supplemental Table 1).

While LA2213, LA2256, and LA2262 were selected for their high alpha-tomatine content (Rick et al., 1994), other steroidal alkaloids in these accessions varied independently. Hydroxytomatine concentrations were statistically similar when comparing OH8243 or Tainan to any of the wild parental lines (Supplemental Table 1).

Although not significant, OH8243 and Tainan were numerically higher in the late pathway intermediates dehydrolycoperoside F, G, or dehydroesculeoside A and lycoperoside F, G, or esculeoside A. The LA2213 x OH8243 cross was selected to advance to BC1S1 for further analysis.

Association analysis was conducted on the germplasm collection. Comparing marker-trait associations with different PC to correct for structure, the naïve model provided the best fit based on BIC in 12.5% of the tests with significant marker-trait associations. In these cases, the fit was not significantly improved by adding PC1.

Models with PC1 provided the best fit (P = 1.02E-05 to 2.2E-16) for 62.5% of the tests, and in these cases no improvement was seen by adding a second PC. Models with PC1 and PC2 provided the best fit based on BIC and were significantly improved relative to the model with a single PC (P < 0.00134 to 0.00274) for 25% of the tests. The marker- trait models with three PC never produced a best fit for a significant marker-trait interaction. These comparisons demonstrated that population structure affected trait and

193 locus specific effects, and that the model with a single PC provided the most parsimonious test across traits and loci.

Genome wide association analysis using the QK model, adjusted for structure with a single PC used for Q and a kinship matrix (K), identified 61 putative associations between the ten steroidal alkaloid traits at P < 0.01. We utilized a critical value of 0.01 as a low-stringency cut-off to protect against type-II error knowing that QTL would be subsequently validated in a segregating population.

These associations were based on 32 markers and defined 9 chromosomes. There were 14 chromosome regions for which trait associations were detected based on windows of 10 cM and non-significant markers separating these windows. For 57 marker-trait associations, the allele most frequently identified in wild germplasm was associated with an increase in the concentration of steroidal alkaloids. A major QTL was detected on chromosome 3. Figure 5.4 shows Manhattan plots of the “early”

(dehydrotomatidine to hydroxytomatine) and “late” (acetoxytomatine to esculeoside B) pathway intermediates, highlighting different genetic associations between these two groups of analytes.

Significant associations at P < 0.01 were detected for alpha-tomatine on chromosome 3, with detection of this locus independent of population structure.

Compounds produced later in the pathway through modification of alpha-tomatine were associated with a locus on chromosome 4 (P < 0.01) in a population structure independent manner. Further associations were detected on chromosome 10 and 12 (P <

0.01), but detection of these associations was dependent on population structure and how many principal components were used in the model.

194 Linkage disequilibrium (LD) decays in cultivated tomato over several cM

(Robbins et al., 2011) and in Solanum pimpinellifolium over blocks as small as 18 Kb

(Lin et al., 2019). To account for differences in LD we also used a sliding window haplotype based association analysis approach changing the point of analysis to pairs of markers (Sim et al., 2015). Associations discovered using the traditional single marker approach, such as major QTL on chromosome 3 associated with alpha tomatine, were concurrently found using the haplotype based association analysis method.

195

Figure 5.4 Manhattan plots of steroidal alkaloids used in GWAS. Red colored bars indicate SNPs above a significance threshold of- log(P-values) of 2.

196

5.3.5 Association determined by GWAS are validated in a biparental population

The BC1S1 material generated from crossing OH8243 (cultivated processing; low alpha-tomatine in ripe fruit) with LA2213 (wild cherry; high alpha-tomatine in ripe fruit) provided a validation population for the GWAS. Analysis of these backcross progeny confirmed a role for a locus on chromosome 3 controlling steroidal alkaloid concentration in tomato (Table 5.2). Markers on chromosome 3 (solcap_snp_sl_7942, solcap_snp_sl_7919, solcap_snp_sl_5761, and solcap_snp_sl_5656) were associated with tomato steroidal alkaloids from the first half of the pathway (dehydrotomatine, alpha- tomatine, hydroxytomatine, and acetoxytomatine), as well as total tomato steroidal alkaloids. Conversely, later pathway metabolites (dehydrolycoperoside F, G, or dehydroesculeoside A, lycoperoside F, G, or esculeoside A, and esculeoside B) were associated with markers on chromosomes 10 (solcap_snp_sl_13202 and solcap_snp_sl_34373) and 11 (solcap_snp_sl_21035 and SL10890_654). Manhattan plots for all steroidal alkaloids can be seen in Figure 5.5. Similar outcomes were also achieved using a haplotype based approach.

197 Table 5.2 Markers associated (P <0.01) with tomato steroidal alkaloids in the BC1S1 validation population. Marker means reported in

µg/100 g fresh weight (Continued).

` Marker Meana Physical Marker Positionb Chrc Steroidal P-value r2 Recurrent Het.d Donor Donor Alkaloid Parent (LA2213) Allele (OH8243) solcap_snp_sl_2629 30.04 6 Tomatidine 1.17E-4 0.10 0.58 0.21 4.56 T solcap_snp_sl_24428 42.04 6 Tomatidine 2.25E-4 0.10 0.52 0.39 4.68 G solcap_snp_sl_7942 55.98 3 Dehydrotomatine 2.61E-11 0.25 38.97 65.91 169.19 C solcap_snp_sl_7919 56.44 3 Dehydrotomatine 7.03E-12 0.26 39.28 62.75 184.22 A solcap_snp_sl_7942 55.98 3 Alpha-tomatine 8.92E-12 0.26 553.90 1245.10 2697.90 C solcap_snp_sl_7919 56.44 3 Alpha-tomatine 5.61E-12 0.26 618.40 1045.60 2948.80 A solcap_snp_sl_5761 45.59 3 Alpha-tomatine 3.14E-10 0.22 510.00 1708.00 1492.10 C solcap_snp_sl_5656 46.04 3 Alpha-tomatine 2.46E-11 0.25 433.80 1828.30 1686.50 G solcap_snp_sl_7942 55.98 3 Hydroxytomatine <2.2E-16 0.58 216.38 227.73 391.71 C solcap_snp_sl_7919 56.44 3 Hydroxytomatine <2.2E-16 0.58 219.29 219.00 413.24 A solcap_snp_sl_7942 55.98 3 Acetoxytomatine 7.36E-10 0.22 147.41 219.83 558.40 C solcap_snp_sl_7919 56.44 3 Acetoxytomatine 3.05E-10 0.22 160.95 181.35 609.80 A Dehydrolycoperoside F, G, SL10890_654 51.61 11 <2.2E-16 0.46 9.52 5.99 18.49 C or Dehydroesculeoside A Dehydrolycoperoside F, G, solcap_snp_sl_13202 1.16 10 <2.2E-16 0.46 8.11 9.99 19.54 C or Dehydroesculeoside A Dehydrolycoperoside F, G, solcap_snp_sl_46386 1.73 10 <2.2E-16 0.45 7.67 11.09 16.519 C or Dehydroesculeoside A Lycoperoside F, G, or solcap_snp_sl_13202 1.16 10 <2.2E-16 0.65 574.92 663.05 1089.11 C Esculeoside A Lycoperoside F, G, or solcap_snp_sl_46386 1.73 10 <2.2E-16 0.66 540.83 690.37 1085.38 C Esculeoside A Lycoperoside F, G, or solcap_snp_sl_21394 55.66 8 <2.2E-16 0.65 780.61 526.02 431.65 G Esculeoside A

198 Lycoperoside F, G, or SL10890_654 51.61 11 <2.2E-16 0.68 615.31 487.75 1189.26 C Esculeoside A SL10890_654 51.61 11 Esculeoside B <2.2E-16 0.55 53.831 50.94 116.15 C solcap_snp_sl_13202 1.16 10 Esculeoside B <2.2E-16 0.53 49.72 70.35 93.99 C solcap_snp_sl_46386 1.73 10 Esculeoside B <2.2E-16 0.55 45.42 70.62 104.95 C solcap_snp_sl_7942 55.98 3 Total <2.2E-16 0.46 1683.90 2372.90 4695.30 C solcap_snp_sl_7919 56.44 3 Total <2.2E-16 0.46 2158.60 1759.80 5008.00 A solcap_snp_sl_5761 45.59 3 Total <2.2E-16 0.42 1605.00 3074.60 3085.80 C solcap_snp_sl_5656 46.04 3 Total <2.2E-16 0.44 1528.30 3205.80 3269.90 G aAverage value obtained for individuals containing marker alleles associated with the recurrent parent (OH8243), donor parent (LA2213), or heterozygotes. All quantities shown are measured in µg/100 g fresh weight. bThe physical location (measured in megabases (Mb)) of a marker cChromosome number dHeterozygote

199

Figure 5.5 Manhattan plots of steroidal alkaloids used in QTL analysis on the BC1S1 validation population. Red colored bars indicate SNPs above a significance threshold of-log(P-values) of 2.

200

5.4 Discussion

Concentrations of steroidal alkaloids were found in higher concentrations and were more variable in wild accession compared to cultivated cherry and processing classes (Figure 5.1). We quantified individual concentrations of steroidal alkaloids up to

60 mg/100g fresh weight, but average values for individual alkaloids within our germplasm collection tended to be around 1 mg/100 g fresh weight (Supplemental Table

3). By comparison, prominent tomato phytochemical classes like carotenoids are often found between 0.09 to 9.93 mg/100 g fresh weight (Dzakovich et al., 2019), suggesting steroidal alkaloids are in higher concentrations than previously thought. Most variation in tomato steroidal alkaloids was present in wild species and relatively little variation was observed in cultivated material (Figure 5.2). Like many other species, wild tomato accessions tend to be phenotypically and genetically diverse for a variety of traits and have long been exploited for breeding purposes (Rick, 1960). The genomes of wild accessions used in this study likely predate one or both genetic bottlenecks that occurred during domestication that restricted diversity at the nucleotide level (Blanca et al., 2015;

Razifard et al., 2020; Sato et al., 2012). These compounds may have been inadvertently selected against by breeders resulting in the relatively low concentrations and amount of diversity seen in cultivated material within our populations.

A noticeable bifurcation in our diversity panel PCA can clearly be seen in Figs.

2A and 2B. Ten Solanum lycopersicum var. cerasiforme individuals clustered separately from other Solanum lycopersicum var. cerasiforme and Solanum pimpinellifolium accessions. The loadings plot in Figure 5.2B confirms that loadings for early and late

201 steroidal alkaloid biosynthetic pathway intermediates are driving the separation seen in the PCA plot (Figure 5.2A). Outcomes of correlation analyses among different tomato steroidal alkaloids reflect this pattern as well (Figure 5.3). Specifically, late-pathway tomato steroidal alkaloids tend to be negatively correlated with early-pathway metabolites in the wild cherry accessions but not other germplasm groups. This finding suggests differential regulation of the early and late parts of the pathway in some, but not all germplasm.

Heritability and reliability estimates confirmed that tomato steroidal alkaloids such as alpha-tomatine, hydroxytomatine, acetoxytomatine, and lycoperoside F, G, or esculeoside A were under strong genetic control (Table 5.1). Although publications containing estimates of broad sense heritability in tomato glycoalkaloids are scant, data generated from the Eshed-Zamir introgression line population (Solanum lycopersicum x

Solanum pennellii) as well as more comprehensive surveys of the tomato clade have reported these values to be between 0.35 and 0.95, with the vast majority being well above 0.50 (Alseekh et al., 2015; Zhu et al., 2018). These reports are consistent with our findings.

All crosses between high alpha-tomatine (LA2213, LA2256, and LA2262) wild cherry tomatoes with low alpha-tomatine (OH8243 and Tainan) confirmed that low alpha tomatine in ripe fruits is a dominant trait (Table 5.2). This outcome aligns with previous findings where LA2213, LA2256, and LA2262 were initially characterized and crossed with low alpha-tomatine varieties (Rick et al., 1994), and suggests that high levels of alpha-tomatine in ripe fruits are a loss of function to shift from early to late steroidal alkaloids . This observation could be due to changes in a regulatory gene, such as a

202 transcription factor, or allelic variation in a promoter binding region that would affect the expression of steroidal alkaloid biosynthesis genes.

A GWAS on the diversity panel indicated major loci for alpha tomatine and total steroidal alkaloids on chromosome 3 which was confirmed by QTL analysis in the biparental backcross population. The identification of QTL associated with the concentration of early metabolites (chromosome 3) and late metabolites (chromosome 10 and 11) parallels the findings in our validation population and mirrors separation of germplasm based on metabolite loading (Figure 1b), and suggests that the coordinate regulation of early and late pathway steps is under separate genetic control.

Previously, we demonstrated that known loci and QTL could be detected in tomato using GWAS with low-density marker panels (Sim et al., 2015). In Solanum lycopersicum, genetic recombination has been shown to occur over a large windows (3-

16 cM) and can vary depending on market class (Robbins et al., 2011; S.-C. Sim et al.,

2012b; Van Berloo et al., 2008). By contrast, LD decay in wild species such as Solanum pimpinellifolium, can occur in intervals as small as 18 kB (Lin et al., 2019). The marker panel we used was optimized for polymorphic information content and genomic coverage. Markers were deliberately selected in euchromatic segments of the genome which exhibit higher rates of recombination relative to centromeric regions. We then used both single marker and haplotype approaches where the point of analysis was pairs of markers organized by physical position. Ultimately, both the single marker and haplotype approaches yielded similar results, suggesting that marker coverage was sufficient to discover major effect QTL given the limitations of population size and recombination.

203 The detection of similar major QTL on chromosomes 3, 10 and 11 in a segregating population provide additional confidence and validate our initial findings.

Previous studies seeking to map genes involved in the biosynthesis of tomato steroidal alkaloids have found QTL on all chromosomes of the tomato genome (Alseekh et al., 2015), but the majority of reports highlight chromosomes 2, 3, 7, 10, and 12

(Baldina et al., 2016; Ballester et al., 2016; Itkin et al., 2013; Zhu et al., 2018). The

GAME genes have been found primarily on chromosomes 7 and 12 (Itkin et al., 2013) along with more recent reports highlighting chromosomes 1 (Cárdenas et al., 2016) and 2

(Cárdenas et al., 2019). A previous mQTL study identified multiple candidate genes in a sweep region at the end of chromosome 3 (Zhu et al., 2018). Among these candidates were two ethylene-responsive transcription factors. Many of the steps in the steroidal alkaloid biosynthetic pathway that have been elucidated are modulated by hormones, such as ethylene, during ripening (Iijima et al., 2009, 2008; Itkin et al., 2011). Given that multiple steroidal alkaloids in our population appear to be affected by the same QTL, such as the one on chromosome 3, a transcription factor that can affect multiple steps in the biosynthetic pathway may be responsible. Quantitative trait loci were also discovered on chromosome 10 in our validation population are near the beginning of that chromosome. A sweep region associated with an acetyltransferase, a cytochrome P450, an acyl-CoA dehydrogenase, and seven UDP-glucosyltransferases has been previously defined on the opposite end of chromosome 10 (Zhu et al., 2018). Therefore, we hypothesize that our QTL is unique to our population and additional efforts are needed to fine map this region on chromosome 10.

204 Tomato steroidal alkaloids exhibit a wide range of diversity in the red-fruited germplasm we examined. While once thought to degrade after ripening (Rick et al.,

1994), we calculated average concentrations of total steroidal alkaloids upwards of 24 mg/100 g fresh weight indicating that not only are they present in ripe fruit, but they can be found in high concentrations relative to other secondary metabolites. Correlation analyses demonstrated that the relationship among steroidal alkaloids throughout the pathway is unique to each sub-population in our study. Particularly, wild cherry accessions used in our study exhibited a tradeoff between early and late pathway intermediates. Genetic analyses revealed that biosynthesis of these compounds is under distinct genetic control where the concentrations of early and late intermediates in the pathway are associated with unique regions of the tomato genome. Some of the QTL discovered in this study appear to unique to the population studied and fine mapping experiments are needed to determine exact gene(s) contributing to phenotypes such as accumulating high amounts of alpha-tomatine in ripe fruits.

5.5 Materials and Methods

5.5.6 Plant Material

A panel of 107 tomato accessions from public breeding programs, the C.M. Rick Tomato

Genetics Resource Center (TGRC), and the United States Department of Agriculture

(USDA) National Plant Germplasm System (NPGS) was assembled. Selection criteria included known patterns of genetic diversity as determined by genotyping using a 7,720

SNP array (Blanca et al., 2015; S.-C. Sim et al., 2012b), previous phenotypic information for alkaloid content (Rick et al., 1994), and geographic origin. Geographic distribution of

205 wild tomato species included in this diversity panel is visualized in Supp. Fig. 1. To maximize genetic diversity within the red-fruited Solanum species, selections included

Solanum pimpinellifolium, Solanum lycopersicum var. cerasiforme (both wild and cultivated cherry tomatoes), and processing varieties of Solanum lycopersicum. Sub- population sizes were chosen based on rarefaction analysis (Blanca et al., 2015). These data allowed for the calculation of minimum sample sizes needed to capture at least of

85% of the genetic variation in each sub-population. This approach resulted in four groups of previously classified material: 25 accessions of S. pimpinellifolium, 37 accessions of S. lycopersicum var. cerasiforme (including six cultivated cherry varieties), and 39 accessions of S. lycopersicum (Blanca et al., 2015). The S. lycopersicum selections included processing varieties representing the diversity present in this market class (Blanca et al., 2015; S.-C. Sim et al., 2012b). Within each species, samples were selected from each of the different sub-groups previously identified (Blanca et al., 2015).

Heinz 1706 (LA4345) and M82 (LA3475) were included as part of the cultivated processing tomatoes to serve as standard reference material due to their widespread and historical use in plant biology research. Finally, six S. lycopersicum x S. lycopersicum var. cerasiforme hybrids were included (wide-cross hybrids) that served as the progenitors of an inbred backcross mapping population.

5.5.7 Field Trial

Plants were grown in three field environments during the summers of 2017 and 2018 at the North Central Agricultural Research Station in Fremont, OH. Summer, 2017 included the first field environment (planted 5/24/2017) while the remaining two (early planting

206 (5/29/2018); late planting (6/18/2018)) were grown during 2018. Environments are differentiated by time (year and/or planting date) and geophysical conditions such as soil composition. In all three environments, plants were grown in a randomized complete block design with two blocks per environment (total n=642). Within each environment, plants were grown in plots comprised of six-ten plants and samples were an aggregate of fruits from all plants within a plot excluding individuals at the ends of each plot. Red ripe fruits were harvested over a three-week period from each environment and stored whole at -40 ° C until analysis.

5.5.8 Inbred Backcross Population Development

Three accessions of tomato containing high levels of alpha-tomatine in red-ripe fruits

(LA2213, LA2256, and LA2262) as well as one processing (OH8243) and one cherry

(Tainan) variety were cultivated under glasshouse conditions during Spring 2017 and crossed. Seedlings from the resulting 6 sets of F1 progeny as well as parental material were included in the 2017 and 2018 field trials and profiled for steroidal alkaloids as described below. Of all crosses made, F1 individuals from OH8243 x LA2213 were backcrossed to the processing tomato recurrent parent (OH8243) and 200 individual BC1 plants were grown out under glasshouse conditions during spring, 2018. BC1S1 progeny were then field grown during Summer 2018, and BC1S1 individuals were genotyped and profiled for steroidal alkaloids as described below.

207 5.5.9 Chemical Reagents

Acetonitrile, formic acid, isopropanol, methanol, and water were of LC-MS grade and purchased from Fisher Scientific (Pittsburgh, PA). Alpha-tomatine (≥90% purity) and solanidine (≥99% purity) were purchased from Extrasynthese (Genay, France). Alpha- solanine (≥95% purity) and tomatidine (≥95% purity) were purchased from Sigma

Aldrich (St. Louis, MO).

5.5.10 Steroidal Alkaloid Profiling

Steroidal alkaloids were extracted and quantified, as previously described, using a rapid extraction and ultra-high performance liquid chromatography tandem mass spectrometry

(UHPLC-MS/MS) (Dzakovich et al., 2020).

5.5.11 Genotyping

DNA was obtained from immature true leaves of seedlings using a hexadecyltrimethylammonium bromide (CTAB) procedure optimized for tomato (Sim et al., 2015) and scaled to accommodate 96-tube racks. Leaf samples were homogenized with 60 mL of 5% Sarkosyl, 150 mL of CTAB lysis buffer (2 M NaCl, 0.2 M Tris, 0.05

M EDTA, and 2% CTAB maintained at a 7.5 pH), and 150 mL of extraction buffer (0.35

M sorbitol, 0.1 M Tris, 25 mM sodium bisulfite, and 5 mM EDTA maintained at pH 7.5) in 1.2 mL tubes. Two 4 mm metal beads were added to each tube prior to sealing and the rack was shaken at 200 rpm in a GenoGrinder (OPS Diagnostics, Lebanon, NJ) for two min. Racks were then placed in a 65°C incubator for 20 min and then left to cool at room temperature for an additional 10 min. Each tube was then spiked with 350 mL of chloroform isoamyl alcohol (24:1) and phase separation was induced by centrifuging

208 plates for 10 min at 5,000 x g. Aqueous phases from each sample were transferred to a corresponding well on a fresh 96 well plate and DNA was precipitated by adding 110 mL of isopropyl alcohol to each well and centrifuging for an additional 15 min at 5000 x g.

Plates were then uncovered and left to air dry for 30 min and DNA was resuspended in

100 mL of Tris-EDTA buffer (10mM Tris and 0.1 mM EDTA).

To genotype both the diversity panel and BC1S1 population, 384 single nucleotide polymorphism (SNP) markers created by the Solanaceae Coordinated Agricultural

Project (SolCAP) were utilized (Hamilton et al., 2012; S.-C. Sim et al., 2012a, 2012b).

Genetic positions based on Sim and colleagues (2012a) and established patterns of linkage disequilibrium decay in Solanum lycopersicum (Robbins et al., 2011) were used to select SNPs distributed across all 12 chromosomes in regions of high recombination.

Gaps in genome coverage were filled in by selecting high PIC markers based on physical position. Genotyping was conducted on 97 members of the diversity germplasm and179

BC1S1 individuals using the PlexSeq™ platform (Agriplex Genomics, Cleveland, OH).

5.5.12 Statistical Analysis

All statistical analyses and data visualization was conducted using R version 3.5.1 (R

Development Core Team, 2018). Based on our experimental design, the following linear model was used to determine the effects of genetics and environment on steroidal alkaloid concentrations:

푌푖푗푘 = 휇 + 퐺푖 + 퐸푗 + 퐵푘(퐸퐽) + 퐺푖퐸푗 + 휀푖푗푘

209 Analysis of variance (ANOVA) using both fixed and random effects models was used to determine signficiance of model parameters and calculate variance estimates, respectively. In the model above, 푌푖푗푘 represents the estimate for a given analyte , 휇 represents the population mean of a given analyte, 퐺푖 represents the contribution due to genetic factors, 퐸푗 represents the contribution due to environmental factors, 퐵푘(퐸퐽) represents within environment effects, 퐺푖퐸푗 represents the interaction between genetic and environmental factors, and 휀푖푗푘 represents the residual error. Variance estimates were calculated using the package “lme4” (Bates et al., 2015). Broad sense heritability (H2) was calculated according to previously described methods (Cotterill, 1987):

2 2 휎 퐺푖 퐻 = 2 2 2 휎 휀푖푗푘 휎 퐺푖퐸푗 휎 퐺푖 + 푛 ∗ 푟 + 푛

Where 휎2 represents the variance, 푛 is equal to the number of environments (three), 푟 is the number of reps within an environment (two), and all other variables are as described above.

Reliability (i2) was calculated with the previously defined model (Bernardo, 2020):

2 2 휎 퐺푖 푖 = 2 2 2 2 2 휎 퐺푖 + 휎 퐸푗 + 휎 퐵푘(퐸퐽) + 휎 퐺푖퐸푗 + 휎 휀푖푗푘

To account for environmental factors and missing data, regression coefficients were extracted from each fixed effects linear model for every genotype and used to generate covariance matrices used for principal components analysis (PCA) and correlation analyses. Population structure and phenotypic variation was visualized using PCA with the packages “FactoMineR” and “Factoextra” using a correlation matrix (each variable

210 scaled to a mean of zero and standard deviation of one) as the input data (Lê et al., 2008).

Boxplots and maps displaying the geographic origin of germplasm studied were generated using ggplot2, ggmap, and maptools (Bivand and Lewin-Koh, 2018; Kahle and

Wickham, 2013; Wickham, 2009). Correlations among analytes were visualized using corrplot (Wei and Simko, 2017).

5.5.13 GWAS

Association analysis was conducted using the R package “rrBLUP” with GWAS utilities

(Endelman, 2011). SNP calls were converted to numerical scoring where 1 was homozygous for the common allele of our reference variety OH8243 (Berry and Gould,

1988). Homozygous for the alternative allele were scored as -1 and heterozygous calls were scored as 0. We used the unified mixed model to test for marker-trait associations

(Yu et al., 2006). This model contains a matrix for kinship (K), considered a random effect, and fixed effects for structure (Q). The K matrix was obtained from the a.mat function in rrBLUP. The Q matrix was developed from principal components analysis

(PCA), as described (Price et al., 2006). Briefly, a covariate matrix was calculated from the standardized n x M (number of lines x number of markers) matrix (N) and by multiplying it by its transformation (N x N-1). The resulting n x n covariate matrix was then used as an input for PCA. We examined the effects of using 1 to 3 Principal

Components (PC) by comparing marker-trait models using the lm() function and by comparing Bayesian information criterion (BIC) for the different models. For example, the naïve model 푌 = 푀푎푟푘푒푟 (푀) was compared to 푌 = 푃퐶1 + 푀 and similarly the model with PC1 was compared to 푌 = 푃퐶1 + 푃퐶2 + 푀 and 푌 = 푃퐶1 + 푃퐶2 + 푃퐶3 +

211 푀 to test the significance of adding additional PC to the model. The results of genome wide analysis with QK models were visualized by graphing -log(P) vs genome position using ggplot2 (Wickham, 2016).

In addition to the single marker approach described above, we used a haplotype-based association analysis to change the point of analysis to pairs of markers (Sim et al., 2015).

Briefly, markers were ordered by physical position and haplotypes were created by iteratively grouping each pair and moving across the chromosome in a sliding window.

Linear models for each pair were run by modeling the interaction between two markers in a haplotype and accounting for population structure by including PC1.

5.5.14 QTL Analysis

Analysis of BC1S1 progeny were analyzed using ANOVA approach with the following model:

푌푖 = 휇 + 푀푖 + 휀푖

Where Yi is the vector of steroidal alkaloid concentrations and 푀푖 represents the effect of a given marker. Significant marker-trait associations were defined at P < 0.01. This critical value was chosen to protect against type-II error given that identified QTL were being cross validated against a separate population. The effect of an allele substitution, the proportion of variance explained (r2), and P-values were summarized as a description of the QTL identified.

Acknowledgments

We thank the crew at the North Central Agriculture Research Station, particular Matt

212 Hoffelich and Frank Thayer. We would also like to thank the Ohio State Department of

Food Science and Technology for providing freezer space to facilitate this study as well as Troy Aldrich and Jiheun Cho for helping coordinate planting and seed saving.

Financial support was provided by an Ohio Agricultural Research and Development

Center Early Career Investigator Award, USDA National Needs Fellowship (2014-

38420-21844) and Hatch funds (OHO01470), Foundation for Food and Agricultural

Research New Innovator Award, and Foods for Health, a focus area of the Discovery

Themes at OSU.

213 Bibliography

Abate-Pella, D., Freund, D.M., Slovin, J.P., Hegeman, A.D., Cohen, J.D., 2017. An

improved method for fast and selective separation of carotenoids by LC–MS. J.

Chromatogr. B 1067, 34–37. doi:10.1016/J.JCHROMB.2017.09.039

Abdelkareem, A., Thagun, C., Nakayasu, M., Mizutani, M., Hashimoto, T., Shoji, T.,

2017. Jasmonate-induced biosynthesis of steroidal glycoalkaloids depends on COI1

proteins in tomato. Biochem. Biophys. Res. Commun. 94.

doi:10.1016/j.bbrc.2017.05.132

Acharjee, A., Kloosterman, B., de Vos, R.C.H., Werij, J.S., Bachem, C.W.B., Visser,

R.G.F., Maliepaard, C., 2011. Data integration and network reconstruction with

~omics data using Random Forest regression in potato. Anal. Chim. Acta 705, 56–

63. doi:10.1016/j.aca.2011.03.050

Adams, C.M., Ebert, S.M., Dyle, M.C., 2015. Use of mRNA expression signatures to

discover small molecule inhibitors of skeletal muscle atrophy. Curr. Opin. Clin.

Nutr. Metab. Care. doi:10.1097/MCO.0000000000000159

Afendi, F.M., Okada, T., Yamazaki, M., Hirai-Morita, A., Nakamura, Y., Nakamura, K.,

Ikeda, S., Takahashi, H., Altaf-Ul-Amin, M., Darusman, L.K., Saito, K., Kanaya, S.,

2012. KNApSAcK Family Databases: Integrated Metabolite–Plant Species

Databases for Multifaceted Plant Research. Plant Cell Physiol. 53, e1–e1.

doi:10.1093/PCP/PCR165

Akiyama, R., Lee, H.J., Nakayasu, M., Osakabe, K., Osakabe, Y., Umemoto, N., Saito,

K., Muranaka, T., Sugimoto, Y., Mizutani, M., 2019. Characterization of steroid 5α-

214 reductase involved in α-tomatine biosynthesis in tomatoes. Plant Biotechnol.

doi:10.5511/plantbiotechnology.19.1030a

Al-Babili, S., Hugueney, P., Schledz, M., Welsch, R., Frohnmeyer, H., Laule, O., Beyer,

P., 2000. Identification of a novel gene coding for neoxanthin synthase from

Solanum tuberosum. FEBS Lett. 485, 168–172. doi:10.1016/S0014-5793(00)02193-

1

Alonso, A., Marsal, S., Julià, A., 2015. Analytical methods in untargeted metabolomics:

State of the art in 2015. Front. Bioeng. Biotechnol. doi:10.3389/fbioe.2015.00023

Alseekh, S., Fernie, A.R., 2018. Metabolomics 20 years on: what have we learned and

what hurdles remain? Plant J. 94, 933–942. doi:10.1111/tpj.13950

Alseekh, S., Tohge, T., Wendenberg, R., Scossa, F., Omranian, N., Li, J., Kleessen, S.,

Giavalisco, P., Pleban, T., Mueller-Roeber, B., Zamir, D., Nikoloski, Z., Fernie,

A.R., 2015. Identification and mode of inheritance of quantitative trait loci for

secondary metabolite abundance in tomato. Plant Cell 27, 485–512.

doi:10.1105/tpc.114.132266

Amir, H., Karas, M., Giat, J., Danilenko, M., Levy, R., Yermiahu, T., Levy, J., Sharoni,

Y., 1999. Lycopene and 1,25‐dihydroxyvitamin d3 cooperate in the inhibition of cell

cycle progression and induction of differentiation in hl‐60 leukemic cells. Nutr.

Cancer 33, 105–112. doi:10.1080/01635589909514756

Andrews, S., 2015. FASTQC A Quality Control tool for High Throughput Sequence

Data. Babraham Inst.

Arathi, B.P., Sowmya, P.R.R., Vijay, K., Dilshad, P., Saikat, B., Gopal, V.,

Lakshminarayana, R., 2015. An Improved Method of UPLC-PDA-MS/MS Analysis 215 of Lycopene Isomers. Food Anal. Methods 8, 1962–1969. doi:10.1007/s12161-014-

0083-5

Arranz, S., Martínez-Huélamo, M., Vallverdu-Queralt, A., Valderas-Martinez, P., Illán,

M., Sacanella, E., Escribano, E., Estruch, R., Lamuela-Raventos, R.M., 2015.

Influence of olive oil on carotenoid absorption from tomato juice and effects on

postprandial lipemia. Food Chem. 168, 203–10.

doi:10.1016/j.foodchem.2014.07.053

Aust, O., Stahl, W., Sies, H., Tronnier, H., Heinrich, U., 2005. Supplementation with

Tomato-Based Products Increases Lycopene, Phytofluene, and Phytoene Levels in

Human Serum and Protects Against UV-light-induced Erythema. Int. J. Vitam. Nutr.

Res. 75, 54–60. doi:10.1024/0300-9831.75.1.54

Baffy, G., Brunt, E.M., Caldwell, S.H., 2012. Hepatocellular carcinoma in non-alcoholic

fatty liver disease: an emerging menace. J. Hepatol. 56, 1384–91.

doi:10.1016/j.jhep.2011.10.027

Baginsky, S., Hennig, L., Zimmermann, P., Gruissem, W., 2010. Gene expression

analysis, proteomics, and network discovery. Plant Physiol. 152, 402–10.

doi:10.1104/pp.109.150433

Bai, Y., Lindhout, P., 2007. Domestication and breeding of tomatoes: what have we

gained and what can we gain in the future? Ann. Bot. 100, 1085–94.

doi:10.1093/aob/mcm150

Baldina, S., Picarella, M.E., Troise, A.D., Pucci, A., Ruggieri, V., Ferracane, R., Barone,

A., Fogliano, V., Mazzucato, A., 2016. Metabolite Profiling of Italian Tomato

Landraces with Different Fruit Types. Front. Plant Sci. 7, 664. 216 doi:10.3389/fpls.2016.00664

Ballester, A.-R., Tikunov, Y., Molthoff, J., Grandillo, S., Viquez-Zamora, M., de Vos, R.,

de Maagd, R.A., van Heusden, S., Bovy, A.G., 2016. Identification of Loci

Affecting Accumulation of Secondary Metabolites in Tomato Fruit of a Solanum

lycopersicum × Solanum chmielewskii Introgression Line Population. Front. Plant

Sci. 7, 1428. doi:10.3389/fpls.2016.01428

Barchi, L., Pietrella, M., Venturini, L., Minio, A., Toppino, L., Acquadro, A., Andolfo,

G., Aprea, G., Avanzato, C., Bassolino, L., Comino, C., Molin, A.D., Ferrarini, A.,

Maor, L.C., Portis, E., Reyes-Chin-Wo, S., Rinaldi, R., Sala, T., Scaglione, D.,

Sonawane, P., Tononi, P., Almekias-Siegl, E., Zago, E., Ercolano, M.R., Aharoni,

A., Delledonne, M., Giuliano, G., Lanteri, S., Rotino, G.L., 2019. A chromosome-

anchored eggplant genome sequence reveals key events in Solanaceae evolution.

Sci. Rep. 9, 1–13. doi:10.1038/s41598-019-47985-w

Barnett, D.W., Garrison, E.K., Quinlan, A.R., Str̈mberg, M.P., Marth, G.T., 2011.

Bamtools: A C++ API and toolkit for analyzing and managing BAM files.

Bioinformatics. doi:10.1093/bioinformatics/btr174

Bates, D., Mächler, M., Bolker, B., Walker, S., 2015. Fitting Linear Mixed-Effects

Models Using lme4. J. Stat. Softw. 67, 1–48. doi:10.18637/jss.v067.i01

Bednarz, H., Roloff, N., Niehaus, K., 2019. Mass Spectrometry Imaging of the Spatial

and Temporal Localization of Alkaloids in Nightshades. J. Agric. Food Chem.

doi:10.1021/acs.jafc.9b01155

Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N., Golani, I., 2001. Controlling the false

discovery rate in behavior genetics research. Behav. Brain Res. 125, 279–284. 217 doi:10.1016/S0166-4328(01)00297-2

Bernardo, R., 2020. Reinventing quantitative genetics for plant breeding: something old,

something new, something borrowed, something BLUE. Heredity (Edinb).

doi:10.1038/s41437-020-0312-1

Berry, S.Z., Gould, W.A., 1988. “Ohio 8243” processing tomato. Hortscience 23, 930.

Bijttebier, S., D’Hondt, E., Noten, B., Hermans, N., Apers, S., Voorspoels, S., 2014.

Ultra high performance liquid chromatography versus high performance liquid

chromatography: Stationary phase selectivity for generic carotenoid screening. J.

Chromatogr. A 1332, 46–56. doi:10.1016/J.CHROMA.2014.01.042

Bivand, R., Lewin-Koh, N., 2018. maptools: Tools for Handling Spatial Objects.

Blanca, J., Cañizares, J., Cordero, L., Pascual, L., Diez, M.J., Nuez, F., 2012. Variation

revealed by SNP genotyping and morphology provides insight into the origin of the

tomato. PLoS One 7, e48198. doi:10.1371/journal.pone.0048198

Blanca, J., Montero-Pau, J., Sauvage, C., Bauchet, G., Illa, E., Díez, M.J., Francis, D.,

Causse, M., van der Knaap, E., Cañizares, J., 2015. Genomic variation in tomato,

from wild ancestors to contemporary breeding accessions. BMC Genomics 16, 257.

doi:10.1186/s12864-015-1444-1

Bohn, T., Blackwood, M., Francis, D.M., Tian, Q., Schwartz, S.J., Clinton, S.K.,

Blackwell, M., Francis, D.M., Tian, Q., Schwartz, S.J., Clinton, S.K., Blackwood,

M., Francis, D.M., Tian, Q., Schwartz, S.J., Clinton, S.K., Blackwell, M., Francis,

D.M., Tian, Q., Schwartz, S.J., Clinton, S.K., Blackwood, M., 2013. Bioavailability

of Phytochemical Constituents From a Novel Soy Fortified Lycopene Rich Tomato

Juice Developed for Targeted Cancer Prevention Trials. Nutr. Cancer 65, null-null. 218 doi:10.1080/01635581.2011.630156

Boileau, T.W.-M., 2003. Prostate Carcinogenesis in N-methyl-N-nitrosourea (NMU)-

Testosterone-Treated Rats Fed Tomato Powder, Lycopene, or Energy-Restricted

Diets. CancerSpectrum Knowl. Environ. 95, 1578–1586. doi:10.1093/jnci/djg081

Boileau, T.W.M., Liao, Z., Kim, S., Lemeshow, S.A., Erdman Jr., J.W., Clinton, S.K.,

2003. Prostate carcinogenesis in N-methyl-N-nitrosourea (NMU)-testosterone-

treated rats fed tomato powder, lycopene, or energy-restricted diets. J. Natl. Cancer

Inst. 95, 1578–1586. doi:10.1093/jnci/djg081

Bolger, A.M., Lohse, M., Usadel, B., 2014. Trimmomatic: A flexible trimmer for

Illumina sequence data. Bioinformatics. doi:10.1093/bioinformatics/btu170

Bone, R.A., Landrum, J.T., Tarsis, S.L., 1985. Preliminary identification of the human

macular pigment. Vision Res. 25, 1531–1535. doi:10.1016/0042-6989(85)90123-3

Bosch, F.X., Ribes, J., Borràs, J., 1999. Epidemiology of primary liver cancer. Semin.

Liver Dis. 19, 271–85. doi:10.1055/s-2007-1007117

Bosch, F.X., Ribes, J., Díaz, M., Cléries, R., 2004. Primary liver cancer: Worldwide

incidence and trends. Gastroenterology 127, S5–S16.

doi:10.1053/j.gastro.2004.09.011

Bouhaddani, S. El, Houwing-Duistermaat, J., Salo, P., Perola, M., Jongbloed, G., Uh, H.-

W., 2016. Evaluation of O2PLS in Omics data integration. BMC Bioinformatics 17

Suppl 2, 11. doi:10.1186/s12859-015-0854-z

Bouvier, F., Hugueney, P., d’Harlingue, A., Kuntz, M., Camara, B., 1994. Xanthophyll

biosynthesis in chromoplasts: isolation and molecular cloning of an enzyme

catalyzing the conversion of 5,6-epoxycarotenoid into ketocarotenoid. Plant J. 6, 45– 219 54. doi:10.1046/j.1365-313X.1994.6010045.x

Bovy, A., Bovy, A., Vos, R. De, Vos, R. De, Kemper, M., Kemper, M., Schijlen, E.,

Schijlen, E., Pertejo, M.A., Pertejo, M.A., Muir, S., Muir, S., Collins, G., Collins,

G., Robinson, S., Robinson, S., Verhoeyen, M., Verhoeyen, M., Hughes, S., Hughes,

S., Santos-buelga, C., Santos-buelga, C., Tunen, A. Van, Tunen, A. Van, 2002.

High-Flavonol Tomatoes Resulting from the Heterologous Expression of the Maize

Transcription Factor Genes. Society 14, 2509–2526. doi:10.1105/tpc.004218.growth

Bradley, P.H., Brauer, M.J., Rabinowitz, J.D., Troyanskaya, O.G., 2009. Coordinated

concentration changes of transcripts and metabolites in Saccharomyces cerevisiae.

PLoS Comput. Biol. 5, e1000270. doi:10.1371/journal.pcbi.1000270

Bramley, P.M., 2002. Regulation of carotenoid formation during tomato fruit ripening

and development. J. Exp. Bot. 53, 2107–2113. doi:10.1093/jxb/erf059

Bramley, P.M., 1992. Analysis of carotenoids by high performance liquid

chromatography and diode-array detection. Phytochem. Anal. 3, 97–104.

doi:10.1002/pca.2800030302

Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32.

doi:10.1023/A:1010933404324

Britton, G., 1996. Carotenoids, in: Natural Food Colorants. Springer US, Boston, MA,

pp. 197–243. doi:10.1007/978-1-4615-2155-6_7

Britton, G., Liaaen-Jensen, S., Pfander, H., 2008. Carotenoids, Vol. 4: Natural Functions.

Springer Science & Business Media.

Broadhurst, D., Goodacre, R., Reinke, S.N., Kuligowski, J., Wilson, I.D., Lewis, M.R.,

Dunn, W.B., 2018. Guidelines and considerations for the use of system suitability 220 and quality control samples in mass spectrometry assays applied in untargeted

clinical metabolomic studies. Metabolomics. doi:10.1007/s11306-018-1367-3

Burton-Freeman, B.M., Sesso, H.D., 2014. Whole Food versus Supplement: Comparing

the Clinical Evidence of Tomato Intake and Lycopene Supplementation on

Cardiovascular Risk Factors. Adv. Nutr. 5, 457–485. doi:10.3945/an.114.005231

Cadigan, K.M., Nusse, R., 1997. Wnt signaling: a common theme in

animal development. Genes Dev. 11, 3286–3305. doi:10.1101/gad.11.24.3286

Caprioli, G., Cahill, M., Logrippo, S., James, K., 2015. Elucidation of the mass

fragmentation pathways of tomatidine and β1-hydroxytomatine using orbitrap mass

spectrometry. Nat. Prod. Commun. 10, 575–576. doi:10.1177/1934578x1501000409

Caprioli, G., Cahill, M.G., James, K.J., 2014. Mass Fragmentation Studies of α-Tomatine

and Validation of a Liquid Chromatography LTQ Orbitrap Mass Spectrometry

Method for Its Quantification in Tomatoes. Food Anal. Methods 7, 1565–1571.

doi:10.1007/s12161-013-9771-9

Cárdenas, P.D., Sonawane, P.D., Heinig, U., Bocobza, S.E., Burdman, S., Aharoni, A.,

2015. The bitter side of the nightshades: Genomics drives discovery in Solanaceae

steroidal alkaloid metabolism. Phytochemistry 113, 24–32.

doi:10.1016/j.phytochem.2014.12.010

Cárdenas, P.D., Sonawane, P.D., Heinig, U., Jozwiak, A., Panda, S., Abebie, B.,

Kazachkova, Y., Pliner, M., Unger, T., Wolf, D., Ofner, I., Vilaprinyo, E., Meir, S.,

Davydov, O., Gal-on, A., Burdman, S., Giri, A., Zamir, D., Scherf, T., Szymanski,

J., Rogachev, I., Aharoni, A., 2019. Pathways to defense metabolites and evading

fruit bitterness in genus Solanum evolved through 2-oxoglutarate-dependent 221 dioxygenases. Nat. Commun. 10, 5169. doi:10.1038/s41467-019-13211-4

Cárdenas, P.D., Sonawane, P.D., Pollier, J., Vanden Bossche, R., Dewangan, V.,

Weithorn, E., Tal, L., Meir, S., Rogachev, I., Malitsky, S., Giri, A.P., Goossens, A.,

Burdman, S., Aharoni, A., 2016. GAME9 regulates the biosynthesis of steroidal

alkaloids and upstream isoprenoids in the plant mevalonate pathway. Nat. Commun.

7, 10654. doi:10.1038/ncomms10654

Cassidy, A., Minihane, A.M., 2017. The role of metabolism (and the microbiome) in

defining the clinical efficacy of dietary flavonoids. Am. J. Clin. Nutr. 105, 10–22.

doi:10.3945/ajcn.116.136051

Cavill, R., Jennen, D., Kleinjans, J., Briedé, J.J., 2015. Transcriptomic and metabolomic

data integration. Brief. Bioinform. bbv090-. doi:10.1093/bib/bbv090

Cayen, M., 1971. Effect of dietary tomatine on cholesterol metabolism in the rat. J. Lipid

Res. 12, 482–490.

Chahar, M.K., Sharma, N., Dobhal, M.P., Joshi, Y.C., 2011. Flavonoids: A versatile

source of anticancer drugs. Pharmacogn. Rev. 5, 1–12. doi:10.4103/0973-

7847.79093

Chandler, L. a., Schwartz, S. j., 1987. HPLC Separation of Cis-Trans Carotene Isomers in

Fresh and Processed Fruits and Vegetables. J. Food Sci. 52, 669–672.

doi:10.1111/j.1365-2621.1987.tb06700.x

Chappell, J., Wolf, F., Proulx, J., Cuellar, R., Saunders, C., 1995. Is the Reaction

Catalyzed by 3-Hydroxy-3-Methylglutaryl Coenzyme A Reductase a Rate-Limiting

Step for Isoprenoid Biosynthesis in Plants? Plant Physiol. 109, 1337–1343.

Chhangawala, S., Rudy, G., Mason, C.E., Rosenfeld, J.A., 2015. The impact of read 222 length on quantification of differentially expressed genes and splice junction

detection. Genome Biol. 16, 131. doi:10.1186/s13059-015-0697-y

Choi, S.-K., Seo, J.-S., 2013. Lycopene supplementation suppresses oxidative stress

induced by a high fat diet in gerbils. Nutr. Res. Pract. 7, 26–33.

doi:10.4162/nrp.2013.7.1.26

Choi, S.H., Ahn, J.-B.B., Kozukue, N., Kim, H.-J.J., Nishitani, Y., Zhang, L., Mizuno,

M., Levin, C.E., Friedman, M., 2012. Structure–Activity Relationships of α-, β1-, γ-,

and δ-Tomatine and Tomatidine against Human Breast (MDA-MB-231), Gastric

(KATO-III), and Prostate (PC3) Cancer Cells. J. Agric. Food Chem. 60, 3891–3899.

doi:10.1021/jf3003027

Chong, I.G., Jun, C.H., 2005. Performance of some variable selection methods when

multicollinearity is present. Chemom. Intell. Lab. Syst. 78, 103–112.

doi:10.1016/j.chemolab.2004.12.011

Ciccone, M.M., Cortese, F., Gesualdo, M., Carbonara, S., Zito, A., Ricci, G., De Pascalis,

F., Scicchitano, P., Riccioni, G., 2013. Dietary Intake of Carotenoids and Their

Antioxidant and Anti-Inflammatory Effects in Cardiovascular Care. Mediators

Inflamm. 2013, e782137. doi:10.1155/2013/782137

Cichon, M.J., Riedl, K.M., Schwartz, S.J., 2017a. A metabolomic evaluation of the

phytochemical composition of tomato juices being used in human clinical trials.

Food Chem. 228, 270–278. doi:10.1016/j.foodchem.2017.01.118

Cichon, M.J., Riedl, K.M., Wan, L., Thomas-Ahner, J.M., Francis, D.M., Clinton, S.K.,

Schwartz, S.J., 2017b. Plasma Metabolomics Reveals Steroidal Alkaloids as Novel

Biomarkers of Tomato Intake in Mice. Mol. Nutr. Food Res. 61, 1700241. 223 doi:10.1002/mnfr.201700241

Cipollini, M.L., Levey, D.J., 1997. Antifungal activity of Solanum fruit glycoalkaloids:

Implications for frugivory and seed dispersal. Ecology 78, 799–809.

doi:10.2307/2266059

Clevers, H., 2006. Wnt/beta-catenin signaling in development and disease. Cell 127,

469–80. doi:10.1016/j.cell.2006.10.018

Clinton, S.K., 2009. Lycopene: Chemistry, Biology, and Implications for Human Health

and Disease. Nutr. Rev. 56, 35–51. doi:10.1111/j.1753-4887.1998.tb01691.x

Clinton, S.K., Emenhiser, C., Schwartz, S.J., Bostwick, D.G., Williams, A.W., Moore,

B.J., Erdman, J.W., Erdman Jr., J.W., Erdman, J.W., 1996. cis-trans lycopene

isomers, carotenoids, and retinol in the human prostate 5, 823–833.

Cohen, J.C., Horton, J.D., Hobbs, H.H., 2011. Human Fatty Liver Disease: Old Questions

and New Insights. Science (80-. ). 332, 1519–1523. doi:10.1126/science.1204265

Collino, A., Termanini, A., Nicoli, P., Diaferia, G., Polletti, S., Recordati, C., Castiglioni,

V., Caruso, D., Mitro, N., Natoli, G., Ghisletti, S., 2018. Sustained activation of

detoxification pathways promotes liver carcinogenesis in response to chronic bile

acid-mediated damage. PLOS Genet. 14, e1007380.

doi:10.1371/journal.pgen.1007380

Cooperstone, J.L., Francis, D.M., Schwartz, S.J., 2016. Thermal processing differentially

affects lycopene and other carotenoids in cis-lycopene containing, tangerine

tomatoes. Food Chem. 210, 466–472. doi:10.1016/J.FOODCHEM.2016.04.078

Cooperstone, J.L., Ralston, R.A., Riedl, K.M., Haufe, T.C., Schweiggert, R.M., King,

S.A., Timmers, C.D., Francis, D.M., Lesinski, G.B., Clinton, S.K., Schwartz, S.J., 224 2015a. Enhanced bioavailability of lycopene when consumed as cis-isomers from

tangerine compared to red tomato juice, a randomized, cross-over clinical trial. Mol.

Nutr. Food Res. 59, 658–669. doi:10.1002/mnfr.201400658

Cooperstone, J.L., Ralston, R.A., Riedl, K.M., Haufe, T.C., Schweiggert, R.M., King,

S.A., Timmers, C.D., Francis, D.M., Lesinski, G.B., Clinton, S.K., Schwartz, S.J.,

2015b. Enhanced bioavailability of lycopene when consumed as cis -isomers from

tangerine compared to red tomato juice, a randomized, cross-over clinical trial. Mol.

Nutr. Food Res. 59, 658–669. doi:10.1002/mnfr.201400658

Cooperstone, J.L., Tober, K.L., Riedl, K.M., Teegarden, M.D., Cichon, M.J., Francis,

D.M., Schwartz, S.J., Oberyszyn, T.M., 2017. Tomatoes protect against

development of UV-induced keratinocyte carcinoma via metabolomic alterations.

Sci. Rep. 7, 5106. doi:10.1038/s41598-017-05568-7

Cotterill, P., 1987. Short note: on estimating heritability according to practical

applications. Silvae Genet. 36, 46–48.

Crozier, A., Del Rio, D., Clifford, M.N., 2010. Bioavailability of dietary flavonoids and

phenolic compounds. Mol. Aspects Med. 31, 446–467.

doi:10.1016/j.mam.2010.09.007

Dalal, M., Chinnusamy, V., Bansal, K.C., 2010. Isolation and functional characterization

of lycopene beta-cyclase (CYC-B) promoter from Solanum habrochaites. BMC

Plant Biol. 10, 61. doi:10.1186/1471-2229-10-61

Daood, H.G., Bencze, G., Palotas, G., Pek, Z., Sidikov, A., Helyes, L., 2014. HPLC

Analysis of Carotenoids from Tomatoes Using Cross-Linked C18 Column and MS

Detection. J. Chromatogr. Sci. 52, 985–991. doi:10.1093/chromsci/bmt139 225 Day, A.J., Williamson, G., 2001. Biomarkers for exposure to dietary flavonoids: a review

of the current evidence for identification of quercetin glycosides in plasma. Br. J.

Nutr. 86, S105–S110. doi:10.1079/bjn2001342

De Jesus, S., 2005. Genetic alteration of plant secondary metabolism: modification,

enhancement and characterization of pigments in Tomato (Lycopersicon

Esculentum) fruit. The Ohio State University.

De Pascual-Teresa, S., Moreno, D.A., García-Viguera, C., 2010. Flavanols and

Anthocyanins in Cardiovascular Health: A Review of Current Evidence. Int. J. Mol.

Sci. 11, 1679–1703. doi:10.3390/ijms11041679

Dekant, W., Vamvakas, S., 1993. Glutathione-dependent bioactivation of xenobiotics.

Xenobiotica 23, 873–887. doi:10.3109/00498259309059415

Del Giudice, R., Raiola, A., Tenore, G.C., Frusciante, L., Barone, A., Monti, D.M.,

Rigano, M.M., 2015. Antioxidant bioactive compounds in tomato fruits at different

ripening stages and their effects on normal and cancer cells. J. Funct. Foods 18, 83–

94. doi:10.1016/J.JFF.2015.06.060

Demmig-Adams, B., Adams, W.W., 1992. Carotenoid composition in sun and shade

leaves of plants with different life forms. Plant. Cell Environ. 15, 411–419.

doi:10.1111/j.1365-3040.1992.tb00991.x

Demmig-Adams, B., Gilmore, A.M., Adams, W.W., 1996. Carotenoids 3: in vivo

function of carotenoids in higher plants. FASEB J. 10, 403–412.

Dunn, W.B., Wilson, I.D., Nicholls, A.W., Broadhurst, D., 2012. The importance of

experimental design and QC samples in large-scale and MS-driven untargeted

metabolomic studies of humans. Bioanalysis 4, 2249–2264. doi:10.4155/bio.12.204 226 During, A., Harrison, E.H., 2004. Intestinal absorption and metabolism of carotenoids:

insights from cell culture. Arch. Biochem. Biophys. 430, 77–88.

doi:10.1016/j.abb.2004.03.024

Dyle, M.C., Ebert, S.M., Cook, D.P., Kunkel, S.D., Fox, D.K., Bongers, K.S., Bullard, S.

a., Dierdorff, J.M., Adams, C.M., 2014. Systems-based discovery of tomatidine as a

natural small molecule inhibitor of skeletal muscle atrophy. J. Biol. Chem. 289,

14913–14924. doi:10.1074/jbc.M114.556241

Dzakovich, M.P., Gas-Pascual, E., Orchard, C.J., Sari, E.N., Riedl, K.M., Schwartz, S.J.,

Francis, D.M., Cooperstone, J.L., 2019. Analysis of Tomato Carotenoids:

Comparing Extraction and Chromatographic Methods. J. AOAC Int. 102, 1069–

1079. doi:10.5740/jaoacint.19-0017

Dzakovich, M.P., Hartman, J.L., Cooperstone, J.L., 2020. A High-Throughput Extraction

and Analysis Method for Steroidal Glycoalkaloids in Tomato. Front. Plant Sci. 11,

767. doi:10.3389/fpls.2020.00767

Ebert, S.M., Dyle, M.C., Bullard, S.A., Dierdorff, J.M., Murry, D.J., Fox, D.K., Bongers,

K.S., Lira, V.A., Meyerholz, D.K., Talley, J.J., Adams, C.M., 2015. Identification

and small molecule inhibition of an activating transcription factor 4 (ATF4)-

dependent pathway to age-related skeletal muscle weakness and atrophy. J. Biol.

Chem. doi:10.1074/jbc.M115.681445

El-Serag, H.B., Rudolph, K.L., 2007. Hepatocellular carcinoma: epidemiology and

molecular carcinogenesis. Gastroenterology 132, 2557–76.

doi:10.1053/j.gastro.2007.04.061

Eltayeb, Elsadig A., Roddick, J.G., 1984. Changes in the Alkaloid Content of Developing 227 Fruits of Tomato ( Lycopersicon esculentum Mill .) II. J. Exp. Bot. 35, 261–267.

doi:10.1093/jxb/35.2.261

Eltayeb, E.A., Roddick, J.G., 1984. Changes in the Alkaloid Content of Developing

Fruits of Tomato ( Lycopersicon esculentum Mill.) I. J. Exp. Bot. 35, 252–260.

doi:10.1093/jxb/35.2.252

Endelman, J.B., 2011. Ridge Regression and Other Kernels for Genomic Selection with

R Package rrBLUP. Plant Genome 4, 250–255.

doi:10.3835/plantgenome2011.08.0024

Erdman, J.W., Bierer, T.L., Gugger, E.T., 1993. Absorption and transport of carotenoids.

Ann. N. Y. Acad. Sci. 691, 76–85. doi:10.1111/j.1749-6632.1993.tb26159.x

Erhardt, A., Stahl, W., Sies, H., Lirussi, F., Donner, A., Häussinger, D., 2011. Plasma

levels of vitamin E and carotenoids are decreased in patients with Nonalcoholic

Steatohepatitis (NASH). Eur. J. Med. Res. 16, 76–8.

Eriksson, L., Johansson, J., Kettaneh-Wold, N., Wold, S., 2001. Multi- and Megavariate

Data Analysis Basic Principles and Applications. Umetrics Academy.

Estévez, J.M., Cantero, A., Reindl, A., Reichler, S., León, P., 2001. 1-Deoxy-D-xylulose-

5-phosphate synthase, a limiting enzyme for plastidic isoprenoid biosynthesis in

plants. J. Biol. Chem. 276, 22901–9. doi:10.1074/jbc.M100854200

Etalo, D.W., De Vos, R.C.H., Joosten, M.H.A.J., Hall, R.D., 2015. Spatially resolved

plant metabolomics: Some potentials and limitations of laser-ablation electrospray

ionization mass spectrometry metabolite imaging. Plant Physiol. 169, 1424–1435.

doi:10.1104/pp.15.01176

Etminan, M., Takkouche, B., Caamaño-Isorna, F., 2004. The Role of Tomato Products 228 and Lycopene in the Prevention of Prostate Cancer: A Meta-Analysis of

Observational Studies. Cancer Epidemiol. Prev. Biomarkers 13.

Eveillard, A., Lasserre, F., de Tayrac, M., Polizzi, A., Claus, S., Canlet, C., Mselli-

Lakhal, L., Gotardi, G., Paris, A., Guillou, H., Martin, P.G.P., Pineau, T., 2009.

Identification of potential mechanisms of toxicity after di-(2-ethylhexyl)-phthalate

(DEHP) adult exposure in the liver using a systems biology approach. Toxicol.

Appl. Pharmacol. 236, 282–92. doi:10.1016/j.taap.2009.02.008

Falcone Ferreyra, M.L., Rius, S.P., Casati, P., 2012. Flavonoids: biosynthesis, biological

functions, and biotechnological applications. Front. Plant Sci. 3.

doi:10.3389/fpls.2012.00222

Ferruzzi, M.G., Nguyen, M.L., Sander, L.C., Rock, C.L., Schwartz, S.J., 2001. Analysis

of lycopene geometrical isomers in biological microsamples by liquid

chromatography with coulometric array detection. J. Chromatogr. B Biomed. Sci.

Appl. 760, 289–299. doi:10.1016/S0378-4347(01)00288-2

Ferruzzi, M.G., Sander, L.C., Rock, C.L., Schwartz, S.J., 1998. Carotenoid determination

in biological microsamples using liquid chromatography with a coulometric

electrochemical array detector. Anal. Biochem. 256, 74–81.

doi:10.1006/abio.1997.2484

Fontaine, T.D., Irving, G.W., Ma, R., Poole, J.B., Doolittle, S.P., 1948. Isolation and

partial characterization of crystalline tomatine, an antibiotic agent from the tomato

plant. Arch. Biochem. 18, 467–75.

Franklin, C.C., Backos, D.S., Mohar, I., White, C.C., Forman, H.J., Kavanagh, T.J., 2009.

Structure, function, and post-translational regulation of the catalytic and modifier 229 subunits of glutamate cysteine ligase. Mol. Aspects Med.

doi:10.1016/j.mam.2008.08.009

Fraser, C.M., Chapple, C., 2011. The Phenylpropanoid Pathway in Arabidopsis.

Arabidopsis Book 9. doi:10.1199/tab.0152

Fray, R.G., Grierson, D., 1993. Identification and genetic analysis of normal and mutant

phytoene synthase genes of tomato by sequencing, complementation and co-

suppression. Plant Mol. Biol. 22, 589–602. doi:10.1007/BF00047400

Friedman, M., 2013. Anticarcinogenic, Cardioprotective, and Other Health Benefits of

Tomato Compounds Lycopene, α-Tomatine, and Tomatidine in Pure Form and in

Fresh and Processed Tomatoes. J. Agric. Food Chem. 61, 9534–9550.

doi:10.1021/jf402654e

Friedman, M., 2002. Tomato glycoalkaloids: role in the plant and in the diet. J. Agric.

Food Chem. 50, 5751–5780.

Friedman, M., Fitch, T.. T.E., Yokoyama, W.E.W.W.., 2000a. Lowering of plasma LDL

cholesterol in hamsters by the tomato glycoalkaloid tomatine. Food Chem. Toxicol.

38, 549–553. doi:10.1016/S0278-6915(00)00050-8

Friedman, M., Fitch, T.E., Levin, C.E., Yokoyama, W.H., 2000b. Feeding Tomatoes to

Hamsters Reduces their Plasma Low-density Lipoprotein Cholesterol and

Triglycerides. Sens. Nutr. Qual. Food 65, 897–900.

Friedman, M., Fitch, T.E. TE, Levin, C.C.E., Yokoyama, W.H.W., Qualities, N.,

Friedman, M., Fitch, T.E. TE, Levin, C.C.E., Yokoyama, W.H.W., 2000c. Feeding

Tomatoes to Hamsters Reduces their Plasma Low-density Lipoprotein Cholesterol

and Triglycerides. Sens. Nutr. Qual. Food 65, 897–900. doi:10.1111/j.1365- 230 2621.2000.tb13608.x

Friedman, M., Levin, C.E., 1998. Dehydrotomatine content in tomatoes. J. Agric. Food

Chem. 46, 4571–4576. doi:10.1021/jf9804589

Friedman, M., Levin, C.E., 1992. Reversed-phase high-performance liquid

chromatographic separation of potato glycoalkaloids and hydrolysis products on

acidic columns. J. Agric. Food Chem. 40, 2157–2163. doi:10.1021/jf00023a023

Friedman, M., Levin, C.E., McDonald, G.M., 1994. .alpha.-Tomatine Determination in

Tomatoes by HPLC using Pulsed Amperometric Detection. J. Agric. Food Chem.

42, 1959–1964. doi:10.1021/jf00045a024

Friedman, M., Levin, C.E.C.E., 1995. .alpha.-Tomatine Content in Tomato and Tomato

Products Determined by HPLC with Pulsed Amperometric Detection. J. Agric. Food

Chem. 43, 1507–1511. doi:10.1021/jf00054a017

Frusciante, L., Carli, P., Ercolano, M.R., Pernice, R., Di Matteo, A., Fogliano, V.,

Pellegrini, N., 2007. Antioxidant nutritional quality of tomato. Mol. Nutr. Food Res.

51, 609–17. doi:10.1002/mnfr.200600158

Fujiwara, Y., Kiyota, N., Hori, M., Matsushita, S., Iijima, Y., Aoki, K., Shibata, D.,

Takeya, M., Ikeda, T., Nohara, T., Nagai, R., 2007. Esculeogenin A, a new tomato

sapogenol, ameliorates hyperlipidemia and atherosclerosis in ApoE-deficient mice

by inhibiting ACAT. Arterioscler. Thromb. Vasc. Biol.

doi:10.1161/ATVBAHA.107.147405

Fujiwara, Y., Kiyota, N., Tsurushima, K., Yoshitomi, M., Horlad, H., Ikeda, T., Nohara,

T., Takeya, M., Nagai, R., 2012. Tomatidine, a tomato sapogenol, ameliorates

hyperlipidemia and atherosclerosis in ApoE-deficient mice by inhibiting acyl- 231 CoA:cholesterol acyl-transferase (ACAT). J. Agric. Food Chem. 60, 2472–2479.

doi:10.1021/jf204197r

Gabeen, A., Fathy, S., El-Houseini, M., Abdel Hamid, F., 2014. Potential

immunotherapeutic role of interleukin-2 and interleukin-12 combination in patients

with hepatocellular carcinoma. J. Hepatocell. Carcinoma 1, 55.

doi:10.2147/jhc.s56012

Gann, P.H., Ma, J., Giovannucci, E., Willett, W., Sacks, F.M., Hennekens, C.H.,

Stampfer, M.J., 1999. Lower Prostate Cancer Risk in Men with Elevated Plasma

Lycopene Levels. Cancer Res. 59.

Gärtner, C., Stahl, W., Sies, H., 1997. Lycopene is more bioavailable from tomato paste

than from fresh tomatoes. Am. J. Clin. Nutr. 66, 116–122.

Gautier, H., Rocci, A., Buret, M., Grasselly, D., Causse, M., 2005. Fruit load or fruit

position alters response to temperature and subsequently cherry tomato quality. J.

Sci. Food Agric. 85, 1009–1016. doi:10.1002/jsfa.2060

Georgé, S., Tourniaire, F., Gautier, H., Goupy, P., Rock, E., Caris-Veyrat, C., 2011.

Changes in the contents of carotenoids, phenolic compounds and vitamin C during

technical processing and lyophilisation of red and yellow tomatoes. Food Chem.

124, 1603–1611. doi:10.1016/j.foodchem.2010.08.024

Georgelis, N., Scott, J.W., Baldwin, E.A., 2004. Relationship of Tomato Fruit Sugar

Concentration with Physical and Chemical Traits and Linkage of RAPD Markers. J.

Amer. Soc. Hort. Sci. 129, 839–845.

Geraghty, C., Thomas-Ahner, J., Powell, R., Schmidt, N., Chitchumroonchokchai, C.,

Riedl, K., Solden, L., Bailey, M., Hussan, H., Francis, D., Cooperstone, J., Mo, X., 232 Young, G., Freitas, M., Schwartz, S., Moran, N., Clinton, S., 2020. Dietary Tomato

Varieties Similarly Inhibit Prostate Carcinogenesis in the TRAMP Model in

Association with Distinct Transcriptomic and Metabolomic Profiles, in: Current

Developments in Nutrition. Oxford Academic, pp. 326–326.

doi:10.1093/CDN/NZAA044_025

Gerster, H., 1997. The potential role of lycopene for human health. J. Am. Coll. Nutr. 16,

109–126. doi:10.1080/07315724.1997.10718661

Gerster, Helga, 1997. The potential role of lycopene for human health. J. Am. Coll. Nutr.

16, 109–126.

Giliberto, L., Perrotta, G., Pallara, P., Weller, J.L., Fraser, P.D., Bramley, P.M., Fiore, A.,

Tavazza, M., Giuliano, G., 2005. Manipulation of the blue light photoreceptor

cryptochrome 2 in tomato affects vegetative development, flowering time, and fruit

antioxidant content. Plant Physiol. 137, 199–208. doi:10.1104/pp.104.051987

Giovannoni, J.J., 2007. Fruit ripening mutants yield insights into ripening control. Curr.

Opin. Plant Biol. 10, 283–289. doi:10.1016/j.pbi.2007.04.008

Giovannucci, E., Ascherio, A., Rimm, E.B., Stampfer, M.J., Colditz, G.A., Willett, W.C.,

1995. Intake of carotenoids and retinol in relation to risk of prostate cancer. J. Natl.

Cancer Inst. 87, 1767–1776. doi:10.1093/jnci/87.23.1767

Giovannucci, E., Rimm, E.B., Liu, Y., Stampfer, M.J., Willett, W.C., 2002. A

prospective study of tomato products, lycopene, and prostate cancer risk. J. Natl.

Cancer Inst. 94, 391–8. doi:10.1093/JNCI/94.5.391

Giuntini, D., Lazzeri, V., Calvenzani, V., Dall’Asta, C., Galaverna, G., Tonelli, C.,

Petroni, K., Ranieri, A., 2008. Flavonoid profiling and biosynthetic gene expression 233 in flesh and peel of two tomato genotypes grown under UV-B-depleted conditions

during ripening. J. Agric. Food Chem. 56, 5905–5915. doi:10.1021/jf8003338

González-Vallinas, M., González-Castejón, M., Rodríguez-Casado, A., Molina, A.R. de,

2013. Dietary phytochemicals in cancer prevention and therapy: a complementary

approach with promising perspectives. Nutr. Rev. 71, 585–599.

doi:10.1111/nure.12051

González, R., Ballester, I., López-Posadas, R., Suárez, M.D., Zarzuelo, A., Martínez-

Augustin, O., Medina, F.S. De, 2011. Effects of Flavonoids and other Polyphenols

on Inflammation. Crit. Rev. Food Sci. Nutr. 51, 331–362.

doi:10.1080/10408390903584094

Gottlieb, D., 1943. Expressed sap of tomato plants in relation to wilt resistance.

Phytopathology 33, 1111.

Grainger, E., Schwartz, S., Wang, S., Unlu, N., Boileau, T., Ferketich, A., Monk, J.P.,

Gong, M., Bahnson, R., DeGroff, V., Clinton, S., 2008. A Combination of Tomato

and Soy Products for Men With Recurring Prostate Cancer and Rising Prostate

Specific Antigen. Nutr. Cancer 60, 145–154. doi:10.1080/01635580701621338

Grainger, E.M., Moran, N.E., Francis, D.M., Schwartz, S.J., Wan, L., Thomas-Ahner, J.,

Kopec, R.E., Riedl, K.M., Young, G.S., Abaza, R., Bahnson, R.R., Clinton, S.K.,

2018. The Journal of Nutrition Nutrient Physiology, Metabolism, and Nutrient-

Nutrient Interactions A Novel Tomato-Soy Juice Induces a Dose-Response Increase

in Urinary and Plasma Phytochemical Biomarkers in Men with Prostate Cancer.

doi:10.1093/jn/nxy232

Grassi, S., Piro, G., Lee, J.M., Zheng, Y., Fei, Z., Dalessandro, G., Giovannoni, J.J., 234 Lenucci, M.S., 2013. Comparative genomics reveals candidate carotenoid pathway

regulators of ripening watermelon fruit. BMC Genomics 14, 781. doi:10.1186/1471-

2164-14-781

Grimplet, J., Cramer, G.R., Dickerson, J.A., Mathiason, K., Van Hemert, J., Fennell,

A.Y., 2009. VitisNet: “Omics” integration through grapevine molecular networks.

PLoS One 4, e8365. doi:10.1371/journal.pone.0008365

Guengerich, F.P., 2018. Mechanisms of Cytochrome P450-Catalyzed Oxidations. ACS

Catal. doi:10.1021/acscatal.8b03401

Gupta, P., Sreelakshmi, Y., Sharma, R., 2015. A rapid and sensitive method for

determination of carotenoids in plant tissues by high performance liquid

chromatography. Plant Methods 11, 5. doi:10.1186/s13007-015-0051-0

Halliwell, B., Chirico, S., 1993. Lipid peroxidation: its mechanism, measurement, and

significance. Am. J. Clin. Nutr. 57, 715S-724S.

Hamilton, J.P., Sim, S.-C., Stoffel, K., Van Deynze, A., Buell, C.R., Francis, D.M., 2012.

Single Nucleotide Polymorphism Discovery in Cultivated Tomato via Sequencing

by Synthesis. Plant Genome. doi:10.3835/plantgenome2011.12.0033

Hardesty, J.E., Wahlang, B., Falkner, K.C., Shi, H., Jin, J., Zhou, Y., Wilkey, D.W.,

Merchant, M.L., Watson, C.T., Feng, W., Morris, A.J., Hennig, B., Prough, R.A.,

Cave, M.C., 2019. Proteomic Analysis Reveals Novel Mechanisms by Which

Polychlorinated Biphenyls Compromise the Liver Promoting Diet-Induced

Steatohepatitis. J. Proteome Res. 18, 1582–1594.

doi:10.1021/acs.jproteome.8b00886

Harris, W.M., Spurr, A.R., 1969. CHROMOPLASTS OF TOMATO FRUITS. I. 235 ULTRASTRUCTURE OF LOW-PIGMENT AND HIGH-BETA MUTANTS.

CAROTENE ANALYSES. Am. J. Bot. 56, 369–379. doi:10.1002/j.1537-

2197.1969.tb07546.x

Harrison, E.H., 2012. Mechanisms involved in the intestinal absorption of dietary vitamin

A and provitamin A carotenoids. Biochim. Biophys. Acta 1821, 70–7.

doi:10.1016/j.bbalip.2011.06.002

Hastie, T., Tibshirani, R., Friedman, J., 2009. Random Forests. pp. 587–604.

doi:10.1007/978-0-387-84858-7_15

Hawkes, J.G., 1977. The importance of wild germplasm in plant breeding. Euphytica 26,

615–621. doi:10.1007/BF00021686

He, X.-Y., Zhao, J., Chen, Z.-Q., Jin, R., Liu, C.-Y., 2018. High Expression of Retinoic

Acid Induced 14 (RAI14) in Gastric Cancer and Its Prognostic Value. Med. Sci.

Monit. 24, 2244–2251. doi:10.12659/MSM.910133

Heftmann, E., Lieber, E.R., Bennett, R.D., 1967. Biosynthesis of tomatidine from

cholesterol in Lycopersicon pimpinellifolium. Phytochemistry 6, 225–229.

doi:10.1016/S0031-9422(00)82767-3

Hemmerlin, A., Hoeffler, J.-F., Meyer, O., Tritsch, D., Kagan, I.A., Grosdemange-

Billiard, C., Rohmer, M., Bach, T.J., 2003. Cross-talk between the Cytosolic

Mevalonate and the Plastidial Methylerythritol Phosphate Pathways in Tobacco

Bright Yellow-2 Cells. J. Biol. Chem. 278, 26666–26676.

doi:10.1074/jbc.M302526200

Hodek, P., Trefil, P., Stiborová, M., 2002. Flavonoids-potent and versatile biologically

active compounds interacting with cytochromes P450. Chem. Biol. Interact. 139, 1– 236 21. doi:10.1016/S0009-2797(01)00285-X

Hoffman, S.M.G., Nelson, D.R., Keeney, D.S., 2001. Organization, structure and

evolution of the CYP2 gene cluster on human chromosome 19. Pharmacogenetics

11, 687–698. doi:10.1097/00008571-200111000-00007

Hollman, P.C.H., 2004. Absorption, bioavailability, and metabolism of flavonoids.

Pharm. Biol. 42, 74–83. doi:10.1080/13880200490893492

Hövelmann, Y., Jagels, A., Schmid, R., Hübner, F., Humpf, H.-U., 2019. Identification of

potential human urinary biomarkers for tomato juice intake by mass spectrometry-

based metabolomics. Eur. J. Nutr. 1–13. doi:10.1007/s00394-019-01935-4

Huang, H., Fujii, H., Sankila, A., Mahler-Araujo, B.M., Matsuda, M., Cathomas, G.,

Ohgaki, H., 1999. Beta-catenin mutations are frequent in human hepatocellular

carcinomas associated with hepatitis C virus infection. Am. J. Pathol. 155, 1795–

801.

Huang, M.T., Smart, R.C., Wong, C.Q., Conney, A.H., 1988. Inhibitory Effect of

Curcumin, Chlorogenic Acid, Caffeic Acid, and Ferulic Acid on Tumor Promotion

in Mouse Skin by 12-O-Tetradecanoylphorbol-13-Acetate. Cancer Res.

Huo, T., Ferruzzi, M.G., Schwartz, S.J., Failla, M.L., 2007. Impact of fatty acyl

composition and quantity of triglycerides on bioaccessibility of dietary carotenoids.

J. Agric. Food Chem. 55, 8950–7. doi:10.1021/jf071687a

Iijima, Y., Fujiwara, Y., Tokita, T., Ikeda, T., Nohara, T., Aoki, K., Shibata, D., 2009.

Involvement of Ethylene in the Accumulation of Esculeoside A during Fruit

Ripening of Tomato ( Solanum lycopersicum ). J. Agric. Food Chem. 57, 3247–

3252. doi:10.1021/jf8037902 237 Iijima, Y., Nakamura, Y., Ogata, Y., Tanaka, K., Sakurai, N., Suda, K., Suzuki, T.,

Suzuki, H., Okazaki, K., Kitayama, M., Kanaya, S., Aoki, K., Shibata, D., 2008.

Metabolite annotations based on the integration of mass spectral information. Plant

J. 54, 949–62. doi:10.1111/j.1365-313X.2008.03434.x

Iijima, Y., Watanabe, B., Sasaki, R., Takenaka, M., Ono, H., Sakurai, N., Umemoto, N.,

Suzuki, H., Shibata, D., Aoki, K., 2013. Steroidal glycoalkaloid profiling and

structures of glycoalkaloids in wild tomato fruit. Phytochemistry 95, 145–157.

doi:10.1016/J.PHYTOCHEM.2013.07.016

Ip, B.C., Hu, K.-Q., Liu, C., Smith, D.E., Obin, M.S., Ausman, L.M., Wang, X.-D., 2013.

Lycopene metabolite, apo-10’-lycopenoic acid, inhibits diethylnitrosamine-initiated,

high fat diet-promoted hepatic inflammation and tumorigenesis in mice. Cancer

Prev. Res. (Phila). 6, 1304–16. doi:10.1158/1940-6207.CAPR-13-0178

Ip, B.C., Liu, C., Ausman, L.M., von Lintig, J., Wang, X.-D., 2014. Lycopene attenuated

hepatic tumorigenesis via differential mechanisms depending on carotenoid cleavage

enzyme in mice. Cancer Prev. Res. (Phila). 7, 1219–27. doi:10.1158/1940-

6207.CAPR-14-0154

Ip, B.C., Wang, X.-D., 2014. Non-alcoholic steatohepatitis and hepatocellular carcinoma:

implications for lycopene intervention. Nutrients 6, 124–62. doi:10.3390/nu6010124

Irving, G.W., Fontaine, T.D., Doolittle, S.P., 1945. Lycopersicin, a fungistatic agent from

the tomato plant. Science 102, 9–11. doi:10.1126/science.102.2636.9

Isaacson, T., Ohad, I., Beyer, P., Hirschberg, J., 2004. Analysis in vitro of the enzyme

CRTISO establishes a poly-cis-carotenoid biosynthesis pathway in plants. Plant

Physiol. 136, 4246–55. doi:10.1104/pp.104.052092 238 Isaacson, T., Ronen, G., Zamir, D., Hirschberg, J., 2002. Cloning of tangerine from

tomato reveals a carotenoid isomerase essential for the production of beta-carotene

and xanthophylls in plants. Plant Cell 14, 333–42. doi:10.1105/tpc.010303.2001

Itkin, M., Heinig, U., Tzfadia, O., Bhide, A.J., Shinde, B., Cardenas, P.D., Bocobza, S.E.,

Unger, T., Malitsky, S., Finkers, R., Tikunov, Y., Bovy, A., Chikate, Y., Singh, P.,

Rogachev, I., Beekwilder, J., Giri, A.P., Aharoni, A., 2013. Biosynthesis of

antinutritional alkaloids in solanaceous crops is mediated by clustered genes.

Science 341, 175–9. doi:10.1126/science.1240230

Itkin, M., Rogachev, I., Alkan, N., Rosenberg, T., Malitsky, S., Masini, L., Meir, S.,

Iijima, Y., Aoki, K., de Vos, R., Prusky, D., Burdman, S., Beekwilder, J., Aharoni,

A., 2011. GLYCOALKALOID METABOLISM1 Is Required for Steroidal Alkaloid

Glycosylation and Prevention of Phytotoxicity in Tomato. Plant Cell 23, 4507–4525.

doi:10.1105/tpc.111.088732

Jaakola, L., 2013. New insights into the regulation of anthocyanin biosynthesis in fruits.

Trends Plant Sci. 18, 477–483. doi:10.1016/j.tplants.2013.06.003

Jeng, K.S., Chang, C.F., Jeng, W.J., Sheen, I.S., Jeng, C.J., 2015. Heterogeneity of

hepatocellular carcinoma contributes to cancer progression. Crit. Rev. Oncol.

Hematol. doi:10.1016/j.critrevonc.2015.01.009

Jenkins, J. a, Mackinney, G., 1955. Carotenoids of the Apricot Tomato and Its Hybrids

with Yellow and Tangerine. Genetics 40, 715–720.

Jenkins, J.A., 1948. The origin of the cultivated tomato. Econ. Bot. 2, 379–392.

doi:10.1007/BF02859492

Jenkins, J.A., Mackinney, G., 1953. Inheritance of carotenoid differences in the tomato 239 hybrid yellow x tangerine. Genetics 38, 107–116.

Jozefczuk, S., Klie, S., Catchpole, G., Szymanski, J., Cuadros-Inostroza, A., Steinhauser,

D., Selbig, J., Willmitzer, L., 2010. Metabolomic and transcriptomic stress response

of Escherichia coli. Mol. Syst. Biol. 6, 364. doi:10.1038/msb.2010.18

Juvik, J.A., Stevens, M.A., 1982. Inheritance of foliar α-tomatine content in tomatoes. J.

Am. Soc. Hortic. Sci. 107, 1061–1065. doi:10.1556/AAlim.2015.0002

Kachanovsky, D.E., Filler, S., Isaacson, T., Hirschberg, J., 2012. Epistasis in tomato

color mutations involves regulation of phytoene synthase 1 expression by cis-

carotenoids. Proc. Natl. Acad. Sci. U. S. A. 109, 19021–6.

doi:10.1073/pnas.1214808109

Kahle, D., Wickham, H., 2013. ggmap: Spatial Visualization with ggplot2. R J. 5, 144–

161.

Kaplan, L.A., Lau, J.M., Stein, E.A., 1980. Carotenoid Composition, Concentrations, and

Relationships in Various Human Organs. Clin Physiol Biochem 8, 1–10.

Katz, Y., Wang, E.T., Airoldi, E.M., Burge, C.B., 2010. Analysis and design of RNA

sequencing experiments for identifying isoform regulation. Nat. Methods 7, 1009–

15. doi:10.1038/nmeth.1528

Kauss, T., Moynet, D., Rambert, J., Al-Kharrat, A., Brajot, S., Thiolat, D., Ennemany, R.,

Fawaz, F., Mossalayi, M.D., 2008. Rutoside decreases human macrophage-derived

inflammatory mediators and improves clinical signs in adjuvant-induced arthritis.

Arthritis Res. Ther. 10, R19. doi:10.1186/ar2372

Kavitha, K., Kowshik, J., Kishore, T.K.K., Baba, A.B., Nagini, S., 2013. Astaxanthin

inhibits NF-κB and Wnt/β-catenin signaling pathways via inactivation of 240 Erk/MAPK and PI3K/Akt to induce intrinsic apoptosis in a hamster model of oral

cancer. Biochim. Biophys. Acta 1830, 4433–44. doi:10.1016/j.bbagen.2013.05.032

Kean, E.G., Hamaker, B.R., Ferruzzi, M.G., 2008. Carotenoid Bioaccessibility from

Whole Grain and Degermed Maize Meal Products. J. Agric. Food Chem. 56, 9918–

9926. doi:10.1021/jf8018613

Keukens, E.A.J., de Vrije, T., van den Boom, C., de Waard, P., Plasman, H.H., Thiel, F.,

Chupin, V., Jongen, W.M.F., de Kruijff, B., 1995. Molecular basis of glycoalkaloid

induced membrane disruption. Biochim. Biophys. Acta - Biomembr. 1240, 216–228.

doi:10.1016/0005-2736(95)00186-7

Keukens, E.A.J., Hop, M.E.C.M., Jongen+, W.M.F., 1994. Rapid High-Performance

Liquid Chromatographic Method for the Quantification of-Tomatine in Tomato,

Food Chem.

Khan, S.R., Baghdasarian, A., Fahlman, R.P., Michail, K., Siraki, A.G., 2014. Current

status and future prospects of toxicogenomics in drug discovery. Drug Discov.

Today 19, 562–78. doi:10.1016/j.drudis.2013.11.001

Kim, J.E., Gordon, S.L., Ferruzzi, M.G., Campbell, W.W., 2015. Effects of egg

consumption on carotenoid absorption from co-consumed, raw vegetables. Am. J.

Clin. Nutr. 102, 75–83. doi:10.3945/ajcn.115.111062

Kjeldahl, K., Bro, R., 2010. Some common misunderstandings in chemometrics. J.

Chemom. 24, 558–564. doi:10.1002/cem.1346

Koh, E., Kaffka, S., Mitchell, A.E., 2013. A long-term comparison of the influence of

organic and conventional crop management practices on the content of the

glycoalkaloid α -tomatine in tomatoes. J. Sci. Food Agric. 93, 1537–1542. 241 doi:10.1002/jsfa.5951

Kopec, R.E., Cooperstone, J.L., Cichon, M.J., Schwartz, S.J., 2012. Analysis Methods of

Carotenoids, in: Analysis of Antioxidant-Rich Phytochemicals. Wiley-Blackwell,

Oxford, UK, pp. 105–148. doi:10.1002/9781118229378.ch4

Kopec, R.E., Cooperstone, J.L., Schweiggert, R.M., Young, G.S., Harrison, E.H.,

Francis, D.M., Clinton, S.K., Schwartz, S.J., 2014. Avocado Consumption Enhances

Human Postprandial Provitamin A Absorption and Conversion from a Novel High–

β-Carotene Tomato Sauce and from Carrots. J. Nutr. 144, 1158–1166.

doi:10.3945/jn.113.187674

Kopec, R.E., Riedl, K.M., Harrison, E.H., Curley, R.W., Hruszkewycz, D.P., Clinton,

S.K., Schwartz, S.J., 2010. Identification and quantification of apo-lycopenals in

fruits, vegetables, and human plasma. J. Agric. Food Chem. 58, 3290–3296.

doi:10.1021/jf100415z

Kopec, R.E., Schick, J., Tober, K.L., Riedl, K.M., Francis, D.M., Young, G.S., Schwartz,

S.J., Oberyszyn, T.M., 2015. Sex differences in skin carotenoid deposition and acute

UVB-induced skin damage in SKH-1 hairless mice after consumption of tangerine

tomatoes. Mol. Nutr. Food Res. doi:10.1002/mnfr.201500317

Kotake-Nara, E., Kushiro, M., Zhang, H., Sugawara, T., Miyashita, K., Nagao, A., 2001.

Carotenoids Affect Proliferation of Human Prostate Cancer Cells. J. Nutr. 131,

3303–3306.

Kotilainen, T., Tegelberg, R., Julkunen-Tiitto, R., Lindfors, A., O’Hara, R.B., Aphalo,

P.J., 2010. Seasonal fluctuations in leaf phenolic composition under UV

manipulations reflect contrasting strategies of alder and birch trees. Physiol. Plant. 242 140, 297–309. doi:10.1111/j.1399-3054.2010.01398.x

Kozukue, N., Friedman, M., 2003. Tomatine, chlorophyll, Beta-carotene and lycopene

content in tomatoes during growth and maturation. J. Sci. Food Agric. 83, 195–200.

doi:10.1002/jsfa.1292

Kozukue, N., Han, J.-S., Lee, K.-R., Friedman, M., 2004. Dehydrotomatine and α-

Tomatine Content in Tomato Fruits and Vegetative Plant Tissues. J. Agric. Food

Chem. 52, 2079–2083. doi:10.1021/jf0306845

Kutty, R.K., Kutty, G., Samuel, W., Duncan, T., Bridges, C.C., El-Sherbeeny, A.,

Nagineni, C.N., Smith, S.B., Wiggert, B., 2001. Molecular Characterization and

Developmental Expression of NORPEG, a Novel Gene Induced by Retinoic Acid. J.

Biol. Chem. 276, 2831–2840. doi:10.1074/jbc.M007421200

Langmead, B., Trapnell, C., Pop, M., Salzberg, S.L., 2009. Ultrafast and memory-

efficient alignment of short DNA sequences to the human genome. Genome Biol.

10, R25. doi:10.1186/gb-2009-10-3-r25

Lawson, D.R., Erb, W.A., Miller, A.R., 1992. Analysis of Solanum Alkaloids Using

Internal Standardization and Capillary Gas Chromatography. Food Chem 40, 2186–

2191.

Lazraq, A., Cléroux, R., Gauchi, J.P., 2003. Selecting both latent and explanatory

variables in the PLS1 regression model. Chemom. Intell. Lab. Syst. 66, 117–126.

doi:10.1016/S0169-7439(03)00027-3

Le Gall, G., Dupont, M.S., Mellon, F.A., Davis, A.L., Collins, G.J., Verhoeyen, M.E.,

Colquhoun, I.J., 2003. Characterization and content of flavonoid glycosides in

genetically modified tomato (Lycopersicon esculentum) fruits. J. Agric. Food Chem. 243 51, 2438–2446. doi:10.1021/jf025995e

Lê, S., Josse, J., Husson, F., 2008. FactoMineR : An R Package for Multivariate

Analysis. J. Stat. Softw. 25, 1–18. doi:10.18637/jss.v025.i01

Lee, H.J., Nakayasu, M., Akiyama, R., Kobayashi, M., Miyachi, H., Sugimoto, Y.,

Umemoto, N., Saito, K., Muranaka, T., Mizutani, M., 2019. Identification of a 3b-

hydroxysteroid dehydrogenase/ 3-ketosteroid reductase involved in a-tomatine

biosynthesis in tomato. Plant Cell Physiol. 60, 1304–1315. doi:10.1093/pcp/pcz049

Lee, K.-R., Kozukue, N., Han, J.-S., Park, J.-H., Chang, E., Baek, E.-J., Chang, J.-S.,

Friedman, M., 2004. Glycoalkaloids and Metabolites Inhibit the Growth of Human

Colon (HT29) and Liver (HepG2) Cancer Cells. J. Agric. Food Chem. 52, 2832–

2839. doi:10.1021/jf030526d

Leipzig, J., 2016. A review of bioinformatic pipeline frameworks. Brief. Bioinform.

bbw020. doi:10.1093/bib/bbw020

Lesellier, E., Tchapla, A., Marty, C., Lebert, A., 1993. Analysis of carotenoids by high-

performance liquid chromatography and supercritical fluid chromatography. J.

Chromatogr. A 633, 9–23. doi:10.1016/0021-9673(93)83133-D

Lewinsohn, E., Gijzen, M., 2009. Phytochemical diversity: The sounds of silent

metabolism. Plant Sci. 176, 161–169. doi:10.1016/j.plantsci.2008.09.018

Li, J., Ou-Lee, T.M., Raba, R., Amundson, R.G., Last, R.L., 1993. Arabidopsis Flavonoid

Mutants Are Hypersensitive to UV-B Irradiation. Plant Cell 5, 171–179.

doi:10.1105/tpc.5.2.171

Li, S., Sullivan, N.L., Rouphael, N., Yu, T., Banton, S., Maddur, M.S., McCausland, M.,

Chiu, C., Canniff, J., Dubey, S., Liu, K., Tran, V.L., Hagan, T., Duraisingham, S., 244 Wieland, A., Mehta, A.K., Whitaker, J.A., Subramaniam, S., Jones, D.P., Sette, A.,

Vora, K., Weinberg, A., Mulligan, M.J., Nakaya, H.I., Levin, M., Ahmed, R.,

Pulendran, B., 2017. Metabolic Phenotypes of Response to Vaccination in Humans.

Cell 169, 862-877.e17. doi:10.1016/j.cell.2017.04.026

Li, X., Bonawitz, N.D., Weng, J.-K., Chapple, C., 2010. The Growth Reduction

Associated with Repressed Lignin Biosynthesis in Arabidopsis thaliana Is

Independent of Flavonoids. Plant Cell 22, 1620–1632. doi:10.1105/tpc.110.074161

Li, X., Nair, A., Wang, S., Wang, L., 2015. Quality control of RNA-seq experiments.

Methods Mol. Biol. 1269, 137–46. doi:10.1007/978-1-4939-2291-8_8

Li, Y.-Y., Mao, K., Zhao, C., Zhao, X.-Y., Zhang, H.-L., Shu, H.-R., Hao, Y.-J., 2012.

MdCOP1 Ubiquitin E3 Ligases Interact with MdMYB1 to Regulate Light-Induced

Anthocyanin Biosynthesis and Red Fruit Coloration in Apple. Plant Physiol. 160,

1011–1022. doi:10.1104/pp.112.199703

Liao, Y., Smyth, G.K., Shi, W., 2019. The R package Rsubread is easier, faster, cheaper

and better for alignment and quantification of RNA sequencing reads. Nucleic Acids

Res. doi:10.1093/nar/gkz114

Liaw, A., Wiener, M., 2002. Classification and Regression by randomForest. R News.

Lin, Y.P., Liu, C.Y., Chen, K.Y., 2019. Assessment of genetic differentiation and linkage

disequilibrium in Solanum pimpinellifolium using genome-wide high-density SNP

markers. G3 Genes, Genomes, Genet. 9, 1497–1505. doi:10.1534/g3.118.200862

Lincoln, R.E., Porter, J.W., 1950. Inheritance of beta-carotene in tomatoes. Genetics 35,

206–211.

Liu, L., Shao, Z., Zhang, M., Wang, Q., 2015. Regulation of carotenoid metabolism in 245 tomato. Mol. Plant 8, 28–39. doi:10.1016/j.molp.2014.11.006

Liu, R., Zhou, F., He, H., Wei, J., Tian, X., Ding, L., 2018. Metabolism and Bioactivation

of Corynoline With Characterization of the Glutathione/Cysteine Conjugate and

Evaluation of Its Hepatotoxicity in Mice. Front. Pharmacol. 9, 1264.

doi:10.3389/fphar.2018.01264

Liu, X., Allen, J.D., Arnold, J.T., Blackman, M.R., 2008. Lycopene inhibits IGF-I signal

transduction and growth in normal prostate epithelial cells by decreasing DHT-

modulated IGF-I production in co-cultured reactive stromal cells. Carcinogenesis

29, 816–823. doi:10.1093/carcin/bgn011

Liu, Y., Roof, S., Ye, Z., Barry, C., van Tuinen, A., Vrebalov, J., Bowler, C.,

Giovannoni, J., Van Tuinent, A., Vrebalov, J., Bowler, C., Giovannoni, J., van

Tuinen, A., Vrebalov, J., Bowler, C., Giovannoni, J., 2004. Manipulation of light

signal transduction as a means of modifying fruit nutritional quality in

tomato\r10.1073/pnas.0400935101. Proc. Natl. Acad. Sci. U. S. A. 101, 9897–9902.

doi:10.1073/pnas.0400935101

Llovet, J.M., Di Bisceglie, A.M., Bruix, J., Kramer, B.S., Lencioni, R., Zhu, A.X.,

Sherman, M., Schwartz, M., Lotze, M., Talwalkar, J., Gores, G.J., 2008. Design and

endpoints of clinical trials in hepatocellular carcinoma. J. Natl. Cancer Inst. 100,

698–711. doi:10.1093/jnci/djn134

Logan, B.A., Monson, R.K., Potosnak, M.J., 2000. Biochemistry and physiology of foliar

isoprene production. Trends Plant Sci. 5, 477–481. doi:10.1016/S1360-

1385(00)01765-9

Lu, H., Gunewardena, S., Cui, J.Y., Yoo, B., Zhong, X.B., Klaassen, C.D., 2013. RNA- 246 sequencing quantification of hepatic ontogeny and tissue distribution of mRNAs of

phase II enzymes in mice. Drug Metab. Dispos. 41, 844–857.

doi:10.1124/dmd.112.050211

Luan, H., Ji, F., Chen, Y., Cai, Z., 2018. statTarget: A streamlined tool for signal drift

correction and interpretations of quantitative mass spectrometry-based omics data.

Anal. Chim. Acta 1036, 66–72. doi:10.1016/j.aca.2018.08.002

Luthria, D.L., Mukhopadhyay, S., Krizek, D.T., 2006. Content of total phenolics and

phenolic acids in tomato (Lycopersicon esculentum Mill.) fruits as influenced by

cultivar and solar UV radiation. J. Food Compos. Anal. 19, 771–777.

doi:10.1016/j.jfca.2006.04.005

MacArthur, J.W., 1934. Linkage groups in the tomato. J. Genet. 29, 123–133.

doi:10.1007/BF02981789

Mardis, E.R., 2013. Next-generation sequencing platforms. Annu. Rev. Anal. Chem.

(Palo Alto. Calif). 6, 287–303. doi:10.1146/annurev-anchem-062012-092628

Martens, S., Preuß, A., Matern, U., 2010. Multifunctional flavonoid dioxygenases:

Flavonol and anthocyanin biosynthesis in Arabidopsis thaliana L. Phytochemistry

71, 1040–1049. doi:10.1016/j.phytochem.2010.04.016

Masson, P., Alves, A.C., Ebbels, T.M.D., Nicholson, J.K., Want, E.J., 2010. Optimization

and Evaluation of Metabolite Extraction Protocols for Untargeted Metabolic

Profiling of Liver Samples by UPLC-MS. Anal. Chem. 82, 7779–7786.

doi:10.1021/ac101722e

Matsukura, C., 2016. Functional Genomics and Biotechnology in Solanaceae and

Cucurbitaceae Crops, Biotechnology in Agriculture and Forestry. Springer Berlin 247 Heidelberg, Berlin, Heidelberg. doi:10.1007/978-3-662-48535-4

McCabe, M., Waters, S., Morris, D., Kenny, D., Lynn, D., Creevey, C., 2012. RNA-seq

analysis of differential gene expression in liver from lactating dairy cows divergent

in negative energy balance. BMC Genomics 13, 193. doi:10.1186/1471-2164-13-

193

Melendez-Martinez, A.J., Nascimento, A.F., Wang, Y., Liu, C., Mao, Y., Wang, X.-D.,

2013. Effect of tomato extract supplementation against high-fat diet-induced hepatic

lesions. Hepatobiliary Surg. Nutr. 2, 198–208.

Mendiburu, F., 2009. Una herramienta de análisis estadístico para la investigación

agrícola. UNI. FIIS.

Mikhak, B., Hunter, D.J., Spiegelman, D., Platz, E.A., Wu, K., Erdman, J.W.,

Giovannucci, E., 2008. Manganese superoxide dismutase (MnSOD) gene

polymorphism, interactions with carotenoid levels and prostate cancer risk.

Carcinogenesis 29, 2335–40. doi:10.1093/carcin/bgn212

Miller, J.C., Tanksley, S.D., 1990. RFLP analysis of phylogenetic relationships and

genetic variation in the genus Lycopersicon. Theor. Appl. Genet. 80, 437–48.

doi:10.1007/BF00226743

Milner, S.E., Brunton, N.P., Jones, P.W., O Brien, N.M., Collins, S.G., Maguire, A.R., O’

Brien, N.M., Collins, S.G., Maguire, A.R., 2011. Bioactivities of Glycoalkaloids and

Their Aglycones from Solanum Species. J. Agric. Food Chem. 59, 3454–3484.

doi:10.1021/jf200439q

Mintz-Oron, S., Mandel, T., Rogachev, I., Feldberg, L., Lotan, O., Yativ, M., Wang, Z.,

Jetter, R., Venger, I., Adato, A., Aharoni, A., 2008. Gene Expression and 248 Metabolism in Tomato Fruit Surface Tissues. Plant Physiol. 147, 823–851.

doi:10.1104/pp.108.116004

Misra, R., Mangi, S., Joshi, S., Mittal, S., Gupta, S.K., Pandey, R.M., 2006. LycoRed as

an alternative to hormone replacement therapy in lowering serum lipids and

oxidative stress markers: A randomized controlled clinical trial. J. Obstet. Gynaecol.

Res. 32, 299–304. doi:10.1111/j.1447-0756.2006.00410.x

Moco, S., Bino, R.J., Vorst, O., Verhoeven, H.A., de Groot, J., van Beek, T.A., Vervoort,

J., de Vos, C.H.R.H.R., 2006. A liquid chromatography-mass spectrometry-based

metabolome database for tomato. Plant Physiol. 141, 1205–18.

doi:10.1104/pp.106.078428

Moise, A.R., Al-Babili, S., Wurtzel, E.T., 2014. Mechanistic aspects of carotenoid

biosynthesis. Chem. Rev. 114, 164–193. doi:10.1021/cr400106y

Moran, N.E., Erdman, J.W., Clinton, S.K., 2013. Complex interactions between dietary

and genetic factors impact lycopene metabolism and distribution. Arch. Biochem.

Biophys. 539, 171–180. doi:10.1016/j.abb.2013.06.017

Mori, K., Blackshear, P.E., Lobenhofer, E.K., Parker, J.S., Orzech, D.P., Roycroft, J.H.,

Walker, K.L., Johnson, K.A., Marsh, T.A., Irwin, R.D., Boorman, G.A., 2007.

Hepatic transcript levels for genes coding for enzymes associated with xenobiotic

metabolism are altered with age. Toxicol. Pathol. 35, 242–251.

doi:10.1080/01926230601156286

Moxley, J.F., Jewett, M.C., Antoniewicz, M.R., Villas-Boas, S.G., Alper, H., Wheeler,

R.T., Tong, L., Hinnebusch, A.G., Ideker, T., Nielsen, J., Stephanopoulos, G., 2009.

Linking high-resolution metabolic flux phenotypes and transcriptional regulation in 249 yeast modulated by the global regulator Gcn4p. Proc. Natl. Acad. Sci. U. S. A. 106,

6477–82. doi:10.1073/pnas.0811091106

Muir, S.R., Collins, G.J., Robinson, S., Hughes, S., Bovy, A., Ric De Vos, C.H., van

Tunen, A.J., Verhoeyen, M.E., 2001. Overexpression of petunia chalcone isomerase

in tomato results in fruit containing increased levels of flavonols. Nat. Biotechnol.

19, 470–474. doi:10.1038/88150

Naderi, G.A., Asgary, S., Sarraf-Zadegan, N., Shirvany, H., 2003. Anti-oxidant effect of

flavonoids on the susceptibility of LDL oxidation. Mol. Cell. Biochem. 246, 193–

196.

Nagata, M., Yamashita, I., 1992. Simple method for simultaneous determination of

chlorophyll and carotenoids in tomato fruit. Soc. Food Sci. Technol. (Nippon

Shokuhin Kogyo Gakkaishi) 39, 925–928.

Nakagawa, H., Maeda, S., 2012. Inflammation- and stress-related signaling pathways in

hepatocarcinogenesis. World J. Gastroenterol. 18, 4071–4081.

Nakayasu, M., Shioya, N., Shikata, M., Thagun, C., Abdelkareem, A., Okabe, Y.,

Ariizumi, T., Arimura, G.I., Mizutani, M., Ezura, H., Hashimoto, T., Shoji, T., 2018.

JRE4 is a master transcriptional regulator of defense-related steroidal glycoalkaloids

in tomato. Plant J. 11. doi:10.1111/tpj.13911

Nambara, E., Marion-Poll, A., 2005. Abscisic Acid Biosynthesis and Catabolism. Annu.

Rev. Plant Biol. 56, 165–185. doi:10.1146/annurev.arplant.56.032604.144046

National Center for Health Statistics, 2015. Health, United States, 2014: With Special

Feature on Adults Aged 55-64, Health, United States, 2014: With Special Feature on

Adults Aged 55-64. 250 Naugler, W.E., Sakurai, T., Kim, S., Maeda, S., Kim, K., Elsharkawy, A.M., Karin, M.,

2007. Gender disparity in liver cancer due to sex differences in MyD88-dependent

IL-6 production. Science 317, 121–4. doi:10.1126/science.1140485

Nisar, N., Li, L., Lu, S., Khin, N.C., Pogson, B.J., 2015. Carotenoid metabolism in plants.

Mol. Plant 8, 68–82. doi:10.1016/j.molp.2014.12.007

Niyogi, K.K., Grossman, A.R., Björkman, O., 1998. Arabidopsis mutants define a central

role for the xanthophyll cycle in the regulation of photosynthetic energy conversion.

Plant Cell 10, 1121–1134.

Novotny, J.A., Harrison, D.J., Pawlosky, R., Flanagan, V.P., Harrison, E.H., Kurilich,

A.C., 2010. Beta-carotene conversion to vitamin A decreases as the dietary dose

increases in humans. J. Nutr. 140, 915–8. doi:10.3945/jn.109.116947

Nowell, S., Falany, C.N., 2006. Pharmacogenetics of human cytosolic sulfotransferases.

Oncogene 25, 1673–1678. doi:10.1038/sj.onc.1209376

Nusse, R., 2005. Wnt signaling in disease and in development. Cell Res. 15, 28–32.

doi:10.1038/sj.cr.7290260

Ökmen, B., Etalo, D.W., Joosten, M.H.A.J., Bouwmeester, H.J., de Vos, R.C.H.,

Collemare, J., De Wit, P.J.G.M., 2013. Detoxification of α-tomatine by

Cladosporium fulvum is required for full virulence on tomato. New Phytol. 198,

1203–1214. doi:10.1111/nph.12208

Orchard, C., 2014. Naturally Occurring Variation in the Promoter of the Chromoplast-

specific Cyc-B Gene in Tomato can be Used to Modulate Levels of ß-carotene in

Ripe Tomato Fruit. The Ohio State University.

Paez, A. V., Pallavicini, C., Schuster, F., Valacco, M.P., Giudice, J., Ortiz, E.G., 251 Anselmino, N., Labanca, E., Binaghi, M., Salierno, M., Martí, M.A., Cotignola,

J.H., Woloszynska-Read, A., Bruno, L., Levi, V., Navone, N., Vazquez, E.S.,

Gueron, G., 2016. Heme oxygenase-1 in the forefront of a multi-molecular network

that governs cell–cell contacts and filopodia-induced zippering in prostate cancer.

Cell Death Dis. 7. doi:10.1038/cddis.2016.420

Page, J.M., Harrison, S.A., 2009. NASH and HCC. Clin. Liver Dis. 13, 631–47.

doi:10.1016/j.cld.2009.07.007

Park, E.J., Lee, J.H., Yu, G.-Y., He, G., Ali, S.R., Holzer, R.G., Osterreicher, C.H.,

Takahashi, H., Karin, M., 2010. Dietary and genetic obesity promote liver

inflammation and tumorigenesis by enhancing IL-6 and TNF expression. Cell 140,

197–208. doi:10.1016/j.cell.2009.12.052

Parker, R.S., 1989. Carotenoids in Human Blood and Tissues. J. Nutr 119, 101–104.

Peano, C., Pietrelli, A., Consolandi, C., Rossi, E., Petiti, L., Tagliabue, L., De Bellis, G.,

Landini, P., 2013. An efficient rRNA removal method for RNA sequencing in GC-

rich bacteria. Microb. Inform. Exp. 3, 1. doi:10.1186/2042-5783-3-1

Peterson, J.J., Dwyer, J.T., Jacques, P.F., McCullough, M.L., 2012. Do Flavonoids

Reduce Cardiovascular Disease Incidence or Mortality in US and European

Populations? Nutr. Rev. 70, 491–508. doi:10.1111/j.1753-4887.2012.00508.x

Pez, F., Lopez, A., Kim, M., Wands, J.R., Caron de Fromentel, C., Merle, P., 2013. Wnt

signaling and hepatocarcinogenesis: molecular targets for the development of

innovative anticancer drugs. J. Hepatol. 59, 1107–17.

doi:10.1016/j.jhep.2013.07.001

Pimentel, H., Bray, N.L., Puente, S., Melsted, P., Pachter, L., 2017. differential analysis 252 of rna-seq incorporating quantification uncertainty 14, 687. doi:10.1038/nMeth.4324

Porrini, M., Riso, P., Brusamolino, A., Berti, C., Guarnieri, S., Visioli, F., 2007. Daily

intake of a formulated tomato drink affects carotenoid plasma and lymphocyte

concentrations and improves cellular antioxidant protection. Br. J. Nutr. 93, 93.

doi:10.1079/BJN20041315

Porrini, M., Riso, P., Testolin, G., 1998. Absorption of lycopene from single or daily

portions of raw and processed tomato. Br. J. Nutr. 80, 353–361.

doi:10.1017/S000711459800141X

Powell, A.L.T., Nguyen, C. V, Hill, T., Cheng, K.L., Figueroa-Balderas, R., Aktas, H.,

Ashrafi, H., Pons, C., Fernández-Muñoz, R., Vicente, A., Lopez-Baltazar, J., Barry,

C.S., Liu, Y., Chetelat, R., Granell, A., Van Deynze, A., Giovannoni, J.J., Bennett,

A.B., 2012. Uniform ripening encodes a Golden 2-like transcription factor

regulating tomato fruit chloroplast development. Science 336, 1711–5.

doi:10.1126/science.1222218

Preet, R., Mohapatra, P., Das, D., Satapathy, S.R., Choudhuri, T., Wyatt, M.D., Kundu,

C.N., 2013. Lycopene synergistically enhances quinacrine action to inhibit Wnt-

TCF signaling in breast cancer cells through APC. Carcinogenesis 34, 277–86.

doi:10.1093/carcin/bgs351

Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A., Reich, D.,

2006. Principal components analysis corrects for stratification in genome-wide

association studies. Nat. Genet. 38, 904–909. doi:10.1038/ng1847

R Development Core Team, 2018. R: A language and environment for statistical

computing. 253 R Development Core Team, 2016. R: A language and environment for statistical

computing.

Raiola, A., Rigano, M.M., Calafiore, R., Frusciante, L., Barone, A., 2014. Enhancing the

Health-Promoting Effects of Tomato Fruit for Biofortified Food. Mediators

Inflamm. 2014. doi:10.1155/2014/139873

Rajendra Prasad, N., Karthikeyan, A., Karthikeyan, S., Reddy, B.V., 2011. Inhibitory

effect of caffeic acid on cancer cell proliferation by oxidative mechanism in human

HT-1080 fibrosarcoma cell line. Mol. Cell. Biochem. 349, 11–19.

doi:10.1007/s11010-010-0655-7

Ramsköld, D., Luo, S., Wang, Y.-C., Li, R., Deng, Q., Faridani, O.R., Daniels, G.A.,

Khrebtukova, I., Loring, J.F., Laurent, L.C., Schroth, G.P., Sandberg, R., 2012. Full-

length mRNA-Seq from single-cell levels of RNA and individual circulating tumor

cells. Nat. Biotechnol. 30, 777–82. doi:10.1038/nbt.2282

Ranc, N., Muños, S., Santoni, S., Causse, M., 2008. A clarified position for Solanum

lycopersicum var. cerasiforme in the evolutionary history of tomatoes (solanaceae).

BMC Plant Biol. 8, 130. doi:10.1186/1471-2229-8-130

Rando, R.R., 1990. The Chemistry of Vitamin A and Vision. Angew. Chemie Int. Ed.

English 29, 461–480. doi:10.1002/anie.199004611

Rani, A., Murphy, J.J., 2016. STAT5 in Cancer and Immunity. J. Interf. Cytokine Res.

36, 226–237. doi:10.1089/jir.2015.0054

Rantalainen, M., Cloarec, O., Beckonert, O., Wilson, I.D., Jackson, D., Tonge, R.,

Rowlinson, R., Rayner, S., Nickson, J., Wilkinson, R.W., Mills, J.D., Trygg, J.,

Nicholson, J.K., Holmes, E., 2006. Statistically integrated metabonomic-proteomic 254 studies on a human prostate cancer xenograft model in mice. J. Proteome Res. 5,

2642–55. doi:10.1021/pr060124w

Rao, A.V., Fleshner, N., Agarwal, S., 1999. Serum and Tissue Lycopene and Biomarkers

of Oxidation in Prostate Cancer Patients: A Case-Control Study. Nutr. Cancer 33,

159–164. doi:10.1207/S15327914NC330207

Razifard, H., Ramos, A., Della Valle, A.L., Bodary, C., Goetz, E., Manser, E.J., Li, X.,

Zhang, L., Visa, S., Tieman, D., van der Knaap, E., Caicedo, A.L., 2020. Genomic

Evidence for Complex Domestication History of the Cultivated Tomato in Latin

America. Mol. Biol. Evol. 37, 1118–1132. doi:10.1093/molbev/msz297

Rein, D., Schijlen, E., Kooistra, T., Herbers, K., Verschuren, L., Hall, R., Sonnewald, U.,

Bovy, A., Kleemann, R., 2006. Transgenic flavonoid tomato intake reduces C-

reactive protein in human C-reactive protein transgenic mice more than wild-type

tomato. J. Nutr. 136, 2331–2337. doi:136/9/2331 [pii]

Rick, C.M., 1960. Hybridization between Lycopersicon Esculentum and Solanum

Penellii: Phylogenetic and Cytogenetic Significance. Proc. Natl. Acad. Sci. 46, 78–

82. doi:10.1073/pnas.46.1.78

Rick, C.M., Uhlig, J.W., Jones, A.D., 1994. High alpha-tomatine content in ripe fruit of

Andean Lycopersicon esculentum var. cerasiforme: developmental and genetic

aspects. Proc. Natl. Acad. Sci. U. S. A. 91, 12877–81.

Robbins, M.D., Sim, S.-C., Yang, W., Van Deynze, A., van der Knaap, E., Joobeur, T.,

Francis, D.M., 2011. Mapping and linkage disequilibrium analysis with a genome-

wide collection of SNPs that detect polymorphism in cultivated tomato. J Exp Bot

62, 1831–1845. 255 Robinson, M.D., McCarthy, D.J., Smyth, G.K., 2010. edgeR: a Bioconductor package for

differential expression analysis of digital gene expression data. Bioinformatics 26,

139–40. doi:10.1093/bioinformatics/btp616

Robinson, M.D., Oshlack, A., 2010. A scaling normalization method for differential

expression analysis of RNA-seq data. Genome Biol. doi:10.1186/gb-2010-11-3-r25

Rodríguez-Amaya, D.B., Kimura, M., 2004. HarvestPlus Handbook for Carotenoid

Analysis. : International Food Policy Research Institute (IFPRI) and International

Center for Tropical Agriculture (CIAT).

Ronen, G., Carmel-Goren, L., Zamir, D., Hirschberg, J., 2000. An alternative pathway to

beta -carotene formation in plant chromoplasts discovered by map-based cloning of

beta and old-gold color mutations in tomato. Proc. Natl. Acad. Sci. U. S. A. 97,

11102–7. doi:10.1073/pnas.190177497

Ronen, G., Cohen, M., Zamir, D., Hirschberg, J., 1999. Regulation of carotenoid

biosynthesis during tomato fruit development: expression of the gene for lycopene

epsilon cyclase is down- regulated during ripening and is elevated in the mutant

Delta. Plant J. 17, 341–351. doi:10.1046/j.1365-313X.1999.00381.x

Rosso, S.W., 1968. The ultrastructure of chromoplast development in red tomatoes. J.

Ultrastruct. Res. 25, 307–322. doi:10.1016/S0022-5320(68)80076-0

Sadler, G., Davis, J., Dezman, D., 1990. Rapid Extraction of Lycopene and ?-Carotene

from Reconstituted Tomato Paste and Pink Grapefruit Homogenates. J. Food Sci.

55, 1460–1461. doi:10.1111/j.1365-2621.1990.tb03958.x

Sander, L.C., Sharpless, K.E., Craft, N.E., Wise, S.A., 1994. Development of Engineered

Stationary Phases for the Separation of Carotenoid Isomers. Anal. Chem. 66, 1667– 256 1674. doi:10.1021/ac00082a012

Sari, E., 2016. The Effects of CYC-B Introgressions on Cherry Tomato Fruit Quality.

The Ohio State University.

Sato, S., Tabata, S., Hirakawa, H., Asamizu, E., Shirasawa, K., Isobe, S., Kaneko, T.,

Nakamura, Y., Shibata, D., Aoki, K., Egholm, M., Knight, J., Bogden, R., Li,

Changbao, Shuang, Y., Xu, X., Pan, S., Cheng, S., Liu, X., Ren, Y., Wang, J.,

Albiero, A., Dal Pero, F., Todesco, S., Van Eck, J., Buels, R.M., Bombarely, A.,

Gosselin, J.R., Huang, M., Leto, J.A., Menda, N., Strickler, S., Mao, L., Gao, S.,

Tecle, I.Y., York, T., Zheng, Y., Vrebalov, J.T., Lee, J., Zhong, S., Mueller, L.A.,

Stiekema, W.J., Ribeca, P., Alioto, T., Yang, W., Huang, Sanwen, Du, Y., Zhang,

Z., Gao, Jianchang, Guo, Y., Wang, Xiaoxuan, Li, Y., He, J., Li, Chuanyou, Cheng,

Z., Zuo, J., Ren, J., Zhao, J., Yan, L., Jiang, H., Wang, B., Li, H., Li, Z., Fu, F.,

Chen, B., Han, B., Feng, Q., Fan, D., Wang, Ying, Ling, H., Xue, Y., Ware, D.,

Richard McCombie, W., Lippman, Z.B., Chia, J.M., Jiang, K., Pasternak, S., Gelley,

L., Kramer, M., Anderson, L.K., Chang, S. Bin, Royer, S.M., Shearer, L.A., Stack,

S.M., Rose, J.K.C., Xu, Y., Eannetta, N., Matas, A.J., McQuinn, R., Tanksley, S.D.,

Camara, F., Guigó, R., Rombauts, S., Fawcett, J., Van De Peer, Y., Zamir, D.,

Liang, C., Spannagl, M., Gundlach, H., Bruggmann, R., Mayer, K., Jia, Z., Zhang,

J., Ye, Z., Bishop, G.J., Butcher, S., Lopez-Cobollo, R., Buchan, D., Filippis, I.,

Abbott, J., Dixit, I.R., Singh, M., Singh, A., Pal, J.K., Pandit, A., Singh, P.K.,

Mahato, A.K., Dogra, V., Gaikwad, K., Sharma, T.R., Mohapatra, T., Singh, N.K.,

Causse, M., Rothan, C., Noirot, C., Bellec, A., Klopp, C., Delalande, C., Berges, H.,

Mariette, J., Frasse, P., Vautrin, S., Zouine, T.M., Latché, A., Rousseau, C., Regad, 257 F., Pech, J.C., Philippot, M., Bouzayen, M., Pericard, P., Osorio, S., Del Carmen,

A.F., Monforte, A., Granell, A., Fernandez-Muñoz, R., Conte, M., Lichtenstein, G.,

Carrari, F., De Bellis, G., Fuligni, F., Peano, C., Grandillo, S., Termolino, P.,

Pietrella, M., Fantini, E., Falcone, G., Fiore, A., Giuliano, G., Lopez, L., Facella, P.,

Perrotta, G., Daddiego, L., Bryan, G., Orozco, B.M., Pastor, X., Torrents, D., Van

Schriek, M.G.M., Feron, R.M.C., Van Oeveren, J., De Heer, P., Da Ponte, L.,

Jacobs-Oomen, S., Cariaso, M., Prins, M., Van Eijk, M.J.T., Janssen, A., Van

Haaren, J.J., -HwanJo, S., Kim, J., Kwon, S.Y., Kim, Sangmi, Koo, D.H., Lee, S.,

Clouser, C., Rico, A., Hallab, A., Gebhardt, C., Klee, K., Jöcker, A., Warfsmann, J.,

Göbel, U., Kawamura, S., Yano, K., Sherman, J.D., Fukuoka, H., Negoro, S.,

Bhutty, S., Chowdhury, P., Chattopadhyay, D., Datema, E., Smit, S., Schijlen,

E.G.W.M., Van De Belt, J., Van Haarst, J.C., Peters, S.A., Van Staveren, M.J.,

Henkens, M.H.C., Mooyman, P.J.W., Hesselink, T., Van Ham, R.C.H.J., Jiang, G.,

Droege, M., Choi, D., Kang, B.C., Kim, B.D., Park, M., Kim, Seungill, Yeom, S.I.,

Lee, Y.H., Choi, Y. Do, Li, G., Gao, Jianwei, Liu, Y., Huang, Shengxiong,

Fernandez-Pedrosa, V., Collado, C., Zuñ Iga, S., Wang, G., Cade, R., Dietrich, R.A.,

Rogers, J., Knapp, S., Fei, Z., White, R.A., Thannhauser, T.W., Giovannoni, J.J.,

Botella, M.A., Gilbert, L., Gonzalez, F.R., Goicoechea, J.L., Yu, Y., Kudrna, D.,

Collura, K., Wissotski, M., Wing, R., Meyers, B.C., Gurazada, A.B., Green, P.J.,

Mathur, S., Vyas, S., Solanke, A.U., Kumar, R., Gupta, V., Sharma, A.K., Khurana,

P., Khurana, J.P., Tyagi, A.K., Dalmay, T., Mohorianu, I., Walts, B., Chamala, S.,

Barbazuk, W.B., Li, J., Guo, H., Lee, T.H., Wang, Yupeng, Zhang, D., Paterson,

A.H., Wang, Xiyin, Tang, H., Barone, A., Chiusano, M.L., Ercolano, M.R., 258 D’Agostino, N., Di Filippo, M., Traini, A., Sanseverino, W., Frusciante, L.,

Seymour, G.B., Elharam, M., Fu, Y., Hua, A., Kenton, S., Lewis, J., Lin, S., Najar,

F., Lai, H., Qin, B., Shi, R., Qu, C., White, D., White, J., Xing, Y., Yang, K., Yi, J.,

Yao, Z., Zhou, L., Roe, B.A., Vezzi, A., D’Angelo, M., Zimbello, R., Schiavon, R.,

Caniato, E., Rigobello, C., Campagna, D., Vitulo, N., Valle, G., Nelson, D.R., De

Paoli, E., Szinay, D., De Jong, H.H., Bai, Y., Visser, R.G.F., Lankhorst, R.K.,

Beasley, H., McLaren, K., Nicholson, C., Riddle, C., Gianese, G., 2012. The tomato

genome sequence provides insights into fleshy fruit evolution. Nature 485, 635–641.

doi:10.1038/nature11119

Schlösser, E., Gottlieb, D., 1966. Mode of hemolytic action of the antifungal polyene

antibiotic filipin. Zeitschrift fur Naturforsch. - Sect. B J. Chem. Sci. 21, 74–77.

doi:10.1515/znb-1966-0120

Schuck, R.N., Zha, W., Edin, M.L., Gruzdev, A., Vendrov, K.C., Miller, T.M., Xu, Z.,

Lih, F.B., DeGraff, L.M., Tomer, K.B., Jones, H.M., Makowski, L., Huang, L.,

Poloyac, S.M., Zeldin, D.C., Lee, C.R., 2014. The cytochrome p450 epoxygenase

pathway regulates the hepatic inflammatory response in fatty liver disease. PLoS

One 9. doi:10.1371/journal.pone.0110162

Schwahn, K., de Souza, L.P., Fernie, A.R., Tohge, T., 2014. Metabolomics-assisted

refinement of the pathways of steroidal glycoalkaloid biosynthesis in the tomato

clade. J. Integr. Plant Biol. 56, 864–875. doi:10.1111/jipb.12274

Schwartz, S.H., 1997. Specific Oxidative Cleavage of Carotenoids by VP14 of Maize.

Science (80-. ). 276, 1872–1874. doi:10.1126/science.276.5320.1872

Schweiggert, R.M.M., Carle, R., 2017. Carotenoid deposition in plant and animal foods 259 and its impact on bioavailability. Crit. Rev. Food Sci. Nutr. 57, 1807–1830.

doi:10.1080/10408398.2015.1012756

Sérino, S., Gomez, L., Costagliola, G., Gautier, H., 2009. HPLC Assay of Tomato

Carotenoids: Validation of a Rapid Microextraction Technique. J. Agric. Food

Chem. 57, 8753–8760. doi:10.1021/jf902113n

Shen, R., Mo, Q., Schultz, N., Seshan, V.E., Olshen, A.B., Huse, J., Ladanyi, M., Sander,

C., 2012. Integrative subtype discovery in glioblastoma using iCluster. PLoS One 7,

e35236. doi:10.1371/journal.pone.0035236

Sheng, K., Cao, W., Niu, Y., Deng, Q., Zong, C., 2017. Effective detection of variation in

single-cell transcriptomes using MATQ-seq. Nat. Methods 14, 267–270.

doi:10.1038/nmeth.4145

Shifman, A.R., Johnson, R.M., Wilhelm, B.T., 2016. Cascade: an RNA-seq visualization

tool for cancer genomics. BMC Genomics 17, 75. doi:10.1186/s12864-016-2389-8

Shishkin, A.A., Giannoukos, G., Kucukural, A., Ciulla, D., Busby, M., Surka, C., Chen,

J., Bhattacharyya, R.P., Rudy, R.F., Patel, M.M., Novod, N., Hung, D.T., Gnirke,

A., Garber, M., Guttman, M., Livny, J., 2015. Simultaneous generation of many

RNA-seq libraries in a single reaction. Nat. Methods 12, 323–5.

doi:10.1038/nmeth.3313

Siddiqui, J.K., Baskin, E., Liu, M., Cantemir-Stone, C.Z., Zhang, B., Bonneville, R.,

McElroy, J.P., Coombes, K.R., Mathé, E.A., 2018. IntLIM: integration using linear

models of metabolomics and gene expression data. BMC Bioinformatics 19, 81.

doi:10.1186/s12859-018-2085-6

Siegel, R., Naishadham, D., Jemal, A., 2012. Cancer statistics, 2012. CA. Cancer J. Clin. 260 62, 10–29. doi:10.3322/caac.20138

Silva, F.A., Borges, F., Guimarães, C., Lima, J.L., Matos, C., Reis, S., 2000. Phenolic

acids and derivatives: studies on the relationship among structure, radical

scavenging activity, and physicochemical parameters. J. Agric. Food Chem. 48,

2122–2126.

Sim, S.-C., Durstewitz, G., Plieske, J., Wieseke, R., Ganal, M.W., Van Deynze, A.,

Hamilton, J.P., Buell, C.R., Causse, M., Wijeratne, S., Francis, D.M., 2012a.

Development of a large snp genotyping array and generation of high-density genetic

maps in tomato. PLoS One 7, e40563. doi:10.1371/journal.pone.0040563

Sim, S.-C., Van Deynze, A., Stoffel, K., Douches, D.S., Zarka, D., Ganal, M.W.,

Chetelat, R.T., Hutton, S.F., Scott, J.W., Gardner, R.G., Panthee, D.R., Mutschler,

M., Myers, J.R., Francis, D.M., 2012b. High-Density SNP Genotyping of Tomato

(Solanum lycopersicum L.) Reveals Patterns of Genetic Variation Due to Breeding.

PLoS One 7, e45520. doi:10.1371/journal.pone.0045520

Sim, S.C., Robbins, M.D., Wijeratne, S., Wang, H., Yang, W., Francis, D.M., 2015.

Association analysis for bacterial spot resistance in a directionally selected complex

breeding population of tomato. Phytopathology. doi:10.1094/PHYTO-02-15-0051-R

Slimestad, R., Fossen, T., Verheul, M.J., 2008. The Flavonoids of Tomatoes. J. Agric.

Food Chem. 56, 2436–2441. doi:10.1021/jf073434n

Smith, A.F., 1994. The Tomato in America: Early History, Culture, and Cookery -

Andrew F. Smith - Google Books. University of Illinois Press.

Sonawane, P.D., Heinig, U., Panda, S., Gilboa, N.S., Yona, M., Pradeep Kumar, S.,

Alkan, N., Unger, T., Bocobza, S., Pliner, M., Malitsky, S., Tkachev, M., Meir, S., 261 Rogachev, I., Aharoni, A., 2018. Short-chain dehydrogenase/reductase governs

steroidal specialized metabolites structural diversity and toxicity in the genus

Solanum. Proc. Natl. Acad. Sci. U. S. A. 115, E5419–E5428.

doi:10.1073/pnas.1804835115

Spencer, J.P.E., 2009a. Flavonoids and brain health: multiple effects underpinned by

common mechanisms. Genes Nutr. 4, 243–250. doi:10.1007/s12263-009-0136-3

Spencer, J.P.E., 2009b. The impact of flavonoids on memory: physiological and

molecular considerations. Chem. Soc. Rev. 38, 1152–1161. doi:10.1039/B800422F

Srinivasan, M., Sudheer, A.R., Menon, V.P., 2007. Recent Advances in Indian Herbal

Drug Research Guest Editor: Thomas Paul Asir Devasagayam Ferulic Acid:

Therapeutic Potential Through Its Antioxidant Property. J. Clin. Biochem. Nutr. 40,

92–100. doi:10.3164/jcbn.40.92

Stahl, W., Heinrich, U., Jungmann, H., Sies, H., Tronnier, H., 2000. Carotenoids and

carotenoids plus vitamin E protect against ultraviolet light–induced erythema in

humans. Am. J. Clin. Nutr. 71, 795–798.

Stahl, W., Heinrich, U., Wiseman, S., Eichler, O., Sies, H., Tronnier, H., 2001. Dietary

Tomato Paste Protects against Ultraviolet Light–Induced Erythema in Humans. J.

Nutr. 131, 1449–1451.

Stahl, W., Schwarz, W., Sundquist, A.R., Sies, H., 1992. cis-trans isomers of lycopene

and β-carotene in human serum and tissues. Arch. Biochem. Biophys. 294, 173–177.

doi:10.1016/0003-9861(92)90153-N

Stahl, W., Sies, H., 1992. Uptake of lycopene and its geometrical isomers is greater from

heat-processed than from unprocessed tomato juice in humans. J. Nutr. 122, 2161–6. 262 Stalmach, A., Steiling, H., Williamson, G., Crozier, A., 2010. Bioavailability of

chlorogenic acids following acute ingestion of coffee by humans with an ileostomy.

Arch. Biochem. Biophys. 501, 98–105. doi:10.1016/j.abb.2010.03.005

Stewart, A.J., Bozonnet, S., Mullen, W., Jenkins, G.I., Lean, M.E.J., Crozier, A., 2000.

Occurrence of flavonols in tomatoes and tomato-based products. J. Agric. Food

Chem. 48, 2663–2669. doi:10.1021/jf000070p

Stommel, J., Abbott, J.A., Saftner, R.A., Camp, M.J., 2005. Sensory and Objective

Quality Attributes of Beta-carotene and Lycopene-rich Tomato Fruit. J. Am. Soc.

Hortic. Sci. 130, 244–251.

Stommel, J.R., 2001. USDA 97L63, 97L66, and 97L97: Tomato breeding lines with high

fruit beta-carotene content. HortScience 36, 387–388.

Stommel, J.R., Haynes, K.G., 1994. Inheritance of beta carotene content in the wild

tomato species Lycopersicon cheesmanii. J. Hered. 85, 401–404.

Story, E.N., Kopec, R.E., Schwartz, S.J., Harris, G.K., 2010. An update on the health

effects of tomato lycopene. Annu. Rev. Food Sci. Technol. 1, 189–210.

doi:10.1146/annurev.food.102308.124120

Sumner, L.W., Amberg, A., Barrett, D., Beale, M.H., Beger, R., Daykin, C.A., Fan,

T.W.-M., Fiehn, O., Goodacre, R., Griffin, J.L., Hankemeier, T., Hardy, N., Harnly,

J., Higashi, R., Kopka, J., Lane, A.N., Lindon, J.C., Marriott, P., Nicholls, A.W.,

Reily, M.D., Thaden, J.J., Viant, M.R., 2007. Proposed minimum reporting

standards for chemical analysis Chemical Analysis Working Group (CAWG)

Metabolomics Standards Initiative (MSI). Metabolomics 3, 211–221.

doi:10.1007/s11306-007-0082-2 263 Sun, B., Karin, M., 2012. Obesity, inflammation, and liver cancer. J. Hepatol. 56, 704–

13. doi:10.1016/j.jhep.2011.09.020

Sun, T.P., Kamiya, Y., 1994. The Arabidopsis GA1 locus encodes the cyclase ent-

kaurene synthetase A of gibberellin biosynthesis. Plant Cell 6, 1509–1518.

doi:10.1105/tpc.6.10.1509

Sun, X.M., Yu, X.P., Liu, Y., Xu, L., Di, D.L., 2012. Combining bootstrap and

uninformative variable elimination: Chemometric identification of metabonomic

biomarkers by nonparametric analysis of discriminant partial least squares.

Chemom. Intell. Lab. Syst. 115, 37–43. doi:10.1016/j.chemolab.2012.04.006

Sutherland, W.H., Walker, R.J., De Jong, S.A., Upritchard, J.E., 1999. Supplementation

with tomato juice increases plasma lycopene but does not alter susceptibility to

oxidation of low-density lipoproteins from renal transplant recipients. Clin. Nephrol.

52, 30–6.

Szymańska, E., Saccenti, E., Smilde, A.K., Westerhuis, J.A., 2012. Double-check:

validation of diagnostic statistics for PLS-DA models in metabolomics studies.

Metabolomics 8, 3–16. doi:10.1007/s11306-011-0330-3

Tajner-Czopek, A., Rytel, E., Aniołowska, M., Hamouz, K., 2014. The influence of

French fries processing on the glycoalkaloid content in coloured-fleshed potatoes.

Eur. Food Res. Technol. 238, 895–904. doi:10.1007/s00217-014-2163-6

Tan, H.-L., Moran, N.E., Cichon, M.J., Riedl, K.M., Schwartz, S.J., Erdman, J.W., Pearl,

D.K., Thomas-Ahner, J.M., Clinton, S.K., 2014. β-Carotene-9’,10’-oxygenase status

modulates the impact of dietary tomato and lycopene on hepatic nuclear receptor-,

stress-, and metabolism-related gene expression in mice. J. Nutr. 144, 431–9. 264 doi:10.3945/jn.113.186676

Tan, H.-L., Thomas-Ahner, J.M., Moran, N.E., Cooperstone, J.L., Erdman, J.W., Young,

G.S., Clinton, S.K., 2017. β-Carotene 9′,10′ Oxygenase Modulates the Anticancer

Activity of Dietary Tomato or Lycopene on Prostate Carcinogenesis in the TRAMP

Model. Cancer Prev. Res. 10.

Tan, H.-L., Thomas-Ahner, J.M., Moran, N.E., Cooperstone, J.L., Erdman, J.W., Young,

G.S., Clinton, S.K., 2016. β-carotene 9’,10’ oxygenase Modulates the Anti-cancer

Activity of Dietary Tomato or Lycopene on Prostate Carcinogenesis in the TRAMP

Model. Cancer Prev. Res.

Tanaka, T., Shnimizu, M., Moriwaki, H., 2012. Cancer chemoprevention by carotenoids.

Molecules 17, 3202–42. doi:10.3390/molecules17033202

Thévenot, E.A., Roux, A., Xu, Y., Ezan, E., Junot, C., 2015. Analysis of the Human

Adult Urinary Metabolome Variations with Age, Body Mass Index, and Gender by

Implementing a Comprehensive Workflow for Univariate and OPLS Statistical

Analyses. J. Proteome Res. doi:10.1021/acs.jproteome.5b00354

Toledo-Ortiz, G., Huq, E., Rodríguez-Concepción, M., 2010. Direct regulation of

phytoene synthase gene expression and carotenoid biosynthesis by phytochrome-

interacting factors. Proc. Natl. Acad. Sci. U. S. A. 107, 11626–11631.

doi:10.1073/pnas.0914428107

Tomes, M., Quackenbush, F., Jr, O.N., North, B., 1953. The inheritance of carotenoid

pigment systems in the tomato. Genetics 117–127.

Tomes, M.L., Quackenbush, F.W., McQuistan, M., 1954. Modification and Dominance

of the Gene Governing Formation of High Concentrations of BETA-Carotene in the 265 Tomato. Genetics 39, 810–7.

Tonucci, L.H., Holden, J.M., Beecher, G.R., Khachik, F., Davis, C.S., Mulokozi, G.,

1995. Carotenoid Content of Thermally Processed Tomato-Based Food Products. J.

Agric. Food Chem. 43, 579–586. doi:10.1021/jf00051a005

Torres, C.A., Andrews, P.K., Davies, N.M., 2006. Physiological and biochemical

responses of fruit exocarp of tomato (Lycopersicon esculentum Mill.) mutants to

natural photo-oxidative conditions. J. Exp. Bot. 57, 1933–1947.

doi:10.1093/jxb/erj136

Trapnell, C., Pachter, L., Salzberg, S.L., 2009. TopHat: discovering splice junctions with

RNA-Seq. Bioinformatics 25, 1105–11. doi:10.1093/bioinformatics/btp120

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H.,

Salzberg, S.L., Rinn, J.L., Pachter, L., 2012. Differential gene and transcript

expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat.

Protoc. 7, 562–78. doi:10.1038/nprot.2012.016

Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J.,

Salzberg, S.L., Wold, B.J., Pachter, L., 2010. Transcript assembly and quantification

by RNA-Seq reveals unannotated transcripts and isoform switching during cell

differentiation. Nat. Biotechnol. 28, 511–5. doi:10.1038/nbt.1621

USDA-ERS, 2018. USDA ERS - Food Availability (Per Capita) Data System [WWW

Document]. URL https://www.ers.usda.gov/data-products/food-availability-per-

capita-data-system/ (accessed 6.26.20).

USDA National Agricultural Statistics, 2012. Statistics of vegetables and melons.

Vallverdú-Queralt, A., Medina-Remón, A., Casals-Ribes, I., Andres-Lacueva, C., 266 Waterhouse, A.L., Lamuela-Raventos, R.M., 2012. Effect of tomato industrial

processing on phenolic profile and hydrophilic antioxidant capacity. LWT - Food

Sci. Technol. 47, 154–160. doi:10.1016/j.lwt.2011.12.020

Van Berloo, R., Zhu, A., Ursem, R., Verbakel, H., Gort, G., Van Eeuwijk, F.A., 2008.

Diversity and linkage disequilibrium analysis within a selected set of cultivated

tomatoes. Theor. Appl. Genet. 117, 89–101. doi:10.1007/s00122-008-0755-x van den Berg, R.A., Hoefsloot, H.C.J., Westerhuis, J.A., Smilde, A.K., van der Werf,

M.J., 2006. Centering, scaling, and transformations: Improving the biological

information content of metabolomics data. BMC Genomics 7, 142.

doi:10.1186/1471-2164-7-142

Vogt, T., 2010. Phenylpropanoid Biosynthesis. Mol. Plant 3, 2–20.

doi:10.1093/mp/ssp106

Wan, L., Tan, H.L., Thomas-Ahner, J.M., Pearl, D.K., Erdman, J.W., Moran, N.E.,

Clinton, S.K., 2014. Dietary tomato and lycopene impact androgen signaling- and

carcinogenesis-related gene expression during early TRAMP prostate

carcinogenesis. Chest 146, 1228–1239. doi:10.1158/1940-6207.CAPR-14-0182

Wang, C.C., Meng, L.H., Gao, Y., Grierson, D., Fu, D.Q., 2018. Manipulation of light

signal transduction factors as a means of modifying steroidal glycoalkaloids

accumulation in tomato leaves. Front. Plant Sci. doi:10.3389/fpls.2018.00437

Wang, M., Carver, J.J., Phelan, V. V., Sanchez, L.M., Garg, N., Peng, Y., Nguyen, D.D.,

Watrous, J., Kapono, C.A., Luzzatto-Knaan, T., Porto, C., Bouslimani, A., Melnik,

A. V., Meehan, M.J., Liu, W.T., Crüsemann, M., Boudreau, P.D., Esquenazi, E.,

Sandoval-Calderón, M., Kersten, R.D., Pace, L.A., Quinn, R.A., Duncan, K.R., Hsu, 267 C.C., Floros, D.J., Gavilan, R.G., Kleigrewe, K., Northen, T., Dutton, R.J., Parrot,

D., Carlson, E.E., Aigle, B., Michelsen, C.F., Jelsbak, L., Sohlenkamp, C., Pevzner,

P., Edlund, A., McLean, J., Piel, J., Murphy, B.T., Gerwick, L., Liaw, C.C., Yang,

Y.L., Humpf, H.U., Maansson, M., Keyzers, R.A., Sims, A.C., Johnson, A.R.,

Sidebottom, A.M., Sedio, B.E., Klitgaard, A., Larson, C.B., Boya, C.A.P., Torres-

Mendoza, D., Gonzalez, D.J., Silva, D.B., Marques, L.M., Demarque, D.P., Pociute,

E., O’Neill, E.C., Briand, E., Helfrich, E.J.N., Granatosky, E.A., Glukhov, E.,

Ryffel, F., Houson, H., Mohimani, H., Kharbush, J.J., Zeng, Y., Vorholt, J.A.,

Kurita, K.L., Charusanti, P., McPhail, K.L., Nielsen, K.F., Vuong, L., Elfeki, M.,

Traxler, M.F., Engene, N., Koyama, N., Vining, O.B., Baric, R., Silva, R.R.,

Mascuch, S.J., Tomasi, S., Jenkins, S., Macherla, V., Hoffman, T., Agarwal, V.,

Williams, P.G., Dai, J., Neupane, R., Gurr, J., Rodríguez, A.M.C., Lamsa, A.,

Zhang, C., Dorrestein, K., Duggan, B.M., Almaliti, J., Allard, P.M., Phapale, P.,

Nothias, L.F., Alexandrov, T., Litaudon, M., Wolfender, J.L., Kyle, J.E., Metz, T.O.,

Peryea, T., Nguyen, D.T., VanLeer, D., Shinn, P., Jadhav, A., Müller, R., Waters,

K.M., Shi, W., Liu, X., Zhang, L., Knight, R., Jensen, P.R., Palsson, B., Pogliano,

K., Linington, R.G., Gutiérrez, M., Lopes, N.P., Gerwick, W.H., Moore, B.S.,

Dorrestein, P.C., Bandeira, N., 2016. Sharing and community curation of mass

spectrometry data with Global Natural Products Social Molecular Networking. Nat.

Biotechnol. doi:10.1038/nbt.3597

Wang, Y., Ausman, L.M., Greenberg, A.S., Russell, R.M., Wang, X.-D., 2010. Dietary

lycopene and tomato extract supplementations inhibit nonalcoholic steatohepatitis-

promoted hepatocarcinogenesis in rats. Int. J. cancer 126, 1788–96. 268 doi:10.1002/ijc.24689

Wang, Z., Gerstein, M., Snyder, M., 2009. RNA-Seq: a revolutionary tool for

transcriptomics. Nat. Rev. Genet. 10, 57–63. doi:10.1038/nrg2484

Wei, M.Y., Giovannucci, E.L., 2012. Lycopene, Tomato Products, and Prostate Cancer

Incidence: A Review and Reassessment in the PSA Screening Era. J. Oncol. 2012,

e271063. doi:10.1155/2012/271063

Wei, T., Simko, V., 2017. R package “corrplot”: Visualization of a Correlation Matrix.

Wertz, K., 2009. Lycopene effects contributing to prostate health. Nutr. Cancer 61, 775–

783. doi:10.1080/01635580903285023

Westerhuis, J.A., Hoefsloot, H.C.J., Smit, S., Vis, D.J., Smilde, A.K., Velzen, E.J.J.,

Duijnhoven, J.P.M., Dorsten, F.A., 2008. Assessment of PLSDA cross validation.

Metabolomics 4, 81–89. doi:10.1007/s11306-007-0099-6

Westermann, A.J., Gorski, S.A., Vogel, J., 2012. Dual RNA-seq of pathogen and host.

Nat. Rev. Microbiol. 10, 618–30. doi:10.1038/nrmicro2852

White, W.S., Kim, C.I., Kalkwarf, H.J., Bustos, P., Roe, D.A., 1988. Ultraviolet light-

induced reductions in plasma carotenoid levels. Am. J. Clin. Nutr. 47, 879–883.

doi:10.1093/ajcn/47.5.879

Wickham, H., 2016. ggplot2 Elegant Graphics for Data Analysis (Use R!), Springer.

Springer-Verlag, New York, NY. doi:10.1007/978-0-387-98141-3

Wickham, H., 2009. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New

York, NY.

Williams, C.R., Baccarella, A., Parrish, J.Z., Kim, C.C., 2016. Trimming of sequence

reads alters RNA-Seq gene expression estimates. BMC Bioinformatics 17, 103. 269 doi:10.1186/s12859-016-0956-2

Williamson, G., Clifford, M.N., 2010. Colonic metabolites of berry polyphenols: The

missing link to biological activity? Br. J. Nutr. 104, S48–S66.

doi:10.1017/S0007114510003946

Wishart, D.S., 2008. Metabolomics: applications to food science and nutrition research.

Trends Food Sci. Technol. doi:10.1016/j.tifs.2008.03.003

Wishart, D.S., Jewison, T., Guo, A.C., Wilson, M., Knox, C., Liu, Y., Djoumbou, Y.,

Mandal, R., Aziat, F., Dong, E., Bouatra, S., Sinelnikov, I., Arndt, D., Xia, J., Liu,

P., Yallou, F., Bjorndahl, T., Perez-Pineiro, R., Eisner, R., Allen, F., Neveu, V.,

Greiner, R., Scalbert, A., 2013. HMDB 3.0-The Human Metabolome Database in

2013. Nucleic Acids Res. doi:10.1093/nar/gks1065

Wold, S., Geladi, P., Esbensen, K., Öhman, J., 1987. Multi-way principal components-

and PLS-analysis. J. Chemom. 1, 41–56. doi:10.1002/cem.1180010107

Wong, C.-M., Man-Fong Lee, J., Ching, Y.-P., Jin, D.-Y., Oi-lin Ng, I., 2003. Genetic

and Epigenetic Alterations of DLC-1 Gene in Hepatocellular Carcinoma.

Wu, A.R., Neff, N.F., Kalisky, T., Dalerba, P., Treutlein, B., Rothenberg, M.E., Mburu,

F.M., Mantalas, G.L., Sim, S., Clarke, M.F., Quake, S.R., 2014. Quantitative

assessment of single-cell RNA-sequencing methods. Nat. Methods 11, 41–6.

doi:10.1038/nmeth.2694

Wu, K., Erdman, J.W., Schwartz, S.J., Platz, E. a, Leitzmann, M., Clinton, S.K., DeGroff,

V., Willett, W.C., Giovannucci, E., 2004. Plasma and Dietary Carotenoids, and the

Risk of Prostate Cancer: A Nested Case-Control Study. Cancer Epidemiol.

Biomarkers Prev. 13, 260–269. doi:10.1158/1055-9965.EPI-03-0012 270 Wu, X., Li, Y., 2012. Signaling Pathways in Liver Cancer, in: Liver Tumors. pp. 37–58.

Xia, H., Liu, C., Li, C.C., Fu, M., Takahashi, S., Hu, K.Q., Aizawa, K., Hiroyuki, S., Wu,

G., Zhao, L., Wang, X.D., 2018. Dietary tomato powder inhibits high-fat diet-

promoted hepatocellular carcinoma with alteration of gut microbiota in mice lacking

carotenoid cleavage enzymes. Cancer Prev. Res. doi:10.1158/1940-6207.CAPR-18-

0188

Yahara, S., Uda, N., Yoshio, E., Yae, E., 2004. Steroidal Alkaloid Glycosides from

Tomato ( Lycopersicon e sculentum ). J. Nat. Prod. 67, 500–502.

doi:10.1021/np030382x

Yamanaka, T., Vincken, J.-P., Zuilhof, H., Legger, A., Takada, N., Gruppen, H., 2009.

C22 Isomerization in α-Tomatine-to-Esculeoside A Conversion during Tomato

Ripening Is Driven by C27 Hydroxylation of Triterpenoidal Skeleton. J. Agric. Food

Chem. 57, 3786–3791. doi:10.1021/jf900017n

Yardy, G.W., Brewster, S.F., 2005. Wnt signalling and prostate cancer. Prostate Cancer

Prostatic Dis. 8, 119–26. doi:10.1038/sj.pcan.4500794

Yonekura-Sakakibara, K., Saito, K., Keiko Yonekura-Sakakibara, Kazuki Saito, 2009.

Functional genomics for plant biosynthesis. Nat. Prod. Rep. 26.

Yoon, S., Nam, D., 2017. Gene dispersion is the key determinant of the read count bias in

differential expression analysis of RNA-seq data. BMC Genomics 18, 408.

doi:10.1186/s12864-017-3809-0

Yu, G., Li, C., Zhang, L., Zhu, G., Munir, S., Shi, C., Zhang, H., Ai, G., Gao, S., Zhang,

Y., Yang, C., Zhang, J., Li, H., Ye, Z., 2020. An allelic variant in GAME9

determines its binding capacity to the GAME17 promoter in the regulation of 271 steroidal glycoalkaloid biosynthesis in tomato. J. Exp. Bot. 9, 2527–2536.

doi:10.1093/jxb/eraa014

Yu, J., Pressoir, G., Briggs, W.H., Bi, I.V., Yamasaki, M., Doebley, J.F., McMullen,

M.D., Gaut, B.S., Nielsen, D.M., Holland, J.B., Kresovich, S., Buckler, E.S., 2006.

A unified mixed-model method for association mapping that accounts for multiple

levels of relatedness. Nat. Genet. 38, 203–208. doi:10.1038/ng1702

Yu, M.-W., Chiu, Y.-H., Chiang, Y.-C., Chen, C.-H., Lee, T.-H., Santella, R.M., Chern,

H.-D., Liaw, Y.-F., Chen, C.-J., 1999. Plasma Carotenoids, Glutathione S-

Transferase M1 andT1 Genetic Polymorphisms, and Risk of Hepatocellular

Carcinoma: Independent and Interactive Effects. Am. J. Epidemiol. 149, 621–629.

doi:10.1093/oxfordjournals.aje.a009862

Yu, Ying, Fuscoe, James C., Zhao, C., Guo, C., Jia, M., Qing, Tao, Bannon, D.I.,

Lancashire, L., Bao, W., Du, Tingting, Luo, H., Su, Z., Jones, W.D., Moland, Carrie

L., Branham, William S., Qian, F., Ning, B., Li, Y., Hong, H., Guo, L., Mei, N., Shi,

T., Wang, K.Y., Wolfinger, R.D., Nikolsky, Y., Walker, S.J., Duerksen-Hughes, P.,

Mason, C.E., Tong, W., Thierry-Mieg, Jean, Thierry-Mieg, Danielle, Shi, Leming,

Wang, C., Armit, C., Henry, A.M., Hohmann, J.G., Djebali, S., Bernstein, B.E.,

Roy, S., Mardis, E.R., Next-generation, D.N.A., Metzker, M.L., Morozova, O.,

Hirst, M., Marra, M.A., Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L.,

Wold, B., Celniker, S.E., Gerstein, M.B., Graveley, B.R., Katz, Y., Wang, E.T.,

Airoldi, E.M., Burge, C.B., Wang, E.T., Chintapalli, V.R., Wang, J., Dow, J.A.,

Krupp, M., Hishiki, T., Kawamoto, S., Morishita, S., Okubo, K., Kawamoto, S.,

Cookson, M.R., Chapple, R.H., Mori, K., Lee, J.S., Kwekel, J.C., Desai, V.G., 272 Moland, C. L., Branham, W. S., Fuscoe, J. C., Kearns, G.L., Abernethy, D.R.,

Woodcock, J., Lesko, L.J., Thierry-Mieg, D., Thierry-Mieg, J., Qing, T., Yu, Y., Du,

T., Shi, L., Ghosh, S.S., Daidoji, T., Gozu, K., Iwano, H., Inoue, H., Yokota, H.,

Richardson, T.A., Sherman, M., Kalman, D., Morgan, E.T., Westin, M.A., Hunt,

M.C., Alexson, S.E., Das, A.K., Uhler, M.D., Hajra, A.K., Hafez, D., Ni, T.,

Mukherjee, S., Zhu, J., Ohler, U., Zhang, W., Kornblihtt, A.R., Derrien, T., Lohse,

M., Trapnell, C., Pachter, L., Salzberg, S.L., Trapnell, C., Shi, L., Shi, L.,

Krzywinski, M., 2014. A rat RNA-Seq transcriptomic BodyMap across 11 organs

and 4 developmental stages. Nat. Commun. 5, 514–524. doi:10.1038/ncomms4230

Zamir, D., 2001. Improving plant breeding with exotic genetic libraries. Nat. Rev. Genet.

2, 983–989. doi:10.1038/35103590

Zechmeister, L., LeRose, A.L., Went, F.W., Pauling, L., 1941. Prolycopene , a Naturally

Occurring Stereoisomer of Lycopene. Proc. Natl. Acad. Sci. 27, 468–474.

Zhang, M., Yuan, B., Leng, P., 2009. Cloning of 9-cis-epoxycarotenoid dioxygenase

(NCED) gene and the role of ABA on fruit ripening. Plant Signal. Behav. 4, 460–3.

doi:10.1093/jxb/erp026

Zhang, Y.K.J., Yeager, R.L., Klaassen, C.D., 2009. Circadian expression profiles of

drug-processing genes and transcription factors in mouse liver. Drug Metab. Dispos.

37, 106–115. doi:10.1124/dmd.108.024174

Zhu, G., Wang, S., Huang, Z., Zhang, S., Liao, Q., Zhang, C., Lin, T., Qin, M., Peng, M.,

Yang, C., Cao, X., Han, X., Wang, X., van der Knaap, E., Zhang, Z., Cui, X., Klee,

H., Fernie, A.R., Luo, J., Huang, S., 2018. Rewiring of the Fruit Metabolome in

Tomato Breeding. Cell 172, 249-261.e12. doi:10.1016/j.cell.2017.12.019 273 Zoratti, L., Karppinen, K., Luengo Escobar, A., Häggman, H., Jaakola, L., 2014. Light-

controlled flavonoid biosynthesis in fruits. Front. Plant Sci. 5.

doi:10.3389/fpls.2014.00534

274 Appendix A. Supplemental Figures for Chapter 5

275

A.1. Map of South and Central America displaying locations where Solanum pimpinellifolium and Solanum lycopersicum var. cerasiforme were collected.

276

A.2. Additional box and whisker plots of steroidal alkaloids measured in diverse germplasm. The y-axis was log transformed to visually condense the large amount of variation observed in the concentrations of all tomato steroidal alkaloids measured in this study.

277

A.3. Additional box and whisker plots of steroidal alkaloids measured in diverse germplasm. The y-axis was log transformed to visually condense the large amount of variation observed in the concentrations of all tomato steroidal alkaloids measured in this study.

278

A.4. Correlation matrix of all tomato steroidal alkaloids and their isomers quantified in diversity panel. Size and darkness of circle indicate intensity of correlation coefficient (see legend on right) and *, **, and *** indicate statistical significance at P<0.05, 0.01, and 0.001, respectively. Cells with no significance indicator were found to be P>0.05.

279 Appendix B. Supplemental Tables for Chapter 5

B.1. Tomato steroidal alkaloid concentrations (in µg/100g fresh weight, ± standard deviation) in parental material and wide cross hybrids (F1 generation).

Background Dehydrotomatidine Tomatidine Dehydrotomatine Alpha-tomatine Hydroxytomatine Acetoxytomatine

OH8243 0.41±0.48a 0.02±0.06az 20.36±17.72c 283.81±270.36b 182.26±66.31b 62.84±60.21c LA2213 23.41±21.85a 1.16±2.07a 1770.84±281.33a 31338.28±6919.53a 3021.71±1654.69a 6984.22±2488.19a LA2213xOH8243 0.37±0.28a 0.03±0.04a 17.74±20.87c 227.71±289.34b 308.08±151.34b 89.46±70.06c

LA2256 29.57±53.14a 0.56±0.56a 1220.39±186.68b 28912.39±5010.84a 829.82±250.64b 2374.46±537.27b LA2256xOH8243 1.03±1.69a 0.24±0.43a 16.30±16.61c 300.87±274.90b 373.96±320.18b 116.13±139.07c

LA2262 21.60±21.56a 0.90±1.44a 1225.29±374.50b 26257.14±8638.81a 724.10±295.03b 2070.27±594.28b LA2262xOH8243 0.27±0.25a 0.03±0.04a 18.77±9.39c 293.59±131.38b 246.23±113.73b 115.51±55.10c

Tainan 0.41±0.47a 0.04±0.10a 34.25±31.68c 297.09±320.24b 696.85±421.10b 161.45±148.01d LA2213 23.41±21.85a 1.16±2.07a 1770.84±281.33a 31338.28±6919.53a 3021.71±1654.69a 6984.22±2488.19a LA2213xTainan 1.89±3.99a 0.00±0.00a 108.86±93.15c 1162.90±1007.23b 1624.08±1069.87ab 608.00±539.37bcd

LA2256 29.57±53.14a 0.56±0.56a 1220.39±186.68b 28912.39±5010.84a 829.82±250.64b 2374.46±537.27b LA2256xTainan 0.30±0.22a 0.00±0.00a 31.88±19.31c 440.24±267.67b 477.19±208.10b 160.57±121.45d

LA2262 21.60±21.56a 0.90±1.44a 1225.29±374.50b 26257.14±8638.81a 724.10±295.03b 2070.27±594.28bc LA2262xTainan 0.44±0.44a 0.06±0.15a 50.94±44.03c 712.66±692.40b 634.74±255.77b 218.92±162.62cd

280 B.1. cont’d.

Dehydrolycoperoside F, Lycoperoside F, G, or G, or Esculeoside A Background Dehydroesculeoside A Esculeoside B Total

OH8243 14.66±8.08a 926.22±463.79ab 54.60±43.92ab 1545.18±772.13b LA2213 2.15±2.12a 395.80±177.88b 14.79±6.02b 38426.37±17528.83a LA2213xOH8243 22.44±19.80a 1788.87±1262.99ab 134.42±156.75ab 2587.24±1587.99b

LA2256 0.12±0.28a 141.22±64.71b 10.19±6.02b 33518.71±5956.70a LA2256xOH8243 62.90±103.68a 3571.24±3690.86a 253.70±282.38a 4696.36±4329.32b

LA2262 0.00±0.00a 129.12±54.92b 8.01±7.14b 30436.43±9805.27a LA2262xOH8243 21.49±13.04a 2320.54±909.83ab 93.22±110.11ab 3106.40±1131.36b

Tainan 63.56±40.31ab 6144.48±3658.53ab 249.53±138.80a 7645.36±4472.67bc LA2213 2.15±2.12b 395.80±177.88bc 14.79±6.02a 38426.37±17528.83a LA2213xTainan 142.16±119.74a 11295.61±7589.26a 392.72±187.25a 15336.22±10420.81bc

LA2256 0.12±0.28b 141.22±64.71c 10.19±6.02a 33518.71±5956.70a LA2256xTainan 54.81±13.49ab 5547.22±1484.53abc 275.49±240.49a 6987.70±1464.70c

LA2262 0.00±0.00b 129.12±54.92c 8.01±7.14a 30436.43±9805.27ab LA2262xTainan 72.67±34.73ab 7503.96±1434.45a 423.58±517.93a 9617.99±1990.63c zValues with different letters are statistically different as determined by a Tukey’s honestly significant difference (HSD) test (α = 0.05). Linear models were run separately for crosses made with processing (top half of table) and cherry (bottom half of table) material.

281 B.2. Metadata information for diversity panel germplasm (Continued).

Genotype Species Class Origin Provence Blanca_Cluster1 Blanca_Cluster2 Cultivated CULBPT_05_11 Processing Processing USA NY SLL_processing_2 SLL_processing_2 Cultivated CULBPT_05_15 Processing Processing USA NY SLL_processing_2 SLL_processing_2 Cultivated CULBPT_05_22 Processing Processing USA NY SLL_processing_2 SLL_processing_2 Cultivated CULBPT04_1 Processing Processing USA NY SLL_processing_2 SLL_processing_2 Cultivated E6203 Processing Processing USA CA SLL_processing_2 SLL_processing_2 Cultivated SLL_processing_1_ F06-2041 Processing Processing USA OH SLL_processing_1 3 Cultivated SLL_processing_1_ F06-2058 Processing Processing USA OH SLL_processing_1 3 Cultivated FG02_188 Processing Processing USA OH mixture mixture Wide Cross FG16-511 Hybrid Hybrid USA OH Not_Classified Not_Classified Wide Cross FG16-513 Hybrid Hybrid USA OH Not_Classified Not_Classified Wide Cross FG16-515 Hybrid Hybrid USA OH Not_Classified Not_Classified Wide Cross FG16-517 Hybrid Hybrid USA OH Not_Classified Not_Classified Wide Cross FG16-519 Hybrid Hybrid USA OH Not_Classified Not_Classified Wide Cross FG16-521 Hybrid Hybrid USA OH Not_Classified Not_Classified

282 Cultivated Cultivated Gold Ball Cherry Cherry USA OH SLL_vintage SLL_vintage_1 Cultivated SLL_processing_1_ Heinz 1706 Processing Processing USA OH SLL_processing_1 1 Cultivated Hunt 100 Processing Processing USA CA mixture mixture S. LA0373 Wild pimpinellifolium Peru Ancash SP_Peru SP_Peru_5 S. LA0400 Wild pimpinellifolium Peru Piura SP_Peru SP_Peru_2 S. LA0411 Wild pimpinellifolium Ecuador Los Rios Not_Classified Not_Classified S. LA0722 Wild pimpinellifolium Peru La Libertad SP_Peru SP_Peru_5 S. LA1237 Wild pimpinellifolium Ecuador Esmeraldas Not_Classified Not_Classified S. LA1261 Wild pimpinellifolium Ecuador Los Rios Not_Classified Not_Classified S. LA1269 Wild pimpinellifolium Peru Lima SP_Peru SP_Peru_6 S. LA1279 Wild pimpinellifolium Peru Lima Not_Classified Not_Classified S. LA1301 Wild pimpinellifolium Peru Ica SP_Peru SP_Peru_8 LA1314 Wild Cherry Wild Cherry Peru Cusco SLC_Peru SLC_Peru_3 S. LA1335 Wild pimpinellifolium Peru Arequipa Not_Classified Not_Classified LA1338 Wild Cherry Wild Cherry Ecuador Napo SLC_Ecuador SLC_Ecuador_1 S. LA1371 Wild pimpinellifolium Peru Lima SP_Peru SP_Peru_7

283 LA1464 Wild Cherry Wild Cherry Honduras Unknown SLC_non_Andean SLC_Mesoamerica El LA1512 Wild Cherry Wild Cherry Salvador Unknown SLC_non_Andean SLC_Mesoamerica Costa LA1542 Wild Cherry Wild Cherry Rica Unknown SLC_non_Andean SLC_Costa_Rica S. LA1545 Wild pimpinellifolium Mexico Campeche SLC_non_Andean SLC_Mesoamerica LA1549 Wild Cherry Wild Cherry Peru Pasco SLC_Peru SLC_Peru_2 LA1569 Wild Cherry Wild Cherry Mexico Vera Cruz SLC_Peru SLC_Peru_2 S. LA1576 Wild pimpinellifolium Peru Lima SP_Peru SP_Peru_7 S. LA1582 Wild pimpinellifolium Peru Lambayeque SP_Peru SP_Peru_3 S. LA1584 Wild pimpinellifolium Peru Lambayeque SP_Peru SP_Peru_3 S. LA1589 Wild pimpinellifolium Peru La Libertad SP_Peru SP_Peru_4 S. LA1590 Wild pimpinellifolium Peru La Libertad SP_Peru SP_Peru_4 S. LA1602 Wild pimpinellifolium Peru Lima SP_Peru SP_Peru_7 S. LA1606 Wild pimpinellifolium Peru Ica SP_Peru SP_Peru_7 S. LA1617 Wild pimpinellifolium Peru Tumbes SP_Montane SP_Montane_2 LA1620 Wild Cherry Wild Cherry Brazil Bahia SLC_non_Andean SLC_Mesoamerica LA1621 Wild Cherry Wild Cherry Mexico Hidalgo SLC_non_Andean SLC_Mesoamerica LA1623 Wild Cherry Wild Cherry Mexico Yucatan SLC_non_Andean SLC_Mesoamerica Madre de LA1632 Wild Cherry Wild Cherry Peru Dios SLL_vintage SLL_vintage_2

284 LA1654 Wild Cherry Wild Cherry Peru San Martin SLC_Peru SLC_Peru_2 LA1668 Wild Cherry Wild Cherry Mexico Guerrero SLC_non_Andean SLC_Asia S. LA1683 Wild pimpinellifolium Peru Piura SP_Peru SP_Peru_1 LA1701 Wild Cherry Wild Cherry Peru La Libertad mixture mixture S. LA1936 Wild pimpinellifolium Peru Arequipa SP_Peru SP_Peru_9 LA1953 Wild Cherry Wild Cherry Peru Arequipa SLC_Peru SLC_Peru_3 conflicting conflicting LA2076 Wild Cherry Wild Cherry Bolivia Unknown classification classification LA2077 Wild Cherry Wild Cherry Bolivia La Paz SLC_non_Andean SLC_world Rio Grande LA2078 Wild Cherry Wild Cherry Brazil de Sol SLC_Peru SLC_Peru_2 Zamora- LA2126 Wild Cherry Wild Cherry Ecuador Chinchipe SLC_Ecuador SLC_Ecuador_3 Zamora- LA2131 Wild Cherry Wild Cherry Ecuador Chinchipe SLC_Ecuador SLC_vintage Santiago- LA2135 Wild Cherry Wild Cherry Ecuador Morona SLC_LA2135 SLC_LA2135 Santiago- LA2137 Wild Cherry Wild Cherry Ecuador Morona SLC_mixture SLC_mixture S. LA2183 Wild pimpinellifolium Peru Amazonas SP_Peru SP_Peru_1 S. LA2184 Wild pimpinellifolium Peru Amazonas SP_Peru SP_Peru_1 LA2213 Wild Cherry Wild Cherry Peru San Martin Not_Classified Not_Classified LA2256 Wild Cherry Wild Cherry Peru San Martin Not_Classified Not_Classified LA2262 Wild Cherry Wild Cherry Peru San Martin Not_Classified Not_Classified LA2308 Wild Cherry Wild Cherry Peru San Martin SLC_Peru SLC_Peru_2

285 LA2312 Wild Cherry Wild Cherry Peru Amazonas SLC_Peru SLC_Peru_1 Costa LA2393 Wild Cherry Wild Cherry Rica Guanacaste SLC_mixture SLC_mixture S. LA2533 Wild pimpinellifolium Peru Unknown SP_Peru SP_Peru_6 LA3137 Wild Cherry Wild Cherry Cuba Holguin mixture mixture Cultivated SLL_processing_1_ M82 Processing Processing USA CA SLL_processing_1 2 Cultivated Cultivated NC2C Cherry Cherry USA NC mixture mixture Cultivated SLL_processing_1_ OH03-6439 Processing Processing USA OH SLL_processing_1 3 Cultivated SLL_processing_1_ OH05-8022 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH05-8044 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH05-8142 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH05-8197 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH05-8210 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH08-5201 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH08-5202 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH08-5204 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH08-5205 Processing Processing USA OH SLL_processing_1 2

286 Cultivated SLL_processing_1_ OH08-5207 Processing Processing USA OH SLL_processing_1 1 Cultivated SLL_processing_1_ OH08-5216 Processing Processing USA OH SLL_processing_1 1 Cultivated SLL_processing_1_ OH08-7466 Processing Processing USA OH SLL_processing_1 1 Cultivated SLL_processing_1_ OH7536 Processing Processing USA OH SLL_processing_1 2 Cultivated OH7663 Processing Processing USA OH Not_Classified Not_Classified Cultivated SLL_processing_1_ OH8243 Processing Processing USA OH SLL_processing_1 3 Cultivated SLL_processing_1_ OH8245 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH8446 Processing Processing USA OH SLL_processing_1 1 Cultivated SLL_processing_1_ OH8556 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH88119 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH9242 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH981049 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH981067 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ OH981136 Processing Processing USA CA SLL_processing_1 2 Cultivated Cultivated OR11 Cherry Cherry USA Unknown SLC_1 SLC_1

287 Cultivated SLL_processing_1_ OX325 Processing Processing USA OH SLL_processing_1 2 Cultivated SLL_processing_1_ Peto95-43 Processing Processing USA CA SLL_processing_1 2 PI129128 Wild Cherry Wild Cherry Panama Unknown Not_Classified Not_Classified PI155372 Wild Cherry Wild Cherry NA Unknown Not_Classified Not_Classified Cultivated Cultivated Principe Borghese Cherry Cherry Italy Unknown SLL_vintage SLL_vintage_1 Cultivated SLL_processing_1_ Sun1642 Processing Processing USA CA SLL_processing_1 2 Cultivated Cultivated Tainan Cherry Cherry Taiwan Unknown mixture mixture Cultivated SLL_processing_1_ Unilever265 Processing Processing USA CA SLL_processing_1 2 Cultivated Cultivated Unknow VFNT Cherry Cherry Cherry n Unknown Not_Classified Not_Classified

288 B.3. Means plus or minus standard deviations of steroidal glycoalkaloids for each genotype represented in the diversity panel (Continued).

CULBPT_05_11 11.5±4.8 64.5±26.9 3.3±3.7 299.0±101.2 5.7±7.2 21.2±28.0 120.6±73.3 0.4±0.3 0.1±0.1 CULBPT_05_15 38.4±26.2 241.4±109.4 12.5±11.0 794.1±382.5 12.4±13.4 63.0±60.1 294.0±283.2 0.3±0.2 0.0±0.0 CULBPT_05_22 21.7±13.5 98.0±39.9 6.1±4.3 590.5±311.0 7.5±9.3 17.9±10.1 174.6±200.9 0.3±0.3 0.1±0.1 CULBPT04_1 46.1±43.5 125.0±68.8 4.7±2.8 497.0±164.6 3.3±3.5 20.0±14.2 89.9±71.2 0.4±0.2 0.0±0.0 E6203 34.5±22.4 80.1±29.5 4.6±3.2 448.2±155.5 8.5±9.0 34.4±23.7 208.3±185.6 0.3±0.2 0.0±0.0 F06-2041 33.6±35.0 106.2±31.6 15.4±8.5 596.9±340.1 4.3±7.3 24.7±27.2 76.0±117.1 0.2±0.2 0.0±0.0 F06-2058 24.4±17.9 100.9±32.9 5.0±4.2 512.4±157.9 7.8±5.1 23.4±16.0 112.7±106.0 0.2±0.2 0.1±0.1 FG02_188 101.6±100.1 138.2±125.2 10.8±7.7 742.8±602.0 20.9±37.8 57.1±73.0 475.3±870.9 0.2±0.2 0.0±0.0 FG16-511 253.7±282.4 374.0±320.2 62.9±103.7 3571.2±3690.9 16.3±16.6 116.1±139.1 300.9±274.9 1.0±1.7 0.2±0.4 FG16-513 275.5±240.5 477.2±208.1 54.8±13.5 5547.2±1484.5 31.9±19.3 160.6±121.4 440.2±267.7 0.3±0.2 0.0±0.0 FG16-515 93.2±110.1 246.2±113.7 21.5±13.0 2320.5±909.8 18.8±9.4 115.5±55.1 293.6±131.4 0.3±0.2 0.0±0.0 FG16-517 423.6±517.9 634.7±255.8 72.7±34.7 7504.0±1434.4 50.9±44.0 218.9±162.6 712.7±692.4 0.4±0.4 0.1±0.1 FG16-519 134.4±156.8 308.1±151.3 22.4±19.8 1788.9±1263.0 17.7±20.9 89.5±70.1 227.7±289.3 0.4±0.3 0.0±0.0 FG16-521 392.7±187.2 1624.1±1069.9 142.2±119.7 11295.6±7589.3 108.9±93.2 608.0±539.4 1162.9±1007.2 1.9±4.0 0.0±0.0 Gold Ball 29.0±22.7 111.5±41.1 5.1±2.0 658.1±209.6 2.2±3.2 20.1±6.3 43.2±63.2 0.4±0.5 0.0±0.0 Heinz 1706 133.9±135.8 132.8±52.0 22.4±14.4 1143.6±727.2 9.8±8.0 35.0±25.6 73.5±54.0 0.2±0.2 0.0±0.0 Hunt 100 19.5±13.4 75.4±33.6 8.1±6.2 678.7±417.4 9.3±8.3 38.6±34.2 267.0±256.1 0.6±0.8 0.3±0.5

289 LA0373 665.0±491.6 1116.8±388.4 201.1±53.8 14213.5±2790.8 36.2±19.5 279.4±81.4 612.1±277.1 15.4±20.5 4.9±10.7 LA0400 1104.8±563.0 1337.3±478.3 782.9±306.3 28351.7±7259.6 84.8±79.4 351.7±104.6 791.4±686.2 2.2±1.8 0.6±1.1 LA0411 670.0±472.0 713.6±253.6 278.9±196.2 15325.8±6199.6 26.5±14.0 187.5±65.3 261.4±116.4 8.3±12.7 2.1±3.4 LA0722 1032.1±214.1 895.5±153.1 471.0±220.7 27123.9±9176.2 44.8±19.9 340.7±101.4 995.8±447.5 7.2±9.7 1.0±1.2 LA1237 669.0±256.6 979.4±225.9 221.9±112.6 15197.5±5022.1 20.3±15.3 243.5±130.5 203.2±139.8 4.6±4.4 0.3±0.2 LA1261 868.5±551.4 1035.6±537.8 241.5±110.2 14506.2±3903.0 46.3±39.0 379.6±246.8 439.8±353.3 2.9±1.3 0.5±0.3 LA1269 810.5±350.0 1631.7±387.6 473.2±224.0 24193.4±10497.5 132.0±117.9 559.6±240.4 1503.6±1170.5 4.8±7.8 0.2±0.2 LA1279 470.1±275.8 647.0±155.3 164.7±98.7 9301.1±3648.8 27.4±7.1 163.9±80.0 285.1±109.6 2.0±1.9 0.3±0.3 LA1301 1270.5±954.9 888.5±156.1 326.3±176.7 20939.4±6453.1 23.2±6.1 219.0±104.8 692.5±279.6 11.6±15.7 4.0±5.9 LA1314 278.1±194.8 1019.4±440.7 113.4±32.4 7910.2±2769.7 70.9±93.9 453.2±431.7 728.1±897.3 1.4±2.0 0.2±0.4 LA1335 536.5±294.4 820.9±327.7 245.6±198.3 14083.9±6729.1 19.5±13.6 252.4±141.2 391.3±318.3 3.7±4.6 0.6±1.1 LA1338 0.0±0.0 24192.4±4676.2 0.0±0.0 0.5±1.0 178.9±120.7 4.3±4.0 1912.4±1192.6 12.4±25.4 0.3±0.5 LA1371 564.6±445.4 156.3±38.8 133.1±96.0 9613.3±6810.2 21.5±14.6 97.4±63.7 470.1±213.1 4.9±4.0 0.6±0.7 LA1464 1737.9±1064.1 2014.4±725.1 466.8±177.7 18892.5±3313.1 129.5±73.6 464.4±328.4 1190.5±628.2 1.1±1.0 0.1±0.2 LA1512 1232.2±844.0 1447.2±340.8 396.4±237.1 19248.6±7741.2 25.9±7.4 296.4±79.5 327.1±106.8 2.0±1.9 0.1±0.1 LA1542 917.8±497.4 1972.0±567.3 500.9±309.1 25887.4±11070.9 83.1±58.8 612.0±271.7 883.0±579.0 8.3±11.8 0.4±0.6 LA1545 1692.2±862.7 2001.5±632.2 700.6±480.6 24867.9±8536.7 138.1±107.2 400.1±221.5 1025.1±749.8 3.4±6.7 0.2±0.4 LA1549 213.3±80.3 906.9±253.1 225.6±91.0 6443.5±2190.1 92.5±46.4 453.2±243.4 528.9±331.4 0.8±0.8 0.1±0.2 LA1569 254.3±132.7 872.4±312.3 219.9±115.4 7810.2±3431.3 77.9±51.9 346.2±225.4 497.4±362.7 3.1±2.5 0.6±0.7 LA1576 1505.8±1168.5 958.4±318.3 445.4±278.7 20913.2±7906.8 45.4±35.1 170.2±100.4 416.0±323.8 1.7±1.4 1.2±1.5 LA1582 552.5±166.0 816.3±211.9 427.7±171.1 21606.4±4657.3 242.7±203.6 393.2±181.7 1549.8±966.5 6.3±12.0 0.2±0.1 LA1584 535.2±225.6 1221.2±530.5 505.8±347.1 21594.7±11939.4 73.9±89.7 378.1±343.2 764.6±849.3 1.5±1.4 0.1±0.1 LA1589 812.8±468.5 720.4±367.0 353.8±170.0 16788.4±5862.9 16.5±11.5 175.6±116.0 411.4±454.6 11.4±24.0 0.5±0.6 LA1590 938.3±484.6 754.0±168.2 254.8±88.5 16548.8±3128.5 15.5±13.5 176.1±84.8 260.2±186.3 3.1±2.5 0.3±0.4 LA1602 831.4±556.3 198.7±37.4 194.3±98.7 10932.9±3742.3 10.5±10.2 68.4±55.2 341.2±350.7 3.1±3.1 0.4±0.3 LA1606 997.9±827.7 1048.6±369.0 318.8±146.3 18053.8±6499.8 36.8±25.3 314.7±142.8 719.1±393.0 21.3±33.5 3.6±6.1 LA1617 1543.1±968.6 1189.5±101.9 766.1±385.9 24734.3±8643.7 125.4±60.2 415.4±182.0 1085.2±491.1 3.2±5.3 1.8±3.6

290 LA1620 556.6±376.2 1993.2±1323.3 246.9±113.0 14672.3±4032.5 136.6±98.9 739.4±429.4 1260.5±786.1 2.4±3.3 0.1±0.1 LA1621 1460.6±701.3 1956.6±512.1 380.4±187.5 19376.2±7435.3 34.8±14.3 371.6±97.5 463.3±169.9 1.5±1.0 0.1±0.1 LA1623 619.8±288.3 2038.2±1069.0 635.5±357.2 15909.4±5596.0 182.3±97.4 775.2±387.6 1027.0±779.5 3.8±3.0 2.3±5.0 LA1632 392.9±222.8 826.2±714.8 98.4±20.1 8035.8±3417.8 26.0±36.3 250.6±342.0 281.6±377.7 1.3±1.3 0.2±0.4 LA1654 20.9±16.6 1220.3±609.2 5.4±4.9 712.8±902.8 1496.6±869.4 4667.3±2472.5 21190.2±13927.8 3.8±3.9 0.2±0.2 LA1668 1585.0±1048.6 1334.4±619.5 429.8±175.4 19925.8±2777.0 72.7±81.0 262.1±227.4 652.7±650.9 1.5±0.7 0.1±0.2 LA1683 955.5±596.7 2244.6±1200.9 587.4±217.9 27374.2±8321.4 187.6±145.6 700.6±424.5 1513.8±1017.8 38.8±48.2 0.4±0.4 LA1701 0.0±0.0 7612.7±5741.7 0.0±0.0 1.9±2.2 563.7±259.6 46917.7±24633.1 2290.9±1588.6 0.5±0.1 0.2±0.2 LA1936 765.5±347.9 712.8±188.1 258.9±131.4 16652.2±5715.6 12.2±6.5 220.9±85.1 266.1±122.3 1.6±1.4 0.4±0.6 LA1953 207.3±106.2 717.3±331.2 138.3±125.1 6068.2±2926.5 40.1±32.2 192.1±127.5 268.0±248.8 2.2±2.5 0.2±0.4 LA2076 357.5±259.8 861.9±461.4 261.5±279.1 8142.8±4192.4 42.0±38.6 252.2±170.9 271.2±198.2 1.3±1.7 0.3±0.4 LA2077 412.3±273.5 972.7±499.7 651.8±299.1 10005.3±3026.2 47.0±21.5 210.2±99.5 165.4±89.9 1.2±0.6 0.5±0.6 LA2078 455.5±237.7 1044.0±367.7 198.9±92.8 8192.2±3009.6 47.1±30.6 204.5±100.6 377.3±279.0 1.7±2.3 0.2±0.1 LA2126 0.0±0.0 3007.8±1358.8 0.0±0.0 0.4±0.9 447.2±200.0 26257.5±9361.9 1482.5±1008.5 8.6±11.1 2.4±4.7 LA2131 888.5±1031.2 642.8±831.9 122.9±129.8 11450.3±8039.7 21.9±12.5 105.9±71.2 316.8±430.5 20.5±30.9 4.1±7.2 LA2135 584.9±429.7 1468.8±601.5 209.7±165.4 11373.8±5879.3 139.7±181.1 608.4±804.1 1136.8±1743.9 3.5±3.2 0.5±0.4 LA2137 656.0±182.3 523.0±195.2 281.1±150.2 18635.4±6582.3 23.1±21.8 168.5±81.2 303.8±228.5 1.1±1.0 0.2±0.2 LA2183 764.3±654.1 1308.6±217.9 675.4±262.2 29482.8±8590.5 55.0±22.6 375.3±104.2 506.4±143.8 2.4±2.3 0.3±0.4 LA2184 1058.1±565.7 2595.3±2307.0 377.5±185.5 19752.2±6800.5 235.8±350.7 17436.9±29240.1 1156.7±1352.7 2.1±2.3 0.4±0.4 LA2213 14.8±6.0 3021.7±1654.7 2.1±2.1 395.8±177.9 1770.8±281.3 6984.2±2488.2 26212.3±14584.2 23.4±21.8 1.2±2.1 LA2256 10.2±6.0 829.8±250.6 0.1±0.3 141.2±64.7 1220.4±186.7 2374.5±537.3 28912.4±5010.8 29.6±53.1 0.6±0.6 LA2262 8.0±7.1 724.1±295.0 0.0±0.0 129.1±54.9 1225.3±374.5 2070.3±594.3 26257.1±8638.8 21.6±21.6 0.9±1.4 LA2308 16.0±17.4 1772.3±914.1 0.9±1.4 174.7±74.5 1410.5±550.7 3983.7±1400.5 17970.4±7519.8 90.3±120.4 0.1±0.1 LA2312 275.8±198.0 810.1±507.8 147.4±96.4 7218.1±4192.8 37.2±26.1 263.0±174.5 313.8±210.0 1.3±0.6 0.2±0.2 LA2393 570.6±233.1 726.4±428.4 218.2±111.3 12576.7±3673.8 27.0±20.2 183.0±68.2 253.2±185.3 0.9±1.2 0.2±0.3 LA2533 948.9±566.8 893.2±291.9 236.5±123.4 15036.3±6171.1 10.8±5.0 222.5±130.0 284.5±120.8 6.7±9.5 0.3±0.5 LA3137 621.4±558.6 1893.2±922.9 524.2±196.1 14775.7±5401.5 94.6±73.0 496.2±320.4 641.7±550.7 0.5±0.4 0.1±0.1

291 M82 51.1±74.2 174.6±107.0 25.9±32.4 1437.8±2360.9 15.9±14.2 53.8±40.6 207.1±193.8 1.6±1.6 0.3±0.3 NC2C 210.7±234.6 419.7±297.9 70.7±114.2 3406.8±3685.7 20.1±13.6 95.3±93.3 235.7±206.4 0.4±0.4 0.0±0.0 OH03-6439 35.8±15.4 98.1±28.0 9.9±6.1 651.8±315.2 6.2±7.9 10.5±8.4 114.8±192.3 0.5±0.7 0.1±0.1 OH05-8022 73.8±73.5 115.2±47.3 12.1±9.3 714.3±518.4 20.6±17.0 40.3±37.0 339.0±316.0 0.5±0.5 0.0±0.0 OH05-8044 18.0±12.9 92.9±60.4 6.9±9.3 380.8±329.3 10.4±10.7 27.8±23.6 186.2±187.2 0.2±0.1 0.0±0.0 OH05-8142 57.2±49.2 200.7±126.1 10.0±7.8 1186.1±442.8 25.7±27.6 39.9±32.5 432.2±467.4 0.3±0.3 0.0±0.0 OH05-8197 13.5±13.4 7308.5±17702.4 2.4±2.7 270.5±161.7 64.6±142.1 22.8±29.8 621.5±1158.4 0.2±0.2 0.0±0.0 OH05-8210 56.0±42.5 166.3±47.1 8.3±5.9 751.5±300.1 45.0±37.1 99.2±80.9 884.0±814.6 0.2±0.2 0.0±0.0 OH08-5201 39.1±36.9 116.8±86.9 4.8±3.9 550.6±388.9 7.8±4.6 37.1±35.9 112.0±61.7 0.2±0.2 0.0±0.0 OH08-5202 23.7±12.7 98.1±32.9 5.3±4.9 640.8±232.3 18.2±10.8 36.4±29.0 449.7±317.2 0.2±0.2 0.0±0.0 OH08-5204 42.6±37.0 178.8±65.6 7.9±8.7 854.7±558.4 21.4±19.6 69.2±47.3 473.4±385.6 0.3±0.3 0.0±0.0 OH08-5205 57.1±47.1 207.3±106.6 12.6±10.5 1103.7±632.1 31.1±35.0 121.3±116.1 641.0±696.7 1.1±1.5 0.0±0.0 OH08-5207 105.9±165.0 149.4±72.2 8.0±7.5 1035.1±642.2 14.9±13.7 61.6±53.2 424.4±422.8 1.3±2.5 0.0±0.0 OH08-5216 41.0±26.7 97.9±42.6 6.5±4.1 562.6±195.1 8.9±8.3 20.4±18.1 150.2±164.1 4.5±10.4 2.5±5.9 OH08-7466 106.9±70.6 281.3±258.7 26.3±12.6 1881.9±1152.7 39.8±34.6 73.0±57.7 654.1±645.6 0.3±0.3 0.0±0.0 OH7536 40.5±36.0 111.3±51.5 3.7±3.7 463.8±162.1 8.1±5.1 33.9±25.4 137.3±79.8 0.6±0.7 0.2±0.3 OH7663 51.0±37.7 84.6±31.3 3.5±3.0 471.5±223.9 5.4±4.7 34.5±41.8 88.3±91.6 0.2±0.2 0.0±0.0 OH8243 54.6±43.9 182.3±66.3 14.7±8.1 926.2±463.8 20.4±17.7 62.8±60.2 283.8±270.4 0.4±0.5 0.0±0.0 OH8245 74.8±77.5 184.1±164.0 12.6±8.0 1169.6±945.0 43.6±44.3 67.8±65.6 749.5±778.1 0.5±0.7 0.0±0.0 OH8446 46.4±51.9 79.3±38.1 9.9±8.1 428.0±273.3 4.9±4.4 13.5±14.9 24.1±14.6 0.2±0.2 0.0±0.0 OH8556 37.4±37.7 58.6±15.2 5.1±1.9 317.0±59.2 6.0±6.4 23.0±23.0 136.5±169.9 0.3±0.3 0.0±0.0 OH88119 47.9±28.0 201.4±210.5 5.1±3.6 766.1±487.9 23.9±30.6 68.5±63.9 575.8±926.1 0.5±0.9 0.2±0.5 OH9242 31.8±15.7 112.2±45.8 10.2±5.3 473.8±359.2 13.0±15.4 47.8±38.2 226.8±281.5 0.3±0.3 0.0±0.0 OH981049 61.6±35.8 147.8±59.0 11.9±16.5 568.9±369.3 24.6±21.5 74.9±58.3 556.9±592.2 0.3±0.4 0.0±0.0 OH981067 84.7±92.3 154.5±58.7 6.3±5.7 617.1±362.9 39.4±39.6 149.2±147.6 865.9±802.7 0.5±0.6 0.1±0.2 OH981136 31.6±23.5 103.4±50.3 10.8±10.3 660.6±449.2 14.0±12.4 38.0±31.4 230.5±190.5 1.9±3.2 1.1±2.3 OR11 72.1±66.4 229.2±118.0 17.1±10.4 1577.6±931.9 9.0±7.2 57.9±31.6 136.0±81.6 2.1±3.2 0.4±0.8

292 OX325 31.1±35.7 84.1±58.9 6.3±4.0 360.7±249.0 16.1±12.4 29.1±36.4 155.3±130.9 0.3±0.2 0.0±0.0 Peto95-43 63.6±41.5 116.8±45.6 8.3±8.0 670.0±404.7 13.5±9.8 54.8±41.1 450.4±345.2 0.3±0.1 0.0±0.0 PI129128 0.0±0.0 4955.4±3622.2 0.0±0.0 0.4±1.0 413.8±248.8 37010.8±21463.6 1467.5±953.6 12.7±26.1 0.7±1.2 PI155372 0.0±0.0 6446.5±2522.2 0.0±0.0 0.0±0.0 801.6±487.3 60254.5±30003.6 2995.5±2360.8 3.5±6.9 2.2±5.3 Principe Borghese 413.6±157.2 1009.1±402.8 127.9±149.8 7728.9±5268.7 39.8±33.7 254.7±152.2 470.6±335.5 0.4±0.3 0.0±0.0 Sun1642 23.6±17.3 116.2±46.5 6.9±6.9 640.0±588.8 7.8±2.0 38.8±15.6 272.0±115.4 1.5±2.4 0.6±1.3 Tainan 249.5±138.8 696.9±421.1 63.6±40.3 6144.5±3658.5 34.2±31.7 161.5±148.0 297.1±320.2 0.4±0.5 0.0±0.0 Unilever265 26.3±22.2 105.0±47.6 8.5±5.2 461.5±142.9 36.2±24.4 118.4±104.7 797.4±544.3 0.4±0.5 0.0±0.0 VFNT Cherry 124.7±58.1 462.9±512.7 21.2±8.5 2413.7±1217.4 14.6±15.9 116.1±142.5 264.6±346.6 10.2±21.4 1.9±4.6

293