<<

GENETICS AND EVOLUTION OF MULTIPLE DOMESTICATED SQUASHES AND (, )

By

HEATHER ROSE KATES

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2017

© 2017 Heather Rose Kates

To Patrick and Tomás

ACKNOWLEDGMENTS

I am grateful to my advisors Douglas E. Soltis and Pamela S. Soltis for their encouragement, enthusiasm for discovery, and generosity. I thank the members of my committee, Nico Cellinese, Matias Kirst, and Brad Barbazuk, for their valuable feedback and support of my dissertation work. I thank my first mentor Michael J. Moore for his continued support and for introducing me to botany and to hard work. I am thankful to

Matt Johnson, Norman Wickett, Elliot Gardner, Fernando Lopez, Guillermo Sanchez,

Annette Fahrenkrog, Colin Khoury, and Daniel Barrerra for their collaborative efforts on the dissertation work presented here. I am also thankful to my lab mates and colleagues at the University of Florida, especially Mathew A. Gitzendanner for his patient helpfulness. Finally, I thank Rebecca L. Stubbs, Andrew A. Crowl, Gregory W. Stull,

Richard Hodel, and Kelly Speer for everything.

4

TABLE OF CONTENTS

page

ACKNOWLEDGMENTS ...... 4

LIST OF TABLES ...... 9

LIST OF FIGURES ...... 10

ABSTRACT ...... 12

CHAPTER

1 GENERAL INTRODUCTION ...... 14

2 ORIGINS, AGRICULTURAL IMPORTANCE, AND BREEDING CHALLENGES OF PUMPKINS, SQUASHES, AND (Cucurbita) AND THE USEFULNESS OF STUDYING Cucurbita CROP WILD RELATIVES ...... 20

Introduction ...... 20 Origin and Historical Use of the Pumpkins, Squashes, and Gourds ...... 20 ...... 23 ...... 24 ...... 24 ...... 24 ...... 25 Current Agricultural and Economic Importance of Pumpkins, Squashes, and Gourds ...... 26 Cucurbita pepo ...... 27 Cucurbita argyrosperma ...... 29 Cucurbita maxima ...... 30 Cucurbita moschata ...... 31 Cucurbita ficifolia ...... 31 Challenges in Cultivation of Squashes, Pumpkins, and Gourds ...... 32 Viral, bacterial, and fungal diseases ...... 32 Non-disease challenges to cultivation ...... 35 The effect of climate change on challenges to cultivation ...... 36 Crop Wild Relatives of Pumpkins, Squashes, and Gourds ...... 37 Primary Genepool CWR ...... 38 Cucurbita argyrosperma ssp. sororia (L. H. Bailey) L. Merrick & D. M. Bates ...... 38 Cucurbita pepo L. (wild) ...... 39 Secondary Genepool CWR ...... 41 Cucurbita lundelliana L. H. Bailey ...... 41 Cucurbita okeechobeensis (Small) L. H. Bailey ...... 42 Cucurbita pedatifolia L.H. Bailey and C. radicans Naudin ...... 43 Tertiary Genepool CWR ...... 44

5

Cucurbita foetidissima Kunth ...... 45 Cucurbita palmata S. Watson, C. digitata A. Gray, C. cordata S. Watson . 45 Wild Utilized Species ...... 47 Conservation Status of CWR and WUS ...... 48 In situ ...... 48 Ex situ...... 52

3 EVOLUTIONARY AND HISTORY OF Cucurbita ( AND SQUASH) SPECIES INFERRED FROM 44 NUCLEAR LOCI ...... 65

Introduction ...... 65 Materials and Methods...... 69 Material and DNA Extraction ...... 69 Selection of Nuclear Loci ...... 70 Microfluidic PCR Amplification and Illumina Sequencing ...... 70 Assembly of Sequencing Reads into Targeted Nuclear Loci ...... 72 Sequence Alignment and Phylogenetic Analyses ...... 73 Ancestral State Reconstruction ...... 75 Results ...... 76 Sequence Data ...... 76 Phylogeny of Cucurbita Based on CA-ML and Species Tree Methods ...... 77 Ancestral Character State Reconstruction...... 79 Differences in Topology and Support among the Gene Trees, CA-ML Tree, and Species Trees ...... 80 Discussion ...... 81 New Cucurbita Species Topology ...... 81 Evolution of Mesophytic Habit in Cucurbita Species ...... 84 Phylogenetic Resolution within Clades of Crops and their Wild Relatives ...... 85 Conclusions ...... 87

4 ALLELE PHASING HAS MINIMAL IMPACT ON PHYLOGENETIC RECONSTRUCTION FROM TARGETED NUCLEAR GENE SEQUENCES IN A CASE STUDY OF () ...... 97

Introduction ...... 97 Methods ...... 100 Datasets ...... 100 Phylogenetic Analyses ...... 103 Assessing Gene Tree Resolution, Topological Incongruence, and Allele Coalescence ...... 106 Results ...... 108 Assemblies and Alignments ...... 108 Allele Coalescence, Sequence Heterozygosity, and Gene Tree Resolution .. 109 Levels of Gene Tree/Species Tree Discordance across Methods of Data Assembly ...... 110 Topological Incongruence among Data Sets and Phylogenetic Inference ..... 111

6

Comparison of Support for Relationships Reconstructed Using Different Datasets ...... 113 Phylogenetic Analysis of Genes with High Levels of Deep Coalescence ...... 114 Discussion ...... 115 Highly Heterozygous Genes and Individuals Do Not Appear Disproportionately Responsible for Low Resolution or Topological Uncertainty in Gene Trees or Species/CAML Trees ...... 116 Using Allele Sequences does not Improve Phylogenetic Resolution or Produce Meaningfully Different Topologies in Enigmatic Areas of the Artocarpus Phylogeny ...... 116 Limitations of Our Results ...... 118 Conclusions ...... 119

5 COMPARATIVE CHARACTERIZATION OF THE DOMESTICATION PROCESS IN TWO INDEPENDENTLY DOMESTICATED PUMPKIN SPECIES 130

Introduction ...... 130 Materials and Methods...... 134 Plant Material and DNA Extraction ...... 134 Targeted Enrichment and DNA Sequencing ...... 135 Read Filtering, Mapping, and SNP Calling ...... 137 Population Structure and Genetic Diversity ...... 137 Results ...... 139 SNP Data ...... 139 Population Structure and Pairwise Population Differentiation...... 140 Phylogenetic Analysis ...... 141 Genetic Diversity ...... 142 Discussion ...... 143 Origins of Domesticated Buttercup Squash and Cushaw ...... 143 Origin of Gigantism in Buttercup Squash...... 144 Domestication Bottlenecks in Buttercup Squash and Cushaw: Considering Breeding System and Gene Flow ...... 145 Contrasting the Genome-Wide Signatures of Selection and Domestication Histories of Buttercup Squash and Cushaw ...... 152 Conclusions ...... 154

6 GENERAL CONCLUSIONS ...... 164

APPENDIX

A SAMPLE INFORMATION FOR SAMPLES USED IN CHAPTERS 3 AND 5 ...... 170

B PRIMERS USED FOR MICROFLUIDIC PCR IN CHAPTER 3 ...... 184

C METHODS USED IN CHAPTER 3 ...... 188

D ADDITIONAL PHYLOGENETIC TREES FROM CHAPTER 3 ...... 190

7

E ALLELE DEEP COALESCENCE AND PHYLOGENETIC RESOLUTION IN CHAPTER 4 ...... 192

F PHYLOGENETIC TREES FROM CHAPTER 4 ...... 193

G SUPPORT FOR ALL HIGH CONFLICT AREAS DISCUSSED IN CHAPTER 4 ACROSS DATASETS AND PHYLOGENETIC ANALYSES ...... 213

LIST OF REFERENCES ...... 214

BIOGRAPHICAL SKETCH ...... 236

8

LIST OF TABLES

Table page

2-1 Cultivated North American pumpkins and squashes ...... 56

2-2 Cucurbita CWR ...... 58

3-1 Cucurbita species and their within-species resolution in the CA-ML tree ...... 89

4-1 Sample information, locus recovery, and statistics for allelic variation for 23 species of Artocarpus and one outgroup...... 121

4-2 Topological variation among three datasets (within analyses)...... 123

4-3 Topological variation within each dataset (among analyses)...... 124

5-1 Summary information about each domesticated ...... 156

5-2 Summary of genetic diversity statistics ...... 157

A-1 Sample information for samples used in the phylogenetic study for Chapter 3. 170

A-2 Sample information for samples used in the domestication genetics study for Chapter 5...... 178

B-1 Primer information for primers used in the phylogenetic study for Chapter 3 ... 184

9

LIST OF FIGURES

Figure page

2-1 Species richness map of modeled potential distributions of North American Cucurbita taxa...... 60

2-2 Modeled potential distribution maps of Cucurbita primary CWR species...... 61

2-3 Modeled potential distribution maps of Cucurbita secondary CWR species...... 62

2-4 Modeled potential distribution maps of Cucurbita tertiary CWR species...... 63

2-5 Number of Cucurbita CWR accessions in the NPGS...... 64

3-1 Distribution map of wild species of Cucurbita with occurrences based on ~7000 records from GBIF (www.gbif.org)...... 91

3-2 Success of microfluidic PCR amplification of loci across taxa...... 92

3-3 Best maximum-likelihood tree reconstructed with RAxML using a matrix of concatenated ambiguity-coded gene sequences under the GTRCAT model... .. 93

3-4 Comparison of alternative branching patterns among seven mesophytic Cucurbita species resolved by four phylogenetic studies of Cucurbita...... 94

3-5 Comparison of the ML phylogeny inferred from the concatenated matrix and ASTRAL species tree estimated from ambiguity-coded gene trees...... 95

3-6 Ancestral character-state reconstruction of habit using maximum likelihood. Possible states are mesophytic habit (black) and xerophytic habit (gray)...... 96

4-1 Read-backed phasing of targeted loci from high-throughput sequences ...... 125

4-2 Frequency of deep coalescence of intraindividual alleles ...... 126

4-3 Relationship between deep coalescence and gene tree resolution...... 127

4-4 ASTRAL-II trees for Artocarpus...... 128

4-5 Incongruence between analyses for all three datasets for Areas A and B...... 129

5-1 Map of putative range of wild C. maxima and C. argyrosperma with sampling localities (if known) for landrace and wild samples included in this study...... 158

5-2 Bayesian clustering (STRUCTURE) results for C. maxima and C. argyrosperma and ancestry coefficients mapped onto sampling localities...... 159

10

5-3 Principal component analysis of improved (blue), landrace (red), and wild () (a) C. argyrosperma accessions and (b) C. maxima accessions...... 160

5-4 Results of phylogenetic analysis of (a) C. argyrosperma accessions and (b) C. maxima accessions...... 161

5-5 By-locus values of within-population nucleotide diversity (θw) across 1-20 in (a) C. argyrosperma and (b) C. maxima...... 162

5-6 By-locus values of within-population nucleotide diversity (π) across chromosomes 1-20 in (a) C. argyrosperma and (b) C. maxima...... 163

C-1 Illustration of bioinformatics pipeline for locus selection in Chapter 3...... 188

C-2 Comparison of success of microfluidic PCR amplification of loci between herbarium-sheet and fresh tissue samples and across primer lengths...... 189

D-1 ASTRAL greedy consensus tree estimated from 44 majority-rule consensus gene trees (RAxML) that were reconstructed from ambiguity sequences...... 190

D-2 ASTRAL greedy consensus tree estimated from 44 majority-rule consensus gene trees (RAxML) that were reconstructed from allele sequences...... 191

E-1 Square-root pairwise distances (sqrt PWD) in the data and its relationship to phylogenetic resolution...... 192

F-1 Phylogenetic trees reconstructed using three datasets and multiple methods of phylogenetic analyses...... 193

G-1 Incongruence between analyses for all three datasets for Areas A-D...... 213

11

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

GENETICS AND EVOLUTION OF MULTIPLE DOMESTICATED SQUASHES AND PUMPKINS (Cucurbita, Cucurbitaceae)

By

Heather Rose Kates

December 2017

Chair: Douglas E. Soltis Cochair: Pamela S. Soltis Major: Genetics and Genomics

Domestication has radically altered wild species to meet human needs, producing morphologies and traits dramatically different from those in the wild. By reconstructing the domestication history of crop species and determining the genetic changes underlying the divergence of crops from wild species, we not only clarify the domestication process, but also learn more about human history, identify genetic resources in wild species for crop improvement, and examine rapid evolution and the genetic underpinnings of adaptation. The multiple domestication events from diverse wild ancestors in Cucurbita serve as replicates of domestication and therefore make

Cucurbita an especially valuable system in which to study crop evolution. Using a multidisciplinary approach, I explore the history, genetics, and evolution of multiple domesticated squashes and pumpkins and their wild ancestors. I provide a review of the origins, uses, and cultivation challenges of domesticated Cucurbita species along with an overview of Cucurbita crop wild relatives to highlight the importance of studying and conserving wild Cucurbita species. I use a phylogenetic approach to reconstruct relationships among wild and domesticated Cucurbita species and discovered novel

12

relationships among Cucurbita species that clarify how wild Cucurbita species are related to the domesticated taxa. To explore the relationship between phylogenetic resolution and sequence heterozygosity that I discovered in my phylogenetic study, I study the effect of assembling heterozygous sequences by allele phasing on phylogenetic reconstruction in a case study of another group of , Artocarpus

(Moraceae) and find that allele phasing has minimal impact on phylogenetic reconstruction in this case. I use population genomic analyses to explore the genetic structure and domestication bottlenecks in two independently domesticated pumpkin species and find that different wild origins and breeding practices produce strikingly different current crop diversity. Taken together, the results of this study improve our understanding of the pumpkin and squash crop wild relatives and how domesticated species have diverged from them. The phylogenetic methods developed for this study offer novel advice to other researchers on how to address heterozygosity in phylogenetic datasets.

13

CHAPTER 1 GENERAL INTRODUCTION

With an incredibly rich and long history of domestication and cultivation, the squashes and pumpkins (Cucurbita, n=20) are unrivalled in morphological diversity and wide range of adaptation to cultivation. In 2013, Americans consumed nearly $600 million worth of squash and pumpkin (Minor and Bond 2017); while the origins of many similarly valuable crops are known (e.g. cassava: Olsen et al. 1999; kiwi: Atkinson et al.

1997; common : Gepts 1996; eggplant: Meyer et al. 2012; pea: Jing et al. 2011), the ancestors of the most economically important squashes and pumpkins have not been identified (Zheng et al. 2013).

Cucurbita (Cucurbitaceae) comprises 13 species (Nesom 2011), although alternative taxonomies are also used (e.g. Gong et al. 2012). The wild taxa occupy diverse elevational and latitudinal habitats and have a distribution that ranges from the northern to southern with a center of diversity in . The six cultivated taxa are believed to be of independent origin from different wild progenitors in different geographic regions (Nee 1990). The wild ancestors of two of the six crops are known (Sanjur et al. 2002, Zheng et al. 2013), while those of four are not determined (Zheng et al. 2013). The co-occurrence of many independent domestication events within a single is unusual and matched perhaps only by Brassica

(cabbage, broccoli, cauliflower, Brussels sprouts, kale, collards, etc.; Dixon 2007). In most crop groups, one or two wild species or event(s) gave rise to all cultivated lines (e.g. Wendel 1989, Rieseberg and Seiler 1990, Olsen and Purugganan

2002, Matsuoka 2002). The multiple domestication events from diverse wild ancestors

14

in Cucurbita serve as replicates of domestication and therefore make Cucurbita an especially valuable system in which to study crop evolution.

Domestication of Cucurbita is among the earliest domestications on record with archaeological evidence indicating that Cucurbita species were used by humans as early as 10,000 years ago (Smith 2006). Hunter-gatherers may have first collected

Cucurbita for their edible , the bitter, inedible flesh for detergent, and the dried rinds for use as containers (Nee 1990). Discoveries of rare non-bitter fruits likely led to efforts to intentionally cultivate Cucurbita species, and the ultimate value of squashes and pumpkins was their edible flesh. This transition from bitter to non-bitter flesh represents one of the major domestication traits in squashes and pumpkins. Other traits of cultivated squashes and pumpkins are enlarged fruit size, considerable of fruit color and shape, fewer and larger seeds, bush habit (rather than ), and increased disease susceptibility. Among all plants cultivated for edible fruit, squashes and pumpkins have the most dramatic increase in fruit size compared with their wild relatives (Neves et al. 2012). While each of the six groups of cultivated squashes and pumpkins exhibits these domestication traits, different genetic changes could underlie their convergent morphologies in that each originated from a different wild species.

Furthermore, the cultivated taxa exhibit different adaptations to cultivation (Nee 1990).

Figleaf (C. ficifolia) is adapted to cool temperatures and short days while (C. moschata) grows best in the humid tropics. Cucurbita pepo subsp. pepo has the widest adaptation, reflected in its greatest economic importance.

For my Ph.D. at the University of Florida, I developed Cucurbita as a system for investigating adaptive response to selection under domestication using population

15

genomic and novel phylogenetic approaches to reconstruct the evolutionary relationships and domestication histories of pumpkins and squashes. The three major goals of my project are to: 1) Identify the wild ancestors of the cultivated squashes and pumpkins by reconstructing the most comprehensive and robust molecular-based phylogeny for Cucurbita to date based on 44 loci derived from introns of single-copy nuclear genes; 2) Suggest geographic origins of domestication for the cultivated squashes and pumpkins based on phylogenetic results; and 3) Characterize and survey the effect of domestication bottlenecks in two crop-wild subspecies pairs through a population genomic study. Below, I discuss the chapters of my dissertation and describe the roles they play in answering the questions outlined above.

In Chapter 2, I review of the cultivation history of domesticated pumpkins and squashes and the natural history of Cucurbita crop wild relatives (CWR) to provide novel insight into the limitations and potential for leveraging Cucurbita CWR genetic resources for crop improvement. I describe that because domesticated pumpkins and squashes can be crossed with each other and with diverse primary genepool relatives, contributions from Cucurbita CWR have enabled the development of disease-resistant and represent a vast pool of untapped genetic variability underlying traits including drought tolerance and disease resistance. Despite the potential usefulness of

Cucurbita CWR, I found that thorough evaluations of these wild species for agronomically important traits are limited. I caution that the twelve Cucurbita CWR of

North America are more narrowly distributed than they were in the past because of the extinction of megafaunal dispersers and because of habitat loss, and the genetic diversity of wild Cucurbita species may be decreasing; one North American wild relative,

16

C. okeechobeensis ssp. okeechobeensis, is nearly extinct and some others are rare. I advise that ex situ and in situ conservation of these species that includes phenotypic assessments is needed to better utilize the wealth of genetic resources available for pumpkin and squash crop improvement.

In Chapter 3, I investigate the evolution of wild and domesticated Cucurbita using the most comprehensive and robust molecular-based phylogeny for Cucurbita to date based on 44 loci derived from introns of single-copy nuclear genes. I discovered novel relationships among Cucurbita species and recovered the first Cucurbita tree with well- supported resolution within species. Cucurbita comprises a clade of mesophytic annual species that includes all six crop taxa and a grade of xerophytic perennial species that represent the ancestral xerophytic habit of the genus. Based on phylogenetic resolution within species I hypothesize that the magnitude of domestication bottlenecks varies among Cucurbita crop lineages. The phylogeny clarifies how wild Cucurbita species are related to the domesticated taxa. I find close relationships between two wild species and crop lineages not previously identified. I conclude that expanded geographic sampling of key wild species is needed for improved understanding of the evolution of domesticated Cucurbita.

In Chapter 4, I more closely examine a question considered in Chapter 3: whether untapped information about allelic diversity within populations and individuals

(i.e. heterozygosity) could improve phylogenetic resolution and accuracy. I review the common methods of including heterozygosity in phylogenetic studies and present a novel method for assembling allele sequences from Illumina sequencing reads. Using

Artocarpus (Moraceae) phylogenomics as a case study, I perform supermatrix

17

phylogeny reconstruction and species tree estimation of Artocarpus based on three methods of accounting for heterozygous sequences: a consensus method based on de novo sequence assembly, the use of ambiguity characters, and a novel method for incorporating read information to phase alleles within a locus. I characterize the extent to which highly heterozygous sequences impeded phylogeny reconstruction and determine whether the use of allele sequences improves phylogenetic resolution or decreases topological uncertainty. I find that highly heterozygous sequences in the datasets do not contribute disproportionately to poor phylogenetic resolution and that the use of allele sequences for phylogeny reconstruction does not have a clear effect on phylogenetic resolution or topological consistency.

In Chapter 5, I use a population genomics approach to explore the genetic structure and domestication bottlenecks in two pumpkin species, buttercup squash

(Cucurbita maxima) and cushaw (C. argyrosperma). I compare how different wild origins and breeding practices produce current crop diversity of these two independently domesticated pumpkin crops that share a domestication syndrome, but have differing domestication histories and breeding practices. I perform targeted genomic sequencing of 48 unrelated accessions for each species including wild, landrace, and improved lines and identify over 15,000 SNPs for each species. Population analysis of allelic diversity of SNP data shows a single domestication event in each species and suggests specific geographic regions where wild relatives are most similar to the respective domesticate. Results of these analyses also reveal that the locally important Mexican domesticate C. argyrosperma experienced a domestication bottleneck consistent with our expectations of the genetic consequences of domestication. In contrast, the

18

phenotypically diverse and economically important C. maxima ssp. maxima does not exhibit a reduction in genetic diversity relative to its wild ancestor. These results improve our knowledge about the domestication of two Cucurbita species and add to our understanding of how the reduction of genetic diversity during the processes of domestication and trait improvement impacts the breeding potential and utility of current crops.

19

CHAPTER 2 ORIGINS, AGRICULTURAL IMPORTANCE, AND BREEDING CHALLENGES OF PUMPKINS, SQUASHES, AND GOURDS (Cucurbita) AND THE USEFULNESS OF STUDYING Cucurbita CROP WILD RELATIVES

Introduction

Origin and Historical Use of the Pumpkins, Squashes, and Gourds

Domesticated Cucurbita include "" called pumpkins, summer and , and gourds, and are among the most important crops native to

North America (Small 2014). The incredibly long and rich domestication histories of multiple North American Cucurbita species produced pumpkin, squash, and gourd crops that are unrivalled in fruit morphological diversity (Duchesne 1786; Naudin 1856) and wide range of adaptation to cultivation. Three pumpkin and squash crop subspecies are native to North America: the widely cultivated and economically important Cucurbita pepo L. ssp. pepo (pumpkin, vegetable , cocozelle, ) and C. pepo ssp. ovifera (L.) D. S. Decker (scallop, acorn, crookneck, straightneck), and C. argyrosperma

C. Huber ssp. argyrosperma (silver- gourd, green-stripe cushaw, pipiana), a less widely cultivated crop that is important in traditional Mesoamerican agricultural systems (Montez-Hernandez and Eguiarte 2002). The three pumpkin and squash crop species and subspecies likely domesticated outside of North America are the widely cultivated C. maxima Duchesne ssp. maxima (buttercup squash, Hubbard squash, , squash) and C. moschata Duchesne (butternut squash), and C. ficifolia Bouché (figleaf gourd), a Cucurbita crop that is relatively unknown in the United States but widely cultivated in and regionally important in some regions of Asia.

20

Because fruit and varietal terms including "pumpkin", "", "winter squash", "gourd", and "cushaw" have been inconsistently applied to the diverse subspecies and varieties of domesticated Cucurbita, it is sometimes difficult to distinguish between accounts of different crop subspecies. For clarity, we will use

"pumpkins and squashes" to refer to Cucurbita crops generally. To refer to individual crop subspecies we will use the botanical name or "pepo pumpkin and squash" for C. pepo ssp. pepo, "ovifera pumpkin and squash" for C. pepo ssp. ovifera, "cushaw" for C. argyrosperma ssp. argyrosperma, "buttercup squash" for C. maxima ssp. maxima,

"figleaf gourd" for C. ficifolia, and "butternut squash" for C. moschata, although many common and varietal names can be used to refer to these crop subspecies.

The wild ancestors of the Cucurbita crops likely appealed to ancient semi- nomadic native peoples because of their large and conspicuous easily collected fruits

(Paris 2016). Wild Cucurbita plants are monoecious, multi-branched that grow along the ground or over trees and other plants or structures (Erwin 1931; Bailey 1943) and bear large (12-15cm) alternately arranged palmate on long petioles (Paris

2016). , , and roots are all borne at the leaf axil. The fruits of wild

Cucurbita vary somewhat among species but are generally round, 3.5-8.0 cm in diameter, with a green exocarp that may be striped or unstriped and may be yellow or green at maturity (Nee 1990). The rinds of wild Cucurbita are hard and lignified

(Robinson and Decker-Walters 1997), and their flesh contains that render them inedible unless repeatedly boiled (Nabhan 1985). Therefore, Cucurbita plants were likely initially selected by native North Americans for their edible, high- seeds as well as for the use of their durable rinds as containers; the latter was of utmost

21

importance to semi-nomadic people prior in the preceramic era (Small 2014). Discovery of rare, non-bitter or less-bitter Cucurbita fruits likely led to the eventual non-bitter

Cucurbita crops we know today. Native North Americans also ate the stems and flowers of Cucurbita and used the -containing flesh as soap, and archaeological evidence suggests many of the undomesticated wild species were used by humans

(Nabhan 1985). All Cucurbita domesticates likely moved outside of their initial range of domestication into other areas of the before European contact (Fritz 1994;

Smith 2001).

The traits that define the domestication syndrome of pumpkins and squashes include more uniform , a reduction in size and abundance of that interfere with harvesting, an increase in the size of fruits and seeds, and a reduction in the bitter taste of the flesh (Lira Saade and Montez Hernandez 1994). Some cultivated

Cucurbita varieties have a bush habit, and most domesticated subspecies exhibit a much wider range of fruit color, shape, and size than their wild relatives. Domesticated species in general have a decreased resistance to drought and disease. Pumpkins and squashes have been important in Mexico for millennia, where fruits are processed and consumed in a variety of ways. The seeds of pumpkins and squashes are a popular snack food and are also ground into a meal used to make sauces. Pumpkin and squash flowers are eaten stuffed or fried and are used to color and flavor soups and salads

(Paris 1989). The origin, current extent of cultivation, most common uses, and groups for each domesticated species are described below and summarized in Table 2-

1.

22

Cucurbita pepo

The domestication of pepo pumpkin and squash is among the earliest plant domestications in human history (Smith 2006). Archaeological evidence indicates that pepo pumpkin and squash was domesticated from an unidentified wild species in

Mexico around ~10,000 years B.P. (Smith 2006). Prior to the 1980s, ancient remains of

C. pepo from ~5,000 years B.P. discovered in Eastern North America were thought to represent the spread of domesticated pepo pumpkin and squash from Mexico (Smith

2006). However, there is now strong support for the independent domestication of ovifera pumpkin and squash from a different C. pepo subspecies (C. pepo ssp. ovifera) in what is currently the United States (Decker-Walters et al. 1990; Decker-Walters et al.

1993). The domestication of C. pepo ssp. ovifera (which includes cultivated scallop, acorn, crookneck, and straightneck squashes) is one of a small number of domestications that confirms the status of Eastern North America as an independent center of plant domestication (Smith 2006).

Domesticated pepo and ovifera pumpkins and squashes were introduced into

Europe and Asia Minor in the late 1400s and secondary diversification of these crops occurred in Asia Minor (Robinson and Decker-Walters 1997). Detailed drawings, paintings, and writings from Europe provide evidence that two ancient lineages of domesticated C. pepo were initially brought to Europe from America: "pumpkin" (C. pepo ssp. pepo var. pepo) and "scallop" (C. pepo ssp. ovifera var. clypeata) and that additional cultivar groups were developed from these lineages in Europe or Asia and were subsequently brought to America (Paris 1989). Among these varieties with probable European origins are the varieties of C. pepo that are most economically important today: "zucchini" and "" (Paris 1989).

23

Cucurbita argyrosperma

C. argyrosperma ssp. argyrosperma includes cultivated silver-seed gourd, green- stripe cushaw, and Calabaza pipian. The domestication of cushaw apparently occurred in Mexico, and archaeological remains of domesticated cushaw have been dated to as early as ~7,000 years B.P. (Smith 2005). In contrast to the other five Cucurbita crops, cushaw did not leave its origin of domestication during the and today is still cultivated primarily in Mexico and Central America with minor cultivation in

Asia (Robinson and Decker-Walters 1997). It is unclear why cushaw was not brought to

Europe during the Columbian exchange, but the reason for its historical and current lack of cultivation compared with other Cucurbita crops is likely due to the inferior quality of its fruit (Lira Saade and Montez Hernandez 1994).

Cucurbita maxima

Domesticated C. maxima ssp. maxima is among the most widely cultivated and morphologically diverse Cucurbita crop subspecies (Chigmura Ngewerume and

Grubben 2004). Cucurbita maxima ssp. maxima was domesticated ~4,000 years ago from the South American subspecies C. maxima ssp. andreana that occurs in Argentina and Uruguay and more rarely in Bolivia (Decker-Walters and Walters 2000). Cucurbita maxima ssp. maxima was brought to the Old World during the Columbian exchange

(Decker-Walters and Walters 2000) and is now cultivated all over the world, with a secondary center of diversity in Asia (Zhou et al. 2016) where extensive breeding and improvement of new varieties have occurred.

Cucurbita moschata

The origin of butternut squash is unclear. As recently as the early 1900s butternut squash was thought to be of Asian origin (Lira-Saade 1994) although all wild

24

Cucurbita species are native to the New World. Multiple lines of evidence now confirm that C. moschata was domesticated somewhere in Mexico, Central America, or South

America. The oldest archaeological remains of C. moschata were found in the Ocampo caves of northwestern Mexico and date from ~5,000 years B.P. More recent archaeological remains have also been found in northern (2,000 B.P.),

Guatemala (2,000 B.P.), and (3,000 B.P.) (Cohen 1978).

Cucurbita moschata moved outside of its area of domestication prior to European contact. For example, the C. moschata landrace "Seminole pumpkin" was first grown by indigenous groups of Florida in the United States before Europeans arrived (Lira-Saade

1994). By the end of the 1800s, C. moschata was cultivated in Asia and

(Robinson and Decker-Walters 1997).

Cucurbita ficifolia

The origin of figleaf gourd is also unclear and, like C. moschata, the figleaf gourd was thought to have had its origin in Asia (Lira-Saade 1994). Some authors have suggested Central America or southern Mexico as the site of domestication for figleaf gourd based on etymological evidence (Andres 1990), but archaeological evidence points instead to the Andean region of (Andres 1990). Phylogenetic studies have been unable to support any of these hypotheses as C. ficifolia is not closely related to any single wild Cucurbita species (e.g. Sanjur et al. 2002; Zheng et al.

2012; Kistler et al. 2015; Kates et al. 2017). Figleaf gourd spread from the New World to

Europe and Asia as early as the 1600s and its cultivation has since spread to many other parts of the world (Andres 1990).

25

Current Agricultural and Economic Importance of Pumpkins, Squashes, and Gourds

The current agricultural and economic importance of the domesticated lineages of Cucurbita differ substantially. Because of the inconsistent and ambiguous nomenclature of cultivated forms it is impossible to determine the precise contribution of individual subspecies to total Cucurbita agricultural production. Cucurbita moschata, C. maxima ssp. maxima, C. pepo ssp. pepo, and C. pepo ssp. ovifera are the most economically important, and references to production and consumption of "pumpkins, squashes, and gourds" refer to all of these subspecies. The terms "pumpkin" and

"squash" are also used; in this case, pumpkin still refers to all subspecies, but squash is more likely to refer to varieties of C. pepo ssp. pepo or C. pepo ssp. ovifera.

China and are the largest producers of pumpkins, squashes, and gourds today, and in these countries C. moschata and C. maxima ssp. maxima are the most commonly grown Cucurbita crops (Yang and Walters 1992; Sharma and Tarsam 1998).

This contrasts with the relative importance of Cucurbita crop subspecies consumed in the United States, where pepo and ovifera pumpkins and squashes are the most popular Cucurbita crops. The "pumpkin" types of Cucurbita consumed in the United

States are mostly produced domestically (Minor and Bond 2017, and in 2014 750K tons of pumpkins were produced in the United States (Minor and Bond 2017). In contrast, most of the "squash" types of Cucurbita consumed in the United States are imported

(FAOSTAT 2017), and the United States is the largest importer of squash worldwide.

Cushaw and figleaf gourd are of regional importance rather than worldwide economic importance. Cushaw is rarely grown outside of the Western Hemisphere and is not widely grown outside of its origin of domestication in Mexico (Robinson and

26

Decker-Walters 1997). Figleaf gourd is regionally popular in some areas outside of its likely area of domestication, but is still relatively rare outside of Mexico, Central

America, and South America (Andres 1990).

There is a wide range of modern uses of pumpkins and squashes that often vary by species and by variety. The commercial uses of pumpkins and squashes include (in order of importance): immature and mature fruit as food (fresh market and processed); seeds for direct consumption; seeds for vegetable oil; mature fruit as animal feed; seeds for meal; and non-edible types for ornamental use. In general, round-fruited types are usually grown for mature fruits or seed (Paris 2016), and the long or flat-fruited types are primarily grown for consumption of immature fruits (Paris 1989). Cucurbita fruits are a good source of A, with levels comparable to that of avocados, asparagus, musk-, Brussels sprouts, artichokes, and green olives (Whitaker 1962) and are an excellent source of starch. The and oil content of Cucurbita seeds is very high and is comparable to sunflower and soybean oil in its fatty-acid profile (Whitaker 1962). In some countries including the United States, most varieties of "pumpkins" are seasonal crops and up to ninety percent of annual consumption occurs between October and

January.

Cucurbita pepo

The popularity and importance of edible varieties of pepo and ovifera pumpkins and squashes has increased tremendously since the 1970s (Small 2014). The United

States produces a modest amount of pepo and ovifera pumpkins and squashes (less than 900K tons in 2014 compared with 7M tons in and 5M tons in India)

(FAOSTAT 2017), but is the largest importer of pumpkins, squash, and gourds in the world. Ninety-five percent of squash consumed in the United States is grown in Mexico

27

(FAOSTAT 2017). There are four cultivated varieties of edible pepo pumpkins and squash and four cultivated varieties of edible ovifera pumpkin and squash.

Currently, the zucchini variety of pepo pumpkin and squash (C. pepo ssp. pepo var. cylindrica), also known as "summer squash", is the most popular Cucurbita crop in the United States (Paris 2008). This pepo pumpkin and squash variety is consumed as a fresh vegetable in its immature state (Robinson and Decker-Walters 1997). Prior to the 1980s, yellow and green "summer squash" consumed in the United States and around the world included immature fruits of nearly all eight edible varieties of pepo and ovifera pumpkin and squash. These diverse varieties of "summer squash" have been replaced with uniform-inbred and highly improved types of C. pepo ssp. pepo var. cylindrica, reflective of a pattern of reduced genetic and morphologic variety in C. pepo produced worldwide (Paris 1989). Breeding and cultivation of pepo pumpkin and squash is increasingly dependent on inbreeding, as the importance of virus-resistant transgenic zucchini increases. Approval for production in Mexico is under consideration and is likely to exacerbate this trend (Reyes et al. 2015). The other three varieties of edible pepo pumpkin and squash are: "pumpkin" (C. pepo ssp. pepo var. pepo), which includes creeping cultivars that produce round, flat-ended fruits (most famously the

Halloween "jack-o-lantern" type); "cocozzelle" (C. pepo ssp. pepo var. longa), a variety that produces long, cylindrical fruits eaten in the unripe state; and "vegetable marrow"

(C. pepo ssp. pepo var. fastigata), a semi-creeping variety that bears short, cylindrical fruits most commonly eaten when mature.

The four varieties of ovifera pumpkin and squash are: the semi-shrubby "scallop"

(C. pepo ssp. ovifera var. clypeata), which is eaten in its immature state and is the

28

cultivated ovifera pumpkin and squash variety that most resembles a wild ancestor based on ; "acorn" (C. pepo ssp. ovifera var. turbinata), a variety that can be shrubby or creeping and bears soft-rinded fruit that can be eaten in its mature state; and

"crookneck" (C. pepo ssp. ovifera var. torticollia) and "straightneck" (C. pepo ssp. ovifera var. recticollis), two similar varieties that include shrubby plants that produce yellow fruits eaten in their immature state.

Cucurbita argyrosperma

Three varieties of cushaw are grown, primarily in Mexico where they are cultivated near the range of their wild ancestor, C. argyrosperma ssp. sororia. Cucurbita argyrosperma ssp. argyrosperma var. argyrosperma (silver seed gourd) likely represents the initial domesticate from which the other cultivated varieties were subsequently developed in different parts of its range of cultivation in Mexico. Limited genetic data reveals a high proportion of wild ancestry for samples of silver seed gourd

(Kates et al., unpublished), but more extensive sampling at the varietal level is needed to determine whether this variety truly represents the initial domesticate of this species.

The large seed-size of this variety suggests that its seeds, rather than its flesh, were the target of initial domestication (Lira Saade and Montez Hernandez 1994). Silver-seed gourd is grown infrequently by home gardeners in the United States as a curiosity (Lira

Saade and Montez Hernandez 1994). Cucurbita argyrosperma ssp. argyrosperma var. callicarpa (green stripe and white cushaw; Japanese pie pumpkin) is considered the most recent or specialized variety of cushaw, and the diversity of shapes, colors, and size of the fruits and seeds suggests it was domesticated for its flesh and its seeds (Lira

Saade and Montez Hernandez 1994).

29

Outside of Mexico, cushaw is a crop of minor importance in South America and in the United States (Lira Saade and Montez Hernandez 1994). A third cultivated variety, C. argyrosperma ssp. argyrosperma var. stenosperma (calabaza pipiana), is another more recently derived variety of cushaw, and although it also has diverse fruit morphology (Lira Saade and Montez Hernandez 1994), it is now mostly grown in

Mexico and Central America for its seeds (Merrick 1990). A fourth variety of cushaw, C. argyrosperma ssp. argyrosperma var. palmeri, is wild and thought to be a feral escape from cultivation (Lira Saade and Montez Hernandez 1994). Genetic data resolve a close relationship between C. argyrosperma ssp. argyrosperma var. palmeri and the cultivated varieties, and support its status as a feral rather than a truly wild taxon (see

Chapter 5).

Cucurbita maxima

Outside of the , C. maxima ssp. maxima is one of the two most common Cucurbita crops consumed. The many cultivars of C. maxima ssp. maxima, known generally as "pumpkins" (Sharma and Lal 1998), are very popular in Asia and

Africa where their mature fruits are widely used in and as cattle feed. In contrast to the minor nutritional importance of squash and pumpkin consumption in the

United States, in Africa and Asia, C. maxima ssp. maxima pumpkins may serve as a staple food when grain production is limited (Sharma and Lal 1998).

Attempts to classify the diverse C. maxima ssp. maxima into cultivar groups are inconsistent, but popular named varieties include banana squash, delicious squash, buttercup squash, hubbard squash, show pumpkins, , and kabocha

(Decker-Walters and Walters 2000). Varieties of C. maxima ssp. maxima, especially the

"buttercup" variety, are the most popular type of pumpkins and squashes consumed in

30

Africa. Cucurbita maxima ssp. maxima also includes all "buttercup squash", varieties that produce the largest fruits in the world. Buttercup squash is popular as ornamental pumpkins in the United States for and are celebrated in many festivals and competitions.

Cucurbita moschata

Cucurbita moschata is one of the two most popular Cucurbita species outside of the Americas. Like C. maxima, its many regional varieties, which were developed in

Central and South America, Africa, and the United States (Lira-Saade 1994), are often referred to as "pumpkins". Among these, the three cultivar groups commercially recognized in North America are cheese, crookneck, and bell (including the popular

"butternut" cultivar) (Robinson and Decker-Walters 1997). In the United States, all canned pumpkin is C. moschata, and the most popular variety of canned pumpkin is

Libby's Select Dickinson (Geisler 2014).

Cucurbita ficifolia

Figleaf gourd is not common in industrialized countries (Robinson and Decker-

Walters 1997) but is used as food and cattle feed in Mexico, Central America, and

South America. Cucurbita ficifolia is also popular in some regions of China where it is known as "shark fin melon" because of its use in a soup that resembles shark fin soup.

Cucurbita ficifolia is also used as a rootstock for grafting (Robinson and

Decker-Walters 1997). Research has demonstrated the potential of the proteolytic enzymes in the pulp of C. ficifolia fruit to treat waste water from the industrial processing of derived from fish (Illanes et al. 1985), but this use of C. ficifolia has never been implemented.

31

Challenges in Cultivation of Squashes, Pumpkins, and Gourds

Viral, bacterial, and fungal diseases

Disease susceptibility is common in all Cucurbitaceae crops (e.g. Citrullus

() and Cucumis (cucumber and melon)), but is comparatively understudied in Cucurbita. For example, fewer disease resistance genes have been reported for pumpkins, squashes, and gourds than for the other cucurbit crops (Robinson and

Decker-Walters 1997). Insufficient funding is mostly responsible for the relative lack of research to screen germplasm and identify disease resistance in Cucurbita, but the large genome size of Cucurbita compared to other cucurbits and the fact that certain cucurbit diseases, including , are less of a problem in Cucurbita than in other cucurbit crops may also play a role.

Diseases attack pumpkins and squash at every stage of development, from germinating seeds to mature fruits (Robinson and Decker-Walters 1997). The major challenge in the cultivation of the North American squashes, pumpkins, and gourds are viral diseases, especially those transmitted by aphids, whiteflies, and other insects

(Paris 2016); virus resistance is one of the most important goals of Cucurbita breeding.

Virus susceptibility in Cucurbita is a moving target; the most damaging viruses vary by region and change over time, and newly damaging viruses are reported frequently

(Paris 2016). Some of the viruses that infect pumpkins and squashes are the cucumber (CMV), (WMV), zucchini yellow mosaic virus

(ZYMV), and ringspot virus (PRV) (Robinson and Decker-Walters 1997). Among these, ZYMV has been the most destructive since the 1970s (Paris 2016). Because there are no chemical tools to control these viruses, the only way to limit the damage is

32

through breeding resistant crops or management of the insect vectors using

(Molinar et al. 2012).

Fungal and bacterial diseases that are the most damaging to pumpkin and squash crops include ( cichoracearum DC.), downy mildew

(Pseudoperonospora cubensis (Berkeley & M. A. Curtis) Rostovzev), gummy stem blight ( (Fuckel) Rehm), charcoal rot (Macrophomina phaseoli

(Tassi) Goid), root and fruit rot caused by capsica Leonian, P. spp., and

Fusarium solani f. sp. cucurbitae (Mart.) Sacc, (Erwinia tracheiphila (Smith)

Bergey), and bacterial leaf spot (Xanthomonas campestris pv. cucurbitae (Pammel)

Dowson) (Robinson and Decker-Walters 1997). Fungal and bacterial diseases of squash are controlled by long-term rotation out of cucurbits (4 years or more), the use of clean seed, chemical treatment (Molinar et al. 2012), and generally reducing environmental stress whenever possible.

Susceptibility to diseases varies among domesticated Cucurbita species. With a few exceptions (see: Walkey and Pink 1984; Lebeda and Kristkoa 1996; Kristkoa and

Lebeda 2000), pepo and ovifera pumpkins and squashes and wild C. pepo do not naturally contain resistance to diseases (Paris 2016). Susceptibility of buttercup squash to major cucurbit diseases is apparently similar to pepo and ovifera pumpkins and squashes (Provvidenti et al. 1978), though this observation is based on a small number of studies. Disease resistance in cushaw and wild C. argyrosperma has not been as well-studied and also appears to be rare (Provvidenti et al. 1978; Luietl 2016; Wessel-

Beaver 1998), but some C. argyrosperma germplasm exhibits resistance to downy mildew and offers a potential source for breeding downy mildew-resistant squash and

33

pumpkins (Lebeda et al. 2016). Butternut squash is resistant to a greater number of diseases than other domesticated species (e.g. Provvidenti et al. 1978: ZYMV, PRV, ringspot virus, , squash curl leaf virus; Chavez et al. 2011: crown rot; Zhou et al. 2010: downy mildew). The prospects and limitations of utilizing pest and disease-resistant crop wild relatives (CWR) for crop enhancement are discussed below in "Crop wild relatives of pumpkins, squashes, and gourds". To avoid the loss of susceptible crops, farmers apply pesticides to target insect vectors, rotate pumpkin and squash crops, and avoid practices that excessively wet squash leaves or create standing water (Sharma and Lal 1998). To address the need for disease resistant C. pepo crops, geneticists and breeders in the United States created a transgenic zucchini with virus-resistance genes.

The transgenic variety of zucchini called "Freedom II" was the second transgenic crop to be deregulated for commercial use in the United States in 1995 (Tricoli et al.

1995). There are currently at least six transgenic cultivars of C. pepo ssp. cylindrica being sold in the United States that are resistant to WMV, ZYMV, and (Gaba and Zelcer 2004). The impact of transgenic zucchini on worldwide pepo pumpkin and squash production is very limited for several reasons. Resistance to the three viruses listed above apparently increases the vulnerability of the resistant squash to other viruses and to some insect pests (Sasu et al. 2009). Additionally, the United

States is not a major producer of pepo pumpkin and squash; is the only country that allows the import of transgenic pepo pumpkin and squash from the United States

(CBAN 2015), and nearly all zucchini sold in the United States market is imported from

Mexico (FAOSTAT 2017), where transgenic squash has not been deregulated (Cruz

34

and Reyes 2015). The potential deregulation of transgenic pepo pumpkin and squash cultivars in Mexico is discussed below in "Conservation status of CWR and WUS".

Non-disease challenges to cultivation

The severity of some non-disease challenges to cultivation differs among the species and varieties of ovifera and pepo pumpkins and squashes and cushaw due to the phenotypic diversity of the cultivated types. All pumpkin and squash crops are sensitive to very low temperatures and continuous frost (Lira-Saade and Montes-

Hernandez 1998; Sharma and Tarsam 1998) and to heavy precipitation and standing water, which can cause fruit to rot. Cultivation of varieties of pepo and ovifera pumpkins and squashes and of cushaw that are grown for mature fruits (e.g. the acorn variety of ovifera pumpkin and squash; most important cultivars of cushaw) require high water use. Pumpkins and winter squash are among the highest water-using vegetable crops

(Daniello 2003) in contrast to summer squashes (e.g. zucchini, straightneck, crookneck), which have the lowest water requirements. Pumpkins and winter squash also require large amounts of arable land to support their vine or semi-bush habit. The bush habit has been selected for in nearly all summer squash varieties (Paris 2016), but these types require larger applications of fertilizer (Sharma and Tarsam 1998).

Insect pests that damage pumpkin and squash crops include seedcorn maggot (Delia platura Meigen), which are associated with high amounts of decaying organic matter in the soil, wireworms (Limonius spp. and others), squash bugs ( DeGeer), whiteflies (Bemisia argentifolii Gennadius and Trialeurodes vaporariorum Westwood), aphids ( Glover and Myzus persicae Sulzer), which are also disease vectors, cucumber beetles (Diabrotica undecimpunctata

Mannerheim and Acalymma trivittatum Mannerheim), larvae of several species of

35

armyworm (Spodoptera spp.), and the cabbage looper (Trichoplusia ni Hübner) (Molinar et al. 2012; Robinson and Walters 1997). Bitter cucurbitacins present in Cucurbita (see sect. 16.2.3) attract squash bugs, corn rootworm, and , and wild and domesticated Cucurbita species are used as perimeter trap crops in integrated pest management (e.g. Addler and Hazzard 2009; Metcalf et al. 1979).

The effect of climate change on challenges to cultivation

Some of the challenges that Cucurbita faces in cultivation may intensify under climate change (e.g. Liu et al. 2011). Although the impacts of climate change on cucurbit crops have not been widely modelled and studied, some plant pathogens are expected to spread and infect plants more readily under current climate change scenarios (Pautasso et al. 2012), and environmentally stressed pumpkins and squash are more susceptible to initial infection and subsequent disease development (Robinson and Decker-Walters 1997). Changes in precipitation also pose a threat to Cucurbita, especially to winter squash and pumpkin cultivars that require high water input and bear fruits that are vulnerable to rot while they mature. In 2015, unusually heavy summer rains drastically reduced pepo pumpkin and winter squash yield (Kennedy 2015). The previous year, drought in reduced yield of Cucurbita grown in that state and increased pest-damage (CBS News 2014).

There are also characteristics of Cucurbita cultivation that may make it resilient to climate change. For example, compared with crops that rely entirely on , Cucurbita crops are also pollinated by native (Giannini et al. 2011).

Though the potential effects of climate change on pollination of Cucurbita crops has not yet been studied, pollination by native bees was found to be a potential buffer against climate change in watermelon. (Radar et al. 2013). Pumpkin and squash production as

36

a whole may be less affected by climate change than that of some other crops due to the diversity of cultivated types and the wide range of elevations and temperatures at which they can be grown.

Crop Wild Relatives of Pumpkins, Squashes, and Gourds

The wild relatives of pumpkins and squashes that occur in North America include six arid-adapted (xerophytic) perennial species and seven non-arid-adapted

(mesophytic) annual species, subspecies, or varieties. The greatest species diversity occurs in Mexico (Figure 2-1), although phylogenetic data suggests the genus likely originated in central or South America (Schaefer et al. 2009). The wild xerophytic perennial Cucurbita species grow in the deserts and dry scrub of Mexico and the southwestern United States, and wild mesophytic annual species grow in the moist or dry forests and plains of Mexico and the southern and central United States. All domesticated Cucurbita species and subspecies are derived from the mesophytic annual species group; therefore, all mesophytic annual species are more closely related to cultivated pumpkins and squashes than are any of the xerophytic perennial species.

Table 2-2 lists the habitat, distribution, genepool classification, and potential agronomic traits for 13 Cucurbita CWR native to North America. Likely due to the recent divergence of the wild Cucurbita species (Zheng et al. 2013), all Cucurbita CWR can be crossed with one or more other species in the genus. Introgression of traits from wild to cultivated Cucurbita can be made directly from a CWR with desirable traits or using another CWR as a genetic bridge (Yi-Hong Wang et al. 2011).

CWR are designated as primary, secondary, and tertiary genepool CWR based on the ease with which they can exchange genes with the crop to which they are related

(Harlan and de Wet 1971); crosses between the primary CWR and crop readily produce

37

fertile offspring, crosses between secondary CWR and the crop are more challenging but possible, and crosses between tertiary CWR and the crop require sophisticated breeding techniques such as embryo rescue or using a secondary CWR as a genetic bridge (Harlan and de Wet 1971). If sufficient crossing experiments between a CWR and crop have not been performed to assign a CWR to a genepool classification, evolutionary relationships among CWR are used to predict the genepool of the CWR

(Maxted et al. 2006).

Because there are multiple domesticated Cucurbita species, each Cucurbita

CWR may be a primary, secondary, and/or tertiary genepool CWR. Below, each CWR is grouped into the genepool rank that denotes its closest relationship to a domesticated

Cucurbita crop species or subspecies. If a CWR is also in a subsequent genepool group, this is noted.

Primary Genepool CWR

Cucurbita argyrosperma ssp. sororia (L. H. Bailey) L. Merrick & D. M. Bates

Cucurbita argyrosperma ssp. sororia is the putative wild ancestor of cushaw

(Merrick 1990), and the two subspecies form fully fertile offspring when hybridized

(Merrick 1990). Cucurbita argyrosperma ssp. sororia is also a secondary genepool

CWR of the pepo and ovifera pumpkins and squashes and butternut squash and a tertiary genepool CWR of buttercup squash and figleaf gourd (Table 2-2). Cucurbita argyrosperma ssp. sororia is locally widespread and occurs in the lowland thorn-scrub vegetation of the Pacific and, less often, the Gulf of Mexico coasts and from southern

Sonora in Mexico south to (Nee 1990) (Figure 2-2). Like most Cucurbita

CWR, C. argyrosperma ssp. sororia has a weedy growth habit and is most common along roadsides, near agricultural fields, and other disturbed areas (Merrick 1990). It

38

thrives in high-light environments and is often found growing over other plants or manmade structures (Merrick 1990).

It is possible to distinguish between C. argyrosperma ssp. sororia and cultivated cushaw by the wild subspecies' larger and more deeply lobed leaves, later flowering time (Jones 1992), and smaller fruits and seeds. The fruits of C. argyrosperma ssp. sororia are ovate and relatively small (about 8 cm in diameter), and contain seeds that are around 50-80% smaller by weight than seeds of the cultivated subspecies (Merrick

1990). Cucurbita argyrosperma ssp. sororia commonly grows near fields of cultivated cushaw in some areas in Mexico, and gene flow between the two subspecies can introduce bitterness into the fruit of the crop (Nabhan 1985; Montes-Hernandez et al.

2005). Gene flow between these two subspecies also introduces characteristics of the crop into wild populations; wild squashes are found with green and white striped rinds, thickened stems, and non-bitter flesh (Nabhan 1985).

Cucurbita argyrosperma ssp. sororia is resistant to two viruses: bean yellow mosaic virus (BYMV) and tomato ringspot virus (TmRSV) (Table 2-2). Neither of these viruses is a major threat to Cucurbita crop production, and resistance to these viruses is common in Cucurbita. Resistance to other viruses including cucumber mosaic virus and watermelon mosaic virus has not been found in C. argyrosperma ssp. sororia

(Provvidenti et al. 1978). There are accounts of wild C. argyrosperma ssp. sororia being used in rural areas of Mexico medicinally and for its edible seeds (Montes-Hernandez et al. 2005).

Cucurbita pepo L. (wild)

Cucurbita pepo ssp. fraterna (L. H. Bailey) Lira et al. and the two wild varieties of

C. pepo ssp. ovifera are primary genepool CWR of both ovifera and pepo pumpkin and

39

squash. These three taxa are also secondary genepool CWR of cushaw and butternut squash and tertiary genepool CWR of buttercup squash and figleaf gourd (Table 2-2).

Wild populations of C. pepo ssp. ovifera that occur in the are classified as C. pepo ssp. ovifera (L.) var. ozarkana D. S. Decker (Ozark gourd). Those that occur in are classified C. pepo ssp. ovifera (L.) D. S. Decker var. texana

(Scheele) Filov (Texas gourd) (Figure 2-2). Ozark gourd occurs along riverbanks and in other disturbed lowland habitats throughout the Ozark plateau and Greater Mississippi

Valley (Smith et al. 1992) and is a persistent weed in agricultural fields (Decker and

Wilson 1987), and Texas gourd occurs along riverbanks and in moist thickets in Texas

(Erwin 1938). Ozark gourd has been proposed as the wild ancestor of ovifera pumpkin and squash based on isozyme studies (Decker-Walters 1993). Modern molecular phylogenetic analyses have not found support for the separation of Ozark gourd and

Texas gourd and thus have not been able to clearly suggest either as the ancestor of ovifera pumpkins and squashes (Kates et al. 2017). However, Ozark gourd and Texas gourd are geographically distinct and can be distinguished from each other based on fruit color and germination time (Decker-Walters 1993), and the relationships of these two groups of wild C. pepo to ovifera pumpkin and squash should be studied further.

Cucurbita pepo ssp. fraterna is known from a small number of populations in the upland, seasonally dry thornscrub in northeastern Mexico (Nee 1990) (Figure 2-2).

Although C. pepo ssp. fraterna was initially considered as a possible wild ancestor of C. pepo ssp. pepo, multiple phylogenetic studies of Cucurbita do not support that hypothesis (Sanjur et al. 2002; Zheng et al. 2013; Kates et al. 2017). Phylogenetic

40

analysis indicates that C. pepo ssp. fraterna is highly diverged from the rest of wild C. pepo (Kates et al. 2017).

Few cases of disease resistance have been documented in wild C. pepo (Paris

2016). To introduce virus resistance into cultivated pepo and ovifera pumpkins and squashes, virus resistance has been introgressed from other less closely related CWR and cultivated Cucurbita subspecies (Paris 2016), but these hybridizations are challenging for breeders.

Secondary Genepool CWR

Cucurbita lundelliana L. H. Bailey

Cucurbita lundelliana is a secondary genepool CWR of both ovifera and pepo pumpkin and squash and of butternut squash, buttercup squash, and figleaf gourd, and a tertiary genepool CWR of cushaw. Cucurbita lundelliana is a mesophytic annual species that is native to southern Mexico and to parts of Central America where it occurs at low elevations in tropical deciduous forests and as a weed in agricultural fields

(Lira 2009) (Figure 2-3). Before the multiple independent domestications in Cucurbita were hypothesized, C. lundelliana was considered as a possible ancestor of all domesticated Cucurbita because it is interfertile with all the cultivated Cucurbita species other than C. pepo (Nee 1990). The ability of C. lundelliana to hybridize with many wild and domesticated Cucurbita species has led to its use as a bridge to transfer genes between species that are difficult to cross (Whitaker and Robinson 1986). Cucurbita lundelliana exhibits resistance to Squash Leaf Curl Virus, Cucumber Mosaic Virus, and

Powdery Mildew (Grubben 2004) (Table 2-2). Resistance to Squash Leaf Curl Virus has been transferred from C. lundelliana to the cultivated species C. moschata (butternut squash), but not to the North American crop subspecies. In addition to its importance as

41

a genetic bridge species and as a source of virus resistance, C. lundelliana produces fruit that are sometimes used locally as a soap substitute and as a container (Lira

2009).

Cucurbita okeechobeensis (Small) L. H. Bailey

Cucurbita okeechobeensis (Okeechobee gourd) includes two geographically disjunct subspecies. These subspecies are secondary genepool CWR of pepo and ovifera pumpkins and squash and of butternut squash, and they are tertiary genepool

CWR of cushaw, buttercup squash, and figleaf gourd. Cucurbita okeechobeensis ssp. martinezii (L. H. Bailey) T. C. Andres & Nabhan ex T. W. Walters & D. S. Decker is native to Mexico where it grows at a wide range of elevations from sea level to the mountain cloud forest and is a weed in coffee plantations and agricultural fields (Lira

2009) (Figure 2-3). Cucurbita okeechobeensis (Small) L. H. Bailey ssp. okeechobeensis occurs in only two locations in central Florida in the United States where it grows in the permanently wet soil of riverbanks or lakeshores (Figure 2-3). The Okeechobee gourd is one of only two federally-listed endangered species native to the United States that were identified as high priority for their value as genetic resources for important food crops (Khoury et al. 2013). The conservation status of the Okeechobee gourd is discussed in " Conservation status of CWR and WUS". Both subspecies of C. okeechobeensis are resistant to CMV, BYMV, the TRSV, powdery mildew (Lira 2009;

Jahn et al. 2002; Formisano et al. 2010), and bacterial leaf spot (Robinson and Decker-

Walters 1997) (Table 2-2). Resistance to powdery mildew has been introgressed from wild C. okeechobeensis to C. pepo, and the resistant offspring are commercially produced (Jahn et al. 2002; Formisano et al. 2010).

42

Cucurbita pedatifolia L.H. Bailey and C. radicans Naudin

Cucurbita pedatifolia and C. radicans are secondary genepool CWR to figleaf gourd and butternut squash, respectively, and tertiary genepool CWR to all other

Cucurbita crops. Of the six xerophytic perennial Cucurbita species, the closely related and interfertile xerophytic perennial species C. pedatifolia and C. radicans (along with tertiary genepool CWR C. foetidissima) are the most closely related to the cultivated pumpkins and squashes (Kates et al. 2017). Cucurbita pedatifolia and C. radicans occur in dry forests of Mexico from Zacatecas at the northern end of their range south to

Chiapas (Figure 2-3). Collections of C. radicans suggest that it occurs between and to the west of disjunct northern and southern distributions of C. pedatifolia (Figure 2-3), but the morphological, ecological, and genetic differences of these two species are not documented, and additional work is needed to clarify whether they are truly different species.

Because of their closer relationship to the mesophytic species and because they are sometimes referred to as semixerophytic (though it is unclear how their adaptation to aridity differs from the other xerophytic species), it has been proposed that any of C. pedatifolia or C. radicans could be used as a possible genetic bridge between the xerophytic and perennial species (Bemis and Whitaker 1969). However, crosses between these species and mesophytic species have been unsuccessful (Robinson and

Decker-Walters 1997).

The development of multiple potato-size tuberous roots by C. pedatifolia rather than the huge taproots produced by other xerophytic species is a trait that may be beneficial for the development of a drought-tolerant Cucurbita starch crop (Andres

1987) (Table 2-2). Cucurbita radicans is resistant to CMV, TRSV, BYMV, but it is

43

susceptible to WMV (Provvidenti et al. 1978) (Table 2-2). Cucurbita pedatifolia has not been screened for virus-resistance, but considering its very close relationship to C. radicans, it may harbor similar traits.

Tertiary Genepool CWR

The four xerophytic perennial Cucurbita species that are tertiary genepool CWR for all Cucurbita crops occur in the hottest and driest regions in North America: the deserts of northwestern Mexico and the southwestern United States (Bemis and

Whitaker 1969) (Figure 2-4). Previously it was thought that the xerophytic species of

Cucurbita were derived from non-xerophytic species and had subsequently evolved an adaptation to hot, dry climates. However, recent studies suggest that wild squashes tolerant to prolonged drought and extreme temperatures represent the ancestral state of pumpkins and squashes, and that domesticated pumpkins, squashes, and gourds and their non-arid-adapted CWR are derived from a drought-tolerant ancestor (Zheng et al.

2013; Kates et al. 2017).

The xerophytic Cucurbita CWR species grow at low elevations (generally less than 1300m) in coarse, dry, sandy soils. They are most common along roadsides and washes where water accumulates during rare periods of precipitation. Compared with the wild species of Cucurbita that grow in temperate to tropical grasslands and forests, the species of pumpkins and squashes that grow wild in the deserts and dry forests of

North America are conspicuous, and some are quite frequent throughout their native ranges due to the lack of competition from other plant species. The xerophytic Cucurbita

CWR can be divided into two groups, one of which is more closely related to cultivated pumpkins and squashes, though more research is needed to understand the relationships among the xerophytic Cucurbita CWR. The Spanish or English common

44

names chichicoyota, calabaza de coyote, calabacilla, and coyote melon/gourd, may refer to any wild, arid-adapted Cucurbita (and rarely to non-arid-adapted wild

Cucurbita), so here we only use common names that refer to a single species.

Cucurbita foetidissima Kunth

Among the xerophytic Cucurbita CWR, C. foetidissima (buffalo gourd) may be the most well-known due to its weedy, common occurrence, conspicuous folded leaves, and typically abundant, round fruit that is golden-yellow at maturity (DeVeaux and

Shultz 1985). Buffalo gourd is native to the deserts of the southwestern United States and northern Mexico (Bailey 1943) (Figure 2-4) and grows as a weed, producing dense groundcover on disturbed soils (DeVeaux and Schultz 1985). "Fetid gourd" is another common name for buffalo gourd and refers to the unpleasant smell of its leaves and flesh (Nabhan 1985). Like other xerophytic Cucurbita CWR, buffalo gourd reproduces primarily asexually (DeVeaux and Schultz 1985); although its vines may bear many fruits, both germination and survival of young seedlings are uncommon. Like other xerophytic perennial species, buffalo gourd has low water requirements (DeVeaux and

Schultz 1985). It is also highly resistant to many insects and diseases that threaten cultivated pumpkins and squashes (Curits 1964; Shahani et al. 1951; Paur 1952) (Table

2-2), though it is susceptible to (Rosemeyer et al. 1982). Buffalo gourd is resistant to some insect pests, including , but cucumber beetles are attracted to the high level of cucurbitacins found in its fruit, roots, and (DeVeaux and Schultz 1985) (Table 2-2).

Cucurbita palmata S. Watson, C. digitata A. Gray, C. cordata S. Watson

This group of tertiary genepool CWR includes the three wild Cucurbita species commonly known as "coyote gourd" or "coyote melon", which are the most distantly

45

related to cultivated pumpkins and squashes (Kates et al. 2017). Although there are morphological differences among these interfertile and partly sympatric species, at some point they were reclassified as three subspecies of C. digitata (Scheerens et al.

1991). It does not appear that this was ever adopted. All three species are native to the lowland deserts of the southwestern United States and western Mexico and typically occur in disturbed, gravelly soils (Shcheerens et al. 1991). Cucurbita cordata is narrowly distributed in Baja California in Mexico (Figure 2-4). The distribution of Cucurbita palmata extends from northeastern Baja California through California into the San Joaquin Valley and the southern part of the Salinas Valley and east to near the

Colorado River (Bemis and Whitaker 1969) (Figure 2-4). The southern end of the distribution of C. digitata ranges from northern Sonora, Mexico into southern Arizona and New Mexico (Figure 2-4). Cucurbita palmata occurs between the disjunct ranges of

C. digitata in southern Arizona and northern Baja California, and interspecific hybridization is common where the two species are sympatric at the periphery of their ranges (Bemis and Whitaker 1969).

Species in this group of xerophytic perennials exhibit extreme drought tolerance and resistance to many of the viruses that infect cultivated Cucurbita (Table 2-2), but because of the distant relationships between the xerophytic perennials and the domesticated Cucurbita, introgression of these resistances into the cultivated Cucurbita has not been possible (Provvidenti 1990). Although desirable traits from these species may not be directly introgressed into cultivated pumpkins and squashes, modern studies of the genetics underlying these traits could aid in the identification of genes

46

important for conferring disease resistance and drought tolerance in the cultivated species and subspecies.

Wild Utilized Species

Nearly all Cucurbita CWR are utilized in some way by rural people who live within the native ranges of the CWR. The diverse uses for wild pumpkins and squash are based on the bitter chemicals in their flesh, nutritious seeds, and hard rinds, which are the same traits that initially attracted hunter-gatherers to wild pumpkins and squashes.

There are accounts of rural desert-dwelling people in the southwestern United States and Mexico using the undried fruit of coyote melon (C. digitata, C. palmata, C. cordata) for soap due to its saponin-rich flesh (Nabhan 1985). The dried roots of buffalo gourd

(C. foetidissima) are sold in medicinal herb markets in the southwestern United States

(Nabhan 1985). The non-xerophytic subspecies C. argyrosperma ssp. sororia is also utilized by rural Mexicans who eat the seeds and sometimes sell them in markets

(Merrick 1990). Rural farmers and their families report using the bitter flesh (which contains and cucurbitacins) to treat intestinal worms and as a biocide to purify water (Merrick 1990). Oil from cultivated Cucurbita subspecies has recently been shown to have pharmacological properties (Bardaa et al. 2016) that are likely also present and exploitable in the wild Cucurbita species and subspecies. Although there are many potential uses for Cucurbita CWR, the only wild Cucurbita species that was developed for commercial production is buffalo gourd (C. foetidissima).

Interest in domestication of buffalo gourd as a dryland oilseed crop emerged following the vegetable oil shortages during the Second World War (DeVeaux and

Shultz 1985). Buffalo gourd was considered as a potential oilseed crop as early as

1946, and in the decades that followed, it was the subject of several studies (e.g. Bolley

47

et al. 1950; Shahani et al. 1951) and preliminary cultivation and domestication efforts

(Paur 1952; Curtis and Rebeiz 1974; Havener 1974). During the late 1970s and early

1980s, scientists worked to rapidly domesticate the wild species as a dryland oilseed and starch crop (Gatham and Bemis 1990). Researchers determined that the buffalo gourd required 150 mm of water annually if grown for its root and 250 mm annually if grown for seed (DeVeaux and Shultz 1985) and that it could yield up to 3,000 kg/ha of seed (Bemis et al. 1978). The oil of seeds produced by buffalo gourd is similar to sunflower oil (DeVeaux and Shultz 1985), and the oil yield was predicted to be up to two times that of sunflower at 91 gal/acre (DeVeaux and Shultz 1985). Root starch from cultivated buffalo gourd was considered as a potential source of ethanol; researchers estimated alcohol yield from buffalo gourd root starch was superior to corn or grain sorghum at around 400 gal/a (DeVeax and Shultz 1985).

Despite the promise of buffalo gourd as an oilseed and ethanol crop for arid lands, interest in the development of the crop waned by 1990 (Small 2013).

Commercialization of buffalo gourd failed, apparently due to its lack of unique qualities needed for breeders and farmers to shift their current practices. However, in the decades since interest buffalo gourd declined, reduced supplies of water and arable land that will only grow scarcer are increasing the demand for drought-tolerant crops and biofuel production, and interest in buffalo gourd may be renewed.

Conservation Status of CWR and WUS

In situ

The Okeechobee gourd (C. okeechobeensis ssp. okeechobeensis) is one of only two federally endangered plants native to the United States that have been determined as high priority as genetic resources of important food crops (Khoury et al. 2013), and it

48

faces a continued threat of extinction due to development, competition from invasive species, and climate change. The Okeechobee gourd was historically more widespread in Florida, but by 1930 95% of its habitat was destroyed when pond apple (Annona glabra L.) forests were cleared in attempts to develop the Everglades and other regions of Florida for agricultural fields. The Okeechobee gourd now only occurs along the shore of Lake Okeechobee and a short stretch of riverbank along the St. John's River in central Florida. The St. John's River populations are somewhat protected as they occur on State Parks Land, but the Lake Okeechobee populations face imminent threat by development, recreation, and water management practices. Furthermore, the populations of Okeechobee gourd that do exist are not robust. Two other CWR, the native vine Vitis rotundifolia Michx. and the invasive exotic Dioscorea bulbifera L., both appear to out-compete the Okeechobee gourd along the St. John's River. Some plants surveyed in 2015 by the author did not reemerge in 2016, and though the vines of the

Okeechobee gourd plants are extensive, they produced very little, if any, fruit, suggesting a possible lack of suitable or limitation by other environmental factors.

There are currently no active in-situ Okeechobee gourd conservation projects, though multiple grant proposals have been submitted to various public and private organizations (Minnow pers. comm.). Public awareness of this nearly extinct CWR is also lacking. Displays throughout the state parks where it occurs warn visitors of the threat of invasive plant and animal species and provide information about the parks' resident federally-listed mammal species, the Florida manatee, but do not mention the

Okeechobee gourd. Local landowners and park rangers are also currently unaware that

49

a federally listed endangered plant occurs in this area. Local accounts that the vine is regarded by some as a weed suggests some populations may be at risk of removal.

Although the Okeechobee gourd is the only listed endangered Cucurbita CWR, many Cucurbita CWR are geographically restricted. The current distributions of

Cucurbita CWR are likely more restricted than they were in the past (Kistler et al. 2015).

Disjunct species distributions in C. okeechobeensis, C. digitata, C. pedatifolia, and C. pepo and low genetic divergence between geographically distant occurrences of some species (Kates et al. 2017) suggest that the areas where Cucurbita CWR occur today represent only a fraction of the areas where they were historically distributed. The narrow present-day distribution of Cucurbita CWR may be due to ecological shifts and the extinctions of megafauna that consumed the bitter Cucurbita fruit and dispersed

Cucurbita seeds (Kistler et al. 2015). Although this ancient shift cannot be reversed, the hypothesis that the extinction of large herbivores led to drastic decline in Cucurbita

CWR populations highlights the importance of dispersal in maintaining the genetic diversity of extant CWR. Commonly, Cucurbita CWR are regarded as an agricultural nuisance in Mexico and Central America (Nabhan 1985) and appear regionally vigorous, so in situ conservation efforts are not pursued.

The seeds of Cucurbita CWR are now most commonly dispersed by water

(Nabhan 1985). After the fruits of Cucurbita CWR mature, the flesh inside dries out and the seeds are preserved inside the lignified rind (Nabhan 1985). These dried fruits may stay in the same place for months, but will eventually be carried by flashfloods (Nabhan

1985). As the buoyant Cucurbita CWR fruits are carried by fast-moving water, they hit against rocks and banks and their dry rinds break open allowing the seeds to be

50

scarified and then scattered in a new area. Because of the importance of seasonal flooding for the dispersal of Cucurbita CWR, unusually prolonged periods of drought in the deserts of the southwestern United States and Mexico inhibit the dispersal of

Cucurbita CWR. Dispersal by water is also important for non-xerophytic Cucurbita CWR that occur along rivers and lakes (C. okeechobeensis ssp. okeechobeensis; wild C. pepo ssp. ovifera), and lower water levels due to drought and water management practices could negatively impact the dispersal of these Cucurbita CWR.

Studies modeling the effect of future climate change on the distribution of

Cucurbita CWR show that the distributions of all Cucurbita CWR are expected to decrease substantially in the next sixty years (Lira et al. 2009). The specialized of Cucurbita CWR, spp., the , also faces threats due to climate change and the widespread use of agricultural pesticides (Watanabe 2013).

More research is needed on these pollinator species. Because of the large vine habit of

Cucurbita CWR, populations of Cucurbita CWR that grow as weeds in or near agricultural fields are removed to prevent the introduction of undesirable traits into

Cucurbita crops or to limit their competition with non-Cucurbita crops (Nabhan 1985).

Similarly, populations of Cucurbita CWR that grow on private, non-agricultural lands are also often removed because they are local weeds.

The deregulation of transgenic zucchini (C. pepo ssp. pepo var. cylindrica) in

Mexico is currently under consideration (Cruz and Reyes 2015) and has been the subject of numerous studies to examine the potential for transgenes to escape (see

Cruz and Reyes 2015; Sasu et al. 2009; Arriaga et al. 2005). Gene flow between populations of wild Cucurbita and cultivated Cucurbita is well-documented in multiple

51

species, including C. pepo (Wessel-Beaver 2000; Wilson et al. 1994; Montes-

Hernandez and Eguiarte 2002). The viability of F1 seed has been experimentally confirmed for some crosses (Cruz-Reyes et al. 2015). Experimental crosses of transgenic squash lines to wild squash have been made to assess the relative competitiveness of the hybrids compared with the parents (Cruz-Reyes et al. 2015).

Although transgenic x wild hybrids thus far do not appear to have a competitive advantage compared with wild Cucurbita, additional studies are needed (Cruz-Reyes et al. 2015; Arriaga et al. 2005). The need to preserve the genetic variation in wild

Cucurbita in Mexico is especially high, as these populations likely represent the ancestors of pepo pumpkin and squash and C. argyrosperma ssp. sororia and still harbor the genetic diversity lost in the crops during their domestication.

Ex situ

Maintenance and regeneration of genetically diverse Cucurbita CWR germplasm resources is critical for identifying the genes that underlie agronomically important traits in Cucurbita CWR. Large collections of Cucurbita CWR are held in seed banks (also referred to as "genebanks") all over the world (e.g. United States, Mexico, ,

Russia, Italy, , Colombia, Bolivia, Czech Republic, , Turkey, and )

(Clark et al. 1991; Nuez et al. 2002; Diez et al. 2002; Lebeda et al. 2007; Ferriol and

Pico 2008; Karlove 2008). Collections of Cucurbita are particularly common in seed banks in Mexico, such as the 25 community seed banks, established in 2005 to preserve in-situ conservation (Vera Sanchez et al. 2015), and in the country's largest wild plant seed bank at the Faculty of Higher Studies of Iztacala, UNAM (FESI-UNAM seed bank) (Rodríguez-Arévalo et al. 2016). However, a comprehensive understanding of genebank coverage and gaps for Cucurbita CWR in Mexico and other countries is

52

lacking in part due to insufficient collaboration among seedbanks around the world that each have unique systems of cataloguing and distributing germplasm.

Collections of Cucurbita CWR from throughout their native ranges are the first step in conserving and increasing ex situ collections of Cucurbita CWR. In over 100 years of the USDA plant exploration program, there has been only one exploration for wild Cucurbita in the United States that has resulted in deposition of accessions into the

National Plant Germplasm System. Collection of North American Cucurbita CWR germplasm from outside the United States for deposition into the USDA National Plant

Germplasm System is limited due to the phytosanitary and political issues described below. American researchers and botanists made many collections of Cucurbita CWR in Mexico in the decades prior to strict regulation; although some of these collections were deposited in the National Plant Germplasm System in the late 1980s and early

1990s and greatly increased the holdings of Cucurbita CWR from Mexico, many of these collections were never deposited (Robinson 1995), and seeds from these collections are unlikely to be viable.

Concerns regarding access and benefit sharing, phytosanitary issues, and a lack of funding limit or prevent sharing of germplasm resources among seedbanks and distributing germplasm to geneticists and breeders internationally. Cucurbita is not currently listed in Annex 1 under the Multilateral System of Access and Benefit Sharing of the International Treaty on Plant Genetic Resources for Food and Agriculture (FAO

2002), and thus does not benefit from pre-arranged facilitated access negotiations.

Distribution of germplasm outside of the country where it is held can be critical to ex situ conservation of genetic resources, because Cucurbita CWR collected in a climate that

53

differs from its storage location may not or fruit in the conditions of its seedbank location (Jarret pers. comm.). For seedbanks in the United States, this can be a problem for Cucurbita CWR that are adapted to tropical or subtropical climates.

Although this issue certainly affects the ex situ conservation of other CWR, it may be especially problematic for Cucurbita. In contrast to the most economically important crops like corn and rice, the infrastructure required to grow and pollinate the large, monoecious Cucurbita plants is not in place (Paris 2016), and investing these resources is risky when the regeneration efforts are not likely to succeed.

Limited funding for seedbank activities can restrict the regeneration efforts needed to make collections available for distribution to geneticists and breeders (Jarret pers. comm.). A high proportion of Cucurbita CWR held in the United States National

Plant Germplasm System is unavailable for distribution (Figure 2-5). Seedbanks respond to user requests. Hence, limited resources may be diverted to maintenance of collections that are more widely used by researchers. Greater demand for collections of modern cultivars (or specific crops) may limit the resources available for collections of

Cucurbita CWR, although these species likely harbor some of the most useful traits for crop enhancement (Robinson 1995). Only a few studies have attempted a thorough investigation of potentially valuable agronomic traits of Cucurbita CWR (e.g. Provvidenti et al. 1978; Scheerens et al. 1991). The xerophytic perennial species are especially understudied even though they are known to be resistant to drought and to many of the viral diseases that pose a threat to cultivated pumpkin and squash yields. Increased research interest in Cucurbita CWR is needed to promote the conservation of

54

genetically diverse CWR in genebanks and to create drought-tolerant, disease-resistant crops that can meet the present and future challenges to food security.

55

Table 2-1. Cultivated North American pumpkins and squashes Subspecies Cultivar groups Origin Current Most common cultivation uses

C. pepo ssp. Pumpkin, Mexico ~10,000 Worldwide Fruit (immature, pepo (pepo Vegetable years B.P.2 mature, canned) pumpkin and Marrow, squash) Cocozelle, Zucchini, round ornamental gourds1 C. pepo ssp. Scallop, Acorn, Eastern North Worldwide Fruit (immature, ovifera (ovifera Crookneck, America ~5,000 mature, canned) pumpkin and Straightneck, years B.P.2 squash) oviform ornamental gourds1 C. maxima Banana South America Worldwide Fruit (immature, ssp. maxima squash; ~4,000 years esp. Africa mature, canned, (buttercup Delicious B.P.2 and Asia decorative) squash) squash; Buttercup squash; Hubbard squash; Show pumpkins; Turban squash; Kabocha4 C. moschata Cheese; Unknown Worldwide Fruit (immature, (butternut Crookneck; (Mexico, Central esp. Africa mature, canned) squash) Bell5 America, or and Asia South America) >5,000 years B.P.6 C. Silver-seed Southern Limited. Seeds (snack argyrosperma gourd; green- Mexico ~7,000 Mexico, food; oil; meal); ssp. stripe cushaw; years B.P.2 U.S.A., Fruit (usually argyrosperma Calabaza Central mature) (cushaw) pipiana3 America

56

Table 2-1. Continued Subspecies Cultivar groups Origin Current Most common cultivation uses

C. ficifolia None Unknown Limited. Fruit (immature, (figleaf gourd) commercially (Mexico, Central Mexico, mature); as recognized. America, or Central rootstock Other names South America) America, include >3,000 years South Malabar melon B.P.7 America, and sharkfin China. High gourd (>1000m) altitudes. 1. (Paris et al. 2012) 2. (Smith 2006) 3. (Lira Saade and Montes Hernandez 1994) 4. (Decker-Walters and Walters 2000) 5. (Robinson and Decker-Walters 1997) 6. (Cohen 1978) 7. (Towle 1961)

57

Table 2-2. Cucurbita CWR CWR Genepool (A; B; Native range Potential agronomic traits C; D; E)

C. argyrosperma 2; 1; 2; 3; 3 Pacific coast from resistant to BYMV and ssp. sororia Sonora in Mexico TmRSV south to Nicaragua C. cordata 3; 3; 3; 3; 3 Baja California drought-tolerance; (Mexico) resistant CMV, TRSV, BYMV C. digitata 3; 3; 3; 3; 3 southwestern drought-tolerance; United States and resistant to CMV, TmRSV northwestern Mexico C. foetidissima 3; 3; 3; 3; 3 southwestern drought-tolerance; United States and resistant to CMV, TRSV, northern Mexico BYMB, WMV, and squash vine borer C. okeechobeensis 2; 3; 2; 3; 3 southern Mexico resistant to CMV, BYMV, ssp. martinezii (gulf coast) TRSV, bacterial leaf spot,

powdery mildew

C. okeechobeensis 2; 3; 2; 3; 3 Florida (United resistant to CMV, BYMV, ssp. States) TRSV, bacterial leaf spot, okeechobeensis powdery mildew

C. palmata 3; 3; 3; 3; 3 southwestern drought-tolerance;

United States and resistant to CMV, TRSV,

Baja California BYMV, TmRSV

C. lundelliana 2; 3; 2; 2; 2 southern Mexico resistant to SqLCV, CMV, (Tobasco to powdery mildew; used as Yucatan) (and a genetic bridge for northern Central breeding non-interfertile America) species

58

Table 2-2. Continued CWR Genepool (A; B; Native range Potential agronomic traits C; D; E)

C. pedatifolia 3; 3; 3; 3; 2 north-central to drought-tolerance; southern Mexico disease resistance unstudied C. pepo ssp. pepo 1; 2; 2; 3; 3 Texas (United undiscovered var. texana States)

C. pepo ssp. pepo 1; 2; 2; 3; 3 central United undiscovered var. ozarkana States

C. pepo ssp. 1; 2; 2; 3; 3 northern gulf undiscovered fraterna coast of Mexico

C. radicans 3; unknown; 2; 3; north-central to drought-tolerance; unknown southern Mexico resistant to CMV, TmRSV; BYMV; production of potato-sized tubers Genepool relative to A: pepo and ovifera pumpkins and squashes; B: cushaw; C: butternut squash; D: buttercup squash; E: figleaf gourd. Diseases: Cucumber Mosaic Virus (CMV), Watermelon Mosaic Virus (WMV); Tomato Ringspot Virus (TmRSV); Bean Yellow Mosaic Virus (BYMV); Tobacco Ringspot Virus (TRSV); Squash Leaf Curl Virus (SqLCV);

59

Figure 2-1. Species richness map of modeled potential distributions of North American Cucurbita taxa. Modeling based on climatic and edaphic similarities with herbarium and genebank reference localities. Warmer colors indicate areas where greater numbers of taxa potentially occur in the same geographic localities.

60

Figure 2-2. Modeled potential distribution maps of Cucurbita primary CWR species (Cucurbita pepo ssp. fraterna, C. pepo ssp. ovifera var. ozarkana, C. pepo ssp. ovifera var. texana, C. argyrosperma ssp. Sororia), based on climatic and edaphic similarities with herbarium and genebank reference localities. Full methods for generation of maps in Figure 2-2, Figure 2-3, and Figure 2-4 is given in Greene (in press).

61

Figure 2-3. Modeled potential distribution maps of Cucurbita secondary CWR species (Cucurbita lundelliana, C. okeechobeensis ssp. okeechobeensis, C. okeechobeensis ssp. martinezii, C. radicans, C. pedatifolia), based on climatic and edaphic similarities with herbarium and genebank reference localities.

62

Figure 2-4. Modeled potential distribution maps of Cucurbita tertiary CWR species (Cucurbita foetidissima, C. palmata, C. digitata, C. cordata) based on climatic and edaphic similarities with herbarium and genebank reference localities.

63

Figure 2-5. Number of Cucurbita CWR accessions in the NPGS. Dark gray bars indicate accessions available for distribution; light gray bars indicate accessions currently unavailable.

64

CHAPTER 3 EVOLUTIONARY AND DOMESTICATION HISTORY OF Cucurbita (PUMPKIN AND SQUASH) SPECIES INFERRED FROM 44 NUCLEAR LOCI1

Introduction

The domestication of Cucurbita (squashes and pumpkins) is among the earliest on record, with archaeological evidence indicating that Cucurbita species were used by humans as early as ca. 10,000 B.P. (Smith, 2006). With an incredibly rich and long history of domestication and cultivation, Cucurbita crops are unrivalled in fruit morphological diversity and their wide range of adaptation to cultivation. Among all plants cultivated for edible fruit, squashes and pumpkins have the most dramatic increase in fruit size compared with their wild relatives (Savage et al., 2015).

Each of the six cultivated species of squashes and pumpkins exhibits the domestication traits noted above; different genetic changes could underlie their convergent morphologies in that each crop appears to have originated from a different wild species. Furthermore, the cultivated taxa exhibit different adaptations to cultivation

(Nee, 1990). For example, C. ficifolia (figleaf gourd) is adapted to cool temperatures and short days whereas C. moschata (butternut squash, Seminole pumpkin, canned pumpkin, and others) grows best in the humid tropics. Cucurbita pepo ssp. pepo (jack- o-lantern pumpkin, zucchini, summer squash, , and others) is adapted to the widest range of habitats and is the most widely cultivated taxon (Nee, 1990). The multiple domestication events from diverse wild ancestors in Cucurbita serve as

Reprinted with permission from Kates, H.R., P.S. Soltis, and D.E. Soltis. 2017. Evolutionary and domestication history of Cucurbita (pumpkin and squash) species inferred from 44 nuclear loci. Molecular and evolution 111: 98–109.

65

replicates of domestication and therefore make Cucurbita an especially valuable system in which to study crop evolution.

Cucurbita (2n = 40) comprises 14 species, with six subspecies and two wild varieties (Nesom, 2011) (Table 3-1). The wild taxa occupy diverse elevational and latitudinal habitats ranging from the Midwestern United States to southern Argentina, with a center of diversity in Mexico (Figure 3-1). Five of the wild Cucurbita species are xerophytic perennials that occur in Mexico and the southwestern United States; seven wild species are mesophytic annuals that occur in Mexico, the southeastern and southcentral United States, and Central and South America (Nee, 1990). The mesophytic annuals are apparently the result of a very recent radiation (~7 mya) from a common ancestor in Central or South America (Schaefer et al., 2009).

The six Cucurbita crop lineages are hypothesized to be of independent origin from different wild ancestors in different geographic regions (Nee, 1990) (Figure 3-1).

Whereas the origins of many similarly valuable crops (e.g. common bean: Gepts, 1996; kiwi: Atkinson et al., 1997; cassava: Olsen et al., 1999; pea: Jing et al., 2011; eggplant:

Meyer et al., 2012) have been identified, the ancestors of four of the six Cucurbita crop lineages, including the two most economically important, C. pepo ssp. pepo and

C. moschata, are still unknown, despite repeated efforts to identify them (e.g. Sanjur et al., 2002; Zheng et al., 2013; Kistler et al., 2015).

Phylogenetic studies of crops and their wild relatives are a necessary crucial first step in investigating the genetic basis of domestication, as interpretation of any phenotype in an evolutionary context requires a phylogeny (Kellogg & Birchler, 1993;

Olsen & Purugganan, 2002; Caicedo & Schaal, 2004; Collins et al., 2005). Furthermore,

66

resolving sister relationships between crops and their wild relatives (e.g. Hoffman et al.,

1986; Mayer & Soltis, 1994) is key to identifying the ancestors of cultivated plants.

Moreover, studies that include sufficiently broad geographic sampling can point to geographic regions where close relationships between wild populations and the domesticate suggest a center of domestication (e.g. Smykal et al., 2010; Huang et al.,

2012; Meyer et al., 2012).

Studies of evolutionary relationships within domesticated groups have contributed greatly to our understanding of the history and origin of cultivated plants. For example, researchers argued for decades about whether the distinctive architecture of could be the product of a single vs. multiple origins (e.g. Wilkes,

1977; Doebley & Iltis, 1980; Doebley, 1990). However, a genetic study of relationships within Zea (Matsuoka, 2002) settled this debate by showing that cultivated maize is most closely related to teosinte from , Mexico, with all other wild Zea species outside of this clade.

Much of what is known about the origins of domestication in Cucurbita comes from a combination of archaeological findings (e.g. Piperno & Pearsall, 1998; Smith,

2006), studies of the genetic diversity within individual lineages in the genus (e.g.

Decker-Walters et al., 2002; Ferriol et al., 2003; Paris et al., 2003; Andres, 2006; Heikal et al., 2008, Gong et al., 2012), and extensive observations of wild and cultivated

Cucurbita taxa in their native ranges throughout North, Central, and South America (e.g.

Nee, 1990). Some longstanding ideas about squash and pumpkin domestication therefore represent hypotheses that have not been supported with genetic data. Sanjur et al., (2002) hypothesized that the ancestor of C. pepo ssp. pepo is a close relative of

67

the wild C. pepo ssp. fraterna, but no phylogeny has resolved the C. pepo clade to support this suggestion (e.g. Sanjur et al., 2002; Zheng et al., 2013; Kistler et al., 2015).

Nee (1990) proposed northern Colombia as the likely site of origin of domestication for

C. moschata, but no published phylogeny has included sufficient samples of this taxon to find a geographic center of phylogenetic diversity that allows for further investigation of this hypothesis with additional phylogeographic analyses.

Molecular phylogenetic studies of Cucurbita support the proposed ancestry of two of the six crops: C. maxima ssp. maxima (buttercup squash, giant pumpkin,

Hubbard squash, kabocha squash) and C. argyrosperma ssp. argyrosperma (cushaw pumpkin, Japanese pie pumpkin) (Sanjur et al., 2002; Zheng et al., 2013; Kistler et al.,

2015). However, the crop and breeding communities continue to rely on putative relationships such as those described above, and phylogenetic studies have thus far been unable to offer alternative hypotheses for the wild ancestors of four of the crop lineages (Sanjur et al., 2002; Zheng et al., 2013; Kistler et al., 2015). Although it is possible that the wild ancestors of crops with unclear ancestry are extinct, previous studies have also been limited by sparse taxonomic, geographic, and genetic sampling.

Furthermore, the very recent and rapid diversification of the wild species and potential hybridization among Cucurbita species (Bemis & Nelson, 1963) may impede phylogenetic inference using slow-evolving and maternally inherited chloroplast DNA sequence data.

Here we improve upon previous phylogenetic knowledge of Cucurbita through increased taxonomic, geographic, and genetic sampling. We present a phylogenetic reconstruction of all cultivated squashes, pumpkins, and gourds and 13 of the 14 crop

68

wild relatives based on multiple, unlinked nuclear loci that provides the first support for close relationships between wild species and two of the six crop taxa. Through our study, we hope to enhance Cucurbita as a study system for investigating adaptive response to selection under domestication.

Materials and Methods

Plant Material and DNA Extraction

When species divergences are recent (resulting in incomplete lineage sorting), sampling more individuals per species leads to the reconstruction of a more accurate species tree (Maddison & Knowles, 2006). Our analysis is the first reconstruction of

Cucurbita phylogeny to include multiple individuals of 19 taxa, with 92 individuals from

19 of the 20 species, subspecies, and wild varieties of Cucurbita (all taxon names following Nesom, 2011). However, we were unable obtain material from C. radicans, a xerophytic species narrowly-distributed in western Mexico. Two outgroup species,

Citrullus lanatus and Cucumis sativus, were selected based on phylogenetic results for

Cucurbitaceae (Schaefer et al., 2011) (Table A-1). Leaf samples for DNA extraction were obtained from seedlings grown at the University of Florida, Gainesville, FL, from seeds obtained from the USDA or from herbarium specimens (Table A-1). DNA was extracted from fresh or dried leaf tissue using a modified 2X CTAB method (Doyle &

Doyle, 1987) yielding ~50-120 ng/µl of DNA per sample. Modifications to the Doyle &

Doyle (1987) protocol include a second addition of chloroform-isoamyl alcohol to the aqueous phase immediately following the first addition, and precipitation of DNA at -

20ºC instead of room temperature. We used a precipitation time of thirty-minutes for fresh tissue and overnight for dried tissue.

69

Selection of Nuclear Loci

To improve the resolution of relationships within and among closely related taxa

(following e.g. Parks et al., 2009; Steane et al., 2011), we sampled loci that are unlinked and more variable than those used in previous analyses of Cucurbita phylogeny (Sanjur et al., 2002; Zheng et al., 2013; Kistler et al., 2015). We identified 85 putatively single- copy, highly variable nuclear loci from a set of four reference genomes and transcriptomes (watermelon (Citrullus lanatus), cucumber (Cucumis sativus), melon

(Cucumis melon), and pumpkin (Cucurbita pepo ssp. pepo; icugi.org), hereafter

“cucurbit references”) using a custom pipeline (Figure C-1). Our pipeline selected intron sequences (400-700 bp) flanked by conserved exons with no highly similar sequence(s) likely to correspond to a paralog present in the same genome or transcriptome.

Sequence similarity among putative genes in the four references was determined by

BLAST with an e-value cutoff of 1.0e-5.

Microfluidic PCR Amplification and Illumina Sequencing

To amplify the 85 potentially phylogenetically informative loci, we used primer3

(Untergasser et al., 2012) in Geneious ver. 7.1.9 (http://www.geneious.com, Kearse et al., 2012) to design primers with uniform 60°C annealing temperatures and no homopolymers of length greater than three, as recommended for the Fluidigm Access

Array™ System (Fluidigm, San Francisco, CA, USA). We used alignments of the four cucurbit references to identify exon regions conserved across all cucurbit references and designed primers from C. pepo ssp. pepo exon sequences in these regions to amplify intron sequences (Table B-1).

In addition to the target-specific primer sequences identified, two conserved sequences (CS1 and CS2, Access Array™ System for Illumina Sequencing Systems

70

User Guide) were added to the 5’ end of both the forward and reverse primers to provide an annealing site for a second pair of custom-designed primers composed of the complementary CS sequence, Illumina adapters, and a sample-specific forward and reverse barcode to allow for all samples to be multiplexed in a fraction of a sequencing run.

Each primer set was validated prior to microfluidic PCR amplification to ensure success under the specific conditions on the Fluidigm Access Array™ (Uribe-Convers et al., 2016). We performed primer validation reactions on one of the species used for primer design, C. pepo, including both fresh and herbarium-derived samples, and on C. foetidissima, which is thought to be among the most phylogenetically distant species from C. pepo, based on previous phylogenetic studies (Zheng et al., 2013). We performed primer validation for each primer set in 10-µL reactions following the Access

Array™ System for Illumina Sequencing Systems User Guide. Reactions were as follows: 1 µL of 10X FastStart High Fidelity Reaction Buffer without MgCl2 (Roche

Diagnostic Corp., Indianapolis, , USA), 1.8 µL of 25 mM MgCl2 (Roche), 0.5 µL

DMSO (Roche), 0.2 µL 10mM PCR Grade Nucleotide Mix (Roche), 0.1 µL of 5 U/µL

FastStart High Fidelity Enzyme Blend (Roche), 0.5 µL of 20X Access Array™ Loading

Reagent (Fluidigm), 2 µL of 2 µM barcoded primers, 2 µL of 50 nM target-specific primers, 0.5 µL of 25-50 ng/µL genomic DNA, and 1.4 µL of PCR Certified Water.

Resulting amplicons from successful primer validation reactions were analyzed using a

Bioanalyzer High-Sensitivity Chip (Agilent Technologies, Santa Clara, California, USA), and the 48 primer pairs that produced a single amplicon of the expected size and that had no primer dimers were selected (Table B-2) for microfluidic PCR.

71

Access Array™ System (Fluidigm) Microfluidic PCR, Illumina MiSeq sequencing, and quality-control steps were performed at the University of Idaho Institute for

Bioinformatics and Evolutionary Studies Genomics Resources Core Facility. Microfluidic

PCR was performed in an Access Array™ System (Fluidigm) using two 48.48 Access

Array™ Integrated Fluidic Circuits (Fluidigm) following the manufacturer’s protocols. On this array, 48 samples were simultaneously amplified using the 48 unique primer pairs described above. The resulting ~2,304 unique PCR amplicons were harvested and pooled by individual in equal volumes and sequenced on 1/10th of a run of an Illumina

MiSeq using 300-bp paired-end reads.

Assembly of Sequencing Reads into Targeted Nuclear Loci

Demultiplexing and assembly of Illumina sequencing reads followed Uribe-

Convers et al. (2016). Briefly, Illumina sequencing reads were demultiplexed with the python preprocessing application dbcAmplicons

(https://github.com/msettles/dbcAmplicons) by individual using the individual-specific dual barcode combinations and by locus using the target-specific primers. To address heterozygosity of nuclear loci, we performed two assemblies of Illumina reads into representative sequences for each of the 48 amplicons using the reduce_amplicons R script (https://github.com/msettles/dbcAmplicons/blob/master/scripts/R/ reduce_amplicons.R) described in Uribe-Convers et al. (2016). In cases where the target locus was longer than the paired-end read length, the two reads were concatenated into a single sequence prior to assembly with reduce_amplicons. One assembly used the “-p ambiguity” option to generate consensus sequences with IUPAC ambiguity codes, and the second assembly used the “-p occurrence” option to generate allelic occurrences for every individual at every locus. Briefly, “-p ambiguity” calls an

72

ambiguity code for any site represented by more than one base when each variant is present in at least 5 reads and 5% of the total of reads. The “-p occurrence” option assembles reads into two alleles at heterozygous loci by reducing reads to counts of identical reads if the count represents at least 5% of the total of reads and contains at least 5 reads. This method of assembly recovers true alleles if the following assumptions are met: sequencing errors are mostly random, and reads containing errors are present at much lower frequencies than real biological variation (Uribe-

Convers et al., 2016). These two methods of assembly resulted in two data matrices for downstream analysis: 1) consensus sequences with IUPAC ambiguity codes

(“ambiguity data set”), and 2) consensus sequences with two alleles per individual for any individual heterozygous at a locus (“allelic data set”) such that each heterozygous individual was represented by two tips in gene trees.

Sequence Alignment and Phylogenetic Analyses

Each locus was aligned with MUSCLE v 3.8.31 (Edgar, 2004) using default settings; alignments were visually inspected and edited in Geneious ver. 7.1.9

(http://www.geneious.com, Kearse et al., 2012). Alignments with data for fewer than 20 individuals were not included in the analyses, resulting in 42 alignments for both the ambiguity and allelic data sets. Two alignments were inferred to include paralogous loci based on the occurrence of more than two alleles per individual and preliminary maximum likelihood (ML) phylogeny reconstruction of each locus using RAxML v 8.2.3

(Stamatakis, 2014) and 1,000 bootstrap replicates under the GTRCAT model. Paralogy was inferred using gene phylogenies reconstructed as described above -- if two major clades were resolved that each contained a set of sequences that represented all or most sampled taxa then this was considered to represent paralogy. The paralogous loci

73

were separated into four putative loci based on the major clades described above, resulting in data sets of 44 loci with a total aligned length of 20.3 kb. Observed heterozygosity was calculated as the proportion of individuals with two alleles at a locus, averaged across 44 loci.

To compare alternative strategies for phylogenetic analyses of a multi-locus nuclear data set, we performed a concatenated analysis using ML (“CA-ML”) of the ambiguity data set and species tree estimation using the gene trees inferred from the ambiguity and allelic data sets. Prior to concatenation and species tree estimation, we conducted ML analysis of each locus using RAxML as described above for both the ambiguity and allelic alignments. We assessed taxon-level congruence of the resulting gene trees (with nodes with less than 50% bootstrap support (“BS”) collapsed) by visual inspection of each topology, rather than standard congruence tests, to avoid erroneous detection of incongruence due to missing data (Leigh et al., 2011).

Two likely sources of observed gene tree incongruence in our dataset are error in gene trees reconstructed from short loci and incomplete lineage sorting. We proceeded with CA-ML analysis despite gene tree incongruence because concatenation methods are robust to short loci but cannot account for incomplete lineage sorting. Thus comparison of CA-ML and species tree estimation results may aid in distinguishing incongruence due to gene tree error from that due to incomplete lineage sorting.

Because there is no straightforward way to concatenate multiple alleles across alignments in the absence of a chromosomal linkage map (e.g. Edwards et al., 2008), we did not analyze the allelic data set using the concatenated approach. ML analysis of the concatenated ambiguity data set was conducted using the MPI/Pthreads

74

version of RAxML v 8.2.3 (Stamatakis, 2014) with 1,000 bootstrap replicates under the

GTRCAT model. The GTRCAT model was selected using JModelTest2 (Darriba et al.,

2012; Guindon and Gascuel 2003) and the CAT approximation of rate heterogeneity was used to accelerate computation (Stamatakis 2006).

We used a coalescent-based species tree approach as implemented in ASTRAL v 4.7.4 (Mirarab & Warnow, 2015) to explore the effects of incongruence among gene trees on the species topology. ASTRAL is statistically consistent under the multispecies coalescent model and estimates a species tree given a set of unrooted gene trees

(Mirarab & Warnow, 2015). We present two results of ASTRAL species tree estimation: one from the allelic data set ("Sp-TR-alleles") and one from the ambiguity data set ("Sp-

TR-amb"). Each species tree was estimated from a set of 44 ML Majority-Rule-

Consensus gene trees to reduce the effect of erroneous signals that may be present in poorly supported branches of best ML trees. For the allelic data set, we used a multi- individual mapping file (ASTRAL -a option) to map alleles to individuals in the case of heterozygotes as there is currently no feasible way to estimate a species tree from gene trees with multiple alleles per individual without mapping alleles to individuals. We performed multi-locus bootstrapping with 1,000 replicates for both trees.

Ancestral State Reconstruction

We examined the evolution of mesophytic and xerophytic habits using a topology we manually constructed to represent species-level relationships all of which were resolved congruently in both the CA-ML and species trees. We used two outgroup taxa,

Peponopsis adhaerens and Polyclathra cucumerina, placed based on the phylogeny of the entire Cucurbitaceae (Schaefer et al. 2009) to polarize the reconstruction. To create a bifurcating tree for ancestral character state reconstruction, we collapsed

75

xerophytic species into two terminals ( 1 and Xerophytes 2) that represent the two xerophytic clades of non-monophyletic species resolved in the species trees and CA-ML tree. Branch lengths for this tree were optimized in RAxML v 8.2.3

(Stamatakis, 2014) using an alignment of GenBank sequence data of four chloroplast loci with one representative for each terminal.

Habit (xerophytic or mesophytic) was scored from the literature (Schaefer &

Renner, 2011). We conducted ML reconstructions using the ACE (ancestral character estimations) function of the ape R package (Paradis et al.,, 2004; R Development Core

Team, 2010). We tested two alternative models, ER (equal rates; assuming equal transition rates between the character states) and ARD (all-rates-different; transitions between the character states allowed to vary freely). The ARD model had a higher log likelihood value of -4.54 compared with -4.61. However, a chi square test of the difference in log likelihood between the two models yielded a p-value of 0.7. We therefore did not accept the more heavily parameterized ARD model, and instead chose the single-parameter ER model.

Results

Sequence Data

Success of microfluidic PCR amplification was correlated with phylogenetic distance of a taxon to Cucurbita pepo ssp. pepo, from which the PCR primers were designed (Figure C-2). We did not observe a correlation between primer length and amplification success (Figure C-2), and although we did observe a lower proportion of amplification success for samples derived from herbarium specimens (Figure C-2), our sample set included four herbarium samples and ninety-four fresh leaf samples, so this comparison may be biased. The variable amplification success across taxa resulted in

76

an initial data matrix with 42% missing data and 81% taxon coverage density (Figure 3-

2). However, substantially more missing data have been shown to be non-problematic in simulation and empirical studies (e.g. Driskell et al., 2004; Philippe et al., 2004; Wiens

& Moen, 2008; Wiens et al., 2010). In fact, including taxa with highly incomplete, non- random character sampling in phylogenetic analyses has been shown to dramatically improve results in some cases (Wiens 2006). Furthermore, some species trees methods, including ASTRAL, are robust to issues that arise because of non-random missing data when concatenation is used (Zhengxiang Xi et al. 2015). Nonetheless, to be cautious, we removed samples that occupied less than four gene alignments yielding a final data matrix of 94 samples with only 24% missing data. The numbers of aligned bases, including gaps, for 44 loci totaled 20.3 kb; 19.9 kb nucleotide positions were variable. Heterozygosity varied among species between ~7% in C. lundelliana and C. palmata to ~43% in C. foetidissima, and observed heterozygosity was 22% overall.

Phylogeny of Cucurbita Based on CA-ML and Species Tree Methods

The phylogeny of Cucurbita species provides strong support (BS 100%) for the monophyly of the mesophytic species, including the domesticated figleaf gourd, C. ficifolia. The xerophytic species form a grade with C. digitata + C. cordata sister to the rest of Cucurbita, and a clade formed by accessions of C. foetidissima and C. pedatifolia (neither of which is monophyletic) sister to all mesophytic species (Figure 3-

3). An accession of the xerophytic C. palmata is a member of each of these xerophytic clades.

The clade of mesophytic species includes the six Cucurbita crop lineages and their close relatives (Figure 3-3). We recovered C. pepo (domesticated pumpkin, domesticated ornamental gourd, and their close wild relatives) sister to C.

77

okeechobeensis + C. lundelliana. Together these three species are sister to the C. moschata (butternut squash) + C. argyrosperma (domesticated cushaw pumpkin and its close wild relative) clade. We resolved the species pair of C. maxima (buttercup squash and its close wild relative) + C. ecuadorensis as sister to the rest of the mesophytic species.

For considerations of within-species resolution we will refer to the CA-ML tree, although we note that the concatenation method used can increase phylogenetic resolution despite incongruent within-species relationships among gene trees (cite). We define percent resolution within a clade as the number of nodes with > 50% BS divided by the number of possible bifurcations (two less than the number of tips). Cucurbita maxima is 42% resolved in the CA-ML tree (Figure 3-3) and is the only species in which the cultivated and wild accessions form separate clades, with 100% and 92% BS, respectively. We did not observe a geographic association within the clade of cultivated accessions of C. maxima ssp. maxima.

Cucurbita argyrosperma is 31% resolved in the CA-ML tree (Figure 3-3). In contrast to the pattern in C. maxima, the wild and cultivated accessions of C. argyrosperma do not form separate clades. Three of the five subclades recovered (BS

> 50%) include both wild and cultivated accessions. We included wild and cultivated accessions of C. argyrosperma from throughout both the native range and the range of cultivation in northwestern Mexico, eastern Mexico, southwestern Mexico, western

Mexico, Central America, and the United States and found no geographic pattern associated with phylogenetic resolution within C. argyrosperma.

78

The C. moschata clade is the best-resolved species clade in our phylogeny (71% resolved in the CA-ML tree) (Figure 3-3). We included accessions of C. moschata from throughout its range of cultivation in Mexico, Central America, South America, and the

United States and found no clear geographic pattern to the relationships.

Cucurbita pepo is 48% resolved in the CA-ML tree (Figure 3-3). We did not find internal support (i.e., BS > 50%) for the monophyly of either of the two wild varieties of

C. pepo ssp. ovifera, and the resolution among accessions of C. pepo ssp. ovifera was not correlated with geographic origin or taxonomy. The CA-ML tree resolves accessions of cultivated C. pepo ssp. ovifera in a clade with the three wild C. pepo taxa. The most widely cultivated pumpkin and squash, C. pepo ssp. pepo, forms a clade (BS = 91%) sister to the rest of wild and cultivated C. pepo and thus is not sister to any wild accessions. Cucurbita pepo ssp. fraterna (wild) forms a clade (BS = 100%) within the C. pepo minus C. pepo ssp. pepo (crop) clade.

Our phylogeny does not resolve accessions of C. okeechobeensis as a clade, and accessions of C. lundelliana form a clade in the Sp-TR-amb tree (BS = 89%), but not in the CA-ML tree. There is lower support for relationships within the C. okeechobeensis + C. lundelliana clade in the CA-ML tree (Figure 3-3) than in the Sp-

TR-amb (Figure D-1).

Ancestral Character State Reconstruction

The maximum likelihood reconstruction under the ER model showed that the xerophytic habit is the more likely character state for the ancestor of Cucurbita

(ancestral state xerophytic: likelihood value = 59%, ancestral state mesophytic = 41%).

79

Differences in Topology and Support among the Gene Trees, CA-ML Tree, and Species Trees

The branching order of species we recovered here is congruent between both species trees and the CA-ML tree (Figure 3-5). Bootstrap support for between and among species relationships is similar between the CA-ML tree and Sp-TR-amb tree

(Figure 3-5) and lower for some nodes in the Sp-TR-alleles tree, with two exceptions: 1)

C. pepo sister to C. lundelliana + C. okeechobeensis has 100% BS in the CA-ML tree compared with 72% in the Sp-TR-amb tree and 82% in the Sp-TR-alleles tree, and 2) the placement of the second-branching clade of xerophytic species sister to all mesophytic species has 75% BS in the CA-ML tree compared with 100% in the Sp-TR- amb tree and 97% in the Sp-TR-alleles tree. The CA-ML tree has greatest within- species resolution and the Sp-TR-alleles tree has the least within-species resolution.

For example, C. pepo is 48% resolved in the CA-ML tree (Table 3-1), 36% resolved in the Sp-TR-amb tree, and 16% resolved in the Sp-TR-alleles tree. All species exhibit within-species incongruence to varying degrees between the CA-ML tree and the Sp-

TR-amb tree (Figure 3-5), and the low within-species resolution in the species trees is likely due in part to the use of majority-rule consensus gene trees to estimate these trees.

All gene trees (reconstructed from both ambiguity and allele sequences) had lower resolution than the CA-ML or species trees. The 44 ambiguity gene trees had an average of 80 tips and 29 bifurcations when branch lengths < 1e-5 were collapsed to reduce forced bifurcations. Out of the 44 ambiguity gene trees, six gene trees do not resolve most species as clades (with BS > 50%) and thus do not support any species- level topology, 15 gene trees resolve a species-level topology congruent with the CA-

80

ML tree and the species trees, and the remaining 23 gene trees support (BS > 50%) alternative topologies (Supplementary Figs. 5-48). Gene tree incongruence with the CA-

ML and species tree topologies is due to the variable position of C. ficifolia in ten of the gene trees and C. pepo in eight gene trees. The cause of incongruence in the remaining five gene trees varied. The 44 allele gene trees (Supplementary Figs. 49-92) were generally less resolved than the ambiguity gene trees, but showed a similar pattern of incongruence.

Discussion

New Cucurbita Species Topology

The species-level topology presented here (Figure 3-3) includes two major clades of mesophytic species: five-species clade from North and Central America and a two-species clade from South America. The branching order within the five-species clade differs from that in previous studies (Figure 3-4b,c) (Sanjur et al., 2002; Zheng et al., 2013; Kistler et al., 2015) and Zheng et al. (2013) did not recover these two clades

(Figure 3-4d). Only two relationships among the seven species in the mesophytic clades are consistent across all molecular phylogenetic studies conducted to date: C. maxima as sister to C. ecuadorensis, and C. lundelliana/C. okeechobeensis, The lack of reciprocal monophyly of C. lundelliana and C. okeechobeensis in our trees may be due to hybridization between the cross-compatible Mexican taxa, C. okeechobeensis ssp. martinezii and C. lundelliana (Whitaker & Bemis, 1964). We also note however that C. okeechobeensis had the highest within-species proportion of missing data (65%), and

C. lundelliana had among the lowest proportion of within-species variable sites (22%) and the characteristics of our data for these taxa could affect this result. The position of

C. pepo (including the most widely cultivated Cucurbita crops) differs in each study

81

conducted to date (Sanjur et al., 2002; Zheng et al., 2013; Kistler et al., 2015) and varies among our gene trees (Supplementary Figs. 5-92). Other studies resolved C. ficifolia, a mesophytic crop species that may live longer than a year without enlarged storage roots (Nee, 1990), as part of the grade of xerophytic species (Sanjur et al.,

2002; Zheng et al., 2013; Kistler et al., 2015), in contrast to our study's placement of C. ficifolia as sister to the rest of the mesophytic species (Figure 3-3, Figure D-1, Figure D-

2).

The xerophytic species have not been evenly sampled in previous phylogenetic studies of Cucurbita, so we cannot directly compare the relationships among these species found here with previous analyses. In our trees, three morphologically and geographically distinct xerophytic species (Bemis & Whitaker, 1969) are each reconstructed as non-monophyletic (C. foetidissima: paraphyletic, C. pedatifolia and C. palmata: polyphyletic). Although our sampling of these species is limited (Table 3-1), our results suggest the possibility of hybridization among xerophytic species. The three non-monophyletic xerophytic species are interfertile (C. pedatifolia and C. foetidissima,

C. foetidissima and C. palmata), and where these species are sympatric they are known to hybridize (Bemis & Whitaker 1969). The samples included here were not from the same geographic region (Table A-1), but past hybridization is a possibility as Cucurbita fruits can disperse long distances (Schaefer et al., 2009) and historical distributions of wild Cucurbita species were likely broader than they are currently (Kistler et al., 2015).

Non-monophyly of xerophytic species could also be due to lower variation in phylogenetic loci for xerophytic species compared with mesophytic annual Cucurbita species. Xerophytic Cucurbita are perennial species (Bemis & Whitaker, 1969) and thus

82

may have a slower rate of molecular evolution than the annual species included in this study (Andreasen & Baldwin, 2001).

Phylogenetic studies of Cucurbita, including our results, have been based on

DNA sequence data from one of three plant genomes (Sanjur et al., 2002: mitochondria;

Zheng et al., 2013 and Kistler et al., 2015: chloroplast; our study: nuclear). A lack of congruence among phylogenetic trees across these three genomes may be due to reticulate evolution and/or a lack of phylogenetically informative characters in one or more of the trees (Knowles et al., 2011). Our phylogeny and that using whole plastomes

(Kistler et al., 2015) were reconstructed from many variable DNA characters, but despite its large number of characters, the plastome is inherited as a linked unit and therefore must be regarded as a single locus (Doyle et al., 1990; Sun et al., 2016).

Incongruence between our nuclear and the plastome phylogeny and among our gene trees is unsurprising as incongruence among gene trees is expected in closely related species (e.g. Maddison, 1997; Degnan & Rosenberg, 2009; Knowles, 2009), particularly groups suspected of hybridization and/or incomplete lineage sorting

(Maureira-Butler et al., 2008; Gerard et al., 2011; Blanco-Pastor et al., 2012). Some gene trees reconstructed in this study (Supplementary Figs. 5-92) resolve topologies congruent with Kistler et al. (2015) (four gene trees), Zheng et al. (2013) (four gene trees), and Sanjur et al. (2002) (four gene trees) in part due to the variable placement of

C. pepo and of C. ficifolia. The variable position of these species among gene trees suggests possible ancient hybridization or incomplete lineage sorting. Although we do not see clear evidence of widespread hybridization in the incongruence among alternative topologies or gene trees, the possibility of hybridization and hybrid origins in

83

Cucurbita should be further studied with methods that model reticulate evolution

(Shizhong Xu, 2000; Linder and Rieseberg 2004).

Species tree estimation is expected to be consistent under incomplete lineage sorting (Mirarab & Warnow, 2015), therefore the higher BS value for the stem node of the mesophytic clade in the species trees (100%) than in the CA-ML tree (75%) (Figure

3-5) supports incomplete lineage sorting as a probable cause of the incongruent position of the first-branching mesophytic species C. ficifolia among gene trees. In contrast, the lower BS value for the stem node of C. pepo in the Sp-TR-amb (73%) than in the CA-ML tree (100%) (Figure 3-5) suggests the variable position of C. pepo may be due to weak phylogenetic signal associated with the low number of variable sites per gene (Patel et al., 2013; Smith et al., 2015) resulting in incongruence among gene trees due to error.

Whereas coalescent methods for species tree estimation assume that all gene trees used for the species tree estimation are correct and that any incongruence among the gene trees is due to incomplete lineage sorting, concatenation of loci overcomes the problem of short genes that may yield high-error gene trees by combining them into one longer locus with many variable sites (e.g. Gadagkar et al., 2005). This increase in variable sites is often needed to resolve relationships among closely related species

(e.g. Edwards et al., 2008; Parks et al., 2009; Crowl et al., 2015; Uribe-Convers et al.,

2016), and the Cucurbita CA-ML tree (Figure 3-3) has high tree-wide support compared with the species trees (Figure D-1, Figure D-2).

Evolution of Mesophytic Habit in Cucurbita Species

Ancestral character-state reconstruction using the species-level topology (Figure

3-3, Figure D-1, Figure D-2) support a xerophytic ancestor of Cucurbita and a single

84

loss of the xerophytic habit during the evolution of Cucurbita species (Figure 3-6). This finding differs from the hypothesis of Zheng et al., (2013) of multiple derivations of the mesophytic habit within Cucurbita and/or a mesophytic ancestor in part due to the position of C. ficifolia in our phylogeny. Cucurbita ficifolia is a crop that lacks drought- resistance and enlarged storage roots but has been resolved within the xerophytic grade in previous phylogenetic trees (Sanjur et al., 2002; Zheng et al., 2013; Kistler et al., 2015). Although our results point to a more parsimonious trait evolution than do previous studies, a set of the 44 gene trees resolved C. ficifolia as sister to xerophytic species as described above.

The placement of the South American C. maxima + C. ecuadorensis clade as sister to the rest of the North American mesophytic species suggests a period of diversification within the ancestral range of Cucurbita, as well as by dispersal into new areas. This diversification and dispersal appears tofollow the evolution of the mesophytic habit perhaps as mesophytic forms occupied new ecological niches in wetter climates south of the center of diversity of Cucurbita. The poor resolution of the backbone of the Cucurbita tree in individual gene trees points to rapid diversification following the evolution of the mesophytic habit. This hypothesis of recent and rapid radiation could explain in part why different Cucurbita phylogenies do not converge on a single topology of the mesophytic species. Biogeographic and diversification analyses of a Cucurbita phylogeny reconstructed with broader geographic sampling than we have included in this study are needed to address the above hypotheses.

Phylogenetic Resolution within Clades of Crops and their Wild Relatives

We found variable patterns of resolution within each of five clades that include domesticates (domesticated C. ficifolia is excluded here due to insufficient sampling) to

85

inform hypotheses for the number and magnitude of initial domestications for the

Cucurbita crops. Accessions of C. pepo ssp. ovifera var. ovifera (ornamental gourd) and

C. argyrosperma ssp. argyrosperma (cushaw pumpkin) included here both form multiple clusters that include both wild and domesticated accessions (three wild C. pepo taxa and C. argyrosperma ssp. sororia, respectively) (Figure 3-3). This pattern is similar to that observed in crops found to have multiple centers of diversification, broad domestication bottlenecks, and/or introgression between wild and domesticated species in other crops (e.g. African rice: Li et al., 2011; olive: Diez et al., 2015; kabuli chickpea:

Penmetsa et al., 2016).

These two crop-wild clades are each sister to a second crop clade: C. pepo ssp. ovifera/C. pepo ssp. fraterna (crop-wild) is sister to C. pepo ssp. pepo (crop), and C. argyrosperma (crop-wild) is sister to C. moschata (crop) (Figure 3-3). The possibility of a related domestication history for each of these two crop/crop-wild pairs may need to be considered. A shared domestication history between two Cucurbita crops has only been considered (though not supported by molecular studies: Decker, 1987; Decker,

1988) in C. pepo because C. argyrosperma and C. moschata, though interfertile, did not appear to hybridize when grown together (Merrick, 1990). In addition, the wide diversity of fruit types in C. moschata cultivated in South America led to hypotheses that South

America is the more likely range of its wild ancestor (Nee, 1990; Wessel-Beaver, 2000).

However, our phylogeny resolves separate South American and North

American/Central American Cucurbita clades, and the membership of C. moschata in the North America/Central America clade (Figure 3-3) makes it unlikely that the closest wild ancestor of C. moschata, if extinct or uncollected, would be found in South

86

America. Furthermore, at least one study has now confirmed natural hybridization between C. argyrosperma and C. moschata (Wessel-Beaver, 2000).

In contrast to the similar pattern of relatedness within and between the four

Cucurbita crops included in C. pepo and C. argyrosperma + C. moschata, the crop and wild accessions of C. maxima (buttercup squash) form two sister clades and this crop- wild clade is not sister to another crop clade (Figure 3-3). The resolution of C. maxima into separate wild and crop clades is similar to patterns seen in crops for which single domestications and low introgression between crop and wild taxa have been hypothesized (e.g. Einkorn : Heun et al., 1997; maize: Matsuoka et al., 2002; soybean: Guo et al., 2010). The phylogenetic resolution and observed heterozygosity among accessions of cultivated C. maxima ssp. maxima included in our study are the lowest of all the Cucurbita crops, further supporting the hypothesis of a narrow domestication bottleneck and a single domestication and limited post-domestication diversification for C. maxima. The within-species resolution in our Cucurbita phylogeny provides an initial framework for domestication-genetics studies to test these and other hypotheses of Cucurbita domestication.

Conclusions

Advances in methods for phylogeny reconstruction and the expansion of expert- curated seed banks of crops and crop-wild-relatives have enabled rigorous investigations into the genetic origins of domesticated species. Ours is the first phylogenetic study to include multiple accessions of all known wild Cucurbita species and to use many, unlinked genetic loci to reconstruct relationships between these species and the six Cucurbita crop lineages. The novel relationships recovered provide new information about the evolution of Cucurbita and the patterns of diversity within

87

Cucurbita crops and their wild relatives. We also demonstrate the need to study the evolution of closely related species using multiple, unlinked nuclear genes, as the relationships recovered in Cucurbita using these loci differ from those recovered using organellar loci.

The lack of close wild relatives identified in this study for C. moschata, C. ficifolia, and C. pepo ssp. pepo may be due in part to widespread extinction of wild Cucurbita, as proposed by Kistler et al. (2015) and to inadequate sampling of wild Cucurbita from regions of possible domestication. Future work to uncover the origins of these three

Cucurbita crops must begin with an extensive effort to survey and collect wild Cucurbita from these geographic regions. The three Cucurbita crops for which we have recovered well-supported relationships to wild species, C. maxima ssp. maxima, C. argyrosperma ssp. sororia, and C. pepo ssp. ovifera var. ovifera, present ideal models in which to study the genetic and demographic processes of plant domestication.

88

Table 3-1. Cucurbita species and their within-species resolution in the CA-ML tree Species No. Bifurcations Percent Example Individuals Resolution crop types sampled/tips Cucurbita pepo ssp. 27 12 48% jack-o- pepo c, C. pepo ssp. lantern ovifera var. ovifera c, pumpkin, C. pepo ssp. ovifera zucchini, var. texana w, C. pepo summer ssp. ovifera var. squash, ozarkana w, C. pepo spaghetti ssp. fraterna w squash; acorn squash, ornamental gourd Cucurbita maxima 14 5 42% Buttercup ssp. maxima c, C. squash, maxima ssp. giant andreana w pumpkin, Hubbard squash, kabocha squash Cucurbita 15 4 31% cushaw argyrosperma ssp. pumpkin, argyrosperma c, C. Japanese argyrosperma ssp. pie sororia w pumpkin Cucurbita moschata c 9 5 71% butternut squash, Seminole pumpkin, canned pumpkin, Cucurbita ficifolia c 2 1 - figleaf gourd, Malabar melon, Thai melon Cucurbita cordata w † 2 - - - Cucurbita digitata w † 2 - - - Cucurbita 2 - - - ecuadorensis w

89

Table 3-1. Continued Species No. Bifurcations Percent Example Individuals Resolution crop types sampled/tips Cucurbita foetidissima 6 - - - w†1 Cucurbita lundelliana 4 - - - w,1 Cucurbita 3 - - - okeechobeensis ssp. okeechobeensis w, C. okeechobeensis ssp. martinezii w,1 Cucurbita palmata w†1 2 - - Cucurbita pedatifolia 3 - - w†1 Cucurbita radicans w† 0 - - C crop taxon w wild taxon † Xerophytic perennial. Percent resolution calculated as nodes with > 50 % BS / possible nodes (two less than the number of tips) in the CA-ML tree. Species with – bifurcations are 1non- monophyletic or have only two accessions.

90

Figure 3-1. Distribution map of wild species of Cucurbita with occurrences based on ~7000 records from GBIF (www.gbif.org). Hypothesized origins of domestication for the six Cucurbita crop lineages are indicated by open circles. Colors correspond to tip colors in Figure 3: Cucurbita pepo ssp. ovifera var. ovifera: dark green; C. pepo ssp. pepo: light green; C. argyrosperma ssp. argyrosperma: blue; C. moschata: magenta; C. ficifolia: light blue; C. maxima ssp. andreana: orange.

91

Figure 3-2. Success of microfluidic PCR amplification of loci across taxa. Highlighted taxon C. pepo ssp. pepo was used for locus-selection and primer design. Black indicates that a locus was amplified for all individuals of the corresponding taxon, white indicates the locus was not successfully amplified for any individuals of the corresponding taxon. Tree illustrates relationships among these taxa as reconstructed in this study. Not all samples and loci presented in this heatmap of amplification success were ultimately included in the analyses presented in this paper.

92

Figure 3-3. Best maximum-likelihood tree reconstructed with RAxML using a matrix of concatenated ambiguity-coded gene sequences under the GTRCAT model. Support values are from 1,000 bootstrap replicates. BS values < 50% are not shown. Outgroup branch lengths are scaled to 1/10 of actual length to accommodate viewing of ingroup. Grey circles designate cultivated samples.

93

Figure 3-4. Comparison of alternative branching patterns among seven mesophytic Cucurbita species resolved by four phylogenetic studies of Cucurbita. a. The topology found in this study; b. Sanjur et al. (2002); c. Kistler et al. (2015); d. Zheng et al. (2013). Mesophytic species Cucurbita ficifolia is excluded due its position outside of this clade in some previous studies.

94

Figure 3-5. Comparison of the ML phylogeny inferred from the concatenated matrix and ASTRAL species tree estimated from ambiguity-coded gene trees, with branches not found in both trees highlighted in red. Nodes with <50% BS are collapsed in both trees. Support values are shown in black for incongruent nodes and in red where BS values disagree between the two trees by more than ten points.

95

Figure 3-6. Ancestral character-state reconstruction of habit using maximum likelihood. Possible states are mesophytic habit (black) and xerophytic habit (gray). Relative likelihood probabilities for each character state are represented with a pie chart at nodes.

96

CHAPTER 4 ALLELE PHASING HAS MINIMAL IMPACT ON PHYLOGENETIC RECONSTRUCTION FROM TARGETED NUCLEAR GENE SEQUENCES IN A CASE STUDY OF Artocarpus (Moraceae)

Introduction

Phylogenetic studies to reconstruct relationships among plant species at all taxonomic scales increasingly rely on hundreds of low-copy nuclear genes. Such biparentally inherited and highly variable nuclear DNA is crucial to reconstruct relationships among closely related species, but these phylogenomic datasets introduce challenges to phylogenetic inference. For example, multiple methods for reconstructing species trees that consider the overall distribution of discordant gene trees have been developed (e.g. (Ané et al., 2007; Liu et al., 2008; Kubatko et al., 2009; Heled and

Drummond, 2010); however, a major but mostly overlooked challenge to phylogenetic analysis of phylogenomic datasets is the treatment of heterozygosity within and across intraindividual polymorphic loci. Although intraindividual polymorphisms are rich sources of phylogenetic information, most algorithms for phylogenetic inference treat intraindividual polymorphic sites as ambiguous or missing characters (Potts et al., 2014) that negatively impact phylogenetic resolution.

Standard practice in assembling sequencing reads from heterozygous individuals for phylogenetic studies is to create a single consensus sequence per individual per locus (for exceptions see: Edwards et al., 2008; Weisrock et al., 2012; Kates et al.,

2017). In consensus sequences, heterozygous positions are either IUPAC ambiguity- coded or assigned a single nucleotide based on read-frequency. In the latter case, the resulting assembled locus is treated as homozygous and is most likely a mixed assembly of two alleles.

97

A major problem with ambiguity-coded consensus sequences is the handling of ambiguity codes in phylogenetic software. Ambiguity codes are interpreted as missing data by many programs (e.g. BEAST: (Drummond and Rambaut, 2007); Mesquite

(Maddison and Maddison, 2017); MrBayes (Ronquist and Huelsenbeck, 2003); PAUP*:

(Swofford, 2002)). If ambiguity-coded positions are interpreted as missing data and phylogenetic information in a heterozygous position of an alignment is lost, phylogenetic resolution may be lower among highly heterozygous sequences than among sequences with less genetic variation. Exceptions are RAxML (Stamatakis, 2006), APE{R} (Paradis et al., 2004), and SVDquartets (Chifman and Kubatko, 2014), all of which treat ambiguity codes as polymorphisms. When ambiguity codes are treated as polymorphisms in likelihood models, the probability of substituting an "A" by "Y" equals that of "A" by "C" and/or "T"; this increases computation time and topological uncertainty

(Felsenstein, 2003). The issues associated with ambiguity-coded bases are particularly problematic for datasets where intraindividual variability exceeds intraspecific variability.

The most biologically accurate way to include heterozygous genes in phylogenetic analysis is to reconstruct phylogenetic trees from allele sequences rather than from ambiguity-coded consensus sequences or chimeric consensus sequences.

This process, known as “phasing,” joins variants across sites: for example the “G” in a

G/T SNP at one site may be associated with the C in a C/A SNP at another site. Two main methods of generating phased allele sequences for a single locus are available: amplicon sequencing and statistical phase inference. In amplicon sequencing (e.g.,

(Uribe-Convers et al., 2016; Kates et al., 2017), targeted genes are amplified in separate PCR reactions and pooled for sequencing. If sequencing read length is longer

98

than the PCR product, sequences may be separated into two or more alleles. However, amplicon sequencing requires a large amount of wet-lab preparation, relies on relatively short loci, and custom read assembly is required.

There are two main methods for inferring phase statistically: population-based phasing and read-backed phasing. The former requires a reference population of known variants, where the phase between alleles is already known and is used almost exclusively for well-characterized model systems (e.g., (Browning and Browning, 2007).

In read-backed phasing, information from each of the reads is used to generate short- range haplotypes of variant sites connected by reads aligned to a reference sequence

(Figure 4-1). This method will only produce phased alleles within a locus; full long-range haplotypes cannot be obtained using read-backed phasing alone.

Even if a dataset of allele sequences is assembled, phylogeny reconstruction from two sequences per individual is also not straightforward. Common methods for phylogeny reconstruction from large numbers of nuclear genes include concatenation

(i.e. supermatrix) and species tree estimation (e.g. ASTRAL (Mirarab and Warnow,

2015), *BEAST (Heled and Drummond, 2010), etc). There is no clear way to concatenate allele sequences across loci without long-range haplotype phase information, and multiple studies have demonstrated that when a single allele for each heterozygous locus is selected and included in the concatenated data matrix, the selection of alleles can influence the results (see Edwards et al., 2008; Weisrock et al.,

2012). This is, in part, because when incomplete lineage sorting occurs, heterozygous alleles from one species may coalesce deeper in a particular gene phylogeny than that species’ divergence with its sister species (Weisrock et al., 2012).

99

Species tree methods may be used to address issues with discordant gene histories due to deep coalescence (Knowles, 2009; Liu, Yu, Kubatko, et al., 2009; Liu et al., 2015; Mirarab and Warnow, 2015). Because these methods allow for mapping of multiple "individuals" in gene trees or a sequence data matrix to a single "species" in the species tree, they include a practical way to use allelic information: multiple alleles in gene trees are mapped to a single individual in the species tree.

Standard methods to assemble and analyze sequence data do not enable researchers to easily reconstruct the evolutionary history of alleles, so we do not know the extent to which phylogeny reconstruction from allele sequences could more effectively elucidate phylogenetic relationships that remain unknown. Here, we present a method for assembling an allelic dataset from Illumina sequencing reads and describe how to integrate this method into a recently developed tool for locus assembly from target enrichment data, HybPiper (Johnson et al., 2016). We demonstrate how these data can be used for phylogeny reconstruction and compare phylogenetic hypotheses of Artocarpus J. R. Forst. & G. Forst (Moraceae) (breadfruit, jackfruit and relatives) across allelic and non-allelic datasets using multiple methods of phylogenetic inference to determine whether the use of allelic data influences phylogenetic results.

Methods

Datasets

The Artocarpus dataset includes target-enriched sequences of 23 ingroup taxa and one outgroup taxon from the sister Moreae (Steblus glaber Corner) (Table 4-

1), originally described in detail in Johnson et al. (2016). The target enrichment probes

(MycroArray, Ann Arbor, MI, USA) were originally designed for 333 single-copy loci; however, many of these genes have one or more paralogs resulting from the whole

100

genome duplication in Artocarpus (Gardner et al., 2016). To avoid potential problems with assembling paralogs as alleles, we built new assemblies for the present analyses as follows. Target-enriched reads from A. camansi Blanco (the same individual used for whole-genome sequencing in the original marker development (Gardner et al., 2016)) were assembled de novo using SPAdes (Bankevich et al., 2012), and genes were predicted using Augustus (Keller et al., 2011), with Arabidopsis Hehyn. as the reference. Those genes were annotated using a BLASTn search seeded with the

HybPiper target file of 458 genes from Johnson et al. (2016). Paralogs were annotated accordingly, and a new target file of 726 genes was used to build new HybPiper assembles for all 24 taxa. There were 151 genes that never triggered a HybPiper paralog warning for any sample. From among these genes, we selected 111 genes with the outgroup present for the analyses presented here.

We assembled four datasets from the sequencing reads: 1) Consensus sequences with heterozygous bases called as the nucleotide with highest read- frequency ("consensus"); 2) Consensus sequences with heterozygous bases ambiguity- coded ("ambiguity"); 3) Allele sequences with read-backed phasing ("alleles"); 4)

Unphased allele sequences ("unphased alleles") used in SVDQuartets (Chifman and

Kubatko, 2014) only; all other phylogenetic methods described below were used only for the consensus, ambiguity, and alleles datasets.

We used HybPiper to assemble the consensus dataset, which is the default method of sequencing read assembly implemented in HybPiper and is described in detail in Johnson et al. (2016). Briefly, HybPiper sorts reads by target gene and then performs de novo assembly of contigs for each gene using SPAdes (Bankevich et al.,

101

2012). If more than one long contig is assembled by SPAdes, HybPiper chooses among multiple full-length contigs by using a sequencing depth cutoff to choose the best full- length contig, or based on the percent identity with the reference sequence if the sequencing depth is similar among all full-length contigs. This method results in a single contig, even if multiple long sequences represent alleles (Johnson et al., 2016). We extracted exon sequence and flanking regions for each gene, which were concatenated into a “supercontig” using the “intronerate.py” script in HybPiper.

We assembled the ambiguity dataset using the following steps for each sample

(the total 726 sequenced genes were used for assembly purposes): (1) We used BWA-

MEM (Li, 2013) and Picard (https://broadinstitute.github.io/picard/) to generate a duplicate-free BAM alignment for each sample separately, using the “supercontig” sequences generated by HybPiper as a reference. (2) We used GATK (McKenna et al.,

2010) to identify and call variants, retain SNPs, and generate sequences with IUPAC ambiguity codes, and (3) for each gene, we generated separate exon and intron files using annotations created by HybPiper (Johnson et al., 2016). (4) We used Macse

(Ranwez et al., 2011) to generate in-frame alignments for exons and used MAFFT

(Katoh and Standley, 2013) to align intron sequences separately. (5) We trimmed exon and intron alignments with TrimAl (Capella-Gutiérrez et al., 2009) to remove sites that did not appear in at least 15 samples and combined the alignments retaining positional information for gene partitions (Figure E-1, Figure 4).

To assemble the alleles dataset, we used WhatsHap (Patterson et al., 2015), a

Python-based program for read-backed phasing designed for long-reads but appropriate for 300-bp paired-end Illumina data. For each sample, we used the BAM

102

alignments generated for the ambiguity dataset to generate a phased VCF file and a

GTF file containing the locations of phase blocks within each gene using default

WhatsHap settings. We generated two phased sequences for every gene using

“bcftools consensus,” but we only retained phased sequence in the longest phase block for each individual at each gene. The remaining variant sites outside the longest phase block were replaced with ambiguity characters (Figure 4-1), and only one sequence was retained if it was completely homozygous for that individual. This method allowed us to retain phased heterozygous sites only where backed by read data (i.e. not spanning long introns), while retaining the full-length sequence that may contain informative sites across species. Our script for processing phased alleles, “haplonerate.py” is freely available at www.github.com/mossmatters/phyloscripts

The final product from the alleles assembly method is one or two sequences per individual per gene. If an individual is heterozygous at a given locus, assembly of allele sequences results in intraindividual alleles: two alleles (for a gene) from the same individual. If intraindividual alleles are not sister in a gene tree, we refer to this as “deep coalescence of alleles”. We assembled a second version of the alleles dataset, which always contains two sequences per individual with no ambiguity codes, to analyze with

SVDQuartets (Chifman and Kubatko, 2014; described in Phylogenetic analyses).

Phylogenetic Analyses

For the ambiguity, consensus, and alleles datasets, we estimated gene trees with

RAxML v 8.2.3 (Stamatakis, 2014) using the GTR+Gamma model of evolution and

1,000 replicates of multi-locus bootstrapping. We specified separate partitions for 1) first and second codon position, 2) third codon position, and 3) intron for each gene.

103

For the ambiguity and consensus datasets, we performed concatenation + maximum likelihood analysis (CAML) on concatenated alignments of 111 genes using

RAxML with the GTR+Gamma model of evolution and 1,000 replicates of multi-locus bootstrapping. We partitioned the concatenated alignments by gene, codon position, and intron (330 partitions per alignment). There is no known method for treating the alleles dataset in a CAML analysis, as this would require associating alleles across loci into long-range haplotypes.

We estimated species trees using four popular summary methods that are consistent under the multispecies coalescent: ASTRAL II v 4.7.4 (Mirarab and Warnow,

2015), MP-EST v 1.5 (Liu et al., 2010), STAR (Liu, Yu, Pearl, et al., 2009), and

SVDquartets (Chifman and Kubatko, 2014). We used ML gene trees estimated by

RAxML (described above) as input for ASTRAL-II, MP-EST, and STAR. For each dataset, we ran ASTRAL-II with 1,000 bootstrap replicates (option -r) using gene and site resampling (option -g). For the alleles dataset, we used the mapping option (-a) to map alleles in the gene trees to individuals. Because branches with very low support in gene trees may cause erroneous topologies in the species tree, we also estimated the

ASTRAL species trees using ML gene trees with branches with less than 33% BS support collapsed using TreeCollapserCL4

(http://emmahodcroft.com/TreeCollapseCL.html). We could not collapse branches with low-support for the other species tree methods as they require bifurcating trees.

We estimated species trees and performed bootstrap validation with MP-EST v

1.5 and STAR on STRAW

(http://bioinformatics.publichealth.uga.edu/SpeciesTreeAnalysis/index.php). Prior to MP-

104

EST and STAR analysis, we rooted ML gene trees and bootstrap trees by outgroup using the program pxrr in phyx (Brown et al., 2017) and removed branch lengths and support values from the trees. We added BS values from the bootstrap consensus trees to the species trees manually. For the alleles dataset, we used a species-allele table to map alleles in the gene trees to individuals.

We estimated SVDquartets trees from concatenated gene sequence alignments using SVDquartets analysis in PAUP4 (Swofford, 2002) with exhaustive quartet sampling, 1,000 bootstrap replicates, and the multispecies coalescent tree model. For the ambiguity dataset, we ran one analysis with ambiguity codes interpreted as distributed and one analysis with ambiguity codes considered as missing data. For the alleles datasets, species-membership partitioning was used to assign allele sequences to individuals. We also performed SVDquartets analysis on unphased allele sequences to determine whether this affects the SVDquartets tree from allelic data. SVDquartets treats each alignment position as an unlinked locus, so phasing of SNPs across alleles may not be necessary for this type of species tree analysis. We manually added bootstrap support values from the bootstrap consensus trees generated by SVDquartets to the SVDquartets species trees.

To focus on issues that may arise in phylogeny reconstruction in datasets with a high level of deep coalescence of alleles, we performed additional species tree estimation in ASTRAL and CAML analysis on a subset of seven genes that had the highest proportion of non-sister intraindividual alleles (20% or higher) in the ML gene trees. The identification of these genes is described below, and we performed the

ASTRAL and CAML analyses exactly as described above for the full datasets (ASTRAL

105

using ML gene trees with branches with less than 33% BS support collapsed). We estimated “percent resolution” of the resulting trees for each dataset by collapsing branches in the trees that had <50% BS support using TreeCollapser4

((http://emmahodcroft.com/TreeCollapseCL.html) and counting the number of bipartitions in the resulting trees using the function bitsplits() in the R package ape{}

(Paradis et al., 2004). We then calculated percent resolution as the number of bipartitions in these trees divided by the number of possible bipartitions (two less than the number of tips). For the alleles dataset we removed all nodes where the descendants were two alleles from the same individual, to avoid overestimating support on these gene trees.

Assessing Gene Tree Resolution, Topological Incongruence, and Allele Coalescence

We used Phyparts (Smith et al., 2015; https://bitbucket.org/blackrim/phyparts) to assess gene tree discordance and gene tree-species tree discordance. Phyparts conducts bipartition analysis across a set of trees while allowing for missing data. For each of the three data assemblies, we ran Phyparts using the ASTRAL-II species tree

(estimated from that dataset) as the mapping tree to assess discordance among the gene trees. Although we include a Phyparts analysis of the gene trees estimated from the alleles dataset, the results of this analysis are intended only to show general patterns and have different interpretations for two reasons: 1) To compare gene trees in which all allele sequences are tips to a species tree, the species tree must also have one tip per allele sequence. Because intraindividual alleles are arbitrarily named across gene trees, the only appropriate way to estimate a species tree for multiple-allele gene trees is to map alleles to individuals. The alleles ASTRAL-II species tree used for the

106

mapping tree is therefore not a real estimation of relationships; 2) Any non-concordance between a gene tree and the ASTRAL-II species tree that involves the position of an allele that is not sister to its second intraindividual may not be reflective of arbitrary intraindividual allele naming. Although any interpretation of Phyparts results for the alleles dataset is severely limited, we include the general patterns observed because the incidences of non-sisterhood of intraindividual alleles was rare in most gene trees after we collapsed branches with low support (described in Results).

Prior to running phyparts, we rooted gene trees and the mapping tree by outgroup using the pxrr program in phyx (Brown et al., 2017) and removed branch lengths from the mapping trees. We ran Phyparts with the -b option set to 33 so that branches with less than 33% BS support in the gene trees would not be considered.

Results from phyparts were visualized using phypartspiecharts.py to summarize gene tree conflict on our phylogenies using pie charts. We used a second script

(minorityreport.py) to generate a "minority bipartition" report from the Phyparts output that shows the number of gene trees supporting each alternative bipartition. Both scripts are freely available under an MIT license at www.github.com/mossmatters/phyloscripts

To summarize the support for hypotheses of Artocarpus relationships across datasets and analyses, we compared the topologies of the seventeen species trees and two CAML trees by eye after rooting by outgroup using the pxrr program in phyx (Brown et al., 2017). We identified across-analysis incongruence and manually identified all various arrangements in areas that exhibited incongruence.

To identify deep coalescence of intraindividual alleles within gene trees, we used the Python package ETE3 (Huerta-Cepas et al., 2016); etetoolkit.org) to calculate how

107

often intraindividual alleles were resolved as sister lineages in ML gene trees estimated from the alleles dataset. When the two intraindividual alleles were monophyletic on a gene tree, we recorded the gene tree bootstrap support. When the alleles were not monophyletic, we recorded the support from the most highly supported node that prevented the monophyly of alleles. We summarized the support for monophyletic intraindividual alleles values across samples and loci in a heatmap using the Python package seaborn (Waskom et al., 2014). A python notebook describing our procedure is available at www.github.com/mossmatters/phyloscripts

To characterize the level of sequence heterozygosity for each individual, we calculated the square root pairwise distance (sqrt PWD) between intraindividual allele sequences in each of 111 gene alignments from the alleles dataset using the function dist.alignment() in the R package seqinR{} (Charif and Lobry, 2007). For homozygotes, we replicated sequence names and sequences in the alignments to generate sqrt

PWDs of zero. To see whether there was an association between the prevalence of deep coalescence of intraindividual alleles for a gene or highly heterozygous genes and gene tree resolution, we plotted the number of non-sister intraindividual alleles in a gene tree and the average sqrt PWD for a gene against gene-tree percent resolution. We estimated “percent resolution” of gene trees as described above in Phylogenetic analyses.

Results

Assemblies and Alignments

We reduced our gene sampling to maximize taxon occupancy; most species had all 111 genes present, and the fewest genes any taxon had was 109 (Table 4-1). The gene alignments ranged from 1067 bp to 5350 bp with a median length of 2303 bp. The

108

final concatenated assembly contained 361,738 bp and had only 71 missing gene/individual combinations.

Allele Coalescence, Sequence Heterozygosity, and Gene Tree Resolution

Among the 24 individuals, the number of gene trees in which intraindividual alleles were not sister (i.e. deep coalescence of alleles) ranged from zero to 23 of 111 gene trees and the average was 4.75 of 111 gene trees (Figure 4-2). Across 111 gene trees, the percentage of non-sister intraindividual alleles (heterozygous allele pairs not sister in the tree/total heterozygous allele pairs in the tree) ranged from 0% to 33% and the average was 6.7%. We did not observe an association between the proportion of monophyletic intraindividual alleles (in the alleles dataset) in a gene tree and the gene tree’s percent resolution (in any dataset, Figure 4-3).

Among the 24 individuals, the sqrt PWD ranged from 0.055 to 0.132 and the average was 0.080 (Figure 4-S1A). Across 111 genes, the average (for all individuals) sqrt PWD ranged from 0.037 to 0.144 and the average was 0.070. We did not observe an association between individuals that had high heterozygosity and membership in high-conflict/poorly resolved areas of the Artocarpus phylogeny (Figure 4-S1B), nor did we observe an association between a high number of heterozygous sequences in a gene tree and the gene tree’s percent resolution (Figure 4-S1C).

The mean percent resolution across gene trees was 77% for the ambiguity dataset, 78% for the consensus dataset, and 86% for the alleles dataset (Figure 4-3).

Gene-tree percent resolution for the alleles dataset is likely inflated because of the contribution of strong support for intraindividual allele sisterhood (as many as 12 bipartitions per gene tree) in the alleles gene trees.

109

Levels of Gene Tree/Species Tree Discordance across Methods of Data Assembly

The number of gene trees concordant with the species tree was similar for the consensus and ambiguity datasets for most clades (Figure 4-4A and B). Of the 11 nodes that had a high level of gene tree discordance with the species tree (>50% of gene trees discordant with species tree), the number of discordant gene trees differed between the two datasets by more than five for only two nodes (1 and 7). For these two nodes and six of the eight remaining “high-conflict” nodes, the ambiguity dataset had a higher number of gene trees concordant with the species tree than did the consensus dataset.

Comparison of gene tree discordance in the alleles dataset to the ambiguity and consensus datasets is limited as described in Methods. We did not make comparisons of alleles gene tree concordance to ambiguity and consensus species trees at high conflict nodes 1-7 because these subtend clades that include individuals with intraindividual alleles that were non-sister in 10 or more gene trees: NZ402 and NZ929.

(The other two individuals with non-sister alleles in 10 or more gene trees occur in parts of the tree without high gene tree/species tree discordance.) At high-conflict nodes 16-

21 (subtending clades that include no individuals with non-sister alleles in more than four gene trees) we observed no association between the number of gene trees concordant with the species tree and the use of allele sequences for gene tree reconstruction (Figure 4-4A and B). Numbers of gene trees concordant with the species tree at these nodes are either lower, in the middle, or higher compared with the consensus and ambiguity datasets. The total number of alternative bipartitions found in gene trees was similar for each high-conflict node across all three datasets; only one

110

node (node 6) had a difference in the number of alternative bipartitions greater than three between the ambiguity and consensus datasets.

Topological Incongruence among Data Sets and Phylogenetic Inference

We assessed incongruence among 19 topologies resolved by all combinations of datasets and phylogenetic analyses: six species trees for the alleles assembly dataset

(Figures 4-S2-4-S7), five species trees and one CAML tree for the consensus dataset

(Figures 4-S8-4-S13), and six species trees and one CAML tree for the ambiguity dataset (Figures 4-S14-4-S20). We identified four major areas in the Artocarpus phylogeny where species’ placement varied among the 19 trees (the two most salient detailed in Figure 4-5, and all four detailed in Figure 4-S26). Hypotheses for Artocarpus evolution are discussed extensively elsewhere (see Williams et al., 2017). Here, we simplify our descriptions of alternative hypotheses to focus on comparing results across datasets.

Briefly, the four areas of incongruence are: A) The position of subgenus Prainea

(King) Renner, here represented by A. limpato Miq. (NZ609) (Figure 4-5). Prainea occurs successively or as a two-member clade with A. sepicanus Diels as sister taxa to the rest of the genus or to a major clade, or as sister species in different major clades;

B) Series Rugosi Jarrett of subgenus Artocarpus sect. Artocarpus (Figure 4-5) (also including A. teijsmannii Miq.), involving comparatively shallow nodes. Three species (A. tamaran Becc. (EG92), A. sericarpus F.M. Jarret (NZ711), and A. elasticus Reinw. ex

Blume (EG87)) occur in all possible positions within the clade; C) Resolution within subgenus Pseudojaca Tréc. (Figure 4-S26). Four of the six subgenus Pseudojaca species (A. dadah Miq. (NZ694), A. thailandicus C.C. Berg (NZ402), A. nitidus ssp. lingnanensis (Merr.) F.M. Jarrett (NZ911), A. lacucha Buch-Ham. (NZ420)) occur in

111

every possible position within the subgenus; and, D) The positions of the allied species

A. excelsus F.M. Jarrett (NZ780) and A. lowii King (MWL2), which either form a clade sister to or a grade before Area B (Figure 4-S26).

Branches involved in incongruent relationships in Area A are the deepest in the tree, and branches affecting the resolution of Area D are the next deepest in the tree.

Branches in Areas B and C are at comparably shallow depths in the tree and subtend terminal clades and/or one node deeper depending on the specific topology resolved.

The crown age of the ingroup has been estimated at ca. 40 mya (Williams et al., 2017).

We found slightly more variation in topologies resolved between datasets (within each program) than between programs (within datasets) (Tables 4-2 and 4-3). In Area

A, six arrangements occurred in our trees (Figure 4-5). Area A was resolved more congruently between the alleles and consensus datasets than between either of these and the ambiguity dataset, and these two datasets resolved trees with generally higher

BS support in this area of the tree. In 16/19 trees, Prainea was not sister to all other

Artocarpus species but instead was instead was nested inside the genus, in agreement with Zerega et al. (2010) who treated Prainea as a subgenus of Artocarpus.

In Area B, four arrangements of series Rugosi occurred in our species/CAML trees but only one arrangement was resolved with high BS support (>70%) in more than one tree (Figure 4-5). The arrangement resolved in the majority of trees was resolved with similar frequency by all of the datasets. The consensus dataset had the most between-analyses congruence in area C, and area C was resolved more congruently between the alleles and ambiguity datasets than between either of these and the

112

consensus dataset. By contrast with area B, this clade was either mostly (Zerega et al.,

2010) or fully (Berg et al., 2006) sampled, depending on the taxonomy used.

In Area C, eight arrangements for subg. Pseudojaca occurred in our trees, but only one arrangement was resolved with high BS support (>70%) in more than one tree

(Figure 4-S26). This most common arrangement was resolved consistently in species trees for both the ambiguity and consensus datasets, but there was complete incongruence among analyses for the alleles datasets; each analysis resolved a different topology. Indeed, four of the eight arrangements appeared only in the alleles analyses. Area B was resolved more congruently between the consensus and ambiguity datasets than between either of these and the alleles dataset. This clade was sparsely sampled in our dataset, with only 6/24 species represented.

In Area D, three arrangements for A. excelsus and A. lowii occurred, and two arrangements had high (>70%) BS support in six of 19 trees (the third arrangement only occurred in one tree and with low support) (Figure 4-S26). The consensus dataset had the highest among-analyses congruence in this area. Area D was resolved more congruently between the ambiguity and allele datasets than between either of these and the consensus dataset because of the congruence between the SVDQUARTETS trees for these datasets. This was the only high-conflict area in the tree for which the CAML analyses resolved congruent topologies for the consensus and ambiguity datasets.

Comparison of Support for Relationships Reconstructed Using Different Datasets

For the four areas of incongruence we compared BS support across trees for the first branch if the arrangement varied by basal split or for the base of the clade if the arrangement varied by clade membership (Figures 4-5 and 4-S26). (We did not

113

compare BS support for non-high-conflict areas of the tree because these areas had high support in all trees.)

For the consensus dataset, the CAML tree had high BS support for the arrangements it resolved in all four high-conflict areas. Each ASTRAL tree had high support for the arrangements it resolved in three of four areas (A, C, and D). The MP-

EST tree and STAR tree each had high support for the topology it resolved for two areas (A and D), and the SVDQuartets tree did not have high support for the topology it resolved in any high-conflict area.

For the ambiguity dataset, the STAR tree had high support for the arrangements it resolved in all four areas. The MP-EST tree and all ASTRAL trees had high support for the topology for two areas (A and C), and each SVDQuartets tree had high support for the topology it resolved in one area (D).

For the alleles dataset, the STAR tree had high support for the arrangements it resolved in all four areas. The MP-EST tree, each ASTRAL tree, and the SVDQuartets tree estimated from phased allele sequences each had high support for the topology it resolved for two areas (MP-EST: A and B; ASTRAL: A and C; SVDQ: C and D), and the

SVDQuartets tree estimated from unphased allele sequences had high support for one area (D).

Phylogenetic Analysis of Genes with High Levels of Deep Coalescence

To assess whether using allele sequences rather than ambiguity or consensus sequences improves resolution of species trees estimated from gene trees in which deep coalescence of intraindividual alleles is not rare, we performed species tree estimation using ASTRAL on a set of seven gene trees in which more than one in five heterozygous individuals had alleles that were not sister in the gene tree. We did not

114

observe better resolution of relationships in the ASTRAL species tree estimated from the alleles dataset gene trees than in the ASTRAL trees estimated from ambiguity and consensus dataset gene trees (Figures 4-S21-4-S23). The total percent resolution of

Artocarpus in the ASTRAL species trees estimated from consensus, ambiguity, and alleles assembly datasets was 68% in each tree. Percent resolution in the CAML trees reconstructed from the consensus and ambiguity assembly datasets was 77% for both trees (Figures 4-S24-4-S25; see Methods for explanation of “percent resolution”).

Discussion

Our understanding of how deep coalescence of alleles and/or high levels of heterozygosity may affect phylogeny reconstruction is very limited. Phylogeneticists do not always assess heterozygosity in datasets and are usually not aware of deep coalescence of alleles because allele sequences are not assembled or analyzed. In this study, we quantified the amount of deep coalescence of alleles and highly heterozygous sequences and asked how these characteristics of a dataset affect our ability to resolve a phylogeny. To answer this question, we considered: 1) Do gene trees estimated from genes with high levels of deep coalescence of alleles or heterozygosity have lower resolution than others?; 2) Do highly heterozygous individuals and/or those with non- coalescing alleles in gene trees appear to limit resolution in species trees/CAML trees?;

3) Does using allele sequences instead of ambiguity or consensus sequences resolve any problems that may be caused by deep coalescence or high heterozygosity?; and 4)

Do phylogenies estimated from allele sequences differ from those estimated from

“super-contigs” (consensus or ambiguity-coded contigs)? In our reconstruction of gene trees, CAML trees, and species trees for Artocarpus using sequences assembled with

115

three different methods of data assembly, we did not find evidence that the answer to any of these questions is yes.

Highly Heterozygous Genes and Individuals Do Not Appear Disproportionately Responsible for Low Resolution or Topological Uncertainty in Gene Trees or Species/CAML Trees

Our finding that highly heterozygous individuals did not influence gene tree resolution or disproportionately occur in high-conflict areas of species or CAML trees was somewhat surprising. This was particularly true for the ambiguity dataset, as ambiguity-coded heterozygous positions have previously been shown to decrease phylogenetic resolution (Potts et al., 2014). However, many factors influence phylogenetic resolution, and we cannot isolate the effect of ambiguity codes or deep coalescence of alleles using our empirical data. In particular, more variable genes are expected to be more phylogenetically informative (Small et al., 2004; Duarte et al.,

2010), and high variability among sequences from separate individuals increases phylogenetic resolution (Parks et al., 2009). Both of these desirable characteristics are inextricably linked to the presence of ambiguity codes (in ambiguity-coded data) and the increased possibility of non-sisterhood of intra-individual alleles. Our finding that high heterozygosity of individuals or gene trees was also not associated with improved phylogenetic resolution is weakly suggestive of these negative effects confound the positive ones.

Using Allele Sequences does not Improve Phylogenetic Resolution or Produce Meaningfully Different Topologies in Enigmatic Areas of the Artocarpus Phylogeny

Answering questions related to “improving” a phylogeny or to comparisons of topologies broadly is limited because we do not know the true phylogeny of real (not simulated) data. We also looked at differences in topologies that resulted from three

116

methods of assembling heterozygous sequences and used three measures to assess whether any one dataset was “better” at resolving the Artocarpus phylogeny: 1)

Bootstrap support in poorly resolved areas of Artocarpus evolution; 2) Topological consistency across methods of phylogeny reconstruction; and 3) Amount of gene tree discordance and gene tree/species tree concordance (i.e. topological consistency across gene trees and between gene trees and species trees). Although we did find that phylogenies reconstructed from the alleles dataset differed from those reconstructed with the consensus or ambiguity sequences, topological incongruence among methods used for phylogenetic analysis and between the other two assembly methods was so common that we can only conclude that different methods of data assembly and different programs used for analysis yield different evolutionary hypotheses in poorly supported areas of a phylogeny. This study highlights the uncertainty inherent in estimating the phylogeny of closely related species. We found that high statistical support at several nodes obscured high gene tree incongruence and disagreement among dataset assembly and phylogeny reconstruction methods. In particular, CAML analysis of the consensus method of data assembly, likely the most commonly used methods of assembly and phylogenetic analysis, consistently resolved topologies with high bootstrap support in high-conflict areas of the Artocarpus phylogeny. The use of other data assemblies and other methods of phylogenetic analysis revealed topological uncertainty in these areas, and in most cases the topology resolved by the consensus-

CAML analysis was not the topology resolved by a majority of other analyses. For example, the alleles analyses recovered six topologies for the sparsely sampled Area C, none of which were recovered by either CAML analysis, even though the two CAML

117

topologies both had high support (Figure 4-S26). This supports previous findings based on simulated and biological data that concatenation analyses can result in the wrong tree with high support under some conditions (Edwards et al., 2007; Kubatko and

Degnan, 2007).

In our comparisons of the topologies of, and support for, poorly resolved relationships in Artocarpus, no clear patterns emerged to suggest that the use of allele sequences for phylogeny reconstruction improves phylogenetic resolution relative to the use of one sequence per allele. Bootstrap support for high-conflict branches in the tree was not higher in the topologies reconstructed using the alleles dataset than in the other topologies. Gene trees estimated from allele sequences were not more concordant than those estimated from the other methods, nor was there more gene tree-species tree concordance for the alleles dataset.

Deep coalescence of alleles is sometimes cited as a potential reason for poor resolution toward the tips of a phylogeny (Maddison and Knowles, 2006). Although there is no clear association between the depth of high-conflict branches in the tree and which dataset best resolves these areas of the phylogeny, the only high-conflict area for which the topologies resolved by the alleles dataset were among the more consistent and better-supported topologies was the area that involved branches deeper in the tree than the other three high-conflict areas (“Area A”).

Limitations of Our Results

Our finding that the use of allele sequences does not improve phylogenetic resolution is novel, but its implications are somewhat limited. The extent to which heterozygosity and deep coalescence of alleles affects phylogeny reconstruction will differ for every phylogenetic dataset. Our Artocarpus dataset only included one

118

individual per species; deep coalescence of alleles may be more problematic when resolving relationships among conspecific individuals across multiple species. However, we have almost certainly under-sampled the allele pool for each species; deep coalescence in Artocarpus may be more pervasive than we observe with one individual per species. As described above, we also did not find clear evidence that poor resolution in the Artocarpus phylogeny was associated with sequence heterozygosity. If heterozygous sequences are not associated with poor resolution, it is not clear that a more biologically accurate assembly of these genes would improve phylogenetic hypotheses. However, heterozygosity is expected to limit phylogenetic resolution when ambiguity codes are used, and this has been shown elsewhere (Kates et al., 2017).

We used biological data rather than simulated sequences for this study. As such, we do not know the true phylogeny of the study group and cannot determine whether a particular method of sequence assembly yields a more “accurate” phylogeny. Instead, we compared statistical support for relationships resolved and topological consistency across phylogenetic methods for our various datasets. Because the usefulness of bootstrapping may be limited by program-specific assumptions and characteristics of the dataset, bootstrap support only has relative meaning within a particular dataset and analysis (Soltis and Soltis, 2003); it is therefore not clear whether it is appropriate or meaningful to compare bootstrap support for a topology across datasets or methods.

Conclusions

We present here a framework for reconstructing alleles from target enrichment data and for systematically evaluating the influence of locus-assembly methods in species tree resolution. Although we did not find evidence that the use of allele sequences to reconstruct phylogenies offers a clear improvement over other methods of

119

assembling heterozygous sequences, these results are specific to our dataset and other researchers should assess heterozygosity in their own datasets and determine whether it is problematic. The common across-analysis incongruence presented here highlights problems with treating phylogenies as true rather than as evolutionary hypotheses, especially in studies where a single method of sequence assembly and phylogenetic analysis is used. Additional studies using simulated data for which the true phylogeny is known are needed to fully address the questions posed here.

120

Table 4-1. Sample information, locus recovery, and statistics for allelic variation for 23 species of Artocarpus and one outgroup. The percentage of longest phase block refers to genes that have more than one phase block for that individual. Voucher information for all individuals except MV2 (vouchered in Gardner et al., 2016) can be found in Johnson et al., 2016. Note: Based on phylogenetic evidence, Artocarpus sepcianus likely does not belong in Artocarpus subg. Artocarpus (Johnson et al., 2016, this paper) Genes Percent. with Deep Mean Sqrt Loci Loci with Longest Coales- Pairwise Sample Num. with two >1 phase Phase cence of Allelic Species Subgenus ID Loci alleles block Block Alleles Distance Artocarpus anisophyllus Artocarpus NZ606 111 90 22 75.2% 10 0.0676

Artocarpus brevipedunculatus Artocarpus NZ814 111 89 22 76.0% 4 0.0681 Artocarpus camansi Artocarpus MV2 111 28 1 91.2% 1 0.0548 Artocarpus elasticus Artocarpus EG87 111 77 14 83.0% 4 0.0625 Artocarpus excelsus Artocarpus NZ780 111 81 22 79.2% 4 0.0600 Artocarpus kemando Artocarpus NZ612 111 85 18 78.6% 3 0.0625 Artocarpus lanceifolius Artocarpus NZ739 111 85 15 74.2% 10 0.0711

Artocarpus lowii Artocarpus MWL2 111 62 17 76.2% 2 0.0617 Artocarpus odoratissimus Artocarpus NZ866 111 89 21 72.6% 3 0.0744

Artocarpus rigidus Artocarpus NZ728 111 79 25 77.9% 3 0.0721 Artocarpus sericicarpus Artocarpus NZ771 111 72 20 78.0% 2 0.0644 Artocarpus tamaran Artocarpus EG92 111 91 23 75.9% 4 0.0690 Artocarpus teysmannii Artocarpus NZ946 111 37 6 74.3% 1 0.0644 Artocarpus sepicanus [Artocarpus] GW1701 109 77 27 77.7% 2 0.0712 Artocarpus heterophyllus Cauliflori EG98 111 48 17 71.9% 3 0.0817

Artocarpus integer Cauliflori NZ918 111 91 29 74.1% 1 0.0737

121

Table 4-1. Continued Genes Percent. with Deep Mean Sqrt Loci Loci with Longest Coales- Pairwise Sample Num. with two >1 phase Phase cence of Allelic Species Subgenus ID Loci alleles block Block Alleles Distance Artocarpus fretessii Pseudojaca NZ929 110 94 22 72.5% 12 0.0774 Artocarpus lacucha Pseudojaca NZ420 111 77 24 78.1% 4 0.0706

Artocarpus nitidus ssp. lingnanensis Pseudojaca NZ911 111 29 5 83.5% 5 0.0805 Artocarpus peltatus Pseudojaca NZ694 111 90 24 76.3% 7 0.0815 Artocarpus primackiana Pseudojaca NZ687 110 93 27 74.5% 6 0.0734 Artocarpus thailandicus Pseudojaca NZ402 110 69 16 71.7% 23 0.1320

Streblus glaber NA EG78 111 71 26 78.6% 0 0.0623

122

Table 4-2. Topological variation among three datasets (within analyses). Percentages are number of pairwise differences across trees/possible number of pairwise differences across trees; higher numbers indicate greater topological variation.

Area ASTRAL MPEST STAR SVDQ CAML

A 50% 50% 50% 50% 100%

B 50% 50% 0% 50% 100%

C 50% 50% 0% 100% 100%

D 0% 50% 0% 50% 0%

123

Table 4-3. Topological variation within each dataset (among analyses). Percentages are number of pairwise differences across trees/possible number of pairwise differences across trees; higher numbers indicate greater topological variation. Area consensus ambiguity alleles

A 60% 33% 20%

B 40% 33% 20%

C 20% 33% 40%

D 20% 17% 40%

124

Figure 4-1. Read-backed phasing of targeted loci from high-throughput sequences. Top: Alignment of reads to the HybPiper supercontig for one locus (gene250.p1), for one individual (Artocarpus lowii MWL2). Variant (heterozygous) sites are shown in the top row. The histogram shows sequencing depth (varies from 1x to 45x). Read-backed phasing using WhatsHap separated reads into two phase blocks, separated by a gap of low depth. Variant sites in the left block (blue) can be phased with other sites within the block, but not with sites in the right block (green). Bottom: Partial DNA sequence generated by haplonerate.py for the three datasets for the same locus and individual. Phased alleles (red text) were retained only in the largest phase block (left)

125

Figure 4-2. Frequency of deep coalescence of intraindividual alleles. Each gene is a column, each row is an individual. The color ranges from light blue (100% support for monophyly of intraindividual alleles) to white (0% support) to red (100% support for non-monophyly of intraindividual alleles). Homozygous loci are shown in dark blue

126

Figure 4-3. Relationship between deep coalescence and gene tree resolution. Resolution is measured by the percentage of nodes with > 50% support. Diagonal: distributions of gene tree support in the three datasets for genes with any deep coalescence (green) and without deep coalescence (blue) in the alleles dataset. Below diagonal: relationship between gene tree resolution in a given dataset and deep coalescence in the alleles dataset. Genes with deep coalescence in the alleles dataset are in green. The kernal densities of these relationships is shown above the diagonal.

127

Figure 4-4. ASTRAL-II trees for Artocarpus. (A) Consensus dataset; B) Ambiguity dataset; C) Alleles dataset) with summary of conflicting and concordant gene trees produced with Phyparts. For each branch, the top number indicates the number of gene trees concordant with the species tree at that node, and the bottom number indicates the number of gene trees in conflict with that clade in the species tree. The pie charts at each node present the proportion of gene trees that support that clade (blue), the proportion that support the main alternative for that clade (yellow), the proportion that support the remaining alternatives (pink), and the proportion that inform (conflict or support) this clade that have less than 50% bootstrap support (black). Alternative topologies and the number of gene trees that support them can be found in the data repository.

128

Figure 4-5. Incongruence between analyses for all three datasets for Areas A and B. Left: table showing incongruence between analyses for all three datasets for Areas A and B. Rows represent topologies, drawn as cartoon trees (or subtrees) at right. Columns represent analyses (labeled at bottom) within datasets (labeled at top). Strong support (black) denotes percent support > 70%. Weak support denotes percent support < 70%. No support indicates that the topology in question was not recovered. Right: Cartoon trees representing alternative topologies: (A) position of A. limpato (subg. Prainea), with other subgenera collapsed; (B) Subg. Artocarpus ser. Rugosi subgree. Rearrangements are shown by colored terminal branches within each area

129

CHAPTER 5 COMPARATIVE CHARACTERIZATION OF THE DOMESTICATION PROCESS IN TWO INDEPENDENTLY DOMESTICATED PUMPKIN SPECIES

Introduction

The process of plant domestication has produced hundreds of domesticated species that differ dramatically from their wild ancestors, both genetically and phenotypically (Meyer and Purugganan 2012). The differences between wild and domesticated plants are a result of broad evolutionary changes that include selection associated with crops' coevolution with human domesticators and the demographic processes that accompany domestication, including population bottlenecks, genetic drift, and introgression with wild relatives. Because of the recent and severe genetic bottleneck that often accompanies domestication, domesticated plants typically possess only a subset of the genetic diversity present in their wild ancestors (Meyer and

Purugganan 2012). Genetic diversity in the initial domesticate is often further reduced by modern breeding (Meyer and Purugganan 2012). The loss of a crop's genetic diversity is increasingly alarming as breeders cannot access genetic diversity underlying traits such as disease resistance and drought tolerance that must be modified to respond to pressures of climate change and human population growth (Esquinas-

Alcázar 2005).

Despite the importance of genetic diversity for crop improvement, much of what we know about how the domestication process shapes the genetic diversity of crops comes from studies of plants from just one branch of the plant tree of life: cereals (e.g. rice: Zhu et al. 2007; corn: Hufford et al. 2012; wheat: Haudry et al. 2007) representing the grass family (Poaceae). Cereals include the most economically important

130

domesticated species but the life-history and domestication traits that these annual, non-outcrossing species share may bias our understanding of how domestication and breeding shape crop diversity in general. Recent work to reconstruct the domestication processes in a greater diversity of crops (e.g. apple (Malus, Rosaceae): Cornille et al.

2012; olive (Olea, Oleaceae): Diez et al. 2015; carrot (Daucus, Apiaceae): Iorizzo et al.

2013; peach (Prunus, Rosaceae): Cao et al. 2014; soybean (, Fabaceae): Guo et al. 2010) has led to broader characterizations of domestication as an evolutionary process. For example, population genomic studies in apple (Cornille et al. 2012) and carrot (Iorizzo et al. 2013) revealed that these crops did not experience the severe domestication bottlenecks that accompanied the domestications of rice (Oryza,

Poaceae), corn (Zea, Poaceae), and wheat (Triticum, Poaceae) and highlight the need to characterize the domestication process in crops that more broadly represent the diversity of wild plant species.

Establishing a single demographic model of crop domestication is not possible because of the diversity of crop wild ancestors and domestication and diversification processes. Comparisons of the domestications of diverse crop species are therefore vital to an accurate understanding of how domestication and breeding affect genetic diversity, but such comparisons are limited by the differences in traits that diverse domesticated species possess. Here we characterize and compare the domestication processes in two domesticated pumpkin species (Cucurbita, Cucurbitaceae) that were independently domesticated from closely related wild species.

Of the six independently domesticated pumpkin (2n = 40) species, only C. argyrosperma ssp. argyrosperma ("cushaw") and C. maxima ssp. maxima ("buttercup

131

squash") have a clearly supported sister relationship to an extant wild species that is likely the crop wild ancestor (Kates et al. 2017). Giant pumpkin and cushaw share a domestication syndrome based on similar initial domestication traits and modern breeding objectives. Both species were domesticated from monoecious, outcrossing, bee-pollinated, herbaceous annual vine species that bear round fruits, 3.5-8.0 cm in diameter, with a green exocarp that may be striped or unstriped and may be yellow or green at maturity (Nee 1990). The rinds of the wild relatives are hard and lignified

(Robinson and Decker-Walters 1997), and their flesh contains compounds called cucurbitacins that render the flesh inedible unless repeatedly boiled (Nabhan 1985).

Wild plants were likely initially selected by semi-nomadic humans for their edible, nutritious seeds and use of durable rinds as containers (Small 2014). Discovery of rare, non-bitter or less-bitter Cucurbita fruits led to the eventual non-bitter Cucurbita crops we know today.

The traits that define the domestication syndrome of C. argyrosperma and C. maxima include more uniform germination, bush-habit, a reduction in size and abundance of trichomes that interfere with harvesting, an increase in the size of fruits and seeds, and a reduction in the bitter taste of the flesh (Lira Saade and Montes

Hernandez 1994). There are also striking differences in the fruit of C. maxima ssp. maxima and C. argyrosperma ssp. argyrosperma and in the economic importance, geographic extent, and ecological diversity of their cultivated forms (Table

5-5-1).

Domesticated C. maxima ssp. maxima is among the most economically important and widely cultivated Cucurbita species and is also the most morphologically

132

diverse Cucurbita crop species (Chigmura Ngewerume and Grubben 2004). Cucurbita maxima ssp. maxima was domesticated in South America ~4,000 years ago from C. maxima ssp. andreana, a species that occurs in warm, temperate areas of Argentina and Uruguay (Decker-Walters and Walters 2000) and has been recorded as far north as

Bolivia (Figure 5-1). Cucurbita maxima ssp. maxima was brought to the Old World during the Columbian exchange (Decker-Walters and Walters 2000) and is now cultivated all over the world, including in a secondary center of diversity in Asia (Zhou et al. 2016) where extensive breeding and improvement of new varieties have occurred. In contrast, C. argyrosperma ssp. argyrosperma is the only domesticated Cucurbita species that is still primarily cultivated within the area where it was domesticated

(Robinson and Decker-Walters 1997). Cucurbita argyrosperma ssp. argyrosperma was likely domesticated ~7,000 years ago (Smith 2006) from C. argyrosperma ssp. sororia, a wild taxon that today occurs mostly along the Pacific coast of Mexico in tropical regions with distinct wet and dry seasons and more rarely in semi-arid areas of northwestern Mexico (Lira Saade and Montes Hernandez 1994) (Figure 5-1).

The incredible diversity of fruit morphology in C. maxima ssp. maxima, which includes a variety that produces the largest fruit on Earth (over one metric ton and over five meters in circumference; Decker-Walters and Decker 2000), is not seen in C. argyrosperma ssp. argyrosperma. Along with perhaps C. ficifolia (figleaf gourd), C. argyrosperma ssp. argyrosperma is the least diverse domesticated Cucurbita species in terms of developed varieties and fruit morphology (Lira Saade and Montes Hernandez

1994). The fruits of C. argyrosperma ssp. argyrosperma resemble larger versions of the wild type with or without a crookneck (Lira Saade and Montes Hernandez 1994), and C.

133

argyrosperma ssp. argyrosperma is mostly cultivated for seed rather than fruit (Lira

Saade and Montes Hernandez 1994).

In this study, we performed population genomic analyses using a set of diverse wild, landrace, and improved accessions for each species and over 15,000 SNP loci evenly distributed across the Cucurbita genome to investigate the following questions:

1) Is there evidence for population subdivision within each species from which we can infer more specific geographic origins of domestication?; 2) How large is the contribution of wild species to the genome of the domesticated taxa?; and 3) What consequences have domestication and subsequent crop improvement had for the genetic variation in each domesticated pumpkin species and how does this relate to the variable morphological and ecological diversity and economic-importance of the modern crops?

Materials and Methods

Plant Material and DNA Extraction

Two sample sets were used: 1) 48 unrelated individuals of C. argyrosperma (17 wild, 15 landrace, 16 improved), and 2) 48 unrelated individuals of C. maxima (16 wild,

16 landrace, 16 improved) (Table 5-S1). Identification of cultivated material as landrace or improved can be inexact; We designated germplasm of cultivated material collected from the origin of domestication as "landrace" and material from outside of this region as "improved". We assume that forms cultivated in the origin of domestication are more representative of the initial domesticates and those cultivated outside of this area are products of modern breeding programs. We also used varietal and cultivar information for the accessions to inform these designations when it was available. Limitations of this approach are addressed in Sect. 4.3. Germplasm for all samples was obtained from the

134

collections of various institutes (United States Department of Agricultural Research

Service (USDA-ARS), Universidad Nacional de Rosario (UNR), Instituto Nacional de

Tecnología Agropecuaria (INTA)) and from private collections at Universidad Nacional

Autónoma de México (UNAM) for C. argyrosperma and UNR for C. maxima. For viable germplasm, leaf samples for DNA extraction were obtained from seedlings grown at the

University of Florida, Gainesville, FL, USA. DNA was extracted from fresh leaf tissue using a modified 2X CTAB method (Doyle & Doyle, 1987) yielding ~50-120 ng/µl of

DNA per sample. Modifications to the Doyle & Doyle (1987) protocol include a second addition of chloroform-isoamyl alcohol to the aqueous phase immediately following the first addition, and precipitation of DNA at -20ºC instead of room temperature. For 42 accessions of C. maxima, DNA was extracted directly from non-viable germplasm using a 2X CTAB protocol with the following modifications: The seed was dissected and the endosperm and embryonic tissue removed from the seed coat using a sterile razor blade and metal spatula. DNA was extracted from the endosperm and embryo following the same method described above but with an additional phenol-phenol purification following the first additional of chloroform-isoamyl alcohol to allow for stronger dissolution and separation from aqueous DNA.

Targeted Enrichment and DNA Sequencing

Genomic library building, probe design, and targeted enrichment were performed by Rapid Genomics LLC (Gainesville, FL, USA). Between 250 ng and 1 μg of genomic

DNA of each sample was fragmented to an average size of 400 bp. DNA libraries were constructed by end-repairing the sheared DNA, A-tailing and adapter ligation, bar- coding, and PCR amplification. Targeted enrichment of Illumina libraries using biotinylated RNA baits (Gnirke et al. 2009) was used to reduce genomic complexity prior

135

to sequencing to increase the number of samples that could be multiplexed on a single sequencing lane. A custom RNA probe kit was developed and synthesized that included ten thousand 150-mer probes targeting 9,175 genomic sequences including 7,922 previously identified single nucleotide polymorphisms (SNPs) and 1,253 putatively single-copy genes.

Single nucleotide polymorphism loci targets were based on SNPs in C. maxima ssp. maxima identified by Zhang et al. (2015) with 500 bp flanking sequences mined from the C. maxima ssp. maxima genome sequenced and assembled by The

International C. maxima Genome Initiative (http://www.icugi.org/cgi- bin/ICuGI/genome/maxima.cgi, unpublished). Single-copy genes were identified using a custom all-by-all BLAST plus single-linkage-clustering pipeline described in Kates et al.

(2017) where the BLAST database included four C. argyrosperma transcriptomes and three C. maxima transcriptomes sequenced and assembled by the COMAV Cucurbits

Breeding Group and Bioinformatics at Universidad Politécnica de Valencia

(unpublished).

Putatively single-copy genes ranged in size from 350-900 bp. Whole plastome sequences of C. maxima and C. argyrosperma obtained from GenBank were used to identify and remove potential probes that would hybridize with high-copy plastid genes.

We designed 7,922 probes to target SNPs, and probes were centered on the SNP region. A total of 2,081 probes was designed to capture the putatively single copy genes. A single probe was used to target exons that were less than 350 bp, two probes for those that were more than 350-500 bp, and three probes for exons >500. The probes were placed to capture both intron and exon sequence. The probes were

136

hybridized to the libraries and enriched for the targets specified. Samples were then pooled equimolar, and 61,778,473 reads were generated on an Illumina HiSeq 3000

PE100 (Illumina, San Diego, CA, USA).

Read Filtering, Mapping, and SNP Calling

Sequencing reads were split by barcode, filtered, and trimmed by quality using the fastx toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). Filtered reads were aligned to the C. maxima ssp. maxima reference genome (icugi.org, unpublished) using MOSAIK

2.3.2 (Lee et al. 2014) with a mismatch threshold (option -mmp) of 0.05. SNPs were identified in the nuclear genome using FREEBAYES 0.9.15 (Garrison and Marth 2012).

Sites with less than 8x coverage (--min-coverage 8) and alleles with a base quality of less than 20 (--min-base-quality 20) were excluded from the analysis, and indels and multi-nucleotide polymorphisms were ignored (--no-indels, --no-mnps). SNPs were quality-filtered using vcftools (Daneck et al. 2011) to include only biallelic sites with quality values greater than 10 and fewer than 50 missing genotypes, mean depth value between 3 and 750, a minor allele frequency greater than or equal to 0.01, and minor allele count greater than or equal to 1. All analyses described below were performed for

C. maxima and C. argyrosperma independently using sets of within-species SNPs parsed from the full dataset using vcftools (Daneck et al. 2011).

Population Structure and Genetic Diversity

Population structure within each species was estimated using STRUCTURE

2.3.4 (Pritchard et al. 2000). We also separately estimated population structure within domesticated C. maxima. For all STRUCTURE analyses, 10 independent runs with a burn-in length of 50,000 and a run length of 100,000 were performed for each K value from 1 to 10. A priori population information (i.e. wild, landrace, improved) was not used.

137

The most likely K value was determined based on the Evanno method (Evanno et al.

2005). STRUCTURE results were visualized using the R package Pophelper 2.1.0

(Francis 2016). Principal component analysis (PCA) and FST analysis were performed using the R package SNPRelate (Zheng et al. 2012). To assess variation among different sample sets, FST was calculated using two population definitions: 1) two structure-defined clusters (K=2 differentiated wild from domesticated (landrace and improved) samples using a proportion of inferred ancestry (q) cutoff of 0.7 (70%) for C. maxima and C. argyrosperma except for one wild C. maxima sample that did not belong to the wild cluster based on inferred ancestry coefficient and was excluded from subsequent analyses), and 2) three a priori population designations: wild, landrace, and improved. For all PCA and FST analyses, linkage disequilibrium (LD) based pruning was performed with an LD threshold of 0.20 that resulted in a set of 1,926 SNPs for C. maxima and 2,024 SNPs for C. argyrosperma.

Phylogenetic trees were built using SNPhylo (Lee et al. 2014) with 100 bootstrap replicates and rooted with four outgroup accessions (i.e. four C. argyrosperma accessions were included in the phylogenetic analysis of C. maxima accessions and four C. maxima accessions were included in the phylogenetic analysis of C. argyrosperma accessions). Although population genetic datasets comprising SNP data for closely related individuals violate assumptions of the evolutionary models underlying phylogenetic analysis (e.g. that implemented in DNAML (Felsenstein 1981) used in

SNPhylo), phylogenetic trees can provide additional information about sample groupings along with population genetic analyses when some characteristics of the data are accounted for. SNPhylo extracts representative SNPs from the original dataset to

138

reduce SNP bias due to high levels of LD (Lee et al. 2014); this results in a set of aligned sites (SNPs) less in violation of the model's assumption that each site evolved independently (Felsenstein 1981). For our phylogeny reconstructions, we used an LD threshold of 0.40 for C. maxima and for C. argyrosperma, resulting in a phylogenetic dataset of 4,285 SNPs for C. maxima and 2,988 SNPs for C. argyrosperma.

Expected heterozygosity (HE) and observed heterozygosity (HO) were calculated with the 4P software (Benazzo et al. 2015) as locus-by-locus and population mean estimates. Watterson's estimator of nucleotide diversity θW (Watterson 1975) and nucleotide diversity π (Nei and Li 1979) was calculated by gene across each using the Popgenome package for R (Pfeifer et al. 2014). Chromosome positions correspond to the C. maxima ssp. maxima loci identified by Zhang et al.

(2015) used to target the SNP loci (described previously). A GFF file created using the chromosome position information in the BED file that accompanied the SNP data was used to map chromosome wide positions for each SNP in Popgenome. For C. maxima, populations were defined as the two STRUCTURE-defined clusters that corresponded to improved and wild accessions. Landrace samples were excluded from analyses rather than included in the improved population to avoid potential issues with uneven sampling across populations. For C. argyrosperma, populations were defined as the three a priori designations wild, landrace, or improved as these were supported by

STRUCTURE analysis and PCA.

Results

SNP Data

An initial set of 67,362 inter-species SNPs was identified from a total of 61.8 million merged paired-end reads for 96 samples mapped to the C. maxima ssp. maxima

139

genome. After filtering SNPs and separating by species, a set of 15,236 SNPs for C. argyrosperma and 17,235 SNPs for C. maxima were used for the analyses.

Population Structure and Pairwise Population Differentiation

The Evanno method (Evanno et al. 2015) suggested two clusters (K=2) as optimal for both C. argyrosperma and C. maxima. For K=2 in C. argyrosperma, the wild and improved samples are differentiated as distinct clusters (ancestry coefficient > 0.7), and the landrace samples are a mixture of individuals from both ancestral populations

(Figure 5-2a). For K=2 in C. maxima, the wild and improved samples are clearly differentiated, but in contrast to the structuring of C. argyrosperma, the landrace samples are not admixed and are instead all from the same ancestral population as the improved samples (Figure 5-2b). We observed less admixture in landrace and in improved samples of C. maxima than for landrace and improved samples of C. argyrosperma. Because there are limitations to considering clustering results for only one value of K (Meirmans 2015), we present the clustering results for K=3 and K=4 as well as K=2 as all three sets of clustering results warrant a biological explanation. As values of K change progressively from 2 to 4, subpopulation structure appeared in the domesticated C. maxima accessions; appearance of subpopulation structure as K increased from K=2 was less pronounced for wild accessions and domesticated C. argyrosperma.

Mapping of admixed wild samples onto their sampling locations can provide information about likely origins of domestication by highlighting geographic regions where admixed wild samples that contain a relatively high proportion of alleles common in the domesticated samples occur. In C. argyrosperma, samples with a relatively high proportion of alleles common in the domesticated samples occur mostly in western and

140

central Mexico and not in the easternmost or southernmost part of the distribution of the wild taxon (Figure 5-2c). In C. maxima, samples with a relatively high proportion of alleles common in the domesticated samples occur mostly in the northernmost part of the distribution of the wild taxon (Figure 5-2d).

For both C. maxima and C. argyrosperma, the results of PCA were congruent with the clusters differentiated by STRUCTURE (Figure 5-3a and 3b). For both species, pairwise FST was higher in the comparison of wild samples and domesticated samples

(both landrace and improved) than in analyses of the two domesticated subclasses

(landrace and improved) compared to each other or to the wild samples. Pairwise FST between wild and domesticated samples was nearly equal in C. maxima and C. argyrosperma. In C. argyrosperma, pairwise FST between the two STRUCTURE defined populations was 0.30. Pairwise FST was close to equal for wild v. landrace and landrace v. improved (0.23 and 0.22, respectively) and was 0.32 for wild v. domesticated

(landrace and improved). In C. maxima, pairwise FST between the two STRUCTURE- defined populations was 0.33. FST for wild vs. domesticated was equal to wild vs. landrace (0.30), and FST between landrace and improved was very low (0.07).

Phylogenetic Analysis

Cucurbita argyrosperma- The wild and domesticated samples form separate clades (BS 67% and 88%, respectively) (Figure 5-4a). Domesticated samples occur in two subclades: clade I (BS 70%) is composed entirely of landrace samples, and clade II

(BS 70%) includes landrace and improved samples. There is not a clear geographic pattern to the resolution of landrace samples in two separate clades, but generally clade

I includes landrace samples from the southeastern part of the sampling range and clade

II landrace samples are more broadly distributed (Figure 5-4b). One sample designated

141

as "wild" (WC09) was resolved as sister to the rest of the domesticated samples.

However, this sample was originally submitted to the USDA Plant Genetic Resources

Conservation Unit as "Cucurbita cf. palmeri" and was later re-named as C. argyrosperma ssp. sororia. Cucurbita cf. palmeri is thought to be a feral escape from cultivation and is very difficult to distinguish from the wild subspecies and was thus excluded from this study's sampling. Based on our phylogenetic results, this sample is likely a feral escape from cultivation rather than a true wild sample.

Cucurbita maxima- The wild and domesticated samples each form well-defined clades (BS 100%) (Figure 5-4c). The domesticated clade includes the wild sample

(WF04) that was also assigned to the domesticated genetic cluster by STRUCTURE based on its ancestry coefficient and was subsequently removed from genetic diversity analyses. One improved sample (WF06) "Nan kwa" is sister to the rest of the domesticated samples and is the only cultivar included in this study that is likely of

Chinese origin.

Genetic Diversity

The mean values of expected heterozygosity (HE) and observed heterozygosity

(HO) in both species are summarized in Table 5-2. Observed heterozygosity was more than three times higher in wild C. argyrosperma than in wild C. maxima. Both wild and improved C. maxima exhibited much higher expected heterozygosity than observed heterozygosity. In C. argyrosperma mean HE and HO were 0.201 and 0.143 in the wild samples, 0.148 and 0.104 in the landrace samples, and 0.084 and 0.060 in the improved samples. In C. maxima, HE and HO were 0.212 and 0.041 in the wild samples and 0.270 and 0.084 in the improved samples, respectively.

142

The by-locus values of within-population nucleotide diversity (θw and π) across chromosomes are illustrated in Figures 5-5 and 5-6, and the mean θw and π for each population is summarized in Table 5-2. Mean θw was similar for wild C. argyrosperma and C. maxima, but highest in improved C. maxima. A similar pattern was observed for

-3 -4 mean π. In C. argyrosperma, mean θw was 1.10x10 in the wild samples, 7.77x10 in the landrace samples, and 4.65x10-4 in the improved samples. Mean π was 1.52x10-3 in the wild samples, 1.24x10-3 in the landrace samples, and 7.41x10-4 in the improved

-3 -3 samples. In C. maxima, mean θw was 1.30x10 in the wild samples and 1.65x10 in the improved samples, and mean π was 1.45x10-3 in the wild samples and 1.66x10-3 in the improved samples. Although the overall trend across loci is lower genetic diversity in improved C. argyrosperma (Figures 5-5a and 5-6a) and wild C. maxima (Figures 5-5b and 5-6b), there are multiple loci for each species that exhibit the reverse (i.e., a locus with higher genetic diversity for improved C. argyrosperma than for wild C. argyrosperma) (Figures 5-5 and 5-6).

Discussion

Origins of Domesticated Buttercup Squash and Cushaw

Our study provides the first molecular investigation of the geographic location and number of domestication origins of buttercup squash and of cushaw.

Cucurbita maxima- Both STRUCTURE and phylogenetic results strongly support a single origin of domesticated buttercup squash and reveal geographic structuring of the wild ancestor C. maxima ssp. andreana. Until recently, hypotheses on the geographic origin of buttercup squash were limited to southern South America (Nee

1990) because C. maxima ssp. andreana was only known from this region (Lema

2009). However, collections of C. maxima ssp. andreana in eastern Peru and Bolivia

143

(Andres and Nee 2005; Valega Rosas et al. 2004) have expanded the described range of C. maxima ssp. andreana. Although we could not obtain accessions of wild C. maxima ssp. andreana from Peru or Bolivia, our results do support a geographic origin of domesticated buttercup squash that is more northern than previously thought.

Populations from the northern part of the wild ancestor's range in Argentina have a higher proportion of alleles assigned to the domesticated subspecies by STRUCTURE than the southern populations. Further study that includes accessions from north of

Argentina may find northern populations are more representative of the ancestral population(s) of domesticated buttercup squash.

Cucurbita argyrosperma- Both STRUCTURE and phylogenetic results strongly support a single origin of domesticated cushaw, and STRUCTURE results reveal geographic structuring of wild cushaw, C. argyrosperma ssp. sororia. Although archaeological remains suggest that southern Mexico is the origin of initial domestication for cushaw, STRUCTURE results suggest that populations of C. argyrosperma ssp. sororia from the southern part of its range share fewer alleles with domesticated cushaw than populations from the Pacific coast and central Mexico. The cluster of populations of wild C. argyrosperma ssp. sororia on the Pacific coast that have the highest proportion of the "domesticated genotype" as defined by STRUCTURE may represent a more specific geographic origin for the initial domestication of cushaw in Mexico.

Origin of Gigantism in Buttercup Squash

Most varieties of C. maxima ssp. maxima do not bear "giant" fruit (Goldman

2004). Researchers are beginning to understand the mechanisms that enable the growth of buttercup squash (see Savage et al. 2015). The relative roles that genetics

144

and environment play in this trait are not fully understood. Selective breeding for larger fruit in certain C. maxima varieties has led to a suite of changes in the carbon transport system from source to sink (Savage et al. 2015). The relationships among varieties of

C. maxima ssp. maxima may help determine why this phenotype has been developed only in certain varieties of one species of pumpkin. Although our study included only two accessions of cultivars that bear very large fruit (over 1 m circumference at the widest point), our phylogenetic results resolve these two accessions (WF07: big moon, segregating fruit shape ; WF10: Denver white, Hubbard cultivar group) in separate subclades of the domesticated clade. This suggests that the trait has arisen independently in genetically divergent varieties during the development of new cultivars.

Domestication Bottlenecks in Buttercup Squash and Cushaw: Considering Breeding System and Gene Flow

Our results reveal two contrasts in the domestication footprints of the independently domesticated C. maxima and C. argyrosperma: 1) We found a clear differentiation between landrace and improved types of cushaw, but no substructure of domesticated buttercup squash; and 2) We recovered a reduction of genetic diversity in domesticated cushaw compared with its wild ancestor, consistent with a domestication bottleneck, but did not find a reduction of genetic diversity in domesticated buttercup squash. Below, we present possible explanations for and implications of these findings independently for each species. We then consider why the footprints of domestication may differ between the two species.

Cucurbita maxima- Our phylogenetic and STRUCTURE results do not show differentiation between landrace and improved accessions of domesticated C. maxima ssp. maxima. Our designations of accessions as landrace or improved may be inexact.

145

However, we expect differentiation between the two groups as defined because

"landrace" accessions are from the origin of domestication in Argentina and Uruguay and the "improved" accessions are from secondary centers of diversity in the United

States and China. FST between these two groups was also very low. Landraces and improved cultivars contained a similar number of SNPs and similar genotypes, indicating that much of the diversity that survived through the initial stages of domestication was retained in current-day cultivars and breeding lines. This result is surprising because although C. maxima ssp. maxima and other Cucurbita crops are naturally outcrossing, breeding F1 hybrid squash through self-pollination has been the industry standard for developing new varieties of C. maxima ssp. maxima since the

1960s (Whitaker and Robinson 1986; Della Vecchia et. al. 1993; Robinson and Decker-

Walters 1997) due to the unusual ability of Cucurbita to withstand inbreeding (Allard

1971; Whitaker and Robinson 1986; Robinson 1999). This breeding strategy was a distinct change from the outcrossing previously common in Cucurbita breeding.

Prior to 1960, Cucurbita cultivars were characterized by high genetic variability in part due to their tendency to outcross (Bisognin 2002). During the past 60 years, inbred lines have been used to develop more uniform hybrids than previous open-pollinated cultivars (Paris 1989). Although some breeders have made specific efforts to create new varieties of C. maxima ssp. maxima that maintain maximum heterozygosity (see:

Dallman and Dallman 2009), this practice is not common enough to greatly affect our results. A specific shift to modern breeding techniques often yields modern lines that can be differentiated from ancient landraces (e.g. carrot: Iorizzo et al. 2013), the latter of

146

which are expected to be more representative of the initial domesticate and older breeding practices (Zeven 1998).

More unexpected than the lack of differentiation between landrace and modern

C. maxima ssp. maxima is the lack of reduced genetic diversity in domesticated C. maxima ssp. maxima as a whole. Although our STRUCTURE and phylogenetic results clearly separate domesticated and wild C. maxima, we found that the domesticated accessions had a comparable genome-wide level of genetic diversity to the wild group.

Possible explanations for this include: 1) The initial domestication of C. maxima involved a broad genetic base of wild plants and ongoing gene flow between wild and domesticated populations; 2) Breeding of C. maxima ssp. maxima seeks to maintain high genetic diversity in cultivars and has resulted in domesticated C. maxima with high genetic diversity, including novel genetic variation; and 3) The wild populations included in this study do not capture the whole genetic diversity of the ancestral population(s), either because they include feral escapes from cultivation rather than truly wild populations or represent a subset of the genetic diversity of the wild ancestor of C. maxima ssp. maxima.

Multiple lines of evidence support a broad initial domestication of C. maxima and ongoing gene flow between wild and cultivated populations. Wild, intermediate, and domesticated C. maxima morphotypes were identified among archaeological remains of

C. maxima from the Pampa Grande archaeological site (1720 ± 50 bp) (Lema 2011).

The domesticated types were morphologically diverse and included both thin-rind types adapted for use as food and others with thick, lignified rinds suited for use as containers

(Lema 2011). "Intermediate" remains likely resulted from hybridization between wild C.

147

maxima ssp. andreana and domesticated C. maxima ssp. maxima (Lema 2011). Early

C. maxima cultivation practices probably allowed crosses between sympatric wild and domesticated populations (Lema 2011) to introduce novel traits, and this practice is still common in modern rural (Altieri and Merrick 1987).

A domesticated species may retain or even increase morphological diversity relative to its wild ancestor (e.g. common bean (Phaseolus): Singh et al. 1991) due to ongoing gene flow between wild and domesticated plants. This contrasts the domestication process as commonly described based on maize, wheat, or rice, because the domestications of these crops involved a radical shift in morphology (e.g. loss of shattering) and/or polyploidization that resulted in a loss of sexual compatibility with wild progenitors (Haudry et al. 2007; Hufford et al. 2012; Li et al. 2006). The most radical phenotypic shift in the domestication of Cucurbita was the loss of bitter cucurbitacins

(described above), but unlike the loss of hard kernel covering in maize, cucurbitacins in intermediate types of C. maxima do not preclude human-use. Cucurbita fruits were likely first selected for the consumption of seeds that do not contain cucurbitacins present in the fruit flesh (Small 2014). Repeated boiling can remove cucurbitacins from the fruit's flesh, and cucurbitacins have important uses (e.g. medicinal, use as soap)

(Nabhan 1985).

Cucurbita maxima ssp. maxima is one of the most phenotypically diverse domesticated Cucurbita species (Lira Saade and Montes Hernandez 1994). There are over 52 C. maxima cultivars with wide variation in morphological traits (e.g. flesh and fruit color, shape, rind texture, fruit size and lobing) and agronomic traits (e.g. annual cycle, yield, and adaptive plasticity) (Lema 2009). Although breeding strategies rarely

148

seek to maintain high heterozygosity in the development of new cultivars (Dallman and

Dallman 2009), we observed high diversity among cultivars that likely underlies the high phenotypic diversity of C. maxima ssp. maxima. A previous characterization of the genetic diversity of a set of C. maxima ssp. maxima cultivars by Ferriol et. al. (2003) grouped different C. maxima ssp. maxima accessions according to type of use, suggesting high genetic diversity among cultivars. We found that HO was much lower than HE in domesticated C. maxima. Observed heterozygosity that is lower than HE means that there were fewer heterozygotes in our domesticated C. maxima samples than expected based on the allelic diversity present in domesticated C. maxima. This is evidence for inbreeding in domesticated C. maxima and helps to explain how we can observe high genetic diversity in domesticated C. maxima despite low heterozygosity that is expected due to breeding practices that favor homozygosity. The diversity in morphological and agronomic traits among different C. maxima cultivars suggests the cultivars are distinct inbred lines and is consistent with this result.

Prior to 2004, the distribution of wild C. maxima ssp. andreana was thought to occur only in Argentina (Lema 2009). Because C. maxima ssp. andreana is the hypothesized progenitor of C. maxima ssp. maxima, Argentina is often cited as the site of domestication of this species, and the wild populations that occur there are expected to include the ancestral genotype. However, recent collections of C. maxima ssp. andreana populations in eastern Peru and Bolivia (Andres and Nee 2005; Valega Rosas et al. 2004) reveal that C. maxima ssp. andreana has a broader distribution. We were not able to obtain accessions of wild C. maxima ssp. andreana from Peru or Bolivia. If the center of diversity of wild C. maxima ssp. andreana is the northern part of its

149

currently described range and C. maxima ssp. maxima was initially domesticated in this region, the wild populations included in our study would not represent the ancestral diversity of domesticated C. maxima ssp. maxima. Instead, our wild populations could represent a subset of the genetic base of C. maxima ssp. andreana at the time of domestication. Domesticated accessions derived from more genetically diverse populations than those included in our study would not necessarily exhibit reduced genetic diversity relative to the wild accessions in our study.

A second reason why the wild accessions included in this study would not represent the genetic diversity present in the ancestor of C. maxima ssp. maxima is if C. maxima ssp. andreana is a feral escape from cultivation rather than a true wild species.

Although literature prior to the 1980s refers to C. maxima ssp. andreana as the “feral” or

“weedy” relative of buttercup squash because of its occurrence in human-disturbed habitats (e.g. Whitaker and Bemis 1964; Whitaker 1951), the range of wild C. maxima ssp. andreana is now regarded as too broad to be achieved by a feral escape (Brücher

2012). Cucurbita maxima ssp. andreana is also not truly weedy; it does not occur in agricultural fields, and its seeds are poorly dispersed. Seedlings only emerge on the site where fruits from the previous season naturally dry and break apart (Lopez Anido, pers. comm.). It is also unlikely that a feral escape of domesticated C. maxima ssp. maxima would have reverted to the ancestral trait of high content, a trait that is absent in all domesticated Cucurbita (Nee 1990) but present in wild species including C. maxima ssp. andreana.

Cucurbita argyrosperma- Our STRUCTURE results show that landrace accessions are mostly admixed between wild and domesticated genotypes at K=2 and

150

that they form a separate population at K=4. The phylogenetic analysis resolved domesticated accessions into two clades, one of which includes only landrace accessions (although some landrace accessions are resolved in the second clade). We also observed lower genetic diversity in the landrace samples than in the wild samples, consistent with a loss of genetic diversity following initial domestication and a subsequent loss of genetic diversity or secondary bottleneck following modern improvement. This pattern has been observed in many other crops (e.g. maize: Hufford et al. 2012; soybean: Wen et al. 2015; cotton: Fan et al. 2017) and occurs due to a founder effect when a small subset of the initial domesticate is introduced to new areas and/or modern breeding practices more severely reduce heterozygosity (Zeven 1998).

A strong secondary bottleneck in C. argyrosperma is surprising, because C. argyrosperma ssp. argyropserma is still mostly cultivated within its site of domestication

(Lira Saade and Hernandez 1995), and there has not been economic incentive to breed modern uniform inbred C. argyrosperma ssp. argyrosperma cultivars.

We also observed clear differentiation between wild and domesticated C. argyrosperma and an overall reduction in genome-wide diversity consistent with a domestication bottleneck. A previous study by Montes-Hernandez and Eguiarte (2002) of wild and domesticated C. argyrosperma based on a small set of allozyme loci found low differentiation between wild and domesticated populations, leading the authors to conclude that there were high levels of gene flow between wild and domesticated C. argyrosperma. The difference in our results is likely due to broader geographic and genetic sampling. Selecting accessions to represent modern lines of C. argyrosperma may be imprecise because it is not clear that any truly "modern" cultivars have been

151

developed for this species. To include samples most representative of a modern type, our improved sample set was biased toward C. argyrosperma ssp. argyrosperma var. callicarpa. Of the three cultivated varieties of C. argyrosperma, var. callicarpa is considered the most recently developed and most specialized (Lira Saade and Montes-

Hernandez 1995) and is the variety most often grown outside of the geographic origin of domestication. It is therefore unsurprising that a set of C. argyrosperma ssp. argyrosperma var. callicarpa accessions would have lower genetic diversity and greater differentiation from wild C. argyrosperma than cultivated samples included in other studies.

Contrasting the Genome-Wide Signatures of Selection and Domestication Histories of Buttercup Squash and Cushaw

Each of the contrasting domestication histories proposed above based on patterns observed in our data has been described in other crops. Some domesticated plants exhibit decreased genetic diversity consistent with a domestication bottleneck

(e.g. maize: Hufford et al. 2012; African rice: Li et al. 2011; wheat: Haudry et al. 2007), and others do not (e.g. apple: Cornille et al. 2012; carrot: Iorizzo et al. 2013). Likewise, modern-improved cultivars may represent a subset of the genetic diversity present in landraces (e.g. peach: Cao et al. 2014), but this is not always the case (e.g. pigeonpea:

Kassa et al. 2012). The levels of genetic diversity and population differentiation we observed may be affected by insufficient or biased genetic or accession sampling.

Biological reasons for the differences in signatures of selection among crops include: 1)

Outcrossing vs. inbred breeding strategies; 2) annual vs. perennial growth; 3) time since domestication; and 4) domestication traits. An outcrossing, perennial crop domesticated thousands of years ago without a radical divergence from the wild phenotype is

152

expected to bear a less pronounced signature of selection than a more recently domesticated inbred, annual crop with uniform phenotype, especially a phenotype that reproductively isolates it from its wild progenitor. This comparison explains why domesticated carrot does not have lower genetic diversity than its wild progenitor and maize does; less clear is why two independently domesticated pumpkins with similar breeding systems, domestication syndromes, and a relatively similar time of domestication exhibit very different signatures of selection.

Most of the explanations for the lack of reduced genetic diversity in landrace and improved C. maxima discussed above in "Domestication bottlenecks in buttercup squash and cushaw: considering breeding system and gene flow" also apply to C. argyrosperma. The differences between the two domesticates are their geographic origins of domestication and wild ancestors, the greater morphological diversity of C. maxima cultivars compared with C. argyrosperma cultivars, the greater geographic extent to which C. maxima ssp. maxima is cultivated, and the higher demand for morphologically-uniform fruit production in C. maxima. Wild C. argyrosperma and wild

C. maxima included in this study had very similar levels of mean genetic diversity as measured by nucleotide diversity (θW and π) and expected heterozygosity (HE), but observed heterozygosity (HO) was much lower in wild C. maxima ssp. andreana than in wild C. argyrosperma ssp. sororia. Low heterozygosity in a wild progenitor may have limited the reduction in genetic diversity that accompanied domestication if crosses between divergent wild individuals that might not have occurred without human intervention produced novel allelic diversity in the domesticate.

153

A second and more straight forward explanation for the different signatures of domestication in the two crops is that the development of broad diversity for morphology and agronomic traits in C. maxima has maintained high allelic diversity in domesticated

C. maxima. There is relatively little diversity in fruit morphology of domesticated C. argyrosperma, and it also does not tolerate a wide variety of growth conditions (Lira

Saade and Montes Hernandez 1995). Cucurbita argyrosperma shares its origin of domestication with at least one other Cucurbita species (C. pepo), so early domesticators may not have selected many different types of this species. For example, there is no C. argyrosperma cultivar specific for immature fruit consumption. In contrast,

C. maxima is the only Cucurbita species domesticated in southern South America and different types may have been selected for different human uses and growth conditions.

A third explanation for the different patterns of genetic diversity observed in C. argyrosperma and C. maxima is not related to a difference in domestication history: If C. maxima ssp. andreana from Argentina represents a subset of the genetic diversity present in the ancestral populations of C. maxima ssp. maxima as discussed above in

"Domestication bottlenecks in buttercup squash and cushaw: considering breeding system and gene flow", the relative level of genetic diversity in domesticated C. maxima compared with C. maxima ssp. andreana cannot truly be viewed as a product of the genetic diversity of C. maxima ssp. andreana.

Conclusions

This study is the first genomic investigation into the domestication of squashes and pumpkins. It also offers a rare comparison of the genetic consequences of domestication in two crops that were independently domesticated from closely related species. As such, our results provide novel insights into how multiple factors influence

154

the effect of domestication on the genetic diversity of crops. We found that two species that share many characteristics bear contrasting signatures of selection, and we identify broad morphological diversity of cultivars as a possible indicator of high genetic diversity in a crop. This is also evidence that modern breeding practices do not always reduce crop genetic diversity, and breeders may discover untapped genetic diversity underlying traits of interest in improved germplasm.

155

Table 5-1. Summary information about each domesticated subspecies Subspecies Cultivar groups Origin Current Elevation Optimum Most common (common cultivation growth uses name used in conditions this paper) C. maxima Banana squash; Silver-seed gourd; Worldwide esp. 500m to 18-27ºC; Fruit ssp. maxima Delicious squash; green-stripe cushaw; Africa and Asia 2,000m1 tolerant of low (immature, (buttercup Buttercup Calabaza pipiana (Lira temp.; mature, squash) squash; Hubbard Saade and Montes photoperiod canned, squash; Show Hernandez 1994) insensitive; decorative) pumpkins; sensitive to Turban squash; frost and Kabocha waterlogging (Decker-Walters and Walters 2000) C. South America Southern Mexico Limited. 0 to 1,800m Not tolerant of Seeds (snack argyrosperma ~4,000 years >7,000 years B.P. Mexico, low temp; likely food; oil; meal); ssp. B.P. (Smith 2006) (Smith 2006) U.S.A., Central photoperiod Fruit (usually argyrosperma America sensitive mature) (cushaw) (unconfirmed); sensitive to frost and waterlogging

156

Table 5-2. Summary of genetic diversity statistics Statistic C. argyrosperma C. maxima FST wild v. landrace 0.23 0.30 FST landrace v. improved 0.22 0.07 FST wild v. domesticated 0.32 0.30

FST STRUCTURE (K=2) pop 1 v. pop 0.30 0.33 2 HE wild; HE landrace; HE improved 0.201; 0.148; 0.212; NA; 0.270 0.084 HO wild; HO landrace; HO improved 0.143; 0.104; 0.041; NA; 0.084 0.060 -3 -3 θw wild; θw landrace; 1.10x10 ; 1.30x10 ; NA; -4 -3 θw improved 7.77x10 ; 1.65x10 4.65x10-4 π wild; π landrace; 1.52x10-3; 1.45x10-3; NA; π improved 1.24x10-3; 1.66x10-3 7.41x10-4

157

Figure 5-1. Map of putative range of wild C. maxima and C. argyrosperma with sampling localities (if known) for landrace and wild samples included in this study. For Cucurbita argyrosperma ssp. sororia (blue), distribution is based on 620 occurrence points from GBIF (gbif.org). For C. maxima ssp. andreana (orange), distribution is based on 58 occurrence points from GBIF (gbif.org), specieslink (splink.cria.org.br/), discover life (discoverlife.org), and conabio (conabio.gob.mx). Landrace samples are indicated by triangles and wild samples are indicated by solid circles.

158

Figure 5-2. Bayesian clustering (STRUCTURE) results for C. maxima and C. argyrosperma and ancestry coefficients mapped onto sampling localities. (a- b) Bayesian clustering (STRUCTURE, K = 2 to 4) of (a) C. argyrosperma accessions and (b) C. maxima accessions. (c-d) Ancestry coefficients mapped onto sampling localities for wild accessions of (c) C. argyrosperma and (d) C. maxima. Colors correspond to ancestry-coefficient colors in Figure 2 a-b

159

Figure 5-3. Principal component analysis of improved (blue), landrace (red), and wild (orange) (a) C. argyrosperma accessions and (b) C. maxima accessions.

160

Figure 5-4. Results of phylogenetic analysis of (a) C. argyrosperma accessions and (b) C. maxima accessions. Branches are colored by improvement status. Two clades of domesticated C. argyrosperma are indicated by Dom. Clade I and Dom. Clade II (c) Map of sampling localities of C. argyrosperma landraces with samples' membership in Dom. Clade I or Dom. Clade II indicated by * or O.

161

Figure 5-5. By-locus values of within-population nucleotide diversity (θw) across chromosomes 1-20 in (a) C. argyrosperma and (b) C. maxima for improved (blue) and wild (orange) "populations".

162

Figure 5-6. By-locus values of within-population nucleotide diversity (π) across chromosomes 1-20 in (a) C. argyrosperma and (b) C. maxima for improved (blue) and wild (orange) "populations".

163

CHAPTER 6 GENERAL CONCLUSIONS

Long standing questions in evolutionary biology can be addressed with studies of domestication, and the processes of domestication and origins of domesticated plants are best understood in an evolutionary context. Nevertheless, the disciplines of evolutionary biology and crop science rarely overlap. Studies of crops that do include crops' wild relatives often focus on only the most closely related wild ancestor of a crop, although we have much to learn from more distantly related wild relatives. The use of these more distant relatives for crop improvement is also increasingly plausible due to innovations in genetic engineering. Because of its multiple independent domestications,

Cucurbita presents unique opportunities for better understanding domestication as an evolutionary process. There are also many unanswered questions about the domestication histories and evolution of agronomic traits of the domesticated Cucurbita species.

Throughout my dissertation work I sought to reconstruct the evolutionary and domestication histories in Cucurbita using multiple disciplines and novel methods.

Chapter 2 provided the context for studying the domestication of pumpkins and squashes by presenting the current understanding of the origins and uses of the domesticated pumpkin and squash, and the natural history of Cucurbita CWR. Chapter

3 used novel phylogenetic methods to reconstruct the evolutionary history of all wild and domesticated Cucurbita taxa and provided the evolutionary context necessary for studying domestication processes in pumpkins and squashes. Chapter 4 investigated an important pattern observed in the phylogenetic results in Chapter 3 by performing phylogenetic analyses of Artocarpus sequence datasets assembled in three different

164

ways. Chapter 5 used a population genomics approach to understand the domestication histories of two crop-wild species pairs identified in Chapter 2. Below I summarize the major findings from my dissertation and conclude with my ongoing work.

Through my review of breeding and cultivation of pumpkins and squashes along with the natural history of wild Cucurbita species in Chapter 2, I conclude that utilizing

Cucurbita CWR genetic resources for crop improvement could resolve major challenges to cultivation. Tertiary CWR exhibit the greatest drought tolerance and disease resistance of all wild Cucurbita species, and further study of the biology and genetics of these wild species could uncover the mechanisms underlying these vital traits. Despite the value of Cucurbita CWR, all wild species are understudied; tertiary CWR must be studied to fully evaluate the disease resistance of these species. To enable broader studies of Cucurbita CWR, improved ex situ and in situ conservation efforts are needed.

Currently there is no in situ conservation of Cucurbita CWR even though one secondary

CWR is federally endangered in the United States. More extensive ex situ conservation of Cucurbita CWR in seed banks is needed to ensure that the genetic resources present in these wild species remain available even if the geographic distributions of these species continue to shrink due to human population growth and climate change. The evolutionary relationships among Cucurbita CWR and crop taxa resolved in Chapter 3 provide additional information about which wild species are of greatest interest for

Cucurbita crop improvement.

The phylogeny presented in Chapter 3 includes novel relationships among

Cucurbita species and well supported within-species resolution. The phylogeny does not resolve the taxon previously proposed as the wild ancestor of pepo squash (e.g.

165

jack-o-lantern) as sister to domesticated pepo squash, and I conclude that the ancestor of pepo squash is either extinct or undiscovered in the wild. Based on the phylogeny, a second Cucurbita crop taxon with no known ancestor, butternut squash, was likely domesticated in North America. Based on the novel relationship between wild and domesticated C. pepo (which includes the most familiar Cucurbita crop taxa, pepo and ovifera pumpkins) and the secondary CWR C. lundelliana and C. okeechobeensis was resolved, I conclude that the genetic basis of disease resistance in these species should be studied further. Well supported resolution within species in the phylogeny enabled the development of hypotheses on the domestication history of crop taxa. For example, because wild and domesticated accessions of C. maxima formed separate clades and wild and domesticated accessions of C. argyrosperma did not, I hypothesize that C. maxima experienced a narrower domestication bottleneck; in Chapter 5 I tested this hypothesis. Ancestral character state reconstruction performed using the phylogeny suggests that drought tolerance, present in tertiary CWR as discussed in Chapter 2, is an ancestral trait. Studies of the genetic differences between drought tolerant and non- drought tolerant Cucurbita could identify the mechanisms that confer drought tolerance in Cucurbita CWR.

An unexpected result in Chapter 3 was that higher sequence heterozygosity was associated with poorer phylogenetic resolution. I hypothesized that ambiguity codes used for heterozygous positions in the genes were not handled well by phylogenetic analyses, and posited that using phased allele sequences instead of a single consensus sequence for a heterozygote would produce more resolved and accurate phylogenetic trees. I tested this hypothesis in Chapter 4.

166

In Chapter 4, I tested how the method used to assembly heterozygous sequences affected phylogenetic resolution and accuracy and compared the effects among three different methods of data assembly: encoding heterozygous positions as ambiguous, resolving heterozygous sequences as a consensus sequence, and assembling two phased allele sequences. I found that highly heterozygous sequences did not contribute disproportionately to poor gene tree resolution for any of the methods of assembly used. I also found that using allele sequences did not improve gene tree congruence or gene tree-species tree congruence. No assembled dataset clearly outperformed the others in topological consistency across multiple phylogenetic analyses or in statistical support for the topologies it resolved. I concluded that the method used to assemble heterozygous sequences did not clearly affect phylogenetic results, but I cautioned that characteristics of different datasets may limit the application of this finding to other phylogenetic studies.

In Chapter 5, I present a study of the genetic structure and domestication bottlenecks in two pumpkin species, buttercup squash (Cucurbita maxima) and cushaw

(C. argyrosperma). These are the only two Cucurbita crop taxa with wild ancestors clearly supported by the phylogeny resolved in Chapter 3. Based on population genetic analysis of allelic diversity of SNP data, I hypothesized a single domestication event in each species and suggest that the initial domestication of buttercup squash may be from further north than previously hypothesized. In Chapter 3, I hypothesized a broader domestication bottleneck in cushaw than in buttercup squash based on the within- species phylogenetic resolution of these two species. The results of Chapter 5 did not support this hypothesis; surprisingly, buttercup squash does not bear the genome-wide

167

signature of a domestication bottleneck based on our data and analyses. I hypothesize that high levels of nucleotide diversity in domesticated buttercup squash can be explained by a combination of characteristics of its domestication and breeding history, including the development of many divergent inbred lines needed to meet the demand for phenotypically diverse squash in South America. This result suggests that genetic diversity of modern crops may be conserved by maintaining multiple varieties, and that crops domesticated in areas that are not centers of domestication for other similar crops may maintain higher levels of genetic diversity throughout the domestication and improvement processes.

Taken together, the results of this dissertation work develop Cucurbita as a system in which to study domestication processes. These results also provide necessary information to improve the use of wild Cucurbita taxa for pumpkin and squash crop improvement. Comparisons of SNP data for wild and domesticated samples used in Chapter 5 will be used to identify genes under selection that may underlie agronomic traits. For C. argyrosperma, comparisons between wild and landrace diversity will also be made to identify genomic regions that may have been involved in the initial domestication of pumpkins and squashes. These "selection scans" will be based on neutrality and diversity statistics. For example, I will identify genes as outliers with respect to values of Tajima's D by ranking genes by Tajima's D value in the domesticated samples. I will choose the top 1% with lowest Tajima's D in the domesticated samples and Tajima's D close to zero in the wild samples as selection candidates. In addition to identifying domestication genes in either C. argyrosperma or

C. maxima, these analyses will allow me to compare whether the same genes were

168

involved in the parallel domestications of these species. This comparison will help to address the evolutionary question: Do parallel genetic changes underlie parallel phenotypic changes in different species?

169

APPENDIX A SAMPLE INFORMATION FOR SAMPLES USED IN CHAPTERS 3 AND 5

Table A-1. Sample information for samples used in the phylogenetic study for Chapter 3. Label Accession Taxon Country Locality Inst Material Cucurbita argyrosperma Sonora: 9 miles north of S9-16 PI 318832 subps. sororia Mexico Movas USDA Fresh leaves Veracruz: Tamarindo Cucurbita argyrosperma between Tamarindo and S9-19 PI 438834 subps. sororia Mexico Ovejas USDA Fresh leaves Oaxaca: Near river at Km 40 Route 190 between Cucurbita argyrosperma Cazadero and Santo Do S9-20 PI 438835 subps. sororia Mexico mingo USDA Fresh leaves Chiapas: Canton El Cucurbita argyrosperma Aguacate Hunicipio de S9-23 PI 442358 subps. sororia Mexico Hazatan USDA Fresh leaves Cucurbita argyrosperma S9-26 PI 512209 subps. sororia Mexico Jalisco: Autlan USDA Fresh leaves Cucurbita argyrosperma Veracruz: 15km SE of S9-31 PI 512221 subps. sororia Mexico Cuitlahuac along hwy 150 USDA Fresh leaves Cucurbita argyrosperma S9-34 PI 512224 subps. sororia Mexico Tepic Nayarit USDA Fresh leaves Cucurbita argyrosperma S9-10 PI 512114 subsp. argyrosperma Nicaragua Playitas Matagalpa USDA Fresh leaves Cucurbita argyrosperma Mazatenango S9-11 PI 512115 subsp. argyrosperma Guatemala Suchitepeguez USDA Fresh leaves Cucurbita argyrosperma S9-5 PI 438547 subsp. argyrosperma Belize Corozal: Concepcion USDA Fresh leaves Chihuauhua: Vicinity of Arroyo Bakosiachi West of La Bufa on South side Cucurbita aryrosperma of Barranca de Batopilas S9-40 Grif 9456 subsp. argyrosperma Mexico municip. Batopilas USDA Fresh leaves Cucurbita aryrosperma S9-6 PI 451712 subsp. argyrosperma U.S. Arizona: Little Tucson USDA Fresh leaves

170

Table A-1. Continued Label Accession Taxon Country Locality Inst Material Cucurbita aryrosperma subsp. argyrosperma S9-38 PI 196923 var. argyrosperma Mexico Isthmus or Tehuantepec USDA Fresh leaves Cucurbita aryrosperma subsp. argyrosperma S9-49 PI 512211 var. palmeri Mexico Sinaloa: Mazatlan USDA Fresh leaves Cucurbita aryrosperma subsp. argyrosperma S9-53 Grif 9454 var. stenosperma Mexico Sonora: Mercado USDA Fresh leaves University of Arizona Herbarium herbarium specimen dried ARIZ 360821 ARIZ 360821 Cucurbita cordata Mexico Baja California (ARIZ) leaves (2002) S9-55 PI 653839 Cucurbita cordata Mexico not listed USDA Fresh leaves University of Arizona Herbarium herbarium specimen dried ARIZ 396001 ARIZ 396001 Cucurbita digitata U.S. Arizona (ARIZ) leaves (2008) S9-56 PI 240879 Cucurbita digitata Mexico not listed USDA Fresh leaves S9-58 PI 432441 Cucurbita ecuadorensis Portoveijo USDA Fresh leaves Km 36 west on road from S9-59 PI 432442 Cucurbita ecuadorensis Ecuador Guayaquil to Santa Elena USDA Fresh leaves S9-120 PI 532349 Cucurbita ficifolia Mexico Oaxaca USDA Fresh leaves Tradewin ds rare and heirloom TW-2456 TW-2456 Cucurbita ficifolia U.S. not listed seeds Fresh leaves PARL-13 PI 512096 Cucurbita foetidissima Mexico Guanajuato USDA Fresh leaves

San Luis Potosi: Latitude: 21 deg 50 min N (21.83333333) Longitude: 100 deg 53 min W (- PARL-19 PI 532353 Cucurbita foetidissima Mexico 100.88333333) USDA Fresh leaves

171

Table A-1. Continued Label Accession Taxon Country Locality Inst Material PARL-20 PI 540902 Cucurbita foetidissima U.S. Texas USDA Fresh leaves PARL-4 PI 435094 Cucurbita foetidissima U.S. New Mexico: white sands USDA Fresh leaves near Soledad Diez PARL-7 PI 442193 Cucurbita foetidissima Mexico Gutierrez USDA Fresh leaves PARL-9 PI 442199 Cucurbita foetidissima Mexico Zacatecas USDA Fresh leaves S9-64 PI 438543 Cucurbita lundelliana Belize not listed USDA Fresh leaves S9-66 PI 489690 Cucurbita lundelliana Guatemala not listed USDA Fresh leaves S9-68 PI 532357 Cucurbita lundelliana Mexico not listed USDA Fresh leaves S9-71 PI 540898 Cucurbita lundelliana Honduras not listed USDA Fresh leaves Cucurbita maxima ne9-1 PI 162669 subsp. andreana Argentina not listed USDA Fresh leaves Cucurbita maxima ne9-32 G 29253 subsp. andreana Argentina not listed USDA Fresh leaves Cucurbita maxima ne9-33 G 29254 subsp. andreana Argentina not listed USDA Fresh leaves Tradewin ds rare and Cucurbita maxima heirloom TW-2248 TW-2248 subsp. andreana Argentina not listed seeds Fresh leaves Corrientes Province: Cucurbita maxima Market Corrientes ne9-10 PI 458660 subsp. maxima Argentina Department Capital USDA Fresh leaves Cucurbita maxima Mendoza province: ne9-12 PI 458653 subsp. maxima Argentina Chacras de Coria USDA Fresh leaves Cucurbita maxima ne9-13 PI 458674 subsp. maxima Argentina La Plata Buenos Aires. USDA Fresh leaves Cucurbita maxima ne9-2 PI 244702 subsp. maxima Brazil Sao Paulo USDA Fresh leaves Cucurbita maxima Pacanguilla Chepen La ne9-23 PI 470933 subsp. maxima Peru Libertad USDA Fresh leaves Guanajuato: Central market in Dolores Cucurbita maxima Hidalgo. grown nearby in ne9-25 PI 512110 subsp. maxima Mexico San Luis de la Paz USDA Fresh leaves

172

Table A-1. Continued Label Accession Taxon Country Locality Inst Material

Cucurbita maxima ne9-29 PI 543227 subsp. maxima Bolivia Rio Beni USDA Fresh leaves Rurrenabaque: Latitude: 14 deg 28 min 0 sec S (- 14.46666667) Longitude: Cucurbita maxima 67 deg 34 min 0 sec W (- ne9-31 PI 560950 subsp. maxima Bolivia 67.56666667) USDA Fresh leaves Cucurbita maxima ne9-5 PI 318430 subsp. maxima U.S. Minnesota USDA Fresh leaves Cucurbita maxima Caranavi La Paz market ne9-7 PI 349347 subsp. maxima Bolivia place USDA Fresh leaves S9-102 PI 512151 Cucurbita moschata Mexico Yucatan USDA Fresh leaves Beni: urrenabaque. Latitude: 14 deg 28 min 0 sec S (-14.46666667), Longitude: 67 deg 34 min S9-104 PI 560946 Cucurbita moschata Bolivia 0 sec W (-67.56666667) USDA Fresh leaves S9-76 PI 201772 Cucurbita moschata U.S. Iowa USDA Fresh leaves Puerto S9-78 PI 209117 Cucurbita moschata Rico Market Carolina USDA Fresh leaves S9-79 PI 244707 Cucurbita moschata Brazil Sao Paulo USDA Fresh leaves Puente de las Palmillas Km 116 Guatemala City to Puertos Barrios. Latitude: 15 deg 2 min N (15.03333333) Longitude: 89 deg 34 min 0 sec W (- S9-86 PI 438576 Cucurbita moschata Guatemala 89.56666667) USDA Fresh leaves S9-88 PI 441726 Cucurbita moschata Brazil Garden Altamira Para USDA Fresh leaves S9-90 PI 449348 Cucurbita moschata Guatemala Aguacatan USDA Fresh leaves

sustainabl eseedco. S9-97 PI 510631 Cucurbita moschata Mexico Sonora: Novojoa mkt com Fresh leaves

173

Table A-1. Continued Label Accession Taxon Country Locality Inst Material Cucurbita okeechobeensis subsp. S9-113 PI 540900 martinezii Mexico not listed USDA Fresh leaves University Cucurbita of Florida Herbarium okeechobeensis subsp. herbarium specimen dried FLAS 207128 FLAS 207128 okeechobeensis U.S. Florida (FLAS) leaves (2000) University of Arizona Herbarium herbarium specimen dried ARIZ 402445 ARIZ 402445 Cucurbita palmata U.S. CA (ARIZ) leaves (2006) S9-118 PI 620760 Cucurbita palmata U.S. CA USDA Fresh leaves S9-115 PI 442290 Cucurbita pedatifolia Mexico not listed USDA Fresh leaves S9-116 PI 442341 Cucurbita pedatifolia Mexico not listed USDA Fresh leaves S9-117 PI 540737 Cucurbita pedatifolia Mexico not listed USDA Fresh leaves Tamaulipas: Along roadside at Valdo el Moro ~ 60 km east of Cuidad Cucurbita pepo subsp. Victoria on Highway 70 nc7-49 PI 614683 fraterna Mexico Victoria County USDA Fresh leaves Tamaulipas: Latitude: 23 deg 22 min 0 sec N (23.36666667), Cucurbita pepo subsp. Longitude: 99 deg 0 min nc7-50 PI 532354 fraterna Mexico 0 sec W (-99) USDA Fresh leaves Tamaulipas: Latitude: 23 deg 34 min 0 sec N (23.56666667) Longitude: Cucurbita pepo subsp. 98 deg 32 min W (- nc7-51 PI 532355 fraterna Mexico 98.53333333) USDA Fresh leaves Tamaulipas: Latitude: 23 deg 34 min 0 sec N (23.56666667) Longitude: Cucurbita pepo subsp. 98 deg 32 min W (- nc7-52 PI 532356 fraterna Mexico 98.53333333) USDA Fresh leaves

174

Table A-1. Continued

Label Accession Taxon Country Locality Inst Material sustainabl Cucurbita pepo subsp. eseedco. SKU1654 SKU1654 ovifera U.S. not listed com Fresh leaves sustainabl Cucurbita pepo subsp. eseedco. SKU1672 SKU1672 ovifera U.S. not listed com Fresh leaves Cucurbita pepo subsp. : Buffalo River nc7-53 Ames 26607 ovifera var. ozarkana U.S. Searcy County USDA Fresh leaves Cucurbita pepo subsp. : River nc7-56 Ames 26610 ovifera var. ozarkana U.S. Cherokee County USDA Fresh leaves Cucurbita pepo subsp. : Wilson Creek nc7-58 Ames 26612 ovifera var. ozarkana U.S. Christian County USDA Fresh leaves Cucurbita pepo subsp. Kentucky: Red River nc7-60 Ames 26617 ovifera var. ozarkana U.S. Drainage Powell County USDA Fresh leaves Cucurbita pepo subsp. Missouri: Gasconade nc7-62 Ames 26620 ovifera var. ozarkana U.S. River Wright County USDA Fresh leaves Cucurbita pepo subsp. Missouri: McDonald nc7-68 Ames 26872 ovifera var. ozarkana U.S. County USDA Fresh leaves Cucurbita pepo subsp. : Tensas nc7-75 Ames 26884 ovifera var. ozarkana U.S. County USDA Fresh leaves Cucurbita pepo subsp. nc7-76 Ames 26885 ovifera var. ozarkana U.S. Kentucky: Union County USDA Fresh leaves

Texas: Along west bank of the Aransas River bottom 0.8 miles north of feeding posts ~6 miles northeast of St. Paul Big Cucurbita pepo subsp. LI Ranch San Patricio nc7-81 PI 614684 ovifera var. texana U.S. County USDA Fresh leaves

175

Table A-1. Continued

Label Label Label Label Label Label Label Texas: San Antonio River bottom west bank at River Ranch 2 miles north of the main ranch buildings 7 miles southeast of Junction 239 Cucurbita pepo subsp. and 183 (77a) Goliad nc7-83 PI 614686 ovifera var. texana U.S. County. USDA Fresh leaves Texas: High bank of Guadalupe River near Cucurbita pepo subsp. bridge at Independence nc7-84 PI 614687 ovifera var. texana U.S. Park Gonzales County. USDA Fresh leaves Cucurbita pepo subsp. Texas: Nail's Creek State nc7-88 PI 614691 ovifera var. texana U.S. Park Lee County USDA Fresh leaves Mississippi: 2.2 miles northeast of Stoneville around 10 miles east of Cucurbita pepo subsp. Greenville Washington nc7-90 PI 614693 ovifera var. texana U.S. County USDA Fresh leaves Mississippi: Southeast of Fitler around 5 miles west of U.S. Highway 61 and Cucurbita pepo subsp. south of State Highway 1 nc7-91 PI 614694 ovifera var. texana U.S. Issaquena County USDA Fresh leaves Mississippi: Less than 1 mile north of Port Gibson 1 mile west of U.S. Cucurbita pepo subsp. Highway 61 near Bayou nc7-92 PI 614695 ovifera var. texana U.S. Pierre Claiborne County USDA Fresh leaves Oaxaca: Colonia de Cucurbita pepo subsp. Cumbre between Oaxaca nc7-18 PI 438716 pepo Mexico and Guelatao de Juarez USDA Fresh leaves Cucurbita pepo subsp. nc7-2 Ames 21649 pepo Brazil Sao Paulo market USDA Fresh leaves

176

Table A-1. Continued

Label Label Label Label Label Label Label Guanajuato: Santiago Cucurbita pepo subsp. Maravatio Municipio nc7-22 PI 442299 pepo Mexico Santiago Maravatio USDA Fresh leaves Cucurbita pepo subsp. nc7-27 PI 451849 pepo Guatemala Solola Solola USDA Fresh leaves Cucurbita pepo subsp. Corona. Patzacia nc7-28 PI 451853 pepo Guatemala Chimaltenango USDA Fresh leaves Cucurbita pepo subsp. nc7-29 PI 458730 pepo Argentina Mendonza USDA Fresh leaves Cucurbita pepo subsp. Fresh leaves nc7-30 PI 458731 pepo Argentina Buenos Aires USDA

177

Table A-2. Sample information for samples used in the domestication genetics study for Chapter 5. Improvement Country of ID Species Status origin State/Province Locality Local name/Cultivar Variety Source Source code C. WA01 argyrosperma Improved Argentina Mendoza unknown Taboy unknown USDA PI458722 Glenn Drowns of Seed C. United Savers' White cushaw WA02 argyrosperma Improved States Exchange NA (crookneck) unknown USDA PI512129 Glenn Drowns of Seed C. United Savers' WA03 argyrosperma Improved States Exchange NA Gold striped cushaw unknown USDA PI512127 C. United WA04 argyrosperma Improved States NA NA White Hopi callicarpa USDA PI 678413

C. United Cushaw Green WA05 argyrosperma Improved States NA NA striped callicarpa USDA PI 678414

C. United Green striped WA06 argyrosperma Improved States NA NA cushaw unknown USDA PI512128

C. United Corneli Seed Cushaw Green WA07 argyrosperma Improved States Co. NA striped (crookneck) callicarpa USDA PI 678410 C. United WA08 argyrosperma Improved States NA NA Hopi callicarpa USDA PI 678416 C. WA09 argyrosperma Improved Argentina NA NA unknown unknown USDA PI458724

C. United WA10 argyrosperma Improved States NA NA wild white cushaw callicarpa USDA PI 678415

C. United Cushaw Green WA11 argyrosperma Improved States NA NA striped callicarpa USDA PI 678408 C. United WA12 argyrosperma Improved States NA NA Cushaw, white callicarpa USDA PI 678412

C. United WB01 argyrosperma Improved States NA NA crookneck, white callicarpa USDA PI 678411 C. WB02 argyrosperma Improved Argentina NA NA unknown unknown USDA PI458725 C. United Corneli Seed WB03 argyrosperma Improved States Co. NA Japanse Pie callicarpa USDA PI 678409 C. United WB04 argyrosperma Improved States NA NA unknown unknown USDA PI512130 C. WB05 argyrosperma Wild Mexico Jalisco El Chante NA unknown UNAM

178

Table A-2. Continued Improvement Country of ID Species Status origin State/Province Locality Local name/Cultivar Variety Source Source code C. Isthmus of WB06 argyrosperma Landrace Mexico Oaxaca Tehuantepec unknown argyrosperma USDA PI196923 C. WB07 argyrosperma Landrace Mexico Sonora Mercado unknown stenosperma USDA Vicinity of Arroyo Bakosiachi, West of La Bufa, on South side of Barranca de Batopilas, C. municip. WB08 argyrosperma Landrace Mexico Chihuauhua Batopilas callicarpa USDA Grif 9456 C. United WB09 argyrosperma Landrace States Arizona Little Tucson unknown unknown USDA PI451712 C. WB10 argyrosperma Landrace Mexico Sonora Muierta unknown callicarpa USDA PI196923 C. WB11 argyrosperma Landrace Belize Corozal Yo chen Pepita gruesa unknown USDA PI438545 C. La Mesa san WB12 argyrosperma Landrace Mexico Jalisco Nicolas unknown unknown UNAM C. WC01 argyrosperma Landrace Mexico Nuevo Leon Los Pasajitos unknown unknown UNAM C. WC02 argyrosperma Landrace Mexico Guanajuato San Miguel unknown unknown UNAM C. WC03 argyrosperma Landrace Mexico Puebla Tecomatlan unknown unknown UNAM C. WC04 argyrosperma Landrace Mexico Veracruz Actopan unknown stenosperma USDA C. WC05 argyrosperma Landrace Mexico Jalisco El Chante unknown unknown UNAM C. WC06 argyrosperma Landrace Mexico Veracruz San Lorenzo unknown unknown UNAM C. WC07 argyrosperma Landrace Mexico Mexico City Anoacapan unknown unknown UNAM

C. stenosperma. WC08 argyrosperma Landrace Mexico Morelos Tepoztlan unknown Market USDA 14.4km S of Hisamaloya Beach bridge C. towards El WC09 argyrosperma Wild Mexico Jalisco Tuito NA NA USDA PI494128

179

Table A-2. Continued Improvement Country of ID Species Status origin State/Province Locality Local name/Cultivar Variety Source Source code Near river at Km 40, Route 190 between Cazadero and C. Santo Do WC10 argyrosperma Wild Mexico Oaxaca mingo NA NA USDA PI438835 Suya Tenco, between Vicente C. Guerrero and WC11 argyrosperma Wild Mexico Chiapas Jiquipilas NA NA USDA PI428836 C. WC12 argyrosperma Wild Mexico Mexico City Anoacapan NA NA UNAM C. Banos de WD01 argyrosperma Wild Mexico Veracruz Carrizal NA NA USDA PI512220 C. WD02 argyrosperma Wild Mexico Jalisco Autlan NA NA USDA PI512209 C. 9 miles north WD03 argyrosperma Wild Mexico Sonora of Movas NA NA USDA PI318832 Km 91 on C. Huatusco- WD04 argyrosperma Wild Mexico Veracruz Conejos Road NA NA USDA PI438833 Tamarindo, between C. Tamarindo WD05 argyrosperma Wild Mexico Veracruz and Ovejas NA NA USDA PI438834 C. WD06 argyrosperma Wild Mexico Guanajuato San Miguel NA NA UNAM C. WD07 argyrosperma Wild Mexico Nuevo Leon Los Pasajitos NA NA UNAM C. WD08 argyrosperma Wild Mexico Oaxaca El Puente NA NA UNAM C. La Mesa san WD09 argyrosperma Wild Mexico Jalisco Nicolas NA NA UNAM Canton El Aguacate, C. Hunicipio de WD10 argyrosperma Wild Mexico Chiapas Hazatan NA NA USDA PI442358

C. San Juan WD11 argyrosperma Wild Mexico Hidalgo Amola NA NA UNAM

180

Table A-2. Continued Improvement Country of ID Species Status origin State/Province Locality Local name/Cultivar Variety Source Source code C. Los WD12 argyrosperma Wild Mexico Puebla Tecomatlan NA NA UNAM WE01 C. maxima Wild Argentina San Luis unknown NA NA IPK MAX125 Sierras Las WE02 C. maxima Wild Argentina San Luis Quijadas NA NA UNR MAX201 WE03 C. maxima Wild Argentina Buenos Aires Laguna Alsina NA NA GRIN MAX43 WE04 C. maxima Wild Argentina San Luis Concaran NA NA UNR MAX137 Santiago del WE05 C. maxima Wild Argentina Estero La Huerta NA NA UNR MAX136 Las Varillas, C WE06 C. maxima Wild Argentina Cordoba El Peaje NA NA UNR MAX174 Coronel WE07 C. maxima Wild Argentina Buenos Aires Suarez NA NA UNR MAX139 WE08 C. maxima Wild Argentina Cordoba Despenaderos NA NA UNR MAX199 Santiago del Linares cerca WE09 C. maxima Wild Argentina Estero de Anatuya NA NA UNR MAX175 WE10 C. maxima Wild Argentina Santa Fe Chapuy NA NA UNR MAX135 Toledo - Rio WE11 C. maxima Wild Argentina Cordoba Segundo NA NA GRIN MAX160 Rincon del WE12 C. maxima Wild Argentina Entre Rios Doll NA NA UNR MAX202 WF01 C. maxima Wild Argentina Buenos Aires Chascomus NA NA GRIN MAX218 WF02 C. maxima Wild Argentina Entre Rios unknown NA NA IPK MAX130 Santiago del WF03 C. maxima Wild Argentina Estero Bandera NA NA UNR MAX140 Coronel WF04 C. maxima Wild Argentina Buenos Aires Pringles NA NA UNR MAX223 WF05 C. maxima Improved USA NA NA Valenciano NA USDA PI 162669 WF06 C. maxima Improved USA NA NA Nan Kwa NA USDA PI 92835 WF07 C. maxima Improved USA NA NA Big Moon NA USDA PI 599986 WF08 C. maxima Improved USA NA NA Pride NA USDA PI 600834 WF09 C. maxima Improved USA NA NA NA USDA PI 667027 WF10 C. maxima Improved USA NA NA Denver White NA USDA PI 667032 WF11 C. maxima Improved USA NA NA Ames 721 NA USDA PI318430 WF12 C. maxima Improved USA NA NA Quality NA USDA PI 656589 WG01 C. maxima Improved USA NA NA Candy Roaster NA USDA PI267750 WG02 C. maxima Improved USA NA NA Criollo NA USDA PI 162670

181

Table A-2. Continued

Improvement Country of ID Species Status origin State/Province Locality Local name/Cultivar Variety Source Source code

WG03 C. maxima Improved USA NA NA Warren Turban NA USDA PI 667026 WG04 C. maxima Improved USA NA NA unknown NA USDA PI349354 WG05 C. maxima Improved USA NA NA Criollo NA USDA PI 162670 WG06 C. maxima Improved USA NA NA Gold Nugget NA USDA PI325133

WG07 C. maxima Improved USA NA NA Banana Squash NA USDA PI518678 WG08 C. maxima Improved USA NA NA Turban Squash NA USDA PI518578

WG09 C. maxima Landrace Argentina unknown unknown Turbante Corcho NA UNR MAX108 WG10 C. maxima Landrace Argentina unknown unknown Zapallo Plomo NA unknown PI 458661 WG11 C. maxima Landrace Paraguay unknown unknown Zapallo NA unknown PI 458742 WG12 C. maxima Landrace Argentina unknown unknown unknown NA unknown PI 458664

WH01 C. maxima Landrace Argentina Catamarca Los Alamitos "primitive pumpkin" NA unknown PI 265571

Ferry WH02 C. maxima Landrace Argentina unknown unknown Zapallito Squash NA Morse MAX155 Chacras de WH03 C. maxima Landrace Argentina Mendoza Coria Chinque NA unknown PI 458686 WH04 C. maxima Landrace Argentina Buenos Aires unknown unknown NA unknown PI 458709

Chacras de WH05 C. maxima Landrace Argentina Mendoza Coria Chacoriano/Zapallito NA unknown PI 458690 Corrientes (prob. Imported from WH06 C. maxima Landrace Argentina Corrientes Misiones) Zapallo Plomo NA unknown PI 458663

Zap Tronco WH07 C. maxima Landrace Argentina unknown unknown Claríssimo/Zapallito NA CAPS MAX156

182

Table A-2. Continued

Improvement Country of ID Species Status origin State/Province Locality Local name/Cultivar Variety Source Source code

Chacras de WH08 C. maxima Landrace Argentina Mendoza Coria Maguilta/Zapallito NA unknown PI 458688 Chacras de WH09 C. maxima Landrace Argentina Mendoza Coria Carla NA unknown PI 458696

WH10 C. maxima Landrace Argentina unknown unknown Zapuco/ Banana NA INTA MAX146 Corrientes (prob. Imported from WH11 C. maxima Landrace Argentina Corrientes Misiones) Zapallo/ Exposicao NA unknown PI 458672

Plomo Ruso WH12 C. maxima Landrace Argentina Mendoza La Consulta (Plaunorskja) NA unknown PI 458702

183

APPENDIX B PRIMERS USED FOR MICROFLUIDIC PCR IN CHAPTER 3

Table B-1. Primer information for primers used in the phylogenetic study for Chapter 3 cucurbigene.n et Cucurbita pepo GenBank transcriptome Locus accession v3 unigene length numbers target (bp) Forward primer sequence Reverse primer sequence (ambiguity; allelic) KY756625 - ACACTGACGACATGGTTCTACATCGAACCATCTGGGA TACGGTAGCAGAGACTTGGTCTACCTCCTTCCAGATCCAT KY756715;KY7587 CUTC000988 477 GAATGC CCC 32 - KY758840 KY754952 - ACACTGACGACATGGTTCTACATCCACCCACACCCTT TACGGTAGCAGAGACTTGGTCTCCATACAAGTCCCACAGG KY755039;KY7600 CUTC002236 487 AATCG TCC 05 - KY760104 KY757050 - ACACTGACGACATGGTTCTACATGGATGAAGGAATCG TACGGTAGCAGAGACTTGGTCTCATGATGCGATTGCCTCA KY757138;KY7582 CUTC003346 507 GTGCC GC 13 - KY758351 KY757229 - ACACTGACGACATGGTTCTACAGCGAACCCAGAATCC TACGGTAGCAGAGACTTGGTCTAATGTGAACTAGTCTCCA KY757316;KY7598 CUTC003400 412 ACCC CCACC 12 - KY759920 KY755951 - ACACTGACGACATGGTTCTACAACCTCTCGCTCTTGTT TACGGTAGCAGAGACTTGGTCTGAACCCTTCTCAGCCACA KY756038;KY7595 CUTC0034082 475 TCCG GG 97 - KY759700 KY755308 - ACACTGACGACATGGTTCTACAGCAATTGAGCAGCTC TACGGTAGCAGAGACTTGGTCTTTCCGAGACGAGCAAATC KY755396;KY7614 CUTC004036 411 TTAGAGG CG 78 - KY761583 KY755216 - ACACTGACGACATGGTTCTACATGGGAGAATGGTTGG TACGGTAGCAGAGACTTGGTCTCCTCTTCTCTCAATAGCC KY755307;KY7616 CUTC005742 441 AGTTGG GTGC 90 - KY761789 KY754681 - ACACTGACGACATGGTTCTACATCGCTCGTTCTCTTG TACGGTAGCAGAGACTTGGTCTCTCTTCTCATGGACAGAC KY754769;KY7589 CUTC005867 620 CTAGG GGG 31 - KY759032 KY757750 - ACACTGACGACATGGTTCTACAGTTGGAGGCTTTGTT TACGGTAGCAGAGACTTGGTCTTCGTCTTGAGATACTGGC KY757835;KY7607 CUTC006961 430 GGTGC TTTGG 28 - KY760829 KY755759 - CUTC007154_ ACACTGACGACATGGTTCTACAAGGGAGCTTCAAGTA TACGGTAGCAGAGACTTGGTCTCACTCGTCCAAGTACAGC KY755849;KY7584 A 462 TGGCG CC 14 - KY758517 ACACTGACGACATGGTTCTACATTTCATTTGATGCGGCGACC TACGGTAGCAGAGACTTGGTCTTCCCTCTTTGTGCTCTGTGG KY757496 - CUTC007154_ KY757578;KY7588 B 455 41 - KY758930

184

Table B-1. Continued cucurbigene.n et Cucurbita pepo GenBank transcriptome Locus accession v3 unigene length numbers target (bp) Forward primer sequence Reverse primer sequence (ambiguity; allelic) KY754862 - ACACTGACGACATGGTTCTACATGCTTGTCGACAGTA TACGGTAGCAGAGACTTGGTCTATTTGAACTCACCAAGAA KY754951;KY7597 CUTC008905 451 CTTGGC CCACG 01 - KY759811 KY756962 - ACACTGACGACATGGTTCTACACCCACTTGTTCTGTTC TACGGTAGCAGAGACTTGGTCTCCTGCAGCATCGTTGAAG KY757049;KY7585 CUTC009821 475 TTGAGG C 18 - KY758619 KY757407 - ACACTGACGACATGGTTCTACAGATGATCATGGATGC TACGGTAGCAGAGACTTGGTCTTTCTTCAATCAAATGCCG KY757495;KY7617 CUTC010843 424 TTGCTACC GTCG 90 - KY761845 KY755489 - ACACTGACGACATGGTTCTACATGCAACAGGAGATGG TACGGTAGCAGAGACTTGGTCTAAGTCGGTCAGCATGTCT KY755577;KY7612 CUTC011783 535 CAAGG CG 02 - KY761308 KY757669 - ACACTGACGACATGGTTCTACAAATGGTTTGGAGCTT TACGGTAGCAGAGACTTGGTCTCTTCCAAGGTTGAGGCTC KY757749;KY7595 CUTC012412 558 GGTGC CA 49 - KY759596 KY755132 - ACACTGACGACATGGTTCTACATTGTTCTCTCAATTGC TACGGTAGCAGAGACTTGGTCTTGATGGAAGGCATAATCT KY755215;KY7605 CUTC013285 526 CATGGC GAGGG 63 - KY760651 KY757926 - ACACTGACGACATGGTTCTACAGTTCGTTTCGAGCAT TACGGTAGCAGAGACTTGGTCTCGAAGCCAATTCATCGGA KY758010;KY7591 CUTC013530 427 GAAGGC CG 67 - KY759265 KY756504 - KY756533, KY756798 - KY756886;KY7608 30 - KY760862, ACACTGACGACATGGTTCTACATGCAACCAACGCTCT TACGGTAGCAGAGACTTGGTCTCGTGTTGCCTGCTGATAT KY761098 - CUTC0135781 321 GTCC TCG KY761201 KY756337 - ACACTGACGACATGGTTCTACATCTTCTTCACTGGAGA TACGGTAGCAGAGACTTGGTCTTCGGATCTCGAATCAAGT KY756412;KY7592 CUTC014907 497 ACACACG CTGG 66 - KY759359 KY756887 - ACACTGACGACATGGTTCTACAATGATCTCACTCAAAC TACGGTAGCAGAGACTTGGTCTCAGAAGAGAGGGTCGGA KY756961;KY7603 CUTC017739 514 CCTCCC ATCG 56 - KY760442 KY756413 - ACACTGACGACATGGTTCTACATTCAACGTCGAGCTT TACGGTAGCAGAGACTTGGTCTGACAAGGAGGTCGGAGG KY756503;KY7601 CUTC019250 309 GCCC C 73 - KY760279

185

Table B-1. Continued cucurbigene.n et Cucurbita pepo GenBank transcriptome Locus accession v3 unigene length numbers target (bp) Forward primer sequence Reverse primer sequence (ambiguity; allelic) KY757317 - ACACTGACGACATGGTTCTACAGTAACATCGATCCAC TACGGTAGCAGAGACTTGGTCTGAAGGGAGTGATGGAGC KY757406;KY7609 CUTC020170 415 ACTTGTCG GC 78 - KY761097 KY758011 - ACACTGACGACATGGTTCTACAAATCTACCTCAGCCG TACGGTAGCAGAGACTTGGTCTTCGGTGGGAGGTAGATGT KY758090;KY7583 CUTC0208902 594 AATCCC CG 52 - KY758413 KY755040 - ACACTGACGACATGGTTCTACATTGTCGATGAGAAGT TACGGTAGCAGAGACTTGGTCTCCAATGGACGACTGCAAG KY755131;KY7586 CUTC023034 374 GTGAAAGC C 20 - KY758731 KY755930 - ACACTGACGACATGGTTCTACAGCTAACACGGACATA TACGGTAGCAGAGACTTGGTCTGCACCATCGTGTAGTTTC KY755950;KY7581 CUTC023408 733 CTTGTTGC ATGG 93 - KY758212 KY757836 - ACACTGACGACATGGTTCTACAGGGAGGAAGAAGAG TACGGTAGCAGAGACTTGGTCTCAGAATCGACAACAACGG KY757925;KY7580 CUTC025497 514 GAACGC GC 91 - KY758192 KY756534 - ACACTGACGACATGGTTCTACAGTAATGAAGGCTGCA TACGGTAGCAGAGACTTGGTCTGCTGGTGTGCAGAACTTC KY756624;KY7604 CUTC028067 362 CTGTTCG G 43 - KY760549 KY754826 - ACACTGACGACATGGTTCTACATCGTCACCACTTCCC TACGGTAGCAGAGACTTGGTCTAATCCACCGATAGTCCAA KY754861;KY7590 CUTC028370 288 AAACG ACCC 33 - KY759073 KY755647 - ACACTGACGACATGGTTCTACACGACTGTGCTTCTCTT TACGGTAGCAGAGACTTGGTCTAATCATGTGACAAGATGC KY755736;KY7608 CUTC028777 414 CGGG TGGC 63 - KY760977 KY755850 - ACACTGACGACATGGTTCTACATAGTCCAATGCGCAT TACGGTAGCAGAGACTTGGTCTTTGGTTCCAGCAGCAGAA KY755929;KY7590 CUTC029194 434 GTCCG GG 74 - KY759166 KY754770 - ACACTGACGACATGGTTCTACACCTACGGGCTAAGTA TACGGTAGCAGAGACTTGGTCTATTGTTTGAGTTGTTGGC KY754825;KY7605 CUTC0305632 499 CAATTTGC GGG 50 - KY760562 KY756240 - ACACTGACGACATGGTTCTACATGAAGATGGGTTGGA TACGGTAGCAGAGACTTGGTCTTTCAACTGCAACAATGGC KY756245;KY7606 CUTC0313502 256 CTCTGC TCC 52 - KY760727 KY755397 - ACACTGACGACATGGTTCTACATGGGAATTCTGCTCT TACGGTAGCAGAGACTTGGTCTGGGCGTTCTTGTGATAAC KY755488;KY7615 CUTC033182 444 CATCACG AATGG 84 - KY761689

186

Table B-1. Continued cucurbigene.n et Cucurbita pepo GenBank transcriptome Locus accession v3 unigene length numbers target (bp) Forward primer sequence Reverse primer sequence (ambiguity; allelic) KY755737 - KY755758, KY755578 - KY755646;KY7593 60 - ACACTGACGACATGGTTCTACACGAACGTCGTAATTG TACGGTAGCAGAGACTTGGTCTGGACGGGTGAACTTCCTC KY759441,KY7601 CUTC0347281 565 GCATCC G 05 - KY760172 ACACTGACGACATGGTTCTACAGTCGACCCTCATACACACGG TACGGTAGCAGAGACTTGGTCTGAAGAGGATCTGGACAACTGGG KY757579 - KY757668;KY7618 CUTC034809 361 46 - KY761947 KY756716 - ACACTGACGACATGGTTCTACATGTTGCCGCTGGAGT TACGGTAGCAGAGACTTGGTCTGAAACAACGCAACAGGAG KY756797;KY7595 CUTC036469 653 AATAAGG GG 27 - KY759548 KY756097 - ACACTGACGACATGGTTCTACAAGCTTCGAAACCGTA TACGGTAGCAGAGACTTGGTCTTTTCGTCTGAATCTCCGC KY756158;KY7602 CUTC036640 546 GAGTCG GG 80 - KY760355 KY756159 - ACACTGACGACATGGTTCTACATGAACGACAAATCAAT TACGGTAGCAGAGACTTGGTCTCCAAGCTGACTGAGATCA KY756239;KY7599 CUTC044924 552 GCACACC AGGG 21 - KY760004 KY756246 - ACACTGACGACATGGTTCTACAGGAATGGCTGGGTGT TACGGTAGCAGAGACTTGGTCTAGTTTCTGTGAGACTCGT KY756336;KY7594 CUTC045173 158 TAATGC AGCC 42 - KY759526 KY757139 - ACACTGACGACATGGTTCTACACTGCACTCGGTCCTA TACGGTAGCAGAGACTTGGTCTCTGCTGCATAGTTAATTC KY757228;KY7613 CUTC045659 485 GCG GCCC 09 - KY761409 KY756039 - ACACTGACGACATGGTTCTACAAGCAGCCTCAGTTTG TACGGTAGCAGAGACTTGGTCTTGCGACTGAGAAGGAGG KY756096;KY7614 CUTC047634 521 TGCAT AAC 10 - KY761477

187

APPENDIX C METHODS USED IN CHAPTER 3

Figure C-1. Illustration of bioinformatics pipeline for locus selection in Chapter 3.

188

Figure C-2. Comparison of success of microfluidic PCR amplification of loci between herbarium-sheet and fresh leaf tissue samples and across primer lengths (forward and reverse primer length combined). Black indicates that a locus was amplified for all individuals of the corresponding primer-length, white indicates the locus was not successfully amplified for any individuals of the corresponding primer-length.

189

APPENDIX D ADDITIONAL PHYLOGENETIC TREES FROM CHAPTER 3

Figure D-1. ASTRAL greedy consensus tree estimated from 44 majority-rule consensus gene trees (RAxML) that were reconstructed from ambiguity sequences. Support values are from 1,000 bootstrap replicates.

190

Figure D-2. ASTRAL greedy consensus tree estimated from 44 majority-rule consensus gene trees (RAxML) that were reconstructed from allele sequences with alleles mapped to individuals. Support values are from 1,000 bootstrap replicates.

191

APPENDIX E ALLELE DEEP COALESCENCE AND PHYLOGENETIC RESOLUTION IN CHAPTER 4

Figure E-1. Square-root pairwise distances (sqrt PWD) in the data and its relationship to phylogenetic resolution. A) Heatmap of sqrt PWD for each individual across 111 genes. Individual averages across genes are listed to the left of the heatmap. Asterisks indicate individuals that are members of high-conflict topologies. B) Relationship between the average sqrt PWD for each of 111 genes and that gene tree's percent resolution for each of three datasets. C) Relationship between the individual's average sqrt PWD and that individual's memebership in a high-conflict area of the Artocarpus topology.

192

APPENDIX F PHYLOGENETIC TREES FROM CHAPTER 4

Figure F-1. Phylogenetic trees reconstructed using three datasets and multiple methods of phylogenetic analyses.

193

Figure F-1. Continued Figure F-1. Continued F

194

Figure F-1. Continued

195

Figure F-1. Continued

196

Figure F-1. Continued

197

Figure F-1. Continued

198

Figure F-1. Continued

199

Figure F-1. Continued

200

Figure F-1. Continued

201

Figure F-1. Continued

202

Figure F-1. Continued

203

Figure F-1. Continued

204

Figure F-1. Continued

205

Figure F-1. Continued

206

Figure F-1. Continued

207

Figure F-1. Continued

208

Figure F-1. Continued

209

Figure F-1. Continued

210

Figure F-1. Continued

211

Figure F-1. Continued

212

APPENDIX G SUPPORT FOR ALL HIGH CONFLICT AREAS DISCUSSED IN CHAPTER 4 ACROSS DATASETS AND PHYLOGENETIC ANALYSES

Figure G-1. Incongruence between analyses for all three datasets for Areas A-D. Left: Table showing incongruence between analyses for all three datasets for areas A-B. Rows represent topologies, drawn as cartoon trees (or subtrees) at right. Columns represent analyses (labeled at bottom) within datasets (labeled at top). Strong support (black) denotes percent support >70%. Weak support denotes percent support <70%. No support indicates that the topology in question was not recovered. Right: Cartoon trees representing alternative topologies for areas A-D. Rearrangements are shown by colored terminal branches within each area.

213

LIST OF REFERENCES

Adler, L.S., Hazzard, R.V. 2009. Comparison of perimeter trap crop varieties: effects on herbivory, pollination, and yield in butternut squash. Environmental Entomology. 38 (1) ,207-15. doi:10.1603/022.038.0126.

Allard, R.W. 1960. Principles of plant breeding. New York: Wiley & Sons, Inc.

Altieri, M.A., Merrick, L. 1987. In situ conservation of crop genetic resources through maintenance of traditional farming systems. Economic Botany 41, 86-96.

Andreasen, K., Baldwin, B.G. 2001. Unequal Evolutionary Rates Between Annual and Perennial Lineages of Checker Mallows (Sidalcea, Malvaceae): Evidence from 18S–26S rDNA Internal and External Transcribed Spacers. Mol. Biol. Evol. 18 (6), 936-944.

Andres, T.C. 1987. Cucurbita fraterna, the closest wild relative and progenitor of C. pepo. Cucurbit Genetics Cooperative Report. (10), 69-71.

Andres, T.C., 2006. Origin, morphological variation, and uses of Cucurbita ficifolia, the mountain squash. Cucurbitaceae. Asheville, North Carolina, USA, 326-332.

Ané, C., Larget, B., Baum, D.A., Smith, S.D., and Rokas, A. 2007. Bayesian estimation of concordance among gene trees. Molecular Biology and Evolution 24, 412– 426.

Arriaga, L., Huerta, E., Lira-Saade, R., Moreno, E., Alarcon, J. 2006. Assessing the risk of releasing transgenic Cucurbita spp. in Mexico. Agriculture Ecosystems & Environment. 112(4), 291-9. doi:10.1016/j.agee.2005.07.007.

Atkinson, R.G., Cipriani, G., Whittaker, D.J., Gardner, R.C., 1997. The allopolyploid origin of kiwifruit, Actinidia deliciosa (Actinidiaceae). Plant Systematics and Evolution. 205(1-2), 111-124.

Bailey, L. 1943. Species of Cucurbita. Gentes Herbarum. 6, 266–322

Bankevich, A., Nurk, S., Antipov, D., Gurevich, A.A., Dvorkin, M., Kulikov, A.S., Lesin, V.M., et al. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of computational biology. 19, 455– 477.

Bardaa, S., Ben Halima, N., Aloui, F., Ben Mansour, R., Jabeur, H., Bouaziz, M. et al. 2016. Oil from pumpkin (Cucurbita pepo L.) seeds: evaluation of its functional properties on wound healing in rats. Lipids in Health and Disease. 15. doi:10.1186/s12944-016-0237-0.

214

Bayzid, M.S., Warnow, T. 2013. Naive binning improves phylogenomic analyses. Bioinformatics 29(18), 2277-2284.

Bemis, W.P., Curtis, L.D., Weber, C.W., , J. 1978. Feral buffalo gourd, Cucurbita foetidissima. Economic Botany. 32(1), 87-95. doi:10.1007/bf02906733.

Bemis, W.P., Nelson, J.M. 1963. Interspecific Hybridization within the Genus Cucurbita I, Fruit Set, Seed and Embryo Development. Journal of Ariz. Acad. of Sci. 2(3), 104-107.

Bemis, W.P., Whitaker, T.W., 1969. The Xerophytic Cucurbita of Northwestern Mexico and Southwestern United States. Madro 20(2), 33-41.

Benazzo, A., Panziera, A., Bertorelle, G. 2015. 4P: fast computing of population genetics statistics from large DNA polymorphism panels. Ecology and Evolution 5:172-175.

Berg, C.C., Corner, E.J.H., Jarrett, F.M., et. al. 2006. Flora Malesiana. Series I, Seed plants. Volume 17, Part 1: Moraceae-genera other than Ficus. Nationaal Herbarium Nederland.

Binod Prasad, L., Sang Gyu, K., Jung, S.S., On-Sook, H., Mun-Sup, Y., Ju-Hee, R. et. al. 2016. Screening of Pumpkin (Cucurbita spp.) germplasm for resistance to powdery mildew at various stages of seedlings growth. Res Plant Dis. 22(3), 133- 44.

Bisognin, D.A. 2002. Origin and evolution of cultivated cucurbits. Ciência Rural 32, 715- 723.

Bolley, D.S., McCormack, R.H., Curtis, L.C. 1950. The utilization of the seeds of the wild perennial gourds. Journal of the American Oil Chemists Society. 27(12), 571-4. doi:10.1007/bf02634988.

Brown, J.W., Walker, J.F., and Smith, S.A. 2017. Phyx: phylogenetic tools for unix. Bioinformatics 33, 1886-1888.

Browning, S.R., and Browning, B.L. 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. American journal of human genetics 81, 1084–1097.

Brücher, H. 2012. Cucurbita ssp. In. Useful Plants of Neotropical Origin: and Their Wild Relatives: Springer Berlin Heidelberg.

CBS News. 2014. Drought means smaller pumpkins for California this Halloween.

215

Caicedo, A.L., Schaal, B.A. 2004. Population structure and phylogeography of Solanum pimpinellifolium inferred from a nuclear gene. Molecular Ecology 13(7), 1871- 1882.

Cao, K., Zheng, Z.J., Wang, L.R., Liu, X., Zhu, G.R., Fang, W.C., Cheng, S.F., Zeng, P., Chen, C.W., Wang, X.W., et. al. 2014. Comparative population genomics reveals the domestication history of the peach, Prunus persica, and human influences on perennial fruit crops. Genome Biology 15(7), 415.

Capella-Gutiérrez, S., Silla-Martínez, J.M., and Gabaldón, T. 2009. trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973.

Charif, D., and Lobry, J.R. 2007. SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis U. Bastolla, M. Porto, H. E. Roman, and M. Vendruscolo [eds.], Structural approaches to sequence evolution: Molecules, networks, populations, 207–232.

Chavez, D.J., Kabelka, E.A., Chaparro, J.X. 2011. Screening of Cucurbita moschata Duchesne germplasm for crown rot resistance to Floridian isolates of Phytophthora capsici Leonian. Hortscience 46(4), 536-40.

Chifman, J., and Kubatko, L. 2014. Quartet inference from SNP data under the coalescent model. Bioinformatics 30, 3317–3324.

Chigmura Ngwerume F., Grubben G.J.H. 2004. Cucurbita. In: Grubben G.J.H, editor. Vegetables: PROTA.

Clark, R.L., Widrlechner, M.P., Reitsma, K.R., Block, C.C. 1991. Cucurbit germplasm at the north central regional plant introduction station, Ames, Iowa. Hortscience 26(4), 326-328.

Cohen, M.N. 1978. Population pressure and the origins of agriculture. In: Browman D.L., editor. Advances in Andean archaeology. The Hauge, Paris: Mouton Publishers.

Collins, A.G., Cartwright, P., McFadden, C.S., Schierwater, B. 2005. Phylogenetic context and basal metazoan model systems. Integrative and Comparative Biology 45(4), 585-594.

Cornille, A., Gladieux, P., Smulders, M.J.M., Roldan-Ruiz, I., Laurens, F., Le Cam B., Nersesyan, A., Clavel, J., Olonova, M., Feugey, L., et. al. 2012. New Insight into the History of Domesticated Apple: Secondary Contribution of the European Wild Apple to the Genome of Cultivated Varieties. Plos Genetics 8(5), e1002703. doi: 10.1371/journal.pgen.1002703..

216

Crowl, A.A., Visger, C.J., Mansion, G., Hand, R., Wu, H-H., Kamari, G., Phitos, D., Cellinese, N. 2015. Evolution and biogeography of the endemic Roucela complex (Campanulaceae: Campanula) in the Eastern Mediterranean. Ecology and Evolution 5(22), 5329-5343.

Cruz-Reyes, R., Avila-Sakar, G., Sanchez-Montoya, G., Quesada, M. 2015. Experimental assessment of gene flow between transgenic squash and a wild relative in the center of origin of cucurbits. Ecosphere 6(12), doi:10.1890/es15- 00304.1.

Dallman, N., Dallman, M. 2009. Breeding classic Cucurbita maxima Buttercup Squash for increased genetic diversity. In. Cucurbit Genetics Cooperative Report, 17-18.

Daniello, F. 2012. Estimated water requirements of vegetable crops. In: Horticultural Crop Guides Department of Horticultural Sciences, Texas A&M University. http://extension.missouri.edu/sare/documents/estimatedwaterrequirementsveget able2012.pdf.

Darriba, D., Taboada, G.L., Doallo, R., Posada, D. 2012. jModelTest 2: more models, new heuristics and parallel computing. Nature Methods. 9,772.

Decker, D.S., Wilson, H.D. 1987. Allozyme variation in the Cucurbita pepo complex - Cucurbita-pepo var. ovifera vs Cucurbita texana. Systematic Botany 12(2), 263- 273.

Decker, D.S. 1988. Origin(s), evolution, and systematics of Cucurbita (Cucurbitaceae). Economic Botany 42(1), 4-15.

Decker-Walters, D., Staub, J., Chung, S., Nakata, E., Quemada H. 2002. Diversity in free-living Populations of Cucurbita pepo (Cucurbitaceae) as assessed by random amplified polymorphic DNA. Systematic Botany 27(1), 19-28.

Decker-Walters, D. 1990. Evidence for multiple domestications of Cucurbita pepo. In: Bates D, Robinson R, Jeffrey C, editors. Biology and utilization of the Cucurbitaceae. Ithaca and London: Cornell University, 96-101.

Decker-Walters, D.S., Walters, T.W., Cowan, C.W., Smith, B.D. 1993. Isozymic characterization of wild populations of Cucurbita pepo. Journal of Ethnobiology 13, 55-72.

Decker-Walters, D.S., Walters, T.W. 2000. Squash. In: Kiple K.F., Kriemhild C.O., editors. The Cambridge World History of Food: Cambridge University Press, 335- 350.

217

Degnan, J.H., Rosenberg, N.A. 2009. Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol Evol 24(6), 332-340.

Della Vecchia, P.T., Terenciano Sobrinho, P., Terenciano, A. 1993. Breeding bush types of C. moschata with field resistance to PRSV-w. In: Cucurbit Genetics Cooperative, 70-71.

Deveaux, J.S., Shultz, E.B.1985. Development of buffalo gourd (Cucurbita foetidissima) as a semiarid land starch and oil crop. Economic Botany 9(4), 454-72.

Diez, C.M., Trujillo, I., Martinez-Urdiroz, N., Barranco, D., Rallo, L., Marfil, P., Gaut, B.S. 2015. Olive domestication and diversification in the Mediterranean Basin. New Phytologist 206, 436- 447.

Doebley, J.F., Gaut, B.S., Smith, B.D. 2006. The molecular genetics of crop domestication. Cell 127,1309-1321.

Doebley, J.F., Iltis, H.H. 1980. Taxonomy of Zea (gramineae) A subgeneric classification with key to taxa. American Journal of Botany 67(6), 982-993.

Doyle, J.J., Doyle, J.L. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue: Bulletin, 11-15.

Doyle, J.J., Doyle, J.L., Brown, A.H.D. 1990. A chloroplast-DNA phylogeny of the wild perennial relatives of soybean (Glycine subgenus Glycine) - congruence with morphological and crossing groups. Evolution 44(2), 371-389.

Driskell, A.C., Ane, C., Burleigh, J.G., McMahon, M.M., O'Meara, B.C., Sanderson, M.J. 2004. Prospects for building the tree of life from large sequence databases. Science 306(5699), 1172-1174.

Drummond, A.J., and Rambaut, A. 2007. BEAST: Bayesian evolutionary analysis by sampling trees. BMC evolutionary biology 7, 214.

Duarte, J.M., Wall, P.K., Edger, P.P., Landherr, L.L., Ma, H., Pires, J.C., Leebens-Mack, J., and dePamphilis, C.W. 2010. Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels. BMC evolutionary biology 10, 61.

Duchesne, A. 1786. Essai sur l'histoire naturelle des courges. Paris: Panckoucke

Díez, M., van Dooijeweert, W., Maggioni, L., Lipman, E. Report of a Working Group Economic Research Service, USDA.

218

Edgar, R.C. 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32(5), 1792-1797.

Edwards, C.E., Lefkowitz, D., Soltis, D.E., Soltis, P.S. 2008. Phylogeny of Conradina and related southeastern scrub mints (Lamiaceae) based on GapC gene sequences. International Journal of Plant Sciences 169(4), 579-594.

Edwards, S.V., Liu, L., and Pearl, D.K. 2007. High-resolution species trees without concatenation. Proceedings of the National Academy of Sciences of the United States of America 104, 5936–5941.

Erwin, A. 1931. Nativity of the Cucurbits. Botanical Gazette 91(1).

Erwin, A. 1938. An interesting Texas cucurbit. Iowa State Coll J Sci. (12):253-5.

Esquinas-Alcazar, J. 2005. Protecting crop genetic diversity for food security: political, ethical and technical challenges. Nature Reviews Genetics 6, 946-953.

Evanno, G., Regnaut, S., Goudet, J. 2005. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology 14, 2611-2620.

FAOSTAT. Production quantities of Pumpkins, squash and gourds by country.

Fang, L., Wang, Q., Hu, Y., Jia, Y.H., Chen, J.D., Liu, B.L., Zhang, Z.Y., Guan, X.Y., Chen, S.Q., Zhou, B.L., et. al. 2017. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nature Genetics 49, 1089

Felsenstein, J. 1973. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Zoology 22(3), 240-249.

Felsenstein, J. 1981. Evolutionary trees from dna-sequences - a maximum- likelihood approach. Journal of Molecular Evolution 17, 368-376.

Felsenstein, J. 2003. Inferring Phylogenies. Sinauer.

Ferriol, M., Pico, B. 2008. Pumpkin and winter squash. In: Prohens J, Nuez F, editors. HDB Plant Breeding. Springer, Heidelberg, 317-49.

Ferriol, M., Pico., B., de Cordova, P.F., Nuez, F. 2004. Molecular diversity of a germplasm collection of squash (Cucurbita moschata) determined by SRAP and AFLP markers. Crop Science 44(2), 653-664.

219

Ferriol, M., Picó, B., Nuez, F. 2003. Genetic diversity of some accessions of Cucurbita maxima from Spain using RAPD and SBAP markers. Genetic Resources and Crop Evolution 50, 227-238.

Figueiredo, L.F.D., Calatayud, C., Dupuits, C., Billot, C., Rami, J.F., Brunel, D., Perrier, X., Courtois, B., Deu, M., Glaszmann, J.C. 2008. Phylogeographic evidence of crop neodiversity in sorghum. Genetics 179, 997-1008.

Food and Agriculture Organization of the (FAO). http://www.fao.org/faostat/en/#data/QC/visualize Item=Pumpkins, squash and gourds. 2017.

Formisano, G., Paris, H.S., Frusciante, L., Ercolano, M.R. 2010. Commercial Cucurbita pepo squash hybrids carrying disease resistance introgressed from Cucurbita moschata have high genetic similarity. Plant Genetic Resources-Characterization and Utilization 8(3), 198-203.

Francis, R.M. 2017. POPHELPER: an R package and web app to analyse and visualize population structure. Molecular Ecology Resources 17, 27-32.

Fritz, G.J. 1994. Pre-columbian Cucurbita argyrosperma ssp. argyrosperma (Cucurbitaceae) in the eastern woodlands of North America. Economic Botany. 48(3), 280-92.

Fu, Y.B. 2015. Understanding crop genetic diversity under modern plant breeding. Theoretical and Applied Genetics 128, 2131-2142.

Gaba, V., Zelcer, A., Gal-On, A. 2004. Cucurbit biotechnology - The importance of virus resistance. In Vitro Cellular & Developmental Biology-Plant 40(4), 346-58.

Gadagkar, S.R., Rosenberg, M.S., Kumar, S. 2005. Inferring species phylogenies from multiple genes: Concatenated sequence tree versus consensus gene tree. Journal of Experimental Zoology Part B-Molecular and Developmental Evolution 304B(1), 64-74.

Gardner, E.M., Johnson, M.G., Ragone, D., Wickett, N.J., and Zerega, N.J.C. 2016. Low-coverage, whole-genome sequencing of Artocarpus camansi (Moraceae) for phylogenetic marker development and gene discovery. Applications in plant sciences 4(7), 1600017.

Gathman, A., Bemis, W. 1990. Domestication of buffalo gourd, Cucurbita foetidissima. Biology of Cucurbits, 335-48.

Geisler, M. 2014. Squash. Ag Marketing Resource Center, Iowa State University. http://www.agmrc.org/commodities-products/vegetables/squash/.

220

Gepts, P. 1996. Origin and evolution of cultivated Phaseolus species. In: Pickersgill, B. & Lock, J.M., eds. Advances in Legume Systematics, Part 8; Legumes of economic importance. London, UK: Kew publishing, 65-74.

Gerard, D., Gibbs, H.L., Kubatko, L. 2011. Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling. Bmc Evolutionary Biology 11, 291

Giannini, T.C., Lira-Saade, R., Ayala, R., Saraiva, A.M., Alves-dos-Santos, I. 2011. Ecological niche similarities of Peponapis bees and non-domesticated Cucurbita species. Ecological Modelling 222(12), 8.

Gnirke, A, Melnikov, A, Maguire, J, Rogov, P, LeProust, E.M., Brockman, W., Fennell, T., Giannoukos, G., Fisher, S., Russ, C., et. al. 2009. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotechnology 2 7:182-189.

Goldman, A. 2004. The Compleat Squash: A Passionate Grower's Guide to Pumpkins, Squash, Gourds. New York, USA: Artisan.

Gong, L., Paris, H.S., Nee, M.H., Stift, G., Pachner, M., Vollmann, J., Lelley, T. 2012. Genetic relationships and evolution in Cucurbita pepo (pumpkin, squash, gourd) as revealed by simple sequence repeat polymorphisms. Theoretical and Applied Genetics 124(5), 875-891.

Gonzalez, L.C., Solano, J.P.L., Verduzco, C.V., Castellanos, J.S. 2010. Genetic diversity in four species of mexican squash (Cucurbita spp.). Revista Fitotecnia Mexicana. 33(3), 189-96.

Greene, S. In press. Chapter 1. In: Greene, S., Williams, K., Marek, L., Kantar, M. and Khoury, C. (Eds.), Crop Wild Relatives of North America. New York: Springer

Grubben, G., Chigumira Ngwerume, F. 2004. Cucurbita. In: Grubben, G, Denton, O, editors. Vegetables. Plant Resources of Tropical Africa, vol 2. Wageningen, Netherlands: PROTA.

Guindon, S., Gascuel, O. 2003. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Systematic Biology 52(5), 696-704.

Guo, J. Wang, Y., Song, C., Zhou, J., Qiu, L., Huang, H., Wang, Y. 2010. A single origin and moderate bottleneck during domestication of soybean (Glycine max): implications from microsatellites and nucleotide sequences. Annals of Botany 106, 505-514.

Harlan, J.R. and de Wet, J.M.J 1971. Toward a rational classification of cultivated plants. Taxon 20(4), 509-517

221

Haudry, A., Cenci, A., Ravel, C., Bataillon, T., Brunel, D., Poncet, C., Hochu, I., Poirier,S., Santoni, S., Glemin, S., et. al. 2007. Grinding up wheat: A massive loss of nucleotide diversity since domestication. Molecular Biology and Evolution, 1506-1517.

Heikal, A.H., Abdel-Razzak, H.S., Hafez, E.E. 2008. Assessment of genetic relationships among and within Cucurbita species using RAPD and ISSR markers. Journal of Applied Sciences Research (5), 515-525.

Heled, J., Drummond, A.J. 2010. Bayesian Inference of Species Trees from Multilocus Data. Molecular Biology and Evolution 27(3), 570-580.

Heun, M., SchaferPregl, R., Klawan, D., Castagna, R., Accerbi, M., Borghi, B., Salamini, F. 1997. Site of einkorn wheat domestication identified by DNA fingerprinting. Science 278, 1312- 1314.

Hoffman, D.L., Soltis, D.E., Muehlbauer, F.J., Ladizinsky, G. 1986. Isozyme polymorphism in Lens (Leguminosae). Systematic Botany 11(3), 392-402.

Huang, X., Han, B. 2012. A crop of maize variants. Nature Genetics 44(7), 734-735.

Huerta-Cepas, J., Serra, F., and Bork, P. 2016. ETE 3: Reconstruction, Analysis, and Visualization of Phylogenomic Data. Molecular biology and evolution 33, 1635– 1638.

Hufford, M.B, Xu, X., van Heerwaarden, J., Pyhajarvi, T., Chia, J.M., Cartwright, R.A., Elshire, R.J., Glaubitz, J.C., Guill, K.E., Kaeppler, S.M., et. al. 2012. Comparative population genomics of maize domestication and improvement. Nature Genetics 44, 808-U118.

Igea, J., Juste, J., Castresana, J. 2010. Novel intron markers to study the phylogeny of closely related mammalian species. Bmc Evolutionary Biology 10, 369

Illanes, A., Schaffeld, G., Schiappacasse, C., Zuniga, M., Gonzalez, G., Curotto, E. et. al. 1985. some studies on the protease from a novel source the plant Cucurbita ficifolia. Biotechnology Letters 7(9), 669-72.

Iorizzo, M., Senalik, D.A., Ellison, S.L., Grzebelus, D., Cavagnaro, P.F., Allender, C., Brunet J., Spooner, D.M., Van Deynze, A., Simon, P.W. 2013. Genetic structure and domestication of carrot (Daucus carota subsp sativus) (Apiaceae). American Journal of Botany 100,930-938.

222

Jing, R., Vershinin, A., Grzebyta, J., Shaw, P., Smykal, P., Marshall, D., Ambrose, M.J., Ellis, T.H.N., Flavell, A.J. 2010. The genetic diversity and evolution of field pea (Pisum) studied by high throughput retrotransposon based insertion polymorphism (RBIP) analysis. Bmc Evolutionary Biology 10, 44

Johnson, M.G., Gardner, E.M., Liu, Y., Medina, R., Goffinet, B., Shaw, A.J., Zerega, N.J.C, and Wickett, N.J. 2016. HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment. Applications in plant sciences 4(7), 1600016.

Jones, C.S. 1992. Comparative ontogeny of a wild cucurbit and its derived cultivar. Evolution 46(6), 1827-47.

Junguo, Z., Huiling, H., Xinzheng, L., Ruijin, Z., Huirong, Z., editors. 2010. Identification of a resource of powdery mildew resistance in Cucurbita moschata. Acta Horticulturae. International Society for Horticultural Science (ISHS), Leuven, Belgium.

Kassa, M.T., Penmetsa, R.V., Carrasquilla-Garcia, N., Sarma, B.K., Datta, S., Upadhyaya, H.D., Varshney, R.K., von Wettberg, E.J.B., Cook D.R. 2012. Genetic Patterns of Domestication in Pigeonpea (Cajanus cajan (L.) Millsp.) and Wild Cajanus Relatives. Plos One 7.

Kates, H.R., Soltis, P.S., and Soltis, D.E. 2017. Evolutionary and domestication history of Cucurbita (pumpkin and squash) species inferred from 44 nuclear loci. Molecular phylogenetics and evolution 111, 98–109.

Katoh, K., and Standley, D.M. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular biology and evolution 30, 772–780.

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., et. al., 2012. Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28(12), 1647-1649.

Keinath, A.P. 2014. Differential susceptibility of nine cucurbit species to the foliar blight and crown canker phases of gummy stem blight. Plant Disease 98(2), 247-54.

Keller, O., Kollmar, M., Stanke, M., and Waack, S. 2011. A novel hybrid gene prediction method employing protein multiple sequence alignments. Bioinformatics 27, 757– 763.

Kellogg, E.A., Birchler, J.A. 1993. Linking phylogeny and genetics - Zea-mays as a tool for phylogenetic studies. Systematic Biology 42(4), 415-439.

223

Kennedy, C. 2015. Climate & Pumpkins. Climate Watch Magazine.

Khoury, C.K., Greene, S., Wiersema, J., Maxted, N., Jarvis, A., Struik, P.C. 2013. An Inventory of crop wild relatives of the United States. Crop Science 53(4), 1496- 508. Doi:10.2135/cropsci2012.10.0585.

Kistler, L., Newsom, L.A., Ryan, T.M., Clarke, A.C., Smith, B.D., Perry, G.H. 2015. Gourds and squashes (Cucurbita spp.) adapted to megafaunal extinction and ecological anachronism through domestication. Proceedings of the National Academy of Sciences of the United States of America 112(49), 15107-15112.

Knowles, L.L. 2009. Estimating species trees: methods of phylogenetic analysis when there is incongruence across genes. Systematic biology 58, 463–467.

Knowles, LL, Klimov, P.B. 2011. Estimating phylogenetic relationships despite discordant gene trees across loci: the species tree of a diverse species group of feather mites (Acari: Proctophyllodidae). Parasitology 138(13), 1750-1759.

Kubatko, L.S., Carstens, B.C., and Knowles, L.L. 2009. STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25, 971–973.

Kubatko, L.S., and Degnan, J.H. 2007. Inconsistency of phylogenetic estimates from concatenated data under coalescence. Systematic biology 56, 17–24.

Lebeda, A., Křístková, E., Roháčková, J., Sedláková, B., Widrlechner, M.P., Paris, H.S. 2016. Race- specific response of Cucurbita germplasm to Pseudoperonospora cubensis. Euphytica, 1-12.

Lee, T.H., Guo, H., Wang, X.Y., Kim, C., Paterson, A.H. 2014. SNPhylo: a pipeline to construct a phylogenetic tree from huge SNP data. Bmc Genomics 15.

Leigh, J.W., Lapointe, F-J, Lopez, P., Bapteste, E. 2011. Evaluating Phylogenetic Congruence in the Post-Genomic Era. Genome Biology and Evolution 3, 571- 587.

Lema, V.S. 2011. The possible influence of post-harvest objectives on Cucurbita maxima subspecies maxima and subspecies andreana evolution under cultivation at the Argentinean Northwest: an archaeological example. Archaeological and Anthropological Sciences 3, 113-139.

Li, C.B., Zhou, A.L., Sang, T. 2006. Rice domestication by reducing shattering. Science 311:1936- 1939.

224

Li, Z.M., Zheng, X.M., Ge, S. 2011. Genetic diversity and domestication history of African rice (Oryza glaberrima) as inferred from multiple gene sequences. Theoretical and Applied Genetics 123:21-31.

Li, H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN]. Available at: http://arxiv.org/abs/1303.3997.

Linder, C.R., Rieseberg, L.H. 2004. Reconstructing patterns of reticulate evolution in plants. Amer. Journal of Botany 91(10): 1700-8.

Lira, R., Tellez, O., Davila, P. 2009. The effects of climate change on the geographic distribution of Mexican wild relatives of domesticated Cucurbitaceae. Genetic Resources and Crop Evolution: 56(5):691-703. doi:10.1007/s10722-008-9394-y.

Lira Saade, R., Montes Hernandez, M. 1994. Cucurbits ( Cucurbita spp. ). In: Hernando Bermejo J, Leon J, editors. Neglected Crops : 1492 from a Different Perspective. Rome, Italy: FAO. p. 63-77.

Liu, L., Pearl, D.K.. 2007. Species Trees from Gene Trees: Reconstructing Bayesian Posterior Distributions of a Species Phylogeny Using Estimated Gene Tree Distributions. Systematic Biology 56(3), 504-514.

Liu, L., Pearl, D.K., Brumfield, R.T., and Edwards, S.V. 2008. Estimating species trees using multiple-allele DNA sequence data. Evolution; international journal of organic evolution 62: 2080–2091.

Liu, L., Wu, S., and Yu, L. 2015. Coalescent methods for estimating species trees from phylogenomic data. Journal of Systematics and Evolution 53: 380–390.

Liu, L., Yu, L., Kubatko, L., Pearl, D.K., Edwards, S.V. 2009. Coalescent methods for estimating phylogenetic trees. Molecular Phylogenetics and Evolution 53(1), 320- 328.

Liu, L., Yu, L., Pearl, D.K., and Edwards, S.V. 2009. Estimating species phylogenies using coalescence times among sequences. Systematic biology 58: 468–477.

Liu, L., Yu, L., and Edwards, S.V. 2010. A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC evolutionary biology 10: 302.

Liu, X-J, Li, Y-J, Liang, G-Y, Fang, C., Liu, D-C, Yang, H., Zhao, Y. 2010. Genetic diversity of some accessions in Cucurbita by SRAP markers. Sichuan Daxue Xuebao (Ziran Kexueban) 47(5), 1195-1200.

225

Luis Blanco-Pastor, J., Vargas, P., Pfeil, B.E. 2012. Coalescent Simulations Reveal Hybridization and Incomplete Lineage Sorting in Mediterranean Linaria. Plos One 7(6).

Maddison, W.P., Knowles, L.L. 2006. Inferring phylogeny despite incomplete lineage sorting. Systematic Biology 55(1), 21-30.

Maddison, W.P. 1997. Gene trees in species trees. Systematic Biology 46(3), 523-536.

Maddison, W.P., and Maddison, D.R. 2017. Mesquite: a modular system for evolutionary analysis. Available at: http://mesquiteproject.org.

Matsuoka, Y., Vigouroux, Y., Goodman, M.M., Sanchez, G.J., Buckler, E., Doebley, J. 2002. A single domestication for maize shown by multilocus microsatellite genotyping. Proceedings of the National Academy of Sciences of the United States of America 99(9), 6080-6084.

Maureira-Butler, I.J., Pfeil, B.E., Muangprom, A., Osborn, T.C., Doyle, J.J. 2008. The reticulate history of Medicago (Fabaceae). Systematic Biology 57(3), 466-482.

Mayer, M.S., Soltis, P.S. 1994. Chloroplast DNA phylogeny of Lens (Leguminosae) - origin and diversity of the cultivated lentil. Theoretical and Applied Genetics 87(7), 773-781.

Maxted, N., Ford-Lloyd, B.V., Jury, S. et al. 2006. Towards a definition of a Biodivers. Conserv. 15: 2673.

McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K. et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome research 20: 1297–1303.

Meirmans, P.G. 2015. Seven common mistakes in population genetics and how to avoid them. Molecular Ecology 24:3223-3231.

Merrick, L. C. 1990. Systematics and evolution of a domesticated squash, Cucurbita argyrosperma, and its wild and weedy relatives. In: D. M. Bates, R. W. Robinson, and C. Jeffrey, eds. Biology and utilization of the Cucurbitaceae. New York, USA: Cornell University Press, 77–95.

Metcalf, R.L., Rhodes, A.M., Ferguson, J.E., E.R., M. 1979. Bitter Cucurbita spp. as for diabroticite beetles. Contract No.: 23.

226

Meyer, R.S., Karol, K.G., Little, D.P., Nee, M.H., Litt, A. 2012. Phylogeographic relationships among Asian eggplants and new perspectives on eggplant domestication. Molecular Phylogenetics and Evolution 63(3), 685-701.

Meyer, R.S., Purugganan, M.D. 2013. Evolution of crop species: genetics of domestication and diversification. Nature Reviews Genetics 14:840-852.

Minor, R. and Bond, J.K. Vegetables and Pulses Yearbook Data/#89011/ April 06, 2017

Mirarab, S., Warnow, T. 2015. ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), 44- 52.

Molinar, R., Aguiar, J., Gaskell, M., Mayberry, K. Summer squash production in California. UC Small Farm Program 2012.

Montes-Hernandez, S., Eguiarte, L.E. 2002. Genetic structure and indirect estimates of gene flow in three taxa of Cucurbita (Cucurbitaceae) in western Mexico. American Journal of Botany 89:1156-1163.

Montes-Hernandez, S., Merrick, L.C., Eguiarte, L.E. 2005. Maintenance of squash (Cucurbita spp.) landrace diversity by farmers' activities in Mexico. Genetic Resources and Crop Evolution: 52(6):697-707. doi:10.1007/s10722-003-6018-4.

Nabhan, G.P. 1985. Gathering the Desert. Tucson, AZ: University of Arizona Press.

Naudin, C. 1856. Nouvelles recherches sur les caractères spécifiques et les variétés des plantes du genre Cucurbita. Annales des Sciences Naturelles. Botanique, vol IV, 5–73.

Nee, M. 1990. The domestication of Cucurbita (Cucurbitaceae). Economic Botany 44(3), 56-68.

Nei, M., Li, W-H. 1979. Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Natl. Acad. Sci. 76(10):5269-5273.

Nesom, G.L. 2011. New state records for Citrullus, Cucumis, and Cucurbita (Cucurbitaceae) outside of cultivation in the USA. Phytoneuron. 1-7.

Neves, L.G., Davis, J.M., Barbazuk, W.B., Kirst, M. 2013. Whole-exome targeted sequencing of the uncharacterized pine genome. Plant Journal 75:146-156.

227

Nuez, F., Fernandez de Cordova, P., Ferriol, M., Valcarcel, J., Pico, B., Diez, M. 2000. Cucurbita ssp. and siceraria collection of the genebank of the center for conservation and breeding of the agricultural biodiversity (COMAV) of the Polytechnical University of Valencia. Cucurbit Genetics Cooperative Report. 2000(23):60-1.

Olsen, K.M., Purugganan, M.D. 2002. Molecular evidence on the origin and evolution of glutinous rice. Genetics 162(2), 941-950.

Olsen, K.M., Schaal, B.A. 1999. Evidence on the origin of cassava: Phylogeography of Manihot esculenta. Proceedings of the National Academy of Sciences of the United States of America 96(10), 5586-5591.

Paradis, E., Claude, J., Strimmer, K. 2004. APE: Analyses of Phylogenetics and Evolution in R language. Bioinformatics 20(2), 289-290.

Paris, H.S. 1989. Historical records, origins, and development of the edible cultivar groups of Cucurbita pepo (Cucurbitaceae). Economic Botany: 43(4):423-43. doi:10.1007/bf02935916.

Paris, H.S. 2016. Overview of the origins and history of the five major cucurbit crops: issues for ancient DNA analysis of archaeological specimens. Vegetation History and Archaeobotany 25(4), 405-14. doi:10.1007/s00334-016-0555-1.

Paris, H.S, Brown, R.N. 2005. The genes of pumpkin and squash. Hortscience 40, 1620-1630.

Paris, H.S., Lebeda, A., Kristkova, E., Andres, T.C., Nee, M.H. 2012. Parallel Evolution Under Domestication and phenotypic differentiation of the cultivated subspecies of Cucurbita pepo (Cucurbitaceae). Economic Botany 66(1), 71-90. doi:10.1007/s12231-012-9186-3.

Paris, H.S., Yonash, N., Portnoy, V., Mozees-Daube, N., Tzuri, G., Katzir, N. 2003. Assessment of genetic relationships in Cucurbita pepo (Cucurbitaceae) using DNA markers. Theoretical and Applied Genetics 106(6), 971-978.

Parks, M., Cronn, R., Liston, A. 2009. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes. Bmc Biology 7, 84.

Patel, S.K., Braun, R.T., E.L. 2013. Error in phylogenetic estimation for bushes in the tree of life: Phylogenet. Evol. Biol. 1:110.

228

Patterson, M., Marschall, T. Pisanti, N. Van Iersel, L., Stougie, L., Klau, G.W., and Schönhuth, A. 2015. WhatsHap: Weighted Haplotype Assembly for Future- Generation Sequencing Reads. Journal of computational biology: a journal of computational molecular cell biology 22: 498–509.

Paur, S. 1952. Four native New Mexico plants of promise as oilseed crops. Las Cruces, NM.

Pautasso, M., Doring, T.F., Garbelotto, M., Pellis, L., Jeger, M.J. 2012. Impacts of climate change on plant diseases-opinions and trends. European Journal of Plant Pathology. 133(1), 295-313. doi:10.1007/s10658-012-9936-1.

Penmetsa, V.R., Carrasquilla-Garcia, N., Bergmann, E. M., Vance, L., Castro, B., Kassa, M. T., Sarma, B. K., Datta, S., Farmer, A. D., Baek, J.-M., Coyne, C. J., Varshney, R. K., von Wettberg, E. J. B.,Cook, D. R. 2016. Multiple post- domestication origins of kabuli chickpea through allelic variation in a diversification-associated transcription factor. New Phytol 211, 1440–1451.

Pfeifer, B., Wittelsburger, U., Ramos-Onsins, S.E., Lercher, M.J. 2014. PopGenome: An Efficient Swiss Army Knife for Population Genomic Analyses in R. Molecular Biology and Evolution 31, 1929-1936.

Philippe, H., Snell, E.A., Bapteste, E., Lopez, P., Holland, P.W.H., Casane, D. 2004. Phylogenomics of eukaryotes: Impact of missing data on large alignments. Molecular Biology and Evolution 21(9), 1740-1752.

Piperno, D.R., Pearsall, D.M. 1998. The Origins of Agriculture in the Lowland Neotropics. San Diego: USA: Academic Press, 109-166.

Potts, A.J., Hedderson, T.A., and Grimm, G.W. 2014. Constructing phylogenies in the presence of intra-individual site polymorphisms (2ISPs) with a focus on the nuclear ribosomal cistron. Systematic biology 63: 1–16.

Pritchard, J.K., Stephens, M., Donnelly, P. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945-959.

Provvidenti, R. 1990. Viral diseases and genetic sources of resistance in Cucurbita species. Biology and Utilization of the Cucurbitaceae (DM Bates, RW Robinson and Ch Jeffrey, eds) Comstock Publ Assoc, Cornell University Press, Ithaca and London. 427-35.

Provvidenti, R., Robinson, R.W., Munger, H.M. 1978. Resistance in feral species to 6 viruses infecting Cucurbita. Plant Disease Reporter. 62(4):326-9.

229

R Core Team 2014. R: A language and environment for statistical computing. R Foundation

Rader, R., Reilly, J., Bartomeus, I., Winfree, R. 2013. Native bees buffer the negative impact of climate warming on pollination of watermelon crops. Global Change Biology. 19(10):3103-10. doi:10.1111/gcb.12264.

Ranwez, V., Harispe, S., Delsuc. F., and Douzery, E.J.P. 2011. MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. PloS one 6: e22594.

Robinson, R.W. 1999. Rationale and methods for producing hybrid cucurbit seed. Journal of New Seeds 1:1-47.

Robinson, R.W. 2002. Squash and Pumpkin. Geneva, New York: Horticultural Sciences

Robinson, R.W., Decker-Walters, D. 1997. Cucurbits: Cab international.

Rodriguez-Arevalo, I., Mattana, E., Garcia, L., Liu, U., Lira, R., Davila, P., Hudson, A., Pritchard, H.W., Ulian, T. 2017. Conserving seeds of useful wild plants in Mexico: main issues and recommendations. Genetic Resources and Crop Evolution 64 (6), 1141-1190. doi:10.1007/s10722-016-0427-7

Ronquist, F., and Huelsenbeck, J.P. 2003. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.

Rosemeyer, M., Wells, B., Zaid, A., editors. 1982. Diseases of the buffalo gourd, Cucurbita foetidissima, in Arizona. Phytopathology; Amer Phytopathological Soc 3340 pilot knob road, st paul, mn 55121.

Sánchez, K.S.V, González Santos, R., Aragón-Cuevas, F. 2015. Community Seed Banks in Mexico. In: Vernooy R, Shrestha P, Sthapit B (eds) Community Seed Banks: Origins, Evolution and Prospects. Routledge, London and New York,

Sanjur, O.I., Piperno, D.R., Andres, T.C., Wessel-Beaver, L. 2002. Phylogenetic relationships among domesticated and wild species of Cucurbita (Cucurbitaceae) inferred from a mitochondrial gene: Implications for crop plant evolution and areas of origin. Proceedings of the National Academy of Sciences of the United States of America 99(1), 535-540.

Sasu, M.A., Ferrari, M.J., Du, D., Winsor, J.A., Stephenson, A.G. 2009. Indirect costs of a nontarget pathogen mitigate the direct benefits of a virus-resistant transgene in wild Cucurbita. Proceedings of the National Academy of Sciences. 106(45), 19067-71.

230

Savage, J.A., Haines, D.F., Holbrook, N.M. 2015. The making of buttercup squash: how selective breeding changed the phloem of Cucurbita maxima from source to sink. Plant Cell and Environment 38, 1543-1554.

Schaefer, H, Heibl, C., Renner, S.S. 2009. Gourds afloat: a dated phylogeny reveals an Asian origin of the gourd family (Cucurbitaceae) and numerous oversea dispersal events. Proceedings of the Royal Society B-Biological Sciences 276(1658), 843- 851.

Schaefer, H., Renner, S.S. 2011. Cucurbitaceae. In: Kubitzki K ed. Flowering Plants. Sapindales, , Myrtaceae. New York, USA: Springer, 112- 174.

Scheerens, J.C., Ralowicz, A.E., McGriff, T.L., Bee, K.A., Nelson, J.M., Gathman, A.C. 1991. Phenotypic variation of agronomic traits among coyote gourd accessions and their progeny. Economic Botany. 45(3), 365-78. doi:10.1007/bf02887078.

Shahani, H., Dollear, F., Markley, K., Quinby, J. 1951. The buffalo gourd, a potential oilseed crop of the southwestern drylands. Journal of the American Oil Chemists’ Society. 28(3), 90-5.

Simmons, M.P., Gatesy, J. 2015. Coalescence vs. concatenation: Sophisticated analyses vs. first principles applied to rooting the angiosperms. Molecular Phylogenetics and Evolution 91, 98-122.

Singh, S.P., Gepts, P., Debouck, D.G. 1991. Races of common bean (, Fabaceae). Economic Botany 45, 379-396.

Small, R.L., R.C. Cronn, and J.F. Wendel. 2004. Use of nuclear genes for phylogeny reconstruction in plants. Australian systematic botany 17, 145–170.

Smith, B.D. 1997. The Initial Domestication of Cucurbita pepo in the Americas 10,000 Years Ago. Science 276(5314), 932-934.

Smith BD. 2001. Documenting plant domestication: the consilience of biological and archaeological approaches. Proceedings of the National Academy of Sciences. 98(4), 1324-6.

Smith, B.D. 2006. Eastern North America as an independent center of plant domestication. Proceedings of the National Academy of Sciences of the United States of America. 103(33), 12223-8. doi:10.1073/pnas.0604335103.

Smith, S.A., Moore, M.J., Brown, J.W., Yang, Y. 2015. Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. Bmc Evolutionary Biology 15, 150.

231

Smýkal, P., Kenicer, G., Flavell, A.J., Corander, J., Kosterin, O., Redden, R.J., Ford, R., Coyne, C.J., Maxted, N., Ambrose, M.J., et. al., 2011. Phylogeny, phylogeography and genetic diversity of the Pisum genus. Plant Genetic Resources 9(1), 4-18.

Soltis, P.S., and Soltis, D.E. 2003. Applying the Bootstrap in Phylogeny Reconstruction. Statistical science: a review journal of the Institute of Mathematical Statistics 18, 256–267.

Stamatakis, A. 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22(21), 2688-2690.

Stamatakis, A. 2014. RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30(9), 1312-1313.

Steane, D.A, Nicolle, D., Sansaloni, C.P., Petroli, C.D., Carling, J., Kilian, A., Myburg, A.A., Grattapaglia D, Vaillancourt RE. 2011. Population genetic analysis and phylogeny reconstruction in Eucalyptus (Myrtaceae) using high-throughput, genome-wide genotyping. Molecular Phylogenetics and Evolution 59(1), 206- 224.

Sun, Y., Moore, M.J., Zhang, S., Soltis, P.S., Soltis, D.E., Zhao, T., Meng, A., Li, X., Li, J., Wang, H. 2016. Phylogenomic and structural analyses of 18 complete plastomes across nearly all families of early-diverging eudicots, including an angiosperm-wide analysis of IR gene content evolution. Molecular Phylogenetics and Evolution 96: 93-101.

Swofford, D.L. 2002. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4.0b10.

Towle, M.A. 1961. The ethnobotany of pre-Columbian Peru. New York, NY: Viking Fund Pub. Anthropol.

Tricoli, D., Carney, K., Russel, P.M. Jr, Groff, D.W., Hadden, K.C., Himmel, P.T., Habbard, J.P., Boeshore, M.l., Reynolds, J.F., Quemada, H.D. 1995. Field evaluation of transgenic squash containing single or multiple virus coat protein gene constructs for resistance to cucumber mosaic virus, watermelon mosaic virus. Nature Biotechnology.13:1458-65.

Untergasser, A., Cutcutache, I., Koressaar, T., Ye, J., Faircloth, B.C., Remm, M., Rozen, S.G. 2012. Primer3--new capabilities and interfaces. Nucleic Acid Research 40(15), 115

232

Uribe-Convers, S., Settles, M.L., and Tank, D.C. 2016. A Phylogenomic Approach Based on PCR Target Enrichment and High Throughput Sequencing: Resolving the Diversity within the South American Species of Bartsia L. (Orobanchaceae). PloS one 11: e0148203.

Walters, T.W., Decker-Walters, D.S. 1993. Systematics of the endangered okeechobee gourd (Cucurbita okeechobeensis Cucurbitaceae). Systematic Botany 18(2), 175-187.

Wang, Y-H. 2012. Mapping and molecular breeding of monogenic traits. In: Wang, Y-H., Behera, T., Kole, C., editors. Genetics, Genomics, and Breeding of Cucurbits. Genetics, Genomics, and Breeding of Crop Plants. Boca Raton, FL: CRC Press.

Ward, D.B., Minno, M.C. 2002. Rediscovery of the Endangered Okeechobee Gourd (Cucurbita okeechobeensis) along the St. Johns River, Florida, Where Last Reported by William Bartram in 1774. Castanea 67(2), 201-206.

Warnow, T. Concatenation Analyses in the Presence of Incomplete Lineage Sorting. 2015. PLOS Currents Tree of Life. Ed. 1 doi: 10.1371/currents.tol.8d41ac0f13d1abedf4c4a59f5d17b1f7.

Waskom, M., Botvinnik, O., Hobson, P., Warmenhoven J., Cole, J.B., Halchenko, Y., Vanderplas, J. et al. 2014. Seaborn: statistical data visualization.

Watanabe, M.E. 2013. Pollinators at Risk: Human activities threaten key species. BioScience.

Watterson, G.A. 1975. Number of segregating sites in genetic models without recombination. Theoretical Population Biology 7:256-276.

Weisrock, D.W., Smith, S.D., Chan, L.M., Biebouw, K., Kappeler, P.M., and Yoder, A.D. 2012. Concatenation and concordance in the reconstruction of mouse lemur phylogeny: an empirical demonstration of the effect of allele sampling in phylogenetics. Molecular biology and evolution 29: 1615–1630.

Wen, Z.X., Boyse, J.F., Song, Q.J., Cregan, P.B., Wang, D.C. 2015. Genomic consequences of selection and genome-wide association mapping in soybean. BMC Genomics 16.

Wessel-Beaver, L. 1998. Sources of whitefly-induced silvering resistance in Cucurbita. In: McCreight J, editor. Cucurbitaceae 98: Evaluation and enhancement of cucurbit germplasm. Alexandria, VA: ASHS Press.

Wessel-Beaver, L. 2000. Cucurbita argyrosperma sets fruit in fields where C. moschata is only source. Cucurbit Genetics Cooperative Report 23: 62-63.

233

Where in the world are GM crops and foods. 2015. GMO Inquiry 2015. Canadian Biotechnology Action Network (CBAN),

Whitaker, T.W., Bemis, W.P. 1964. Evolution in the genus Cucurbita. Evolution 18, 553- 559.

Whitaker, T.W., Davis, G.N. 1962. Cucurbits. Botany, cultivation, and utilization.

Whitaker, T.W., Robinson, R.W. 1986. Squash breeding. In: Bassett MJ, editor. Breeding vegetable crops. Westport,CT.: AVI Publishing Company.

Wiens, J.J. 2006. Missing data and the design of phylogenetic analyses. Journal of Biomedical Informatics 39(1), 34-42.

Wiens, J.J., Kuczynski, C.A., Townsend, T., Reeder, T.W., Mulcahy, D.G., Sites, J.W. 2010. Combining Phylogenomics and Fossils in Higher-Level Squamate Reptile Phylogeny: Molecular Data Change the Placement of Fossil Taxa. Systematic Biology 59(6), 674-688.

Wiens, J.J., Moen, D.S. 2008. Missing data and the accuracy of Bayesian phylogenetics. Journal of Systematics and Evolution 46(3), 307-314.

Wiens, J.J., Morrill, M.C. 2011. Missing Data in Phylogenetic Analysis: Reconciling Results from Simulations and Empirical Data. Systematic Biology 60(5), 719-731.

Wilkes, H.G. 1977. Hybridization of maize and teosinte in Mexico and Guatemala and improvement of maize. Economic Botany 31(3), 254-293.

Williams, E.W., Gardner, E.M., Harris 3rd, R., Chaveerach, A., Pereira, J.T., and Zerega, N.J.C. 2017. Out of Borneo: biogeography, phylogeny and divergence date estimates of Artocarpus (Moraceae). Annals of botany 119, 611–627.

Wilson, H.D., Lira, R., Rodríguez, I. 1994. Crop/Weed gene flow: Cucurbita argyrosperma Huber and C. fraterna LH Bailey (Cucurbitaceae). Economic Botany: 48(3), 293-300.

Xu, S.Z. 2000. Phylogenetic analysis under reticulate evolution. Mol Biol Evol 17(6), 897-907.

Yang, S-L., Walters, T.W. 1992. Ethnobotany and the economic role of the Cucurbitaceae of China. Economic Botany: 46(4), 349-67.

Zerega, N.J.C., Supardi, M.N.N., and Motley, T.J. 2010. Phylogeny and Recircumscription of Artocarpeae (Moraceae) with a Focus on Artocarpus. Systematic botany 35, 766–782.

234

Zeven, A.C. 1998. Landraces: A review of definitions and classifications. Euphytica 104, 127-139.

Zhao, D., Wen, L., Bi, H.W., Zhu, Z.C., Liu, J.H., Zhang, J.M., Shi, Q.X., You, H.B., Dong, D.J., Liu, Q. 2017. Genetic diversity of Cucurbita maxima assessed using morphological characteristics and random-amplified polymorphic DNA markers in China. Acta Agriculturae Scandinavica Section B-Soil and Plant Science 67, 155- 163.

Zheng, X.W., Levine, D., Shen, J., Gogarten, S.M., Laurie, C., Weir, B.S. 2012. A high- performance computing toolset for relatedness and principal component analysis of SNP data. Bioinformatics 28, 3326-3328.

Zheng, Y-H., Alverson, A.J., Wang, Q-F., Palmer, J.D. 2013. Chloroplast phylogeny of Cucurbita: Evolution of the domesticated and wild species. Journal of Systematics and Evolution 51(3), 326-334.

Zhenxiang, X., Liang, L., Charles, C.D. 2015. The Impact of Missing Data on Species Tree Estimation. Mol. Biol. Evol. 33 (3), 838-860.

Zhu, Q.H., Zheng, X.M., Luo, J.C., Gaut, B.S., Ge, S. 2007. Multilocus analysis of nucleotide variation of Oryza sativa and its wild relatives: Severe bottleneck during domestication of rice. Molecular Biology and Evolution 24, 875-888.

235

BIOGRAPHICAL SKETCH

Heather Rose Kates earned her B.A. in biology from Oberlin College in Oberlin,

Ohio in 2011. Under the mentorship of Oberlin associate professor Dr. Michael J.

Moore, Heather began her academic career in plant phylogenetics, studying the evolution of edaphic endemic mustards. During her time as an undergraduate, Dr. Kates also worked as a researcher in the labs of Dr. Krissa Skogen of the Chicago Botanic

Garden and Dr. Peter Fritsch at the California Academy of Sciences. Dr. Kates began her doctoral work at the University of Florida in the fall of 2011 as a rotation student in the Genetics and Genomics graduate program and undertook research projects under the mentorship of Drs. Kevin Folta, Matthew Gitzendanner, Douglas E. Soltis, and

Pamela S. Soltis. In the fall of 2012, Dr. Kates began her dissertation work under the direction of Drs. Douglas E. Soltis and Pamela S. Soltis, and graduated in the fall of

2017.

236