FACULTY OF SCIENCE UNIVERSITY OF COPENHAGEN

Master’s Genomic Evolutionthesis of vinifera

Insights into the history of viniculture from ancient DNA Eva Egelyng Sigsgaard Master thesis by Anne Kathrine Wiborg Runge

Monitoring of marine fishes using environmental DNA

from seawater samples

Academic advisors: Post Doc. Nathan Wales. Professor Tom Gilbert

Submitted 31.08.2015

Academic advisors: Postdoc Philip F. Thomsen and Ass. Prof. Peter R. Møller

Submitted:! 01/08/14 ! 1! Institute name: Natural History Museum of Denmark

Name of department: Centre for Geogenetics

Author: Anne Kathrine Wiborg Runge

Title: Genomic Evolution of – a palaeogenomic approach

Subject description: This MSc thesis presents the results of the first in-solution target capture on ancient and historic Vitis vinifera subsp vinifera seeds and explores their potential in ancient DNA studies. An enrichment of up to 3.4 fold was observed in bioinformatically useful reads in post-capture libraries. Berry colour was inferred for 18 samples from the Roman and Medieval period. Furthermore a tentative assignment of sex genotype was made for seven berries. Evidence of a genetic displacement in cultivars through time was found at one site on the coast of the Mediterranean Basin, matching the changes in human habitation in the area. Finally, a provisional identification of two and two Chradonnay cultivars was made. In all, this study proves the effectiveness of target enrichment of ancient plants DNA, and demonstrates the advantages of using ancient and historic seeds to study domestication in plants.

Academic advisors: Post Doc. Nathan Wales. Centre for GeoGenetics Natural History Museum of Denmark University of Copenhagen

Professor Tom Gilbert

Centre for GeoGenetics Natural History Museum of Denmark University of Copenhagen

Submitted: 31st August 2015

Cover illustration: photo by Anne Kathrine Wiborg Runge, Montpellier, (2015) ,,.“

! ! 2! Table of contents PREFACE AND ACKNOWLEDGEMENTS!...... !4! SUMMARY!...... !5! INTRODUCTION!...... !6! A REVIEW OF THE HISTORY OF ANCIENT DNA RESEARCH!...... !6! Introducing aDNA!...... !6! The methodological progress in aDNA research!...... !7! Ancient plants!...... !12! GRAPES!...... !13! Domestication history!...... !14! Domestication traits!...... !15! METHODS!...... !20! SAMPLES!...... !20! EXTRACTION AND LIBRARY BUILD!...... !20! CAPTURE!...... !24! DATA ANALYSIS!...... !28! RESULTS!...... !31! TARGET ENRICHMENT!...... !31! SEX DETERMINATION!...... !33! BERRY COLOUR!...... !34! POPULATION STRUCTURE!...... !37! DISCUSSION!...... !39! TARGET ENRICHMENT!...... !39! GRAPE SEX!...... !40! BERRY COLOUR!...... !42! POPULATION STRUCTURE!...... !43! CONCLUDING REMARKS!...... !44! REFERENCES!...... !45! SUPPLEMENTARY MATERIAL!...... !51! SUPPLEMENTARY FIGURES!...... !51!

! ! 3! Preface and acknowledgements This M.Sc. thesis was conducted at the Centre for GeoGenetics, Natural History Museum of Copenhagen, University of Copenhagen, and was supervised by Nathan Wales and Tom Gilbert.

I would like to thank Laurent Bouby for providing the grape seeds used in the experiments described below, and Roberto Bacilieri and the GrapeReSeq project for providing the SNPs used in the capture experiment, access to the their database on modern cultivars, and valuable discussion on every aspect of grapes. In addition, their hospitality during my visit in Montpellier was greatly appreciated. I would like to thank my supervisors for being dedicated and inspiring, for their help and guidance, and for constructive discussions during the process.

I would like to express my gratitude to Jazmín Ramos Madrigal for the help she gave me with the bioinformatics, her patience in teaching me how to work in the terminal, and the tremendous amount of work she put into the analytical phase. I would also like to thank Jose Alfredo Samaniego for designing the RNA baits used in the capture experiment, and everybody at the Centre for GeoGenetics and the Danish National High-throughput DNA Sequencing Centre for their help and guidance.

! ! 4! Summary This MSc thesis presents the results of the first in-solution target capture on ancient and historic Vitis vinifera subsp vinifera seeds, and explores their potential in ancient DNA studies. An enrichment of up to 3.4 fold was observed in bioinformatically useful reads in post-capture libraries. Berry colour was inferred for 18 samples from the Roman and Medieval period. Furthermore a tentative assignment of sex genotype was made for seven berries. Evidence of a genetic displacement in cultivars over a 1000 year period was found, which matches the changes in human habitation in the area. Finally, a provisional identification of two Pinot blanc and two cultivars was made. In all, this study proves the effectiveness of target capture in enriching for ancient plant DNA, and demonstrates the advantages of using ancient and historic grape seeds to study domestication in plants.

! ! 5! Introduction This study has four main goals, 1) to determine the benefits of using an array-based in-solution capture protocol on waterlogged ancient and historic Vitis seeds, 2) to investigate the possibility of determining sex genotypes in ancient and historic grape pips, and thereby infer the presence of hermaphroditic cultivated grapes and admixture with the wild ancestor, 3) to determine the berry colours present in Roman and Medieval France pertaining to when the white phenotype occurred and, 4) to investigate the population structure of the ancient samples and through a comparison with modern cultivars attempt to determine which varieties might have been grown in France during the Roman and Middle Ages.

A review of the history of ancient DNA research Introducing aDNA The field of ancient DNA (aDNA) concerns itself with the molecular study of archaeological and paleontological specimens where DNA can be extracted from a wide range of well-preserved tissues. To date, studies have been carried out on sample types ranging from ancient bone (Prüfer et al., 2014), hair (Rasmussen et al., 2010), and dental calculus (Warinner et al., 2014), to cropolites (Wood et al., 2012), seeds (Manen et al., 2003, Cappellini et al., 2010), cobs (da Fonseca et al., 2015), sediments (Willerslev et al., 2003, Willerslev et al., 2007, Anderson-Carpenter et al., 2011), and many more in-between. Ancient specimens generally contain low amounts of endogenous DNA, which consists of short molecules that are highly fragmented, and exhibit characteristic DNA damage patterns. Many aDNA samples are hindered by inhibition during PCR. In addition, exogenous contamination from microbial sources, like bacteria and fungi that have colonised the samples since their deposition, usually far exceeds the endogenous content of the target organism (Pääbo, 1989, Hofreiter et al., 2001, Willerslev and Cooper, 2005, Orlando et al., 2015).

DNA is an unstable molecule, but in living cells repair pathways are in place to quickly mend any damage to the DNA strands (Lindahl, 1993). When the cell dies these mechanisms are no longer active, and under normal circumstances the genetic material is rapidly degraded by endonucleases. Under certain conditions, however, the endonuclease activity can be slowed down or halted completely (Hofreiter et al., 2001). Constant freezing temperatures, as is found in permafrost and ice-cores, are ideal for long-time conservation of genetic material, and it has been estimated that DNA molecules can survive up to 1 million years in these environs (Willerslev and Cooper, 2005, Orlando et al., 2015). Rapid and continued desiccation, waterlogging and high salinity have also proven well suited for long-term DNA storage (Pääbo, 1989, Lindahl, 1993, Orlando et al., 2002, Willerslev and Cooper, 2005, Anderson-Carpenter et al., 2011), and DNA survival times have been estimated to around 500,000 years in arid, temperate caves (Orlando et al., 2015). In these preservation conditions, spontaneous processes like hydrolysis and oxidation affect the longevity of genetic material (Lindahl, 1993, Hofreiter et al., 2001). The most characteristic form of damage observed in aDNA sequences is C→T transitions (Orlando et al., 2015), corresponding to cytosine to urasil deamination (Orlando et al., 2015). Single-stranded DNA is far more susceptible to decay than double-stranded DNA (Lindahl, 1993), which is why mutations accumulate at the termini

! ! 6! where fragmentation of the DNA molecule has produced single-stranded overhangs (Meyer et al., 2012). DNA fragmentation is another characteristic trait in ancient specimens. A purine base is often found before single-strand breaks or abasic sites, and depurination, therefore, appears to be the most common driver of post-mortem DNA fragmentation in (Orlando et al., 2015).

The methodological progress in aDNA research Since the first study on aDNA was published in 1984 (Higuchi et al., 1984) the field of ancient DNA has evolved with the invention of new and more sophisticated methodologies and technology. The advent of high-throughput sequencing (HTS) and the ensuing ability to sequence whole ancient genomes (Poinar et al., 2006, Rasmussen et al., 2010, Meyer et al., 2012) has especially driven the field forward (Orlando et al., 2015).

aDNA before high-throughput sequencing

The earliest studies to explore aDNA used bacterial cloning to obtain short mitochondrial DNA sequences from the dried muscle of an extinct equid, the quagga, (Higuchi et al., 1984) and the skin of desiccated Egyptian mummies (Pääbo, 1985). These studies found that the majority of DNA in ancient samples came from microbial sources while the endogenous DNA was only present in low concentrations and consisted of short, damaged fragments (Higuchi et al., 1984, Pääbo, 1985, Pääbo, 1989, Willerslev and Cooper, 2005).

With the invention of the polymerase chain reaction (PCR) it became possible to obtain millions of copies from as little as one surviving DNA molecule (Pääbo, 1989, Willerslev and Cooper, 2005). Traditional PCR targets a single, specific locus by the annealing of primers designed to fit the region of interest. However, while the sensitivity of PCR was a vast advantage to the field of ancient DNA, it also proved how vulnerable the material was to contamination during handling and in the laboratory from modern sources of DNA, contaminated reagents, and previously amplified PCR products (Gilbert et al., 2005, Willerslev and Cooper, 2005, Orlando et al., 2015). Furthermore, PCR primers may preferentially bind to modern DNA rather than aDNA due to damage and/or mutations that weaken the bond between the ancient specimen and the primer (Pääbo and Wilson, 1988, Pääbo et al., 1989). As a consequence, a number of false-positive results were published and severely compromised the credibility of the discipline (Hofreiter et al., 2001, Willerslev and Cooper, 2005).

Dedicated aDNA facilities that were isolated from laboratories where PCR took place and with one- way movement of personnel, samples and reagents up the concentration gradient, along with steps like UV irradiation systems, and thorough cleaning policies, were implemented to minimise contamination. In addition, with a focus on authentication of the results through steps like cloning and independent replication in a separate laboratory, and while keeping in mind the susceptibility of a sample to produce false-positives (Kwok and Higuchi, 1989, Cooper and Poinar, 2000, Gilbert et

! ! 7! al., 2005), the field of ancient DNA finally emerged as a viable scientific discipline in the mid 2000’s (Willerslev and Cooper, 2005, Orlando et al., 2015).

Most PCR based studies focused on mitochondrial and chloroplast aDNA (Willerslev and Cooper, 2005) because these organelles occur in multiple copies per cell and therefore had a higher probability of containing surviving genetic material. In contrast, the nuclear genome is generally present only once per cell, which made recovery of nuclear loci more challenging (Pääbo, 1989, Hofreiter et al., 2001), but not impossible (Poinar et al., 2003). The ability to generate data from ancient specimens created opportunities to answer new research questions and provided fresh perspectives to complement established disciplines like archaeology, paleoecology and palaeontology. For instance, new insights could be offered into the study of phylogenetics, as in Thomas et al. (1989) who resolved the polemics concerning the phylogeny of an extinct marsupial wolf, when they used mitochondrial DNA to assign it as a close relative of Australian rather than South American carnivorous marsupials, while Poinar et al. (2003), used single-copy nuclear DNA obtained from coprolites of the extinct Shasta sloth to infer phylogenetic history. It became possible to study the population genetics and demographic histories of extinct species (Orlando et al., 2002, Shapiro et al., 2004), as well as extant species, which was particularly relevant for taxa that had gone through recent bottlenecks (Weber et al., 2000). This also created the opportunity to investigate the influence of climate change on fluctuations in population sizes and genetic diversity through time (Orlando et al., 2002, Shapiro et al., 2004). New understanding of megafauna diets was obtained from coprolites from a range of species like extinct ground sloths (Poinar et al., 1998, Hofreiter et al., 2000), ancient humans (Hofreiter et al., 2001), and moa (Wood et al., 2012). These studies, along with sediments (Willerslev et al., 2003) and basal sections of deep ice cores (Willerslev et al., 2007), also yielded information on ecosystem composition and climatic effects through time.

In 2001 the field of ancient DNA entered the genomic age with the publication of the first complete ancient mitochondrial genomes of two extinct moa species (Cooper et al., 2001). A few years later the reconstructed sequence of the 1918 Spanish Influenza virus (Tumpey et al., 2005), and the mammoth (Rogaev et al., 2006) and cave bear (Bon et al., 2008) mitochondrial genomes followed. Ancient genomics, however, were constrained by the low DNA contents in even the best-preserved ancient specimens. Keeping in mind the limited availability of well-preserved ancient samples, the amount of material required for whole-genome sequencing with the techniques available at the time, made it unrealistic to obtain any but the smallest genomes. In addition, the cost as well as the amount of time required for laboratory processing and sequencing further limited the field (Der Sarkissian et al., 2015).

aDNA in the high-throughput sequencing era

The field of ancient DNA was revolutionised by the invention of HTS (also known as next- generation or second-generation sequencing) platforms, and it quickly became the main driving force behind the study of ancient genomics (Knapp and Hofreiter, 2010). The term ‘HTS’ was

! ! 8! coined in 1998 when the first 96 capillary sequencers, capable of Sanger sequencing 96 samples in parallel, were launched. However, as these systems relied on first-generation sequencing technology, they could not compete with the next generation of DNA sequencers that began to emerge from the mid 2000’s and completely redefined the concept of high throughput (Kircher and Kelso, 2010).

Following the invention of emulsion PCR, where each DNA library molecule is separately amplified in a water-oil emulsion droplet (Dressman et al., 2003), the 454 platform, performing sequencing by synthesis based on a pyrosequencing protocol, was developed. At a fraction the cost and time, this platform achieved a 100-fold higher throughput than the Sanger-based capillary sequencing, which was the standard of DNA sequencing at the time (Margulies et al., 2005). Furthermore it allowed single research groups to generate large amounts of data where previously genome projects were only done in collaboration between several groups or at large sequencing centres (Kircher and Kelso, 2010, Der Sarkissian et al., 2015). For the field of ancient DNA this marked the beginning of a new era where characterisation of paleogenomes became attainable. The first study to explore the new technology was published only a few months later, and used the 454 platform to generate 13 million basepair (bp) of mammoth DNA from a 28,000 years old bone. In a comparison between this data and the sequences for the African elephant, the phylogeny of the two species was investigated, and the time of divergence confirmed fossil-based estimates (Poinar et al., 2006). With the subsequent discovery of hair as a source of well-preserved aDNA that, unlike many other tissues, could be subjected to efficient decontamination (Der Sarkissian et al., 2015), Miller et al. (2008) published the first draft of the woolly mammoth genome, recovering 70 % of the genomic sequence.

The continuous endeavour to reduce cost and time per run while increasing throughput led to the invention of the Illumina platform, which does sequencing by synthesis using reversible terminator technology and thereby achieving even higher throughput than all earlier platforms (Bentley et al., 2008). The Illumina Genome Analyser II platform was capable of generating 180 million reads per run (Der Sarkissian et al., 2015), and was used to sequence the first ancient human genome. With a coverage of 20x over 78% of the genome, the study of a 4,000 years old, extinct palaeo-Eskimo was ground-breaking in several ways including providing information on several phenotypic traits like blood type and hair colour, offering insight into the colonization of Greenland, and demonstrating the ability to study of ancient human genomes despite potential contamination from archaeologists and laboratory personnel (Rasmussen et al., 2010).

Current HTS platforms require modifications of the DNA template by the ligation of adapters during the construction of sequencing libraries (Der Sarkissian et al., 2015, Orlando et al., 2015). Originally double-stranded DNAlibraries were end-repaired and then fitted with either blunt-end adapters (Briggs et al., 2007) or A-tailed by ligation of an adenine base at the 3’ of the DNA strands followed by ligation of T-tailed adapters complementary to the A-tails. The A-tailing approach, however, has been shown to be suboptimal for aDNA (Orlando et al., 2015). In both cases, single- strand breaks are lost in the subsequent fill-in reaction. To counter this, single-stranded DNA library preparation was developed specifically for aDNA. By separating the DNA double-strands

! ! 9! prior to adaptor ligation, the complementary strand no longer holds single-strand breaks together, and each 3’ can then be used for priming an extension (Meyer et al., 2012, Gansauge and Meyer, 2013). Single-stranded DNA libraries were used to obtain the 30x high-quality genome of a Denisovan individual, which revealed low levels of heterozygosity consistent with a small population size, and offered new insights into human evolution (Meyer et al., 2012). The same method was used to produce a high quality Neanderthal genome, which revealed geneflow etween Neanderthals and modern humans, and offered insights into the population structure of the Neandethals inhabiting the Altai Mountains (Prüfer et al., 2014).

DNA enrichment strategies

The metagenomic nature of DNA recovered from most ancient specimens usually means that the majority of the reads generated by shotgun sequencing are unrelated to the organism of interest. As a result, the sequencing of ancient genomes is often inefficient and expensive. Several DNA enrichment methods have been invented to circumvent this problem and present relatively inexpensive alternatives for increasing the fraction of endogenous DNA while generating high- quality data for specific targets (Ávila-Arcos et al., 2011). However, current enrichment protocols are all limited by their inability to recover all targets, which in turn reduces the complexity of the library (Der Sarkissian et al., 2015, Orlando et al., 2015).

The first enrichment method that became available was primer extension capture, which used biotinylated PCR primers to capture targeted DNA fragments (Briggs et al., 2009). Complete mitochondrial genomes from five Neanderthals were recovered this way. Primer extension capture has since been replaced by more sophisticated methods and is no longer used (Der Sarkissian et al., 2015, Orlando et al., 2015).

Currently the most widely used methods for aDNA enrichment are based on in-solution target enrichment where biotinylated baits are prepared from modern DNA extracts (Maricic et al., 2010) or commercially manufactured based on known sequences (Ávila-Arcos et al., 2011). By targeting complementary library molecules, these approaches are generally well suited for enriching smaller regions like whole mitochondrial genomes or shorter nuclear loci (Orlando et al., 2015). If a microarray-based hybridisation protocol (Burbano et al., 2010) is adapted, targets may be much larger (Orlando et al., 2015). In a study of a 40,000-year-old ancient human, Fu et al. (2013) captured the mitochondrial genome and the entire chromosome 21 in order to determine the ancestry of the individual. Although the baits are designed based on modern DNA, they may deviate from the target sequence by 10-13%, which is a great benefit when working on ancient DNA and extinct species where a closely related genome is not obtainable from modern sources (Orlando et al., 2015).

A different strategy of enriching for aDNA is through the selective targeting of the deoxyuracils produced by post-mortem cytosine→uracil deamination. This method was developed to be incorporated into single-stranded DNA library preparation (Gansauge and Meyer, 2013). By treatment with uracil DNA glycosylase and endonuclease VIII, and subsequent primer extension

! ! 10! with a strand-displacing Bst polymerase the fraction of the library that originally contained DNA damage is released into the supernatant by heat denaturation, while the undamaged fraction remains bound to streptavidin-coated paramagnetic beads and can be removed. The method has been demonstrated on Neanderthal extracts contaminated by modern human DNA and showed a post- treatment reduction in contamination from both modern human and microbial sources (Gansauge and Meyer, 2014).

With the growing interest in studying ancient population genomics, whole-genome in-solution capture currently offers the most cost-effective way of obtaining large datasets (Orlando et al., 2015). In this method a probe DNA library is prepared from modern DNA of a closely related species (Carpenter et al., 2013) or commercially manufactured (Ávila-Arcos et al., 2015). By in vitro transcription the library is turned into biotinylated RNA probes. At the same time, an aDNA library is produced. By heat denaturing the aDNA library into single strands, the RNA probes can be annealed to the targets and immobilized on streptavidin-coated paramagnetic beads. The non- target fraction can be washed away, while the targets are retained and can be annealed off the beads (Carpenter et al., 2013).

A brief review of recent research in aDNA

By far the largest part of ancient genomic studies has been focussed on vertebrates (Gugerli et al., 2005). Ancient anatomically modern humans have received particular attention, and investigations of their genomes have yielded information on human migration and dispersal across the continents (Rasmussen et al., 2011, Raghavan et al., 2014, Rasmussen et al., 2014). Phenotypic traits, like eye and skin colour, have been gleaned, along with new insight into genotypic traits like those for lactose intolerance, pathogen resistance and genetic predispositions to certain diseases (Keller et al., 2012, Olalde et al., 2014). Another focus has been ancient hominins where the Neanderthal genome, first as a draft (Green et al., 2010) and later a high-quality complete genome (Prüfer et al., 2014), in combination with the Denisovan genome (Reich et al., 2010), have provided important clues to human evolution and the admixture between human lineages.

Other vertebrates have also been afforded a lot of attention, and chiefly among them are large mammals. A study of extinct and extant horse species completely changed the established understanding of equid evolutionary history and divergeance (Orlando et al., 2013), while several studies on marine mammals, like whales (Foote et al., 2012, Foote et al., 2013), have vastly improved our knowledge of species that are difficult to study in their natural habitat. Also the ancestors of domesticated livestock like pigs (Ottoni et al., 2012) and particularly the changes brought about human selection, has been of interest.

Several studies have investigated ancient pathogens and particularly those related to humans. Starting with the sequencing of the 1918 Spanish influenza virus (Tumpey et al., 2005), the field has since produced the complete genomes of a wide range of pathogens including bacteria like Yersinia pestis (Bos et al., 2011) and historic Mycobacterium tuberculosis (Bouwman et al., 2012).

! ! 11! These have provided novel insights into large-scale epidemics, and the factors that result in high virulence or enhanced susceptibility of the host population to infection.

In contrast, ancient plants have received little attention (Gugerli et al., 2005).

Ancient plants

Plants generally exhibit large, highly heterozygous genomes with large, repetitive regions, and high levels of paralogy, which creates major challenges in genome assembly even for modern samples. In addition, variation in ploidy and whole or partial genome duplications further complicate matters (Hamilton and Robin Buell, 2012). In combination with the inherent challenges associated with aDNA in general, this makes studies of ancient plant genomics particularly challenging (Der Sarkissian et al., 2015). Nevertheless, ancient plants offer intriguing opportunities for historical and population genomic investigations through their abundance in herbaria where they are often supplemented by detailed paleontological records (Wales et al., 2013).

The far majority of ancient plant studies have been relatively small-scale, with a clear preference for domesticated crop species like maize (Jaenicke-Despres et al., 2003), barley (Palmer et al., 2009) and wheat (Li et al., 2011). The history of domesticated species, their evolution and routes of dispersal is a growing field of interest (Zeder, 2015), and studies on ancient plants are expected to increase in the near future (Brown et al., 2015, Der Sarkissian et al., 2015).

Desiccated plant remains are especially suitable for preservation of aDNA (O'Donoghue et al., 1994), yet few dessicated samples are preserved in archaeological deposits (Brown et al., 2015) and studies on this material has therefore mainly been limited to maize. A recent, large-scale study investigated 348 nuclear genes in maize landraces spanning 6,000 years of evolution (da Fonseca et al., 2015). This study revealed that the diffusion of maize from Mexico most likely occurred by a highland route into the South-western US, and further characterised the genes that were most likely to have influenced the adaptation to different climatic conditions along the way.

Waterlogged remains have received some attention as they too preserve DNA (Brown et al., 2015). A wealth of plant species can be found in wells (Figueiral et al., 2010), and might provide excellent opportunities for studying ancient and historic grape pips. The first study to investigate ancient grapes used microsatellite markers to assign a tentative geographic origin (Manen et al., 2003). Later, aDNA microsatellite loci, and amino acid and protein methodologies were combined to study pips from Italy and England (Cappellini et al., 2010). The results indicated that the English grape remains did not originate from the local area, while the Italian samples had a higher similarity with eastern Mediterranean markers. A year later, a study demonstrated the opportunities of using herbarium specimens to study the history of modern cultivars (Malenica et al., 2011). In combination with detailed information about the sample, microsatellites were used to redefine the history of a modern cultivar.

! ! 12! Grapes

The Eurasian grapevine, Vitis vinifera L, is well suited as a model system for molecular investigation of complex eukaryotic organisms. This is due in particular its diploid genome, which consists of 19 haploid chromosomes, and a genome size of 475-500 Mb (This et al., 2006) that is relatively small among plant species (in comparison, the maize genome is 2,300 Mb, (Schnable et al., 2009)). Furthermore, V. vinifera, has shown itself attractive for aDNA studies. Grape seeds, especially from waterlogged conditions, have been shown to preserve genetic material for hundreds and even thousands of years (Manen et al., 2003, Cappellini et al., 2010). Ancient and historic grape specimens thereby provide the opportunity to gain new insights into crop domestication and the selective processes associated with agriculture, as well offering a new perspective on human culture through time.

As a crop species, the history of grape cultivation and the changes induced by selective breeding is interesting in itself. However, the most fascinating aspect of the domestication history of Vitis vinifera subsp vinifera remains the influence it has had on the development of human cultures through millennia. in particular has figured as an essential part of many civilisations. Its disinfectant and analgesic properties made it the most widespread medicinal substance in antiquity (Cavalieri et al., 2003, McGovern, 2003, Legras et al., 2007). Wine also served as a prominent feature in many mythologies; several societies considered the fermented grape juice a drink for the gods and the Greek god, Dionysus, is only one among many deities dedicated or associated with wine. Other cultures regarded wine as the blood of people with special abilities and thought it imbued with mystical properties (McGovern, 2003). The appreciation for wine has lasted though the ages, and cultivated grapes remain the most important horticultural plant species in the world (Mullins et al., 1992, Pretorius, 2000).

The importance of grapes has led to extensive breeding programs for desired traits, like the muscat flavour (Emanuelli et al., 2010) and seedlessness (Perl et al., 1998). Mutations are frequent in grapevine (This et al., 2006), and in combination with local crossings with the wild subspecies, Vitis vinifera subsp silvestris, and little introgression between pedigrees due to the prevalence of vegetative propagation (Myles et al., 2011), high levels of genetic variation are present in grapes (Lijavetzky et al., 2007). It is likely that the genetic diversity was higher in the past, but particularly European grapes went through a bottleneck in the late 19th century, when phylloxera (Daktulosphaira vitifoliae), was introduced from America with a devastating effect on European (Mullins et al., 1992). Another, more recent bottleneck has been brought about by the globalisation of markets, and old landraces have been abandoned in favour of commercialised cultivars like Chardonnay, and (This et al., 2006).

Grapes are currently experiencing severe pathogen pressure, in particular from downy and powdery mildews, and receive extensive chemical treatment to combat diseases. However, with the growing concerns for the environment and implications for human health, new, sustainable methods are required to achieve resistance (Marguerit et al., 2009, Myles et al., 2011). The grafting with resistant rootstock from Vitis species native to America once saved the European vineyards from phylloxera (Mullins et al., 1992, This et al., 2006). Likewise, the exploration of the high genetic

! ! 13! diversity contained within the thousands of known cultivars, may lead to new approaches to fighting diseases in grapes (Marguerit et al., 2009, Myles et al., 2011).

Domestication history

Vitis vinifera, is the only species of the Vitaceae family that is extensively used for wine making (McGovern, 2003). It comprises two separate subspecies, the wild silvestris and the domesticated vinifera. But while silvestris is rare and can be found in South-western Eurasia and Northern Africa (This et al., 2006), humans have introduced cultivation of vinifera to almost every continent (Pretorius, 2000, McGovern, 2003). A marked morphological difference exists between the seeds of the domesticated grapevine and its wild progenitor. Vinifera seeds are generally long and narrow, while silvestris seeds are short and squat. The reason behind this differentiation has so far not been identified, but it is known to have happened early in the domestication process. Indeed the morphological characteristics of the seeds have been a principal tool in tracking the presence of in many archaeological settings, along with tools used in wine production, storage units, text and other remains from the past (McGovern, 2003). The Eurasian grapevine is among the earliest domesticated fruit crops in the world (Zohary et al., 2012), and the domestication began approximately 6-10,000 years ago (McGovern et al., 1996, McGovern, 2003, This et al., 2006, Bacilieri et al., 2013). Primo-domestication most likely occurred in the Near East, in the area spanning from Caucasus to the Fertile Crescent (McGovern, 2003, This et al., 2006, Bacilieri et al., 2013). This is supported by the excavation of domesticated grape seeds from Georgian and Turkish sites dating back to approximately 8,000 BP (McGovern, 2003). In addition, the earliest known evidence of wine making, dated to 7400-7000 BP, has been excavated in Iran (McGovern et al., 1996). Molecular studies have also provided support for an Eastern origin (Aradhya et al., 2003, Myles et al., 2011, Bacilieri et al., 2013) There has been some dispute regarding the occurrence of one or more secondary domestication events in Western Mediterranean Europe (Grassi et al., 2003, Arroyo‐García et al., 2006, Ucchesu et al., 2015). This is supported by the appearance of primitive cultivars reported from European Bronze Age sites (This et al., 2006, Ucchesu et al., 2015), and remains of cultivated grape seeds that are found in excavations of sites from the Neolithic (This et al., 2006). Furthermore, genetic studies of SSR markers (Grassi et al., 2003) and chlorotyopes (Arroyo‐García et al., 2006) in modern populations identified patterns, which appear independent from Eastern grapes. However, a recent study of a large, geographically diverse dataset proposed that these patterns are more likely due to large-scale intermixing and exchange of grape varieties through time (Bacilieri et al., 2013). From the primo-domestication area, grape cultivation spread in three main directions. An eastern route brought the domesticated grapevine into Central, and eventually Eastern, Asia. The dispersal of viticulture around the Mediterranean Basin followed the expansion of the major civilisations (This et al., 2006). A southern route of diffusion brought grape cultivation into Lower Mesapotamia and Egypt, where archaeological artefacts dating as far back as 7,000 BP have been associated with

! ! 14! wine storage (McGovern et al., 1996, McGovern, 2003). A northern route brought the domesticated grapevine into Mediterranean Europe, and seeds that are indisputably from cultivated varieties appear in archaeological deposits from the Iron Age (McGovern, 2003). The Romans industriously expanded the range of viticulture, first along major trade routes, and later throughout most of the Roman Empire. By the end of the Roman period grape cultivation was established as far north as Germany. During the Medieval period, grapes were mainly distributed by the Catholic Church with the diffusion of the religion into northern Europe and the Crusades to the Holy Land. In the Middle East, North Africa, and eventually , the distribution of mainly table grapes followed the expansion of Islam (This et al., 2006). In the 16th century, Old World explorers and colonists began the first expansion of the Eurasian grapevine into territories where it was not indigenous. Grape seeds were brought first to the Americas, later followed by cuttings of favoured European varieties, and Vineyards were established in Mexico, Argentina and Peru. A second wave followed with the European colonial expansions, and grape cultivation was introduced to South Africa, India, California, Australia and New Zealand (Pretorius, 2000, This et al., 2006). Viticulture continues to expand today as new areas become available, and while widely marketed varieties are still being planted and maintained, recent years have seen a return of some of the old landraces.

Domestication traits

The thousands of grape cultivars that are recognised today are all products of human mediated selection processes that have been going on for thousands of years. This is reflected in the assortment of different phenotypes that are available today. An adaptation that is presumed to have been selected for early in the history of domestication is the change from dioecy to hermaphrodit- ism, as the ability to self-fertilise greatly enhances the production of berries (McGovern, 2003, This et al., 2006). Another important trait that shows strong signs of selection in molecular studies is the wide variety of berry colours (Myles et al., 2011). The black berry is believed to be the ancestral allele, while white berries are derived. The distinction between white and black berries is thought to have been in effect during the Roman period (This et al., 2006). Both of these traits are discussed in more detail below. Other domestication traits relate to the distinction between wine and table grapes. Wine grapes are generally smaller and more densely clustered on the vine, while table grapes are large. The division between wine and table grapes is believed to be old, and thought to have existed at least during the Roman Era (This et al., 2006). Other traits that appear to have been heavily influenced by domestication include sugar content, bunch structure, acidity, traits relating to fertility and more uniform ripening of the berries (Myles et al., 2011, Bacilieri et al., 2013), along with other traits that are beyond the scope of this thesis. Simultaneous with the breeding for desired traits in wine grapes, a selection also occurred in the yeasts that colonise the grape berry skin. Saccharomyces cerevisiae is the primary driver of the fermentation process, which occurs during conversion of sugar to CO2 and ethanol, and is known

! ! 15! also from beer production and as a fundamental ingredient in baking. In , the principal desired traits are a rapid and complete conversion of grape sugars to alcohol, without the production of off-flavours (Pretorius, 2000, Cavalieri et al., 2003, Legras et al., 2007). The natural occurrence of S. cerevisiae on berry skin indicates that the relationship between yeast and winemaking was established early. In a study of residues from Egyption wine jars dating back to 5200 BP, ribosomal aDNA from S. cerevisiae was successfully sequenced. Although it could not be confirmed that the yeast was responsible for the fermentation of the wine that had once been contained within the jars, its presence implied an early relationship (Cavalieri et al., 2003) The discovery of yeast in the mid- 19th century followed by subsequent innovations eventually led to the introduction of pure yeast cultures to control the fermentation. This revolutionised wine production by vastly increasing the quantity and quality of the final product (Pretorius, 2000).

Hermaphrodism

Perhaps the most extraordinary domestication trait in grapes is the change from dieocy to hermaphroditism. A majority of flowering plants have hermaphroditic flowers, while monoecious or dioecious flowers are rare (Charlesworth, 2015, Vyskot and Hobza, 2015).

The progenitor of Vitis vinifera subsp silvestris is thought to have been hermaphroditic, while silvestris is dieocious (Oberle, 1938, McGovern, 2003). This in itself is interesting, as only 6 % of flowering plants have male and female flowers on separate individuals (Vyskot and Hobza, 2015). A mutation in male silvestris supresses the expression of the pistil and induces female sterility by programmed external nucellus and integumentary cell death. In female flowers, a different mutation renders the male organ non-functional through the production of sterile microspores (Caporali et al., 2003). Both male and female flowers retain a hermaphroditic morphology (Oberle, 1938).

Male silvestris have been observed to revert to functional hermaphrodites owing to both genetic and environmental conditions (Negi and Olmo, 1971), while female flowers on occasion produce viable microspores (Oberle, 1938). This may have provided the basis for the human mediated conversion to hermaphroditism in the cultivated form of V. vinifera. Plants that are capable of self-pollination produce more fruit (Bacilieri et al., 2013), which made hermaphroditism an attractive trait in a crop species. In addition, male flowers rarely produce fruit, while the berries of female flowers exhibit variation in taste (McGovern, 2003), both undesired in crop species. Hermaphroditism, therefore, has been strongly selected for and is almost completely fixed within the population of cultivated grapes (Picq et al., 2014, Bacilieri et al., 2013).

Separate sex chromosomes are rare in plants, but several species, like papaya (Liu et al., 2004) and wild strawberry (Spigler et al., 2008), have been found to have a sex locus, where sex-determining genes are situated in close proximity on a chromosome. The presence of a grapevine sex locus with male>hermaphrodite>female dominance was recognised early on (Oberle, 1938), and later located on chromosome 2 (Marguerit et al., 2009, Battilana et al., 2011, Fechter et al., 2012). A recent study found that the grapevine sex locus exhibits the characteristics of an XY inheritance system where females are always homozygotes for the most frequent allele and males always heterozygotes. They

! ! 16! Figure!1:!Four!VvmybA1!allelic! variants.!!The!black!line!indicates!the! genomic!secuence.!The!Pink!boxes! indicate!the!VvmybA1!gene.!Genes! that!are!filled!express!anthocyanin,! while!genes!that!are!non!functional,! are!white.!The!null!allele,!VvmybA1d,! is!shown!with!a!line!through!to! indicate!the!missing!gene.

further narrowed the location of the sex determining loci to the amplicons, VSVV006, VSVV007, VSVV009 and VSVV010, and within these identified a small subset of SNPs that could accurately predict sex genotype in 97 % of the individuals included in the study (Picq et al., 2014).

A number of genes have been associated with the grapevine sex locus. Fechter et al. (2012) identified candidate genes, which included several flavin monooxygenases that are hypothesized to be involved in pathogen defense as well as a putative gene, which encodes an adeninephosphoribosyl transferase. Other genes include Trehalose-6-phosphate phosphatase, WRKY transcription factor 21, and a gene from the exotosin family, which appear to be associated with the male haplotype (Picq et al., 2014).

Berry Colour

Through millennia of plant breeding, humans have selected for many phenotypic traits, including variations in fruit colour. The biochemical basis for fruit colour is primarily determined by anthocyanin (Allan et al., 2008), a compound in the flavonoid family. This group of ubiquitous secondary metabolites in plants have been attributed a variety of functions like protection against light stress and as attractants during plant reproduction. Anthocyanins are produced through the anthocyanin metabolic pathway (Boss et al., 1996) and accumulate within the tissue or skin of fruits and vegetables where they play a central role in determining the stage of ripeness and quality of the produce (Boss et al., 1996, Allan et al., 2008).

In the domesticated grapevine, berry colour is among the most important qualities for selective breeding (Azuma et al., 2008), and colour variation has greatly increased through domestication (This et al., 2007). Wild grapevines exclusively yield black berries, while domesticated varieties are phenotypically diverse including the ancestral black fruit, but also derived red, white, and intermediate colours (Walker et al., 2007, Azuma et al., 2008). Black berries accumulate anthocyanins in their skin, while white berries only accumulate proanthocyanidins. Red berries also accumulate anthocyanins, although in lesser quantities than black berries. This correlates with the expression of UDP glucose-flavonoid 3-o-glucosyl transferase (UFGT), the last gene in the anthocyanin pathway, in the skin of black and red berries. White berry skin does not express the gene (Boss et al., 1996), indicating that white berries are unable to produce anthocyanins.

! ! 17! In many plants anthocyanin biosynthesis is controlled by regulatory genes belonging to either the bHLH or the MYB transcription factor families (Azuma et al., 2008). The MYB superfamily comprises a diverse group of transcription factors found throughout eukaryotes and has been attributed numerous functions specific to plants (Ambawat et al., 2013). This family of genes has been implicated in anthocyanin synthesis in maize, snapdragon, petunia, and soybean (Quattrocchio et al., 1993, Du et al., 2012), as well as Vitis species (Kobayashi et al., 2002, Kobayashi et al., 2005). In vinifera, several studies have found that the majority of berry colour variation can be explained by polymorphisms in the MybA genes (Zohary and Spiegel-Roy, 1975, This et al., 2006, Azuma et al., 2008, This et al., 2007, Lijavetzky et al., 2006). These four genes are clustered close together on a single locus on chromosome 2 (Zohary and Spiegel-Roy, 1975, Allan et al., 2008, Shimazaki et al., 2011, Fournier-Level et al., 2010). VvmybA1, VvmybA2 and VvmybA3 have been shown to be more similar to each other at the nucleotide level than the more distantly related VvmybA4 gene (Walker et al., 2007). Some of the genetic variation in VvmybA1 alleles is shown in Figure 1.

The VvmybA1 gene has been the single most studies gene responsible for berry colour variation. The black allele of VvmybA1 (VvmybA1c, Fig. 1) is found in wild grapes and likely is the ancestral form (Yakushiji et al., 2006). While it was not the first version of the gene to be characterised by researchers, hence the “c” allele, it is associated with a fully functional anthocyanin metabolic pathway, leading to anthocyanin accumulation in the berry skins.

The white berry phenotype is recessive (This et al., 2007) and is caused by mutations in VvmybA1 and VvmybA2 that disrupt the anthocyanin metabolic pathway (Walker et al., 2006, This et al., 2006, Kobayashi et al., 2004, Allan et al., 2008). A cultivar that produces white berries therefore has to be homozygous for the non-functional allele in both genes (Vezzulli et al., 2012). The white allele of VvmybA1 is not transcribed due either to the deletion of the whole gene (VvmybA1d in Fig. 1; Yakushiji et al., 2006) or the insertion of a transposable element, Gret1, in the promoter region of the gene Figure 2: Geographic placement of the archaeological dig sites (VvmybA1a in Fig. 1; Kobayashi et from where the samples of this study were collected. Map modified from Google Maps al., 2004, This et al., 2007). This et al. (https://www.google.dk/maps/place/Frankrig). (2007) found a strong association

! ! 18! between the VvmybA1a genotype and the white phenotype in over 200 cultivate grape varieties, where 81 out of 84 of the white-skinned grapes contained the Gret1 insert. They also observed that the Gret1 insert was present at the same location in all the 81 accessions, and coupled with a low level of sequence diversity across all tested VvmybA1a sequences, this led them to hypothesise that the allele arose only once or a limited number of times (This et al., 2007).

Currently there has been no explanation as to why cultivars that have functional mybA genes might bear white berries. However, an untested hypothesis is that in these varieties, bHLH, which also controls anthocyanin biosynthesis (Azuma et al., 2008), or UFGT, the enzyme that converts pro- anthocyanidins into anthocyanins (Boss et al., 1996), has been rendered non-functional my mutations.

Red-skinned grapes have arisen multiple times over the course of cultivation, and have varying genetic bases. Some varieties of red-skinned grapes, like Ruby Okuyama and Flame Muscat, are descendant from white-skinned cultivars containing the Gret1 retrotransposon. Cellular mechanisms excised a majority of the transposable element, but left a portion of the 3’ Last Terminal Repeat (LTR) and duplicated target site, giving rise to an 800 bp insertion in the promoter region of the allele (VvmybA1b, Fig. 1; Kobayashi et al. 2004). Shimazaki et al. (2011) identified a 33 bp insert in the second intron of VvmybA1 that was characteristic to all Asian pink-skinned cultivars. A different 33 bp insertion in the second intron of European grapes was described by Lijavetzky et al. (2006) along with several smaller inserts in the promoter, including a 111 bp and a 44 bp indel in the promoter (Shimazaki et al., 2011, Lijavetzky et al., 2006), and two point mutations in the first exon (This et al., 2007). An additional deletion of the last part of the VvmybA1 gene has also been observed (Lijavetzky et al., 2006). All of the aforementioned mutations alter gene expression, thereby reducing the final anthocyanin production and causing red berry skins (This et al., 2007, Kobayashi et al., 2004).

Grapes with black berries can be either homozygous for the black allele or heterozygous for the black and white alleles. The same pattern is observed in varieties with red berries, while white berries are always homozygous for the white allele (Azuma et al., 2008). This indicates a relationship of black > red/pink > white dominance in grapes.

In contrast to the wealth of research on the VvmybA1 gene, much less is known about the other three genes in the MYB family in Vitis. Studies of VvmybA2 have found that this gene also plays a role in berry colour determination. The research effort has mainly been focussed on the white allele, which may be caused either by a complete deletion of the gene (Walker et al., 2007) or by two mutations in the coding region (Allan et al., 2008).

Most studies that mention VvmybA3, only notes the presence of the gene. One study hypothesise that colour restoration in the Asian Benikata grape was caused by homologous recombination between VvmybA1 and VvmybA3 (Azuma et al., 2009), however Walker et al. (2007) conclude that it plays a very minor role, if any at all. VvmybA4 is less similar to the other three myb genes. It is of a similar length to VvmybA1, but because of a frame shift mutation, it is thought to only encode 20 bp (Walker et al., 2007). To my current knowledge it has not been characterised further.

! ! 19!

Methods Samples

Ancient and historic (henceforth referred to as ancient) Vitis seeds were collected from 10 archaeological sites in France: three sites in the temperate part of France (Horbourg-Wihr, Madeleine, and Colletière), and seven sites in the French Mediterranean area, primarily in the Languedoc region (Fig. 2). The samples were found in waterlogged conditions from excavations of wells, pits and ditches. The majority of samples were dated based on the presence of archaeological artefacts, and ranged in age from Iron Age to late Medieval/ early Modern. A more detailed overview of the archaeological context of the different sites can be found in Table 1.

In the field, sediments were systematically collected by archaeologists or archaeobotanists from Inrap (Institut National pour la Recherche en Archéologie Préventive) in Montpellier, France. Layers were sampled based on visible grape pips or knowledge pertaining to the presence of grape pips obtained from previous sievings. In most cases, the sediment was immediately isolated to prevent contamination, and stored in cool conditions. Later, the sediments were processed in a clean room that was not connected to the archaeobotanical laboratory to prevent contamination from the modern samples that are handled there. All surfaces were cleaned with bleach prior to handling. In two cases, Colletière and Horbourg-Wihr, the pips were directly isolated at the site before being sent to the laboratory in Montpellier. All pips included in this study were photographed for morphological analyses before being shipped from France. Upon arrival in Copenhagen, they were all stored at -20 °C.

49 pips were randomly selected for extraction. Of those 34 were built into libraries and shotgun sequenced. 32 of the libraries were chosen for target enrichment capture.

Extraction and library build

Extraction All pre-amplification steps were carried out in a dedicated aDNA laboratory at the Centre for Geogenetics, University of Copenhagen, where stringent measures are in place to prevent contamination, including positive air pressure, nightly UV radiation, and frequent decontamination of surfaces with bleach.

The extraction was performed following a phenol and chloroform extraction protocol optimized for ancient plant aDNA (protocol version 1.16, 03.09.2014), Wales et al., 2014). For each round of extractions, 7 pips and 1 blank were processed. The pips were inspected for holes and cracks and if any were found, another pip was chosen. Individual pips were rinsed first in a 10 % dilution of commercial bleach to neutralise contaminants stuck to the outside shell, and then in ddH2O to remove the bleach. Seeds were dried off using paper towels, wrapped in aluminium foil and crushed

! ! 20! with a hammer. The resulting powder was transferred to an irradiated LoBind Eppendorf tube. Weight and colour of the pip was recorded. The hammer and all surfaces were wiped with 10% bleach and 70% ethanol between each crushing. The samples were incubated on a rotator at 55°C overnight in 1,000 µL of digestion buffer (10 µL Tris-HCl (10 mM), 2 µL NaCl (10 mM) 200 µL

SDS (2 % w/v), 5 µL CaCl2 (5 mM), 5 µL EDTA (pH 8.0, 2.5 mM), ≈6.25 mg DTT (40 mM), 100

µL Proteinase K solution (10 %), and 678 µL H2O).

On the following day, samples were equilibrated to room temperature before further processing. They were spun of a bench-top centrifuge at maximum speed or 5 minutes to pellet the debris. The upper aqueous layer was collected, added to 1 volume of phenol, and then gently mixed on a rotator at room temperature for 5 min. After spinning on a bench centrifuge for 5 min at maximum speed, the upper aqueous layer was carefully collected to avoid the protein-containing lower layer, and the phenol extraction step was repeated for a total of two times. The upper aqueous phase was collected and added to 1 volume of chloroform, rotated and spun at maximum speed in a bench centrifuge, both steps for 5 minutes. Samples were purified using the MinElute PCR Purification Kit with modifications based on Dabney et al. (2013) to prevent washing away short DNA fragments. The upper aqueous layer was collected and added to 13 volumes of PB Buffer. After mixing thoroughly, the entire volume was added to a Qiagen MinElute column attached to a bleached and irradiated Zymo-Spin V extension reservoir placed in a 50 ml tube. Samples were centrifuged for 4 minutes at 1,500 x g in a bench centrifuge until the entire volume had passed. The spin column was transferred back to the collection tube and dry spun for 1 minute at 3,300 x g. Samples were washed twice with 700 μl PE Buffer and spun at 3,300 x g for 1 minute. Samples were dry spun at 16,100 x g for 1 minute, the spin columns moved to a labelled 1.5 ml LoBind Eppendorf tube, and 30 μl EB buffer was added to the filter. After incubating for 15 minutes at 37 °C, samples were spun at 16,100 x g to elute. DNA extracts were stored at -20 °C.

All extracts were tested for DNA content using the Qubit dsDNA HS (High Sensitivity) Assay Kit (Life Technologies) following the manufacturer’s protocol. A master mix was prepared using 199 μl Qubit dsDNA HS Buffer and 1 μl Qubit dsDNA HS Reagent per sample. To prepare the standards, 190 μl of the master mix was added to a clear tube and mixed with 10 μl of Qubit dsDNA HS Standard #1 or Qubit dsDNA HS Standard #2. To prepare the samples 199 μl of the master mix and 1 μl of extract was added to a clear tube. Samples were vortexed and spun before being measured on the Qubit 2.0 Fluorometer.

All extracts were tested for inhibition and plant DNA content using universal primers for the rbcL (ribulose-biphophate carboxylase) gene. The primer sequences for the forward and reverse primers were 5 ' - GGCAGCATTCCGAGTAACTCCTC-3 ' and 5 ' -CGTCCTTTGTAACGATCAAG-3 ', respectively. Quantitative PCR (qPCR) set-up was done in the aDNA laboratory, using 20 μl reactions containing 0.16 μl dNTPs (25 mM), 0.8 μl rbcL primer F/ R mixture (10 μM), 0.2 μl

AmpliTaq Gold DNA Polymerase (5 U/ μl), 2 μl 10X AmpliTaq Gold Buffer, 2 μl MgCl2 (25 mM),

0.8 μl bovine serum albumin (BSA) (10 mg/ ml), 0.8 μl SYBR Green, and 12.24 μl ddH2O. Extracts were tested at 100% and 10 % dilutions. Dilutions were made by adding 9 μl ddH2O to 1 μl of the extract. 1 μl of template was added to 19 μl of the master mix for a total reaction size of 20 μl, on a

! ! 21! Table 1: Archaeological context * Dating: archaeological artefacts, ** Dating: dendrochronology, ***Dating: C14, ****Dating: unclear Region Site Stratigraphic Structure Age Period Sample name unit Alsace Horbourg-Wihr Pit ST7054 2nd c AD**** Roman HBG7054 Pit ST7172 2nd c AD**** Roman HBG7172 Gard Mas de Vignoles XIV, US 12111 Well PT 1220 ± 30 Early MDV14_US12111 Nîmes 12024 BP*** Medieval (731-851 AD) US 13525 Well PT 1605± 35 Late Roman/ MDV14_US13525 13319 BP*** Medieval (417-515 AD) US14152 Rural ditch FO 2nd-1st c Early Roman MDV14_US14152 14194 BC**** Hérault La Cougourlude, Lattes US 31084 Ditch FO 510-475 BC* Iron Age Cougourlude_237 30277 2480 +- 30 BP (769-417 cal BC)*** Magalas (Terrasses de US4008 Well PT 4000 1st c to the 4th Roman MAG2013_US4008 Montfau ) c AD* US4015 Well PT 4000 1st c to the 4th Roman MAG2013_US4015 c AD* Mont Ferrier, Tourbes US 2076 Well PT 2052 1st c AD**** Roman Montfer Roumèges, Poussan US 5007, US Well PT 5001 1st-3rd c AD* Roman Roumeg 5012 and US 5013 La Lesse-Espagnac, US 3019 Well PT 3005 175-225 Roman SAU3019 Sauvian AD**** Isère Colletière, Charavines Cultural layer, 1006-1040 Medieval Collet08 rubbish AD** deposits Loiret Rue du Faubourg US 15126 Cesspit F 1517 1050-1200 Late Madel_08 Madeleine / Prieuré de AD* medieval/ la Madeleine, Orléans Early Modern Var Avenue Pierre-et-Marie- Well PT 1107 Late 1st c AD Roman CAV1107 Curie, Cavalaire sur Mer to the early 3rd c AD* white qPCR plate and covered with clear film. All wells containing samples were thoroughly sealed prior to leaving the aDNA laboratory. An aliquot of the master mix was brought to the modern laboratory, where 19 μl master mix was added to 1 μl of a rbcL standard for a total of 7 calibration standards and one blank. All samples were run on the Roche LightCycler 480 with the following program: initial denaturing for 10 minutes at 95°C, denaturing for 30 seconds at 95°C, annealing for 60 seconds at 54°C, extension for 60 seconds at 72°C. Denaturing, annealing and extension steps were repeated for 40 cycles and a melt curve was determined.

Library construction

DNA extracts were converted to high-throughput sequencing libraries using the NEBnext DNA Library Prep Mast Mix Set 2 (E6070L, New England BioLabs) with modifications developed by Tom Gilbert and Andaine Seguin-Orlando. In the first module of the kit, 10 μl End Repair Reaction

Buffer (10x), 5 μl End Repair Enzyme Mix and 65 μl ddH2O were added to 20 μl of extract for a total reaction size of 100 μl, and incubated for 20 min at 12°C and 15 min at 37°C in a Thermocycler. Purification was done with Qiagen MinElute PCR Purification Kit, using1300 μl PB Buffer to bind (13X volume following Dabney et al. 2013), 700 μl PE Buffer to wash, centrifugations at 3,300 x g to prevent loss of short DNA, and 30 μl EB Buffer incubated for 15 min

! ! 22! at 37 °C to elute. For adapter ligation, 10 μl Quick Ligation Reaction Buffer (5x) and 5 μl P5/P7 Blunt End Adaptor (20 µM) was added to 30 uL of the End-repaired DNA. Then 5 μl Quick T4 DNA Ligase (5 U/μl) was added for a total reaction size of 50 Qiagen, and incubated at 20 min at 20 °C. Purification was done with Qiagen QIAquick PCR Purification Kit, 250 μl PB Buffer to bind, 700 μl PE Buffer to wash, spins were done at 9,000 x g to retain DNA but remove adapter dimers. To elute, 30 μl EB Buffer was added directly to the filter and incubated for 15 min at 37 °C before collection. Adapter fill-in reaction was performed with 5 μl Adapter Fill-in Reaction Buffer,

3 μl Bst DNA polymerase, and 12 μl H2O added to 30 μl Adapter-ligated DNA. Samples were incubated for 20 min at 65°C and 20 min at 80°C in a thermocycler. Unamplified libraries were stored at -20 °C.

To determine how many PCR cycles each library required, 2 μl of a 1:100 dilution of the unamplified libraries was added to a master mix containing 2 μl Taq Gold Buffer (10X), 2 μl MgCl2 (25mM), 0.16 μl dNTPs (25 mM), 0.4 μl inPE 1.0 F primer (10 μM), 0.8 μl SYBR Green, 0.8 μl

BSA (10 mg/ml), 0.4 μl R primer with index (10 μM), 11.28 μl H2O and 0.16 μl AmpliTaq Gold DNA Polymerase, for a 20 μl reaction. Samples were run on a Roche LightCycler 480 Instrument with the following program: initial denaturing for 10 minutes at 95°C, denaturing for 30 seconds at 95°C, annealing for 60 seconds at 60°C, extension for 60 seconds at 72°C. Denaturing, annealing and extension steps were repeated for 40 cycles, and a melt curve was determined. The amplification requirements for each library were investigated by observing the cycle number where the amplification curve reached the plateau phase. The required number of indexing PCR cycles was estimated by correcting for the 1:100 dilution and the larger volume of library template in the indexing PCR.

Libraries were amplified in 100-ul reactions using double amount of AmpliTaq Gold DNA polymerase and BSA. A master mix containing 10 μl Taq Gold Buffer (10X), 10 μl MgCl2 (25mM),

0.8 μl dNTPs (25 mM), 2 μl inPE 1.0 F primer (10 μM), 4 μl BSA (10 mg/ml), 49.2 μl H2O and 2 μl AmpliTaq Gold DNA Polymerase was added to 20 μl library and 2 μl R primer with indices individual to each sample. Each library was amplified in 2 separate reactions to maximise library complexity. They were run with the following settings: initial denaturing for 10 minutes at 95°C, denaturing for 30 seconds at 95 °C, annealing for 60 seconds at 60 °C, extension for 60 seconds at 72 °C. Denature, annealing and extension was repeated for the number of cycles determined by qPCR. PCR products for the same library were combined for purification, which was done with Qiagen QIAquick PCR Purification Kit, 1000 μl PB buffer to bind, 700 μl to wash, spins at 9,000 x g, and 30 μl EB buffer incubated for 15 min at 37 °C to elute.

DNA concentration of the amplified libraries was tested using the Qubit dsDNA HS Assay Kit (Life Technologies) following the manufacture’s protocol as previously described.

To determine the average insert size, concentration and molarity in the 100-1000 bp region, amplified libraries were run on a High Sensitivity DNA chip on the Agilent 2100 Bioanalyzer following the manufacturer’s protocol. Samples were then pooled based on index compatibility and sample molarity, and 100 bp Single-Read shotgun sequenced on the Illumina 2500 HiSeq platform.

! ! 23!

Capture

A target enrichment method using MYbaits, manufactured by MYcroarray, was chosen for the capture experiment in this study. A total of 16281 custom RNA baits, targeting 83 genes and 13,000 single nucleotide polymorphisms (SNPs), were designed by Jose Alfredo Samaniego and Nathan Wales. The targeted genes included several related to important phenotypic traits, like berry color and bunch architecture, as well as neutral genes that are not expected to be under selection. The SNPs were selected earlier SNP arrays developed for genotyping Vitis species and commercial grapevines (GrapeReSeq: https://urgi.versailles.inra.fr/Projects/Achieved-projects/GrapeReSeq).

Rounds 1 and 2

The first two rounds of capture followed the MYbaits User Manual version 1.3.8 (14/06/2013). 8 samples were processed each round (Table 2).

Table 2: Library information Capture Sample DNA into PCR DNA into Capture notes round library (ng) cycles capture (ng) MAG2013_US4015_P6 30.2 22 62.4 Strictly followed protocol version 1.3.8 MAG2013_US4015_P10 20.4 22 169.4 MDV14_US12111_P2 38.2 19 48.3 MDV14_US12111_P4 46.4 14 151.2 1 MDV14_US12111_P7 31.2 19 198.5 MDV14_US14152_P7 204.0 14 160.4 Montfer_P23 22.0 14 146.9 SAU3019_P2 90.4 17 149.2 HBG7172_P17 113.2 14 131.8 Strictly followed protocol version1.3.8 MAG2013_US4015_P8 27.0 22 23.2 MDV14_US12111_P5 18.28 22 32.5 MDV14_US12111_P9 19.8 22 8.3 2 Montfer_P25 30.0 14 176.4 Roumeg_P9 29.2 14 167.8 SAU3019_P8 33.2 14 124.4 SAU3019_P13 42.4 14 171.7 Collet08_P27 27.8 12 132.5 Strictly followed protocol version 2.3.1, Cougourlude_237 6.2 17 40.1 except for when, by mistake, double the MAG2013_US4008_P1 75.6 14 360.6 amount of Hybridisation Master Mix was MDV14_US14152_P4 46.4 14 475.5 added to the Capture Beads and Library 3 Montfer_P21 38.2 12 180.8 Master Mix prior to incubation Roumeg_P14 10.1 15 304.3 SAU3019_P9 79.6 15 495.8 SAU3019_P14 11.1 14 196.0 CAV1107_P25 14.2 25 56.5 The bait hybridisation was done according HBG7054_P18 165.6 18 62.4 to protocol Version 2.3.1 to optimise the HBG7172_P3 170.8 18 723.3 bait hybridisation. Recovery was done Madel_08_P22 84.4 18 157.5 following protocol version 1.3.8. The first 4 MAG2013_US4015_P5 38.2 16 581.5 part was chosen for optimised bait hybridisation, while the latter part was MDV14_US13525_P5 77.2 14 124.0 chosen to optimise the workflow. MDV14_US13525_P7 32.0 16 763.9 MDV14_US14152_P9 39.0 16 794.6 Not captured MDV14_US12111_P1 36.8 14 NA NA Not captured SAU3019_P4 24.2 18 NA NA

! ! 24! All libraries were concentrated using the SpeedVac (Thermo Scientific) until they contained 100- 500 ng DNA in 3.4 μl. The concentrated libraries were added to a Library Master Mix (LMM) containing 2.5 μl Block #1, 2.5 μl Block #2 and 0.6 μl Block #3 per sample. A Hybridization Master Mix (HMM) was prepared containing 20 μl HYB #1, 0.8 μl HYB #2, 8 μl HYB #3, and 8 μl HYB #4 per sample. Finally 1 μl RNase Block and 5 μl Capture Probe was added directly to a separate PCR strip tube, comprising the Capture Baits Master Mix (CBMM). The LMM was denatured at 95 °C for 5 minutes before the temperature was lowered to 65 °C and the HMM was added to the thermocycler to preheat. After 3 minutes, the CBMM was also added to the thermocycler to incubate alongside the other two Master Mixes for an additional 2 minutes. While keeping the tubes at 65 °C, the entire volume of the LMM was transferred to the CBMM using a multichannel pipette. Then 13 μl of the HMM was added to the CBMM and mixed by pipetting. The hybridisation solution incubated for approximately 21 hours at 65 °C during which time the targeted DNA sequences hybridised to the biotinylated RNA baits.

The following day, 50 μl of Dynabeads MyOne Streptavidin C1 magnetic beads were added to a 2 ml tube and pelleted using a magnetic stand. The supernatant was discarded and the beads were washed using 200 μl Binding Buffer. The beads were again pelleted, the supernatant discarded, and the wash repeated for a total of three washes. The beads were re-suspended in 200 μl Binding Buffer, the entire volume of the hybridisation solution was transferred, and the samples incubated at room temperature for 30 minutes to bind the baits to the magnetic beads. The beads were then pelleted, the supernatant removed, and washed with 500 μl of Wash Buffer 1. After incubating for 15 minutes, the beads were pelleted, the supernatant removed, and the beads washed thrice with 500 μl of Wash Buffer 2 preheated to 65 °C. In-between washes, the beads were incubated for 10 minutes at 65 °C, pelleted, and the supernatant removed. After the third wash, samples were spun down and the beads pelleted so that any leftover buffer could be removed. The beads were then re- suspended in 30 μl Molecular Biology Grade Water and stored at -20 °C.

Round 3

The third round of capture followed the MYbaits User Manual 2.3.1 (5/22/2014). 8 samples were processed. See Table 2 for specifications.

All libraries were concentrated until they contained 100-500 ng DNA in 5.9 μl using the SpeedVac (Thermo Scientific) and added to a 0.2 ml tube containing 11.5 μl of the LMM (2.5 μl Block #1, 2.5 μl Block #2 and 0.6 μl Block #3 per sample). The HMM (20 μl HYB #1, 0.8 μl HYB #2, 8 μl HYB #3, and 0.8 μl HYB #4) was prepared and 29.6 μl was added to a 0.2 ml tube. Finally 1 μl RNase Block and 5 μl Capture Probe was added directly to a separate 0.2 ml tube, comprising the CBMM. The LMM was denatured at 95 °C for 5 minutes before the temperature was lowered to 65 °C and the HMM was added to the thermocycler to preheat. After 3 minutes, the CBMM was also added to the thermocycler to incubate alongside the other two Master Mixes for an additional 2 minutes. While keeping the tubes at 65 °C, the entire volume of the LMM was transferred to the Capture Baits Master Mix using a multichannel pipette. Then 21 μl of the HMM was also added to the CBMM. By mistake the transferred volume of the HMM was twice what the protocol specified. The

! ! 25! hybridisation solution was mixed by pipetting and incubated at 65 °C for approximately 21 hours during which the targeted DNA sequences hybridises to the biotinylated RNA baits.

The following day, 50 μl of Dynabeads MyOne Streptavidin C1 magnetic beads were added to a 2 ml tube and pelleted using a magnetic stand. The supernatant was discarded and the beads were washed using 200 μl Binding Buffer. The beads were again pelleted, the supernatant discarded, and the wash repeated for a total of three washes. The beads were re-suspended in 20 μl Binding Buffer, and transferred to a 0.2 ml tube. The beads were incubated in a thermocycler for 2 minutes at 65 °C, and then transferred to the hybridisation solution. The solution was incubated for 45 minutes at 65 °C and mixed by pipetting up and down every 10 minutes. The solution was then transferred to an Eppendorf tube, the beads pelleted and the supernatant removed. The beads were washed thrice with 500 μl of Wash Buffer 2 preheated to 65 °C. After each wash, the beads were incubated for 5 minutes at 65 °C, pelleted, and the supernatant removed. After removing the supernatant after the third wash, samples were spun down and the beads pelleted so that any leftover buffer could be removed. The beads were then re-suspended in 30 μl Molecular Biology Grade Water and stored at -20 °C.

Round-4.-

The! fourth! round! of! capture! followed! a! modified! version! of! the! MYbaits! User! Manuals,! combining!the!best!elements!of!versions!1.3.8!(14/06/2013)!and!Version!2.3.1!(5/22/2014).! A!total!of!8!samples!were!processed!(Table!2)!

All libraries were concentrated until they contained 100-500 ng DNA in 5.9 μl using the SpeedVac (Thermo Scientific) and added to a 0.2 ml tube containing 11.5 μl of the LMM (2.5 μl Block #1, 2.5 μl Block #2 and 0.6 μl Block #3 per sample). The HMM (20 μl HYB #1, 0.8 μl HYB #2, 8 μl HYB #3, and 0.8 μl HYB #4) was prepared and 29.6 μl was added to a 0.2 ml tube. Finally 1 μl RNase Block and 5 μl Capture Probe was added directly to a separate 0.2 ml tube, comprising the CBMM. The LMM was denatured at 95 °C for 5 minutes before the temperature was lowered to 65 °C and the HMM was added to the thermocycler to preheat. After 3 minutes, the CBMM was also added to the thermocycler to incubate alongside the other two Master Mixes for an additional 2 minutes. While keeping the tubes at 65 °C, the entire volume of the LMM was transferred to the Capture Baits Master Mix using a multichannel pipette. Then 21 μl of the HMM was also added to the CBMM. By mistake the transferred volume of the HMM was twice what the protocol specified. The hybridisation solution was mixed by pipetting and incubated at 65 °C for approximately 21 hours during which the targeted DNA sequences hybridises to the biotinylated RNA baits.

The following day, 50 μl of Dynabeads MyOne Streptavidin C1 magnetic beads were added to a 2 ml tube and pelleted using a magnetic stand. The supernatant was discarded and the beads were washed using 200 μl Binding Buffer. The beads were again pelleted, the supernatant discarded, and the wash repeated for a total of three washes. The beads were re-suspended in 200 μl Binding Buffer, the entire volume of the hybridisation solution was transferred, and the samples incubated on a rotator at room temperature for 30 minutes to bind the baits to the magnetic beads. The beads were then pelleted, the supernatant removed, and washed with 500 μl of Wash Buffer 2 preheated to

! ! 26! Table 3: Sequencing information. Endogenous reads, useful reads and useful reads on target are all calculated based on the total number of reads before trimming. Average read length is calculated from the useful reads. !Indicates the samples that were paired-end sequenced on the Illumina MiSeq platform instead of the Illumina 2500 HiSeq platform. * indicates the samples that had a higher percentage of useful reads after shotgun sequencing compared to capture. Reads before trimming Endogenous Useful reads (%) Useful reads on DoC on target Average read length Sample content (%) target (%) Shotgun Capture Shotgun Capture Shotgun Capture Shotgun Capture Shotgun Capture Shotgun Capture CAV1107_P25 446,016 14,751,120 0.00 0.01 0.00 <0.01 No reads <0.01 0.000 0.014 NA 81.5 Collet08_P27 12,342,006 13,212,213 38.58 61.30 20.25 23.91 <0.01 6.60 1.405 40.046 76.0 83.0 Cougourlude_237 40,748,677 9,563,728 4.00 4.84 0.59 1.18 <0.01 0.41 0.052 1.435 55.9 61.2 HBG7054_P18 1,587,638 4,616,944 8.25 25.16 4.30 12.76 <0.01 4.49 0.015 8.961 66.9 76.0 HBG7172_P3! 249,166 28,379,895 0.40 1.37 0.17 0.45 <0.01 1.56 <0.01 1.435 NA 63.8 HBG7172_P17 10,631,281 5,986,991 3.83 51.06 2.09 10.04 <0.01 0.64 1.046 18.508 67.9 76.7 Madel_08_P22 375,017 14,036,347 5.93 20.51 3.37 8.26 <0.01 1.75 0.003 10.340 71.0 78.1 MAG2013_US4008_P1 500,501 13,205,143 <0.01 0.01 0.00 <0.01 No reads <0.01 0.000 0.004 62.3 68.7 MAG2013_US4015_P5 2,193,570 7,115,207 0.033 0.07 0.01 0.031 <0.01 <0.01 <0.01 0.028 55.1 61.4 MAG2013_US4015_P6 6,693,926 60,572,928 11.22 25.81 5.92 1.44* <0.01 0.59 1.073 1.468 74.8 76.3 MAG2013_US4015_P8 10,684,507 6,277,264 9.51 66.63 5.32 2.65* <0.01 0.93 1.092 2.378 76.1 79.8 MAG2013_US4015_P10 10,942,338 9,591,764 43.40 59.43 22.78 17.03* <0.01 2.11 1.372 6.286 76.2 81.0 MDV14_US12111_P1 482,278 NA 6.07 NA 2.96 NA <0.01 NA <0.01 NA 69.7 NA MDV14_US12111_P2 5,873,619 59,025,095 44.74 69.79 28.28 10.68* <0.01 0.90 1.281 22.626 88.2 90.3 MDV14_US12111_P4 27,898,979 8,118,691 60.13 74.07 33.78 36.44 <0.01 8.98 2.752 33.256 81.5 86.7 MDV14_US12111_P5 5,037,181 53,942,809 46.78 81.84 28.56 4.86* <0.01 0.40 1.249 8.984 90.1 90.1 MDV14_US12111_P7 12,582,798 5,227,058 15.00 45.45 9.35 16.41 <0.01 4.74 1.207 11.084 89.1 93.7 MDV14_US12111_P9 6,976,777 7,440,477 20.44 77.33 12.27 4.72* <0.01 1.85 1.146 5.957 81.3 90.1 MDV14_US13525_P5 642,342 13,737,742 0.094 0.19 0.043 0.09 <0.01 0.03 <0.01 0.139 58.3 63.8 MDV14_US13525_P7 942,747 13,185,747 4.16 10.63 2.33 4.94 <0.01 1.12 <0.01 6.896 75.4 82.2 MDV14_US14152_P4 628,774 14,022,440 4.53 13.49 2.25 5.47 <0.01 5.24 <0.01 9.002 65.4 73.3 MDV14_US14152_P7 23,070,948 4,101,987 1.72 5.05 0.80 2.05 <0.01 0.95 1.043 1.569 61.2 69.2 MDV14_US14152_P9! 303,704 5,625,870 4.73 14.21 2.45 6.58 <0.01 1.95 <0.01 4.465 NA 68.7 Montfer_P21 6,191,254 11,321,244 22.94 55.35 13.15 21.95 <0.01 7.71 1.127 39.833 78.0 86.2 Montfer_P23 9,837,811 6,138,969 23.79 47.43 13.34 21.90 <0.01 7.14 1.192 19.748 75.0 82.1 Montfer_P25 7,327,730 6,046,803 17.66 66.81 10.14 18.62 <0.01 10.37 1.108 27.821 75.1 84.6 Roumeg_P9 8,493,432 7,515,386 2.59 17.86 1.03 2.47 <0.01 1.28 1.043 3.753 57.4 63.2 Roumeg_P14 413,625 12,444,807 6.04 13.72 2.73 3.83 <0.01 0.51 <0.01 2.320 58.9 63.6 SAU3019_P2 278,572 5,715,490 36.25 36.32 19.66 11.39* <0.01 3.40 1.108 7.835 66.1 71.5 SAU3019_P4 9,033,450 NA 14.54 NA 7.44 NA <0.01 NA 0.013 NA 72.0 NA SAU3019_P8 17,809,286 6,541,549 9.16 68.25 5.34 18.03 <0.01 11.87 1.145 35.164 75.4 86.4 SAU3019_P9 274,688 14,579,544 7.13 15.81 3.41 5.53 <0.01 1.29 <0.01 7.285 60.1 65.1 SAU3019_P13 7,771,131 7,981,680 20.77 66.59 11.74 17.16 <0.01 7.01 1.146 35.274 73.1 82.3 SAU3019_P14 6,423,772 11,321,944 28.88 51.61 14.04 17.89 <0.01 8.86 1.150 21.134 85.6 71.4 65 °C. After each wash, the beads were incubated for 10 minutes at 65 °C, pelleted, and the supernatant removed. After the third wash, samples were spun down and the beads pelleted so that any leftover buffer could be removed. The beads were then re-suspended in 30 μl Molecular Biology Grade Water and stored at -20 °C.

Post-capture amplification. All captured libraries were amplified by adding 15 μl of the captured library to a Master Mix containing 8.5 μl ddH2O, 25 μl 2X KAPA HiFi HotStart ReadyMix, 0.75 μl PCR Primer 5, and 0.75 μl PCR Primer 6 R reamplification. They were run with the following settings: initial denaturing for 30 seconds at 98°C, denaturing for 20 seconds at 98 °C, annealing for 30 seconds at 60 °C, extension for 30 seconds at 72 °C. Denature, annealing and extension was repeated for 14 cycles, before a final extension for 5 minutes at 72 °C. PCR products were purified using the Qiagen QIAquick PCR Purification Kit, 250 μl PB buffer to bind, 700 μl to wash, spins at 9,000 x g, and 30 μl EB buffer incubated for 15 min at 37 °C to elute.

Captured libraries were tested for DNA concentration using the Qubit dsDNA HS Assay Kit (Life Technologies) following the same procedure as previously described. Based on Qubit measurements, 1 μl of the amplified libraries were diluted to 1-2 ng/ μl with ddH20, and the average insert size, concentration, and molarity in the 100-1000 bp region was determined using a High Sensitivity DNA Assay Kit for the Agilent 2100 Bioanalyzer as described above. Samples were pooled based on index compatibility and sample molarity, and were 100 bp Single-Read sequenced on the Illumina 2500 HiSeq platform.

Data analysis

The experiments performed in this study have four aspects; 1) to test the effect of capture on our dataset in a comparison to shotgun sequencing, 2) to investigate the possibility of determining the sex of the ancient samples using modern reference data. Through the characterisation of females in the population it is possible to infer gene flow into domesticated grapes from wild silvestris populations, 3) To explore the possibility of determining the berry colour of the ancient samples, and 4) to investigate the population structure of the ancient samples, and through a comparison with an extensive collection of SNPs from modern cultivars attempt to infer the most likely varieties among the ancient samples.

Sequencing. Raw Illumina reads were cleaned with AdapterRemoval (Lindgreen, 2012) to remove adapter sequences. Stretches of Ns and low quality bases at the ends of the reads were removed, and reads were discarded if they had an overall mapping quality <25 and a read length <30 basepairs (bp). Trimmed reads were mapped to the grape reference genome sequence (Genoscope 12x.0) using bwa aln (version 0.7.5a-r405, Li & Durbin 2009) and filtered with SAMtools (Li et al., 2009) for mapping quality >30. PCR duplicates, and reads mapping in more than one position in the genome were filtered out using Picard (http://picard.sourceforge.net), and the reads were realigned using Genome Analysis Toolkit (GATK, McKenna et al., 2010).

! ! 28! mapDamage (Jónsson et al., 2013) was used to display nucleotide misincorporation patterns, and calculate mean read lengths. The high rate of transition substitutions due to post-mortem deamination particularly near read termini was also used to confirm the reads as being of ancient origin.

Endogenous content was calculated for each sample as: !"#$%!!"#$%&!!"!!"##$%!!"#$% !×!100!% !"#$%!!"#$%&!!"!!!"#!$%!&!!"#$%

The percentage of useful reads (reads that had a quality >30, mapped only to one position, and had PCR duplicates collapsed to one read) for each sample was calculated as:

!"#$%!!"#$%&!!"!!"##$%, !"#$%&%'!!"#!!"#$%&'"(!!"#$% !×!100!% !"#$%!!"#$%&!!"!!"#$"%&"'!!"#$%

The rate of enrichment in endogenous content after capture was calculated as:

!"#$%&"$'(!!"#$%#$!!!"#$% − !"#$%&"$'(!!"#!!"#!"#$%&'

!"#$%&"$'(!!"#$%#$!!!"#$%

The rate of enrichment in useful reads after capture was calculated as:

!"#$%&!!"#$%!!!"#$% − !"#$%&!!"#$%!"#$%&'

!"#$%&!!"#$%!!!"#$%

To asses the efficiency of the capture, reads were screened for overlap with the target RNA baits using BEDTools Intersect (version 2.24.0, Quinlan and Hall, 2010) and the number of reads on target was calculated using SAMtools (Li et al., 2009).

Grape sex. To be able to determine the sex of the ancient grapes, genotypes were called on SNPs in sex-determining loci identified by Picq et al. (Supplementary table 3, Picq 2014 (Picq et al., 2014)). Five bases on each end of the reads were trimmed to avoid including misincorporations caused by DNA damage. The genotypes were called using GATK HaplotypeCaller (version 3.4.0, McKenna 2010) in bases with quality higher than 20. Only SNPs covered by at least 10 reads were kept.

For each sample, the probability of the obtained genotypes (G) in the context of being female or male was calculated as follows:

p(G | female) = Σ log( gfi)

p(G | male) = Σ log( gmi) where G is the genotype of that sample, gfi is the genotype frequency of that specific genotype, i, in the females and gmi is that frequency in males. Genotype frequencies for the reference dataset were obtained from Picq et al. (supplementary table 3, Picq 2014).

! ! 29! A heatmap showing the likelihood of being either male or female was made with R using the plotrix package (Lemon 2006), where each cell represents the difference between male and female likelihoods:

p(G|female) – p(G |male)

Berry colour. The grape berry colour was inferred by comparing the depth of coverage (DoC) over the genomic region on chromosome 2 containing the VvmybA1 gene and its upstream promoter. The insertion of a retrotransposon, Gret1, in the promoter region is responsible for the vast majority of white berry phenotypes (This et al., 2007).Two reference sequences were used; one with the Gret1 retrotransposon inserted in the promoter region, corresponding to the white berry colour allele, and another without the insertion, representing the black, wildtype allele. By looking at the DoC specifically at the 5’ and 3’ ends of the Gret1 insertion, samples that were homozygous or heterozygous for the white allele were expected to show reads spanning these positions. If the sample was homozygous for the black allele, however, no reads were expected to span the positions. To compare, DoC in the position of the insertion on the black reference was also investigated. Here, samples that were homozygous or heterozygous for the black allele were expected to span the position, while samples that were homozygous for the white allele would not be covered at the insertion point. By comparing the DoC at the insertion points of the two reference sequences, theoretically it should be possible to determine berry colour.

The Genoscope 12x.0 reference genome displays the white allele, so to obtain the black allele, the Gret1 insertion was excised from the reference sequence. Samples were then re-mapped to the two reference sequences using bwa (version 0.7.5a-r405, Li & Durbin 2009) and visualised using SAMtools (Li et al., 2009). Plots were made with the R plotrix package (Lemon, 2006) (Lemon 2006).

Population structure. NGSadmix version 29 (Skotte et al., 2013) was used to infer population structure on the ancient samples. This method is based on inferred genotype likelihoods, which makes it particularly well suited for data with low DoC. Genotype likelihoods were calculated for the ancient samples in the positions covered in the panel (8618 sites in total) using ANGSD (Korneliussen et al., 2014) on bases with quality >20, and on polymorphic sites (p-value <1e-6) covered by at least 50% of the individuals. NGSadmix was run assuming 2 to 5 populations (K=2- 5). For each K, 100 replicates were obtained and the one with the best likelihood was kept.

Without assuming a set number of populations, the population structure of the ancient grape samples in the context of modern cultivars was investigated. A subset consisting of 11,519 of the captured SNPs was used on a dataset comprised of 1,344 individuals provided by Roberto Bacilieri at INRA, France. This dataset consisted of 981 Vitis vinifera and 363 Vitis Silvestris accessions. The MDS analysis was made using the bammds tool (Malaspinas et al., 2014), and visualised using SSPS Statistics (version 23.0, IBM Corp. 2015)

! ! 30! Results Target enrichment

Endogenous content. The basic sequencing information is listed in Table 3. Before capture, the ancient samples had an endogenous content of 0.00-60.13% (avg. 15.4%). After capture, all samples displayed an increase in endogenous content, which ranged from 0.01-81.84% (avg. 35.9%, Table 3). The rate of enrichment differed a lot between samples (Fig. 3), but up to 12.34- fold increase in endogenous content was observed in captured libraries. In particular, the samples with low concentrations of grape DNA before capture showed a remarkable increase in endogenous content after being subjected to target enrichment. The samples with higher initial percentages of grape DNA showed more modest improvements.

Useful reads. When looking only at the bioinformatically useful reads, the percentages were more similar before and after capture, ranging from 0.00-33.78% (avg. 8.5%) and 1.9*10-3-36.44% (avg. 9.7%) for shotgun and captured libraries respectively (Table 3). An increase in useful reads reaching 3.8-fold was observed in captured libraries (Fig. 4), while seven samples showed a post- capture decrease in useful reads (Table 3). Overall, however, the same pattern as for the endogenous content was observed: libraries with fewer useful reads in shotgun libraries, showed a greater increase post-capture, while libraries that already had a high fraction of useful reads, did not have a high rate of enrichment.

On-target reads. Another interesting aspect of target enrichment was to see how many reads were actually on target. The shotgun libraries all had less than 1% reads on target when considering both mapped reads and useful reads. After capture these numbers were markedly different (Table 3). After capture, a 21,000-fold enrichment in mapped reads was observed in the sample that improved the most after capture. The lowest observed increase in mapped reads was 19.81-fold. The corresponding values for useful on-target reads after capture were 11,000 and 0.72-fold enrichments, respectively (Fig. 5).

Library composition. Overall, the composition of the libraries varied greatly between samples in

14.00 4.00

12.00 3.00 10.00

8.00 2.00

6.00 1.00 4.00 0.00 2.00 Enrichment in useful Enrichment reads 0.00 -1.00 Enrichment endogenousEnrichment content 0% 10% 20% 30% 40% 50% 60% 70% 0% 10% 20% 30% 40% 50% 60% 70% Shotgun endogenous content Shotgun endogenous content

Figure 4: Enrichment in endogenous content in captured Figure 3: Enrichment in useful reads in captured libraries. libraries. The rate of endogenous content enrichment increases The rate of useful reads enrichment increases in inverse in inverse proportions the endogenous content of the shotgun proportions the endogenous content of the shotgun library. library.

! ! 31! response to capture (Fig. 6, Fig. S.1). Shotgun 100000 Mapped reads libraries generally had a higher degree of 10000 Useful reads useful reads with few reads on target, while captured libraries showed higher proportions 1000 of on-target reads in combination with 100 elevated levels of clonality. Some shotgun 10 libraries had very low endogenous content and 1 were only slightly improved after capture enrichment On-target 0% 10% 20% 30% 40% 50% 60% 70% while others showed a marked difference in 0 Shotgun endogenous content the percentage of grape DNA between pre- and post-capture libraries. Finally, shotgun Figure 5: Enrichment in on-target reads in captured libraries. libraries with high initial grape DNA Both mapped and useful reads are shown. concentrations generally showed an improvement in the number of on-target reads, but also had higher levels of clonality and fewer useful reads than were observed prior to target enrichment.

Depth of coverage. The on-target DoC radically increased after capture (Table 3). Most of the shotgun libraries had a DoC on target of approximately 1 or less. The highest on-target DoC observed before capture was 2.75. After capture, DoC increased to as much as 40.05, and only 4 samples had an on-target DoC of less that one. This coincided with the massive post-capture increase observed in on target reads.

Capture. No discernable pattern was observed for the different capture protocols applied to the samples (Table 2). Interestingly, all the samples that exhibited a negative effect in useful reads, were captured in round 1 or 2. So were the two samples with the highest rate of enrichment after capture. The remaining samples from the first two rounds showed intermediate values of increase in on-target reads. If anything, the negative rates of post-capture enrichment obtained for useful reads appear to be correlated with the number of pre-capture PCR cycles, as every sample with fewer useful reads had received the highest number of cycles (Table 2). At the opposite ends of the

Figure 6: Stacked plots illustrating the composition of sequenced libraries before and after target enrichment. The on-target fraction (yellow) represents the fraction of reads that corresponds to the regions targeted by the capture baits, with unique reads shown in bright yellow and PCR duplicates (clonal reads) in dark yellow. The useful grape fraction (blue) represents the off-target reads that map to the grape reference genome before (clonal) and after (unique) filtering. The repetitive grape fraction (purple) represents endogenous reads that originate from genomic loci that are represented in multiple copies and are therefore not commonly considered bioinformatically useful. The ‘not mapped’ fraction (black) illustrates the reads that did not map to the grape reference genome, and likely originate from microorganisms.

! ! 32! Figure 7: Heatmap of the sex genotype likelihoods in ancient grapes. For each SNP, red sites denote the genotype that is more frequent in females. Green indicates the genotype most frequent in males, while orange indicates genotypes with similar frequencies in males and females. White indicates missing data. The SNPs circled in blue, are those identified by Picq et al. (2014) to be highly correlated with sex in grapes. spectrum, the libraries that produced high numbers of useful reads were those with the fewest cycles in PCR. !

Read length. An interesting pattern emerged when looking at the average read lengths of the samples. Older samples appeared to have shorter reads (55.9 bp (shotgun), Cougourlude_237), while younger samples (88.2 bp (shotgun), MDV14_US12111_P2) had longer reads. This is consistent with the findings of previous studies of ancient DNA (Allentoft et al., 2012). We also observed that the average read length increased in captured libraries (Table 3), with a mean increase across samples of 5.7 bp across samples. A paired T-test was performed to test the significance and illustrated that these differences were significant (P=2.36E-12).

Sex determination

It is important to remember the caveat that DNA from a seed may contain DNA from the mother in the seed coat as well as a mixture of maternal and paternal DNA in the embryo and endosperm. This complicates interpretations of ancient sex expression as three alleles may be present in a given pip; however, given that many archaeological seeds are reportedly devoid of an embryo and that the paternal DNA may be a minor contribution to the overall weight of the seed (Ebadi et al., 1996, Cadot et al., 2006), we have here inferred the colour of the berry to which the seed once belonged assuming the presence of only two maternal alleles.

! ! 33! To avoid an underestimation of heterozygotes and thereby biasing our results towards the homozygous females, we applied a threshold of DoC >10 to each SNP position before it was considered informative. Consequently, this caused many SNPs to be excluded from the analysis, as the ancient samples had very low coverage in the targets covering the sex-determining loci. The entire set of targets for the VSVV010 amplicon used by Picq et al. (2014) Picq et al. (2014) was consistently absent in all of the ancient samples, even when the threshold was lowered to DoC >4. All the SNPs from VSVV008 were present at DoC >4, but none reached a DoC above 10. Furthermore, VSVV006 and VSVV007 were poorly represented at DoC >10, while SNPs that had previously been shown to be highly correlated with sex determination from VSVV009 (Picq et al., 2014) were also excluded from the analysis due to low coverage. Put together, this implies a problem somewhere in the experimental or analytical process, pertaining perhaps to the effectiveness of the capture baits, the quality of the reference genome, or in mapping to approximate positions.

Nevertheless, we recovered information from 85 SNPs for 23 ancient samples. Most of these SNPs showed intermediate probabilities of being male or female (Fig. 7). The results of Picq et al. (2014) indicated that the VSVV001-VSVV005 and VSVV011 positions are not significantly correlated to sex determination. It was therefore not surprising that most of the ancient samples were undifferentiated in these SNPs. Some sites, however, showed strong signals of being either male or female, but whether this is an artefact of low coverage or another type of bias has yet to be investigated. In the study by Picq et al. (2014), a number of SNPs were found to be highly correlated with sex in grapes. Of these, 20 SNPs reached the minimum DoC of 10 and were included in the analysis (Fig. 8). 15 samples had at least one of the highly correlated SNPs that met the required threshold, while 19 samples did not. All of the covered, significant positions in VSVV006 showed similar probabilities of being male or female (Fig. 7), which left 13 informative SNPs in VSVV007 and VSVV009.

From these SNPs it was possible to identify one sample, MDV14_US12111_P7, which showed clear patterns of being female in 8 of the 13 covered SNPs, while no male probabilities were observed (Table 4). Four additional samples were tentatively assigned as females based on a higher number of SNPs displaying female genotype likelihoods, although these samples also exhibited male genotype likelihoods in a limited number of SNPs. One sample, Montfer_P23, had 10 of 13 SNPs that showed high probabilities of being either male or hermaphrodite, while no female probabilities in the highly correlated SNPs were observed. Another sample, Montfer_P21, showed a high probability of being male in 8 SNPs, while 2 sites exhibited female genotype likelihoods. This sample was also tentatively assigned as a male or hermaphrodite.

Berry colour

The interpretation of ancient berry colour may also be confounded by the presence of maternal and paternal DNA in the embryo and endosperm. However, as in the analysis of sex genotypes, we

! ! 34! Figure 8: Depth of coverage (DoC) across the insertion points of the Gret1 retroelement. a) The DoC across the length of the VvmybA1 gene and its upstream promoter with the insertion of the Gret1 retrotransposon. The thin orange line represents the genomic sequence, the thick orange line denotes the Gret1 retroelement, and the thick blue line denotes the VvmybA1 gene. b) The observed pattern of a grape that fits the expectation of a homozygote for the black allele when zooming in on the insertion points of the Gret1 retrotransposon. To the left and in the centre the white allele insertion points are shown for the 5’ and 3’ respectively. To the right DoC across the black allele is shown. c) The observed pattern of a sample that fits the expectation of a heterozygote with one black and one white allele. d) The observed pattern of this sample fits the expected pattern of a grape that is homozygous for the white allele. e) Tannat is a modern reference with black berries and therefore either a heterozygote or a homozygote for the black allele. The observed pattern does not fit the expectations of a black grape, however, in combination with the low coverage, it is not a suitable reference. The pink line represents the insertion point, while the blue line denotes a DoC of 1. Note the differences in the scaling of the y-axis. inferred the colour of the berry to which the seed once belonged assuming only two maternal alleles are present.!

By mapping sequencing reads from ancient grape pips to the genomic sequence of the VvmybA1 gene and its upstream promoter in the presence of the Gret1 retroelement, we observed large variation in the DoC across the length of the sequence (Fig. 8.a). With Gret1 being a retrotransposon, this is not unexpected. Nor is it a surprise that we observed variation in coverage across the VvmybA1 gene sequence, since there are four paralogous myb genes in the V. vinifera genome. To infer the berry colour, we looked at the DoC as well as the pattern of mapped reads

! ! 35! across the insertion points of Gret1, the retrotransposon responsible for the majority of white phenotypes. We expected to see homozygotes for the white allele, homozygotes for the black allele, and heterozygotes containing one white and one black allele. From our data we did indeed observe this pattern.

Fig. 6.b depicts the one sample that appears to be homozygous for the black allele. All reads were seen to stop exactly at the insertion points when mapped to the white allele. In contrast, the sample showed a DoC of 15 across the insertion point in the black allele. This was quite characteristic when compared to the other DoC patterns observed in the data (Fig. 6.c-d). The berry of this sample would therefore appear to have been black (Table 4).

Figure 9: Population structure based on different numbers of assumed populations (k). The colours indicate different populations.

! ! 36! Four samples fit the expected pattern of a heterozygote containing one black and one white allele (Fig. 6.c). Samples were only assigned to this haplotype/genotype if they had a DoC of at least 10 across the insertion point. In addition the DoC in the black allele had to be of a similar magnitude to the DoC of at least one of the positions in the white allele. In contrast to the pattern observed in the white allele in Fig. 6.b, the reads in Fig 6.c clearly spanned the 5’ and 3’ ends of the Gret1 retroelement. The DoC pattern indicated that these four samples would have had black berries (Table 4).

Samples were only assigned to the white genotype if they had more than 10 reads spanning the insertion points in the white allele (Fig 6.d). In addition, the DoC across the two positions in the white allele needed to be at least twice that of the black allele to rule out errors. Using these parameters, 14 samples were assigned as homozygous for the white allele, indicating a white berry colour (Table 4). However, many of these samples still had a fairly high coverage across the insertion point in the black allele.

One sample exhibited a DoC pattern that, under the criteria described here, did not allow it to be assigned to one of the haplotypes. Despite having a DoC of at least 10 across the insertion points in all alleles, the differences in coverage between the 5’ and 3’ positions in the white allele were large (16 and 45 respectively). So while the coverage in the black allele (10) was close to that observed in the 5’ of the white allele, and a tentative assignment as a black heterozygote would be fitting, the overall pattern of coverage seems more similar to the what has been observed for homozygotes for the white allele (Table 4, Fig. S2).

The remaining samples did not meet the criteria of having at least 10 reads spanning the insertion point and were therefore not assigned to a haplotype (Table 4, Fig. S2). A comparison to a modern cultivar, Tannat, known to have black berries (Da Silva et al., 2013) was attempted (Fig. 6.e), but the coverage in the regions investigated here was too low to meet the criteria used for the ancient samples, and did not give a clear picture of the DoC in the two alleles.

Population Structure

Based on genotype likelihoods and by assuming the presence of a different number of populations (k), we performed an NGSadmix analysis on our data in order to investigate the genetic structure within the population of ancient samples (Fig. 9). At k = 2, the samples that were most genetically divergent clustered in different ends of the spectrum. With an increasing number of assumed populations, different structures began to appear. For example, an interesting relationship was formed at k = 3 between the HBG samples and SAU3019_P8, suggesting they may originate from a distinct subpopulation. The Harbourg-Wihr (HBG) site was located far inland (Fig. 2) while the La Lesse-Espagnac (SAU3019) site sits on the coast of the Mediterranean Basin, and therefore implied that these locations exchanged and/or maintained similar grape varieties in the 2nd century AD.

At the k = 3 level the La Lesse-Espagnac site appeared to have a number of genetically different grape varieties (Fig. 9). The SAU3019_P13 and SAU3019_P14 samples belonged to a group that

! ! 37! was divergent from the SAU3019_P8 sample, while the SAU3019_P2, SAU3019_P4, and SAU3019_P9 samples showed varying levels of admixture. Indeed at all k-values the three latter samples were highly admixed. The complex structure of the Sauvian population might be related toits geographic location at the intersection of major trade routes from Mediterranean, Iberian and Atlantic cultures (Figueiral et al., 2010), which makes the exchange and intermixing of different varieties more probable. An alternative hypothesis is that a higher number of k-values might be better suited to describe the population structure.

Another interesting result of the admixture analysis was that the genetic structure of the Mas de Vignoles XIV (MDV14) site in Nîmes (Fig. 2) appeared to change through time. This site was the only location where we had samples from different time points, and it offered the opportunity to study the changes in viticulture from the Early Roman Era to the start of the Middle Ages. At k = 4 (Fig. 9), the samples from the 2nd-1st century BC deposit (MDV14_US14152) exhibited genotype likelihoods, which set them apart from all the other samples in this study. The population from the 5th- early 6th century AD showed a radical change in genetic composition when compared to the MDV14_US14152 population. One sample, MDV14_US13525_P5, belonged a cluster with a different genetic composition, while the other sample, MDV14_US13525_P7, showed high levels of admixture. 300 years later, the population structure of the samples, which were present at the site, had changed again. Overall, these early medieval samples (MDV14_US12111) appeared to be more similar both to the MDV14_US14152 and the MDV14_US13525 populations than the latter two were to each other.

Although a replacement in grape varieties is not surprising when it takes place over the course of 1000 years, these results were compelling when compared to the archaeological context of the Mas de Vignoles XIV site, which was provided by Laurent Bouby from INRAP, Montpellier. The MDV14_US14152 samples were dated to the 2nd-1st century BC at which time a farm occupied the site. The farm was replaced at the end of the 1st century BC and during the 1st century AD when two new, small farms were built. These coexisted until the end of the Roman period, where they were abandoned and larger farm was erected in their stead. The MDV14_US13525 samples originated from this time point, and the genetic difference between these and the MDV14_US14152 samples, appear to reflect the changing history of human occupation in the area. The same held true for the MDV14_US12111 samples, which were deposited during the 8th-9th century AD. During this period, the site, although not very well preserved, no longer showed signs of being used for human habitation. Instead it appeared to have been used for agriculture and artisanal purposes.

The initial observation from the MDS analysis identified two clusters that both comprised vinifera and silvestris accessions (Fig. 10). A previous study on the same dataset (Bacilieri et al., 2013), observed patterns that indicated gene flow into the domesticated population from local populations of silvestris, which might explain why wild grapes are scattered throughout the clusters. All of the ancient samples showed a higher affinity towards cluster in the lower-left portion of Figure 10 (extending approximately from (-0.3, 0.15) to (0.2, -0.4)). In general, they appeared more closely related to vinifera cultivars, which confirmed the initial morphological assessment made by Laurent Bouby and colleagues (Terral et al. 2010). We explored the reference data from the GrapeReSeq

! ! 38! project to identify the difference between the two main clusters, but found that no additional information was attached to the cluster in the upper-right portion of Figure 10 (extending approximately from (0.2, 0.2) to (0.6, 0)). The basis for the segregation of the two clusters therefore remained uncertain. As the second cluster could not provide us with useful information, it was excluded from downstream analysis.

We compared the clustering of wine and table grapes to the ancient samples (Fig. 11) and found an overall stronger relationship with wine grapes, which is consistent with the history of viniculture in France (McGovern, 2003, Figueiral et al., 2010). The group comprising MDV14_US12111_P1, P2 and P4, was more closely related to the main cluster of table grapes than any of the other samples. However, this area is also where the cluster of wine grapes is most dense, and the exact relationship is therefore unresolved. We also investigated several economically important cultivars in the MDS plot and compared them to ancient samples to infer genetic similarities. Two samples, MDV14_US12111_P7 and SAU3019_P13 clustered with Pinot noir, while two other samples, MAG2013_US4015_P5 and MAG2013_US4015_P8, appeared to be closely related to Chardonnay. It is important to note that we cannot currently determine if the ancient samples are genetic clones of these cultivars, and that other modern cultivars may be slightly more closely related to the ancient samples. However, given the historical traditions of Pinot and Chardonnay (Bowers et al., 1999), these findings merit further discussion.

Pinot noir is considered an ancient cultivar and is believed to have existed at least since the Roman period (This et al., 2006). Furthermore, it has a strong association with Northern France, where it has been grown for centuries (Bowers et al., 1999). In combination with its long history of predominantly vegetative propagation makes Pinot one of the more likely cultivars to show up in the MDS analysis. Chardonnay is known to be a cross between Pinot noir and Gouais that is believed to have originated in the Middle Ages. It is striking then that both of the samples that cluster with Chardonnay are from the Roman period.

Since Chardonnay is the progeny of Pinot noir, we would expect to see the two proposed Chardonnay samples, MAG2013_US4015_P5 and MAG2013_US4015_P8, fall in the same subpopulation as the two reputed Pinot noir samples, MDV14_US12111_P7 and SAU3019_P13. At k = 5, the clustering indicates that the Chardonnay samples and SAU3019_P13 are similar to each other, which lends some support to the results. MDV14_US12111_P7, however, is markedly different at every k, which subtracts credibility.

Discussion

Target enrichment

We compared the effectiveness of in-solution target enrichment to traditional shotgun sequencing using ancient and historic grape seeds as a model system. We found up to 12.34-fold enrichment in endogenous content (Fig. 3), and up to 3.80-fold increase in useful reads (Fig. 4) in captured libraries. The rate of enrichment in on-target reads was massive for both endogenous (up to 21,000-

! ! 39! fold) and useful reads (up to 11,000-fold; Fig. 5). Following capture, we found a change in library composition pertaining in particular to increased levels of clonality (Fig. 6), which appeared to be correlated to pre-capture PCR amplification. A significant increase in mean read length of 5.7 bp was also observed, indicating that the baits have a higher affinity towards longer reads. These results are all consistent with previous studies that have investigated the performance of target enrichment (Ávila-Arcos et al., 2011, Ávila-Arcos et al., 2015).

Some ancient grape libraries performed well in capture while others were less successful (Fig. 6; Fig. S1). The reason behind this might be difficult to determine as many factors influence the success of target enrichment. However, an obvious first point to consider is the preservation of the samples, and, by extension, the endogenous DNA concentration in the extracts, as well as the quality of this DNA. It is difficult, after all, to enrich for something that is not present to begin with. There does indeed seem to be a relationship between the amount of DNA in the sequenced libraries and the samples that were flagged as potentially problematic by the archeologists who handled the excavations. For example, the CAV1107_P25 and the Cougourlude_237 samples showed consistently low DNA concentrations throughout the experimental phase (Table 2), which was consistent with the low amount of bioinformatically useful reads in both the shotgun and captured libraries (Table 3).

Despite the limitations, however, the results of the target capture showed that not only were we able to sequence grape DNA from the ancient samples, we were also able to retrieve sufficient enrichment in bioinformatically useful reads to conduct further analysis on our dataset.

Subspecies Grape sex Vinifera Silvestris Ancient The reversal to functional hermahro- ,200 ditism in domesticated grapes allows for self-fertilisation and results in higher productivity. This makes it a highly ,000 desirable trait to grape growers and has therefore been selected for almost to the

mds_y point of fixation (Bacilieri et al., 2013). -,200 The wild ancestor of the domesticated grape retains a dioecious nature, so with the recent characterisation of a sex locus -,400 in grapes (Picq et al., 2014), one of the main objectives of this study was to investigate the presence of females -,250 ,000 ,250 ,500 ,750 mds_x within a dataset composed of grape seeds. We are currently unable to Figur 10: MDS plot showing the ancient grapes plotted against differentiate males from hermaphrodites modern vinifera and silvestris. in our dataset, but by identifying samples

! ! 40!

Page 1 with high probabilities of being female, we were able to infer whether roman and medieval winegrowers in France supplemented the gene pool of cultivated varieties through crosses with wild silvestris.

A tentative assignment of five females, three of which are from the Roman period while two are Early Medieval in origin, led us to the conclusion that geneflow did occur from the wild into domesticated individuals in the Horbourg-Wihr and La Lesse-Espagnac sites during the Roman Era and the Mas de Vignoles XIV site during the Middle Ages. These results are supported by recent, large-scale studies of the molecular structure in modern grape, which indicate that admixture between local populations of silvestris and vinifera have occurred frequently in the past (Myles et al., 2011, Bacilieri et al., 2013).

However, an unfortunate observation in our dataset is the apparent failure of the capture baits that were designed for the sex locus. In the berry colour analysis some samples reached a local DoC>100, yet in the sex locus, several targets including all those for VSVV010, did not even reach a DOC>4, much less the required threshold of DoC>10. A number of factors may have influenced the absence of data, chiefly among which are the efficiency of the capture baits, the quality of the reference genome, and the accuracy of the mapping. A previous study found that the optimal GC content for bait-hybridisation is 45 %, while lower and higher values significantly decreases the affinity of the baits (Tewhey et al., 2009). The GC content has not yet been explored in our targets, and at present it is therefore not possible to conclude whether this has biased our results. In relation to the quality of the reference genome, Picq et al. (2014) note that the positions of their VSVV008 and VSVV010 amplicons in the 12x.0 Use version are approximate. It has featured MDV14_US12111_P4 wine grape table grap Chardonnay into our considerations during the ,200 Pinot noir MDV14_US12111_P2 Ancient analysis, but it seems likely that we did not succeed in accounting for this and MDV14_US12111_P1 further analysis is required to resolve this. ,000 Finally, another problematic point of the analysis is that the female allele is located

mds_y on chromosome 2 in the 12x.0 version, -,200 while the hermaphrodite allele is located on the unassigned scaffold_233 (chromosome UnRandom). Some MAG2013_US4015_P5 -,400 sequence overlap is expected between MAG2013_US4015_P8 SAU3019_P13 these two alleles, and it is therefore MDV14_US12111_P7 possible that we are filtering out many of -,400 -,200 ,000 ,200 ,400 ,600 the informative reads. A potential mds_x solution to this problem would be to separately map against the female and Figur 11: MDS plot showing the ancient grapes plotted against modern table and wine grapes. Pinot noir and Chardonnay are hermaphrodite alleles, but this approach highlighted and the closest cultivars are indicated. has yet to be explored.

! ! 41!

Page 1 !! Berry colour

From the method we used to determine berry colour, we identified all three of the expected patterns of coverage in the upstream promoter of the VvmybA1 regulatory gene. The presence of all three patterns lends support to our model. In summary, we identified one homozygote for the black allele, while four samples appeared to be heterozygous for the white and black alleles. A total of 13 ancient grape pips yielded white genotypes. The differentiation between white and black berries is believed to have happened at least by the Roman Era (This 2006), which is consistent with the observations in our dataset (Table 4, Fig. S.2).

An interesting observation is that 5 of the 10 white samples from the Roman period were collected at the same site, La Lesse-Espagnac (SAU3019). A possible hypothesis, which might explain the exclusive presence of white grapes at this site, is that the establishment specialised in production. La Lesse-Espagnac is located near modern day Béziers, where major trade routes converged. Archaeological evidence has previously associated the site with wine production, which is particularly interesting as the wine from the Béziers area was renowned for high quality during the Roman Era. It therefore seems possible that the establishment was a supplier of wine for the local town or, perhaps, a greater market (Figueiral et al., 2010).

It seems somewhat excessive that out of 18 samples, 13 (72.2%) appear to have been varieties that bore white berries. Several factors might have influenced the distribution, but one important aspect of berry colour determination in grapes is that they have at least two genes, VvmybA1 and VvmybA2, which appear to be equally important in the creation of the white berry phenotype. Indeed, the individual needs to be homozygous for the non-functional allele in both genes before a white phenotype is observed (Vezzulli et al., 2012). This is a limitation to the current analysis, as we have only investigated the VvmybA1 gene. However, the collection of capture baits that were designed for this study also targeted the VvmybA2 gene, so with further analysis our dataset holds the potential of resolving the issue.

In addition, it is important to note that the method used to determine the berry colour of the ancient specimens has not yet been sufficiently verified. Since the phenotypes of all the ancient specimens are unknown, we tested our method on the sequence of a known black berry variety, Tannat (Fig. 8). However, as this modern specimen had a low coverage in the region, it did not give a clear image of the practicability of the method. Therefore a reference dataset representing several samples of known genotype and with a high coverage over the insertion point is needed before we can be completely confident in our methodology.

Another thing to note is that the method described here for determining berry colour, does not differentiate between black, red and pink genotypes. Several polymorphisms, which are responsible for berry colour variation in grapes, have been identified (Fig 1, Kobayashi et al., 2004, Lijavetzky et al., 2006; Yakushiji et al., 2006, Shimazaki et al., 2011), and so far the possibility of distinguishing these has not been explored for the ancient dataset. However, the fact that we remove all sequences, which map to more than one position in the genome, might complicate this. As there

! ! 42! are four characterised paralogous mybA-genes in grapes, these are bound to have some degree of sequence similarity. This means that we might be removing the informative regions from our dataset because we cannot tell where they belong.

Population structure

The NGSadmix analysis indicated that genetically similar varieties were grown at the Harbourg- Wihr (HBG) and La Lesse-Espagnac (SAU3019) site at similar time points, which could be an indication that similar varieties were grown in these locations. More work is needed in order to determine what those varieties might have been.

In addition, we found support for a change in genetic composition of the grapes that were grown at the La Lesse-Espagnac site during the 1000 years our samples spanned. This was found to be consistent with the archaeological analysis of a changing environment of human habitation.

Finally our analysis indicated the presence of Pinot noir in the Roman Era, SAU3019_P13, and the Middle Ages, MDV14_US12111_P7, and that it was grown in the Languedoc region no later that 225 AD. This appears to be consistent with the history of the cultivar as an archaic founder variety (This et al., 2006). Pinot noir is the ancestor of many progeny, like Pinot gris and Pinot blanc, that are derived through vegetative propagation and the accumulation of spontaneous somatic mutations. The results of the berry colour analysis indicated that both of these samples bore white berries, and it is therefore more likely that the two Pinot varieties are Pinot blanc.

The Chardonnay variety on the other hand is believed to have arisen during the Medieval period (Bowers et al., 1999), which is not consistent with our results. Both of the samples that cluster with Chardonnay in the MDS analysis are from the Roman period, and indicate that the cultivar was grown in the Languedoc region no later than the 4th century AD. If this is true, it has important implications for the history of the variety and the growing in Southern France. It was not possible to determine berry colour for these two varieties, although the NGSadmix analysis indicated that they might be related.

Both the NGSadmix and the MDS analysis come with their own potential problems. The NGSadmix analysis infers relatedness based on genotype likelihoods, which makes it better suited for aDNA where large amounts of missing data is a common occurrence. However, if many samples lack coverage in the positions used for the analysis, this will create a stronger relationship between sites that have few positions covered. This appears to be less of a problem in this case, as we observe that sites with low percentages of useful reads, like CAV1107_P25 and Cougourlude_237, do not exhibit similar patterns. Another potential bias is that the sites with low coverage often appear to have a higher degree of admixture, which might be the case of the Cougourlude_237 sample. The MDS analysis is based on genotypes rather than genotype likelihoods, which present a row of issues when trying to fit ancient DNA into the mix. It is currently unknown how uncovered positions affect the clustering of the ancient samples, and further

! ! 43! analysis is needed to account for the potential bias in directly comparing ancient and modern samples.

Concluding remarks

The first target capture experiment performed on ancient and historic Vitis vinifera subsp vinifera seeds was a success. We demonstrated the power of enrichment strategies in enhancing the amount

Table 4: Summary of results for each sample. Sex is only tentatively assigned. The berry colour genotype was only assigned for samples with a DoC of at least 10 across two of the insertion points. * Indicates the one sample that appears to be homozygous for the black allele. The sample with an ‘Unclear’ berry colour exhibit a DoC pattern which is more similar to a black berry, but because of the large difference in coverage between the 5’ and 3’ insertion points in the white allele, a colour genotype has not been assigned. The last column identifies close genetic relationships to economically important cultivars according to MDS analysis. NC: no coverage. ND: not determined. Sample Age Sex Berry colour Closest Cultivar CAV1107_P25 Roman NC Unknown ND Collet08_P27 Medieval ND Black ND Cougourlude_237 Iron Age NC Unknown ND HBG7054_P18 Roman ND White ND HBG7172_P3 Roman NC Unknown ND HBG7172_P17 Roman F White ND Madel_08_P22 Late medieval/ Early Modern ND White ND MAG2013_US4008_P1 Roman NC Unknown ND MAG2013_US4015_P5 Roman NC Unknown ND MAG2013_US4015_P6 Roman NC Unknown Chardonnay MAG2013_US4015_P8 Roman NC Unknown Chardonnay MAG2013_US4015_P10 Roman NC White ND MDV14_US12111_P1 Early Medieval NC Unknown ND MDV14_US12111_P2 Early Medieval ND White ND MDV14_US12111_P4 Early Medieval F Black ND MDV14_US12111_P5 Early Medieval NC Black ND MDV14_US12111_P7 Early Medieval ND White Pinot blanc MDV14_US12111_P9 Early Medieval NC Unknown ND MDV14_US13525_P5 Late Roman/ Medieval NC Unknown ND MDV14_US13525_P7 Late Roman/ Medieval NC *Black ND MDV14_US14152_P4 Early Roman ND Unclear ND MDV14_US14152_P7 Early Roman NC Unknown ND MDV14_US14152_P9 Early Roman NC Unknown ND Montfer_P21 Roman ND White ND Montfer_P23 Roman M/H Black ND Montfer_P25 Roman ND White ND Roumeg_P9 Roman NC Unknown ND Roumeg_P14 Roman NC Unknown ND SAU3019_P2 Roman ND White ND SAU3019_P4 Roman NC Unknown ND SAU3019_P8 Roman F White ND SAU3019_P9 Roman NC White ND SAU3019_P13 Roman F White Pinot blanc SAU3019_P14 Roman ND White ND

! ! 44! of data that can be attained from archaeobotanical remains, and used this information to infer the presence of white and black berries in in Roman and Medieval France. We further identified genetic variation between pips that were deposited at three different time points spanning 1000 years. Finally we tentatively identified two samples that might have been Pinot blanc, indicating that the founder variety was grown in the Languedoc Region by no later that 225 AD. We additionally characterised two samples that appeared to be Chardonnay, and, although more work is needed to confirm this, indicated that the variety originated several hundred years prior to what has previously been assumed.

References

ALLAN, A. C., HELLENS, R. P. & LAING, W. A. 2008. MYB transcription factors that colour our fruit. Trends in plant science, 13, 99-102. ALLENTOFT, M. E., COLLINS, M., HARKER, D., HAILE, J., OSKAM, C. L., HALE, M. L., CAMPOS, P. F., SAMANIEGO, J. A., GILBERT, M. T. P. & WILLERSLEV, E. 2012. The half-life of DNA in bone: measuring decay kinetics in 158 dated fossils. Proceedings of the Royal Society of London B: Biological Sciences, 279, 4724-4733. AMBAWAT, S., SHARMA, P., YADAV, N. R. & YADAV, R. C. 2013. MYB transcription factor genes as regulators for plant responses: an overview. Physiology and Molecular Biology of Plants, 19, 307-321. ANDERSON-CARPENTER, L. L., MCLACHLAN, J. S., JACKSON, S. T., KUCH, M., LUMIBAO, C. Y. & POINAR, H. N. 2011. Ancient DNA from lake sediments: bridging the gap between paleoecology and genetics. BMC Evolutionary Biology, 11, 30. ARADHYA, M. K., DANGL, G. S., PRINS, B. H., BOURSIQUOT, J.-M., WALKER, M. A., MEREDITH, C. P. & SIMON, C. J. 2003. Genetic structure and differentiation in cultivated grape, Vitis vinifera L. Genetical research, 81, 179-192. ARROYO‐GARCÍA, R., RUIZ‐GARCÍA, L., BOLLING, L., OCETE, R., LOPEZ, M., ARNOLD, C., ERGUL, A., UZUN, H., CABELLO, F. & IBÁÑEZ, J. 2006. Multiple origins of cultivated grapevine (Vitis vinifera L. ssp. sativa) based on chloroplast DNA polymorphisms. Molecular ecology, 15, 3707-3714. ÁVILA-ARCOS, M. C., CAPPELLINI, E., ROMERO-NAVARRO, J. A., WALES, N., MORENO-MAYAR, J. V., RASMUSSEN, M., FORDYCE, S. L., MONTIEL, R., VIELLE-CALZADA, J.-P. & WILLERSLEV, E. 2011. Application and comparison of large-scale solution-based DNA capture-enrichment methods on ancient DNA. Scientific reports, 1. ÁVILA-ARCOS, M. C., SANDOVAL‐VELASCO, M., SCHROEDER, H., CARPENTER, M. L., MALASPINAS, A. S., WALES, N., PEÑALOZA, F., BUSTAMANTE, C. D. & GILBERT, M. T. P. 2015. Comparative performance of two whole‐genome capture methodologies on ancient DNA Illumina libraries. Methods in Ecology and Evolution. AZUMA, A., KOBAYASHI, S., GOTO-YAMAMOTO, N., SHIRAISHI, M., MITANI, N., YAKUSHIJI, H. & KOSHITA, Y. 2009. Color recovery in berries of grape (Vitis vinifera L.)‘Benitaka’, a bud sport of ‘Italia’, is caused by a novel allele at the VvmybA1 locus. Plant Science, 176, 470-478. AZUMA, A., KOBAYASHI, S., MITANI, N., SHIRAISHI, M., YAMADA, M., UENO, T., KONO, A., YAKUSHIJI, H. & KOSHITA, Y. 2008. Genomic and genetic analysis of Myb-related genes that regulate anthocyanin biosynthesis in grape berry skin. Theoretical and applied genetics, 117, 1009-1019. BACILIERI, R., LACOMBE, T., LE CUNFF, L., DI VECCHI-STARAZ, M., LAUCOU, V., GENNA, B., PÉROS, J.- P., THIS, P. & BOURSIQUOT, J.-M. 2013. Genetic structure in cultivated grapevines is linked to geography and human selection. BMC plant biology, 13, 25. BATTILANA, J., EMANUELLI, F., GAMBINO, G., GRIBAUDO, I., GASPERI, F., BOSS, P. K. & GRANDO, M. S. 2011. Functional effect of grapevine 1-deoxy-D-xylulose 5-phosphate synthase substitution K284N on Muscat flavour formation. Journal of experimental botany, err231. BENTLEY, D. R., BALASUBRAMANIAN, S., SWERDLOW, H. P., SMITH, G. P., MILTON, J., BROWN, C. G., HALL, K. P., EVERS, D. J., BARNES, C. L. & BIGNELL, H. R. 2008. Accurate whole human genome sequencing using reversible terminator chemistry. nature, 456, 53-59.

! ! 45! BON, C., CAUDY, N., DE DIEULEVEULT, M., FOSSE, P., PHILIPPE, M., MAKSUD, F., BERAUD-COLOMB, É., BOUZAID, E., KEFI, R. & LAUGIER, C. 2008. Deciphering the complete mitochondrial genome and phylogeny of the extinct cave bear in the Paleolithic painted cave of Chauvet. Proceedings of the National Academy of Sciences, 105, 17447-17452. BOS, K. I., SCHUENEMANN, V. J., GOLDING, G. B., BURBANO, H. A., WAGLECHNER, N., COOMBES, B. K., MCPHEE, J. B., DEWITTE, S. N., MEYER, M., SCHMEDES, S., WOOD, J., EARN, D. J. D., HERRING, D. A., BAUER, P., POINAR, H. N. & KRAUSE, J. 2011. A draft genome of Yersinia pestis from victims of the Black Death. Nature, 478, 506-510. BOSS, P. K., DAVIES, C. & ROBINSON, S. P. 1996. Expression of anthocyanin biosynthesis pathway genes in red and white grapes. Plant molecular biology, 32, 565-569. BOUWMAN, A. S., KENNEDY, S. L., MÜLLER, R., STEPHENS, R. H., HOLST, M., CAFFELL, A. C., ROBERTS, C. A. & BROWN, T. A. 2012. Genotype of a historic strain of Mycobacterium tuberculosis. Proceedings of the National Academy of Sciences, 109, 18511-18516. BOWERS, J., BOURSIQUOT, J.-M., THIS, P., CHU, K., JOHANSSON, H. & MEREDITH, C. 1999. Historical genetics: the parentage of Chardonnay, , and other wine grapes of northeastern France. Science, 285, 1562-1565. BRIGGS, A. W., GOOD, J. M., GREEN, R. E., KRAUSE, J., MARICIC, T., STENZEL, U., LALUEZA-FOX, C., RUDAN, P., BRAJKOVIĆ, D. & KUĆAN, Ž. 2009. Targeted retrieval and analysis of five Neandertal mtDNA genomes. Science, 325, 318-321. BRIGGS, A. W., STENZEL, U., JOHNSON, P. L., GREEN, R. E., KELSO, J., PRÜFER, K., MEYER, M., KRAUSE, J., RONAN, M. T. & LACHMANN, M. 2007. Patterns of damage in genomic DNA sequences from a Neandertal. Proceedings of the National Academy of Sciences, 104, 14616-14621. BROWN, T. A., CAPPELLINI, E., KISTLER, L., LISTER, D. L., OLIVEIRA, H. R., WALES, N. & SCHLUMBAUM, A. 2015. Recent advances in ancient DNA research and their implications for archaeobotany. Vegetation History and Archaeobotany, 24, 207-214. BURBANO, H. A., HODGES, E., GREEN, R. E., BRIGGS, A. W., KRAUSE, J., MEYER, M., GOOD, J. M., MARICIC, T., JOHNSON, P. L. & XUAN, Z. 2010. Targeted investigation of the Neandertal genome by array-based sequence capture. science, 328, 723-725. CADOT, Y., MIÑANA-CASTELLÓ, M. T. & CHEVALIER, M. 2006. Anatomical, histological, and histochemical changes in grape seeds from Vitis vinifera L. cv during fruit development. Journal of agricultural and food chemistry, 54, 9206-9215. CAPORALI, E., SPADA, A., MARZIANI, G., FAILLA, O. & SCIENZA, A. 2003. The arrest of development of abortive reproductive organs in the unisexual flower of Vitis vinifera ssp. silvestris. Sexual Plant Reproduction, 15, 291-300. CAPPELLINI, E., GILBERT, M. T. P., GEUNA, F., FIORENTINO, G., HALL, A., THOMAS-OATES, J., ASHTON, P. D., ASHFORD, D. A., ARTHUR, P. & CAMPOS, P. F. 2010. A multidisciplinary study of archaeological grape seeds. Naturwissenschaften, 97, 205-217. CARPENTER, M. L., BUENROSTRO, J. D., VALDIOSERA, C., SCHROEDER, H., ALLENTOFT, M. E., SIKORA, M., RASMUSSEN, M., GRAVEL, S., GUILLÉN, S. & NEKHRIZOV, G. 2013. Pulling out the 1%: whole- genome capture for the targeted enrichment of ancient DNA sequencing libraries. The American Journal of Human Genetics, 93, 852-864. CAVALIERI, D., MCGOVERN, P. E., HARTL, D. L., MORTIMER, R. & POLSINELLI, M. 2003. Evidence for S. cerevisiae fermentation in ancient wine. Journal of molecular evolution, 57, S226-S232. CHARLESWORTH, D. 2015. Plant contributions to our understanding of sex chromosome evolution. New Phytologist. COOPER, A., LALUEZA-FOX, C., ANDERSON, S., RAMBAUT, A., AUSTIN, J. & WARD, R. 2001. Complete mitochondrial genome sequences of two extinct moas clarify ratite evolution. Nature, 409, 704-707. COOPER, A. & POINAR, H. N. 2000. Ancient DNA: do it right or not at all. Science, 289, 1139-1139. DA FONSECA, R. R., SMITH, B. D., WALES, N., CAPPELLINI, E., SKOGLUND, P., FUMAGALLI, M., SAMANIEGO, J. A., CARØE, C., ÁVILA-ARCOS, M. C. & HUFNAGEL, D. E. 2015. The origin and evolution of maize in the Southwestern United States. Nature Plants, 1. DA SILVA, C., ZAMPERIN, G., FERRARINI, A., MINIO, A., DAL MOLIN, A., VENTURINI, L., BUSON, G., TONONI, P., AVANZATO, C. & ZAGO, E. 2013. The high polyphenol content of grapevine cultivar tannat berries is conferred primarily by genes that are not shared with the reference genome. The Plant Cell, 25, 4777-4788. DER SARKISSIAN, C., ALLENTOFT, M. E., ÁVILA-ARCOS, M. C., BARNETT, R., CAMPOS, P. F., CAPPELLINI, E., ERMINI, L., FERNÁNDEZ, R., DA FONSECA, R. & GINOLHAC, A. 2015. Ancient genomics. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 370, 20130387.

! ! 46! DRESSMAN, D., YAN, H., TRAVERSO, G., KINZLER, K. W. & VOGELSTEIN, B. 2003. Transforming single DNA molecules into fluorescent magnetic particles for detection and enumeration of genetic variations. Proceedings of the National Academy of Sciences, 100, 8817-8822. DU, H., YANG, S.-S., LIANG, Z., FENG, B.-R., LIU, L., HUANG, Y.-B. & TANG, Y.-X. 2012. Genome-wide analysis of the MYB transcription factor superfamily in soybean. BMC plant biology, 12, 106. EBADI, A., SEDGLEY, M., MAY, P. & COOMBE, B. 1996. Seed development and abortion in Vitis vinifera L., cv. Chardonnay. International Journal of Plant Sciences, 703-712. EMANUELLI, F., BATTILANA, J., COSTANTINI, L., LE CUNFF, L., BOURSIQUOT, J.-M., THIS, P. & GRANDO, M. S. 2010. A candidate gene association study on muscat flavor in grapevine (Vitis vinifera L.). BMC Plant Biology, 10, 241. FECHTER, I., HAUSMANN, L., DAUM, M., SÖRENSEN, T. R., VIEHÖVER, P., WEISSHAAR, B. & TÖPFER, R. 2012. Candidate genes within a 143 kb region of the flower sex locus in Vitis. Molecular Genetics and Genomics, 287, 247-259. FIGUEIRAL, I., FABRE, L. & BEL, V. 2010. Considerations on the nature and origin of wood-fuel from gallo-roman cremations in the Languedoc region (Southern France). Quaternaire. Revue de l'Association française pour l'étude du Quaternaire, 21, 325-331. FOOTE, A. D., HOFREITER, M. & MORIN, P. A. 2012. Ancient DNA from marine mammals: studying long-lived species over ecological and evolutionary timescales. Annals of Anatomy-Anatomischer Anzeiger, 194, 112- 120. FOOTE, A. D., KASCHNER, K., SCHULTZE, S. E., GARILAO, C., HO, S. Y., POST, K., HIGHAM, T. F., STOKOWSKA, C., VAN DER ES, H. & EMBLING, C. B. 2013. Ancient DNA reveals that bowhead whale lineages survived Late Pleistocene climate change and habitat shifts. Nature communications, 4, 1677. FOURNIER-LEVEL, A., LACOMBE, T., LE CUNFF, L., BOURSIQUOT, J. & THIS, P. 2010. Evolution of the VvMybA gene family, the major determinant of berry colour in cultivated grapevine (Vitis vinifera L.). Heredity, 104, 351-362. FU, Q., MEYER, M., GAO, X., STENZEL, U., BURBANO, H. A., KELSO, J. & PÄÄBO, S. 2013. DNA analysis of an early modern human from Tianyuan Cave, China. Proceedings of the National Academy of Sciences, 110, 2223-2227. GANSAUGE, M.-T. & MEYER, M. 2013. Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nature protocols, 8, 737-748. GANSAUGE, M.-T. & MEYER, M. 2014. Selective enrichment of damaged DNA molecules for ancient genome sequencing. Genome research, 24, 1543-1549. GILBERT, M. T. P., BANDELT, H.-J., HOFREITER, M. & BARNES, I. 2005. Assessing ancient DNA studies. Trends in Ecology & Evolution, 20, 541-544. GRASSI, F., LABRA, M., IMAZIO, S., SPADA, A., SGORBATI, S., SCIENZA, A. & SALA, F. 2003. Evidence of a secondary grapevine domestication centre detected by SSR analysis. Theoretical and Applied Genetics, 107, 1315-1320. GREEN, R. E., KRAUSE, J., BRIGGS, A. W., MARICIC, T., STENZEL, U., KIRCHER, M., PATTERSON, N., LI, H., ZHAI, W. & FRITZ, M. H.-Y. 2010. A draft sequence of the Neandertal genome. science, 328, 710-722. GUGERLI, F., PARDUCCI, L. & PETIT, R. J. 2005. Ancient plant DNA: review and prospects. New Phytologist, 166, 409-418. HAMILTON, J. P. & ROBIN BUELL, C. 2012. Advances in plant genome sequencing. The Plant Journal, 70, 177- 190. HIGUCHI, R., BOWMAN, B., FREIBERGER, M., RYDER, O. A. & WILSON, A. C. 1984. DNA sequences from the quagga, an extinct member of the horse family. HOFREITER, M., POINAR, H. N., SPAULDING, W. G., BAUER, K., MARTIN, P. S., POSSNERT, G. & PÄÄBO, S. 2000. A molecular analysis of ground sloth diet through the last glaciation. Molecular Ecology, 9, 1975- 1984. HOFREITER, M., SERRE, D., POINAR, H. N., KUCH, M. & PÄÄBO, S. 2001. Ancient DNA. Nature Reviews Genetics, 2, 353-359. JAENICKE-DESPRES, V., BUCKLER, E. S., SMITH, B. D., GILBERT, M. T. P., COOPER, A., DOEBLEY, J. & PÄÄBO, S. 2003. Early allelic selection in maize as revealed by ancient DNA. Science, 302, 1206-1208. JÓNSSON, H., GINOLHAC, A., SCHUBERT, M., JOHNSON, P. L. & ORLANDO, L. 2013. mapDamage2. 0: fast approximate Bayesian estimates of ancient DNA damage parameters. Bioinformatics, btt193. KELLER, A., GRAEFEN, A., BALL, M., MATZAS, M., BOISGUERIN, V., MAIXNER, F., LEIDINGER, P., BACKES, C., KHAIRAT, R. & FORSTER, M. 2012. New insights into the Tyrolean Iceman's origin and phenotype as inferred by whole-genome sequencing. Nature communications, 3, 698.

! ! 47! KIRCHER, M. & KELSO, J. 2010. High‐throughput DNA sequencing–concepts and limitations. Bioessays, 32, 524- 536. KNAPP, M. & HOFREITER, M. 2010. Next generation sequencing of ancient DNA: requirements, strategies and perspectives. Genes, 1, 227-243. KOBAYASHI, S., GOTO YAMAMOTO, N. & HIROCHIKA, H. 2005. Association of VvmybA1 gene expression with anthocyanin production in grape (Vitis vinifera) skin-color mutants. Journal of the Japanese Society for Horticultural Science (Japan). KOBAYASHI, S., GOTO-YAMAMOTO, N. & HIROCHIKA, H. 2004. Retrotransposon-induced mutations in grape skin color. Science, 304, 982-982. KOBAYASHI, S., ISHIMARU, M., HIRAOKA, K. & HONDA, C. 2002. Myb-related genes of the Kyoho grape (Vitis labruscana) regulate anthocyanin biosynthesis. Planta, 215, 924-933. KORNELIUSSEN, T. S., ALBRECHTSEN, A. & NIELSEN, R. 2014. ANGSD: analysis of next generation sequencing data. BMC bioinformatics, 15, 356. KWOK, S. A. & HIGUCHI, R. 1989. Avoiding false positives with PCR. Nature, 339, 237-238. LEGRAS, J.-L., MERDINOGLU, D., CORNUET, J. & KARST, F. 2007. Bread, beer and wine: Saccharomyces cerevisiae diversity reflects human history. Molecular ecology, 16, 2091-2102. LEMON, J. 2006. Plotrix: a package in the red light district of R. R-news, 6, 8-12. LI, C., LISTER, D. L., LI, H., XU, Y., CUI, Y., BOWER, M. A., JONES, M. K. & ZHOU, H. 2011. Ancient DNA analysis of desiccated wheat grains excavated from a Bronze Age cemetery in Xinjiang. Journal of Archaeological Science, 38, 115-119. LI, H., HANDSAKER, B., WYSOKER, A., FENNELL, T., RUAN, J., HOMER, N., MARTH, G., ABECASIS, G. & DURBIN, R. 2009. The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078-2079. LIJAVETZKY, D., CABEZAS, J. A., IBÁÑEZ, A., RODRÍGUEZ, V. & MARTÍNEZ-ZAPATER, J. M. 2007. High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology. BMC genomics, 8, 424. LIJAVETZKY, D., RUIZ-GARCÍA, L., CABEZAS, J. A., DE ANDRÉS, M. T., BRAVO, G., IBÁÑEZ, A., CARREÑO, J., CABELLO, F., IBÁÑEZ, J. & MARTÍNEZ-ZAPATER, J. M. 2006. Molecular genetics of berry colour variation in table grape. Molecular Genetics and Genomics, 276, 427-435. LINDAHL, T. 1993. Instability and decay of the primary structure of DNA. nature, 362, 709-715. LINDGREEN, S. 2012. AdapterRemoval: easy cleaning of next-generation sequencing reads. BMC research notes, 5, 337. LIU, Z., MOORE, P. H., MA, H., ACKERMAN, C. M., RAGIBA, M., YU, Q., PEARL, H. M., KIM, M. S., CHARLTON, J. W. & STILES, J. I. 2004. A primitive Y chromosome in papaya marks incipient sex chromosome evolution. Nature, 427, 348-352. MALASPINAS, A.-S., TANGE, O., MORENO-MAYAR, J. V., RASMUSSEN, M., DEGIORGIO, M., WANG, Y., VALDIOSERA, C. E., POLITIS, G., WILLERSLEV, E. & NIELSEN, R. 2014. bammds: a tool for assessing the ancestry of low-depth whole-genome data using multidimensional scaling (MDS). Bioinformatics, 30, 2962-2964. MALENICA, N., ŠIMON, S., BESENDORFER, V., MALETIĆ, E., KONTIĆ, J. K. & PEJIĆ, I. 2011. Whole genome amplification and microsatellite genotyping of herbarium DNA revealed the identity of an ancient grapevine cultivar. Naturwissenschaften, 98, 763-772. MANEN, J.-F., BOUBY, L., DALNOKI, O., MARINVAL, P., TURGAY, M. & SCHLUMBAUM, A. 2003. Microsatellites from archaeological Vitis vinifera seeds allow a tentative assignment of the geographical origin of ancient cultivars. Journal of Archaeological Science, 30, 721-729. MARGUERIT, E., BOURY, C., MANICKI, A., DONNART, M., BUTTERLIN, G., NÉMORIN, A., WIEDEMANN- MERDINOGLU, S., MERDINOGLU, D., OLLAT, N. & DECROOCQ, S. 2009. Genetic dissection of sex determinism, inflorescence morphology and downy mildew resistance in grapevine. Theoretical and Applied Genetics, 118, 1261-1278. MARGULIES, M., EGHOLM, M., ALTMAN, W. E., ATTIYA, S., BADER, J. S., BEMBEN, L. A., BERKA, J., BRAVERMAN, M. S., CHEN, Y.-J. & CHEN, Z. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376-380. MARICIC, T., WHITTEN, M. & PÄÄBO, S. 2010. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PloS one, 5, e14004-e14004. MCGOVERN, P. E. 2003. Ancient wine: the search for the origins of viniculture, Princeton University Press. MCGOVERN, P. E., GLUSKER, D. L., EXNER, L. J. & VOIGT, M. M. 1996. Neolithic resinated wine. MEYER, M., KIRCHER, M., GANSAUGE, M.-T., LI, H., RACIMO, F., MALLICK, S., SCHRAIBER, J. G., JAY, F., PRÜFER, K. & DE FILIPPO, C. 2012. A high-coverage genome sequence from an archaic Denisovan individual. Science, 338, 222-226.

! ! 48! MILLER, W., DRAUTZ, D. I., RATAN, A., PUSEY, B., QI, J., LESK, A. M., TOMSHO, L. P., PACKARD, M. D., ZHAO, F. & SHER, A. 2008. Sequencing the nuclear genome of the extinct woolly mammoth. Nature, 456, 387-390. MULLINS, M. G., BOUQUET, A. & WILLIAMS, L. E. 1992. Biology of the grapevine, Cambridge University Press. MYLES, S., BOYKO, A. R., OWENS, C. L., BROWN, P. J., GRASSI, F., ARADHYA, M. K., PRINS, B., REYNOLDS, A., CHIA, J.-M. & WARE, D. 2011. Genetic structure and domestication history of the grape. Proceedings of the National Academy of Sciences, 108, 3530-3535. NEGI, S. & OLMO, H. 1971. Conversion and determination of sex in Vitis vinifera L.(sylvestris). Vitis, 9, 265-279. O'DONOGHUE, K., BROWN, T. A., CARTER, J. F. & EVERSHED, R. P. 1994. Detection of nucleotide bases in ancient seeds using gas chromatography/mass spectrometry and gas chromatography/mass spectrometry/mass spectrometry. Rapid Communications in Mass Spectrometry, 8, 503-508. OBERLE, G. 1938. A genetic study of variations in floral morphology and function in cultivated forms of Vitis. NY State Agric. Exp. Sta., Geneva, Tech. Bull, 250. OLALDE, I., ALLENTOFT, M. E., SÁNCHEZ-QUINTO, F., SANTPERE, G., CHIANG, C. W., DEGIORGIO, M., PRADO-MARTINEZ, J., RODRÍGUEZ, J. A., RASMUSSEN, S. & QUILEZ, J. 2014. Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European. Nature, 507, 225-228. ORLANDO, L., BONJEAN, D., BOCHERENS, H., THENOT, A., ARGANT, A., OTTE, M. & HÄNNI, C. 2002. Ancient DNA and the population genetics of cave bears (Ursus spelaeus) through space and time. Molecular Biology and Evolution, 19, 1920-1933. ORLANDO, L., GILBERT, M. T. P. & WILLERSLEV, E. 2015. Reconstructing ancient genomes and epigenomes. Nature Reviews Genetics. ORLANDO, L., GINOLHAC, A., ZHANG, G., FROESE, D., ALBRECHTSEN, A., STILLER, M., SCHUBERT, M., CAPPELLINI, E., PETERSEN, B. & MOLTKE, I. 2013. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature, 499, 74-78. OTTONI, C., FLINK, L. G., EVIN, A., GEÖRG, C., DE CUPERE, B., VAN NEER, W., BARTOSIEWICZ, L., LINDERHOLM, A., BARNETT, R. & PETERS, J. 2012. Pig domestication and human-mediated dispersal in western Eurasia revealed through ancient DNA and geometric morphometrics. Molecular biology and evolution, mss261. PALMER, S. A., MOORE, J. D., CLAPHAM, A. J., ROSE, P. & ALLABY, R. G. 2009. Archaeogenetic evidence of ancient Nubian barley evolution from six to two-row indicates local adaptation. PLoS One, 4, e6301. PERL, A., SAHAR, N., SPIEGEL-ROY, P., GAVISH, S., ELYASI, R., ORR, E. & BAZAK, H. Conventional and biotechnological approaches in breeding seedless table grapes. VII International Symposium on Grapevine Genetics and Breeding 528, 1998. 613-618. PICQ, S., SANTONI, S., LACOMBE, T., LATREILLE, M., WEBER, A., ARDISSON, M., IVORRA, S., MAGHRADZE, D., ARROYO-GARCIA, R. & CHATELET, P. 2014. A small XY chromosomal region explains sex determination in wild dioecious V. vinifera and the reversal to hermaphroditism in domesticated grapevines. BMC plant biology, 14, 229. POINAR, H., KUCH, M., MCDONALD, G., MARTIN, P. & PÄÄBO, S. 2003. Nuclear gene sequences from a late Pleistocene sloth coprolite. Current Biology, 13, 1150-1152. POINAR, H. N., HOFREITER, M., SPAULDING, W. G., MARTIN, P. S., STANKIEWICZ, B. A., BLAND, H., EVERSHED, R. P., POSSNERT, G. & PÄÄBO, S. 1998. Molecular coproscopy: dung and diet of the extinct ground sloth Nothrotheriops shastensis. Science, 281, 402-406. POINAR, H. N., SCHWARZ, C., QI, J., SHAPIRO, B., MACPHEE, R. D., BUIGUES, B., TIKHONOV, A., HUSON, D. H., TOMSHO, L. P. & AUCH, A. 2006. Metagenomics to paleogenomics: large-scale sequencing of mammoth DNA. science, 311, 392-394. PRETORIUS, I. S. 2000. Tailoring wine yeast for the new millennium: novel approaches to the ancient art of winemaking. Yeast, 16, 675-729. PRÜFER, K., RACIMO, F., PATTERSON, N., JAY, F., SANKARARAMAN, S., SAWYER, S., HEINZE, A., RENAUD, G., SUDMANT, P. H. & DE FILIPPO, C. 2014. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature, 505, 43-49. PÄÄBO, S. 1985. Molecular cloning of ancient Egyptian mummy DNA. PÄÄBO, S. 1989. Ancient DNA: extraction, characterization, molecular cloning, and enzymatic amplification. Proceedings of the National Academy of Sciences, 86, 1939-1943. PÄÄBO, S. & WILSON, A. C. 1988. Polymerase chain reaction reveals cloning artefacts. Nature, 334, 387. QUATTROCCHIO, F., WING, J. F., LEPPEN, H. T., MOL, J. N. & KOES, R. E. 1993. Regulatory genes controlling anthocyanin pigmentation are functionally conserved among plant species and have distinct sets of target genes. The Plant Cell, 5, 1497-1512.

! ! 49! RAGHAVAN, M., SKOGLUND, P., GRAF, K. E., METSPALU, M., ALBRECHTSEN, A., MOLTKE, I., RASMUSSEN, S., STAFFORD JR, T. W., ORLANDO, L. & METSPALU, E. 2014. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature, 505, 87-91. RASMUSSEN, M., ANZICK, S. L., WATERS, M. R., SKOGLUND, P., DEGIORGIO, M., STAFFORD JR, T. W., RASMUSSEN, S., MOLTKE, I., ALBRECHTSEN, A. & DOYLE, S. M. 2014. The genome of a Late Pleistocene human from a Clovis burial site in western Montana. Nature, 506, 225-229. RASMUSSEN, M., GUO, X., WANG, Y., LOHMUELLER, K. E., RASMUSSEN, S., ALBRECHTSEN, A., SKOTTE, L., LINDGREEN, S., METSPALU, M. & JOMBART, T. 2011. An Aboriginal Australian genome reveals separate human dispersals into Asia. Science, 334, 94-98. RASMUSSEN, M., LI, Y., LINDGREEN, S., PEDERSEN, J. S., ALBRECHTSEN, A., MOLTKE, I., METSPALU, M., METSPALU, E., KIVISILD, T. & GUPTA, R. 2010. Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature, 463, 757-762. REICH, D., GREEN, R. E., KIRCHER, M., KRAUSE, J., PATTERSON, N., DURAND, E. Y., VIOLA, B., BRIGGS, A. W., STENZEL, U. & JOHNSON, P. L. 2010. Genetic history of an archaic hominin group from Denisova Cave in Siberia. Nature, 468, 1053-1060. ROGAEV, E. I., MOLIAKA, Y. K., MALYARCHUK, B. A., KONDRASHOV, F. A., DERENKO, M. V., CHUMAKOV, H. & GRIGORENKO, A. P. 2006. Complete mitochondrial genome and phylogeny of Pleistocene mammoth Mammuthus primigenius. PLoS biology, 4, 403. SCHNABLE, P. S., WARE, D., FULTON, R. S., STEIN, J. C., WEI, F., PASTERNAK, S., LIANG, C., ZHANG, J., FULTON, L. & GRAVES, T. A. 2009. The B73 maize genome: complexity, diversity, and dynamics. science, 326, 1112-1115. SHAPIRO, B., DRUMMOND, A. J., RAMBAUT, A., WILSON, M. C., MATHEUS, P. E., SHER, A. V., PYBUS, O. G., GILBERT, M. T. P., BARNES, I. & BINLADEN, J. 2004. Rise and fall of the Beringian steppe bison. Science, 306, 1561-1565. SHIMAZAKI, M., FUJITA, K., KOBAYASHI, H. & SUZUKI, S. 2011. Pink-colored grape berry is the result of short insertion in intron of color regulatory gene. PloS one, 6, e21308-e21308. SKOTTE, L., KORNELIUSSEN, T. S. & ALBRECHTSEN, A. 2013. Estimating individual admixture proportions from next generation sequencing data. Genetics, 195, 693-702. SPIGLER, R., LEWERS, K., MAIN, D. & ASHMAN, T. 2008. Genetic mapping of sex determination in a wild strawberry, Fragaria virginiana, reveals earliest form of sex chromosome. Heredity, 101, 507-517. TEWHEY, R., NAKANO, M., WANG, X., PABÓN-PEÑA, C., NOVAK, B., GIUFFRE, A., LIN, E., HAPPE, S., ROBERTS, D. N. & LEPROUST, E. M. 2009. Enrichment of sequencing targets from the human genome by solution hybridization. Genome Biol, 10, R116. THIS, P., LACOMBE, T., CADLE-DAVIDSON, M. & OWENS, C. L. 2007. Wine grape (Vitis vinifera L.) color associates with allelic variation in the domestication gene VvmybA1. Theoretical and Applied Genetics, 114, 723-730. THIS, P., LACOMBE, T. & THOMAS, M. R. 2006. Historical origins and genetic diversity of wine grapes. TRENDS in Genetics, 22, 511-519. THOMAS, R. H., SCHAFFNER, W., WILSON, A. C. & PÄÄBO, S. 1989. DNA phylogeny of the extinct marsupial wolf. Nature, 340, 465-467. TUMPEY, T. M., BASLER, C. F., AGUILAR, P. V., ZENG, H., SOLÓRZANO, A., SWAYNE, D. E., COX, N. J., KATZ, J. M., TAUBENBERGER, J. K. & PALESE, P. 2005. Characterization of the reconstructed 1918 Spanish influenza pandemic virus. science, 310, 77-80. UCCHESU, M., ORRÙ, M., GRILLO, O., VENORA, G., USAI, A., SERRELI, P. F. & BACCHETTA, G. 2015. Earliest evidence of a primitive cultivar of Vitis vinifera L. during the Bronze Age in Sardinia (Italy). Vegetation History and Archaeobotany, 1-14. VEZZULLI, S., LEONARDELLI, L., MALOSSINI, U., STEFANINI, M., VELASCO, R. & MOSER, C. 2012. Pinot blanc and Pinot gris arose as independent somatic mutations of Pinot noir. Journal of experimental botany, 63, 6359-6369. VYSKOT, B. & HOBZA, R. 2015. The genomics of plant sex chromosomes. Plant Science, 236, 126-135. WALES, N., ALLABY, R., WILLERSLEV, E. & GILBERT, M. 2013. Ancient plant DNA. Encyclopedia of quaternary science, 2, 705-715. WALES, N., ANDERSEN, K., CAPPELLINI, E., ÁVILA-ARCOS, M. C. & GILBERT, M. T. P. 2014. Optimization of DNA recovery and amplification from non-carbonized archaeobotanical remains. PloS one, 9. WALKER, A. R., LEE, E., BOGS, J., MCDAVID, D. A., THOMAS, M. R. & ROBINSON, S. P. 2007. White grapes arose through the mutation of two similar and adjacent regulatory genes. The Plant Journal, 49, 772-785.

! ! 50! WALKER, A. R., LEE, E. & ROBINSON, S. P. 2006. Two new grape cultivars, bud sports of Cabernet Sauvignon bearing pale-coloured berries, are the result of deletion of two regulatory genes of the berry colour locus. Plant molecular biology, 62, 623-635. WARINNER, C., RODRIGUES, J. F. M., VYAS, R., TRACHSEL, C., SHVED, N., GROSSMANN, J., RADINI, A., HANCOCK, Y., TITO, R. Y. & FIDDYMENT, S. 2014. Pathogens and host immunity in the ancient human oral cavity. Nature genetics, 46, 336-344. WEBER, D., STEWART, B. S., GARZA, J. C. & LEHMAN, N. 2000. An empirical genetic assessment of the severity of the northern elephant seal population bottleneck. Current Biology, 10, 1287-1290. WILLERSLEV, E., CAPPELLINI, E., BOOMSMA, W., NIELSEN, R., HEBSGAARD, M. B., BRAND, T. B., HOFREITER, M., BUNCE, M., POINAR, H. N. & DAHL-JENSEN, D. 2007. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science, 317, 111-114. WILLERSLEV, E. & COOPER, A. 2005. Review paper. ancient dna. Proceedings of the Royal Society of London B: Biological Sciences, 272, 3-16. WILLERSLEV, E., HANSEN, A. J., BINLADEN, J., BRAND, T. B., GILBERT, M. T. P., SHAPIRO, B., BUNCE, M., WIUF, C., GILICHINSKY, D. A. & COOPER, A. 2003. Diverse plant and animal genetic records from Holocene and Pleistocene sediments. Science, 300, 791-795. WOOD, J. R., WILMSHURST, J. M., WAGSTAFF, S. J., WORTHY, T. H., RAWLENCE, N. J. & COOPER, A. 2012. High-resolution coproecology: using coprolites to reconstruct the habits and habitats of New Zealand’s extinct upland moa (Megalapteryx didinus). PloS one, 7, e40025. YAKUSHIJI, H., KOBAYASHI, S., GOTO-YAMAMOTO, N., TAE JEONG, S., SUETA, T., MITANI, N. & AZUMA, A. 2006. A skin color mutation of grapevine, from black-skinned Pinot Noir to white-skinned Pinot Blanc, is caused by deletion of the functional VvmybA1 allele. Bioscience, biotechnology, and biochemistry, 70, 1506-1508. ZEDER, M. A. 2015. Core questions in domestication research. Proceedings of the National Academy of Sciences, 112, 3191-3198. ZOHARY, D., HOPF, M. & WEISS, E. 2012. Domestication of Plants in the Old World: The origin and spread of domesticated plants in Southwest Asia, Europe, and the Mediterranean Basin, Oxford University Press on Demand. ZOHARY, D. & SPIEGEL-ROY, P. 1975. Beginnings of fruit growing in the Old World. Science, 187, 319-327.

Supplementary material Supplementary figures

Figure S1: Library composition for all the samples.

Figure S2: Berry colour depth of coverage in all the samples.

! ! 51! CAV1107_P25 Collet08_P27 Cougourlude 237 Extraction blank 2 Extraction blank 4 Extraction blank 7 Extraction blank 8 100% 80% 60% 40% 20% 0% HBG7054_P18 HBG7172_P13 HBG7172_P17 HBG7172_P3 Madel_08_P22 MAG2013_US4008_P1 MAG2013_US4015_P10 100% 80% 60% 40% 20% 0% MAG2013_US4015_P5 MAG2013_US4015_P6 MAG2013_US4015_P8 MDV14_US12111_P1 MDV14_US12111_P2 MDV14_US12111_P4 MDV14_US12111_P5 100% 80% 60% Reads 40% Not mapped 20% Repetitive grape 0% Useful grape (clonal) MDV14_US12111_P7 MDV14_US12111_P9 MDV14_US13525_P5 MDV14_US13525_P7 MDV14_US14152_9 MDV14_US14152_P4 MDV14_US14152_P7 100% Useful grape (unique) 80% On−target (clonal)

Percent of library Percent 60% On−target (unique) 40% 20% 0% Montfer_P21 Montfer_P23 Montfer_P25 Roumeg_P14 Roumeg_P9 SAU3019_P13 SAU3019_P14 100% 80% 60% 40% 20% 0% SAU3019_P2 SAU3019_P4 SAU3019_P8 SAU3019_P9 100% 80% 60% 40% 20% 0% Shotgun Capture Shotgun Capture Shotgun Capture Shotgun Capture Experiment Figure S1: Library composition for all the samples.

SAU3019_P8 5'Gret1 SAU3019_P8 3'Gret1 SAU3019_P8 Insertion site 80 80 80 60 60 60 DoC DoC DoC 40 40 40 20 20 20 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 SAU3019_P9 5'Gret1 SAU3019_P9 3'Gret1 SAU3019_P9 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 25 40 25 20 20 30 15 15 20 DoC DoC DoC 10 10 5 5 10 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Tannat_DoC_vvmybA1_white 5'Gret1 Tannat_DoC_vvmybA1_white 3'Gret1 Tannat_DoC_vvmybA1_white Insertion site

region[15515:15715] 6 region[5094:5294] region2[5094:5294] 8 10 5 6 4 8 3 DoC DoC DoC 4 6 2 2 4 1 0 0 2 14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 region[15515:15715] region[5094:5294] region2[5094:5294]

! ! 52! CAV1107_P25 5'Gret1 CAV1107_P25 3'Gret1 CAV1107_P25 Insertion site 1.0 1.0 1.0 0.5 0.5 0.5 0.0 0.0 0.0 DoC DoC DoC 0.5 0.5 0.5 − − − 1.0 1.0 1.0 − − − 14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Collet08_P27 5'Gret1 Collet08_P27 3'Gret1 Collet08_P27 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 120 80 100 100 60 80 DoC DoC DoC 40 80 60 20 40 60 0 20 14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Collet08_P2 5'Gret1 Collet08_P2 3'Gret1 Collet08_P2 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 3.0 1.0 3.0 0.8 2.0 2.0 0.6 DoC DoC DoC 0.4 1.0 1.0 0.2 0.0 0.0 0.0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Cougourlude_237 5'Gret1 Cougourlude_237 3'Gret1 Cougourlude_237 Insertion site

5 region[15515:15715] region[5094:5294] region2[5094:5294] 15 15 4 10 10 3 DoC DoC DoC 2 5 5 1 0 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 HBG7054_P18 5'Gret1 HBG7054_P18 3'Gret1 HBG7054_P18 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 40 60 35 30 50 30 40 25 20 DoC DoC DoC 30 20 15 10 20 10 10 0 5

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 HBG7172_P17 5'Gret1 HBG7172_P17 3'Gret1 HBG7172_P17 Insertion site 50 50 60 40 50 40 40 30 30 DoC DoC DoC 30 20 20 20 10 10 10 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 HBG7172_P3 5'Gret1 HBG7172_P3 3'Gret1 HBG7172_P3 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 20 25 20 20 15 15 15 10 DoC DoC DoC 10 10 5 5 5 0 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MAG2013_US4015_P10 5'Gret1 MAG2013_US4015_P10 3'Gret1 MAG2013_US4015_P10 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 15 15 15 10 10 10 DoC DoC DoC 5 5 5 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MAG2013_US4015_P5 5'Gret1 MAG2013_US4015_P5 3'Gret1 MAG2013_US4015_P5 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 1.0 1.0 1.0 0.5 0.5 0.5 0.0 0.0 0.0 DoC DoC DoC 0.5 0.5 0.5 − − − 1.0 1.0 1.0 − − − 14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MAG2013_US4015_P6 5'Gret1 MAG2013_US4015_P6 3'Gret1 MAG2013_US4015_P6 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 3.0 3.0 3.0 2.0 2.0 2.0 DoC DoC DoC 1.0 1.0 1.0 0.0 0.0 0.0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900

! ! 53! MAG2013_US4015_P8 5'Gret1 MAG2013_US4015_P8 3'Gret1 MAG2013_US4015_P8 Insertion site 8 7 8 6 6 5 6 4 4 DoC DoC DoC 4 3 2 2 2 1 0 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US12111_P2 5'Gret1 MDV14_US12111_P2 3'Gret1 MDV14_US12111_P2 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 40 40 40 30 30 30 20 DoC DoC DoC 20 20 10 10 10 0 5

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US12111_P4 5'Gret1 MDV14_US12111_P4 3'Gret1 MDV14_US12111_P4 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 80 100 100 60 90 80 80 40 DoC DoC DoC 60 70 20 40 60 0 20 50

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US12111_P5 5'Gret1 MDV14_US12111_P5 3'Gret1 MDV14_US12111_P5 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 25 25 25 20 20 15 15 20 DoC DoC DoC 10 10 5 15 5 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US12111_P7 5'Gret1 MDV14_US12111_P7 3'Gret1 MDV14_US12111_P7 Insertion site 35 35 region[15515:15715] region[5094:5294] region2[5094:5294] 35 30 30 30 25 25 20 25 20 DoC DoC DoC 15 20 15 10 15 10 5 5 10 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US12111_P9 5'Gret1 MDV14_US12111_P9 3'Gret1 MDV14_US12111_P9 Insertion site 20 20 10 8 15 15 6 DoC DoC DoC 10 4 10 5 2 5 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US13525_P5 5'Gret1 MDV14_US13525_P5 3'Gret1 MDV14_US13525_P5 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 1.0 1.0 1.0 0.8 0.8 0.5 0.6 0.6 0.0 DoC DoC DoC 0.4 0.4 0.5 − 0.2 0.2 1.0 0.0 0.0 − 14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US13525_P7 5'Gret1 MDV14_US13525_P7 3'Gret1 MDV14_US13525_P7 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 15 20 20 15 10 15 10 DoC DoC DoC 5 5 10 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US14152_P4 5'Gret1 MDV14_US14152_P4 3'Gret1 MDV14_US14152_P4 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 25 50 30 20 40 15 30 20 DoC DoC DoC 10 20 5 10 10 5 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 MDV14_US14152_P7 5'Gret1 MDV14_US14152_P7 3'Gret1 MDV14_US14152_P7 Insertion site

7 region[15515:15715] region[5094:5294] region2[5094:5294] 14 8 6 5 10 6 4 8 DoC DoC DoC 4 3 6 2 4 2 1 2 0 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900

! ! 54! MDV14_US14152_P9 5'Gret1 MDV14_US14152_P9 3'Gret1 MDV14_US14152_P9 Insertion site 25 15 30 20 10 15 20 DoC DoC DoC 10 5 10 5 0 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Madel_08_P22 5'Gret1 Madel_08_P22 3'Gret1 Madel_08_P22 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 40 35 30 30 30 25 20 20 20 DoC DoC DoC 15 10 10 10 5 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Montfer_P21 5'Gret1 Montfer_P21 3'Gret1 Montfer_P21 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 140 120 120 100 80 80 100 60 DoC DoC DoC 60 80 40 40 60 20 20 0 40

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Montfer_P23 5'Gret1 Montfer_P23 3'Gret1 Montfer_P23 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 80 50 60 40 60 50 30 DoC DoC DoC 40 40 20 10 30 20 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Montfer_P25 5'Gret1 Montfer_P25 3'Gret1 Montfer_P25 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 80 80 100 60 80 60 40 60 DoC DoC DoC 40 40 20 20 20 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Roumeg_P14 5'Gret1 Roumeg_P14 3'Gret1 Roumeg_P14 Insertion site 5 8 8 4 6 6 3 4 DoC DoC DoC 4 2 2 2 1 0 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 Roumeg_P9 5'Gret1 Roumeg_P9 3'Gret1 Roumeg_P9 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 25 20 20 20 15 15 15 DoC DoC DoC 10 10 10 5 5 5 0 0 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 SAU3019_P13 5'Gret1 SAU3019_P13 3'Gret1 SAU3019_P13 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 100 80 80 80 60 60 60 DoC DoC DoC 40 40 40 20 20 20 0

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 SAU3019_P14 5'Gret1 SAU3019_P14 3'Gret1 SAU3019_P14 Insertion site region[15515:15715] region[5094:5294] region2[5094:5294] 70 70 80 60 50 50 60 40 DoC DoC DoC 30 40 30 20 20 10 0 10

14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 SAU3019_P2 5'Gret1 SAU3019_P2 3'Gret1 SAU3019_P2 Insertion site 30 30 region[15515:15715] region[5094:5294] region2[5094:5294] 40 25 25 30 20 20 15 20 DoC DoC DoC 15 10 10 10 5 5 0 0 14251500 14251450 14251400 14251350 14251300 14241050 14241000 14240950 14240900 14241050 14241000 14240950 14240900 !

Figure S.2: Coverage across the Gret1 insertionpoint

! ! 55!