THE DISTRIBUTION OF GENOMIC DIVERSITY BETWEEN AND WITHIN CRYPTIC SPECIES OF FIELD AND SURFCLAMS

A Dissertation Presented to the Faculty of the Graduate School of Cornell University

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

by Nicholas Keech Fletcher

August, 2020

© 2020 Nicholas Keech Fletcher

PATTERNS OF GENOMIC DIVERSITY WITHIN AND BETWEEN CRYPTIC SPECIES OF FIELD VOLES AND SURFCLAMS

Nicholas Keech Fletcher, Ph. D. Cornell University 2020

Understanding how genetic diversity is distributed within and between species is a fundamental goal of evolutionary biology and conservation. Cryptic species are an especially interesting challenge to study in this context as they lack the obvious morphological differentiation that provide clues to how genetic diversity may be partitioned. In this dissertation, I analyze genetic diversity in two unique cryptic species groups, voles in the genus and surfclams in the genus Spisula, at multiple levels of sampling both genomically and geographically.

First, I analyze genome-wide single nucleotide polymorphism (SNP) data paired with environmental nice modeling (ENM) to understand how isolation glacial cycling has driven divergence in three cryptic European species formerly classified as Microtus agrestis. I find high levels of divergence distributed across the whole genome, with little evidence of ongoing gene flow well as divergence in environmental niches between species. There is a high level of predicted overlap between the niches of M. rozianus and M. lavernedii that has not resulted in any evidence in gene flow between these groups. These cryptic vole species have evolved high divergence across the genome in isolation during glacial cycling. Secondly, I use a dataset including microsatellite markers, mtDNA, and nuclear intron loci to examine patterns of genetic diversity in cryptic subspecies of the

Atlantic surfclam, Spisula solidissima. Recently, isolated populations of the lesser-

known southern Atlantic surfclam, S. s. similis, were found far north in the range of S. s. solidissima, a commercially important northern subspecies. To test a hypothesis of recency and isolation in the north, this newly found population of S. s. similis was sampled to provide the first population genetic comparison to a population in Georgia, as well as contrasts with S. s. solidissima. These recently discovered northern populations of S. s. similis showed a slight reduction in genetic diversity but high levels of connectivity with southern populations. Genetic differentiation was weak and only significant for microsatellite data. Based on coalescent modeling, I conclude that the northern S. s. similis populations most likely represent a post-glacial expansion, and it is hypothesized that there are unidentified stepping-stone populations connecting populations throughout its range.

Finally, I explore a characteristic pattern of genomic variation within many species in Britain, dubbed the Celtic Fringe, which is generated by the two waves of colonization after the Last Glacial Maximum. Here I analyze low-coverage whole genome sequencing and full mitochondrial genomes from 11 populations and 124 individuals of the short-tailed field vole (Microtus agrestis), to explore patterns of genome-wide introgression and divergence across the Celtic Fringe. I find high genome-wide divergence between our northern and southern populations, with a strong signature of isolation by distance north to south. There is also a signature of mito-nuclear divergence in my genetic data, and the southern mitochondrial haplotype has not introgressed as far north as the southern nuclear DNA. I find no signature of selection in the data, and my results point to genetic drift the major evolutionary process driving divergence genome-wide between our populations. This study offers insights into the evolutionary and demographic processes that drive Celtic Fringe patterns in Britain and have led to contemporary patterns of genetic diversity after climatic cycling.

BIOGRAPHICAL SKETCH

Nicholas (Nick) Keech Halvard Fletcher grew up in southern , where he went to Corona del Mar High School. His love for the natural world was sparked young ad he e hi eal da cachig ble bellie i Califia ad kama i Sede dig he mme. Nick gadaed fm Uiei f

California Berkeley with a B.A. in Integrative Biology in 2009. While at Berkeley, he worked with Dr. Jim McGuire, studying the evolutionary genetics of Indonesian gliding lizards in the genus Draco. Nick spent a summer as an undergraduate field tech in western Sumatra and the Mentawai Islands collecting herpetofauna for the Museum of Vertebrate Zoology. He also studied coral rubble-inhabiting crustacean cmmiie i Mea, Fech Pleia dig a emester-long field course in

2009. After Berkeley, Nick worked as a laboratory technician in Dr. Alicia McDgh lab a he Keck Schl f Medicine at the University of Southern California in . In the McDonough lab he helped investigate salt and potassium homeostasis, kidney disease and hypertension in rat models. Nick joined the Searle lab in the Department of Ecology and Evolutionary Biology at Cornell in 2012 to study evolutionary genomics in small and other organisms. In the last year of his PhD, Nick finished remotely while working as a part-time instructor of

Introductory Biology at Montgomery College, Takoma Park/ Silver Spring. He will begin a position as a professional-track instructor in the Department of Biology at the

University of Maryland shortly after his PhD.

v

This dissertation is dedicated to my morfar Dr. Bo Liander (1932-2020), whose love of the natural world and life-long learning inspired me to pursue this PhD.

vi

ACKNOWLEDGMENTS Firstly, I would like to thank Jeremy Searle for all of his guidance and support as my advisor during my PhD. Without his unwavering generosity and kindness, this dissertation would not be possible. I would also like to thank the rest of my committee: Matt Hare, Jerry Herman and Nina Therkildsen. They provided me with all kinds of support and were just all-around great people to interact with. Special thanks to Rick Harrison, he was a huge influence on my scientific thinking before he passed away and I am still inspired by him. I would also like to acknowledge the other professors in the department who helped me with other aspects of teaching or research: Irby Lovette, Betty McGuire, Michelle Smith, Kelly Zamudio, Harry Greene, Willy Bemis, Monica Geber, Greg Graffin, Jed Sparks, and Brian Powell. I would also like to thank my coauthors on my chapters, who were instrumental in helping my research ideas come together: Pelayo Acevedo, Joana Castro, Paolo C. Alves, and Nicolas Lou. Thank you to the revolving cast of Searle Lab members/visitors past and present who made it a great and supportive place to do research: Ben Johnson, Jacob Tyrell, Henry Kunerth, Jon Hughes, Trevor Sless, Frída Jóhannesdóttir, Soraia Barbosa, Rodrigo Vega, Tom White, Bret Pasch, Alex Gomez- Lepiz, Svetlana Pavlova, Mumtaz Baig, Joanna Stojak, Margherita Collini, Arden

Hulme-Beaman, Marja Heikkinen, and Alexandra Trinks. The Hare lab members were m ecd famil hak Ham Bchad-Weir, Cat Sun, Alyssa Wetterau,

Ellen George, Laura Eierman, and Mallory Choudoir. I have been lucky to work with a great group of undergraduate research and teaching assistants so thank you to Nikrumah Frazer, Tamar Law, April Townson, Matthew Sison, Beatriz Olivia, Brianna Mims, Samuel Coleman, Dylan Tarbox, Cassandra Ramirez, Allyson Evans, Bia Magie, Fak DeAi, Camille Magiaai, Bia OTle, ad Ma

Margaret Ferraro. EEB has fantastic staff that allows the department to function and

vii

are great people, thank you Carol Damm, Jennifer Robinson, Patty Jordan, Luanne

Kenjerska, John Howell, DeeDee Albertsman, Brian Mlodzinski, Alita Howard, Christine Bogdanowicz, Steve Bogdanowicz, Janeen Orr and Ashley Pendell. The EEB community (and beyond!) of grad students and post docs at Cornell not only provided me with invaluable intellectual and professional feedback, they became some of my best friends. The community is the greatest strength of the department, and I am truly indebted to everyone, but special thanks to (in no order): Nick Mason, Scott Taylor, Laura Porturas, Ann Bybee-Finley, Sahas Barve, Dave

Toews, Cait McDonald, Leo Campagna, Roxanne Razavi, Mariah Meek, Cinnamon Mittan, Petra Deane, Michelle Wong, Stepfanie Aguillon, Maria Akopyan, Kara Andres, Amelia Demery, Emily Funk, Olivia Graham, Natalie Hofmeister, Lizzie

Lombardi, Anyi Mazo-Vargas, Tram Nguyen, Jasmin Peters, Sue Pierre, Bhavya Sridhar, Young Ha Suh, Katie Sirrianni, Karin Van der Burg, Rachel Wilkins, MG Weber, Gideon Bradburd, Jennie Miller, Arne Jacobs, Lina Arcila Hernandez, Nati Garcia, James Lewis, Will Wetzel, Geoff Giller, Daniel Hooper, Liz Clayton-Fuller,

Collin Edwards, Suzi Claflin, Yasir Ahmed-Braimah, and Keeley MacNeill. I a hak m e famil: Ti, Blai, ad Picilla Tac. The hae been so welcoming and supportive of me in the last few years of my PhD.

I al a hak m ld famil. Aa ad Kal, ae he be iblig I could ever ask for and have tolerated and even encouraged all my shenanigans our whole lives. Love you so much. Mom and Dad, you are the most supportive, caring and generous people I know, I hit the jackpot with you two and love you very much. Finally, Allison, you were the best thing to happen to me during my PhD. I ld hae ee made i ih , ad I ca ai cie he adee ih you (and Dana)! Love you always and forever.

viii

TABLE OF CONTENTS

BIOGRAPHICAL SKETCH...... v

DEDICATION...... vi

ACKNOWLEDGMENTS...... vii-viii

LIST OF FIGURES...... x

LIST OF TABLES...... xi

CHAPTER 1: Glacial cycles drive rapid divergence of cryptic field vole species .....1

CHAPTER 2: Population history and genetic structure in the western Atlantic surfclam (Spisula solidissima sp.)...... 44

CHAPTER 3: Genome-wide analysis of the Celtic Fringe pattern of genetic diversity in British field voles...... 79

APPENDICES...... 123

ix

LIST OF FIGURES CHAPTER 1...... 1 FIGURE 1.1: Map of sampling localities of field voles and Procrustes transformed genetic and geographic coordinates...... 8 FIGURE 1.2: STRUCTURE plot for three cryptic field vole species...... 17 FIGURE 1.3: PCA summarizing 35,434 SNP loci for three cryptic species....18 FIGURE 1.4: Individuals used for parameterizing climatic niche models and distribution of cryptic species of the field vole...... 19 FIGURE 1.5: Climatic niche modeling of field voles during the LGM...... 20 FIGURE 1.6: Predicted overlap of climatic niches between species pairs...... 22 CHAPTER 2...... 44 FIGURE 2.1: Map of sampling localities of Spisula solidissima sp...... 50 FIGURE 2.2: STRUCTURE analysis of microsatellites for 5 populations of Spisula solidissima sp...... 59 FIGURE 2.3: STRUCTURE analysis of microsatellites for 3 populations of Spisula solidissima similis...... 59 CHAPTER 3 ...... 79 FIGURE 3.1: The Celtic Fringe and sampling of voles in Britain...... 84 FIGURE 3.2: PCA summarizing 5,890,000 SNPs across 124 individuals...... 94 FIGURE 3.3: PCAngsd admixture analysis for K= 2 7...... 95 FIGURE 3.4: NGSadmix analysis for K= 2...... 95 FIGURE 3.5: PCAngsd selection scan...... 95 FIGURE 3.6: Pairwise FST for highest FST populations...... 97 FIGURE 3.7: Mantel test for isolation by distance using SNP data...... 98 FIGURE 3.8: PCoA plot for 5,890,000 SNPs across 11 populations...... 98 FIGURE 3.9: EEMS plot of migration across 11 populations...... 99 FIGURE 3.10: STRUCTURE plot for K=2 for mtDNA...... 101 FIGURE 3.11: Haplotype network for mtDNA data...... 101 FIGURE 3.12: Mantel test for isolation by distance using mtDNA data...... 103 APPENDIX: SUPPLEMENTARY MATERIAL...... 123 FIGURE S1.1: Distribution of FST values for 35,434 SNPs...... 123 FIGURE S1.2: Calibration plots for climatic niche models...... 123 FIGURE S1.3: Comparison of 4 climatic niche models for present-day field voles...... 124 FIGURE S1.4: Calibration plots between predicted and observed occurrence of field vole cryptic species...... 124 FIGURE S1.5: Current climatic niche models for three field vole species....125 FIGURE S1.6: Climatic niche models for cryptic species in 2080...... 125 FIGURE S1.7: PCA summarizing SNPs within the short tailed field vole....127 FIGURE S2.1: Microsatellite allele frequency distributions applied to both S. s. solidissima and S. s. similis...... 135-137

x

LIST OF TABLES CHAPTER 1...... 1 TABLE 1.1: Average individual inbreeding coefficient and weighted FST...... 16 TABLE 1.2: Statistical validation of climatic niche models...... 21 CHAPTER 2...... 44 TABLE 2.1: Summary of Spisula solidissima sp. population sampling ...... 51 TABLE 2.2: Summary statistics for genetic diversity...... 57-58 TABLE 2.3: Pairwise FST estimates for mtDNA and nuclear intron loci...... 60 TABLE 2.4: Pairwise FST estimates for microsatellite loci...... 60 CHAPTER 3 ...... 79 TABLE 3.1: Pairwise FST values for all population pairs...... 97 TABLE 3.2: Number of base substitutions across mtGenome averaging over all sequence pairs between field vole populations...... 102 APPENDIX: SUPPLEMENTARY MATERIAL...... 123 TABLE S1.1: Field vole individuals and sampling localities...... 127-129 TABLE S1.2: Statistical comparison of climatic niche models...... 129 TABLE S1.3: Statistical discrimination of climatic niche models between cryptic species pairs...... 129 TABLE S2.1: Microsatellite amplification success across subspecies...... 134 TABLE S3.1: Field vole individuals and sampling localities...... 138-141

xi CHAPTER 1

GLACIAL CYCLES DRIVE RAPID DIVERGENCE OF CRYPTIC FIELD VOLE SPECIES Abstract

Understanding the factors that contribute to the generation of reproductively isolated forms is a fundamental goal of evolutionary biology. Cryptic species are an especially interesting challenge to study in this context since they lack obvious morphological differentiation that provides clues to adaptive divergence that may drive reproductive isolation. Geographical isolation in refugial areas during glacial cycling is known to be important for generating genetically divergent populations, but its role in the origination of new species is still not fully understood and likely to be situation dependent. We combine analysis of 35,434 single nucleotide polymorphisms (SNPs) with environmental niche modelling (ENM) to investigate genomic and ecological divergence in three cryptic species formerly classified as the field vole (Microtus agrestis). The SNPs demonstrate high genomic divergence (pairwise FST values of

0.45 - 0.72) and little evidence of gene flow among the three field vole cryptic species, pointing to genetic drift as an important mechanism for divergence in the group. The

ENM reveals three areas as potential glacial refugia for the cryptic species and differing climatic niches, although with spatial overlap between species pairs. This evidence underscores the role that glacial cycling has in promoting genetic differentiation and reproductive isolation by subdivision into disjunct distributions at glacial maxima in areas relatively close to ice sheets. Future investigation of the intrinsic barriers to gene flow between the field vole cryptic species is required to fully

1 assess the mechanisms that contribute to reproductive isolation. In addition, the

Portuguese field vole (M. rozianus) shows a high inbreeding coefficient and a restricted climatic niche, and warrants investigation into its conservation status.

Introduction

Genetic variation in contemporary taxa is often heterogeneous across space, with a patchwork of genetically distinct lineages that have been shaped by various historical and ecological processes (Moritz 1996, Avise 2000). These processes not only determine the geographic distribution of the entities that make up a species but also mould their adaptive potential and evolutionary trajectories (Waples 1991, Moritz

1996, Funk et al 2012). Genetically divergent lineages are the precursors of new reproductively isolated forms and can be considered the true origin of species

(Broughton and Harrison 2003, Feder et al 2012).

In the case of cryptic species, i.e. two or more similar species that were previously conspecific and essentially indistinguishable in gross morphology, the assessment of genetic differentiation is the foundation of their discovery (Sáez and

Lozano 2005, Bickford et al 2007). Cryptic species complexes form a critical, yet understudied, component of biodiversity and while (by definition) they are morphologically similar, life history, ecological or behavioural differences may impact their role in the ecosystem and their susceptibility to extirpation or need for conservation (Geller 1999, Ravaoarimanana et al 2004, Bickford et al 2007).

Behavioural differences may also allow members of cryptic species to distinguish their own kind through non-visual communication such as electrophysiological, auditory or

2 chemical cues (e.g. Feulner et al 2006, Narins 1983, Carde et al 1978). The genetic distinctiveness that allows cryptic species to be identified may be substantial, driven by the same processes of isolation and accmlation of genetic differences as in non crptic species complees (Sanders et al 2004, Paupério et al 2012), with implications for their ecological and evolutionary potential (Geller 1999, Ravaoarimanana et al

2004).

Genetic divergence between contemporary cryptic species may have been driven by geographic isolation during Pleistocene glacial maxima (e.g. Rand 1948,

Haffer 1969, Lovette 2005, Stewart et al 2010). Europe offers a particular opportunity to study cryptic speciation due to close proximity of ice sheets at glacial maxima, and geographic complexity of the region that resulted in multiple independent glacial refugia and subsequent northward expansions during interglacials (Hewitt 2000,

Stewart et al 2010). Populations in different refugia would have experienced different selective pressures during isolation that, along with genetic drift, would have promoted genetic divergence (Stewart et al 2010). The amount of genomic differentiation between separated refugial populations may reflect a variety of evolutionary processes, including but not limited to: local adaptation and divergent selection during isolation, the amount of gene flow between the refugia, and the extent of historic reductions in population size (Funk et al 2012, Kohn et al 2006, Thomas and Klaper 2004). Genetic drift is especially likely to have been important in refugia where population sizes were small enough to overwhelm the forces of natural selection (Stewart et al 2010). These multiple microevolutionary processes created the

3 opportunity for accelerated speciation during the glacial cycles (Weir and Schluter

2004).

The integration of genomic analyses and environmental niche modelling

(ENM) may allow a powerful investigation of cryptic speciation during the

Pleistocene. In the absence of direct observations, genomic analyses can serve as essential proxies for assessing both historic differentiation and current reproductive isolation between forms (Thomas and Klaper 2004, Kohn et al 2006, Bickford et al

2007). Additionally, ENM provides a valuable assessment of the ecogeographical gradients that limit a species distribtion, proiding insight into the aggregate environmental and geographical spaces of the species that may be important in distinguishing characteristics between cryptic species. Moreover, ENM can provide the information needed to generate and test hypotheses concerning the structure of genetic diversity that can be relevant to evaluate hypotheses of glacial refuges (e.g.

Richards et al 2007).

The system of cryptic speciation that we analyse here is within the vole genus

Microtus. This genus includes more than 65 species that originated roughly 2 million years ago (based on a robust fossil record), indicating that Microtus has an exceptionally high speciation rate, ~60-100 times higher than typical vertebrates

(Chaline et al 1999, Triant and DeWoody 2006). This speciation rate is matched in only a few other vertebrate groups, perhaps most notably the African cichlids

(Brawand et al 2014). Unlike cichlids, whose rapid radiation has been attributed to selection at multiple regions of the genome (Brawand et al 2014), the mechanisms underlying the high speciation rate in Microtus are poorly known. The genus has a

4 Holarctic distribution and many species occur in areas abutting the maximum glacial extent (Jaarola et al 2004). Therefore, the Microtus radiation offers an intriguing system to study complementary and potentially different mechanisms for rapid speciation compared to other classic vertebrate models of speciation.

In particular, the traditional Erasian field vole Microtus agrestis (Wilson and Reeder 2005) includes three major evolutionary lineages that split very recently, are morphologically and karyotypically similar, yet show up to 6% divergence at cytochrome b and reciprocal monophyly at multiple nuclear loci (Jaarola and Searle

2004, Giménez et al 2012, Paupério et al 2012). Following Paupério et al (2012) and

Krtfek (2017) these three divergent lineages will be considered cryptic species: the newly named Portuguese field vole (M. rozianus) and Mediterranean field vole (M. lavernedii), with the most widespread form, the short-tailed field vole, retaining M. agrestis as a Linnean name. According to Paupério et al (2012), the Portuguese lineage became separated from the combined Mediterranean and short-tailed field vole lineages around ~70,000 years ago, at the maximum glacial extent in northern Iberia, while the latter two lineages subsequently originated during the Last Glacial

Maximum (LGM) in Europe ~25,000 years ago. The Portuguese and Mediterranean cryptic species have relatively restricted distributions within the Iberian Peninsula and

Southern Europe, while the short-tailed field vole is present in much of Eurasia, from

Britain and France in the west to central Siberia in the east. Within the short-tailed cryptic species, six genetic lineages split during the Younger Dryas ~12,000 years ago and subsequently expanded to fill this range (Herman et al 2014). The dates for these splits are based on robust mtDNA molecular clock data, using calibrations based on

5 precise dating of ice sheet retreat and sea level rise that limited the colonization of

British Isles (Herman and Searle 2011). Therefore, the field vole cryptic species complex includes a hierarchy of differentiation caused by lineage splits associated with different climatic events and locations during the Late Pleistocene, making it an excellent system to understand the basis of rapid speciation and the role of recent glacial cycles in population divergence and speciation.

In this study we investigated the rapid divergence of the field vole cryptic species by using genotyping-by-sequencing (GBS; Elshire et al 2011) to sequence anonymous single nucleotide polymorphisms (SNPs) across the genome of 83 individuals. Our sampling includes individuals of all three field vole cryptic species from western Europe (Fig 1.1, Supplementary Table 1.1). We aim to characterise the level of genomic divergence that has led to the formation of cryptic species within the last glacial cycle, presumably driven by geographic isolation in glacial refugia. In order to assess the putative climatic basis of differentiation, we performed ENM to estimate the climate favourability for the field vole at the Last Glacial Maximum

(LGM) and, finally, we studied the biogeographic relationships among its cryptic species at present. Using these analyses, we characterised broad-scale genomic and ecological differentiation among the three field vole cryptic species and confirm the high levels of both genomic divergence and differences in their climatic envelope.

These analyses contribute to our understanding of how isolation during glacial cycling can cause rapid, yet high genome-wide divergence between populations that result in new (cryptic) species.

6 Methods

DNA extraction and genotyping-by-sequencing (GBS)

Genomic DNA was extracted from ethanol-preserved vole tissues (ears or digits) from

83 specimens, collected between 1995 and 2009 in western Europe (Fig 1.1, Table

S1.1), using the DNeasy Blood and Tissue kit (Qiagen, Valencia, CA) following standard protocols. GBS on extracted DNA was conducted by the Cornell Institute for

Genomic Diversity (Elshire et al 2011). The enzyme PstI (CTGCAG) was used for digestion, and the fragmented DNA was then ligated to a barcoded adaptor and a common adaptor with the appropriate sticky ends. Each individual was given a unique barcode combination, and after ligation all individuals were pooled into a single

Eppendorf tube. PCR was then run on the libraries, using primers that matched the barcoded and common adaptors, to amplify appropriately sized sequence fragments and to add sequencing primers to the libraries. Libraries were cleaned using a

Qiaquick PCR Purification Kit (Qiagen, Valencia, CA). Libraries were sequenced using single-end 100 bp reads on two separate lanes of an Illumina HiSeq 2000 at the

Cornell University Life Sciences Core Laboratories Center.

7

Figure 1.1: Map of western European sampling localities for the three species included in SNP analyses on the left. Portuguese (yellow), Mediterranean (red) and short-tailed (blue) field voles. Procrustes transformed genetic and geographic coordinates on the right.

SNP genotyping and filtering

Raw FASTQ files from the Illumina run were converted to individual SNP genotypes using the TASSEL GBS pipeline, as part of the TASSEL 5.0 software (Bradbury et al

2007). We implemented the standard TASSEL pipeline with the following parameters.

First we found all reads with barcodes that match the index file and trimmed them to

64 bp to create tags. Only tags with a minimum number of 5 reads were retained and subsequently merged into a master tag list for each individual. Only tags with one SNP per fragment were included in this analysis. All reads were aligned to the Microtus ochrogaster reference genome (McGraw et al 2011) using BWA (Li and Durbin

2009). The error rate for SNP calling was set to 0.03, with a genotypic mismatch rate set to 0.1. To minimise sequencing error, triallelic minor alleles were excluded, as

8 well as SNPs with a minor allele frequency < 0.05. Only SNPs that were present in >

60 of 83 individuals (~72%), and therefore shared across all three cryptic species, were included. It should be noted that changing this missing data threshold to 80%,

90%, and 100% did not change the pattern of our results (data not shown). Individuals with a proportion of missing SNP data > 0.25% were excluded and the average proportion of missing SNP data per individual was 0.066% across all individuals after filtering. Data were converted to .vcf files for subsequent analyses. To filter out SNPs from paralogous loci, SNPs were excluded if they showed significant (P < 0.05) excess heterozygosity and deviation from Hardy-Weinberg equilibrium as calculated using vcftools (Danecek et al 2011).

Genetic diversity patterns

Weir and Cockerhams FST and the inbreeding coefficient F were calculated using vcftools (Danecek et al 2011). To visualise genetic clusters, principal component analysis (PCA) was performed with the TASSEL 5.0 software using standard parameters (Elshire et al 2011). File conversion for subsequent analyses was done using the software PGDspider (Lischer and Excoffier 2012). To examine patterns of genetic differentiation and admixture, we used STRUCTURE version 2.3 with a

50,000 cycle run burn-in period followed by 50,000 cycles using an admixture model and correlated allele frequencies among groups (Pritchard et al 2000). This was repeated for K = 1 5. STRUCTURE HARVESTER (Earl et al 2012) was used to determine the K with highest log-likelihood. Data outputs were visualised using custom R scripts included on our Data Dryad submission. (R Development Core Team

9 2013). To investigate if genome-wide differentiation patterns are driven by geographic sampling of the individuals, we compared genomic PCA distributions to geographic distributions using a Procrustes transformation technique (Wang et al 2010). The first two PCA eigenvectors and the latitude and longitude of the individual sample locations were superimposed, with the PC axes rotated to maximise similarity to the geographic distribution of sampled locations. The significance of the association was evaluated based on 10,000 permutations. Procrustes transformations and maps were implemented in R using the MCMCpack and vegan packages (R Development

Core Team 2013).

Environmental niche modelling

Environmental niche modelling can provide information needed to understand the past structure of genetic diversity (e.g. Richards et al 2007), supplying complementary information to that obtained from phylogeography (e.g. Diniz-Filho et al 2013). In this context, we used ENM to: i) parameterise a model to be hindcasted for the LGM to make inferences about glacial refugia for the species, and ii) parameterise models for each cryptic species of field vole to determine their biogeographical relationships. i) Glacial refugia

The starting point for these analyses were distribution data from the Atlas of European

Mammals (Mitchell-Jones et al 1999) where the field vole is reported in 1439 UTM 50 km x 50 km (out of the 2553 squares in western Europe). Analyses were restricted to western Europe in order to exclude the most incomplete sampling areas identified by the authors of the Atlas (A.J. Mitchell-Jones, pers. comm.). Flojgaard et al (2009)

10 studied the main environmental gradients constraining the species distribution range and showed that a model including water balance and growing degree-days provided the best predictive ability for the occurrence of the field vole (AUC = 0.802), that was closely followed by a model based on annual mean temperature and annual precipitation (AUC = 0.766). However, in revisiting their data for western Europe, the two models are virtually equivalent (only when considering Siberia did the models differ). Therefore, given the interpretability of the predictor variables and the restriction to western Europe of our approach, we feel justified in the present study to parameterise a model including annual mean temperature (BIO1) and annual precipitation (BIO12) as climatic predictors, obtained from the Worldclim project

(http://www.worldclim.org/bioclim; Hijmans et al 2005). Accordingly, four models using the selected predictors were compared (quadratic terms were estimated by centring the variables on the mean, and they were included in order to account for non-linear responses of the species to these climatic gradients): model.1: ~ BIO1+BIO12 model.2: ~ BIO1+BIO12+BIO12 model.3: ~ BIO1+BIO12+BIO122 model.4: ~ BIO1+ BIO12+BIO12+BIO122

For ENM, the geographic area of analysis was first delimited using a Trend

Surface Analysis (TSA; see details in Acevedo et al 2012). Then, the dataset with the localities selected after TSA was split and each model was parameterised on the training dataset (80%) and its predictive performance was evaluated on the validation dataset (20%). For each model, logistic regression was used to relate the predictors

11 with species distribution range since it has proved to produce consistent results when models are hindcasted to past climatic scenarios (Acevedo et al 2015). Their performance was assessed on the validation dataset both in terms of discrimination

(AUC) and reliability (Hosmer and Lemeshow test, and calibration plots), the latter being more informative of their predictive capacity (see Jiménez-Valverde et al 2013).

Models were compared in terms of predictive performance and the best model was selected. The selected model was then transferred by projecting the model on bioclimatic variables for the LGM (Braconnot et al 2007). For this purpose, we considered the CNRM-CM5 Global Circulation Model for models transference.

Previously, as the models are not able to accurately predict beyond the range of values of the predictors used for training (Werkowska et al 2017), a multivariate environmental similarity surface (MESS) analysis was developed (Elith et al 2010).

Therefore, the model was not projected to the past in those squares outside the climatic range represented in the training dataset. ii) Biogeographical relationships

To determine the biogeographical relationships between cryptic species, we constructed a dataset including the distributions of the cryptic species based on 377 voles previously assessed using diagnostic mtDNA and nuclear markers sampled throughout the range [Portuguese (P) = 67 localities, Mediterranean (M) = 84 and short-tailed (St) = 226, data from: Jaarola and Searle 2002, Jaarola and Searle 2004,

Hellborg et al 2005, Herman and Searle 2011, Paupério et al 2012, Herman et al 2014,

J. Paupério unpublished data]. The first stage was to model, at locality level, pairs of adjacent cryptic species (i.e. P vs M and M vs St), using bioclimatic variables as

12 predictors, in order to obtain models able to discriminate between each pair of species.

The probabilities of species presence were then projected to the entire species distribution range and used to assign a particular cryptic species to each 50 km x 50 km UTM square with presence of the field vole (Mitchell-Jones et al 1999); for this purpose, the threshold that minimises the difference between sensitivity and specificity was used (Liu et al 2013).

In the next stage, the distribution of each cryptic species in western Europe at

50 x 50 km UTM square resolution was modelled. This procedure allows us to include in the ENMs a more complete range of species distribution avoiding the sampling effort bias that would be included when directly modelling raw data at sampling locality scale. For this purpose, we used the presence/absence data predicted in the previews stage. Modelling each cryptic species independently can result in an incomplete environmental response of the species to climatic conditions (e.g. Sanchez-

Fernandez et al 2011). Thus, a new model for each cryptic species was parameterised including just the predictors selected in the previous model including the overall species distribution plus the spatial factor (longitude and latitude) to account for the current spatial range in the predictions. Models for each cryptic species were developed using logistic regressions. Finally, to establish direct comparisons between ranges of cryptic species distribution, the logistic probabilities were included in the favourability function described by Real et al (2006). The favourability function is a valuable tool to study biogeographical relationships between models whatever the sample prevalence in the calibration datasets (e.g. see Acevedo and Real 2012). The biogeographical relationships between two cryptic species were assessed from the

13 favourabilities and using the fuzzy overlap index (FOvI; see details Acevedo et al

2010), i.e. the ratio between the degree to which the study area is favourable to the two studied cryptic species simultaneously and the degree to which it is favourable for either species (Dubois and Prade 1980; Kunchenva 2001). This index varies from 0

(no overlap in favourability) to 1 (complete overlap in favourability). The FOvI can be decomposed into absolute local overlap values (FOvI-L) that represent the contribution of each locality (UTM square) to the FOvI. Thus, the FOvI-L shows the spatial location of the areas where spatial overlap between species is expected to occur

(Acevedo et al 2010). According to these authors, three thresholds in FOvI-L can be defined: FOvI-L < 0.2 autoecological segregation, 0.2 > FOvI-L > 0.8 competitive exclusion, and FOvI-L > 0.8 coexistence. Absolute local overlap values indicate the minimum environmental favourability that a given locality achieves for the two species. Low values represent localities in which favourability is low at least for one of the species and therefore ecological separation is expected. Medium values are obtained for those localities in which a moderate favourability is obtained for both species; if one of the species achieves higher favourability values than the other in this overlap range competitive exclusion is expected. Finally, high overlap values represent localities highly favourable for both species and therefore coexistence is predicted. These analyses of biogeographical relationship between interacting species were carried out with the favourability values derived for ENM reported in Figure

S1.5.

14 Results

Sequence data quality control and filtering

We sequenced a total of 198,371,000 reads across 83 individuals. Of these,

166,336,045 reads contained unique barcodes and cut sites, and contained no Ns.

These reads corresponded to 692,765 tags across all individuals that met the minimum

5x threshold coverage. Of these tags, a total of 338,732 could be aligned to unique positions in the M. ochrogaster genome. After filtering for missing data, minor allele frequency and excess heterozygosity, there were 35,434 loci, with an average of

6.55% missing data across all sites and individuals.

Genetic diversity patterns

Global FST among the three cryptic field vole species, using the 35,434 loci, was 0.453 and the frequency distribution of FST across sites is shown in Fig S1.1. Weir and

Cockerhams eighted pairise FST estimates between cryptic species pairs and the average inbreeding coefficient (F) are summarised in Table 1.1. Pairwise FST was highest between the Portuguese and other cryptic species of field vole (vs

Mediterranean = 0.72; vs short-tailed = 0.69). There was a somewhat lower pairwise

FST = 0.45 between the Mediterranean and short-tailed cryptic species. The average inbreeding coefficient was distinctly higher for the Portuguese cryptic species

(F=0.75) than the others (Table 1.1).

15

Table 1.1: Average individual inbreeding coefficient (F) and Weir and Cockerhams eighted pairwise FST for the Portuguese, Mediterranean, and short-tailed cryptic species of the field vole

Portuguese Mediterranean Short-tailed F StDev FST FST FST

Portuguese 0.75 0.033 0.00

Mediterranean 0.63 0.068 0.72 0.00

Short-tailed 0.60 0.090 0.69 0.45 0.00

Our STRUCTURE analysis of 35,434 loci supported K = 3, as Structure

Harvester showed highest K log likelihood change from K = 2 to K = 3 as per the

Evanno et al (2005) method. The most probable K reflected the distribution of our three predicted cryptic species. The STRUCTURE clusters are composed of populations with extremely low admixture between the short-tailed and Mediterranean cryptic species of field vole, and no admixed individuals with both Mediterranean and

Portuguese cryptic species ancestry (Fig 1.2). These results are supported by PCA analysis; clear separation of all three cryptic species, with tight clustering of individuals within each species (Fig 1.3). PC 1 explained 39.5% of the variation among individuals and the PC1 axis clearly separated the Portuguese cryptic species from the Mediterranean and short-tailed. PC2 explained 15.7% of the variation among individuals and the Mediterranean cryptic species was distinctive along this axis. The rotated genetic coordinates shown graphically in the Procrustes analyses revealed that

16 the individual cryptic species cluster more tightly by refugial origin than would be predicted by geography alone (Fig 1.1B).

Figure 1.2: STRUCTURE plot showing K = 3 for short-tailed (blue), Mediterranean (red) and Portuguese (yellow) field voles.

17

0

4

)

%

7

0

.

2

5

1

(

2

C

P

0

0

2 -

-20 0 20 40 60 80

PC1 (39.5%)

Figure 1.3: PCA summarising 35,434 SNP loci for Portuguese (yellow), Mediterranean (red), and short-tailed (blue), field vole individuals.

Environmental niche modelling i) Glacial refugia

The bioclimatic predictors showed significant effects in all niche models (Table S1.2).

AICs for the models, their predictive performances and a cartographic representation of the predicted probabilities are shown in Figure S1.3. Model.4 was selected based on explanatory capacity and predictive performance. After MESS, model.4 was projected onto the LGM only in the areas within its environmental domain (Fig 1.4). Predictions for the LGM can be used to delimit potential refugia for the species (predicted P > 0.8; see Fig 1.5).

18

Figure 1.4: A) Individuals used for parameterising climatic niche models (N = 377) and the distribution of cryptic species of field vole (P: Portuguese (grey squares), M: Mediterranean (white circles) and St: short-tailed (black diamonds)) in western Europe (our area for modelling; see text for details). Area in grey shows the Microtus agrestis distribution in western Europe (Mitchell-Jones et al 1999). B) Predicted cryptic species distribution in Europe according to environmental niche modelling (see text for details) and the species distribution recorded by Mitchell-Jones et al (1999). Hatching = short-tailed, solid black = Mediterranean, solid grey = Portuguese. Grid cells represent the UTM 50 x 50 km squares

19

Figure 1.5: Predicted probability of occurrence of the field vole during the Last Glacial Maximum (LGM) in western Europe according to climatic niche model 4 (individual cryptic species not identified). Hatched areas were excluded for model transferability according to a MESS analysis. Grid cells represent the UTM 50 x 50 km squares.

ii) Biogeographical relationships

The models at locality level (see Table S1.3) discriminated well between the

Portuguese and Mediterranean cryptic species of field vole (AUC: 0.999 and H-L:

5.78, P > 0.05) and between Mediterranean and short-tailed cryptic species (AUC:

0.999 and H-L: 10.94, P > 0.05). These models were used to assign a cryptic species to each UTM square (Fig 1.6). Results of the subsequent independent models for each species distribution at UTM square level are shown in Table 1.2 (see also Fig 1.6).

20 Models showed that the climatic factors modulating the species distribution at a global scale are affecting each cryptic species differently (Table 1.2).

Table 1.2: Statistical parameters for validation (discrimination, Area Under the Curve (AUC); reliability, Hosmer and Lemeshow test (H-L)) of the climatic niche models and individual predictors (AUC/H-L) for the Portuguese, Mediterranean and short-tailed cryptic species of the field vole. (n.s., not significant; *. P < 0.05; **, P < 0.01; ***, P < 0.001)

Parameter Portuguese Mediterranean Short-tailed Model ~BIO1-BIO12+BIO12- ~-BIO1-BIO12+BIO12- ~-BIO1-BIO12-BIO12- BIO122 BIO122 BIO122 -X-Y -X-Y -X+Y AUC 0.995 0.888 0.882 H-L 4.21 n.s 12.46 n.s. 16.14 * Predictor BIO1 1.2e-01 / 2.04 * -3.9e-03 / -0.61 n.s. -3.6e-02 / -9.32 *** BIO12 -2.2e-03 / -3.55 *** -5.3e-04 / -6.02 *** -5.9e-04 / -15.56 *** BIO12 5.2e-03 / 3.07 ** 3.8e-03 / 7.54 *** -2.7e-03 / -7.58 *** BIO122 -1.1e-05 / -2.94 ** -16e-06 / -5.12 *** -7.8e-07 / -1.76 n.s. Longitude -1.0 / -4.62 *** -7.2e-02 / -7.70 *** -6.4e-02 / -8.38 *** Latitude -6.5e-01 / -5.22 *** -3.2e-01 / -12.06 *** 1.2e-01 / 9.20 *** (Intercept) 1.7e+01 / 1.90 * 1.2e+01 / 7.90 *** 9.3e-01 / 0.82 n.s.

The biogeographical relationships between cryptic species are shown in Fig

1.6. The results suggest the existence of zones of potential co-occurrence in most of the core area of the distribution of the Portuguese cryptic species. Areas where the

Portuguese cryptic species could be excluded by the Mediterranean are limited and currently distributed close to the edge of the Portuguese cryptic species distribution.

By contrast, coexistence areas for the Mediterranean and short-tailed cryptic species are not predicted by the models. Predicted coexistence areas are those in which the environmental conditions are highly favourable for both species. The scarcity of the predicted areas of coexistence for the short-tailed and Mediterranean cryptic species is explained by the small area in which the two species overlap and interact being in localities that are far from each of their optimal niche and at the edge of their ranges, and therefore will limit their overlap. In addition, even when models showed a wide

21 territory of potential competition between cryptic species, no clear superiority is shown by either lineage.

Figure 1.6: Biogeographical relationships between the cryptic species of the field vole. (P: Portuguese, M: Mediterranean and St: short-tailed). Mean favourability scores (defined for each species) are plotted against gradients defined by the absolute local overlap values. The gradients were divided in natural intervals, and mean favourability values (± SD) are shown. The number of sampling sites at each interval is also shown in columns. Areas with higher local overlap of favourability scores between species increase on the x-axis. Fixed intervals are defined (dotted vertical lines) in the charts and mapped, with light green being the lowest, orange and maroon being median, and dark green being the highest local overlap. Grid cells represent the UTM 50 x 50 km squares.

22 Discussion

We showed substantial genome-wide differentiation between the Mediterranean, short-tailed and Portuguese cryptic species of the field vole, which were until recently considered a single species (Mathias et al 2017). This high genetic differentiation is reflected in their high levels of FST across the genome as well as by the clear separation of clusters in the PCA. The Procrustes analysis comparing separation in

PCA space to the geographic separation of individuals sampled confirms that current geographic sampling location alone cannot explain these patterns, and that they cluster instead according to different refugial origin of each cryptic species. Analyses of short-tailed field vole populations alone revealed high intraspecific genetic structure, suggesting that even more recent climatic oscillations may play a role in driving divergence within species (Fig S1.7, Herman and Searle 2011). The divergence is genome-wide across most loci, as shown in Fig S1.1 and is unlikely that it is driven by selection alone. The levels of pairwise FST between the cryptic species of the field vole are extremely high, especially when considering that these lineages have only been separated for 30 - 70,000 years (Paupério et al 2012). High levels of genetic drift along with low levels of admixture between the cryptic species is most likely to have driven these divergence patterns. A short generation time (less than 1 year; Mathias et al

2017) is likely important for the rapid evolution and, importantly, this species is also known for its multi-year population cycling, which if occurring in isolated populations during refugial isolation, may increase fixation of alternative alleles and drive divergence between lineages (Ehrich et al 2009).

23 Our data also indicate a divergence in the environmental niche among the three cryptic species, although at a lower level than that observed from the genomic data.

Interestingly, this is particularly true when comparing the environmental niche between the Portuguese and Mediterranean cryptic species, which shows low divergence despite the genomic differentiation being the highest observed. The short- tailed cryptic species occupies a relatively cooler and more humid niche than the others, and the Portuguese cryptic species a relatively hotter and more humid niche than the Mediterranean. Although there are differences between the climatic niches of each cryptic species, our models predict a wide geographic area of potential overlap between the Portuguese and Mediterranean cryptic species, but this is not reflected in the contemporary distribution of the genetic lineages. This suggests some pre or post zygotic barrier to reproduction between these groups. These intrinsic barriers to reproduction would promote species boundaries, despite similar external morphologies and climatic niches (Rowe et al 2011). A recent fine scale study on the contact zone between the Portuguese and Mediterranean cryptic species also supports our conclusions, where no hybrid individuals were found and the contact zone is estimated to be less than 15 km wide, despite no obvious geographic or ecological barriers (J.

Paupério, unpublished data). Both of these lines of evidence suggest that the

Mediterranean and Portuguese cryptic species are essentially reproductively isolated despite less than ~70,000 years of divergence. This is contrasted with the pattern of differentiation we find between the more recently diverged Mediterranean and short- tailed field voles, which have lower genetic differentiation, but also much less overlap in our models (Fig 1.6). Although the results suggest that reproductive isolation is

24 acting as a barrier to gene flow and limiting the distribution of each of the species, they may also differ in other aspects of their ecology that could have led to other axes of niche specialization that are not measured in our environmental niche models.

It has previously been shown that allopatry during glacial refugial periods allows for the accumulation of genetic differences between lineages, including differences that impact reproductive isolation, determining the amount of subsequent gene flow between forms (Barton and Hewitt 1985, Weir and Schluter 2004).

Reproductive isolation is even more likely to occur when small population sizes in isolated refugia allow for the fixation of genetic variants (including chromosomal rearrangements, but also genic incompatibilities) that affect the fitness of hybrids upon secondary contact (Searle 1993, Hewitt 1996, Rowe et al 2011, Dion-Côté and

Barbash 2017). Although isolation during glacial cycling and subsequent hybridisation has been suspected as a scenario that promotes rapid speciation (Hewitt 1996), the particular geographic context matters to determine whether speciation will occur

(Klicka and Zink 1997).

This study elucidates one potential factor that has contributed to speciation in microtine . Voles in the genus Microtus represent one of the most speciose genera of rodents in the world, with over 65 species that arose in the last ~2 million years (Musser and Carleton 2005, Chaline et al 1999). This suggests a speciation rate that is matched by few other vertebrate groups that are typically attributed to adaptive radiations resulting in rapid divergence in morphology and ecology (Brawand et al

2014). Many species groups within Microtus have distributions relatively close to where ice sheets occur during glacial maxima and isolation during cyclical glacial

25 maxima may have caused divergence, reproduction isolation and ultimately speciation, between these closely related lineages (Chaline et al 1999, Weir et al 2016).

Environmental niche modelling hindcasted to the LGM shows multiple potential refugial areas for the field vole, each population evolving in allopatry: one in the northeastern Iberia (delimited by Ebro river) (the Portuguese cryptic species), another in France or northern Italy (the Mediterranean cryptic species) and finally another in the Balkans (short-tailed cryptic species) (Fig 1.5). These distributions of refugia are in line with previous hypotheses of refugial locations for the field vole (Paupério et al

2012). The distribution of these cryptic species of the field vole reflect common phylogeographic breaks in Europe, which underscores the role that glacial cycles have played in structuring genetic diversity in the region (Taberlet et al 1998). In particular, the highly divergent Portuguese field vole species represents another endemic to the Iberian Peninsula, a region known to harbour the highest level of endemicity in Europe, and already home to another endemic vole in the genus

(Microtus cabrerae) (Bilton et al 1998, Baquero and Tellería 2001). The large amount of divergence across the genome between cryptic species of the field vole points to substantial genetic drift and fixation of alternative alleles during isolation in refugial areas.

If the divergence between forms results in changes that promote reproductive isolation, such as fixation of alternative chromosomal arrangements as is seen in many voles in the genus Microtus (Mazurak et al 2001), lineages will continue to remain evolutionarily distinct even after secondary contact. The three cryptic vole species analysed in this study have the same number of chromosomes, and similar giant sex

26 chromosomes (Giménez et al 2012), but it is possible that there are undetected chromosomal rearrangements that could have contributed to reproductive isolation.

These cryptic vole species may also have fixed genetic differences that result in genetic incompatibilities in hybrid individuals. For example, heterochromatin differences are thought to contribute to reproductive isolation between diverging lineages by triggering genome instability in hybrids, and the field vole is known for its large blocks of heterochromatin repeats, especially on their giant sex chromosomes

(Marchal et al 2004, Dion-Côté and Barbash 2017). There could also be other isolating mechanisms in this group, such as pheromones or urinary proteins that are known to be important for species recognition in rodents, or in genes involved with seminal proteins that may result in incompatibilities expressed in hybrids (Payseur and

Nachman 2005, Logan et al 2008).

Subdivision into isolated glacial refugia may be particularly important in the generation of cryptic species complexes (Hope et al 2012, Weir et al 2016). There may be a lag between genetic and morphological differentiation in systems where there is no strong selection for gross phenotypic divergence, although divergence may occur in other traits important for maintaining reproductive isolation between these cryptic species (Cardé et al 1978, Fier et al 2018). This contributes to the growing number of eamples of seemingl nonadaptie radiations of crptic species on continents, in which speciation lacks the accompanying morphological differentiation typically epitomised by adaptive radiations (Zelditch et al 2015, Mathias et al 2017). However, environmental niche modelling has shown that these voles have indeed diverged in some aspects of their ecolog and therefore labelling this radiation as nonadaptie

27 does not capture differences in their ecology and physiology that underlie their differences in climatic niche. Other groups, such as the speciose genus Rattus, have likewise shown high rates of speciation without obvious morphological differentiation, and this may represent a common trend in rodent groups with rapid speciation (Rowe et al 2011, Maestri et al 2017). This trend is already becoming apparent within Microtus. A recent study has revealed two highly genetically distinct populations of M. californicus, that also are currently considered one population and lack obvious morphological differentiation (Lin et al 2018). These populations diverged less than 50,000 years ago, have no obvious ecological or geographic barriers separating them, and yet show no evidence of nuclear gene flow (Lin et al 2018).

Cryptic species are often overlooked, yet they are an important component of biodiversity, and the identification of cryptic species can reveal their unique ecological roles and evolutionary potential (Bickford et al 2007, Geller 1999). Genomic tools have much potential in the discovery of cryptic species and genomic differences reveal the underlying processes that restrict gene flow and cause differentiation between forms (Funk et al 2012). The three cryptic species of field vole complex not only differ greatly at a genome-wide level, but they have diverged in ecological traits that underpin differences in individual climatic niches. Clearly it would be interesting in the future to link genotype and phenotype and establish the genomic basis of climatic adaptation in these cryptic species.

The field vole is a key primary herbivore and component of the food web in the here the occr, and has been described as the most important dietary part of more species of predators than perhaps any other mammal due mainly

28 to its nmerical abndance (Mathias et al 2017). Our study provides further evidence in support of three recently recognised, taxonomically distinct, cryptic species in the field ole (Paprio et al 2012, Krtfek 2017). On these grounds it is important to understand the basis of ecological and physiological divergence in these cryptic species and the specific role that they have in their respective environments.

Environmental modelling show that the niche of the Portuguese cryptic species is shrinking under current climate change (Figure S1.6), and this lineage already shows the highest level of inbreeding of any of the cryptic taxa. As recommended in Paupério et al (2012), further evaluation of the Portuguese cryptic species in particular is needed to examine the future viability of populations in its range, and the putative need to implement conservation measures to preserve this species. This species may be locally adapted to a habitat that is disappearing given current climate change scenarios.

29 REFERENCES

Acevedo, P., Ward, A.I., Real, R. & Smith, G.C. (2010). Assessing biogeographical relationships of ecologically related species using favourability functions: a case study on British deer. Diversity and Distributions, 16, 515528.

Acevedo, P., Jiménez-Valverde, A., Lobo, J.M. & Real, R. (2012). Delimiting the geographical background in species distribution modelling. Journal of Biogeography,

39, 13831390.

Acevedo, P. & Real, R. (2012). Favourability: concept, distinctive characteristics and potential usefulness. Naturwissenschaften, 99, 515522.

Acevedo, P., Melo-Ferreira, J., Farelo, L., Beltran-Beck, B., Real, R., Campos, R., &

Alves, P. C. (2015). Range dynamics driven by Quaternary climate oscillations explain the distribution of introgressed mt DNA of Lepus timidus origin in hares from the Iberian Peninsula. Journal of Biogeography, 42, 1727-1735.

Avise, J. C. (2000). Phylogeography: the History and Formation of Species. Harvard

University Press, Cambridge MA.

30 Baquero, R. A., & Tellería, J. L. (2001). Species richness, rarity and endemicity of

European mammals: a biogeographical approach. Biodiversity & Conservation, 10,

29-44.

Barton, N. H. & Hewitt, G. M. (1985). Analysis of hybrid zones. Annual Review of

Ecology and Systematics. 16, 113-148.

Bickford, D., Lohman, D. J., Sodhi, N. S., Ng, P. K., Meier, R., Winker, K., Ingram,

K.K., & Das, I. (2007). Cryptic species as a window on diversity and conservation.

Trends in Ecology & Evolution, 22, 148-155.

Bilton, D. T., Mirol, P. M., Mascheretti, S., Fredga, K., Zima, J., & Searle, J. B.

(1998). Mediterranean Europe as an area of endemism for small mammals rather than a source for northwards postglacial colonization. Proceedings of the Royal Society of

London B, 265, 1219-1226.

Braconnot, P., Otto-Bliesner, B., Harrison, S., Joussaume, S., Peterchmitt, J.-Y., Abe-

Ouchi, A., Crucifix, M., Driesschaert, E., Fichefet, T., Hewitt, C. D., Kageyama, M.,

Kitoh, A., Laîné, A., Loutre, M.-F., Marti, O., Merkel, U., Ramstein, G., Valdes, P.,

Weber, S.L., Yu, Y. & Zhao, Y. (2007). Results of PMIP2 coupled simulations of the

Mid-Holocene and Last Glacial Maximum Part 1: experiments and large-scale features. Climate of the Past, 3, 261277.

31 Brawand, D., Wagner, C. E., Li, Y. I., Malinsky, M., Keller, I., Fan, S., Simakov, O.,

Ng, A.Y., Lim, Z.W., Bezault, E. & Turner-Maier, J. (2014). The genomic substrate for adaptive radiation in African cichlid fish. Nature, 513, 375-381.

Broughton, R. E. & Harrison, R. G. (2003). Nuclear gene genealogies reveal historical, demographic and selective factors associated with speciation in field crickets. Genetics, 163, 1389-1401.

Cardé, R. T., Roelofs, W. L., Harrison, R. G., Vawter, A. T., Brussard, P. F., Mutuura,

A., & Munroe, E. (1978). European corn borer: pheromone polymorphism or sibling species? Science, 199, 555-556.

Chaline, J., Brunet-Lecomte, P., Montuire, S., Viriot, L., & Courant, F. (1999).

Anatomy of the arvicoline radiation (Rodentia): palaeogeographical, palaeoecological history and evolutionary data. Annales Zoologici Fennici, 36, 239-267.

Danecek, P., Auton, A., Abecasis, G., Albers, C.A., Banks, E., DePristo, M.A.,

Handsaker, R.E., Lunter, G., Marth, G.T., Sherry, S.T. & McVean, G. (2011). The variant call format and VCFtools. Bioinformatics, 27, 2156-2158.

Diniz-Filho, J. A. F., Gouveia, S. F., & Lima-Ribeiro, M. S. (2013). Evolutionary macroecology. Frontiers of Biogeography, 5, 195-203.

32 Dion-Côté, A. M., & Barbash, D. A. (2017). Beyond speciation genes: an overview of genome stability in evolution and speciation. Current Opinion in Genetics &

Development, 47, 17-23.

Ehrich, D., Yoccoz, N. G., & Ims, R. A. (2009). Multi-annual density fluctuations and habitat size enhance genetic variability in two northern voles. Oikos, 118, 1441-1452.

Elith, J., Kearney, M. & Phillips, S. (2010). The art of modelling range-shifting species. Methods in Ecology and Evolution, 1, 330342.

Elshire, R. J., Glaubitz, J. C., Sun, Q., Poland, J. A., Kawamoto, K., Buckler, E. S., &

Mitchell, S. E. (2011). A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PloS One, 6, e19379.

Evanno, G., Regnaut, S., & Goudet, J. (2005). Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Molecular Ecology,

14, 2611-2620.

Feder, J. L., Egan, S. P., & Nosil, P. (2012). The genomics of speciation-with-gene- flow. Trends in Genetics, 28, 342-350.

Feulner, P. G. D., Kirschbaum, F., Schugardt, C., Ketmaier, V., & Tiedemann, R.

(2006). Electrophysiological and molecular genetic evidence for sympatrically

33 occurring cryptic species in African weakly electric fishes (Teleostei: Mormyridae:

Campylomormyrus). Molecular Phylogenetics and Evolution, 39, 198-208.

Fier, C., Robinson, C. T., & Malard, F. (2018). Cryptic species as a window into the paradigm shift of the species concept. Molecular Ecology, 27, 613-635.

Fløjgaard, C., Normand, S., Skov, F., & Svenning, J.-C. (2009). Ice age distributions of European small mammals: insights from species distribution modelling. Journal of

Biogeography, 36, 1152-1163.

Funk, W. C., McKay, J. K., Hohenlohe, P. A., & Allendorf, F. W. (2012). Harnessing genomics for delineating conservation units. Trends in Ecology & Evolution, 27, 489-

496.

Geller, J. B. (1999). Decline of a native mussel masked by sibling species invasion.

Conservation Biology, 13, 661-664.

Giménez, M. D., Paupério, J., Alves, P. C., & Searle, J. B. (2012). Giant sex chromosomes retained within the Portuguese lineage of the field vole (Microtus agrestis). Acta Theriologica, 57, 377-382.

Haffer, J. (1969). Speciation in Amazonian forest birds. Science, 165, 131-137.

34 Hellborg, L., Gündüz, ., & Jaarola, M. (2005) Analysis of sex-linked sequences supports a new mammal species in Europe. Molecular Ecology, 14, 20252031.

Herman, J. S., McDeitt, A. D., Kaako, A., Jaarola, M., Wjcik, J. M., & Searle, J.

B. (2014). Land-bridge calibration of molecular clocks and the post-glacial colonization of Scandinavia by the Eurasian field vole Microtus agrestis. PLoS One, 9, e103949.

Herman, J. S., & Searle, J. B. (2011). Post-glacial partitioning of mitochondrial genetic variation in the field vole. Proceedings of the Royal Society B, 278, 3601-

3607.

Hewitt, G. M. (1996). Some genetic consequences of ice ages, and their role in divergence and speciation. Biological Journal of the Linnean Society, 58, 247-276.

Hewitt, G. (2000). The genetic legacy of the Quaternary ice ages. Nature, 405, 907-

913.

Hijmans, R. J., Cameron, S. E., Parra, J. L., Jones, P. G. & Jarvis, A. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of

Climatology, 25, 1965-1978.

35 Hope, A. G., Speer, K. A., Demboski, J. R., Talbot, S. L., & Cook, J. A. (2012). A climate for speciation: rapid spatial diversification within the Sorex cinereus complex of shrews. Molecular Phylogenetics and Evolution, 64, 671-684.

Jaarola, M., Martnko, N., Gnd, ., Brnhoff, C., Zima, J., Nadachowski, A.,

Amori, G., Bulatova, N.S., Chondropoulos, B., Fraguedakis-Tsolis, S., González-

Esteban, J., López-Fuster, M. J., Kandaurov, A. S., Kefelioglu, H., da Luz Mathias,

M., Villate, I., & Searle J. B. (2004). Molecular phylogeny of the speciose vole genus

Microtus (, Rodentia) inferred from mitochondrial DNA sequences.

Molecular Phylogenetics and Evolution, 33, 647-663.

Jaarola, M., & Searle, J. B. (2002). Phylogeography of field voles (Microtus agrestis) in Eurasia inferred from mitochondrial DNA sequences. Molecular Ecology, 11, 2613-

2621.

Jaarola, M., & Searle, J. B. (2004) A highly divergent mitochondrial DNA lineage of

Microtus agrestis in southern Europe. Heredity, 92, 228234.

Jiménez-Valverde, A., Acevedo, P., Barbosa, A. M., Lobo, J. M., & Real, R. (2013).

Discrimination capacity in species distribution models depends on the representativeness of the environmental domain. Global Ecology and Biogeography,

22, 508-516.

36 Klicka, J., & Zink, R. M. (1997) The importance of recent ice ages in speciation: a failed paradigm. Science 277, 16661669.

Kohn, M. H., Murphy, W. J., Ostrander, E. A., & Wayne, R. K. (2006). Genomics and conservation genetics. Trends in Ecology & Evolution, 21, 629-637.

Krtfek, B. (2017). Handbook of the Mammals of the World. Vol. 7. Rodents II, D.

E. Wilson, T. E. Lacher, Jr. & R. A. Mittermeier (Chief Eds.). Lynx Edicions,

Barcelona.

Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows

Wheeler transform. Bioinformatics, 25, 1754-1760.

Lin, D., Bi, K., Conroy, C. J., Lacey, E. A., Schraiber, J. G., & Bowie, R. C. (2018).

Mitonclear discordance across a recent contact zone for California voles. Ecology and Evolution, 8, 6226-6241.

Lischer, H. E. L. & Excoffier, L. (2012). PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28,

298-299.

Liu, C., White. M., & Newell, G. (2013). Selecting thresholds for the prediction of species occurrence with presence-only data. Journal of Biogeography, 40, 778789.

37

Logan, D. W., Marton, T. F., & Stowers, L. (2008). Species specificity in major urinary proteins by parallel evolution. PLoS One, 3, e3280.

Lovette, I. J. (2005). Glacial cycles and the tempo of avian speciation. Trends in

Ecology & Evolution, 20, 57-59.

Maestri, R., Monteiro, L. R., Fornel, R., Upham, N. S., Patterson, B. D., & Freitas, T.

R. O. (2017). The ecology of a continental evolutionary radiation: is the radiation of sigmodontine rodents adaptive? Evolution, 71, 610-632.

Marchal, J. A., Acosta, M. J., Nietzel, H., Sperling, K., Bullejos, M., de la Guardia, R.

D., & Sánchez, A. (2004). X chromosome painting in Microtus: origin and evolution of the giant sex chromosomes. Chromosome Research, 12, 767-776.

Mathias, M. L., Hart, E. B., Ramalhinho, M. G., & Jaarola, M. (2017). Microtus agrestis (Rodentia: ). Mammalian Species, 49, 23-39.

Mazurok, N. A., Rubtsova, N. V., Isaenko, A. A., Pavlova, M. E., Slobodyanyuk, S.

Y., Nesterova, T. B., & Zakian, S. M. (2001). Comparative chromosome and mitochondrial DNA analyses and phylogenetic relationships within common voles

(Microtus, Arvicolidae). Chromosome Research, 9, 107-120.

38 McGraw, L. A., Davis, J. K., Young, L. J., & Thomas, J. W. (2011). A genetic linkage map and comparative mapping of the (Microtus ochrogaster) genome.

BMC Genetics, 12, 60.

Mitchell-Jones, A. J., Giovanni, A., Bogdanowicz, W., Krystufek, B., Reijnders, P. J.

H., Spitzenberger, F., Stubbe, M., Thissen, J. B. M., Vohralik, V., & Zima, J. (1999).

The Atlas of European Mammals. London: Academic Press.

Morit, C. (1994). Defining eoltionaril significant nits for conseration. Trends in Ecology & Evolution, 9, 373-375.

Musser G.M & Carleton M.D. (2005) Superfamily Muroidea. In: Wilson, D. E., &

Reeder, D. M. (Eds.). Mammal Species of the World: a Taxonomic and Geographic

Reference. (pp. 989-1020) Baltimore, MD: Johns Hopkins University Press.

Narins, P. M. (1983). Divergence of acoustic communication systems of two sibling species of eleutherodactylid frogs. Copeia, 4, 1089-1090.

Paupério, J., Herman, J. S., Melo-Ferreira, J., Jaarola, M., Alves, P. C., & Searle, J. B.

(2012). Cryptic speciation in the field vole: a multilocus approach confirms three highly divergent lineages in Eurasia. Molecular Ecology, 21, 6015-6032.

39 Payseur, B. A. & Nachman, M. W. (2005). The genomics of speciation: investigating the molecular correlates of X chromosome introgression across the hybrid zone between Mus domesticus and Mus musculus. Biological Journal of the Linnean

Society, 84, 523-534.

R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-

07-0, URL http://www.R-project.org.

Rand, A. L. (1948). Glaciation, an isolating factor in speciation. Evolution, 2, 314-321.

Ravaoarimanana, I. B., Tiedemann, R., Montagnon, D., & Rumpler, Y. (2004).

Molecular and cytogenetic evidence for cryptic speciation within a rare endemic

Malagasy lemur, the northern sportive lemur (Lepilemur septentrionalis). Molecular

Phylogenetics and Evolution, 31, 440-448.

Real, R., Barbosa, A.M., & Vargas, J.M. (2006). Obtaining environmental favourability functions from logistic regression. Environmental and Ecological

Statistics, 13, 237245.

Richards, C. L., Carstens, B. C., & Knowles, L. L. (2007) Distribution modelling and statistical phylogeography: an integrative framework for generating and testing alternative biogeographical hypotheses. Journal of Biogeography, 34, 1833-1845.

40

Rowe, K. C., Aplin, K. P., Baverstock, P. R., & Moritz, C. (2011). Recent and rapid speciation with limited morphological disparity in the genus Rattus. Systematic

Biology, 60, 188-203.

Sáez, A. G., & Lozano, E. (2005). Body doubles. Nature, 433, 111.

Sánchez-Fernández, D., Lobo, J. M., & Hernández-Manrique, O. L. (2011). Species distribution models that do not incorporate global data misrepresent potential distributions: a case study using Iberian diving beetles. Diversity and Distributions,

17, 163-171.

Sanders, K. L., Malhotra, A., & Thorpe, R. S. (2004). Ecological diversification in a group of Indomalayan pitvipers (Trimeresurus): convergence in taxonomically important traits has implications for species identification. Journal of Evolutionary

Biology, 17, 721-731.

Searle, J. B. (1993). Chromosomal hybrid zones in eutherian mammals. In: Harrison

R. G. (Ed.). Hybrid Zones and the Evolutionary Process. (pp. 309353) New York:

Oxford University Press.

41 Stewart, J. R., Lister, A. M., Barnes, I., & Dalén, L. (2010). Refugia revisited: individualistic responses of species in space and time. Proceedings of the Royal

Society B, 277, 661-671.

Taberlet, P., Fumagalli, L., Wust-Saucy, A. G., & Cosson, J. F. (1998). Comparative phylogeography and postglacial colonization routes in Europe. Molecular Ecology, 7,

453-464.

Thomas, M. A. & Klaper, R. (2004). Genomics for the ecological toolbox. Trends in

Ecology & Evolution, 19, 439-445.

Triant, D. A. & DeWoody, J. A. (2006). Accelerated molecular evolution in Microtus

(Rodentia) as assessed via complete mitochondrial genome sequences. Genetica, 128,

95-108.

Wang, C., Szpiech, Z. A., Degnan, J. H., Jakobsson, M., Pemberton, T. J., Hardy, J.

A., Singleton, A.B., & Rosenberg, N. A. (2010). Comparing spatial maps of human population-genetic variation using Procrustes analysis. Statistical Applications in

Genetics and Molecular Biology, 9, 1.

Waples, R. S. (1991). Pacific salmon, Oncorhynchus spp., and the definition of

"species" under the Endangered Species Act. Marine Fisheries Review, 53, 11-22.

42 Weir, J. T., & Schluter, D. (2004). Ice sheets promote speciation in boreal birds.

Proceedings of the Royal Society of London B, 271, 1881-1887.

Weir, J. T., Haddrath, O., Robertson, H. A., Colbourne, R. M., & Baker, A. J. (2016).

Explosive ice age diversification of kiwi. Proceedings of the National Academy of

Sciences USA, 113, E5580-E5587.

Werkowska, W., Márquez, A. L., Real, R., & Acevedo, P. (2016). A practical overview of transferability in species distribution modeling. Environmental Reviews,

25, 127-133.

Wilson, D. E., & Reeder, D. M. (Eds.). (2005). Mammal Species of the World: a

Taxonomic and Geographic Reference. Vol. 2. John Hopkins University Press,

Baltimore MD.

Zelditch, M. L., Li, J., Tran, L. A., & Swiderski, D. L. (2015). Relationships of diversity, disparity, and their evolutionary rates in squirrels (Sciuridae). Evolution, 69, 1284-1300.

43 CHAPTER 2

POPULATION HISTORY AND GENETIC STRUCTURE IN THE WESTERN ATLANTIC SURFCLAM (SPISULA SOLIDISSIMA SP.) Abstract

Understanding population history and genetic connectivity is important for assessing the long-term viability of populations. The Atlantic surfclam (Spisula solidissima) has historically been subdivided into two subspecies: lesser known S. s. similis in the South Atlantic Bight and Gulf of Mexico, and commercially important S. s. solidissima mostly north of Cape Hatteras. Recently, novel populations of the

sothern S. s. similis subspecies were identified far north of their historical range limit. To test a hypothesis of recency and isolation in the North the population sampling of S. s. similis was extended to provide the first population genetic comparison to a population sample from Georgia, USA, as well as contrasts with S. s. solidissima. Population structure and demographic history were inferred using genetic variation at a combination of microsatellite, mtDNA and nuclear intron loci. These recently discovered northern populations of S. s. similis showed a slight reduction in genetic diversity but high levels of connectivity with southern populations. Genetic differentiation was weak and only significant for microsatellite data. All S. s. similis populations had extremely low mtDNA diversity, with only 2 mtCOI haplotypes found across the range in contrast to 15 haplotypes in S. s. solidissima. Based on coalescent modeling we conclude that the northern S. s. similis populations most likely represent a post-glacial expansion rather than a recent introduction, and it is

44 hypothesized that there are unidentified stepping-stone populations connecting S. s. similis populations throughout its range.

Introduction

It is valuable to assess the population history and genetic connectivity of populations to assess their long-term viability and threat of extinction (Allendorf et al

2009, Hellberg 2009, Nance et al 2011). Estimating population genetic parameters, such as effective population size and gene flow, is especially important for the management of populations that are commercially harvested so that management decisions ensure the sustainable commercial success of the industry even while environments shift ever more rapidly from climate change (Hare et al 2011, Hellberg

2009). Marine species with high fecundity, dispersal by planktonic larval stages, and large ranges have historically been assumed to have large effective population sizes and high gene flow (Kuparinen et al 2016), but studies have increasingly shown that there can be population structure at a local scale and small effective population sizes that leave these populations vulnerable to local extirpation and extinction (e.g. Hauser et al 2002, Hoarau et al 2005; Olafsdottir et al 2014; Sinclair-Waters 2017). The use of genetic markers as tools for estimating historical demography is especially valuable for understanding organisms with complex life history where traditional methods such as mark-and-recapture techniques are difficult or nearly impossible to implement

(Hellberg 2009, Nance et al 2011), and in systems where there are cryptic species or populations that are challenging to diagnose using morphology alone (Chen and Hare

2008, Ovenden et al 2010, Thomas et al 2014).

45 The Atlantic surfclam, Spisula solidissima is a common mollusk that inhabits sandy sediments in the western Atlantic from the Gulf of Mexico to the Gulf of St.

Lawrence (Abbott 1974). Current subdivides the species into two subspecies that differ in distribution, life history traits, morphology, and DNA sequence variation (Hare and Weinberg 2005). The commercially important Atlantic surfclam, Spisula solidissima solidissima (Dillwyn, 1817), is distributed latitudinally from the Gulf of St. Lawrence in eastern Canada to Cape Hatteras in North Carolina

(Abbott 1974) and ranges from the shallow, coastal subtidal zone to 50 m depth

(Weinberg 1993). This subspecies supports one of the largest bivalve fisheries in the

U.S., with landings of 18.4 metric tons of clam meat valued at $30.5 million in 2015

(National Marine Fisheries Service 2016). The southern Atlantic surfclam, Spisula solidissima similis (Say 1822; syn. Spisula raveneli (Jacobson and Old 1966)) has historically been reported to have a distribution from Cape Hatteras to the Gulf of

Mexico in shallow coastal habitats (Andrews 1971, Abbott 1974). A recent study however, documented a previously-unidentified population of S. s. similis living in

Long Island Sound (Hare et al 2010), substantially expanding the known latitudinal range of this subspecies. Long Island Sound supported a commercial fishery on these

S. s. similis clams valued at $1 million annually as recently as 2003 (NYSDEC unpubl. data). The two subspecies differ in maximum reported shell length for the habitats where they are abundant, up to 200 mm for offshore S. s. solidissima and up to 122 mm for Gulf of Mexico S. s. similis (Weinberg 1999, Walker and Heffernan

1994). However, where they co-occur in nearshore habitats the two subspecies have similar maximum size which makes them difficult to distinguish morphologically

46 (Hare et al 2010). Fortunately, they can be distinguished unequivocally with genetic markers; the two subspecies are reciprocally monophyletic based on mitochondrial and nuclear markers, and separated by net sequence divergence of 13.9% at mitochondrial cytochrome oxidase I (Hare and Weinberg 2005, Hare et al 2010).

These genetic patterns are consistent with recognition as distinct biological species by some definitions, but here we continue to use the existing subspecies taxonomy

(Kartavtsev 2011).

The recent discovery of S. s. similis in Long Island Sound (LIS), (Hare et al

2010) represents a ca. 800 km shift in the recognized northern edge of its range. In this study we characterize an additional southern New England population at Washburn

Island, Massachusetts (MA) to address the question: what is the origin of these northern populations of S. s. similis that are seemingly isolated from the typical range of S. s. similis? Two competing hypotheses stand out to explain this pattern: 1) the LIS and MA populations of S. s. similis are relictual populations that represent natural colonization after the Pleistocene (Maggs et al 2008) and have escaped identification because of their morphological similarity to nearshore S. s. solidissima or 2) these

sothern surfclams are a recent introduction by humans or northern range extension due to climate change and other anthropogenic effects. Temperatures in the Atlantic hae been rising since the 1960s (Leits et al 2000, Friedland and Hare 2007) and coastal Atlantic species are expected to contract their shallow water range margin and shift poleward in response (Scavia et al 2002). In fact, these range shifts have been detected for S. s. solidissima (Weinberg 2005; Powell et al 2017) and a warming trend is one hypothesis proposed to explain diminishing adult surfclam abundance in

47 nearshore waters of New Jersey and New York (0-3 miles from shore, but with more dramatic trends closer to shore; NFSC 2017). Genetic markers can be used to shed light on the history of these populations and test for a recent introduction or colonization (Grosholz 2002, Purcell and Stockwell 2015, Luo et al 2018).

To address these hypotheses the population history of the two S. solidissima subspecies were reconstructed, as well as estimates of both historical and contemporary measures of effective population size and levels of gene flow between populations. This study expands genetic sampling of Spisula solidissima populations in both the northern and southern part of its currently known range. The use of multiple classes of genetic markers, microsatellites, mtDNA and nuclear introns, allows us to draw inferences about demographic and historical processes at multiple time scales (Hellberg 2009). A strong population bottleneck caused by recent anthropogenic introduction would be expected to cause reductions in genetic diversity in northern S. s. similis populations as well as differentiation by genetic drift between the northern and southern populations. In contrast, post-glacial expansion from a southern refugia is likely to be associated with weaker north-south differentiation and no bottleneck signal for northern S. s. similis, especially if stepping stone populations

(not currently known) help maintain gene flow connectivity between southern and northern populations.

Materials and Methods

Sampling and DNA extraction

48 A total of 275 individual S. solidissima sp. were collected from 4 southern New

England and New York localities (Fig. 2.1). Three populations of S. s. similis were sampled: 25 from Georgia (Sim-GA), 67 from Washburn Island, Massachusetts (Sim-

MA), and 86 from three different sites in the Long Island Sound (29 from Sim-LIS1,

30 from Sim-LIS2, and 27 from Sim-LIS3; Hare et al 2010). Two populations of S. s. solidissima were sampled: 78 from the nearshore shelf along the South shore of Long

Island (Sol-NY) and 20 from Provincetown, Massachusetts (Sol-PT) (Fig 2.1). All samples were collected for this study except Sim-LIS, for which previous genetic analyses only included a species diagnostic RFLP marker (Hare et al 2010).

Specimens were collected by hand (MA) or dredge, and adductor muscle was dissected from each individual and preserved in 90% ethanol for subsequent molecular analyses.

49 45

● 4 ● ● 2 3 ● 40 5 e d u t i t a

L 35

● 1

30

25

85 80 75 70 Longitude

Figure 2.1: Map of sampling localities of Spisula solidissima sp. along western Atlantic coast: 1= Georgia S. s. similis (Sim-GA); 2= Long Island Sound S. s. similis, (Sim-LIS); 3= Washburn Island, Massachusetts S. s. similis, (Sim-MA); 4= Provincetown, Massachusetts S. s. solidissima, (Sol-PT) and 5= Long Island south shore S. s. solidissima, (Sol-NY).

All samples were used for microsatellite analyses whereas mtDNA and nucDNA variation was assayed only in a subset of 85 samples from Georgia, South shore of Long Island, New York, and Washburn Island, Massachusetts (Table 2.1).

50 Genomic DNA was extracted using DNeasy Tissue Kits following the manufacturers protocol (Qiagen, Venlo, Limburg, Netherlands).

Table 2.1: Summary of individuals of Spisula solidissima solidissima and S. s. similis populations used for microsatellite, mitochondrial COI (mtCOI), and nuclear DNA intron analyses.

N N mtCOI and Subspecies Population Location Lat. Long. microsat nDNA Introns S. s. solidissima Long Island Shelf, Sol-NY 40.61° N -73.2° W 78 30 NY Sol-PT Provincetown, MA 42.04° N -70.15° W 20 -- S. s. similis Sim-GA GA 31.70° N -81.143° W 24 25 Long Island Sound, Sim-LIS 40.98° N -72.99° W 86 -- NY Washburn Island, Sim-MA 41.5461° N -70.5298° W 67 30 MA

Microsatellite loci and scoring

Nine microsatellite loci amplified and scored in all 276 individuals were developed for use in both S. solidissima subspecies (Hare and Borchardt-Wier 2014).

These microsatellites included dinucleotide (Ssim23440, Ssim3823, Ssim5132), trinucleotide (Ssim1303, Ssim9017, Ssim10007, Ssim948, Ssim9323), and tetranucleotide (Ssim1436) motifs. The amplicon size range of the microsatellite loci was 98- 387 bp (Hare and Borchardt-Wier 2014). Polymerase chain reaction (PCR) amplification conditions are given in the Appendix.

Mitochondrial amplification and sequencing

Mitochondrial cytochrome oxidase I (mtCOI) was amplified from each sample by PCR using Spis-L (5-GGKACKGGKTTAGKATNAT-3) and Spis-R (5-

51 AAGTATTAAAATGNCGATCAGT-3) degenerate primers (Hare and Weinberg

2005). The PCR reagents and conditions are given in the Appendix. Purification for sequencing used ExoSAP-IT (Affymetrix, Santa Clara, CA) following the manfactrers protocol. Both forard and reerse DNA strands ere Sanger sequenced at the Cornell University Biotechnology Resource Center using the PCR primers.

Nuclear DNA amplification and sequencing

For the nuclear loci, custom primers were designed for use in a two-step PCR protocol with the Nextera Index Kit (Illumina, San Diego, CA) to allow for pooling of PCR products by individual for subsequent sequencing. This protocol allows for multiplexing loci for each indexed individual to be run on a single Illumina sequencing lane with subsequent bioinformatic parsing of the data by barcode and locus. Primers for a total of 6 loci were developed but here we focus on three that passed quality control after sequencing (see below).

The following primers were designed from existing primers by adding sequences recognizable by the Nextera indexing kit: NEXCAL-2 (5-

TCGTCGGCAGCGTCGTGTCCTTCATTTTNCKTGCCATCAT-3) and NEXCAL-

1 (5-GTCTCGTGGGCTCGGGCCGAGCTGCARGAYATGATCAA-3) derived from the nuclear intron calmodulin (Cal-A) primers Cal-2 and Cal-1 (Duda and

Palumbi 2009), SsimNEX3823-F (5-

TCGTCGGCAGCGTCAACTATGCAGCATGACATTTAGC-3) and

SsimNEX3823-R (5-

52 GTCTCGTGGGCTCGGGAGAGGTGGGATGCATTGTATTG-3) deried from the microsatellite primers Ssim3823-F and Ssim 3823-R (Hare and Borchardt-Wier 2014), and SsimNEX9017-F (5- TCGTCGGCAGCGTCTTGCCTTACCCCATGTCTGG-

3) and SsimNEX9017-R (5-

GTCTCGTGGGCTCGGGCTCTTTATGTACAATTGACTGCTG-3) deried from the microsatellite primers Ssim9017 F and Ssim 9017 R (Hare and Borchardt-Wier

2014). The PCR reagents and conditions are given in the Appendix for the primary

PCR as well as a second reaction to anneal unique combinatorial Nextera i7 and i5 indexing primers to the primary PCR products. PCR amplicon purification was performed using AMPure XP beads by following the PCR Clean-Up protocol in the

Nextera DNA Sample Preparation Guide pp. 31-33 (Illumina Inc. San Diego, CA). To better retain amplicons of smaller sie, 85 l of Ampre beads ere added, 3.4x the recommended amount per the manufacturers instructions. The multiplex, indexed reaction products for each individual were pooled and sequenced on the Illumina

MiSeq platform at the Biotechnology Resource Center at Cornell University in a single lane using 2x250 bp paired-end reads.

Sequence filtering and alignment

Sanger mtCO1 sequences from both DNA strands were trimmed manually, assembled for each individual, and aligned across individuals using Sequencher 5.4 (GeneCodes) default parameters.

Nuclear loci were filtered for sequencing error using custom perl scripts. See

Appendix for detailed information on the filtering steps. After filtering, all nuclear

53 sequences were assembled and aligned by locus using GENEIOUS PRO. 5.3.4

(Drummond et al 2011). Microsatellite repeat regions for nuSsim3823 and nuSsim9017 were visually identified and trimmed from the alignment, concatenating the flanking sequences.

Genetic diversity

Within population genetic diversity for microsatellites was estimated in terms of observed and expected heterozygosity as well as allelic richness (average number of alleles per locus after controlling for sample size variation), all calculated with Fstat ver. 2.9.32 (Goudet 2005). Differences in diversity between geographic regions was tested with 10,000 permutations in Fstat. Variation at DNA sequence markers also was quantified in terms of observed and expected heterozygosity (gene diversity) in addition to nucleotide diversity (), haplotype diversity (h), and theta () in terms of segregating sites (S) and homozygosity (Hom), calculated using ARLEQUIN ver. 3.5

(Excoffier and Lischer 2010).

Population structure

To test for genetic structure using the mtDNA and nuDNA sequences a hierarchical

Analysis of Molecular Variance (AMOVA) in ARLEQUIN 3.5 was used, with significance testing over 1000 permutations. Differentiation was estimated using microsatellite data among population samples using FST as calculated in Genepop, with significance testing based on an exact test. Assignment tests implemented in

STRUCTURE v. 2.3.3 also were used to infer the number of differentiated populations

54 and test for admixture between populations based on microsatellite data. Assignment tests use individual specimens as the unit of analysis and can estimate the proportional contributions of 2+ source populations within admixed individuals. To test for admixture between subspecies, analyses of the 6 microsatellites that had the lowest missing data (< 5%) for both subspecies together were performed. All 5 populations were analyzed with K = 1-7 in an admixture model, with 100,000 steps and a burn-in of 20,000 steps. To infer the number of differentiated populations among the 3 S. s. similis population samples (Sim-GA, Sim-MA, Sim-LIS), all 9 microsatellites were analyzed with K = 1-5 in an admixture model, with 100,000 steps and a burn-in of

20,000 steps.

Demographic history

Contemporary effective population sizes were estimated based on linkage disequilibrium occurring in the microsatellite data using NeEstimator ver. 2, ignoring alleles with frequency < 0.02 to avoid known biases (Do et al 2014). Signals of demographic epansion sing Tajimas D, and Fs FS statistics were tested against a neutral model using 1000 simulated samples in ARLEQUIN 3.5. Although originally deeloped as tests for selection, Tajimas D and Fs F statistic can test for signals of recent demographic expansion when selection is not present (Tajima 1989, Fu 1997).

IMa2 was used (Hey 2010) to estimate demographic parameters of contemporary and ancestral relative poplation sies (), migration rates (m), and divergence times (t) for

Sim-GA, Sim-MA and Sol-NY populations, using nuclear intron sequences from

NuSsim3823, NuSsim9017, and CAL-A loci. The mitochondrial locus mtCOI was not

55 included as it was invariable for Sim-MA and we found only 2 haplotypes for Sim-

GA, and thus not informative for these analyses. Attempts to include microsatellite variation in coalescent modeling also proved ineffective, as models including the microsatellite variation would not converge. IMa2 was run on the Center for

Biotechnology computing clusters at Cornell University. After exploring parameter space with a series of pilot IMa2 runs, we used priors on the model parameters in M- mode with maximum migration (m)= 2, population size maximum value (q) =40 and maximum splitting time (t)= 20. A divergence with gene flow model was implemented such that historical migration was tested between Sim-GA and Sim-MA sister populations, as well as between Sol-NY and the lineage that includes Sim-GA+Sim-

MA. In addition, population sizes, migration rates, and divergence times were all estimated including only the Sim-GA and Sim-MA populations. Analyses were run with 107 MCMC steps following a burn-in of 10,000 steps, using an HKY mutation model.

Results

Genetic diversity and demographic analyses

Microsatellite allele frequency distributions were similar in the two subspecies, supporting the assumption that homologous loci are being amplified (Appendix).

Microsatellite heterozygosity and allelic richness were significantly higher in S. s. similis than in S. s. solidissima when tested with all loci except Ssim1436, which had

68% missing data in the latter subspecies. Another locus, Ssim10007, had 35% missing data and was nearly monomorphic in S. s. solidissima. After removing both

56 these loci the seven remaining microsatellites had no significant difference in genetic diversity between subspecies. The three nDNA loci also had 22% more observed heterozygosity in Sim-GA S. s. similis than in Sol-NY S. s. solidissima on average, but other nDNA metrics did not show a consistent trend. In contrast, nucleotide diversity at mtCOI was 4 times higher in Sol-NY (=0.012) than in Sim-GA and Sim-MA poplations ith =0.0032 and =0, respectiel.

Within S. s. similis, comparisons using 9 microsatellites between the pooled northern (Sim-LIS and Sim-MA) and southern (Sim-GA) regional populations showed a trend for slightly greater diversity in Georgia, but no significant difference in allelic richness (southern (S)= 6.330 and northern (N)= 5.648), expected heterozygosity (HS)

(S=0.704 and N= 0.666), and inbreeding coefficient (FIS) (S=0.173 and N=0.181).

This trend for greater southern diversity was also found with mtCOI and 2 out of 3 of the nDNA loci (Table 2.2).

Table 2.2: Summary statistics for genetic diversity and parameters of demographic expansion for mtCOI and nuclear introns nuSsim3823, nuSsim9017, and CAL-A for three populations: Long Island (Sol-NY) S. s. solidissima, Massachusetts (Sim-MA) and Georgia (Sim-GA) S. s. similis.

Sample Locality (total n individuals)

Sim-MA Sim-GA Marker Summary statistic Sol-NY (30) (30) (25) mtCO1 467bp (n gene copies) 25 30 25 Expected heterozygosity (S.D) 0.38 (0.16) 0.00 (0.00) 0.08 (0.0) Ncleotide diersit 0.0122 0.000173 (S.D.) (0.0068) 0.00 (0.00) (0.00038) Number of haplotypes 15 1 2 Hom (S.D.) NA NA NA 3.972 0.264 S (S.D.) (1.594) 0.00 (0.00) (0.264)

57 Tajima's D 1.54 0.00 -1.157 Fu's Fs -24.888 NA NA

nuSsim3823 288bp (n diploid gene copies) 60 58 42 Expected heterozygosity 0.914 0.892 0.933 Observed heterozygosity 0.767 0.690 0.9524 Ncleotide diersit 0.0372 0.0399 (S.D.) 0.0278 (0.011) (0.02)

9.195 7.039 12.348 Hom (S.D.) (2.713) (3.067) (7.02) 17.282 15.106 S (S.D.) 9.65 (2.90) (5.70) (4.65) Tajima's D -0.760 -2.354 -1.684 Fu's Fs -2.10 -8.332 -8.204

nuSsim9017 257bp (n diploid gene copies) 56 56 42 Expected heterozygosity 0.607 0.717 0.796 Observed heterozygosity 0.666 0.536 0.762 Ncleotide diersit 0.016 0.0077 0.0106 (S.D.) (0.009) (0.0049) (0.009)

1.511 1.945 3.078 Hom (S.D.) (0.496) (0.630) (1.169) 6.095 4.136 4.648 S (S.D.) (1.973) (1.435) (1.653) Tajima's D -1.021 -1.644 -1.384 Fu's Fs -2.84 -5.62 -5.626 CAL-A 450bp (n diploid gene copies) 28 26 12 Expected heterozygosity 0.901 0.951 0.985 Observed heterozygosity 0.643 0.6923 0.8333 Ncleotide diersit 0.084 0.195 0.0876 (S.D.) (0.042) (0.096) (0.0462)

8.180 17.689 63.120 Hom (S.D.) (2.857) (10.599) (175.125) 45.74 78.093 45.366 S (S.D.) (14.54) (24.982) (17.696) Tajima's D -0.662 0.488 -0.613 Fu's Fs 12.966 9.735 0.315

Genetic structure

58 In the analysis of all 7 populations with STRUCTURE, K = 2 had the highest probability, with populations clustering by subspecies (Figure 2.2). The STRUCTURE plot showed evidence for subspecies admixture within a few Sim-MA individuals, possibly representing hybridization events. In the STRUCTURE analysis restricted to

S. s. similis populations K = 3 had the highest probability, but an admixture pattern

(presumed to be an artifact) was present in all individuals (Figure 2.3).

Figure 2.2: STRUCTURE analysis of 6 microsatellites for 5 populations of Spisula spp. At K=2. Location and taxon abbreviations as in Fig. 2.1 legend.

Figure 2.3: STRUCTURE analysis of 9 microsatellites for 3 populations of S. s. similis at K=3. Location and taxon abbreviations as in Fig. 2.1 legend. Results from FST analyses for mtCOI, NuSsim3823, NuSsim9017, and CAL-A are summarized in Table 2.3. For all 4 loci, the S. s. similis populations analyzed (Sim-

GA and Sim-MA) showed significant FST when compared to Sol-NY. Pairwise FST values between Sim-GA and Sim-MA populations of S. s. similis ranged from -0.015

59 to 0.075 and were nonsignificant in all comparisons. With microsatellites, on the other hand, contrasts between the southern New England Sim-LIS and Sim-MA S. s. similis versus Sim-GA resulted in low but significant FST, but no differentiation was detected between Sim-LIS and Sim-MA (Table 2.4).

Table 2.3: Pairwise estimates of Fst based on mtCOI and nuclear intron loci between Long Island S. s. solidissima (Sol-NY), Massachusetts (Sim-MA) and Georgia (Sim-GA) S. s. similis populations. Significant p-values (p<0.05) indicated in bold.

mtCOI Sol-NY Sim-MA Sim-GA Sol-NY * Sim-MA 0.946 * Sim-GA 0.952 0.074 *

nuSsim9017 Sol-NY Sim-MA Sim-GA Sol-NY * Sim-MA 0.540 * Sim-GA 0.488 -0.015 *

nuSsim3823 Sol-NY Sim-MA Sim-GA Sol-NY * Sim-MA 0.176 * Sim-GA 0.130 -0.006 *

CAL-A Sol-NY Sim-MA Sim-GA Sol-NY * Sim-MA 0.181 * Sim-GA 0.260 0.033 *

Table 2.4: GenePop FST matrix based on 9 microsatellite loci for 3 populations of S. s. similis: Georgia (Sim-GA), Washburn Island, Massachusetts (Sim-MA) and Long Island Sound (Sim- LIS). Significant p-values after 60 permutations (p<0.05) indicated in bold.

Sim-GA Sim-LIS Sim-MA Sim-GA * Sim-LIS 0.0137 * Sim-MA 0.0077 0.0052 *

60 Historical demography

Reslts for Tajimas D tests and Fs FS statistic for mtCOI, NuSsim3823,

NuSsim9017, and CAL-A loci are summarized in Table 2.2 (note: only a single mtCOI haplotype was found at Sim-MA). Tajimas D as significantl negatie for the

Sim-MA S. s. similis population at the nuSsim3823 and nuSsim9017 loci, as well as for Sim-GA S. s. similis poplation at nSsim3823. Fs FS was significantly negative for the Sol-NY population at mtCOI, and at nuSsim3823 for both Sim-MA and Sim-

GA.

The IMa2 analysis including all three populations (Sim-MA, Sim-GA and Sol-

NY) resulted in migration estimates =1.03 (.058 StDev) from Sim-GA -> Sim-MA and

= 1.01 (0.57) from Sim-MA -> Sim-GA. Migration rate estimates from Sol-NY->

Sim-GA+Sim-MA lineage = 0.11 (0.11 StDev) and for Sim-GA+Sim-MA -> Sol-NY

= 0.30 (0.15 StDev). Population split estimates were t=0.10 for the split between Sim-

GA and Sim-MA populations and t=16.5 for the split between S. s. similis and S. s. solidissima. Populations size () estimates for each population were Sim-GA= 24.74

(9.93 StDev), Sim-MA= 17.46 (11.12), and Sol-NY= 8.51 (1.94). Ancestral population sizes for Sim-GA + Sim-MA = 25.72 (4.24) and the ancestral population size for all three populations = 22.04 (10.31).

IMA2 analysis including only S. s. similis populations (Sim-MA + Sim-GA) resulted in migration estimates = 1.76 (0.21 StDev) Sim-GA-> Sim-MA and m= 1.45

(0.40) Sim-MA-> Sim-GA. Splitting time (t) was estimated to be t=2.21 between the two populations. Populations size () estimates for each population were Sim-

61 GA=32.67 (5.51) and Sim-MA=10.00 (2.57). The ancestral population size = 26.98

(9.56).

Discussion

Results from genetic analyses of mitochondrial, nuclear intron, and microsatellite data in Atlantic surfclams add to the previous body of evidence that there is strong genetic differentiation between the two S. solidissima subspecies. These results also suggest that there is weak genetic structure in the S. s. similis populations along the western Atlantic coast. Mitochondrial (mt)COI and nuclear intron sequences were analyzed for northern and southern populations of S. s. similis (Sim-MA and

Sim-GA respectively), but no significant differences were found among locales in pairwise estimates of FST across these loci (Table 2.3). Coalescent modeling of nuclear introns with IMa2 resulted in high estimates for both contemporary and historical population sizes of S. s. similis and high migration rates between northern and southern populations of this subspecies, consistent with low differentiation. Finally, S. s. similis population structure also was tested using 9 microsatellite loci and Sim-GA populations were found to be weakly but significantly differentiated from both northern S. s. similis populations (Sim-LIS and Sim-MA) based on FST (Table 2.4).

The northern S. s. similis populations were not significantly differentiated from one another in the microsatellite FST analysis (Table 2.4). By using multiple classes of markers, each of which are affected by different evolutionary processes and biases that can complicate the phylogeographic patterns, the analyses performed here provide a temporal context for phylogeographic interpretations (Zink and Barrowclough 2008,

62 Edwards and Bensch 2009). The fact that significant geographic structure within S. s. similis was only found using rapidly evolving microsatellite markers suggests that this genetic differentiation results from some combination of a recent separation of S. s. similis populations, slow genetic drift, and/or moderate levels of ongoing gene flow

(Hellberg 2009, Grosholz 2002).

Results from comparing the two Spisula solidissima subspecies in this study reinforce previous inferences of substantial genetic divergence (Hare and Weinberg

2005). Here, although the analysis does not have the ability to estimate absolute divergence times, the split between these cryptic subspecies is more than two orders of magnitude earlier (t=16.5) than the population split between North and South S. s. similis populations (t=0.1). Surprisingly, the microsatellite STRUCTURE analysis of the Washburn Island population of S. s. similis (Sim-MA) shows evidence for hybrid or backcross individuals. The other S. s. similis populations (Sim-GA and Sim-LIS) did not show any evidence of hybridization (Figure 2.2). Notably, these two subspecies have been successfully crossed in a hatchery with grow-out to late larval stages (Emma Green-Beach, unpublished data). It is unclear what drives reproductive isolation between these cryptic taxa in the wild given this absence of an absolute barrier, but isolation by habitat and or other prezygotic barriers may be present in the system.

Within S. s. similis, there was only evidence of population structure between northern (Sim-LIS) and southern (Sim-GA) individuals in microsatellite data, indicating that there may be connectivity between these populations. Recent modeling of S. solidissima larval dispersal has predicted that larvae may have a dispersal range

63 of up to 100 km, and that larval-mediated dispersal is predicted to generate high connectivity within the mid-Atlantic bight, but these models do not extend south of

Cape Hatteras (Zhang et al 2015). Perhaps, due to the large potential dispersal distance of larvae, there is enough gene flow between S. s. similis populations to retard their differentiation despite a wide geographic separation (Waples 1998). Very low amounts of gene flow would be sufficient to maintain genetic homogeneity if either (1) large effective population sizes minimized genetic drift or (2) intermediate populations provided multi-generational stepping stones.

The weak genetic structure within S. s. similis may also reflect a recent founding or invasion by southern populations to the north. The disjunct distribution of

S. s. similis along the western Atlantic coast has been speculated to be the result of a recent dispersal event due to climate change or other recent anthropogenic effects

(Hare et al 2010). The IMA2 estimates of population split times (t) were treated as relative and not absolute because the mutation rates used to scale the split time parameter are unknown for the loci used in the analyses. The analyses estimated a relatively recent split (t=0.1) between Sim-GA and Sim-MA, when compared to the split time estimated between Sol-NY and the two S. s. similis populations (t=16.5).

These split time estimates are comparable if we assume that mutation rates are similar in these sister taxa. A recent invasion by S. s. similis to the north could result in similar allele frequencies in the northern and southern populations of S. s. similis

(Grosholtz 2002, Darling et al 2008) if no bottleneck or founder effect was involved.

Founder events (colonization bottleneck) would be expected to reduce heterozygosity and accelerate genetic drift in the resulting northern populations (Grosholtz 2002,

64 Darling et al 2008). Based on nDNA sequence variation, Sim-MA has 3.45% - 9.81% reduction in observed heterozygosity compared to GA across our three nuclear loci.

Two out of our three nuclear loci (nuSsim3823, nuSsim 9017) have significantly negatie Tajimas D in the Sim-MA population which is a signal for demographic expansion in the absence of selection (Tajima 1989, Chevolot et al 2007). Negative

Tajimas D ales are epected hen sampling a population recently after it has invaded new territory (Taylor and Keller 2007, Azzuro et al 2006) due to the expanding population. Coalescent analyses with nDNA intron sequences also resulted in an effective population size estimate for Sim-MA that was one-third the size of the southern population in Georgia. Thus, the nDNA intron sequences provide a compelling case for an historical S. s. similis population expansion in southern New

England, with ambiguous timing, but no strong and recent bottleneck.

While the nuclear intron loci show divergence and diversity patterns consistent with a model of invasion of the northern population, there are no significant indications of a reduction of genetic diversity in northern S. s. similis from the microsatellite data - the markers most likely to show a very recent bottleneck

(Hellberg 2009). Comparisons between northern and southern populations of S. s. similis show no significant difference in any diversity measures including allelic richness, expected heterozygosity, and inbreeding coefficient. The trends lean toward what would be expected after a northern colonization with reduced allelic diversity and HS as well as increase in FIS in northern populations compared to southern

(Allendorf 2009), but these trends are not significant. Because low frequency microsatellite alleles are expected to be sensitive to demographic bottlenecks (Selkoe

65 and Toonen 2006), microsatellite data often provide a sensitive test of bottleneck and colonization hypotheses over recent time scales. The discrepancy between intron and microsatellite patterns in the nuclear genome of S. s. similis is most parsimoniously explained by a northern colonization founder event that is moderately old, providing sufficient time for recovery of most microsatellite diversity while nDNA introns retain signatures of the post-founding expansion. Exact timing is difficult to estimate without knowing microsatellite mutation rates, but a post-Pleistocene time scale, after the last glacial maximum, is more likely than recent colonization within the Anthropocene

(thousands rather than hundreds of years ago).

While mtDNA often is informative about shallow as well as deeper population structure (Zink and Barrowclough 2008, Waters et al 2005), in this case there was an unexplained and severe reduction in mtCOI variation but not nDNA variation across all S. s. similis populations, limiting mtCOI utility for measuring population structure.

Unusually low mitochondrial cytochrome oxidase I (mtCOI) diversity was observed in

Sim-MA and Sim-GA both within and between populations. Even when pooled together as a single population, the two S. s. similis populations had lower mtCOI diversity than the Sol-NY (Long Island S. s. solidissima) population, despite ~1600km separation between the Sim-GA and Sim-MA populations. One possible explanation for a reduction in genetic diversity at mtCOI is a bottleneck in the ancestral S. s. similis population (Allendorf et al 2009). However, the genetic diversity at nuclear loci does not show a parallel reduction compared with Sol-NY (Bazin et al 2006).

Another explanation for reduced mtCOI diversity is a selective sweep at a mitochondrial locus or background selection that would remove variation throughout

66 the mitochondrial genome (Meiklejohn et al 2007). Mitochondrial function has been associated with a variety of physiological traits in bivalves, including critical temperature maximum and minimum (Pörtner et al 1999) and cold adaptation

(Tschischka et al 2000). Weinberg (2005) demonstrated that warming ocean temperate has led to mortalities due to thermal stress and caused a significant shift to deeper waters in S. s. solidissima along its distribution. Although S. s. similis is known to tolerate higher temperatures than S. s. solidissima in aquaculture settings (Walker et al

1997) and can tolerate water temperatures exceeding 30°C in Georgia (Goldberg and

Walker 1990), recent warming may have resulted in even stronger selection for temperature tolerance in the shallow, nearshore zone that S. s. similis exclusively inhabits (Walker and Heffernan 1994, Andrews 1971). Thus, the reduced diversity at mtCOI may reflect a recent selective sweep for a functionally important mtDNA allele that increases temperature tolerance in S. s. similis. Further work will need to be done to fully understand the functional significance of mtDNA diversity and its relationship to thermal tolerance in S. s. similis, perhaps by comparison with Gulf of Mexico populations of S. s. similis.

The balance of evidence across markers in this study suggests that northern S. s. similis populations in MA and LIS may not be the result of recent colonization due to human introduction, despite their discovery only recently. There is not strong genetic evidence for a very recent bottleneck with few founding individuals in the

Southern New England populations, the genetic structure between the northern and southern S. s. similis populations is weak, and migration estimates between northern and southern populations are high. Biogeographic breaks known to be important in

67 other marine species along the western Atlantic, such as Cape Hatteras (McCartney et al 2013), may not be structuring contemporary S. s. similis populations. Many Western

Atlantic species show a latitudinal gradient in genetic diversity, with relatively lower genetic diversity in the north (e.g., Maggs et al 2008). This is hypothesized to be primarily the result of post-glacial expansion from southern refugia, where long-term populations maintain higher genetic diversity (Luo et al 2018, Maggs et al 2008).

Alternatively, for S. s. similis, Southern New England represents the (newly recognized) northern range limit and as such, may sustain smaller effective population sizes and therefore lower genetic diversity than in the southern core range. In either case, there are likely to be unidentified, stepping-stone populations of inshore S. s. similis populations that have facilitated gene flow between southern and northern populations. The weak genetic differentiation detected with microsatellite data here may be part of a more gradual isolation by distance pattern (Hellberg et al 2002), but further surveys are needed to assess whether geographically intermediate populations of S. s. similis exist. In addition to weak genetic differentiation, the lack of evidence for a strong bottleneck in northern S. s. similis contradicts the hypothesis of recent anthropogenic origins. Instead, observed patterns are most consistent with post-

Pleistocene colonization of Southern New England by S. s. similis, not as a disjunct northern population that has remained isolated, but rather as a well-connected part of a metapopulation. This study also highlights the utility of using multiple classes of genetic markers, in order to both inform species comparisons and to assess within

(sub)species patterns of genetic diversity.

68 REFERENCES

Abbott, R. T. 1974. American Seashells; The Marine Molluska of the Atlantic and

Pacific Coasts of North America (2nd Edition). New York, NY. Van Nostrand

Reinhold.

Allendorf, F. W. & Luikart, G. 2009. Conservation and the genetics of populations.

New York, NY. Wiley.

Andrews, J. 1971. Shells and shores of Texas. Austin, TX. University of Texas Press.

Azzurro, E., Golani, D., Bucciarelli, G., & Bernardi, G. 2006. Genetics of the early stages of invasion of the Lessepsian rabbitfish Siganus luridus. Journal of

Experimental Marine Biology and Ecology. 333: 190-201.

Bazin, E., Glémin, S., & Galtier, N. 2006. Population size does not influence mitochondrial genetic diversity in . Science. 312: 570-572.

Chen, G., & Hare, M. P. (2008). Cryptic ecological diversification of a planktonic estuarine copepod, Acartia tonsa. Molecular Ecology. 17: 1451-1468.

Chevolot, M., Wolfs, P. H., Pálsson, J., Rijnsdorp, A. D., Stam, W. T., & Olsen, J. L.

2007. Population structure and historical demography of the thorny skate (Amblyraja

69 radiata, Rajidae) in the North Atlantic. Marine Biology. 151: 1275-1286.

Darling, J. A., Bagley, M. J., Roman, J. O. E., Tepolt, C. K., & Geller, J. B. 2008.

Genetic patterns across multiple introductions of the globally invasive crab genus

Carcinus. Molecular Ecology. 17: 4992-5007.

Do, C., Waples, R. S., Peel, D., Macbeth, G. M., Tillett, B. J., & Ovenden, J. R. 2014.

NeEstimator 2: reimplementation of softare for the estimation of contemporar effective population size (Ne) from genetic data. Molecular Ecology Resources. 14:

209-214.

Drummond AJ, Ashton B, Buxton S et al. 2011 Geneious v5.4, Available from http://www.geneious.com.

Edwards, S., & Bensch, S. 2009. Looking forwards or looking backwards in avian phylogeography? A comment on Zink and Barrowclough 2008. Molecular Ecology,

18: 2930-2933.

Excoffier, L., & Lischer, H. E. 2010. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Molecular

Ecology Resources. 10: 564-567.

70 Friedland, K. D., & Hare, J. A. 2007. Long-term trends and regime shifts in sea surface temperature on the continental shelf of the northeast United States. Continental

Shelf Research. 27: 2313-2328.

Fu, Y. X. 1997. Statistical tests of neutrality of mutations against population growth, hitchhiking and background selection. Genetics. 147: 915-925.

Gene Codes. Sequencher version 5.4 DNA sequence analysis software, Gene Codes

Corporation. http://www.genecodes.com.

Goldberg, R. & Walker, R. L. 1990. Cage culture of yearling surfclams, Spisula solidissima (Dillwyn, 1817), in coastal Georgia. Journal of Shellfish Research. 9: 187-

193.

Goudet, J. 2005. FSTAT (version 2.9.3): a program to estimate and test population genetic parameters. http://www2unil.ch/popgen/softwares/fstat.htm

Grosholz, E. 2002. Ecological and evolutionary consequences of coastal invasions. Trends in Ecology & Evolution. 17: 22-27.

Hare, M. P., & Borchardt-Wier, H. 2014. Microsatellite primers designed from pyrosequences to compare two subspecies of surfclam, Spisula solidissima.

Conservation Genetics Resources. 6: 1039-1041.

71 Hare, M. P., & Weinberg, J. R. 2005. Phylogeography of surfclams, Spisula solidissima, in the western North Atlantic based on mitochondrial and nuclear DNA sequences. Marine Biology, 146: 707-716.

Hare, M. P., Weinberg, J., Peterfal, O., & Daidson, M. 2010. The sothern surfclam (Spisula solidissima similis) found north of its reported range: a commercially harvested population in Long Island Sound, New York. Journal of

Shellfish Research. 29: 799-807.

Hauser, L., Adcock, G. J., Smith, P. J., Ramírez, J. H. B., & Carvalho, G. R. 2002.

Loss of microsatellite diversity and low effective population size in an overexploited population of New Zealand snapper (Pagrus auratus). Proceedings of the National

Academy of Sciences USA. 99: 11742-11747.

Hellberg, M. E., Burton, R. S., Neigel, J. E., & Palumbi, S. R. 2002. Genetic assessment of connectivity among marine populations. Bulletin of Marine Science. 70:

273-290.

Hellberg, M. E. 2009. Gene flow and isolation among populations of marine animals. Annual Review of Ecology, Evolution, and Systematics. 40: 291-310.

Hey, J. (2010). Isolation with migration models for more than two populations.

Molecular Biology and Evolution. 27: 905-920.

72 Hoarau, G., Boon, E., Jongma, D. N., Ferber, S., Palsson, J., Van der Veer, H. W., &

Olsen, J. L. 2005. Low effective population size and evidence for inbreeding in an overexploited flatfish, plaice (Pleuronectes platessa L.). Proceedings of the Royal

Society of London B: Biological Sciences. 272: 497-503.

Jacobson, M. K., & Old Jr, W. E. 1966. On the identity of Spisula similis (Say).

American Malacological Union Report. 1966: 30-31.

Kartavtsev, Y. P. 2011. Sequence divergence at mitochondrial genes in animals: applicability of DNA data in genetics of speciation and molecular phylogenetics.

Marine Genomics. 4: 71-81.

Kuparinen, A., Hutchings, J.A. and Waples, R.S. 2016. Harvest-induced evolution and effective population size. Evolutionary Applications. 9: 658-672.

Levitus, S., Antonov, J. I., Boyer, T. P., & Stephens, C. (2000). Warming of the world ocean. Science. 287: 2225-2229.

Lou, R. N., Fletcher, N. K., Wilder, A. P., Conover, D. O., Therkildsen, N. O., &

Searle, J. B. 2018. Full mitochondrial genome sequences reveal new insights about post-glacial expansion and regional phylogeographic structure in the Atlantic silverside (Menidia menidia). Marine Biology. 165: 124.

73 Maggs, C. A., Castilho, R., Foltz, D., Henzler, C., Jolly, M. T., Kelly, J., & Wares, J.

2008. Evaluating signatures of glacial refugia for North Atlantic benthic marine taxa. Ecology. 89: S108-S122.

McCartney, M. A., M. L. Burton, and T. G. Lima. 2013. Mitochondrial DNA differentiation between populations of black sea bass (Centropristis striata) across

Cape Hatteras, North Carolina (USA). Journal of Biogeography. 40:13861398.

Meiklejohn, C. D., Montooth, K. L., & Rand, D. M. 2007. Positive and negative selection on the mitochondrial genome. Trends in Genetics 23: 259-263.

Nance, H. A., Klimley, P., Galván-Magaña, F., Martínez-Ortíz, J., & Marko, P. B.

2011. Demographic processes underlying subtle patterns of population structure in the scalloped hammerhead shark, Sphyrna lewini. PLoS One. 6: e21459.

National Marine Fisheries Service. 2016. Fisheries of the United States, 2015. U.S.

Department of Commerce, NOAA Current Fishery Statistics No. 2015. Available at: https://www.st.nmfs.noaa.gov/commercial-fisheries/ fus/fus15/index

Northeast Fisheries Science Center. 2017. 61st Northeast Regional Stock Assessment

Workshop (61st SAW) Assessment Report. US Dept. Commerce, Northeast Fish Sci.

Cent. Ref. Doc. 17-05; 466 p. Available at: http://www.nefsc.noaa.gov/publications/

74

Ólafsdóttir, G. Á., Westfall, K. M., Edvardsson, R., & Pálsson, S. 2014. Historical

DNA reveals the demographic history of Atlantic cod (Gadus morhua) in medieval and early modern Iceland. Proceedings of the Royal Society B: Biological Sciences.

281: 1777.

Ovenden, J. R., Morgan, J. A. T., Kashiwagi, T., Broderick, D., & Salini, J. 2010.

Toards better management of Astralias shark fisher: genetic analses reeal unexpected ratios of cryptic blacktip species Carcharhinus tilstoni and C. limbatus.

Marine and Freshwater Research. 61: 253-262.

Pereyra, S., García, G., Miller, P., Oviedo, S., & Domingo, A. 2010. Low genetic diversity and population structure of the narrownose shark (Mustelus schmitti). Fisheries Research. 106: 468-473.

Powell, E. N., Kuykendall, K. M., & Moreno, P. 2017. The death assemblage as a marker for habitat and an indicator of climate change: Georges Bank, surfclams and ocean quahogs. Continental Shelf Research, 142: 14-31.

Purcell, K. M. & Stockwell, C. A. 2015. An evaluation of the genetic structure and post-introduction dispersal of a non-native invasive fish to the North Island of New

Zealand. Biological Invasions. 17: 625-636.

75 Pörtner, H. O., Hardewig, I., & Peck, L. S. 1999. Mitochondrial function and critical temperature in the Antarctic bivalve, Laternula elliptica. Comparative Biochemistry and Physiology Part A: Molecular & Integrative Physiology. 124: 179-189.

Scavia, D., Field, J. C., Boesch, D. F., Buddemeier, R. W., Burkett, V., Cayan, D. R.,

& Titus, J. G. 2002. Climate change impacts on US coastal and marine ecosystems.

Estuaries. 25: 149-164.

Selkoe, K. A., & Toonen, R. J. 2006. Microsatellites for ecologists: a practical guide to using and evaluating microsatellite markers. Ecology Letters. 9: 615-629.

Sinclair-Waters, M., 2017. Genomic Perspectives for Conservation and Management of Atlantic Cod in Coastal Labrador (Doctoral dissertation, Dalhousie University).

Tajima, F. 1989. The effect of change in population size on DNA polymorphism.

Genetics. 123: 597-601.

Taylor, D. R., & Keller, S. R. 2007. Historical range expansion determines the phylogenetic diversity introduced during contemporary species invasion. Evolution.

61: 334-345.

76 Thomas Jr, R. C., Willette, D. A., Carpenter, K. E., & Santos, M. D. 2014. Hidden diversity in sardines: genetic and morphological evidence for cryptic species in the goldstripe sardinella, Sardinella gibbosa (Bleeker, 1849). PloS One. 9: e84719.

Tschischka, K., Abele, D., & Portner, H. O. 2000. Mitochondrial oxyconformity and cold adaptation in the polychaete Nereis pelagica and the bivalve Arctica islandica from the Baltic and White Seas. Journal of Experimental Biology. 203: 3355-3368.

Walker, R. L., & Heffernan, P. B. (1994). Age, growth rate, and size of the southern surfclam, Spisula solidissima similis (Say, 1822). Journal of Shellfish Research. 13:

433-441.

Walker, R. L., Hurley, D. H., & Moroney, D. A. 1997. Culture of juvenile Atlantic surfclams, Spisula solidissima solidissima and Spisula solidissima similis, in forced flow upwellers in a bivalve hatchery in coastal Georgia. Journal of the World

Aquaculture Society. 28: 27-33.

Zink, R. M., & Barrowclough, G. F. 2008. Mitochondrial DNA under siege in avian phylogeography. Molecular Ecology. 17: 2107-2121.

Waples, R. S. 1998. Separating the wheat from the chaff: patterns of genetic differentiation in high gene flow species. Journal of Heredity. 89: 438-450.

77 Waters, J. M., King, T. M., O'Loughlin, P. M., & Spencer, H. G. 2005.

Phylogeographical disjunction in abundant high-dispersal littoral gastropods.

Molecular Ecology. 14: 2789-2802.

Weinberg, J. R. 1999. Age-structure, recruitment, and adult mortality in populations of the Atlantic surfclam, Spisula solidissima, from 1978 to 1997. Marine Biology. 134:

113-125.

Weinberg, J. R. 2005. Bathymetric shift in the distribution of Atlantic surfclams: response to warmer ocean temperature. ICES Journal of Marine Science. 62: 1444-

1453.

Zhang, X., Haidvogel, D., Munroe, D., Powell, E. N., Klinck, J., Mann, R., &

Castruccio, F. S. 2015. Modeling larval connectivity of the Atlantic surfclams within the Middle Atlantic Bight: Model development, larval dispersal and metapopulation connectivity. Estuarine, Coastal and Shelf Science. 153: 38-53.

78 CHAPTER 3

GENOME-WIDE ANALYSIS OF THE CELTIC FRINGE PATTERN OF GENETIC DIVERSITY IN BRITISH FIELD VOLES Abstract

Genetic structure in high latitude species shows signatures of colonization of refugial populations after isolation during glacial cycling, and potentially admixture from secondary contact of such populations. Many species recolonized northern latitudes as habitats became suitable shortly after the Last Glacial Maximum (LGM, ca ~22,500 years before present), but even more recent climatic oscillations, such as a cold period known as the Younger Dryas (ca ~12,800-11,500 years before present), have played a role in structuring populations. Species of small mammals and other taxa have been shown to colonize the British Isles in two separate waves, one shortly after the LGM and another shortly after the Younger Dryas. This has led to a characteristic pattern of genomic variation within many species, dubbed the Celtic Fringe, with an older, northern genomic group with partially replacement and admixture by the younger, southern genomic group. Although this is a general phenomenon, the evolutionary and demographic processes that drive this pattern genome-wide are still not fully understood. Here we analyze low-coverage whole genome sequencing and mitochondrial genomes from 11 populations and 124 individuals of the short-tailed field vole (Microtus agrestis), to explore patterns of genome-wide introgression and divergence across the Celtic Fringe. We find high genome-wide divergence between our northern and southern populations, with a strong signature of isolation by distance north to south across Britain. There is also mito-nuclear divergence between our

79

mitochondrial and nuclear data, and the southern mitochondrial haplotype has not introgressed as far north as the southern nuclear DNA. We find no signature of selection in our data, and our results point to genetic drift the major evolutionary process driving divergence genome-wide between our populations. This signature of drift may be especially strong because the field voles have multi-year demographic cycling, which can drive divergence through cyclical population bottlenecking. The mito-nuclear discordance found in our data may be explained by male-biased dispersal of our southern population after secondary contact post-colonization, and limited introgression of maternally inherited southern mitochondrial genomes. This study offers insights into the evolutionary and demographic processes that drive Celtic

Fringe patterns in Britain, and have led to contemporary patterns of genetic diversity after climatic cycling.

Introduction

Temperate species show genetic signatures of colonization from refugial populations that existed at the time of the greatest extent of ice cover during the most recent glaciation, the Last Glacial Maximum (LGM, ca 22,500 years before present,

BP) (Hewitt 2000). Refugial populations colonized newly habitable areas after the retreat of glacial ice-sheets, resulting in a distinctive geographic distribution of genetic diversity that is evident in many current populations (Taberlet et al 1998, Hewitt

2004). Populations from different refugia came into secondary contact resulting in admixture, replacement, or even reproductive isolation (Hewitt 1999, Barnes et al

2002, Lovette 2005). Although colonization occurred post LGM (occurring shortly

80

after glacial retreat ~15-16,000 BP), even later climatic oscillations around the

Pleistocene/Holocene transition have been shown to play an important role in structuring genetic diversity within high latitude species, including glacial re-advance during the Younger Dryas (12,800-11,500 BP, Herman and Searle 2011,Walker et al

2012, de Bruyn et al 2012, Brace et al 2016).

Britain was recolonized by temperate species from continental Europe after ice sheets retreated and climatic conditions were again favorable (Montgomery et al 2014,

Jacobi and Higham 2009). For a number of animal species, this recolonization has been shown to be the result of two independent phases, one shortly after the LGM, and another following the last cold period of the last glaciation, the Younger Dryas. These were both periods when the climate was suitable and there was a land bridge connection between Britain and continental Europe (Jacobi and Higham 2009, Brace et al 2016). Where this two-phased colonization of Britain has occurred it has resulted in a geographic pattern of genetic ariation knon as a Celtic Fringe, in hich the contemporary genotypes of northern and peripheral populations are descended from the initial colonizers after the LGM and central and southern populations are the descendants of the subsequent colonizing populations after the Younger Dryas (Figure

1A, Searle et al 2009). Phylogeographic studies on neutral genetic variation of dormice, voles, shrews and snails all show that the two-phase colonization has resulted in a similar geographically defined genetic structure (Searle and Wilkinson 1987,

Davidson 2000, Searle et al 2009, Filipi et al 2015, Brace et al 2016, Combe et al

2016).

81

It is not only neutral genetic variation involved in the Celtic Fringe pattern in animals that have colonized the British Isles. Kotlík et al (2014) investigated the dual colonization and formation of a Celtic Fringe in bank voles (Clethrionomys glareolus) in Britain, and uncovered putative adaptive differences between hemoglobin alleles that apparently evolved while the two colonizing populations were isolated in independent glacial refugia. There is a signature of a hemoglobin allele sweeping from one population into the other, underscoring the role that these end-glacial dynamics can play in the recent adaptive evolution of species (Kotlík et al 2014). Other studies have found discrete patterns of chromosomal variation or clinal patterns of morphological variation within species in Britain, again creating a Celtic Fringe

(Arthur and Kettle 2001, Searle and Wilkinson 1987). Overall the various studies on the Celtic Fringe highlight the role that post-LGM glacial readvance has played in the generation and structuring of contemporary genetic diversity, as well as both the early stages of speciation and the dynamics of adaptive introgression between recently diverged populations (Herman and Searle 2011, Kotlík et al 2014, Brace et al 2016,

Herman et al 2019).

The short-tailed field vole (Microtus agrestis) is a particularly interesting species to study in the context of the Celtic Fringe because selection, drift and introgression may be involved in the patterns of genetic variation on Britain. M. agrestis is part of a rapidly speciating cryptic species complex and diverged from its sister species while isolated during the LGM (Paupério et al 2012). Although the exact mechanism driving reproductive isolation between the cryptic field vole species is not currently known, highly divergent loci are distributed across the entire genome,

82

pointing to genetic drift as an important process, although they also differ in their environmental niches (Fletcher et al 2019). Genetically distinct populations of M. agrestis colonized Britain after the LGM in two phases, resulting in the classic Celtic fringe pattern of genetic variation (Herman and Searle 2011). Analysis of mitochondrial, Y-chromosome and microsatellite loci have revealed discordance between the patterns of introgression at these loci, but it is unclear whether demographic processes drive these patterns (Herman et al 2019). These analyses also identify a possible selective sweep of the mitochondrial genome during postglacial climatic oscillations, perhaps due to selection for adaptive metabolic genes under conditions of rapid environmental change, although there was no evidence of selection in cytochrome b gene (Herman et al 2019). Despite these initial clues there is still little understanding of the relative roles of selection, drift and introgression in structuring genetic diversity in the short tailed field vole in Britain.

In this study we use low-coverage whole genome sequencing data on 124 field vole individuals across 11 populations in Britain in a north to south transect, perpendicular to the contact zone across northern England (Figure 3.1B) to investigate how the formation of the Celtic Fringe contributed to differential patterns of divergence and introgression across the genome. We tested whether regions of the genome show evidence for either adaptive or restricted introgression. Adaptive introgression would be consistent with positive selection on genes after admixture between colonizing lineages, while restricted introgression would be expected for genomic regions involved in the early stages of reproductive isolation between two diverging populations (Harrison and Larson 2016). We sequenced each individual to

83

an average 1x coverage, and after filtering used the resulting ~5.89 million SNPs in population genetic analyses in order to investigate the extent of genome-wide divergence between populations from the two colonizing lineages of Britain. In addition to genome-wide SNP data, our sequencing led to high coverage (average 70x) across the entire mitochondrial genome. We used this whole mitogenome dataset to investigate whether these populations show more specific patterns of mito-nuclear discordance and selection on mitochondrial genes, known to be a common phenomenon in high latitude species (Toews and Brelsford 2012).

Figure 3.1: A) A nmber of species sho a Celtic Fringe distribtion of genetic ariation, ith one geneticall distinct poplation (the Celtic Fringe) fond in the peripher (hich maximally can incorporate northern and western Britain and even parts of the extreme south), and the other genetically distinct population found in more central and eastern areas. The exact geographical region occupied by the Celtic Fringe varies between species, e.g. for some species, the Celtic Fringe only incorporates northern Britain. B) Sampling localities of field voles used in this study. Blue dashed line is approximate location of Celtic Fringe between mtDNA lineages, based on Herman and Searle (2011). Further details of populations and localities provided in Table S3.1.

84

Materials and Methods

Sampling, library preparation and sequencing:

Genomic DNA was extracted from ethanol-preserved vole tissues (digits) from 124 museum specimens of field voles distributed across 11 distinct geographically designated populations in Britain. Each population had between 7 - 19 individuals, collected between 1991 and 2005 (Fig 3.1B, Table S3.1). DNA was extracted using the DNeasy Blood and Tissue kit (Qiagen) following standard protocols. Low- coverage whole-genome sequencing based on the protocol described in Therkildsen and Palumbi (2017) was conducted at the Genomic Diversity Facility at Cornell

Uniersits Institte for Biotechnolog. Briefl, a dilted Netera librar preparation was used to ligate sequencing adaptors and dual index each individual. After individual indexing, all individual samples were pooled and a Pippen size selection on pooled samples targeted loci 450 - 800 bp in length. Pooled samples were sequenced on four paired-end 2 x 150 NextSeq 500 lanes. After the first lane was sequenced, libraries were re-pooled based on read counts to minimize uneven sequencing across individuals, with a target of 1x coverage per individual.

Sequence filtering, mapping, and SNP calling:

We used FastQC v0.11.5 (Andrews 2010) to check sequence quality and adaptors were clipped using Trimmomatic 0.39 (Bolger et al 2014) using the ILLUMINACLIP mode allowing 2 mismatches, a simple clip threshold of 10, minimum adapter length

85

of 4, a palindrome clip threshold of 30 and keeping both reads after clipping. PolyG trimming was performed by fastp (Chen et al 2018). Reads were mapped in end-to-end mode to the Microtus ochrogaster MicOch1.0 reference genome (GenBank assembly accession GCA_000317375.1) and converted to bam files using Bowtie 2 (Langmead and Salberg 2012) ith the folloing settings: er-sensitie pre-set option, minimum fragment length of 0, and a max fragment length of 1500. Reads below the q-score of 20 and those that were non-uniquely mapped were filtered out using samtools v1.8 (http://www.htslib.org/). Duplicate reads were removed using Picard 2.9

MarkDuplicates tool (http://broadinstitute.github.io/picard/). We then merged bam files from different sequencing lanes using samtools and overlaps from paired end reads were clipped using bamUtil v1.0.14 clipOverlap tool (Jun et al 2015). Reads were then realigned around indels using GATK ver. 3.7 IndelRealigner tool (McKenna et al 2010) using the default settings for realignment. Coverage statistics were calculated using the samtools depth tool.

ANGSD v0.912 (Korneliussen et al 2014) was used to call SNPs and estimate genotype likelihoods using the samtools model (-GL 1) for all 125 individuals. The minimum number of individuals threshold (-minInd) was set to 63 and excluded bases with a quality score (-minQ) < 20. We also set the minimum and maximum depth (- setMaxDepth, -setMinDepth) to 63 and 248, respectively, to eliminate low coverage sites and repetitive or paralogous regions. These thresholds were set because our target sequencing depth was 1x per individual. We used a p-value cut-off of 10-6 for calling polymorphic loci and retained only SNPs with a minor global allele frequency of > 1%

86

(-minMAF 0.01). Major and minor alleles were inferred from genotype likelihoods across individuals (-doMajorMinor 1).

Additionally, we used ANGSD v0.912 (Korneliussen et al 2014) to generate allele frequency likelihoods for each population (-doSaf 1), by calculating a site frequency spectrum following the method described in Nielson et al (2012). Bam files were mapped to the reference genome, with the maximum depth set to 248 (~2x coverage per individual, -setMaxDepth 248), minimum sequence phred score at 20 (- minQ 20), and the minimum number of individuals per population set to 2 (-minInd

2). Population minor allele frequencies and associated likelihoods were output as .saf files for subsequent analyses.

Population genomics of NGS data:

To visualize genetic variation and admixture across our samples and populations, we used PCAngsd (Meisner and Albrechtsen 2018) to calculate a covariance matrix among individuals and then performed individual-level principal component analysis (PCA) with the eigen function with default settings in R (R Core

D Team 2014). PCAngsd was run with the admixture setting (-admix) to estimate individual admixture proportions with eigenvalues set from 1 to 11 (-e) and minimum minor allele frequency set to 0.01 (-minMaf 0.01) (Meisner and Albrechtsen 2018).

Additionally, we used the program NGSadmix (Skotte et al 2013) to estimate individual admixture proportions and population structure. NGSadmix uses a maximum-likelihood framework to estimate individual admixture levels directly from genotype likelihoods, taking into account uncertainty and information from

87

unobserved genotypes. We set minor allele frequency to 0.05 (-minMAF 0.05) and set the number of populations to infer from 1 to 11 (k 1-11).

Population allele frequencies likelihoods were used to estimate weighted pairwise FST values between all populations using ANGSD. The 2D site frequency spectrum for each pair of populations were estimated (-realSFS), and used to calculated the pairwise weighted FST (-realSFS fst) across all population pairs. To visualize the patterns of genome-wide divergence between populations, we generated

Manhattan plots of pairwise FST first for all SNPs, and then in non-overlapping 50 kb windows, using ANGSD (realSFS fst stats2). When generating figures we excluded

SNPs on scaffolds < 10 Mb in length, as the number of smaller scaffolds made it impossible to visualize. We used custom R scripts to create a pairwise FST matrix using all pairwise average FST values between all 11 populations.

We then constructed a symmetrical pairwise geographic distance matrix to our

FST matrix, with distances between all populations using the distm function in the geosphere R package (R core team 2014). Using the symmetrical distance and FST matrices, we tested for signatures of isolation by distance between our populations, using a Mantel test in the R package ade4 using the function mantel.Rtest with 9999 repetitions (Bougeard and Dray 2018). We tested for a significant relationship between geographic distance and genetic distance using a linear model in R (lm, R core team 2014). We then applied classical multidimensional scaling to the pairwise

FST matrix using the cmdscale function in R (R core team 2014). Using the scaled matrix, we generated a population-level principal coordinate analysis (PCoA) to visualize the distribution of genomic diversity between populations. To assess whether

88

there is a geographic pattern of restricted gene flow between populations, we estimated effective migration surfaces between populations using the program EEMS

(Petkova et al 2016). EEMS analyses the pairwise FST matrix, using a population genetic model to relate effective migration rates to expected genetic similarities. This results in a geographic visualization of effective migration rates between populations, which can highlight spatial features of population structure. We ran EEMS on the diploid setting with 200,000 MCMC iterations (-numMCMCIter = 200000), after a burnin of 100,000 iterations (-numbBurnIter = 100000) and sampled every 999th iteration (-numThinIter = 999). We drew a polygon around Britain as our geographic area of analysis, with 1000 geographic demes (-nDemes = 1000).

We performed a scan for selection across genome in all our individuals using the selection fnction in PCAngsd (Meisner and Albrechtsen 2018). This fnction uses an extended model of FastPCA (Galinsky et al 2016), which identifies principal component variants that are significantly divergent compared to the null distribution of genetic drift. Minimum minor allele frequency was set to 0.01 (-minMaf 0.01) and for all significant PCs, as well as specified for PCs 1 and 2 alone as they explained most of the variation in the data (-e 1 and 2). P-values were calculated for all positions from chi-squared values using R, as well as a Bonferroni-corrected p-value significance threshold of p < 0.01.

Reconstructing mitochondrial genome:

In addition to the targeted 1x coverage across the nuclear genome, the low-coverage whole genome sequencing method results in high coverage of mitochondrial

89

sequences, as there are many more mitochondrial genomes per cell. These sequences can be used to reconstruct whole mitogenomes for all individuals (Lou et al 2018). To reconstruct individual mitogenomes, after adaptor clipping, sequence quality filtering and polyG trimming steps described above, reads were mapped to the published M. agrestis mitogenome (Jiang et al 2018; GenBank MH152570.1) using Bowtie 2

(Langmead and Salberg 2012) ith the folloing settings: er-sensitie pre-set option, minimum fragment length of 0, and a max fragment length of 1500. Reads below the q-score of 20 and those that were non-uniquely mapped were filtered out using samtools v1.8 (http://www.htslib.org/). Duplicate reads were removed using

Picard 2.9 MarkDuplicates tool (http://broadinstitute.github.io/picard/). We then merged bam files from different sequencing lanes using samtools and overlaps from paired end reads were clipped using bamUtil v1.0.14 clipOverlap tool (Jun et al 2015).

We used ANGSD v0.912 (Korneliussen et al 2014) to call haplotypes using the

DoHaploCall function with the minimum minor allele across individuals set to 3 (- minMinor 03), sites with a minimum phred score > 20 (-minQ 20) and the maximum missing data across individuals set to 60 (-maxMis 60), while also generating allele count data for each individual (dumpCount 4). Allele count data was used to generate consensus sequences and converted to FASTA files using custom R-scripts that addressed sequencing errors by removing sites that had a minimum major allele frequencies of less than 0.75 for each individual, and only including sites with a minimum coverage of 5x. Five individual samples (1016M, 1018M, 1019M, 1020M, and 431M) did not have enough sites with minimum 5x coverage and were removed from subsequent analyses. We also used FreeBayes to call haplotypes from raw bam

90

files and to convert them to vcf files, using the haploid calling function (- p 1) with minimum coverage set to 20x (-! 20) (Garrison and Marth 2012). Files were then converted to STRUCTURE format using PGDspider (Lischer and Excoffier 2012).

Analysis of mtDNA data:

We used structure_threader (Pina-Martins et al 2017) to iteratively run STRUCTURE version 2.34 (Pritchard et al 2000) for K = 1 - 11 with three repetitions each K, using the no admixture model and set to haploid mode with a 50,000 burn-in period followed by 50,000 cycles. We then used STRUCTURE HARVESTER to implement the Evanno method for testing the optimal K value from our outputs (Earl and vonHoldt 2012). Nexus files were used to visualize mtDNA sequences as a haplotype network using the program PopArt (Leigh and Bryant 2015), using the median joining network function.

We calculated the number of segregating sites (S), proportion of segregating sites (ps = S/n, n = total number of sites) nucleotide diversity () and Tajimas D sing

MEGA7 (Kumar et al 2016, Tajima 1989). We included all codon positions and all positions containing gaps or missing data were eliminated. We first analysed all 119 individuals that passed all coverage filters, with a total of 15,454 bp. We then separatel analsed the indiidals assigned to the northern mitochondrial lineage through our STRUCTURE analysis and haplotype network (96 individuals, 15,661 bp after filters), folloed b the sothern indiidals (23 indiidals, 15,609 bp). We then estimated the number of substitutions per site, averaging over all sequence pairs between populations using MEGA7 (Kumar et al 2016). Analyses were conducted

91

using the Maximum Composite Likelihood model (Tamura et al 2004). We included all codon positions and all positions containing gaps and missing data were eliminated.

In total, 15,454 positions were used in the pairwise base substitution analysis. Using the pairwise substitution matrix and the symmetrical pairwise geographic distance matrix, we tested for signatures of isolation by distance in our mitochondrial data between our populations, using a Mantel test in the R package ade4 by implementing the function mantel.Rtest with 9999 repetitions (Bougeard and Dray 2018). We tested for a significant relationship between geographic distance and genetic distance using a linear model in R (lm, R core team 2014).

Results

Sequence filtering, mapping, and SNP calling:

Each of the 124 individuals received on average ~5.6 million reads (median = 6.8 million, range 1.9 - 18.1 million) in our first lane of sequencing. After re-pooling to even out sequencing effort across individuals, each individual averaged ~ 6.4 million

(median = ~6.7 million, range ~3.4 - 8.5 million), ~6.6 million (median = ~6.7 million, range ~4.3 - 8.6 million), and ~6.6 million (median = ~6.7 million, range = ~4.3 - 8.6 million) reads in our three subsequent lanes of sequencing. After adaptor clipping, ~

94% of raw reads were retained across all lanes. Between ~72% - 78% on average of adaptor-clipped bases across lanes mapped uniquely to the MicOch1.0 reference genome, and ~61% of those mapped bases were retained after quality filtering. After merging bam files for individuals across lanes, deduplicating, and clipping overlaps, each individual averaged ~111.3 million bases (median = ~117.4 million, range =

92

~49.2 - 150.0 million) with an average fragment length of ~335.6 bp (median = ~346.6 bp, range = ~158.7 - 380.5 bp) across all individuals. After SNP calling and associated filtering steps, we obtained a dataset of 5,890,848 SNP genotype likelihoods across all individuals. This resulted in an average of ~0.77x coverage of the reference genome across all individuals.

Population genomics of NGS data:

Individual-level PCA analysis revealed geographic variation in the clustering of individuals (Figure 3.2). PC1 explained 10.39% of our SNP variation, and showed a clear separation of our southern and northern populations. PC2 explained 5.54% of the variation and highlighted our admixed populations in the middle of Britain from the

parental poplations in the soth and north. Or PCAngsd Admitre analsis for K

= 2 - 7 are summarized in Figure 3.3. When assigned to two clusters (K = 2), our analyses reveal distinct northern and southern populations with admixed populations and individuals in the middle of Britain. When assigned to other numbers of clusters

(K = 3 - 6), a more complex pattern of population structure and admixture emerged from these analyses. NGSadmix analysis for K = 2 show two distinct clusters corresponding to the northern and sothern parental poplations representing the two populations forming the Celtic Fringe patterns. Again, populations in the center of

Britain show varying levels of admixture, with a mosaic of admixed individuals and populations (Figure 3.4). Our selection scan using PCAngsd did not find any regions of the genomes that crossed our Bonferroni corrected significance threshold (Figure

3.5).

93

Figure 3.2: Principal components analysis summarizing differentiation based on ~5,890,000 SNPs across 124 individual field voles. Individuals are color coded based on populations, as in Figure 3.1. Graph shows principal component 1 plotted against principal component 2, with percentage variation represented by each component.

94

Figure 3.3: PCAngsd admixture analysis of genome-wide SNP data for all 124 field vole individuals, for K = 2 - 7. Individuals are arranged in populations corresponding to those on

the map in Figure 3.1.

0 0 . 1

5 7 . 0

e l b a i r a v

1 k 0 5 . 0

al ue

v

2 k

5 2 . 0

0 0 . 0

M 4 2 5 M 7 1 5 M 6 1 5 M 5 1 5 M 4 1 5 M 3 1 5 M 7 8 M 6 8 M 2 9 M 0 9 M 4 1 2 1 M 1 3 3 M 7 2 3 M 6 2 3 M 8 0 3 M 6 0 2 M 1 0 2 M 5 9 1 M 4 9 1 M 3 9 1 M 1 7 1 M 5 0 2 M 5 3 3 M 4 3 3 M 9 2 3 M 2 2 3 M 1 2 3 M 8 1 3 M 7 1 3 M 4 5 2 M 3 5 2 M 2 5 2 M 3 4 2 M 4 0 2 M 9 6 1 M 2 0 2 M 8 9 1 M 6 6 1 M 7 4 6 M 0 4 6 M 9 3 6 M 8 3 6 M 2 3 6 M 1 3 6 M 9 8 9 1 M 8 8 9 1 M 7 0 1 M 2 0 1 M 1 0 1 M 2 2 4 M 1 3 4 3 K 2 K 1 K M 1 9 4 M 0 9 4 M 9 8 4 M 8 8 4 M 2 9 6 M 1 9 6 M 0 9 6 M 9 8 6 M 8 8 6 M 7 5 6 M 6 5 6 M 5 5 6 M 8 6 0 1 M 7 6 0 1 M 6 6 0 1 M 5 6 0 1 M 4 6 0 1 M 3 6 0 1 M 2 6 0 1 M 1 6 0 1 M 0 6 0 1 M 9 5 0 1 M 8 5 0 1 M 7 5 0 1 M 5 5 0 1 M 4 5 0 1 M 3 5 0 1 M 2 5 0 1 M 1 5 0 1 M 0 5 0 1 M 9 4 0 1 M 8 4 0 1 M 7 4 0 1 M 6 4 0 1 M 5 4 0 1 M 1 4 0 1 M 0 4 0 1 M 9 3 0 1 M 8 3 0 1 M 7 3 0 1 M 6 3 0 1 M 5 3 0 1 M 4 3 0 1 M 8 5 9 M 8 8 7 M 2 2 6 M 7 9 5 M 6 9 5 M 5 9 5 M 4 9 5 M 3 0 3 1 M 6 6 2 1 M 5 6 2 1 M 2 4 1 1 M 1 4 1 1 M 2 1 1 1 M 1 1 1 1 M 1 2 0 1 M 0 2 0 1 M 9 1 0 1 M 8 1 0 1 M 7 1 0 1 M 6 1 0 1 M 2 7 4 M 1 7 4 M 0 7 4 M 9 6 4 M 8 6 4 M 7 6 4 M 6 6 4

e l p m a

One Two Three Four Five Six s Seven Eight Nine Ten Eleven

Figure 3.4: K = 2 output from NGSadmix analysis of genome-wide SNP data for all 124 field vole individuals. Individuals are arranged by latitude and by population, with population names labeled underneath. The extent to which the genomes of individuals are attributed to the two clusters is indicated.

Figure 3.5: PCAngsd selection scan using 5,890,000 SNPs across the genome for scaffolds >

95

10Mb for all 124 field vole individuals. Y axis is -log(p) significance for each site, and X-axis is position across the genome. Dashed line is Bonferroni-corrected significance threshold for p<0.01.

Pairwise FST estimates ranged from 0.026 to 0.208 between our population pairs, and are summarized in Table 3.1. Our highest measured pairwise FST (0.208) was between population Two and population Eleven, representing the extremes in geographic distance and perhaps representing different colonizing populations. The distribution of pairwise FST across the genome seems to be relatively uniform, with no obvious peaks in any individual region of the genome (Figure 3.6). The Mantel test showed a significant signature of isolation by distance and relationship between pairwise geographic distance and FST (p = 0.0082, Figure 3.7). The PCoA analysis showed differentiation between populations, with PCo1 explaining 76.65% of variation and roughly separating northern and southern populations. PCo2 explained

13.75% of the variation, but there was no obvious geographic pattern on this axis

(Figure 3.8). Our EEMS analysis showed significant restricted effective migration

(P{log(m) < 0}, posterior probabilities > 0.9 or 0.95) in three distinct geographic regions: one between Population Two and all other populations, one between

Populations Six and Seven, and one surrounding Populations 10 (Figure 3.9).

96

Table 3.1: Pairwise FST values between all population pairs in the study using allele frequency data from ~5,890,000 SNPs.

One Two Three Four Five Six Seven Eight Nine Ten Eleven

One 0.000 0.092 0.066 0.083 0.067 0.095 0.058 0.074 0.079 0.141 0.139

Two 0.092 0.000 0.108 0.132 0.135 0.197 0.037 0.142 0.129 0.208 0.208

Three 0.066 0.108 0.000 0.032 0.057 0.109 0.026 0.076 0.052 0.117 0.124

Four 0.083 0.132 0.032 0.000 0.062 0.096 0.037 0.068 0.059 0.115 0.113

Five 0.067 0.135 0.057 0.062 0.000 0.066 0.071 0.046 0.048 0.100 0.090

Six 0.095 0.197 0.109 0.096 0.066 0.000 0.122 0.042 0.060 0.097 0.078

Seven 0.058 0.037 0.026 0.037 0.071 0.122 0.000 0.082 0.068 0.134 0.135

Eight 0.074 0.142 0.076 0.068 0.046 0.042 0.082 0.000 0.033 0.073 0.056

Nine 0.079 0.129 0.052 0.059 0.048 0.060 0.068 0.033 0.000 0.074 0.059

Ten 0.141 0.208 0.117 0.115 0.100 0.097 0.134 0.073 0.074 0.000 0.092

Eleven 0.139 0.208 0.124 0.113 0.090 0.078 0.135 0.056 0.059 0.092 0.000

Figure 3.6: Pairwise FST between our highest pairwise FST populations Two and Eleven for 5,890,000 SNPs across the genome for scaffolds > 10Mb. Y axis is FST value, and X-axis is position across the genome. Blue line is genome-wide average Fst (0.208).

97

0.20 p=0.0082 0.15 t s F 0.10

0.05

250000 500000 750000 Geographic distance (m)

Figure 3.7: Results from a Mantel test for isolation by distance. Pairwise geographic distance is plotted against pairwise average FST for all 11 populations of field voles, with a line of best fit in blue.

Figure 3.8: Principal coordinates analysis summarizing allele frequency variation for ~5,890,000 SNPs among the 11 studied populations of field vole.

98

Posterior probabilities P{log(m) <> 0 | diffs}P{log(m) > 0}

● 0.95 0.9

● ●

● ●

● 0.9 ● 0.95

Figure 3.9: EEMS plot of significant deviation of effective migration rates (posterior probabilitiesP{log(m) < 0} P{log(m) < or > 0). Plotted are posterior probabilities for m < 0 (indicating restricted effective migration). Light red = 0.9-0.95, dark red > 0.95.

Reconstructing the mitochondrial genome:

After mapping to the M. agrestis mitochondrial genome and sequence quality filtering, each of the 124 individuals received on average ~438 thousand bases (median = ~503

99

thousand, range 16 thousand 1.5 million) in our first lane of sequencing. After re- pooling to even out sequencing effort across individuals, each individual averaged ~

507 thousand (median= ~531 thousand, range ~31 thousand - 1.2 million), ~509 thousand (median = ~549 thousand, range ~34 thousand 1.2 million), and ~514 thousand (median = ~543 thousand, range = ~34 thousand - 1.3 million) bases in our three subsequent lanes of sequencing. This resulted in an average of ~70x coverage across the ~16,000 bp of the mitochondrial genome.

Analysis of mtDNA data:

Our STRUCTURE analyses resulted in an optimal K-value (highest K) of K = 2 using the Evanno method. Plotting K = 2 shows that the two clusters are distributed geographically, one that primarily includes individuals from the southern populations, and the other with primarily northern individuals (Figure 3.10). The southern genetic cluster included individuals from Populations Eleven, Ten, Nine and Eight, and one individual from Population Six. The haplotype network confirmed this geographic pattern of variation, with a cluster of haplotypes that primarily included individuals from the southern populations (Eleven, Ten, Nine, Eight), and another larger cluster that included individuals from the more northern populations (Figure 3.11).

100

0 0 . 1

5 7 . 0

e l b a i r a v

1 k 0 5 . 0

al ue

v

2 k

5 2 . 0

0 0 . 0

M 4 2 5 M 7 1 5 M 6 1 5 M 5 1 5 M 4 1 5 M 3 1 5 M 7 8 M 6 8 M 2 9 M 0 9 M 4 1 2 1 M 1 3 3 M 7 2 3 M 6 2 3 M 8 0 3 M 6 0 2 M 1 0 2 M 5 9 1 M 4 9 1 M 3 9 1 M 1 7 1 M 5 0 2 M 5 3 3 M 4 3 3 M 9 2 3 M 2 2 3 M 1 2 3 M 8 1 3 M 7 1 3 M 4 5 2 M 3 5 2 M 2 5 2 M 3 4 2 M 4 0 2 M 9 6 1 M 2 0 2 M 8 9 1 M 6 6 1 M 7 4 6 M 0 4 6 M 9 3 6 M 8 3 6 M 2 3 6 M 1 3 6 M 9 8 9 1 M 8 8 9 1 M 2 0 1 M 1 0 1 M 7 0 1 M 2 2 4 M 1 3 4 3 K 2 K 1 K M 9 8 4 M 8 8 4 M 1 9 4 M 0 9 4 M 2 9 6 M 1 9 6 M 8 8 6 M 0 9 6 M 9 8 6 M 6 5 6 M 5 5 6 M 7 5 6 M 8 6 0 1 M 7 6 0 1 M 6 6 0 1 M 5 6 0 1 M 4 6 0 1 M 3 6 0 1 M 2 6 0 1 M 1 6 0 1 M 0 6 0 1 M 8 5 0 1 M 7 5 0 1 M 9 5 0 1 M 5 5 0 1 M 4 5 0 1 M 3 5 0 1 M 2 5 0 1 M 9 4 0 1 M 5 4 0 1 M 1 5 0 1 M 7 4 0 1 M 6 4 0 1 M 0 5 0 1 M 8 4 0 1 M 1 4 0 1 M 9 3 0 1 M 5 3 0 1 M 4 3 0 1 M 0 4 0 1 M 7 3 0 1 M 6 3 0 1 M 8 3 0 1 M 3 0 3 1 M 6 6 2 1 M 5 6 2 1 M 2 4 1 1 M 1 4 1 1 M 8 5 9 M 8 8 7 M 2 2 6 M 7 9 5 M 6 9 5 M 5 9 5 M 4 9 5 M 2 1 1 1 M 1 1 1 1 M 9 1 0 1 M 6 1 0 1 M 1 2 0 1 M 0 2 0 1 M 8 1 0 1 M 7 1 0 1 M 7 6 4 M 2 7 4 M 1 7 4 M 0 7 4 M 9 6 4 M 8 6 4 M 6 6 4

e l p m a

One Two Three Four Five Six s Seven Eight Nine Ten Eleven

Figure 3.10: K = 2 output from STRUCTURE analysis of mtDNA data for all 124 field vole individuals. Individuals are arranged by latitude and by population, with population names labeled underneath. The extent to which the genomes of individuals are attributed to the two clusters is indicated.

Figure 3.11: Median joining haplotype network for mtDNA data for 119 individual field voles. Branch lengths are roughly scaled by number of mutational differences. Mutations are noted by hash marks on branches. Populations and colors match those from map in Figure 3.1.

101

Our analysis of the mitochondrial haplotypes of all 119 individuals passing filtering, resulted in 764 segregating sites (S), the proportion of segregating sites (ps) =

0.0494, nucleotide diversity () = 0.00398, and Tajimas D = -1.906. The results for or 96 northern indiidals ere: S = 603, ps = 0.0385, = 0.00369 and Tajimas D

= -1.731; and or 23 sothern indiidals: S = 203, ps = 0.0130, = 0.00167, and

Tajimas D = -2.134. The pairwise average of per base substitutions between populations are summarized in Table 3.2, and ranged from 0.00119 to 0.00629 between populations. A Mantel test showed a significant signature of isolation by distance with a relationship between pairwise geographic distance and pairwise substitutions in our mitochondrial dataset (p = 0.0067, Figure 3.12).

Table 3.2: The number of base substitutions per site across the mitogenome, averaging over all sequence pairs between field vole populations. Standard error estimates are shown above the diagonal in grey italicized text. Analyses were conducted using the Maximum Composite Likelihood model (Tamura et al 2014), with 119 individuals that passed the quality filters. All codon positions were included and all positions containing gaps and missing data were eliminated. There were a total of 15,454 positions in the final dataset. Analyses were conducted in MEGA7 (Kumar et al 2016).

One Two Three Four Five Six Seven Eight Nine Ten Eleven

One 0.00000 0.00032 0.00024 0.00026 0.00020 0.00026 0.00027 0.00027 0.00036 0.00036 0.00052

Two 0.00306 0.00000 0.00024 0.00029 0.00026 0.00035 0.00015 0.00034 0.00032 0.00033 0.00051

Three 0.00368 0.00218 0.00000 0.00020 0.00019 0.00023 0.00017 0.00025 0.00032 0.00029 0.00050

Four 0.00384 0.00214 0.00300 0.00000 0.00024 0.00028 0.00022 0.00029 0.00034 0.00031 0.00052

Five 0.00382 0.00267 0.00360 0.00344 0.00000 0.00021 0.00022 0.00025 0.00032 0.00031 0.00048

Six 0.00439 0.00358 0.00429 0.00420 0.00397 0.00000 0.00029 0.00024 0.00029 0.00029 0.00041

Seven 0.00354 0.00119 0.00262 0.00249 0.00320 0.00396 0.00000 0.00028 0.00031 0.00029 0.00050

Eight 0.00441 0.00373 0.00447 0.00443 0.00437 0.00422 0.00417 0.00000 0.00030 0.00032 0.00039

Nine 0.00513 0.00395 0.00485 0.00494 0.00504 0.00505 0.00452 0.00463 0.00000 0.00029 0.00022

Ten 0.00483 0.00338 0.00422 0.00432 0.00465 0.00484 0.00389 0.00458 0.00386 0.00000 0.00033

Eleven 0.00629 0.00543 0.00628 0.00629 0.00615 0.00576 0.00593 0.00507 0.00324 0.00397 0.00000

102

s 0.008 p=0.0067 e c n e

r 0.006 e f f i d 0.004 e s i w r i 0.002 a P 250000 500000 750000 Geographic distance (m)

Figure 3.12: Results from a Mantel test for isolation by distance for mitogenome data of field voles. Pairwise geographic distance is plotted against average pairwise substitutions for all 11 populations, with a line of best fit in blue.

Discussion

Both mitochondrial and genomic data show a strong Celtic Fringe pattern of variation in field voles of Britain, although there was discordance in the geographic distribution of variation between nuclear and mtDNA, with the transition further north for the nuclear DNA. The two genetically distinct populations, hereafter referred to as our northern and southern genomic groups, represent the original colonizing population after the LGM and second colonizing population after the Younger Dryas, respectively. Close to 10,000 years of isolation between colonizing lineages has resulted in high levels of divergence, evenly distributed across the entire genome, with average pairwise FST values up to 0.208 between populations (Table 3.1, Figure 3.6).

We did not find any evidence for peaks of divergence or selection at specific sites in the genome (Figures 3.5, 3.6), pointing to genetic drift as a major mechanism driving

103

differentiation in these populations. In addition, we found strong signatures of isolation by distance in our nuclear data, but with a few geographic areas of restricted effective migration between populations (Figures 3.7, 3.9). In general, there was a pattern of genome-wide introgression distributed south to north between populations, although the pattern of transition between the northern and southern genomic group is

patch (Figre 3.3, 3.4). There were two highly distinct mitochondrial groups in

Britain, with a contact zone between populations Eight and Nine near Derbyshire

(Figures 3.1, 3.10, 3.11). The northern mitochondrial group was more diverse, with higher proportion of both mitochondrial segregating sites (ps = 0.0385 vs 0.0130) and mitochondrial nucleotide diversity ( = 0.00369 vs 0.00167).

These data are consistent with the dual colonization model of the formation of the Celtic Fringe in Britain (Searle et al 2009). The northern lineage most likely represents the original colonizing lineage post Last Glacial Maximum, it has higher mitochondrial haplotype diversity, and the pattern of introgression in our genome- wide SNP data fit with partial replacement by the subsequent colonizing lineage in the south of Britain (Figure 3.4). We found high levels of pairwise FST between the northern and southern populations, distributed relatively evenly across the entire genome (Table 3.1, Figure 3.6). This pattern of high genome-wide pairwise FST is similar to what was found in the bank vole (Clethrionomys glareolus), another small mammal with a Celtic Fringe pattern of genetic variation (Kotlík et al 2014). This pattern is also similar to what we find between cryptic species of the field vole in

Europe, although those cryptic species have been isolated ~30 - 70,000 years, and

104

therefore these two populations in Britain (isolated during the Younger Dryas) may reflect very early stages of divergence between species (Fletcher et al 2019). The fact that the high FST values are distributed genome-wide point to genetic drift as the major mechanism driving divergence between populations.

Both field voles and bank voles are known to have multi-year population cycles, characterized by fluctuations in population size that can vary by orders of magnitude in the local density of voles between population peaks and troughs (Krebs and Myers 1974, Wiger 1979, Bierman et al 2006). Population fluctuations are known to increase FST between populations, especially if there is spatial variation in demographic properties (Whitlock 1992). Field vole population densities in the

Kielder Forest in Britain have been monitored since the 1980s, and they show 3 - 4 year population cycling with up to four times higher population densities during peaks versus troughs (Bierman et al 2006). These types of demographic dynamics will lead to repeated alternative fixation of alleles between independently cycling populations and can explain the general patterns of high, genome-wide FST we see between our populations (Motro and Thomson 1982, Ehrich et al 2009). This population cycling presumably also occurred while our populations were isolated during the Younger

Dryas, and after colonization they came into secondary contact with extensive admixture. There is a significant signature of isolation by distance between our populations (Figures 3.7, 3.12), and there may be insufficient migration to counteract the signatures of drift between our populations (Harrison and Hastings 1996, Norén and Angerbjörn 2014, García-Navas et al 2015, Row et al 2016). These dynamics in

105

concert drove and will maintain high genome-wide differentiation between isolated populations.

Intriguingly, the geographic area of contact between mitochondrial groups in our study roughly coincides with the mitochondrial contact zone found in the bank vole, and in both systems introgression of nuclear loci occurs much farther north than the mitochondrial contact zone (Kotlík et al 2014). This pattern of geographic discordance between mitochondrial and nuclear markers, hereafter mito-nuclear discordance, can be driven by a variety of demographic and evolutionary factors

(Toews and Brelsford 2012). In general, the pattern in British field voles fits one of secondary contact and not incomplete lineage sorting; there is a limited geographic distribution of the southern mitochondrial haplotype throughout Britain, with a narrow contact zone (Figures 3.10, 3.11, Toews and Brelsford 2012). Most studies of mito- nuclear discordance following secondary contact find that clines in mitochondrial

DNA are wider than those in nuclear DNA (Toews and Brelsford 2012), including in recent studies of voles in the genus Microtus. A contact zone between lineages of the field vole in the Swiss Jura mountains found a higher extent of introgression in mtDNA compared to nuclear markers (Beysard et al 2012). In addition, Lin et al

(2018) found nearly no nuclear introgression in a contact zone between highly divergent lineages of the California vole (M. californicus), with modest, but increased mtDNA introgression. In both of these cases, the lineages diverged ~ 30 - 50,000 years BP, there is very little gene flow of nuclear DNA and the lineages are proposed to represent separate, cryptic species (Lin et al 2018, Fletcher et al 2019). Our two colonizing populations are more recently diverged (< ~10,000 years), with little

106

evidence for restriction of gene flow between populations (Figure 3.9).

The patterns found in our study are more similar to those found in the contact zone between lineages of the common vole (M. arvalis) in the Alps, where nuclear introgression was more extensive than mtDNA (Braaker and Heckel 2009). In both systems, the patterns can be explained by sex-biased dispersal; female Microtus have very limited dispersal, less than 100 meters for female M. arvalis (Boyce and Boyce

1988), and perhaps even less for female M. agrestis (Sandell et al 1990). Male-biased dispersal can generate more structuring in mtDNA and is a likely mechanism driving asymmetries in introgression between mitochondrial and nuclear DNA in our system

(Toews and Brelsford 2012). Female field voles are philopatric and territorial, and may be resistant to colonization by other female voles (Pasenhu et al 1998). In systems where females are philopatric, after colonization by a novel lineage, there is predicted to be higher penetrance of colonizer nuclear and Y-linked genes than mtDNA (Jones et al 1995, Herman et al 2019). Thus, the mito-nuclear discordance in our system can be explained by increased introgression of the southern nuclear genes after the second wave of colonization post Younger Dryas and secondary contact with the original northern population already on Britain.

We did not find any peaks of high FST across the genome in our pairwise analyses, as would be expected if there were genomic regions with restricted introgression, genomic regions of low recombination, or directional adaptive selection

(Figure 3.6, Clucas et al 2019, Wolf and Ellegren 2017). In addition, our genome-wide selection scan also did not reveal any loci that crossed the significance threshold

107

(Figure 3.5). Any signatures of selection may have been masked by drift caused by cyclical bottlenecks, driving genome-wide divergence (Jiménez-Mena et al 2016).

Analsis of the mitochondrial data reslted in negatie Tajimas D ales for all analses: the hole dataset, the northern haplotpes, and the sothern (Tajimas D = -

1.906, -1.731, and -2.134). Althogh a negatie Tajimas D can indicate that sequences have undergone purifying selection, such a result can also suggest a recent population bottleneck with subsequent expansion (Tajima 1989), but this was not detected in analyses of microsatellite data (Herman et al 2019). It is likely that our results with field voles were generated by population bottlenecks, consistent with a strong role of genetic drift in generating the genetic pattern that we found in field voles in Britain.

In conclusion, therefore, we found that field voles in Britain have a Celtic

Fringe pattern of nuclear and mitochondrial genomic data consistent with a history of dual colonization from Europe post-Pleistocene. Such a dual colonization scenario has been suggested for multiple small mammals entering Britain at the end of the last glaciation (Searle et al 2009). There are distinct northern and southern populations, with high genome-wide divergence and geographic structure with a strong signature of isolation by distance. Field voles undergo multi-year population cycling and we hypothesize that these demographic dynamics while our populations were isolated, lead to strong genetic drift and alternative fixation of alleles across the genome that we see in our data (Norén and Angerbjörn 2014). After secondary contact and subsequent admixture for thousands of years, there is still a strong signal of isolation by distance between the populations. There is also a signature of mito-nuclear discordance in our

108

data, with mitochondrial DNA showing a narrower contact zone, farther south than the genome-wide SNP data. This discordance can be explained by male-biased dispersal after the second population colonized Britain after the Younger Dryas. This study has added new insights into the role of post-Pleistocene climatic oscillations and colonization in structuring contemporary patterns of genomic diversity.

109 REFERENCES

Arthur, W. & Kettle, C. (2001). Geographic patterning of variation in segment number in geophilomorph centipedes: clines and speciation. Evolution and Development, 3,

3440. https://doi.org/10.1046/j.1525-142x.2001.00083.x

Barnes, I., Matheus, P., Shapiro, B., Jensen, D., & Cooper, A. (2002). Dynamics of

Pleistocene population extinctions in Beringian brown bears. Science, 295, 2267

2270. https://doi.org/10.1126/science.1067814

Beysard, M., Perrin, N., Jaarola, M., Heckel, G., & Vogel, P. (2011). Asymmetric and differential gene introgression at a contact zone between two highly divergent lineages of field voles (Microtus agrestis). Journal of Evolutionary Biology, 25, 400408. https://doi.org/10.1111/j.1420-9101.2011.02432.x

Bierman, S. M., Fairbairn, J. P., Petty, S. J., Elston, D. A., Tidhar, D., & Lambin, X.

(2006). Changes over time in the spatiotemporal dynamics of cyclic populations of field voles (Microtus agrestis L.). The American Naturalist, 167, 583590. https://doi.org/10.1086/501076

Bougeard, S. & Dray, S. (2018). Supervised multiblock analysis in R with the ade4

Package. Journal of Statistical Software, 86. https://doi.org/10.18637/jss.v086.i01

110

Boyce, C. C. K. & Boyce, J. L. (1988). Population biology of Microtus arvalis. II.

Natal and breeding dispersal of females. Journal of Animal Ecology, 57, 723-736.

Braaker, S. & Heckel, G. (2009). Transalpine colonisation and partial phylogeographic erosion by dispersal in the common vole (Microtus arvalis). Molecular Ecology, 18,

25182531. https://doi.org/10.1111/j.1365-294x.2009.04189.x

Brace, S., Ruddy, M., Miller, R., Schreve, D. C., Stewart, J. R., & Barnes, I. (2016).

The colonization history of British water vole ( amphibius (Linnaeus, 1758)): origins and development of the Celtic fringe. Proceedings of the Royal Society B:

Biological Sciences, 283, 20160130. https://doi.org/10.1098/rspb.2016.0130

Buggs, R. J. A. (2007). Empirical study of hybrid zone movement. Heredity, 99, 301

312. https://doi.org/10.1038/sj.hdy.6800997

Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34, i884i890. https://doi.org/10.1093/bioinformatics/bty560

Clucas, G. V., Lou, R. N., Therkildsen, N. O., & Kovach, A. I. (2019). Novel signals of adaptive genetic variation in northwestern Atlantic cod revealed by whole‐genome sequencing. Evolutionary Applications, 12, 19711987. https://doi.org/10.1111/eva.12861

111

Combe, F. J., Ellis, J. S., Lloyd, K. L., Cain, B., Wheater, C. P., & Harris, W. E.

(2016). After the Ice Age: the impact of post-glacial dispersal on the phylogeography of a small mammal, Muscardinus avellanarius. Frontiers in Ecology and Evolution, 4,

72. https://doi.org/10.3389/fevo.2016.00072

Conneller, C. (2007). Inhabiting new landscapes: settlement and mobility in Britain after the Last Glacial Maximum. Oxford Journal of Archaeology, 26, 215237. https://doi.org/10.1111/j.1468-0092.2007.00282.x

Davison, A. (2000). An east-west distribution of divergent mitochondrial haplotypes in British populations of the land snail, Cepaea nemoralis (Pulmonata). Biological

Journal of the Linnean Society, 70, 697706. https://doi.org/10.1111/j.1095-

8312.2000.tb00224.x

de Bruyn, M., Hoelzel, A. R., Carvalho, G. R., & Hofreiter, M. (2011). Faunal histories from Holocene ancient DNA. Trends in Ecology & Evolution, 26, 405413. https://doi.org/10.1016/j.tree.2011.03.021

Earl, D. A. & vonHoldt, B. M. (2011). STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method.

Conservation Genetics Resources, 4, 359361. https://doi.org/10.1007/s12686-011-

9548-7

112

Ehrich, D., Yoccoz, N. G., & Ims, R. A. (2009). Multi-annual density fluctuations and habitat size enhance genetic variability in two northern voles. Oikos, 118, 14411452. https://doi.org/10.1111/j.1600-0706.2009.17532.x

Fairweather, R., Bradbury, I. R., Helyar, S. J., et al (2018). Range-wide genomic data synthesis reveals transatlantic vicariance and secondary contact in Atlantic cod.

Ecology and Evolution, 8, 1214012152. https://doi.org/10.1002/ece3.4672

Filipi, K., Marková, S., Searle, J. B., & Kotlík, P. (2015). Mitogenomic phylogenetics of the bank vole Clethrionomys glareolus, a model system for studying end-glacial colonization of Europe. Molecular Phylogenetics and Evolution, 82, 245257. https://doi.org/10.1016/j.ympev.2014.10.016

Fletcher, N. K., Acevedo, P., Herman, J. S., Paupério, J., Alves, P. C., & Searle, J. B.

(2019). Glacial cycles drive rapid divergence of cryptic field vole species. Ecology and Evolution, 9, 14101-14113. https://doi.org/10.1002/ece3.5846

Galinsky, K. J., Bhatia, G., Loh, P.-R., Georgiev, S., Mukherjee, S., Patterson, N. J., &

Price, A. L. (2016). Fast principal-component analysis reveals convergent evolution of

ADH1B in Europe and East Asia. American Journal of Human Genetics, 98, 456472. https://doi.org/10.1016/j.ajhg.2015.12.022

113

García-Navas, V., Bonnet, T., Waldvogel, D., Wandeler, P., Camenisch, G., &

Postma, E. (2015). Gene flow counteracts the effect of drift in a Swiss population of snow voles fluctuating in size. Biological Conservation, 191, 168177. https://doi.org/10.1016/j.biocon.2015.06.021

Garrison, E. & Marth, G. (2012) Haplotype-based variant detection from short-read sequencing. arXiv preprint arXiv:1207.3907 [q-bio.GN]

Harrison, S. & Hastings, A. (1996). Genetic and evolutionary consequences of metapopulation structure. Trends in Ecology & Evolution, 11, 180183. https://doi.org/10.1016/0169-5347(96)20008-4

Harrison, R. G. & Larson, E. L. (2016). Heterogeneous genome divergence, differential introgression, and the origin and structure of hybrid zones. Molecular

Ecology, 25, 24542466. https://doi.org/10.1111/mec.13582

Herman, J. S. & Searle, J. B. (2011). Post-glacial partitioning of mitochondrial genetic variation in the field vole. Proceedings of the Royal Society B: Biological Sciences,

278, 36013607. https://doi.org/10.1098/rspb.2011.0321

Herman, J. S., Stojak, J., Paupério, J., Jaarola, M., Wójcik, J. M., & Searle, J. B.

(2019). Genetic variation in field voles (Microtus agrestis) from the British Isles:

114

selective sweeps or population bottlenecks? Biological Journal of the Linnean Society,

126, 852865. https://doi.org/10.1093/biolinnean/bly213

Hewitt, G. M. (2004). Genetic consequences of climatic oscillations in the Quaternary.

Philosophical Transactions of the Royal Society of London. Series B: Biological

Sciences, 359, 183195. https://doi.org/10.1098/rstb.2003.1388

Jacobi, R. M. & Higham, T. F. G. (2009). The early Lateglacial re-colonization of

Britain: ne radiocarbon eidence from Goghs Cae, sothest England.

Quaternary Science Reviews, 28, 18951913. https://doi.org/10.1016/j.quascirev.2009.03.006

Jiang, J.-Q., Wu, S.-X., Chen, J.-J., & Liu, C.-Z. (2018). Characterization of the complete mitochondrial genome of short-tailed field vole, Microtus agrestis.

Mitochondrial DNA Part B, 3, 845846. https://doi.org/10.1080/23802359.2018.1467240

Jiménez-Mena, B., Tataru, P., Brøndum, R. F., Sahana, G., Guldbrandtsen, B., &

Bataillon, T. (2016). One size fits all? Direct evidence for the heterogeneity of genetic drift throughout the genome. Biology Letters, 12, 20160426. https://doi.org/10.1098/rsbl.2016.0426

115

Jones, C. S., Noble, L. R., Jones, J. S., Tegelström, H., Triggs, G. S., & Berry, R. J.

(1995). Differential male genetic success determines gene flow in an experimentally manipulated mouse population. Proceedings of the Royal Society of London. Series B:

Biological Sciences, 260, 251-256.

Kotlík, P., Marko, S., Vojtek, L., Stratil, A., lechta, V., Hrl, P., & Searle, J. B.

(2014). Adaptive phylogeography: functional divergence between haemoglobins derived from different glacial refugia in the bank vole. Proceedings of the Royal

Society B: Biological Sciences, 281, 20140021. https://doi.org/10.1098/rspb.2014.0021

Krebs, C. J. & Myers, J. H. (1974). Population cycles in small mammals. Advances in

Ecological Research, 8, 267399. https://doi.org/10.1016/s0065-2504(08)60280-9

Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular Evolutionary

Genetics Analysis Version 7.0 for bigger datasets. Molecular Biology and Evolution,

33, 18701874. https://doi.org/10.1093/molbev/msw054

Leigh, J. W. & Bryant, D. (2015). popart: full-feature software for haplotype network construction. Methods in Ecology and Evolution, 6, 11101116. https://doi.org/10.1111/2041-210x.12410

116

Lin, D., Bi, K., Conroy, C. J., Lacey, E. A., Schraiber, J. G., & Bowie, R. C. K.

(2018). Mito-nuclear discordance across a recent contact zone for California voles.

Ecology and Evolution, 8, 62266241. https://doi.org/10.1002/ece3.4129

Lischer, H. E. L. & Excoffier, L. (2011). PGDSpider: an automated data conversion tool for connecting population genetics and genomics programs. Bioinformatics, 28,

298299. https://doi.org/10.1093/bioinformatics/btr642

Lou, R. N., Fletcher, N. K., Wilder, A. P., Conover, D. O., Therkildsen, N. O., &

Searle, J. B. (2018). Full mitochondrial genome sequences reveal new insights about post-glacial expansion and regional phylogeographic structure in the Atlantic silverside (Menidia menidia). Marine Biology, 165, 124. https://doi.org/10.1007/s00227-018-3380-5

Lovette, I. (2005). Glacial cycles and the tempo of avian speciation. Trends in Ecology

& Evolution, 20, 5759. https://doi.org/10.1016/j.tree.2004.11.011

Meisner, J. & Albrechtsen, A. (2018). Inferring population structure and admixture proportions in low-depth NGS data. Genetics, 210, 719731. https://doi.org/10.1534/genetics.118.301336

Montgomery, W. I., Provan, J., McCabe, A. M., & Yalden, D. W. (2014). Origin of

British and Irish mammals: disparate post-glacial colonisation and species

117

introductions. Quaternary Science Reviews, 98, 144165. https://doi.org/10.1016/j.quascirev.2014.05.026

Motro, U. & Thomson, G. (1982). On heterozygosity and the effective size of populations subject to size changes. Evolution, 36, 10591066. https://doi.org/10.1111/j.1558-5646.1982.tb05474.x

Nielsen, R., Korneliussen, T., Albrechtsen, A., Li, Y., & Wang, J. (2012). SNP calling, genotype calling, and sample allele frequency estimation from New-

Generation Sequencing data. PLoS ONE, 7, e37558. https://doi.org/10.1371/journal.pone.0037558

Norén, K. & Angerbjörn, A. (2013). Genetic perspectives on northern population cycles: bridging the gap between theory and empirical studies. Biological Reviews, 89,

493510. https://doi.org/10.1111/brv.12070

Pasenhu, J., Viitala, J., Marienberg, T., & Ritvanen, S. (1998). Matrilineal kin clusters and their effect on reproductive success in the field vole Microtus agrestis. Behavioral

Ecology, 9, 8592. https://doi.org/10.1093/beheco/9.1.85

Paupério, J., Herman, J. S., Melo-Ferreira, J., Jaarola, M., Alves, P. C., & Searle, J. B.

(2012). Cryptic speciation in the field vole: a multilocus approach confirms three

118

highly divergent lineages in Eurasia. Molecular Ecology, 21, 60156032. https://doi.org/10.1111/mec.12024

Petkova, D., Novembre, J., & Stephens, M. (2015). Visualizing spatial population structure with estimated effective migration surfaces. Nature Genetics, 48, 94100. https://doi.org/10.1038/ng.3464

Piertney, S. B., Stewart, W. A., Lambin, X., Telfer, S., Aars, J., & Dallas, J. F. (2005).

Phylogeographic structure and postglacial evolutionary history of water voles

(Arvicola terrestris) in the United Kingdom. Molecular Ecology, 14, 14351444. https://doi.org/10.1111/j.1365-294x.2005.02496.x

Pina-Martins, F., Silva, D. N., Fino, J., & Paulo, O. S. (2017). Structure_threader : An improved method for automation and parallelization of programs structure, fastStructure and MavericK on multicore CPU systems. Molecular Ecology

Resources, 17, e268e274. https://doi.org/10.1111/1755-0998.12702

Pritchard, J.K. & Donnelly, S.M. (2000) Inference of population structure using multilocus genotype data. Genetics, 155, 945959.

R Core Team. (2014). R: A language and environment for statistical computing. R

Foundation for Statistical Computing, Vienna, Austria. 2013. ISBN 3-900051-07-0. http://www.R-project.org.

119

Row, J. R., Wilson, P. J., & Murray, D. L. (2016). The genetic underpinnings of population cyclicity: establishing expectations for the genetic anatomy of cycling populations. Oikos, 125, 16171626. https://doi.org/10.1111/oik.02736

Sandell, M., Agrell, J., Erlinge, S., & Nelson, J. (1990). Natal dispersal in relation to population density and sex ratio in the field vole, Microtus agrestis. Oecologia, 83,

145149. https://doi.org/10.1007/bf00317745

Searle, J. B., Kotlík, P., Rambau, R. V., Marková, S., Herman, J. S., & McDevitt, A.

D. (2009). The Celtic fringe of Britain: insights from small mammal phylogeography.

Proceedings of the Royal Society B: Biological Sciences, 276, 42874294. https://doi.org/10.1098/rspb.2009.1422

Searle, J. B. & Wilkinson, P. J. (1987). Karyotypic variation in the common shrew

(Sorex araneus) in Britain a Celtic fringe. Heredity, 59, 345351. https://doi.org/10.1038/hdy.1987.141

Skotte, L., Korneliussen, T. S., & Albrechtsen, A. (2013). Estimating individual admixture proportions from next generation sequencing data. Genetics, 195, 693702. https://doi.org/10.1534/genetics.113.154138

Steffensen, J.P., Andersen, K.K., Bigler, M., Clausen, H.B., Dahl-Jensen, D., Fischer,

H., Goto-Azuma, K., Hansson, M., Johnsen, S.J., Jouzel, J. & Masson-Delmotte, V.

120

(2008). High-resolution Greenland ice core data show abrupt climate change happens in few years. Science, 321, 680684. https://doi.org/10.1126/science.1157707

Taberlet, P., Fumagalli, L., Wust‐Saucy, A., & Cosson, J. (1998). Comparative phylogeography and postglacial colonization routes in Europe. Molecular Ecology, 7,

453464. https://doi.org/10.1046/j.1365-294x.1998.00289.x

Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by

DNA polymorphism. Genetics, 123, 585-595.

Tamura, K., Nei, M., & Kumar, S. (2004). Prospects for inferring very large phylogenies by using the neighbor-joining method. Proceedings of the National

Academy of Sciences USA, 101, 1103011035. https://doi.org/10.1073/pnas.0404206101

Therkildsen, N. O. & Palumbi, S. R. (2016). Practical low-coverage genomewide sequencing of hundreds of individually barcoded samples for population and evolutionary genomics in nonmodel species. Molecular Ecology Resources, 17, 194

208. https://doi.org/10.1111/1755-0998.12593

Toews, D. P. L. & Brelsford, A. (2012). The biogeography of mitochondrial and nuclear discordance in animals. Molecular Ecology, 21, 39073930. https://doi.org/10.1111/j.1365-294x.2012.05664.x

121

Yalden, D. W. (1982). When did the mammal fauna of the British Isles arrive?

Mammal Review, 12, 157. https://doi.org/10.1111/j.1365-2907.1982.tb00007.x

Walker, M.J., Berkelhammer, M., Björck, S., Cwynar, L.C., Fisher, D.A., Long, A.J.,

Lowe, J.J., Newnham, R.M., Rasmussen, S.O. & Weiss, H. (2012). Formal subdivision of the Holocene Series/Epoch: a Discussion Paper by a Working Group of

INTIMATE (Integration of Ice-core, Marine and Terrestrial Records) and the

Subcommission on Quaternary Stratigraphy (International Commission on

Stratigraphy). Journal of Quaternary Science, 27, 649659. https://doi.org/10.1002/jqs.2565

Whitlock, M. C. (1992). Temporal fluctuations in demographic parameters and the genetic variance among populations. Evolution, 46, 608615. https://doi.org/10.1111/j.1558-5646.1992.tb02069.x

Wiger, R. (1979). Demography of a cyclic population of the bank vole Clethrionomys glareolus. Oikos, 33, 373-385. https://doi.org/10.2307/3544325

Wolf, J. B. W. & Ellegren, H. (2016). Making sense of genomic islands of differentiation in light of speciation. Nature Reviews Genetics, 18, 87100. https://doi.org/10.1038/nrg.2016.133

122

APPENDICES

CHAPTER 1 APPENDIX

Figure S1.1: Distribution of FST values for all 35,434 SNPs genotyped in the field voles (all cryptic species) studied here.

Figure S1.2: Calibration plots including information on the goodness-of-fit of the climatic niche models and their predictive performance on the validation dataset. Model.4 is the most explanatory and also has the best calibration (Hosmer and Lemeshow test (H-L), P > 0.05), lowest AIC scores, and with the highest AUC values, all indicative of a good predictive performance.

123

Figure S1.3: Predicted probability for occurrence of the field vole at the present time (not distinguishing between cryptic species) according to the four assessed climatic niche models.

Figure S1.4: Calibration plots showing the relationship between the predicted probability of occurrence for the climatic niche models and the observed proportion of a given cryptic species of the field vole (Portuguese, Mediterranean, short-tailed): A) Portuguese vs Mediterranean model and B) Mediterranean vs short-tailed model. Most localities are located in the extremes of each species (P = 0 and 1).

124

Figure S1.5: Current climatic favourability for the three cryptic species of field vole in western Europe (P: Portuguese, M: Mediterranean and St: short-tailed).

Figure S1.6: Predicted climatic distribution for Portuguese (P), Mediterranean (M) and short- tailed (St) field vole cryptic species in 2080.

Methods for Fig. S1.6: Model predictions suggest that ongoing climate change will negatively affect the climatic potential for the three cryptic species in western Europe. The fuzzy increment indices for the three species according to new models (Figs 2 and 4) were:

125

Portuguese: I = -0.653; Mediterranean: I = -0.566; and short-tailed: I = -0.386. That is, the climatic potential in 2080 for the Portuguese cryptic species is expected to decrease 65% of its current distribution, Mediterranean 56.6% and short-tailed 39%. When the predicted favourabilities for 2080 are used to explore the biogeographical relationships between cryptic species, no areas for coexistence are predicted, only a few squares for the Portuguese and

Mediterranean. The expected relationships between these species for the future differ of that obtained for present. Here, the Portuguese shows potential to displace the Mediterranean in areas of potential competition, as these are the core areas of the Portuguese cryptic species distribution. The situation for the relationship between the Mediterranean and short-tailed is quite similar to that found for present. However, for future a potential competitive exclusion is possible in some localities of eastern Alps, where the short-tailed cryptic species could be excluded by the Mediterranean.

126

Figure S1.7: Intraspecific genetic structure within the short-tailed field vole using 35,434 SNP loci. Individuals are labelled by their country of origin (see legend). PCA analysis was restricted to only short-tailed field vole individuals from mainland Europe and UK. Individuals from small offshore islands were removed to show general geographic patterns within the species.

Table S1.1: List of field vole individuals and sampling localities included in this study, ordered according to cryptic species (Portuguese, Mediterranean, short-tailed), country of origin and location. Voucher specimens in collections of National Museums Scotland (prefix NMS.Z), J.B. Searle tissue collection (no prefix) or CIBIO, University of Porto (prefix SM).

Sample ID Museum ID LOCATION CRYPTIC COUNTRY Latitude Longitude Tissue Type SEX SPECIES

NMA01 DK02 Nørre Farup, Ribe Short-tailed DENMARK 55.3544 8.7328 foot NA NMA02 DK09 Nørre Farup, Ribe Short-tailed DENMARK 55.3544 8.7328 foot NA NMA03 DK04 Mandø, Ribe Short-tailed DENMARK 55.3072 8.6597 foot NA NMA04 NMS.Z.2009.101.652 Sheep Island, Cumbria Short-tailed UK ENGLAND foot M 54.0641 -3.2024 NMA05 NMS.Z.2009.101.658 Sheep Island, Cumbria Short-tailed UK ENGLAND foot M 54.0641 -3.2024 NMA06 NMS.Z.2009.101.659 Sheep Island, Cumbria Short-tailed UK ENGLAND foot M 54.0641 -3.2024 NMA07 NMS.Z.2009.101.1041 Thorley, Isle of Wight Short-tailed UK ENGLAND foot F 50.6896 -1.4733 NMA08 NMS.Z.2009.101.1042 Thorley, Isle of Wight Short-tailed UK ENGLAND foot M 50.6896 -1.4733 NMA09 NMS.Z.2009.101.1043 Thorley, Isle of Wight Short-tailed UK ENGLAND foot M 50.6896 -1.4733 NMA10 NMS.Z.2009.101.466M Broadmayne, Dorset Short-tailed UK ENGLAND foot F 50.6730 -2.3976

127

NMA11 NMS.Z.2009.101.467M Dorchester, Dorset Short-tailed UK ENGLAND foot M 50.7088 -2.4404 NMA12 NMS.Z.2009.101.468M Dorchester, Dorset Short-tailed UK ENGLAND foot M 50.7088 -2.4404 NMA13 NMS.Z.2009.101.490M Hexham, Northumberland Short-tailed UK ENGLAND foot F 54.9885 -2.1109 NMA14 NMS.Z.2009.101.494M Newcastle, Northumberland Short-tailed UK ENGLAND foot M 54.9065 -1.7442 NMA15 NMS.Z.2009.101.1079M Mildenhall, Suffolk Short-tailed UK ENGLAND foot M 52.3362 0.5246 NMA16 K2 Kielder, Northumberland Short-tailed UK ENGLAND 55.2187 -2.4448 foot NA NMA17 F131C3 Abbeville, Picardie Short-tailed FRANCE 50.0833 1.5667 foot NA NMA18 F100C3 Abbeville, Picardie Short-tailed FRANCE 50.0833 1.5667 foot NA NMA19 F182C7 Abbeville, Picardie Short-tailed FRANCE 50.1097 1.8278 foot NA NMA21 F264E1 St Malo, Bretagne Short-tailed FRANCE 48.6482 -2.0261 foot NA NMA22 F308E1 St Malo, Bretagne Short-tailed FRANCE 48.6482 -2.0261 foot NA NMA23 F309E1 St Malo, Bretagne Short-tailed FRANCE 48.6482 -2.0261 foot NA NMA24 MA706 Trebeurden, Bretagne Short-tailed FRANCE 48.7500 -3.5667 foot NA NMA25 D29 Katinger Watt, Schleswig-Holstein Short-tailed GERMANY 54.2831 8.8403 foot NA NMA26 D30 Katinger Watt, Schleswig-Holstein Short-tailed GERMANY 54.2831 8.8403 foot NA NMA27 D31 Katinger Watt, Schleswig-Holstein Short-tailed GERMANY 54.2831 8.8403 foot NA NMA28 NMS.Z.2009.101.10 Stilligarry, South Uist Short-tailed UK SCOTLAND foot M 57.3235 -7.3708 NMA29 NMS.Z.2009.101.14 Stilligarry, South Uist Short-tailed UK SCOTLAND foot M 57.3235 -7.3708 NMA30 NMS.Z.2009.101.180 Carinish, North Uist Short-tailed UK SCOTLAND foot M 57.5176 -7.3233 NMA31 NMS.Z.2009.101.187 Carinish, North Uist Short-tailed UK SCOTLAND foot M 57.5176 -7.3233 NMA32 NMS.Z.2009.101.350 Uig, Skye Short-tailed UK SCOTLAND foot F 57.6027 -6.3505 NMA33 NMS.Z.2009.101.360 Loch Leathan, Skye Short-tailed UK SCOTLAND foot M 57.4798 -6.1822 NMA34 NMS.Z.2009.101.364 Storr, Skye Short-tailed UK SCOTLAND foot M 57.5113 -6.1531 NMA35 NMS.Z.2009.101.383 Loch nam Brae Ruaidh, South Uist Short-tailed UK SCOTLAND foot M 57.3000 -7.3200 NMA36 NMS.Z.2009.101.385 Lochmaddy, North Uist Short-tailed UK SCOTLAND foot F 57.6055 -7.1585 NMA37 NMS.Z.2009.101.87 M Loch Assynt, Sutherland Short-tailed UK SCOTLAND foot M 58.1785 -5.0406 NMA38 NMS.Z.2009.101.92 M Achmelvich, Sutherland Short-tailed UK SCOTLAND foot M 58.1673 -5.3067 NMA39 NMS.Z.2009.101.514M Sangomore, Durness Short-tailed UK SCOTLAND foot M 58.5689 -4.7407 NMA40 NMS.Z.2009.101.14S Tornbotten, Oland Short-tailed SWEDEN 56.6667 16.5333 foot M NMA41 NMS.Z.2009.101.15S Lindsdal, Smaland Short-tailed SWEDEN 56.6815 16.2921 foot M NMA42 NMS.Z.2009.101.16S Eso, Oland Short-tailed SWEDEN 56.6278 16.5126 foot M NMA44 NMS.Z.2009.101.1131M Llanddona, Anglesey Short-tailed UK WALES foot M 53.2879 -4.1391 NMA45 NMS.Z.2009.101.1298M Capel Mawr, Anglesey Short-tailed UK WALES foot M 53.2143 -4.3661 NMA46 NMS.Z.2009.101.1299M Capel Mawr, Anglesey Short-tailed UK WALES foot M 53.2143 -4.3661 NMA47 NMS.Z.2009.101.1916 M Aberystwyth, Dyfed Short-tailed UK WALES foot M 52.3769 -3.9964 NMA48 NMS.Z.2009.101.1917 M Aberystwyth, Dyfed Short-tailed UK WALES foot M 52.3769 -3.9964 NMA50 SM.867 Malcata Portuguese PORTUGAL 40.3000 -6.9833 ear F NMA51 SM.211 Quinta da Taberna, Serra da Estrela Portuguese PORTUGAL 40.4833 -7.4167 ear M NMA52 SM.204 Quinta da Taberna, Serra da Estrela Portuguese PORTUGAL 40.4833 -7.4167 ear F NMA53 SM.207 Quinta da Taberna, Serra da Estrela Portuguese PORTUGAL 40.4833 -7.4167 ear M NMA54 SM.93 Rego, Celorico de Bastos Portuguese PORTUGAL 41.4167 -8.0833 foot F NMA55 SM.95 Rego, Celorico de Bastos Portuguese PORTUGAL 41.4167 -8.0833 foot M NMA56 SM.117 Lamas de Olo, Serra do Alvão Portuguese PORTUGAL 41.3667 -7.7833 ear M NMA57 SM.118 Lamas de Olo, Serra do Alvão Portuguese PORTUGAL 41.3667 -7.7833 ear F NMA58 SM.110 Lagoa, Vila Pouca de Aguiar Portuguese PORTUGAL 41.5333 -7.5167 ear M NMA59 SM.79 Lagoas de Bertiandos Portuguese PORTUGAL 41.7667 -8.6167 ear F NMA60 SM.81 Lagoas de Bertiandos Portuguese PORTUGAL 41.7667 -8.6167 ear F NMA61 SM.82 Lagoas de Bertiandos Portuguese PORTUGAL 41.7667 -8.6167 ear M NMA62 SM.310 Vilar das Almas, Ponte de Lima Portuguese PORTUGAL 41.6333 -8.5500 ear M NMA63 SM.311 Vilar das Almas, Ponte de Lima Portuguese PORTUGAL 41.6333 -8.5500 ear F NMA64 SM.162 Ala 1, Macedo de Cavaleiros Portuguese PORTUGAL 41.6000 -7.0000 ear M NMA65 SM.166 Ala 1, Macedo de Cavaleiros Portuguese PORTUGAL 41.6000 -7.0000 foot F NMA66 SM.169 Ala 2, Macedo de Cavaleiros Portuguese PORTUGAL 41.6000 -6.9833 ear F NMA67 SM.171 Ala 2, Macedo de Cavaleiros Portuguese PORTUGAL 41.6000 -6.9833 ear M NMA68 SM.152 Ala 1, Macedo de Cavaleiros Portuguese PORTUGAL 41.6000 -7.0000 foot M NMA69 SM.322 Picos da Europa Mediterranean SPAIN 43.1000 -5.0000 foot F NMA70 SM.327 Picos da Europa Mediterranean SPAIN 43.1000 -5.0000 foot M NMA71 SM.329 Picos da Europa Mediterranean SPAIN 43.1000 -5.0000 foot M

128

NMA72 SM.333 Picos da Europa Mediterranean SPAIN 43.1000 -5.0000 foot F NMA73 SM.334 Picos da Europa Mediterranean SPAIN 43.1000 -5.0000 foot F NMA74 SM.348 Burgos, Sierra de la Demanda Mediterranean SPAIN 42.3000 -3.2667 foot F NMA75 SM.349 Burgos, Sierra de la Demanda Mediterranean SPAIN 42.3000 -3.2667 foot F NMA76 SM.356 Burgos, Sierra de la Demanda Mediterranean SPAIN 42.3000 -3.2667 foot F NMA77 SM.357 Burgos, Sierra de la Demanda Mediterranean SPAIN 42.3000 -3.2667 foot M NMA78 SM.364 Burgos, Sierra de la Demanda Mediterranean SPAIN 42.3000 -3.2667 foot M NMA79 SM.343 Biskaia Mediterranean SPAIN 43.0333 -2.6667 foot M NMA80 SM.378 Eugui, Navarra Mediterranean SPAIN 42.9667 -1.5167 tail F NMA81 SM.380 Eugui, Navarra Mediterranean SPAIN 42.9667 -1.5167 tail M NMA82 SM.385 Eugui, Navarra Mediterranean SPAIN 42.9667 -1.5167 tail F NMA85 SM.410 Granollers Mediterranean SPAIN 41.8000 2.2833 foot F NMA86 SM.411 Granollers Mediterranean SPAIN 41.8000 2.2833 foot F NMA87 SM.412 Andorra Mediterranean ANDORRA 45.1333 17.6000 tail F NMA88 SM.413 Andorra Mediterranean ANDORRA 45.1333 17.6000 ear F

Table S1.2: Results of the climatic niche models explaining the current distribution range of the field vole (without distinguishing the cryptic species; coefficient / Wald test statistics; *, P < 0.05; **, P < 0.01; ***, P < 0.001). Predictor Model.1 Model.2 Model.3 Model.4 BIO1 -1.5e-02 / -10.77*** -1.9e-02 / -12.95*** -1.7e-02 / -11.97*** -2.1e-02 / -13.62*** BIO12 -2.7e-04 / -10.31*** -2.4e-04 / -9.25*** BIO12 -5.0e-04 / -2.79** -8.5e-04 / -4.66*** -1.4e-03 / 4.90*** 8.5e-04 / 2.86** BIO122 -3.7e-06 / -7.66*** -3.2e-06 / -6.50*** (Intercept) 2.3 / 11.31*** 3.3 / 14.63*** 1.3 / 5.80*** 2.39 / 9.29*** BIO1: annual mean temperature; BIO12: annual precipitation as climatic predictors (Hijmans et al 2005). Squared predictors were considered in order to account for non-linear relationships.

Table S1.3: Results of the climatic niche models discriminating between cryptic species of the field vole: Portuguese vs Mediterranean and Mediterranean vs short-tailed (coefficient / Wald test statistics; n.s., P > 0.05; *, P < 0.05; **, P < 0.01; ***, P < 0.001). Predictor Portuguese Mediterranean vs vs Mediterranean Short-tailed BIO15 0.8 / 2.25* BIO3 3.7 / 1.67* Latitude 2.0 / 2.72** BIO9 -5.2e-03 / -2.22* (Intercept) 263.2 / 1.16 n.s. -96.6 / -2.72** BIO3: isothermality; BIO9: mean temperature of driest quarter; BIO15: precipitation seasonality (coefficient of variation) (Hijmans et al 2005)

129

Methods for Table S1.3: Using individuals genotyped with diagnostic mtDNA and nuclear markers (N = 377) a model was parameterised in order to find ecogeographical gradients able to discriminate between the different cryptic species of field vole. For this purpose, two generalised models (binomial and logic link function) were performed in which the dependent variables were Portuguese vs Mediterranean and Mediterranean vs short-tailed. Bioclimatic variables (19 predictors; Hijmans et al 2005) and geographical variables (latitude and longitude) were used as predictors. A forward-backward stepwise procedure based on AIC was used to select the most parsimonious models. Table S1.2 shows the final models for each pair of neighbour lineages.

The predictive performance of the models was high: Portuguese vs Mediterranean

(AUC: 0.999 and H-L: 5.78, P > 0.05) and Mediterranean vs short-tailed (AUC: 0.999 and H-

L: 10.94, P > 0.05)(see Figure S1.2). These models were used to assign a given species to each 50 x 50 km UTM squares in which the field vole is present in western Europe.

130

CHAPTER 2 APPENDIX

Supplementary Methods:

PCR reagents and conditions

In all cases except microsatellite PCR for fragment analysis, the PCR reactions were carried ot in 25L total olme and sing rapid-ramping thermocyclers.

Mitochondrial CO1 locus

PCR reagents: 2.5 L 10 Initrogen bffer, 1.25L MgCl2 (50 g/L), 5L dNTPs

(1.25 g/L), 2.5 L BSA (2g/L), 0.5 L of each primer (10 M), and 0.1 L Taq polmerase (5 U/L) (Initrogen, Carlsbad, CA, USA), and 0.5 L template DNA (20 ng/L). PCR conditions: 94C for 2 min for initial denatration, folloed b 35 ccles of: 94°C for 30 sec, 47°C for 30 sec, 72°C for 1 min, and ending with a 10 min extension at 72°C.

Nuclear loci initial PCRs

NEXCAL-1 and -2

PCR reagents: 3.75 L 10 Initrogen bffer, 1.5 L MgCl2 (50 g/L), 5L dNTPs

(1.25 g/L), 2.5 L BSA (2g/L), 0.5 L of each primer (10 M), and 0.1 L Taq polmerase (5 U/L) (Initrogen), and 0.5 L template DNA (20 ng/L). PCR conditions: 94°C for 2 min for initial denaturation, followed by 35 cycles of: 94°C for

30 sec, 57°C for 30 sec, 72°C for 20 sec, and ending with a 5 min extension at 72°C.

SsimNEX3823-F and R

131

PCR reagents: 3.75 L 10 Initrogen bffer, 0.75 L MgCl2 (50 g/L), 4L dNTPs

(1.25 g/L), 2.5 L BSA (2g/L), 0.4 L of each primer (10 M), and 0.25 L Taq polmerase (5 U/L) (Initrogen), and 0.5 L template DNA (20 ng/L). PCR conditions: at 94°C for 2 min for initial denaturation, followed by 30 cycles of: 94°C for 30 sec, 58°C for 45 sec, 72°C for 45 sec, and ending with a 30 min extension at

72°C.

SsimNEX9017-F and -R

PCR reagents: 3.75 L 10 Initrogen bffer, 0.75 L MgCl2 (50 g/L), 4L dNTPs

(1.25 g/L), 2.5 L BSA (2g/L), 0.4 L of each primer (10 M), and 0.25 L Taq polmerase (5 U/L) (Initrogen), and 0.5 L template DNA (20 ng/L). PCR conditions: 94°C for 2 min for initial denaturation, followed by 30 cycles of: 94°C for

30 sec, 56°C for 45 sec, 72°C for 45 sec, and ending with a 30 min extension at 72°C.

Nuclear loci second PCR to anneal Nextera indexing primers

PCR master mies contained 3.75 L 10 Initrogen bffer, 0.75 L MgCl2 (50 mM),

4L dNTPs (1.25 g/L), 2.5 L BSA (2g/L), and 0.25 L Taq polymerase (5

U/L). For each indiidal, a niqe combinatorial of 5L Netera i5 and i7 primers and 2L of each PCR prodct generated aboe ere added to 15L of the PCR master mix. PCR conditions: 72°C for 3 min and 98°C for 30 sec followed by 5 cycles of:

98°C for 10 sec, 63°C for 30 sec, and 72°C for 3 min.

Microsatellite fragment analysis

132

Nine microsatellite loci were amplified in three multiplex groups as described in Hare and Borchardt-Wier (2014). Briefly, the reactions contained 10 ng DNA template,

0.16 M each primer, 0.5U Invitrogen Taq polymerase, 1.5x Invitrogen buffer, 0.2 mg/ml BSA, 1.5 mM MgCl2 , 0.2 mM each dNTP, in 10 L total reaction volume.

PCR conditions were 94°C for 2 min; 30 cycles of 94°C for 30 s, 58°C for 45 s, 72°C for 45 s; with 72°C for 30 min final extension.

Custom filtering for nuclear sequences

For the nuclear sequences, individuals were first separated by Nextera index and then loci were separated and sequences quality filtered using custom perl scripts.

The calmodulin (Cal-A) nuclear intron was filtered separately from the other loci because the Cal-A amplicon is longer than the 500 bp output from the 2x250 MiSeq and the paired ends could not be assembled into an overlapping contig. Briefly, for the

(Cal-A) locus: the script trimmed 25 bp from each end of the read to remove sequencing errors, assembled each pair of reads into a concatenated sequence and assigned each concatenated sequence to a locus based on a blast to a reference Cal-A sequence. The Cal locus has two paralagous loci, but no Cal-B sequences were detected in our samples based on blasting to a reference Cal-B sequence. After being assigned to the reference sequence, Cal-A sequences were pooled from all individual samples to identify all possible alleles in the population. The two most frequent alleles observed in each individual (or one for homozygous individuals) were accepted as the genotype. Individuals were called homozygous if the second most frequent allele was

< 0.1x as frequent as the most frequent alleles, all others were discarded as sequencing

133

error. All retained Cal-A sequences (two identical haplotypes for homozygotes, total =

2n) were aligned using GENEIOUS PRO. 5.3.4 (Drummond et al 2011).

For the other nuclear loci (NuSsim3823 and NuSsim9017) the script trimmed

25 bp from the 3 end of each read to remoe seqencing error, assembled each overlapping pair of reads into a contig and assigned each contig a locus based on blast results to NuSsim3823 and NuSsim9017 reference sequence. The NuSsim3823 or

NuSsim9017 sequences were pooled from all individual samples to identify all possible alleles. The top two alleles for each individual (or one for homozygous individuals) were assigned based on sequence counts. Individuals were called homozygous if the second most frequent allele was < 0.1x as frequent as the most frequent alleles, all others were discarded as sequencing error. After assignment, sequences were assembled and aligned using GENEIOUS PRO. 5.3.4 (Drummond et al 2011) and microsatellite repeat regions were visually identified and trimmed from the aligned sequences. The remaining flanking regions were concatenated and saved in the FASTA format for subsequent analyses.

Table S2.1: Microsatellite amplification success and population variation.

% missing FIS Ar(16) sim sol Avg. sim Avg. sol Avg. sim Avg. sol Ssim23 0 0 0.229 0.051 7.6142 10.016 Ssim13 0.6 0 0.103 0.155 4.7454 4.63 Ssim38 17.4 1 0.395 0.233 7.0742 4.5005 Ssim90 0 0 0.031 -0.007 8.7682 4.712 Ssim1436 4.5 67.7 0.262 na na na Ssim10 0.6 35.3 0.309 1.000 8.7928 1.2885 Ssim94 0 0 0.074 0.024 5.0148 4.2625 Ssim93 0.6 0 0.052 0.047 4.606 7.474 Ssim51 5.1 0 -0.060 0.082 3.386 2.742 average 3.2 11.6 0.155 0.198 6.250 4.953

134

160 Ssim23440 140

120

100

80 Sim Ssim23440 N=344 gene copies 60 Sol Ssim23440 N=205 gene copies 40

20

0 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 3 8 5 5 6 6 7 7 8 8 9 9 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3

200 Ssim3823 180 160 140 120 100 Sim Ssim3823 N=284 gene copies 80 Sol Ssim3823 N=198 gene copies 60 40 20 0 9 2 5 8 1 4 7 0 3 6 9 2 5 8 1 4 7 0 3 6 9 2 5 2 3 3 3 4 4 4 5 5 5 5 6 6 6 7 7 7 8 8 8 8 9 9 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3

250 Ssim1303

200

150 Sim Ssim1303 N=346 gene copies 100 Sol Ssim1303 N=204 gene copies

50

0 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

135

200 Ssim1436 180

160 140

120 100 Sim Ssim1436 N=340 gene copies 80 Sol Ssim1436 N=66 gene copies 60 40 20

0 3 6 9 2 5 8 1 4 7 0 3 6 9 2 5 8 1 4 7 0 3 6 9 2 5 8 1 4 7 0 3 0 0 0 1 1 1 2 2 2 3 3 3 3 4 4 4 5 5 5 6 6 6 6 7 7 7 8 8 8 9 9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

140 Ssim10007 120

100

80 Sim Ssim10007 N=346 gene copies 60 Sol Ssim10007 N=132 gene copies

40

20

0 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2

160 Ssim948 140

120

100

80 Sim Ssim948 N=355 gene copies 60 Sol Ssim948 N=206 gene copies

40

20

0 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 8 9 9 9 9 9 0 0 0 0 0 1 1 1 1 1 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2

136

140 Ssim9323 120

100

80 Sim Ssim9323 N=351 gene copies 60 Sol Ssim9323 N=206 gene copies 40

20

0 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 3 4 4 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

250 Ssim5132

200

150 Sim Ssim5132 N=339 gene copies Sol Ssim5132 N=206 gene copies 100

50

0 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 5 7 9 1 3 4 4 4 5 5 5 5 5 6 6 6 6 6 7 7 7 7 7 8 8 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

140 Ssim9017 120

100

80 Sim Ssim9017 N=352 gene copies Sol Ssim9017 N=206 gene copies 60

40

20

0 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 0 2 4 6 8 9 9 9 9 9 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Figure S2.1: Microsatellite allele frequency distributions based on the common allelic binset applied to both S. s. solidissima and S. s. similis. Note that unequal sample sizes were used for the two subspecies (N and locus name listed on each graph).

137

CHAPTER 3 APPENDIX

Table S3.1: List of individual field vole specimens used in the study, including sampling county (within Britain), more specific location, latitude, longitude, and assignment to our 11 populations. Sample IDs are specimen ID numbers from the National Museums Scotland.

Sample ID County Specific Location Latitude Longitude Population 524M Sutherland Durness, Balnakeil 58.5845 -4.7677 One Durness, Sutherland 513M Sangomore 58.5689 -4.7407 One Durness, Sutherland 514M Sangomore 58.5689 -4.7407 One Durness, Sutherland 515M Sangomore 58.5689 -4.7407 One Durness, Sutherland 516M Sangomore 58.5689 -4.7407 One Durness, Sutherland 517M Sangomore 58.5689 -4.7407 One 87M Sutherland Loch Assynt 58.1785 -5.0406 One 86M Sutherland Achmelvich 58.1685 -5.2983 One 90M Sutherland Achmelvich 58.1673 -5.3067 One 92M Sutherland Achmelvich 58.1673 -5.3067 One Lochinver, Sutherland 1214M Badnaban 58.1304 -5.2599 One 171M Easter Ross Balintore 57.7529 -3.9189 Two 193M Easter Ross Balintore 57.7529 -3.9189 Two 194M Easter Ross Balintore 57.7529 -3.9189 Two 195M Easter Ross Balintore 57.7529 -3.9189 Two 201M Easter Ross Balintore 57.7529 -3.9189 Two 206M Easter Ross Balintore 57.7529 -3.9189 Two 308M Easter Ross Balintore 57.7529 -3.9189 Two 326M Easter Ross Balintore 57.7529 -3.9189 Two 327M Easter Ross Balintore 57.7529 -3.9189 Two 331M Easter Ross Balintore 57.7529 -3.9189 Two 205M Inverness Kilvean 57.4618 -4.2679 Three 243M Inverness Kirkhill 57.4541 -4.4183 Three 252M Inverness Kirkhill 57.4541 -4.4183 Three 253M Inverness Kirkhill 57.4541 -4.4183 Three 254M Inverness Kirkhill 57.4541 -4.4183 Three 317M Inverness Kirkhill 57.4541 -4.4183 Three 318M Inverness Kirkhill 57.4541 -4.4183 Three

138

321M Inverness Kirkhill 57.4541 -4.4183 Three 322M Inverness Kirkhill 57.4541 -4.4183 Three 329M Inverness Kirkhill 57.4541 -4.4183 Three 334M Inverness Tomatin 57.4541 -4.4183 Three 335M Inverness Kirkhill 57.4541 -4.4183 Three 169M Inverness Tomatin 57.3357 -3.9951 Three 204M Inverness Tomatin 57.3357 -3.9951 Three 166M Inverness Kyllachy 57.3103 -4.0087 Three 198M Inverness Kyllachy 57.3103 -4.0087 Three 202M Inverness Kyllachy 57.3103 -4.0087 Three 101M Perthshire Bridge of Earn 56.3418 -3.4090 Four 102M Perthshire Bridge of Earn 56.3418 -3.4090 Four 107M Perthshire Bridge of Earn 56.3418 -3.4090 Four 1988M Perthshire Bridge of Earn 56.3418 -3.4090 Four 1989M Perthshire Bridge of Earn 56.3418 -3.4090 Four 631M Perthshire Bridge of Earn 56.3418 -3.4090 Four 632M Perthshire Bridge of Earn 56.3418 -3.4090 Four 638M Perthshire Bridge of Earn 56.3418 -3.4090 Four 639M Perthshire Bridge of Earn 56.3418 -3.4090 Four 640M Perthshire Bridge of Earn 56.3418 -3.4090 Four 647M Perthshire Bridge of Earn 56.3418 -3.4090 Four 422M Midlothian Kirkhill 55.8946 -3.3096 Four 431M Midlothian Currie Kirk 55.8813 -3.1077 Four K1 Northumberland Kielder 55.2187 -2.4448 Five K2 Northumberland Kielder 55.2187 -2.4448 Five K3 Northumberland Kielder 55.2187 -2.4448 Five Hexham, Acomb Northumberland 488M Mill 54.9885 -2.1109 Five Hexham, Acomb Northumberland 489M Mill 54.9885 -2.1109 Five Hexham, Acomb Northumberland 490M Mill 54.9885 -2.1109 Five Hexham, Acomb Northumberland 491M Mill 54.9885 -2.1109 Five 688M Cumbria Kendal 54.3303 -2.7396 Six 689M Cumbria Kendal 54.3303 -2.7396 Six 690M Cumbria Kendal 54.3303 -2.7396 Six 691M Cumbria Kendal 54.3303 -2.7396 Six 692M Cumbria Kendal 54.3303 -2.7396 Six 655M Cumbria Pennington 54.1995 -3.1496 Six 656M Cumbria Pennington 54.1995 -3.1496 Six 657M Cumbria Pennington 54.1995 -3.1496 Six

139

Billington large Lancashire 1057M site 53.8123 -2.4343 Seven Billington large Lancashire 1058M site 53.8123 -2.4343 Seven Billington large Lancashire 1059M site 53.8123 -2.4343 Seven Billington large Lancashire 1060M site 53.8123 -2.4343 Seven Billington large Lancashire 1061M site 53.8123 -2.4343 Seven Billington large Lancashire 1062M site 53.8123 -2.4343 Seven Billington large Lancashire 1063M site 53.8123 -2.4343 Seven Billington large Lancashire 1064M site 53.8123 -2.4343 Seven Billington large Lancashire 1065M site 53.8123 -2.4343 Seven Billington large Lancashire 1066M site 53.8123 -2.4343 Seven Billington large Lancashire 1067M site 53.8123 -2.4343 Seven Billington large Lancashire 1068M site 53.8123 -2.4343 Seven 1045M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1046M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1047M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1048M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1049M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1050M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1051M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1052M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1053M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1054M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1055M Derbyshire Holme Moss Road 53.5326 -1.8808 Eight 1034M Derbyshire Crowden 53.4877 -1.8960 Eight 1035M Derbyshire Crowden 53.4877 -1.8960 Eight 1036M Derbyshire Crowden 53.4877 -1.8960 Eight 1037M Derbyshire Crowden 53.4877 -1.8960 Eight 1038M Derbyshire Crowden 53.4877 -1.8960 Eight 1039M Derbyshire Crowden 53.4877 -1.8960 Eight 1040M Derbyshire Crowden 53.4877 -1.8960 Eight 1041M Derbyshire Crowden 53.4877 -1.8960 Eight 1141M Worcestershire Hopwood 52.3748 -1.9588 Nine

140

1142M Worcestershire Hopwood 52.3748 -1.9588 Nine 1265M Worcestershire Hopwood 52.3748 -1.9588 Nine 1266M Worcestershire Hopwood 52.3748 -1.9588 Nine 1303M Worcestershire Hopwood 52.3748 -1.9588 Nine 594M Worcestershire Hopwood 52.3748 -1.9588 Nine 595M Worcestershire Hopwood 52.3748 -1.9588 Nine 596M Worcestershire Hopwood 52.3748 -1.9588 Nine 597M Worcestershire Hopwood 52.3748 -1.9588 Nine 622M Worcestershire Hopwood 52.3748 -1.9588 Nine 788M Worcestershire Hopwood 52.3748 -1.9588 Nine 958M Worcestershire Hopwood 52.3748 -1.9588 Nine 1016M Wiltshire Corsham 51.4378 -2.1884 Ten 1017M Wiltshire Corsham 51.4378 -2.1884 Ten 1018M Wiltshire Corsham 51.4378 -2.1884 Ten 1019M Wiltshire Corsham 51.4378 -2.1884 Ten 1020M Wiltshire Corsham 51.4378 -2.1884 Ten 1021M Wiltshire Corsham 51.4378 -2.1884 Ten 1111M Wiltshire Corsham 51.4378 -2.1884 Ten 1112M Wiltshire Corsham 51.4378 -2.1884 Ten 467M Dorset Broadmayne 50.7088 -2.4404 Eleven 468M Dorset Dorchester 50.7088 -2.4404 Eleven 469M Dorset Dorchester 50.7088 -2.4404 Eleven 470M Dorset Dorchester 50.7088 -2.4404 Eleven 471M Dorset Dorchester 50.7088 -2.4404 Eleven 472M Dorset Dorchester 50.7088 -2.4404 Eleven 466M Dorset Broadmayne 50.6730 -2.3976 Eleven

141