Received: 12 January 2020 | Revised: 8 July 2020 | Accepted: 16 July 2020 DOI: 10.1111/mec.15604

ORIGINAL RESEARCH

Gene flow and climate-associated genetic variation in a vagile habitat specialist

Zachary G. MacDonald1 | Julian R. Dupuis2 | Corey S. Davis3 | John H. Acorn1 | Scott E. Nielsen1 | Felix A. H. Sperling3

1Department of Renewable Resources, University of Alberta, Edmonton, Alberta, Abstract Canada Previous work in landscape genetics suggests that geographic isolation is of greater 2 Department of Entomology, University of importance to genetic divergence than variation in environmental conditions. This is Kentucky, Lexington, Kentucky, USA 3Department of Biological Sciences, intuitive when configurations of suitable habitat are a dominant factor limiting dis- University of Alberta, Edmonton, Alberta, persal and gene flow, but has not been thoroughly examined for habitat specialists Canada with strong dispersal capability. Here, we evaluate the effects of geographic and en- Correspondence vironmental isolation on genetic divergence for a vagile invertebrate with high habitat Zachary G. MacDonald, Department of Renewable Resources, University of Alberta, specificity and a discrete dispersal life stage: Dod's Old World , 751 General Services Building, Edmonton, machaon dodi. In Canada, P. m. dodi are generally restricted to eroding habitat Alberta, Canada T6G 2H1. Email: [email protected] along major river valleys where their larval host plant occurs. A series of causal and linear mixed effects models indicate that divergence of genome-wide single nucleo- Funding information Natural Sciences and Engineering tide polymorphisms is best explained by a combination of environmental isolation Research Council of Canada, Grant/ (variation in summer temperatures) and geographic isolation (Euclidean distance). Award Number: RGPIN-2014-04842 and RGPIN-2018-04920; Alberta Conservation Interestingly, least-cost path and circuit distances through a resistance surface pa- Association, Grant/Award Number: rameterized as the inverse of habitat suitability were not supported. This suggests ACAGiB-848 that, although habitat associations of many butterflies are specific due to reproduc- tive requirements, habitat suitability and landscape permeability are not equivalent concepts due to considerable adult vagility. We infer that divergent selection related to variation in summer temperatures has produced two genetic clusters within P. m. dodi, differing in voltinism and propensity. Within the next century, tem- peratures are predicted to rise by amounts greater than the present-day difference between regions of the genetic clusters, potentially affecting the persistence of the northern cluster under continued climate change.

KEYWORDS climate change, isolation by environment, isolation by resistance, landscape genetics, Lepidoptera, local adaptation

1 | INTRODUCTION at both the individual and population level (Coyne & Orr, 2004; Mayr, 1963; Nosil, 2012). When genetic divergence has a strong A principal aim in ecology and evolutionary biology is to resolve spatial component, causes are generally attributed to spatial vari- factors and understand processes that influence genetic divergence ation in evolutionary processes, such as gene flow, genetic drift,

Molecular Ecology. 2020;00:1–18. wileyonlinelibrary.com/journal/mec © 2020 John Wiley & Sons Ltd | 1 2 | MACDONALD et al. and selection (Bohonak, 1999; Rousset, 1997; Schwartz McKelvey, via the combination of (a) reduced fitness and negative selection on Cushman, & Luikart, 2010; Slatkin, 1987). However, inferring the individuals that have dispersed across environmental gradients; (b) relative contributions of these processes is challenging. Landscape reduced fitness and negative selection on dispersers’ offspring in genetics addresses this by quantitatively relating patterns of ge- non-natal habitats (outbreeding depression); or (c) reduced tendency netic divergence to geographic and environmental landscape factors of individuals to disperse due to local adaptation to environmental (Cushman, McKelvey, Hayden, & Schwartz, 2006; Manel, Schwartz, conditions (Crispo, Bentzen, Reznick, Kinnison, & Hendry, 2006; Luikart, & Taberlet, 2003; Richardson, Brady, Wang, & Spear, 2016; Dobzhansky, 1937; Lee & Mitchell-Olds, 2011; Nosil, 2004, 2012; Shirk, Wallin, Cushman, Rice, & Warheit, 2010; Sork, Nason, Nosil, Vines, & Funk, 2005; Wang & Bradburd, 2014). After con- Campbell, & Fernandez, 1999). trolling for geographic distance, positive correlations between Multiple heuristics have been invoked to conceptualize rela- genetic distance and differences in environmental/ecological condi- tionships between genetic divergence and landscape factors, with tions confer support for IBE. each implicating specific evolutionary processes. The first, iso- Although often framed as competing hypotheses, comple- lation-by-distance (IBD; Wright, 1943), predicts that geographic mentarity of IBD, IBR, and IBE has been documented in multiple distance or physical barriers to dispersal reduce gene flow and per- studies (Coyne & Orr, 2004; Crispo et al., 2006; Sánchez-Ramírez mit drift between spatially separated individuals or populations. et al., 2018; Thorpe, Surget-Groba, & Johansson, 2008; Van Buskirk Because dispersal is limited in most species (Greenwood, 1980), & van Rensburg, 2020; Wang, Glor, et al., 2012). Within such investi- Euclidean and genetic distances are often positively correlated, gations, it can be instructive to invoke Euclidean, least-cost path, and supporting IBD (Rousset, 1997; Vekemans & Hardy, 2004; but see circuit distances as measures of geographic isolation (IBD + IBR) for Meirmans, 2012). A second heuristic, isolation-by-resistance (IBR; contrast with measures of environmental/ecological isolation (IBE) McRae, 2006), may be seen as a modification of IBD and predicts estimated as differences in biotic or abiotic conditions. For example, that patterns of genetic divergence will be best explained by geo- Wang, Glor, et al. (2012) compared the effects of geographic isola- graphic distances accounting for variation in the relative resistance tion (estimated as least-cost path and circuit distances) and ecologi- organisms experience when dispersing through heterogeneous land- cal isolation (estimated as differences in values of 24 environmental scapes. To assess IBR, resistance surfaces are often parameterized variables) on genetic divergence for 17 Anolis species. Genetic di- as the inverse of habitat suitability, generally modelled using occur- vergence was significantly related to geographic and ecological iso- rences of focal taxa and geographic and environmental/ecological lation for 15 and 13 species, respectively, with inferred effects of predictor variables (McRae & Beier, 2007; Wang, Yang, Bridgman, geographic isolation being, on average, more than twice as strong & Lin, 2008; Storfer, Murphy, Spear, Holderegger, & Waits, 2010; as those of ecological isolation. Similar results have been reported Wang, Glor, & Losos, 2012, but see Peterman, Connette, Semlitsch, for a variety of other vertebrate taxa, including the Trinidad Guppy & Eggert, 2014). Optimal pathways (e.g., least-cost path distances) (Poecilia reticulata; Crispo et al., 2006), Agassiz's Desert Tortoise or a multitude of possible pathways of varying probability derived (Gopherus agassizii; Sánchez-Ramírez et al., 2018), and the Common from circuit theory (circuit distances) may then be estimated across Frog (Rana temporaria; Van Buskirk & van Rensburg, 2020), suggest- resistance surfaces and related to patterns of genetic divergence ing that geographic distance, spatial features, and arrangements of (McRae & Beier, 2007). When resistance surfaces are parametrized suitable habitat are of greater importance to genetic divergence than in this way, positive correlations between genetic distance and re- variation in environmental/ecological conditions (but see support for sistance-based distances suggest that organisms are more likely to IBE in epigenetic data, Wogan, Yuan, Mahler, & Wang, 2020). disperse within suitable habitat and experience high movement re- Across these studies, greater support for IBD + IBR than IBE sistance/cost within unsuitable habitat. IBR thereby equates the con- contrasts with the hypothesis that factors contributing to genetic cept of habitat suitability to that of landscape permeability (the ease divergence are predominantly environmental/ecological (Foll & with which organisms move through a landscape), or in circuit the- Gaggiotti, 2006; Nosil, 2012; Thorpe et al., 2008). However, past ory terms, conductance. A third heuristic, isolation-by-environment comparisons of geographic and environmental isolation have (IBE; Wang & Summers, 2010), predicts that spatial variation in en- tended to address vertebrate taxa with relatively specific habi- vironmental/ecological conditions contributes to genetic divergence tat associations that are maintained through their life cycles. It is

FIGURE 1 Visual representation of the methods used to generate geographic and environmental distances between the 161 sequenced dodi included in this study. Various spatial data layers (a–g) and 375 P. m. dodi records (161 sequenced individuals and 214 georeferenced P. m. dodi Global Biodiversity Information Facility (GBIF) records) were used to build a Maxent habitat suitability model, which was then used to predict habitat suitability across the study area, with higher values indicating greater suitability. High habitat suitability generally followed the eroding banks of major rivers in southern Alberta and Saskatchewan, Canada (Red Deer, Old Man, South Saskatchewan, and Milk Rivers). Euclidean distance (i) represents the minimum distance between sequenced individuals. A resistance surface was parameterized as the inverse of habitat suitability and used to estimate least-cost path and circuit distances (j and k). The background in inset j is a projected cost surface, representing the cumulative costs incurred by individuals moving across the landscape from each occurrence point. Environmental distances (e–g) were estimated by taking the absolute difference between values of environmental variables extracted from the occurrences of sequenced individuals. Inset pictures are the adult and larval stage of P. m dodi MACDONALD et al. | 3

therefore intuitive that patterns of genetic divergence are gener- Wehausen, Bleich, Torres, & Brashares, 2007; McRae, 2006; McRae ally best explained by resistance-based geographic distances, as & Beier, 2007; Sánchez-Ramírez et al., 2018; Vignieri, 2005; Wang, configurations of suitable habitat typically moderate spatial vari- Savage, & Bradley Shaffer, 2009; Wang et al., 2008). But other ation in movement and dispersal in such taxa (Broquet, Ray, Petit, taxa, including many terrestrial invertebrates, have discrete disper- Fryxell, & Burel, 2006; Coulon et al., 2004; Crispo et al., 2006; Epps, sal life stages (generally the adult) with broader habitat tolerances 4 | MACDONALD et al. than larval stages, which may have important consequences for important considerations for predicting gene flow on heterogene- processes affecting genetic divergence (Phillipsen et al., 2015). For ous landscapes; and (b) habitat suitability may be used as a proxy example, Keller and Holderegger (2013) found that short-distance of landscape permeability. However, although habitat associations movements of the southern damselfly (Coenagrion mercuriale) gener- of P. m. dodi appear to be specific and geographically restricted, the ally followed corridors of reproductive and larval habitat (streams), dispersal ability of P. machaon is estimated to be among the great- while long-distance dispersal, inferred from patterns of genetic di- est of all Canadian butterflies (Burke, Fitzsimmons, & Kerr, 2011). vergence, was best explained by Euclidean distance across unsuit- We therefore hypothesize that, considering only measures of geo- able habitat (agricultural land). Although IBE was not evaluated for graphic isolation, relationships between genetic divergence and C. mercuriale, genetic divergence within this taxon and similar taxa Euclidean distance (IBD) will be stronger than those between ge- with discrete dispersal life stages may be expected to show stronger netic divergence and least-cost path or circuit distances (IBR), de- relationships to environmental isolation than geographic isolation, as spite a resistance surface predicting greatest landscape permeability evolutionary processes predicted by IBD and IBR become subsidiary (lowest resistance) along the dendritic ecological network of suitable to those predicted by IBE. habitat. The aim of our study was to evaluate IBD, IBR, and IBE for a Environmental isolation may also play a significant role in struc- taxon with high habitat specificity, a discrete dispersal stage, and turing genetic divergence within P. m. dodi. Sperling (1987) noted distribution across a variable environment. The Old World swallow- distinct differences in the butterfly's phenology, diapause propen- tail butterfly (Papilio machaon L.) species group has been the subject sity, and voltinism across its Canadian range, possibly implicating of considerable study in North America (e.g., Dupuis, Cullingham, divergent selection related to variation in summer temperatures as a Nielsen, & Sperling, 2019; Dupuis, Mori, & Sperling, 2016; Dupuis & driver of spatial genetic divergence. If IBE is detected, mechanisms Sperling, 2015, 2016; Sperling, 1987, 1990). One subspecies in par- by which environmental isolation structures genetic divergence ticular, P. m. dodi McDunnough, 1939, is well suited for this investiga- may be inferred from patterns of population clustering, and we can tion. Adult P. m. dodi search for mates by hilltopping along prominent make several subsequent predictions. First, if IBE is primarily driven edges of river valleys, leading to clustering of occurrence records by reduced fitness and negative selection on individuals that have along the Red Deer, South Saskatchewan, Old Man, and Milk Rivers dispersed across environmental gradients, genetic clustering should in southern Alberta and Saskatchewan, Canada (Sperling, 1987; indicate some spatial discordance of individual cluster assignments Bird, Hilchie, Kondla, Pike, & Sperling, 1995; Dupuis et al., 2019; oc- (i.e., migrants found in non-natal populations) without admixture be- currence records are visualized in Figure 1, inset h). After mating, tween clusters indicative of successful hybridization. Second, if IBE females travel downslope from hilltops to oviposit on their larval operates via reduced fitness and negative selection on genetically host plant, Artemisia dracunculus L., which is generally restricted to intermediate individuals, some admixture indicative of F1 hybrid- south-facing eroding slopes of river valleys. We therefore hypoth- ization between migrant and natal individuals may be evident, but esize that these river valleys constitute suitable habitat under the substantial admixture among clusters should be absent due to selec- functional resource-based concept (sensu Dennis, Shreeve, & Van tion against these admixed genotypes. Third, if individuals exhibit a Dyck, 2003), providing resources sufficient for mate location, repro- reduced tendency to disperse across environmental gradients due duction, resting, roosting, and feeding. This unique habitat configu- to local adaptation, there should be little or no spatial discordance of ration may be described as a dendritic ecological network of suitable individual cluster assignments or admixture among clusters. Finally, habitat corridors situated in a matrix of unsuitable agricultural and we also identified loci under putative divergent selection between prairie habitat. Such configurations have proven practical for de- genetic clusters (FST outliers) and assessed environmental associ- coupling Euclidean and resistance-based distances (e.g., Keller & ations of allele frequencies for individual loci. If population struc- Holderegger, 2013). Additionally, occurrences of P. m. dodi in Canada ture is driven by local adaptation to environmental conditions, we 2 span an area of approximately 53,000 km that is sufficiently vari- hypothesize that FST outlier and environmental association analyses able in environmental conditions to assess IBE (environmental gradi- will identify similar sets of candidate loci under putative selection. ents are visualized in Figure 1, insets e–g).

2.1 | Sample collection 2 | MATERIALS AND METHODS We collected adult and larva P. m. dodi with aerial net surveys and We used a series of causal models to assess and contrast the ef- host plant searches, respectively, between 15 May and 31 August fects of geographic isolation (IBD + IBR) and environmental isola- 2017. We visited most known P. m. dodi occurrence locations in tion (IBE) on genetic divergence within P. m. dodi. Resistance-based Canada and aimed to collect 5–10 individuals every 25–50 km distances were estimated using a resistance surface parameterized along the Red Deer, South Saskatchewan, Old Man, and Milk Rivers, as the inverse of predicted habitat suitability to assess whether adult where suitable habitat is present (collection locations are visual- butterflies are more likely to disperse within suitable habitat. Such ized in Figure 1, inset h). As with previous studies (Dupuis et al., a result would suggest that (a) configurations of suitable habitat are 2016, 2019; Dupuis & Sperling, 2015, 2016; Sperling, 1987, 1990), MACDONALD et al. | 5 we attempted to locate and collect P. m. dodi between major river Durbin, 2009). We then converted files from SAM to BAM format valleys (e.g., smaller eroding slopes with A. dracunculus and hilltop- using SAMtools 1.9 (Li et al., 2009) and used “gstacks” and “popu- ping features). However, consistent with our past work and historical lations” within Stacks 2.0 to call SNPs and generate output files, records (e.g., Bird et al., 1995), we did not observe any individuals stipulating a single population containing all individuals. Genotype outside of known suitable habitat along major river valleys. Adults calls were exported in variant call format (VCF), and individuals with were generally collected on prominences along major river valleys more than 50% missing data were removed from the data set (three used as hilltopping features. Adult females descend from hilltops im- adults and three larvae). Finally, we used VCFtools 0.1.14 (Danecek mediately after mating while males remain in search of additional et al., 2011) to filter genotypes with read depths less than five and mates, which means that males are more frequently encountered filter loci with minor allele frequencies less than 0.05, percentages of during sampling (Dupuis et al., 2019). Larvae were collected from missing data greater than 5%, and those within <10 kb of each other patches of A. dracunculus on eroding slopes, typically 100–500 m2 in to reduce the probability of retaining loci that are in physical link- area, below hilltopping features. Individuals collected within 500 m age. This thinning interval was based on linkage decay documented of each other were given the same collection location, recorded as in other butterfly species; e.g., linkage decays to baseline within the centroid of the hilltopping feature or A. dracunculus patch. After 1 kb–10 kb in Heliconius spp. (Martin et al., 2013) and within 100 bp collection, adult individuals were frozen live and stored at –20°C. in Danaus plexippus (Zhan et al., 2014). Larvae were raised to 4th or 5th instar on clippings of A. dracunculus, Multiple larval samples were often collected from single or ad- preserved in 95% ethanol, and stored at –20°C. jacent A. dracunculus plants. Full-sibling relationships between indi- viduals are therefore probable due to females ovipositing multiple eggs on single plants, which may bias inferences of genetic diver- 2.2 | DNA extraction and library preparation gence and population structure (O'Connell, Mulder, Maldonado, Currie, & Ferraro, 2019). To identify full sibs, we used the package We extracted genomic DNA from thoracic tissue of adults (n = 148) “SNPRelate” (Zheng et al., 2012) implemented in the R environment and larvae (n = 32) using DNeasy Kits (Qiagen, Hilden, Germany). (v3.5.1; R Core Team) to estimate kinship coefficients for all pairs Extractions followed the manufacturer's protocol, with the addi- of sequenced individuals. For diploid organisms, the expected kin- tion of bovine pancreatic ribonuclease A treatment (RNaseA, 4 μl ship coefficient between full sibs is 0.25. Only pairs of larvae col- at 100 mg/ml; Sigma-Aldrich Canada Co., Canada). Genomic DNA lected from single locations had kinship coefficients greater than was then ethanol precipitated and stored in 50 μl Millipore water 0.22, where a natural break in values occurred. For each of these at –20°C. Double digest restriction-site associated DNA sequenc- larval pairs, we removed the individual with the greater percentage ing (ddRADseq) libraries were prepared from 200 ng input DNA and of missing data (13 individuals total). We then reverted to the original MspI and PstI. We followed a modified version of Poland, Brown, BAM files, recalled SNPs with Stacks, and filtered VCF files as above Sorrells, and Jannink (2012) for wet laboratory procedures and used using the reduced data set of 161 individuals comprised of 136 adult a standard dual index Illumina adapter system following Peterson, males, eight adult females, and 17 unsexed larvae. Weber, Kay, Fisher, and Hoekstra (2012). Details of our library prep- aration protocol and adapters are provided in Data S1 and S2, re- spectively. A final, pooled library of 180 individuals was sequenced 2.4 | Population structure with single-end, 75 bp sequencing on a single high output flowcell of an Illumina NextSeq 500. Two independent methods were used to quantify population struc- ture. We first used the model-based clustering program Structure 2.3.4 (Pritchard, Stephens, & Donnelly, 2000) in a hierarchical fash- 2.3 | Bioinformatic processing ion (Vähä & Primmer, 2006) to assess K-values ranging from 1 to 10. For the first set of runs in the hierarchical analysis, 10 independent Following Illumina sequencing, we used “process_radtags” in the runs were completed for each value of K using the admixture model program Stacks 2.0 (Rochette, Rivera-Colón, & Catchen, 2019) to and correlated allele frequencies. The burnin period and number of demultiplex FASTQ reads and filter those with quality scores below Markov chain Monte Carlo (MCMC) repetitions were set to 100,000 20 within a sliding window 15% of the read length. All reads were and 1,000,000, respectively. Location priors (n = 27 collection sites) truncated to 67 bp after removing the 8 bp Illumina index sequences, were used to inform the MCMC algorithm. The alpha prior (relative identified with one mismatch permitted. We then searched for and admixture levels between populations) was set to 0.5, based on the removed remnant Illumina adaptor sequences and removed the first inverse of the expected value of K = 2 informed by overt cluster- 5 bp from the 5’ end of each read (corresponding to the PstI restric- ing in preliminary principal component analysis (PCA; see Figure S1) tion site) using the program Cutadapt 1.9.1 (Martin, 2011). Filtered (Wang, 2017). Two approaches were then used to determine the op- and trimmed reads were aligned to a P. machaon reference genome timal K from Structure outputs; the ΔK method (Evanno, Regnaut, comprised of 63,187 scaffolds (NCBI Accession GCA_001298355.1) & Goudet, 2005), implemented in the program Structure Harvester using Burrows-Wheeler Aligner 0.7.17 (BWA-MEM) (Li, 2013; Li & (Earl & vonHoldt, 2012), and the rate of change in the likelihood of 6 | MACDONALD et al.

K across K = 1:10 (Pritchard et al., 2000). For the second set of runs To evaluate the predicative power of the habitat suitability in the hierarchical analysis, we completed independent Structure model, 20% of occurrence localities were withheld for cross-vali- analyses on each cluster identified in the first analysis using settings dation and receiver operating characteristic (ROC) analysis (Phillips identical to the first set of runs. Q-values > 0.8 from the first set of et al., 2006). Following evaluation, we used the model to predict runs denoted assignment of individuals to specific clusters, effec- habitat suitability across the study landscape using the “predict” tively excluding individuals with substantial admixture. function (raster package), with each grid cell receiving habitat suit- In addition to Structure analyses, we also assessed popu- ability values ranging from 0–1, where higher values indicate greater lation structure using discriminant analysis of principal com- habitat suitability. Information on the validity of the Maxent process ponents (DAPC), which conducts discriminant analysis (DA) on and its application to P. m. dodi is available in Data S1. principal components (PCs) generated in PCA (Jombart, Devillard, & Balloux, 2010). Assignments of individuals to a priori population clusters for DAPC were inferred using the “find.clusters” function 2.6 | Geographic distance (adegenet package) with default parameters, retaining all PCs, to find the optimal K value based on Bayesian Information Criterion (BIC) We estimated geographic isolation between sequenced individuals scores in successive K-means clustering analysis across K = 1:20. using three different pairwise distance metrics; Euclidean distance, DAPC was then completed using the R package “adegenet” v2.1.1 least-cost path distance, and circuit distance. Euclidean distances (Jombart, 2008). We used the “xvalDapc” function (adegenet pack- represent minimal distances required to travel between locations age), stipulating 100 replicates, to determine the optimal number and do not account for landscape characteristics. In contrast, least- of PCs to retain in DAPC using stratified cross-validation of DAPC cost path distances are estimated by searching for single, optimal across increasing numbers of principal components (PCs). Missing routes that minimize cumulative costs associated with travelling genotypes were imputed as the mean of the available data per locus through heterogeneous landscapes (Wang et al., 2009). Least-cost for this cross-validation. path analysis thereby assumes that organisms have complete knowl- edge of such landscapes and are able to consistently navigate optimal routes. Circuit-based analyses relax this assumption, with distances 2.5 | Habitat suitability estimated by summarizing costs associated with all possible paths through heterogeneous landscapes (McRae & Beier, 2007). To map habitat suitability for P. m. dodi within our study landscape, we Pairwise Euclidean distances between the collection locations of created a habitat suitability model using Maxent software (Phillips, the 161 sequenced individuals were estimated using the “spDists” Anderson, & Schapire, 2006), implemented through the R package function in the R package “sp” (Pebesma & Bivand, 2005). To esti- “dismo” (Hijmans, Phillips, Leathwick, & Elith, 2011). Briefly, Maxent mate least-cost path and circuit distances, we first parameterized a uses machine-learning maximum entropy modelling to predict habi- resistance surface as the inverse of habitat suitability scores pre- tat suitability across a landscape using georeferenced occurrence dicted by the habitat suitability model (McRae & Beier, 2007; Storfer localities and a set of geographic information systems (GIS) predic- et al., 2010; Wang et al., 2008; Wang, Glor, et al., 2012). We then es- tor variables (spatial data layers). For georeferenced occurrence lo- timated pairwise least-cost path distances using the “costDistance” calities, we used both the collection locations of the 161 sequenced function in the R package “gdistance” (van Etten, 2018) and pairwise individuals from this study and georeferenced P. m. dodi occurrences circuit distances using the program Circuitscape 5.0 (McRae, 2006), downloaded from the Global Biodiversity Information Facility (GBIF; both with an eight-neighbour connection scheme. To increase com- accessed from https://doi.org/10.15468/​dl.axez0s, 5 December putational efficiency, we aggregated the resolution of the resistance 2018). Of the 259 occurrences available from GBIF, 214 were within surface to 300 m, as connectivity inferences are shown to be gener- the study landscape. Geographic and environmental GIS data lay- ally robust to such aggregations (McRae & Beier, 2007). Collectively, ers included elevation, a terrain ruggedness index, a heat load index these analyses produced three pairwise matrices of geographic (based on terrain slope and aspect), land cover (12 categories), and distances. three Worldclim 2 (Fick & Hijmans, 2017) bioclimatic variables: mean temperature of warmest quarter (temp.warm), mean temperature of coldest quarter (temp.cool), and mean annual precipitation (precip.). 2.7 | Environmental distance These environmental variables were selected based both on biologi- cal relevance and to minimize collinearity (see Data S1 for further Environmental isolation between sequenced individuals were es- details). Each GIS data layer was reprojected to an equal-area pro- timated using the same Worldclim 2 bioclimatic variables included jection (NAD83[CSRS98]/UTM zone 12N) at 30-m resolution using in the habitat suitability model (temp.warm, temp.cool, and precip.). the R package “raster” (Hijmans & van Etten, 2012). Further details We extracted values for each of the three bioclimatic variables for and sources for GIS data layers may be found in Data S1. Correlation collection locations of sequenced individuals using the “extract” coefficients were less than 0.7 for all pairs of GIS data layers, and so function in the R package “raster”. Following Wang, Glor, et al. all seven were included in the habitat suitability model. (2012), environmental distances were estimated by taking absolute MACDONALD et al. | 7 differences of these values. Resulting differences were organized network. SEM analyses have proven particularly useful for distinguish- into three pairwise matrices of environmental distances. ing effects of multiple collinear variables (e.g., geographic and environ- mental distances; Grace, 2006; Wang, Glor, et al., 2012). Our causal path network included two regression pathways; one from geographic 2.8 | Determinants of genetic divergence distance to genetic distance and one from environmental distance to genetic distance. Geographic and environmental distance were linked We used two types of causal models and linear mixed effects mod- by a covariance pathway. Results of the RCM analysis were used to infer els, each implementing an individual-based (c.f. population-based) which single measures of geographic and environmental distance were approach, to evaluate how variation in geographic and environmen- most strongly related to genetic distance. To test whether geographic tal isolation relates to genetic divergence within P. m. dodi. To quan- and environmental distance contributed meaningfully to observed var- tify genetic divergence, we first used the “dist” function within the iation in genetic distance, we compared Akaike's information criterion R package “adegenet” to estimate pairwise genetic distance (sum of (AIC) scores for the full model, a model excluding geographic distance, squared Euclidean distances between ith and the jth genotype) be- and a model excluding environmental distance (sensu Wang, Glor, et al., tween sequenced individuals, commensurate with the geographic 2012). Lower AIC scores indicated superior model fit. Models with AIC and environmental distance matrices generated above. This simple, scores exceeding the best supported model by 10 or more points were individual-based measure of genetic distance has been shown to unsupported (Burnham & Anderson, 1998). SEM analyses were com- effectively quantify genetic divergence in a variety of simulations pleted using maximum-likelihood estimation in the R package “lavaan” (e.g., Shirk, Landguth, & Cushman, 2017) and in field studies (e.g., (Rosseel, 2012). To account for nonindependence among pairwise Sánchez-Ramírez et al., 2018). data, we randomly permuted rows and columns of distance matrices to generate null distributions for path coefficients assuming no relation- ships between variables exist (Fourtune et al., 2018). Unbiased stand- 2.8.1 | Reciprocal causal modelling (RCM) ard errors and p-values for path coefficients were then calculated by comparing observed coefficients to their null distributions. This was Our first set of causal models addressed relationships between ge- completed using a modification of the “permutation.based.pathanaly- netic distance and geographic and environmental distances using re- sis” R code provided by Fourtune et al. (2018). ciprocal causal modelling (RCM) with partial Mantel tests (Cushman et al., 2006; Cushman, Wasserman, Landguth, & Shirk, 2013). We used the “mantel.partial” function within the R package “vegan” (Oksanen 2.8.3 | Validation of the causal model framework et al., 2007) to perform partial Mantel tests (999 permutations) for genetic distance and each combination of the six geographic and envi- To investigate whether inferences from casual models were contingent ronmental distances, totalling to 30 tests organized into 15 reciprocal on method of analysis, we also employed linear mixed effects models causal models. For each comparison of relationships between genetic with maximum likelihood population effects (MLPE; Clarke, Rothery, & distance and two geographic/environmental distances (one recipro- Raybould, 2002). MLPE linear mixed effects models have been shown cal model), we first estimated the partial Mantel's R coefficient (RPM) to be one of the highest performing methods for quantifying relation- between genetic distance and one geographic/environmental distance ships between distance matrices while controlling for nonindepend- (focal variable) conditioned on the other geographic/environmental ence among pairwise data (Shirk et al., 2017). Relationships between distance (alternative variable), comprising partial Mantel test A. We genetic distance and each of the six measures of geographic and envi- then estimated the reciprocal RPM, comprising partial Mantel test B. ronmental isolation were quantified in independent models. The iden-

If RPM-A > RPM-B, the focal variable from partial Mantel test A is better tities of sequenced individuals involved in pairwise distance values supported, and vice versa. Results of RPM-A - RPM-B were summarized in were included in models as mixed effects to control for nonindepend- a heatmap similar to Ruiz-Gonzalez Cushman, Madeira, Randim, and ence within distance matrices (Clarke et al., 2002). MLPE linear mixed Gómez-Moliner (2015). Notwithstanding this RCM framework, if both effects models were fit using the “MLPE.lmm” function within the R

RPM-A and RPM-B were significant, we inferred partial support for rela- package “ResistanceGA” (Peterman, 2018). We set REML = FALSE to tionships between genetic distance and each of the two geographic or allow for the estimation of valid AIC scores that were used to evaluate environmental distances used in the reciprocal model. relative model support (Peterman, 2018; Row, Knick, Oyler-McCance, Lougheed, & Fedy, 2017; Shirk et al., 2017).

2.8.2 | Structural equation modelling (SEM) 2.9 | Population divergence of candidate loci Our second set of causal models employed structural equation model- ling (SEM), a method originally developed by Wright (1921), to quantify To identify candidate loci under putative divergent selection, we the relative strength of effects of geographic distance and environ- used Bayescan 2.1 (Foll & Gaggiotti, 2008) to estimate allele fre- mental distance on genetic distance according to an a priori causal path quencies and FST values for 1,382 loci. When population assignment 8 | MACDONALD et al. of individuals is sensible, Bayescan is generally recognized as the histograms for each environmental variable, estimated using the genomic most effective method for identification of FST outlier loci (De Mita inflation factor (λ) procedure (Devlin & Roeder, 1999). Distributions that et al., 2013; Lotterhos & Whitlock, 2014; Narum & Hess, 2011). Q- are relatively flat with a peak near zero indicate the selected number of values >0.8 from the Structure analysis (K = 2) denoted assignment latent factors adequately controlled for potentially confounding effects of individuals to either the northern or southern cluster, effectively of spatial genetic structure (Frichot & François, 2015). Finally, to control excluding individuals with substantial admixture from this analysis. for false positives associated with multiple tests, we again used the FDR We used default parameters to run Bayescan (prior odds set to 10, criterion (Benjamini & Hochberg, 1995), producing q-values for each as- thinning interval to 10, number of pilot runs to 20, length of pilot sociation. Loci with q-values <0.05 were inferred as having significant runs to 5,000, and burnin length to 50,000), except for the num- environmental associations. ber of outputted iterations, set to 10,000. To reduce the likelihood of false positives associated with multiple tests, we assessed the significance of FST outliers using q-values generated by Bayescan 2.9.2 | Genomic contexts of candidate loci according to the False Discover Rate (FDR) criterion (Benjamini &

Hochberg, 1995). FST outlier loci were identified using a q-value- To map the location of candidate loci within the P. m. dodi genome, threshold of 0.05. Fifteen Bayescan runs were completed using this we used BEDTools v2.27.1 (Quinlan & Hall, 2010) to extract 5 kb of protocol and a union of the resulting lists of FST outlier loci was taken flanking sequence on the 5’ and 3’ ends of each candidate locus iden- to generate a final list of loci under putative divergent selection. tified by either Bayescan or LFMM analyses. This length of flank- We also used the “snpzip” function (adegenet package) with de- ing sequence was selected in reference to previous thinning of loci fault settings to identify which loci contributed most significantly to within 10 kb of each other. We then used the BLAST function within between-population structure in DAPC, with population assignment Lepbase (Challis, Kumar, Dasmahapatra, Jiggins, & Blaxter, 2016) based on K-means clustering analysis. This analysis uses the relative to match resulting sequences to annotated genes within Lepbase's contribution of each SNP to DAPC to perform hierarchical clustering butterfly and moth CDS databases. As this search queried multi- and classify loci as either “structural” or “nonstructural”. ple species’ genomes, we evaluated possible interspecific matches based on percent match of the query, phylogenetic distance to the matched species, and the number of distinct genomes (multiple spe- 2.9.1 | Environmental associations of individual loci cies) in which each gene was reported. Putative gene functions were compiled from the UniProt Consortium (2018) using gene accession While Bayescan is effective for identifying loci under putative codes included within the Lepbase output. divergent selection among discrete populations, an individual- based approach may be more effective for identifying candidate loci potentially under selection across environmental gradients 3 | RESULTS (Frichot, Schoville, Bouchard, & François, 2013). To accomplish this, we used latent factor mixed modelling (LFMM), implemented ddRAD sequencing resulted in a total of 293,036,249 single-end, in LFMM 1.3 (Frichot et al., 2013) via the R package “LEA” (Frichot 75-bp reads across the original set of 180 sequenced individuals. & François, 2015). LFMM assesses correlations between allele fre- After running “process_radtags” and associated filters, 273,664,014 quencies of individual loci and environmental variables (each in- reads remained, of which, 192,902,810 were aligned to the P. cluded in an independent model as a fixed effect) while controlling machaon reference genome. After removing individuals with > 0% for background population structure using latent factors equal in missing data and putative full-sibs, 108,049,831 reads were used number to the optimal value of K. This reduces the likelihood of false to call 104,038 SNPs for the final set of 161 sequenced individuals. positives arising from spurious relationships between allele frequen- Filtering of loci resulted in a total of 1,382 SNPs with a mean read cies and environmental variables due to autocorrelation of space, depth of 71.99 (min = 11.76, max = 1,950.7), comprising the data set demography, and the environment (Frichot et al., 2013; Lotterhos & used in all subsequent analyses. Whitlock, 2014), which can be problematic for other analysis meth- ods, such as those employed in BayEnv2 (Günther & Coop, 2013; Lotterhos & Whitlock, 2014). 3.1 | Population structure Environmental variables included in LFMM analyses were temp. warm, temp.cool, and precip.. We completed five independent LFMM Our first set of Structure runs predicted an optimal K-value of K = 2 runs with 10,000 iterations and a burnin of 5,000, stipulating two latent using both the ΔK method (Evanno et al., 2005) and the rate of change factors (K = 2 inferred from both Structure and DAPC/K-means cluster- in the likelihood of K across K = 1:10 (Pritchard et al., 2000; see Data ing analyses). Results were then combined by calculating the median |z|- S1, Figure S2a). Individuals collected near and north of Dorothy, scores across the five LFMM runs, which represent the strength of the Alberta, were generally assigned to a northern cluster, while individu- genetic-environment association for each locus. To validate the number als collected within and south of Dinosaur Provincial Park, Alberta, of latent factors used in LFMM, we visually inspected adjusted p-values were assigned to a southern cluster (Figure 2). Spatial discordance of MACDONALD et al. | 9 two individuals’ cluster assignments (i.e., migrants found in non-natal AUC score of 0.948. As hypothesized, high suitability generally fol- populations) suggests that dispersal between the regions occurs. lowed the eroding banks of major rivers, taking on the form of a den- Nine individuals had an approximate 50/50 split of Q-values, suggest- dritic ecological network (Figure 1, inset h). Contributions of each ing that some hybridization between migrant and natal individuals oc- variable to the habitat suitability model were estimated by measur- curs. However, little admixture was observed beyond these putative ing the drop in AUC after values of each variable were randomly per- F1 hybrids. In the second set of Structure runs, no subclustering was muted (permutational importance): terrain ruggedness = 47.7, temp. evident in the northern cluster, while the existence of two subclus- cool = 34.2, elevation = 4.8, precip. = 4.4, temp.warm = 0.2, and ters was best supported within the southern cluster (Data S1, Figure heat load = 0.1. A resistance surface parameterized as the inverse of S2b and c). Q-values for individuals of these two subclusters (K = 2 habitat suitability scores permitted estimation of least-cost path and in the southern cluster only analysis) were very similar to those for circuit distances (Figure 1, insets j and k). K = 3 in the first set of runs (including all individuals). For simplicity, we therefore present admixture plots from the first set of Structure runs (all individuals) for both K = 2 and K = 3 (Figure 2). 3.3 | Determinants of genetic divergence Similar to the first set of Structure runs, K-means clustering anal- ysis suggested K = 2 was best supported. An optimal number of 20 3.3.1 | Reciprocal causal modelling (RCM) PCs was retained for DAPC. Assignments of individuals to northern and southern clusters were nearly identical between the first set of Results of RCM are summarized in a heatmap (Figure 3), with red

Structure runs and DAPC/K-means clustering analysis, save four ad- and blue colours indicating positive and negative values for RPM-A mixed individuals with Q-values around 0.5 that were assigned to - RPM-B, respectively (sensu Ruiz-Gonzalez et al., 2015). Focal and the southern cluster by Structure (based on Q-values > 0.5) and the alternative variables used in partial Mantel test A for each recipro- northern cluster by DAPC/K-means clustering analysis. cal model are on the y- and x-axes, respectively. For ease of inter- pretation, variables on the y-axis with more positive (red) values in their corresponding rows are better supported. Overall, the strong- 3.2 | Habitat suitability est correlates of genetic distance after partialling out relationships with alternative variables were Euclidean distance and temp.warm The Maxent habitat suitability model adequately predicted habitat distance. Euclidean distance was significantly correlated with ge- suitability across our study landscape, indicated by an out-of-sample netic distance after partialling out temp.warm distance (RPM =0.23;

0.00.5 1.0 0.00.5 1.0 Sample size >10 3-10

5750000 1-3 FIGURE 2 Population genetic structure within Papilio machaon dodi in Alberta and Saskatchewan, Canada, 5700000 inferred using the model-based clustering program Structure. An optimal K value of Dorothy, AB 2 was best supported by the ΔK method and rate of change in the likelihood of 5650000 K across K = 1:10 for the first set of Structure runs addressing all individuals.

Hierarchical runs addressing the northern Dinosaur Provincial and southern clusters independently 600000 Park, AB suggested no overt subclustering within the northern cluster and the existence of two subclusters within the southern

cluster. We present admixture plots 555000 05 derived from the first set of Structure runs for both K = 2 (exhibiting the two primary clusters) and K = 3 (including the two southern subclusters) for simplicity. 500000 For K = 2, individuals collected near and north of Dorothy, Alberta, were generally assigned to a northern cluster, while

individuals collected within and south of 545000 05 N 0 25 50 km Dinosaur Provincial Park, Alberta, were K = 3 K = 2 assigned to a southern cluster 350000 400000 450000 500000 550000 600000 10 | MACDONALD et al. p = .001) and the reciprocal partial Mantel test was also significant genetic distance was temp.warm distance followed by Euclidean

(RPM =0.19; p = .001). All other partial Mantel tests using either the distance (Table 2). This result supported the use of Euclidean and Euclidean or temp.warm distances as alternative variables were not temp.warm distances as single distance measures for geographic significant, indicating other measures of geographic and environ- and environmental distances, respectively, in SEMs. AIC scores for mental distances were unsupported. MLPE linear mixed effects models with Euclidean and temp.warm distances were >2 points lower than all other models, suggesting that alternative measures of geographic and environmental isolation 3.3.2 | Structural equation modelling (SEM) were unsupported (Burnham & Anderson, 1998).

Based on results of RCM analysis, Euclidean distance and temp.warm distance were used as single measures of geographic and environmen- 3.4 | Population divergence of candidate loci tal distance, respectively, in our causal path network. The full model, including direct paths from both geographic distance (Euclidean dis- All 15 independent Bayescan runs identified the same list of 33 out- tance) and environmental distance (temp.warm distance) to genetic lier loci based on elevated FST values (2.39% of 1,382 SNPs). Positive distance, was better supported than alternative models excluding ei- alpha values for each of these 33 loci indicated they were under pu- ther geographic or environmental distance (Table 1). Path coefficients tative divergent selection, rather than purifying or balancing selec- for Euclidean distance and temp.warm distance were 0.120 (p < .001) tion. To visualize the data, −log10 q-values were plotted against FST and 0.331 (p < .001), respectively, suggesting each is positively re- values estimated in the first Bayescan run for all 1,382 SNPs (see lated to genetic distance. Covariance of the two predictor variables in Data S1, Figure S3). The mean of FST values for putative outlier loci the model was 0.592; thus, relative effect sizes should be interpreted was 0.169 (SD = 0.092), ranging from 0.030 to 0.272. The “snpzip” with caution. However, model selection based on AIC indicated that function (adegenet package) identified 17 structural loci that signifi- variation in Euclidean distance and temp.warm distance each have cantly contributed to between-population structure in DAPC. Each important relationships to variation in genetic distance beyond that of these 17 structural loci were contained within the list of 33 candi- associated with their covariance structure. date loci identified by Bayescan.

3.3.3 | Validation of the causal model framework 3.5 | Environmental associations of individual loci

Results of MLPE linear mixed effects models aligned with those of Histograms of adjusted p-values were uniformly distributed with our causal models. Overall, the best supported variable affecting a peak near zero for all three environmental variables, suggesting

0.6 Euclidean

0.4 Least−cost

0.2

Circuit riable

va 0.0 cal Temp.warm Pairwise heatmap

Fo FIGURE 3 −0.2 visualizing reciprocal causal modelling (RCM) results. Values in each cell

Temp.cool represent results of RPM-A–RPM-B, with red and blue colours indicating positive and −0.4 negative values, respectively. Rows and columns contain the focal and alternative Precip. variables, respectively, for partial −0.6 Mantel test A within each reciprocal

m model. Therefore, the figure should be

Precip. Circuit interpreted by rows and not columns; emp.cool T Euclidean Temp.war Least−cost variables on the y-axis with more positive (red) values in their corresponding rows Alternative variable are better supported MACDONALD et al. | 11

TABLE 1 Relative support for structural equation models (SEMs) 3.6 | Genomic contexts of candidate loci inferred using Akaike's information criterion (AIC)

Using both Bayescan and LFMM, a total of 86 candidate loci were SEM structure AIC ΔAIC identified as being under putative divergent selection or having sig- Full model 34,147.52 0 nificant environmental associations. Results of our Lepbase BLAST Environment only 34,288.77 141.25 search are summarized in Data S2, including putative genes matched Geography only 35,203.77 1,056.25 to each of the 86 sequences and their functional annotation. Note: The causal path network in the full model included two regression pathways; one from tify specific narratives of local adaptation. geographic distance to genetic distance and one from environmental distance to genetic distance. Akaike's information criterion (AIC) scores are reported for the full model, a model excluding environmental 4 | DISCUSSION distance (geography only), and a model excluding geographic distance (environment only). Reciprocal causal models (RCM) were used to infer which single measures of geographic and environmental distance 4.1 | Geographic isolation and dispersal machines were most strongly related to genetic distance and used in place of geographic and environmental distance (Euclidean distance and temp. Overt clustering of P. m. dodi occurrences along major river val- warm distance, respectively). leys in southern Canada is generally understood to be a function of both the ecologically restricted occurrences of its larval host plant, TABLE 2 Relative support for effects of geographic and A. dracunculus, and the presence of hilltopping features along river environmental distances on genetic distance inferred using linear valley edges (Bird et al., 1995; Dupuis et al., 2019; Sperling, 1987). mixed effects models with maximum likelihood population effects Accordingly, results of our habitat suitability model support the in- (MLPE) ference that suitable habitat takes on the form of a dendritic eco- Distance measure AIC ΔAIC logical network, with the intervening landscape primarily comprised Temp.warm 32,955.55 0 of unsuitable habitat. Examination of a corresponding cost surface Euclidean 33,147.24 191.69 (Figure 1, inset j) shows that individuals dispersing between river valleys are predicted to experience high resistance/costs. IBR based Circuit 33,152.41 196.86 on habitat suitability therefore predicts that dispersal and resultant Least-cost 34,623.89 1,668.34 gene flow should follow the dendritic ecological network (Figure 1, Precip. 35,055.58 2,100.03 insets j and k). However, both RCM and MLPE linear mixed effects Temp.cool 35,590.9 2,635.35 models suggested that variation in genetic distance was better ex- Note: Akaike's information criterion (AIC) scores are reported for each plained by Euclidean distance than by resistance-based distances model. accounting for arrangements of suitable habitat. It is possible that other landscape features, beyond configurations of suitable habitat, K = 2 adequately controlled for confounding effects of spatial ge- influence dispersal of P. m. dodi and thereby support IBR. For ex- netic structure (see Data S1, Figure S4). LFMM identified a total of ample, Peterman et al. (2014) outlines a methodological approach 78 loci with significant environmental associations (q-values <0.05; for optimizing resistance surfaces using a nonlinear optimization al- Figure 4). Single loci often had multiple significant environmental gorithm and any combination of spatial variables. However, we did associations, probably due to spatial correlation of environmental not have a priori mechanistic hypotheses predicting what variables variables. We therefore only considered the strongest association for might influence P. m. dodi dispersal beyond those associated with each locus based on median |z|-scores (De Kort, Vandepitte, Mergeay, habitat suitability. Lack of support for IBR based on habitat suit- Mijnsbrugge, & Honnay, 2015; Martins et al., 2018). In total, 52 of 57 ability in this system was of principal interest and demonstrates a loci significantly associated with temp.warm were more strongly asso- clear decoupling of landscape permeability from habitat suitability ciated with temp.warm than any other environmental variable, 12 of (c.f. Coyne & Orr, 2004; Crispo et al., 2006; McRae, 2006; McRae 27 for temp.cool, and 14 of 43 for precip. A total of 25 loci were iden- & Beier, 2007; Thorpe et al., 2008; Wang et al., 2009; Sánchez- tified by both LFMM and Bayescan analyses as being under putative Ramírez et al., 2018). selection; 23 of these loci were most strongly associated with temp. In addition to mating and reproduction associated with adult life warm, 0 with temp.cool, and 2 with precip.. LFMM identified 56 loci stages of invertebrates, it may be instructive to think of adult P. m. with significant environmental associations that were not identified as dodi as dispersal specialists within the taxon's life cycle, with greater candidate loci by Bayescan, while Bayescan identified 8 candidate loci vagility and more general habitat tolerances than larval stages. Such that were not identified by LFMM as having significant environmental traits enable long-distance dispersal of adults across considerable associations. stretches of unsuitable habitat, implicated in the spatial discordance 12 | MACDONALD et al.

15 (a) mean temperature of warmest quarter 10 05

15 (b) mean temperature of coolest quarter 10 alue q−v 10 −lo g 05

15 (c) mean annual precipitation 10 05

0 200 400 600 800100012001400

SNP (ordered by contig arbitrarily)

FIGURE 4 Associations between allele frequencies of loci (n = 1,382) and (a) mean temperature of the warmest quarter of the year (temp.warm); (b) mean temperature of the coolest quarter (temp.cool); and (c) mean annual precipitation (precip.) in latent factor mixed models (LFMM) with K = 2. Black dots are loci with significant associations to the relevant environmental variable based on a q-value threshold of 0.05 (−log10 q-value ~1.3). Single loci often had multiple significant environmental associations due to spatial correlation of environmental variables. Open circles represent the strongest association (based on median |z|-scores) for each locus with a significant environmental association. Loci are arranged on the x-axis in order of position within scaffolds, which are in turn arranged by increasing size

of the cluster assignments of three adult individuals (i.e. migrants landscapes. As a consequence of the dispersal machine concept, found in non-natal populations; see Figure 2 and Data S1, Figure considerable support for IBR in past research cannot necessarily be S1). Indeed, for many vagile butterfly species occupying fragmented extrapolated to organisms with disparate life histories; particularly, landscapes, instances of long-distance dispersal between patches if the life cycles of focal taxa include a discrete dispersal life stage or corridors of suitable habitat are rare, but are known to have im- with substantially different habitat constraints than other life stages. portant consequences for gene flow, metapopulation dynamics, and emergent diversity patterns (Hanski, 1998; MacDonald, Anderson, Acorn, & Nielsen, 2018; Nowicki et al., 2014; Wiens 2001). We 4.2 | Environmental isolation therefore propose that habitat suitability and landscape permeabil- ity should be evaluated as distinct concepts for taxa with discrete Beyond the effects of geographic isolation on genetic divergence, dispersal life stages, even if they are habitat specialists. As a men- this study provides multiple lines of evidence for strong associations tal shortcut, we suggest a “dispersal machine” concept, similar to between genetic variation and environmental conditions. First, anal- Dawkins's (1976) selfish gene perspective on evolution by natural yses of population structure within P. m. dodi indicated the presence selection, in which genes metaphorically build “survival machines” of two prominent genetic clusters, with their spatial separation cor- (i.e., bodies of organisms) to facilitate their own stability and repli- responding to a cline in mean temperature of the warmest quarter cation. In landscape genetics, it may be instructive to understand (temp.warm). Although any correlative relationship between spatial adult life stages of many terrestrial invertebrates as not only the life genetic structure and an environmental variable may be a product stage in which mating and reproduction occur, but also as “dispersal of spatial autocorrelation with a third unconsidered but causative machines,” exhibiting greater vagility and broader habitat tolerances variable, our results accord our a priori hypotheses relating sum- than larval life stages. Such characteristics are likely to facilitate mer temperatures to variation in phenology, diapause propensity, long-distance dispersal resulting in gene flow across heterogeneous and voltinism within P. m. dodi, suggesting this relationship is both MACDONALD et al. | 13 meaningful and worthy of further consideration. Second, RCM, genome assembly (>60 thousand scaffolds) precludes more in-depth SEM, and MLPE linear mixed effects models indicated that genetic comparative genomic analyses of this region. distance among individuals was strongly associate with environmen- Local adaptation to environmental conditions has been docu- tal distance (specifically, temp.warm distance) beyond its covariance mented in a number of butterfly species, with variation in voltinism structure shared with geographic distance (specifically, Euclidean and diapause propensity across environmental gradients being a distance). Finally, Bayescan identified 33 FST outlier loci inferred focal point of past work (e.g., Aalberg Haugen & Gotthard, 2015; to be under putative spatially divergent selection between the two Friberg, Bergman, Kullberg, Wahlberg, & Wiklund, 2008; Pruisscher, primary genetic clusters. Of these 33 FST outlier loci, 23 were also Nylin, Gotthard, & Wheat, 2018; Ryan, Valella, Thivierge, Aardema, identified by LFMM analysis as having allele frequencies that were & Scriber, 2018). Papilio machaon dodi is known to exhibit variation significantly associated with temp.warm values, meaning most loci in voltinism and diapause propensity across its Canadian range, with identified as being under putative spatially divergent selection were northern and southern populations exhibiting one and two gen- also significantly associated with variation in summer temperatures. erations per year, respectively (Bird et al., 1995; Sperling, 1987). Referring back to our original hypotheses, mechanisms by which Additionally, we have noted that a proportion (~25%) of pupae environmental isolation structures genetic divergence may be in- reared from northern populations (collected near Drumheller, AB) ferred from patterns of population structure. Spatial discordance of require two distinct cooling cycles before emergence occurs, while a few individuals’ cluster assignments suggests that some dispersal pupae reared from southern populations (collected near Taber, AB) of adult individuals between natal and non-natal regions of the study consistently emerge after a single cooling cycle (unpublished data, area occurs (and see Dupuis & Sperling, 2016). Additionally, some and see Dupuis et al., 2016; Sperling, 1987). We hypothesize that admixture indicative of F1 hybridization between migrant and natal this facultative second diapause in northern populations represents individuals is evident (individuals with ~ 50/50 split of Q-values in an ecological “hedging of bets” (sensu Seger & Brockmann, 1987), Structure plots). However, further admixture among the two pri- distributing risks of high mortality and low fecundity due to poor mary genetic clusters was not prevalent, possibly due to reduced environmental conditions across multiple years (Dupuis et al., 2016; fitness and negative selection on admixed genotypes. We therefore Hanski, 1988; Tuljapurkar, 1990). Specific mechanisms by which infer that mechanisms by which environmental isolation contributes variation in voltinism and diapause propensity might contribute to to genetic divergence in P. m. dodi may be a combination of reduced divergent selection between northern and southern P. m. dodi pop- fitness and negative selection on both individuals that have dis- ulations are not entirely clear, but there are several possibilities. persed across environmental gradients and genetically intermediate Following hybridization between natal and migrant individuals in the individuals resulting from hybridization. northern extent of the study area, genetically intermediate offspring may experience high mortality if over-winter diapause is not induced in the first generation, as individuals of a second generation will 4.3 | Adaptation to local environmental conditions lack sufficient day-degree accumulation and resources to complete their life cycle and enter over-winter diapause before temperatures Our combination of analyses suggests that variation in environ- drop to lethal levels. Conversely, genetically intermediate individuals ment conditions has significant effects on genetic structure within emerging in the southern extent of the study area may experience P. m. dodi, possibly attributed to divergent selection across environ- reduced fecundity and fitness relative to natal individuals if only one mental gradients resulting in local adaptation. BLAST searches of generation of genetically intermediate offspring emerge annually candidate loci (both FST outliers and loci with significant associa- and/or a proportion of genetically intermediate individuals undergo tions) and their flanking sequences did not resolve annotated genes a second diapause, resulting in a partially semivoltine lifecycle. By with biological functions sufficient to justify specific narratives of these mechanisms, divergent selection on voltinism and diapause local adaptation. However, a total of 52 loci were significantly and propensity may maintain the integrity of the two genetic clusters. most strongly associated with mean temperature of the warmest However, further work is required to evaluate the validity of these quarter (temp.warm)—the period in which development, reproduc- hypotheses. tion, and diapause initiation occur. Nineteen of these 52 loci were evenly distributed along a single scaffold (NW_014538813.1, length 6.9 Mb) within the P. machaon reference genome, corresponding to 4.4 | Environmental determinants of genetic the spike in − log10 q-values observed in LFMM analysis (Figure 1, diversity versus species occurrence inset a). Seventeen of these 19 loci were identified by Bayescan as being under putative divergent selection based on elevated FST val- The environmental variables that were most strongly associated ues. Considered together, these results suggest the existence of an with genetic variation in P. m. dodi (temp.warm, followed by precip., island of genomic differentiation (sensu Harr, 2006; Turner, Hahn, & followed by temp.cool) differed from those that best predicted P. Nuzhdin, 2005) between northern and southern P. m. dodi popula- m. dodi occurrences in our habitat suitability model (temp.cool, fol- tions, possibly corresponding to local adaptation to environmental lowed by precip., followed by temp.warm). While summer tempera- conditions. However, the relatively low quality of the P. machaon tures may influence population structure in P. m. dodi via spatially 14 | MACDONALD et al. divergent selection related to phenology, diapause propensity, and not, these range expansions are not sufficient to offset contractions voltinism, winter temperatures may influence habitat suitability and toward the equator (Lewthwaite et al., 2018). Indeed, there exist limit the range of P. m. dodi due to limited cold tolerance of pupae. few opportunities for P. m. dodi to track its climatic niche northward Indeed, winter temperatures have been inferred to limit ranges of as temperatures rise. Steep south-facing riverbanks that might pro- other congeneric species (e.g., Kukal, Ayres, & Scriber, 1991; Scriber, vide adequate A. dracunculus habitat are sparse north of the current Maher, & Aardema, 2012; Yoshio & Ishii, 2001). Considered together, range of P. m. dodi and successional changes to the composition of our inferences suggest that the environmental/ecological conditions riverbank vegetation that could provide suitable habitat are unlikely that influence divergent selection and possibly facilitate ecological to match the pace of the southern genetic cluster's northward ex- speciation (Foll & Gaggiotti, 2006; Nosil, 2012; Thorpe et al., 2008) pansion. We therefore hypothesize that genotypes unique to the may differ from those that limit species’ ranges and structure emer- northern genetic cluster may be displaced by the end of the century. gent patterns of species diversity (e.g., due to environmental filter- ing; sensu Kraft et al., 2015). Landscape genetic analyses, comparing ACKNOWLEDGEMENTS the environmental/ecological conditions that influence divergent We thank Jan Scott, Dave Laurie, and Kelley Mulligan for assistance selection to those that limit species’ ranges, are required for multiple with field work, Erin Campbell, Janet Sperling, and Tyler Nelson species to assess the generality of this finding. for assistance with lab procedures and bioinformatic processing, Clayton Lamb, Diana Stralberg, and Heather Proctor for assistance with analyses and related insights, and Sophie Dang and staff of the 4.5 | Anticipated changes to genetic structure in Molecular Biology Facility (MBSU) at the University of Alberta for Papilio machaon dodi Illumina sequencing. We also thank five anonymous reviewers for suggestions on an earlier draft of this manuscript. This work was Currently, no members of the P. machaon species group are listed as supported by an Alberta Conservation Association (ACA) Grant in being of conservation concern in Canada. However, recognition of Biodiversity to Z.G.M., Natural Sciences and Engineering Research cryptic evolutionary significant units (sensu Ryder; 1986), such as the Council (NSERC) Discovery Grants to F.A.H.S. (RGPIN-2018-04920) northern and southern genetic clusters identified in this study, may and S.E.N. (RGPIN-2014-04842), and an NSERC Alexander Graham affect future conservation directives in light of continued climate Bell Canada Graduate Scholarship – Doctoral (CGS – D) to Z.G.M. change. Our study resolved that a northern genetic cluster, which includes the type specimen for P. m. dodi (Kondla, 1981), is geograph- AUTHOR CONTRIBUTIONS ically restricted and occupies a climatic niche that is distinct from Z.G.M., J.R.D., and F.A.H.S. developed the study design; Z.G.M. more southerly populations in Alberta and Saskatchewan. If genetic completed specimen collection and DNA extraction and purification; divergence between the northern and southern genetic clusters is C.S.D. designed and completed library preparation and sequenc- indeed driven by local adaptation to environmental conditions, con- ing; Z.G.M. completed the analyses with assistance from J.R.D. and tinued climate change and rising summer temperatures may lead to F.A.H.S. and advice from J.H.A., S.E.N.; Z.G.M. led the writing of this the displacement of the northern genetic cluster as genotypes and manuscript with assistance of all authors. associated phenological traits of southern populations become more advantageous across the northern extent of the range of P. m. dodi. DATA AVAILABILITY STATEMENT Quantitative data support these predictions. Mean tempera- DNA sequences in fastq format: Genbank accessions: ture of the warmest quarter differed by an average of 1.60°C be- SAMN13811887 - SAMN13812047; NCBI SRA: PRJNA600290. tween collection locations of individuals belonging to the northern Geographic data, environmental data, and Maxent input files: Dryad and southern genetic clusters (excluding migrant individuals). https://doi.org/10.5061/dryad.w0vt4​b8ms Based on the ClimateWNA model (Wang, Hamann, Spittlehouse, & Sampling locations of sequenced individuals: Dryad https://doi. Murdock, 2012), which provides climate data from 24 general cir- org/10.5061/dryad.w0vt4​b8ms culation models, mean annual temperature for Alberta is predicted to rise by 2.8°C– 4.2°C by the end of the century, contingent on ORCID emission scenarios (Schneider, 2013). Accordingly, growing de- Zachary G. MacDonald https://orcid.org/0000-0002-7966-5712 gree-days, based on a break point of 5°C, are estimated to increase Julian R. Dupuis https://orcid.org/0000-0002-6989-9179 33%–56% (Schneider, 2013). Within Alberta and Saskatchewan, Corey S. Davis https://orcid.org/0000-0003-4362-0659 changes in mean annual temperature, growing degree-days, and John H. Acorn https://orcid.org/0000-0001-6570-1949 vegetation composition are expected to be most pronounced in Scott E. Nielsen https://orcid.org/0000-0002-9754-0630 central and southern regions, including the present-day range of P. Felix A. H. Sperling https://orcid.org/0000-0001-5148-4226 m. dodi (Barber, Nielsen, & Hamann, 2016; Schneider, 2013; Zhang, Nielsen, Stolar, Chen, & Thuiller, 2015). There is some evidence that REFERENCES vagile North American butterfly species may track their climatic Aalberg Haugen, I. M., & Gotthard, K. (2015). Diapause induction and niches poleward as temperatures warm; however, more often than relaxed selection on alternative developmental pathways in a MACDONALD et al. | 15

butterfly. Journal of Ecology, 84(2), 464–472. https://doi. outbreeding and selfing populations. Molecular Ecology, 22(5), 1383– org/10.1111/1365-2656.12291 1399. https://doi.org/10.1111/mec.12182 Barber, Q. E., Nielsen, S. E., & Hamann, A. (2016). Assessing the vulner- Dennis, R. L., Shreeve, T. G., & Van Dyck, H. (2003). Towards a functional ability of rare plants using climate change velocity, habitat connec- resource-based concept for habitat: A butterfly biology viewpoint. tivity, and dispersal ability: A case study in Alberta. Canada. Regional Oikos, 417–426. Environmental Change, 16(5), 1433–1441. https://doi.org/10.1007/ Devlin, B., & Roeder, K. (1999). Genomic control for associ- s10113-015-0870-6 ation studies. Biometrics, 55(4), 997–1004. https://doi. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery org/10.1111/j.0006-341X.1999.00997.x rate: A practical and powerful approach to multiple testing. Journal Dobzhansky, T. G. (1937). Genetics and the origin of species. New York: of the Royal Statistical Society: Series B (Methodological), 57(1), Columbia University Press. 289–300. Dupuis, J. R., Cullingham, C. I., Nielsen, S. E., & Sperling, F. A. H. (2019). Bird, C. D., Hilchie, G. J., Kondla, N. G., Pike, E. M., & Sperling, F. A. Environmental effects on gene flow in a species complex of vagile, H. (1995). Alberta butterflies. Edmonton, Alberta: The Provincial hilltopping butterflies. Biological Journal of the Linnean Society, 127(2), Museum of Alberta. 417–428. https://doi.org/10.1093/bioli​nnean/​blz043 Bohonak, A. J. (1999). Dispersal, gene flow, and population struc- Dupuis, J. R., Mori, B. A., & Sperling, F. A. H. (2016). Trogus parasitoids ture. The Quarterly Review of Biology, 74(1), 21–45. https://doi. of Papilio butterflies undergo extended diapause in western Canada org/10.1086/392950 (, ). Journal of Hymenoptera Research, Broquet, T., Ray, N., Petit, E., Fryxell, J. M., & Burel, F. (2006). Genetic iso- 50, 179. https://doi.org/10.3897/JHR.50.9158 lation by distance and landscape connectivity in the American mar- Dupuis, J. R., & Sperling, F. A. H. (2015). Repeated reticulate evolu- ten (Martes americana). Landscape Ecology, 21(6), 877–889. https:// tion in North American Papilio machaon group swallowtail but- doi.org/10.1007/s10980-005-5956-y terflies. PLoS One, 10, e0141882. https://doi.org/10.1371/journ​ Burke, R. J., Fitzsimmons, J. M., & Kerr, J. T. (2011). A mobility index al.pone.0141882 for Canadian butterfly species based on naturalists’ knowledge. Dupuis, J. R., & Sperling, F. A. H. (2016). Hybrid dynamics in a species Biodiversity and Conservation, 20(10), 2273–2295. https://doi. group of swallowtail butterflies. Journal of Evolutionary Biology, 29, org/10.1007/s10531-011-0088-y 1932–1951. https://doi.org/10.1111/jeb.12931 Burnham, K. P., & Anderson, D. R. (1998). Model selection and inference: A Earl, D. A., & vonHoldt, B. M. (2012). STRUCTURE HARVESTER: A practical information-theoretic approach. New York: Springer-Verlag. website and program for visualizing STRUCTURE output and imple- Challis, R. J., Kumar, S., Dasmahapatra, K. K. K., Jiggins, C. D., & Blaxter, menting the Evanno method. Conservation Genetics Resources, 4(2), M. (2016). Lepbase: The Lepidopteran genome database. BioRxiv, 359–361. https://doi.org/10.1007/s12686-011-9548-7 56994. Epps, C. W., Wehausen, J. D., Bleich, V. C., Torres, S. G., & Brashares, J. Clarke, R. T., Rothery, P., & Raybould, A. F. (2002). Confidence limits S. (2007). Optimizing dispersal and corridor models using landscape for regression relationships between distance matrices: Estimating genetics. Journal of Applied Ecology, 44(4), 714–724. https://doi. gene flow with distance. Journal of Agricultural, Biological, and org/10.1111/j.1365-2664.2007.01325.x Environmental Statistics, 7(3), 361. https://doi.org/10.1198/10857​ Evanno, G., Regnaut, S., & Goudet, J. (2005). Detecting the number 1102320 of clusters of individuals using the software STRUCTURE: A sim- Coulon, A., Cosson, J. F., Angibault, J. M., Cargnelutti, B., Galan, M., ulation study. Molecular Ecology, 14(8), 2611–2620. https://doi. Morellet, N., … Hewison, A. J. M. (2004). Landscape connectivity in- org/10.1111/j.1365-294X.2005.02553.x fluences gene flow in a roe deer population inhabiting a fragmented Fick, S. E., & Hijmans, R. J. (2017). WorldClim 2: New 1-km spatial res- landscape: An individual–based approach. Molecular Ecology, 13(9), olution climate surfaces for global land areas. International Journal 2841–2850. https://doi.org/10.1111/j.1365-294X.2004.02253.x of Climatology, 37(12), 4302–4315. https://doi.org/10.1002/ Coyne, J. A., & Orr, H. A. (2004). Speciation. Sunderland, MA: Sinauer joc.5086 Associates. Foll, M., & Gaggiotti, O. (2006). Identifying the environmental factors Crispo, E., Bentzen, P., Reznick, D. N., Kinnison, M. T., & Hendry, A. P. that determine the genetic structure of populations. Genetics, 174, (2006). The relative influence of natural selection and geography 875–891. https://doi.org/10.1534/genetics.106.059451​ on gene flow in guppies. Molecular Ecology, 15, 49–62. https://doi. Foll, M., & Gaggiotti, O. (2008). A genome-scan method to identify se- org/10.1111/j.1365-294X.2005.02764.x lected loci appropriate for both dominant and codominant mark- Cushman, S. A., McKelvey, K. S., Hayden, J., & Schwartz, M. K. (2006). ers: A Bayesian perspective. Genetics, 180, 977–993. https://doi. Gene flow in complex landscapes: Testing multiple hypotheses with org/10.1534/genetics.108.092221​ causal modeling. The American Naturalist, 168(4), 486–499. https:// Fourtune, L., Prunier, J. G., Paz-Vinas, I., Loot, G., Veyssière, C., & Blanchet, doi.org/10.1086/506976 S. (2018). Inferring causalities in landscape genetics: An extension Cushman, S., Wasserman, T., Landguth, E., & Shirk, A. (2013). Re- of Wright’s causal modeling to distance matrices. The American evaluating causal modeling with Mantel tests in landscape genetics. Naturalist, 191(4), 491–508. https://doi.org/10.1086/696233 Diversity, 5(1), 51–72. https://doi.org/10.3390/d5010051 Friberg, M., Bergman, M., Kullberg, J., Wahlberg, N., & Wiklund, C. Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, (2008). Niche separation in space and time between two sympatric M. A., … Durbin, R. (2011). The variant call format and VCFtools. sister species – A case of ecological pleiotropy. Evolutionary Ecology, Bioinformatics, 27(15), 2156–2158. https://doi.org/10.1093/bioin​ 22, 1–18. https://doi.org/10.1007/s10682-007-9155-y forma​tics/btr330 Frichot, E., & François, O. (2015). LEA: An R package for landscape and Dawkins, R. (1976). The selfish gene. New York: Oxford University Press. ecological association studies. Methods in Ecology and Evolution, 6(8), De Kort, H., Vandepitte, K., Mergeay, J., Mijnsbrugge, K. V., & Honnay, O. 925–929. https://doi.org/10.1111/2041-210X.12382 (2015). The population genomic signature of environmental selection Frichot, E., Schoville, S. D., Bouchard, G., & François, O. (2013). Testing in the widespread -pollinated tree species Frangula alnus at dif- for associations between loci and environmental gradients using la- ferent geographical scales. Heredity, 115(5), 414–425. tent factor mixed models. Molecular Biology & Evolution, 30(7), 1687– De Mita, S., Thuillet, A. C., Gay, L., Ahmadi, N., Manel, S., Ronfort, J., 1699. https://doi.org/10.1093/molbev/mst063​ & Vigouroux, Y. (2013). Detecting selection along environmental Grace, J. B. (2006). Structural equation modeling and natural systems. gradients: Analysis of eight methods and their effectiveness for Cambridge, UK: Cambridge University Press. 16 | MACDONALD et al.

Greenwood, P. J. (1980). Mating systems, philopatry and dispersal in Manel, S., Schwartz, M. K., Luikart, G., & Taberlet, P. (2003). Landscape birds and mammals. Animal Behaviour, 28, 1140–1162. https://doi. genetics: Combining landscape ecology and population genetics. org/10.1016/S0003-3472(80)80103-5 Trends in Ecology & Evolution, 18(4), 189–197. https://doi.org/10.1016/ Günther, T., & Coop, G. (2013). Robust identification of local adapta- S0169-5347(03)00008-9 tion from allele frequencies. Genetics, 195(1), 205–220. https://doi. Martin, M. (2011). Cutadapt removes adapter sequences from org/10.1534/genetics.113.152462​ high-throughput sequencing reads. Embnet Journal, 17(1), 10–12. Hanski, I. (1988). Four kinds of extra long diapause in : A review of https://doi.org/10.14806/ej.17.1.200​ theory and observations. Annales Zoologici Fennici, 25, 37–53. Martin, S. H., Dasmahapatra, K. K., Nadeau, N. J., Salazar, C., Walters, J. Hanski, I. (1998). Metapopulation dynamics. Nature, 396, 41–49. https:// R., Simpson, F., … Jiggins, C. D. (2013). Genome-wide evidence for doi.org/10.1038/23876 speciation with gene flow in Heliconius butterflies. Genome Research, Harr, B. (2006). Genomic islands of differentiation between house mouse 23(11), 1817–1828. subspecies. Genome Research, 16, 730–737. https://doi.org/10.1101/ Martins, K., Gugger, P. F., Llanderal-Mendoza, J., González-Rodríguez, gr.5045006 A., Fitz-Gibbon, S. T., Zhao, J. L., … Sork, V. L. (2018). Landscape ge- Hijmans, R. J., Phillips, S., Leathwick, J., & Elith, J. (2011). Package ‘dismo’. nomics provides evidence of climate-associated genetic variation in Available online at: http://cran.r-proje​ct.org/web/packa​ges/dismo/​ Mexican populations of Quercus rugosa. Evolutionary Applications, index.html 11(10), 1842–1858. Hijmans, R. J., & van Etten, J. (2012). raster: Geographic analysis and mod- Mayr, E. (1963). Animal species and evolution. Cambridge, MA: Harvard eling with raster data. R package version 2.0-12. http://CRAN.R-proje​ University Press. ct.org/packa​ge=raster McRae, B. H. (2006). Isolation by resistance. Evolution, 60, 1551–1561. Jombart, T. (2008). adegenet: A R package for the multivariate analy- https://doi.org/10.1111/j.0014-3820.2006.tb00500.x​ sis of genetic markers. Bioinformatics, 24, 1403–1405. https://doi. McRae, B. H., & Beier, P. (2007). Circuit theory predicts gene flow in org/10.1093/bioin​forma​tics/btn129 plant and animal populations. Proceedings of the National Academy Jombart, T., Devillard, S., & Balloux, F. (2010). Discriminate analysis of of Sciences USA, 104, 19885–19890. https://doi.org/10.1073/ principal components: A new method for the analysis of genetically pnas.07065​68104 structured populations. BMC Genetics, 11, 94. pmid:20950446 Meirmans, P. G. (2012). The trouble with isolation by dis- Keller, D., & Holderegger, R. (2013). Damselflies use different move- tance. Molecular Ecology, 21(12), 2839–2846. https://doi. ment strategies for short-and long-distance dispersal. Insect org/10.1111/j.1365-294X.2012.05578.x Conservation and Diversity, 6(5), 590–597. https://doi.org/10.1111/ Narum, S. R., & Hess, J. E. (2011). Comparison of FST outlier tests for icad.12016 SNP loci under selection. Molecular Ecology Resources, 11, 184–194. Kondla, N. G. (1981). Type localities of the Badlands Old World Nosil, P. (2004). Reproductive isolation caused by visual predation on Swallowtail in Alberta. Blue Jay, 39, 144. migrants between divergent environments. Proceedings of the Royal Kraft, N. J. B., Adler, P. B., Godoy, O., James, E. C., Fuller, S., & Levine, J. Society of London. Series B: Biological Sciences, 271(1547), 1521–1528. M. (2015). Community assembly, coexistence and the environmen- https://doi.org/10.1098/rspb.2004.2751 tal filtering metaphor. Functional Ecology, 29, 592–599. https://doi. Nosil, P. (2012). Ecological speciation. Oxford: Oxford University Press. org/10.1111/1365-2435.12345 Nosil, P., Vines, T. H., & Funk, D. J. (2005). Reproductive isolation caused Kukal, O., Ayres, M. P., & Scriber, J. M. (1991). Cold tolerance of the pupae by natural selection against immigrants from divergent habitats. in relation to the distribution of swallowtail butterflies. Canadian Evolution, 59, 705–719. https://doi.org/10.1111/j.0014-3820.2005. Journal of Zoology, 69(12), 3028–3037. https://doi.org/10.1139/ tb017​47.x z91-427 Nowicki, P., Vrabec, V., Binzenhöfer, B., Feil, J., Zakšek, B., Hovestadt, T., Lee, C. R., & Mitchell-Olds, T. (2011). Quantifying effects of envi- & Settele, J. (2014). Butterfly dispersal in inhospitable matrix: Rare, ronmental and geographical factors on patterns of genetic dif- risky, but long-distance. Landscape Ecology, 29(3), 401–412. https:// ferentiation. Molecular Ecology, 20, 4631–5464. https://doi. doi.org/10.1007/s10980-013-9971-0 org/10.1111/j.1365-294X.2011.05310.x O'Connell, K. A., Mulder, K. P., Maldonado, J., Currie, K. L., & Ferraro, Lewthwaite, J. M. M., Angert, A. L., Kembel, S. W., Goring, S. J., Davies, D. M. (2019). Sampling related individuals within ponds biases T. J., Mooers, A. Ø., … Kerr, J. T. (2018). Canadian butterfly climate estimates of population structure in a pond-breeding amphibian. debt is significant and correlated with range size. Ecography, 41(12), Ecology and Evolution, 9(6), 3620–3636. https://doi.org/10.1002/ 2005–2015. https://doi.org/10.1111/ecog.03534 ece3.4994 Li, H. (2013). Aligning sequence reads, clone sequences and assembly contigs Oksanen, J., Kindt, R., Legendre, P., O’Hara, B., Stevens, M. H. H., with BWA-MEM. arXiv preprint arXiv:1303.3997. Oksanen, M. J., & Suggests, M. A. S. S. (2007). The vegan package. Li, H., & Durbin, R. (2009). Fast and accurate short read alignment with Community Ecology Package, 10, 631–637. Burrows-Wheeler transform. Bioinformatics, 25(14), 1754–1760. Pebesma, E. J., & Bivand, R. S. (2005). Classes and methods for spatial https://doi.org/10.1093/bioin​forma​tics/btp324 data in R. R News, 5(2), 9–13. https://cran.r-proje​ct.org/doc/Rnews/. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … Durbin, Peterman, W. E. (2018). ResistanceGA: An R package for the op- R. (2009). The sequence alignment/map format and SAMtools. timization of resistance surfaces using genetic algorithms. Bioinformatics, 25(16), 2078–2079. https://doi.org/10.1093/bioin​ Methods in Ecology and Evolution, 9(6), 1638–1647. https://doi. formatics/btp352​ org/10.1111/2041-210X.12984 Lotterhos, K. E., & Whitlock, M. C. (2014). Evaluation of demographic his- Peterman, W. E., Connette, G. M., Semlitsch, R. D., & Eggert, L. S. (2014). tory and neutral parameterization on the performance of FST outlier Ecological resistance surfaces predict fine-scale genetic differentia- tests. Molecular Ecology, 23(9), 2178–2192. tion in a terrestrial woodland salamander. Molecular Ecology, 23(10), MacDonald, Z. G., Anderson, I. D., Acorn, J. H., & Nielsen, S. E. (2018). 2402–2413. https://doi.org/10.1111/mec.12747 Decoupling habitat fragmentation from habitat loss: Butterfly spe- Peterson, B. K., Weber, J. N., Kay, E. H., Fisher, H. S., & Hoekstra, H. E. cies mobility obscures fragmentation effects in a naturally frag- (2012). Double digest RADseq: An inexpensive method for de novo mented landscape of lake islands. Oecologia, 186(1), 11–27. https:// SNP discovery and genotyping in model and non-model species. PLoS doi.org/10.1007/s00442-017-4005-2 One, 7(5), e37135. https://doi.org/10.1371/journ​al.pone.0037135 MACDONALD et al. | 17

Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy Scriber, J. M., Maher, E., & Aardema, M. L. (2012). Differential effects modeling of species geographic distributions. Ecological Modelling, of short term winter thermal stress on diapausing tiger swallowtail 190, 231–259. https://doi.org/10.1016/j.ecolm​odel.2005.03.026 butterflies (Papilio spp.). Insect Science, 19(3), 277–285. Phillipsen, I. C., Kirk, E. H., Bogan, M. T., Mims, M. C., Olden, J. D., Seger, J., & Brockmann, H. J. (1987). What is bet-hedging? In P. H. Harvey, & Lytle, D. A. (2015). Dispersal ability and habitat requirements & L. Partridge (Eds.), Oxford surveys in evolutionary biology (pp. 182– determine landscape-level genetic patterns in desert aquatic in- 211). Oxford: Oxford University Press. sects. Molecular Ecology, 24(1), 54–69. https://doi.org/10.1111/ Shirk, A. J., Landguth, E. L., & Cushman, S. A. (2017). A comparison of mec.13003 individual-based genetic distance metrics for landscape genet- Poland, J. A., Brown, P. J., Sorrells, M. E., & Jannink, J. L. (2012). ics. Molecular Ecology Resources, 17(6), 1308–1317. https://doi. Development of high-density genetic maps for barley and org/10.1111/1755-0998.12684 wheat using a novel two-enzyme genotyping-by-sequencing ap- Shirk, A. J., Wallin, D. O., Cushman, S. A., Rice, C. G., & Warheit, K. I. proach. PLoS One, 7(2), e32253. https://doi.org/10.1371/journ​ (2010). Inferring landscape effects on gene flow: A new model se- al.pone.0032253 lection framework. Molecular Ecology, 19, 3603–3619. https://doi. Pritchard, J. K., Stephens, M., & Donnelly, P. (2000). Inference of pop- org/10.1111/j.1365-294X.2010.04745.x ulation structure using multilocus genotype data. Genetics, 155(2), Slatkin, M. (1987). Gene flow and the geographic structure of natural 945–959. populations. Science, 236, 787–792. https://doi.org/10.1126/scien​ Pruisscher, P., Nylin, S., Gotthard, K., & Wheat, C. W. (2018). Genetic ce.3576198 variation underlying local adaptation of diapause induction along a Sork, V. L., Nason, J., Campbell, D. R., & Fernandez, J. F. (1999). cline in a butterfly. Molecular Ecology, 27(18), 3613–3626. https://doi. Landscape approaches to historical and contemporary gene flow org/10.1111/mec.14829 in plants. Trends in Ecology & Evolution, 14(6), 219–224. https://doi. Quinlan, A. R., & Hall, I. M. (2010). BEDTools: A flexible suite of utili- org/10.1016/S0169-5347(98)01585-7 ties for comparing genomic features. Bioinformatics, 26(6), 841–842. Sperling, F. A. H. (1987). Evolution of the Papilio machaon species https://doi.org/10.1093/bioin​forma​tics/btq033 group in western Canada (Lepidoptera: Papilionidae). Quaestiones Richardson, J. L., Brady, S. P., Wang, I. J., & Spear, S. F. (2016). Navigating Entomologica, 23, 198–315. the pitfalls and promise of landscape genetics. Molecular Ecology, Sperling, F. A. (1990). Natural hybrids of Papilio (Insecta: Lepidoptera): 25(4), 849–863. https://doi.org/10.1111/mec.13527 Poor or interesting evolutionary problem? Canadian Rochette, N. C., Rivera-Colón, A. G., & Catchen, J. M. (2019). Stacks 2: Journal of Zoology, 68(8), 1790–1799. Analytical methods for paired-end sequencing improve RADseq- Storfer, A., Murphy, M. A., Spear, S. F., Holderegger, R., & Waits, L. P. (2010). based population genomics. Molecular Ecology, 28(21), 4737–4754. Landscape genetics: Where are we now? Molecular Ecology, 19(17), https://doi.org/10.1111/mec.15253 3496–3514. https://doi.org/10.1111/j.1365-294X.2010.04691.x Rosseel, Y. (2012). Lavaan: An R package for structural equation model- Thorpe, R. S., Surget-Groba, Y., & Johansson, H. (2008). The relative ing and more. Version 0.5–12 (BETA). Journal of Statistical Software, importance of ecology and geographic isolation for speciation in 48(2), 1–36. anoles. Philosophical Transactions of the Royal Society B: Biological Rousset, F. (1997). Genetic differentiation and estimation of gene Sciences, 363, 3071–3081. https://doi.org/10.1098/rstb.2008.0077 flow from F-statistics under isolation by distance. Genetics, 145, Tuljapurkar, S. (1990). Delayed reproduction and fitness in variable en- 1219–1228. vironments. Proceedings of the National Academy of Sciences, 87(3), Row, J. R., Knick, S. T., Oyler-McCance, S. J., Lougheed, S. C., & Fedy, B. 1139–1143. https://doi.org/10.1073/pnas.87.3.1139 C. (2017). Developing approaches for linear mixed modeling in land- Turner, T. L., Hahn, M. W., & Nuzhdin, S. V. (2005). Genomic islands of scape genetics through landscape-directed dispersal simulations. speciation in Anopheles gambiae. PLOS Biology, 3, e285. https://doi. Ecology and Evolution, 7(11), 3751–3761. https://doi.org/10.1002/ org/10.1371/journ​al.pbio.0030285 ece3.2825 UniProt Consortium. (2018). UniProt: A worldwide hub of protein knowl- Ruiz-Gonzalez, A., Cushman, S. A., Madeira, M. J., Randi, E., & Gómez- edge. Nucleic Acids Research, 47(D1), D506–D515. Moliner, B. J. (2015). Isolation by distance, resistance and/or clus- Vähä, J. P., & Primmer, C. R. (2006). Efficiency of model-based Bayesian ters? Lessons learned from a forest-dwelling carnivore inhabiting a methods for detecting hybrid individuals under different hybridiza- heterogeneous landscape. Molecular Ecology, 24, 5110–5129. https:// tion scenarios and with different numbers of loci. Molecular Ecology, doi.org/10.1111/mec.13392 15(1), 63–72. https://doi.org/10.1111/j.1365-294X.2005.02773.x Ryan, S. F., Valella, P., Thivierge, G., Aardema, M. L., & Scriber, J. M. (2018). Van Buskirk, J., & van Rensburg, A. J. (2020). Relative importance of iso- The role of latitudinal, genetic and temperature variation in the in- lation-by-environment and other determinants of gene flow in an duction of diapause of Papilio glaucus (Lepidoptera: Papilionidae). alpine amphibian. Evolution, 74, 962–978. https://doi.org/10.1111/ Insect Science, 25(2), 328–336. evo.13955 Ryder, O. A. (1986). Species conservation and systematics: The dilemma van Etten, J. (2018). gdistance: Distances and routes on geographical grids. of subspecies. Trends in Ecology and Evolution, 1, 9–10. https://doi. R package version (p. 1.2-2.). https://CRAN.R-proje​ct.org/packa​ org/10.1016/0169-5347(86)90059-5 ge=gdist​ance Sánchez-Ramírez, S., Rico, Y., Berry, K. H., Edwards, T., Karl, A. E., Vekemans, X., & Hardy, O. J. (2004). New insights from fine-scale spatial Henen, B. T., & Murphy, R. W. (2018). Landscape limits gene flow and genetic structure analyses in plant populations. Molecular Ecology, 13, drives population structure in Agassiz’s desert tortoise (Gopherus 921–935. https://doi.org/10.1046/j.1365-294X.2004.02076.x agassizii). Scientific Reports, 8(1), 11231. https://doi.org/10.1038/ Vignieri, S. N. (2005). Streams over mountains: Influence of riparian s41598-018-29395-6 connectivity on gene flow in the Pacific jumping mouse (Zapus Schneider, R. R. (2013). Alberta's natural subregions under a changing cli- trinotatus). Molecular Ecology, 14(7), 1925–1937. https://doi. mate: Past, present, and future. Biodiversity Management and Climate org/10.1111/j.1365-294X.2005.02568.x Change Adaptation Project. Wang, I. J., & Bradburd, G. S. (2014). Isolation by environment. Molecular Schwartz, M. K., McKelvey, K. S., Cushman, S. A., & Luikart, G. (2010). Ecology, 23(23), 5649–5662. https://doi.org/10.1111/mec.12938 Landscape genomics: A brief perspective. In S. A. Cushman, & F. Wang, I. J., Glor, R. E., & Losos, J. B. (2012). Quantifying the roles of Huettmann (Eds.), Spatial complexity, informatics, and wildlife conser- ecology and geography in spatial genetic divergence. Ecology Letters, vation (pp. 165–174). New York: Springer Publishing. 16(2), 175–182. https://doi.org/10.1111/ele.12025 18 | MACDONALD et al.

Wang, I. J., Savage, W. K., & Bradley Shaffer, H. (2009). Landscape genet- Yoshio, M., & Ishii, M. (2001). Relationship between cold hardiness and ics and least-cost path analysis reveal unexpected dispersal routes in northward invasion in the great mormon butterfly, Papilio memnon L. the California tiger salamander (Ambystoma californiense). Molecular (Lepidoptera: Papilionidae) in Japan. Applied Entomology and Zoology, Ecology, 18(7), 1365–1374. 36(3), 329–335. Wang, I. J., & Summers, K. (2010). Genetic structure is correlated with Zhan, S., Zhang, W., Niitepold, K., Hsu, J., Haeger, J. F., Zalucki, M. P., … phenotypic divergence rather than geographic isolation in the highly Kronforst, M. R. (2014). The genetics of monarch butterfly migration polymorphic strawberry poison-dart frog. Molecular Ecology, 19, and warning colouration. Nature, 514(7522), 317–321. 447–458. https://doi.org/10.1111/j.1365-294X.2009.04465.x Zhang, J., Nielsen, S. E., Stolar, J., Chen, Y., & Thuiller, W. (2015). Gains Wang, J. (2017). The computer program STRUCTURE for assigning in- and losses of plant species and phylogenetic diversity for a northern dividuals to populations: Easy to use but easier to misuse. Molecular high-latitude region. Diversity and Distributions, 21(12), 1441–1454. Ecology Resources, 17(5), 981–990. https://doi.org/10.1111/ddi.12365 Wang, T., Hamann, A., Spittlehouse, D. L., & Murdock, T. Q. (2012). Zheng, X., Levine, D., Shen, J., Gogarten, S. M., Laurie, C., & Weir, B. S. ClimateWNA—high-resolution spatial climate data for western North (2012). A high-performance computing toolset for relatedness and America. Journal of Applied Meteorology and Climatology, 51(1), 16–29. principal component analysis of SNP data. Bioinformatics, 28(24), https://doi.org/10.1175/JAMC-D-11-043.1 3326–3328. https://doi.org/10.1093/bioin​forma​tics/bts606 Wang, Y. H., Yang, K. C., Bridgman, C. L., & Lin, L. K. (2008). Habitat suitability modelling to correlate gene flow with landscape connec- tivity. Landscape Ecology, 23(8), 989–1000. https://doi.org/10.1007/ SUPPORTING INFORMATION s10980-008-9262-3 Wiens, J. A. (2001). The landscape context of dispersal. In J. Clobert, E. Additional supporting information may be found online in the Danchin, A. A. Dhondt, & J. D. Nichols (Eds.), Dispersal: Individual, Supporting Information section. population, and community (pp. 96–109). Oxford, UK: Oxford University Press. Wogan, G. O., Yuan, M. L., Mahler, D. L., & Wang, I. J. (2020). Genome- How to cite this article: MacDonald ZG, Dupuis JR, Davis CS, wide epigenetic isolation by environment in a widespread Anolis liz- Acorn JH, Nielsen SE, Sperling FAH. Gene flow and climate- ard. Molecular Ecology, 29(1), 40–55. Wright, S. (1921). Correlation and causation. Journal of Agricultural associated genetic variation in a vagile habitat specialist. Mol Research, 20, 557–585. Ecol. 2020;00:1–18. https://doi.org/10.1111/mec.15604 Wright, S. (1943). Isolation by distance. Genetics, 28(2), 114.