Full paper

Effects of landscapes and range expansion on population structure and local adaptation

Wei Zhao1,2 , Yan-Qiang Sun1, Jin Pan2, Alexis R. Sullivan2 , Michael L. Arnold3 , Jian-Feng Mao1 and Xiao-Ru Wang1,2 1Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, College of Biological Sciences and Technology, Beijing Forestry University, 100083 Beijing, China; 2Department of and Environmental Science, UPSC, Umea University, SE-901 87, Umea, Sweden; 3Department of Genetics, University of Georgia, Athens, GA 30602-7223, USA

Summary Authors for correspondence:  Understanding the origin and distribution of genetic diversity across landscapes is critical for Jian-Feng Mao predicting the future of organisms in changing climates. This study investigated how adaptive Tel: +86 13366181735 and demographic forces have shaped diversity and population structure in Pinus densata,a Email: [email protected] keystone species on Qinghai-Tibetan Plateau (QTP). Xiao-Ru Wang  We examined the distribution of genomic diversity across the range of P. densata using Tel: +46 907869955 exome capture sequencing. We applied spatially explicit tests to dissect the impacts of allele Email: [email protected] surfing, geographic isolation and environmental gradients on population differentiation and Received: 13 January 2020 forecasted how this genetic legacy may limit the persistence of P. densata in future climates. Accepted: 15 April 2020  We found that allele surfing from range expansion could explain the distribution of 39% of the c. 48 000 genotyped single nucleotide polymorphisms (SNPs). Uncorrected, these allele New Phytologist (2020) 228: 330–343 frequency clines severely confounded inferences of selection. After controlling for demo- doi: 10.1111/nph.16619 graphic processes, isolation-by-environment explained 9.2–19.5% of the genetic structure, with c. 4.0% of loci being affected by selection. Allele surfing and genotype–environment associations resulted in genomic mismatch under projected climate scenarios. Key words: allele frequency cline, exome  We illustrate that significant local adaptation, when coupled with reduced diversity as a sequences, genomic mismatch, local adaptation, nucleotide diversity, Pinus result of demographic history, constrains potential evolutionary response to climate change. densata, Qinghai-Tibetan Plateau. The strong signal of genomic vulnerability in P. densata may be representative for other QTP endemics.

allow alleles to ‘surf’ to very high or low frequencies, which may Introduction leave a molecular signature similar to selection (Edmonds et al., The ability of a species to sustain environmental change is primar- 2004; Klopfstein et al., 2006; Excoffier & Ray, 2008). Clines ily determined by its genetic reservoir, which is shaped over the produced by allele surfing can overlap with those produced by course of history through demography and selection. Dissecting IBD, but surfing can also result in strong differentiation between the effects of demography, geography and selection on population geographically proximate populations. Until now, the impact of diversity helps us to understand how genetic variation is dis- range expansion on allele frequency clines (AFCs) has been tributed across a landscape, as well as the evolutionary potential of obtained mostly from theoretical simulations (Klopfstein et al., species under climate change (Sork et al., 1999; Lee & Mitchell- 2006; Lotterhos & Whitlock, 2015; Hoban et al., 2016), with Olds, 2011; Manel & Holderegger, 2013; Orsini et al.,2013). few empirical studies in natural populations, especially in plants Sources of genetic differentiation can be broadly classified into (but see Gonzalez-Martınez et al., 2017; Ruiz Daniels et al., adaptive and dispersal-demographic factors. Among the later, iso- 2018). Detecting adaptive sources of genetic differentiation is lation by distance (IBD; Wright, 1943), a neutral process where often confounded by dispersal-demographic factors including gene flow is increasingly limited between more distant or isolated IBD and allele surfing, but natural populations are widely populations, is a well-studied source of clinal variation. More expected to experience isolation by environment (IBE; Orsini recently, the importance of density-dependent effects during et al., 2013; Wang & Bradburd, 2014), in which gene flow range expansion in producing strong clines or even discrete among populations inhabiting different ecological habitats is lim- genetic sectors has been illustrated by microbial experiments and ited by selection (Nosil et al., 2009; Feder et al., 2012a). simulation studies (Excoffier & Ray, 2008; Excoffier et al., 2009; Although methods exist to control for the ubiquitous autocorrela- Waters et al., 2013; Peischl et al., 2016). Repeated founder effects tion of IBE and IBD, distinguishing selection from dispersal-de- combined with density-dependent competitive exclusion can mographic effects during range expansion remains a challenge.

330 New Phytologist (2020) 228: 330–343 Ó 2020 The Authors www.newphytologist.com New Phytologist Ó 2020 New Phytologist Trust This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. New Phytologist Research 331

Patterns of IBE can be used to identify relationships between Materials and Methods genomic and environmental variation, which can be projected onto future climate models to estimate the vulnerability of extant Sampling and exome capture sequencing populations to extinction (Manel & Holderegger, 2013; Fitz- patrick & Keller, 2015; Bay et al., 2018). Understanding the We sampled 23 populations across the distribution of P. densata genomic mismatch between modern and future environments is (Fig. 1a). The name, location and sample size of each population necessary for assessing the ability of populations to persist. Popu- are listed in Table 1. Because of its hybrid history, we included lations with high genomic mismatch are likely to suffer popula- two and four representative populations of P. tabuliformis and tion decline if de novo mutations and migration cannot P. yunnanensis, respectively, to better polarize the genetic compo- compensate for the required diversity (Bay et al., 2018; Ruegg nents in P. densata. Populations from the eastern margin of et al., 2018). Moreover, the potential for populations to adapt P. densata (group E populations in this study) have a mix of mito- interacts with demographic history, and range expansions may chondrial DNA (mtDNA) haplotypes found in the two parental constrain viability by reducing both the pool of genetic diversity species, reflecting heterogeneous maternal lineages, but predomi- and the effectiveness of selection to purge deleterious alleles. nately P. tabuliformis chloroplast DNA (cpDNA) haplotypes as a Pinus densata forms extensive forests on the southeastern Qing- result of pollen-mediated introgression (cpDNA is paternally hai-Tibetan Plateau (QTP) at elevations ranging from 2700 to inherited in pines; Wang et al., 2011). The frequency of haplo- 4200 m above sea level (Mao & Wang, 2011). Previous genetic types unique to P. densata for both organelles increases westward, analyses suggest that P. densata originated from hybridization which supports the eastern margin as the ancient hybrid zone between Pinus tabuliformis and Pinus yunnanensis in the late where P. densata originated (Wang et al., 2011; Gao et al., 2012). Miocene (Wang & Szmidt, 1994; Wang et al., 2011; Gao et al., Despite their organelle haplotype overlap, P. densata populations 2012). An ancient hybrid zone has been identified in the north- in the east are distinct from the two parental species in cone and eastern edge of the current distribution of P. densata, from where seed morphometric traits (Mao et al., 2008). the hybrid lineage successfully colonized new habitats by step- Needles or cones were collected from four to 12 randomly wise, westward migration (Wang et al., 2011; Gao et al., 2012). selected trees in each stand. Genomic DNA was extracted from This demographic history could produce significant AFCs along needles or seedlings using a Plant Genomic DNA kit (Tian- the expansion route as a result of genetic surfing. Reciprocal gen, Beijing, China). Forty thousand exome probes (each 120 nt) transplant experiments revealed pronounced differences in sur- were designed from Pinus taeda UniGenes (Neves et al., 2013). vival among P. densata populations, suggesting extensive local The majority were aligned to c. 29 000 genes, while 9800 probes adaptation (Zhao et al., 2014). These lines of evidence suggest were aligned to intergenic regions. Library preparation, probe that demographic events and local adaptation have played impor- hybridization, and sequencing were conducted by RAPiD tant roles in the evolution of P. densata. However, until now it Genomics (Gainesville, FL, USA; Neves, et al., 2013). In total, has been difficult to fully address the relative contribution of 208 trees were genotyped. these forces because of the lack of genomic resources and meth- ods to adequately distinguish between AFCs generated by disper- Reduced reference genome preparation, mapping and sal-demographic effects and IBE. variant calling In this study, we used spatially explicit tests to identify the sig- nature of allele surfing, IBD and IBE and forecasted how this Genomes in Pinus are > 20 Gbp and computationally challenging genetic legacy might constrain the persistence and evolution of for population genomic analyses. We thus prepared a reduced P. densata in future climates. We hypothesize that: the stepwise reference genome from the P. taeda v.1.01 assembly (Neale et al., expansion history of P. densata generated substantial AFCs along 2014) following the approach of Yeaman et al. (2016). Briefly, the east–west colonization axis; adaptation to heterogeneous any scaffolds to which the capture probes aligned were retained as plateau habitats marked the genome with the signature of IBE; the reduced reference genome. Exome sequence reads for 58 indi- and strong AFCs coupled with decreased diversity result in high viduals (two per population) were first aligned to the whole genomic mismatch to future climates. To test these hypotheses, genome using the Burrows–Wheeler Aligner mem (BWA-MEM) we sampled 23 populations across the range of P. densata and algorithm with default parameters (Li, 2013). Variants were examined the allele frequency distributions over 40 000 exome called with the SAMTOOLS and BCFTOOLS pipeline using default regions. We estimated the magnitude of AFCs as a result of range parameters (Li, 2011). Scaffolds that had at least one single expansion, investigated gene–environment associations and parti- nucleotide polymorphism (SNP) in ≥ 50% of the individuals tioned the contribution of environment and geography to were also included in the reduced reference. In total, 55 300 scaf- among-population differentiation, with and without clinal loci. folds were included in the reduced reference genome. Using current genotype–environment association as a baseline, Sequence read quality was assessed with FASTQC (http://www.b we projected the risk of population decline under future climate ioinformatics.babraham.ac.uk/projects/fastqc/). Adapter sequences change scenarios. Our study provides a concrete dissection of and low-quality bases (Phred quality < 20) were removed using complex evolutionary forces that shape population structure in a TRIMMOMATIC (Bolger et al., 2014). Reads shorter than 36 bases keystone alpine species and highlights the vulnerability of after trimming were discarded. Clean reads were mapped to the endemics to climate change. reduced reference genome using the BWA-MEM algorithm with

Ó 2020 The Authors New Phytologist (2020) 228: 330–343 New Phytologist Ó 2020 New Phytologist Trust www.newphytologist.com New 332 Research Phytologist

(a) (c) Pd E Pd C Pd SW Pd W Pt Py

34° N

32° N

30° N

Latitude 28° N

0.15 26° N 0.10 PC1 (19.9%) 0.05 0.00 −0.05 24° N −0.10 PC3 (3.5%) −0.15 90° E 92° E 94° E 96° E 98° E 100° E 102° E 104° E −0.20−0.20 −0.15 −0.10 −0.05 −0.15 0.00 0.05 0.10 −0.10 −0.05 0.00 0.05 0.10 0.15 0.20 Longitude PC2 (5.2%) (b) Ancestry coefficients

28 29 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 Pt Pd east (E) Pd central (C) Pd southwest (SW) Pd west (W) Py (d) Group E Group C Group SW Group W Group E – 0.184 (0.270) 0.219 (0.350) 0.234 (0.424) Group C 0.405 (0.150) – 0.070 (0.063) 0.093 (0.120) Group SW 0.386 (0.231) 0.109 (0.158) – 0.081 (0.063) Group W 0.386 (0.180) 0.150 (0.132) 0.270 (0.210) –

Fig. 1 Population genomic structure in Pinus densata. (a) Spatial interpolation of five genomic clusters inferred by FASTSTRUCTURE. Black dots are population locations. Colors represent different ancestry groups. (b) Ancestry assignment for 29 populations of Pinus tabuliformis (Pt), Pinus yunnanensis (Py) and P. densata (Pd) at K = 5. Each bar represents an individual, with different colors reflecting varying ancestry. (c) Principal component analysis (PCA) of the

three species Pt, Py and Pd, with different colors reflecting different species/groups. (d) Genetic differentiation (FST) on all 47 612 single nucleotide polymorphisms (SNPs) and 18 539 allele frequency cline (AFC) SNPs (in parenthesis, above the diagonal) and on outliers (below the diagonal) detected by PCADAPT (1790 SNPs) and BAYENV (2025 SNPs, in parenthesis) between groups of P. densata.

default parameters (Li, 2013). PCR duplicates were removed using < À8.0; genotypes with genotype quality (GQ) < 20 or read depth PICARD MARKDUPLICATES (http://broadinstitute.github.io/picard/). (DP) < 5 were masked; and SNPs that met any of the following cri- Reads around putative insertions and deletions were locally teria were removed: missing rate > 20%, minor allele frequency realigned using REALIGNERTARGETCREATOR and INDELREALIGNER in (MAF) < 5%, heterozygosity > 70% or allele number > 2. The the Genome Analysis Toolkit (GATK v.3.5-0; Van der Auwera remaining SNPs were used for population genomic analyses, except et al., 2013). Variant calling was performed using HaplotypeCaller for site frequency spectrum (SFS) and nucleotide diversity-based individually. GENOTYPEGVCFS was then used to perform multi- analyses for which no MAF filtering was applied. sample joint aggregation and genotype likelihood correction, with the parameter ‘-includeNonVariantSites’. Population structure and diversity Several filtering steps were performed to minimize SNP calling errors: SNPs within 5 bp from any indel were removed; GATK Genomic structure was assessed using FASTSTRUCTURE (Raj et al., hard filters were set to QD < 2.0, FS > 60.0, SOR > 4.0, 2014), and model complexity (i.e. the number of K genetic MQ < 40.0, MQRankSum < À12.5 and ReadPosRankSum groups required to explain structure in the dataset) was examined

New Phytologist (2020) 228: 330–343 Ó 2020 The Authors www.newphytologist.com New Phytologist Ó 2020 New Phytologist Trust New Phytologist Research 333

Table 1 Geographic locations, sample size (N), average heterozygosity per locus (Het), mean nucleotide diversity across all loci (p), 0-fold degenerate (p0) and four-fold degenerate loci (p4) of the 29 populations of Pinus densata, Pinus tabuliformis, and Pinus yunnanensis included in this study.

Latitude Longitude Altitude

Species Population (°N) (°E) (m) N Het pp0 p4 p0/p4

Pinus densata 1 Maerkang 31.91 102.20 2712 8 0.266 0.0036 0.0027 0.0062 0.4312 2 Lixian 1 31.67 102.80 2856 8 0.277 0.0034 0.0026 0.0059 0.4312 3 Lixian 2 31.40 102.96 2382 8 0.266 0.0034 0.0026 0.0059 0.4317 4 Baoxing 30.78 102.73 2347 8 0.297 0.0034 0.0025 0.0059 0.4353 Total group E 32 0.277 0.0036 0.0026 0.0061 0.4262 5 Kangding 30.19 101.91 2951 8 0.273 0.0028 0.0021 0.0049 0.4339 6 Jiulong 29.01 101.50 3025 4 0.252 0.0024 0.0019 0.0042 0.4619 7 Zhongdian 1 28.04 99.52 3180 4 0.265 0.0026 0.0021 0.0044 0.4700 8 Zhongdian 2 28.18 99.74 3396 12 0.262 0.0025 0.0020 0.0043 0.4628 9 Xiangcheng 1 28.81 99.86 3804 8 0.258 0.0024 0.0019 0.0042 0.4646 10 Xiangcheng 2 28.92 99.79 3234 8 0.260 0.0025 0.0020 0.0043 0.4674 11 Litang 30.52 100.37 2951 8 0.252 0.0025 0.0019 0.0042 0.4594 12 Deqin 28.46 98.86 3242 8 0.266 0.0025 0.0019 0.0042 0.4620 13 Mangkang 1 29.20 98.64 3448 8 0.259 0.0025 0.0019 0.0042 0.4563 14 Mangkang 2 29.56 98.31 3798 8 0.253 0.0024 0.0019 0.0041 0.4633 Total group C 76 0.260 0.0025 0.0020 0.0044 0.4545 15 Zayu 1 29.14 97.23 3306 8 0.246 0.0022 0.0018 0.0038 0.4682 16 Zayu 2 29.01 97.39 2883 8 0.245 0.0022 0.0017 0.0036 0.4733 17 Zayu 3 28.65 97.44 2294 8 0.228 0.0021 0.0018 0.0037 0.4788 18 Zayu 4 28.48 97.03 1481 8 0.216 0.0021 0.0017 0.0036 0.4764 19 Zayu 5 28.71 96.79 1991 8 0.247 0.0022 0.0018 0.0037 0.4828 Total group SW 40 0.236 0.0022 0.0017 0.0037 0.4595 20 Parlung Zangbo 1 29.90 95.59 2717 8 0.194 0.0018 0.0015 0.0030 0.5108 21 Parlung Zangbo 2 30.10 95.10 2152 8 0.193 0.0018 0.0015 0.0029 0.5081 22 Niyang valley 29.65 94.38 3220 8 0.221 0.0020 0.0017 0.0035 0.4792 23 Yarlung Zangbo 29.25 94.28 2876 8 0.206 0.0019 0.0016 0.0033 0.4891 Total group W 32 0.203 0.0019 0.0016 0.0033 0.4848 Total P. densata 180 0.248 0.0028 0.0022 0.0049 0.4490 P. yunnanensis 24 Gongshan 28.02 98.63 1658 8 0.235 0.0024 0.0019 0.0039 0.4865 25 Weixi 27.20 99.29 2262 4 0.230 0.0023 0.0019 0.0039 0.4794 26 Tengchong 24.92 98.58 1824 4 0.228 0.0025 0.0021 0.0042 0.4986 27 Wenshan 23.43 104.24 1452 4 0.213 0.0023 0.0019 0.0039 0.4880 Total 20 0.227 0.0024 0.0019 0.0040 0.4750 P. yunnanensis P. tabuliformis 28 Tumote 40.79 111.21 1223 4 0.247 0.0034 0.0026 0.0059 0.4367 29 Guangyuan 32.62 106.10 1340 4 0.266 0.0036 0.0027 0.0062 0.4299 Total 8 0.257 0.0036 0.0027 0.0062 0.4355 P. tabuliformis

E, east; C, central; SW, southwest; W, west.

= – for K 1 16. The most likely K was selected using the SNP pairs in each scaffold. Population differentiation (FST) was ‘chooseK.py’ script. Geographic maps of ancestry coefficients calculated using VCFTOOLS. were drawn in R following Caye et al. (2016). Population struc- ture was also evaluated using principal component analysis (PCA) Estimating the fitness effects of amino acid-changing implemented in EIGENSOFT v.6.1.4 (Price et al., 2006). Genetic mutations diversity in each population was estimated as average heterozy- gosity per locus (Het) and pairwise nucleotide diversity at all sites To understand the strength of purifying selection operating in p p p ( ) and at 0-fold ( 0) and four-fold ( 4) degenerate coding sites P. densata populations, we estimated the distribution of fitness using VCFTOOLS (https://vcftools.github.io/index.html; Danecek effects (DFE) of nonsynonymous mutations by using the maxi- et al., 2011). The Het was calculated using MAF-filtered informa- mum-likelihood procedure implemented in DFE-a (Keightley & tive SNPs, while p was calculated using both informative and Eyre-Walker, 2007; Eyre-Walker & Keightley, 2009). This invariant SNPs. The zygotic linkage disequilibrium (LD; squared method assumes that the fitness effects of new mutations at neu- correlation coefficient, r2) values for all SNP pairs within each tral sites are zero and deleterious at selected sites. We generated scaffold were calculated using VCFTOOLS (https://vcftools.github. the folded SFS in each population for a class of putatively neutral io/index.html) and plotted against the physical distances between reference sites (four-fold degenerate sites) and a class of selected

Ó 2020 The Authors New Phytologist (2020) 228: 330–343 New Phytologist Ó 2020 New Phytologist Trust www.newphytologist.com New 334 Research Phytologist

sites (0-fold degenerate sites). We modeled the effects of recent Detection of outlier loci demographic change on neutral SFS by assuming one step popu- lation size change and inferred the fitness of new deleterious We employed two conceptually different methods, PCADAPT (Luu mutations at the selected sites from a gamma distribution while et al.,2017)andBAYENV2.0 (Gunther€ & Coop, 2013), to detect simultaneously fitting the estimated parameters for the demo- genome-wide signatures of local adaptation because they show graphic model. The strength of purifying selection is defined as promise for accurately identifying outliers under a variety of com- the product of the effective population size Ne and the selection plex demographic scenarios (Lotterhos & Whitlock, 2014, 2015; À coefficient s ( Nes). We performed 999 bootstrap resampling of Luu et al.,2017).PCADAPT identifies population structure using SNPs in each site class to generate the 95% confidence intervals PCA and calculates the correlation between genetic variants and À of Nes. significant PCs. The Mahalanobis distance is computed for each SNP and the scores that do not follow the distribution of the bulk of distance points are considered outliers (Luu et al.,2017).Cat- Detection of AFCs tell’s graphical rule was used to choose the number of principal To detect allele frequency changes along an east to west axis components, and outliers were selected at a false discovery rate across the P. densata range, we used a sliding-window approach (FDR) of 0.05 using the R package QVALUE (Storey et al., 2015). following Currat & Excoffier (2005) and Pereira et al. (2018). BAYENV2.0 uses a set of neutral loci to estimate the empirical Briefly, the distribution range was divided longitudinally into pattern of covariance in allele frequencies between populations. 1.5° (c. 150 km) nonoverlapping windows. The mean frequency This estimate is then employed as a null model to test the correla- of each SNP in every window was calculated. We then used linear tion between individual SNPs and environmental variables regression to test the strength of the relationship between allele (Gunther€ & Coop, 2013). We generated a set of putatively neu- frequency changes from east to west vs geographic distance tral and independent SNPs by performing LD pruning on the between windows. To evaluate whether the observed clines were four-fold degenerate sites using --indep-pairwise (50 5 0.05) in a result of chance, we used a permutation test to derive the null PLINK 1.9 (https://www.cog-genomics.org/plink/1.9/). This pro- distribution of cline slopes by performing linear regression on cedure calculates pairwise LD (r2) within overlapping window of each allele frequency vs 10 000 randomized population coordi- 50 SNPs with a step size of 5 and removes one of a pair of SNPs nates. The observed slope of each allele was then compared with if r2 > 0.05. The remaining four-fold SNPs (1906) were then the null distribution to test the probability that the observed cline used to estimate the covariance matrix with 100 000 iterations. was significant at the a = 0.05 level. This procedure was per- Covariance matrices were compared after three independent runs formed in R for all SNPs. Alleles were considered AFC alleles if with different seed numbers to ensure convergence. Correlations the regression was significant and the slope was significantly dif- with the 11 environmental variables described earlier and altitude ferent from the null distribution. Allele surfing, IBD and IBE can were assessed by averaging five independent runs, each with all result in AFCs, and we further dissect these forces in the out- 100 000 Markov chain Monte Carlo iterations. SNPs were con- lier detection section. sidered outliers if they were in the top 1% of the Bayes factor (BF) values, with BF > 10 and in the top 10% of the absolute val- ues of Spearman’s q. Environmental data Fourteen environmental variables previously identified as most Inferring IBD and IBE relevant to the niche divergence between P. densata and its parental species (Mao & Wang, 2011) and six global UV radia- We conducted redundancy analyses (RDAs) to estimate the rela- tion values were extracted for each of the sampling locations tive importance of environmental and geographic distance to (Supporting Information Table S1). After evaluating the Spear- population genomic differentiation. RDA involves multiple lin- man pairwise correlations among these variables, 11 variables ear regressions and PCA to assess the relative effect of matrices of with correlation coefficients (q) ≤ |0.75| across the range of independent variables on a matrix of dependent variables. We P. densata were retained: annual mean air temperature (bio1), included Hellinger-transformed MAF for each population cre- isothermality (bio3), air temperature seasonality (bio4), annual ated in the R package VEGAN (Oksanen et al., 2016) as the depen- precipitation (bio12), precipitation of the driest month (bio14), dent matrix and two independent matrices: the scaled precipitation seasonality (bio15), ground-frost frequency (FRS), environmental variables (11 environmental variables and altitude, growing degree days (GDD), soil organic carbon (SC), wet-day representing IBE) and the distance-based Moran’s eigenvector frequency (WET, i.e. days having ≥ 0.1 mm of precipitation) and map (6 dbMEMs, representing IBD). Using all SNPs, we per- annual mean UV-B (UVB1). Additionally, we also included alti- formed forward selection with an a value of 0.05 on the geo- tude as we expect it may be important to an alpine species. To graphic and environmental variables separately to avoid evaluate the divergence in environmental conditions across the overfitting, following the recommendation of Borcard et al. species’ range, we extracted 115 occurrence sites and their corre- (2011). This resulted in the retention of four dbMEMs sponding environmental variables from Mao & Wang (2011). (dbMEM1, dbMEM2, dbMEM3 and dbMEM4) and four envi- All environmental layers were converted to the same resolution at ronmental variables (UVB1, WET, bio1 and bio14) for the fol- a grid cell size of 30 arc-seconds (c.1km2). lowing analyses.

New Phytologist (2020) 228: 330–343 Ó 2020 The Authors www.newphytologist.com New Phytologist Ó 2020 New Phytologist Trust New Phytologist Research 335

We performed a series of full and partial RDA model tests to genomic compositions under future environmental projections differentiate the independent effects of environment and geogra- using contemporary associations as baseline (Fitzpatrick & phy by reciprocally constraining one of the two factors. The sig- Keller, 2015). We evaluated the predicted genomic composition nificance of these models was determined by the ‘anova.cca’ in by 2070 under two representative concentration pathway function of VEGAN, based on a permutation test of 9999 itera- (RCP) scenarios, RCP 2.6 and RCP 8.5, representing low and tions. RDA was performed for different SNP sets (Table 2). high greenhouse gas emission trajectories, respectively. Five of the 11 environmental layers (WET, UVB1, FRS, GDD and SC) were not available for future projection. We left them unchanged Prediction of genomic mismatch and used the projections for the other six variables (bio1, bio3, To assess genotype–environment correlations and predict the bio4, bio12, bio14 and bio15) to predict future genomic compo- genomic mismatch to future conditions, we performed gradient sitions, assuming a generation time of 50 yr for pine. We calcu- forest (GF) analysis using the R package GRADIENTFOREST (Ellis lated the Euclidean distances between the current and future et al., 2012). GF models apply a machine-learning algorithm to genomic compositions to represent the scale of genetic change identify genotype–environment relationships at sampled loca- needed to match environmental change, with higher values indi- tions and project the correlations onto unsampled geographic cating greater vulnerability of the population (Fitzpatrick & regions or times (Fitzpatrick & Keller, 2015). SNPs polymorphic Keller, 2015). We visualized the genomic mismatch in geo- in fewer than five populations were removed from this analysis to graphic space to illustrate the distribution of vulnerable popula- ensure robust regression. The 11 environmental variables charac- tions. terizing each sample location were included in the GF models to predict the genomic composition of each grid point across the Data availability range of P. densata. Each GF model was tested using 500 regres- sion trees per SNP while keeping all the other parameters at All sequencing data are archived in the NCBI SRA database default values. The resulting multidimensional genomic patterns under BioProject accession number PRJNA492187. were summarized using PCA with the first three PCs assigned to red, green and blue, respectively (Fitzpatrick & Keller, 2015). Results Similar colors in the space represent similar genetic composition. This allowed us to visualize the differences in allele frequencies Genomic diversity and population structure (referred to as ‘genomic turnover’) along environmental gradients across space. To validate that our model explained more variation Exome capture sequencing on the 208 trees generated 1.25 bil- than expected by chance, we performed 10 GF models with ran- lion paired-end reads with an average of 6.00 million reads per domized environmental variables following Ruegg et al. (2018) sample (Table S2). Alignment of the sequence reads to the and compared the number of SNPs with positive r2 and the mean reduced reference genome (55 300 scaffolds) yielded an average r2 across these SNPs between models. of 15.10 Mbp of genomic sequence covered by at least five reads To identify the spatial regions where genotype–environment per individual (Table S2). Exome probe capture rates ranged relationships are most likely to be disrupted by climate change, from 77.11% to 89.24% (Table S2). We retained 48 443 high- we evaluated the mismatch between current and predicted quality SNPs after stringent quality control, of which 53.49%

Table 2 Redundancy analyses (RDAs) that partition sources of genetic differentiation among populations in Pinus densata into geography, environment and their combined effects.

All SNPs AFC SNPs PCADAPT outliers BAYENV outliers 47 612 18 539 3653a (1620b, 1790c) 2704a (1855b, 2025c)

Combined fractions F ~ geog. 0.532*** 0.606*** 0.617*** (0.624***, 0.633***) 0.513*** (0.414***, 0.465***) F ~ env. 0.466*** 0.557*** 0.547*** (0.533***, 0.550***) 0.571*** (0.483***, 0.522***) Individual fractions F ~ geog. | env. 0.132*** 0.098*** 0.150*** (0.186***, 0.175***) 0.105*** (0.126***, 0.121***) F ~ env. | geog. 0.066*** 0.048*** 0.080*** (0.095***, 0.092***) 0.162*** (0.195***, 0.177***) Total explained 0.597*** 0.655*** 0.697*** (0.719***, 0.725***) 0.675*** (0.609***, 0.642***) Total confounded 0.400 0.509 0.467 (0.438, 0.458) 0.408 (0.288, 0.344) Total unexplained 0.403 0.345 0.303 (0.281, 0.275) 0.325 (0.391, 0.358) Total 1.000 1.000 1.000 (1.000, 1.000) 1.000 (1.000, 1.000)

Three sets of outliers were generated under PCADAPT and BAYENV: a, the full set of detected outliers; b, removing allele frequency cline (AFC) single nucleotide polymorphisms (SNPs) from a; and c, including common outliers to PCADAPT and BAYENV that show AFC back into b. F, dependent matrix of Hellinger-transformed minor allele frequencies; redundancy analysis (RDA) tests are of the form: F ~ independent matrices | covariate matrices. env., four retained environmental variables; geog., four retained Moran’s eigenvector map variables; total explained, total adjusted R2 of individual fractions; total confounded, total of individual fractions confounded between various combinations of climate and geography. ***, P ≤ 0.001; significance of confounded fractions between climate and geography was not tested.

Ó 2020 The Authors New Phytologist (2020) 228: 330–343 New Phytologist Ó 2020 New Phytologist Trust www.newphytologist.com New 336 Research Phytologist

were annotated to 2621 ‘high confidence’ and 3755 ‘low confi- variation. The spatial sliding-window analysis identified 38.9% dence’ genes. LD among SNPs decayed by half, from r2 = 0.46 to (18 539) of the 47 612 SNPs showing significant longitudinal cli- 0.23 within 2 kb (Fig. S1). This pattern is similar to that reported nes that cannot be explained by chance. The decline of Het at cli- for other pine species (Brown et al., 2004; Neale & Savolainen, nal SNPs from east to west was sharp (r = À0.89, P << 0.001; 2004; Eckert et al., 2009). Fig. 2a). Mean population differentiation (FST) at these clinal = Using all 48 443 SNPs, FASTSTRUCTURE determined K 5to SNPs was 0.288, which was much higher than the FST of 0.191 be an optimal number of clusters to explain the genetic structure over all 47 612 SNPs (Fig. 1d). in the 208 individuals. Of the five clusters, one was unique to Using all 47 612 input SNPs in P. densata,PCADAPT identified P. yunnanensis, and the other four clusters split P. densata into 3653 outliers at a FDR of 0.05. BAYENV identified 2704 SNPs distinct geographic zones: east (E), central (C), southwest (SW) with significant correlations with one or more environmental and west (W) (Fig. 1a, b; Table 1). The eastern range of variables. Among these outliers, 55.7% of PCADAPT and 31.4% of P. densata was predominated by the P. tabuliformis ancestry. BAYENV showed significant AFCs. With the aim of minimizing These populations, however, have a mtDNA composition and false positives as a result of allele surfing, all AFC SNPs were first morphometric traits that distinguish them from P. tabuliformis. removed from the outliers, leaving 1620 and 1855 outliers for The nuclear DNA pattern was concordant with the chloroplast PCADAPT and BAYENV, respectively. This rigorous filtering con- DNA (paternally inherited in pines) result reported earlier (Wang versely elevates the risk of false negatives. To help mitigate this et al., 2011), thus reflecting pollen-mediated introgression in this effect, we considered outliers that are common to PCADAPT and region. Increasing K to higher values resulted in no further popu- BAYENV yet that demonstrate AFCs (170 SNPs) to more likely lation substructuring (data not shown). PCA yielded a similar reflect IBE than demographic-dispersal effects and included them grouping as FASTSTRUCTURE and the first three eigenvalues signifi- back into both the PCADAPT and BAYENV outlier bins (Fig. S2). cantly (Tracy–Widom test, P < 0.001) explained 28.6% of the This resulted in a set of 1790 and 2025 SNPs for PCADAPT and total genetic variance (19.9%, 5.2% and 3.5% for PC1, PC2 and BAYENV, respectively (Table 2). PC3, respectively; Fig. 1c). The average genetic differentiation (FST) among the P. densata Excluding reference populations of P. tabuliformis and groups at the 1790 PCADAPT outliers was 0.411, while the value P. yunnanensis yielded 47 612 SNPs in the 23 populations of for the 2025 BAYENV outliers was 0.215. A closer examination of P. densata (180 trees). Genome-wide heterozygosity for each pop- the 2025 BAYENV outliers showed that 1154 were significantly ulation ranged from 0.193 to 0.297 (Table 1) and showed a associated with temperature, 778 with water availability, 132 strong negative correlation with distance to the easternmost pop- with soil organic carbon (SC), 317 with UVB and 294 with alti- ulation, supporting this region as the ancient hybrid zone tude (Table S3). Annotation of the 2025 outliers identified 703 (r = À0.85, P << 0.001; Fig. 2a). Nucleotide diversity at all sites candidate genes, of which 372 can be functionally classified into p p p ( ) and 0-fold ( 0) and four-fold ( 4) degenerate sites showed a gene ontology (GO) terms. Macromolecule transport and similar pattern across this east–west axis (Fig. 2b; Table 1). The metabolic processes were significantly enriched among these p p < variation in 0 and 4 followed the same trend, but the values of terms (FDR 0.05; Table S4). p p 4 decreased much faster than 0 (Fig. 2b), resulting in an increasing p /p ratio from east to west (r = 0.90, P << 0.001; 0 4 Quantifying IBD and IBE Table 1; Fig. 2c). We evaluated the effect of IBD and IBE in shaping genomic vari- ation among populations using redundancy analysis (RDA). All Distribution of fitness effects of nonsynonymous mutations the RDA models performed using the retained environmental We quantified the fitness effects of nonsynonymous mutations in (four) and geographic (four) variables after forward selection were P. densata using whole-exome sequences. The inferred DFE for highly significant (P ≤ 0.001). We found that the exclusive con- all groups indicated that 29.7–41.0% of new mutations would be tribution of IBD while controlling for environment (partial À > – ~ | – strongly deleterious ( Nes 100; Fig. 2d), 34.1 40.1% weakly model F geog. env. in Table 2) explained 9.8 18.6% of the À < – deleterious ( Nes 1), and 18.9 36.5% moderately deleterious variation in allele frequencies among populations in different < À < (1 Nes 100). The DFE was significantly different among SNP sets. When controlling for IBD, the exclusive contribution groups, with the SW and W groups having a larger proportion of of IBE (partial model F ~ env. | geog.; Table 2) explained 6.6% highly deleterious mutations while the E and C groups had more of the variation in the full set of SNPs and 4.8% for the AFC moderately deleterious sites (Fig. 2d). SNPs. Removing AFC SNPs from the PCADAP and BAYENV out- liers enlarged the exclusive contribution of IBE to 9.5% and 19.5% for PCADAPT and BAYENV outliers from 8.0% and 16.2%, Detecting AFCs and putative adaptive loci respectively. Including the 170 AFC SNPs identified as outliers Allele surfing during range expansion can generate clinal variation by both PCADAP and BAYENV did not alter the contribution of IBE that resembles IBE. To separate the confounding effect of clines in the PCADAP outliers (1790 SNPs) but slightly decreased IBE in caused by dispersal-demographic effects from local adaptation, the BAYENV outliers (2025 SNPs) from 19.5% to 17.7% (partial we performed a spatial sliding-window analysis along the east– model F ~ env. | geog.; Table 2). A total of 59.7–72.5% of the west expansion axis to identify alleles showing significant clinal variation could be explained by the two components in different

New Phytologist (2020) 228: 330–343 Ó 2020 The Authors www.newphytologist.com New Phytologist Ó 2020 New Phytologist Trust New Phytologist Research 337

(a) (b)

πππ AFC SNPs All SNPs 0 4 0.35 r = – 0.89, P = 1.9e−08 r = – 0.90, P = 4.2e−09 r P r P

= – 0.85, = 2.9e−07 0.006 = – 0.89, = 1.1e−08 r = – 0.90, P = 5.3e−09 0.005 0.25 0.30 0.003 0.004 0.20 Nucleotide diversity 0.002 Average heterozygosity per locus 0.15 0.001 0 200 400 600 800 0 200 400 600 800 Geographical distance to group E (km) Geographical distance to group E (km)

(c) (d) Pd east Pd central Pd southwest Pd west r = 0.90, P = 6.3e−09 a a a b

b

0.4 c c d 4 π / a 0 a π b c

0.2 b a c d Proportion of mutations 0.44 0.46 0.48 0.50

0 200 400 600 800 0.0 0 < -Nes < 1 1 < -Nes < 10 10 < -Nes < 100 -Nes > 100 Geographical distance to group E (km) Fig. 2 Plot of nucleotide diversity vs geographic distance from the suggested ancient hybrid zone for average heterozygosity per locus (a), mean pairwise distance across all sites (p), 0-fold degenerate sites (p0) and four-fold degenerate sites (p4) (b), and the ratio of mean p0 to mean p4 (p0/p4) (c); the Pearson correlation coefficient and the corresponding significance are shown. (d) Estimates of purifying selection at 0-fold degenerate sites in four Pinus densata (Pd) groups. Error bars represent 95% bootstrap confidence intervals. Different letters (a, b, c and d) above each bar indicate significant differences (a = 0.05) among groups based on Kruskal–Wallis multiple-range tests.

SNP sets (‘total explained’ in Table 2), of which a large propor- of the P. densata distribution showed a distinct class of geno- tion was s a result of their joint effect (‘Total confounded’ in type–environment associations, consistent with the more Table 2). This effect was most pronounced at the AFC SNPs. extreme conditions of the region (Fig. 3c). Wet-day frequency (WET), annual mean UV-B (UVB1), annual mean air temper- ature (bio1) and air temperature seasonality (bio4) were most Spatial distribution of gene–environmental associations and strongly correlated with the observed genomic variation future genomic mismatch (Fig. S3). PCA on these variables showed clear niche diver- Spatial mapping of genotype–environment relationships using gence across the distribution range of P. densata (Fig. 3c). Per- GF models revealed genomic turnover between the east, cen- mutation tests showed that the GF models explained tral-southwest and west regions (Fig. 3a, b). The western range

Ó 2020 The Authors New Phytologist (2020) 228: 330–343 New Phytologist Ó 2020 New Phytologist Trust www.newphytologist.com New 338 Research Phytologist

all SNPs, 16.5% and 17.2% of the distribution space were recog- (a) nized as above this threshold of vulnerability under RCP 2.6 and RCP 8.5, respectively. The corresponding values for the 2025 BAYENV outliers were 8.2% and 38.5%, illustrating that adaptive variation is more susceptible to strong climate change. GF mod- elling predicted the western portion of the distribution was most vulnerable under both climate scenarios, indicating where cli- mate-induced selective pressure will be the highest (Fig. 4). Note that we kept five of the 11 environmental variables static because of the lack of future prediction data, which may have resulted in an underestimation of genomic mismatch in P. densata.

(b) Discussion As the dominant forest-forming species in the southeastern QTP, the resilience of P. densata underlies regional structure and function. While the colonization of large and variable areas suggests successful adaptation to high plateau environments (Mao & Wang, 2011; Zhao et al., 2014), the spatial autocorrela- tion of environmental gradients with expansion routes makes identifying genotype–environment relationships challenging. Here, we controlled explicitly for genetic differentiation gener- ated by founder effects, geographic isolation and local adaptation (c) to predict population viability under ongoing climate change.

Range expansion produces AFCs and structures genetic diversity Previous studies established that P. densata colonized the plateau by stepwise westward migration from the ancient hybrid zone located in the eastern margin of its current distribution (Wang et al., 2011; Gao et al., 2012). In this study, we detected a contin- uous loss of heterozygosity and nucleotide diversity in exome sequences across the distribution from east to west, reinforcing the previous inferences of serial bottlenecks along the expansion axis. Wind-pollinated conifers are generally characterized by low Fig. 3 Gradient Forest mapped genotype–environment relationships across population differentiation at nuclear markers, even across large the Pinus densata distribution range based on all single nucleotide geographical distances (Alberto et al., 2013). The more widely polymorphisms (SNPs) (a) and BAYENV outliers (b). Locations with similar distributed and montane parental species P. tabuliformis and colors are expected to harbor populations with similar genomic P. yunnanesis accord well with this expectation, with estimates of composition. The biplot in each panel indicates the contribution of the environment variables to the predicted patterns of genetic composition, FST ranging from 0.023 to 0.086 (Ma et al., 2006; Wang et al., with labeled vectors showing the loadings of the four top variables. Black 2011; Gao et al., 2012; Xia et al., 2018). By contrast, P. densata dots represent sampling sites on the geographic (map) and genetic possesses substantial spatial genetic structure, similar to other (biplots) space, with crosses, squares, circles and triangles representing plants inhabiting the QTP (Geng et al., 2009; Qiu et al., 2011; groups of east (E), central (C), southwest (SW) and west (W), respectively. Wen et al., 2014). The four genetic groups identified within (c) Principal component analysis (PCA) biplot for the top four environmental variables across 115 occurrence sites of P. densata. P. densata correspond to four distinct geographic regions and the Grouping of the sites into E, C, SW and W follows the map in Fig. 1(a). genome-wide FST among them was 0.191, presumably a result of dispersal limitations across the dramatic and complex topography of the QTP. significantly more genomic variation than randomized models Range expansion can lead to a loss of heterozygosity, allele cli- (Fig. S4), lending confidence to our GF model selection. nes, and genetic differentiation along the axis of a range expan- We simulated the genomic change needed to track predicted sion (Currat & Excoffier, 2005; Klopfstein et al., 2006; Excoffier climate change by the year 2070 under RCP 2.6 and RCP 8.5. & Ray, 2008; Excoffier et al., 2009). However, empirical support To compare the two scenarios, we calculated the proportion of remains limited owing to the difficulty in detecting range expan- the distribution range having a genomic mismatch > 50% of the sion and AFCs in natural systems. We applied a spatially explicit maximum detected value (0.048 in this study; Fig. 4). Based on method (Currat & Excoffier, 2005; Pereira et al., 2018) to

New Phytologist (2020) 228: 330–343 Ó 2020 The Authors www.newphytologist.com New Phytologist Ó 2020 New Phytologist Trust New Phytologist Research 339

92° E 94° E 96° E 98° E 100° E 102° E 104° E 92° E 94° E 96° E 98° E 100° E 102° E 104° E

(a) (c) 34° N

32° N

30° N 0.000 – 0.004 0.004 – 0.008 0.008 – 0.012 28° N 0.012 – 0.016 0.016 – 0.020 0.020 – 0.024 (b) (d) 0.024 – 0.028 0.028 – 0.032 34° N 0.032 – 0.036 0.036 – 0.040 0.040 – 0.044 32° N 0.044 – 0.048

30° N

28° N

Fig. 4 Prediction of genomic mismatch to future climate change for all single nucleotide polymorphisms (SNPs) (a, b), and BAYENV SNPs (c, d). (a) and (c) reflect scenario representative concentration pathway (RCP) 2.6 2070; (b) and (d) reflect scenario RCP 8.5 2070. Red and blue indicate high and low genomic mismatch, respectively. identify AFCs. We found as many as 38.9% of the 47 612 SNPs Klopfstein et al., 2006; Excoffier & Ray, 2008; Excoffier et al., showed clinal variation that could be the result of allele surfing 2009). Even though the methods we used for outlier detection and/or clinal adaptation. Consistent with theoretical expecta- (PACDAPT and BAYENV) correct for neutral population structure tions, the decline of heterozygosity from east to west at AFC (Gunther€ & Coop, 2013; Luu et al., 2017), detection of selection SNPs was sharp, and population differentiation at these SNPs in structured populations can be difficult (Wright & Gaut, 2005; was 51.0% higher than the genome average. This suggests that Lotterhos & Whitlock, 2014, 2015). We found that as many as during range expansion in P. densata, from serial 55.7% of outlier loci may be the result of allele surfing, although founder events drove a considerable number of alleles to very the two detection methods used here differed substantially in high or low frequencies and consecutively reduced heterozygos- their overlap with AFCs (Fig. S2). Notably, outliers identified ity, with the western populations harboring the lowest diversity. with respect to population structure using PCADAPT were more Redundancy analyses determined IBD can explain 13.2% of enriched for AFCs than those identified based on allele-environ- the variation at all SNPs (Table 2) and 9.8% at the AFC SNPs. ment correlations in BAYENV (55.7% vs 31.4%), which lends sup- Because the range expansion in P. densata occurred east-to-west, port to the ability of our methodology to remove outliers we expect the AFC SNPs to overlap with signal of IBD, although generated by drift. allele surfing can also produce clines that show no relationship to Simultaneously minimizing false positives and negatives is vir- geographic distances under some circumstances. Our analyses tually impossible in population genomic studies in natural sys- illustrate that while P. densata was historically able to exploit an tems. Conservatively, we can regard outliers overlapping with unoccupied niche on the high plateau, colonization has left a AFCs as false positives, and thus their association with environ- strong genetic legacy of reduced diversity and population connec- ments as resulting from dispersal-demographic effects rather than tivity that may leave it vulnerable to future change. selective pressures. However, among the 11 studied environmen- tal variables and altitude, five were correlated significantly with longitude (Table S3). Additionally, simulations indicate that ben- Local adaptation or allele surfing? eficial alleles on the expansion front have a higher probability to Simulations show that dispersal-demographic effects during establish and spread (Hallatschek & Nelson, 2010; Peischl et al., range expansion produce strong AFCs and even discrete sectors 2015, 2016). Discarding all outliers that show AFCs will remove of genetic homogeneity. These processes can promote genetic dif- true adaptive alleles mapped to any of these variables and will ferentiation along the axis of a range expansion, which can mimic thus underestimate local adaptation. To achieve a pragmatic the signature of local adaptation (Currat & Excoffier, 2005; trade-off between power and error rates, we created an additional

Ó 2020 The Authors New Phytologist (2020) 228: 330–343 New Phytologist Ó 2020 New Phytologist Trust www.newphytologist.com New 340 Research Phytologist

outlier set that included those identified by both PCADAPT and margin showing a higher proportion of weakly and strongly dele- BAYENV but show AFC. Common outliers detected by different terious mutations. Accumulation of highly deleterious mutations methods yield conservative error rates (de Villemereuil et al., increases genetic load, which in turn leads to elevated risk of pop- 2014), and their clinal behavior should be more likely as a result ulation decline via mutational meltdown (Lynch et al., 1995). An of clinal selection. Overall, 3.8–4.3% of the sampled loci can be alternative view is that populations that occupy different environ- inferred as being affected by selection because they were either ments may experience diverse selective constraints, resulting in nonAFC outlier loci identified by one of the methods or because different shapes of DFE (Tellier et al., 2011). The difference in they were AFC loci detected by both methods as outliers. the strength of purifying selection found in this study could thus Although false positives and negatives almost certainly exist in be partially explained by the distinct environments occupied by these outlier designations, our analyses provide a rare empirical the P. densata groups (Fig. 3c). evaluation of the magnitude of allele surfing during range expan- Using GF analyses, we found a clear signal of genomic mis- sion as well as its bearing on inferences of adaptive evolution. match, especially for adaptive variation, across the distribution Further dissection using an RDA strategy suggested that a sig- range of P. densata. This mismatch is particularly high in the west- nificant portion (9.2–19.5%) of the variation in the outliers can ern populations where differentiation is the highest and genetic be explained exclusively by IBE. Reduced gene flow across the variation the lowest. This mismatch estimation is probably conser- genome as a result of environmental conditions provides a con- vative as we assumed five environmental layers will be the same in vincing line of evidence for local adaptation (Nosil et al., 2009; the future as they are now, and some true outliers under selection Orsini et al., 2013). Although outlier SNPs comprise a small pro- but were confounded by AFC may have been removed. A strong portion of the genome, selection affecting multiple loci can cause genomic mismatch indicates that populations may be unable to genome-wide reductions in gene flow, even for loci unlinked to persist in situ because contemporary genotypes are not sufficiently those directly under selection, resulting in a signal of IBE across associated with projected climatic variables. This problem is exac- the whole genome (Table 2). Furthermore, spatially divergent erbated for species with long generation times, such as conifers. selection against nonadapted migrants can allow mutations to Responses to change in P. densata are expected to be further hin- establish even if their effect size is relatively small (Nosil et al., dered by the strong geographical isolation among regions, which 2005; Feder et al., 2012a,b; Wang & Bradburd, 2014). The com- limits the ability of populations to track spatially changing niches bined evidence of strong selection against immigrants in recipro- through migration. Given the reduction in genetic diversity found cal transplant trials (Zhao et al., 2014), the detection of genome- in the western populations of P. densata, the low mutation rate in wide differentiation attributable exclusively to environmental conifers (De La Torre et al., 2017) and the low probability of de variables, and the identification of robust outlier associated with novo mutations proving beneficial, persistence in situ under future water availability, temperature, and UVB radiation (Table S3) climates seems unlikely. suggest the strong signature of IBE in P. densata is the result of local adaptation to distinct habitats. Conclusions Our study shows how range expansion across complex landscapes Efficacy of purifying selection and genomic vulnerability to promotes not only allele surfing but also strong spatial structure changing climate through repeated founder effects, geographical isolation and local Organisms cope with changing environments through migration adaptation. While the deep valleys and high mountain ridges of to track ecological niches spatially and/or through adaptation to the QTP have helped to create a global hotspot new conditions in situ (Aitken et al.,2008).Highamountsof (Zheng, 1996; Qiu et al., 2011; Wen et al., 2014; Xing & Ree, standing genetic variation facilitate faster adaptive evolution than 2017), these same features can constrain adaptive responses to cli- waiting for appropriate mutations to arise, especially because most mate change. This should be especially prounced in organisms de novo mutations with fitness consequences are expected to be with limited dispersal, by either selection or physicial barriers to mildly deleterious. In P. densata, genetic diversity clearly declined migration. Other species on the QTP show strong genetic struc- along the east-to-west expansion axis, and this decline was coupled ture consistent with limited effective migration (Qiu et al., 2011; p p p p with an increasing 0/ 4 ratio. The 0/ 4 ratio is generally inter- Wen et al., 2014), which suggests that the strong signal of preted as a measure of the efficacy of purifying selection, with genomic mismatch in P. densata may be representative for other higher values indicating lower selection efficacy (Chen et al., plateau endemics. As we accumulate further examples, it will 2017). Selection operates more effectively in populations with become possible to gain a more general understanding of how p p large effective sizes (Ne). The observed increasing trend of 0/ 4 demography and landscape factors constrain or promote adapta- from east to west was thus probably a consequence of decreasing tion to novel and changing environments. Ne as a result of consecutive bottlenecks (Gao et al.,2012). We examined the DFE of nonsynonymous mutations, which Acknowledgements is a more direct reflection of the efficacy of selection because the method corrects for demography and population structure. We Genomic data processing and analyses were performed using found purifying selection acting less effectively in the western resources provided by the Swedish National Infrastructure for than in the eastern and central populations, with the western Computing (SNIC), through the High Performance Computing

New Phytologist (2020) 228: 330–343 Ó 2020 The Authors www.newphytologist.com New Phytologist Ó 2020 New Phytologist Trust New Phytologist Research 341

Centre North (HPC2N) and the Uppsala Multidisciplinary Cen- Eckert AJ, Wegrzyn JL, Pande B, Jermstad KD, Lee JM, Liechty JD, Tearse BR, tre for Advanced Computational Science (UPPMAX). This study Krutovsky KV, Neale DB. 2009. Multilocus patterns of nucleotide diversity was supported by grants from the National Natural Science and divergence reveal positive selection at candidate genes related to cold hardiness in coastal Douglas Fir (Pseudotsuga menziesii var. menziesii). Genetics Foundation of China (NSFC 31800550, 31670664) and the 183: 289–298. Swedish Research Council (VR). The authors declare there are Edmonds CA, Lillie AS, Cavalli-Sforza LL. 2004. Mutations arising in the wave no conflicts of interest. front of an expanding population. Proceedings of the National Academy of Sciences, USA 101: 975–979. Ellis N, Smith SJ, Pitcher CR. 2012. Gradient forests: calculating importance Author contributions gradients on physical predictors. Ecology 93: 156–168. Excoffier L, Foll M, Petit RJ. 2009. Genetic consequences of range expansions. X-RW, J-FM and WZ planned and designed the research. J-FM, Annual Review of Ecology, Evolution, and Systematics 40: 481–501. WZ and Y-QS conducted the fieldwork. WZ and JP prepared Excoffier L, Ray N. 2008. Surfing during population expansions promotes – DNA samples. WZ analyzed the data. WZ and X-RW wrote the genetic revolutions and structuration. Trends in Ecology & Evolution 23: 347 manuscript draft. WZ, X-RW, ARS and MLA revised the 351. Eyre-Walker A, Keightley PD. 2009. Estimating the rate of adaptive molecular manuscript. evolution in the presence of slightly deleterious mutations and population size change. Molecular Biology and Evolution 26: 2097–2108. Feder JL, Egan SP, Nosil P. 2012a. The genomics of speciation-with-gene-flow. ORCID Trends in Genetics 28: 342–350. Michael L. Arnold https://orcid.org/0000-0002-5920-3051 Feder JL, Gejji R, Yeaman S, Nosil P. 2012b. Establishment of new mutations under divergence and genome hitchhiking. Philosophical Transactions of the Jian-Feng Mao https://orcid.org/0000-0001-9735-8516 Royal Society of London. Series B: Biological Sciences 367: 461–474. Alexis R. Sullivan https://orcid.org/0000-0003-2182-911X Fitzpatrick MC, Keller SR. 2015. Ecological genomics meets community-level Xiao-Ru Wang https://orcid.org/0000-0002-6150-7046 modelling of biodiversity: mapping the genomic landscape of current and – Wei Zhao https://orcid.org/0000-0001-9437-3198 future environmental adaptation. Ecology Letters 18:1 16. Gao J, Wang B, Mao JF, Ingvarsson P, Zeng QY, Wang XR. 2012. Demography and speciation history of the homoploid hybrid pine Pinus densata on the Tibetan Plateau. Molecular Ecology 21: 4811–4827. References Geng Y, Cram J, Zhong Y. 2009. Genetic diversity and population structure of alpine plants endemic to Qinghai-Tibetan Plateau, with implications for Aitken SN, Yeaman S, Holliday JA, Wang T, Curtis-McLane S. 2008. conservation under global warming. In: Mahoney CL, Springer DA, eds. Adaptation, migration or extirpation: climate change outcomes for tree Genetic diversity. New York, NY, USA: Nova Science, 213–228. populations. Evolutionary Applications 1:95–111. Gonzalez-Martınez SC, Ridout K, Pannell JR. 2017. Range expansion Alberto FJ, Aitken SN, Alıa R, Gonzalez-Martınez SC, H€anninen H, Kremer A, compromises adaptive evolution in an outcrossing plant. Current Biology 27: Lefevre F, Lenormand T, Yeaman S, Whetten R et al. 2013. Potential for 2544–2551. evolutionary responses to climate change – evidence from tree populations. Gunther€ T, Coop G. 2013. Robust identification of local adaptation from allele Global Change Biology 19: 1645–1661. frequencies. Genetics 195: 205–220. Bay RA, Harrigan RJ, Underwood VL, Gibbs HL, Smith TB, Ruegg K. 2018. Hallatschek O, Nelson DR. 2010. Life at the front of an expanding population. Genomic signals of selection predict climate-driven population declines in a Evolution 64: 193–206. migratory bird. Science 359:83–86. Hoban S, Kelley JL, Lotterhos KE, Antolin MF, Bradburd G, Lowry DB, Poss Bolger AM, Lohse M, Usadel B. 2014. Trimmomatic: a flexible trimmer for ML, Reed LK, Storfer A, Whitlock MC. 2016. Finding the genomic basis of Illumina sequence data. Bioinformatics 30: 2114–2120. local adaptation: pitfalls, practical solutions, and future directions. The Borcard D, Gillet F, Legendre P. 2011. Numerical ecology with R. New York, NY, American Naturalist 188: 379–397. USA: Springer. Keightley PD, Eyre-Walker A. 2007. Joint inference of the distribution of fitness Brown GR, Gill GP, Kuntz RJ, Langley CH, Neale DB. 2004. Nucleotide effects of deleterious mutations and population demography based on diversity and linkage disequilibrium in loblolly pine. Proceedings of the National nucleotide polymorphism frequencies. Genetics 177: 2251–2261. Academy of Sciences, USA 101: 15255–15260. Klopfstein S, Currat M, Excoffier L. 2006. The fate of mutations surfing on the Caye K, Deist TM, Martins H, Michel O, Francois O. 2016. TESS3: fast wave of a range expansion. Molecular Biology and Evolution 23: 482–490. inference of spatial population structure and genome scans for selection. Lee CR, Mitchell-Olds T. 2011. Quantifying effects of environmental and Molecular Ecology Resources 16: 540–548. geographical factors on patterns of genetic differentiation. Molecular Ecology 20: Chen J, Glemin S, Lascoux M. 2017. Genetic diversity and the efficacy of 4631–4642. purifying selection across plant and animal species. Molecular Biology and Li H. 2011. A statistical framework for SNP calling, mutation discovery, Evolution 34: 1417–1428. association mapping and population genetical parameter estimation from Currat M, Excoffier L. 2005. The effect of the Neolithic expansion on European sequencing data. Bioinformatics 27: 2987–2993. molecular diversity. Proceedings of the Royal Society B: Biological Sciences 272: Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with 679–688. BWA-MEM. [WWW document] URL http://arxiv.org/abs/1303.3997v1302 Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, [q-bio.GN] [accessed 21 June 2018]. Handsaker RE, Lunter G, Marth GT, Sherry ST et al. 2011. The variant call Lotterhos KE, Whitlock MC. 2014. Evaluation of demographic history and format and VCFtools. Bioinformatics 27: 2156–2158. neutral parameterization on the performance of FST outlier tests. Molecular De La Torre AR, Li Z, Van de Peer Y, Ingvarsson PK. 2017. Contrasting Ecology 23: 2178–2192. rates of molecular evolution and patterns of selection among gymnosperms Lotterhos KE, Whitlock MC. 2015. The relative power of genome scans to and flowering plants. Molecular Biology and Evolution 34: 1363– detect local adaptation depends on sampling design and statistical method. 1377. Molecular Ecology 24: 1031–1046. de Villemereuil P, Frichot E, Bazin E, Francois O, Gaggiotti OE. 2014. Luu K, Bazin E, Blum MGB. 2017. pcadapt: an R package to perform genome Genome scan methods against more complex models: when and how much scans for selection based on principal component analysis. Molecular Ecology should we trust them? Molecular Ecology 23: 2006–2019. Resources 17:67–77.

Ó 2020 The Authors New Phytologist (2020) 228: 330–343 New Phytologist Ó 2020 New Phytologist Trust www.newphytologist.com New 342 Research Phytologist

Lynch M, Conery J, Burger R. 1995. Mutation accumulation and the extinction Tellier A, Fischer I, Merino C, Xia H, Camus-Kulandaivelu L, Stadler T, of small populations. The American Naturalist 146: 489–518. Stephan W. 2011. Fitness effects of derived deleterious mutations in four Ma XF, Szmidt AE, Wang XR. 2006. Genetic structure and evolutionary history closely related wild tomato species with spatial structure. Heredity 107: 189– of a diploid hybrid pine Pinus densata inferred from the nucleotide variation at 199. seven gene loci. Molecular Biology and Evolution 23: 807–816. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy- Manel S, Holderegger R. 2013. Ten years of landscape genetics. Trends in Ecology Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J et al. 2013. From & Evolution 28: 614–621. FastQ data to high confidence variant calls: the Genome Analysis Toolkit best Mao JF, Li Y, Wang XR. 2008. Empirical assessment of the reproductive fitness practices pipeline. Current Protocols in Bioinformatics 43: 11.10.11–11.10.33. components of the hybrid pine Pinus densata on the Tibetan Plateau. Wang B, Mao JF, Gao J, Zhao W, Wang XR. 2011. Colonization of the Tibetan Evolutionary Ecology 23: 447. Plateau by the homoploid hybrid pine Pinus densata. Molecular Ecology 20: Mao JF, Wang XR. 2011. Distinct niche divergence characterizes the homoploid 3796–3811. hybrid speciation of Pinus densata on the Tibetan Plateau. The American Wang IJ, Bradburd GS. 2014. Isolation by environment. Molecular Ecology 23: Naturalist 177: 424–439. 5649–5662. Neale DB, Savolainen O. 2004. Association genetics of complex traits in conifers. Wang XR, Szmidt AE. 1994. Hybridization and chloroplast DNA variation in a Trends in Plant Science 9: 325–330. Pinus species complex from Asia. Evolution 48: 1020–1031. Neale DB, Wegrzyn JL, Stevens KA, Zimin AV, Puiu D, Crepeau MW, Waters JM, Fraser CI, Hewitt GM. 2013. Founder takes all: density-dependent Cardeno C, Koriabine M, Holtz-Morris AE, Liechty JD et al. 2014. Decoding processes structure biodiversity. Trends in Ecology & Evolution 28:78–85. the massive genome of loblolly pine using haploid DNA and novel assembly Wen J, Zhang JQ, Nie ZL, Zhong Y, Sun H. 2014. Evolutionary diversifications strategies. Genome Biology 15: R59. of plants on the Qinghai-Tibetan Plateau. Frontiers in Genetics 5:4. Neves LG, Davis JM, Barbazuk WB, Kirst M. 2013. Whole-exome targeted Wright S. 1943. Isolation by distance. Genetics 28: 114–138. sequencing of the uncharacterized pine genome. The Plant Journal 75: 146– Wright SI, Gaut BS. 2005. Molecular population genetics and the search for 156. adaptive evolution in plants. Molecular Biology and Evolution 22: 506–519. Nosil P, Funk DJ, Ortiz-Barrientos D. 2009. Divergent selection and Xia H, Wang B, Zhao W, Pan J, Mao JF, Wang XR. 2018. Combining heterogeneous genomic divergence. Molecular Ecology 18: 375–402. mitochondrial and nuclear genome analyses to dissect the effects of Nosil P, Vines TH, Funk DJ. 2005. Reproductive isolation caused by natural colonization, environments and geography on population structure in Pinus selection against immigrants from divergent habitats. Evolution 59: tabuliformis. Evolutionary Applications 11: 1931–1945. 705–719. Xing Y, Ree RH. 2017. Uplift-driven diversification in the Hengduan Oksanen J, Blanchet FG, Friendly M, Kindt R, Legendre P, McGlinn D, Mountains, a temperate biodiversity hotspot. Proceedings of the National Minchin PR, O’Hara RB, Simpson GL, Solymos P et al. 2016. vegan: Academy of Sciences, USA 114: E3444–E3451. community ecology package. R package v.2.4-1. [WWW document] URL https:// Yeaman S, Hodgins KA, Lotterhos KE, Suren H, Nadeau S, Degner JC, CRAN.R-project.org/package=vegan [accessed 28 June 2018]. Nurkowski KA, Smets P, Wang T, Gray LK et al. 2016. Convergent local Orsini L, Vanoverbeke J, Swillen I, Mergeay J, De Meester L. 2013. Drivers of adaptation to climate in distantly related conifers. Science 353: 1431–1433. population genetic differentiation in the wild: isolation by dispersal limitation, Zhao W, Meng J, Wang B, Zhang L, Xu Y, Zeng QY, Li Y, Mao JF, Wang XR. isolation by adaptation and isolation by colonization. Molecular Ecology 22: 2014. Weak crossability barrier but strong juvenile selection supports ecological 5983–5999. speciation of the hybrid pine Pinus densata on the Tibetan Plateau. Evolution Peischl S, Dupanloup I, Bosshard L, Excoffier L. 2016. Genetic surfing in 68: 3120–3133. human populations: from genes to genomes. Current Opinion in Genetics & Zheng D. 1996. The system of physico-geographical regions of the Qinghai- Development 41:53–61. Xizang (Tibet) Plateau. Science in China Series D–Earth Sciences 39: 410–417. Peischl S, Kirkpatrick M, Excoffier L. 2015. Expansion load and the evolutionary dynamics of a species range. The American Naturalist 185: E81– E93. Supporting Information Pereira P, Teixeira J, Velo-Anton G. 2018. Allele surfing shaped the genetic structure of the European pond turtle via colonization and population Additional Supporting Information may be found online in the expansion across the Iberian Peninsula from Africa. Journal of 45: Supporting Information section at the end of the article. 2202–2215. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Fig. S1 Plot of zygotic LD (r2) against the physical distance in 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genetics 38: 904–909. base pairs between all pairs of SNPs. Qiu YX, Fu CX, Comes HP. 2011. Plant molecular phylogeography in China and adjacent regions: tracing the genetic imprints of Quaternary climate and Fig. S2 Outlier SNPs identified by BAYENV2 and PCADAPT, and environmental change in the world’s most diverse temperate flora. Molecular their overlap with loci showing allele frequency clines (AFCs). Phylogenetics and Evolution 59: 225–244. Raj A, Stephens M, Pritchard JK. 2014. fastSTRUCTURE: variational inference Fig. S3 Ranked importance of environmental variables based on of population structure in large SNP data sets. Genetics 197: 573–589. Ruegg K, Bay RA, Anderson EC, Saracco JF, Harrigan RJ, Whitfield M, Paxton gradient forest analysis for all variation (a) and adaptive variation EH, Smith TB. 2018. Ecological genomics predicts climate vulnerability in an (b). endangered southwestern songbird. Ecology Letters 21: 1085–1096. Ruiz Daniels R, Taylor RS, Serra-Varela MJ, Vendramin GG, Gonzalez- Fig. S4 Comparison of the number of SNPs with positive R2 Martinez SC, Grivet D. 2018. Inferring selection in instances of long-range with environments, and the mean R2 across these SNPs between colonization: the Aleppo pine (Pinus halepensis) in the Mediterranean Basin. Molecular Ecology 27: 3331–3345. 10 randomized GF models and our final selected GF models. Sork VL, Nason J, Campbell DR, Fernandez JF. 1999. Landscape approaches to historical and contemporary gene flow in plants. Trends in Ecology & Evolution Table S1 Environmental parameters used in this study. 14: 219–224. Storey JD, Bass AJ, Dabney A, Robinson D. 2015. qvalue: Q-value estimation for Table S2 Probe capture efficiency, number of sequence reads, false discovery rate control. R package v.2.2.2. [WWW document] URL http:// github.com/jdstorey/qvalue [accessed 28 June 2018]. mean sequencing depth and genome coverage for each sample.

New Phytologist (2020) 228: 330–343 Ó 2020 The Authors www.newphytologist.com New Phytologist Ó 2020 New Phytologist Trust New Phytologist Research 343

Table S3 Correlation of climate variables and genomic variation Please note: Wiley Blackwell are not responsible for the content components with longitude. or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be Table S4 Functional enrichment of the outlier genes. directed to the New Phytologist Central Office.

New Phytologist is an electronic (online-only) journal owned by the New Phytologist Trust, a not-for-profit organization dedicated to the promotion of plant science, facilitating projects from symposia to free access for our Tansley reviews and Tansley insights.

Regular papers, Letters, Research reviews, Rapid reports and both Modelling/Theory and Methods papers are encouraged. We are committed to rapid processing, from online submission through to publication ‘as ready’ via Early View – our average time to decision is <26 days. There are no page or colour charges and a PDF version will be provided for each article.

The journal is available online at Wiley Online Library. Visit www.newphytologist.com to search the articles and register for table of contents email alerts.

If you have any questions, do get in touch with Central Office (np-centraloffi[email protected]) or, if it is more convenient, our USA Office (np-usaoffi[email protected])

For submission instructions, subscription and all the latest information visit www.newphytologist.com

Ó 2020 The Authors New Phytologist (2020) 228: 330–343 New Phytologist Ó 2020 New Phytologist Trust www.newphytologist.com