Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri

Guo-Dong Wanga,b,1, Bao-Lin Zhanga,c,1, Wei-Wei Zhoua,d, Yong-Xin Lia,c, Jie-Qiong Jina, Yong Shaoa, He-chuan Yange, Yan-Hu Liuf, Fang Yana, Hong-Man Chena, Li Jing, Feng Gaoh, Yaoguang Zhangg, Haipeng Lib,h, Bingyu Maoa,b, Robert W. Murphya,i, David B. Wakej,2, Ya-Ping Zhanga,b,2, and Jing Chea,b,d,2 aState Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, ; bCenter for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; cKunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, Yunnan, China; dSoutheast Asia Biodiversity Research Institute, Chinese Academy of Sciences, Yezin, 05282 Nay Pyi Taw, Myanmar; eHuman Genetics, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, Singapore; fLaboratory for Conservation and Utilization of Bio-Resources & Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming 650091, China; gKey Laboratory of Freshwater Fish Reproduction and Development of the Ministry of Education and Key Laboratory of Aquatic Science of Chongqing, Southwest University School of Life Sciences, Chongqing 400715, China; hCAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China; iCentre for Biodiversity and Conservation Biology, Royal Ontario Museum, Toronto, ON, Canada M5S 2C6; and jDepartment of Integrative Biology and Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720-3160

Contributed by David B. Wake, April 13, 2018 (sent for review September 17, 2017; reviewed by Scott V. Edwards and Craig Moritz) Tibetan frogs, Nanorana parkeri, are differentiated genetically but speciation (16). However, gene flow can also facilitate evolution not morphologically along geographical and elevational gradients and speciation by transferring adaptive genes or generating novel in a challenging environment, presenting a unique opportunity to genes (17). A genomic approach has the potential to identify investigate processes leading to speciation. Analyses of whole ge- genes that are important for adaptation and speciation, which nomes of 63 frogs reveal population structuring and historical de- often evolve more rapidly than other genes (18). Growing numbers mography, characterized by highly restricted gene flow in a of studies have used genome-scale comparisons in phylogeography narrow geographic zone lying between matrilines West (W) and – and speciation [e.g., birds (19 22), fishes (23, 24), mammals (25, 26), EVOLUTION East (E). A population found only along a single tributary of the Yalu Zangbu River has the mitogenome only of E, whereas nuclear reptiles (27, 28)]. genes of W comprise 89–95% of the nuclear genome. Selection accounts for 579 broadly scattered, highly divergent regions Significance (HDRs) of the genome, which involve 365 genes. These genes fall into 51 gene ontology (GO) functional classes, 14 of which are Central topics in evolutionary biology include uncovering the likely to be important in driving reproductive isolation. GO enrich- processes and genetic bases of speciation and documenting ment analyses of E reveal many overrepresented functional cate- environmental adaptations and processes responsible for gories associated with adaptation to high elevations, including them. The challenging environment of the Qinghai-Tibetan blood circulation, response to hypoxia, and UV radiation. Four Plateau (QTP) facilitates such investigations, and the Ti- genes, including DNAJC8 in the brain, TNNC1 and ADORA1 in the betan frog, Nanorana parkeri, offers a unique opportunity to heart, and LAMB3 in the lung, differ in levels of expression be- investigate these processes. A cohort of whole-genome se- tween low- and high-elevation populations. High-altitude adapta- quences of 63 individuals from across its entire range opens tion plays an important role in maintaining and driving continuing avenues for incorporating population genomics into studies divergence and reproductive isolation. Use of total genomes en- of speciation. Natural selection plays an important role in abled recognition of selection and adaptation in and between maintaining and driving the continuing divergence and re- populations, as well as documentation of evolution along a step- productive isolation of populations of the species. The QTP is ped cline toward speciation. a natural laboratory for studying how selection drives ad- aptation, how environments influence evolutionary history, gene flow | hybridization | natural selection | population genomics | and how these factors can interact to provide insight into speciation speciation.

peciation, the fundamental phenomenon underlying bio- Author contributions: W.-W.Z., D.B.W., Y.-P.Z., and J.C. designed research; G.-D.W., Y.-P.Z., and J.C. managed this project; B.-L.Z., J.-Q.J., Y.S., and J.C. collected the samples and per- Sdiversity, continues to be a central focus of research in evo- formed physiological measurement; B.-L.Z., J.-Q.J., F.Y., and H.-M.C. performed DNA exper- lutionary biology. Disentangling pattern and process and gaining iments; Y.-X.L., Y.S., and B.M. performed the gene expression experiment; B.-L.Z., H.-c.Y., an understanding of the underlying genetic mechanisms as spe- and Y.-H.L. performed the G-PhoCS analysis and coalescent simulations; B.-L.Z., L.J., and Y.Z. ciation proceeds are central issues in species biology (1, 2). Most performed the histological experiment; F.G. and H.L. calculated the population-scaled recombination rates; G.-D.W. and B.-L.Z. analyzed the data; and G.-D.W., B.-L.Z., W.-W.Z., vertebrate species arise by vicariant isolation. This form of spe- R.W.M., D.B.W., Y.-P.Z., and J.C. wrote the manuscript. ciation takes time and the right combination of stochastic genetic Reviewers: S.V.E., Harvard University; and C.M., Australian National University. change and natural selection (3, 4). It can occur swiftly under The authors declare no conflict of interest. certain circumstances, for example, when populations experience This open access article is distributed under Creative Commons Attribution-NonCommercial- rapid demographic changes (e.g., bottlenecks, expansions) (5, 6) NoDerivatives License 4.0 (CC BY-NC-ND). or when they are exposed to ecological shifts (7–9). Speciation The sequence data reported in this paper have been deposited in the genome sequence becomes difficult to understand when it involves many non- archive of Beijing Institute of Genomics, Chinese Academy of Sciences, gsa.big.ac.cn (ac- exclusive mechanisms (10). Fortunately, approaches using whole cession nos. CRA000919 for N. parkeri, and CRA000918 for N. pleskei). genomes or transcriptomes are opening new research pathways 1G.-D.W. and B.-L.Z. contributed equally to this work. – (11 13). 2To whom correspondence may be addressed. Email: [email protected], zhangyp@mail. Speciation accompanied by gene flow (i.e., without complete kiz.ac.cn, or [email protected]. geographical isolation) is thought to be common in animal This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. evolution (14, 15). The role of gene flow in speciation remains 1073/pnas.1716257115/-/DCSupplemental. controversial because it is often assumed to be an impediment to www.pnas.org/cgi/doi/10.1073/pnas.1716257115 PNAS Latest Articles | 1of10 How organisms adapt and diversify in high-altitude environ- admixture is known. No morphological differences between W ments has attracted much attention in recent years (29–31). and E have been noted (44). Clines in elevation, genetic- Because of its heterogeneous topography, inhospitable environ- morphological discordance, and mtDNA introgression in the ment, and complex paleoclimate history, the Qinghai-Tibetan zone of admixture offer an unprecedented opportunity to in- Plateau (QTP) is a natural laboratory for studying adaptation vestigate the role selection plays in driving rapid genetic and speciation (32–35). Differences in elevation between valleys change, environmental adaptation, and the evolution of species and mountaintops often exceed 2,000 m, presenting strong eco- differences. logical gradients in abiotic variables, such as oxygen partial The annotated genome of N. parkeri (45) facilitates investi- pressure (36), UV radiation (37), precipitation (38), and ambient gations into the evolutionary drivers of population divergence, temperatures (39). The periodicity of climatic oscillations affects and possibly speciation, by enabling the identification of se- diverse phenomena, such as dispersal, fluctuations in population lected genes and their functions. Herein, we report on whole- size, and speciation (32, 40). genome sequences of 63 N. parkeri individuals from across the An endemic Tibetan frog, Nanorana parkeri, occurs in lentic range of the species and decipher the genetic mechanisms that environments of the southeastern QTP along the Yalu Zangbu may be responsible for driving population divergence. We re- River (YZR) drainage. It faces challenges that few other construct the evolutionary history of N. parkeri from a genomic amphibians experience. Not unusual for frogs, it occurs at an perspective, investigate the impacts of isolation and gene flow elevation of 2,800 m, yet it is the only amphibian to also exist on the process of speciation, and explore genomic conse- at 5,000 m (41), where oxygen is scarce and UV radiation is quences. We also evaluate ecological factors leading to speci- dangerously high. Western (W) and eastern (E) mitochondrial ation and identify candidate genes underlying the observed DNA matrilines of N. parkeri diverged in the middle Pleisto- differentiation. cene and formed distinct entities with overlapping elevational ranges (42). Three nuclear DNA loci correspond to this pat- Results tern with limited admixture of E and W alleles in two localities Sampling and Sequencing. Collection localities ranged from near their geographic boundaries (43). However, neither the 2,900 to 4,900 m in elevation and covered the entire documented extent of gene flow nor the degree of isolation in the zone of distribution of N. parkeri (Fig. 1A and SI Appendix, Table S1).

Fig. 1. Sampling sites and population structure of the Tibetan frog (N. parkeri). (A) Sampling locations (ArcGIS 10.2; esri). Site numbers refer to SI Appendix, Table S1. Colors denote the five main groups recovered from population structure and phylogenetic analyses. Gray indicates hybrid populations (25–27). (B) PCA of all high-coverage samples. (C) Principle component plot of E samples only. (D) ML tree based on concatenated sequences. W, green; E1, purple; E2, yellow; E3, light blue; E4, red.

2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1716257115 Wang et al. The 45 E and W individuals were sequenced for 33.77 Gb (16.89- Table 1. Admixture signatures from D-statistic tests fold coverage) on average, and 18 individuals with mixed E and Test D statistic Z score W heritage were sequenced for 11.92 Gb (5.96-fold coverage). Most reads were aligned (90.88% average mappable rate) to the W, E1; E3,E4 −0.247 −50.868 reference genome of N. parkeri belonging to E (45). After gen- W, E1; E3,E2 −0.264 −61.376 otyping and stringent quality-filtering, we retained 8.59 million W, E1; E4, E2 −0.009 −2.354 single-nucleotide polymorphisms (SNPs) of E and W individuals W, E2;E3,E4 0.117 21.667 and 6.44 million SNPs of mixed heritage. E1, E2; E3, E4 0.260 78.181 Populations with gene flow are denoted in boldface. Population Structure, Phylogeny, and Introgression. Principal com- ponent analysis (PCA) and population structure analyses unambiguously identify five genetic clusters. The PCA plot 36.9–55.6 Ka) and 39.3 Ka (95% CI: 32.1–46.6 Ka), respectively. separates populations W and E along the first eigenvector, which We may underestimate the divergence time because the muta- explains 57.76% of total genetic variance (Fig. 1B). The second – tion rate [0.776e-09 per site per year (45)] is a conservative value. eigenvector identifies subpopulations E1 E4 and explains 8.80% Demographic analysis shows evidence of genetic exchanges of the variance. E1 and E3 comprise samples from low elevations among all subpopulations of E as well as directional migration (∼2,900–3,300 m; localities 1–5; Fig. 1 A–C) near the Yalu from E4 to W (Fig. 3A). Gene flow from the hypothesized an- Zangbu Grand Canyon, and no obvious geographic barrier sep- cestral population of E2–E4 into W occurred between 45.9 and arates them. Subpopulation E2 includes samples from high ele- 181.7 K (G-PhoCS analyses). Coalescent simulations of gene flow vations (∼3,900–4,900 m; localities 6–8; Fig. 1 A–C). The a between W and E under different migration scenarios suggest that remaining localities, at elevations from ∼3,700–4,500 m, form the inferred demographic parameters were most concordant with subpopulation E4. Phylogenetic analyses based on 1,000 neutral B loci (Fig. 1D) and gene tree-based coalescent (SI Appendix, Figs. the observed patterns of differentiation (Fig. 3 ). S1 and S2) methods display nearly identical topologies. Pop- Genomic Isolation Between West and East Matrilines. We used sev- ulations W and E split into distinct branches, with subpopulation eral statistical approaches to assess the extent of genomic differ- E1 in the most basal position, subpopulation E2 following entiation between W and E. High levels of differentiation between subpopulation E1, and then allopatric units E3 and E4. Pop- EVOLUTION ulation structure analysis corroborates the phylogenetic analy- W and E and within E are found using pairwise mean relative di- vergence (FST) and absolute sequence divergence (dxy). The mean ses and PCA, and identifies signs of nuclear gene flow from E = ± = ± to W at their boundary (Fig. 2). Two topology-based methods FST 0.4787 0.0097 and dxy 0.0030 0.0002 values are more verify nuclear admixture among some of the matrilines. The than threefold higher than those among subpopulations of E (FST: ± ± SI Appendix TreeMix analysis (SI Appendix,Fig.S3) and the D-statistic test 0.1405 0.0322, dxy: 0.0009 0.0001; Fig. 4 and , (Table 1) show gene flow within sympatric E1 and E3. A sig- Tables S3 and S4). Levels of divergence using the density of df nificant signature of admixture is found between E2 and E4 (jZj > fixed differences per site between populations ( )areabout SI 3; Table 1). 100-fold higher than those within subpopulations of E ( Appendix,TableS5). The genomic landscape of divergence using Demography History. We used the generalized phylogenetic co- the df statistic finds large numbers of loci across the genome fixed alescent sampler (G-PhoCS) (46) to infer ancestral population between W and subpopulations of E (Fig. 4A and SI Appendix, sizes, divergence times, and migration rates. We estimate the Fig. S5). divergence time of W and E from ∼688.3–846.0 kya (Ka) (Fig. We used a modified population branch statistic (PBS) ap- 3A). The ancestral effective population size of E at this period proach to measure the extent of genomic differentiation between was reduced by ∼56% (SI Appendix, Table S2). Pairwise se- W and all four subpopulations of E (Materials and Methods). The quentially Markovian coalescence (PSMC) results suggest that mean PBS value of W (PBSw) is 0.5583. Neutral coalescent W was reduced by ∼50% (SI Appendix, Fig. S4). Population simulations based on the inferred demographic model find the expansion occurred two- to threefold until the penultimate gla- top 2.5% of the observed PBSW values (ca. PBSW ≥ 0.8906; Fig. ∼ – K ciation [Marine Isotope Stage 6, 100 200 a (47)]. E1 diverged 5A) to be significantly higher than simulated neutral values (P < – ∼ K = – from E2 E4 at 181.7 a [95% confidence interval (CI) 158.0 2.2e-16, two-tailed Mann–Whitney test; Fig. 6). Because all five K 205.4 a]. The time of divergence for E2 relative to E3 and E4, populations of the species expanded recently (based on PSMC K and between the latter two, is estimated at 45.9 a (95% CI: analysis), we performed the simulation taking this expansion into account; results were similar to those under the G-PhoCS model (SI Appendix, section 1.1). Accordingly, we can identify highly divergent regions (HDRs) of the genome with the observed PBSw ≥ 0.8906; there are 579 regions with a total length of 39.60 Mb. The longest fragment is 350 kb, but most (76.34%) are only a one-window-length (50 kb) fragment (SI Appendix, Fig. S6). FST and dxy values are significantly higher in HDRs than those in the background genome (P < 2.2e-16, two-tailed Mann–Whitney test; Fig. 5B and SI Appendix, Table S6). The correlation be- tween FST and dxy is significantly positive (Pearson’s R > 0.3, P < 2.2e-16; SI Appendix, Fig. S7), reflecting reduced gene flow in these regions. A contrasting pattern between W and E occurs with respect to nucleotide diversity (π)(SI Appendix, Table S6). The value in HDRs of W is no less than for the rest of the ge- nome (0.00078 vs. 0.00067) but is reduced significantly (P < 2.2e-16) in the corresponding regions of subpopulations of E (Fig. 5B and Fig. 2. Population structure plots with the number of ancestral clusters SI Appendix, Table S6). The reduced level of intrapopulation (K) = 2–5. diversity in HDRs suggests that selection has affected E. Relative

Wang et al. PNAS Latest Articles | 3of10 genome (SI Appendix, Figs. S8 and S9). However, all mixed in- dividuals have the mitogenome of E (SI Appendix, Fig. S10). We used PCAdmix to identify genomic regions from E in the ge- nomes of the mixed individuals. The low level of E ancestry detected (∼5.9–11.2%; SI Appendix, Fig. S11) is not caused by systematic errors (<2%; SI Appendix, Table S8). Regions of E ancestry occur in no less than 67% (24 of 36 haploids) of mixed individuals. Such regions, about 1.6 Mb in size, are dispersed randomly across the genome, with a mean length of 62 Kb. These regions contain 31 protein-coding genes, one relevant to mito- chondria [VDAC3 (48)], but none exhibit reduced levels of polymorphism (0.00059 > 0.00046) or significant variation in in- tergroup and intragroup nonsynonymous/synonymous ratios [P > 0.05, McDonald–Kreitman test (49)]. We detect no signal of se- lection on the mitogenome in the zone of admixture (Tajima D = −1.49, P > 0.10).

Incomplete Isolation and Gene Flow Within the Geographic Range of E Matrilines. Levels of whole-genomic differentiation among E1– E4 are comparatively small. Interpopulation FST values range from 0.0974 to 0.1825, and dxy values range from 0.0007 to 0.0010 (SI Appendix, Tables S3 and S4). Mean df values within − − the subpopulations range from 1.27 × 10 07 to 6.02 × 10 07 (SI Appendix, Figs. S12 and S13 and Table S5), which corresponds to ∼1.27–6.02 fixed differences for every 10 Mb of sequence data in compared populations. We applied the PBS statistic to identify HDRs and then examined the features of these regions. Similar patterns occur for each subpopulation as follows: (i) most outlier windows (>72%) are discontinuous and one window in length (50 kb) (SI Appendix,Fig.S6); (ii)FST and dxy values are sig- − nificantly higher than the background genome (P < 2.2 × 10 16, two-tailed Mann–Whitney test; Fig. 5D and SI Appendix,Fig. S14 and Table S6); (iii) π is significantly lower in HDRs rela- tive to the rest of the genome, although the level varies among the subpopulations (Fig. 5D and SI Appendix,Fig.S14and Table S6); and (iv)E1–E3 are skewed toward high frequencies of derived variants (Fig. 5D and SI Appendix,Fig.S14and Table S6). Fig. 3. Demographic inference. (A) Demographic history inferred by G- PhoCS. Widths of branches are proportional to Ne. Horizontal dashed lines Environmental, Histological, and Physiological Divergence Affecting E denote posterior estimates for divergence times, associated mean values are Matrilines. In a PCA of all bioclimatic variables associated with shown in bold, and 95% credible intervals are shown in parentheses. Arrows frog sampling sites (SI Appendix, Fig. S15), PC1 explains 49.13% indicate the direction of gene flow, and associated figures indicate the es- of the variation. This value differs significantly between low- timates of total migration rates. (B) Distribution of F values in the ST(W, E1) elevation clades E1 and E3 and high-elevation clades E2 and observed data and in the simulated data under different migration scenarios P = between W and E. The full model shows simulation with the full set of de- E4 ( 3.3e-4). Histological sections of middorsal skin show mographic parameters inferred from G-PhoCS. The no_E234 refers to the simu- that the frogs from high elevations have a significantly greater lation without the postdivergence migration from E234 to W. The no_E refers to number of granular glands than frogs from low elevations (P < the simulation without current and postdivergence migration from E to W. 0.05, two-tailed t test; Fig. 7A); such glands may function in re- sponse to environmental stimuli (50). Hemoglobin (Hb) levels in peripheral blood are correlated with elevation (P < 0.01; Fig. to the genomic background, these regions also contain a signif- 7B). In contrast to Hb levels, muscular oxygen content in rela- icantly high derived allele frequency (DAF) (P < 2.2e-16). tively low-elevation populations is significantly higher than muscular oxygen content in populations from higher elevations Divergence of W and E Matrilines in Relation to Reproductive Isolation. (P < 0.01; Fig. 7C); this corresponds to the decreased oxygen We used the annotated genome of N. parkeri to infer the func- content in air. tions of candidate target genes within the HDRs that might re- late to speciation. We annotated 365 protein-coding genes in the Genes, Ecological Factors, and the Evolution of E Matrilines. Genes HDRs of PBSW. Gene ontology (GO) evaluations identified from the HDRs of each subpopulation were evaluated for 51 functional classes that were significantly overrepresented (P < functional categories that related to environmental factors. GO 0.05; SI Appendix,TableS7). In total, 21 genes distributed among enrichment analyses revealed many overrepresented functional 14 GO terms have functions related to reproduction, including, categories that appear to associate with adaptation to the envi- as examples, reproductive developmental process (GO:0003006), ronment of the QTP (SI Appendix, Tables S9–S12). For instance, sexual reproduction (GO:0019953), and spermatogenesis (GO:0007283) Kyoto Encyclopedia of Genes and Genomes pathways and GO (Table 2 and SI Appendix,TableS7). categories related to metabolism (fatty acid metabolism, hsa00071), blood circulation (circulatory system process, GO:0003013), motility Little Representation of E Matrilines in the Nuclear Genome Within (muscle cell differentiation, GO:0042692), and immune response the Zone of Admixture. Individuals in the zone of admixture are (inflammatory response, GO:0006954) characterize low-elevation more closely related to W than E with respect to the nuclear E1 (SI Appendix,TableS9). GO categories associated with apoptosis

4of10 | www.pnas.org/cgi/doi/10.1073/pnas.1716257115 Wang et al. Fig. 4. Genomic divergence associated with species formation. (A)FST distribution between W and E1 and their landscape of genomic divergence measured by the df. Both statistics are measured from 50-kb nonoverlapping windows. Scaffolds are concatenated to reveal the whole-genomic divergence pattern.

(B)FST distribution between E1 and E2 and their landscape of genomic divergence. (C)FST distribution between E3 and E4 and their landscape of genomic divergence.

(induction of apoptosis, GO:0006917) were significantly enriched in of male reproductive genes, for instance, in primates (55), birds high-elevation E2 (SI Appendix, Table S10), and response to (19), fruit flies (56), and amphibians (57). Taken together with

radiation (GO:0009314) and light stimulus (GO:0009416) were the fact that W, E2, and E4 occupy similar climatic environments EVOLUTION significantly enriched in high-elevation E4 (SI Appendix,TableS11). (SI Appendix, Fig. S15) and display no differences in morphol- GO categories associated with blood circulation (GO:0008015) and ogy, we suggest that endogenous selection is a dominant factor metabolism process (response to lipid, GO:0033993) are present building reproductive isolation, resulting in speciation between in low-elevation E3 (SI Appendix, Table S12). W and E matrilines. To verify the function of the genes in HDRs, we performed Populations from the geographic region of admixture between real-time quantitative PCR on seven candidate genes from three E and W matrilines occur along a narrow valley containing a different tissues (heart, lung, and brain). These seven candidate tributary of the YZR. They have the mitogenome of E, whereas genes were identified based upon what they potentially contrib- nuclear genes of W strongly dominate. Such mitonuclear dis- SI Appendix ute to high-altitude adaptation ( , Table S13). Four of cordance is common in vertebrates (58); however, here, a cohort these differ in their levels of expression between low-elevation of N. parkeri genomes offers an unusual opportunity to decipher D and high-elevation lineages (Fig. 7 ), including two related to the genomic pattern and the genetic mechanism of the phe- TNNC1 cardiac function in the heart [Troponin C1 Slow ( ) (51) nomenon. Population genomic analyses reveal the E intro- and Adenosine A1 Receptor (ADORA1) (52)] and two hypoxia- DNAJC8 gressed regions in hybrids are small and randomly dispersed in related genes [DnaJ Homolog Subfamily C Member 8 ( ) the genome. Those short introgressed segments suggest to us in the brain (53) and Laminin Subunit Beta 3 (LAMB3) in the that they formed by hybridization, and were subsequently broken lung (54)]. into even smaller sizes by recombination operating over many Discussion generations (59, 60). We detect no signal of selection in either Genomic Isolation and Speciation Between E and W Matrilines. the regions of introgression or the mitochondrial genome, and Population genomic analyses find substantial genomic isolation no significant enrichment of the pathways relevant to mito- between W and E, with limited directional gene flow. Gene flow chondria or of the GOs of genes in introgressed regions. Thus, has been found only in one geographically restricted area, where the mitonuclear discordance is not caused by natural selection no obvious geographic barrier (e.g., mountains) exists that could driven by environmental factors (29, 61, 62). A likely scenario restrict dispersal. Further, the genetic patterns indicate that frogs is establishment of reproductive isolating barriers (RIBs) from both E and W cross the YZR within their ranges and reject resulting from secondary contact with historical and limited the null hypothesis of panmixia within a single species. Neutral introgression after mid-Pleistocene divergence (63, 64). We hy- processes, including divergence, changes of current, and ances- pothesize that hybrids formed historically only between females tral effective population size (Ne), and levels of gene flow have of E and males of W upon secondary contact, with hybrid females the potential to explain the broad pattern of differentiation be- continuing to backcross to the males of W, leading to postzygotic, tween W and E. However, the HDRs between W and E exhibit prezygotic, or both categories of RIBs. significantly more genetic differentiation than our simulation using the inferred history. HDRs could result from positive se- Ecological Adaptation Within E Matrilines. E Tibetan frogs primarily lection and not demographic history (23). Furthermore, the segregate into high-elevation (>3,700 m, E2 and E4) and low- HDRs in subpopulations of E exhibit significantly lower π and elevation (<3,700 m, E1 and E3) populations (Fig. 1A). This pattern higher DAF compared with the rest of the genome (Fig. 5B), likely reflects an ecological stratification in the southeastern suggesting the action of positive selection on E. Because the QTP, including oxygen partial pressure (36), UV radiation (37), HDRs of W and E harbor many genes that relate to GO cate- precipitation (38), and ambient temperature (39). If ecological selection gories involved in fertilization, particularly to spermatogenesis, is one of the forces that drives differentiation of the populations of they likely lead to reproductive isolation by suppressing gene E, one expects to find an imprint on genomic regions involved in flow. Studies of other species have detected the rapid evolution ecological adaptation (27, 65).

Wang et al. PNAS Latest Articles | 5of10 Fig. 5. Identification of HDRs. (A) Distribution of FST-based statistic of PBSW. HDRs in top 2.5% PBS distribution, light blue; outside HDRs, gray. (B) Com- parisons of HDRs of PBSW in terms of FST, π, dxy, and DAF versus the genomic background. (C) Distribution of PBSE1.(D) Comparisons of HDRs of PBSE1 with the genomic background. Asterisks designate levels of significance between HDRs and outside HDRs by a two-tailed Mann–Whitney test (*P < 0.01; **P < 1e-8; ***P < 2.2e-16).

The HDRs of each subpopulation of E exhibit signals of se- Materials and Methods lection compared with the background, including low π, in- Sample Collection and Sequencing. Tibetan frogs (45 in total) were selected for creased dxy between populations, and DAF. GO enrichment our genomic study based on matrilineal (mtDNA) genealogies (42). An ad- analyses point to environmental drivers of divergence. No clear ditional 18 individuals come from the area of mixed clades in a tributary of geographic barriers separate the low-elevation sympatric pair the YZR (localities 25, 26, and 27; Fig. 1). One individual of Nanorana pleskei E1 and E3. Although they were formed at different times (Fig. was used as the outgroup taxon (66). All collections were made according to 3A), they share several GO functional categories, such as blood circulation and metabolic processes. Given the phylogeny and population history, these functions likely evolved independently in each lineage. The histological, physiological, and expressional evidence (Fig. 7) corresponds to the scenario that ecological stratification in the southeastern QTP has greatly promoted adaptive population diversification in situ. In summary, N. parkeri is a useful model to investigate the processes and genetic bases of speciation along geographic and environmental gradients. We argue that natural selection plays important roles in driving continuing divergence within the species, and even in maintaining it. The extreme environments of the Tibetan Plateau can drive the rapid evolution of species [e.g., yak (33)]. Given the rapidity of changes and the challenging environ- ment, the area is a natural laboratory for studying how selection drives adaptation; how environments influence evolutionary history; Fig. 6. Distribution of observed genome-wide top 2.5% PBSW values and, in some cases, how speciation can occur. compared with the simulated PBSW values under the full model.

6of10 | www.pnas.org/cgi/doi/10.1073/pnas.1716257115 Wang et al. Table 2. GO analysis of genes located in regions that strongly differentiated W from all four subpopulations of E Category Term No. of genes P value

Cluster 1 GO:0048538∼thymus development 5 0.00 GO:0048534∼hemopoietic or lymphoid organ development 12 0.01 GO:0002520∼immune system development 12 0.01 Cluster 2 GO:0003006∼reproductive developmental process 15 0.00 GO:0048610∼reproductive cellular process 11 0.00 GO:0048609∼reproductive process in a multicellular organism 18 0.01 GO:0032504∼multicellular organism reproduction 18 0.01 GO:0019953∼sexual reproduction 17 0.01 GO:0007281∼germ cell development 7 0.01 GO:0007276∼gamete generation 15 0.01 GO:0048232∼male gamete generation 12 0.02 GO:0007283∼spermatogenesis 12 0.02 Cluster 3 GO:0035270∼endocrine system development 6 0.01 GO:0030325∼adrenal gland development 3 0.01

Annotation clusters with an enrichment score of ≥2 are shown. animal use protocols approved by the Kunming Institute of Zoology Animal To infer admixture events, we applied the D-statistic using the qpDstat Care and Ethics Committee. module in the ADMIXtools package (78). We also adopted a tree-based Total genomic DNA was extracted from liver, muscle, toe clips, or tadpole approach, which was implemented in TreeMix (79), to verify the existence samples using the phenol/chloroform method (67). For each individual, 1–3 μg of gene flow through modeling the migration values set from 0 to 4 with a of DNA was sheared into fragments of 300–800 bp using the Covaris system. block of 5,000 SNPs. DNA fragments were processed and sequenced using Illumina paired-end se- quencing technology (Illumina, Inc.). The 45 nonhybrid individuals and the Demographic Analysis Using G-PhoCS and PSMC. G-PhoCS (46) was employed EVOLUTION outgroup taxa were sequenced to a target depth of 15×,andthe18hybrid to infer the complete demographic history for N. parkeri, including population individuals were sequenced to target depth of 5×. The raw sequence data from divergence times, ancestral population size, and migration rates based on this study have been submitted to the Genome Sequence Archive (gsa.big.ac.cn/) 1,000 neutral loci (80). The parameters were inferred in a Bayesian manner using under accession nos. CRA000919 (N. parkeri) and CRA000918 (N. pleskei). Markov Chain Monte Carlo to jointly sample model parameters and genealogies of the input loci (46). Migration scenarios were added by combining results of D- SNP Calling and Filtering. Raw sequence reads of each individual were statistic tests, Frappe, and TreeMix because G-PhoCS often have limited ability to mapped to the Tibetan frog reference genome (45) using BWA-ALN (v.0.7.4) (68) characterize complex migration scenarios (81). Additionally, two postdivergence with default parameters. SAMtools (v.0.1.18) was used for sorting and removing migration bands were added to test if gene flow occurred between W and E since PCR duplicates (69). To minimize false-positive SNP calls around indels, local they separated. Each Markov chain was run for 2,000,000 generations while realignment around indels was performed using the Genome Analysis Tool Kit sampling parameter values every 20th iteration. Burn-in and convergence of each (v.2.6-5) (70). Raw SNPs were extracted using SAMtools on the locally realigned run were determined with TRACER 1.5 (82). More information about the control BAM files with the command “samtools mpileup -q 20 -Q 20 -C 50 -uDEf.” file of G-PhoCS is provided in SI Appendix, section 1.2. Divergence times in units of To obtain high-quality genotype calls for downstream analyses, we kept years, effective population sizes, and migration rates were calibrated by the es- SNPs that met the following criteria: (i) sites were at least 5 bp away from a timates of generation time and neutral mutation rate from previous studies (45, predicted insertion/deletion, (ii) the consensus quality was ≥40, (iii) sites did 83). We repeated the G-PhoCS analysis with four separate runs to obtain reliable not have triallelic alleles and indels, (iv) the depth ranged from 2.5 to 97.5% and stable estimates for the demographic parameters. To validate the G-PhoCS in depth quartile, (v) SNPs had minor allele frequencies ≥ 0.01, and (vi) SNPs inferences and test if the differential gene flow to W was correctly identified, we occurred in more than 95% of high-coverage individuals (nonhybrids) and used ms (84) to perform simulations under different migration scenarios between 75% of medium-coverage individuals (hybrids). The filtered data were WandE(SI Appendix, section 1.3). We used the inferred demographic parameters phased using Beagle v.3.3.2 (71). to produce 50 Mbp of sequences and compared the observed patterns of dif- ferentiation for W–E1 with those under different migration scenarios. Population Structure, Phylogenetic Inference, and Admixture Analyses. We The trajectory of demographic histories for the five populations of Tibetan used PCA and population structure analysis to evaluate the genetic struc- frogs was inferred by the PSMC model (85). Because PSMC has high false-negative turing of frogs. SNPs in scaffolds longer than 500 kb were extracted; they rates at low sequence coverage, we restricted this analysis to the individual in occupied about 76% of the entire genome. PCA was performed using the each group with highest coverage (≥15×). In addition, a correction factor (-N) package GCTA (v.1.24.2) (72). The genomic ancestry of each individual was was invoked to correct for the false-negative rate caused by the shallow se- inferred using Frappe (v.1.1) (73). To avoid the effect of linkage disequilib- quence depth (80). The PSMC analysis was set as the following parameters: -N25 rium, we selected one SNP for each interval of 50 kb. The postulated number -t15 -r5 -b -p “4+25 * 2+4+6”. A bootstrapping approach with 100 replicates was of ancestral clusters (K) was set from two to five, and the maximum number performed to assess the variation in the inferred Ne trajectories. A generation of expectation-maximization iterations was set to 10,000. time of 5 y and a neutral mutation rate of 0.776e-09 per site per year were used Phylogenetic relationships were inferred via both coalescent and con- to convert the population sizes and scaled time into real sizes and time (45, 83). catenation methods. To minimize the effects of potential alignment errors The program fastsimcoal2 (86) was used to estimate the extent of pop- and regions with strong natural selection, we performed these analyses on ulation expansion within 30,000 y based on the model of G-PhoCS. Fifty time putatively neutral genomic regions by filtering out the positions with re- simulations were performed, and the result with the highest likelihood was peat sequences, exons, and the 10 kb flanking them on each side. We kept. For each run, demographic estimates were obtained from 100,000 randomly selected 1,000 neutral loci with a window size of 100 kb from the simulations (-n 100,000) and 40 expectation/conditional maximization cycles intergenic region. First, we reconstructed individual gene trees for each (-L 40) per parameter file. window based on the maximum likelihood (ML) approach using RAxML (v.8.1.15) (74). Support values for each node were inferred using 100 rapid Inference of Mitochondrial Phylogeny and Hybrid Ancestry Assignment. We used bootstrap replicates based on the GTRGAMMA model. Second, for gene BWA-ALN (v.0.7.4) to map all raw sequence reads from each individual to the tree-based coalescent analysis, species trees were generated using MP-EST previously assembled complete mitochondrial genomes of N. pleskei (87), the (75) and STAR (76, 77) and using population as the units of the tree tips. Third, closest relative of our Tibetan frog. SNPs were identified and filtered based on for the concatenation analysis, ML trees were constructed based on the the same nuclear genome. The matrilineal (mitochondrial) genealogy was GTRGAMMA model using the concatenated sequences from the same set. inferred using the ML method based on the GTRGAMMA model implemented in

Wang et al. PNAS Latest Articles | 7of10 Fig. 7. Morphological and physiological changes associated with elevation. (A) Bar plots of the differences in numbers of granular glands in the middorsal skin between low-elevation (2,968 m, E1) and high-elevation (4,859 m, E4) populations (two-tailed test: P < 0.05). (B) Hb levels (grams per deciliter) at different el- evations. Red lines show the best-fit regression line based on a third-order polynomial equation. The 95% confidence interval is shown in gray. (C)TOC(μmol/L) in low (E1) and high (E4) elevations. (D) Expression level of DNAJC8 in the brain, TNNC1 and ADORA1 in the heart, and LAMB3 in the lung from low-elevation (E1) and high-elevation (E4) populations of E, respectively. Eight replicates were performed for each group. Statistically significant differences in differential expression are indicated by asterisk(s) (two-tailed t test: *P < 0.05; **P < 0.01).

RAxML (v.8.1.15) (74). PCAdmix (88) was used to identify the ancestries in each of The length of the branch leading to E1 since the divergence from the the 18 hybrid individuals. We used W and E4 as the ancestral populations be- remaining subpopulations of E clades was then estimated as follows: cause they were the geographically closest populations and showed the + + − − highest signal of admixture in the f3 test. To prevent high-linkage blocks 2TðE1, E2Þ TðE1, E3Þ TðE1, E4Þ TðE2, E3Þ TðE2, E4Þ PBSE1 = . from having excessive influence on the inferred ancestry of a region, SNPs 4 were thinned with r2 > 0.80 in all ancestral and admixed groups. PCA was Calculations for branch lengths leading to subpopulations E2, E3, and performed using a window of 40 SNPs. A default calling threshold of 0.9 was E4 were similar to the equation for E1. HDRs were defined as the upper used as the criterion to assign ancestry. However, systematic errors, such as 2.5% of each PBS distribution. To avoid the potential assembly or map- incomplete lineage sorting and estimation errors, may have contributed to ping errors at the ends of scaffolds, we further removed high divergence similar introgression signals in the PCAdmix analysis. To estimate these, we reanalyzed the PCAdmix analysis using non-E4 individuals as hybrids. peaks less than two consecutive windows in both ends of scaffolds in both sides. Inference HDRs. Genomic differentiation was calculated based on the PBS df π under the topological transformation described previously (89). The PBS Characterization of HDRs in Terms of dxy, , , and DAF. In addition to FST and value estimates the amount of sequence change along a population branch PBS, we applied several population genomic parameters to quantify and since its divergence from other branches of a population tree. First, pairwise compare the outlier windows with the background genome, including the following: dxy between populations (93), df between populations (19), FST statistics were calculated using Weir and Cockerham’s method (90), implemented in VCFtools v0.1.11 (91), with nonoverlapping 50-kb genomic within-population π level (19, 23, 91), DAF, and population-scaled re- ρ windows. Negative values of FST were treated as 0. Windows less 20 kb in combination rates ( ). We treated N. pleskei as the outgroup when comparing length were excluded for further analysis. A log-transformation FST was used differences in DAF between W and E, and treated three nonadmixture W in- for the PBS calculation (92). The length of the branch leading to W since the dividuals (Fig. 2) as the outgroup when comparing DAF differences among divergence from all subpopulations of E was estimated as follows: subpopulations of E.

3Tð Þ + Tð Þ + Tð Þ + Tð Þ − Tð Þ − Tð Þ − Tð Þ PBS = West, E1 West, E2 West, E3 West, E4 E1, E2 E1, E3 E1, E4 . West 6

8of10 | www.pnas.org/cgi/doi/10.1073/pnas.1716257115 Wang et al. GO Enrichment Analysis. GO enrichment analysis was performed using RNA Isolation and Reverse Transcriptase PCR Assay. Total RNAs from pop- DAVID (Database for Annotation Visualization and Integrated Discovery) ulations at low (, 2,968 m) and high (Milashankou, 4,859 m) elevations (94). GO terms with less than two genes were excluded from further of E were extracted using the TRIzol total RNA extract kit (Tiangen). Reverse < analysis. GO terms with a P value 0.05 were considered to be significantly transcription was carried out using the Fermentas RevertAid First-Strand cDNA enriched. synthesis kit (Fermantas) to prepare templates for real-time quantitative PCR. The primers of candidate genes used for real-time PCR are displayed in SI Phenotype Data from Physiological and Morphological Detection. Hb levels and Appendix,TableS14. Actin, Beta (ACTB) was used as a loading control. tissue oxygen content (TOC) were collected in the field during June and July 2015. Measurements were taken in five communities in E along an ele- ACKNOWLEDGMENTS. We thank Michael W. Nachman (University of vation gradient: Nyingchi (29.61°N, 94.36°E, 2,968 m above sea level), California, Berkeley), Weiwei Zhai (Genome Institute of Singapore), and (29.67°N, 90.88°E, 3,671 m above sea level), Zhaxigang (29.75°N, two reviewers for helpful comments and suggestions. We also thank Na Lin 91.95°E, 4,011 m above sea level), Riduo (29.70°N, 92.23°E, 4,368 m above of Anhui University for assistance in file preparation. This work was sup- sea level), and Milashankou (29.80°N, 92.34°E, 4,859 m above sea level). At ported by the Strategic Priority Research Program (B) (Grant XDB13020200 least 10 individuals were measured for each community. Hb level in peripheral to J.C.) of the Chinese Academy of Sciences (CAS), National Natural Science blood was measured from 53 adult frogs, and muscular oxygen content was Foundation of China (Grants 91431105 and 31622052 to J.C.), and the Animal measured from 69 adult frogs. Hb was determined using Mission Plus Hb, Branch of the Germplasm Bank of Wild Species of CAS (Large Research In- immediately after drawing blood from the heart ventricle. TOC was de- frastructure Funding) (J.C.). W.-W.Z. was supported by Strategic Priority Research termined via a fiber optic cable (PreSens Precision Sensing GmbH). We also Program (B) (Grant XDB03030107) of CAS and the National Key Research made histological sections of the middorsal skin of seven frogs from the and Development Program of China (Grant 2017YFC0505202). G.-D.W. was communities of Nyingchi and Milashankou, which represented populations supported by the 13th Five-year Informatization Plan of CAS (Grant No. from E at low and high elevations, respectively. The tissues were fixed in XXH13503-05). J.C. and G.-D.W. are supported by the Youth Innovation Pro- Heidenhain’s Susa fixative to make paraffin sections (5 μm thick). motion Association, CAS.

1. Seehausen O, et al. (2014) Genomics and the origin of species. Nat Rev Genet 15: 31. Storz JF, Scott GR, Cheviron ZA (2010) Phenotypic plasticity and genetic adaptation to 176–192. high-altitude hypoxia in vertebrates. J Exp Biol 213:4125–4136. 2. Coyne J, Orr H (2004) Speciation (Sinauer, Sunderland, MA). 32. Favre A, et al. (2015) The role of the uplift of the Qinghai-Tibetan Plateau for the 3. Mayr E (1999) Systematics and the Origin of Species from the Viewpoint of a Zoologist evolution of Tibetan biotas. Biol Rev Camb Philos Soc 90:236–253. (Harvard Univ Press, Cambridge, MA). 33. Qiu Q, et al. (2012) The yak genome and adaptation to life at high altitude. Nat Genet 4. Dobzhansky T (1937) Genetics and the Origin of Species (Columbia Univ Press, New York). 44:946–949. 5. Hewitt GM (2004) Genetic consequences of climatic oscillations in the Quaternary. 34. Qu Y, et al. (2013) Ground tit genome reveals avian adaptation to living at high al- EVOLUTION Philos Trans R Soc Lond B Biol Sci 359:183–195, discussion 195. titudes in the Tibetan plateau. Nat Commun 4:2071. 6. Avise JC (2001) Phylogeography: The History and Formation of Species (Harvard Univ 35. Simonson TS, et al. (2010) Genetic evidence for high-altitude adaptation in . Press, Cambridge, MA). Science 329:72–75. 7. Funk DJ (1998) Isolating a role for natural selection in speciation: Host adaptation and 36. Wang D (2003) The geography of aquatic vascular plants of Qinghai Xizang (Tibet) sexual isolation in Neochlamisus bebbianae leaf beetles. Evolution 52:1744–1759. Plateau. PhD dissertation (Wuhan University, Wuhan, China). 8. Schluter D (2000) The Ecology of Adaptive Radiation (Oxford Univ Press, Oxford). 37. Norsang G, Kocbach L, Stamnes J, Tsoja W, Pincuo N (2011) Spatial distribution and 9. Nosil P (2012) Ecological Speciation (Oxford Univ Press, Oxford). temporal variation of solar UV radiation over the Tibetan Plateau. Appl Phys Rev 3:37. 10. Turelli M, Barton NH, Coyne JA (2001) Theory and speciation. Trends Ecol Evol 16:330–343. 38. Li SC, et al. (2007) Change of annual precipitation over Qinghai-Xizang Plateau and 11. Nosil P, Feder JL (2012) Genomic divergence during speciation: Causes and conse- sub-regions in recent 34 Years. J Desert Res 27:307–314. quences. Philos Trans R Soc Lond B Biol Sci 367:332–342. 39. Ma X, Lu X (2010) Annual cycle of reproductive organs in a Tibetan frog, Nanorana 12. Feder JL, Egan SP, Nosil P (2012) The genomics of speciation-with-gene-flow. Trends parkeri. Anim Biol Leiden Neth 60:259–271. Genet 28:342–350. 40. Zhou W, et al. (2013) River islands, refugia and genetic structuring in the endemic 13. Feder JL, Flaxman SM, Egan SP, Comeault AA, Nosil P (2013) Geographic mode of brown frog Rana kukunoris (Anura, Ranidae) of the Qinghai-Tibetan Plateau. Mol speciation and genomic divergence. Annu Rev Ecol Evol Syst 44:73–97. Ecol 22:130–142. 14. Nosil P (2008) Speciation with gene flow could be common. Mol Ecol 17:2103–2106. 41. Kunming Institute of Zoology (CAS) (2018) AmphibiaChina. The database of Chinese 15. Pinho C, Hey J (2010) Divergence with gene flow: Models and data. Annu Rev Ecol amphibians. Available at www.amphibiachina.org/. Accessed May 5, 2018. Evol Syst 41:215–230. 42. Zhou WW, et al. (2014) DNA barcodes and species distribution models evaluate 16. Mayr E (1966) Animal Species and Evolution (Belknap Press of Harvard Univ Press, threats of global climate changes to genetic diversity: A case study from Nanorana Cambridge, MA). parkeri (Anura: Dicroglossidae). PLoS One 9:e103899. 17. Nolte AW, Tautz D (2010) Understanding the onset of hybrid speciation. Trends Genet 43. Liu J, et al. (2015) Phylogeography of Nanorana parkeri (Anura: Ranidae) and mul- 26:54–58. tiple refugia on the Tibetan Plateau revealed by mitochondrial and nuclear DNA. Sci 18. Nosil P, Funk DJ, Ortiz-Barrientos D (2009) Divergent selection and heterogeneous Rep 5:9857. genomic divergence. Mol Ecol 18:375–402. 44. Fei L, Ye CY (2017) Amphibians of China: I (Science Press, Beijing). 19. Ellegren H, et al. (2012) The genomic landscape of species divergence in Ficedula 45. Sun YB, et al. (2015) Whole-genome sequence of the Tibetan frog Nanorana parkeri flycatchers. Nature 491:756–760. and the comparative evolution of tetrapod genomes. Proc Natl Acad Sci USA 112: 20. Poelstra JW, et al. (2014) The genomic landscape underlying phenotypic integrity in E1257–E1262. the face of gene flow in crows. Science 344:1410–1414. 46. Gronau I, Hubisz MJ, Gulko B, Danko CG, Siepel A (2011) Bayesian inference of an- 21. Burri R, et al. (2015) Linked selection and recombination rate variation drive the cient human demography from individual genome sequences. Nat Genet 43: evolution of the genomic landscape of differentiation across the speciation contin- 1031–1034. uum of Ficedula flycatchers. Genome Res 25:1656–1665. 47. Zheng B, Xu Q, Shen Y (2002) The relationship between climate change and Qua- 22. Lamichhaney S, et al. (2015) Evolution of Darwin’s finches and their beaks revealed by ternary glacial cycles on the Qinghai-Tibetan Plateau: Review and speculation. Quat genome sequencing. Nature 518:371–375. Int 35643:93–101. 23. Malinsky M, et al. (2015) Genomic islands of speciation separate cichlid ecomorphs in 48. Rahmani Z, Maunoury C, Siddiqui A (1998) Isolation of a novel human voltage- an East African crater lake. Science 350:1493–1498. dependent anion channel gene. Eur J Hum Genet 6:337–340. 24. Jones FC, et al.; Broad Institute Genome Sequencing Platform & Whole Genome As- 49. McDonald JH, Kreitman M (1991) Adaptive protein evolution at the Adh locus in sembly Team (2012) The genomic basis of adaptive evolution in threespine stickle- Drosophila. Nature 351:652–654. backs. Nature 484:55–61. 50. Xu X, Lai R (2015) The chemistry and biological activities of peptides from amphibian 25. Phifer-Rixey M, Bomhoff M, Nachman MW (2014) Genome-wide patterns of differ- skin secretions. Chem Rev 115:1760–1846. entiation among house mouse subspecies. Genetics 198:283–297. 51. Hershberger RE, et al. (2010) Coding sequence rare variants identified in MYBPC3, 26. Carneiro M, et al. (2014) The genomic architecture of population divergence between MYH6, TPM1, TNNC1, and TNNI3 from 312 patients with familial or idiopathic dilated subspecies of the European rabbit. PLoS Genet 10:e1003519. cardiomyopathy. Circ Cardiovasc Genet 3:155–161. 27. Campbell-Staton SC, et al. (2017) Winter storms drive rapid phenotypic, regulatory, 52. Jenner TL, Rose’Meyer RB (2006) Loss of vascular adenosine A1 receptors with age in and genomic shifts in the green anole lizard. Science 357:495–498. the rat heart. Vascul Pharmacol 45:341–349. 28. Bragg JG, et al. (2018) Phylogenomics of a rapid radiation: The Australian rainbow 53. Kaindl AM, et al. (2008) Erythropoietin protects the developing brain from hyperoxia- skinks. BMC Evol Biol 18:15. induced cell death and proteome changes. Ann Neurol 64:523–534. 29. Cheviron ZA, Brumfield RT (2009) Migration-selection balance and local adaptation of 54. Kokkonen N, et al. (2007) Hypoxia upregulates carcinoembryonic antigen expression mitochondrial haplotypes in rufous-collared sparrows (Zonotrichia capensis) along an in cancer cells. Int J Cancer 121:2443–2450. elevational gradient. Evolution 63:1593–1605. 55. Wyckoff GJ, Wang W, Wu CI (2000) Rapid evolution of male reproductive genes in the 30. Cheviron ZA, Connaty AD, McClelland GB, Storz JF (2014) Functional genomics of descent of man. Nature 403:304–309. adaptation to hypoxic cold-stress in high-altitude deer mice: Transcriptomic plasticity 56. Ting CT, Tsaur SC, Wu ML, Wu CI (1998) A rapidly evolving homeobox at the site of a and thermogenic performance. Evolution 68:48–62. hybrid sterility gene. Science 282:1501–1504.

Wang et al. PNAS Latest Articles | 9of10 57. Malone JH, Michalak P (2008) Physiological sex predicts hybrid sterility regardless of 76. Liu L, Yu L, Pearl DK, Edwards SV (2009) Estimating species phylogenies using co- genotype. Science 319:59. alescence times among sequences. Syst Biol 58:468–477. 58. Toews DP, Brelsford A (2012) The biogeography of mitochondrial and nuclear dis- 77. Shaw TI, Ruan Z, Glenn TC, Liu L (2013) STRAW: Species TRee analysis web server. cordance in animals. Mol Ecol 21:3907–3930. Nucleic Acids Res 41:W238–W241. 59. Pool JE, Nielsen R (2009) Inference of historical changes in migration rate from the 78. Patterson N, et al. (2012) Ancient admixture in human history. Genetics 192: lengths of migrant tracts. Genetics 181:711–719. 1065–1093. 60. Harris K, Nielsen R (2013) Inferring demographic history from a spectrum of shared 79. Pickrell JK, Pritchard JK (2012) Inference of population splits and mixtures from haplotype lengths. PLoS Genet 9:e1003521. genome-wide allele frequency data. PLoS Genet 8:e1002967. 61. Irwin DE (2012) Local adaptation along smooth ecological gradients causes phylo- 80. Wang GD, et al. (2016) Out of southern East Asia: The natural history of domestic – geographic breaks and phenotypic clustering. Am Nat 180:35–49. dogs across the world. Cell Res 26:21 33. 62. Currat M, Ruedi M, Petit RJ, Excoffier L (2008) The hidden side of invasions: Massive 81. Freedman AH, et al. (2014) Genome sequencing highlights the dynamic early history introgression by local genes. Evolution 62:1908–1920. of dogs. PLoS Genet 10:e1004016. 63. Chan KM, Levin SA (2005) Leaky prezygotic isolation and porous genomes: Rapid 82. Rambaut A, Drummond AJ (2007) Tracer 1.5. Available at tree.bio.ed.ac.uk/software/ introgression of maternally inherited DNA. Evolution 59:720–729. tracer/. Accessed July 3, 2011. 64. Colliard C, et al. (2010) Strong reproductive barriers in a narrow hybrid zone of West- 83. Ma X, Lu X (2009) Sexual size dimorphism in relation to age and growth based on skeletochronological analysis in a Tibetan frog. Amphibia-Reptilia 30:351–359. Mediterranean green toads (Bufo viridis subgroup) with Plio-Pleistocene divergence. 84. Hudson RR (2002) Generating samples under a Wright-Fisher neutral model of ge- BMC Evol Biol 10:232. netic variation. Bioinformatics 18:337–338. 65. Storz JF (2005) Using genome scans of DNA polymorphism to infer adaptive pop- 85. Li H, Durbin R (2011) Inference of human population history from individual whole- ulation divergence. Mol Ecol 14:671–688. genome sequences. Nature 475:493–496. 66. Che J, et al. (2010) Spiny frogs (Paini) illuminate the history of the Himalayan region 86. Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust de- and Southeast Asia. Proc Natl Acad Sci USA 107:13765–13770. mographic inference from genomic and SNP data. PLoS Genet 9:e1003905. 67. Sambrock J, Russel DW (2012) Molecular Cloning: A Laboratory Manual (Cold Spring 87. Chen G, Wang B, Liu J, Xie F, Jiang J (2011) Complete mitochondrial genome of Harbor Laboratory Press, Plainview, NY). Nanorana pleskei (Amphibia: Anura: Dicroglossidae) and evolutionary characteristics. 68. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler Curr Zool 57:785–805. transform. Bioinformatics 25:1754–1760. 88. Brisbin A, et al. (2012) PCAdmix: Principal components-based assignment of ancestry 69. Li H, et al.; 1000 Genome Project Data Processing Subgroup (2009) The sequence along each chromosome in individuals with admixed ancestry from two or more – alignment/map format and SAMtools. Bioinformatics 25:2078 2079. populations. Hum Biol 84:343–364. 70. DePristo MA, et al. (2011) A framework for variation discovery and genotyping using 89. Lachance J, Tishkoff SA (2014) Biased gene conversion skews allele frequencies in – next-generation DNA sequencing data. Nat Genet 43:491 498. human populations, increasing the disease burden of recessive alleles. Am J Hum 71. Browning SR, Browning BL (2007) Rapid and accurate haplotype phasing and missing- Genet 95:408–420. data inference for whole-genome association studies by use of localized haplotype 90. Weir BS, Cockerham CC (1984) Estimating F‐statistics for the analysis of population clustering. Am J Hum Genet 81:1084–1097. structure. Evolution 38:1358–1370. 72. Yang J, Lee SH, Goddard ME, Visscher PM (2011) GCTA: A tool for genome-wide 91. Danecek P, et al.; 1000 Genomes Project Analysis Group (2011) The variant call format complex trait analysis. Am J Hum Genet 88:76–82. and VCFtools. Bioinformatics 27:2156–2158. 73. Tang H, Peng J, Wang P, Risch NJ (2005) Estimation of individual admixture: Analytical 92. Yi X, et al. (2010) Sequencing of 50 human exomes reveals adaptation to high alti- and study design considerations. Genet Epidemiol 28:289–301. tude. Science 329:75–78. 74. Stamatakis A (2014) RAxML version 8: A tool for phylogenetic analysis and post- 93. Ai H, et al. (2015) Adaptation and possible ancient interspecies introgression in pigs analysis of large phylogenies. Bioinformatics 30:1312–1313. identified by whole-genome sequencing. Nat Genet 47:217–225. 75. Liu L, Yu L, Edwards SV (2010) A maximum pseudo-likelihood approach for estimating 94. Huang W, Sherman BT, Lempicki RA (2009) Systematic and integrative analysis of species trees under the coalescent model. BMC Evol Biol 10:302. large gene lists using DAVID bioinformatics resources. Nat Protoc 4:44–57.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1716257115 Wang et al.

Supplementary Information for

Selection and environmental adaptation along a path to speciation in the Tibetan frog Nanorana parkeri

Guo-Dong Wanga,b,1, Bao-Lin Zhanga,c,1, Wei-Wei Zhoua,d, Yong-Xin Lia,c, Jie-Qiong Jina, Yong Shaoa, He-chuan Yange, Yan-Hu Liuf, Fang Yana, Hong-Man Chena, Li Jing, Feng Gaoh, Yaoguang Zhangg, Haipeng Lih,b, Bingyu Maoa,b, Robert W. Murphya,i, David B. Wakej,2, Ya-Ping Zhanga,b,2, and Jing Chea,b,d,2

aState Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China; bCenter for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming 650223, China; cKunming College of Life Science, University of Chinese Academy of Sciences, Kunming 650204, Yunnan; China; dSoutheast Asia Biodiversity Research Institute, Chinese Academy of Sciences, Yezin, 05282 Nay Pyi Taw, Myanmar; eHuman Genetics, Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore 138672, Singapore; fLaboratory for Conservation and Utilization of Bio-Resources & Key Laboratory for Microbial Resources of the Ministry of Education, Yunnan University, Kunming 650091, China; gKey Laboratory of Freshwater Fish Reproduction and Development of the Ministry of Education and Key Laboratory of Aquatic Science of Chongqing, Southwest University School of Life Sciences, Chongqing 400715, China; hCAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China;

www.pnas.org/cgi/doi/10.1073/pnas.1716257115 iCentre for Biodiversity and Conservation Biology, Royal Ontario Museum, Toronto, ON, Canada M5S 2C6; jDepartment of Integrative Biology and Museum of Vertebrate Zoology, University of California, Berkeley, CA 94720-3160, USA.

1G.-D.W. and B.-L.Z. contributed equally to this work. 2To whom correspondence may be addressed. Email: [email protected], [email protected], or [email protected].

This PDF file includes:

Supplementary text Figs. S1 to S18 Tables S1 to S15 References for SI reference citations

Supplementary Information Text

1.1 Estimation of population size change after Last Glacial Maximum (LGM,

~20,000 years ago) and coalescent simulations

The PSMC approach indicated that all five populations of Tibet frog expanded recently (See SI Appendix, fig. 4). Because this method lacks power more recently than ~20,000 years (1), and the G-PhoCS approach cannot simulate change in Ne for a single population (2), we used fastsimcoal2 (3) to estimate Ne expansion within the last 30,000 years. The divergence time, migration rate, and ancestral Ne in fastsimcoal2 30,000 years ago were set by demographic parameters inferred from

G-PhoCS. The beginning time of recent population expansion was set to 8,000 years

~ 30,000 years (1,600 ~ 6,000 generations) based on the result of PSMC. The current

Ne and 95% credibility intervals were inferred by G-PhoCS, with Ne before expansion. A relatively large expansion range (100 times of beginning Ne) was allowed for each population. Simulations were performed 50 times and the best-fit of results was chosen based on the maximum likelihood value. For each run, demographic estimates were obtained from 100,000 simulations (-n 100,000) and 40

Expectation/Conditional Maximization cycles (-L40) per parameter file.

In the best result of fastsimcoal2, five populations had undergone various degrees of expansion from 14,000 years to 29,000 years. The Ne of E1, E2 and E3 expanded 1.67 to 3.00 fold, and those of E4 expanded 15.19 times and W expanded 67.32 fold, respectively (See SI Appendix, table S15 and Fig. S16). Based on the expansion model, we performed a neutral coalescent simulation and again calculated the PBSW, finding the distribution of the PBS value similar to that obtained with the G-PhoCS (See SI Appendix, fig. Fig. S17), the values for average and median PBSw under the expansion model were 0.4017 and 0.3879, respectively, slightly smaller than those under the G-PhoCS model (mean: 0.4202, median: 0.4046). The value of the Fst tends to be slightly smaller when the population has undergone expansion, because of the effect of genetic drift (4). Using the new model assuming recent population expansion, we re-calculated all population genomic parameters as in the text. Further statistical analysis produced results identical to those obtained earlier (See SI Appendix, fig.

S18). These results indicate that our inference from HDRs based on cut-off values of the upper 2.5% PBS is reliable and robust to recent population expansion.

1.2 Control file of G-PhoCS.

GENERAL-INFO-START

seq-file nutral_1k_1000.txt trace-file nutral_1k_1000.log locus-mut-rate VAR 10.0

burn-in 0 mcmc-iterations 2000000 mcmc-sample-skip 19 start-mig 0 iterations-per-log 20 logs-per-line 20

find-finetunes TRUE find-finetunes-num-steps 2500 # finetune-coal-time 0.01 # finetune-mig-time 0.3 # finetune-theta 0.04 # finetune-mig-rate 0.02 # finetune-tau 0.0000008 # finetune-mixing 0.003 # finetune-locus-rate 0.3

tau-theta-print 1000.0 # tau-theta-alpha 1.0 # for STD/mean ratio of 100% # tau-theta-beta 10000.0 # for mean of 1e-4

mig-rate-print 1.0 mig-rate-alpha 4.0 mig-rate-beta 0.002

GENERAL-INFO-END

CURRENT-POPS-START

POP-START name GRP1 samples KIZ05992 d KIZ08194 d KIZ08133 d theta-alpha 2.0 theta-beta 2000 POP-END

POP-START name GRP2 samples KIZYPX6143 d KIZ05613 d KIZ010938 d theta-alpha 2.0 theta-beta 2000 POP-END

POP-START name GRP3 samples KIZ08359 d KIZ08398 d KIZ08351 d theta-alpha 2.0 theta-beta 2000 POP-END

POP-START name GRP4 samples KIZ012631 d KIZ012640 d KIZ02247 d theta-alpha 2.0 theta-beta 2000 POP-END

POP-START name GRP5 samples KIZ08106 d KIZ08116 d KIZYPX31094 d theta-alpha 2.0 theta-beta 2000 POP-END

CURRENT-POPS-END

ANCESTRAL-POPS-START

POP-START name GRP45 children GRP4 GRP5 theta-alpha 2.0 theta-beta 2000 tau-alpha 0.1 tau-beta 100000 POP-END

POP-START name GRP345 children GRP3 GRP45 theta-alpha 2.0 theta-beta 2000 tau-alpha 0.1 tau-beta 1000 POP-END

POP-START name GRP2345 children GRP2 GRP345 theta-alpha 2.0 theta-beta 2000 tau-alpha 0.3 tau-beta 1000 POP-END

POP-START name GRP12345 children GRP1 GRP2345 theta-alpha 2.0 theta-beta 2000 tau-alpha 1.0 tau-beta 1000 POP-END

ANCESTRAL-POPS-END

MIG-BANDS-START

BAND-START source GRP4 target GRP5 BAND-END

BAND-START source GRP2 target GRP4 BAND-END

BAND-START source GRP3 target GRP5 BAND-END

BAND-START source GRP3 target GRP4 BAND-END

BAND-START source GRP5 target GRP4 BAND-END

BAND-START source GRP4 target GRP2 BAND-END

BAND-START source GRP5 target GRP3 BAND-END

BAND-START source GRP5 target GRP1 BAND-END

BAND-START source GRP4 target GRP3 BAND-END

BAND-START source GRP345 target GRP1 BAND-END

BAND-START source GRP2345 target GRP1 BAND-END

BAND-START source GRP1 target GRP5 BAND-END

BAND-START source GRP1 target GRP345 BAND-END

BAND-START source GRP1 target GRP2345 BAND-END

MIG-BANDS-END

1.3 Simulation command lines under a range of possible demographic models. 1.3.1 The command of coalescent simulation under the full set of demographic parameters derived from G-PhoCS analyses. ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 0.236312518496597 -n 2 0.365788694880142 -n 3 0.514205386208938 -n 4 0.346552234388872 -n 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 1.05602089372 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0.47844768034 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1 0.315329979283812 \ -em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

1.3.2 Simulation without the post-divergence migration E234 to W. ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 0.236312518496597 -n 2 0.365788694880142 -n 3 0.514205386208938 -n 4 0.346552234388872 -n 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 1.05602089372 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1 0.315329979283812 \ -em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

1.3.3 Simulation without current and post-divergence migration from E to W. ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 0.236312518496597 -n 2 0.365788694880142 -n 3 0.514205386208938 -n 4 0.346552234388872 -n 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 0 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1 0.315329979283812 \ -em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

1.3.4 Simulation with recent population expansion after Last Glacial Maximum (LGM,

~20,000 years ago)(Figure S16). ms 60 50000000 -t 0.0006758 -I 5 12 12 12 12 12 \ -n 1 3.544687777 -n 2 0.73157739 -n 3 1.028410772 -n 4 1.039656703 -n 5 67 \ -en 0.022965374 1 0.236312518496597 -en 0.022965374 2 0.365788694880142 -en 0.022965374 3 0.514205386208938 -en 0.022965374 4 0.346552234388872 -en 0.022965374 5 1 \ -m 2 1 1.2480958236 -m 1 2 1.048192832 \ -m 4 2 9.07224709448 -m 2 4 1.85090949118 \ -m 3 1 1.3407885516 -m 1 3 0.8220241976 \ -m 5 1 1.05602089372 -m 1 5 0 \ -m 3 2 0.91914267222 -m 2 3 0.88110412036 \ -ej 0.0451775673276117 2 1 -en 0.0451775673276117 1 0.657295057709381 \ -ej 0.052681266646937 3 1 -en 0.052681266646937 1 0.971589227582125 \ -em 0.052681266646937 1 5 0 -em 0.052681266646937 5 1 0.47844768034 \ -ej 2.0864160994377 4 1 -en 2.0864160994377 1 0.315329979283812 \ -em 2.0864160994377 1 5 0 -em 2.0864160994377 5 1 0 \ -ej 8.81473808819177 5 1 -en 8.81473808819177 1 0.714116602545132

Fig. S1. Species-tree inferred using the maximum pseudo-likelihood coalescent method (MP-EST). Numbers beside nodes are bootstrap support values.

Fig. S2. Species-tree inferred using average ranks of coalescence (STAR). Numbers beside nodes are bootstrap support values.

Fig. S3. Tree topology inferred from TreeMix when 1 migratory tract is allowed.

Fig. S4. Trajectories of demographic history inferred by pairwise sequential Markovian coalescent (PSMC). Because PSMC yields high rates of false-negatives at low sequence coverage, only individuals with high coverage (≥15X) genome sequences are included. Samples of E4 from Lhasa River valley (LRV) and outside of it that have the highest sequence coverage are indicated. The early stage of Last Glacial Maximum (MIS 4, 30~70 Ka), the Penultimate glaciation (MIS6, 100~200 Ka) and the Naynayxungla glaciation (MIS 14~18, 500 ~720 Ka) are shaded in light blue.

Fig. S5. Genomic divergence between W and groups of E of Nanorana parkeri. (A). Venn diagram shows overlapping regions of high differentiation (top 2.5% FST distribution) between W and E1–4, which are highly correlated (P<10-5, Monte Carlo simulation). (B). Distribution of divergence between W and E1–4 measured by the density of fixed differences per bp for 50kb windows across the genome.

Fig. S6. Distribution of the sizees of highly divergence regions in Nanorana parkeri.

Fig. S7. Relationship between FST (W–E1) and dxy (W–E1) for Nanorana parkeri (Pearson's correlation R = 0.4225, p < 2.2e-16).

Fig. S8. Principle component analysis of all individuals of Nanorana parkeri, including samples from the hybrid regions.

Fig. S9. Population structure plots for Nanorana parkeri with K=2–6 for all individuals.

Fig. S10. Matrilineal genealogy (mitochondrial haplotypes) as depicted by a maximum likelihood analysis based on the SNPs. ML bootstrap <75 are not listed.

Fig. S11. Local ancestry assignment for individual genomes of Nanorana parkeri. (A) Estimated proportions of ancestry inferred from PCAdmix for 18 hybrid genomes. Green, W; red, E4; black, undetermined ancestry. Analyses used a calling-threshold of 0.9. (B) Assigned ancestry along the genome. All scaffolds were concatenated into a super-scaffold. Only one individual from each hybrid population presented. BC (KIZYPX30750), GC (KIZYPX39973), KC (KIZYPX30668).

Fig. S12. Venn diagram depicting genomic divergence between groups E1 and E2–4 of Nanorana parkeri. (A). Overlapping regions of high differentiation (top 2.5% FST distribution) are highly correlated (P<10-5, Monte Carlo simulation). (B). Distribution of divergence between E1 and E2–4 measured by the density of fixed differences per bp for 50kb windows across the genome.

Fig. S13. Distribution of divergence for the pair-wise comparisons of E2, E3, and E4 of Nanorana parkeri measured by the density of fixed differences per bp for 50kb windows across the genome.

Fig. S14. Comparisons of high-divergence regions (HDRs) versus the genomic background for Nanorana parkeri. (A) Characterization of population branch statistics (PBS) PBSE2, (B) Characterization of PBSE3, (C) Characterization of PBSE4. Light blue, HDRs; gray, outside of HDRs. Asterisks denote levels of significance by two-tailed Mann–Whitney test (*p <0.01; **p <1e-8; ***p <2.2e-16).

Fig. S15. Environmental principal component analysis for Nanorana parkeri. Colors match the five main groups as indicated in Figure 1. 31,095

767.7ka 13,731

181.7ka

42,307 43,544

45.9ka 15,090 28,621 39.3ka

10,290 15,928 22,390 End of LGM,~20ka

156,321 31,635 37,444 45,268 2,931,475

E4 E3 E2 E1 W

Fig. S16. The models of recent population expansion used for coalescent simulations.

Fig. S17. Comparisons of PBSW density between full set of demographic parameters inferred from G-PhoCS and demographic model with recent population expansion after LGM. W***-E1 W***-E2 W***-E3 W***-E4 W *** E1 ***E2 *** E3 ***E4

W***-E1 W***-E2 W***-E3 W***-E4 ***W *** E1 ***E2 ***E3 ***E4

Fig. S18. Comparisons of HDRs of PBSW versus the genomic background. Here HDRs were identified using the top 2.5% of simulated PBS value under the demographic model with recent population expansion. Light blue, HDRs; gray, outside HDRs. Asterisks denote levels of significance by two-tailed Mann–Whitney test (*p <0.01; **p<1e-8; ***p <2.2e-16).

Table S1. Sample information. Raw reads Locality Sample ID Population name Species Mapable (G) Mapped ratio Depth Locality name Elev. Longitude Latitude (G)

1 KIZYPX6147 E1 Nanorana parkeri 42.49 39.12 0.92 16.26 China: , Bayi Town, Cuomujiri Lake 2992m 94.013°E 30.052°N

1 KIZYPX6128 E1 Nanorana parkeri 37.28 34.26 0.92 17.63 China: Tibet Autonomous Region, Bayi Town, Cuomujiri Lake 2992m 94.013°E 30.052°N

1 KIZYPX6143 E1 Nanorana parkeri 34.55 25.95 0.90 13.91 China: Tibet Autonomous Region, Bayi Town, Cuomujiri Lake 2992m 94.013°E 30.052°N

1 KIZYPX6144 E1 Nanorana parkeri 37.49 26.57 0.89 14.40 China: Tibet Autonomous Region, Bayi Town, Cuomujiri Lake 2992m 94.013°E 30.052°N

1 KIZYPX6146 E1 Nanorana parkeri 31.21 25.17 0.90 13.43 China: Tibet Autonomous Region, Bayi Town, Cuomujiri Lake 2992m 94.013°E 30.052°N

2 KIZ010938 E1 Nanorana parkeri 33.37 29.82 0.89 13.51 China: Tibet Autonomous Region, Nyingchi County, Lulang 2900m 94.788°E 29.957°N

2 KIZ05615 E1 Nanorana parkeri 42.96 37.28 0.87 20.05 China: Tibet Autonomous Region, Nyingchi County, Lulang 2900m 94.788°E 29.957°N

2 KIZ05613 E1 Nanorana parkeri 30.92 23.13 0.90 12.47 China: Tibet Autonomous Region, Nyingchi County, Lulang 2900m 94.788°E 29.957°N

2 KIZ05616 E1 Nanorana parkeri 24.26 18.14 0.90 9.70 China: Tibet Autonomous Region, Nyingchi County, Lulang 2900m 94.788°E 29.957°N

2 KIZ05617 E1 Nanorana parkeri 33.33 25.27 0.91 13.48 China: Tibet Autonomous Region, Nyingchi County, Lulang 2900m 94.788°E 29.957°N

3 KIZ02247 E3 Nanorana parkeri 29.83 28.19 0.94 13.27 China: Tibet Autonomous Region, Gongbo’gvamda County 3296m 93.693°E 29.848°N

3 KIZYPX14924 E3 Nanorana parkeri 30.25 27.82 0.92 13.36 China: Tibet Autonomous Region, Gongbo’gvamda County 3296m 93.693°E 29.848°N

4 KIZYPX15419 E3 Nanorana parkeri 32.80 30.15 0.92 15.48 China: Tibet Autonomous Region, Gongbo’gvamda County 3204m 93.472°E 29.895°N

4 KIZ01919 E3 Nanorana parkeri 29.83 21.85 0.89 11.91 China: Tibet Autonomous Region, Gongbo’gvamda County 3204m 93.472°E 29.895°N

5 KIZ012630 E3 Nanorana parkeri 30.67 27.85 0.91 11.93 China: Tibet Autonomous Region, , Pai Town 2941m 94.934°E 29.621°N

5 KIZ012638 E3 Nanorana parkeri 35.03 31.17 0.91 16.54 China: Tibet Autonomous Region, Mainling County, Pai Town 2941m 94.934°E 29.621°N

5 KIZ012631 E3 Nanorana parkeri 43.57 27.07 0.88 14.87 China: Tibet Autonomous Region, Mainling County, Pai Town 2941m 94.934°E 29.621°N

5 KIZ012639 E3 Nanorana parkeri 30.85 22.58 0.85 12.17 China: Tibet Autonomous Region, Mainling County, Pai Town 2941m 94.934°E 29.621°N

5 KIZ012640 E3 Nanorana parkeri 39.05 29.78 0.89 16.11 China: Tibet Autonomous Region, Mainling County, Pai Town 2941m 94.934°E 29.621°N

6 KIZ08359 E2 Nanorana parkeri 33.95 30.78 0.91 13.74 China: Tibet Autonomous Region, Nedong County, Yadui 4903m 92.012°E 28.840°N

6 KIZ08349 E2 Nanorana parkeri 29.03 23.73 0.90 12.78 China: Tibet Autonomous Region, Nedong County, Yadui 4903m 92.012°E 28.840°N

6 KIZ08351 E2 Nanorana parkeri 26.43 17.52 0.88 9.65 China: Tibet Autonomous Region, Nedong County, Yadui 4903m 92.012°E 28.840°N

6 KIZ08352 E2 Nanorana parkeri 29.57 21.90 0.89 11.85 China: Tibet Autonomous Region, Nedong County, Yadui 4903m 92.012°E 28.840°N

7 KIZ08381 E2 Nanorana parkeri 33.07 29.79 0.90 13.26 China: Tibet Autonomous Region, Cona County 4372m 91.964°E 28.004°N

7 KIZ08376 E2 Nanorana parkeri 31.78 23.02 0.89 12.48 China: Tibet Autonomous Region, Cona County 4372m 91.964°E 28.004°N

7 KIZ08377 E2 Nanorana parkeri 28.95 22.11 0.90 11.91 China: Tibet Autonomous Region, Cona County 4372m 91.964°E 28.004°N

8 KIZ08399 E2 Nanorana parkeri 31.10 29.32 0.94 13.55 China: Tibet Autonomous Region, Lhunze County 3908m 92.411°E 28.420°N

8 KIZ08395 E2 Nanorana parkeri 33.63 27.26 0.90 14.60 China: Tibet Autonomous Region, Lhunze County 3908m 92.411°E 28.420°N

8 KIZ08398 E2 Nanorana parkeri 31.48 26.06 0.90 13.92 China: Tibet Autonomous Region, Lhunze County 3908m 92.411°E 28.420°N

9 KIZYPX15399 E4 Nanorana parkeri 29.10 26.72 0.92 12.05 China: Tibet Autonomous Region, Maizhokunggar County, Milashanxiang, Riduo 4529m 92.302°E 29.718°N 10 KIZ06709 E4 Nanorana parkeri 34.73 31.66 0.91 16.27 China: Tibet Autonomous Region, Maizhokunggar County 4386m 92.244°E 29.693°N

11 KIZ08106 E4 Nanorana parkeri 41.09 37.82 0.92 19.38 China: Tibet Autonomous Region, Dagze County, Bakexuega 3711m 91.428°E 29.709°N

12 KIZ08116 E4 Nanorana parkeri 34.63 31.82 0.92 16.29 China: Tibet Autonomous Region, Dagze County,Sangzhulin 3675m 91.280°E 29.657°N

13 KIZ08271 E4 Nanorana parkeri 38.50 36.39 0.95 16.12 China: Tibet Autonomous Region, Damxung County, Shazigang 3692m 90.852°E 30.331°N

14 KIZ08319 E4 Nanorana parkeri 32.71 30.11 0.92 14.45 China: Tibet Autonomous Region, Damxung County, Longren 4309m 91.229°E 30.516°N

15 KIZ08275 E4 Nanorana parkeri 58.55 53.48 0.91 26.88 China: Tibet Autonomous Region, County, Kongma 4536m 92.288°E 31.614°N

16 KIZYPX31094 E4 Nanorana parkeri 30.99 29.38 0.95 13.37 China: Tibet Autonomous Region, Baqen County 4449m 94.044°E 31.914°N

17 KIZ014954 E4 Nanorana parkeri 37.25 33.85 0.91 16.06 China: Tibet Autonomous Region, Baxoi County, Bangda 4448m 97.187°E 30.675°N

18 KIZ08235 E4 Nanorana parkeri 36.34 33.21 0.91 17.11 China: Tibet Autonomous Region, Nagarze County, Yangzhongyongcuo 4208m 90.422°E 28.977°N

19 KIZ08133 W Nanorana parkeri 40.08 34.95 0.87 15.50 China: Tibet Autonomous Region, City, Nierixiong 3843m 88.842°E 29.321°N

20 KIZYPX20531 W Nanorana parkeri 28.42 25.36 0.89 11.85 China: Tibet Autonomous Region, Lhaze County 4003m 87.741°E 29.064°N

21 KIZ08181 W Nanorana parkeri 28.22 24.95 0.88 11.54 China: Tibet Autonomous Region, Ngamring County,Qieduo 4773m 86.277°E 29.503°N

22 KIZYPX15060 W Nanorana parkeri 30.36 27.51 0.91 12.87 China: Tibet Autonomous Region, 4580m 85.245°E 29.373°N

23 KIZ08194 W Nanorana parkeri 24.26 21.79 0.90 10.15 China: Tibet Autonomous Region, Nielamu County, Jiangdong 4136m 86.034°E 28.303°N

24 KIZ05992 W Nanorana parkeri 35.79 32.19 0.90 14.71 China: Tibet Autonomous Region, , Pali 4294m 89.150°E 27.710°N

Mean 33.77 28.75 0.90 14.29

25 KIZ016304 Hybrid population (KC) Nanorana parkeri 12.80 11.64 0.93 6.07 China: Tibet Autonomous Region, Kangmar County 4295m 89.675°E 28.559°N

25 KIZ016305 Hybrid population (KC) Nanorana parkeri 12.91 11.70 0.92 6.12 China: Tibet Autonomous Region, Kangmar County 4295m 89.675°E 28.559°N

25 KIZYPX30667 Hybrid population (KC) Nanorana parkeri 12.88 11.80 0.93 6.13 China: Tibet Autonomous Region, Kangmar County 4295m 89.675°E 28.559°N

25 KIZYPX30668 Hybrid population (KC) Nanorana parkeri 12.41 11.23 0.92 5.88 China: Tibet Autonomous Region, Kangmar County 4295m 89.675°E 28.559°N

25 KIZYPX30669 Hybrid population (KC) Nanorana parkeri 12.97 11.71 0.92 6.14 China: Tibet Autonomous Region, Kangmar County 4295m 89.675°E 28.559°N

25 KIZYPX30671 Hybrid population (KC) Nanorana parkeri 12.19 11.15 0.93 5.80 China: Tibet Autonomous Region, Kangmar County 4295m 89.675°E 28.559°N

26 KIZYPX39973 Hybrid population (GC) Nanorana parkeri 12.33 11.21 0.92 5.85 China: Tibet Autonomous Region, Gyangze County 4041m 89.591°E 28.900°N

26 KIZYPX39978 Hybrid population (GC) Nanorana parkeri 12.74 11.64 0.93 6.05 China: Tibet Autonomous Region, Gyangze County 4041m 89.591°E 28.900°N

26 KIZYPX39979 Hybrid population (GC) Nanorana parkeri 11.28 10.34 0.93 5.37 China: Tibet Autonomous Region, Gyangze County 4041m 89.591°E 28.900°N

26 KIZYPX39980 Hybrid population (GC) Nanorana parkeri 11.89 10.85 0.93 5.65 China: Tibet Autonomous Region, Gyangze County 4041m 89.591°E 28.900°N

26 KIZYPX39984 Hybrid population (GC) Nanorana parkeri 12.08 10.77 0.90 5.75 China: Tibet Autonomous Region, Gyangze County 4041m 89.591°E 28.900°N

26 KIZYPX39986 Hybrid population (GC) Nanorana parkeri 10.89 9.93 0.93 5.18 China: Tibet Autonomous Region, Gyangze County 4041m 89.591°E 28.900°N

27 KIZYPX30750 Hybrid population (BC) Nanorana parkeri 11.68 10.67 0.93 5.55 China: Tibet Autonomous Region, Bainang County 3905m 89.246°E 29.140°N

27 KIZYPX30751 Hybrid population (BC) Nanorana parkeri 10.98 10.06 0.93 5.23 China: Tibet Autonomous Region, Bainang County 3905m 89.246°E 29.140°N

27 KIZYPX30754 Hybrid population (BC) Nanorana parkeri 10.62 9.72 0.93 5.06 China: Tibet Autonomous Region, Bainang County 3905m 89.246°E 29.140°N

27 KIZYPX30755 Hybrid population (BC) Nanorana parkeri 11.66 10.64 0.93 5.54 China: Tibet Autonomous Region, Bainang County 3905m 89.246°E 29.140°N

27 KIZYPX30756 Hybrid population (BC) Nanorana parkeri 11.40 10.41 0.93 5.42 China: Tibet Autonomous Region, Bainang County 3905m 89.246°E 29.140°N 27 KIZYPX30758 Hybrid population (BC) Nanorana parkeri 10.77 9.87 0.93 5.13 China: Tibet Autonomous Region, Bainang County 3905m 89.246°E 29.140°N

Mean 11.92 10.85 0.93 5.66

Outgroup KIZ020307 Outgroup Nanorana pleskei 43.34 36.93 0.85 16.96 China: Qinghai Province, Dari County, Manzhang 4238m 100.195°E 33.390°N

Table S2. Demographic parameters inferred by Generalized Phylogenetic Coalescent Sampler (G-PhoCS). Parameter Point estimation Lower 95%CI Upper 95%CI

θW 43,543.8144 40,277.0619 46,810.5670

θE1 15,090.2062 12,274.4845 18,028.3505

θE2 22,390.4639 17,519.3299 27,435.5670

θE3 15,927.8351 12,403.3505 19,735.8247

θE4 10,289.9485 8,150.7732 12,506.4433

θE34 28,621.1340 298.9691 83,769.3299

θE234 42,306.7010 33,634.0206 51,346.6495

θE1234 13,730.6701 11,237.1134 16,243.5567

θW-E1234 31,095.3608 24,729.3814 37,577.3196

ΤE34 39,344.0722 32,074.7423 46,649.4845

ΤE234 45,878.8660 36,868.5567 55,567.0103

ΤE1234 181,701.0309 157,989.6907 205,412.3711

ΤW-E1234 767,654.6392 688,273.1959 846,005.1546

ME1->E3 0.0836 0.0341 0.1508

ME3->E1 0.4099 0.2551 0.6057

ME2->E4 0.0371 0.0056 0.0854

ME4->E2 0.0606 0.0112 0.1332

ME2->E3 0.0398 0.0070 0.0907

ME3->E2 0.0415 0.0066 0.0956

ME3->E4 0.0474 0.0077 0.1079

ME4->E3 0.0564 0.0096 0.1262

ME4->W 0.0477 0.0207 0.0827

ME234->W 0.0746 0.0309 0.1285

ME1234->W 0.0901 0.0213 0.1830 Table S3. FST statistics calculated between W and E1–4. E1 E3 E2 E4 W

E1

E3 0.1449

E2 0.1825 0.1367

E4 0.1683 0.0974 0.1132

W 0.4931 0.4730 0.4767 0.4720

Table S4. Summary of dxy statistics calculated between W and E1–4. E1 E3 E2 E4 W

E1

E3 0.0009

E2 0.0010 0.0009

E4 0.0009 0.0007 0.0008

W 0.0026 0.0031 0.0031 0.0031

Table S5. Summary of density of fixed differences (df) statistics calculated between W and E1–4. E1 E3 E2 E4 W

E1

E3 3.27E-07

E2 4.69E-07 3.98E-07

E4 6.02E-07 2.00E-07 1.27E-07

W 5.17E-05 5.10E-05 5.03E-05 5.28E-05

Table S6. Summary statistics for comparisons of regions of extreme genetic differentiation with remaining genomic regions (mean ± standard deviation). PBS W Parameter HDRs Outside HDRs p value (Wilcoxon test) PBS 0.9662 (± 0.0739) 0.5478 (±0.1423) < 2.2e-16

FST (W : E1) 0.6533 (±0.0350) 0.4680 (±0.0856) < 2.2e-16

FST (W: E2) 0.6347 (±0.0436) 0.4596 (±0.0832) < 2.2e-16

FST (W: E3) 0.6480 (±0.0402) 0.4672 (±0.0866) < 2.2e-16

FST (W: E4) 0.6313 (±0.0421) 0.4594 (±0.0887) < 2.2e-16 dxy (W:E1) 0.0040 (±0.0017) 0.0026 (±0.0011) < 2.2e-16 dxy (W: E2) 0.0048 (±0.0020) 0.0031 (±0.0013) < 2.2e-16 dxy (W: E3) 0.0048 (±0.0020) 0.0031 (±0.0013) < 2.2e-16 dxy (W: E4) 0.0043 (±0.0020) 0.0030 (±0.0012) < 2.2e-16 π (10-4, W) 0.0008 (±0.0004) 0.0007 (±0.0003) 0.0000 π (10-4, E1) 0.0002 (±0.0001) 0.0004 (±0.0002) < 2.2e-16 π (10-4, E2) 0.0002 (±0.0001) 0.0004 (±0.0002) < 2.2e-16 π (10-4, E3) 0.0002 (±0.0001) 0.0003 (±0.0002) < 2.2e-16 pi (10-4, E4) 0.0002 (±0.0001) 0.0003 (±0.0002) < 2.2e-16 DAF (W) 0.3562 (±0.0554) 0.2574 (±0.0582) < 2.2e-16 DAF (E1) 0.2237 (±0.0605) 0.1822 (±0.0499) < 2.2e-16 DAF (E2) 0.2250 (±0.0596) 0.1750 (±0.0498) < 2.2e-16 DAF (E3) 0.2220 (±0.0596) 0.1744 (±0.0498) < 2.2e-16 DAF (E4) 0.2160 (±0.0592) 0.1617 (±0.0513) < 2.2e-16 rho 2.3229 (±2.3742) 3.0978 (±3.8721) < 2.2e-16 PBS E1 PBS 0.3561 (±0.0772) 0.1160 (±0.0583) < 2.2e-16

FST (E1 : E2) 0.3448 (±0.0618) 0.1783 (±0.0601) < 2.2e-16

FST (E1 : E3) 0.3232 (±0.0831) 0.1404 (±0.0680) < 2.2e-16

FST (E1 : E4) 0.3385 (±0.0719) 0.1640 (±0.0630) < 2.2e-16 dxy (E1 : E2) 0.0013 (±0.0006) 0.0010 (±0.0005) < 2.2e-16 dxy (E1 : E3) 0.0012 (±0.0006) 0.0009 (±0.0005) < 2.2e-16 dxy (E1 : E4) 0.0013 (±0.0006) 0.0009 (±0.0005) < 2.2e-16 π (10-4, E1) 0.0003 (±0.0002) 0.0004 (±0.0002) < 2.2e-16 π (10-4, E2) 0.0003 (±0.0002) 0.0004 (±0.0002) < 2.2e-16 π (10-4, E3) 0.0003 (±0.0002) 0.0003 (±0.0002) 0.0001 π (10-4, E4) 0.0002 (±0.0001) 0.0003 (±0.0002) < 2.2e-16 DAF (E1) 0.2125 (±0.0620) 0.1837 (±0.0717) < 2.2e-16 DAF (E2) 0.1847 (±0.0663) 0.1765 (±0.0709) 0.0000 DAF (E3) 0.1886 (±0.0660) 0.1752 (±0.0716 ) < 2.2e-16 DAF (E4) 0.1735 (±0.0688) 0.1624 (±0.0712) <1e-8 rho 1.3779 (±0.8322) 1.8301 (±1.4415) < 2.2e-16 PBS E2 PBS 0.2497 (±0.0514) 0.0790 (±0.0427) < 2.2e-16

FST (E2 : E1) 0.3005 (±0.0620) 0.1795 (±0.0627) < 2.2e-16 FST (E2 : E3) 0.2656 (±0.0588) 0.1334 (±0.0539) < 2.2e-16

FST (E2 : E4) 0.2332 (±0.0729) 0.1101 (±0.0498) < 2.2e-16 dxy (E2 : E1) 0.0011 (±0.0005) 0.0010 (±0.0005) < 2.2e-16 dxy (E2 : E3) 0.0011 (±0.0005) 0.0009 (±0.0004) < 2.2e-16 dxy (E2 : E4) 0.0010 (±0.0005) 0.0008 (±0.0004) < 2.2e-16 π (10-4, E2) 0.0004 (±0.0002) 0.0004 (±0.0002) 0.0034 π (10-4, E1) 0.0003 (±0.0002) 0.0004 (±0.0002) < 2.2e-16 π (10-4, E3) 0.0003 (±0.0002) 0.0003 (±0.0002) < 2.2e-16 π (10-4, E4) 0.0002 (±0.0001) 0.0003 (±0.0002) < 2.2e-16 DAF (E2) 0.1977 (±0.0501) 0.1757 (±0.0505) < 2.2e-16 DAF (E1) 0.1831 (±0.0492) 0.1832 (±0.0506) 0.7870 DAF (E3) 0.1721 (±0.0504) 0.1756 (±0.0506) 0.0065 DAF (E4) 0.1640 (±0.0511) 0.1630 (±0.0521) 0.6873 rho 1.2758 (±0.7631) 1.8323 (±1.4415) < 2.2e-16 PBS E3 PBS 0.1969 (±0.0464) 0.0485 (±0.0352) < 2.2e-16

FST (E3 : E1) 0.2477 (±0.0950) 0.1423 (±0.0716) < 2.2e-16

FST (E3 : E2) 0.2455 (±0.0620) 0.1339 (±0.0549) < 2.2e-16

FST (E3 : E4) 0.2288 (±0.0581) 0.0940 (±0.0489) < 2.2e-16 dxy (E3 : E1) 0.0011 (±0.0005) 0.0009 (±0.0005) < 2.2e-16 dxy (E3 : E2) 0.0011 (±0.0005) 0.0009 (±0.0004) < 2.2e-16 dxy (E3 : E4) 0.0010 (±0.00045) 0.0007 (±0.0004) < 2.2e-16 π (10-4, E3) 0.0003 (±0.0002) 0.0004 (±0.0002) 0.0000 π (10-4, E1) 0.0004 (±0.0002) 0.0004 (±0.0002) 0.6105 π (10-4, E2) 0.0004 (±0.0002) 0.0004 (±0.0002) 0.0052 π (10-4, E4) 0.0003 (±0.0001) 0.0003 (±0.0002) 0.6710 DAF (E3) 0.1945 (±0.0486) 0.1750 (±0.0506) < 2.2e-16 DAF (E1) 0.1912 (±0.0494) 0.1830 (±0.0506) 0.0000 DAF (E2) 0.1745 (±0.0493) 0.1763 (±0.0506) 0.0920 DAF (E4) 0.1646 (±0.0505) 0.1630 (±0.0522) 0.3934 rho 1.5769 (±1.1812) 1.8251 (±1.4366) 0.0000 PBS E4 PBS 0.1824 (±0.0401) 0.0489 (±0.0378) < 2.2e-16

FST (E4 : E1) 0.2734 (±0.0773) 0.1656 (±0.0664) < 2.2e-16

FST(E4 : E2) 0.1777 (±0.0695) 0.1116 (±0.0526) < 2.2e-16

FST (E4 : E3) 0.2218 (±0.0581) 0.0942 (±0.0494) < 2.2e-16 dxy (E4 : E1) 0.0011 (±0.0006) 0.0009 (±0.0005) < 2.2e-16 dxy (E4 : E2) 0.0009 (±0.0004) 0.0008 (±0.0004) < 2.2e-16 dxy (E4 : E3) 0.0010 (±0.0005) 0.0007 (±0.0004) < 2.2e-16 π (10-4, E4) 0.0002 (±0.0002) 0.0003 (±0.0002) < 2.2e-16 π (10-4, E1) 0.0003 (±0.0002) 0.0004 (±0.0002) 0.0047 π (10-4, E2) 0.0004 (±0.0002) 0.0004 (±0.0002) 0.0000 π (10-4, E3) 0.0004 (±0.0002) 0.0003 (±0.0002) 0.0000 DAF (E4) 0.1640 (±0.0524) 0.1630 (±0.0521) 0.7815 DAF (E1) 0.1874 (±0.0493) 0.1831 (±0.0506) 0.0166 DAF (E2) 0.1845 (±0.0506) 0.1760 (±0.0506) 0.0000 DAF (E3) 0.1839 (±0.0493) 0.1753 (±0.0506) 0.0000 rho 1.4253 (±1.0044) 1.8288 (±1.4391) < 2.2e-16

Table S7. Gene Ontology analysis of genes located in regions that strongly differentiated W from all four subpopulations of E. Category Term No. of genes p value Cluster 1 GO:0048538~thymus development 5 0.00 GO:0048534~hemopoietic or lymphoid organ development 12 0.01

GO:0002520~immune system development 12 0.01

Cluster 2 GO:0003006~reproductive developmental process 15 0.00 GO:0048610~reproductive cellular process 11 0.00

GO:0048609~reproductive process in a multicellular organism 18 0.01

GO:0032504~multicellular organism reproduction 18 0.01

GO:0019953~sexual reproduction 17 0.01

GO:0007281~germ cell development 7 0.01

GO:0007276~gamete generation 15 0.01

GO:0048232~male gamete generation 12 0.02

GO:0007283~spermatogenesis 12 0.02

Cluster 3 GO:0035270~endocrine system development 6 0.01 GO:0030325~adrenal gland development 3 0.01

Cluster 4 GO:0003006~reproductive developmental process 15 0.00 GO:0007548~sex differentiation 8 0.02

GO:0048608~reproductive structure development 7 0.02

GO:0045137~development of primary sexual characteristics 7 0.03

GO:0008406~gonad development 6 0.05

Cluster 5 GO:0006357~regulation of transcription from RNA polymerase II promoter 26 0.00 GO:0045893~positive regulation of transcription, DNA-dependent 18 0.00

GO:0051254~positive regulation of RNA metabolic process 18 0.01

GO:0045944~positive regulation of transcription from RNA polymerase II promoter 15 0.01

GO:0045941~positive regulation of transcription 19 0.01

GO:0010628~positive regulation of gene expression 19 0.01

GO:0010604~positive regulation of macromolecule metabolic process 25 0.02

GO:0031328~positive regulation of cellular biosynthetic process 21 0.02

GO:0009891~positive regulation of biosynthetic process 21 0.02

GO:0010557~positive regulation of macromolecule biosynthetic process 20 0.02

GO:0045935~positive regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolic process 19 0.03

GO:0051173~positive regulation of nitrogen compound metabolic process 19 0.04

Cluster 6 GO:0010553~negative regulation of specific transcription from RNA polymerase II promoter 5 0.01 GO:0010551~regulation of specific transcription from RNA polymerase II promoter 7 0.01

GO:0000122~negative regulation of transcription from RNA polymerase II promoter 12 0.01

GO:0032582~negative regulation of gene-specific transcription 5 0.01

GO:0051253~negative regulation of RNA metabolic process 14 0.01

GO:0045892~negative regulation of transcription, DNA-dependent 13 0.02

GO:0032583~regulation of gene-specific transcription 7 0.03

Cluster 7 GO:0012501~programmed cell death 20 0.01 GO:0006915~apoptosis 19 0.02

Cluster 8 GO:0030856~regulation of epithelial cell differentiation 4 0.01 GO:0045616~regulation of keratinocyte differentiation 3 0.02

GO:0045604~regulation of epidermal cell differentiation 3 0.02

GO:0033554~cellular response to stress 18 0.02

GO:0006974~response to DNA damage stimulus 13 0.03

Cluster 10 GO:0040008~regulation of growth 13 0.02 GO:0008361~regulation of cell size 9 0.03

GO:0045926~negative regulation of growth 6 0.05

Cluster 11 GO:0051101~regulation of DNA binding 7 0.02 GO:0051090~regulation of transcription factor activity 6 0.04

GO:0043433~negative regulation of transcription factor activity 4 0.04

Table S8. Estimations of false signals of gene flow. Gene flow does not appear to occur based on the proportions of shared ancestral genes; the false signal is <2%. This test uses W and nearby E4 as potential source populations for E1-E3. Sample ID W ancestry E4 ancestry Undefined KIZ05613 0.0152 0.9699 0.0149 KIZ05615 0.0190 0.9653 0.0157 KIZ05617 0.0138 0.9716 0.0146 KIZ010938 0.0153 0.9710 0.0137 KIZYPX6128 0.0171 0.9686 0.0144 KIZYPX6143 0.0125 0.9738 0.0136 KIZYPX6144 0.0134 0.9745 0.0121 KIZYPX6146 0.0128 0.9748 0.0124 KIZYPX6147 0.0166 0.9689 0.0145 KIZ08399 0.0128 0.9769 0.0103 KIZ08349 0.0096 0.9813 0.0090 KIZ08351 0.0090 0.9828 0.0082 KIZ08352 0.0097 0.9810 0.0092 KIZ08359 0.0100 0.9808 0.0092 KIZ08376 0.0103 0.9796 0.0101 KIZ08377 0.0110 0.9788 0.0102 KIZ08381 0.0096 0.9807 0.0097 KIZ08395 0.0110 0.9794 0.0096 KIZ08398 0.0106 0.9788 0.0106 KIZ01919 0.0025 0.9949 0.0026 KIZYPX14924 0.0024 0.9958 0.0018 KIZ02247 0.0022 0.9950 0.0028 KIZYPX15419 0.0020 0.9964 0.0016 KIZ012630 0.0055 0.9900 0.0046 KIZ012631 0.0052 0.9899 0.0049 KIZ012638 0.0071 0.9874 0.0055 KIZ012639 0.0053 0.9899 0.0049 KIZ012640 0.0054 0.9898 0.0048

Table S9. Gene Ontology analysis of genes located in regions that strongly differentiated E1 from E2–4. Category Term No. of genes P-value Annotation Cluster 1 GO:0022610~biological adhesion 33 0.00 GO:0007155~cell adhesion 33 0.00

Annotation Cluster 2 GO:0006793~phosphorus metabolic process 42 0.00 GO:0006796~phosphate metabolic process 42 0.00

GO:0016310~phosphorylation 36 0.00

GO:0006468~protein amino acid phosphorylation 31 0.01

Annotation Cluster 3 GO:0031589~cell-substrate adhesion 9 0.01 GO:0007160~cell-matrix adhesion 8 0.01

Annotation Cluster 4 GO:0016044~membrane organization 21 0.00 GO:0006897~endocytosis 14 0.01

GO:0010324~membrane invagination 14 0.01

GO:0016192~vesicle-mediated transport 25 0.03

Annotation Cluster 5 GO:0043085~positive regulation of catalytic activity 26 0.01 GO:0044093~positive regulation of molecular function 27 0.01

GO:0051345~positive regulation of hydrolase activity 11 0.03

Annotation Cluster 6 GO:0006067~ethanol metabolic process 3 0.01 GO:0006069~ethanol oxidation 3 0.01

GO:0034308~monohydric alcohol metabolic process 3 0.01

hsa00071: Fatty acid metabolism 5 0.03

hsa00010: Glycolysis / Gluconeogenesis 6 0.03

Annotation Cluster 7 GO:0003013~circulatory system process 12 0.01 GO:0008015~blood circulation 12 0.01

Annotation Cluster 8 GO:0050953~sensory perception of light stimulus 13 0.02 GO:0007601~visual perception 13 0.02

Annotation Cluster 9 hsa04080: Neuroactive ligand-receptor interaction 17 0.00 GO:0019932~second-messenger-mediated signaling 15 0.01

GO:0007187~G-protein signaling, coupled to cyclic nucleotide second messenger 9 0.01

GO:0019935~cyclic-nucleotide-mediated signaling 9 0.03

GO:0007188~G-protein signaling, coupled to cAMP nucleotide second messenger 7 0.03

GO:0019933~cAMP-mediated signaling 7 0.05

GO:0045761~regulation of adenylate cyclase activity 7 0.05

Annotation Cluster 10 GO:0007628~adult walking behavior 5 0.00 GO:0008344~adult locomotory behavior 6 0.02

GO:0030534~adult behavior 7 0.03

GO:0007610~behavior 21 0.04

GO:0007626~locomotory behavior 14 0.04

Annotation Cluster 11 GO:0042692~muscle cell differentiation 10 0.01 GO:0048747~muscle fiber development 5 0.02

GO:0007528~neuromuscular junction development 4 0.02

GO:0051146~striated muscle cell differentiation 7 0.03

GO:0048741~skeletal muscle fiber development 4 0.04

Annotation Cluster 12 hsa04610: Complement and coagulation cascades 7 0.02 GO:0006954~inflammatory response 16 0.04

Annotation Cluster 13 GO:0009611~response to wounding 24 0.02 GO:0042060~wound healing 11 0.04

Annotation Cluster 14 GO:0015807~L-amino acid transport 4 0.02 GO:0006865~amino acid transport 7 0.04

Annotation Cluster 15 GO:0010959~regulation of metal ion transport 7 0.02 GO:0051924~regulation of calcium ion transport 6 0.04

Annotation Cluster 16 GO:0006936~muscle contraction 10 0.03 GO:0003012~muscle system process 10 0.04

Annotation Cluster 17 GO:0002714~positive regulation of B cell mediated immunity 3 0.04 GO:0002891~positive regulation of immunoglobulin mediated immune response 3 0.04

Table S10. Gene Ontology analysis of genes located in regions that strongly differentiated E2 from all other populations of E. Category Term No. of genes P-value Annotation Cluster 1 GO:0007167~enzyme linked receptor protein signaling pathway 19 0.01 GO:0007169~transmembrane receptor protein tyrosine kinase signaling pathway 14 0.01

Annotation Cluster 2 GO:0019932~second-messenger-mediated signaling 16 0.00 GO:0007193~inhibition of adenylate cyclase activity by G-protein signaling 6 0.01

GO:0007187~G-protein signaling, coupled to cyclic nucleotide second messenger 10 0.01

GO:0019935~cyclic-nucleotide-mediated signaling 10 0.01

GO:0051350~negative regulation of lyase activity 6 0.02

GO:0031280~negative regulation of cyclase activity 6 0.02

GO:0007194~negative regulation of adenylate cyclase activity 6 0.02

GO:0009190~cyclic nucleotide biosynthetic process 4 0.03

GO:0007188~G-protein signaling, coupled to cAMP nucleotide second messenger 7 0.03

Annotation Cluster 3 GO:0007156~homophilic cell adhesion 11 0.00 GO:0016337~cell-cell adhesion 15 0.03

GO:0022610~biological adhesion 30 0.03

GO:0007155~cell adhesion 30 0.03

Annotation Cluster 4 GO:0008624~induction of apoptosis by extracellular signals 9 0.02 GO:0006917~induction of apoptosis 16 0.04

GO:0012502~induction of programmed cell death 16 0.04

Annotation Cluster 5 GO:0043062~extracellular structure organization 11 0.02 GO:0030199~collagen fibril organization 4 0.05

Annotation Cluster 6 GO:0001701~in utero embryonic development 11 0.03 GO:0001824~blastocyst development 5 0.03

Annotation Cluster 7 GO:0007409~axonogenesis 12 0.02 GO:0048667~cell morphogenesis involved in neuron differentiation 12 0.04

GO:0048812~neuron projection morphogenesis 12 0.04

GO:0000904~cell morphogenesis involved in differentiation 13 0.05

Annotation Cluster 8 GO:0009725~response to hormone stimulus 18 0.04 GO:0032870~cellular response to hormone stimulus 9 0.04

GO:0009719~response to endogenous stimulus 19 0.04

Table S11. Gene Ontology analysis of genes located in regions that strongly differentiated E4 from all other populations of E. Category Term No. of genes P-value Annotation Cluster 1 GO:0006793~phosphorus metabolic process 54 0.00 GO:0006796~phosphate metabolic process 54 0.00

GO:0006468~protein amino acid phosphorylation 38 0.00

GO:0016310~phosphorylation 43 0.00

Annotation Cluster 2 GO:0009628~response to abiotic stimulus 27 0.00 GO:0009314~response to radiation 16 0.00

GO:0009416~response to light stimulus 11 0.02

Annotation Cluster 3 GO:0046839~phospholipid dephosphorylation 4 0.00 GO:0030258~lipid modification 8 0.01

Annotation Cluster 4 GO:0007155~cell adhesion 40 0.00 GO:0022610~biological adhesion 40 0.00

GO:0016337~cell-cell adhesion 16 0.04

Annotation Cluster 5 GO:0051938~L-glutamate import 4 0.00 GO:0043090~amino acid import 4 0.00

GO:0043092~L-amino acid import 4 0.00

GO:0070779~D-aspartate import 3 0.00

GO:0070777~D-aspartate transport 3 0.00

GO:0015813~L-glutamate transport 4 0.01

GO:0042940~D-amino acid transport 3 0.01

GO:0006835~dicarboxylic acid transport 4 0.01

GO:0015800~acidic amino acid transport 4 0.01

GO:0015810~aspartate transport 3 0.02

GO:0015807~L-amino acid transport 4 0.04

Annotation Cluster 6 GO:0051056~regulation of small GTPase mediated signal transduction 18 0.00 GO:0046578~regulation of Ras protein signal transduction 15 0.01

GO:0035023~regulation of Rho protein signal transduction 9 0.02

Annotation Cluster 7 GO:0044093~positive regulation of molecular function 36 0.00 GO:0043085~positive regulation of catalytic activity 32 0.00

GO:0051345~positive regulation of hydrolase activity 15 0.00

GO:0032147~activation of protein kinase activity 11 0.00

GO:0051336~regulation of hydrolase activity 21 0.01

GO:0045860~positive regulation of protein kinase activity 15 0.02

GO:0000165~MAPKKK cascade 13 0.02

GO:0033674~positive regulation of kinase activity 15 0.02

GO:0007243~protein kinase cascade 21 0.02

GO:0051347~positive regulation of transferase activity 15 0.03

GO:0042325~regulation of phosphorylation 24 0.04

GO:0045859~regulation of protein kinase activity 19 0.04

Annotation Cluster 8 GO:0051345~positive regulation of hydrolase activity 15 0.00 GO:0006919~activation of caspase activity 7 0.01

GO:0010952~positive regulation of peptidase activity 7 0.01

GO:0043280~positive regulation of caspase activity 7 0.01

GO:0043281~regulation of caspase activity 7 0.05

Annotation Cluster 9 GO:0006302~double-strand break repair 7 0.02 GO:0010212~response to ionizing radiation 6 0.05

Annotation Cluster 10 GO:0065003~macromolecular complex assembly 33 0.02 GO:0006461~protein complex assembly 26 0.03

GO:0070271~protein complex biogenesis 26 0.03

GO:0051259~protein oligomerization 12 0.03

GO:0043933~macromolecular complex subunit organization 33 0.05

Annotation Cluster 11 GO:0048232~male gamete generation 18 0.03 GO:0007283~spermatogenesis 18 0.03

GO:0032504~multicellular organism reproduction 25 0.03

GO:0048609~reproductive process in a multicellular organism 25 0.03

Annotation Cluster 12 GO:0051092~positive regulation of NF-kappaB transcription factor activity 5 0.05 GO:0051101~regulation of DNA binding 9 0.05

GO:0051091~positive regulation of transcription factor activity 6 0.05

Table S12. Gene Ontology analysis of genes located in regions that strongly differentiated E3 from all other E populations. Category Term No. of genes P-value Annotation Cluster 1 GO:0010033~response to organic substance 43 0.00 GO:0009725~response to hormone stimulus 24 0.00

GO:0009719~response to endogenous stimulus 25 0.00

GO:0048545~response to steroid hormone stimulus 12 0.05

Annotation Cluster 2 GO:0007266~Rho protein signal transduction 7 0.00 GO:0007265~Ras protein signal transduction 9 0.02

GO:0007264~small GTPase mediated signal transduction 18 0.02

Annotation Cluster 3 GO:0007160~cell-matrix adhesion 9 0.01 GO:0031589~cell-substrate adhesion 9 0.01

Annotation Cluster 4 GO:0070542~response to fatty acid 3 0.01 GO:0033993~response to lipid 4 0.02

Annotation Cluster 5 GO:0030334~regulation of cell migration 13 0.01 GO:0040012~regulation of locomotion 14 0.01

GO:0051270~regulation of cell motion 13 0.02

Annotation Cluster 6 GO:0050806~positive regulation of synaptic transmission 6 0.00 GO:0051971~positive regulation of transmission of nerve impulse 6 0.01

GO:0031646~positive regulation of neurological system process 6 0.01

GO:0051968~positive regulation of synaptic transmission, glutamatergic 3 0.02

GO:0051966~regulation of synaptic transmission, glutamatergic 4 0.02

GO:0007613~memory 5 0.04

GO:0042755~eating behavior 4 0.04

Annotation Cluster 7 GO:0002724~regulation of T cell cytokine production 3 0.01 GO:0032663~regulation of interleukin-2 production 5 0.02

GO:0002709~regulation of T cell mediated immunity 4 0.03

Annotation Cluster 8 GO:0006873~cellular ion homeostasis 22 0.01 GO:0055082~cellular chemical homeostasis 22 0.01

GO:0050801~ion homeostasis 23 0.01

GO:0006874~cellular calcium ion homeostasis 13 0.02

GO:0055074~calcium ion homeostasis 13 0.02

GO:0006875~cellular metal ion homeostasis 13 0.03

GO:0048878~chemical homeostasis 26 0.03

GO:0019725~cellular homeostasis 24 0.03

GO:0042391~regulation of membrane potential 10 0.03

GO:0030005~cellular di-, tri-valent inorganic cation homeostasis 14 0.03

GO:0055065~metal ion homeostasis 13 0.03

GO:0051480~cytosolic calcium ion homeostasis 9 0.04

GO:0055066~di-, tri-valent inorganic cation homeostasis 14 0.05

Annotation Cluster 9 GO:0008015~blood circulation 13 0.02 GO:0003013~circulatory system process 13 0.02

GO:0008217~regulation of blood pressure 8 0.04

Annotation Cluster 10 GO:0001508~regulation of action potential 7 0.02 GO:0043523~regulation of neuron apoptosis 8 0.03

GO:0042391~regulation of membrane potential 10 0.03

Annotation Cluster 11 GO:0022604~regulation of cell morphogenesis 10 0.03 GO:0008360~regulation of cell shape 6 0.03

Annotation Cluster 12 GO:0050851~antigen receptor-mediated signaling pathway 5 0.02 GO:0002757~immune response-activating signal transduction 6 0.03

GO:0002764~immune response-regulating signal transduction 6 0.03

GO:0002429~immune response-activating cell surface receptor signaling pathway 5 0.04

GO:0050852~T cell receptor signaling pathway 4 0.04

GO:0002768~immune response-regulating cell surface receptor signaling pathway 5 0.05

Annotation Cluster 13 GO:0031175~neuron projection development 16 0.02 GO:0030182~neuron differentiation 23 0.03

GO:0048812~neuron projection morphogenesis 13 0.04

GO:0048666~neuron development 18 0.05

Annotation Cluster 14 GO:0007242~intracellular signaling cascade 54 0.02 GO:0007243~protein kinase cascade 20 0.03

GO:0033674~positive regulation of kinase activity 14 0.04

GO:0051347~positive regulation of transferase activity 14 0.05

Annotation Cluster 15 GO:0022610~biological adhesion 32 0.05 GO:0007155~cell adhesion 32 0.05

Table S13. Functional references for 7 candidate genes that potentially contribute to high altitude adaptation. Population Gene Symbol Biological category E1 ADORA1 Blood circulation(5, 6) E3 DNAJC8 Hypoxia (7) E3 TNNC1 Blood circulation(8) E3 HBB Blood circulation (9) E4 LAMB3 Hypoxia (10) E4 CAT1 Antioxidant protection(11) E4 CAT2 Antioxidant protection(11)

Table S14. The primers of candidate genes used for real-time PCR. Gene Forward primer Reverse primer DNAJC8 5’-AGTGGTTGATGAAGATGA-3’ 5’-ATGTCCTTTGTTTCTCTCT-3’ TNNC1 5’-AAGAATGCGGATGGATAC-3’ 5’-CCTCAATGTCATCTTCAGT-3’ ADORA1 5’-AGTAGGAGCATTGGTCAT-3’ 5’-GCGTCAGAATCAGCATAA-3’ LAMB3 5’-TGGTGTATGTGAGGATTG-3’ 5’-TTGGCTGATGTCTTGATT-3’ CAT1 5’-TATCCGAACAGCTTTACAT-3’ 5’-GACATCTCCAGAAGATAAGA-3’ CAT2 5’-TCCGCCTCCTCCTGAATA-3’ 5’-CTGGAACACAATCAATTAGGTCTT-3’ HBB 5’-TGCTGATTATTACCTTGG-3’ 5’-ATAGAATGAAATGCTGACT-3’ ACTB 5’-AACTACCTACAACTCCATAA-3’ 5’-GCATCCTATCAGCAATTC-3’

Table S15. Estimation of the beginning expansion time, the Ne of beginning, the Ne of end, and their corresponding 95% credible intervals within the last 30,000 years based on the model of G-PhoCS. Population Time of expansion (95% CI) Ne of beginning (95% CI) Ne of end (95% CI) fold

E1 28,615 (24,565-29,590) 17,772 (17,115-17,800) 53,313 (45,483-103,244) 3.00 (2.56-5.81) E2 26,660 (8,835-27,415) 25,758 (23,807-26,642) 43,076 (24,543-1,138,960) 1.67 (0.92-42.95) E3 15,595 (8,615-26,235) 18,475 (14,175-19,622) 36,694 (16,361-921,135) 1.99 (0.88-64.98) E4 29,015 (27,050-29,625) 11,916 (10,880-12,352) 181,024 (73,157-880,545) 15.19 (6.38-76.07) W 14,035 (9,465-22,120) 41,412 (40,474-44,732) 2,787,923 (652,857-3,968,337) 67.32 (15.54-95.81)

References 1. Li H & Durbin R (2011) Inference of human population history from individual whole-genome sequences. Nature 475:493-496. 2. Gronau I, Hubisz MJ, Gulko B, Danko CG, & Siepel A (2011) Bayesian inference of ancient human demography from individual genome sequences. Nat Genet 43:1031-1034. 3. Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, & Foll M (2013) Robust demographic inference from genomic and SNP data. PLoS Genet 9:e1003905. 4. Weir BS & Cockerham CC (1984) Estimating F-statistics for the analysis of population structure. Evolution 38:1358-1370. 5. Tang Z, et al. (2007) Polymorphisms in adenosine receptor genes are associated with infarct size in patients with ischemic cardiomyopathy. Clinical Pharmacology & Therapeutics 82: 435-440. 6. Jenner TL & Rose'Meyer RB (2006) Loss of vascular adenosine A 1 receptors with age in the rat heart. Vascular pharmacology 45:341-349. 7. Kaindl AM, et al. (2008) Erythropoietin protects the developing brain from hyperoxia-induced cell death and proteome changes. Ann Neurol 64:523-534. 8. Hershberger RE, et al. (2010) Coding sequence rare variants identified in MYBPC3, MYH6, TPM1, TNNC1, and TNNI3 from 312 patients with familial or idiopathic dilated cardiomyopathy. Circ Cardiovasc Genet 3(2):155-161. 9. Storz JF, et al. (2009) Evolutionary and functional insights into the mechanism underlying high-altitude adaptation of deer mouse hemoglobin. Proc Natl Acad Sci U S A 106(34):14450-14455. 10. Kokkonen N, et al. (2007) Hypoxia upregulates carcinoembryonic antigen expression in cancer cells. Int J Cancer 121(11):2443-2450. 11. Öncel I, Yurdakulol E, Keleş Y, Kurt L, & Yıldız A (2004) Role of antioxidant defense system and biochemical adaptation on stress tolerance of high mountain and steppe plants. Acta Oecologica 26(3):211-218.