<<

Phylogeography of Armeniaca L. By Chloroplast DNA and Nuclear Ribosomal Sequences

Wen-Wen Li Xinjiang Agricultural University Li-Qiang Liu Xinjiang Agricultural University Qiu-Ping Zhang Liaoning Institute of Pomology Wei-Quan Zhou Xinjiang Agricultural University Guo-Quan Fan Xinjiang Academy of Agricultural Sciences Kang Liao (  [email protected] ) Xinjiang Agricultural University

Research Article

Keywords: evolutionary history of organisms, P. armeniaca, AMOVA, phytogeography

Posted Date: February 22nd, 2021

DOI: https://doi.org/10.21203/rs.3.rs-215210/v1

License:   This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Phylogeography of L. by Chloroplast DNA and Nuclear Ribosomal Sequences

1 Wen-Wen Li1, Li-Qiang Liu1, Qiu-Ping Zhang2, Wei-Quan Zhou1, Guo-Quan 2 Fan3, Kang Liao1*

3 1College of Horticulture and Forestry, Xinjiang Agricultural University, Urumqi, Xinjiang, 4 . 2Xiongyue National Germplasm Resources Garden of the Liaoning Institute of 5 Pomology, Xiongyue, Shenyang, China. 3Luntai National Germplasm Resources 6 Garden of Xinjiang Academy of Agricultural Sciences, Luntai, Xinjiang, China. email: 7 [email protected]

8 Abstract

9 To clarify the phytogeography of Prunus armeniaca L., two chloroplast DNA 10 fragments (trnL-trnF and ycf1) and the nuclear ribosomal DNA internal transcribed 11 spacer (ITS) were employed to assess the genetic variation across 12 P. armeniaca 12 populations. The results of cpDNA and ITS sequence data analysis showed that the 13 level of genetic diversity in P. armeniaca was high (cpDNA: HT=0.499; ITS: 14 HT=0.876), and the level of genetic differentiation was low (cpDNA: FST=0.1628; ITS: 15 FST=0.0297). An analysis of molecular variance (AMOVA) revealed that most of the 16 genetic variation in P. armeniaca occurred among individuals within populations. The 17 value of interpopulation differentiation (NST) was significantly higher than the number 18 of substitution types (GST), indicating a genealogical structure in P. armeniaca. P. 19 armeniaca shared the same genotypes with related species and may be associated with 20 them through continuous and extensive gene flow. The haplotypes/genotypes of 21 cultivated populations in Xinjiang, North China, and foreign apricot 22 populations were mixed with large numbers of haplotypes/genotypes of wild apricot 23 populations from the Ili River Valley. The wild apricot populations in the Ili River 24 Valley contained the ancestral haplotypes/genotypes with the highest genetic diversity 25 and were located in an area considered a potential glacial refugiume for P. armeniaca. 26 Since population expansion occurred 16.53 kyr ago, the area has provided a suitable 27 climate for the population and protected the genetic diversity of P. armeniaca.

28 Introduction

29 The evolutionary history of organisms, including their genetic diversity, population 30 structure and historical dynamics, is critical to species conservation 1. Understanding 31 the effects of climate change on the spatial genetic patterns of species, particularly 32 endangered species, can help reveal not only the evolutionary history of species but 33 also conservation strategies 2-3. The origins of mountain biodiversity are complex and

1

34 may include the immigration of preadapted lineages 4-5, in situ diversification 6, or the 35 continuation of ancestral lineages 7. Compared with those of other mountains, such as 36 the Hengduan Mountains, the evolution and diversity of the Tianshan Mountains are 37 still poorly understood. The Ili River Valley is located in the western part of the 38 Tianshan Mountains in China and is surrounded by mountains on three sides. This 39 valley was the main northern crossing of the ancient Silk Road. In the late Tertiary 40 period, a large number of species, including wild apricot, wild apple and wild 41 hawthorn, remained in the Ili River Valley and were important components of the 42 deciduous broad-leaved forest at elevations below the coniferous forest and above the 43 mountain grassland belt in the Xinjiang Uygur Autonomous Region, China 8. 44 However, there are few studies on the phylogeography of species in the arid 45 region of Northwest China 9-10. 46 In recent decades, the method of combining molecular data with paleoclimatic and 47 geographical evidence has been effective in the study of system geography 11-12. Many 48 systematic geographic studies have shown that the Last Glacial Maximum (LGM) of 49 the Pleistocene strongly influenced the genetic variation and biodiversity of 50 throughout the Northern Hemisphere 13-14. Especially during the Quaternary glacial 51 period, species in the ice-free zone were mainly affected by the cold and dry climate 52 15. Climatic fluctuations cause the distribution of species to shrink and expand 16, and 53 cold and dry climates prompted species to retreat to refugia, providing shelter for 54 plants and animals 17. As temperatures rose after the glacial period, species underwent 55 population expansion, changes that left a genetic signature in their current populations 56 18. Although no unified glacial sheet had occurred in Asia, paleodata suggested that 57 the distribution ranges of woody species in this region were similarly changed by the 58 Quaternary climatic oscillations 19. Based on the geographical distribution of genetic 59 variation, the glacial refuges and postglacial revegetation routes of most woody plants 60 are roughly consistent with fossil evidence 20. However, due to the lack of fossil 61 evidence, genetic evidence has become an important means of providing information 62 on the distribution range of some woody species and the history of glacial refugia 21-22. 63 Volkova 23 interpret the observed genetic structure [nuclear (ITS) and plastid DNA] of 64 the Eurasian populations of Prunus padus as plausibly resulted from at least two 65 cycles of glacial survivals in refugia followed by post-glacial colonization events. 66 17The complex geographic history may have provided a refuge for species from the 67 last glacial period 17. Xu et al. 24 proposed that there were two possible independent 68 glacial refugia in Northwest China: the Ili River Valley and the Northern Junggar 69 Basin. Su et al. 9 speculate that aggravation of the dry and cold climate during the 70 early Quaternary, combined with plant physiological features, were determining 71 factors contributing to the lineage split, and climate oscillations most likely led to the 72 Yili range expansion. Therefore, it is of great interest to study the systematic 73 geography of endemic plants in Northwest China against the background of 74 Quaternary climatic fluctuations. 75 Apricot is a fruit of temperate and subtropical regions. Turkey, Uzbekistan, , 76 Algeria, Italy, Spain, China, Pakistan, France and are the main producers of

2

77 (http://www.fao.org/home/zh). Apricot belongs to the section Armeniaca 78 (Lam.) Koch, subgenus Prunophora Focke and genus Prunus () 25. Almost 79 all cultivated apricots originated from P. armeniaca 26. In China, wild apricot is 80 distributed in the Ili River Valley (Tianshan Mountain area). The species is also 81 distributed along the Tianshan mountains and westward to , 82 and Uzbekistan. It is a relic of a broad-leaved forest from the late Tertiary period, 83 which played a decisive role in the of cultivated apricots worldwide 27. 84 As the world experienced extreme weather events, especially glacial periods, and 85 most species went extinct, some of the more complex valleys may have become 86 refugia for surviving forests and locally present species. As a result, traces of the 87 species may be gradually shrinking in such areas. So, is Ili River Valley a glacial 88 refugia for wild apricot? 89 Generally, dispersal distance is much less than that of pollen, and population 90 divergence due to genetic drift will be more marked for chloroplast DNA (cpDNA) 91 than for nuclear DNA. Indeed, cpDNA is considered to evolve very slowly, with low 92 recombination and mutation rates 28. Organelle markers could provide powerful tools 93 for studying the phylogeography and migratory footprints of species 29. Parental 94 genetic markers are often combined with single-parent organelle markers for 95 population genetics studies 30. cpDNA and ITS sequence variations have been very 96 effective in revealing the glacial refuges of plants 29. cpDNA lineages usually show 97 the unique geographical distribution and evolutionary history of natural populations 98 and are therefore widely used in studies of systematic geography 31. Li et al. 32 99 successfully used cpDNA and ITS markers to evaluate the diversity and phylogenetic 100 relationships among populations of Saxifraga sinomontana, indicating that the species 101 had microrefugia during the Quaternary glacial period. In this study, we employed 102 cpDNA (trnL-trnF and ycf1) and nuclear ribosomal DNA (nrDNA) sequences to (1) 103 reveal the haplotype/genotypic diversity and population genetic structure of the 104 species; (2) test for potential glacial refugia for the species in the Ili River Valley; and 105 (3) assess whether the refugia show signs of recent expansion.

106 Materials and methods

107 Sample collection

108 The samples used for cpDNA analysis included 123 individuals from 20 populations. 109 A total of 171 individuals from 19 populations was used for the ITS analysis, of 110 which 38 samples were obtained from the NCBI database (Table S1). The samples 111 studied were from P. armeniaca and related species (P. sibirica, NAG; P. 112 mandshurica, LX; P. dasycarpa, ZX; P. mume, ECG; Prunus zhengheensis, ZHX 33; 113 Prunus limeixing, LMX 33 and , FGX). Prunus davidiana (T) was 114 regarded as the outgroup. 115 Our collection of wild apricot (P. armeniaca) populations covered most of the 116 natural distribution in China, including Huocheng County (DZGhcmd, DZGhcy and 117 DZGhcm populations), Yining County (DZGyn population), Gongliu County 3

118 (DZGglb and DZGgld populations) and Xinyuan County (DZGxyt, DZGxya and 119 DZGxyz populations). The distance between individuals sampled in each population 120 was at least 100 m. Young were collected and dried immediately with silica 121 gel. 122 The cultivated populations of P. armeniaca included the Xinjiang apricot group 123 (CAG, cultivated apricots in Xinjiang), the North China apricot group (NCG, 124 cultivated apricots in Shandong, Shaanxi, Gansu, Liaoning and Ningxia) and the 125 foreign apricot group (EG, cultivated apricots in the USA, France, Italy and Australia). 126 Detailed sample information was provided in Table 1 and Table S2. The main 127 characteristics of different populations can be found in Zhang et al. 8.

128 DNA sequencing

129 The total genomic DNA was extracted from the silica gel-dried materials using a 130 Plant Genomic DNA Kit (Tiangen Biotech, Beijing, China) 34. The quality and 131 concentration of the extracted DNA were determined by 1% agarose gel 132 electrophoresis and ultraviolet spectrophotometry, respectively. 133 cpDNA and nrDNA sequences were initially screened from 15 samples using 134 universal primers. The sequencing results showed that the sequences of cpDNA 135 (trnL-trnF and ycf1) and nrDNA (ITS) were polymorphic. cpDNA and ITS fragments 136 were amplified by polymerase chain reaction (PCR), and the details of their primers 137 are provided in Table S3 35-37. PCR was performed in a total volume of 25 µL that 138 contained 1 µL DNA, 5.5 µL PCR mix, 16.5 µL double-distilled water and 1 µL each 139 forward or reverse primer. PCR amplifications were performed under the following 140 conditions: 5 min of initial denaturation at 94℃; 35 cycles of 0.5 min at 94℃, 0.5 141 min of annealing at 58°, and 0.5 min of extension at 72℃, with 10 min of final 142 extension at 72℃. A CASpure PCR Purification Kit (CASarray, Shanghai, China) 32 143 was used for purification. The purified PCR products were sequenced on an ABI 144 PRISM 3730XL DNA Analyzer (Applied Biosystems, Foster City, CA, USA) 38.

145 Data analysis

146 We used BioEdit 39 to view and manually correct the sequencing results. We first used 147 CLUSTAL W 40 to align the sequences and coded indels following the method of 148 Simmons and Ochoterena 41. Then manual adjustments were made in MEGA ver. 149 7.0.26 42 to remove the overhanging tails to ensure a uniform sequence length. We 150 concatenated two chloroplast fragments (trnL-trnF and ycf1) to a separate matrix for 151 subsequent analysis. We used DnaSP ver. 5.10 to identify different haplotypes 152 (cpDNA sequences) or genotypes (ITS sequences). A haplotype network was 153 constructed using TCS ver. 1.2.1 43. ArcGIS ver. 10.2 (http://desktop.arcgis.com) 154 software was used to draw a haplotype geographical distribution map. 155 The haplotype/genotype diversity (Hd) and nucleotide diversity (π) were 156 calculated using DnaSP ver. 5.10 44 software. The within- population gene diversity 157 (HS), gene diversity in all populations (HT), interpopulation differentiation (GST) and

4

45 158 number of substitution types (NST) were calculated using PERMUT ver. 2.0 . The 159 last two indexes (GST and NST) were analyzed via permutation tests including 1,000 160 permutations. When NST is greater than GST, it indicates the existence of genealogical 161 geographic structure 45. Analysis of molecular variance (AMOVA) was performed 162 using Arlequin ver. 3.5.2.2 46 to partition the genetic variation at different levels, with 163 statistical significance determined by 1,000 permutations. 164 To investigate the historical dynamics of P. armeniaca, mismatch distribution 165 analysis was conducted using DnaSP ver. 5.10. The sum of squared deviations (SSDs), 166 Harpending's raggedness index (HRI) 45 and corresponding P values were calculated 46 167 using Arlequin ver. 3.5.2.2 . Neutrality tests based on Tajima’s D and Fu’s FS were 168 conducted to detect departures from the population equilibrium by Arlequin ver. 169 3.5.2.2 46. According to the formula of Rogers and Harpending 47, T=τ/2u, where “τ” 170 is the parameter value from the mismatch distribution model. In the formula u=μkg, 171 “μ” is the base substitution rate (chloroplast angiosperms 48: 1.1×10-9), “k” is the 172 fragment length (cpDNA length after combination: 2,062 bp), and “g” is the 173 generation time (20 years) 49. 174 The outgroup was P. davidiana, and the time of -apricot differentiation was 175 used as the calibration point 50. The BEAUti interface was used to create an input file 176 for BEAST 51, to which the GTR+I+G nucleotide substitution model. The data was 177 analyzed using a relaxed log-normal clock model and a Yule Process speciation 178 model for the tree priors. Prior settings for calibrating node were: offset of 55.1 Ma, a 179 log mean of 1.0 (log stdev of 0.5). The Bayesian Markov chain Monte Carlo 180 simulation was run for 100 million generations with a sample frequency of 1,000, and 181 the first 20% of generations were discarded as burn-in. Three independent analyses 182 were conducted and combined by LogCombiner ver. 1.8.4. Finally, annotation and 183 visualization of the maximum clade credibility tree were performed in TreeAnnotator 184 1.8.4 and FigTree ver. 1.4.3, respectively.

185 Results

186 Haplotype/genotype phylogenetics and distribution

187 Based on the concatenated cpDNA sequences (trnL-trnF and ycf1), 33 haplotypes 188 (H1-H33) were identified in 123 individuals from 20 populations of P. armeniaca and 189 related species (Figure 1, Table 2). The alignment lengths of the two chloroplast 190 fragments were 746 bp and 1,316 bp, respectively, and the combined length was 191 2,026 bp. Variable sites among the 33 haplotypes are shown in Table S4. Species P. 192 armeniaca harboured haplotype H1-H17 and H19-H20; species P. sibirica harboured 193 haplotype H28-H29; species P. mandshurica harboured haplotype H27; species P. 194 dasycarpa harboured haplotype H33; species P. mume harboured haplotype H20; 195 species P. zhengheensis harboured haplotype H32; species P. limeixing harboured 196 haplotype H23-H26; species P. brigantina harboured haplotype H21-H22; and species 197 P. davidiana harboured haplotype H30-H31. The results showed that populations of 198 different species did not share haplotypes. The results showed that, except for the EG 5

199 accessions, the diversity of wild P. armeniaca (DZG accessions) (nucleotide diversity 200 and haplotype diversity: 0.0013, 0.444, respectively) was higher than that of the CAG 201 (0.0006, 0.239) and NCG (0.0002, 0.400) accessions. 202 In the ITS dataset, a total of 57 ITS genotypes were discovered in 171 individuals 203 from 19 populations of P. armeniaca and its related species (Table 2, Figure S1), and 204 the alignment length was 545 bp. The variable sites among the 57 genotypes are 205 shown in Table S5. Species P. armeniaca harboured haplotype T1-T30, T42 and 206 T48-T50; species P. sibirica harboured haplotype T2-T3, T8-T9, T12, T25 and 207 T44-T47; species P. mandshurica harboured haplotype T3 and T9; species P. 208 dasycarpa harboured haplotype T2 and T55-T57; species P. mume harboured 209 haplotype T11 and T31-T41; species P. zhengheensis harboured haplotype T52-T54; 210 species P. limeixing harboured haplotype T3 and T43; and species P. davidiana 211 harboured haplotype T51. The results showed that, except for the EG accessions, the 212 diversity of wild P. armeniaca (DZG accessions) (nucleotide diversity and haplotype 213 diversity: 0.0055, 0.878, respectively) was higher than that of the CAG (0.0053, 0.832) 214 and NCG (0.0047, 0.818) accessions. 215 Based on the concatenated cpDNA sequences (trnL-trnF and ycf1), 19 haplotypes 216 (G1-G19) were identified in 107 individuals from 12 populations of P. armeniaca 217 (Figure 2). Variable sites among the 19 haplotypes are shown in Table S6. The Hd 218 and π values detected at the cpDNA sequence level in P. armeniaca were 0.548 and 219 1.9×10-3, respectively. The geographic distribution of the 19 haplotypes is shown in 220 Figure 2A, showing that G1 haplotypes were distributed in all populations. The 221 cpDNA haplotype network (Figure 2B) showed that 7 haplotypes were differentiated 222 from haplotype G1 by a one-step mutation with G1 at the center. Three haplotypes 223 were differentiated from G11 by a one-step mutation. The overall network map 224 showed a "star-like" distribution pattern. Based on the ITS sequences, 34 haplotypes 225 (P1-P34) were identified in 124 individuals from 12 populations of P. armeniaca 226 (Figure S2). The Hd and π values detected at the ITS sequence level in P. armeniaca 227 were 0.865 and 5.4×10-3, respectively.

228 Population structure and genealogical geography

229 In P. armeniaca, the gene diversity in all populations (cpDNA: HT=0.499; ITS: 230 HT=0.876) was higher than that within populations (cpDNA: HS=0.490; ITS: 231 HS=0.0.794) (Table 3). A permutation test showed that NST was significantly higher 232 than GST (cpDNA: NST=0.227>GST=0.020; ITS: NST=0.126>GST=0.094; P<0.05), 233 indicating that P. armeniaca has a significant geographical structure (Table 3). 234 AMOVA revealed significant genetic differentiation among all populations of P. 235 armeniaca (cpDNA: FST=0.1628, P<0.001; ITS: FST=0.0297, P<0.001), with most of 236 the genetic diversity occurring within the populations and relatively little occurring 237 among them (Table 4).

238 Demographic history and estimation of divergence times

6

239 The mismatch distribution based on cpDNA and ITS dataset analysis, in which 240 multimodal data were drawn from the cultivated populations or all populations, 241 indicated a demographic equilibrium (Figure 3, Figure S3). Both the neutrality tests 242 based on Tajima’s D (cpDNA: -2.272, P<0.05; ITS: -0.966, P<0.05) and Fu’s FS 243 (cpDNA: -5.8253, P<0.05; ITS: -12.223, P<0.05) and the mismatch distribution 244 analysis (Figure 3) based on the cpDNA and ITS dataset suggested recent range or 245 demographic expansion in wild populations of P. armeniaca. In addition, neither the 246 SSDs (cpDNA: 0.037, P>0.05; ITS: 0.002, P>0.05) nor the HRI (cpDNA: 0.179, 247 P>0.05; ITS: 0.017, P>0.05) showed significant positive values (Table 5), indicating 248 no deviation of the observed mismatch distribution from that obtained via model 249 simulation under sudden demographic expansion. Thus, we concluded that the 250 demographic expansion time of the wild populations of P. armeniaca was 16.53 kyr 251 ago. 252 The cpDNA dataset was employed to estimate the time of the onset of divergence 253 between P. armeniaca and its related species (Figure 4, Table S7). The results showed 254 that 33 haplotypes were divided into two groups: P. armeniaca (blue) and related 255 species (green) (Figure 4). The divergence time revealed that the differentiation of P. 256 armeniaca from its related species occurred during the middle Eocene, at 257 approximately 45.68 Ma (95% HPD = 28.47-61.87). The onset of intraspecific 258 divergence in P. armeniaca was estimated to be 25.55 (95% HPD = 12.93-39.63) Ma.

259 Discussion

260 Based on the cpDNA sequences, the total genetic diversity in P. armeniaca was 261 relatively low but sufficient to reflect the long evolutionary history and wide 262 distribution of the species 49. Many scholars 52-55 believed that the diversity of wild 263 apricot was richest in the Ili River Valley, with low levels of genetic differentiation 264 and genetic variation mainly occurring within populations. Hu et al. 54 used simple 265 sequence repeat markers to analyze the diversity of 212 apricot germplasm from 14 266 populations in Ili River Valley. Among them, the Tuergen township in Xinyuan 267 County had the highest genetic diversity, and the genetic distance between 268 populations was significantly correlated with the geographical distance. The 269 self-incompatible pattern, wide distribution, long-distance transmission of pollen 270 through insects and strong winds are the main factors affecting the genetic variation 271 structure 54. The haplotype/genotypic diversity of the wild populations of P. 272 armeniaca in the Ili River Valley obtained from cpDNA and ITS data was relatively 273 high (Table 2). The results of AMOVA (Table 4) showed that the genetic diversity in 274 P. armeniaca mainly occurs within populations (cpDNA: 83.72%; ITS: 97.03%), but 275 there were also significant differences among populations (cpDNA: 16.28%; ITS: 276 2.97%), which was consistent with previous results based on simple repeat markers 54. 277 The relatively high genetic diversity also confirmed the status of Tianshan wild 278 apricot as the origin center of cultivated apricot 54. The limited informative mutation 279 sites among ITS genotypes led to an almost complete lack of resolution for the 280 construction of genotype relationships (Figure S2), suggesting rapid intraspecific 7

281 differentiation in the most recent species of P. armeniaca, and similar results were 282 found in Saxifraga sinomontana 32. 283 The genetic backgrounds of the NAG, LX, and ZX populations had the same 284 genotypes (T2, T7-8 and T12) as P. armeniaca, indicating that they are associated 285 with P. armeniaca through continuous and extensive gene flows 56. Based on 286 microsatellite markers, Liu et al. 56 believed that P. sibirica was divided into two 287 groups, one of which may have under gene exchange with P. armeniaca, further 288 verifying our results. In addition, the authors found an extensively mixed genetic 289 background in the germplasm of cultivated apricots in China. This study indicated 290 that the cultivated and wild populations of P. armeniaca had the same ancestral 291 haplotype, which was G1. The haplotypes of the CAG, EG and NCG populations 292 were mixed with the haplotypes/genotypes of the large wild populations (P. 293 armeniaca). According to coalescent theory 57, chloroplast haplotype G1, which was 294 widely distributed and located in the center of the chloroplast network (Figure 2), 295 should be considered the oldest haplotype. The Kashgar, Hotan and Aksu oasis areas 296 around the Tarim Basin in the southern part of the Xinjiang Uygur Autonomous 297 Region of China are the main apricot-producing areas and contain the greatest 298 abundance of apricot . There is only one mountain between southern Xinjiang 299 and the Ili Valley, and there are several corridors between the northern and southern 300 Tianshan Mountains. Therefore, the apricots cultivated in Xinjiang, southern Tianshan 301 Mountains (CAG), most likely evolved from the spread of wild apricots in the Ili 302 River Valley. 303 Liu et al. 56 believed that apricots have experienced at least three domestication 304 events, which formed apricots in (the United States and continental Europe), 305 southern (Turkmenistan, Afghanistan, India) and China, respectively, in 306 which both ancient gene flows and recent gene mixing phenomena exist. Central Asia 307 was the most diverse center of wild apricots, with genetically differentiated 308 populations that may have resulted from population isolation in glacial shelters 56. In 309 this study, the nucleotide diversity and haplotype/genotype diversity of wild apricots 310 (DZG accessions) were higher than that of CAG and NCG accessions. These findings 311 are reasonable from a historical perspective, as there was extensive cultural contact 312 along the Silk Road from 207 BCE to 220 CE 58. Therefore, historical and commercial 313 influences may have contributed to the development of this unique species of 314 cultivated apricot. 315 The theory and method of phylogeography can reveal the historical dynamics of 316 species or populations, such as expansion, differentiation, isolation, migration and 317 extinction 29. Phylogeographic studies have shown that ancient haplotypes and high 318 genetic diversity can be used to identify refuges 1, 59. Many scholars 17, 19, 24 have 319 suggested that the complex geographic history of Northwest China may have 320 provided refuges for species during glacial periods. By combining two markers, we 321 showed that the wild populations distributed in the Ili River Valley all contained 322 ancestral haplotypes/genotypes and had high genetic diversity (Table 2). These 323 populations were located in areas considered glacial refugia for P. armeniaca, which

8

324 appears to be a relic of Quaternary glaciation. The region provides a suitable climate 325 for the biological community and protects the genetic diversity of P. armeniaca. Both 326 neutrality tests and mismatch distribution analysis based on the cpDNA and ITS 327 dataset suggested recent range or demographic expansion of wild populations of P. 328 armeniaca. We estimated that the recent demographic expansion time of the wild 329 populations of P. armeniaca was 16.53 kyr ago, that is, at the end of the LGM 49. 330 The taxa-sampling and the fossil calibration strategy will influence the age 331 estimation 60-61. Due to the lack of fossil evidence of P. armeniaca, we used the 332 peach-apricot divergence time as the calibration point. In this study, we tried to use 333 cpDNA data set to construct the divergence time, and the effect was relatively ideal. 334 However, compared with the median ages estimated by Chin et al. 50 (mean age of 335 31.1 Ma), the divergence time estimates in this study should be treated with caution, 336 because the limited coverage and a low number of calibration points may skew 337 divergence time estimation of P. sibirica (mean age of 33.43 Ma) too high.

338 Conclusion

339 Based on the cpDNA and ITS datasets, the genetic diversity of P. armeniaca remains 340 high. Haplotypes/genotypes shared with close relatives tend to occur in 341 geographically similar populations, and P. armeniaca exhibits a genealogical structure. 342 The haplotypes/genotypes of cultivated apricot populations in Xinjiang, North China, 343 and foreign apricot populations were mixed with large numbers of 344 haplotypes/genotypes of wild apricot populations from the Ili River Valley. We argue 345 that P. armeniaca originated in Northwest China (Ili Valley). Affected by the 346 Quaternary glaciation of the Pleistocene, the Ili River Valley in Northwest China 347 served as a glacial refugium for P. armeniaca, providing the species with a suitable 348 climate and preserving its genetic diversity. During the interglacial period, the species 349 underwent a recent expansion in the face of favorable climatic and environmental 350 conditions. Apricots originated during the middle Eocene, and the cultivated apricot 351 in Xinjiang originated from the Ili River Valley in Northwest China.

352 References

353 1. Meng, H. H. & Zhang, M. L. Diversification of plant species in arid Northwest 354 China: species-level phylogeographical history of Lagochilus Bunge ex Bentham 355 (Lamiaceae). Mol. Phylogenet. Evol. 68, 398-409. 356 https://doi.org/10.1111/jse.12088 (2015). 357 2. Pennington, R. T. et al. Contrasting plant diversification histories within the 358 Andean biodiversity hotspot. Proc. N. Acad. of Sci. USA 107, 13783-13787. 359 https://doi.org/10.1073/pnas.1001317107 (2010). 360 3. Hughes, C. & Eastwood, R. Island radiation on a continental scale: Exceptional 361 rates of plant diversification after uplift of the Andes. Proc. N. Acad. of Sci. USA 362 103, 10334-10339. https://doi.org/10.1073/pnas.0601928103 (2006). 363 4. Johansson, U. S. et al. Build-up of the Himalayan avifauna through immigration:

9

364 a biogeographical analysis of the Phylloscopus and Seicercus warblers. Evolution 365 61, 324-333. https://doi.org/10.1111/j.1558-5646.2007.00024.x (2007). 366 5. Hughes, C. E. & Atchison, G. W. The ubiquity of alpine plant radiations: from 367 the Andes to the Hengduan Mountains. New Phytol. 207, 275-282. 368 https://doi.org/10.1111/nph.13230 (2015). 369 6. Lagomarsino, L. P., Condamine, F. L., Antonelli, A., Mulch, A. & Davis, C. C. 370 The abiotic and biotic drivers of rapid diversification in Andean bellflowers 371 (Campanulaceae). New Phytol. 210, 1430–1442. 372 https://doi.org/10.1111/nph.13920 (2016). 373 7. Ebersbach, J. et al. In and out of the Qinghai-Tibet Plateau: divergence time 374 estimation and historical biogeography of the large arctic-alpine genus Saxifraga 375 L. J. Biogeogr. 44, 900-910. https://doi.org/10.1111/jbi.12899 (2017). 376 8. Zhang, J. Y. & Zhang, Z. Flora of Chinese Fruit Trees 61-62 (China forestry 377 press, 2003) 378 9. Su, Z., Zhang, M. & Sanderson, S. C. Chloroplast phylogeography of 379 Helianthemum songaricum (Cistaceae) from northwestern China: implications 380 for preservation of genetic diversity. Conserv. Genet. 12, 1525-1537. 381 https://doi.org/10.1007/s10592-011-0250-9 (2011). 382 10. Xie, K. Q. & Zhang, M. L. The effect of Quaternary climatic oscillations on 383 Ribes meyeri (Saxifragaceae) in northwestern China. Biochem. Syst. Ecol. 50, 384 39-47. https://doi.org/10.1016/j.bse.2013.03.031 (2013). 385 11. Salvi, D., Schembri, P., Sciberras, A. & Harris, D. Evolutionary history of the 386 Maltese wall lizard Podarcis filfolensis: insights on the ‘Expansion–Contraction’ 387 model of Pleistocene biogeography. Mol. Ecol. 23, 1167–1187. 388 https://doi.org/10.1111/mec.12668 (2014). 389 12. Liu, J. Q., Sun, Y. S., Ge, X. J., Gao, L. M. & Qiu, Y. X. Phylogeographic studies 390 of plants in China: Advances in the past and directions in the future. J. Syst. Evol. 391 50, 267-275. https://doi.org/10.1111/j.1759-6831.2012.00214.x (2012). 392 13. Hewitt, G. The genetic legacy of the Quaternary ice ages. Nature 405, 907–913. 393 https://doi.org/10.1038/35016000 (2000). 394 14. Hewitt, G. M. The structure of biodiversity-insights from molecular 395 phylogeography. Front. Zool. 1, 1-16. https://doi.org/10.1186/1742-9994-1-4 396 (2004). 397 15. Willis, K. J. & Niklas, K. J. The role of Quaternary environmental change in 398 plant macroevolution: the exception or the rule? Philos. Trans. R. Soc. Lond. B 399 359, 159-172. https://doi.org/10.1098/rstb.2003.1387 (2004). 400 16. Schmitt, T. Molecular biogeography of Europe: Pleistocene cycles and 401 postglacial trends. Front. Zool. 4, 11. https://doi.org/10.1186/1742-9994-4-11 402 (2007). 403 17. Shen, L., Chen, X. Y. & Li, Y. Y. Glacial refugia and postglacial recolonization 404 patterns of organisms. Acta Ecol. Sin. 22, 1983-1990. 405 https://doi.org/10.1088/1009-1963/11/5/313 (2002). 406 18. Schonswetter, P., Popp, M. & Brochmann, C. Rare arctic-alpine plants of the

10

407 European Alps have different immigration histories: the snow bed species 408 Minuartia biflora and Ranunculus pygmaeus. Mol. Ecol. 15, 709-720. 409 https://doi.org/10.1111/j.1365-294X.2006.02821.x (2006). 410 19. Guo, Y. P., Zhang, R., Chen, C. Y., Zhou, D. W. & Liu, J. Q. Allopatric 411 divergence and regional range expansion of Juniperus sabina in China. J. Syst. 412 Evol. 48, 153-160. https://doi.org/10.1111/j.1759-6831.2010.00073.x (2010). 413 20. Jaramillo-Correa, J. P., Beaulieu, J. & Bousquet, J. Variation in mitochondrial 414 DNA reveals multiple distant glacial refugia in black spruce (Picea mariana), a 415 transcontinental North American conifer. Mol. Ecol. 13, 2735-2747. 416 https://doi.org/10.1111/j.1365-294X.2004.02258.x (2004). 417 21. Afzal-Rafii, Z. & Dodd, R. S. Chloroplast DNA supports a hypothesis of glacial 418 refugia over postglacial recolonization in disjunct populations of black pine 419 (Pinus nigra) in western Europe. Mol. Ecol. 16, 723-736. 420 https://doi.org/10.1111/j.1365-294X.2006.03183.x (2007). 421 22. Anderson, L., Hu, F., Nelson, D., Petit, R. & Paige, K. Ice-age endurance: DNA 422 evidence of a white spruce refugium in Alaska. Proc. N. Acad. of Sci. USA 103, 423 12447–12450. https://doi.org/10.1073/pnas.0605310103 (2006). 424 23. Volkova, P. A., Burlakov, Y. A. & Schanzer, I. A. Genetic variability of Prunus 425 padus (Rosaceae) elaborates “a new Eurasian phylogeographical paradigm”. 426 Plant Syst. Evol. 306, 1-9. https://doi.org/10.1007/s00606-020-01644-0 (2020). 427 24. Xu, Z. & Zhang, M. L. Phylogeography of the arid shrub Atraphaxis frutescens 428 (Polygonaceae) in northwestern China: evidence from cpDNA sequences. J. 429 Hered. 106, 184-195. https://doi.org/10.1093/jhered/esu078 (2015). 430 25. Rehder, A. Manual of cultivated trees and shrubs hardy in North America, 431 exclusive of the subtropical and warmer temperate regions 345-346 (Macmillan, 432 1927) 433 26. Zhebentyayeva, T. N., Ledbetter, C., Burgos, L., & Llácer, G. Fruit breeding 434 415-458 (Springer, 2012) 435 27. Zhebentyayeva, T. N., Reighard, G. L., Gorina, V. M. & Abbott, A. G. Simple 436 sequence repeat (SSR) analysis for assessment of genetic variability in apricot 437 germplasm. Theor. Appl. Genet. 106, 435-444. 438 https://doi.org/10.1007/s00122-002-1069-z (2003). 439 28. Schaal, B. A., Hayworth, D. A., Olsen, K. M., Rauscher, J. T. & Smith, W. A. 440 Phylogeographic studies in plants: problems and prospects. Mol. Ecol. 7, 441 465-474. https://doi.org/10.1046/j.1365-294x.1998.00318.x (1998). 442 29. Avise, J. C. Phylogeography: retrospect and prospect. J. Biogeogr. 36, 3-15. 443 https://doi.org/10.1111/j.1365-2699.2008.02032.x (2009). 444 30. Poudel, R. C., Möller, M., Li, D. Z., Shah, A. & Gao, L. M. Genetic diversity, 445 demographical history and conservation aspects of the endangered yew tree 446 Taxus contorta (syn. Taxus fuana) in Pakistan. Tree Genet. Genom. 10, 653-665. 447 https://doi.org/10.1007/s11295-014-0711-7 (2014). 448 31. Dutech, C., Maggia, L. & Joly, H. Chloroplast diversity in Vouacapoua 449 americana (Caesalpiniaceae), a neotropical forest tree. Mol. Ecol. 9, 1427-1432.

11

450 https://doi.org/10.1046/j.1365-294x.2000.01027.x (2000). 451 32. Li, Y. et al. Rapid intraspecific diversification of the Alpine species Saxifraga 452 sinomontana (Saxifragaceae) in the Qinghai-Tibetan Plateau and Himalayas. 453 Front. Genet. 9, 381. https://doi.org/10.3389/fgene.2018.00381 (2018). 454 33. Zhang, Q. P. & Liu, W. S. Advances of the apricot resources collection, 455 evaluation and germplasm enhancement. Acta Hortic. Sin. 45, 1642-1660. 456 https://doi.org/10.16420/j.issn.0513-353x.2017-0654 (2018). 457 34. Hu, Z. B. et al. Population genomics of pearl millet (Pennisetum glaucum (L.) R. 458 Br.): Comparative analysis of global accessions and Senegalese landraces. BMC 459 Genomics 16, 1048. https://doi.org/10.1186/s12864-015-2255-0 (2015). 460 35. White, T. J., Bruns, T., Lee, S. & Taylor, J. Amplification and direct sequencing 461 of fungal ribosomal RNA genes for phylogenetics. PCR Protoc.: Guide Methods 462 Appl. 18, 315-322. (1990). 463 36. Dong, W. et al. ycf1, the most promising plastid DNA barcode of land plants. Sci. 464 Rep. 5, 8348. https://doi.org/10.1038/srep08348 (2015). 465 37. Bortiri, E. et al. Phylogeny and Systematics of Prunus (Rosaceae) as Determined 466 by Sequence Analysis of ITS and the Chloroplast trnL-trnF Spacer DNA. Syst. 467 Bot. 26, 797-807. https://doi.org/full/10.1043/0363-6445-26.4.797 (2001). 468 38. Zhang, Q. Y. et al. Latitudinal Adaptation and Genetic Insights Into the Origins 469 of Cannabis sativa L. Front Plant Sci. 9, 1876. 470 https://doi.org/10.3389/fpls.2018.01876 (2018). 471 39. Hall, T. A. BioEdit: a user-friendly biological sequence alignment editor and 472 analysis program for Windows 95/98/NT. Nucleic Acids Sumo. Ser. 41, 95-98. 473 https://doi.org/10.1021/bk-1999-0734.ch008 (1999). 474 40. Thompson, J. D., Higgins, D. G. & Gibson, T. J. CLUSTAL W: improving the 475 sensitivity of progressive multiple sequence alignment through sequence 476 weighting, position-specific gap penalties and weight matrix choice. Nucleic 477 Acids Res. 22, 4673-4680. https://doi.org/10.1093/nar/22.22.4673 (1994). 478 41. Simmons, M. P. & Ochoterena, H. Gaps as characters in sequence-based 479 phylogenetic analyses. Syst. Biol. 49, 369-381. 480 https://doi.org/10.1080/10635159950173889 (2000). 481 42. Kumar, S., Stecher, G. & Tamura, K. MEGA7: molecular evolutionary genetics 482 analysis version 7.0 for bigger datasets. Mol. Biol. Evol. 33, 1870-1874. 483 https://doi.org/10.1093/molbev/msw054 (2016). 484 43. Clement, M., Posada, D. & Crandall, K. A. TCS: a computer program to estimate 485 gene genealogies. Mol. Ecol. 9, 1657-1659. 486 https://doi.org/10.1046/j.1365-294x.2000.01020.x (2000). 487 44. Librado, P. & Rozas, J. DnaSP v5: a software for comprehensive analysis of 488 DNA polymorphism data. Bioinformatics 25, 1451-1452. 489 https://doi.org/10.1093/bioinformatics/btp187 (2009). 490 45. Pons, O. & Petit, R. J. Measwring and testing genetic differentiation with 491 ordered versus unordered alleles. Genetics 144, 1237-1245. 492 https://doi.org/doi:10.1016/S1050-3862(96)00162-3 (1996).

12

493 46. Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: a new series of programs to 494 perform population genetics analyses under Linux and Windows. Ecol. Resour. 495 10, 564-567. https://doi.org/10.1111/j.1755-0998.2010.02847.x (2010). 496 47. Rogers, A. R. & Harpending, H. Population growth makes waves in the 497 distribution of pairwise genetic differences. Mol. Biol. Evol. 9, 552-569. (1992). 498 48. Wolfe, K. H., Li, W. H. & Sharp, P. M. Rates of nucleotide substitution vary 499 greatly among plant mitochondrial, chloroplast, and nuclear DNAs. Proc. N. 500 Acad. of Sci. USA 84, 9054-9058. https://doi.org/10.1073/pnas.84.24.9054 501 (1987). 502 49. Wang, Z. et al. Phylogeography study of the Siberian apricot ( L.) 503 in Northern China assessed by chloroplast microsatellite and DNA makers. Front. 504 Plant Sci. 8, 1989. https://doi.org/10.3389/fpls.2017.01989 (2017). 505 50. Chin, S. W., Shaw, J., Haberle, R., Wen, J. & Potter, D. Diversification of 506 , , plums and cherries-Molecular systematics and biogeographic 507 history of Prunus (Rosaceae). Mol. Phylogenet. Evol. 76, 34-48. 508 https://doi.org/10.1016/j.ympev.2014.02.024 (2014). 509 51. Drummond, A. J., Suchard, M. A., Xie, D. & Rambaut, A. Bayesian 510 phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29, 1969-1973. 511 https://doi.org/10.1093/molbev/mss075 (2012). 512 52. Li, M., Zhao, Z. & Miao, X. J. Genetic variability of wild apricot (Prunus 513 armeniaca L.) populations in the Ili Valley as revealed by ISSR markers. Genet. 514 Resour. Crop Evol. 60, 2293-2302. https://doi.org/10.1007/s10722-013-9996-x 515 (2013). 516 53. Li, M., Hu, X., Miao, X. J., Xu, Z. & Zhao, Z. Genetic diversity analysis of wild 517 apricot (Prunus armeniaca) populations in the lli Valley as revealed by SRAP 518 markers. Acta Hortic. Sin. 43, 1980-1988. 519 https://doi.org/10.16420/j.issn.0513-353x.2016-0156 (2016). 520 54. Hu, X., Zheng, P., Ni, B., Miao, X. & Li, M. Population genetic diversity and 521 structure analysis of wild apricot (Prunus armeniaca L.) revealed by SSR 522 markers in the Tien-Shan mountains of China. Pak. J. Bot. 50, 609-615. (2018). 523 55. Decroocq, S. et al. New insights into the history of domesticated and wild 524 apricots and its contribution to Plum pox virus resistance. Mol. Ecol. 25, 525 4712-4729. https://doi.org/10.1111/mec.13772 (2016). 526 56. Liu, S. et al. The complex evolutionary history of apricots: species divergence, 527 gene flow and multiple domestication events. Mol. Ecol. Notes 28, 5299-5314. 528 https://doi.org/doi.org/10.1111/mec.15296 (2019). 529 57. Posada, D. & Crandall, K. A. Intraspecific gene genealogies: trees grafting into 530 networks. Trends Ecol. Evol. 16, 37-45. 531 https://doi.org/10.1016/S0169-5347(00)02026-7 (2001). 532 58. Boulnois, L. Silk road: monks, warriors & merchants on the Silk Road 115-165 533 (WW Norton & Co Inc, 2004) 534 59. Zhao, C., Wang, C. B., Ma, X. G., Liang, Q. L. & He, X. J. Phylogeographic 535 analysis of a temperate-deciduous forest restricted plant (Bupleurum

13

536 longiradiatum Turcz.) reveals two refuge areas in China with subsequent refugial 537 isolation promoting speciation. Mol. Phylogen. Evol. 68, 628-643. 538 https://doi.org/10.1016/j.ympev.2013.04.007 (2013). 539 60. Ebersbach, J., Schnitzler, J., Favre, A. & Muellner-Riehl, A. N. Evolutionary 540 radiations in the species-rich mountain genus Saxifraga L. BMC Evol. Biol. 17. 541 https://doi.org/10.1186/s12862-017-0967-2 (2017). 542 61. Favre, A. et al. The role of the uplift of the Qinghai-Tibetan Plateau for the 543 evolution of Tibetan biotas. Biol. Rev. 90, 236-253. (2015). 544 https://doi.org/10.1111/brv.12107 545

546 Acknowledgments

547 We are grateful to K.L. for the useful discussions and insightful comments, L.Q.L., 548 Q.P.Z., W.Q.Z. and G.Q.F. for the assistance in collecting samples.

549 Author contribution

550 W.W.L., and K.L. conceived the study. L.Q.L., Q.P.Z., W.Q.Z. and G.Q.F. 551 contributed to the sampling. W-WL and Adam collected and analyzed the data. 552 W.W.L. and K.L. wrote the manuscript. All authors read and approved the final 553 manuscript.

554 Funding

555 This work was supported by the National Key Research and Development Program 556 (grant number 2016YFC0501504), the Xinjiang Uygur Autonomous Region 557 Horticulture Key Discipline Fund (grant number 2016-10758-3), the Natural Science 558 Foundation of Xinjiang Uygur Autonomous Region and the Xinjiang Agricultural 559 University Crop science postdoctoral research station. The authors would like to thank 560 the different research institutions, scientists, and breeders involved in this work and 561 the company American Journal Experts (AJE) for providing English editorial 562 assistance.

563 Competing interests

564 The authors declare that the research was conducted in the absence of any commercial 565 or financial relationships that could be construed as a potential conflict of interest.

14

566 Figures

567 568 Figure 1. The haplotype network generated from the haplotypes of P. armeniaca and 569 related species based on cpDNA dataset. The small black circles shown an 570 intermediate haplotype not detected in this study. A, P. armeniaca; B, P. zhengheensis; 571 C, P. davidiana; D, P. sibirica; E, P. dasycarpa; F, P. limeixing; G, P. mandshurica; 572 H, P. mume; I, P. brigantina.

15

573 Figure 2. Geographic location and haplogroup distribution patterns of the 12 574 populations of P. armeniaca based on cpDNA dataset. (A) Geographic distribution of 575 haplotyoes of P. armeniaca; Pie chart size corresponds to the sample size of each 576 population. (B) The haplotype network generated from the haplotyoes of P. armeniaca. 577 The haplotype network generated from the 19 haplotypes of P. armeniaca. The small 578 black circles shown an intermediate haplotype not detected in this study. 579

16

580 581 582 583 584 585 586 587 588 589 Figure 3. Mismatch distribution analysis of P. armeniaca based on overall gene pool 590 of cpDNA dataset. 591

17

592 Figure 4. Bayesian phylogenetic tree based on cpDNA dataset. Gray bars represent 593 95% highest posterior density intervals. Plio, Pliocene; Pl, Pleistocene; numbers 594 above the branches indicate Bayesian posterior probabilities.

18

595 Tables

Latitude Longitude Altitude Population Origin/location Type (N°) (E°) (m) P. armeniaca DZGhcmd Huocheng,Xinjiang W 44.43 80.79 1187.6 DZGhcy Huocheng,Xinjiang W 44.44 80.79 1245.7 DZGhcm Huocheng,Xinjiang W 44.40 80.71 1244.1 DZGyn Yining,Xinjiang W 44.12 81.62 1983.6 DZGglb Gongliu,Xinjiang W 43.25 82.86 1371.5 DZGgld Gongliu,Xinjiang W 43.23 82.75 1269.4 DZGxyt Xinyuan,Xinjiang W 43.54 83.44 1138.8 DZGxya Xinyuan,Xinjiang W 43.50 83.70 1275.3 DZGxyz Xinyuan,Xinjiang W 43.38 83.61 1374.1 CAG Luntai,Xinjiang C 41.78 84.23 972.0 NCG Luntai,Xinjiang C 41.78 84.23 972.0 EG Xiongyue,Liaoning C 40.17 122.16 20.4 P. sibirica NAG Xiongyue,Liaoning W 40.17 122.16 20.4 P. mandshurica LX Xiongyue,Liaoning C 40.17 122.16 20.4 P. dasycarpa ZX Luntai,Xinjiang C 41.78 84.23 972.0 P. mume ECG Xiongyue,Liaoning C 40.17 122.16 20.4 P. zhengheensis ZHX Xiongyue,Liaoning C 40.17 122.16 20.4 P. limeixing LMX Xiongyue,Liaoning C 40.17 122.16 20.4 P. brigantina FGX Xiongyue,Liaoning C 40.17 122.16 20.4 P. davidiana T Luntai, Xinjiang O 41.78 84.23 972.0 596 Table 1. Population code, sampling location, coordinates, altitude and types of P. 597 armeniaca and relative species. W, wild; C, cultivars; O, out group.

19

598 cpDNA ITS Population Haplotype composition N Hd π Genotype composition N Hd π DZGhcmd H1(6), H10(1), H11(1), H12(1) 9 0.583 0.0016 T2(2), T9(1), T14(1), T17(2), T19(1), T20(1), T21(1), T22(1) 10 0.956 0.0082 DZGhcy H1(9) 9 0.000 0.0000 T3(2), T8(1), T9(3), T23(1), 7 0.810 0.0067 DZGhcm H1(7), H8(1), H9(1) 9 0.417 0.0002 T2(1), T3(4), T8(1), T9(1), T18(1) 8 0.786 0.0000 DZGyn H1(6), H7(1), H15(1), H16(1), H17(1) 10 0.667 0.0011 T1(1), T3(2), T8(2), T9(1), T14(2), T30(1) 9 0.917 0.0087 DZGglb H1(6), H5(1), H6(1), H7(2) 10 0.644 0.0012 T2(1), T3(3), T8(2), T13(1), T14(1), T15(1) 9 0.889 0.0043 DZGgld H1(6), H3(1) 7 0.286 0.0010 T3(4), T8(1), T16(1), T17(1) 7 0.714 0.0027 DZGxyt H1(8), H2(1) 9 0.222 0.0001 T3(4), T8(2), T9(2), T12(1), T25(1) 10 0.822 0.0032 DZGxya H1(1), H4(1), H13(1) 3 1.000 0.0039 T2(2), T24(1) 3 0.667 0.0988 DZGxyz H1(7), H3(1), H14(1) 9 0.417 0.0031 T2(2), T3(2), T26(1), T27(1),T28(1), T29(2) 9 0.917 0.0057 Within population 75 0.444 0.0013 72 0.878 0.0055 T1(5), T2(11), T3(8), T4(1), T5(1), T6(1), T7(1), T8(1), T9(1), CAG H1(21), H2(1), H3(1), H4(1) 24 0.239 0.0006 34 0.832 0.0053 T10(1), T11(2), T12(1) NCG H1(4), H2(1) 5 0.400 0.0002 T1(1), T3(5), T9(2), T48(1), T49(1), T50(2) 12 0.818 0.0047 EG H1(1), H18(1), H19(1) 3 1.000 0.0045 T2(1), T3(4), T42(1) 6 0.600 0.0053 Within population 32 0.343 0.0012 52 0.833 0.0052 Among population 107 0.548 0.0019 12 0.865 0.0054 T2(1), T3(6), T8(1), T9(1), T12(1), T25(1), T44(1), T45(1), NAG H28(1), H29(2) 3 0.667 0.0023 15 0.857 0.0104 T46(1), T47(1) LX H27(1) 1 0.000 0.0000 T3(4), T9(1) 5 0.400 0.0014 ZX H33(1) 1 0.000 0.0000 T2(2), T55(1), T56(2), T57(2) 7 0.857 0.0229 T11(1), T31(1), T32(1), T33(1), T34(1), T35(1), T36(1), ECG H20(1) 1 0.000 0.0000 12 1.000 0.017 T37(1), T38(1), T39(1), T40(1), T41(1) ZHX H32(1) 1 0.000 0.0000 T52(1), T53(1), T54(1) 3 1.000 0.0047 LMX H23(1), H24(1), H25(1), H26(1) 4 1.000 0.0051 T3(3), T43(1) 4 0.500 0.0170 FGX H21(2), H22(1) 3 0.667 0.0003

20

T H30(1), H31(1) 2 1.000 0.0005 T51(1) 1 0.000 0.0000 599 Table 2. Sample information and summary of haplotype/genotype distribution, genetic diversity for each population. N, sample size; Hd, 600 haplotype/genotype diversity; π, nucleotide diversity. 601

21

cpDNA ITS Populations HS HT GST NST HS HT GST NST P. armeniaca (Cultivated & Wild) 0.490 0.499 0.020 0.227* 0.794 0.876 0.094 0.126* Table 3. Estimates of average gene diversity within populations (HS), total gene diversity (HT), interpopulation differentiation (GST) and number of substitution types (NST) for cpDNA haplotypes and ITS genotypes of P. armeniaca. Ns, not significant; *, P<0.05.

cpDNA ITS Source of variation df SS VC PV (%) FST df SS VC PV (%) FST Among populations 11 31.699 0.209 16.28 11 23.777 0.068 4.38 Within populations 95 102.086 1.075 83.72 112 166.844 1.490 95.62 Total 106 133.785 1.284 0.1628* 123 190.621 1.558 0.0297* Table 4. Analyses of molecular variance (AMOVA) of cpDNA haplotypes and ITS genotypes for populations of P. armeniaca. df, degrees of freedom; SS, sum of squares; VC, variance components; PV, percentage of variation; FST, fixation index; *P<0.001.

Tajima’s D-test Fu’s FS-test Mismatch distribution Populations D P FS P SSD P HRI P Overall -2.417 0.000* -6.350 0.025* 0.232 0.150 0.218 0.600 cpDNA P. arminiaca (cultivated) -1.933 0.003* 1.289 0.772 0.038 0.250 0.356 0.500 P. arminiaca (wild) -2.272 0.001* -5.825 0.021* 0.037 0.070 0.179 0.880 Overall -1.193 0.106 -8.587 0.010* 0.002 0.660 0.023 0.700 ITS P. arminiaca (cultivated) -1.420 0.155 -4.951 0.024* 0.015 0.020 0.063 0.310 P. arminiaca (wild) -0.966 0.041* -12.223 0.000* 0.002 0.810 0.017 0.880 Table 5. Neutrality tests (Tajima’s D and Fu’s FS) and mismatch distribution analysis for the P. armeniaca based on the cpDNA and ITS dataset. * P<0.05.

22

Figures

Figure 1

The haplotype network generated from the haploty p es of P. armeniaca and related species based on cp D NA dataset dataset. The small black circles shown an intermediate haplotype not detected in this study. A, P. armeniaca ; B, P. zhengheensis; C, P. davidiana; D, P. sibirica; E, P. dasycarpa; F, P. limeixing G, P. mandshurica H, P. mume; I, P. brigantina. Figure 2

Geographic location and haplogroup distribution patterns of the 12 populations of P. armeniaca based on cpDNA dataset . (A) Geographic distribution of h aplotyoes of P. armeniaca ; Pie chart size correspond s to the sample size of each population. (B) Th e haplotype network generated from the haplotyoes of P. armeniaca. The haplotype network generated from the 19 haplotypes of P. armeniaca The small black circles shown an intermediate haplotype not detected in this study. Note: The designations employed and the presentation of the material on this map do not imply the expression of any opinion whatsoever on the part of Research Square concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. This map has been provided by the authors. Figure 3

Mismatch di stribut ion analysis of P armeniaca based on overall gene pool of cpDNA dataset. Figure 4

Bayesian phylogenetic tree based on cpDNA dataset. Gray bars re present 95% highest posterior density intervals. Pli o , Pliocene; Pl, Pleistocene; numbers above the branches indicate Bayesian posterior probabilities.

Supplementary Files

This is a list of supplementary les associated with this preprint. Click to download.

SupplementaryFigures.docx TablesS1.docx TablesS2.docx TablesS3.docx TableS4.xlsx TableS5.xlsx TableS6.xlsx TablesS7.docx