bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 Genetic Characterization and Curation of Diploid A-Genome Wheat 3 4 One-sentence summary: Genotyping of gene bank collections of diploid A-genome relatives of wheat 5 uncovered relatively higher genetic diversity and unique evolutionary relationships which gives insight 6 to the effective use of these germplasm for wheat improvement. 7 8 Authors: 9 Laxman Adhikari1, John Raupp1, Shuangye Wu1, Duane Wilson1, Byron Evers1, Dal-Hoe Koo1, 10 Narinder Singh1,‡, Bernd Friebe1 and Jesse Poland1,2,* 11 12 Affiliations: 13 Department of Plant Pathology, Kansas State University, Manhattan, KS; Wheat Genetic Resource 14 Center (WGRC), Kansas State University, Manhattan, KS 66502 15 Center for Desert Agriculture, King Abdullah University of Science and Technology, Thuwal, Saudi 16 Arabia 17 ‡ current address: Bayer Crop Science 18 * Corresponding author. Email: [email protected] 19 20 Short title: Characterization of Diploid Wheat 21 22

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

23 Abstract 24 The A-genome diploid wheats represent the earliest domesticated and cultivated wheat species in the 25 Fertile Crescent and the donor of the wheat A sub-genome. The A-genome species encompass the 26 cultivated einkorn (Triticum. monococcum L. subsp. monococcum), wild einkorn (T. monococcum L. 27 subsp. aegilopoides (Link) Thell.) and T. urartu. We evaluated the collection of 930 accessions in the 28 Wheat Genetics Resource Center (WGRC), using genotyping-by-sequencing (GBS) and identified 29 13,089 curated SNPs. Genomic analysis detected misclassified and duplicated accessions with most 30 duplicates originated from the same or a nearby locations. About 56% (n = 520) of the WGRC A- 31 genome species collections were duplicates supporting the need for genomic characterization for 32 effective curation and maintenance of these collections. Population structure analysis confirmed the 33 morphology-based classifications of the accessions and reflected the species geographic distributions. 34 We also showed that the T. urartu as the closest A-genome diploid to wheat through phylogenetic 35 analysis. Population analysis within the wild einkorn group showed three genetically distinct clusters, 36 which corresponded with wild einkorn races α, β, and γ described previously. The T. monococcum

37 genome-wide FST scan identified candidate genomic regions harboring domestication selection signature 38 (Btr1) on the short arm of chromosome 3Am at ~ 70 Mb. We established A-genome core set (79 39 accessions) based on allelic diversity, geographical distribution, and available phenotypic data. The 40 individual species core set maintained at least 80% of allelic variants in the A-genome collection and 41 constitute a valuable genetic resource to improve wheat and domesticated einkorn in breeding programs. 42 43 44 45 Key Words: Einkorn, A-genome wheat species, T. urartu, T. monococcum, aegilopoides, duplicates,

46 population structure, Nei’s index, FST scan, Btr1, selection signal, GBS, misclassified, core collection 47

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

48 Introduction 49 50 Wheat wild relatives are an important reservoir of genetic diversity that can be utilized for wheat 51 improvement, particularly for diseases, pests, and abiotic stress tolerance (Wulff & Moscou, 52 2014). Cultivated tetraploid (pasta wheat, Triticum turgidum) and hexaploid (bread wheat, Triticum 53 aestivum) wheat arose through successive whole-genome hybridization between related species in the 54 Triticeae. Although polyploidization in wheat enabled broad adaptation and genome plasticity found in 55 polyploids (Comai, 2005), it also created severe genetic bottlenecks within each subgenome (Feldman & 56 Levy, 2012). Likewise, of the three natural races within wild einkorn, only one natural race (β) has been 57 domesticated, thus, genetic diversity in the wild einkorn is expected to be greater than in domesticated 58 einkorn (Pourkheirandish et al., 2018). Some recent findings, however, reported no or low reduction in 59 nucleotide diversity through einkorn domestication, most likely indicating a minimal bottleneck during 60 domestication of cultivated einkorn (Kilian et al., 2007). This was true when diversity comparisons were 61 performed between wild einkorn specific races (α and β) vs. domesticated einkorn. However, when the 62 comparison was made between the domesticated einkorn vs. all groups of wild einkorn, the wild einkorn 63 diversity was much higher than found in the cultivated accessions. The value of A-genome species 64 diversity for alleviating the wheat diversity bottleneck have been described (Brunazzi et al., 2018; 65 Mondal et al., 2016). Thus, diversity assessment in germplasm collections of diploid A-genome species 66 is crucial for conservation planning and efficient utilization of germplasm in breeding. 67 68 A-genome wheat species (2n = 2x = 14, AA) are diploid grasses including the wild einkorn (T. 69 monococcum L. subsp. aegilopoides (Link) Thell.), domesticated einkorn (T. monococcum L. subsp. 70 monococcum), and T. urartu (van Slageren, 1994). Molecular and cytological studies have confirmed 71 that T. urartu, a related species sharing the same genome as domesticated einkorn, is the A-genome 72 ancestor to cultivated wheat (T. aestivum) (Dong et al., 2012). In the first polyploidization event that 73 occurred ~500,000-150,000 million years ago (MYA) (Charmet, 2011), T. urartu naturally hybridized 74 with a B-genome donor grass, an extant species but close relative of Aegilops speltoides Tausch, giving 75 rise to the wild tetraploid wheat T. turgidumL. subsp. dicoccoides (Körn. Ex Asch. & Graebn.) Thell. 76 (AABB, 2n = 4x = 28) (Nair, 2019). In the next event, the cultivated tetraploid emmer wheat (T. 77 turgidum subsp. durum (Desf.) Husn.) naturally hybridized with the D-genome donor species (Ae. 78 Tauschii Coss) forming hexaploid bread wheat (AABBDD, 2n = 6x = 42). The A-genome species

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

79 morphologically resemble cultivated tetraploid and hexaploid wheat more than any other surviving 80 diploid genome donors and are predominant in the Fertile Crescent (Heun et al., 1997). Domestication of 81 einkorn wheat, together with emmer wheat and barley around 12,000 years ago, transformed human 82 culture from hunting-gathering to agriculture, popularly known as the ‘Neolithic Revolution’ (Kilian et 83 al., 2010). The Karacadağ mountain in the southeast Turkey has been considered the geographical point 84 for einkorn domestication (Brandolini et al., 2016). 85 86 The donor of the A genome of the bread wheat, T. urartu, is estimated to have diverged nearly 0.57 – 87 0.76 MYA from another widespread A-genome diploid species, T. monococcum. Interspecific crosses 88 between T. urartu and T. monococcum are infertile, confirming the large phylogenetic distance and 89 genetic differentiation of the species (Middleton et al., 2014). Like hexaploid wheat, A-genome species 90 have a large genome size with a mean nuclear DNA content of 5.784 pg/1C in T. urartu to 6.247 pg/1C 91 in T. monococcum subps. aegilopoides (Özkan et al., 2010). Morphologically, T. urartu possesses 92 smooth leaves, a brittle rachis, and smaller anthers (< 0.3 mm). The wild einkorn (T. monococcum 93 subsp. aegilopoides) are characterized with a brittle rachis, hairy leaves, and larger (≥0.5 mm) anthers. 94 Domesticated einkorn has a nonbrittle (semitough) rachis with smooth leaves (Brandolini & Heun, 95 2019). 96 97 Being homologous to the wheat A-genome, these species provide useful sources for wheat improvement 98 using wide crosses and cytogenetics approaches. The A-genome species are important genetic resources 99 for pest resistance and stress tolerance. For example, T. urartu was identified as a source of resistance to 100 the root lesion nematode Pratylenchus thornei (Sheedy et al., 2012) and stem rust (Rouse & Jin, 2011).

101 Novel stem rust resistance genes SrTm5 and Sr60 were mapped in an F2 population derived from crosses 102 between wild and the cultivated einkorn (Chen et al., 2018). Sr35, the first gene cloned against the 103 devastating stem rust race UG99, also originates from T. monococcum (Saintenac et al., 2013). A leaf 104 rust gene, Lr63, in wheat chromosome 3AS was introgressed from T. monococcum (Kolmer et al., 105 2010). Surveying the genetic variation in A-genome species that can be utilized in wheat improvement 106 has lagged, considering the potential value of more effectively utilizing these species for wheat 107 improvement. 108

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

109 Einkorn has multiple botanical names in the literature as proposed by the various taxonomists, and 110 confusion related to the einkorn nomenclature is widespread. In 1948, Schiemann classified einkorn as 111 wild einkorn (T. boeoticum subsp. thaoudar), the feral einkorn (T. boeoticum subsp. aegilopoides), and 112 the domesticated einkorn (T. monococcum subsp. monococcum) (Brandolini et al., 2016; Schiemann, 113 1948). MacKey published einkorn classification in 1954 (Key, 1954) and updated the nomenclature 114 several times through 2005 (Mac Key, 2005a). van Slageren also published the einkorn nomenclature, 115 where the wild and domesticated einkorn were simply named as T. monococcum L. subsp. aegilopoides 116 (hereafter subsp. aegilopoides) and T. monococcum L. subsp. monococcum (hereafter subsp. 117 monococcum), respectively (van Slageren, 1994). In this study, we follow van Slageren’s (1994) einkorn 118 , because the A-genome species collection in the Wheat Genetics Resource Center (WGRC) at 119 Kansas State University (KSU) were initially classified using this nomenclature (van Slageren, 1994). 120 121 A well-characterized population structure of A-genome species is critical to formulating effective 122 conservation strategy, selecting diverse germplasm, and enhancing the accuracy of the genomic analysis 123 with structure information (Singh et al., 2019). Population structure and diversity assessment have 124 become easier with next-generation sequencing, which makes discovery of thousands of genotyping 125 markers possible. Here, we used genotyping by sequencing (GBS) for single nucleotide polymorphism 126 (SNP) discovery. GBS is straightforward, high-throughput, and with multiple downstream pipelines for 127 data processing (Poland et al., 2012a). However, population structure of A-genome species has not been 128 evaluated in detail with the resource of whole-genome profiling. Therefore, our objectives are to : i) 129 curate A-genome wheat accessions in the gene bank by identifying duplicates and misclassified 130 accessions, ii) assess the population structure and genetic diversity of the A-genome wheat species, and 131 iii) establish genetically, geographically, and phenotypically representative core collections for A- 132 genome species within the WGRC gene bank. 133 134 135 Results 136 137 A-Genome Species Distribution 138 Most of the wild einkorn (subsp. aegilopoides) in our collection, were collected across Turkey, northern 139 Iraq, west Iran, and Transcaucasia, whereas the majority of domesticated einkorn (subsp. monococcum)

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

140 were from west Turkey and the Balkans (Figure 1, Supplementary Table S1). About half of the T. urartu 141 accessions were from eastern Lebanon, around the Beqaa Valley, and a major part were from southeast 142 Turkey (Figure 1). The A-genome species are known to span from Transcaucasia through Anatolia to 143 the Caspian Sea. The WGRC collection covers the geographic range of this species. After genomic 144 characterization including misclassified accessions adjustment, we retained 196 T. urartu, 145 of 145 domesticated einkorn, and 584 wild einkorn (Supplementary Table S1). There were also 5 tetraploids 146 identified in the population which were curated to correct species. 147 148 Markers and Genotyping 149 For all A-genome accessions, we identified 44,215 biallelic SNPs after a filter for passing Fisher exact 150 test of disassociated alleles. Separating this by subspecies, we had 24,314 biallelic SNPs for subsp. 151 aegilopoides, 19,940 biallelic SNPs for T. urartu, and 13,957 biallelic SNPs for subsp. monococcum. 152 Upon filtration (MAF > 0.01, 30% < missing, 10% < heterozygosity), we retained 7432 SNPs for T. 153 urartu and 6734 SNPs for T. monococcum, 6343 SNPs for subsp. aegilopoides, and 3980 for subsp. 154 monococcum. For wheat and A-genome diploids together we found 15,300 filtered SNPs. For A-genome 155 species diversity assessment, thousands of segregating loci were available for the groups defined by 156 population analysis and core set selections (Table 1). We filtered the loci for MAF (MAF > 0.01) before 157 splitting the VCF file to the species and sub-species and observed loci that were fixed or otherwise one 158 heterozygous genotype call within the individual species and subspecies. To compute total segregating 159 loci per group and minimize the effect of potential sequencing error, we did further filtration and 160 removed any loci that were segregating only due to a single heterozygous genotype and otherwise the 161 major allele is fixed in remaining population (Table 1). 162 163 Gene Bank Curation 164 We identified and corrected a total of 22 misclassified accessions using fastStructure analysis, 165 phylogenetic and PCA clustering (Supplementary Figure S1) including nine T. urartu, two subsp. 166 monococcum, six subsp. aegilopoides, and five tetraploid accessions (Supplementary Table S3). As 167 large number of accessions in both T. urartu and subsp. aegilopoides collection were from southeast 168 Turkey; we observed most of the misclassified accessions also were from the same site. 169

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

170 While evaluating the collection for duplicate accessions, we compared various number of loci for allele 171 matching per A-genome species (Table 3) as the SNPs were filtered to keep only the sites with > 0.05 172 MAF, < 50% missing and < 10% heterozygous. We identified and used a threshold of ≥ 99% identity by 173 state (IBS) to declare the individuals as identical accessions to warrant the inclusion of identical 174 accessions in the duplicate set (Supplementary Figure S2) with tolerance for sequencing and genotyping 175 error. With these criteria we identified a total of 520 (56%) duplicated accessions which were mostly 176 observed within T. urartu and within α race subsp. aegilopoides (Supplementary Table S1). To confirm 177 this analysis, we checked the collection sites of the groups of duplicates identified and all of the 178 respective sets of duplicates were collected from the same or nearby sites. We further observed the 179 duplicates had same phenotypes as the glume color scores were the same for sets of duplicates 180 (Supplementary Table S1), confirming the accuracy of using the GBS data for identification of 181 duplicated accessions. For instance, TA471 and its 11 duplicates had glume color score of 7 while on a 182 scale of 1 (white) to 9 (black) (Supplementary Table S1). 183 184 Relationship Between A-genome Diploid and Wheat 185 The genetic grouping of A-genome diploids and CIMMYT wheat lines together showed that wheat is 186 closer to T. urartu than to T. monococcum (Supplementary Figure S3), a finding in agreement with the 187 known relationship between the species. The unrooted NJ tree constructed for wheat and A-genome 188 diploid wheat showed five accessions (TA282, TA10915, TA1325, TA1369, and TA10881) clustering 189 far from the T. urartu major clade (Supplementary Figure S3). Cytological analysis identified them as 190 tetraploid (2n=28) (Supplementary Figure S4). Therefore, we excluded these five accessions from 191 population analysis. This observation confirms that GBS also enables identifying cryptic accessions with 192 different ploidy levels in the population. 193 194 A-genome Population Structure and Wild Einkorn Genetic Races 195 Population grouping in the fastStructure analysis at K=2 to K=7 showed the A-genome genetic structure 196 was split with the known biological and geographical characterization (Figure 2). This analysis revealed 197 a number of misclassified accessions that were individually curated and checked, including 198 morphological confirmation, and were reclassified to the appropriate group (Supplementary Table S3). 199

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

200 At K=2, the population differentiation occurred only at the level of species, the accessions split into T. 201 monococcum and T. urartu, confirming known species differences (Figure 2). At K=3, the two 202 subspecies of T. monococcum differentiated with the accessions in the α wild einkorn race were clearly 203 differentiated from domesticated einkorn. However, the other races of wild einkorn (β and γ) appeared 204 to be an admixture, supporting that there is not complete differentiation between the wild and 205 domesticated einkorn, a classification that is simply based on the few morphological characteristics of 206 the domestication syndrome. 207 208 We observed differentiation of wild einkorn into genetically distinct groups at K=7. Comparing these 209 three wild einkorn subgroups with the α, β, and γ wild einkorn races described by (Kilian et al., 2007), 210 we report the three genetic subgroups as representing the races α, β, and γ by identifying common 211 USDA Plant Introduction (PI) numbers for accessions in both studies. The genetic clustering pattern and 212 geographical distribution then confirmed that the subgroups within subsp. aegilopoides represents α, β, 213 and γ races described and we hereby name these genetic groups accordingly (Supplementary Table S4) 214 (Kilian et al., 2007). In (Kilian et al., 2007), the α race accessions were primarily from southeast Turkey, 215 northern Iraq, and Iran; the γ race involves accessions from Transcaucasia to western Anatolia; and the β 216 race comprises a few accessions collected around Karacadag Turkey (Figure 1, Supplementary Table 217 S1). Based on population differentiation, α race exhibited the strongest differentiation with domesticated 218 einkorn and should represent the base population of subsp. aegilopoides, whereas the β race of wild 219 einkorn exhibited the least differentiation with subsp. monococcum. Interestingly, the β race did not 220 fully differentiate from subsp. monococcum at any value of K (Figure 2), supporting that domesticated 221 einkorn originated out of this subpopulation, which already largely differentiated from the other wild 222 einkorn, or (2) that the β race represents ‘feral’ subsp. monococcum accessions that were, at one point, 223 fully domesticated but reverted to wild plant types through introgression and admixture. 224 225 At K=5, the population subgrouping according to the accession origin was observed in α race accessions 226 within the wild einkorn. Accessions from Erbil (ancient name ‘Arbil’) differentiated as a subpopulation, 227 and the accessions from Sulaymaniyah (Iraq) split as the admixture of the Erbil subgroup and the 228 remaining accessions at K=5 (Figure 2). We could not observe any new differentiation within the wild 229 einkorn group at K=6. However, at K=7, we observed three distinct subgroups and a higher level of 230 admixture within the α race of subsp. aegilopoides (Figure 2). Also, there were two main sets of

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

231 admixture types; the first set mainly consists of accessions from Iran that shared ancestry from the 232 Duhok (red) and Turkey (purple) subgroups, and the second corresponds with accessions from 233 Sulaymaniyah (Iraq) and has ancestry from all three subgroups. Hence, within the population of α race 234 einkorn accessions, three subgroups exist; Erbil, Duhok, and Turkey, and two groups of genetic 235 admixtures (Iran and Sulaymaniyah), named from their origin. 236 237 We did not observe any subgrouping within the accessions from the southeast Turkey, yet the accessions 238 were primarily from two sites (Sanliurfa and Mardin). The grouping pattern of three subgroups within 239 the α race accessions provided a new insight into the wild einkorn subgrouping and their genetic 240 relationships. We did not observe within population differentiation in domesticated einkorn group. 241 242 In T. urartu, the subgrouping occurred at K=6, and was unchanged at K=7 (Figure 2). Two major T. 243 urartu subgroups represented accessions from Turkey (#T) and another from Lebanon (#L). Few T. 244 urartu accessions were from Syria (#S); some showed admixture, and some had a clean ancestry that 245 resembled accessions from Turkey (Figure 2). The few remaining accessions primarily were from 246 Transcaucasia (#M) and exhibited an ancestry similar to accessions from Turkey (Figure 2). 247 248 Phylogenetic Clustering and PCA 249 The phylogenetic clustering split the A-genome accessions into separate clades for T. urartu, T. 250 monococcum subsp. monococcum, and all races within the subsp. aegilopoides (Figure 3). Only 12 251 accessions were retained within race β, and the accessions were clustered with some other domesticated 252 einkorn accessions (Figure 3). The T. urartu clade distantly clustered in both PCA and phylogenetic 253 analysis from either of the einkorn clade indicating the obvious genetic differences between species. 254 The misclassified accessions (Supplementary Figure S1) observed in the phylogenetic clustering were 255 re-classified into proper genotype-based classes. 256 257 A PCA plot of A-genome species also showed accessions clustering as in fastStructure and phylogenetic 258 analysis (Supplementary Figure S5). The first principal component (PC1), which grouped the 259 accessions of T. monococcum and T. urartu in two primary clusters, explained 58% of the variation. 260 The PC2, which divided the einkorn accessions, explained 8% of the variation and separated

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

261 domesticated and different races within the wild einkorn. Misclassified accessions previously observed 262 also were revealed in the PCA analysis and their taxonomy classification adjusted. 263

264 Genetic Diversity and FST 265 A considerably high Nei’s diversity index (0.25) was observed for the complete set of A-genome 266 accessions. The Nei’s diversity indices for individual A-genome species ranged from 0.058 for 267 domesticated einkorn to 0.106 for the entire einkorn group. Among the three races of wild einkorn, the 268 Nei’s diversity indices of β race (0.058) was the lowest and γ was the highest (0.093; Table 1). As 269 expected for diverse accessions, we found a high density of alleles with low minor allele frequency 270 (MAF) (Supplementary Figure S6). 271 272 Population differentiation within the A-genome species were further verified by pairwise fixation index

273 (FST) values (Nei’s, 1987) computed between the groups. Pairwise FST between T. urartu and entire 274 einkorn were greater than 0.80, supporting that the two species are strongly differentiated (Table 2).

275 The pairwise FST (0.56) between the α race and domesticated einkorn indicated the strongest

276 differentiation between any two groups within the einkorn, whereas the weakest differentiation (FST = 277 0.31) was between the β race and domesticated einkorn, supporting the model that this wild race was the 278 most likely forerunner of domesticated einkorn as previously hypothesized (Kilian et al., 2007). The 279 concept also was endorsed by the origin of β race einkorn in the WGRC collection, mostly from 280 Diyarbakir and Sanliurfa, which are near Karacadag and Kartal-Karacadag mountains (points of 281 domestication). Nonetheless, the genetic grouping of β also occurred with subsp. monococcum in the

282 unrooted NJ tree (Figure 3). Pairwise FST (~ 0.40) between pairs: ‘γ race - subsp. monococcum’ and ‘γ 283 race - α race’ implicit the differentiation of γ race as a genetically intermediate type from truly wild α

284 race and domesticated einkorn (Table 2). The pairwise FST computed between two subpopulations 285 (Turkey and Lebanon) of T. urartu was 0.52, which also agrees with the population structure analysis. 286

287 Pairwise FST computed between the subpopulations within α race of subsp. aegilopoides signaled out the 288 geographical differentiation and the potential gene flow within this wild einkorn race. Consistent with

289 the fastStructure output, the Erbil subgroup showed the stronger differentiation (higher FST) with other 290 wild einkorn subgroups (Supplementary Table S5). The subgroup Duhok and southeast Turkey and

291 their admixture group (Iran) had the minimum pairwise FST (~ 0.12). The accessions within the

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

292 admixture group of Sulaymaniyah displayed almost similar differentiation (~FST = 0.16) with three 293 subgroups, which agrees with population structure as the admixture group has ancestry from all three. 294

295 FST Scan and Einkorn Selection Signature 296 After filtration and imputation, we had 6,622 SNPs segregating in T. monococcum on which we

297 calculated per site FST values for each of the seven chromosomes that ranged from near 0 to 1. Both

298 methods, Porto-Neto et al. (2013) and VCFtools, produced similar results for raw and smoothed FST

299 values. We used a genome-wide threshold of 3σ (0.24) over the mean FST, from which we observed only 300 a single-selection signature on short arm of chromosome 3A (Supplementary Figure S7) after smoothing 301 using Lowess method (f = 0.1) (Pintus et al., 2014). This selection signature corresponded to the locus 302 that harbors the brittle rachis 1 (Btr1) (Pourkheirandish et al., 2018) and was supported by the BLAST 303 hit of a coding sequence (Supplementary text S1) of Btr1 on the reference genome used to genotype our 304 population (T. urartu pseudomolecule), which was occurred at 62 Mb on chromosome 3A. We also

305 observed that the raw FST values for three consecutive sites of the region (62 Mb) had the highest (FST 306 =1) values. Thus, this selection scan identified the impact of selection for Btr1 in the domesticated 307 einkorn. 308 309 A-genome Core Collection 310 To maximize the utility of the WGRC collection we identified a core set that captured the majority of 311 allelic diversity within 19 T. urartu accessions, and 60 accessions of T. monococcum (einkorn wheat) 312 (Supplementary Table S2). In core sets of the entire A-genome collection, we captured ~98 % of the 313 identified alleles, whereas each separate sub-core also captured at least 80% of the segregating alleles of 314 the respective species-specific collections (Table 1). Richness in allelic diversity within the core 315 collections was confirmed by the higher Nei’s diversity index (0.27) of the selected cores relative to the 316 entire collection (0.25) (Table 1). Distribution of the core set accessions in the phylogenetic cluster, 317 PCA clusters, and in the geographic map showed that the selected accessions also represented all 318 subgroups within the population and covered the geographic range (Supplementary Figure S8 – S10). 319 Ranges of glume color scores (Supplementary Table S2) in the core indicated that the core collections 320 are also an excellent representative of phenotypic variations within the whole collection. 321 322 Discussion

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

323 324 A-Genome Species Distribution, Einkorn Nomenclatures and Morphology 325 Our results confirm that the WGRC A-genome collection includes arrays of naturally selected 326 germplasm around the center of origin (Figure 1). While verifying the morphologically based grouping 327 of A-genome species through population analysis, we identified three genetically different wild einkorn 328 races (Figure 2 and 3). This information is very crucial in handling a large group of wild einkorn so that 329 accessions with desired genetic background and morphology of interest can be selected for utilization in 330 breeding and further investigation. The wild einkorn genetic races described herein, matched with the 331 races described Kilian et al. (2007), add information to establish the evolutionary and genetic 332 relationships between wild and domesticated einkorn wheats. 333 334 However, various nomenclature of the einkorn (Supplementary Figure S11) creates a conundrum in 335 interpreting the different races within the wild einkorn. Some einkorn nomenclature is written in 336 multiple languages; Schiemann (1948) published his nomenclature in German and Dorofeev et al. 337 (1979) in Russian , which could have reduced the acceptance of the nomenclatures by the wider research 338 communities (Dorofeev et al., 1979). In a revised form of MacKey’s classification (Mac Key, 2005b), 339 the T. monococcum subsp. boeoticum was changed to T. monococcum subsp. aegilopoides (Goncharov, 340 2011). Therefore, no single einkorn classification is deemed to be the most widely accepted and 341 uniformly used. The van Slageren (1994) nomenclature that we follow also is mostly in agreement with 342 the MacKey classification, because both systems use T. monococcum subsp. aegilopoides as the wild 343 einkorn. With all these issues, an updated and widely accepted monograph of einkorn may help 344 maintaining uniformity in taxonomy of these natural accessions. 345 346 Species and subspecies classification is first based on morphology. Multiple studies also have discussed 347 different ecogeographical wild einkorn races that have intermediate morphology. Van Zeist (1992) 348 described two groups of wild einkorn: the first group (T. boeoticum subsp. thaoudar) predominately 349 exists in the southeast Turkey, northern Iraq, and west Iran, and the second group (T. boeoticum subsp. 350 aegilopoides) primarily occurs in the west Anatolian center (VAN ZEIST, 1992). The first group of 351 accessions had a double-grained spikelet, and the second group was single-grained, suggesting that the 352 second group is more similar to domesticated einkorn. Brandolini and Heun (2019) explained the T. 353 boeoticum subsp. aegilopoides as an intermediate type feral (semi-wild) einkorn with a semi-brittle

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

354 rachis and T. boeoticum subsp. thaoudar as the truly wild einkorn with an extremely brittle rachis and 355 argued on the quantitative nature of brittleness in einkorn wheat (Brandolini & Heun, 2019). They 356 hypothesized that the feral einkorn had evolved when agriculture moved from the southeast to west 357 Turkey and Balkans. The semi-brittle rachis breaks into two parts only after being bent and the naturally 358 emerged semi-brittle rachis einkorn mutant still exists in the vicinity of the Karacadag, however, the 359 area is predominant for the truly wild double-grained einkorn (Brandolini & Heun, 2019). Some 360 einkorn accessions with intermediate leaf hairiness, a trait used to classify accessions that is common 361 only in wild einkorn, was also observed (Empilli et al., 2000), indicating that einkorn with intermediate 362 or intergraded morphological characteristics are common (Harlan & Zohary, 1966). The three genetic 363 races of wild einkorn observed in this study also possess unique genetic relationships with cultivated

364 einkorn, as shown by phylogenetic grouping and pairwise FST values, showing the varying levels of 365 relatedness within and between wild einkorn accession. 366 367 Gene Bank Curation 368 Globally, plant gene banks often suffer from identified and unidentified duplicates that unnecessarily 369 increase maintenance costs (Díez et al., 2018). Here, we curated 930 A-genome species accessions in 370 WGRC gene bank, identifying duplicates and misclassified accessions, and recognizing valuable unique 371 accessions using genotyping. The existence of misclassified accessions in the gene bank may be due to 372 human error on class assignment and/or data recording; rarely, some accessions might also have 373 controversial morphology. As an example of severe misclassification, consider the wild einkorn 374 accession PI 427328 discussed earlier. Except for the WGRC and Leibniz gene banks, other three gene 375 collection agencies have listed this accession (PI 427328) as T. urartu (https://www.genesys- 376 pgr.org/a/v2JRrMq2g22), illustrating the importance of genetic scrutiny of the misclassified accessions 377 within the A-genome accessions in different repositories. This genotype-based curation reduces the 378 gene bank operation costs and makes germplasms preservation and utilization easier. 379 380 T. urartu: the Closest A-genome Diploid Relative of Wheat 381 With GBS information here we showed that T. urartu is the closest diploid A-genome relative of wheat 382 and thereby most likely donor of A-genome to the hexaploid wheat (Supplementary Figure S2). This 383 study endorses the known relationships between wheat and A-genome diploids, which was based on 384 thousands of molecular markers and samples (~1,000 diploids and > 200 wheat). Most previous studies

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

385 describing the relationship between T. urartu and wheat (Dvořák et al., 1993) however relied on 386 cytogenetic analysis. 387 388 Wild Einkorn Races 389 The wild einkorn groups were previously divided into α, β, and γ races (Kilian et al., 2007; Zaharieva & 390 Monneveux, 2014)which was consistent with the phylogeny observed in our study. Furthermore, we 391 validated these race groups to match accessions with common USDA PI numbers in both studies 392 overlapped their point of collection and almost all fall under the same races in both studies 393 (Supplementary Table S4). Comparing between studies, there were a few discrepancies in race 394 assignment of accessions (Kilian et al., 2007) that needed correction. For example, Killian et al. (2007) 395 grouped PI 427328 in T. urartu, but our genetic analysis grouped it into α race within subsp. 396 aegilopoides which is also in harmony with WGRC database (accession no. TA879). According to the 397 Genesys database (https://www.genesys-pgr.org/10.25642/IPK/GBIS/98704), another gene bank 398 (Leibniz Institute of Plant Genetics and Crop Plant Research) also classified this PI 427328 within wild 399 einkorn but under the name T. baeoticum Boiss. subsp. boeoticum exemplifying multiple wild einkorn 400 nomenclatures use and creating confusion when describing wild einkorn. Interestingly, Kilian et al. 401 (2007) reported a few feral types of einkorn accessions, indicating they are T. monococcum subsp. 402 aegilopoides according to the nomenclature used, which we did not observe in the WGRC collection. 403 We show that the wild and domesticated einkorn can clearly be differentiated based on genomic data 404 into α, β, and γ races and the domesticated accessions. Given the difficulty and ambiguity of 405 morphological classification, the genetic classification from genomic data can be a preferred approach to 406 cleanly classify any given accession. 407 408 Population Analysis and Different Groups Under A-genome Species

409 The population structure and FST analysis on the A-genome species endorsed the established 410 relationships between the species and subspecies. For instance, hybrids between T. monococcum and T. 411 urartu are largely sterile and, hence, the genetic differentiation between these species is apparent 412 (Fricano et al., 2014). Also, the intraspecific population differentiation between groups under einkorn at 413 relatively higher K values supported the known genetic relationship between these crossable subspecies 414 that produce mostly fertile hybrids (Harlan & Zohary, 1966). 415

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

416 Our analysis shows that the α race einkorn accessions most likely represent the truly wild einkorn with 417 an extremely brittle rachis, most likely the group of accessions that were traditionally classified as T. 418 boeoticum subsp. thaoudar (Brandolini & Heun, 2019). Differentiation of subpopulations within the α 419 race wild einkorn corresponding to geographic distribution implies migration and genetic drift among 420 truly wild einkorn in the Near East. The T. urartu subgrouping of accessions from Lebanon and Turkey 421 agrees with Wang et al. (2017) , where two subgroups, Mediterranean coastal and Mesopotamia- 422 Transcaucasia, within T. urartu were reported (Wang et al., 2017). 423 424 Diversity Analysis 425 Cultivated einkorn had a lower Nei’s diversity index (0.058) than the wild sister group and wild T. 426 urartu (Table 1), which was expected. As a domesticated species, subsp. monococcum experienced a 427 strong population bottleneck and artificial selection might have triggered genetic erosion. On the other 428 hand, the population structure of cultivated einkorn did not show substantial admixture, with the 429 exception of a few accessions, all individuals were true to the ancestry (Figure 2), suggesting a low post 430 domestication admixture contributing elevated diversity. The involvement of a single race (β) in 431 domestication would have further reduced allelic diversity in the cultivated einkorn; there was no 432 difference between the Nei’s diversity of β race (0.058) and the domesticated einkorn (0.058). Kilian et 433 al. (2007) illustrated no nucleotide diversity was reduced during einnkorn domestication; instead, they 434 observed increased diversity in domesticated compared to wild einkorn (Kilian et al., 2007). However, 435 the diversity assessment in (Kilian et al., 2007) could be influenced by the limited number of loci and 436 smaller sample size; especially, diversity estimates are sensitive to sample size when there are only a 437 handful of markers (Bashalkhanov et al., 2009; Li et al., 2009). In this experiment, we used thousands of 438 SNP markers and have larger sample size, which minimized the effect of sample size and the number of 439 loci. The highest Nei’s diversity index (0.25) for all A-genome combinedly, and the considerably higher 440 Nei’s diversity index for each species and core collections indicated that these accessions are very 441 important assets with novel and useful genetic variations. 442 443 Btr1: Einkorn Domestication Signal

444 Through FST computation, we showed that in einkorn wheat there is a single strong selection signal 445 observed on chromosome 3A corresponding to the Btr1 locus (Supplementary Figure S7). Previous 446 study also described Btr1 as one of the most important features of einkorn domestication

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

447 (Pourkheirandish et al., 2018). The non-brittleness in domesticated einkorn is controlled by a single 448 nucleotide change in Btr1 of wild einkorn that results in an amino acid substitution (alanine to threonine) 449 (Pourkheirandish et al., 2018). With ~ 1,000 filtered loci per chromosome, we located the candidate 450 selection region. The availability of a T. monococcum reference genome to call the genotype would be 451 ideal for obtaining dense markers and better locating the selection signature on einkorn wheat. 452 453 Core Collections 454 Establishing core collections of A-genome species enabled harnessing useful genetic variation to 455 improve wheat and cultivated einkorn. To the best of our knowledge, this is the first genetic core of A- 456 genome species, which included only 79 accessions and yet contains ~ 98% of the identified alleles 457 while achieving a more than 10-fold (79/930) reduction in the number accessions (Table 1, 458 Supplementary Table S2). The Nei’s diversity index computed for these core collections supported that 459 they have considerably higher relative diversity and can be leveraged for targeted germplasm 460 improvement. 461 462 Conclusions 463 This study reports the important aspects of the A-genome wheat species for genetic diversity, gene bank 464 curation, and core set selection. Following an assessment of nearly 1,000 accessions, we report that the 465 A-genome species possess a considerable amount of genetic diversity, which can be utilized in breeding 466 wheat and domesticated einkorn. This vast diversity is most effectively managed in pre-breeding with 467 well-defined core collections. Identifying and in-depth characterizing of such core collections adds 468 significant value and accessibility to the germplasm. Having a well curated and accurately described 469 gene bank collection, as done here, is a critical foundation to effectively using this rich diversity for crop 470 improvement and enhancing the value of gene bank resources. 471 472

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

473 474 Author Contributions: 475 JP and JR designed the experiment and conceptualized the study. JR, SW, DW, BE and NS carried out 476 experiments and data collection, growing of plant materials and germplasm. LA conducted data 477 analysis. LA and JP wrote the manuscript. All the authors have read and approved the manuscript for 478 publication. 479 480 Funding information 481 This material is based upon work supported by the National Science Foundation and Industry Partners 482 under Award No. (1822162) “Phase II IUCRC at Kansas State University Center for Wheat Genetic 483 Resources” and the National Science Foundation under Grant No. (1339389) “GPF-PG: Genome 484 Structure and Diversity of Wheat and Its Wild Relatives”. Any opinions, findings, and conclusions or 485 recommendations expressed in this material are those of the author(s) and do not necessarily reflect the 486 views of the National Science Foundation or industry partners. 487

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

488 Materials and Methods 489 490 Plant Resources 491 This study included 930 accessions of the A-genome diploid wheat species maintained in the WGRC 492 gene bank (Supplemental Table S1), which were primarily acquired from the Near East, Transcaucasia, 493 and the Balkans (Figure 1). Most of the A-genome accessions (~ 85%) tested include those initially 494 collected by B. Lennert Johnson, University of California–Riverside in the 1960s and 1970s. The 495 remaining accessions were obtained from gene banks in Japan (22), Germany (24), and ICARDA (61). 496 Several accessions were donated by Robert Metzger, USDA, Oregon State University, Corvallis (26), 497 seven were collected by the WGRC, and the remainder (10) from other sources. We also tested 225 498 CIMMYT wheat lines (Supplementary Table S1) genotyped earlier with GBS SNPs (Gao et al., 2021) 499 thereby inferring the genetic relationships between A-genome diploids and the wheat. 500 501 Genotypic Characterization 502 The tested accessions were grown as single plants in the greenhouse and tissue collected in 96-well 503 plates. The tissues were lyophilized for ~3 days and ground to a fine powder using Retsch mixer mill 504 MM 400. Genomic DNA extraction and GBS library preparation steps were according to Singh et al. 505 (2019). We had total four multiplexed GBS libraries including one for the pilot study. The pilot study 506 GBS library was 384-plex, whereas the other GBS libraries were 288-plex. We sequenced on the 507 Illumina platform with 150 bp pair-end reads (PE150). We had total The information about GBS of 226 508 CIMMYT lines can be obtained (Gao et al., 2021). 509 510 The TASSEL5 GBSv2 pipeline was used for sequence data processing and genotype calling (Glaubitz et 511 al., 2014). Reads were aligned to a T. urartu pseudomolecule reference (Ling et al., 2018) using 512 bowtie2 alignment (Ling et al., 2018) and exported to variant call format (VCF). Filtering of the VCF 513 was done for bi-allelic SNPs using the Fisher exact test with a threshold p-value <0.001 as described 514 previously (Poland et al., 2012b) considering that the genotypes should represent biallelic variants in 515 inbred accessions. Genotypes for accessions across all A-genome species were called together, followed 516 by extracting variants segregating within each species using VCFtools (Danecek et al., 2011). The SNPs 517 were filtered for minor allele frequency (MAF) > 0.01, missing percentage < 30%, and heterozygous 518 genotypes < 10% at the population level using TASSEL and R (R Core Team 2019).

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

519 520 The A-genome diploids and wheat lines were genotyped together calling SNP on A-genome of wheat 521 reference genome of cultivar Chinese Spring (iwgsc_refseqv1.0) (Appels et al., 2018). We also filtered 522 these SNPs using aforementioned criteria. The unrooted neighbor-joining (NJ) phylogenetic tree of A- 523 genome diploid and wheat lines were generated for investigating the genetic relationship. We followed 524 approach of Singh et al. (2019) to generate NJ tree from GBS sampled population, where clustering was 525 conducted with default parameters of R packages ‘dist’, ‘ape’, and ‘phyclust’. 526 527 Gene Bank Curation 528 A-genome species in the WGRC gene bank were curated to identify misclassified and duplicate 529 accessions. The misclassified accessions identified based on the genetic properties were compared with 530 accessions in the adjusted class morphologically to assure if they were previously assigned or 531 documented to the wrong class. Furthermore, to confirm the ploidy of the misclassified accessions that 532 were grouped far from the major T. urartu clade and did not exhibit a closer relationship with any 533 diploid A-genome in genetic tree, chromosome counts were made by staining with 4',6-diamidino-2- 534 phenylindole (DAPI). The detail method for chromosome count was obtained (Koo et al., 2017). 535 536 The genetically identical accessions were identified using pairwise allele matching across homozygous 537 and non-missing sites. We first analyzed the loci identity proportions distribution at genome-wide scale 538 including every possible pair-wise comparison among accessions within a single species. A threshold for 539 allele matching percentage given discrepancies for sequencing errors was then detected by finding a 540 point that separates the local maxima existing around the prefect identity (100%). The identity matrix 541 and percentage allele matching were computed in R using a custom script as described by (Singh et al., 542 2019). The morphological similarity and the geographical relations of the identified duplicate accessions 543 were checked for confirmation. Glume color (level of darkness) was used as a morphological marker for 544 cross-validation to affirm the accessions in a duplicate set have the same or similar phenotypes. The 545 variation in glume color was rated from completely white (0) to dark black (9). 546 547 Population Structure 548 Population structure of A-genome wheat species was analyzed using fastStructure (Raj et al., 2014). 549 The fastStructure was initially run at K=2 to K=12 with three replications using ‘simple’ prior where K

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

550 refers to number of population or model complexity. For the optimum value of K, the program was run 551 using ‘logistic’ prior at K=2 to K=7 with three replications (Singh et al., 2019). An appropriate number 552 of K was also obtained using the fastStructure provided utility tool, chooseK.py. The fastStructure 553 output was graphically visualized using an R package POPHELPER (Francis, 2017). Passport 554 information including the classification based on morphology, and the accessions geographical sites 555 were used to group and reorder the samples in population analysis. Accessions that were identified as 556 misclassified were confirmed through morphological evaluation and reordered to subspecies based on 557 the genotype-based grouping and the final result was plotted. 558 559 Phylogenetic clustering was carried out in R using ‘dist’ function and ‘ape’ and ‘phyclust’ packages 560 (Singh et al., 2019). The branches of an unrooted neighbor-joining (NJ) tree were first colored using the 561 morphology-based classification, and then according to genotype analysis. The morphology-based 562 coloring was particularly focused in identifying misclassified accessions. A-genome species population 563 genetic structure was also dissected using principal component analysis (PCA) of genomic data. For 564 PCA, we estimated the eigenvalues and eigenvectors on R using the ‘e’ function in ‘A’ matrix obtained 565 from the rrBLUP (Endelman, 2011; Singh et al., 2019). 566 567 Analysis of Genetic Diversity 568 A-genome species genetic diversity was assessed by computing the Nei’s diversity index (Nei 1973) 569 using filtered genotyping markers (Nei, 1973). We computed the Nei’s indices of (1) all A-genome 570 accessions together, (2) each species and subspecies independently, (3) the races within the subspecies, 571 and (4) and the core collections. The minor allele frequency (MAF) for each species was also plotted to 572 discern the excess of rare variants in respective population. Number of segregating loci per group were

573 determined (Table 1). A pairwise fixation index (FST) (Nei, 1987) also was computed between the 574 species and subgroups separated by the population analysis (Singh et al., 2019). 575

576 FST Within Einkorn and Selection signature

577 We computed a genome-wide FST statistic for variants within the einkorn group using R (R Core Team

578 2019) as described (Porto-Neto et al., 2013). This method compute FST statistic based on pure drift 579 model (Nicholson et al., 2002). We also compared the output by computing the Cockerham and Weir

580 FST statistic (Weir & Cockerham, 1984) using VCFtools (Danecek et al., 2011). The T. monococcum

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

581 VCF file with biallelic variant was further filtered keeping SNPs with MAF > 0.01, missing < 30% and 582 heterozygous < 10% followed by imputation using Beagle 5.1 (Browning et al., 2018). The filtered and

583 imputed genotyping information was used to derive the FST values. To balance the population sizes of 584 domesticated and wild einkorn, we randomly chose 145 wild einkorn accessions to match the number of

585 145 domesticated accessions. The FST were plotted using ggplot2 in R (R Core Team 2019) and the raw

586 FST plots were smoothed using Lowess method (Pintus et al., 2014) to find the genomic regions with

587 extreme FST. To define the selection signal peak, we considered outlier FST values that were more than 588 three standard deviation (3σ) over genome wide average as the threshold. 589 590 Core Collections 591 Core collections of T. urartu, and T. monococcum (wild and domesticated einkorn) were selected taking 592 allelic diversity, genotype coverage, geographical representation, and phenotypic variation (glume color) 593 into consideration. From the filtered genotyping file, heterozygous genotypes were masked before 594 running the core accessions selection software GenoCore (Jeong et al., 2017). We ran GenoCore with 595 the default parameters: -d 0.01% and -cv 99%. The positions of the selected samples within the 596 phylogenetic tree and PCA clusters were observed through coloring the selected core accessions versus 597 all other samples. Also, the geographical representations were evaluated marking the selected vs. 598 remaining accessions in the google map using GPS Visualizer (https://www.gpsvisualizer.com). To 599 ensure phenotypic variations in the selected core sets, we considered the glume color score 600 (Supplementary Table S2) as a reference variation. The Nei’s diversity index (1987) of core sets were 601 also computed (Danecek et al., 2011). 602

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

603 604 Accession Numbers 605 606 Raw sequence data obtained from GBS, the fastq files, has been deposited at the National Center for 607 Biotechnology Information (NCBI) SRA database with the BioProject accession PRJNA744683 608 (https://www.ncbi.nlm.nih.gov/sra/PRJNA744683). The GBS key file with required information for 609 demultiplexing and further detail about the SRA deposited fastq files can be obtained at Dryad digital 610 repository (doi:10.5061/dryad.9zw3r22f6). 611 612 613 Supplemental Data 614 615 Supplementary Table S1. List of A-genome accessions, their origin and duplicated accessions. 616 617 Supplementary Table S2. Core collections of A-genome species. 618 619 Supplementary Table S3. The misclassified A-genome species accessions, their previous class based 620 on morphology and the updated class/group based on the genotyping. 621 622 Supplementary Table S4. Number of accessions with common PI numbers that clustered in 623 corresponding groups in this experiment and a past experiment. Both studies tested only a portion of 624 germplasms from USDA. The α, β, and γ races indicate the three genetic clusters within the wild einkorn 625 as designated in the past study. The * indicates the accessions that we detected as misclassified and need 626 adjustment of class. Past study also grouped these accessions in the same group that we observed, 627 however, they did not discuss on misclassification issue and just listed the accessions based on 628 morphological classification. 629 630 Supplementary Table S5. Pairwise FST coefficients among the subgroups within α race of subsp. 631 aegilopoides (wild einkorn) and the admixture groups. There were three subgroups (Turkey, Duhok, 632 Erbil) and two admixture groups (Iran and Sulaymaniyah (Iraq)). 633 634 Supplementary Figure S1. The T. urartu clade and subsp. aegilopoides α race clade in the unrooted NJ 635 tree highlighting the misclassified accessions between the two groups. The red branches within the gold- 636 colored clade and the gold branches within the red clade reflect the misclassified accessions. 637

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

638 Supplementary Figure S2. Threshold determination for declaring duplicate accessions identification. 639 (A) Percentage identity versus the number of comparisons among 204 accessions in T. urartu (D) 640 Percentage identity versus the number of pairwise comparisons for the accession pairs in T. urartu that 641 had near perfect (≥ 99%) identity. 642 643 Supplementary Figure S3. An unrooted Neighbor-Joining (NJ) tree of wheat and A-genome species: T. 644 urartu, subsp. aegilopoides, and subsp. monococcum. The tree branches are colored based on the genetic 645 grouping of the accessions after correcting misclassified accessions. T. urartu (yellow), domesticated 646 einkorn (red), wild einkorn race α (blue), and wild einkorn race γ (green), and misclassified tetraploids 647 (brown) are shown. 648 649 Supplementary Figure S4. The mitotic metaphase cell of a misclassified wild wheat accession in 650 WGRC collection, TA10881 confirming the accession as tetraploid (2n=4x=28) and thus verified the 651 GBS based genetic grouping. Chromosomes were stained with 4',6-diamidino-2-phenylindole (DAPI). 652 Before genotyping and cytological confirmation, the accession was falsely grouped under T. urartu. 653 654 Supplementary Figure S5. Principle component analysis (PCA) plot for A-genome wheat species with 655 two major PCs. There were three races α, γ and β within the wild einkorn group which clustered 656 separately in population analysis. 657 658 Supplementary Figure S6. Minor allele frequency plots of A-genome diploid species: (a) T. 659 monococcum subsp. aegilopoides, (b) T. monococcum subsp. monococcum, and (c) T. urartu 660

661 Supplementary Figure S7. Smoothed FST curve showing selection signal for einkorn wheat on 662 chromosome 3A. The strongest signal was located at 60-90 Mb. The horizontal green line indicates the 663 selection signature determination genome-wide threshold (0.24), which is 3σ above the mean. The red 664 vertical line at 62 Mb on chromosome 3A indicates the location of candidate selection signature Btr1. 665 666 Supplementary Figure S8. An unrooted NJ phylogenetic tree of A-genome wheat species showing the 667 accessions in the core collections and all other accessions in respective clades. Black branch reflects the 668 accessions in the core collection, and the golden branch indicates all other accessions that are not in the 669 core collections.

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

670 Supplementary Figure S9. Principle component analysis (PCA) plot of A-genome wheat accessions 671 showing partitioning of different groups within the species and the accessions selected in the genetic 672 cores (black triangles). 673 674 Supplementary Figure S10. Geographic map of A-genome wheat accessions, where the core 675 accessions were indicated by larger google marks and the rest of the accessions were shown by smaller 676 marks of the respective groups. 677 678 Supplementary Figure S11. Diagram showing three different taxonomic classification systems of 679 einkorn wheat. In WGRC, we follow the taxonomic classification system of Van Slageren (1994). 680 681 Supplementary text S1. Coding sequence of gene for non-brittle rachis 1 (Btr1) in T. monococcum 682 subsp. monococcum 683 684 Acknowledgements 685 We would like to thank the wheat genetic resource center (WGRC) gene bank for collecting and 686 maintaining these A-genome species. 687 688 Authors have no competing interests 689 690 All data are available in the manuscript or the supplementary resources 691 692 693 694 695 696 697 698 699 700 701

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

702 Table 1. A-genome species and sub-species groups with number of samples, the Nei’s diversity indices, 703 and number of segregating loci. The percentage of segregating SNPs for core set groups were estimated 704 relative to the segregating loci within the respective groups. 705 Group Number of Diversity Segregating Samples Index SNPs A-genome species (T. monococcum + T. urartu) 925 0.25 13089 T. monococcum (einkorn) 729 0.106 6587 (50.3%) Domesticated einkorn (subsp. monococcum) 145 0.058 3637 (27.8%) Wild einkorn (subsp. aegilopoides) 584 0.086 6213 (47.4%) α race einkorn 524 0.073 5119 (39.1%) γ race einkorn 48 0.093 4440 (33.9%) β race einkorn 12 0.058 2622 (20.1%) T. urartu 196 0.069 4072 (31.11%) A-genome species core set 79 0.271 12907 (98.6%) T. monococcum core 60 0.116 5926 (89.9%) Wild einkorn core 41 0.098 5324 (85.6%) Domesticated einkorn core 19 0.065 3039 (83.6%) T. urartu core 19 0.072 3324 (81.6%) 706 707

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

708 Table 2. Pairwise FST coefficients among the A-genome wheat species. Higher FST reflects a stronger 709 population differentiation. The α, β and γ genetic races comprise the wild einkorn (T. monococcum 710 subsp. aegilopoides L.). 711 α race γ race β race T. urartu subsp. monococcum 0.56 0.41 0.31 0.87 α race - 0.40 0.50 0.86 γ race - - 0.37 0.83 β race - - - 0.86 712 713

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

714 Table 3. Number (#) of unique accessions, number of accessions in a duplicate set consisting maximum 715 identical accessions, and total accessions of A-genome species: T. urartu, domesticated einkorn (subsp. 716 monococcum), and wild einkorn (subsp. aegilopoides) three genetic races: α, γ, and β. The identical 717 accessions were detected using pairwise allele matching. 718 α race γ race β race subsp. monococcum T. urartu Total accessions 524 48 12 145 196 # Loci compared 4112 4112 4112 3337 6356 Max duplicates set 28 3 0 5 39 Unique accessions 198 (37.8%) 37 (77%) 12 (100%) 97 (66.8%) 61 (31.2%) 719 720 721 722 723

27 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

724 Figure Legends 725 726 Figure 1. Geographic distribution of A-genome wheat species in the WGRC gene bank. Collection sites 727 of accessions in this study are designated for domesticated einkorn (Triticum. monococcum subsp. 728 monococcum) (red); α race within wild einkorn (T. monococcum. subsp. aegilopoides) (blue); γ race 729 wild einkorn (orange); β race wild einkorn (magenta); and T. urartu (yellow). 730 731 Figure 2. Population structure of A-genome wheat species: Triticum monococcum L. and T. urartu. 732 Subpopulations were determined using fastStructure at K=2 to K=7. Each color represents a population, 733 and each bar indicates the admixture proportion of an individual accession from K populations. The 734 subgroup within α, which is exemplified by yellow color sole includes the accessions from Erbil (also 735 spelled Arbil), Iraq, whereas the subgroup embodied by red color only comprises the accessions from 736 Duhok (ancient name ‘Dahuk’, Iraq). The bars with purple color only represent the accessions from 737 southeast Turkey (ST). Other admixture types within α included accessions were from Iran, SU 738 (Sulaymaniyah (Iraq)), random different sites (D) and unknown sites (U) as indicated. Within T. urartu, 739 the LE group represents accessions from Lebanon, the TU includes accession from Turkey, S indicates 740 accessions from Syria and M shows accessions from mixed sites. 741 742 Figure 3. An unrooted Neighbor-Joining (NJ) tree of A-genome species: T. urartu, subsp. aegilopoides, 743 and subsp. monococcum. The tree branches are colored based on the genetic grouping of the accessions 744 after correcting misclassified accessions. T. urartu (yellow), domesticated einkorn (red), wild einkorn 745 race α (blue), and wild einkorn race γ (green) are shown. 746 747 Figure 4. Relationship between the allele coverage as estimated using GenoCore and the number of 748 samples selected in the core for einkorn group (T. monococcum). The threshold for 60 accessions at 749 approximately 90% genotype coverage is shown with vertical red line. 750 751 752

28 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

753 References 754 755 Appels, R., Eversole, K., Feuillet, C., Keller, B., Rogers, J., Stein, N., Pozniak, C. J., Stein, N., Choulet, F., 756 Distelfeld, A., Eversole, K., Poland, J., Rogers, J., Ronen, G., Sharpe, A. G., Pozniak, C., Ronen, G., 757 Stein, N., Barad, O., Baruch, K., Choulet, F., Keeble-Gagnère, G., Mascher, M., Sharpe, A. G., 758 Ben-Zvi, G., Josselin, A. A., Stein, N., Mascher, M., Himmelbach, A., Choulet, F., Keeble-Gagnère, 759 G., Mascher, M., Rogers, J., Balfourier, F., Gutierrez-Gonzalez, J., Hayden, M., Josselin, A. A., 760 Koh, C., Muehlbauer, G., Pasam, R. K., Paux, E., Pozniak, C. J., Rigault, P., Sharpe, A. G., Tibbits, 761 J., Tiwari, V., Choulet, F., Keeble-Gagnère, G., Mascher, M., Josselin, A. A., Rogers, J., Spannagl, 762 M., Choulet, F., Lang, D., Gundlach, H., Haberer, G., Keeble-Gagnère, G., Mayer, K. F. X., 763 Ormanbekova, D., Paux, E., Prade, V., Šimková, H., Wicker, T., Choulet, F., Spannagl, M., 764 Swarbreck, D., Rimbert, H., Felder, M., Guilhot, N., Gundlach, H., Haberer, G., Kaithakottil, G., 765 Keilwagen, J., Lang, D., Leroy, P., Lux, T., Mayer, K. F. X., Twardziok, S., Venturini, L., Appels, R., 766 Rimbert, H., Choulet, F., Juhász, A., Keeble-Gagnère, G., Choulet, F., Spannagl, M., Lang, D., 767 Abrouk, M., Haberer, G., Keeble-Gagnère, G., Mayer, K. F. X., Wicker, T., Choulet, F., Wicker, T., 768 Gundlach, H., Lang, D., Spannagl, M., Lang, D., Spannagl, M., Appels, R., Fischer, I., Uauy, C., 769 Borrill, P., Ramirez-Gonzalez, R. H., Appels, R., Arnaud, D., Chalabi, S., Chalhoub, B., Choulet, F., 770 Cory, A., Datla, R., Davey, M. W., Hayden, M., Jacobs, J., Lang, D., Robinson, S. J., Spannagl, M., 771 Steuernagel, B., Tibbits, J., Tiwari, V., van Ex, F., Wulff, B. B. H., Pozniak, C. J., Robinson, S. J., 772 Sharpe, A. G., Cory, A., Benhamed, M., Paux, E., Bendahmane, A., Concia, L., Latrasse, D., 773 Rogers, J., Jacobs, J., Alaux, M., Appels, R., Bartoš, J., Bellec, A., Berges, H., Doležel, J., Feuillet, 774 C., Frenkel, Z., Gill, B., Korol, A., Letellier, T., Olsen, O. A., Šimková, H., Singh, K., Valárik, M., van 775 der Vossen, E., Vautrin, S., Weining, S., Korol, A., Frenkel, Z., Fahima, T., Glikson, V., Raats, D., 776 Rogers, J., Tiwari, V., Gill, B., Paux, E., Poland, J., Doležel, J., Číhalíková, J., Šimková, H., 777 Toegelová, H., Vrána, J., Sourdille, P., Darrier, B., Appels, R., Spannagl, M., Lang, D., Fischer, I., 778 Ormanbekova, D., Prade, V., Barabaschi, D., Cattivelli, L., Hernandez, P., Galvez, S., Budak, H., 779 Steuernagel, B., Jones, J. D. G., Witek, K., Wulff, B. B. H., Yu, G., Small, I., Melonek, J., Zhou, R., 780 Juhász, A., Belova, T., Appels, R., Olsen, O. A., Kanyuka, K., King, R., Nilsen, K., Walkowiak, S., 781 Pozniak, C. J., Cuthbert, R., Datla, R., Knox, R., Wiebe, K., Xiang, D., Rohde, A., Golds, T., Doležel, 782 J., Čížková, J., Tibbits, J., Budak, H., Akpinar, B. A., Biyiklioglu, S., Muehlbauer, G., Poland, J., 783 Gao, L., Gutierrez-Gonzalez, J., N'Daiye, A., Doležel, J., Šimková, H., Číhalíková, J., Kubaláková, 784 M., Šafář, J., Vrána, J., Berges, H., Bellec, A., Vautrin, S., Alaux, M., Alfama, F., Adam-Blondon, A. 785 F., Flores, R., Guerche, C., Letellier, T., Loaec, M., Quesneville, H., Pozniak, C. J., Sharpe, A. G., 786 Walkowiak, S., Budak, H., Condie, J., Ens, J., Koh, C., Maclachlan, R., Tan, Y., Wicker, T., Choulet, 787 F., Paux, E., Alberti, A., Aury, J. M., Balfourier, F., Barbe, V., Couloux, A., Cruaud, C., Labadie, K., 788 Mangenot, S., Wincker, P., Gill, B., Kaur, G., Luo, M., Sehgal, S., Singh, K., Chhuneja, P., Gupta, O. 789 P., Jindal, S., Kaur, P., Malik, P., Sharma, P., Yadav, B., Singh, N. K., Khurana, J., Chaudhary, C., 790 Khurana, P., Kumar, V., Mahato, A., Mathur, S., Sevanthi, A., Sharma, N., Tomar, R. S., Rogers, J., 791 Jacobs, J., Alaux, M., Bellec, A., Berges, H., Doležel, J., Feuillet, C., Frenkel, Z., Gill, B., Korol, A., 792 van der Vossen, E., Vautrin, S., Gill, B., Kaur, G., Luo, M., Sehgal, S., Bartoš, J., Holušová, K., 793 Plíhal, O., Clark, M. D., Heavens, D., Kettleborough, G., Wright, J., Valárik, M., Abrouk, M., 794 Balcárková, B., Holušová, K., Hu, Y., Luo, M., Salina, E., Ravin, N., Skryabin, K., Beletsky, A., 795 Kadnikov, V., Mardanov, A., Nesterov, M., Rakitin, A., Sergeeva, E., Handa, H., Kanamori, H.,

29 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

796 Katagiri, S., Kobayashi, F., Nasuda, S., Tanaka, T., Wu, J., Appels, R., Hayden, M., Keeble- 797 Gagnère, G., Rigault, P., Tibbits, J., Olsen, O. A., Belova, T., Cattonaro, F., Jiumeng, M., Kugler, K., 798 Mayer, K. F. X., Pfeifer, M., Sandve, S., Xun, X., Zhan, B., Šimková, H., Abrouk, M., Batley, J., 799 Bayer, P. E., Edwards, D., Hayashi, S., Toegelová, H., Tulpová, Z., Visendi, P., Weining, S., Cui, L., 800 Du, X., Feng, K., Nie, X., Tong, W., Wang, L., Borrill, P., Gundlach, H., Galvez, S., Kaithakottil, G., 801 Lang, D., Lux, T., Mascher, M., Ormanbekova, D., Prade, V., Ramirez-Gonzalez, R. H., Spannagl, 802 M., Stein, N., Uauy, C., Venturini, L., Stein, N., Appels, R., Eversole, K., Rogers, J., Borrill, P., 803 Cattivelli, L., Choulet, F., Hernandez, P., Kanyuka, K., Lang, D., Mascher, M., Nilsen, K., Paux, E., 804 Pozniak, C. J., Ramirez-Gonzalez, R. H., Šimková, H., Small, I., Spannagl, M., Swarbreck, D., & 805 Uauy, C. (2018). Shifting the limits in wheat research and breeding using a fully annotated 806 reference genome. Science, 361(6403). https://doi.org/10.1126/science.aar7191 807 Bashalkhanov, S., Pandey, M., & Rajora, O. P. (2009). A simple method for estimating genetic diversity 808 in large populations from finite sample sizes. BMC Genetics, 10(1), 84. 809 https://doi.org/10.1186/1471-2156-10-84 810 Brandolini, A., & Heun, M. (2019). Genetics of brittleness in wild, domesticated and feral einkorn wheat 811 (Triticum monococcum L.) and the place of origin of feral einkorn. Genetic Resources and Crop 812 Evolution, 66(2), 429-439. https://doi.org/10.1007/s10722-018-0721-7 813 Brandolini, A., Volante, A., & Heun, M. (2016). Geographic differentiation of domesticated einkorn 814 wheat and possible Neolithic migration routes. Heredity, 117(3), 135-141. 815 https://doi.org/10.1038/hdy.2016.32 816 Browning, B. L., Zhou, Y., & Browning, S. R. (2018). A One-Penny Imputed Genome from Next- 817 Generation Reference Panels. The American Journal of Human Genetics, 103(3), 338-348. 818 https://doi.org/https://doi.org/10.1016/j.ajhg.2018.07.015 819 Brunazzi, A., Scaglione, D., Talini, R. F., Miculan, M., Magni, F., Poland, J., Enrico Pè, M., Brandolini, A., 820 & Dell'Acqua, M. (2018). Molecular diversity and landscape genomics of the crop wild relative 821 Triticum urartu across the Fertile Crescent. The Plant Journal, 94(4), 670-684. 822 https://doi.org/10.1111/tpj.13888 823 Charmet, G. (2011). Wheat domestication: Lessons for the future. Comptes Rendus Biologies, 334(3), 824 212-220. https://doi.org/https://doi.org/10.1016/j.crvi.2010.12.013 825 Chen, S., Guo, Y., Briggs, J., Dubach, F., Chao, S., Zhang, W., Rouse, M. N., & Dubcovsky, J. (2018). 826 Mapping and characterization of wheat stem rust resistance genes SrTm5 and Sr60 from 827 Triticum monococcum. Theoretical and Applied Genetics, 131(3), 625-635. 828 https://doi.org/10.1007/s00122-017-3024-z 829 Comai, L. (2005). The advantages and disadvantages of being polyploid. Nature Reviews Genetics, 830 6(11), 836-846. https://doi.org/10.1038/nrg1711 831 Danecek, P., Auton, A., Abecasis, G., Albers, C. A., Banks, E., DePristo, M. A., Handsaker, R. E., Lunter, 832 G., Marth, G. T., Sherry, S. T., McVean, G., Durbin, R., & Group, G. P. A. (2011). The variant call 833 format and VCFtools. Bioinformatics, 27(15), 2156-2158. 834 https://doi.org/10.1093/bioinformatics/btr330 835 Díez, M. J., De la Rosa, L., Martín, I., Guasch, L., Cartea, M. E., Mallor, C., Casals, J., Simó, J., Rivera, A., 836 Anastasio, G., Prohens, J., Soler, S., Blanca, J., Valcárcel, J. V., & Casañas, F. (2018). Plant 837 Genebanks: Present Situation and Proposals for Their Improvement. the Case of the Spanish 838 Network [Policy and Practice Reviews]. Frontiers in Plant Science, 9(1794). 839 https://doi.org/10.3389/fpls.2018.01794

30 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

840 Dong, L., Huo, N., Wang, Y., Deal, K., Luo, M.-C., Wang, D., Anderson, O. D., & Gu, Y. Q. (2012). 841 Exploring the diploid wheat ancestral A genome through sequence comparison at the high- 842 molecular-weight glutenin locus region. Molecular Genetics and Genomics, 287(11), 855-866. 843 https://doi.org/10.1007/s00438-012-0721-9 844 Dorofeev, V., Filatenko, A., Migušova, E., Udačin, R., & Jakubciner, M. (1979). Pšenica (Wheat). Kolos”, 845 Leningrad. 846 Dvořák, J., Terlizzi, P. d., Zhang, H.-B., & Resta, P. (1993). The evolution of polyploid wheats: 847 identification of the A genome donor species. Genome, 36(1), 21-31. 848 https://doi.org/10.1139/g93-004 %M 18469969 849 Empilli, S., Castagna, R., & Brandolini, A. (2000). Morpho-agronomic variability of the diploid wheat 850 Triticum monococcum L. Plant Genetic Resources Newsletter(No. 124), 36-40. 851 Endelman, J. B. (2011). Ridge Regression and Other Kernels for Genomic Selection with R Package 852 rrBLUP. The Plant Genome, 4(3), 250-255. https://doi.org/10.3835/plantgenome2011.08.0024 853 Feldman, M., & Levy, A. A. (2012). Genome evolution due to allopolyploidization in wheat. Genetics, 854 192(3), 763-774. https://doi.org/10.1534/genetics.112.146316 855 Francis, R. M. (2017). pophelper: an R package and web app to analyse and visualize population 856 structure. Molecular Ecology Resources, 17(1), 27-32. https://doi.org/10.1111/1755- 857 0998.12509 858 Fricano, A., Brandolini, A., Rossini, L., Sourdille, P., Wunder, J., Effgen, S., Hidalgo, A., Erba, D., Piffanelli, 859 P., & Salamini, F. (2014). Crossability of Triticum urartu and Triticum monococcum wheats, 860 homoeologous recombination, and description of a panel of interspecific introgression lines. G3 861 (Bethesda, Md.), 4(10), 1931-1941. https://doi.org/10.1534/g3.114.013623 862 Gao, L., Koo, D.-H., Juliana, P., Rife, T., Singh, D., Lemes da Silva, C., Lux, T., Dorn, K. M., Clinesmith, M., 863 Silva, P., Wang, X., Spannagl, M., Monat, C., Friebe, B., Steuernagel, B., Muehlbauer, G. J., 864 Walkowiak, S., Pozniak, C., Singh, R., Stein, N., Mascher, M., Fritz, A., & Poland, J. (2021). The 865 Aegilops ventricosa 2NvS segment in bread wheat: cytology, genomics and breeding. 866 Theoretical and Applied Genetics, 134(2), 529-542. https://doi.org/10.1007/s00122-020-03712- 867 y 868 Glaubitz, J. C., Casstevens, T. M., Lu, F., Harriman, J., Elshire, R. J., Sun, Q., & Buckler, E. S. (2014). 869 TASSEL-GBS: A High Capacity Genotyping by Sequencing Analysis Pipeline. PloS one, 9(2), 870 e90346. https://doi.org/10.1371/journal.pone.0090346 871 Goncharov, N. P. (2011). Triticum L. taxonomy: the present and the future. Plant Systematics 872 and Evolution, 295(1), 1-11. https://doi.org/10.1007/s00606-011-0480-9 873 Harlan, J. R., & Zohary, D. (1966). Distribution of Wild Wheats and Barley. Science, 153(3740), 1074- 874 1080. https://doi.org/10.1126/science.153.3740.1074 875 Heun, M., Schäfer-Pregl, R., Klawan, D., Castagna, R., Accerbi, M., Borghi, B., & Salamini, F. (1997). Site 876 of Einkorn Wheat Domestication Identified by DNA Fingerprinting. Science, 278(5341), 1312- 877 1314. https://doi.org/10.1126/science.278.5341.1312 878 Jeong, S., Kim, J.-Y., Jeong, S.-C., Kang, S.-T., Moon, J.-K., & Kim, N. (2017). GenoCore: A simple and fast 879 algorithm for core subset selection from large genotype datasets. PloS one, 12(7), e0181420. 880 https://doi.org/10.1371/journal.pone.0181420 881 Key, M. J. (1954). The taxonomy of hexaploid wheat. Svensk Bot Tidskr, 48, 579–590. 882 Kilian, B., Martin, W., & Salamini, F. (2010). Genetic Diversity, Evolution and Domestication of Wheat 883 and Barley in the Fertile Crescent. In M. Glaubrecht (Ed.), Evolution in Action: Case studies in

31 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

884 Adaptive Radiation, Speciation and the Origin of Biodiversity (pp. 137-166). Springer Berlin 885 Heidelberg. https://doi.org/10.1007/978-3-642-12425-9_8 886 Kilian, B., Özkan, H., Walther, A., Kohl, J., Dagan, T., Salamini, F., & Martin, W. (2007). Molecular 887 Diversity at 18 Loci in 321 Wild and 92 Domesticate Lines Reveal No Reduction of Nucleotide 888 Diversity during Triticum monococcum (Einkorn) Domestication: Implications for the Origin of 889 Agriculture. Molecular Biology and Evolution, 24(12), 2657-2668. 890 https://doi.org/10.1093/molbev/msm192 891 Kolmer, J. A., Anderson, J. A., & Flor, J. M. (2010). Chromosome Location, Linkage with Simple 892 Sequence Repeat Markers, and Leaf Rust Resistance Conditioned by Gene Lr63 in Wheat. Crop 893 Science, 50(6), 2392-2395. https://doi.org/10.2135/cropsci2010.01.0005 894 Koo, D.-H., Liu, W., Friebe, B., & Gill, B. S. (2017). Homoeologous recombination in the presence of Ph1 895 gene in wheat. Chromosoma, 126(4), 531-540. 896 Li, O., Zhao, Y.-y., Guo, N., Lu, C.-y., & Sun, X.-w. (2009). Effects of sample size and loci number on 897 genetic diversity in wild population of grass carp revealed by SSR. 898 Ling, H.-Q., Ma, B., Shi, X., Liu, H., Dong, L., Sun, H., Cao, Y., Gao, Q., Zheng, S., Li, Y., Yu, Y., Du, H., Qi, 899 M., Li, Y., Lu, H., Yu, H., Cui, Y., Wang, N., Chen, C., Wu, H., Zhao, Y., Zhang, J., Li, Y., Zhou, W., 900 Zhang, B., Hu, W., van Eijk, M. J. T., Tang, J., Witsenboer, H. M. A., Zhao, S., Li, Z., Zhang, A., 901 Wang, D., & Liang, C. (2018). Genome sequence of the progenitor of wheat A subgenome 902 Triticum urartu. Nature, 557(7705), 424-428. https://doi.org/10.1038/s41586-018-0108-0 903 Mac Key, J. (2005a). Wheat: its concept, evolution and taxonomy. Durum wheat breeding. Current 904 approaches and future strategies, 1(part 1), 3-61. 905 Mac Key, J. (2005b). Wheat: its concept, evolution, and taxonomy. In Durum Wheat Breeding (pp. 35- 906 94). CRC Press. 907 Middleton, C. P., Senerchia, N., Stein, N., Akhunov, E. D., Keller, B., Wicker, T., & Kilian, B. (2014). 908 Sequencing of chloroplast genomes from wheat, barley, rye and their relatives provides a 909 detailed insight into the evolution of the Triticeae tribe. PloS one, 9(3), e85761-e85761. 910 https://doi.org/10.1371/journal.pone.0085761 911 Mondal, S., Rutkoski, J. E., Velu, G., Singh, P. K., Crespo-Herrera, L. A., Guzmán, C., Bhavani, S., Lan, C., 912 He, X., & Singh, R. P. (2016). Harnessing Diversity in Wheat to Enhance Grain Yield, Climate 913 Resilience, Disease and Insect Pest Resistance and Nutrition Through Conventional and Modern 914 Breeding Approaches [Review]. Frontiers in Plant Science, 7(991). 915 https://doi.org/10.3389/fpls.2016.00991 916 Nair, K. P. (2019). Utilizing crop wild relatives to combat global warming. In Advances in Agronomy (Vol. 917 153, pp. 175-258). Elsevier. 918 Nei, M. (1973). Analysis of Gene Diversity in Subdivided Populations. Proceedings of the National 919 Academy of Sciences, 70(12), 3321-3323. https://doi.org/10.1073/pnas.70.12.3321 920 Nei, M. (1987). Molecular Evolutionary Genetics Columbia University Press New York 512. 921 Nicholson, G., Smith, A. V., Jónsson, F., Gústafsson, Ó., Stefánsson, K., & Donnelly, P. (2002). Assessing 922 population differentiation and isolation from single-nucleotide polymorphism data. Journal of 923 the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 695-715. 924 Özkan, H., Tuna, M., Kilian, B., Mori, N., & Ohta, S. (2010). Genome size variation in diploid and 925 tetraploid wild wheats. AoB PLANTS, 2010. https://doi.org/10.1093/aobpla/plq015 926 Pintus, E., Sorbolini, S., Albera, A., Gaspa, G., Dimauro, C., Steri, R., Marras, G., & Macciotta, N. P. 927 (2014). Use of locally weighted scatterplot smoothing (LOWESS) regression to study selection

32 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

928 signatures in Piedmontese and Italian Brown cattle breeds. Anim Genet, 45(1), 1-11. 929 https://doi.org/10.1111/age.12076 930 Poland, J., Endelman, J., Dawson, J., Rutkoski, J., Wu, S., Manes, Y., Dreisigacker, S., Crossa, J., Sánchez- 931 Villeda, H., & Sorrells, M. (2012b). Genomic selection in wheat breeding using genotyping-by- 932 sequencing. The Plant Genome, 5(3). 933 Poland, J. A., Brown, P. J., Sorrells, M. E., & Jannink, J.-L. (2012a). Development of high-density genetic 934 maps for barley and wheat using a novel two-enzyme genotyping-by-sequencing approach. PloS 935 one, 7(2), e32253-e32253. https://doi.org/10.1371/journal.pone.0032253 936 Porto-Neto, L. R., Lee, S. H., Lee, H. K., & Gondro, C. (2013). Detection of Signatures of Selection Using 937 FST. In C. Gondro, J. van der Werf, & B. Hayes (Eds.), Genome-Wide Association Studies and 938 Genomic Prediction (pp. 423-436). Humana Press. https://doi.org/10.1007/978-1-62703-447- 939 0_19 940 Pourkheirandish, M., Dai, F., Sakuma, S., Kanamori, H., Distelfeld, A., Willcox, G., Kawahara, T., 941 Matsumoto, T., Kilian, B., & Komatsuda, T. (2018). On the Origin of the Non-brittle Rachis Trait 942 of Domesticated Einkorn Wheat [Original Research]. Frontiers in Plant Science, 8(2031). 943 https://doi.org/10.3389/fpls.2017.02031 944 Raj, A., Stephens, M., & Pritchard, J. K. (2014). fastSTRUCTURE: Variational Inference of Population 945 Structure in Large SNP Data Sets. Genetics, 197(2), 573-589. 946 https://doi.org/10.1534/genetics.114.164350 947 Rouse, M. N., & Jin, Y. (2011). Stem Rust Resistance in A-Genome Diploid Relatives of Wheat. Plant 948 Disease, 95(8), 941-944. https://doi.org/10.1094/pdis-04-10-0260 949 Saintenac, C., Zhang, W., Salcedo, A., Rouse, M. N., Trick, H. N., Akhunov, E., & Dubcovsky, J. (2013). 950 Identification of Wheat Gene Sr35 That Confers Resistance to Ug99 Stem Rust Race 951 Group. Science, 341(6147), 783-786. https://doi.org/10.1126/science.1239022 952 Schiemann, E. (1948). Weizen. Roggen, Gerste: Systematik Geschichte und Verwendung, Verlag Gustav 953 Fischer, Jena, Germany. 954 Sheedy, J. G., Thompson, J. P., & Kelly, A. (2012). Diploid and tetraploid progenitors of wheat are 955 valuable sources of resistance to the root lesion nematode Pratylenchusthornei. Euphytica, 956 186(2), 377-391. https://doi.org/10.1007/s10681-011-0617-5 957 Singh, N., Wu, S., Tiwari, V., Sehgal, S., Raupp, J., Wilson, D., Abbasov, M., Gill, B., & Poland, J. (2019). 958 Genomic Analysis Confirms Population Structure and Identifies Inter-Lineage Hybrids in 959 Aegilops tauschii [Original Research]. Frontiers in Plant Science, 10(9). 960 https://doi.org/10.3389/fpls.2019.00009 961 van Slageren, M. W. (1994). Wild wheats: a monograph of Aegilops L. and Amblyopyrum (Jaub. & 962 Spach) Eig (Poaceae). Wageningen Agricultural University Papers(94-7). 963 VAN ZEIST, W. (1992). The origin and development of plant cultivation in the Near East. Nichibunken 964 Japan Review, 149-165. 965 Wang, X., Luo, G., Yang, W., Li, Y., Sun, J., Zhan, K., Liu, D., & Zhang, A. (2017). Genetic diversity, 966 population structure and marker-trait associations for agronomic and grain traits in wild diploid 967 wheat Triticum urartu. BMC Plant Biology, 17(1), 112. https://doi.org/10.1186/s12870-017- 968 1058-7 969 Weir, B. S., & Cockerham, C. C. (1984). Estimating F-statistics for the analysis of population structure. 970 evolution, 1358-1370.

33 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

971 Wulff, B. B. H., & Moscou, M. J. (2014). Strategies for transferring resistance into wheat: from wide 972 crosses to GM cassettes [Review]. Frontiers in Plant Science, 5(692). 973 https://doi.org/10.3389/fpls.2014.00692 974 Zaharieva, M., & Monneveux, P. (2014). Cultivated einkorn wheat (Triticum monococcum L. subsp. 975 monococcum): the long life of a founder crop of agriculture. Genetic Resources and Crop 976 Evolution, 61(3), 677-706. https://doi.org/10.1007/s10722-014-0084-7 977 978

34 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

979

Wild einkorn race α Wild einkorn race γ

Wild einkorn race β Domesticated einkorn

T. urartu 980 981 982 983 Figure 1. Geographic distribution of A-genome wheat species in the WGRC gene bank. Collection sites 984 of accessions in this study are designated for domesticated einkorn (Triticum. monococcum subsp. 985 monococcum) (red); α race within wild einkorn (T. monococcum. subsp. aegilopoides) (blue); γ race 986 wild einkorn (orange); β race wild einkorn (magenta); and T. urartu (yellow). 987

35 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

T. monococcum T. urartu

α γ subsp. monococcum β

K =2

K =3

K =4

K =5

K =6

K =7

ST 988 Iran Erbil SU Duhok D U M LE S TU 989 990 Figure 2. Population structure of A-genome wheat species: Triticum monococcum L. and T. urartu. 991 Subpopulations were determined using fastStructure at K=2 to K=7. Each color represents a population, 992 and each bar indicates the admixture proportion of an individual accession from K populations. The 993 subgroup within α, which is exemplified by yellow color sole includes the accessions from Erbil (also 994 spelled Arbil), Iraq, whereas the subgroup embodied by red color only comprises the accessions from 995 Duhok (ancient name ‘Dahuk’, Iraq). The bars with purple color only represent the accessions from 996 southeast Turkey (ST). Other admixture types within α included accessions were from Iran, SU 997 (Sulaymaniyah (Iraq)), random different sites (D) and unknown sites (U) as indicated. Within T. urartu, 998 the LE group represents accessions from Lebanon, the TU includes accession from Turkey, S indicates 999 accessions from Syria and M shows accessions from mixed sites. 1000

36 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1001 subsp. mono

TA10639TA10640 TA10645 TA10580 TA10562TA10643TA10642TA10868TA10609 TA10582TA10583TA10584TA10581TA10641 subsp. monococcum TA10646TA10626TA10560 TA10637 subsp. mono TA10604TA10630 TA10654TA10653TA10611TA10598 TA10649 TA2712 subsp. aegilpoides−alpha TA2726 TA10650TA10652TA10651 TA10575TA10576TA10568 TA2717TA2706 TA10634TA10897TA10896TA10621 TA2720 subsp. mono TA11107 TA10893TA10660TA10567 TA10574 TA10580 TA10645 TA10643TA10642TA10868TA10609TA10639TA10640 TA10656 TA2039TA2035 TA139 TA10583TA10584TA10581TA10641TA10562 TA2721 TA10626TA10560TA10582 ⍺ TA10659TA10607TA4420 TA10630TA10637TA10646 TA10580 TA10868TA10609TA10639TA10640TA10645 TA10418 race wild einkorn TA10642 TA10641TA10562TA10643 TA10657 TA10604 TA10584TA10581 TA10598 TA10560TA10582TA10583 TA10591 TA10646TA10626 TA10590 TA10653TA10611 TA10630TA10637 TA10654 TA10649 TA2712 subsp. aegilpoides−alpha TA10598TA10604 TA2726 TA10895TA2027 TA10651 TA10654TA10653TA10611 TA10613 TA10650TA10652 TA10649 TA2712 subsp. aegilpoides−alpha TA2726 TA10650TA10652TA10651 TA10658 TA2706 T. urartu TA2706 TA2717 TA2717 TA2029 TA138 TA10621TA10575TA10576TA10568 TA10615 TA10575TA10576TA10568 TA10634TA10897TA10896 TA2720 TA10589 TA10634TA10897TA10896TA10621 TA2720 TA11107 TA10894TA10627TA10631TA2026TA141 TA10574TA10893TA10660TA10567 TA10588TA2025 TA11107 TA10607TA10656TA2721TA2039TA2035 TA139 TA10606TA2036 TA10590TA10591TA10657TA10418TA10659TA4420 TA10574TA10893TA10660TA10567 T. urartu TA10658TA10613TA10895TA2027 TA10656 TA2039TA2035 TA139 TA10631TA10589TA10615TA2029 TA138 TA10607TA2721 TA10588TA10894TA10627TA2025TA2026TA141 TA10548TA2032TA10657TA10418TA10659TA4420 " race wild einkorn TA10606TA2036 TA10895TA10590TA10591TA2027 T. urartu TA10548TA2032 TA137TA10658TA10613 TA2029 TA138 TA10566 TA10623 TA10615 TA142 TA10587 TA10612 TA10624 TA2714TA2037 TA10563TA10559TA2707TA10608TA10555 TA2703 TA10625TA2713 subsp. aegilo−gamma TA137 TA10564TA2711TA10622TA10565 TA1988 TA10631TA10589 TA10557TA10578TA2038

TA10566 TA10623 TA142 TA10587 TA10627 TA2033 TA10612 TA10624 TA2026 TA2034 TA2714 TA2037 TA10563TA10559 TA2724 TA2707 TA141 TA2703 TA10608TA10555 TA10625TA2713 subsp. mono TA10564 subsp. aegilo−gamma TA2711TA10622TA10565 TA10557 TA10894 TA10578 TA2710 TA1988 TA2038TA2033TA2034 TA10588 TA2705TA2725TA11039 TA2724TA2710 TA2025 TA2716TA11016TA2709 TA2705TA2725TA11039TA2716TA11016TA2709 TA10606 TA10558TA2708TA2031 TA580 TA10558TA2708TA2031TA10632TA136 TA2036 TA10632TA136 TA10617TA10610TA10636TA10635TA10620TA10605TA10618TA10619 TA580 TA10617TA10610 TA2718TA2701TA2722TA2702TA2704 TA10636TA10635TA10620TA10605TA10618TA10619 TA10571 TA2024TA2028TA2030 TA2032 TA2718TA2701TA2722TA2702TA2704 TA10909 TA10579TA10569TA2715TA10561 TA10548 TA2024TA2028TA2030 subsp. aegilo.beta TA10644TA10556TA10628TA10629 TA2012TA569TA179 TA10577TA2719 TA10571 TA137 TA10573 TA10640 TA10645 TA10580 TA210 TA10639 TA10916 TA2723TA10655 TA10909 TA10641TA10562TA10643TA10642TA10868TA10609 TA10581 TA10566 TA10584 TA10623 TA142 TA10587 TA10585TA1324TA204 TA10582TA10583 TA10612 TA10624 TA2714TA2037 TA10579 TA10563TA10559 TA10569 TA2707 TA2715 TA10546 TA10626TA10560 TA10608TA10555 TA10561 TA10570 TA2703 TA10625TA2713 subsp. aegilo−gamma TA10547 TA10646 TA10564TA2711TA10622TA10565 β race wild einkorn TA10600 TA286 TA1988 TA10630TA10637 TA10557TA10578TA2038 TA10644TA10556 TA571 TA1232 subsp. aegilo.beta TA10599 TA1236TA1235TA661 TA10598TA10604 TA2033TA2034TA2724TA2710 TA10628TA10629 TA10601TA10602TA582 TA694TA1242TA1244 TA179 TA10654TA10653TA10611 TA2705TA2725TA11039TA2716TA11016TA2709 TA10603 TA177 TA10649 TA2712 TA10558TA2708 TA10577 subsp. aegilpoides−alpha TA2726 TA2031 TA10910 TA2012TA210TA569 TA10650TA10652TA10651 TA10632TA136 TA2719TA10573 TA580 TA2717TA2706 TA2007TA2008 TA10585TA10916TA1324TA204 TA10621TA10575TA10576TA10568 TA10617TA10610TA10636TA10635TA10620TA10605TA10618TA10619 TA2723TA10655 TA11013TA2005 TA10547TA10546 TA10634TA10897TA10896 TA2720 TA2718TA2701TA2722TA2702TA2704 TA10570 TA11012TA11014TA2006TA518 TA10599TA10600TA571 TA11107 TA2024TA2028TA2030 TA286TA1232 TA11015TA528 TA10602TA582TA10571 TA10660TA10567 TA1236TA1235TA661TA694 TA10603TA10601TA10909 TA10574TA10893TA2035 TA1242TA1244TA177 TA10595TA10594TA10596TA10597TA199TA578 TA10607TA10656TA2721TA2039 TA139 TA10579TA10569TA2715TA10561 TA10910 subsp. aegilo.beta TA570TA574 TA10591TA10657TA10418TA10659TA4420 TA10644TA10556TA10628TA10629 TA10908TA10907TA10586 TA2008TA2012TA210TA569TA179 TA10658TA10613TA10895TA10590TA2027 TA10577TA2719TA10573 T.T. urartu urartu TA1264TA264TA584 TA2007TA10585TA10916TA1324TA204 TA10615TA2029 TA138 TA2723TA10655 TA1267TA2009 TA11014TA11013TA2005TA10547TA10546 TA10627TA10631TA10589TA2026TA141 TA10570 TA2010TA2011TA572 TA11012TA2006TA518TA10599TA10600TA571 TA10606TA10588TA10894TA2036TA2025 TA286TA1232TA1236TA1235 TA11015TA10603TA528TA10601TA10602TA582 TA661TA694TA1242TA1244TA177 TA10548TA2032 TA10910 TA10595TA10594TA10596TA10597TA199TA578TA2007TA2008 TA137

TA10566 TA10623 TA574 TA142 TA10587 TA10612 TA10624 TA2714TA2037 TA10563TA10559TA2707TA10608TA10555 TA2703 TA10625TA2713 TA2005 TA10564 TA570 TA2711 subsp. aegilo−gamma TA10622TA10565 TA10557TA10578 TA11014TA11013 TA1988 TA2038TA2033TA2034TA2724 TA10586TA11012TA2006 TA2710TA2705TA2725TA11039 TA518 TA2716TA11016TA2709TA10558 TA10908TA10907 TA528 TA580 TA2708TA2031TA10632TA136 TA11015 TA10617TA10610TA10636TA10635TA10620TA10605TA10618TA10619 TA1264TA264TA584 TA2718TA2701TA2722TA2702TA2704 TA1267TA2009TA10595TA10594TA10596TA10597TA199TA578 TA10571 TA2024TA2028TA2030 TA2010TA2011TA572TA574 TA10909 TA10579TA10569TA2715TA10561 subsp. aegilo.beta TA570 TA10644TA10556TA10628TA10629 TA827 TA10586 TA2012TA569TA179 TA10577TA2719 TA1313TA1292TA1318TA1310 TA10908TA10907TA1264 TA10585TA10916TA1324TA204TA210 TA10573TA2723TA10655 TA1297TA1287 TA264TA584 TA10600TA10547TA10546 TA10570 TA1278TA1299TA1285TA1294TA1312TA1284TA1304TA1298TA767 TA2009 TA10602TA10599TA582TA571 TA286TA1232TA1236TA1235TA661 TA1281TA1305TA1288TA1282TA1291TA1307TA1295TA1321TA774TA796TA797TA777 TA2010TA2011TA1267 TA10603TA10601 TA694TA1242TA1244TA177 TA1290TA1283TA1316TA1323TA1280TA769 TA10906TA10867TA198TA556TA547TA554TA549 TA572 TA10910 TA1319TA1309TA1303TA1311TA779TA765TA775 TA10914TA10913TA2022TA404 TA766TA783 TA10911TA10888TA519TA10912 TA2007TA2008 TA520TA583TA529TA11038 TA11013TA2005 TA544TA658TA657TA1227 TA11012TA11014TA2006TA518 TA659TA660TA1228TA550TA651 TA11015TA528 TA804TA803 TA644TA646TA641TA696TA626 TA1289TA1286TA1308TA1275 TA826TA643TA662TA635TA630 TA10596TA10597 TA1302TA1293TA1317TA1279TA763TA773TA778TA799TA776 TA663TA345TA2023TA1212 TA10595TA10594TA199TA578 TA1315TA1322TA802TA801TA782TA780 TA2004TA752TA640TA10415 TA570TA574 TA768 TA593TA637TA632TA1230 TA10586 TA10877 TA634TA639TA633TA667 TA10907 TA1272TA789TA791TA794 TA654TA1225TA699 TA10908TA1264 TA1269TA1270TA1274TA792TA790 TA647TA645TA655TA642TA649TA698TA664TA648TA653 TA264TA584 TA1271TA1273TA793TA786TA785TA784 TA650 TA1267TA2009 TA1306TA2015TA1314TA771 TA827 TA748TA530TA749TA1260 TA2010TA2011TA572 TA10874TA10875TA1300TA788TA787 TA1313TA1292TA1318TA1310 TA603TA612TA757TA627TA717 TA1296TA1301TA1276TA800TA781 TA611TA607TA579TA533 TA1320TA795TA798 TA720TA722TA1254TA726TA600TA678TA721 TA1284TA1304TA1298TA1297TA1287 TA602TA599TA591TA589TA585 TA10876TA10873TA10891 TA1321TA1278TA1299TA1285TA1294TA1312TA797TA777TA767TA827 TA594TA606TA608TA609TA598 TA10886TA10885 TA1281TA1305TA1288TA1282TA1291TA1307TA1295TA774TA796TA1313TA1292TA1318TA1310 TA531TA727TA532TA592TA590 TA1290TA1283TA1316TA1323TA1280TA769 TA686TA685TA588TA1246TA1245TA683 TA10906TA10867TA198TA556TA547TA554TA549 TA1319TA1309TA1303TA1311TA779TA765TA775TA1297TA1287 TA269TA614TA596 TA10914TA10913TA2022 TA766TA783TA1312TA1284TA1304TA1298 TA746TA747TA754TA618 TA404TA10911TA10888TA519 TA1295TA1321TA1278TA1299TA1285TA1294TA797TA777TA767 TA534TA615TA756TA755TA753 TA10912TA520TA583TA529 TA1281TA1305TA1288TA1282TA1291TA1307TA774TA796 TA625TA536TA621TA623TA624 TA11038 TA2013TA665TA842 TA1290TA1283TA1316TA1323TA1280TA769 TA613TA620TA619TA616TA617 TA10906TA10867TA198TA556TA547TA554TA549 TA544TA658TA657 TA2014TA841 TA1319TA1309TA1303TA1311TA779TA765TA775 TA10914TA10913TA2022TA1227TA659TA660 TA725TA811TA834TA745 TA766TA783 TA404TA10911TA10888TA519TA1228TA550TA651TA644 TA622TA670TA823TA750 TA563TA195 TA10912TA520TA583TA529TA646TA641TA696TA626 TA604TA815TA810TA809 TA1308TA1275TA804TA803 TA561TA192TA181TA581TA10572 TA11038TA826TA643TA662 TA840TA848TA824TA813 TA1289TA1286TA776 TA827 TA560TA559TA190TA191TA189 TA544TA658TA657TA635TA630TA663 TA821TA713TA709TA806TA751TA819 TA1322TA1302TA1293TA1317TA1279TA780TA763TA773TA778TA799TA1313TA1292TA1318TA1310 TA10901TA197TA566TA562TA565 TA1227TA659TA660TA1228TA345TA2023TA1212TA2004 TA807 TA1315TA801TA782 TA567TA1349 TA550TA651TA644TA752TA640TA10415 TA715TA728 TA768TA802 TA186TA185TA558TA187TA188TA1348 TA646TA641TA696TA593TA637TA632 TA213TA808TA820TA711TA758TA742TA718TA710TA805TA705TA822TA704 TA1275TA804TA803TA1304TA1298TA1297TA1287 TA183TA182TA436TA441TA557TA495 TA626TA826TA643TA662TA1230TA634TA639 TA849TA724 TA10877TA1289TA1286TA1308TA1278TA1299TA1285TA1294TA1312TA1284TA777TA767 TA196TA176TA184 TA635TA630TA633TA667 TA847TA812TA814TA735 TA1293TA1317TA1279TA794TA763TA773TA778TA799TA776TA1282TA1291TA1307TA1295TA1321TA796TA797 TA10903TA10902TA564 TA663TA345TA2023TA1212 TA714TA723TA740TA730TA732 TA1274TA1272TA1315TA1322TA1302TA790TA789TA791TA782TA780TA1323TA1280TA1281TA1305TA1288TA769TA774 TA1262TA762 TA10906TA10867 TA2004TA752TA640TA654TA1225TA699TA647TA645TA655TA642TA649TA698 TA737TA836TA837 TA1269TA1270TA792TA768TA802TA801TA1303TA1311TA1290TA1283TA1316TA775 TA10917TA759TA676TA673TA540 TA198TA556TA547TA554TA549TA10914 TA10415TA593TA664TA648TA653 TA712TA738TA736TA760TA816 TA786TA785TA784TA1319TA1309TA783TA779TA765 TA668TA521TA539 TA10913TA2022TA404 TA637TA632TA1230TA634TA650 TA706TA832TA731TA733TA729 TA10877TA1271TA1273TA793TA766 TA692TA680TA679TA542TA677 TA10911TA10888TA519TA10912 TA639TA633TA667 TA2015TA1314TA794 TA10905TA10904TA2001TA2002TA2003 TA520TA583TA529TA11038 TA748TA530TA749 TA1300TA1306TA1274TA1272TA771TA790TA789TA791 TA357TA480 TA654TA1225TA699TA647TA645TA655TA642TA649TA698TA1260TA603TA612 TA825 TA10875TA1269TA1270TA788TA787TA792 TA352TA351TA481TA365TA358 TA544TA658TA657TA1227 TA664TA648TA653TA757TA627TA717 TA10889TA10892TA10890TA10882 TA10874TA786TA785TA784 TA1366TA478TA353TA354 TA659TA660TA1228TA550 TA650TA611TA607TA579 TA10883TA10887TA10879TA850TA828TA853TA830TA829 TA1296TA1301TA1276TA1271TA1273TA800TA781TA793 TA356TA328TA479TA359TA355TA361 TA651TA644TA646 TA533 TA568TA684TA839 TA1320TA2015TA1314TA795TA798TA804TA803 TA362TA367TA10899TA10900 TA641TA696TA626TA826 TA748TA530TA749TA720TA722TA1254TA726TA600TA678 TA10592TA741TA857TA858TA739 TA1306TA771TA1308TA1275 TA484TA363TA482TA364 TA643TA662TA635 TA1260TA603TA612TA721 TA10884TA1268TA831TA856 TA1300TA788TA787TA1289TA1286TA776 TA477TA483TA485 TA630TA663 TA757TA627TA602TA599 TA10878 TA10874TA10875TA1322TA1302TA1293TA1317TA1279TA780TA763TA773TA778TA799 TA345TA2023TA1212TA2004TA717TA611TA607TA591TA589TA585 TA10880 TA10891TA1276TA781TA1315TA801TA782 TA323TA515TA306TA309TA307 TA752TA640TA10415TA579TA533TA594TA606TA608 TA10876TA10873TA1320TA1296TA1301TA795TA798TA800TA768TA802 TA463TA462TA341TA348TA350 TA593TA637TA632 TA720TA722TA1254TA726TA609TA598 TA10886TA10885 TA322TA1343TA1220TA476TA332TA338 TA1230TA634 TA600TA678TA531TA727 TA10877 TA336TA344TA1340 TA639TA633TA667TA721TA602TA532TA592TA590 TA794 TA475TA343TA331TA346TA340TA1221 TA599TA591TA589 TA1272TA790TA789TA791 TA330TA342TA347TA329 TA654TA1225TA699TA647TA645TA655TA585TA594TA686TA685TA588TA1246TA1245TA683 TA10873TA10891TA1269TA1270TA1274TA792 TA1339TA474TA333 TA642TA649TA698TA664TA648TA653TA606TA608TA609 TA10886TA10876TA10885TA786TA785TA784 TA466TA464TA467TA465TA317TA469TA315TA468 TA650TA598TA531TA727TA269TA614TA596TA746 TA1271TA1273TA793 TA412TA308TA303TA1334TA449TA310TA460 TA532TA592TA590TA747TA754 TA2015TA1314 TA453TA452TA451TA438 TA748TA530TA618 TA334TA349TA311TA313 TA749TA1260TA686TA685TA534TA615 TA1300TA1306TA771 TA1214TA304TA494 TA603TA612TA588TA1246TA1245TA683TA756TA755 TA788TA787 TA456TA457TA312TA454 TA757TA627TA753TA625 TA10874TA10875 TA10593TA1338TA1219TA325TA473TA324 TA1213TA316TA321TA318TA470TA320 TA717TA611TA269TA614TA536TA621 TA326TA471TA327 TA434TA458TA314 TA607TA579TA533TA596TA746TA623TA624 TA1296TA1301TA1276TA665TA800TA781 TA1335TA1216TA1363TA1362TA1217TA1361 TA448TA302TA433TA442 TA747TA754TA613TA620TA619 TA2013TA1320TA842TA795TA798 TA261TA228 TA1211TA291TA300TA1210TA1208TA301TA447TA290TA1360TA299TA446 TA720TA722TA1254TA726TA600TA618TA534TA616TA617 TA2021TA239 TA294TA437 TA678TA615 TA1215TA297TA435TA440TA298TA293TA1209TA292TA444 TA721TA756TA755 TA2014 TA384 TA445TA439TA443 TA602TA599TA753TA625 TA841 TA1266TA194TA373TA374 TA591TA589TA536TA621 TA725TA811TA834 TA240TA695TA238 TA585TA594TA623TA624 TA745 TA387 TA522TA523 TA613 TA10891TA622 TA1326TA217 TA526TA524 TA606 TA10873TA665TA670 TA212TA237TA496TA370TA236TA369 TA527TA703TA525 TA608TA620TA619 TA10886TA10876TA2013TA842TA823TA750 TA376TA381TA372 TA609TA598TA531TA727TA616TA617TA563TA195 TA10885 TA224TA219 TA250TA206 TA487TA220TA218 TA1329TA205TA414 TA561 TA232 TA511 TA532TA192 TA604 TA229TA233 TA245TA397 TA396TA1327TA879TA401TA246TA400TA295 TA592 TA1354 TA590 TA398 TA181 TA2014 TA223TA383TA386TA382 TA251 TA581 TA841TA815TA810 TA234TA502TA221 TA249TA248TA395TA1328TA247 TA10572 TA809 TA215TA222 TA399TA366TA175TA486TA517 TA840 TA385 TA201TA415TA207 TA848 TA377 TA403 TA725 TA378 TA407TA254TA252TA498TA268TA497 TA241TA489 TA686 TA811 TA390TA490TA208 TA685TA559 TA834 TA427TA283TA1203TA425TA421TA504 TA824 TA281TA274TA1332TA432TA507TA426TA275TA416TA505TA1331TA420TA506TA1206TA510TA1202 TA588TA1246TA1245TA683TA560TA190TA191 TA745TA622TA670TA813TA821 TA193TA337 TA189TA10901 TA713 TA266TA259TA405TA499TA492TA413TA258TA406TA260TA410 TA197 TA709 TA200TA262 TA823 TA256 TA203 TA269 TA750TA806TA751TA819TA807 TA265TA408TA263TA257TA411TA267TA409 TA614TA596TA746TA563TA195TA561TA192TA566TA562TA565TA567 TA604 TA417 TA285TA287 TA747 TA418TA503TA271 TA754TA181TA581 TA815 TA279TA305TA230TA227 TA1330 TA226TA280 TA618 TA810 TA278TA235 TA10572 TA809 TA289TA214TA375 TA840 TA1201 TA380TA211 TA534 TA848TA715 TA388TA288 TA1349 TA371 TA1348 TA216TA379 TA188 TA512 TA615TA187 TA277TA273TA270TA272TA430TA429TA424 TA559TA185 TA423TA508 TA209 TA560 TA488TA243TA242 TA428TA491TA244 TA756 TA393 TA202 TA191 TA394 TA392 TA276 TA431 TA419 TA824TA813TA728 TA391 TA284 TA422 TA755TA753TA190TA189TA558 TA1207 TA625TA186 TA821TA713TA709TA806TA751TA819TA213TA808TA820TA711TA758TA742TA718TA710TA805TA705TA822TA704 TA1353 TA1357 TA536TA621TA623TA624TA183TA182TA436TA441TA557TA495TA10901TA197TA566TA562TA565 TA2013TA665TA842TA807TA849TA724 TA613TA620TA619TA616TA617TA196TA176TA184TA567TA1349 1002 TA2014TA715TA728TA847TA812TA814TA735 TA10903TA10902TA564TA186TA185TA558TA187TA188TA1348 TA841TA725TA811TA834TA213TA808TA820TA711TA758TA742TA718TA710TA805TA705TA822TA714TA723TA740TA730TA732 TA1262TA762TA183TA182TA436TA441TA557TA495 TA745TA622TA670TA823TA750TA704TA849TA724TA737TA836TA837 TA759TA676TA673TA540TA563TA176TA184 TA604TA847TA812TA814TA712TA738TA736TA760TA816 TA668TA521TA539TA10917TA195TA561TA192TA181TA581TA10902TA564TA196 1003 TA815TA810TA809TA840TA848TA735TA714TA723TA740TA730TA706TA832TA731TA733TA729 TA692TA680TA679TA542TA677TA1262TA762TA10903TA10572TA560TA559TA190TA191 TA824TA813TA821TA713TA709TA732TA737TA836TA837 TA10904TA2001TA2002TA2003TA759TA676TA673TA540TA189TA10901TA197 TA806TA751TA819TA807TA712TA738TA736TA760TA816 TA10905TA668TA521TA539TA10917TA566TA562TA565TA567 TA715TA728TA706TA832TA731TA733TA729TA825 TA481TA365TA358TA357TA480TA692TA680TA679TA542TA677TA185TA187TA188TA1348TA1349 TA213TA808TA820TA711TA758TA10889TA10892 TA478TA353TA354TA352TA351TA2001TA2002TA2003TA495TA186TA558 1004 Figure 3. An unrooted TA742TA718TA710TA805TA705TA822TA704TA849TA724NeighborTA10890TA10882TA10883TA10887TA10879TA850TA828-Joining (NJ) tree of A-genome species: T. urartu, subsp. aegilopoides, TA359TA355TA361TA1366TA10905TA10904TA184TA183TA182TA436TA441TA557 TA847TA825TA853TA830TA829TA568 TA10900TA356TA328TA479TA481TA365TA358TA357TA480TA564TA196TA176 TA812TA814TA735TA714TA10889TA684TA839TA741 TA362TA367TA10899TA353TA354TA352TA351TA10903TA10902 TA723TA740TA730TA732TA737TA10892TA10890TA10882TA10883TA10887TA10592TA850TA857TA858TA739 TA485TA484TA363TA482TA364TA355TA361TA1366TA478TA540TA1262TA762 TA836TA837TA712TA10879TA10884TA828TA853TA830TA829TA568TA1268TA831TA856 TA477TA483TA356TA328TA479TA359TA10917TA759TA676TA673 1005 and subsp. monococcum. TA738TA736TA760TA816TheTA706TA10878 treeTA684TA839 branches are colored based on the genetic grouping of the accessions TA307TA362TA367TA10899TA10900TA677TA668TA521TA539 TA832TA731TA733TA729TA10592TA10880TA741TA857TA858TA739 TA348TA350TA323TA515TA306TA309TA485TA484TA363TA482TA364TA692TA680TA679TA542 TA10884TA1268TA831TA856 TA463TA462TA341TA477TA483TA10905TA10904TA2001TA2002TA2003 TA10878 TA1340TA322TA1343TA1220TA476TA332TA338TA480 TA825TA10880 TA1221TA336TA344TA348TA350TA323TA515TA306TA309TA307TA481TA365TA358TA357 1006 after correcting misclassifiedTA10889TA10892 accessions. T. urartu (yellow), domesticated einkorn (red), wild einkorn TA329TA475TA343TA331TA346TA340TA463TA462TA341TA478TA353TA354TA352TA351 TA10890TA10882TA10883TA10887TA850 TA330TA342TA347TA322TA1343TA1220TA476TA332TA338TA355TA361TA1366 TA10879TA828TA853TA830TA829TA568 TA1339TA474TA333TA336TA344TA1340TA356TA328TA479TA359 TA684TA839 TA466TA464TA467TA465TA317TA469TA315TA468TA475TA343TA331TA346TA340TA1221TA362TA367TA10899TA10900 TA10592TA741TA857TA858TA739 TA310TA460TA342TA347TA329TA363TA482TA364 TA10884TA1268TA831 TA438TA412TA308TA303TA1334TA449TA1339TA474TA333TA330TA477TA483TA485TA484 1007 race α (blue), and wild einkorn raceTA856 γ (green) are shown. TA313TA453TA452TA451TA468 TA10878TA10880 TA494TA334TA349TA311TA466TA464TA467TA465TA317TA469TA315TA306TA309TA307 TA312TA454TA1214TA304TA308TA303TA1334TA449TA310TA460 TA348TA350TA323TA515 TA10593TA1338TA1219 TA321TA318TA470TA320TA456TA457TA438TA412 TA463TA462TA341 TA325TA473TA324TA326TA471TA327 TA458TA314TA1213TA316TA311TA313TA453TA452TA451 TA1343TA1220TA476TA332TA338 TA1335TA1216TA1363TA1362TA1217 TA448TA302TA433TA442TA434TA494TA334TA349 TA344TA1340TA322 1008 TA1361 TA291TA300TA1210TA1208TA301TA447TA290TA1360TA299TA446 TA312TA454TA1214TA304 TA1221TA336 TA10593TA1338TA1219 TA261TA228TA239 TA1211TA321TA318TA470TA320TA456TA457 TA475TA343TA331TA346TA340 TA325TA473TA324TA2021 TA435TA440TA298TA293TA1209TA292TA444TA294TA437 TA314TA1213TA316 TA342TA347TA329 TA326TA471TA327 TA443TA1215TA297 TA458 TA333TA330 TA1335TA1216TA1363TA1362TA1217TA1361 TA384 TA445TA439 TA448TA302TA433TA442TA434 TA1339TA474 TA1266TA194TA373TA374 TA1211TA291TA300TA1210TA1208TA301TA447TA290TA1360TA299TA446 TA464TA467TA465TA317TA469TA315TA468 TA2021TA261TA228TA239 TA240TA695TA238 TA466 TA387TA217 TA524TA522TA523 TA297TA435TA440TA298TA293TA1209TA292TA444TA294TA437 TA449TA310TA460 TA1326TA212TA237TA496TA370TA236TA369 TA527TA703TA525TA526 TA443TA1215 TA412TA308TA303TA1334 TA384TA376TA381TA372 TA445TA439 TA453TA452TA451TA438 TA1266TA194TA373TA374TA224TA219TA487 TA250TA206 TA349TA311TA313 TA240TA220TA218TA232TA229TA233 TA511TA1329TA205TA414 TA494TA334 TA695TA238 TA396TA1327TA879TA401TA246TA400TA295TA245TA397 TA1214TA304 TA387TA217TA223TA383TA386 TA251TA398TA1354 TA524TA522TA523 TA312TA454 TA1326TA212TA237TA382TA234TA502TA221 TA249TA248TA395TA1328TA247 TA703TA525TA526 TA456TA457 TA1338TA1219 TA496TA370TA236TA369 TA215TA222 TA399TA366TA175TA486TA517 TA527 TA320 TA10593 TA385TA377 TA403TA201TA415TA207 TA321TA318TA470 TA325TA473 TA376 TA378 TA407TA254TA252TA498TA268TA497 TA1213TA316 TA324 TA381 TA241TA489 TA372 TA390TA490TA208 TA458TA314 TA326 TA427TA283TA1203TA425 TA471TA327 TA224TA219TA487 TA421TA504TA281TA274TA1332TA432TA507TA426TA275TA416TA505TA1331TA420TA506TA1206TA510TA1202 TA250TA206 TA1335TA1216TA1363 TA220TA218TA232TA229TA233 TA193TA337 TA511TA1329TA205TA414 TA448TA302TA433TA442TA434 TA1362TA1217TA1361 TA396TA1327TA879TA401TA246TA400TA295TA245TA397 TA223TA383TA386 TA266TA259TA405TA499TA492TA413 TA251TA398TA1354 TA1208TA301TA447TA290TA1360TA299TA446 TA261 TA382TA234TA502TA221 TA258TA406TA260TA410TA200TA262TA256TA203TA265TA408TA263 TA249TA248TA395TA1328TA247 TA1211TA291TA300TA1210 TA228 TA215TA222 TA257TA411TA267TA409 TA399TA366TA175TA486TA517 TA2021TA239 TA385TA377 TA403TA201TA415TA207 TA378 TA407TA254TA252TA498TA268TA497 TA1209TA292TA444TA294TA437 TA390TA490TA208TA241TA489 TA297TA435TA440TA298TA293 TA427TA283TA1203TA425TA421TA504TA281TA274TA1332TA432TA507TA426TA275TA416TA505TA1331TA420TA506TA1206TA510TA1202 TA1215 TA193TA337 TA417TA285TA287TA418TA503TA271 TA445TA439TA443 TA384 TA1330 TA279TA305TA230TA227TA226TA280TA278TA235 TA194 TA266TA259 TA1201 TA289TA214TA375 TA1266TA373TA374 TA405TA499TA492TA413TA258TA406TA260 TA380TA211TA388TA288 TA410TA200TA262TA256TA203TA265TA408TA263 TA371TA216TA379 TA240 TA257TA411TA267TA409 TA512TA277TA273TA270TA272TA430TA429TA424 TA423TA508 TA695 TA209TA488TA243TA242TA428TA491TA244 TA393 TA238 TA394TA392TA391 TA422 TA276 TA431 TA419 TA202 TA387 TA284 TA1326TA217 TA525TA526TA524TA522TA523 TA212TA237TA496TA370TA236 TA417TA285TA287TA418TA503TA271 TA1207 TA527TA703 TA369 TA1353 TA1357 TA376 TA1330 TA279TA305TA230TA227TA226TA280TA278TA235 TA381TA372 TA1201 TA289TA214TA375 TA224TA219 TA380TA211TA388TA288 TA206 TA487TA220TA218 TA371TA216TA379TA512 TA414TA250 TA232 TA277TA273TA270TA272TA430TA429TA424 TA511TA1329TA205 TA229 TA423 TA233 TA508 TA209TA488TA243TA242 TA428TA491TA244 TA393 TA394TA392 TA276 TA431 TA419 TA202 TA391 TA284 TA422 TA295TA245TA397 TA223 TA398TA1354TA396TA1327TA879TA401TA246TA400 TA383TA386TA382TA234TA502TA221 TA249TA248TA395TA1328TA247TA251 TA215TA222 TA1207 TA366TA175TA486TA517 TA385 TA1353 TA1357 TA201TA415TA207TA399 TA377TA378 TA407TA254TA252TA498TA268TA497TA403 TA390TA490TA208TA241TA489 TA193TA337 TA427TA283TA1203TA425TA421TA504TA281TA274TA1332TA432TA507TA426TA275TA416TA505TA1331TA420TA506TA1206TA510TA1202 TA266TA259TA405TA499TA492TA413TA258TA406 TA260TA410TA200TA262TA256TA203TA265TA408TA263TA257TA411TA267TA409

TA417TA285TA287TA418TA503TA271 TA1330 TA279TA305TA230TA227TA226TA280TA278TA235 TA289TA214 TA1201 TA375TA380TA211TA388 TA288TA371TA216TA379 TA512TA277TA273TA270TA272TA430TA429TA424 TA423TA508 TA209TA488TA243TA242 TA428TA491TA244 TA393 TA394TA392TA391 TA284 TA422 TA276 TA431 TA419 TA202

TA1207

TA1353 TA1357

37 bioRxiv preprint doi: https://doi.org/10.1101/2021.08.20.457122; this version posted August 20, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1009 (T.monococcum sp.) core collection 100 90 80 (%) 70 60 Coverage 50 Genotype coverage (%) Genotype coverage 40

0 50 100 150 200

Number of samples Number of samples 1010 1011 Figure 4. Relationship between the allele coverage as estimated using GenoCore and the number of 1012 samples selected in the core for einkorn group (T. monococcum). The threshold for 60 accessions at 1013 approximately 90% genotype coverage is shown with vertical red line. 1014 1015

38