<<

Mesoamerican origin of the common ( vulgaris L.) is revealed by sequence data

Elena Bitocchia, Laura Nannia, Elisa Belluccia, Monica Rossia, Alessandro Giardinia, Pierluigi Spagnoletti Zeulib, Giuseppina Logozzob, Jens Stougaardc, Phillip McCleand, Giovanna Attenee, and Roberto Papaa,f,1

aDipartimento di Scienze Agrarie, Alimentari ed Ambientali, Università Politecnica delle Marche, 60131 Ancona, ; bDipartimento di Biologia Difesa e Biotecnologie Agro-Forestali, Università degli Studi della Basilicata, 85100 Potenza, Italy; cCentre for Recognition and Signalling, Department of Molecular Biology, Aarhus University, DK-8000 Aarhus C, Denmark; dDepartment of Sciences, North Dakota State University, Fargo, ND 58105; eDipartimento di Scienze Agronomiche e Genetica Vegetale Agraria, Università degli Studi di Sassari, 07100 Sassari, Italy; and fCereal Research Centre, Agricultural Research Council (CRA-CER), S.S. 16, Km 675, 71122 Foggia, Italy

Edited* by Jeffrey L. Bennetzen, University of Georgia, Athens, GA, and approved December 16, 2011 (received for review June 3, 2011) Knowledge about the origins and evolution of rep- scenario of P. vulgaris (20, 21), although a recent study has sug- resents an important prerequisite for efficient conservation and use gested a single event at the origin of the indica and of existing plant materials. This study was designed to solve the japonica subgroups (22). While only these two major gene pools ongoing debate on the origins of the common bean by investigating are recognized in the domesticated population, the geographical the nucleotide diversity at five gene loci of a large sample that structure of the wild form of the common bean is more complex, represents the entire geographical distribution of the wild forms of with an additional third gene pool that is localized between this species. Our data clearly indicate a Mesoamerican origin of the and Ecuador (23) and characterized by a specific storage common bean. They also strongly support the occurrence of a , phaseolin type I (24). Moreover, wild populations from bottleneck during the formation of the Andean gene pool that are often seen as intermediates, and a marked geo- predates the domestication, which was suggested by recent studies graphical structure is observed in wild from based on multilocus molecular markers. Furthermore, a remarkable (12). The population from northern Peru and Ecuador is usually result was the genetic structure that was seen for the Mesoamerican considered the ancestral population from which P. vulgaris origi- accessions, with the identification of four different genetic groups nated (the northern Peru–Ecuador hypothesis) (11, 24, 25). In- that have different relationships with the sets of wild accessions deed, the work by Kami et al. (24) analyzed a portion of the gene from the and northern Peru–Ecuador. This finding implies that codes for the storage seed protein phaseolin, including pha- that both of the gene pools from originated through seolin type (type I) from northern Peru–Ecuador accessions that different migration events from the Mesoamerican populations that was not present in the other gene pools, which indicates that type I were characteristic of central . phaseolin is ancestral to the other phaseolin sequences of P. vul- garis (24). Thus, the work by Kami et al. (24) suggested that, crop evolution | | population structure | mutation rate starting from the core area of the western slopes of the Andes in northern Peru and Ecuador, the wild bean was dispersed north he common bean ( L.) is the main (Colombia, Central America, and Mexico) and south (southern Tlegume for direct human consumption, and it represents Peru, , and Argentina), which resulted in the Mesoamer- a rich source of protein, vitamins, minerals, and fiber, especially ican and Andean gene pools, respectively. for the poorer populations of Africa and (1). The Mesoamerican origin of the common bean is an alterna- Investigations into the origins and evolution of this species would tive and older hypothesis. This hypothesis is supported by the be expected to highlight the structure and organization of its observations that the closest relatives of wild P. vulgaris in the genetic diversity and the role of the evolutionary forces that have Phaseolus are distributed throughout Mesoamerica (26– been shaping this diversity. Such knowledge is a crucial pre- 29). Additionally, the higher diversity found in the Mesoamer- requisite for efficient conservation and use of the existing ican compared with the Andean gene pool (phaseolin types, germplasm for the development of new improved plant varieties. allozyme alleles, and molecular markers) (8–10, 30, 31) supports The current distribution of the wild common bean encompasses a Mesoamerican origin. a large geographical area: from northern Mexico to northwestern Recently, the work by Rossi et al. (13) compared amplified Argentina (2). In general, two major ecogeographical gene pools fragment length polymorphism (AFLP) data based on an anal- are recognized: Mesoamerica and the Andes. These two gene ysis of wild and domesticated P. vulgaris accessions with the data pools are characterized by partial reproductive isolation (3, 4), from the work by Kwak and Gepts (14) based on simple se- and they are seen in both wild and domesticated materials. They quence repeats (SSRs) obtained on a similar sample. This have been recognized in several studies based on morphology (5– analysis suggested that, before domestication, there was a severe 7), agronomic traits (7), seed (8), allozymes (9), and bottleneck in the Andean populations, and they reproposed the different types of molecular markers (10–15), which have given Mesoamerican origin of the common bean. the overall indication of the occurrence of at least two in- dependent domestication events in the two different hemispheres. The existence of these two geographically distinct and isolated Author contributions: E. Bitocchi and R.P. designed research; E. Bitocchi, L.N., E. Bellucci, evolutionary lineages that predate the domestication of the M.R., A.G., P.S.Z., G.L., and J.S., P.M., G.A., and R.P. performed research; E. Bitocchi, L.N., common bean represents a unique scenario among . This E. Bellucci, G.A., and R.P. analyzed data; and E. Bitocchi and R.P. wrote the paper. scenario differs from the occurrence over a smaller geographical The authors declare no conflict of interest. range of multiple domestications that have been proposed for *This Direct Submission article had a prearranged editor. other species, such as barley (16) and wheat (17), and indeed, Data deposition: The sequences reported in this paper have been deposited in the Gen- within the Mesoamerican gene pool of the common bean itself Bank database (accession nos. JN796475-JN796922). (18, but see 13, 19). In these cases, the lack of isolation between 1To whom correspondence should be addressed. E-mail: [email protected]. the populations prevented independent evolution of different See Author Summary on page 5148 (volume 109, number 14). lineages in both the wild and domesticated forms. Rice is the only This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. crop that may show a scenario that is to some extent similar to the 1073/pnas.1108973109/-/DCSupplemental.

E788–E796 | PNAS | Published online March 5, 2012 www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Downloaded by guest on September 28, 2021 The main aim of the present study was to investigate the evolu- 34-aa sequence has 85% identity with histidinol dehydrogenase of PNAS PLUS tionary history of P. vulgaris to solve the issue regarding the origin Arabidopsis thaliana. This catalyses the last two steps in of this species. We have investigated the nucleotide diversity for the L-histidine biosynthesis pathway, which is conserved in bac- five different genes from a wide sample of wild P. vulgaris that teria, archaea, fungi, and . For Leg100, a coding region of is representative of its geographical distribution. Our findings 45 bp was identified at the 3′ extremity of the sequence, and its identify the origin of P. vulgaris as Mesoamerican, and they also deduced 15-aa sequence showed 100% identity with biotin syn- reveal the level and structure of the genetic diversity that char- thase of A. thaliana, an enzyme that is involved in the biotin acterizes the wild accessions of this species. Our data are relevant biosynthetic process. Three exons were found in Leg133 of 75, 72, for crop improvement and in particular, maximization of the and 90 bp. The corresponding 79-aa sequence showed 84% efficient use of wild genetic diversity for breeding programs. identity with dolichyl-diphosphooligosaccharide-protein glyco- transferase of A. thaliana, an enzyme that is involved in aspara- Results gine-linked protein glycosylation. Two exons (88 and 56 bp) were Nucleotide Variation in the Wild Common Bean. We sequenced five identified in Leg223, with an identity of 71% of the translated loci of a large collection that included wild common bean acces- protein fragment with the eukaryotic translation initiation factor sions from Mesoamerican (n = 49) and Andean (n = 47) gene SUI1 protein of A. thaliana. This protein seems to have an pools and genotypes from northern Peru–Ecuador (n = 6) that are important role in accurate initiator codon recognition during characterized by the ancestral type I phaseolin (Table S1). These translation initiation. The structure of PvSHP1 was characterized gene pools represent a cross-section of the entire geographical in the work by Nanni et al. (19) (Fig. S1 and Table S2), and the distribution of the wild form of P. vulgaris. The sequenced region fragment sequenced in this study included three coding regions of for each gene is located within the transcriptional unit and 12, 42, and 42 bp. This sequence was identified as homologous to encompasses between 500 and 900 bp, which includes both introns Arabidopsis SHP1 (SHATTERPROOF 1), a gene that is involved and exons. Altogether, we sequenced ∼3.4 kb per accession. in the control of shattering. We investigated the structure of the sequenced gene fragments We considered the variation in the coding regions of the loci in P. vulgaris, starting with BLAST analysis against the reference studied in all of the P. vulgaris accessions. We found that the sequences for the identification of anchor (Leg) markers coding regions of three loci (Leg044, Leg100, and Leg223) did (Fig. S1 and Table S2) (32). For the Leg marker Leg044, we not show nucleotide substitutions. For Leg133, there was one identified an exon of 102 bp within the fragment, and the deduced synonymous substitution in three Mexican accessions and all of

Table 1. Population genetics statistics for the five gene fragments and the concatenate sequences in the different gene pools of P. vulgaris −3 −3 Population N VPiSHHdπ × 10 θW × 10 D

Concatenate — All 84 137 123 14 56 0.96 9.9 8.3 SCIENCES

MW 37 119 98 21 34* 0.99* 10.6* 8.7* — AGRICULTURAL AW 43 32 22 10 18 0.86 1.0 2.3 — PhI 4 18 0 18 4 1.00 2.7 3.0 — Leg044 All 96 23 23 0 13 0.83 6.1 5.4 0.35 ns MW 44 20 10 10 9* 0.82* 4.3* 5.6* −0.73 ns AW 46 9 3 6 5 0.47 0.9 2.5 −1.77 ns PhI 6 14 0 14 2 0.33 5.7 7.4 −1.47 ns Leg100 All 99 45 43 2 13 0.70 22.6 15.4 1.47 ns MW 47 40 39 1 10* 0.83* 22.9* 16.0* 1.46 ns AW 46 1 0 1 2 0.04 0.1 0.4 −1.11 ns PhI 6 2 2 0 2 0.53 1.9 1.6 1.03 ns Leg133 All 101 15 12 3 9 0.63 5.1 5.0 0.06 ns MW 48 13 12 1 6* 0.75* 6.2* 5.1* 0.70 ns AW 47 3 1 2 4 0.20 0.4 1.2 −1.34 ns PhI 6 0 0 0 1 0.00 0.0 0.0 — Leg223 All 94 7 7 0 8 0.78 3.2 2.9 0.23 ns MW 43 6 6 0 7* 0.79* 2.9* 3.0* −0.11 ns AW 47 4 3 1 4 0.45 1.5 1.9 −0.58 ns PhI 4 0 0 0 1 0.00 0.0 0.0 — PvSHP1 All 98 49 44 5 26 0.84 14.0 11.2 0.81 ns MW 47 42 36 6 20* 0.92* 16.0* 11.2* 1.47 ns AW 45 15 15 0 4 0.32 1.7 4.0 −1.79 ns PhI 6 2 0 2 3 0.60 0.8 1.0 −1.13 ns

All, all genotypes of P. vulgaris; AW, Andean wild; D, the D parameter by Tajima (66) for testing neutrality; H, number of haplotypes; Hd, haplotype diversity; MW, Mesoamerican wild; N, sample size; ns, not significant; − PhI, phaseolin I type; Pi, parsimony informative sites; S, singleton variable sites; V, variable sites; π × 10 3 and −3 θW × 10 , two measures of nucleotide diversity from Tajima (64) and Watterson (65) (θ estimator), respectively. *The MW diversity parameters that are higher than those parameters of the AW.

Bitocchi et al. PNAS | Published online March 5, 2012 | E789 Downloaded by guest on September 28, 2021 Table 2. Loss of nucleotide diversity in the Andean wild (AW) ulations (from P ≤ 0.009 for haplotype, haplotype diversity, and π population vs. Mesoamerican wild (MW) population calculated to P ≤ 0.016 for θW). This finding was also shown by the loss of π − π π π π as L =1 ( AW/ MW), where MW and AW are the nucleotide diversity (Lπ) estimates between the Mesoamerican and Andean diversities in MW and AW populations, respectively (39) populations, with a reduction in the diversity for the latter Locus Lπ compared with the former that ranged from 0.49 (Leg223) to 1.00 (Leg100) and was 0.90 for the concatenate sequence (Table Leg044 0.78 2). Finally, even if Tajima’s D was never significant, in the Leg100 1.00 Andean population, it was negative and always much lower than Leg133 0.93 Tajima’s D from Mesoamerica (Table 1). Leg223 0.49 PvSHP1 0.89 Population Structure. The population structure analysis de- Concatenate 0.90 termined six subpopulations that best define the population (Fig. 1, B1 to B6). All of the Andean accessions were clearly assigned to cluster B6, with high percentages of membership (qB6 ≥ 0.76). – the accessions characterized by type I phaseolin. There was also The same was seen for the type I phaseolin of the northern Peru ≥ one nonsynonymous substitution that involved only one Andean Ecuador accessions, which were assigned to cluster B5 (qB5 accession from Argentina [a 1-aa replacement: Asp (D) for Ala 0.91). The Mesoamerican accessions showed a different sce- (A)]. Finally, for PvSHP1, a nonsynonymous substitution was nario; indeed, they were subdivided into four different clusters (B1–B4), and they showed higher levels of admixture. We con- found in two Mesoamerican accessions from Mexico [a 1-aa ≥ replacement: Gln (Q) for His (H)]. sidered a threshold of q 0.70 to assign the individuals to these The nucleotide diversity for each of these five loci and the four clusters. Cluster B1 included 17 Mesoamerican accessions q ≥ concatenate sequence was analyzed across all of the P. vulgaris ( B1 0.85), 6 of which were from Mexico, 6 from Guatemala, 4 from Colombia, and 1 from El Salvador. The other three clusters accessions, taking into account the major population subdivisions were composed of only Mexican accessions: cluster B2 included that corresponded to the three wild gene pools: the Meso- seven accessions (q ≥ 0.93), cluster B3 included eight accessions american, Andean, and phaseolin type I (northern Peru–Ecua- B2 (q ≥ 0.76), and cluster B4 included four accessions (q ≥ 0.99). dor) groups (Table 1). For the concatenate sequence, which B3 B4 fi One Mexican accession (PI325677) was not assigned to any included 84 accessions of P. vulgaris for which all of the ve loci specific Mesoamerican cluster because of its high level of ad- sequences were obtained, a total of 137 variable sites (V) were mixture (qB2 = 0.28, qB3 = 0.45, qB6 = 0.22, and qB4 = 0.05). observed. The number of haplotypes was 56, none of which was To better understand the geographical distributions of the shared among the three gene pools. The highest number of Mesoamerican accessions that belong to genetic groups B1–B4, haplotypes was 34 for the Mesoamerican gene pool, with 18 we carried out spatial interpolation of the membership coef- haplotypes in the Andean gene pool and 4 haplotypes in the ficients (Fig. 2). The accessions of cluster B1 were distributed all – northern Peru Ecuador accessions. Among all of the groups, the along the Mesoamerican gene pool from the north of Mexico highest diversity was seen in the Mesoamerican wild population down to Colombia (Fig. 2A), with a major presence toward the −3 (π = 10.6 × 10 ), which was 10-fold higher than the diversity of Pacific Ocean. The accessions of cluster B2 were spread from −3 the Andean accessions (π = 1.0 × 10 ). Considering each gene central (particularly on the Caribbean side) down to southern fragment separately, the diversity was always higher in Meso- Mexico (Fig. 2B). However, there was a lack of representation of america than in the Andes (for all of the statistics considered: these two clusters in a wide area of central Mexico. Clusters B3 π θ number of haplotypes, haplotype diversity, , and W) (Table 1). and B4 were represented essentially in Mexico and in particular, According to the binomial distribution, this pattern has a prob- above the Transverse Volcanic Axis, with cluster B3 widely ability of occurring by chance of P = 0.03. Moreover, considering spread from northern to central Mexico (Fig. 2C) and cluster B4 the estimates from the five loci and using the Wilcoxon–Kruskal– more restricted to a small area in central Mexico (Fig. 2D). Wallis nonparametric test, the Mesoamerican wild bean showed The relationships among the clusters identified were revealed a significant higher diversity compared with the Andean pop- by a neighbor-joining (NJ) tree (Fig. 3). There was no clear

Fig. 1. Percentages of membership (q) for each of the clusters identified (B1–B6; color-coded as indicated). Each accession is represented by a vertical line divided into colored segments, the lengths of which indicate the proportions of the genome that are attributed to the specific clusters. The accessions are ordered according to latitude from northern Mexico to northern Argentina. The country of origin is indicated by the horizontal line. AW, Andean wild; ar, Argentina; bl, Bolivia; C_mx, central Mexico; col, Colombia; ec, Ecuador; es, El Salvador; gt, Guatemala; MW, Mesoamerican wild; N_mx, north Mexico;N_pr, northern Peru; PhI, type I phaseolin (northern Peru–Ecuador); S_mx, south Mexico; S_pr, southern Peru.

E790 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al. Downloaded by guest on September 28, 2021 PNAS PLUS

Fig. 2. Spatial interpolation of membership coefficients (q) for the four Mesoamerican clusters identified by the Bayesian clustering analysis. (A) B1_blue. (B) B2_light-blue. (C) B3_green. (D) B4_orange. Latitude and longitude are expressed in the Universal Transverse Mercator system. SCIENCES

distinction between the three gene pools (Mesoamerican, haplotypes, where the former seemed to be distributed all along AGRICULTURAL Andean, and northern Peru–Ecuador). Interestingly, the Andean the tree and the latter showed haplotypes shared always with the cluster (B6) was more closely related to the Mesoamerican group Mesoamerican accessions; the only exception here was PvSHP1, – B3, and the northern Peru Ecuador cluster (B5) was more where there were no common haplotypes between these gene closely related to the Mesoamerican group B4; however, the pools. However, a clear relationship was evident between a group Mesoamerican groups B1 and B2 were in an external position. of Mesoamerican haplotypes (from central Mexico) and the three A similar population structure was revealed by the NJ tree obtained considering the single accessions (Fig. 4) according to the Bayesian model-based approach [Bayesian Analysis of Pop- ulation Structure (BAPS)]. Also in this case, there was no clear distinction between the three gene pools. Indeed, the Meso- american accessions were found to be distributed in all of the tree branches: the Andean accessions (Fig. 4, AW) clustered with the Mesoamerican group B3 (bootstrap value = 97%) and the northern Peru–Ecuador accessions (Fig. 4, PhI) were more related to the other Mesoamerican groups (bootstrap value = 98%) and particularly, the B4 accessions. The Mesoamerican groups B1 and B2 were included in a clade that was statistically well-supported (bootstrap value = 79%).

Haplotype Networks. The haplotype networks for each of the five loci are shown in Fig. 5. The number of haplotypes found for each gene ranged from 8 (Leg223) to 26 (PvSHP1). The Leg044, Leg133, and Leg223 haplotypes were interrelated through a few mutational steps, whereas a higher number of evolutionary steps was seen for the Leg100 and PvSHP1 haplotypes. However, a consistent observation arose from this analysis: the Meso- american gene pool had the greatest number of haplotypes for all of the loci, from 6 (Leg133) to 20 (PvSHP1), and the Andean gene pool showed a star-like structure with 1 major haplotype with a high frequency (higher than 0.72) and a few (from 1 to 3) additional minor haplotypes. Furthermore, there was no clear Fig. 3. Unrooted NJ tree showing the phylogenetic relationships of genetic distinction between the Mesoamerican and Andean group of clusters identified by cluster analysis.

Bitocchi et al. PNAS | Published online March 5, 2012 | E791 Downloaded by guest on September 28, 2021 Fig. 4. Unrooted NJ bootstrap tree inferred from the concatenate sequence data. Each set of accessions (as indicated) is represented by a colored circle, and each color indicates the membership to the BAPS groups. Small gray and violet circles represent the nodes for which bootstrap values are higher that 50%and 80%, respectively (the 80% threshold highlights the relationships with very strong support). AW, Andean wild; MW, Mesoamerican wild; PhI, type I phaseolin (northern Peru–Ecuador).

major Andean haplotypes, which were separated by six muta- Predomestication Bottleneck in the Andes. Recently, on the basis of tional steps (Fig. 5, PvSHP1). A confirmation of the clear re- a comparison of the levels observed for AFLP (13) and SSR (14) lationship between the Andean and Mesoamerican accessions of diversity in the wild populations of P. vulgaris from the Andes group B3 that was highlighted by the NJ trees (Figs. 3 and 4) was and Mesoamerica, the work by Rossi et al. (13) suggested that provided by these haplotype networks, even if, at this level, a bottleneck had occurred in the Andes before domestication. a similar situation can be seen for B1. Finally, for all of the genes, Indeed, for molecular markers characterized by a higher muta- the type I phaseolin accessions showed haplotypes that were tion rate (SSRs compared with AFLPs), there were no (or only closer to the Mesoamerican accessions and often separated from very small) differences in the diversities observed in the two the majority of the Andean accessions. main geographical areas, whereas there was much lower di- versity in the Andes for AFLPs compared with the Mesoamer- Discussion ican wild P. vulgaris. Indirect estimates of the AFLP mutation − − fi rate have shown values that vary from 10 6 to 10 5 (34–36), In this study, we analyzed the nucleotide diversity for ve gene − whereas the SSR mutation rate is higher; it ranges from 10 3 to fragments in a large sample of wild P. vulgaris to address the −4 origins of this species. Indeed, the main aim was to throw light 10 using both indirect (34, 37, 38) and direct (39, 40) esti- onto the unique scenario among crop plants that characterizes mates. Thus, following the model proposed in the work by Nei the common bean: the existence of two major geographically et al. (41) that described the effects of a bottleneck on the ge- distinct evolutionary lineages that predate domestication (Meso- netic diversity of a population at a neutral locus, the work by Rossi et al. (13) suggested the occurrence of a bottleneck in the american and Andean). Our results indicate a clear pattern as- Andes before domestication. This bottleneck was then re- sociated with a Mesoamerican origin of this species from which covered by markers with a high mutation rate compared with different migration events extended the distribution of P. vulgaris markers showing lower rates of mutation. According to this into South America. hypothesis for nucleotide diversity, an even lower diversity To date, the most credited hypothesis relating to the origins of should be expected in the Andes compared with Mesoamerica. the common bean has indicated that, from a core area on the Indeed, as indicated in the work by Lynch and Conery (42) for − western slopes of the Andes in northern Peru and Ecuador, the , the nucleotide mutation rate is ∼6.1 × 10 9, much wild beans were dispersed north (to Colombia, Central America, lower than both the AFLP and SSR rates. Our sequence data and Mexico) and south (to southern Peru, Bolivia, and Argentina), show a very strong difference in the genetic diversity between which resulted in the Mesoamerican and Andean gene pools, re- the wild Mesoamerican and Andean accessions (Lπ = 90%). fi spectively. This hypothesis has relied on the identi cation of a These reductions are about 2- and 13-fold higher than the phaseolin (the major seed storage protein) as the ancestral pha- reductions computed in a comparable sample of P. vulgaris seolin (type I), and it was based on the assumption that the species genotypes using AFLP (45%) (13) and SSR data (7%) (14), phylogeny is identical to the phylogeny of the gene. However, for respectively. Thus, our data strongly support the bottleneck several reasons (33), this strict relationship has not always held, hypothesis of Rossi et al. (13), and they are based on the clear and as our results suggest, the current distribution of phaseolin relationship between the mutation rate and the time of diversity might not reflect its ancient distribution. Alternatively, type I recovery from the occurrence of a bottleneck: the higher the phaseolin might be extinct in Mesoamerica, or it might still be mutation rate, the faster the recovery of diversity. To the best of present but just not included in the samples analyzed. our knowledge, this example is one of the few (40) of the critical

E792 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al. Downloaded by guest on September 28, 2021 PNAS PLUS

Fig. 5. Haplotype networks of the five nuclear loci. Each circle represents a single different haplotype, and the circle sizes are proportional to the number of individuals that carry the same haplotype. Black circles indicate missing intermediate haplotypes. The lengths of the lines of the haplotype networks are proportional to the number of mutational steps, which are indicated when more than one. Colors show the genotypes belonging to the different BAPS clusters (as indicated). AW, Andean wild; MW, Mesoamerican wild; PhI, type I phaseolin (northern Peru–Ecuador).

role of marker mutations in describing the diversity of plant clusters, which were more related to the South American pop- fi populations, thus underlining the need for the careful consid- ulations B5 and B6. This nding is clearly not compatible with the SCIENCES

eration of mutation rates in diversity studies. hypothesis of a South American origin, where the phaseolin I AGRICULTURAL genotypes would be expected to be intermediate between the Population Structure in the Mesoamerican Gene Pool. The occur- Mesoamerican and Andean clusters. rence of three gene pools of the wild common bean (Meso- The Mesoamerican origin of the wild common bean is sup- america, Andes, and northern Peru–Ecuador) has been shown ported by the large diversity observed, and it is also confirmed by extensively (9, 12–14, 23, 24, 31, 43). In particular, the subdivisions the single locus haplotype networks, where the northern Peru– of the two major ecogeographical gene pools (Mesoamerica and Ecuador haplotypes were closer to the Mesoamerican groups and Andes) have been shown by several studies that have used dif- often separated from the majority of the Andean accessions that ferent types of markers. Even if high population structure has were usually represented as a single, highly frequent haplotype. been seen in the Mesoamerican wild gene pool (12), it has usually In summary, this analysis of the population structure supports been considered as a single gene pool. However, in the present the Mesoamerican origin hypothesis. At the same time, it reveals study, while the northern Peru–Ecuador and Andean gene pools the very complex geographical structure of the genetic diversity are, indeed, characterized by homogeneous assignments into in Mesoamerica, with central Mexico and the Transverse Vol- specific genetic groups (B5 and B6, respectively), the Meso- canic Axis, which originated ∼5 Mya in the Late Miocene, as the american accessions are clearly split into four distinct genetic cradle of diversity of P. vulgaris. As the magnitude of this clusters, B1–B4, that are clearly separated also with the NJ tree structure has not been clearly identified using multilocus mark- based on the single individual genotypes (Fig. 4). Moreover, a very ers, on the basis of its old origins, this finding would suggest that important result is seen in the lack of a clear distinction between its signature has been partially hidden because of the combined the Mesoamerican and Andean wild gene pools (Figs. 3–5), effects of different mutation rates and recombination. whereas the Mesoamerican clusters show different degrees of relatedness with the other gene pools. In particular, as showed by Origin of the Common Bean. Our study presents clear evidence of a the NJ trees (Figs. 3 and 4), the Andean wild accessions (B6) were Mesoamerican origin of P. vulgaris, which was most likely lo- more related to the Mesoamerican B3 as were the northern Peru– cated in Mexico, both from the analysis of the population Ecuador accessions (B5) to the Mesoamerican B4; indeed, the structure and phylogeny and the confirmation of the occurrence major Mesoamerican clusters (B1 and B2) were less related to the of a bottleneck before domestication in the Andes, which was Andean and northern Peru–Ecuador clusters. It is, thus, important proposed in the work by Rossi et al. (13). The Mesoamerican to consider the geographical distributions of these groups in origin is consistent with the known distribution of most of the Mesoamerica. The B1 group was present essentially across all of close relatives of P. vulgaris, the much higher diversity of Meso- the geographical area from the north of Mexico down to Colom- american wild compared with the diversity from South America, bia, whereas the B2 group was spread from central to southern the occurrence of a severe bottleneck in the Andes before do- Mexico. According to our data, the B1 and B2 clusters were almost mestication, and finally, the occurrence in Mesoamerica of wild absent in a wide area of central Mexico, where they were beans that are closely related to those beans found in South substituted by the two Mesoamerican groups, as the B3 and B4 America, both from the Andean and northern Peru–Ecuador

Bitocchi et al. PNAS | Published online March 5, 2012 | E793 Downloaded by guest on September 28, 2021 gene pool. Thus, we suggest that P. vulgaris from northern Peru– DNA fragments were amplified using 25 ng DNA and the following Ecuador is a relict population that only represents a fraction of reagent concentrations: 0.25 μM each forward and reverse primer, 200 μM each dNTP, 2.5 mM MgCl2,1× Taq polymerase buffer, 1 unit AmpliTaq DNA the genetic diversity of the ancestral population and that this fi μ population migrated from Central Mexico in ancient times. The polymerase, and sterile double-distilled H2Otoa nal volume of 100 L. Amplifications were conducted with a 9700 Thermal Cycler (Perkin-Elmer results that suggest type I phaseolin as ancestral seem quite Applied Biosystems) with an initial denaturation of 1 min at 95 °C that was robust; thus, the absence of this type of seed protein in the followed by 30 cycles of 1 min at 95 °C, 1 min at X °C, and 2 min at 72 °C Mesoamerican gene pool could be explained by two alternative plus 10 min of final extension at 72 °C. The X °C refers to the annealing hypotheses: the type I phaseolin became extinct in Mesoamerica, temperature, which is specified for each primer pair in Table S2.ThePCR or it might still be present but just not included in the samples products were purifiedusingGFXPCRDNAandGelBandPurification Kits analyzed in the literature. (GE Healthcare) according to the manufacturer’s instructions. For the Our data also present a scenario for the evolutionary history of Leg100, Leg133, and Leg223 loci, the samples were sequenced on both the wild common bean, with the magnitude of the population strands using forward and reverse primers on the cycle sequencing re- action with BigDye Terminator Cycle Sequencing Ready Reaction Kits subdivisions in Mesoamerica not having been clearly recognized (Applied Biosystems). The products were resolved on an ABI Prism 3100- before this study. Avant Automated Sequencer (Applied Biosystems). The sequence data were analyzed using Pregap4 and Gap4 of the Staden Software Package Conclusions (http://staden.sourceforge.net/). The Pregap4 modules were used to pre- Evolutionary studies of crop species are crucial for several pare the sequence data for assembly (quality analysis). Gap4 was used for applications; indeed, the knowledge relating to the level and the final sequence assembly of the Pregap4 output files (normal shotgun structure of genetic diversity of crop plants and their wild rela- assembly). The single-strand sequencing reaction for Leg044 (reverse tives is the starting point of any breeding program (44–52). Our primer) and PvSHP1 (forward primer) was performed by Macrogen. The fragments were resequenced if there was any ambiguity as to which allele study indicates that, to explore new genetic diversity that is not was present. The sequences are accessible at GenBank (GeneBank acces- incorporated into the current domesticated germplasm and sion nos. JN796475–JN796922). consider this high genetic diversity in the Mesoamerican acces- sions that is not present in the Andean gene pool, it is the wild Diversity Analyses. For the Leg markers, the identification of exons and Mesoamerican germplasm that should be used in breeding pro- introns was carried out by BLAST analysis (60) against the reference sequence grams, because it has potential for the release of new . of A. thaliana (32) (Table S2). For PvSHP1, its structure was previously iden- Moreover, it is crucial to consider the high population structure tified in the work by Nanni et al. (19). Sequence alignment and editing were that characterizes the Mesoamerican wild germplasm to sample carried out using MUSCLE v3.7 (61) and BioEdit v.7.0.9.0 (62). Insertion/ the largest amount of diversity for introgression into commercial deletions (indels) were not included in the analyses. Molecular population fi genetic analyses were conducted using DnaSP 5.10.01 (63). As given in Table varieties. This nding is very important if we consider that the 1, estimations for the five gene fragments and the concatenate sequences in majority of the improved varieties of the common bean are of the different gene pools of P. vulgaris were made for the number of variable Andean origin at present. Furthermore, exploration of new ge- sites, singleton variable sites, parsimony informative sites, haplotypes,

netic diversity is also fundamental for the meeting of future haplotype diversity, nucleotide diversity (π in ref. 64 and θW in ref. 65), and D demands for cultivars that can adapt to climate change, while by Tajima (66). To measure the loss of nucleotide diversity in the Andean also maintaining, or improving their yields. wild vs. Mesoamerican wild populations as proposed in the work by Vigo- roux et al. (39), we used the statistic Lπ =1− (πAW/πMW), where πMW and πAW Materials and Methods are the nucleotide diversities in Mesoamerican and Andean wild pop- ulations, respectively; the Lπ parameterrangesfromzerotoone,whichin- Plant Materials. A panel of 102 wild accessions of the common bean was se- dicate a total loss of diversity and no loss of diversity, respectively. Haplotype lected to represent the geographical distribution of wild P. vulgaris from trees for each gene fragment were constructed using the median-joining northern Mexico to northwestern Argentina. The accessions are representa- network algorithm implemented in the program NETWORK 4.5.1.6 (67). tive of the different gene pools of the species: 49 Mesoamerican accessions Considering the concatenate sequence, an unrooted phylogenetic tree was (Mexico, Central America, and Colombia), 47 Andean accessions (South constructed based on Kimura two-parameter distances, and the relative America), and 6 wild accessions from northern Peru and Ecuador that are support for each node was tested by the bootstrap method using 1,000 characterized by the ancestral type I phaseolin (23, 24). The accessions char- replicates in MEGA 4 (68). The sites with gaps were also excluded from acterized by the type I phaseolin are from few small populations found in these analyses. restricted geographic areas on the western slope of the Andes. We used six of these accessions to represent the diversity of these populations, including Population Structure Analyses. A Bayesian model-based approach was used only well-described and characterized accessions. A complete list of the to infer the hidden genetic population structure of our sample and thus, accessions studied is available in Table S1. The were provided by the assign the genotypes into genetically structured groups/populations. This Department of Agriculture Western Regional Plant Introduction approach was implemented in the software BAPS 5.3 (69–72). This version Station and the International Centre of in Colombia. of the software incorporates the possibility to account for the dependence caused by linkage between the sites within aligned sequences, which is PCR and Sequencing. Genomic DNA was extracted from each accession from different from the most widely used STRUCTURE software (73) that uses young of a single, greenhouse-grown plant using the miniprep ex- a model that is not designed to deal with background linkage disequilib- fi ∼ traction method (53). A total of ve 500- to 900-bp gene regions across the rium between very tightly linked markers (74). A total of 84 accessions, common bean genome were sequenced (Table S2). Four of these fragment those accessions showing high-quality sequences for all of the five loci, was genes (Leg044, Leg100, Leg133, and Leg223) were chosen from a set of these used in this analysis. We carried out a genetic mixture analysis to de- Leg markers developed in the work by Hougaard et al. (32). Single-copy termine the most probable number of populations (K) given the data (71, fi orthologous genes between legume species were identi ed, and primers 72). Under its default settings, BAPS includes K as a parameter to be es- fi were designed in conserved exon regions for the ampli cation of exon and timated, and the best partition of the data into K clusters is identified as – fi intron sequences (32, 54 56). The fth gene fragment, PvSHP1, is homolo- the one with the highest marginal log likelihood. The clustering with gous to the SHATTERPROOF (SHP1) gene involved in the control of fruit linked loci analysis was chosen to account for the linkage present between shattering in A. thaliana. This finding was developed in the work by Nanni sites within aligned sequences; at the same time, the five loci were as- et al. (19) that analyzed PvSHP1 nucleotide variation in a limited sample of sumed as independent. Ten iterations of K (from 1 to 20) were conducted wild P. vulgaris (29 and 16 Mesoamerican and Andean wild genotypes, re- to determine the optimal number of genetically homogeneous groups. spectively). The available PvSHP1 sequence data of 40 wild genotypes from The admixture analysis was then applied to estimate individual admixture the study by Nanni et al. (19) were included in the present analyses (Table S1). proportions with regards to the most likely number of K clusters identified It is important to note that mapping experiments show that gene du- (70, 72). Admixture inference was based on 100 realizations from the plication and highly repeated gene sequences in P. vulgaris are generally low, posterior of the allele frequencies. We repeated the admixture five times with most loci occurring as single copies (57–59). to confirm the consistency of the results. Spatial interpolation of mem-

E794 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al. Downloaded by guest on September 28, 2021 bership coefficients was performed according to the kriging method, ACKNOWLEDGMENTS. We thank G. Bertorelle for critical reading of this PNAS PLUS which was implemented in the R packages spatial (http://www.r-project. manuscript and his valuable advice. This study was supported by Italian org/). To explore the relationships among the identified clusters, an Government (MIUR) Grant n. 20083PFSXA_001, Project Progetti di Ricerca di unrooted phylogenetic tree was constructed using the NJ algorithm in Interesse Nazionale (PRIN) 2008, and the Università Politecnica delle MEGA 4 (68). Marche (2006–2010).

1. Brougthon WJ, et al. (2003) Beans (Phaseolus spp.): model . Plant Soil 29. Delgado-Salinas A, Bibler R, Lavin M (2006) Phylogeny of the genus Phaseolus 252:55–128. (Leguminosae): A recent diversification in an ancient landscape. Syst Bot 31:779–791. 2. Toro O, Tohme J, Debouck DG (1990) Wild Bean (Phaseolus vulgaris L): Description 30. Koenig R, Singh SP, Gepts P (1990) Novel phaseolin types in wild and cultivated and Distribution (Centro Internacional de Agricultura Tropical, Cali, Colombia). common bean (Phaseolus vulgaris, Fabaceae). Econ Bot 44:50–60. 3. Gepts P, Bliss FA (1985) F1 hybrid weakness in the common bean: Differential 31. Chacón MI, Pickersgill B, Debouck DG, Salvador Arias J (2007) Phylogeographic geographic origin suggests two gene pools in cultivated bean germplasm. J Hered 76: analysis of the chloroplast DNA variation in wild common bean (Phaseolus vulgaris L.) 447–450. in the . Plant Syst Evol 266:175–195. 4. Koinange EMK, Gepts P (1992) Hybrid weakness in wild Phaseolus vulgaris L. J Hered 32. Hougaard BK, et al. (2008) Legume anchor markers link syntenic regions between 83:135–139. Phaseolus vulgaris, Lotus japonicus, Medicago truncatula and Arachis. Genetics 179: 5. Delgado-Salinas A, Bonet A, Gepts P (1988) The wild relative of Phaseolus vulgaris in 2299–2312. Middle America. Genetic Resources of Phaseolus Beans, ed Gepts P (Kluwer, Boston), 33. Doyle JJ (1992) Gene trees and species trees: Molecular systematics as one character pp 163–184. . Syst Bot 17:144–163. 6. Gepts P, Debouck DG (1991) Origin, domestication, and evolution of the common 34. Mariette S, et al. (2001) Genetic diversity within and among Pinus pinaster bean, Phaseolus vulgaris. Common Beans: Research for Crop Improvement, eds populations: Comparison between AFLP and microsatellite markers. Heredity (Edinb) Voysest O, Van Schoonhoven A (CAB International, Wallingford, Oxon, United 86:469–479. Kingdom), pp 7–53. 35. Gaudeul M, Till-Bottraud I, Barjon F, Manel S (2004) Genetic diversity and 7. Singh SP, Gutiérrez JA, Molina A, Urrea C, Gepts P (1991) Genetic diversity in differentiation in Eryngium alpinum L. (Apiaceae): Comparison of AFLP and – cultivated common bean: II. Marker-based analysis of morphological and agronomic microsatellite markers. Heredity (Edinb) 92:508 518. traits. Crop Sci 31:23–29. 36. Kropf M, Comes HP, Kadereit JW (2009) An AFLP clock for the absolute dating of fi 8. Gepts P, Osborn TC, Rashka K, Bliss FA (1986) Phaseolin-protein variability in wild shallow-time evolutionary history based on the intraspeci c divergence of – forms and landraces of the common bean (Phaseolus vulgaris): Evidence for multiple southwestern European alpine plant species. Mol Ecol 18:697 708. centers of domestication. Econ Bot 40:451–468. 37. Estoup A, Angers B (1998) Microsatellites and minisatellites for molecular ecology: 9. Koenig R, Gepts P (1989) Allozyme diversity in wild Phaseolus vulgaris: Further Theoretical and experimental considerations. Advances in Molecular Ecology,ed – evidence for two major centers of genetic diversity. Theor Appl Genet 78:809–817. Carvalho G (IOS Press, Amsterdam), pp 55 86. 10. Becerra-Velásquez VL, Gepts P (1994) RFLP diversity in common bean (Phaseolus 38. Garoia F, Guarniero I, Grifoni D, Marzola S, Tinti F (2007) Comparative analysis of fi vulgaris L.). Genome 37:256–263. AFLPs and SSRs ef ciency in resolving population genetic structure of Mediterranean – 11. Freyre R, Ríos R, Guzmán L, Debouck D, Gepts P (1996) Ecogeographic distribution of Solea vulgaris. Mol Ecol 16:1377 1387. Phaseolus spp. (Fabaceae) in Bolivia. Econ Bot 50:195–215. 39. Vigouroux Y, et al. (2002) Identifying genes of agronomic importance in by 12. Papa R, Gepts P (2003) Asymmetry of gene flow and differential geographical screening microsatellites for evidence of selection during domestication. Proc Natl – structure of molecular diversity in wild and domesticated common bean (Phaseolus Acad Sci USA 99:9650 9655. 40. Thuillet AC, Bataillon T, Poirier S, Santoni S, David JL (2005) Estimation of long-term vulgaris L.) from Mesoamerica. Theor Appl Genet 106:239–250. effective population sizes through the history of durum wheat using microsatellite 13. Rossi M, et al. (2009) Linkage disequilibrium and population structure in wild and data. Genetics 169:1589–1599. domesticated populations of Phaseolus vulgaris L. Evol Appl 2:504–522. 41. Nei M, Maruyama T, Chakraborty R (1975) The bottleneck effect and genetic 14. Kwak M, Gepts P (2009) Structure of genetic diversity in the two major gene pools of variability in populations. Evolution 29:1–10. common bean (Phaseolus vulgaris L., Fabaceae). Theor Appl Genet 118:979–992. 42. Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate SCIENCES 15. Angioi SA, et al. (2009) Development and use of chloroplast microsatellites in genes. Science 290:1151–1155. AGRICULTURAL Phaseolus spp. and other legumes. Plant Biol (Stuttg) 11:598–612. 43. Tohme J, Gonzales DO, Beebe S, Duque MC (1996) AFLP analysis of gene pools of 16. Morrell PL, Clegg MT (2007) Genetic evidence for a second domestication of barley a wild bean core collection. Crop Sci 36:1375–1384. (Hordeum vulgare) east of the Fertile Crescent. Proc Natl Acad Sci USA 104: 44. Hammer K (1984) The (Das Domestikationssyndrom). 3289–3294. Kulturpflanze 32:11–34. 17. Özkan H, Willcox G, Graner A, Salamini F, Kilian B (2010) Geographic distribution and 45. Doebley J, Stec A, Wendel J, Edwards M (1990) Genetic and morphological analysis of domestication of wild emmer wheat (Triticum dicoccoides). Genet Resour Crop Evol a maize-teosinte F population: Implications for the origin of maize. Proc Natl Acad 58:11–53. 2 Sci USA 87:9888–9892. 18. Chacón S MI, Pickersgill B, Debouck DG (2005) Domestication patterns in common 46. Harlan JR (1992) Crops and Man (ASA, Madison, WI). bean (Phaseolus vulgaris L.) and the origin of the Mesoamerican and Andean 47. Grandillo S, Tanksley SD (1996) QTL analysis of horticultural traits differentiating the cultivated races. Theor Appl Genet 110:432–444. cultivated from the closely related species Lycopersicon pimpinellifolium. 19. Nanni L, et al. (2011) Nucleotide diversity of a genomic sequence similar to Theor Appl Genet 92:935–951. SHATTERPROOF (PvSHP1) in domesticated and wild common bean (Phaseolus vulgaris 48. Koinange EMK, Singh SP, Gepts P (1996) Genetic control of the domestication L.). Theor Appl Genet 123:1341–1357. syndrome in common bean. Crop Sci 36:1037–1045. 20. Vitte C, Ishii T, Lamy F, Brar D, Panaud O (2004) Genomic paleontology provides 49. Xiong LZ, Liu KD, Dai XK, Xu CG, Zhang Q (1999) Identification of genetic factors evidence for two distinct origins of Asian rice (Oryza sativa L.). Mol Genet Genomics controlling domestication-related traits of rice using an F population of a cross – 2 272:504 511. between Oryza sativa and O. rufipogon. Theor Appl Genet 98:243–251. 21. Londo JP, Chiang YC, Hung KH, Chiang TY, Schaal BA (2006) Phylogeography of Asian 50. Poncet V, et al. (1998) Genetic analysis of the domestication syndrome in pearl millet fi wild rice, Oryza ru pogon, reveals multiple independent domestications of cultivated (Pennisetum glaucum L, Poaceae): Inheritance of the major characters. Heredity 81: – rice, Oryza sativa. Proc Natl Acad Sci USA 103:9578 9583. 648–658. 22. Molina J, et al. (2011) Molecular evidence for a single evolutionary origin of 51. Poncet V, et al. (2000) Genetic control of domestication traits in pearl millet – domesticated rice. Proc Natl Acad Sci USA 108:8351 8356. (Pennisetum glaucum L, Poaceae). Theor Appl Genet 100:147–159. 23. Debouck DG, Toro O, Paredes OM, Johnson WC, Gepts P (1993) Genetic diversity and 52. Singh SP (2001) Broadening the genetic base of common bean cultivars: A review. ecological distribution of Phaseolus vulgaris in northwestern South America. Econ Bot Crop Sci 41:1659–1675. – 47:408 423. 53. Doyle JJ, Doyle JL (1987) A rapid DNA isolation procedure for small quantities of fresh fi 24. Kami J, Velásquez VB, Debouck DG, Gepts P (1995) Identi cation of presumed tissue. Phytochem Bull 19:11–15. ancestral DNA sequences of phaseolin in Phaseolus vulgaris. Proc Natl Acad Sci USA 54. Fredslund J, Schauser L, Madsen LH, Sandal N, Stougaard J (2005) PriFi: Using 92:1101–1104. a multiple alignment of related sequences to find primers for amplification of 25. Gepts P, Papa R, Coulibaly S, González Mejía A, Pasquet R (1999) Wild legume homologs. Nucleic Acids Res 33:W516–W520. diversity and domestication—insights from molecular methods. Wild Legumes: 55. Fredslund J, et al. (2006) A general pipeline for the development of anchor markers Proceedings of the Seventh MAFF International Workshop on Genetic Resources,ed for comparative genomics in plants. BMC Genomics 7:207. Vaughan D (National Institute of Agrobiological Resources, Tsukuba, Japan), pp 56. Fredslund J, et al. (2006) GeMprospector—online design of cross-species genetic 19–31. marker candidates in legumes and grasses. Nucleic Acids Res 34:W670–W675. 26. Schmit V, Jardin P, Baudoin JP, Debouck DG (1993) Use of chloroplast DNA 57. Vallejos CE, Sakiyama NS, Chase CD (1992) A molecular marker-based linkage map of polymorphisms for the phylogenetic study of seven Phaseolus taxa including P. Phaseolus vulgaris L. Genetics 131:733–740. vulgaris and P. coccineus. Theor Appl Genet 87:506–516. 58. Freyre R, et al. (1998) Towards an integrated linkage map of common bean. 4. 27. Freytag GF, Debouck DG (2002) Taxonomy, Distribution, and Ecology of the Genus Development of a core map and alignment of RFLP maps. Theor Appl Genet 97: Phaseolus (Leguminosae-Papilionoideae) in North America, Mexico and Central 847–856. America (Botanical Research Institute Of Texas, Ft. Worth, TX). 59. McClean PE, Lee RK, Otto C, Gepts P, Bassett MJ (2002) Molecular and phenotypic 28. Delgado-Salinas A, Turley T, Richman A, Lavin M (1999) Phylogenetic analysis of the mapping of genes controlling seed coat pattern and color in common bean cultivated and wild species of Phaseolus (Fabaceae). Syst Bot 24:438–460. (Phaseolus vulgaris L.). J Hered 93:148–152.

Bitocchi et al. PNAS | Published online March 5, 2012 | E795 Downloaded by guest on September 28, 2021 60. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein 68. Tamura K, Dudley J, Nei M, Kumar S (2007) MEGA4: Molecular Evolutionary Genetics database search programs. Nucleic Acids Res 25:3389–3402. Analysis (MEGA) software version 4.0. Mol Biol Evol 24:1596–1599. 61. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and high 69. Corander J, Waldmann P, Sillanpää MJ (2003) Bayesian analysis of genetic – throughput. Nucleic Acids Res 32:1792 1797. differentiation between populations. Genetics 163:367–374. 62. Hall TA (1999) BioEdit: A user-friendly biological sequence alignment editor and 70. Corander J, Marttinen P (2006) Bayesian identification of admixture events using analysis program for Windows 95/98/NT. Nucleic Acids Symp Ser 41:95–98. multilocus molecular markers. Mol Ecol 15:2833–2843. 63. Librado P, Rozas J (2009) DnaSP v5: A software for comprehensive analysis of DNA 71. Corander J, Tang J (2007) Bayesian analysis of population structure based on linked polymorphism data. Bioinformatics 25:1451–1452. molecular information. Math Biosci 205:19–31. 64. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. 72. Corander J, Marttinen P, Sirén J, Tang J (2008) Enhanced Bayesian modelling in BAPS Genetics 105:437–460. software for learning genetic structures of populations. BMC Bioinformatics 9:539. 65. Watterson GA (1975) On the number of segregating sites in genetical models without 73. Pritchard JK, Stephens M, Donnelly P (2000) Inference of population structure using recombination. Theor Popul Biol 7:256–276. – 66. Tajima F (1989) Statistical method for testing the neutral mutation hypothesis by DNA multilocus genotype data. Genetics 155:945 959. polymorphism. Genetics 123:585–595. 74. Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using 67. Bandelt HJ, Forster P, Röhl A (1999) Median-joining networks for inferring multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164: intraspecific phylogenies. Mol Biol Evol 16:37–48. 1567–1587.

E796 | www.pnas.org/cgi/doi/10.1073/pnas.1108973109 Bitocchi et al. Downloaded by guest on September 28, 2021