doi: 10.1111/j.1469-1809.2007.00421.x A Perspective on the History of the Iberian Gypsies Provided by Phylogeographic Analysis of Y-Chromosome Lineages ∗ A. Gusmao˜ 1, L. Gusmao˜ 1, , V. Gomes1, C. Alves1, F. Calafell2, A. Amorim1,3 and M. J. Prata1,3 1Ipatimup, Instituto de Patologia e Imunologia da Universidade do Porto. R. Dr. Roberto Frias, s/n. 4200-465 Porto. Portugal 2Unitat de Biologia Evolutiva, Departament de Ciencies` Experimentals i de la Salut, Universitat Pompeu Fabra. 08003 Barcelona, Catalonia, 3Faculty of Sciences, University of Porto, Pr. Gomes Teixeira, 4099-002 Porto. Portugal

Summary The European Gypsies, commonly referred to as Roma, are represented by a vast number of groups spread across many countries. Although sharing a common origin, the Gypsy groups are highly heterogeneous as a consequence of genetic drift and different levels of admixture with surrounding populations. With this study we aimed at contributing to the knowledge of the Roma history by studying 17 Y-STR and 34 Y-SNP loci in a sample of 126 Portuguese Gypsies. Distinct genetic hallmarks of their past and migration route were detected, namely: an ancestral component, shared by all Roma groups, that reflects their origin in India (H1a-M82; ∼17%); an influence from their long permanence in the Balkans/Middle-East region (J2a1b-M67, J2a1b1-M92, I-M170, Q-M242; ∼31%); traces of contacts with European populations preceding the entrance in the Iberian Peninsula (R1b1c-M269, J2b1a-M241; ∼10%); and a high proportion of admixture with the non-Gypsy population from Iberia (R1b1c-M269, R1-M173/del.M269, J2a-M410, I1b1b-M26, E3b1b-M81; ∼37%). Among the Portuguese Gypsies the proportion of introgression from host populations is higher than observed in other groups, a fact which is somewhat unexpected since the arrival of the Roma to Portugal is documented to be more recent than in Central or East .

Keywords: Portuguese Gypsies, Gypsy diaspora, Roma, Y-chromosome lineages, Y-SNP , Y-STR haplo- types

Introduction ter entering Iberia they progressively lost the original lan- guage, and while nowadays many Gitanos still speak Calo, Portugal is the westernmost region reached by the Gypsy which is basically Spanish with a large amount of Romani diaspora after the Roma people arrived in Europe 600– loan words, the Ciganos from Portugal speak Portuguese 700 years ago (Liegeois,´ 1989; Fraser, 1998; Kendrick & with the Calo just being a reminiscent reference language Puxon, 1998). The establishment of Gypsy groups in Por- (Fraser, 1998). tugal is recorded since the second half of the 15th century The Roma in Portugal, as indeed the Roma elsewhere, and at present they are estimated to amount to 30–50 thou- are a transnational genetic isolate which fulfil the prop- sand individuals. The Portuguese Gypsies are likely to be erties that make genetic isolates an interesting resource in a branch of the group that crossed the Pyrenees as early genetic epidemiology; namely, they have reduced genetic as the first quarter of the 15th century and rapidly spread diversity and increased linkage disequilibrium (Gonzalez-´ throughout Spain and Portugal. Contrary to many other Neira et al. personal communication). The Roma therefore Roma, the Iberian Gypsies, known as Gitanos in Spain present a particular genetic disease spectrum, with some and Ciganos in Portugal, are non-Romani-speakers. Af- prevalent diseases almost absent, others specific to Gypsies, and others with private Roma mutations (for a review, ∗ Corresponding Author: Leonor Gusmao,˜ Ipatimup Rua Dr. see Kalaydjieva et al. 2001b). Presently in Portugal, Gypsy Roberto Frias, s/n 4200-465 Porto, Portugal. Tel: +351 22 5570700; communities are spread all over the country and represent Fax: +351 22 5570799; E-mail address: [email protected] indeed a conspicuous component of the Portuguese social

C 2008 The Authors Annals of Human Genetics (2008) 72,215–227 215 Journal compilation C 2008 University College London A. Gusmao˜ et al.

and demographic landscape. Despite that, little is known Y-STR Typing about the Portuguese Gypsies. How are they related with Seventeen Y-STR loci (DYS19, DYS385, DYS389I, DYS389II, other Gypsy groups? Towhat extent interaction with other DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, Portuguese contributed to change the founder gene pool? DYS439, DYS448, DYS456, DYS458, DYS635 and GATA What signs do they still retain from the complex history H4.1) were amplified using the AmpFlSTR Yfiler PCR Am- that pre-dated their entrance in Iberia? plification Kit (AB Applied Biosystems), following the manufac- Such questions cannot be satisfactorily answered by turers’ instructions. Genotypes were produced with a 310 Ge- applying conventional historical approaches, because the netic Analyser (AB Applied Biosystems), and by comparison to proper sources are almost non-existent. As happens with reference sequenced ladders provided with the kit. The nomen- other Roma groups, the Ciganos from Portugal lack a writ- clature was according to the ISFG recommendations (Gusmao˜ ten history of their own and the documentary sources from et al. 2006) and alleles for GATA H4.1 locus were named by non-Gypsies (Coelho, 1996) are limited to episodic refer- adding nine repeats to the allele numbering of the typing kit ences. (Mulero et al. 2006). The absence of records constitutes a major complication in Gypsy studies and explains why the most instructive evidences about the original homeland of the proto- Y-SNP Typing Gypsies have resulted from linguistics and genetics. Lin- Thirty-four Y-SNPmarkers were hierarchically typed, in order to guistics has provided compelling evidence of an ancestral define the major male lineages (Figure 1). All samples were typed origin in the Indian subcontinent (Liegeois,´ 1989; Fraser for the biallelic markers SRY1532, M213, M9, M70, M22, Tat, 1998; Kendrick & Puxon, 1998). Evidence for this hypoth- 92R7, M173 and P25, as previously described by Brion´ et al. esis has been consistently found in genetic analyses (e.g. (2005). For the remaining SNPs, the following typing scheme Gresham et al. 2001; Morar et al. 2004; Malyarchuk et al. was used: 2006), which, although essentially centred in non-Iberian - M201, M170, 12f2, M26, M62 and M172 were typed in samples Gypsies, have further revealed a particular high level of carrying the M213 mutation and lacking M9, as in Brion´ et al. sub-structuring among Gypsy groups, this is in sharp con- (2005); trast to the surrounding relatively uniform European pop- - M96, M35, M78, M81, M123 and M34 were typed in samples ulations. Small population size, strong drift effects, limited carrying only the SRY1532 mutation, as previously described by inter-group gene flow and differential admixture with the Brion´ et al. (2005); host populations, seem to have acted together leading to - M242 was tested in one individual that fell within P (xR1). This mutation was typed by SNaPshot according to strong genetic differentiation between Gypsy groups. This Blanco-Verea et al. (2006); means that no group of Gypsies can be representative of - M269, M18 and M73 were screened in all individuals within other groups, which consequently makes the group the R1∗ haplogroup. M18 and M73 were typed by SNaPshot using basic unit of genetic research, as it is of social organisation the primers described in Brion´ et al. (2005). M269 was typed by (Fraser, 1998). In view of this and in order to address the RFLP (Supplementary Table S1); questions raised above about the Portuguese Ciganos,in -F∗ (xGIJK) individuals were screened for M69, M52, M82 this study we firstly examined a sample of 126 unrelated and Apt mutations in a newly implemented multiplex SNaPshot Gypsy males with a high-resolution Y-chromosome STR reaction (Supplementary Table S1); and SNP typing strategy; secondly we exploited the results - Samples carrying M172 were tested for M410, M67, M92, in the context of previously published data on other Gypsy M12 and M241 using also SNaPshot methods (Supplementary groups and respective host populations, and finally we tried Table S1). to unfold layers of genetic male lineages in the Portuguese The YCC (2002) and Jobling & Tyler-Smith (2003) nomen- Gypsies to retrieve information about major events in their clature was updated according to Cruciani et al. (2004); Sengupta demographic history. et al. (2006); Semino et al. (2004) and Cinnioglu˘ et al. (2004).

Material and Methods Data Analysis

Sample Collection Haplotype diversity, mean number of pairwise differences and pairwise genetic distances (Rst, for STR-haplotypes and Fst for Blood samples were taken under informed consent from 126 SNP-haplogroups) were calculated using the Arlequin software unrelated Gypsy males, from 18 different communities in 11 dif- ver. 3.01 (Excoffier & Scheinder, 2005). In genetic distance ferent Portuguese districts. Personal inquiries were made to each analyses, DYS385 was not considered, and the number of re- individual in order to avoid close kinship. Genomic DNA was peats in DYS389I was subtracted from DYS389II. Haplogroup extracted according to standard phenol-chloroform method. frequencies were estimated by direct counting.

216 Annals of Human Genetics (2008) 72,215–227 C 2008 The Authors Journal compilation C 2008 University College London A Perspective on the History of the Iberian Gypsies

Figure 1 Phylogenetic network of Y-SNP haplogroups, indicating the biallelic markers typed in the present work and their absolute frequencies in a sample of 126 Portuguese Gypsies. See Adams et al. (2006) for details on P25 recurrence.

The YHRD - Y Chromosome Haplotype Reference Database 42% of distinct lineages (Supplementary Table S2). Out of (Willuweit & Roewer, 2007), URL: www.ystr.org, release these, 38 were found only once (73.1% of unique haplo- 22 from August 10, 2007, was used for haplotype match types) while the remaining 14 were shared by two or more analysis. males. Three of the shared haplotypes were very well rep- Phylogenetic median-joining networks of Y-STR haplotypes resented (H18-7.9%; H21-15.9% and H24-13.5%) reach- inside haplogroups were constructed using Network 4.2.0.0 ing unusually high values in comparison to those found (www.fluxus-technology.com). in most populations for lineages defined by such a high number of Y-STRs (e.g. Alves et al. 2007; Turrina et al. Results and Discussion 2006; Berger et al. 2005; Soltyszewski et al. 2007). In most European populations haplotype diversity is higher than 99.9%, in contrast with the low value now detected in the Y-STR Haplotype Diversity Portuguese Gypsies (0.9437 ± 0.0107). The analysis of 17 Y-STR loci in the Portuguese Gypsies Previous studies (Gresham et al. 2001; Zaharova et al. revealed 52 different haplotypes yielding a proportion of 2001; Ploski et al. 2002; Nagy et al. 2007; Furedi¨ et al.

C 2008 The Authors Annals of Human Genetics (2008) 72,215–227 217 Journal compilation C 2008 University College London A. Gusmao˜ et al.

Pairs of Roma populations Pairs of non-Roma populations

3500 3500

3000 3000

2500 2500

2000 2000 R2 = 0,6493 1500 R2 = 0,0499 1500 P = 0,0002 P = 0,2529 1000 1000 Geographic distance (Km) Geographic Geographic distance (Km) Geographic 500 500

0 0 0,000 0,050 0,100 0,150 0,200 0,250 0,000 0,050 0,100 0,150 0,200 0,250 Genetic distance (Rst) Genetic distance (Rst) Figure 2 Correlation between genetic and geographic distances in Roma and non-Roma populations. 1999; Periciˇ c´ et al. 2005a; Kalaydjieva et al. 2001a) al- from the distant Lithuanian or Slovakian groups. Accord- ready had assessed Y-STR diversity in other Roma groups ingly, while genetic distances in the European host popula- (see Table 1) using, however, very distinct sets of loci. Only tions were found to be correlated with geographic distance seven Y-STRswere common to all studies, namely DYS19, (Figure 2), in agreement with the general picture widely DYS389 I, DYS389 II, DYS390, DYS391, DYS392 and reported for Europe (Rosser, 2000), the same was not ob- DYS393. In view of that, comparisons with Roma and served in the Gypsy groups (r2 = 0.0499; P = 0.2529; their host populations (which were available for all except Figure 2) as previously reported by Gresham et al. (2001) Slovakia) were based exclusively upon those seven loci. in a study restricted to Gypsies from , Spain and Haplotype diversity in the Portuguese Gypsies, using these . markers, was now 0.9097 ± 0.0121, fitting well in the The overall structure of the distance matrix can be range of values observed in other Roma groups, which visualised by means of multidimensional scaling (MDS, varied between a minimum of 0.8430 ± 0.0218 in Bul- Fig. 3), in which Roma and non-Roma groups are clearly garian Gypsies and a maximum of 0.9200 ± 0.0373 in separated. Such separation is statistically significant: the dif- Spanish Gypsies, both studied by Gresham et al. (2001). Y- ference between groups accounted for 14.8% of the to- STR diversity in each Gypsy group was remarkably lower tal genetic variance (as computed with AMOVA; p = than in any non-Gypsy host population where diversities 0.001), and was larger than that within groups (8.0%, were always higher than 0.98. p < 0.001). Differentiation among Roma (Fst = 8.71%, The reduced diversities in all Gypsy groups are signs of p < 0.000) was larger than among non-Roma (Fst = drift effects acting in their pool of Y-STR lineages and 7.70%, p < 0.000). The high genetic distances observed be- such effects can be explained by the small population size tween Roma groups pinpoints the fast differentiation that of the Roma groups, together with the deeply rooted en- Gypsy populations experienced during their dispersion in dogamy that characterises the Gypsy people (Kalaydjieva et Europe. al. 2001b; Gresham et al. 2001). Y-SNP Haplogroup Variation Pairwise Genetic Distance Analysis In the Portuguese Gypsies, 17 different Y-SNPhaplogroups Pairwise genetic affinities between Gypsy and non-Gypsy were detected (Figure 1) from which five accounted for host populations were evaluated through Rst distances. 88% of their chromosomes, namely R1b1c-M269, J2a1b- On average, the highest genetic distances were observed M67, J2a-M410, H1a and I-M170. in comparisons involving a Gypsy and a non-Gypsy Comparing the Y-SNPprofile of Gypsy and non-Gypsy population (Table 1), with all values reaching statistical Portuguese populations (Beleza et al. 2003) highly signifi- significance. Among the non-Gypsy pairs, absence of dif- cant differences were observed (Fst = 0.152; P < 0.001). ferentiation was only detected between Spain/Portugal, What mainly differentiates them is the low prevalence of Bulgaria/Macedonia and Bulgaria/, which de- R1b1c in non-Gypsies (∼60% against 27%), the much notes an Y-STR sub-structuring pattern visibly influenced higher frequency of J lineages (∼40% versus ∼10%) as well by geographical distance. Geography does not seem to play as the absence of H haplogroup in non-Gypsies, whereas such a linear role in the inter-group differentiation be- in the Gypsy sample it reached a proportion nearing 17%. tween Gypsies. In fact the Spanish Gypsies neither differed Furthermore, although the number of different hap- from their Iberian neighbours, the Portuguese Gypsies, nor logroups in both samples is similar, the frequency

218 Annals of Human Genetics (2008) 72,215–227 C 2008 The Authors Journal compilation C 2008 University College London A Perspective on the History of the Iberian Gypsies

C 2008 The Authors Annals of Human Genetics (2008) 72,215–227 219 Journal compilation C 2008 University College London A. Gusmao˜ et al.

0,8 Mac RoHun RoBul1 0,6 Lit 0,4 Bul RoBul2 RoSlo 0,2 Hun

0 -1,5 -1RoSpa -0,5 0 0,5 1 1,5 2 -0,2 Dimension 2 RoLit RoPor -0,4

-0,6 RoMac -0,8 Spa Por -1 Dimension 1

Figure 3 Multidimensional scaling plot of Rst pairwise values between Roma (grey circles) and non-Roma (black circles) populations. Codes for populations as in Table 1. distribution in Gypsies was much less monotonous than in day India (Sengupta et al. 2006) where it is widespread non-Gypsies, as indicated by haplogroup diversities that are throughout the entire subcontinent, reaching the highest 0.8367 ± 0.0138 and 0.6520 ± 0.0206, respectively. The frequencies in Central and Southern India. Being a major high haplogroup diversity in Gypsies can be explained by haplogroup in all European Roma but absent in their host admixture during the history of the group, which can be populations, it provides clear genetic evidence of the an- simplistically outlined as encompassing three main phases: cestral geographical origin of the proto-Gypsies (Gresham the period beyond the early establishment and fragmenta- et al. 2001; Kalaydjieva et al. 2001a). tion of founder groups in Europe, possibly in the Balkan Haplogroup H was one of the most represented in region (since it represented the fons et origo of Roma in Eu- our sample (16.7%). All individuals inside this haplogroup rope; Fraser 1998), the fast journey towards Western Eu- were H1a, carrying both the M52 and M82 mutations. rope, and the arrival in Iberia and subsequent settlement in Figure 4A depicts the network constructed after recover- Portugal. ing the STR information currently available on H1 chro- However, haplogroup diversity does not reflect other mosomes of different Gypsy groups (Gresham et al. 2001; relevant demographic processes, such as founder effects, Kalaydjieva et al. 2001a). It shows a star-shaped network although they are quite evident when comparing levels where an outstandingly high number of chromosomes lie of diversity within haplogroups assessed from STR vari- in a modal haplotype, shared by all Gypsy groups, and the ability. Dramatically low diversities characterise most of remaining differing from the modal by no more than two the haplogroups present more than once in the sample molecular steps. Taking together all Gypsy H1 chromo- (Supplementary Table S2), being particularly clear within somes, the internal STR diversity was remarkably reduced (i) J2a∗ (xJ2a1b)-M410, in which 20 out of 21 chromo- with a haplotype diversity of 0.398 ± 0.049 and a mean somes share exactly the same 16-STR haplotype; (ii) the variance across the seven STR loci of 0.033. This pattern 25 J2a1b∗ (xJ2a1b1)-M67 lineages, where 17 individuals of STR diversity within H1 is a clear indication that a belong to the same STR background and 5 belong to a profound bottleneck happened in the most distant past of different one; or (iii) I∗ (xI1b1b)-M170, with 10 out of 12 the Gypsy people, suggesting that, in full accordance with individuals sharing a given haplotype. Gresham et al. (2001), a remarkably small number of founders must have been present in the ancestral group from which new Roma populations emerged. Phylogeographical Analyses When searching for matches with the Gypsy H1 modal haplotype (DYS19∗ 15, DYS389I∗ 14, DYS389II∗ 30, ∗ ∗ ∗ ∗ Haplogroup H DYS390 22, DYS391 10, DYS392 11 and DYS393 12) Haplogroup H, defined by the M69 mutation (Underhill it was possible to observe that it was shared by two tribal et al. 2001), possibly arose in the boundaries of present- groups from Southern India, being rather common among

220 Annals of Human Genetics (2008) 72,215–227 C 2008 The Authors Journal compilation C 2008 University College London A Perspective on the History of the Iberian Gypsies

Figure 4 Median-joining networks of Y STR haplotypes within haplogroups H1, J2a1b∗ (xJ2a1b1), I and R1∗ of Gypsy populations. For each haplogroup the area of the circles is proportional to the number of individuals; the smallest area is equivalent to one individual. Within the pie-charts, orange refers to Portuguese, blue to Bulgarian (Gresham et al. 2001 and Kalaydjieva et al. 2001a), yellow to Spanish, and red to Lithuanians (Gresham et al. 2001). Arrows assign lineages possessing the Dinaric modal haplotype (DMH) or the Nordic modal haplotype (NMH). ∗ identifies the Portuguese individual belonging to I1b12. one of them (Kivisild et al. 2003). Additionally, four exact tribes from North or South), means that it is not possible matches for the Gypsy H1 haplotype extended to three to exclude any scenario for the Indian source population more markers (DYS437∗ 14, DYS438∗ 9 and DYS439∗ 11) of the first wave of migration. were also found among individuals belonging to castes Portuguese Gypsies preserved a low proportion of H of Indian warriors, two of them from the North (Zerjal lineages in comparison to other Roma from Europe (17% et al. 2007). These findings illustrate how difficult it is to against ∼80% in the Bulgarians, for instance). address the questions: which specific Indian region and social stratum did the first Gypsies arise? The widespread, Haplogroup J although uneven, distribution of the Gypsy H1 modal Over one third of the Portuguese Gypsy males be- haplotype in different Indian populations (either castes or long to haplogroup J2, being distributed in five different

C 2008 The Authors Annals of Human Genetics (2008) 72,215–227 221 Journal compilation C 2008 University College London A. Gusmao˜ et al.

subgroups. Of the two subgroups defined by mutation was also observed at low frequencies in the Bulgarian and M67, J2a1b∗ (xJ2a1b1) was very well represented account- Spanish Gypsies (Gresham et al. 2001) with the STR back- ing for ∼20% of the Gypsy males. J2a1b∗ was also com- ground of two of the Bulgarian males differing just one monly observed in the Roma from Bulgaria, Lithuania and molecular step from the Portuguese J2a1b1 lineage. J2a1b1 Spain (Gresham et al. 2001). has been predominantly found in the Northern Mediter- The network produced with the STR haplotypes within ranean from westward (Di Giacomo, 2004), having J2a1b∗ lineages (Figure 4B) was very similar to the one an overall geographical distribution and history of disper- obtained for haplogroup H1. However, it is worth notic- sion quite similar to J2a1b∗ (xJ2a1b1). Thus, an identical ing the almost opposite gradient of distribution of H1 explanation can be argued for the introduction of both and J2a1b∗ between the Bulgarian Gypsies, on one side, lineages in the Gypsies, though not followed in the case of and the Portuguese, Spanish and Lithuanian, on the other. J2a1b1 by the strong drift effects that drove the distribution The frequency of J2a1b∗ in the latter three groups varies of J2a1b∗ (xJ2a1b1). between approximately 21% and 33%, whereas in the Another very common J lineage in the Portuguese Gyp- Bulgarian Roma it just attains 9%. Adding to these findings sies was J2a∗ (xJ2a1b), reaching 16.7%. It is characterised by the non-detection of J2a1b∗ (xJ2a1b1) among the Macedo- the absence of M67 and the presence of the very recently nian Roma (Periciˇ c´ et al. 2005b), it seems likely that the discovered mutation M410 (Sengupta et al. 2006). It seems haplogroup was not integrated in the ancestral gene pool extremely unlikely that such lineages were present among of the first proto-Roma migrating from India, and that the Romani groups studied by Gresham et al. (2001), since later on, during the fragmentation of the Roma settled with their typing strategy only five out of 252 samples were in the Balkan region (which gave rise to the groups that not assigned to a specific haplogroup and these five sam- continued to move west and northward in Europe), strong ples had microsatellite backgrounds very distinct from the drift effects operated successively, enhancing the frequency J2a∗ (xJ2a1b) chromosomes now detected. As the pattern of J2a1b∗ (xJ2a1b1) in a group that would turn out to be of geographical distribution of J2a∗ (xJ2a1b) is still incipi- a common founder of both Iberian and North European ently known, in order to address the question of its origin Roma. among the Portuguese Gypsies we recruited the STR in- The spatial distribution pattern of the haplogroup in formation anchored in those lineages and we conducted a Eurasia makes it likely that indeed the proto-Roma already haplotype match analysis using the YHRD. Based on the carried J2a1b∗ (xJ2a1b1) when they first arrived in Europe. ‘minimal haplotype+SWGDAM core set’ of the database Like other subgroups of the J2a-M410 cluster, its presence (11 STR loci) we only observed a full match in a world- in Eurasia is widely recognised to be associated with the de- wide population sample of 23,979 haplotypes, in a set of mographic spread of the farmers. In India, M67 221 populations, and the match was with a Portuguese lineages were found to be almost virtually absent (Zerjal sample. This led us to presume that admixture with Por- et al. 2007; Sengupta et al. 2006) although two isolated tuguese non-Gypsies was responsible for the introduction occurrences have been reported (Kivisild et al. 2003). The of the haplogroup in the Gypsy gene pool and again a strong region where their frequency peaks is South , but founder effect extraordinarily raised its frequency. Interest- they have been observed from the Middle East through ingly, 20 of the 21 J2a∗(xJ2a1b) chromosomes shared the and also diffused throughout Europe. Moreover, same 16 STR haplotype which was only one molecular since higher frequencies and variances have been registered step distant from the first. Given that the founder haplo- in Europe and Turkey than in the Middle East, it has been type appeared in Gypsy communities spread throughout suggested that M67 chromosomes arrived in Europe from Portugal, the introduction event must have occurred soon Turkey (Semino et al. 2004) probably following an Aegean after the arrival of the first Gypsies in the country. route (Di Giacomo et al. 2004). Each of the remaining two J lineages detected, J2b-M12 Having this in mind and knowing from historical sources and J2b1a-M241, were observed only once in the sample. that Turkey and Greece represented open doors for the in- Consulting the YHRD, we found eleven full matches with flux of Roma into the Balkan region, it seems quite admis- the STR haplotype harboured by the J2b chromosome sible that the introduction of J2a1b∗ (xJ2a1b1) in the Gypsy in individuals from Romania, Macedonia, Bosnia, Albania gene pool occurred prior to their arrival in the Balkans, and . However, the same haplotype did not match any maybe in the Turkish region where the Gypsies settled for of the Romani samples contained in the database, namely a long time before entering Europe, either crossing the from Hungary (N = 224), Bulgaria (N = 81), Turkey Bosphorus or following an Aegean route. (N = 111), and Slovakia (N = 63). We can admit that Besides M67, one isolated chromosome further har- some J2b-M12 lineages were assimilated in the Balkan boured M92, thus falling into J2a1b1. This haplogroup region.

222 Annals of Human Genetics (2008) 72,215–227 C 2008 The Authors Journal compilation C 2008 University College London A Perspective on the History of the Iberian Gypsies

Concerning J2b1a-M241, several haplotypes molecu- cluster fit in or are molecularly related to the known Di- larly related with the Portuguese Gypsy J2b1a were ob- naric modal haplotype (16–24-11–11-13 at DYS19, 390, served disseminated in Western European populations and 391, 392, 393), which is an STR motif strongly associated a Pyrenean sample showed a full match to it. Therefore, this with the sub-cluster I1b (Barac´ et al. 2003; Rootsi et al. might have been incorporated during their passage through 2004), the haplotypes within the most represented cluster Western Europe. belong to, or are molecularly related to, the Nordic modal haplotype (14–23-10-11-13 at the same STR loci), which in turn has been connected to the sub-cluster I1a (this is Haplogroup I only moderately observed in the Balkans). Admitting that The M170 mutation defining haplogroup I was carried both clusters were incorporated into the Roma gene pool by 10.3% of Portuguese Gypsy males. The clade contains by admixture events occurring in the Balkan region, it is a variety of lineages restricted to Europe and almost ab- noteworthy that in the two clusters occur sent elsewhere including in the Near-East and Middle-East. at inverted proportions compared to their host populations, In Europe, where the haplogroup arose in middle Upper a finding that can easily be explained by drift effects, but Palaeolithic times (Semino et al. 2000; Rootsi et al. 2004), which also testifies how these events can lead to misinter- two distinct areas of very high incidence are observed, pretation when addressing the origin of specific lineages. one located around Scandinavia and the other around the Dinaric Alps in the Balkan region. These two foci of fre- quency are due to different I sub-clades, namely I1a and I1b∗ , whose defining mutations were not assessed in this The second most common Y clade in the Portuguese Gyp- study. Besides M170, only M26 was additionally tested and sies was R1, defined by M173, and detected in 36 indi- the sub-clade it defines, I1b1b, was found in one out the 13 viduals (∼29%), 35 of them holding the derivative muta- I chromosomes. The Portuguese Gypsy lineage classified tion M269 that characterises the sub-clade R1b1c. In one as I1b1b had a STR background quite distinct from others sample it was not possible to type M269, because but differing by just two steps from a chromosome of a of a large deletion of at least 3.5 Mb affecting this ∗ Spanish Gypsy, which may also be I1b1b. Western Europe chromosome (see haplotype H33, classified as R1 - is the region where I1b1b peaks and where the defining M173/del.M9/del.M269; Supplementary Table S2). Nev- mutation, M26, most possibly arose in a population from ertheless, this sample haplotype profile is identical to the Iberia/South France (Rootsi et al. 2004). A match anal- most frequent haplotype inside R1b1c haplogroup. ysis with the extended haplotype revealed that the Por- R1 was found in all Roma groups with variable fre- tuguese I1b1b chromosome fully matched six records in quency: 3.5% in Macedonians (Periciˇ c´ et al. 2005b), the YHRD, all from Portugal or Spain. We can safely con- 4.4% in Bulgarians; 10% in Lithuanians, 22.2% in Spanish clude that the introgression of I1b1b in the Iberian Gypsies (Gresham et al. 2001). The gradient of distribution within occurred after the Pyrenees crossing. Gypsy groups overlaps quite well with the global clinal Gresham et al. (2001) showed that M170 lineages were pattern of R1 across Europe where the Palaeolithic R1b present in Roma from Bulgaria (∼25%), Spain (∼15%) and subgroup of this occurs significantly more fre- Lithuania (∼5%), Periciˇ c´ et al. (2005b) also found them quently in Western populations, reaching 60% frequency among the Macedonian Romani (∼5%). in Portugal (Beleza et al. 2006). The Iberian Peninsula is In the YHRD, for the most frequent haplotype inside thought to be the region of origin of the major deriva- I∗ (xI1b1b), we found 20 identical haplotypes. Fifteen were tive R1b1c as well as of its expansion after the Last Glacial in Eastern Europe, eight of them in Gypsy populations Maximum (Semino et al. 2001; Cinnioglu˘ et al. 2004). In from Slovakia and Hungary, supporting their possible in- agreement with the R1 distribution pattern, the Gypsies corporation in the Balkan region. from Spain and Portugal present the highest frequencies The phylogenetic network reconstructed with the STR of these chromosomes, and most likely a large proportion data within I chromosomes from Gypsy individuals is pre- of them were essentially acquired by admixture with the sented in Figure 4C and shows that the most common of Iberian surrounding populations. Two lines of evidence the two Portuguese I∗ (xI1b1b) haplotypes is also the most lead to this interpretation. Firstly, the absence of founder frequent among the Bulgarian Roma. Lineages molecu- lineages being transversal to all Roma groups, as shown in larly close to this modal haplotype are found in Gypsies Figure 4D, where the network of the STR haplotypes an- from Bulgaria, Portugal and Spain. A substantial propor- chored in Gypsy R1 is depicted. We can observe a cluster tion of Bulgarian haplotypes fall in a second molecular of haplotypes molecularly related that contains chromo- cluster to which the single Lithuanian I chromosome also somes from Portuguese, Spanish and Lithuanian Roma. belongs. Interestingly, while the haplotypes within this last The remaining haplotypes are very divergent and occur

C 2008 The Authors Annals of Human Genetics (2008) 72,215–227 223 Journal compilation C 2008 University College London A. Gusmao˜ et al.

erratically in distinct Roma populations. This suggests that Gypsies before entrance in Europe. This mutation was not R1 chromosomes were incorporated independently in dif- detected in a sample of 57 Macedonian Romani, and the ferent Roma and that diversification in loco within each lack of resolution in other studies on Gypsy groups does group was of minor relevance in the context of the Gypsy not allow confirmation of its presence among them. R1 diversity. E∗ (xE3b1)-M96, G-M201, and K2-M70 were observed The other evidence comes from the haplotype match once each in the Portuguese Gypsy sample. They were not analysis. Out of 35 STR haplotypes within R1b1c chro- found in other Roma and all are relatively uncommon in mosomes, 28 showed complete coincidence within Euro- Europe. No matches were observed for these lineages. Like pean populations in the YHRD, in a total of 183 matches. in other European populations, they are present in Portugal From those, only 14 matches were with individuals from at low frequencies (Beleza et al. 2006). Eastern European populations and none was with any of the 479 haplotypes from Gypsy populations contained in the database. The other 169 coincidences were dissemi- Conclusions nated in Western Europe but nearly 38% were with Iberian In this work we studied the Y-chromosome diversity of haplotypes, 57% of which inside Portugal. Portuguese Gypsies, aiming at deepening previous knowl- These results led us to conclude that the majority of edge about the population structure of the Roma in R1b1c lineages were assimilated after the arrival in Iberia. Europe. Combining a large amount of both SNP and STR in- formation, it was possible to unfold layers of information Other Lineages kept in the Portuguese Gypsy male gene pool, enabling Three chromosomes were positive for M35, a mutation an extensive reconstruction of the major episodes in their that defines haplogroup E3b. This haplogroup was not de- demographic history. tected among the Lithuanian or Spanish Gypsies, but was present in 4.4% of Bulgarian Roma and reached 29.8% in the Macedonian Romani. At least the two M35 chromo- The Ancestral Component Shared by the somes from the Portuguese Gypsies that additionally had European Roma M81 (E3b1b), do not seem to belong to the array of E3b In the Portuguese Ciganos four lineages were detected, H1a, lineages of the Roma from the Balkans. M81 was absent in ∗ ∗ I (xI1b1b), J2a1b (xJ2a1b1), and J2a1b1, which are widely the Macedonian Romani, and although not typed in the distributed in Roma groups. The sharing of this set of lin- Bulgarian Gypsies, their E3b chromosomes had STR hap- eages, usually uncommon in European non-Gypsy popu- lotypes quite different from the Portuguese E3b1b. This lations, is a genetic link among Gypsies that clearly bear haplogroup was likely introduced in Iberia during the Is- witness to the common origin of all Roma in Europe. lamic occupation (Bosch et al. 2001; Cruciani et al. 2004) ∗ Two other lineages, J2b (xJ2b1a) and Q, could have also being present in ∼6% of the Portuguese non-Gypsy pop- been incorporated into the Gypsies prior to their entrance ulation (Beleza et al. 2006). Searching in YHRD with into Europe. On the whole, the proportion retained from the extended haplotype, one of the chromosomes fully an ancient gene pool already present in the Balkan Roma matched 12 records out of which eight were from Portugal (before the fragmentation event and westward dispersion and one from Spain, seeming to imply that the lineages that occurred during the 15th century) amounted to about were introduced into the Gypsies already in Iberia. 48.4% of the Portuguese Gypsy chromosomes. Concerning the third E3b chromosome, it belonged to E3b1a-M78, with a distribution that encompasses North and Eastern Africa, all of Europe, and the Middle East. The Passage from the Balkans to Iberia In YHRD only two full matches were observed with both chromosomes from non-Gypsies, one from Poland and an- During the short period spanning the exodus from the other from Germany. Balkans to the entrance into Iberia, the frequency of lin- A single male was found carrying the M242 mutation, eage R1b1c might have been considerably enriched. R1∗ belonging to haplogroup Q. This mutation originated in was present at 28.6% frequency in the Portuguese Gyp- , although it is also been detected at low fre- sies, but only at 2%, 4% and 10% in Macedonian, Bulgar- quencies in India, Pakistan (Sengupta et al. 2006), ian and Lithuanian Roma, respectively, which implies that (Regueiro et al. 2006) and Turkey (Cinnioglu˘ et al. 2004). about 18.6% of R1b1c lineages can be attributed to Iberian Apart from its presence in Turkey, this mutation is virtu- influx. Actually, out of the 35 STR-haplotypes within the ally absent in Europe, which supports its acquisition by the R1b1c Portuguese Gypsy chromosomes, 23 had previously

224 Annals of Human Genetics (2008) 72,215–227 C 2008 The Authors Journal compilation C 2008 University College London A Perspective on the History of the Iberian Gypsies been detected in non-Gypsy Iberians. Assuming that these pool, influxes of roughly six additional non R1b1c lineages 23 lineages were integrated into the Gypsy gene pool al- might also have occurred. From these, four were already ready in Iberia, we can estimate that ∼18.3% (23 in 126) considered as Iberian influx (J2a, I1b1b and E3b1b). The of chromosomes in Gypsies resulted from R1b1c admix- other expected lineages can belong to any of the tails of ture in Iberia, in full conformity with that obtained when haplogroups, E∗ (xE3b1), G, J2b1a or K2, referred to above, the Lithuania Gypsies were used as reference. Admitting since they are also present in Iberia. this, the remaining proportion of the stock of Portuguese Summarizing, the emergence of the Portuguese Gyp- Gypsies’ R1b1c, together with J2b1a, roughly 10.3%, may sies was the result of a complex process stretching over a have been acquired in Europe, mainly during the journey vaguely known history of migration, multiple fragmenta- towards Iberia. Despite the supposedly short lasting passage tion and interaction with other populations. In the current from the Balkans to Iberia, interaction with surrounding study some progress was made towards disentangling a little populations was nonetheless enough to leave visible im- more of their past. Emphasising a few, it seems now possi- prints in the genetic composition of the Portuguese Gyp- ble to refute the hypothetical relevance of a second route sies. of entrance of Gypsies into Europe through North Africa, at least from the male point of view, as we did not find ge- netic evidence supporting any considerable North African Acquisitions after Arrival in Iberia influence. We can also contest the common idea that the Por- ∗ R1b1c and J2a (xJ2a1b) are the two lineages among the tuguese Gypsies (and Gypsies in general) are an extremely Portuguese Ciganos that provide the best clues to the ex- closed group. The minimum estimate of around 37% as the tent of admixture that occurred in Iberia. Within Iberia proportion of male admixture that had already occurred in it is impossible to discriminate between the Portuguese Iberia is in clear contradiction with the stereotype. This and Spanish contributions since the Iberian Y-chromosome high rate of lineage assimilation means that through adop- landscape is quite homogeneous. tion of a socio-cultural style of life, many people became We have estimated the proportion of R1b chromosomes and are becoming Gypsy (and vice-versa). More than re- of Iberian descendent to be ∼18.3%. flecting a biological/linguistic entity, the term “Gypsy” ∗ We also hypothesized an Iberian origin for J2a (xJ2a1b), must be perceived as a social construct in a continually encompassing 16.7% of the Portuguese Ciganos, and for changing environment. I1b1b and E3b1b, observed at the residual frequency of Despite the high rate of gene pool renewal, the process 2.4%. did not erase signatures of the strong endogamous nature Summing up all contributions of probable Iberian ori- of the Portuguese Gypsy society. All Gypsy groups share gin, we suggest 37.4% as the minimum estimate of the this structural tradition and all of them experienced iden- admixture proportion between Gypsies and non-Gypsy in- tical and peculiar population histories. Along time, both dividuals in Iberia. demography and culture acted together providing the con- A number of other lineages detected in the Portuguese ditions for profound and continuous reshuffling of previ- ∗ Gypsies, namely E (xE3b1), E3b1a, G, J2b1a and K2, to- ous genetic compositions. Eventually, new founder effects taling ∼4.0%, probably introgressed during their passage inside old founder effects under a continuous corridor of from the Balkans to Iberia or after arrival in Iberia. Whilst gene influx will progressively attenuate the existing genetic fully absent in the Balkan Roma, all are found dispersed signs of their origin. As this and previous works demon- at low or moderate frequencies throughout Europe, it is strate, at the moment, genetic studies can still contribute therefore difficult to infer where they were absorbed into to partially recover that past. the Gypsy gene pool. We have previously estimated that 23 R1b1c chromo- somes were of Iberian origin, and that they are represented by ten different 17-STR haplotypes. Pairwise comparisons Acknowledgements between these ten haplotypes revealed that (with a single We thank all DNA donors and those, which made possible the exception) out of 45 possible comparisons the observed contacts with Gypsy communities in Portugal, including ACIME differences were always higher than four (between four to and Cruz Vermelha Internacional. This work was partially sup- 13 repeats). Therefore, we can say that at least nine R1b1c ported by Fundac¸ao˜ para a Cienciˆ aeaTecnologia, through POCI Iberian chromosomes have been incorporated in Gypsies. (Programa Operacional Cienciaˆ e Inovac¸ao˜ 2010) and project Because this haplogroup exists in 60% of the non Gypsy PTDC/ANT/70413/2006. The primers used to type M242 mu- Iberian males (Beleza et al. 2006), we can extrapolate that, tation were kindly provided by Maria Brion´ (Institute of Legal from the remaining 40% of the Iberian non-Gypsy gene Medicine, Santiago de Compostela, Spain).

C 2008 The Authors Annals of Human Genetics (2008) 72,215–227 225 Journal compilation C 2008 University College London A. Gusmao˜ et al.

References Furedi,¨ S., Woller, J., Padar, Z. & Angyal, M. (1999) Y-STR haplo- typing in two Hungarian populations. Int J Legal Med 113, 38–42. Adams, S. M., King, T. E., Bosch, E. & Jobling, M. A. (2006) Gresham, D., Morar, B., Underhill, P. A., Passarino, G., Lin, A. The case of the unreliable SNP: recurrent back-mutation of Y- A., Wise, C., Angelicheva, D., Calafell, F., Oefner, P. J., Shen, chromosomal marker P25 through gene conversion. Forensic Sci P., Tournev, I., de Pablo, R., Kucinskas, V., Perez-Lezaun, A., Int 159, 14–20. Marushiakova, E., Popov, V.& Kalaydjieva, L. (2001) Origins and Alves, C., Gomes, V., Prata, M. J., Amorim, A. & Gusmao,˜ L. divergence of the Roma (gypsies). Am J Hum Genet 69, 1314– (2007) Population data for Y-chromosome haplotypes defined by 1331. 17 STRs (AmpFlSTR YFiler) in Portugal. Forensic Sci Int 171, Gusmao,˜ L., Butler, J. M., Carracedo, A., Gill, P.,Kayser, M., Mayr, 250–255. W. R., Morling, N., Prinz, M., Roewer, L., Tyler-Smith. C. & Barac,´ L., Pericic,´ M., Klaric,´ I. M., Rootsi, S., Janicijevi´ c,´ B., Schneider, P. M. (2006) DNA Commission of the International Kivisild, T., Parik, J., Rudan, I., Villems, R. & Rudan, P. (2003) Society of Forensic Genetics (ISFG): An update of the recom- Y chromosomal heritage of Croatian population and its island mendations on the use of Y-STRs in forensic analysis. Forensic Sci isolates. Eur J Hum Genet 11, 535–542. Int 157, 187–197. Beleza, S., Alves, C., Gonzalez-Neira, A., Lareu, M., Amorim, A., Jobling, M. A. & Tyler-Smith, C. (2003) The Human Y Chromo- Carracedo, A. & Gusmao,˜ L. (2003) Extending STR markers in some: An evolutionary Marker Comes of Age. Nature Reviews Y chromosome haplotypes. Int J Legal Med 117, 27–33. Genetics 4, 598–612. Beleza, S., Gusmao,˜ L., Lopes, A., Alves, C., Gomes, I., Giouzeli, Kalaydjieva, L., Calafell, F.,Jobling, M. A., Angelicheva, D., de Kni- M., Calafell, F., Carracedo, A. & Amorim, A. (2006) Micro- jff, P., Rosser, Z. H., Hurles, M. E., Underhill, P., Tournev, I., Phylogeographic and Demographic History of Portuguese Male Marushiakova, E. & Popov V (2001a) Patterns of inter- and intra- Lineages. Ann Hum Genet 70, 181–194. group genetic diversity in the Vlax Roma as revealed by Y chro- Berger, B., Lindinger, A., Niederstatter,¨ H., Grubwieser, P. & mosome and mitochondrial DNA lineages. Eur J Hum Genet 9, Parson, W.(2005) YSTR typing of an Austrian population sample 97–104. using a 17-loci multiplex PCR assay. Int J Legal Med 119, 241–246. Kalaydjieva, L., Gresham, D. & Calafell, F. (2001b) Genetic studies Blanco-Verea, A., Brion,´ M., Sanchez-Diz,´ P.,Jaime, J. C., Lareu, M. of the Roma (Gypsies): a review. BMC Medical Genetics 2,5. V. & Carracedo, A. (2006) Analysis of Y chromosome lineages in Kendrick, D. & Puxon, G. (1998) Ciganos: Da India´ ao native South American population. Progress in Forensic Genetics Mediterraneo.ˆ Colecc¸ao˜ Interface 3. 11. International Congress Series 1288, 222–224. Kivisild, T., Rootsi, S., Metspalum, M., Mastanam, S., Kaldmam, Bosch, E., Calafell, F., Comas, D., Oefner, P. J., Underhill, P. A. K., Parik, J., Metspalu, E., Adojaan, M., Tolk, H. V., Stepanov, & Bertranpetit, J. (2001) High-resolution analysis of human Y- V., Golge, M., Usanga, E., Papiha, S. S., Cinnioglu,˘ C., King, chromosome variation shows a sharp discontinuity and limited R., Cavalli-Sforza, L., Underhill, P. A. & Villems, R. (2003) The gene flow between northwestern Africa and the Iberian Peninsula. genetic heritage of the earliest settlers persists both in Indian tribal Am J Hum Genet 68, 1019–1029. and caste populations. Am J Hum Genet 72, 313–332. Brion,´ M., Sobrino, B., Blanco-Verea, A., Lareu, M. V.& Carracedo, Liegeois,´ J. P.(1989) Ciganos e Itinerantes: Dados socioculturais. Da- A. (2005) Hierarchical analysis of 30 Y-chromosome SNPs in Eu- dos sociopol´ıticos. Santa Casa da Misericordia´ de Lisboa. 1Edic¸ao˜ ropean populations. Int J Legal Med 119, 10–15. Malayarchuk, B. A., Grzybowski, T., Deremko, M. V., Czarny, J. & Cinnioglu,˘ C., King, R., Kivisild, T., Kalfoglu,˘ E., Atasoy, S., Miscicka-Sliwka, D. (2006) Mitochondrial DNA diversity in the Cavalleri, G. L., Lillie, A. S., Roseman, C. C., Lin, A. A., Prince, Polish Roma. Ann Hum Genet 70, 195–206. K., Oefner, P. J., Shen, P., Semino, O., Cavalli-Sforza, L. L. & Mart´ın, P., Garcia-Hirchfeld, J., Garcia, O., Gusmao,˜ L., Albarran, Underhill, P.A. (2004) Excavating Y-chromosome haplotype strata C., Sancho, M. & Alonso, A. (2004) A Spanish population study in Anatolia. Hum Genet 114, 127–148. of 17 Y chromosome STR loci. Forensic Sci Int 139, 231–235. Coelho, F. A. (1996) Os ciganos de Portugal, com um estudo do Morar, B., Gresham, D., Angelicheva, D., Tournev, I., Gooding, calao.˜ Dom Quixote. Lisboa R., Guergueltcheva, V.,Schmidt, C., Abicht, A., Lochmuller,¨ H., Cruciani, F., La Fratta, R., Santolamazza, P., Sellitto, D., Pascone, Tordai, A., Kalmar,´ L., Nagy, M., Karcagi, V., Jeanpierre, M., R., Moral, P., Watson, E., Guida, V., Colomb, E. B., Zaharova, Herczegfalvi, A., Beeson, D., Venkataraman, V., Carter, K. W., B., Lavinha, J., Vona, G., Aman, R., Cali, F., Akar, N., Richards, Reeve, J., de Pablo, R., Kucinskas,ˇ V. & Kalaydjieva, L. (2004) M., Torroni, A., Novelletto, A. & Scozzari, R. (2004) Phylogeo- Mutation history of the Roma/Gypsies. Am J Hum Genet 75, graphic analysis of haplogroup E3b (E-M215) Y chromosomes 596–609. reveals multiple migratory events within and out of Africa. Am J Mulero, J. J., Budowle, B., Butler, J. M. & Gusmao,˜ L. (2006) Letter Hum Genet 74, 1014–1022. to the editor – nomenclature and allele repeat structure update for Di Giacomo, F., Luca, F., Popa, L. O., Akar, N., Anagnou, N., the Y-STR locus GATA H4. J Forensic Sci 51, 694. Banyko, J., Brdicka, R., Barbujani, G., Papola, F., Ciavarella, Nagy, M., Henke, L., Henke, J., Chatthopadhyay, P. K., Volgyi, A., G., Cucci, F., Di Stasi, L., Gavrila, L., Kerimova, M. G., Zalan, A., Peterman, O., Bernasovska, J. & Pamjav, H. (2007) Kovatchev, D., Kozlov, A. I., Loutradis, A., Mandarino, V., Searching for the origin of Romanies: Slovakian Romani, Jats of Mammi, C., Michalodimitrakis, E. N., Paoli, G., Pappa, K. I., Haryana and Jat Sikhs Y-STR data in comparison with different Pedicini, G., Terrenato, L., Tofanelli, S., Malaspina, P. & Novel- Romani populations. Forensic Sci Int 69, 19–26. letto, A. (2004) Y chromosomal haplogroup J as a signature of the Periciˇ c,´ M., Klaric,´ I. M., Lauc, L. B., Janicijevi´ c,´ B., Dordevic,´ D., post-neolithic colonization of Europe. Hum Genet 115, 357–371. Efremovska, L. & Rudan, P. (2005a) Population genetics of 8 Y Excoffier, L. G. & Scheinder, S. (2005) Arlequin ver.3.0: An in- chromosome STR loci in Macedonians and Macedonian Romani tegrated software package for population genetics data analysis. (Gypsy). Forensic Sci Int 154, 257–261. Evolutionary Bioinformatics Online 1, 47–50. Periciˇ c,´ M., Lauc, L. B., Klaric,´ I. M., Rootsi, S., Janicijevi´ c,´ B., Fraser, A. (1998) Historia´ do Povo Cigano. Traduc¸ao˜ de Costa T. Rudan, I., Terzic,´ R., Colak,ˇ I., Kvesic,´ A., Popovic,´ D., Sijaˇ cki,ˇ Editorial Teorema, LDA

226 Annals of Human Genetics (2008) 72,215–227 C 2008 The Authors Journal compilation C 2008 University College London A Perspective on the History of the Iberian Gypsies

A., Behluli, I., Dordevic,´ D., Efremovska, L., Bajec, D. D., Ste- of high-resolution Y-chromosome distribution in India identify fanovic,´ B. D., Villems, R. & Rudan, P. (2005b) High-resolution both indigenous and exogenous expansions and reveal minor ge- phylogenetic analysis of southeastern Europe traces major episodes netic influence of Central Asian Pastoralists. Am J Hum Genet 78, of paternal gene flow among Slavic populations. Mol Biol Evol 22, 202–221. 1964–1975. Soltyszewski, I., Pepinski, W., Spolnicka, M., Kartasinska, E., Ploski, R., Wozniak, M., Pawlowski, R., Monies, D. M., Branicki, Konarzewska, M. & Janica, J. (2007) Y-chromosomal haplotypes W., Kupiec, T., Kloosterman, A., Dobosz, T., Bosch, E., Nowak, for the AmpFlSTR Yfiler PCR Amplification Kit in a population M., Lessig, R., Jobling, M. A., Roewer, L. & Kayser, M. (2002) sample from Central Poland. Forensic Sci Int 168, 61–7. Homogeneity and distinctiveness of Polish paternal lineages re- Turrina, S., Atzei, R. & De Leo, D. (2006) Y-chromosomal STR vealed by Y chromosome microsatellite haplotype analysis. Hum haplotypes in a Northeast Italian population sample using 17plex Genet 110, 592–600. loci PCR assay. Int J Legal Med 120, 56–59. Regueiro, M., Cadenas, A. M., Gayden, T., Underhill, P.A. & Her- Underhill, P. A., Passarino, G., Lin, A. A., Shen, P., Mirazon-Lahar, rera, R. J. (2006) Iran: Tricontinental nexus for Y-chromosome M., Foley, R. A., Oefner, P. J. & Cavalli-Sforza, L. L. (2001) driven migration. Hum Hered 61, 132–143. The phylogeography of Y-chromosome binary haplotypes and the Rosser, Z. H., Zerjal, T., Hurles, M. E., Adojaan, M., Alavantic, origins of modern human populations. Ann Hum Genet 65, 43–62. D., Amorim, A., Amos, W., Armenteros, M., Arroyo, E., Bar- Willuweit, S., Roewer, L., on behalf of the International Forensic bujani, G., Beckman, G., Beckman, L., Bertranpetit, J., Bosch, Y Chromosome User Group (2007) Y chromosome haplotype E., Bradley, D. G., Brede, G., Cooper, G., Corte-Real,ˆ H.B, de reference database (YHRD): update. Forensic Sci Int: Genetics 1, Knijff, P., Decorte, R., Dubrova, Y. E., Evgrafov, O., Gilissen, 83–87. A., Glisic, S., Golge,¨ M., Hill, E. W., Jeziorowska, A., Kalayd- Y Chromosome Consortium (2002) A nomenclature system for the jieva, L., Kayser, M., Kivisild, T., Kravchenko, S. A., Krumina, tree of human Y-chromosomal binary haplogroups. Genome Res A., Kucinskas, V.,Lavinha, J., Livshits, L. A., Malaspina, P.,Maria, 12, 339–348. S., McElreavey, K., Meitinger, T. A., Mikelsaar, A. V., Mitchell, Zaharova, B., Andonova, S., Gilissen, A., Cassiman, J. J., Decorte, R. J., Nafa, K., Nicholson, J., Nørby, S., Pandya, A., Parik, J., R. & Kremensky, I. (2001) Y-chromosomal STR haplotypes in Patsalis, P. C., Pereira, L., Peterlin, B., Pielberg, G., Prata, M. J., three major population groups in Bulgaria. Forensic Sci Int 124, Previdere,´ C., Roewer, L., Rootsi, S., Rubinsztein, D. C., Saillard, 182–186. J., Santos, F. R., Stefanescu, G., Sykes, B. C., Tolun, A., Villems, Zerjal, T., Pandya, A., Thangaraj, K., Ling, E. Y., Kearley, J., R., Tyler-Smith, C. & Jobling, M. A. (2000) Y-chromosomal di- Bertoneri, S., Paracchini, S., Singh, L. & Tyler-Smith, C. (2007) versity in Europe is clinal and influenced primarily by geography, Y-chromosomal insights into the genetic impact of the caste system rather than by language. Am J Hum Genet 67, 1526–1543. in India. Hum Genet 121, 137–144. Rootsi, S., Magri, C., Kivisild, T., Benuzzi, G., Help, H., Bermi- sheva, M., Kutuev, I., Barac, L., Periciˇ c,´ M., Balanovsky, O., Pshenichnov, A., Dion, D., Grobei, M., Zhivotovsky, L. A., Supplementary Material Battaglia, V., Achilli, A., Al-Zahery, N., Parik, J., King, R., Cin- nioglu,˘ C., Khusnutdinova, E., Rudan, P., Balanovska, E., Schef- The following material is available for this article online: frahn, W., Simonescu, M., Brehm, A., Goncalves, R., Rosa, A., Moisan, J. P., Chaventre, A., Ferak, V., Furedi,¨ S., Oefner, P. J., Table S1. Multiplex H, J2 and M269. Details on sequences Shen, P., Beckman, L., Mikerezi, I., Terzic, R., Primorac, D., of PCR and SBE primers used, and typing conditions Cambon-Thomsen, A., Krumina, A., Torroni, A., Underhill, P. Table S2. Y-chromosome haplotypes in 126 Gypsies from A., Santachiara-Benerecetti, A. S., Villems, R. & Semino, O. Portugal. District codes are: Aveiro (Av); Lisboa (Li); Viana (2004) Phylogeography of Y-chromosome haplogroup I reveals do Castelo (VC); Braga (B); Porto (P); Braganc¸a (Bn); distinct domains of prehistoric gene flow in Europe. Am J Hum Genet 75, 128–137. Vila Real (VR); Viseu (Vi); Coimbra (Co); Guarda (Gu); Semino, O., Passarino, G., Oefner, P. J., Lin, A. A., Arbuzova, S., Castelo Branco (CB); Leiria (Le); Santarem´ (Sa); Setubal´ Beckman, L. E., De Benedictis, G., Francalacci, P., Kouvatsi, A., (Se); Portalegre (Po); Evora´ (Ev); Beja (Be); Faro (Fa); Fun- Limborska, S., Marcikiae, M., Mika, A., Mika, B., Primorac, D., chal (Fu). Santachiara-Benerecetti, A. S., Cavalli-Sforza, L. L. & Under- hill, P. A. (2000) The genetic legacy of paleolithic homo sapiens This material is available as part of the online article from: in extant europeans: A Y-chromosome prespective. Science 290, http://www.blackwell-synergy.com/doi/abs/10.1111/ 1155–1159. j.1469-1809.2007.00421.x Semino, O., Magri, C., Benuzzi, G., Lin, A. A., Al-Zahery, N., (This link will take you to the article abstract). Battaglia, V.,Maccioni, L., Triantaphyllidis, C., Shen, P.,Oefner, P. J., Zhivotovsky, L. A., King, R., Torroni, A., Cavalli-Sforza, L. L., Please note: Blackwell Publishing are not responsible Underhill, P. A. & Santachiara-Benerecetti, A. S. (2004) Origin, for the content or functionality of any supplementary diffusion, and differentiation of Y-chromosome haplogroups E and materials supplied by the authors. Any queries (other than J: inferences on the neolithization of Europe and later migratory events in the Mediterranean area. Am J Hum Genet 74, 1023–1034. missing material) should be directed to the corresponding Sengupta, S., Zhivotovsky, L. A., King, R., Mehdi, S. Q., Edmonds, author for the article. C. A., Chow, C. T., Lin, A. A., Mitra, M., Sil, S. K., Ramesh, A., Rani, M. V. U., Thakur, C. M., Cavalli-Sforza, L. L., Ma- Received: 15 October 2007 jumder, P. P. & Underhill, P. A. (2006) Polarity and temporality Accepted: 20 October 2007

C 2008 The Authors Annals of Human Genetics (2008) 72,215–227 227 Journal compilation C 2008 University College London