A Glance of Genetic Relations in the Balkan Populations Utilizing
Total Page:16
File Type:pdf, Size:1020Kb
ANTHROPOLOGICAL REVIEW • VOL. 81(3), 252–268 (2018) ANTHROPOLOGICAL REVIEW Available online at: https://content.sciendo.com/anre Journal homepage: www.ptantropologiczne.pl A glance of genetic relations in the Balkan populations utilizing network analysis based on in silico assigned Y-DNA haplogroups Emir Šehović1*, Martin Zieger3, Lemana Spahić1, Damir Marjanović1,2, Serkan Dogan1 1International Burch University, Department of Genetics and Bioengineering, Sarajevo, Bosnia and Herzegovina 2Institute for Anthropological Research, Zagreb, Croatia 3Institute of Forensic Medicine, Forensic Molecular Biology Dpt., University of Bern, Sulgenauweg 40, 3007 Bern, Switzerland ABSTRACT: The aim of this study is to provide an insight into Balkan populations’ genetic relations utilizing in silico analysis of Y-STR haplotypes and performing haplogroup predictions together with network analysis of the same haplotypes for visualization of the relations between chosen haplotypes and Balkan populations in general. The population dataset used in this study was obtained using 23, 17, 12, 9 and 7 Y-STR loci for 13 populations. The 13 populations include: Bosnia and Herzegovina (B&H), Croatia, Macedonia, Slovenia, Greece, Romany (Hungary), Hungary, Serbia, Montenegro, Albania, Kosovo, Romania and Bulgaria. The overall dataset contains a total of 2179 samples with 1878 different haplotypes. I2a was detected as the major haplogroup in four out of thirteen analysed Balkan populations. The four populations (B&H, Croatia, Montenegro and Serbia) which had I2a as the most prevalent haplogroup were all from the former Yugoslavian republic. The remaining two major populations from former Yugoslavia, Macedonia and Slovenia, had E1b1b and R1a haplogroups as the most prevalent, respectively. The populations with E1b1b haplogroup as the most prevalent one are Macedonian, Romanian, as well as Albanian populations from Kosovo and Albania. The I2a haplogroup cluster is more compact when compared to E1b1b and R1b haplogroup clusters, indicating a larger degree of homogeneity within the haplotypes that belong to the I2a haplogroup. Our study demonstrates that a combination of haplogroup prediction and network analysis represents an effective approach to utilize publicly available Y-STR datasets for population genetics. KEY WORDS: Balkan populations, Y-STR haplotype analysis, Haplogroup prediction, Median-joining tree, Y chromosomal haplogroups. Introduction 2005 and Shi et al. 2005). The Y chro- mosome, despite being the smallest The paternally inherited Y chromo- chromosome in the organism, still pos- some is used in forensics for identifi- sesses two highly useful types of ge- cation, anthropology and population netic markers, single nucleotide poly- genetics for understanding origin and morphisms (SNPs) and short tandem migrations of humans (Kayser et al. repeats (STRs) (Gusmao et al. 2005; Original Research Article Received April 10, 2018 Revised August 16, 2018 Accepted August 17, 2018 DOI: 10.2478/anre-2018-0021 © 2018 Polish Anthropological Society 253 Emir Šehović, et al. Ballantyne et al. 2010; Wang et al. and South Slavs, the latter including 2014). World population linkages, and most of the Balkan populations (Kushn- phylogenetic trees are created based iarevich et al. 2015). upon SNP markers, mainly because of Earlier research done by Šarac et al. their low mutation rate (Wang et al. (2018) shows that the Balkans have been 2010). The lineages determined by SNP a turbulent region for hundreds of years, patterns are referred to as haplogroups. and this region consequently contains Haplogroups can also be inferred from several major haplogroups. The four ma- readily available Y-STR genotyping data jor Y chromosomal haplogroups in the (Athey 2006). In the forensic context Balkans are I2a, E1b1b, R1a and R1b. there is plenty of Y-STR data available The I2a and E1b1b haplogroups are the (Willuweit and Roewer 2015) that can most abundant but the R1a and R1b still also be explored for population genet- have a considerable percentage (Semino ics. et al. 2000). Haplogroup I is thought to Haplogroup I referred to as “Palaeo- have appeared on the Balkan Peninsula lithic” European-specific is a biological about 45,000 years ago. However, this proof that Balkan population has had an haplogroup most probably originated additional expansion after the last Ice from the Middle East coming to Europe Age (Marjanović et al. 2005). Anatolian through Anatolia (Battaglia et al. 2009; agriculture began to spread 8,000 to Primorac et al. 2011). Haplogroup E1b1b 9,500 years ago and went across the Bal- is believed to have originated from the kan regions (Lemmen et al. 2011). Roos- African continent and provides evidence talu et al. (2006) and Pala et al. (2012) of the last direct migration from Africa discussed that ancient DNA samples, to Europe. (Battaglia et al. 2009). Fur- especially those of mtDNA link Europe- thermore, the inclusion of a significant an population to their neighbours from percentage of R1a and R1b haplogroups the Near East. Scientific efforts focused within the Balkan populations confirm on Genetics confirmed the hypothesis its historical intertwining with the oth- that there were several waves in which er populations of Europe (Semino et al. farmers from Near East came to Europe 2000). due to climate changes and new farming The aim of this study is to provide inventions. Therefore, the overall gene an insight into Balkan populations’ ge- pool of Europe was turbulently mixed netic relations utilizing in silico analy- many times. (Özdoğan, 2011; Davidović sis of Y-STR haplotypes by performing et al. 2015; Šarac et al. 2016; Veldhuis haplogroup predictions and network and Underdown 2017). analysis of the same haplotypes for vi- The Balto-Slavic speakers make ap- sualizing the relations between the cho- proximately one third of the total Eu- sen haplotypes and Balkan populations ropeans and the analysis of their mito- in general. Hence, the in silico analysis, chondrial DNA and non-recombining while also being affordable, is very use- region of the Y chromosome suggests ful in analysing the haplogroup distri- that their genetic structure does not dif- butions within the Balkan countries. In fer significantly from the neighbouring addition, the analysed haplotypes were populations. Balto-Slavic population can also studied using median-joining trees be divided into East Slavs, West Slavs, according to various sets of loci. Network analysis of Balkan Y haplogroups. 254 Material and methods This dataset contains a total of 2179 samples with 1878 different haplotypes. The population dataset used in this study The numbers of samples per popula- was obtained using 23, 17, 12, 9 and 7 tion are not consistent, varying from 53 Y-STR loci for 13 populations. A 23 Y-STR samples from Romany (Hungary) to 404 set was analysed in seven populations in- Montenegrin samples. The list of all an- cluding Bosnia and Herzegovina (Purps alysed populations, as well as the num- et al. 2014), Croatia (Purps et al. 2014), ber of samples and different haplotypes Macedonia (Purps et al. 2014), Slovenia is shown in Table 1. Full data used in this (Purps et al. 2014), Greece (Purps et al. article are available from the correspond- 2014), Romany (Hungary) (Purps et al. ing author on request. 2014) and Hungary (Purps et al. 2014). Y chromosomal haplogroups can be A set of 17 Y-STRs was analysed in two assigned from Y-STR haplotypes using populations, namely Serbia and Mon- haplogroup predictors, which are very tenegro (Mirabal et al. 2010). 12 Y-STR useful for analysing previous published analysis includes only the Albanian pop- Y-STR datasets (Dogan et al. 2016; Do- ulation (Ferriet al. 2010). Kosovo (Peričić gan et al. 2017; Heraclides et al. 2017; et al. 2004) and Romania (Barbarii et al. Gurkan et al. 2017) as well as validation 2003), population data were obtained us- purposes of haplogroup assignments ing 9 Y-STR loci analysis. Finally, 7 Y-STR based on Y SNP data (Petrejčíková et al. analysis includes only the Bulgarian pop- 2014 and Emmerova et al. 2017). The ulation (Zaharova et al. 2001). The rea- four haplogroup predictors that are in son for using some population datasets focus in this study are: Whit Athey hap- with less Y-STR loci is due to there being logroup predictor (Athey 2006), Nevgen no publicly available data with 23 Y STR haplogroup predictor (Ćetković - Gen- for those populations. tula and Nevski 2015), Vadim Urasin’s Table 1. Population datasets used in the study. Population Number of samples Number of different haplotypes Albania 339 233 Bosnia and Herzegovinian 100 100 Bulgarian 126 88 Croatian 239 239 Greek 214 214 Hungarian 100 100 Albanian (Kosovo) 117 60 Macedonian 101 101 Montenegrin 404 318 Romanian 104 97 Romany (Hungary) 53 53 Serbian 179 171 Slovenian 104 104 Total 2179 1878 255 Emir Šehović, et al. haplogroup predictor (Urasin 2013) and of genetic data. It is based upon creat- Jim Cullen haplogroup predictor (Cullen ing minimum spanning trees which are 2008). further combined into one reticulate Overall concordance between the four network. In order to achieve parsimony, haplogroup predictors as well as the con- consensus sequences in form of medi- cordance between each of the haplogroup an vectors, or Steiner points are added predictors was analysed in order to ob- (Bendelt et al. 1999). Posada and Cran- tain the most reliable results and to un- dall (2001) introduced network analysis derstand how the haplogroup predictors and haplogroup predictors that together compare to each other. Relative concor- make a great tool for genetic analysis of dance between each of the haplogroup populations. Furthermore, as they com- predictors was calculated by giving a val- plement each other’s weaknesses, the ue of either 1 or 0 depending on whether results obtained from their combined the output of the predicted haplogroup is overall picture can be considered more the same. Finally, dividing the obtained accurate than when using each method sum of the values by the number of hap- individually (Posada and Crandall 2001).