<<

www.nature.com/scientificreports

OPEN Positive selection and climatic efects on MHC class II diversity in hares (Lepus capensis) Received: 21 December 2017 Accepted: 16 July 2018 from a steep ecological gradient Published: xx xx xxxx Asma Awadi1, Hichem Ben Slimen1,2, Steve Smith 3, Felix Knauer3, Mohamed Makni1 & Franz Suchentrunk3

In natural populations, allelic diversity of the major histocompatibility complex (MHC) is commonly interpreted as resulting from positive selection in varying spatiotemporal pathogenic landscapes. Composite pathogenic landscape data are, however, rarely available. We studied the spatial distribution of allelic diversity at two MHC class II loci (DQA, DQB) in hares, Lepus capensis, along a steep ecological gradient in North Africa and tested the role of climatic parameters for the spatial distribution of DQA and DQB . Climatic parameters were considered to refect to some extent pathogenic landscape variation. We investigated historical and contemporary forces that have shaped the variability at both , and tested for diferential selective pressure across the ecological gradient by comparing allelic variation at MHC and neutral loci. We found positive selection on both MHC loci and signifcantly decreasing diversity from North to South Tunisia. Our multinomial linear models revealed signifcant efects of geographical positions that were correlated with mean annual temperature and precipitation on the occurrence of variants, but no efects of co-occurring DQA or DQB proteins, respectively. Diversifying selection, recombination, adaptation to local pathogenic landscapes (supposedly refected by climate parameters) and neutral demographic processes have shaped the observed MHC diversity and diferentiation patterns.

Genetic studies of natural populations based on presumably neutral markers such as mitochondrial d-loop DNA (mtDNA), microsatellites or SNPs are important to infer phylogeny, population history and gene fow1,2. However, in some cases the time span between the separation of populations might be too short or gene fow might be too strong to leave a signal of diferentiation at neutral loci3. In such cases, diferences between populations can only be detectable at functional genes under selection3. Moreover, studying molecular polymorphism at loci under will help to understand the genetics of adaptive processes and to increase our knowledge on the importance of adaptive genetic variability in free ranging animal populations1. Local adaptation at functional markers has mainly been addressed in populations structured at neutral markers and rarely in the case of seem- ingly panmictic populations4. A study of Tunisian hare populations indicated a strong population structure at mitochondrial OXPHOS genes, which are under positive selection at several codons, despite a high gene fow at neutral markers5. Tese fndings are consistent with local adaptation according to climate variation5. Adaptation to local or regional pathogenic landscapes is ofen addressed by analyses of major histocompatibil- ity genes (MHC) which code for glycoproteins that recognize and bind antigens in order to present them to CD4+ and CD8+T-lymphocytes to initiate the adaptive immune response against pathogens. MHC class I and class II loci, the two main types of MHC loci involved in the adaptive immune system, have been shown to be highly pol- ymorphic in a wide range of species6–9 and their adaptive signifcance in vertebrates has long been recognized10,11. Allelic diversity at these loci can be associated with variation across environmental and geographical scales as demonstrated from several studies12,13. Te role of environmental characteristics in shaping selective pressures

1Unité de Recherche Génomique des Insectes Ravageurs des Cultures d’Intérêt Agronomique, Faculty of Sciences of Tunis, University of Tunis El Manar, 2092, Tunis, Tunisia. 2Institut Supérieur de Biotechnologie de Béja, University of Jendouba, Avenue Habib Bourguiba Béja 9000, BP. 382, Béja, Tunisia. 3Research Institute of Wildlife Ecology, University of Veterinary Medicine Vienna, Savoyenstrasse 1, 1160, Vienna, Austria. Correspondence and requests for materials should be addressed to A.A. (email: [email protected])

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 1 www.nature.com/scientificreports/

on MHC genes is suggested to be mediated by combined efects including recent and ancient demography as well as behavioural and ecological diferences in pathogens exposure14–16. Moreover, host-pathogen interactions have been suggested to be afected by temperature, nutrient availability and environmental stress17. Whereas increases of ambient temperature can reinforce selection on immune genes, thereby accelerating host adaptation18,19, vari- ation in nutrient levels can force physiological modifcation that could either impair or promote local adaptation of the host17. Among all MHC molecules, class II genes have received the most attention, due to their role in studying mech- anisms and signifcance of molecular adaptation in vertebrates20 and host parasite co-evolution2. MHCII mole- cules are heterodimers formed by an α and a β chain encoded by A and B genes, respectively21,22. Tese chains are characterized by several typical structures, such as N-linked glycosylation sites, connecting peptides, transmem- brane regions, and cytoplasmic domains, as well as the α1/α2 domain and the β1/β2 domain, respectively23. Te α1 and β1 domains constitute the peptide-binding region (PBR), involved with the recognition and binding to the antigens14,24,25. Tese domains are highly polymorphic and are encoded by the second exon of A and B MHCII genes. Given their role in presenting antigenic determinant peptides to the cell surface, these loci are typically expected to be subject to strong positive selection, since their greater allelic diversity should be associated with a response to a wider range of pathogens16. Genetic diversity in the MHC is also generated through several mecha- nisms, including point mutation, recombination, , sexual selection and maternal–foetal interac- tions25–28. Te variation generated by these processes is thought to be maintained in individuals and populations by a combination of balancing and diversifying selection, driven primarily by variation in selection pressure from pathogens and parasites across space and time3,28. encompasses a number of diferent evolutionary mechanisms that result in a large number of alleles being maintained in populations for longer than would be expected relative to neutral genetic variation24. Such balancing selection is generally explained by three mechanisms a) over-dominant selection or heterozygote advantage where heterozygous individuals have higher ftness than both corresponding homozygotes; b) frequency dependent selection or rare allele advantage where individuals with rare genotypes have higher ftness29,30; c) fuctuation in selection pressure where MHC polymor- phism is maintained by changing pathogen composition through time and space. Conversely, under diversifying selection at MHC loci, the spread of an advantageous allele (positive selection) would be expected to lead to a loss of genetic variation. Similarly, selection against disadvantageous alleles (negative or purifying selection) would also be expected to reduce diversity31. Among the diferent suggested mechanisms of balancing selection, fuctua- tion in selection pressure have been reported for class II MHC genes among populations of the Asian cygomolgus macaque (Macaca fascicularis)12, and the Atlantic salmon32, among other studies. Patterns of MHC selection can be inferred by comparing MHC genetic diferentiation to diferentiation at presumably neutral markers28. Neutral loci refect demographic factors such as migration and drif, whereas loci under selection usually show higher diferentiation under diferent local selection regimes or lower diferentiation under balancing selection28,33 or under negative/purifying selection33. Higher diferentiation at MHC genes could therefore refect local adaptation to diferent parasite faunas prevailing in the respective populations. In Tunisia, hares (Lepus capensis) are distributed throughout the whole country in diverse environments and over short geographic distances, reaching from the Mediterranean climate in the north to the pronounced arid desert climate (Sahara) in the south. Tis situation provides a good precondition to study natural selection and adaptation on functional genes5. Te likely diferences in diferential parasite pressure in these habitats could lead to diferent selection pressure at MHC loci among the local populations. Such selective pressure would shape diversity and diferentiation profles in interaction with neutral demographic/stochastic factors. In this study, we examined the MHC class II polymorphism for DQA and DQB genes in hares from Tunisia along a steep ecological gradient. We frst investigated the variation of genetic diversity in the diferent studied populations distributed in heterogeneous ecoregions from North to South. If diversifying selection is the main evolutionary force acting on MHC genes, we expected signifcant genetic diferentiation despite observed high gene fow in neutral markers. Second, we used diferent approaches to evaluate the efect of selective pressure in each codon of the obtained sequences. Moreover, we compared the levels of allelic diversity of the considered hare popula- tions with those obtained for a set of neutral loci. We expect that MHC diversity varies signifcantly between the regional populations under study given that spatial changes in parasite communities and dynamics with envi- ronmental conditions might alter the selection pressures they exert on their host populations17. However, neutral demographic processes (i.e., gene fow, genetic drif…) can also have signifcant efects on the genetic diversity of natural populations4,34. Given the steep ecological gradient along the relatively short geographical distance from where our samples were drawn and the absence of any obvious migration barrier across our study area, signifcant variation of MHC allele frequencies across the study area should refect adaptation to local environments. Tus, we hypothesize that climatic factors associated with the geographical origin of the samples will lead to spatial variation in the fre- quencies of the most common alleles. Moreover, MHC alleles at diferent loci are expected to co-evolve. Tus, we further expect a signifcant level of linkage between the respective co-occurring allele or genotype at the second MHC locus studied on the occurrence of an MHC allele under consideration. Such allele efects of co-occurring MHC loci might have led to an “optimal” combination of alleles at the two loci studied independently of spatial or climatic efects. Results Allelic diversity and genetic differentiation. We identified 17 DQA alleles (accession numbers MH346126-MH346142) and 26 DQB alleles (accession numbers MH346143- MH346168) in 142 hares (L. cap- ensis) from Tunisia with a total length of 228 and 222 bp, respectively. Among them, only three DQA alleles (Lecp-DQA*04, Lecp-DQA*05, Lecp-DQA*06 corresponding to Leeu-DQA*02, Leeu-DQA*03, Leeu-DQA*09 in L. europaeus, respectively) were previously described in L. europaeus. No insertions, deletions, or stop codons

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 2 www.nature.com/scientificreports/

Figure 1. Amino acid alignment of DQA exon 2. Antigen-binding sites (ABS) are indicated above with the sign “+”. Positively selected codons identifed by the diferent methods are indicated with asterisks. Numbering of amino acid residues is according to Bondinas et al.89.

Figure 2. Amino acid alignment of DQB exon 2. Antigen-binding sites (ABS) are indicated above with the sign “+”. Positively selected codons identifed by the diferent methods are indicated with asterisks. Numbering of amino acid residues is according to Bondinas et al.35.

were detected in either DQA or DQB sequences. Te nucleotide sequence of each allele in both loci translated into a diferent amino-acid sequence (Figs 1 and 2). Allele frequencies and basic diversity parameters for the two MHC class II exon 2 loci in the diferent populations as well as the overall values are displayed in Table 1. Hares were grouped into three populations according to climatic, geographic and phenotypic data (see methods section). Tese populations are northern Tunisia (NT), central Tunisia (CT) and southern Tunisia (ST). Te numbers of alleles per regional population were 14, 12, and 9 in NT, CT, and ST, respectively, for the DQA locus. For DQB, allele numbers were 18, 15, and 11 in NT, CT and ST, respectively (Table 1). In general, in all three populations, few alleles were relatively abundant, e.g. Lecp-DQA*07, Lecp-DQA*08, Lecp-DQB*01 and Lecp-DQB*09 were highly frequent, whereas most others were detected at very low frequencies. A comparison of the obtained genotypes revealed alleles specifc to each regional population. For DQA, seven alleles were shared between the three populations, whereas three alleles (Lecp-DQA*18, Lecp-DQA*19,

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 3 www.nature.com/scientificreports/

DQA Allele NT CT ST DQB Allele NT CT ST Lecp-DQA*04 0.0075 Lecp-DQB*01 0.0500 0.3219 0.0286 Lecp-DQA*05 0.0484 0.0149 Lecp-DQB*02 0.0500 0.1027 0.0143 Lecp-DQA*06 0.0161 0.0075 Lecp-DQB*03 0.0411 0.5000 Lecp-DQA*07 0.3548 0.1493 0.0690 Lecp-DQB*04 0.0167 0.0479 Lecp-DQA*08 0.2258 0.5746 0.1034 Lecp-DQB*05 0.0068 0.2143 Lecp-DQA*09 0.0323 0.0597 0.0345 Lecp-DQB*06 0.0333 0.1233 Lecp-DQA*10 0.0161 0.0224 0.1379 Lecp-DQB*07 0.0333 0.1507 Lecp-DQA*11 0.0806 0.0970 0.3793 Lecp-DQB*08 0.0411 0.0714 Lecp-DQA*12 0.0161 0.0149 0.0345 Lecp-DQB*09 0.4667 0.0411 0.0143 Lecp-DQA*13 0.0690 Lecp-DQB*10 0.0068 0.0286 Lecp-DQA*14 0.0161 0.0075 Lecp-DQB*11 0.0137 0.0429 Lecp-DQA*15 0.0161 0.0299 Lecp-DQB*12 0.0167 0.0685 0.0286 Lecp-DQA*16 0.0323 0.0149 0.1034 Lecp-DQB*13 0.0137 Lecp-DQA*17 0.0690 Lecp-DQB*14 0.0500 Lecp-DQA*18 0.0161 Lecp-DQB*15 0.0167 Lecp-DQA*19 0.0968 Lecp-DQB*16 0.0500 0.0068 Lecp-DQA*20 0.0323 Lecp-DQB*17 0.0143 Lecp-DQB*18 0.0333 0.0429 Lecp-DQB*19 0.0137 Lecp-DQB*20 0.0667 Lecp-DQB*21 0.0167 Lecp-DQB*22 0.0167 Lecp-DQB*23 0.0167 Lecp-DQB*24 0.0167 Lecp-DQB*25 0.0333 Lecp-DQB*26 0.0167 N 31 67 29 N 30 73 35 Na 14 12 9 Na 18 15 11 H exp. 0.8002 0.6323 0.7990 H exp. 0.7611 0.8351 0.6922 H obs. 0.4839 0.4776 0.5517 H obs. 0.5000 0.3425 0.4571 Rs 13.603 9.050 9.000 Rs 17.731 11.941 10.396

Table 1. Allele frequencies of two MHC class II loci for the three Tunisian hare populations. N Sample size, Na number of alleles, Hobs. observed number of heterozygotes, Hexp. expected number of heterozygotes, Rs allelic richness, NT: northern Tunisia, CT: central Tunisia, ST: southern Tunisia.

Lecp-DQA*20) were specifc to NT, one allele (Lecp-DQA*04) to CT and two (Lecp-DQA*13, Lecp-DQA*17) to ST. For DQB, only four alleles (Lecp-DQB*01, Lecp-DQB*02, Lecp-DQB*09, Lecp-DQB*12) were shared between the three regional populations. Nine alleles were specifc to NT, two to CT, and two to ST. Frequencies of these specifc alleles ranged between 0.0161 and 0.0968 for DQA and between 0.0137 and 0.0500 for DQB (Table 1). Finally, allelic richness (Rs), which measures the number of alleles independently of sample size, was signif- icantly (p < 0.0001) higher for the two MHC loci than for the microsatellite loci in NT but not in CT and ST. Moreover, allelic richness varies signifcantly (p = 0.027) between NT and the two other populations (CT and ST). Signifcant (FIS for NT = 0.384, p < 0.001; FIS for CT = 0.447, p < 0.001; FIS for ST = 0.338, p < 0.001) devia- tion from Hardy-Weinberg Equilibrium in all three regional populations indicated that the MHC loci contained signifcantly fewer heterozygotes than expected. Finally, there was no signifcant linkage disequilibrium between genotypes at the DQA and DQB locus as indicated by the Linkage disequilibrium test (R = 0.16; p = 0.22). Genetic diferentiation between regional populations was revealed by pairwise comparisons of Dest and FST values (Table 2). Population diferentiation as determined by mean Dest values across the two loci was high and ranged between 0.46 (NT vs CT) and 0.78 (NT vs ST). For the individual loci, Dest values for the DQA locus ranged between 0.29 (NT vs CT) and 0.59 (CT vs ST), whereas those for the DQB locus ranged between 0.73 (NT vs CT) and 0.96 (NT vs ST). Similarly, pairwise FST (Table 2) values revealed signifcant genetic divergence among regional populations ranging between 0.12 (NT vs CT) and 0.19 (NT vs ST). In contrast, pairwise values for the microsatellite loci were low ranging between 0.013 to 0.030 for FST values and between 0.043 and 0.112 for pairwise Dest values (Table 2). Finally, a hierarchical population structure (AMOVA) based on analyses of variance of gene frequencies was evident in the genetic diferentiation across both MHC loci, with 17.13% of the genetic variance attributable to between regional population diferences, 4.79% between sampling localities within regional populations, and 78.09% to within sampling localities (P < 0.001, P < 0.05, and P < 0.001, respec- tively). Tese estimations for the microsatellite loci were 0.60 (P > 0.05), 4.58 (P < 0.001) and 94.82 (P < 0.001) for the same parameters, respectively.

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 4 www.nature.com/scientificreports/

NT CT ST

a. FST for MHC (upper values) and microsatellite loci (lower values) NT 0.123* 0.180* CT 0.013 0.187* ST 0.030 0.028 b. Dest for DQA (upper values) and DQB (lower values) separately NT 0.29* 0.54* CT 0.73* 0.59* ST 0.96* 0.82* c. Dest for MHC (upper values) and microsatellite (lower values) loci NT 0.46* 0.78* CT 0.043* 0.69* ST 0.112* 0.101*

Table 2. Pairwise FST and Dest values among Tunisian hare populations for MHC and microsatellite loci. NT: northern Tunisia, CT: central Tunisia, ST: southern Tunisia. *p < 0.001.

Linkage Heterozygote Molecular disequilibrium Excess Coancestry Ne 396.8 Infnite 364.0 NT 95% CI 123.4-Infnite — 0.4–1827.5 Ne 477.2 Infnite Infnite CT 95% CI 274.3–1588.2 — — Ne 317.9 Infnite Infnite ST 95% CI 109.2-Infnite — —

Table 3. Estimates of efective population size (Ne) according to three diferent methods implemented in Ne estimator. NT: North Tunisia, CT: Central Tunisia, ST: South Tunisia. CI: confdence interval.

Figure 3. Principal component analysis (PCA) of the allelic diversity in the diferent sampling regions of hares. Each dot represents one MHC allele; the DQA and DQB alleles are indicated as A-allele number and B-allele number, respectively.

Te frst two axes of our PCA (Fig. 3) based on the DQA and DQB allele matrix explained 31.51% of the vari- ance in our MHC genotypic data (18.19% F1, 13.32% F2). Both axes (F1 and F2) were structured by DQA alleles DQA*7, DQA*8, DQA*11, DQA*13 and DQB alleles DQA*1, DQA*3, DQA*9 (Fig. 3). Among these alleles, only DQA*13 was a rare allele. Finally, bottleneck signatures were detected in CT (p = 0.0176) and ST (p = 0.0017) populations when using the Two Phase Mutation model (TPM). No tests detected a bottleneck in the northern Tunisian (NT) popula- tion. Diferent Ne estimates were obtained for each population depending on the used method (Table 3). Te Heterozygote Excess based method gave an infnitely large estimate of Ne for all populations. Te smallest esti- mates were observed in the ST population (317.9) by the Linkage disequilibrium method.

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 5 www.nature.com/scientificreports/

r2 |D’| G4 ρ θ ρ/θ DQA 0.432 (−0.0071) 0.798 (0.0388) 0.712 (0.0232) 0.126 12.719 0.010 DQB 0.001 (−0.1639) 0.194 (−0.0370) 0.191 (−0.0407) 16.771 11.53 1.455

Table 4. Intralocus recombination in the Tunisian hare sequences measured as r2, D’ and G4, and LDhat population mutation (Wattersons θ = 4 Nμ) and population recombination (ρ = 4 Nr).

Selection and recombination analysis. Using the three PERMUTE statistics, no recombination was obtained for the DQA locus, whereas recombination evidence was obtained by one test (r2 = −0.163; p = 0.001) for the DQB locus (Table 4). One recombination event was detected for each locus when using GARD and one to three events were detected for DQB and DQA, respectively, using the RDP statistics (Tables S1 and S2). Te estimates of population mutation (θ) and recombination (ρ) obtained with the LDhat approach showed that each of these processes has played a signifcant role in generating the diversity seen among the currently studied MHC beta genes (Table 4). To fnd out the relative contribution of recombination and point mutations in the of these two currently studied genes in hares, we examined the ratio of average mutation (θ) and recombination rate (ρ). By this measure, mutation was of approximately hundred times more important in the evolution of DQA than recombination. In contrast, the recombination rate was about one and a half times greater than mutation in the evolution of the DQB gene (Table 4). Te codons found to be positively selected by all methods are shown in Figs 1 and 2. Five codons (8, 10, 16, 55 and 69) were considered to be under episodic diversifying selection for DQA as evidenced by more than one of the applied tests in DATAMONKEY. Six (8, 10, 50, 53, 55 and 76) and three (47, 55 and 69) codons were reported under positive selection for the same locus by PAML (see Table S3) and OMEGAMAP, respectively (Figs 1 and 2). Only site 55 was confrmed by all methods. Twelve codons were suggested to be under negative purifying selec- tion in DQA as indicated by the diferent DATAMONKEY tests. For DQB, fve positions (amino acid sites 8, 26, 48, 60, 67) were under episodic diversifying selection in DATAMONKEY. Ten and eleven codons were detected with PAML (codons 13, 14, 26, 32, 37, 48, 57, 59, 60 and 67) and OMEGAMAP (13, 14, 26, 30, 37, 48, 57, 59, 60, 66 and 67), respectively. Four codons (26, 48, 60 and 67) were confrmed by all three methods to be under positive selection. Negative selection in the DQB locus was detected in fve codons (15, 16, 45, 62 and 72) as suggested by DATAMONKEY tests. Finally, the outlier test of Beaumont and Nichols35 indicated departure from expectation under a neutral evo- lution scenario for both the DQA and the DQB locus (Fig. 3), with higher values than the microsatellite loci, in accordance with the codon-based signals of positive selection.

Statistical Models of DQA and DQB protein occurrence. In three cases we found models (Table 5) with excellent support from the data (i.e. delta AICc > 10, Null-model insubstantial36), in two other cases models with strong support (delta AICc > 8) and in two cases models with some support (delta AICc > 4). Two models had a deviance explained of more than 40%, which is notable, because in multinomial models nominal data are compared with quantitative predictions. In all models except for the “dqb1gt model” geographic variables (latitude or longitude) had a high RVI-value (> = 0.89) indicating a high probability for these variables being in the best model. Furthermore, the delta devi- ance value of one of the geographic variables was more than ten times higher than the respective co-occurring DQA or DQB protein variants combinations in four models and eight times higher in the “dqa11gt”-model (Table 5; see also Supplementary Table S4 for estimated coefficients, standard errors, and upper and lower bounds of the 95% confdence interval for all models). Tis indicates strong support for an efect of the geo- graphic variables on the dependent variables in comparison to the respective co-occurring DQA or DQB protein variants combinations. In the “dqb1gt-model”, again the performance was very low. Overall, these results sug- gest a strong efect of geographic variables on the co-occurring DQA or DQB protein variants. In particular, the proteins of Lecp-DQA*07 and Lecp-DQA*08 did occur more ofen in homozygous state in Northern Tunisia, with lower mean annual temperatures and higher levels of precipitation, whereas proteins of Lecp DQA*11 were more common in South Tunisia, with higher mean annual temperatures and lower precipitation. For the DQB locus, proteins of Lecp-DQB*01 were more frequent, in homozygous state, in eastern parts of Tunisia, proteins of Lecp-DQB*01 in heterozygous state as well as proteins of Lecp-DQB*09 in homozygous and heterozygous state were more common in North Tunisia, whereas proteins of Lecp-DQB*03 in homozygous and heterozygous state were more frequent in South Tunisia. Moreover, proteins were generally more common, in heterozygous state, in North Tunisia. Discussion We assessed the MHC diversity and its distribution at two MHC class II genes, DQA and DQB, in hare popula- tions (L. capensis) along a steep ecological gradient in North Africa. We found 17 DQA and 26 DQB alleles among 142 L. capensis individuals. We have analysed the molecular mechanisms of sequence evolution in both loci, i.e., substitution, recombination, and positive selection. We have also described the genetic diferentiation and population structure and compared this across both MHC loci and populations. Apart from signifcant positive selection and recombination events, we found a signifcant decrease of allelic richness at the two MHC loci from the northern population under Mediterranean humid climate to the southern population in the Sahara region. In addition, we have found a high population substructuring at the two MHC loci compared to supposedly neutral microsatellite markers.

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 6 www.nature.com/scientificreports/

dqb1gt dqb3gt dqb9gt dqa7gt dqa8gt dqa11gt hezyg variables RVI delta dev RVI delta dev RVI delta dev RVI delta dev RVI delta dev RVI delta dev RVI delta dev lat 0.68 5.7 1 49.7 1 59.7 0.99 12.2 0.53 5.3 1 17.8 0.89 7.8 long 0.76 5.7 0.35 2.2 0.63 2.1 0.27 1.5 0.98 12.2 0.14 1.1 0.11 1.1 alt 0.13 0.5 0.17 0.6 0.24 0.0 0.15 0.2 0.19 2.1 0.14 1.0 0.15 1.7 dqa_fxedfac 0.6 4.8 0.18 1.5 0.4 3.4 dqb_fxedfac 0.11 0.1 0.12 0.5 0.25 2.1 delta AICc 5.98 49.58 59.44 8.41 8.99 16.29 4.4 dev explained 11.52% 41.96% 40.47% 8.46% 7.71% 15.84% 4.21%

Table 5. Results of multinomial models for all seven dependent variables. Except for dqb1gt, always one of the geographic variables had a high RVI and this was multiple times stronger than dqa/dqb (not valid for the heterozygosity models).

Our results show a high level of polymorphism for the DQA and DQB loci. Such wide allelic diversity in both allele numbers and pairwise nucleotide distances is characteristic for MHC genes37. Moreover, this is in accord- ance with the results of our previous analyses of the same individuals using mtDNA control region sequences38, partial transferrin sequences39, allozyme40, and microsatellites38, which confrms the generally high genetic vari- ability of Tunisian hare populations. Notably, the detected levels of polymorphism of DQA and DQB were higher than those reported for brown hares (L. europaeus) from Austria and Belgium8,41–43. Klein44 and Takahata et al.45 suggested that such diferences in polymorphism could be due to either a diference in the speed of allelic diver- sifcation or in time of their persistence in the gene pool. Both loci in our study have exhibited very similar substitution rates, but a greater recombination rate was recorded for DQB. Tis efect may be due to a longer persistence of DQA diversity in the populations, whereas DQB allele diversifcation may be more likely due to higher intra-locus recombination. Indeed, we found evidence for multiple (intra-allelic) recombination events creating novel DQA and DQB haplotypes. According to the diferent methods used (that vary in their conserva- tive nature), breakpoints at position 120 and 221 were detected by only one test (Tables S1 and S2). For DQA, mul- tiple breakpoint positions were obtained and confrmed by more than one of the methods. In contrast, the overall recombination rate (ρ = 16.771) of DQB was one and a half times more than the mean mutation rate (θ = 11.53) indicating that the generation of DQB diversity is driven by an accumulation of new recombinants relative to new mutations (ρ/θ = 1.455), which is similar to results obtained in other Lepus species43. For DQA, recombination seems to have a minor efect on its evolution compared to mutations (the ratio ρ/θ = 0.010). In contrast, Wegner46 showed that recombination rate and intensity of positive selection were positively correlated in MHC class I and class II sequences of nine diferent species of fsh, suggesting that recombination and gene conversion played a signifcant role in the evolution of the studied MHC genes. Genetic diferentiation between the studied Tunisian hare populations was high in both MHC class II genes as suggested by high and signifcant FST and Dest values. But the same populations showed little genetic structure in the microsatellite loci, which is in accordance with mtDNA and nuclear sequences39,47, as well as microsatellite and allozyme genotypes38,40 studied earlier in these populations. Terefore, population genetic diferentiation at MHC loci could result from either a divergence in qualitative and quantitative parasite pressure that would select for diferent types of alleles in the diferent ecoregions but may also be due to diferences in modes of mutation and mutation rates between the two marker types. However, high gene fow was concordantly indicated by several markers38–40 (microsatellites, allozymes, mtDNA d-loop and transferrin sequences) characterized by diferent modes of mutation and mutation rates. Terefore, the observed pattern of diferentiation only at MHC loci is likely the result of diferential selection pressures across the studied populations which is consistent with theoretical expectations28. Indeed, the few frequent alleles (those with >10%; Lecp-DQA*07 and Lecp-DQB*09 for NT, Lecp-DQA*08 & Lecp-DQB*01 for CT, Lecp-DQA*11 and Lecp-DQB*03 for ST) vary greatly in each population for each locus. Tose predominant alleles were suggested by the PCA to be responsible for the genetic difer- entiation observed at MHC loci between the three ecoregions. Such diferences in allele frequencies between populations have been demonstrated to be mediated by pathogens in afected and unafected chamois popula- tions48. In contrast, cohorts of brown hares characterized by similar quantitative and qualitative pathogens have exhibited the same frequent allele as a sign of stabilizing selection41. Moreover, estimations of efective population size and bottleneck signature indicate that the reduced allelic richness of the ST population may be the result of a smaller and/or drifing population. Terefore, the lower MHC polymorphism, in terms of number of alleles, in the ST population may be due to demographic processes, independently from specifc parasite pressure. Te results indicate that the population structuring at MHC loci is linked to qualitative MHC polymorphism (allelic diversity: diferent alleles are frequent in diferent populations) likely shaped by variation in parasite pressure, but the quantitative MHC polymorphism (allelic richness: fewer alleles from North to South) may still be the results of demographic processes. Both loci showed excess of non-synonymous to synonymous substitutions, a signal of positive selection49 that was stronger at the DQB locus than for DQA as indicated by a signifcantly higher number of positively selected sites. In contrast, negative purifying selection was stronger at the DQA than in DQB locus. In comparison with other studies of lagomorph species, Goüy de Bellocq et al.42 identifed only one codon under positive selec- tion (amino acid 56 in the current study) in thirteen DQA alleles of L. europaeus populations from Austria and

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 7 www.nature.com/scientificreports/

Figure 4. Estimated FST values from the 16 loci studied (14 microsatellites, DQA and DQB) plotted as a function of heterozygosity (He) using Lositan sofware. Gray shading indicates the area on the graph within the confdence limits. DQA and DQB were found to be under positive selection.

Belgium and four codons (three were identifed presently as well: 10, 56, and 70) among 19 Oryctolagus cuniculus alleles. For DQB, Smith et al.43 identifed two codons (codons 26 and 60) under positive selection among eleven DQB alleles in L. europaeus, which are also under positive selection in the current study. Te observed heterozygosity values of MHC loci were rather low in all three regional populations showing a signifcant defcit. Te low heterozygosity might ft to a pattern of recent expansion, as suggested by mitochon- drial and nuclear sequences38,39 or be due to allelic dropout as diferent alleles might have similar conformation when using CE-SSCP techniques1,50. However, the distribution of MHC heterozygosity suggests that heterozy- gote advantage is unlikely given the observed positive selection in the studied populations. Indeed, the MHC FST values were clearly higher than those for microsatellite loci, suggesting an apparent efect of diversifying positive selection. Tis conclusion was also supported by the outliers test, where the DQA and DQB loci showed signif- cantly higher mean FST than predicted from its mean heterozygosity (Fig. 4). Further support for our interpreta- tion comes from our observation of regionally diferentiated populations that also suggest diversifying selection maintained by changes of pathogen composition across geographic regions. However, other mechanisms, such as sexual selection and mating system, might drive the evolution of MHC genes28. Nevertheless, it is probable that only the efects of pathogens vary both over time and among populations8. Host-parasite interactions were suggested to be shaped by environmental conditions (i.e., temperature, nutrient availability; see Brunner and Eizaguirre17 for an overview) which can infuence parasite species richness, as well as intensity of infestation of hosts. According to Klein and Horejsi51 an indirect assessment of parasite infestation can be seen via analysis of MHC class II molecules diversity in the host species, as these molecules primarily correspond to extracellular pathogens, such as bacteria, nematodes, cestodes. Indeed, associations between MHC variants and parasite load have been demonstrated in numerous cases28. Here we found signifcant efects of mean annual temperature and precipitation in the distribution of some MHC proteins/alleles which might indicate regional and climate efects. Concordantly, genetic diferentiation and estimation of gene fow indicated regional distribution patterns for MHC loci compared to high genetic admixture in the neutral markers (i.e. microsatellites). Both results suggest that the MHC polymorphism across sampling populations might refect an adaptation to variation in parasite pressure across the ecological gradient. We expect quantitative and qualitative changes of pathogens from the Mediterranean humid region in the north to the arid Sahara in the south. Indeed, Zvinorova et al.52 showed that the highest helminths and Eimeria infections in goats were observed in the wet season, whereas Pandey et al.53 suggested direct relationship between rainfall and intensity of infection with gastrointestinal nematodes in goats. Notably, diversity of the mammalian fauna that potentially constitutes a complex pathogen reservoir shows a reduction along the currently studied ecological gradient in Tunisia54, probably due to habitat changes and nutrient availability. Moreover, population density of hares from Tunisia seems to decrease from North to South Tunisia (personal observation during hunting trips). Tis may point towards a combined efect of various aspects, mainly driven by environmental conditions, resulting in the reduction of pathogens in the Sahara region and hence resulting in lower MHC diversity. Estimation of efective population size showed that Ne estimates were not consistent across methods. Tis is expected as the accuracy and precision of the diferent methods to estimate Ne are still uncertain55. We fol- lowed Wang56 who showed that linkage disequilibrium (LD) methods perform dramatically better than the other methods. Ne estimates using this method indicated a similar efective population sizes in all three populations. Moreover, our current Ne estimate for the ST population was less than 400 which was suggested to be the min- imum efective population size necessary to maintain genetic variation in the long term in a population57. Tis later population was suggested also by the bottleneck test on neutral microsatellite loci to be under genetic drif. Furthermore, Olivier et Piertney58 have demonstrated that selection counters drif to maintain polymorphism at a major histocompatibility complex (MHC) locus in water vole bottlenecked population from Coiresa Island.

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 8 www.nature.com/scientificreports/

Figure 5. Sampling regions of hares from North, Central and South Tunisia [Te original map was created by using Google Maps, https://www.google.tn/maps/@34.6113892,8.7590835,6z?hl=fr and the map of Tunisia was drawn by using POWERPOINT 2007 part of the Microsof Ofce Package (https://www.microsof.com/fr-fr/ sofware-download/ofce)]. Sample sizes appear in parentheses. Hares were grouped into three populations according to climatic, geographic and phenotypic data.

Terefore, we consider that demographic factors (e.g. bottlenecks or drif) were strong enough to shape the allelic richness in ST population and/or selective pressures were weak to counter the efect of genetic drif. In conclusion, our present fndings from hares along a steep ecological gradient in North Africa indicated that MHC polymorphism and diferentiation were shaped by a combination of neutral demographic processes and climate-driven positive selection of MHC genes across relatively short geographical distances. How changes in pathogenic landscape diversity on the one hand, and how immunologic resistance or tolerance of infected hosts on the other hand may interact with local or regional climatic parameters in the evolution of MHC diversity and in the course of adaptation to environmental characteristics needs, however, further studies with particular inclu- sion of infection patterns of the hosts. Methods Samples. A total of 142 samples of cape hares (Lepus capensis, e.g.59–61) were collected during regulars hunts from various biogeographic zones within the major ecoregions in Tunisia. We never manipulated live animals; we got only hare samples during regular hunting season from hunters. All samples were collected with the agreement of the general forest department of Tunisia (authorization no. 2629DGF/DCF/CPN; Ministry of Agriculture, Water Resources and Fishing, Tunisia). Hares were grouped into three populations corresponding to the three main climatic zones in Tunisia (e.g. You et al.62). Acronyms of sampling localities and positions within northern (NT), central (CT), and southern (ST) Tunisia, as well as sample sizes are given in Fig. 5. Te whole sampling area expanded from the Mediterranean humid and sub-humid bioclimatic zones in the north (NT), characterized by cork oak (Quercus suber) Mediterranean forest and scrubland (maquis: rosemary, carob tree, mastic tree, thyme, lavender), across the semi-arid central part of Tunisia (CT) to the arid and hyper-arid Sahara in the south (ST), with typical plants that have adapted to dry conditions such as Acacia tortilis subsp. raddiana, Vachellia spp., Stipagrostis pungens vegetation62,63. Our study area represents a remarkably steep climatic and ecological gradi- ent across a straight distance of less than 500 km, with means of annual precipitation and temperature ranging between 916 mm and 16.3 °C in the north and 77 mm and 21.26 °C in the south.

PCR amplifcation, CE-SSCP and Sequencing. Total genomic DNA was extracted using the GenElute Mammalian Genomic DNA Miniprep kit (Sigma-Aldrich) from diverse tissues (liver, skeletal muscles, ear carti- lage, tongue) of each specimen5,39,47,60. Amplifcation of exon 2 fragments for two MHC class II genes (DQA and DQB) were performed according to Goüy de Bellocq et al.42 and Smith et al.43. Te amplifed 258 bp and 249 bp (including primers) correspond to the

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 9 www.nature.com/scientificreports/

total exon 2 of DQA and DQB genes, respectively. We used capillary electrophoresis single-strand conformation polymorphism (CE-SSCP) to screen allelic diversity in the above mentioned genes. Amplifcation was carried out using the Multiplex PCR kit (Qiagen) following the manufacturer’s instructions in a fnal reaction volume of 10 μl. Forward primers for both gene segments were 6’FAM-labelled and reverse primers were NED-labelled. We selected individuals with diverse SSCP patterns, representing all identifed alleles, to investigate sequence variation at both DQA and DQB loci, respectively. Te genes were amplifed using the same procedure as in Goüy de Bellocq et al.42 and Smith et al.43, but using non-labelled primers. As homozygous states were detected by SSCP analysis for all alleles in both gene segments, we directly sequenced the PCR product of the corresponding sam- ples. Among all currently analysed samples, only eight were not screened with SCCP. Tese eight samples were directly sequenced using the same primers and the program phase64 was used to separate alleles in heterozygous samples. For comparison between MHC and neutral genetic variation, genotype data of 14 microsatellite loci of 134 individuals out of the specimens analysed in the current study were used from a previous study38.

Data analysis. Population genetic parameters. Sequences were edited and aligned by eye using the BioEdit v.7.2.5 program©, 1997-201365. Allele frequencies, mean number of alleles (A), observed (Ho) and expected (He) heterozygosity were calculated for each locus and for each regional population (NT, CT, ST) with GENETIX66. Tests of deviation from Hardy–Weinberg equilibrium and linkage disequilibrium were calculated using GENEPOP 4.067 and GENETIX66, respectively. Te allelic numbers for the cape hare sequences were assigned according to the guidelines of Klein et al.68 and begin at Lecp-DQA*04 and Lecp-DQB*01 for DQA and DQB, respectively. Te FSTAT vers. 2.9.3.2. program69 was used to calculate locus-specifc values of allelic richness (Rs) based on a rarefaction approach to account for diferent sample sizes. Allelic richness calculation was based on minimum sample size of 29 and 18 individuals for MHC and microsatellite loci, respectively. Diferentiation between regional populations was determined by calculation of standardized pairwise FST (10 66 70 000 permutations) in GENETIX and also Jost’s D (Dest) as a more robust measure than FST using GenAlEx 6.571. In addition, an analysis of molecular variance (AMOVA) from the MHC data was calculated for the three eco-regions (NT, CT, ST) using the Arlequin 3.5 program72. Te same genetic parameters were also estimated for genotypes at 14 microsatellite loci obtained earlier for the same individuals38 for NT, CT, ST. Furthermore, in order to identify the contribution of the diferent MHC alleles to the population structure, a principal components analysis (PCA) was performed in ADE-473, using the genotype data at both MHCII loci. Coordinates obtained for each allele were used for scatter plots using Past sofware74. Finally, as demographic events might have afected the observed pattern of diversity and diferentiation in MHC loci, we frst use the “Wilcoxon signed-ranks test”75 implemented in the bottleneck sofware to test for bottleneck in the microsatellite data. As recommended by Piryet al.75, we used the Two Phase Mutation model (TPM) and the Stepwise Mutation Model (SMM).We then calculated efective population size using the neutral microsatellite data using Ne estimator76. Tree diferent methods based on the information of linkage disequilib- rium (LD), heterozygote excess (HE) and molecular coancestry (MC) were used.

Recombination analysis. Te assumptions of several methods of sequence analysis are violated if recombina- tion is present77. To check for the presence of recombination in the alignment of our sequences, we frst used PERMUTE (included in the sofware package OMEGAMAP78), which computes three diferent statistics (r2, D, and G4) based on the correlation between physical distance of sites and their linkage disequilibrium. Te RDP, GENECONV, MAXCHI, CHIMAERA and 3SEQ implemented in the RDP3 package79 were all employed to detect recombination breakpoint locations. In the latter analyses we used a signifcance level of 0.05. We also cal- culated the population scaled recombination rate ρ and mutation rate θ by using LDhat recombination rate scan implemented in theRDP3 package79. Finally, the GARD method80 was employed to search for possible recombi- nation partitions.

Detecting positive selection. In order to infer evidence of positive selection, three diferent methods that compare the rates of synonymous and non-synonymous substitutions separately for every single codon were used. We chose to use HYPHY81 as implemented in the DATAMONKEY webserver (http://www.datamonkey.org/83; last accessed 30th January 2017), OMEGAMAP78 and CODEML (PAML 4 package82). Te frst two sofware packages have the important advantage of taking recombination into account whereas CODEML does not account for that. Te OmegaMap program is based on a Bayesian approximation to the coalescent the- ory and generates means and credible intervals for the selection parameter (dN/dS = ω) and recombination rate (ρ = 4Nr) for each codon (N and r represent the efective population size and the per codon rate of recombina- tion, respectively). We used the same parameters as in Smith et al.43. Two Markov chain Monte Carlo runs of 250 000 iterations (25 000 iteration burn-in) for DQA and 400 000 iterations (40 000 iteration burn-in) for DQB on population allele frequencies at each locus, were compared for convergence. Codons are considered as positively selected with posterior probabilities greater than 95%. In DATAMONKEY, we frst identifed recombination break points with genetic algorithm recombination detection (GARD80) that infers phylogenies for each putative non recombinant fragment. Te output was used to run four diferent maximum likelihood methods for detection of selection: SLAC (single likelihood ancestral counting), FEL (fxed efects likelihood), REL (random efects likelihood), and Mixed Efects Model of Evolution (MEME)83. Signifcance levels of P < 0.25 in SLAC and FEL and P < 0.05 in MEME and Bayes factors >50 in REL were considered as indicating positively selected sites. We considered a codon to be positively selected only if it was identifed by at least two of the methods implemented77,84. Several studies42,85,86 that used CODEML to test for site specifc positive selection in MHC data have shown its robustness although it does not account for recombination events. Two pairs of models were compared using this

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 10 www.nature.com/scientificreports/

program: M1a (neutral model) was compared to M2a (adaptive model) and M7 (beta) was compared to M8 (beta plus omega). Pairwise comparisons on nested models were performed using the likelihood ratio test (LRT): twice the log-likelihood diference was compared with a χydistribution with degrees of freedom equal to the diference in the number of parameters between both models. When the LRT was signifcant, a Bayes Empirical Bayes (BEB) method was used to calculate the posterior probabilities of codon classes in models M2a and M8. Posterior prob- abilities of >0.95 were considered as supportive under the BEB method87. We considered a codon to be positively selected if it was identifed by M2a or M8 or both. In addition to the applied tests to infer molecular signature of positive selection, we used the model-based approach of Beaumont and Nichols35, based on MHC genotypes, implemented in Lositan88 to compare the observed FST values estimated at each locus to a null distribution of FST conditional on heterozygosity. DQA, DQB 38 and fourteen microsatellite loci were tested for neutrality under 50000 simulations, estimated neutral mean FST, infnite alleles mutation model, 99% confdence interval and false discovery rate of 0.1%.

Testing for variation of genetic diversity between regions and for efects of climate variables on MHC alleles and heterozygosity. First, we used locus specifc allelic richness of MHC and microsatellite loci in the three regional populations to run a linear mixed efects model (lmer) to check for signifcant variation of allelic richness from North to South Tunisia. Under a positive selection scenario for MHC loci we expected that allelic richness at both MHC loci varies between sampling regions in the case of diferent pathogen pressure. However, genetic diversity (allelic richness) for selectively neutral markers like microsatellites was not expected to vary between populations, as population genetic results indicated high gene fow across the whole study region38–40. Tus, with a hypothesized higher pathogenic diversity in the more climatically diverse NT population, MHC diversity at the two studied loci was expected to exceed the neutral microsatellite diversity particularly in NT. In the model we used locus specifc allelic richness as obtained by FSTAT as dependent variable, regional population (NT, CT, ST) and locus type (MHC or microsatellite) as fxed factors and locus as random variable to account for potential locus-specifc efects (particularly among the microsatellites) and specifcally tested for a signifcant locus type by population interaction efect. Te model syntax was as follows: lmer (random efect: ~ 1I locus; locus specifc allelic richness ~ population * locus type). We also tested if allelic richness varies across the ecological gradient independently from microsatellite diversity using a general linear model of the syntax: MHC allelic richness ~ region; where MHC allelic richness means locus-specifc allelic richness (for DQA and DQB) and region means NT or (CT & ST). CT and ST populations were combined because allelic richness values (see Table 1) were higher in NT but similarly lower in CT and ST. Second, we used the statistical sofware package R 2.15.0 (R Development Core Team, 2011) to run multi- nomial log-linear models separately for the most frequent protein variants at the DQA (07, 08, 11) and the DQB (01, 03, 09) loci as response variables and geographical latitude, longitude, altitude as well as the protein combi- nation at the respective co-occurring MHC locus as independent variables. However, mean annual temperature (r = −0.796) and annual precipitation (r = 0.831) correlated too closely with latitude to add them in the model, and latitude has the clearest efect of these three variables on the response variable (see Supplementary Table S5). Tus, hereafer latitude will be interpreted as surrogate for climate, acknowledging that other abiotic factors like soil or wetness also might be covered by this variable. Mean annual temperature, annual precipitation and altitude were obtained from WORLDCLIM data set for 2.5 min intervals (Version 1.4, http://www.worldclim.org/bioclim.htm) and were automatically extracted using DIVA-GIS ver. 7.5. We also run a multinomial log-linear model with individual heterozygosity in DQA and DQB loci as response variable and the same geographic variables as inde- pendent variables, expecting a higher level in NT (in parallel to allelic richness). Te model syntaxes were as follows (using the package nnet): 1) Multinom (DQA protein class ~ latitude + longitude + altitude + as.factor (co-occurring DQB protein class)), where the two co-occurring protein classes represented either any DQB protein combination of the most preva- lent proteins 01, 03, 09, on the one hand and any other protein combination on the other,

2) Multinom (DQB protein class ~ latitude * longitude + altitude + as.factor (co-occurring DQA protein class)),

where the two co-occurring protein classes represented either any DQA protein combination of the most preva- lent proteins 07, 08, 11 on the one hand and any other protein combination on the other,

3) Multinom (MHC heterozygosity ~ latitude * longitude + altitude).

We used an information-theoretic approach and techniques of model selection and multi-model inference36 (i.e., model ranking and model averaging) to obtain the best model explaining the data. Based on the global mod- els (see syntaxes above), we run all possible models including the respective null-models and calculated the prob- ability of each possible model being the best model using the Akaike weight based on AICc values (i.e., corrected for small sample sizes). Adding up the weight of all models in which a given variable is present gives the “Relative Variable Importance”, RVI, Table 5). RVI is an estimate of the absolute probability for a variable of being in the best model explaining this particular dataset. For each variable we also calculated the so-called delta deviance, i.e. the diference in deviance between the full model and the model afer dropping this variable. In the cases of completely uncorrelated variables (which is almost the case here), delta deviance gives a relative estimate on the efect strength of one variable to the other, e.g. the efect of latitude on dqb3gt is 49.7/1.5 ≈ 32 times stronger than

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 11 www.nature.com/scientificreports/

the efect of the respective co-occuring dqa genotype. Additionally we report the estimates, standard errors and the upper and lower 95% confdence interval (see Supp2), which indicates the precision of the estimate. We do not use these outputs to interpret the importance of the variables, because this would almost like using p-values in a Null-hypothesis testing framework, which should be avoided in an information-theoretic approach36. For each model we report delta AICc (diference in AICc between the best model and the intercept-only model) and the explained deviance as measure of goodness of ft (Table 5).

Accession codes. Sequence data from this article can be found in the GenBank under the accession num- bers: MH346126-MH346168. References 1. Sommer, S., Courtiol, A. & Mazzoni, C. J. MHC genotyping of non-model organisms using next‐generation sequencing: a new methodology to deal with artefacts and allelic dropout. BMC Genomics 14, 542–559 (2013). 2. Zhang, M. & He, H. Parasite-mediated selection of major histocompatibility complex variability in wild brandt’s voles (Lasiopodomys brandtii) from Inner Mongolia, China. BMC Evol. Biol. 13, 149 (2013). 3. Cohen, S. Strong positive selection and habitat-specifc amino acid substitution patterns in MHC from an estuary fsh under intense pollution stress. Mol. Biol. Evol. 19, 1870–1880 (2002). 4. Gillingham, M. A. F. et al. Very high MHC Class IIB diversity without spatial diferentiation in the mediterranean population of greater Flamingos. BMC Evol. Biol.17, 56 (2017). 5. Ben Slimen, H., Schaschl, H., Knauer, F. & Suchentrunk, F. Selection on the mitochondrial ATP synthase 6 and the NADH dehydrogenase 2 genes in hares (Lepus capensis L., 1758) from a steep ecological gradient in North Africa. BMC Evolutionary Biology 17, 46 (2017). 6. Klein, J. Natural History of the Major Histocompatability Complex. John Wiley and Sons: New York (1986). 7. Weber, D. S., Stewart, B. S., Schienman, J. & Lehman, N. Major histocompatibility complex variation at three class II loci in the northern elephant seal. Mol. Ecol. 13, 711–8 (2004). 8. Koutsogiannouli, E. A. et al. Major histocompatibility complex variation at class II DQA locus in the brown hare (Lepus europaeus). Mol. Ecol. 18, 4631–4649 (2009). 9. Balasubramaniam, S. et al. New data from basal Australian songbird lineages show that complex structure of MHC class II β genes has early evolutionary origins within passerines. BMC Evol. Biol. 16, 1–11 (2016). 10. Kohn, M. H., Murphy, W. J., Ostrander, E. A. & Wayne, R. K. Genomics and conservation genetics. Trends Ecol. Evol. 21, 629–637 (2006). 11. Bonin, A., Nicole, F., Pompanon, F., Miaud, C. & Taberlet, P. Population adaptive index: A new method to help measure intraspecifc genetic diversity and prioritize populations for conservation. Conserv. Biol. 21, 697–708, https://doi.org/10.1111/j.1523- 1739.2007.00685.x (2007). 12. Bonhomme, M., Blancher, A., Jalil, M. F. & Crouau-Roy, B. Factors shaping genetic variation in the MHC of natural non primate populations. Tissue Antigens 70, 398–411 (2007). 13. Vassilakos, D., Natoli, A., Dahlheim, M. & Hoelzel, A. R. Balancing and directional selection at exon-2 of the MHC DQB1 locus among populations of odontocete cetaceans. Mol. Biol. Evol. 26, 681–689 (2009). 14. Baker, C. S. et al. Diversity and duplication of DQB and DRB-like genes of the MHC in baleen whales (suborder: Mysticeti). Immunogenetics 58, 283–296 (2006). 15. Xu, S. X. et al. Sequence polymorphism and evolution of three cetacean MHC genes. J. Mol. Evol. 69, 260–275 (2009). 16. Moreno-Santillán, D. D., Lacey, E. A., Gendron, D. & Ortega, J. Genetic variation and at exon 2 of the MHC class II DQB locus in Blue Whale (Balaenoptera musculus) from the Gulf of California. PLoS ONE 11, e0141296 (2016). 17. Brunner, F. & Eizaguirre, C. Can environmental change afect host/parasite-mediated speciation? Zoology 119, 384–394 (2016). 18. Wegner, K. M., Kalbe, M., Milinski, M. & Reusch, T. B. H. Mortality selection during the 2003 European heat wave in three-spined sticklebacks: efect of parasites and MHC genotype. BMC. Evol. Biol. 8, 1–12 (2008). 19. Björklund, M., Aho, T. & Behrmann-Godel, J. Isolation over 35 years in a heated biotest basin causes selection on MHC class IIß genes in the European perch (Perca fuviatilis L.). Ecol. Evol. 5, 1440–55 (2015). 20. Klein, J. & Figueroa, F. Te evolution of class I MHC genes. Immunol.Today 7, 41–44 (1986). 21. Hughes, A. L. & Nei, M. Evolutionary relationships of class II major-histocompatibility-complex genes in mammals. Mol. Biol. Evol. 7, 491–514 (1990). 22. Hughes, A. L. & Yeager, M. Natural selection at major histocompatibility complex loci of vertebrates. Annu. Rev. Genet. 32, 415–435 (1998). 23. Ottaviani, D. et al. Reconfguration of genomic anchors upon transcriptional activation of the human major histocompatibility complex. Genome Res. 18, 1778–1786 (2008). 24. Sommer, S. Te importance of immune gene variability (MHC) in evolutionary ecology and conservation. Front Zool. 2, 16 (2005). 25. Yeager, M. & Hughes, A. L. Evolution of the mammalian MHC: natural selection, recombination, and . Immunol. Rev. 167, 45–58 (1999). 26. Ohta, T. Gene conversion vs point mutation in generating variability at the antigen recognition site of major histocompatibility complex loci. J. Mol. Evol. 41, 115–119 (1995). 27. Miller, H. C. & Lambert, D. M. Gene duplication and gene conversion in class II MHC genes of New Zealand robins (Petroicidae). Immunogenetics 56, 178–191, https://doi.org/10.1007/s00251-004-0666-1 (2004). 28. Spurgin, L. & Richardson, D. How pathogens drive genetic diversity: MHC, mechanisms and misunderstandings. Proc R Soc B 277, 979–988 (2010). 29. Woolhouse, M. E. J., Webster, J. P., Domingo, E., Charlesworth, B. & Levin, B. R. Biological and biomedical implications of the co- evolution of pathogens and their hosts. Nat. Genet. 32, 569–577 (2002). 30. Gandon, S., Capowiez, Y., Dubois, Y., Michalakis, Y. & Olivieri, I. Local adaptation and gene-for-gene coevolution in a metapopulation model. Proc. R. Soc. B, Biol. Sci. 263, 1003–1009 (1996). 31. Teacher, A. G. F., Garner, T. W. J. & Nichols, R. A. Evidence for directional selection at a novel major histocompatibility class I marker in wild common frogs (Ranatemporaria) exposed to a viral pathogen (Ranavirus). PLoS ONE 4, e4616 (2009). 32. Landry, C. & Bernatcez, L. Comparative analysis of population structure across environments and geographical scales at major histocompatibility complex and microsatellite loci in Atlantic (salmo salar). Mol. Ecol. 10, 2525–2539 (2001). 33. Schierup, M. H. Te number of self-incompatibility alleles in a fnite, subdivided population. Genetics 149, 1153–1162 (1998). 34. Knafer, G. J., Grueber, C. E., Sutton, J. T. & Jamieson, I. G. Diferential patterns of diversity at microsatellite, MHC, and TLR loci in bottlenecked South Island saddleback populations. New Zealand Journal of Ecology 41, 98–106 (2017). 35. Beaumont, M. A. & Nichols, R. A. Evaluating loci for the use in the genetic analysis of population structure. Proc. R. Soc. B. 263, 1619–1636 (1996). 36. Burnham, K. P. & Anderson, D. R. Model Selection and Multimodel Inference: A Practical Information-Teoretic Approach 2nded. Springer. NewYork (2002).

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 12 www.nature.com/scientificreports/

37. Klein, J., Satta, Y., O’hUigin, C. & Takahata, N. Te molecular descent of the major histocompatibility complex. Annu. Rev. Immunol. 11, 269–295 (1993). 38. Ben Slimen, H. Phylogénie morphologique et moléculaire des lièvres d’Afrique du Nord du genre Lepus. PhD thesis, Faculty of Sciences of Tunis (2008). 39. Awadi, A., Suchentrunk, F., Makni, M. & Ben Slimen, H. Phylogenetic relationships and genetic diversity of Tunisian hares (Lepus sp. or spp., Lagomorpha) based on partial nuclear gene transferrin sequences. Genetica 144, 497–512 (2016). 40. Ben Slimen, H., Suchentrunk, F., Memmi, A. & Ben AmmarElgaaied, A. Biochemical genetic relationships among Tunisian hares (Lepus sp.), South African cape hares (L. capensis), and European brown hares (L. europaeus). Biochem.Genet. 43, 577–96 (2005). 41. Campos, J. L., Bellocq, J. Gd., Schaschl, H. & Suchentrunk, F. MHC class II DQA gene variation across cohorts of brown hares (Lepus europaeus) from eastern Austria: testing for diferent selection hypotheses. Mamm.Biol. 76, 251–257 (2011). 42. Goüy de Bellocq, J., Suchentrunk, F., Baird, S. & Schaschl, H. Evolutionary history of an MHC gene in two leporid species: characterisation of Mhc-DQA in the European brown hare and comparison with the European rabbit. Immunogenetics 61, 131–144 (2009). 43. Smith, S., Goüy de Bellocq, J., Suchentrunk, F. & Schaschl, H. Evolutionary genetics of MHC class II beta genes in the brown hare, Lepus europaeus. Immunogenetics 63, 743–751 (2011). 44. Klein, J. Origin of major histocompatibility complex polymorphism—Te trans-species hypothesis. Hum. Immunol. 19, 155–162 (1987). 45. Takahata, N., Satta, Y. & Klein, J. Polymorphism and balancing selection at major histocompatibility complex loci. Genetics 130, 925–938 (1992). 46. Wegner, K. M. Historical and contemporary selection of teleost MHC genes: did we leave the past behind? Journal of Fish Biology 73, 2110–2132 (2008). 47. Ben Slimen, H., Suchentrunk, F., Shahin, A. B. & Ben Ammar Elgaaied, A. Phylogenetic analysis of mtCR-1 sequences of Tunisian and Egyptian hares (Lepus sp. or spp., Lagomorpha) with diferent coat colours. Mamm. Biol. 72, 224–239 (2007). 48. Schaschl, H. et al. Sex-specifc selection for MHC variability in Alpine chamois. BMC Evolutionary Biology 12, 20 (2012). 49. Hill, R. E. & Hastie, N. D. Accelerated evolution in the reactive center regions of serine protease inhibitors. Nature 326, 96–99 (1987). 50. Promerová, M. et al. Evaluation of two approaches to genotyping major histocompatibility complex class I in a passerine-CE-SSCP and 454 pyrosequencing. Mol. Ecol. Res. 12(2), 285–92 (2012). 51. Klein, J. & Horejsi, V. Immunology. Blackwell Science Ltd. Oxford. UK (1997). 52. Zvinorova, P. I. et al. Prevalence and risk factors of gastrointestinal parasitic infections in goats in low-input low-output farming systems in Zimbabwe. Small Rumin.Res. 143, 75–83 (2016). 53. Pandey, V., Ndao, M. & Kumar, V. Seasonal prevalence of gastrointestinal nematodes in communal land goats from the Highveld of Zimbabwe. Vet. Parasitol. 51, 241–248 (1994). 54. Soberon, J. & Ceballos, G. Species Richness and Range Size of the Terrestrial Mammals of the World: Biological Signal within Mathematical Constraints. PLoS ONE 6, e19359 (2011). 55. Tallmon, D. A., Luikart, G. & Beaumont, M. A. Comparative evaluation of a new efective population size estimator based on approximate Bayesian computation. Genetics 167, 977–988 (2004). 56. Wang, J. A comparison of single-sample estimators of efective population sizes from genetic marker data. Molecular Ecology 25(19), 4692–4711 (2016). 57. Frankham, R., Ballou, J. D. & Briscoe, D. A. Introduction to Conservation Genetics. Cambridge University Press, Cambridge (2002). 58. Oliver, M. K. & Piertney, S. B. Selection maintains mhc diversity through a natural population bottleneck. Mol. Biol. Evol. 29, 1713–1720 (2012). 59. Flux, J. E. C. & Angermann, R. Hares and Jackrabbits inRabbits, Hares and Pikas. Status Survey and Conservation Action Plan (eds Chapman, J. A. & Flux, J. E. C.) 61–94. Information Press. Oxford. UK (1990). 60. Ben Slimen, H. et al. Population genetics of cape and brown hares (Lepus capensis and L. europaeus): a test of Petter´s hypothesis of conspecifcity. Biochem.Syst. Ecol. 36, 22–39 (2008). 61. Hofmann, R. S. & Smith, A. T. Order Lagomorpha in Mammal Species of the World 3rd ed. (eds Wilson, D. E. & Reeder, D. M.) 185–211. Te Johns Hopkins University Press. Baltimore (2005). 62. You, H. et al. Plant diversity in diferent bioclimatic zones in Tunisia. Journal of Asia-Pacifc Biodiversity 9, 56–62 (2016). 63. Zahran, M. A. Climate - Vegetation: Afro-Asian Mediterranean and Red Sea Coastal Land. Springer Verlag. Dordrecht (2010). 64. Stephans, M. & Donelly, P. A. comparison of bayesian methods for haplotype reconstruction from population genotype data. Am. J. Hum. Genet. 73, 1162–1169 (2003). 65. Hall, T. A.BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl. Acids. Symp. Ser. 41, 95–98 (1999). 66. Belkhir, K., Borsa, P., Chikhi, L., Raufaste, N. & Bonhomme, F. GENETIX 4.05, logiciel sous Windows TM pour la génétique des populations. LaboratoireGénome, Populations, Interactions, CNRS UMR 5171.Université Montpellier 2. Monpellier. France (1996–2004). 67. Rousset, F. GENEPOP'007: A complete re-implementation of the GENEPOP sofware for Windows and Linux. Mol. Ecol. Resour. 8, 103–106 (2008). 68. Klein, J. et al. Nomenclature for the major histocompatibility complexes of diferent species: a proposal. Immunogenetics 31, 217–9 (1990). 69. Goudet, J. FSTAT, version 2.9. 3, A program to estimate and test gene diversities and fxation indices. Lausanne University, Lausanne, Switzerland (2001). 70. Jost, L. GST and its relatives do not measure diferentiation. Mol. Ecol. 17, 4015–4026 (2008). 71. Peakall, R. & Smouse, P. E. GenAlEx 6.5: genetic analysis in Excel. Population genetic sofware for teaching and research-an update. Bioinformatics 28, 2537–2539 (2012). 72. Excofer, L. & Lischer, H. E. L. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol. Ecol. Resour. 10, 564–567 (2010). 73. Tioulouse, J., Chessel, D., Dolédec, S. & Olivier, J. Ade-4: a multivariate analysis and graphical display sofware. Statistics and Computing 7, 75–83 (1997). 74. Hammer, Ø., Harper, D. A. T. & Ryan, P. D. PAST: Paleontological statistics sofware package for education and data analysis. Palaeontologia Electronica 4(1), 9 (2001). 75. Piry, S., Luikart, G. & Cornuet, J. M. BOTTLENECK: a computer program for detecting recent reductions in the efective population size using allele frequency data. Journal of Heredity 90, 502–503 (1999). 76. Do, C. et al. Ne Estimator V2: reimplementation of sofware for the estimation of contemporary efective population size (Ne) from genetic data. Molecular Ecology Resources 14, 209–214 (2014). 77. Garrigan, D. & Hedrick, P. W. Perspective: detecting adaptive molecular polymorphism: lessons from the MHC. Evolution 57, 1707–1722 (2003). 78. Wilson, D. J. & McVean, G. Estimating diversifying selection and functional constraint in the presence of recombination. Genetics 172, 1411–1425 (2006).

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 13 www.nature.com/scientificreports/

79. Martin, D. P. et al. RDP3: A fexible and fast computer program for analyzing recombination. Bioinformatics 26, 2462–2463, https:// doi.org/10.1093/bioinformatics/btq467 (2010). 80. Pond, S. L. K., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. Mol. Biol. Evol. 23, 1891–1901 (2006). 81. Kosakovsky, P. S. L., Frost, S. D. W. & Muse, S. V. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–9 (2005). 82. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591 (2007). 83. Murrell, B. et al. Detecting individual sites subject to episodic diversifying selection. PLoS Genet. 8, e1002764 (2012). 84. Kosakovsky, P. S. L. & Frost, S. D. W. Datamonkey: rapid detection of selective pressure on individual sites of codon alignments. Bioinformatics 21, 2531–3 (2005). 85. Consuegra, S. et al. Rapid Evolution of the MH Class I Locus Results in Diferent Allelic Compositions in Recently Diverged Populations of Atlantic Salmon. Mol. Biol.Evol. 22, 1095–106 (2005). 86. Kuduk, K. et al. Evolution of major histocompatibility complex class I and class II genes in the brown bear. BMC Evol. Biol. 12, 197–10; 1186/1471-2148-12-197 (2012). 87. Yang, Z., Nielsen, R., Goldman, N. & Pedersen, A. M. Codon-substitution models for heterogeneous selection pressure at amino acid sites. Genetics 155, 431–449 (2000). 88. Antao, T., Lopes, A., Lopes, R. J., Beja-Pereira, A. & Luikart, G. LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method. BMC Bioinformatics 9, 323, https://doi.org/10.1186/1471-2105-9-323 (2008). 89. Bondinas, G., Moustakas, A. & Papadopoulos, G. Te spectrum of HLA-DQ and HLA-DR alleles, 2006: a listing correlating sequence and structure with function. Immunogenetics 59, 539–553 (2007). Acknowledgements We thank A. Haiden for supporting laboratory work. We thank Dr. Mark Gillingham (University of Ulm) and an anonymous reviewer for their comments and suggestions, which helped us to improve the manuscript. Author Contributions F.S. and A.A. conceived the experiments. A.A. and S.S. conducted the experiments. F.S., A.A., S.S. and F.K. analyzed the results. A.A. and H.B.S. wrote the paper. F.S. and M.M. reviewed the manuscript. All authors approved the fnal manuscript. Additional Information Supplementary information accompanies this paper at https://doi.org/10.1038/s41598-018-29657-3. Competing Interests: Te authors declare no competing interests. Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional afliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Cre- ative Commons license, and indicate if changes were made. Te images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not per- mitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

© Te Author(s) 2018

SCIEntIFIC REPOrtS | (2018)8:11514 | DOI:10.1038/s41598-018-29657-3 14