Genome-wide scan for runs of homozygosity identifies potential candidate associated with local adaptation in Valle del Belice sheep Salvatore Mastrangelo, Marco Tolone, Maria T. Sardina, Gianluca Sottile, Anna M. Sutera, Rosalia Di Gerlando, Baldassare Portolano

To cite this version:

Salvatore Mastrangelo, Marco Tolone, Maria T. Sardina, Gianluca Sottile, Anna M. Sutera, et al.. Genome-wide scan for runs of homozygosity identifies potential candidate genes associated with local adaptation in Valle del Belice sheep. Genetics Selection Evolution, BioMed Central, 2017, 49 (1), pp.84. ￿10.1186/s12711-017-0360-z￿. ￿hal-01635193￿

HAL Id: hal-01635193 https://hal.archives-ouvertes.fr/hal-01635193 Submitted on 14 Nov 2017

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Mastrangelo et al. Genet Sel Evol (2017) 49:84 DOI 10.1186/s12711-017-0360-z Genetics Selection Evolution

RESEARCH ARTICLE Open Access Genome‑wide scan for runs of homozygosity identifes potential candidate genes associated with local adaptation in Valle del Belice sheep Salvatore Mastrangelo1* , Marco Tolone1, Maria T. Sardina1, Gianluca Sottile2, Anna M. Sutera1, Rosalia Di Gerlando1 and Baldassare Portolano1

Abstract Background: Because very large numbers of single nucleotide polymorphisms (SNPs) are now available throughout the genome, they are particularly suitable for the detection of genomic regions where a reduction in heterozygosity has occurred and they ofer new opportunities to improve the accuracy of inbreeding (F) estimates. Runs of homozy- gosity (ROH) are contiguous lengths of homozygous segments of the genome where the two haplotypes inherited from the parents are identical. Here, we investigated the occurrence and distribution of ROH using a medium-dense SNP panel to characterize autozygosity in 516 Valle del Belice sheep and to identify the genomic regions with high ROH frequencies. Results: We identifed 11,629 ROH and all individuals displayed at least one ROH longer than 1 Mb. The mean value of F estimated from ROH longer than1 Mb was 0.084 0.061. ROH that were shorter than 10 Mb predominated. The highest and lowest coverages of Ovis aries ± (OAR) by ROH were on OAR24 and OAR1, respectively. The number of ROH per length displayed a specifc pattern, with higher values for the frst three chromo- somes. Both number of ROH and length of the genome covered by ROH varied considerably between animals. Two hundred and thirty-nine SNPs were considered as candidate markers that may be under directional selection and we identifed 107 potential candidate genes. Six genomic regions located on six chromosomes, corresponding to ROH islands, are presented as hotspots of autozygosity, which frequently coincided with regions of medium recombination rate. According to the KEGG database, most of these genes were involved in multiple signaling and signal transduc- tion pathways in a wide variety of cellular and biochemical processes. A genome scan revealed the presence of ROH islands in genomic regions that harbor candidate genes for selection in response to environmental stress and which underlie local adaptation. Conclusions: These results suggest that natural selection has, at least partially, a role in shaping the genome of Valle del Belice sheep and that ROH in the ovine genome may help to detect genomic regions involved in the determinism of traits under selection.

Background phenomena such as genetic drift, population bottlenecks, Autozygosity is the homozygous state of identical-by- mating of close relatives, natural and artifcial selection descent (IBD) alleles, which can result from several [1, 2]. Te increase in inbreeding (F) leads to diferent negative efects such as reduction in genetic variance, *Correspondence: [email protected] higher frequency of homozygous genotypes for del- 1 Dipartimento Scienze Agrarie, Alimentari e Forestali, Università degli eterious alleles with reduction in individual performance Studi di Palermo, 90128 Palermo, Italy (inbreeding depression) and lower population viability Full list of author information is available at the end of the article

© The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 2 of 10

[3]. Terefore, since F has been incriminated in reduced USA), which includes 54,241 SNPs. Raw signal intensi- ftness, there is a growing interest in characterizing and ties were converted into genotype calls using the Illumina monitoring autozygosity for an accurate estimation of GenomeStudio Genotyping Module v1.0 software by F . Traditionally, F is estimated based on pedigree infor- applying a no-call threshold of 0.15. mation. Te current availability of very large numbers Genotyping data were initially tested for quality using of single nucleotide polymorphisms (SNPs) throughout the above-mentioned software. Chromosomal coordi- the genome makes these markers particularly suitable nates for each SNP were obtained from the latest release for the detection of genomic regions where a reduction of the ovine genome sequence assembly Oar_v4.0. SNPs in heterozygosity has occurred and ofers the opportu- were fltered to exclude loci assigned to unmapped con- nity to estimate F more precisely at the genome level. In tigs and to sex chromosomes. Moreover, quality controls fact, an alternative approach for quantifying individual included the following criteria, i.e. a call frequency higher homozygosity that better refects IBD is based on runs than 0.95, a minor allele frequency (MAF) higher than of homozygosity (ROH). ROH are contiguous lengths 0.05, and a P value higher than 0.001 for Hardy–Wein- of homozygous segments of the genome where the two berg equilibrium (HWE). SNPs that did not satisfy these haplotypes inherited from the parents are identical [4]. criteria were excluded. Animals with more than 5% of Tese haplotypes are most likely identical because the missing SNPs were also removed from further analyses. parents inherited them from a common ancestor. Nowa- days, among several alternative methods to estimate Measure of runs of homozygosity inbreeding, F estimated from ROH (FROH) is considered Runs of homozygosity (ROH) were estimated for each as the most powerful and makes it possible to distinguish individual using PLINK [13]. No pruning was performed between recent and ancient inbreeding [2]. ROH may be based on linkage disequilibrium (LD), but the minimum used to identify regions that have an unfavorable efect on length of a ROH was set to 1 Mb to exclude short ROH a phenotype when they are in the homozygous state [5], that derived from LD. Te following criteria to defne but also to detect associations between traits of economic ROH were used: (1) one missing SNP was allowed in a interest and genes present in these regions [6]. Indeed, ROH and up to one possible heterozygous genotype; (2) given the stochastic nature of recombination, the occur- the minimum number of SNPs that constituted the ROH rence of ROH is highly heterogeneous across the genome, (l) was calculated with the method proposed by Lencz and hotspots of ROH across a large number of samples et al. [14], to minimize the number of false positive ROH: may be indicative of selective pressure [7], which leads to logeα/ns × ni the fxation of favorable alleles in the population. Identi- l = , loge(1 − het) fcation of genomic regions that display a reduced level of polymorphism or no polymorphism (selective sweeps) where α is the percentage of false positive ROH (set to may indicate occurrence of recent selection and may 0.05 in this study), ns is the number of SNPs per individ- help to detect QTL and candidate genes. ROH have been ual, ni is the number of individuals, het is the heterozygo- studied in humans [4], cattle [8–10], pigs [11, 12], but sity across all SNPs; (3) a minimum density of one SNP less commonly in other livestock species, such as sheep. over 100 kb; and (4) a maximum gap between consecu- Here, we investigated the occurrence and the distribution tive SNPs of 1 Mb. of ROH in Valle del Belice sheep using a medium-density SNP genotyping array, in order to characterize autozygo- Genetic diversity and genomic inbreeding coefcients sity and identify the genomic regions with high ROH fre- PLINK [13] was also used to estimate basic genetic diver- quencies, namely ROH islands or ROH hotspots. sity indices including observed and expected heterozygo- sity (Ho and He, respectively), the inbreeding coefcient Methods (F) based on the diference between the observed and Ethics statement expected numbers of homozygous genotypes, and minor In this study, the procedures for which animal samples allele frequency (MAF ≥ 0.05). Te molecular coancestry were collected followed the recommendation of directive coefcient (also called kinship) ( fij) between individuals 2010/63/EU. i and j was also estimated [15]. Moreover, the contem- porary efective population size (Ne) was estimated using Samples, genotyping and data editing NEESTIMATOR v.2 [16] according to the random mat- Blood samples were collected from 516 individu- ing model of the LD method. Inbreeding coefcient (F) als (502 females and 14 males) of the Valle del Belice based on ROH (FROH) for each animal was calculated as: breed. Genomic DNA was hybridized with the Illumina LROH FROH = , OvineSNP50 K BeadChip (Illumina Inc., San Diego, CA, Laut Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 3 of 10

where LROH is the total length of all ROH in the genome provided by NCBI. Te function of these genes and path- of an individual while Laut refers to the length of the auto- ways in which they are involved were assessed using Pan- somal genome covered by SNPs included in the array ther software [20] and the Kyoto Encyclopedia of Genes (2644.30 Mb). Pearson’s correlation between the two and Genomes (KEGG, http://www.genome.jp/kegg/) measures of inbreeding (F and FROH) was calculated. database. Finally, to investigate the biological function of each annotated within ROH islands, we conducted Distribution of runs of homozygosity an extensive accurate literature search. Te mean number of ROH per individual (MNROH), the average length of ROH (ALROH) and the total number of Results and discussion ROH per animal were estimated. Te percentage of chro- Analysis of ROH based on genomic data can help to mosomes covered by ROH was also calculated. First, the describe the history of the population to which an indi- mean ROH length was calculated by summing all ROH vidual belongs and can also reveal the level of inbreed- (Mb) on a chromosome (OAR for Ovies aries chromo- ing within populations, recent population bottlenecks some) and dividing by the number of individuals that or signatures of directional selection. However, to date, had ROH on that OAR; the mean ROH length was then literature on ovine ROH is scarce, although sheep repre- divided by the length of the chromosome in Mb. In addi- sent excellent genetic resources that contribute to local tion, chromosomal FROH (FROHOAR) values were also esti- economy. To the best of our knowledge, this study is the = mated, as FROHOAR LROHOAR/LOAR, where LROHOAR frst efort to describe the occurrence and distribution is the total length of an individual’s ROH for each OAR of ROH in a large number of individuals using medium- and LOAR is the length of each OAR covered by the SNPs density SNP arrays in dairy sheep. involved. General statistics Detection of common runs of homozygosity After fltering, the fnal number of samples and SNPs To identify the genomic regions that were most com- retained for analyses were 497 and 38,815, respectively. monly associated with ROH, the percentage of the occur- Te average observed (Ho) and expected (He) heterozygo- rences of a SNP in ROH was calculated by counting the sities were 0.373 ± 0.118 and 0.377 ± 0.117, respectively, number of times the SNP was detected in those ROH and the average MAF was 0.288 ± 0.128. Tese values across individuals, and this was plotted against the posi- were consistent with the range reported by other authors tion of the SNP along the chromosome. Tis percentage for southern European sheep [21] and in previous studies F had to be higher than 20% to be an indication of a pos- on this breed [22]. A low and positive (0.011 ± 0.073) sible hotspot of ROH in the genome. A series of adjacent was observed, which suggests that the sampled animals SNPs with a proportion of ROH occurrences higher than are not highly related [6]. Tis result was also confrmed the 20% threshold formed long genomic regions, called by the kinship coefcients, which were positive between ROH islands. all pairs of animals (0.498 ± 0.060). Te contemporary To verify if recombination rate afected ROH length, efective population size (Ne) was about 45, which indi- ROH were mapped using the genetic SNP coordinates cates a high risk of inbreeding and reduced genetic diver- (position in the linkage map) reported by Johnston et al. sity. Recently, a study on Australian sheep breeds based [17]. Te average recombination rate (cM/Mb) was esti- on SNP data and using the same method [23] reported mated in 500-kb intervals for the chromosomes with the large Ne for all investigated breeds (ranging from 140 to Ne highest FROHOAR values and also within each ROH hot- 348), whereas another study reported a small [25] for spot, and the percentage of occurrences of a SNP in ROH the Sicilian Barbaresca breed [24]. was plotted against recombination rate for the aforemen- tioned chromosomes. In addition, genetic mapping of Distribution of runs of homozygosity ROH lengths was used to infer demographic events by Because strong LD, typically extending up to about applying the method proposed by Tompson et al. [18] 100 kb, is common throughout the ovine genome [22], and recently reported by Purfeld et al. [19] in a study on short tracts of homozygosity are very prevalent. To meat sheep breeds, following the same four ROH length exclude these short and very common ROH, the mini- categories. mum length for ROH was set at 1 Mb, with a minimum Genomic coordinates for all identifed selected regions number of 40 SNPs. Moreover, we estimated ROH longer were used to annotate genes that were either entirely or than 1 Mb with one heterozygous SNP in order to avoid partially included within each selected region using the underestimation of long ROH. In fact, for livestock popu- Genome Data Viewer (https://www.ncbi.nlm.nih.gov/ lations, not allowing for heterozygous SNP genotypes in genome/gdv/browser/?context=gene&acc=101104604) a ROH, as was advocated for human populations, is not Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 4 of 10

adequate because they have much higher levels of autozy- is frequently unaccounted for in the recorded pedigree of gosity and therefore longer ROH [25]. Moreover, because an individual. genotyping errors in SNP chip data can occur, it is more Following a recent study on the distribution of ROH reasonable to allow one heterozygous call per ROH. in sheep [19], we mapped ROH using their genetic posi- In total, 11,629 ROH were identifed with a MN­ ROH tions and used the abundance of ROH in diferent length of 24.20 ranging from 1 to 66 ROH per animal. Similar classes to qualitatively evaluate the historical demogra- results were reported by Al-Mamun et al. [23]. All Valle phy of the breed. Te time to the most recent common del Belice individuals displayed at least one ROH longer ancestor (TMRCA) was estimated for four diferent cat- than 1 Mb. Te mean value of FROH for ROH longer than egories. Te results show a substantial increase in the 1 Mb was equal to 0.084 ± 0.061 and ranged from 0.002 abundance of ROH from 10 to 20 generations ago to less to 0.339. Te coefcient of variation (72.6%) indicated than 5 generations ago (see Additional fle 1: Figure. S1) that autozygosity levels in this breed varied largely and in the Valle del Belice breed, which suggests a recent the correlation between F and FROH was high (0.97, p decrease in the efective population size, and that the value < 0.001). Tese results corroborate previous stud- individuals involved in our study experienced both recent ies in cattle [10] and sheep [19]. For the three animals and historical autozygosity events. Similar results were that had the highest level of homozygosity, respectively reported for the Belclare breed [19]. Terefore, it is likely 807.85, 827.49, and 895.88 Mb of their genome were clas- that the long ROH detected in the Valle del Belice breed sifed as ROH, which is close to 30% of the genome. Te are signatures of the extended use of a few rams within least inbred animal presented only one ROH of 4.95 Mb. herds and extensive mating between relatives. In fact, in An average ROH length of 9.51 Mb was estimated across the Sicilian farming system, natural mating is the com- all the autosomes but the lengths of ROH varied consid- mon practice and the exchange of rams among focks is erably ranging from 2.15 to 127.72 Mb. It is likely that quite unusual, which results in an increase in inbreeding the values for total ROH length and number reported within a herd and a consequent decrease in variability in our work are underestimated because many ROH [29]. Indeed, the accumulation of long ROH in individu- remain undetected when using a medium-density SNP als could have consequences on the biological ftness. On panel [26]. Te genomic distribution, length and abun- the one hand, long ROH are enriched in genomic regions dance of ROH constitute a valuable source of information that carry deleterious mutations and there is a strong lin- about the demographic history of livestock species [27]. ear relationship between the genomic fraction of ROH Te distribution of ROH according to size is in Fig. 1, in and the number of individuals that carry deleterious which the length in Mb was log-transformed. Our results homozygous mutations [30]. For example, a large number show that ROH shorter than 10 Mb predominated. of genomic regions that contain long ROH were shown to Because recombination events interrupt long chromo- have unfavorable associations with milk yield in Holstein some segments, long ROH (~ 10 Mb) arise as a result of cattle, probably because of inbreeding depression [5]. On recent inbreeding (up to fve generations ago). In con- the other hand, short ROH are subject to selection for a trast, short ROH (~ 1 Mb) are produced by IBD genomic longer period of time and recombination has had more regions from old ancestors and are indicative of more time to trim down ROH that are the target of selection ancient relatedness (up to 50 generations ago) [28], which sweeps [30]. In addition, it should be underlined that not all short ROH are due to IBD and it is possible that some short ROH originated from identity-by-status (IBS) due to localized low recombination rates, genetic drift and high LD in unrelated ancestors [19]. Te relationship between the number of ROH and the length of the genome covered by ROH per individual var- ies considerably among animals (see Fig. 2). For example, animals with the same cumulative length of ROH can have a diferent number of ROH with dif- ferent lengths, because of their diferent distances from the common ancestor [31]. As mentioned above, Valle del Belice sheep have a large number of short ROH with some extreme animals that have ROH that cover 600 Mb Fig. 1 Distribution of the number of runs of homozygosity (ROH) of or more of the genome (Fig. 2). Similar distributions were diferent lengths (Mb). The values of length in Mb were transformed also observed in other livestock species, such as cattle in ­log10 Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 5 of 10

Fig. 2 Total number of runs of homozygosity (ROH) longer than 1 Mb and total length of genome (Mb) covered by ROH segments per individual

[10, 30] and pigs [32]. To identify outlying individuals more precisely, the observed total length of ROH was compared with a randomly generated distribution of ROH lengths among individuals. We carried out a simu- lation by sampling with replacement n values from the distribution of ROH (Fig. 1). We compared the observed total length of ROH to the simulated lengths and an empirical p-value was computed. Te results are sum- marized in Figure S2 (see Additional fle 2: Figure S2), in which some individuals are indicated in red because their p-values rejected the null hypothesis and thus, this highlights that some outlying individuals, with very long ROH, can be identifed.

Runs of homozygosity per chromosome Figure 3 shows the percentage of OAR chromosomes cov- ered by ROH and the number of ROH per chromosome. Fig. 3 Number of runs of homozygosity (ROH) longer than 1 Mb per Te highest coverage by ROH was observed on OAR24 chromosome (bars) and average percentage of each chromosome (25.54%), whereas the lowest was on OAR1 (10.68%). Te covered by ROH (line) number of ROH per chromosome displayed a specifc pattern with the larger numbers found for the frst three chromosomes, a number that tended to decrease with chromosome length, and the smallest number on OAR the mean FROHOAR values followed the same pattern 24 with 172 segments. Regarding the number of ROH per as those computed for the whole genome and difered OAR, our results confrm those reported in other sheep between chromosomes. In particular, FROHOAR values breeds [23], whereas for the percentage of coverage per were highest for OAR2, 4, 11 and 23. chromosome, they are quite diferent which indicates Genomic regions that are in close proximity to quanti- that it may be breed-specifc. tative trait loci (QTL) subjected to selection are expected One of the main advantages of genomic coefcients is to show diferences, such as reduced nucleotide diversity, the availability of chromosomal inbreeding coefcients and tend to generate ROH islands, with a high level of [10]. FROHOAR estimates are reported in Fig. 4. In general, homozygosity around these genomic regions compared Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 6 of 10

Fig. 4 Distribution of inbreeding coefcients (FROHOAR) based on runs of homozygosity (ROH) for each chromosome (OAR)

to the rest of the genome. In fact, signifcant QTL for the occurrence of SNPs in ROH was plotted against the milk production traits have been reported on these chro- recombination rate for the aforementioned chromo- mosomes in sheep i.e.: on OAR2, García-Gámez et al. somes with the highest FROHOAR value (see Additional [33] identifed signifcant QTL for milk yield (ID = 57718 fle 4: Figure S7). Figure S7 shows that the highest levels and ID = 57,737), milk (ID = 57719) and fat of autozygosity were often found within regions with a (ID = 57720 and ID = 57736) yields; on OAR4, QTL for low recombination rate, as reported in a previous study milk protein yield [34] (ID = 57687) and milk fat per- [19]. However, genomic regions that harbor loci involved centage [33] (ID = 57721) were localized; on OAR11, in general disease resistance, such as the major histo- Moioli et al. [35] reported QTL for milk fat (ID = 66000 compatibility complex (MHC) for which a high level of and ID = 66001) and protein (ID = 66002) percentages, genetic diversity ensures that the population can deal Jonas et al. [36] detected highly signifcant QTL for milk with potential new disease challenges, may show low yield (ID = 16017), milk yield persistency (ID = 16018) levels of inbreeding [39]. In fact, the lowest values of and milk protein yield, whereas García-Fèrnandez et al. FROHOAR were reported for OAR14, on which signifcant [37] reported QTL for fatty acid content (ID = 13896, genomic regions associated with resistance to nema- ID = 13903, ID = 13904); and on OAR23, Gutiérrez- tode infection have been detected [40], and for OAR20, Gil et al. [38] identifed signifcant QTL for milk yield where the MHC is localized. Tese results indicate that (ID = 13906) and milk fat yield (ID = 13907) on OAR23. genome-based measures of inbreeding through ROH are In Figures S3, S4, S5 and S6 (see Additional fle 3: Figures able to detect diferences between chromosomal regions, S3, S4, S5 and S6), the occurrence of SNPs in ROH was providing a more detailed picture of the genetic diversity. plotted against the genomic regions of the above-men- Terefore, it might be possible to focus on the specifc tioned QTL. Tese fgures show that, except for OAR4, control of inbreeding on certain genomic regions because high levels of autozygosity on these chromosomes occur the level of diversity in those regions is already low (due in the genomic region to which the QTL were mapped. to previous selection processes) or because they harbor Terefore, the diferences in FROHOAR patterns detected genes for which populations with higher levels of diver- in our study may also highlight a specifc efect of selec- sity exhibit higher ftness (e.g. MHC genes) [41]. tion on these chromosomes for some milk production traits. However, it is also important to mention that the Genomic regions within runs of homozygosity ROH identifed in these genomic regions may be partly To identify the genomic regions that were most com- explained by a reduced recombination rate. Indeed, monly associated with ROH in the Valle del Belice breed, although the ROH are more or less distributed along the percentage of SNPs in ROH was assessed by analyzing the chromosome, ROH hotspots were mostly found in the frequency of a SNP occurring in those ROH across regions with a low recombination rate [19, 27]. In order diferent individuals (%), and this was plotted against the to verify this distribution in the Valle del Belice breed, position of the SNP along the chromosome (Fig. 5). Te Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 7 of 10

Fig. 5 Manhattan plot of occurrences (%) of a SNP in ROH across individuals

results show that ROH frequencies vary largely with the region with the strongest signal was found on OAR3 and position on the genome. On OAR25, we found 37 non- starts at position 99,745,362 bp (rs399921230) and ends consecutive SNPs that were not included in a ROH. at position 105,480,746 bp (rs399868290). Generally, as Terefore, in this population, these SNPs are in the het- reported above, the existence of ROH hotspots can be erozygous state and indicate a region with a high level of partly explained by the reduced variation in recombina- heterozygosity. We set a threshold of 20% for considering tion rate. Indeed, Purfeld et al. [19] reported ROH hot- a possible ROH hotspot in the genome. Two hundred and spots within regions with very low recombination rates thirty-nine SNPs, i.e. less than 1% of all SNPs, were con- (from 0.00 to 0.82). Te ROH hotspots reported in our sidered as candidate SNPs that may be under directional study frequently coincided with regions with a higher selection (see Additional fle 5: Table S1). Tese adjacent recombination rate (from 0.47 to 1.64) (Table 1). Tere- SNPs were merged into genomic regions and considered fore, these results support the hypothesis that ROH pat- as indicators of potential autozygosity islands, defned as terns are not solely the result of demography and instead genomic regions with extreme ROH frequency based on harbor targets of selection [19, 30]. the Manhattan plot (Fig. 5). SNP rs411425463 on OAR3 We also checked if these regions of high homozygo- was the most frequent SNP detected in ROH (152 occur- sity overlapped with putative selection signatures in rences). Six genomic regions located on six chromo- sheep. We found that only the ROH on OAR2 partially somes (OAR2, 3, 4, 10, 11 and 23), presented hotspots overlapped with a region under climate-associated selec- of autozygosity (Table 1). Te length of these regions tion, which spanned two genes (MELK and GNE) [42] ranged from 0.10 Mb on OAR4 to 6.60 Mb on OAR3. Te and with a selection signature detected in the Comisana

Table 1 List of genomic regions of extended homozygosity detected in Valle del Belice sheep and average recombination rate (cM/Mb) within each hotspot OAR Start (bp) End (bp) Length (bp) SNPs Genes cM/Mb

2 50,616,834 52,106,037 1,489,203 16 15 0.98 3 99,745,362 105,807,623 6,062,261 98 63 0.47 4 89,052,028 89,155,631 103,603 4 1 1.28 10 4,311,072 9,712,929 5,401,857 73 1 0.51 11 22,251,409 22,847,898 596,489 12 16 0.76 23 46,846,216 49,422,183 2,575,967 36 11 1.64

OAR ovine chromosome, SNPs number of SNPs in each genomic region, Genes number of genes in each genomic region = = = Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 8 of 10

breed [43]. Within all the genomic regions with a high the activity of its promoter was recently associated with level of autozygosity, we identifed 107 genes (Table 1). heat stress [51], LYG1 and LYG2, two lysozyme g genes Table S2 (see Additional fle 6: Table S2) provides the (that encode an antibacterial enzyme) with an important chromosome position, start and end, full names and role in innate immunity in vertebrate and non-vertebrate functions for all annotated genes. We found that sev- species [52], CNGA3 which is associated with achro- eral of the SNPs in ROH occurred in regions with few matopsia in sheep [53], ANKRD23 and ACOXL which are genes. In fact, some of the identifed regions, such as that involved in fatty acid and energy metabolism [54] and fat on OAR10, contain only one annotated gene, although metabolism in pigs [55], respectively. On OAR11, we also it is longer than 5 Mb, either because annotation of the identifed SERPINF1 and SERPINF2, which are involved ovine genome is still incomplete or the genomic region is in several biological and metabolic processes such as positioned outside a coding region. Tis result may also regulation of infammatory response [56]. On OAR23, refect selection acting on uncharacterized regulatory the ROH hotspot included the endothelial lipase (LIPG) regions or simply fxation of non-coding DNA by genetic gene, which plays a role in the reverse cholesterol trans- drift due to the absence of any selection [44]. Accord- port (RCT) pathway that is a major component of lipid ing to Panther analysis and the KEGG database, most of homeostasis afecting lipid phenotypes, such as difer- the genes were involved in multiple signaling and signal ent fat depots, fatty acid compositions and overall body transduction pathways of a wide variety of cellular and growth [57]. SMAD2 and SMAD7 were also detected on biochemical processes. OAR23, which code for a group of molecules that func- tion as intracellular signal transducers downstream of the Candidate genes within runs of homozygosity receptors of the TGF-β superfamily [58]. None of these In this paper, we do not discuss in detail all the genomic candidate genes overlapped with previously published regions associated with ROH, but focus on some selected ovine QTL. Based on the literature search conducted for regions that show associations with several specifc traits this study, only a few of the genomic regions that har- related to livestock breeding. Two genes were identifed bor candidate genes known to afect specifc production within the ROH on OAR2, i.e. CLTA associated with traits were reported in previous studies in sheep. Moreo- prion protein deposition in sheep [45] and GNE, which ver, in this study on the Valle del Belice breed, we did not is important for the metabolism of sialylated oligosac- identify important candidate genes for milk production charides in bovine milk [46]. On OAR3, we identifed traits in sheep in the detected ROH islands, such as the an interesting gene, NPAS2, which encodes a protein casein cluster, and the DGAT1 or ACACA genes, which that is part of the helix-loop-helix family of transcrip- may be due to statistical or biological factors. tion factors, i.e. an essential component of the circadian Generally, ROH patterns tend to difer between breeds clock [47]. Te circadian clock, an internal timing sys- [59]. Analysis of the distribution of ROH within a breed tem, regulates various physiological processes through can provide insight into the efect of selection on the the generation of about 24-h circadian rhythms of gene genome over varying periods of time and determine the expression, which translate into metabolic and behavioral direction of selection [30]. Te Valle del Belice breed is circadian rhythms. It acts as an important regulator of a subjected to limited breeding selection programs for wide range of physiological functions, including metabo- milk production traits, but shows excellent adaptability lism, body temperature, blood pressure, endocrine and to local environments, sometimes with harsh conditions. immune functions [47]. Environmental variables such Tus, it is not surprising that the reported ROH islands as photoperiod, heat, stress, nutrition and other external spanned several candidate genes, which infuence traits factors have profound efects on the quality and quantity that are associated with adaptability in these environ- of milk. How environment interacts with genotype to ments and with the regulation of immune responses. In impact milk production is not known, but some evidence fact, our fndings indicate that the genomic regions that suggests that circadian clocks play an important role display autozygosity in the Valle del Belice breed are [47, 48]. Approximately 7% of the genes expressed dur- mostly linked to selection in response to environmental ing lactation have circadian patterns including core clock stress as a result of local adaptation, and less to selection and metabolic genes [48]. Indeed, NPAS2 was recently for milk production. reported as a candidate gene for milk traits and SNPs in this gene are located within a reported QTL region Conclusions for milk fat yield on chromosome 11 in cow [49]. Other In this study, we investigated the occurrence and the dis- putative candidate genes identifed on OAR3 included: tribution of ROH in the genome of Valle del Belice sheep. ADRA2B which plays an important role in vasoconstric- Autozygosity levels varied largely in this breed, which has tion and blood pressure regulation [50], PDCL3 for which experienced both recent and historical inbreeding events. Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 9 of 10

Several genes within ROH islands are associated with Received: 3 April 2017 Accepted: 7 November 2017 milk production and immune responses. Tese results suggest that natural selection has, at least partially, a role in shaping the genome of the Valle del Belice breed and that, in sheep, the analysis of ROH may contribute to References detect genomic regions that are involved in the determin- 1. Curik I, Ferenčaković M, Sölkner J. Inbreeding and runs of homozygosity: a possible solution to an old problem. Livest Sci. 2014;166:26–34. ism of traits of economic importance. 2. Keller MC, Visscher PM, Goddard ME. Quantifcation of inbreeding due to distant ancestors and its detection using dense single nucleotide Additional fles polymorphism data. Genetics. 2011;189:237–49. 3. Ouborg NJ, Pertoldi C, Loeschcke V, Bijlsma R, Hedrick PW. Conserva- Additional fle 1: Figure S1. Mean sum of runs of homozygosity (ROH) tion genetics in transition to conservation genomics. Trends Genet. per animal estimated within four diferent generation categories. ROH 2010;26:177–87. were mapped according to their genetic positions (i.e. linkage map 4. Gibson J, Morton NE, Collins A. Extended tracts of homozygosity in positions). ROH length (l cM) within each category was determined using outbred human populations. Hum Mol Genet. 2006;15:789–95. 100/2 g, replacing g with the number of generations of interest. 5. Pryce JE, Haile-Mariam M, Goddard ME, Hayes BJ. Identifcation of genomic regions associated with inbreeding depression in Holstein and Additional fle 2: Figure S2. Total number of runs of homozygosity Jersey dairy cattle. Genet Sel Evol. 2014;46:71. (ROH) longer than 1 Mb and total length of genome (Mb) covered by ROH 6. Szmatola T, Gurgul A, Ropka-Molik K, Jasielczuck I, Zabek T, Bugno- segments per individual. Observed (black) vs simulated (red) data. Poniewierska M. Charateristics of runs of homozygosity in selected cattle Additional fle 3: Figures S3, S4, S5, S6. Plot of SNP occurrences (%) in breeds maintained in Poland. Livest Sci. 2016;188:72–80. ROH against the genomic regions of QTL for OAR chromosomes with the 7. Zavarez LB, Utsunomiya YT, Carmo AS, Neves HH, Carvalheiro R, highest inbreeding coefcient (OAR 2, 4, 11, 23). Ferencaković M, et al. Assessment of autozygosity in Nellore cows (Bos indicus) through high-density SNP genotypes. Front Genet. 2015;6:5. Additional fle 4: Figure S7. Plot of SNP occurrences in ROH against 8. Marras G, Gaspa G, Sorbolini S, Dimauro C, Ajmone-Marsam P, Valen- recombination rate. Recombination rate is the solid red line and the tini A, et al. Analysis of runs of homozygosity and their relationship occurrence of a SNP in a ROH is represented by blue dots. (A) OAR2, with inbreeding in fve cattle breeds farmed in Italy. Anim Genet. (B) OAR4, (C) OAR11, and (D) OAR23. Recombination rate (cM/Mb) was 2014;46:110–21. estimated every 500 kb. 9. Ferenčaković M, Hamzić E, Gredler B, Solberg TR, Klemetsdal G, Curik Additional fle 5: Table S1. List of 239 SNPs considered as candidate I, et al. Estimates of autozygosity derived from runs of homozygosity: markers under directional selection in the Valle del Belice sheep breed. empirical evidence from selected cattle populations. J Anim Breed Genet. 2013;130:286–93. Additional fle 6: Table S2. List of 107 potential candidate genes under 10. Mastrangelo S, Tolone M, Di Gerlando R, Fontanesi L, Sardina MT, Porto- directional selection in the Valle del Belice sheep breed. lano B. Genomic inbreeding estimation in small populations: evalua- tion of runs of homozygosity in three local dairy cattle breeds. Animal. 2016;10:746–54. 11. Silió L, Rodríguez MC, Fernández A, Barragán C, Benítez R, Óvilo C, et al. Authors’ contributions Measuring inbreeding and inbreeding depression on pig growth from SM conceived, planned and coordinated the study and drafted the manu- pedigree or SNP-derived metrics. J Anim Breed Genet. 2013;130:349–60. script; RDG and AMS contributed with generation of data; SM, MTS, GS, MT 12. Ai H, Huang L, Ren J. Genetic diversity, linkage disequilibrium and selec- and BP analyzed the data and performed the statistical analysis. All authors tion signatures in Chinese and Western pigs revealed by genome-wide discussed the results, made suggestions and corrections. All authors read and SNP markers. PLoS One. 2013;8:e56001. approved the fnal manuscript. 13. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al. PLINK: a tool set for whole genome association and population-based Author details linkage analyses. Am J Hum Genet. 2007;81:559–75. 1 Dipartimento Scienze Agrarie, Alimentari e Forestali, Università degli Studi di 14. Lencz T, Lambert C, DeRosse P, Burdick KE, Morgan TV, Kane JM, et al. Runs 2 Palermo, 90128 Palermo, Italy. Dipartimento di Scienze Economiche, Azien- of homozygosity reveal highly penetrant recessive loci in schizophrenia. dali e Statistiche, Università degli Studi di Palermo, 90128 Palermo, Italy. Proc Natl Acad Sci USA. 2007;104:19942–7. 15. Astle W, Balding DJ. Population structure and cryptic relatedness in Acknowledgements genetic association studies. Stat Sci. 2009;24:451–71. The authors would like to thank two anonymous referees for valuable com- 16. Do C, Waples RS, Peel D, Macbeth GM, Tillett BJ, Ovenden JR. NeEstima- ments, which helped to improve the manuscript. This research was fnanced tor V2: re-implementation of software for the estimation of contempo- by PON02_00451_3133441, CUP: B61C1200076005 funded by MIUR. rary efective population size Ne from genetic data. Mol Ecol Resour. 2014;14:209–14. Competing interests 17. Johnston SE, Bérénos C, Slate J, Pemberton JM. Conserved genetic The authors declare that they have no competing interests. architecture underlying individual recombination rate variation in a wild population of Soay sheep (Ovis aries). Genetics. 2016;203:583–98. Ethics approval and consent to participate 18. Thompson EA. Identity by descent: variation in meiosis, across genomes, Not applicable. and in populations. Genetics. 2013;194:301–26. 19. Purfeld DC, McParland S, Wall E, Berry DP. The distribution of runs of Availability of data and materials homozygosity and selection signatures in six commercial meat sheep The datasets used and analysed during this study are available on request breeds. PLoS One. 2017;12:e0176780. from the corresponding author. 20. Mi H, Muruganujan A, Thomas PD. PANTHER in 2013: modeling the evolution of gene function, and other gene attributes, in the context of Publisher’s Note phylogenetic trees. Nucleic Acids Res. 2013;41:D377–86. Springer Nature remains neutral with regard to jurisdictional claims in pub- 21. Kijas JW, Lenstra JA, Hayes B, Boitard S, Porto-Neto LR, Cristobal MS, et al. lished maps and institutional afliations. Genome-wide analysis of the world’s sheep breeds reveals high levels of historic mixture and strong recent selection. PLoS Biol. 2012;10:e1001258. Mastrangelo et al. Genet Sel Evol (2017) 49:84 Page 10 of 10

22. Mastrangelo S, Di Gerlando R, Tolone M, Tortorici L, Sardina MT, Portolano 43. Fariello MI, Servin B, Tosser-Klopp G, Rupp R, Moreno C, San Cristobal M, B. Genome wide linkage disequilibrium and genetic structure in Sicilian et al. Selection signatures in worldwide sheep populations. PLoS ONE. dairy sheep breeds. BMC Genet. 2014;15:108. 2014;9:e103813. 23. Al-Mamun HA, Clark SA, Kwan P, Gondro C. Genome-wide linkage 44. Qanbari S, Gianola D, Hayes B, Schenkel F, Miller S, Moore S, et al. Applica- disequilibrium and genetic diversity in fve populations of Australian tion of site and haplotype-frequency based approaches for detecting domestic sheep. Genet Sel Evol. 2015;47:90. selection signatures in cattle. BMC Genomics. 2011;12:318. 24. Mastrangelo S, Portolano B, Di Gerlando R, Ciampolini R, Tolone M, Sar- 45. Filali H, Martín-Burriel I, Harders F, Varona L, Hedman C, Mediano DR, et al. dina MT. Genome-wide analysis in endangered populations: a case study Gene expression profling of mesenteric lymph nodes from sheep with in Barbaresca sheep. Animal. 2017;11:1107–16. natural scrapie. BMC Genomics. 2014;15:59. 25. Ferenčaković M, Solkner J, Curik I. Estimating autozygosity from high- 46. Wickramasinghe S, Hua S, Rincon G, Islas-Trejo A, German JB, Lebrilla CB, throughput information: efects of SNP density and genotyping errors. et al. Transcriptome profling of bovine milk oligosaccharide metabolism Genet Sel Evol. 2013;45:42. genes using RNA-sequencing. PLoS One. 2011;6:e18895. 26. Manunza A, Noce A, Serradilla JM, Goyache F, Martínez A, Capote J, et al. 47. Casey TM, Plaut K. Lactation Biology Symposium: circadian clocks A genome-wide perspective about the diversity and demographic his- as mediators of the homeorhetic response to lactation. J Anim Sci. tory of seven Spanish goat breeds. Genet Sel Evol. 2016;48:52. 2012;90:744–54. 27. Bosse M, Megens HJ, Madsen O, Paudel Y, Frantz LA, Schook LB, et al. 48. Plaut K, Casey T. Does the circadian system regulate lactation? Animal. Regions of homozygosity in the porcine genome: consequence 2012;6:394–402. of demography and the recombination landscape. PLoS Genet. 49. Ibeagha-Awemu EM, Peters SO, Akwanji KA, Imumorin IG, Zhao X. High 2012;8:e1003100. density genome wide genotyping-by-sequencing and association 28. Howrigan DP, Simonson MA, Keller MC. Detecting autozygosity through identifes common and low frequency SNPs, and novel candidate genes runs of homozygosity: a comparison of three autozygosity detection infuencing cow milk traits. Sci Rep. 2016;6:31109. algorithms. BMC Genomics. 2011;12:460. 50. Muszkat M, Kurnik D, Solus J, Sofowora GG, Xie HG, Jiang L, et al. Variation 29. Mastrangelo S, Sardina MT, Riggio V, Portolano B. Study of polymorphisms in the α2B-adrenergic receptor gene (ADRA2B) and its relationship to in the promoter region of ovine β-lactoglobulin gene and phylogenetic vascular response in vivo. Pharmacogenet Genomics. 2005;15:407–14. analysis among the Valle del Belice breed and other sheep breeds consid- 51. Krzemień-Ojak Góral A, Joachimiak E, Filipek A, Fabczak H. Interaction ered as ancestors. Mol Biol Rep. 2012;39:745–51. of a novel chaperone PhLP2A with the heat shock protein Hsp90. J Cell 30. Kim ES, Cole JB, Huson H, Wiggans GR, Van Tassell CP, Crooker BA, et al. Biochem. 2016;118:420–9. Efect of artifcial selection on runs of homozygosity in US Holstein cattle. 52. Irwin DM, Biegel JM, Stewart CB. Evolution of the mammalian lysozyme PLoS One. 2013;8:e80813. gene family. BMC Evol Biol. 2011;11:166. 31. Mészáros G, Boison SA, Pérez O’Brien AM, Ferenčaković M, Curik I, Da Silva 53. Reicher S, Seroussi E, Gootwine E. A mutation in gene CNGA3 is associ- MV, et al. Genomic analysis for managing small and endangered popula- ated with day blindness in sheep. Genomics. 2010;95:101–4. tions: a case study in Tyrol Grey cattle. Front Genet. 2015;6:173. 54. Dong XJ, Guan HP, Zhang QD, Yerle M, Liu B. Mapping of porcine ANKRD1, 32. Traspov A, Deng W, Kostyunina O, Ji J, Shatokhin K, Lugovoy S, et al. ANKRD2, ANKRD23, VGLL2 and VGLL4 using somatic cell and radiation Population structure and genome characterization of local pig breeds in hybrid panels. Anim Genet. 2007;38:424–5. Russia, Belorussia, Kazakhstan and Ukraine. Genet Sel Evol. 2016;48:16. 55. Onteru SK, Gorbach DM, Young JM, Garrick DJ, Dekkers JC, Rothschild MF. 33. Garcia-Gámez E, Gutiérrez-Gil B, Suarez-Vega A, de la Fuente LF, Arranz Whole genome association studies of residual feed intake and related JJ. Identifcation of quantitative trait loci underlying milk traits in Spanish traits in the pig. PLoS One. 2013;8:e61756. dairy sheep using linkage plus combined linkage disequilibrium and link- 56. Caode J, Yonggang L. Isolation, sequence identifcation and tissue age analysis approaches. J Dairy Sci. 2013;96:6059–69. expression profle of a novel sheep gene SERPINF1. J Anim Vet Adv. 34. García-Gámez E, Gutiérrez-Gil B, Sahana G, Sánchez JP, Bayón Y, Arranz JJ. 2011;10:2593–8. GWA analysis for milk production traits in dairy sheep and genetic sup- 57. Daniels TF, Wu XL, Pan Z, Michal JJ, Wright RW Jr, Killinger KM, et al. port for a QTN infuencing milk protein percentage in the LALBA gene. The reverse cholesterol transport pathway improves understanding of PLoS One. 2012;7:e47782. genetic networks for fat deposition and muscle growth in beef cattle. 35. Moioli B, Scatà MC, De Matteis G, Annicchiarico G, Catillo G, Napolitano PLoS One. 2010;5:e15203. F. The ACACA gene is a potential candidate gene for fat content in sheep 58. Zhu X, Topouzis S, Liang LF, Stotish RL. Myostatin signaling through milk. Anim Genet. 2013;44:601–3. Smad2, Smad3 and Smad4 is regulated by the inhibitory Smad7 by a 36. Jonas E, Thomson PC, Hall EJS, McGill D, Lam MK, Raadsma HW. Mapping negative feedback mechanism. Cytokine. 2004;26:262–72. quantitative trait loci (QTL) in sheep. IV. Analysis of lactation persistency 59. Zhang Q, Guldbrandtsen B, Bosse M, Lund MS, Sahana G. Runs of and extended lactation traits in sheep. Genet Sel Evol. 2011;43:22. homozygosity and distribution of functional variants in the cattle 37. García-Fernández M, Gutiérrez-Gil B, García-Gámez E, Sánchez JP, Arranz genome. BMC Genomics. 2015;16:542. JJ. The identifcation of QTL that afect the fatty acid composition of milk on sheep chromosome 11. Anim Genet. 2010;41:324–8. 38. Gutiérrez-Gil B, El-Zarei MF, Alvarez L, Bayón Y, De La Fuente LF, San Primitivo F, et al. Quantitative trait loci underlying milk production traits in sheep. Anim Genet. 2009;40:423–34. 39. Birch J, Murphy L, MacHugh ND, Ellis SA. Generation and mainte- Submit your next manuscript to BioMed Central nance of diversity in the cattle MHC class I region. Immunogenetics. 2006;58:670–9. and we will help you at every step: 40. Riggio V, Matika O, Pong-Wong R, Stear MJ, Bishop SC. Genome-wide • We accept pre-submission inquiries association and regional heritability mapping to identify loci underlying variation in nematode resistance and body weight in Scottish Blackface • Our selector tool helps you to find the most relevant journal lambs. Heredity (Edinb). 2013;110:420–9. • We provide round the clock customer support 41. Kleinman-Ruiz D, Villanueva B, Fernández J, Toro MA, García-Cortés • Convenient online submission LA, Rodríguez-Ramilo ST. Intra-chromosomal estimates of inbreeding and coancestry in the Spanish Holstein cattle population. Livest Sci. • Thorough peer review 2016;185:34–42. • Inclusion in PubMed and all major indexing services 42. Lv FH, Agha S, Kantanen J, Colli L, Stucki S, Kijas JW, et al. Adapta- • Maximum visibility for your research tions to climate-mediated selective pressures in sheep. Mol Biol Evol. 2014;31:3324–43. Submit your manuscript at www.biomedcentral.com/submit