Analysis of Copy Loss and Gain Variations in Holstein Cattle
Total Page:16
File Type:pdf, Size:1020Kb
Seroussi et al. BMC Genomics 2010, 11:673 http://www.biomedcentral.com/1471-2164/11/673 RESEARCH ARTICLE Open Access Analysis of copy loss and gain variations in Holstein cattle autosomes using BeadChip SNPs Eyal Seroussi1*, Giora Glick1, Andrey Shirak1, Emanuel Yakobson1, Joel I Weller1, Ephraim Ezra2, Yoel Zeron3 Abstract Background: Copy number variation (CNV) has been recently identified in human and other mammalian genomes, and there is a growing awareness of CNV’s potential as a major source for heritable variation in complex traits. Genomic selection is a newly developed tool based on the estimation of breeding values for quantitative traits through the use of genome-wide genotyping of SNPs. Over 30,000 Holstein bulls have been genotyped with the Illumina BovineSNP50 BeadChip, which includes 54,001 SNPs (~SNP/50,000 bp), some of which fall within CNV regions. Results: We used the BeadChip data obtained for 912 Israeli bulls to investigate the effects of CNV on SNP calls. For each of the SNPs, we estimated the frequencies of occurrence of loss of heterozygosity (LOH) and of gain, based either on deviation from the expected Hardy-Weinberg equilibrium (HWE) or on signal intensity (SI) using the PennCNV “detect” option. Correlations between LOH/CNV frequencies predicted by the two methods were low (up to r = 0.08). Nevertheless, 418 locations displayed significantly high frequencies by both methods. Efficiency of designating large genomic clusters of olfactory receptors as CNVs was 29%. Frequency values for copy loss were distinguishable in non-autosomal regions, indicating misplacement of a region in the current BTA7 map. Analysis of BTA18 placed major quantitative trait loci affecting net merit in the US Holstein population in regions rich in segmental duplications and CNVs. Enrichment of transporters in CNV loci suggested their potential effect on milk- production traits. Conclusions: Expansion of HWE and PennCNV analyses allowed estimating LOH/CNV frequencies, and combining the two methods yielded more sensitive detection of inherited CNVs and better estimation of their possible effects on cattle genetics. Although this approach was more effective than methodologies previously applied in cattle, it has severe limitations. Thus the number of CNVs reported here for the Holstein breed may represent as little as one-tenth of inherited common structural variation. Background (~SNP/50,000 bp). This chip may capture any genetic The Holstein-Friesian breed is the world’shighest- variance that is genetically linked to these markers, as producing dairy cattle; much of its outstanding milk well as copy number variations (CNVs) [2,3]. A CNV is production was gained by selection of elite artificial a structural variation, including deletion, duplication, insemination (AI) bulls based on breeding values that translocation or inversion. CNV has been recently iden- were estimated by progeny testing. Genomic selection is tified in human and other mammalian genomes, and it a newly developed tool for the estimation of breeding is now recognized that CNV might be a major source of values through the use of genome-wide genotyping of heritable variation in complex traits [4]. In humans, single nucleotide polymorphisms (SNPs). Over 30,000 over 14,478 CNV loci have been recorded based on Holstein bulls have been genotyped with the Illumina 89,427 different entries that cover about one-third of BovineSNP50 BeadChip [1], which includes 54,001 SNPs the genome. Of these entries, 65% include CNVs that range mostly between 1 and 10 kb and 34% are indels * Correspondence: [email protected] in the range of 100 bp to 1 kb http://projects.tcag.ca/ 1Institute of Animal Sciences, ARO, The Volcani Center, Bet Dagan 50250, variation/. CNV regions (CNVRs) encompassing adja- Israel cent or overlapping losses or gains cover 12% of the Full list of author information is available at the end of the article © 2010 Seroussi et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Seroussi et al. BMC Genomics 2010, 11:673 Page 2 of 10 http://www.biomedcentral.com/1471-2164/11/673 human genome. Hence, this source of variation has divided by n, the solution for this expansion in the case 0.5 more nucleotide content per genome than SNPs [4]. of a null allele r is: rl = [0.25 - 0.25pqo +poqo/pqo] - However, assuming an average spontaneous CNV muta- 0.5; in the case of gain of an allele which consists of 0.5 0.5 0.5 tion rate of 1/10,000 per locus [5], it is expected that a both types, r is: rg =[po +qo +pqo] -po -qo (see considerable portion of the reported entries arise from Additional file 1 for a detailed mathematical solution de novo CNVs of a sporadic nature. and Additional file 2 for allele distribution, c2 test, and Several algorithms for CNV identification from SNP rl and rg values for all 54,001 SNPs of the Illumina Bovi- arrays are available [6]. Following reports that PennCNV neSNP50 BeadChip). Average values of rl and rg for was the most reliable algorithm in the detection of autosomal markers were -0.3% ± 2.7 and 0.5% ± 2.1, CNVs from Illumina BeadChip data [7,8], we chose this respectively. For non-pseudoautosomal markers on the software to analyze signal intensity (SI) data. PennCNV X chromosome (positions 28,044-86,115,497), where the is a CNV detection tool that incorporates multiple model of inheritance does not fit the model under sources of information, including the ratio of total SI to which the formulas were developed, average values of rl allelic intensity at each SNP marker. This software was and rg were 361% ± 258 and -25% ± 16, respectively. originally developed for Illumina whole-genome Bead- Thus, encountering extreme values (above 100%) for rl, Chip arrays [9]. when analyzing autosomal markers, may indicate an The introduction of AI to modern dairy herd manage- error in the mapping of markers that are actually ment has resulted in a loss of genetic diversity in Hol- located on sex chromosomes. Although the HWE devia- steins and the effective size of the Holstein population is tion is an important factor in CNV occurrence, other low (e.g. 39 in the USA [10]). Since the Israeli Holstein reasons than erroneous positioning of markers of sex population has been under intensive selection for chromosomes may also exist; for example, systematic 50 years, its genetic pool is expected to have similar problems in distinguishing the alleles, due to technical characteristics. Although it is now accepted that gen- failures. However these are unlikely, because of the high omes vary more at the structural level than at the quality of the Beadchip technology [11]. nucleotide-sequence level, little has been published on CNVs in Holsteins. SI-based detection of CNV In a study that validated the quality of BovineSNP50 Using the PennCNV detection module, we analyzed the BeadChip performance [11], population-wide genotyping autosomes of each of the 912 bulls for CNVs. From the of Israeli Holstein bulls was initiated in order to intro- output of this analysis, which contained the chromoso- duce genomic selection into the Israeli breeding program. mal positions and copy numbers of the detected CNVs, Our study makes use of these data to describe the the frequency of loss or gain for each SNP marker was frequent CNV in Holsteins and investigate its effect on calculated (Additional file 2). Average loss and gain BeadChip calls. We propose to combine Hardy-Weinberg values for autosomal markers were lsi = 0.02% ± 0.2 and equilibrium (HWE)-based and SI-based methods to reli- gsi = 0.12% ± 0.15, respectively. ably detect CNVs of the deletion and duplication types that are not de novo or sporadic CNVs, which are less Comparing the HWE-based and SI-based methods for likely to be of any economic value. CNV detection Seeking confirmation of the CNV detection, we examined Results and Discussion the correlation between the HWE-based and SI-based HWE-based detection of CNV detection methods (Figure 1). When sorted according to We used the data obtained for 912 Holstein bulls to frequency of deletion as detected by the PennCNV analy- investigate the effects of CNVs on BovineSNP50 Bead- sis (SI loss), the markers that exhibited frequent LOH Chip calls. For each of the SNPs, we estimated the fre- using the HWE-based formulas (high rl, and negative rg) quency of occurrence of deletions and insertions using a clustered together towards the right end of panel A generalization of the Hardy-Weinberg principle for (Figure 1). The distribution of frequencies of the SI-based more than two allele frequencies (p, q) by assuming pre- method was of limited range (from 0 to 14%) compared sence of a third allele (r). Under the assumption of to the HWE-based method (Figure 1A). Six autosomal three-alleles, expected HWE frequencies are obtained by markers displayed rl values higher than 100%, five of the trinomial expansion of (p + q+ r)2 = 1. Defining ‘n’ them that were closely mapped on BTA7 (see BTA7 sec- as the number of individuals sampled, ‘pqo’ as the tion). These markers were regarded as non-autosomal detected number of individuals with phenotype similar and removed from further analyses. to the pq heterozygote phenotype divided by n, ‘po’ as As the Holstein population has a very low effective thenumberofallelep-likehomozygotesdividedbyn, population size, it was expected that some of the CNV and ‘qo’ as the number of allele q-like homozygotes alleles within our sample would be common.