Seroussi et al. BMC Genetics (2019) 20:71 https://doi.org/10.1186/s12863-019-0773-5

RESEARCHARTICLE Open Access Bos taurus–indicus hybridization correlates with intralocus sexual-conflict effects of PRDM9 on male and female fertility in Holstein cattle Eyal Seroussi1* , Andrey Shirak1, Moran Gershoni1, Ephraim Ezra2, Daniel Jordan de Abreu Santos3,LiMa3 and George E. Liu4

Abstract Background: Crossover localization during meiotic recombination is mediated by the fast-evolving zinc-finger (ZnF) domain of PRDM9. To study its impact on dairy cattle performance, we compared its genetic variation between the relatively small Israeli (IL) Holsteins and the North American (US) Holsteins that count millions. Results: Initially, we analyzed the major BTA1 haplotypes present in IL Holsteins based on the 10 most telomeric SNPs of the BovineSNP50 BeadChip. Sequencing of representative haplotype carriers indicated that for all frequent haplotypes (> 6%), the variable PRDM9 ZnF array consisted of seven tandem ZnF repeats. Two rare haplotypes (frequency < 4%) carried an indicine PRDM9, whereas all others were variants of the taurine type. These two haplotypes included the minor SNP allele, which was perfectly linked with a previously described PRDM9 allele known to induce unique localization of recombination hotspots. One of them had a significant (p = 0.03) negative effect on IL sire fertility. This haplotype combined the rare minor alleles of the only SNPs with significant (p < 0.05) negative substitution effects on US sire fertility (SCR). Analysis of telomeric SNPs indicated general agreement of allele frequencies (R = 0.95) and of the substitution effects on sire fertility (SCR, R = 0.6) between the US and IL samples. Surprisingly, the alleles that had a negative impact on male fertility had the most positive substitution effects on female fertility traits (DPR, CCR and HCR). Conclusions: A negative genetic correlation between male and female fertility is encoded within the BTA1 telomere. Cloning the taurine PRDM9 gene, which is the common form carried by Holsteins, we encountered the infiltration of an indicine PRDM9 variant into this population. During , in heterozygous males, the indicine PRDM9 variant may induce incompatibility of recombination hotspots and male infertility. However, this variant is associated with favorable female fertility, which would explain its survival and the general negative correlation (R = − 0.3) observed between male and female fertility in US Holsteins. Further research is needed to explain the mechanism underlying this positive effect and to devise a methodology to unlink it from the negative effect on male fertility during breeding. Keywords: Genomic conflict, Recombination, Dairy, Beef, Fertility, Holstein

* Correspondence: [email protected] 1Agricultural Research Organization (ARO), Volcani Center, Institute of Animal Science, HaMaccabim Road, P.O.B 15159, 7528809 Rishon LeTsiyon, Israel Full list of author information is available at the end of the article

© The Author(s). 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Seroussi et al. BMC Genetics (2019) 20:71 Page 2 of 11

Background prevalent in Holstein cattle and analyze the effects of During meiosis, genetic recombination reshuffles homolo- the different forms on male and female fertility. gous to produce offspring with combina- tions of traits that differ from those of their parents. Thus, Results increased recombination rate is considered to be essential Computerized cloning of PRDM9 of an influential Israeli for effective selection during domestication [1, 2], and this (IL) Holstein sire trait has recently drawn much attention from cattle re- To obtain the PRDM9 sequence of a representative Hol- searchers and breeders [3–8]. stein sire (JJ, HOLISRM000000007424), we applied deep Among others, the REC8 Meiotic Recombin- sequencing to the of this leading Israeli service ation (REC8), Ring Finger Protein 212 (RNF212) sire. At the end of 2018, this sire was recorded in the and Cyclin B1 Interacting Protein 1 (CCNB1IP1) have top 20 sires for total net merit, with more than 10,000 been implicated in driving variation in meiotic recom- daughters. Being a descendant of the popular US bull O- bination rate, with PR/SET Domain 9 (PRDM9) control- Bee Manfred Justice (HOUSA000122358313), this sire ling the positioning of recombination hotspots in represents an influential blood line of Holstein cattle. Di- ruminants, as in other mammals [3, 5, 6, 9]. PRDM9 is rected assembly resulted in a 13,568-bp gene (starting annotated at the telomeric end of Bos taurus autosome 1 the count in the 5′ untranslated region, Table 1) covered (BTA1) (GenBank: NP_001306826) including four major by 2147 reads of 100 bp each (~ 16-fold coverage). As functional domains, two of which, Krüppel Associated the assembly algorithm setup required a minimal match Box (KRAB) and SSX Repression Domain (SSXRD) nu- of 98 bp, all reads were of high quality, with no variation, clear localization signal, are associated with transcription mismatches or gaps (see BAM-format file [ENA: repression. This transcription-repression-like module is ERR3237582]). This assembled sequence had 99% nu- followed by a SET domain that provides methyltransfer- cleotide sequence identity with the reference mRNA se- ase activity and a C2H2 (ZnF) array that quence of PRDM9 (GenBank: NP_001306826.2) and binds to DNA. During meiosis, the ZnF array directs the similarly, consisted of 10 exons all bordered by canonical specific binding of PRDM9 to sites across the chromo- splice sites (Table 1). The first nine exons were capable somes, and the SET domain produces H3K4me3 and of encoding 383 amino acids that were all identical to H3K36me3 trimethylations to nearby [10]. those of the reference gene and comprised the transcrip- These modifications serve to recruit the SPO11 initiator tion-repression-like module, followed by the SET do- of meiotic double stranded breaks topoisomerase main. The last exon was capable of encoding 344 amino (SPO11) to initiate double-strand breaks by a mechan- acids, which showed only 93% identity to their counter- ism that involves protein–protein interactions with parts in the reference protein (Fig. 1). Hence, the fast- PRDM9’s transcription-repression-like module and that evolving ZnF array encoded by this exon was the source eventually promotes crossing over [10]. of all variation between the dairy and beef forms of TheC2H2ZnFarrayofPRDM9 is the fastest evolv- PRDM9, resulting in a longer 727-amino-acid dairy vari- ing ZnF in humans and other mammals [11]; this is ant compared to the reference protein of 725 amino compatible with the evident selection at the DNA- acids from beef cattle (Fig. 1). binding sites of PRDM9 [12]. This variation may promote subfertility and male sterility in hybrids, in which PRDM9 plays a complex role (reviewed by PRDM9 expression [10]). In the dairy sector, subfertility accounts for Length differences between the 727-amino-acid Hol- major economic losses, and dairy cattle breeding, stein PRDM9 variant and the 725-amino-acid reference which focuses primarily on selection for production sequence were also introduced by a different splicing traits, has resulted in a decline in the reproductive scheme for the last exon. As indicated in Table 1,in performance of Holstein cows [13]. Fertility issues are our build, the splice donor is the first CAG motif 5′ of also predominant in male crossbreds of Bos taurus × this exon, which is followed by another identical motif B. indicus cattle. Compared to purebreds, crossbred used by the reference. To verify which is the actual spli- progeny of Holstein–Friesian and indicine cattle show cing donor, we explored RNA-Seq data deposited in the poorer seminal parameters, subfertility and male ster- Sequence Read Archive (SRA) of NCBI. Expression was ility [14]. The current bovine PRDM9 reference se- detected only in the testis and using a stringent SRA quence originates from beef cattle (US Hereford BLAST search, we located 414 reads from three RNA- cattle) and despite much interest in this gene’sfunc- Seq submissions of the Hereford SuperBull 99,375 testis tion in dairy cattle, there is no Holstein PRDM9 ref- (Domino). Of these reads, 265 were assembled into a erence sequence deposited in GenBank. In this study, 2586-bp complete cDNA (Fig. 1; [ENA: ERR3237910] we describe a longer form of PRDM9 protein that is for BAM-format and [ENA: LR536714] for annotated Seroussi et al. BMC Genetics (2019) 20:71 Page 3 of 11

Table 1 Genomic organization of the Bos taurus PRDM9 gene (using the representative Holstein sire) Introna Exon Intron no. size size TGAGCACTCCAATGGCC 1 75 CCCACGgtgagaggca 406 attttccttagGCCAAA 2 124 CTATAGgtaacaggaa 307 cttctttccagGTTTCA 3 108 AGCAAGgtgaggggcc 4190 tcattttttagGTAAAC 4 50 CAGAATgtgagtattt 1784 tcatgtggaagACAATG 5 157 AAGTCGgtaagagaaa 842 ctccatcttagAACTCA 6 102 ACCTCTgtgagtgccc 216 ctcacctccagATTGTG 7 272 TGGCTGgtgagaaaca 576 ctgacactcagATCACC 8 68 GATGAGgtgagtgcag 1145 ctcgaccccagGTATGT 9 194 CCAGAGgtgagcgcca 1677 tcctctttcagCAGAAT 10 1294 GAATCATAGCCAACAAACTACATTTTAGTCACAGGAGAATTACTGCAGCCACCCCATGCCTCAGCTCTAAGGGGGCCTCAGAGGAGGTCTG TGACCTGTTACAGTCACCAAGAGTGTGAGGAGAGACTTCCCAGGTGGTCCAGGGGCCAAGACTCCAATTCAGGGACCCATGTTCAATACCTG ATTGGGGAACTAGCTCCCACATGCTGCAACTAAGACACAGTGCAGCTGAATAAATAAATACGTAAATAAATATTTTAAA...+poly-A+tail? a Exon and intron sizes are given in base pairs. Intron and exon sequences are written in lowercase and uppercase letters, respectively. The first and last two bases of the introns (gt and ag for donor and acceptor splice sites, respectively) are in bold type. The initiation and stop codons, and the putative polyadenylation signal (ATG, TAG, AATAA) are in bold and underlined type. Starting from the initiation codon, the genomic and transcript sizes of the PRDM9 gene as presented here were 13,557 and 2444 bp, respectively

Fig. 1 The putative amino-acid variation encoded in exon 10 of PRDM9. Genomic DNA of sires preferentially homozygous for haplotype alleles were used as templates for amplification and for Sanger sequencing using nucleotide primers 1 and 2 in Additional file 1: Table S1. The resulting traces were compared to reference sequences (Bos taurus [GenBank: NP_001306826.2] and two variants of B. indicus based on sequence phasing [GenBank: XP_019820291.1 and ANN45578]) or assembled sequences from NGS data (Holstein, JJ [ENA: LR536713]). The amino-acid alignment was colored using Boxshade. Dashes indicate gaps introduced by the alignment program or stop codons. Identical and similar amino acid residues in at least two of four sequences are indicated by a black and gray background, respectively. White boxes indicate non-conservative amino acid changes between the . Above the alignment, tandem ZnF repeats are labeled following Zhou et al., 2018 [15]. Each repeat consists of 28 residues. Nomenclature of variants follows that of their associated BTA1 telomeric SNP haplotypes (Table 3) and theirs DNA encoding sequences (Additional file 1) Seroussi et al. BMC Genetics (2019) 20:71 Page 4 of 11

transcript sequence), which validated the first CAG calculated using a linear sire model that included the in- motif as the donor (also exemplified by [SRA: semination technician as random effect and were based SRR5363137.1086298]). on determination of pregnancy by veterinary examination for all cows that did not display estrus within 60 d of in- Paralogous genes semination [17]. Input data included genotypes of 1750 To further analyze variation in the Holstein PRDM9 sires for 10 polymorphic SNPs on BTA1 that fit the gene, it was essential to investigate and map close par- Hardy–Weinberg distribution (p < 0.001). The PLINK per- alogous sequences that may interfere with this gene’s mutation option was employed to verify probability of as- characterization. Using as template query the 13,598-bp sociation of the haplotype alleles with male fertility. The sequence of the dairy form of PRDM9, we BLAST identified haplotype consisted of the most telomeric SNPs searched the current genome build (ARS-UCD1.2). on the BovineSNP50K BeadChip at positions 157,229, This indicated the existence of five close paralogs (- 645–157,542,408 (build ARS-UCD1.2), nearest to the imal score > 2500): PRDM9 on BTA1 (identity 99%, PRDM9 gene (157,545,780–157,559,387). For this window, coverage 100%), LOC100851938 on BTAX (identity 16 common haplotypes explained > 92% of the observed 89%, coverage 99%), LOC100139638 on BTA8 (identity sequence variation (Table 3). The likelihood of association 92%, coverage 99%), LOC789895 on BTA21 (identity with male fertility was significant only for haplotype #9, 82%, coverage 95%), an unannotated PRDM9-like which associated with negative male fertility (β value − pseudogene on BTA22 (identity 92%, coverage 40%) 0.58, Table 3). This haplotype combined the rare minor al- (Fig. 2). It should be noted that the latter is annotated leles of the two SNPs that displayed the most negative ef- as LOC113880961 in the hybrid cattle genome but not fects on this trait (allele frequencies 3.7 and 9.1% with β in the B. taurus genome build. values of − 0.5 and − 0.3, respectively, Table 2). However, Diversity of the Holstein PRDM9’s ZnF array was fur- since this simplified analysis may be confounded by popu- ther characterized by de-novo assembly of all Holstein lation stratification, we applied bootstrapping with 100, reads in the SRA that have been deposited by the 000 permutations, which corroborated the significance of USDA (12 bulls, [NCBI BioProject: PRJNA277147]) and this association (Table 3). that were proved similar to the reference sequence of We further analyzed the selected haplotype using a large- PRDM9 exon 10 by an SRA BLAST search. This assem- scale pedigree haplotyper [18]; we examined the statistically bly resulted in five major contigs with varying lengths phased haplotypes and adjusted their reconstruction based of ZnF arrays, ranging from 4 ZnF repeats on BTA22 to on Mendelian inheritance and the complex kinship - over 20 ZnF repeats on BTX, and corresponding to the tions within the sample. The final sample for which associ- five aforedescribed paralogs (Fig. 2). All reads that were ation of the PRDM9 with the male fertility trait was assembled into the PRDM9 contig matched its dairy estimated included 1414 sires with fully reconstructed and form of seven repeats. We used this information to de- confirmed haplotypes. This analysis indicated that haplo- sign PCR primers (Additional file 1: Table S1) that type #9 is associated with sires with a negative score for would enable specific amplification, and to apply male fertility (chi-squared test, p < 0.05, Table 3). Sanger sequencing of the major variation in the ZnF array of Holstein PRDM9. Confirming PRDM9 association with fertility in US Holsteins Haplotype analysis While the association analysis is somewhat limited when Using PLINK software [16] sliding-window analysis over using the data for the IL Holstein herd, the US population BTA1, we identified informative haplotypes of 10 single- offers almost unlimited statistical power as it includes mil- nucleotide polymorphisms (SNPs) spanning the PRDM9 lions of individuals with Illumina BeadChip data. We used locus (Table 2). Scores for the male fertility trait were this dataset to test the association between fertility traits

Fig. 2 PRDM9 ZnF array paralogs. Genomic reads of US Holsteins with sequence similarity to exon 10 of PRDM9 were downloaded from the SRA database and assembled using GAP5 software. Each red dot represents an 8-bp repeat that is similar to the PRDM9 exon 10 sequence. The domain of the tandem repeats forms a dotted rectangle, which reflects the number of tandem repeats Seroussi et al. BMC Genetics (2019) 20:71 Page 5 of 11

Table 2 BTA1 telomeric SNPs SNP markera BTA1 position A1 (minor) A2 MAFb βc ARS-BFGL-NGS-73542 157,229,645 A G 0.4020 −0.01 ARS-BFGL-NGS-19721 157,253,652 G A 0.4090 0.04 ARS-BFGL-NGS-101788 157,307,208 A G 0.3740 −0.02 BTA-105868-no-rs 157,328,448 A G 0.2400 0.10 BTB-01585499 157,367,221 G A 0.2910 −0.08 ARS-BFGL-NGS-113905 157,405,441 A G 0.0373 −0.50 ARS-BFGL-NGS-90894 157,431,081 A G 0.1570 0.04 ARS-BFGL-NGS-83544 157,458,860 A G 0.0913 −0.30 Hapmap26498-BTA-33060 157,503,718 A G 0.3490 −0.05 ARS-BFGL-NGS-76717 157,542,408 G A 0.2350 −0.16 a Boldfaced names indicate SNP alleles with exceptionally low frequency and β values b Frequency of the minor allele was calculated based on 1750 BeadChips c Marker effects on male fertility were estimated using PLINK Fisher’s exact test and nine BTA1 telomeric SNPs that were genotyped in correlation indicates that the trends measured for the both the US and IL datasets (Table 4). For these SNPs, much smaller (2576-fold) IL substitution effects (Table 2) allelic composition was very similar (R = 0.95) to that were also real. Indeed, for the US population as well, only observed in the IL population (Table 2). Table 4 shows that the two SNPs with the lowest minor allele frequency all effects were significant, most of them reaching the (MAF < 10%, Table 4), which are carried by the aforede- lowest number possible by the computing software for oc- scribed B. indicus haplotype, had negative effects on sire currence by chance, and thus their p values were indistin- fertility (Table 4). Surprisingly, these two SNPs were the guishable from zero. We observed a significant correlation only ones with positive substitution effects on female fertil- (0.6) between the substitution effects on the male fertility ity represented by heifer conception rate (β HCR values, trait estimated by the sire conception rate (β SCR values, Table 4). As other traits of female fertility, including rates Table 4) and the effects of these SNPs on male fertility in for daughter pregnancy (DPR) and cow conception (CCR), the IL Holstein herd (β values, Table 2). This significant were positively correlated with HCR (Table 5), similar

Table 3 Association analysis of BTA1 telomeric SNP haplotypes with male fertility Haplotypea # Freqb βc STAT p EMP1d GAGGAGGGGA 1 0.2760 0.0749 0.517 0.472 0.4718 AGAGGGGGAG 2 0.1190 −0.1120 0.670 0.413 0.4133 AGAAAGGGGA 3 0.0981 0.1570 0.969 0.325 0.3266 AAGGGGAGGA 4 0.0802 0.2560 2.100 0.148 0.1493 GGAAAGAGGA 5 0.0533 −0.0120 0.004 0.952 0.9518 GAGGAGGGAG 6 0.0530 −0.2140 1.050 0.306 0.3071 GGAAAGGGAA 7 0.0504 0.3230 2.070 0.151 0.1498 GAGGAGGGAA 8 0.0458 0.0161 0.005 0.945 0.9438 GAGGAAGAGA 9 0.0370 −0.5800 4.560 0.033 0.0325 GAGGGGGAGA 10 0.0257 0.1530 0.263 0.608 0.6043 GGGGAGGGAG 11 0.0184 0.5370 2.590 0.108 0.1073 AAGGAGGGAG 12 0.0154 −0.0296 0.006 0.937 0.9361 AAGGAGGGGA 13 0.0152 0.0878 0.037 0.847 0.8467 AGGGAGGGGA 14 0.0148 −0.2660 0.481 0.488 0.4859 AGGAAGGGAG 15 0.0124 −0.1210 0.090 0.764 0.7601 AGAAGGGAGA 16 0.0103 0.1830 0.131 0.717 0.7139 a Boldfaced haplotype has a significantly low β value b Haplotype frequency was used to sort this table c Haplotype effects on male fertility were estimated using PLINK linear regression test d Empirical p value was the number of times the permuted haplotype-statistic exceeded p in 100,000 permutations Seroussi et al. BMC Genetics (2019) 20:71 Page 6 of 11

Table 4 Substitution effects on fertility traits in US Holstein cattle of BTA1 telomeric SNPs a b c c SNP marker MAF β HCR PHCR β SCR PSCR ARS-BFGL-NGS-73542 0.4158 −0.226 0 0.015 0 ARS-BFGL-NGS-19721 0.3315 −0.280 0 0.025 0 ARS-BFGL-NGS-101788 0.3295 −0.250 0 0.023 0 BTA-105868-no-rs 0.2140 −0.126 1.4E-205 0.004 5.39E-21 BTB-01585499 0.2289 −0.315 0 0.044 0 ARS-BFGL-NGS-113905 0.0833 0.064 4.93E-26 −0.011 4.35E-56 ARS-BFGL-NGS-90894 0.1095 −0.208 0 0.034 0 ARS-BFGL-NGS-83544 0.0953 0.046 2.63E-15 −0.001 0.05 Hapmap26498-BTA-33060 0.2591 −0.259 0 0.042 0 Correlation with IL valuesd 0.95 −0.78 0.60 a Boldfaced names indicate SNP alleles with exceptional frequency and β values b Minor alleles were the same as in Table 2 and their frequencies were calculated based on 4,508,642 BeadChips c Marker effects on heifer and sire conception rates (HCR and SCR, with numbers of observations of 922,893 and 903,690, respectively) were estimated using PLINK Fisher’s exact test d Correlations (R) are with Table 2 β values. R was − 0.89 within the β values of Table 4 effect values were also observed for these other traits (data (Additional file 1: Table S1) or by subcloning into a se- not shown). This suggests that near the BTA1 telomere, quencing vector. Such plasmid sequencing also enabled there is a linkage between a beneficial allele that affects fe- the identification of a PRDM9 variant with eight ZnF re- male fertility and an allele that reduces male fertility. These peats, which was carried by the relatively rare haplotypes observations were supported by the moderate negative #5 and #7. The nucleotide sequence of this variant was genetic correlations (R ≈−0.3, on average) that were gener- virtually identical to that of the most common allele ally noticed between male (SCR) and female (DPR, HCR (haplotype #1) except for an insertion in an additional se- and CCR) fertility traits in the US sample (Table 5). Conse- quence motif of the ZnF repeat (Fig. 1, [ENA: LR536717]). quently, strong negative correlations were observed be- Analysis of the variation of the PRDM9 ZnF-array alleles tween the effects of the BTA1 telomeric SNPs on US HCR indicated their division into two phylogenetic groups (Fig. 3). with either IL male fertility or US SCR (R = − 0.78 and − Most forms belonged to the longer 727-amino-acid dairy 0.89, respectively, Table 4). variant (Fig. 1), which we refer to as taurus-liketype(Fig.3). Individuals heterozygous for the rare haplotypes #9 and #10, Sequence analysis of exon 10 of PRDM9 and its encoded which were the only haplotypes that carried the minor allele ZnF array ‘A’ in the SNP marker ARS-BFGL-NGS-83544 (~ 9% of the The rapidly evolving ZnF array encoded by exon 10 is population, Table 2), were characterized by ambiguous trace thought to confer sequence specificity to the binding of chromatograms when sequenced in the reverse orientation PRDM9 to DNA sites in which recombination hotspots (Fig. 4). Such forms are compatible with the presence of the are induced. Thus, variation of this domain in heterozy- 725-amino-acid PRDM9 variant, which we refer to as the gotes may drive incompatibility that affects male fertility. indicus-like branch (Fig. 3). This shorter form was also To analyze such variation, we Sanger sequenced this ZnF present in the B. taurus reference sequence and in the se- array in a sample of individuals that were preferentially quence of Dominette as assembled from trace files (data not homozygous for the common haplotype alleles of the shown), both derived from the Hereford beef breed; and in BTA1 telomeric end (haplotypes 1–10; Table 3,Fig.1,and the reference sequences for B. indicus PRDM9. Hence, Additional file 1). Haplotypes #9 and #10 were sequenced haplotype #9, which was associated for sires with a negative from heterozygotes using allele-specific PCR primers score for male fertility, was also associated with the indicus-

Table 5 Pearson correlations between EBVs of rates of daughter pregnancy and of sire, heifer and cow conception in the US Holstein populationa DPR HCR CCR SCR −0.280 −0.247 − 0.368 DPR 0.452 0.880 HCR 0.614 a DPR, daughter pregnancy rate; SCR, HCR, CCR, sire, heifer and cow conception rates, respectively Seroussi et al. BMC Genetics (2019) 20:71 Page 7 of 11

Fig. 3 Phylogenetic tree of PRDM9 ZnF array alleles. The evolutionary history of the polypeptides presented in Fig. 1 was inferred using the Neighbor-Joining method. The different alleles are identified by their carrying haplotype numbers. The optimal tree with the sum of branch length = 0.099 is shown. Next to the branches, the percentages of replicate trees in which the associated polypeptides clustered together in the bootstrap test are shown. The tree is drawn to the scale shown in units of number of amino acid substitutions per site like PRDM9, suggesting that it drives male infertility as in combinations of genetic variants [15]. A recent study of Bos taurus–indicus hybrids. US cattle showed that a specific PRDM9 allele denoted ‘allele 5’ has a dramatic influence on the localization of Discussion recombination hotspots and unique recombination hot- The purpose of this study was to discover PRDM9 al- spot regions that are distinguishable from hotspot re- leles that affect dairy production traits or could lead to gions modulated by all other alleles [15]. However, it more rapid genomic selection in Holstein cattle breed- was unclear whether this pattern would also be ob- ing by controlling the rates of meiotic recombination. served in IL Holstein cattle, which have a different On the one hand, meiotic recombination driven by demographic history, although AI using semen of elite PRDM9 may induce deleterious chromosomal instabil- US sires is often practiced to enhance the local variety. ityandmeioticdrive[11, 19, 20]; on the other, it re- We applied genomic sequencing as the putative method shuffles paternal and maternal genetic alleles in the of choice to investigate which PRDM9 alleles are preva- next generation, potentially providing better novel lent in the IL Holsteins. Computerized cloning of the

Fig. 4 Ambiguous trace chromatograms associated with heterozygosity for the dairy and beef forms of PRDM9. On top, the ambiguous trace chromatogram was obtained by sequencing of the PCR product amplified from sire #5228 that carries haplotypes #9 and #15 (Table 3). Sequencing was performed using the reverse primer (primer 1, Additional file 1: Table S1). Phased nucleotides and their corresponding encoded protein translations are presented below this chromatogram. Further below, chromatograms were obtained from bacterial-cloned fragments amplified from sire #5611 that carries haplotypes #8 and #10 using the SP6 primer Seroussi et al. BMC Genetics (2019) 20:71 Page 8 of 11

PRDM9 gene of an influential IL Holstein sire indicated (DPR, CCR and HCR, exemplified for the latter in that it encodes a 727-amino-acid PRDM9 variant, Table 4). This made us double-check our methodology, which we refer to as the dairy form. This form was lon- but realizing that “nothing in genetics makes sense ex- ger than the beef form of the GenBank reference se- cept in light of genomic conflict” [22], we concluded that quence, which was derived from the Hereford beef our results may point to a fundamental intralocus sexual breed, as this breed’s genome was the first to be se- conflict that arises for either the PRDM9 gene or closely quenced, assembled, and annotated [21]. Moreover, this linked genes at the BTA1 telomere. Such situations in reference suggests an alternative splicing which does which a genetic locus combines beneficial alleles for fe- not correspond to the common transcript, as evidenced males with a selective disadvantage for males have been from our assembly of RNA-Seq data obtained from frequently observed (recently reviewed, [23]). This may Hereford SuperBull 99,375 testis; as such, it should be stabilize the survival of alleles with a negative impact on considered a computational artifact. Based on this fertility despite the obvious importance of this trait to RNA-Seq read assembly, we give the correct transcript genetic fitness. Indeed, in the US Holstein population, sequence of the beef form. we observed a moderate negative genetic correlation be- As expected [19], most of the nucleotide variation was tween male and female fertility traits (R ~ − 0.3, Table 5); observed within the repetitive ZnF array. However, in this can now be explained by the intralocus sexual con- both forms, we mostly observed seven tandem ZnF re- flict on the BTA1 telomere, where we recorded a much peats, while other repeat numbers have also been previ- higher negative genetic correlation (R ~ 0.9) between the ously suggested [15]. Taking into account paralogous substitution effects on female and male fertilities sequences, we carefully assembled NGS data of 12 US (Table 4). Such a moderate negative correlation between bulls, and concluded that all observed PRDM9 alleles male and female fertility has been observed in Danish have at least seven tandem ZnF repeats, while smaller cattle, leading to the suggestion that in breeding repeat numbers belong to paralogous loci. To strengthen schemes for fertility, attention should be focused on the this conclusion, we analyzed the major BTA1 haplotypes female side [24]. As in IL, but unlike in the US [25], present in IL Holsteins based on the 10 most telomeric male semen is not titrated according to male fertility SNPs available on the Illumina BovineSNP50 BeadChip. score; it may be that the IL breeding scheme led to a Sanger sequencing of representative haplotype carriers much lower frequency (< 4%, 2.2-fold less than in the indicated that for all frequent haplotypes (frequency > US, Tables 2 and 4) of the minor SNP allele of ARS- 6%), the sequences of the PRDM9 ZnF array consisted of BFGL-NGS-113905. This allele has the highest negative seven tandem ZnF repeats. Nevertheless, two rare haplo- impact on male fertility and thus the selection in IL types (frequency < 4%, #9 and #10, Table 3) carried the against this allele decreased the negative correlation be- beef form of PRDM9, while all others were variants of tween male and female fertility traits to a non-substan- the dairy type. These two haplotypes included the minor tial number (data not shown). It should be also noted SNP allele ‘A’ of rs110661033 or ARS-BFGL-NGS-83544 that SCR is service sire contribution to pregnancy while which was perfectly linked with allele 5 of PRDM9 [15]. HCR is the female contribution to pregnancy. Therefore, Hence, these are likely to induce different localization of SCR is not a direct trait for male fertility, but an indirect the recombination hotspots compared to all other haplo- male contribution through genetics and potentially epi- type alleles, as has been previously reported [15]. More- genetics in sperms [26]. over, haplotype #9 had a significant (p = 0.03) negative Our phylogenetic analysis indicated that the beef effect on IL sire fertility. This haplotype combined the form of PRDM9 is virtually identical to B. indicus rare minor alleles of the only SNPs tending to negative PRDM9.Boththetaurus and indicus species descend substitution effects on IL sire fertility (Table 2). To en- from the extinct wild aurochs (Bos primigenius). How- sure this observation’s significance, we analyzed nine of ever, separate ancient domestication events led to spe- the most telomeric BTA1 SNPs using data from the US ciation [27] and although these species readily national dairy cattle database that includes records for hybridize, male infertility is often observed in the cross- millions of individuals (Table 4). This analysis indicated breds [14]. Low levels of haplotype sharing between B. general agreement between the allele frequency (R = taurus and indicus breeds have been frequently ob- 0.95) and the substitution effects on sire fertility (SCR, served for every analyzed gene because of the recent R = 0.6) between the US and IL samples, confirming sig- formation of B. taurus × B. indicus hybrids in North nificant (p < 0.05) negative substitution effects on male America [28]. This suggests infiltration into the Hol- fertility for both minor SNP alleles that associate with stein herd of the indicine PRDM9, which induces the IL #9 haplotype that carries the beef form of unique recombination hotspot regions. These are not PRDM9. Surprisingly, the very same alleles had the most compatible with the recombination hotspots mediated positive substitution effects on female fertility traits by the taurine PRDM9 andthusdrivemeiosisin Seroussi et al. BMC Genetics (2019) 20:71 Page 9 of 11

heterozygous individuals toward chromosomal instabil- amplified using PCR primers (Additional file 1: Table S1) ity and male infertility. and the Bio-X-ACT™ Long Kit (Bioline Ltd., London, UK) according to the manufacturer’s instructions under the Conclusions following conditions: 30 cycles for 40 s at 92 °C, 60 s at In Holstein cattle, the breeding scheme for female fertil- 63 °C and 60 s at 68 °C. The PCR products were separated ity has been complicated by a negative correlation be- on agarose gels, excised, and purified with AccuPrep® Gel tween this trait and milk production [29]. We show that Purification Kit (BioNeer Corp., Seoul, Korea). Chromato- this scheme is further complicated by the negative gen- grams were obtained by ABI3730 sequencing using a Big- etic correlation between male and female fertility that is Dye® Terminator v1.1 Cycle Sequencing Kit (Applied encoded in the BTA1 telomere. Cloning the taurine Biosystems, Foster City, CA, USA). Detection and PRDM9 gene, which is the common form carried by characterization of indels were performed using ShiftDe- Holstein haplotypes of this region, we demonstrated the tector and the ABI tracefiles [32]. infiltration of a rare indicine PRDM9 variant into the Holstein population. We suggest that during meiosis, in Cloning of PRDM9 exon10 sequence heterozygous males, this may induce incompatibility in PRDM9 DNA fragments were amplified with subcloning the localization of recombination hotspots, destabilize primers (Additional file 1: Table S1) using Hy-Fy High Fi- genome integrity, and cause male infertility due to de- delity Mix (Hy Laboratories Ltd., Rehovot, Israel). The fects in spermiogenesis. However, the indicine PRDM9 amplified products were digested by restriction enzymes, variant was associated with a favorable effect on female purified from a 1% agarose gel by Gel/PCR DNA Frag- fertility, which would explain the survival of this variant ments Kit (Geneaid Biotech Ltd., Taipei, Taiwan) and li- and the general negative correlation of R = − 0.3 ob- gated into the pGEM®-T Easy Vector (Promega, Madison, served between male and female fertility traits in US WI, USA) using EcoRI and NcoI sites and T4 DNA ligase Holsteins. Further research is needed to explain the (Promega). These cloned DNA fragments were subjected to mechanism underlying this positive effect on female fer- Sanger dideoxy sequencing using primers for SP6 and T7 tility, and to devise a methodology that will unlink it promoters in pGEM-T Easy and an additional primer from the observed negative effect on male fertility. within the insert (Additional file 1:TableS1).

Methods The dataset, haplotype phasing and trait-association Deep sequencing and analysis of bovine analysis The current reference genome is based on the Hereford Using Illumina (San Diego, CA, USA) BovineSNP50 Bead- beef breed. To find variations between the dairy and beef Chip genotypes, four traits were analyzed: cow, heifer and species that may underlie the differences in PRDM9, daughter fertilities (CCR, HCR and DPR, respectively), DNA was extracted from thawed frozen semen of a sin- and sire conception rate (IL-SCR) as previously described gle Holstein sire (JJ, HOLISRM000000007424) and was [33, 34]. Briefly, IL-SCR was calculated based on a linear deep-sequenced using the Illumina HiSeq2000 platform model and 5,658,632 insemination records of 1597 sires according to the manufacturer’s paired-end protocol. with a minimum 250 inseminations per sire delivered by a Average fragment length was 580 bp, and 100-bp se- qualified inseminator with a minimum 250 inseminations quence reads were obtained from both ends. DNA sam- per year. Fixed effects were insemination number, AI in- ple was applied to two lanes; yielding ~ 30-fold (906,996, stitute, geographical region, and calendar month. Analysis 192 reads) coverage for this sample. The reference gene of cows also included the fixed effects of parity, calving sequence was then used as a template for mapping these status, and day in milk at insemination. Random effects DNA-Seq reads using GAP5 software [30]. BWA options included in the model were herd-year season, insemin- for this mapping were set to bam bwasw -t 8 -T 60 [31]. ation technician, sire of cow, and service sire. The stand- The assembled sequence of this sire gene was submitted ard deviation for IL-SCR evaluations was 0 ± 0.024 and under ENA accession nos. ERS3326200 (BAM format) mean reliability was 78.2%. DNA was extracted from the and LR536713 (annotated gene sequence). semen of 1750 Holstein bulls used for AI in Israel. The Additional genomic sequences of the PRDM9 locus bulls’ identity, relationship and genetic breeding values are were reconstructed using DNA-Seq reads located in available at http://www.icba-israel.com/cgi-bin/bulls/en/ NCBI’s SRA and the Nucleotide BLAST tool (GenBank bl_main.htm. The dataset of IL sires, including genotyping accession No. PRJNA277147). The reference gene se- data and SCR values is available in Excel format quence was then used as a template for mapping these (Additional file 2). DNA-Seq reads following the above mention procedures Association for BTA1 SNPs was determined using for assembling our own data. Further analysis of variation PLINK [16], activating the haplotype sliding-window and was performed with Sanger sequencing: DNA was bootstrapping options (−-hap-window 10 --hap-linear Seroussi et al. BMC Genetics (2019) 20:71 Page 10 of 11

--mperm 100,000). Haplotype spanning of the PRDM9 the JTT matrix-based method in units of number of gene, consisting of 10 SNPs within positions 157,229, amino acid substitutions per site. The rate variation 645–157,542,408 (build ARS-UCD1.2), was chosen for among sites was modeled with a gamma distribution further analyses. For this haplotype, phasing was corrob- (shape parameter = 2.53). All positions containing gaps orated using the rule-based Large-Scale Pedigree and missing data were eliminated. There were a total of Haplotyper (LSPH) software [18]. The genetic correla- 342 positions in the final dataset. tions between traits or between markers’ substitution ’ effects were estimated as Pearson s correlation coeffi- Additional files cients. These coefficients of correlation were calculated using R package [35] or CORREL function in Excel Additional file 1: Table S1. Primer pairs used for PCR amplification and spreadsheet (Microsoft Corporation, Santa Rosa, CA, sequencing of the PRDM9 gene; and nucleotide sequences of PRDM9 USA), respectively. exon10, partial coding sequences associated with the haplotype and individuals studied. (PDF 362 kb) Additional file 2: The dataset of IL sires, including genotyping data and US Holstein samples and analysis SCR values. (XLSX 218 kb) The data used were part of the 2018 US genomic evalua- tions from the Council on Dairy Cattle Breeding (CDCB), Abbreviations consisting of 1,953,934 Holstein cattle from the national AI: Artificial insemination; BIC: Bayesian information criterion; CCR: Cow dairy cattle database. Estimated breeding values (EBVs) of conception rate; CDS: Coding sequence; DPR: Daughter pregnancy rate; four fertility traits were analyzed: SCR, DPR, HCR and HCR: Heifer conception rate; NGS: Next generation sequencing; SCR: Sire CCR. We only included those animals with both available conception rate genotype and trait reliability larger than the parent average. Acknowledgements A detailed description of the data is provided in Table 6. We thank Dr. Joel I. Weller for useful discussion. Contribution from the The genotype data from different SNP arrays were Agricultural Research Organization, Institute of Animal Science, Bet Dagan, imputed to a common dataset of 4340 SNPs on BTA1 Israel, is acknowledged. using FindHap version 3 [36]. Then, nine telomeric SNPs were analyzed: ARS-BFGL-NGS-73542, ARS-BFGL-NGS- Authors’ contributions LM, GEL (US side) and ES (Israeli side) conceived and designed the experiments. 19721, ARS-BFGL-NGS-101788, BTA-105868-no-rs, EE collected samples and generated data. AS carried out the Sanger dideoxy BTB-01585499, ARS-BFGL-NGS-113905, ARS-BFGL- sequencing and haplotype analyses. ES performed the bioinformatics analyses NGS-90894, ARS-BFGL-NGS-83544, and Hapmap26498- of the DNA- and RNA-Seq data. DJDAS and MG performed computational and statistical analyses on the US and IL sides, respectively. ES drafted the paper. All BTA-33060. The association studies were performed using authors read and approved the final manuscript. PLINK v 1.07 software [16]. Following Garrick et al., 2009 [37], association analysis was also performed using dereg- Funding ressed EBVs (dEBVs) and removing the parent effect from This work was supported by BARD grant number US-4997-17 from the US– ’ Israel Binational Agricultural Research and Development (BARD) Fund. The the individual s EBV. The substitution effects estimated funders played no role in the study design, collection, analysis or interpret- based on dEBVs were highly correlated with those ob- ation of the data, writing of the manuscript, or in the decision to submit the tained using EBVs (R = 0.956, data not shown). manuscript for publication.

Analysis of evolutionary relationships Availability of data and materials Sequence data have been submitted to ENA under accession no. The evolutionary history of the PRDM9 ZnF-array alleles PRJEB31626. The dataset of IL sires, including genotyping data and SCR was inferred using the Neighbor-Joining method. Evolu- values, are presented in Additional file 2. tionary analyses were conducted in MEGA6 [38]. Briefly, the best model was selected according to the lowest Ethics approval and consent to participate Bayesian Information Criterion (BIC) scores. The opti- Not applicable. mal tree was identified by the bootstrap test (1000 repli- Consent for publication cates). The evolutionary distances were computed using Not applicable.

Table 6 Description of number of animals, estimated breeding Competing interests value summary statistics and average of their reliability The authors declare that they have no competing interests. Trait N Mean ± SD Min Max Reliability Author details SCR 903,690 2.85 ± 0.25 1.79 5.01 0.38 1Agricultural Research Organization (ARO), Volcani Center, Institute of Animal DPR 836,623 −14.41 ± 2.73 −30.39 7.24 0.33 Science, HaMaccabim Road, P.O.B 15159, 7528809 Rishon LeTsiyon, Israel. 2Israel Cattle Breeders Association, Caesarea, Israel. 3Department of Animal HCR 922,913 −0.62 ± 2.30 −16.2 11.2 0.28 and Avian Sciences, University of Maryland, College Park, MD, USA. 4Animal CCR 794,362 −12.22 ± 3.42 −32.6 7.34 0.32 Genomics and Improvement Laboratory, BARC, Agricultural Research Service, USDA, Beltsville, MD 20705, USA. Seroussi et al. BMC Genetics (2019) 20:71 Page 11 of 11

Received: 12 April 2019 Accepted: 21 August 2019 26. Fang L, Jiang J, Li B, Zhou Y, Freebern E, Vanraden PM, et al. Genetic and epigenetic architecture of paternal origin contribute to gestation length in cattle. Commun Biol. 2019;2:100. 27. Pitt D, Sevane N, Nicolazzi EL, MacHugh DE, Park SDE, Colli L, et al. References Domestication of cattle: two or three events? Evol Appl. 2019;12(1):123–36. 1. Ross-Ibarra J. The evolution of recombination under domestication: a test of 28. Seabury CM, Seabury PM, Decker JE, Schnabel RD, Taylor JF, Womack JE. – two hypotheses. Am Nat. 2004;163(1):105 12. Diversity and evolution of 11 innate immune genes in Bos taurus taurus 2. Ritz KR, Noor MAF, Singh ND. Variation in recombination rate: adaptive or and Bos taurus indicus cattle. Proc Natl Acad Sci U S A. 2009;107(1):151–6. – not? Trends Genet. 2017;33(5):364 74. 29. Wathes DC, Fenwick M, Cheng Z, Bourne N, Llewellyn S, Morris DG, et al. 3. Sandor C, Li W, Coppieters W, Druet T, Charlier C, Georges M. Genetic Influence of negative energy balance on cyclicity and fertility in the high variants in REC8, RNF212, and PRDM9 influence male recombination in producing dairy cow. Theriogenology. 2007;68 Suppl 1:S232–41. cattle. PLoS Genet. 2012;8(7):e1002854. 30. Bonfield JK, Whitwham A. Gap5--editing the billion fragment sequence 4. Weng ZQ, Saatchi M, Schnabel RD, Taylor JF, Garrick DJ. Recombination assembly. Bioinformatics. 2010;26(14):1699–703. locations and rates in beef cattle assessed from parent-offspring pairs. 31. Li H, Durbin R. Fast and accurate long-read alignment with burrows- Genet Sel Evol. 2014;46:34. wheeler transform. Bioinformatics. 2010;26(5):589–95. 5. Ma L, O'Connell JR, VanRaden PM, Shen B, Padhi A, Sun C, et al. Cattle sex- 32. Seroussi E, Ron M, Kedra D. ShiftDetector: detection of shift mutations. specific recombination and genetic control from a large pedigree analysis. Bioinformatics. 2002;18(8):1137–8. PLoS Genet. 2015;11(11):e1005387. 33. Weller JI, Ezra E. Genetic analysis of the Israeli Holstein dairy cattle 6. Kadri NK, Harland C, Faux P, Cambisano N, Karim L, Coppieters W, et al. Coding population for production and nonproduction traits with a multitrait animal and noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B model. J Dairy Sci. 2004;87(5):1519–27. – affect recombination rate in cattle. Genome Res. 2016;26(10):1323 32. 34. Seroussi E, Klompus S, Silanikove M, Krifucks O, Shapiro F, Gertler A, et al. 7. Ahlawat S, De S, Sharma P, Sharma R, Arora R, Kataria RS, et al. Evolutionary Nonbactericidal secreted phospholipase A2s are potential anti-inflammatory dynamics of meiotic recombination hotspots regulator PRDM9 in bovids. factors in the mammary gland. Immunogenetics. 2013;65(12):861–71. – Mol Gen Genomics. 2016;292(1):117 31. 35. R Core Team. R: a language and environment for statistical computing. 8. Shen B, Jiang J, Seroussi E, Liu GE, Ma L. Characterization of recombination R Foundation for statistical computing. Vienna; 2013. http://www.R- features and the genetic basis in multiple cattle breeds. BMC Genomics. project.org/. 2018;19(1):304. 36. VanRaden PM, O'Connell JR, Wiggans GR, Weigel KA. Genomic evaluations 9. Petit M, Astruc JM, Sarry J, Drouilhet L, Fabre S, Moreno CR, et al. Variation with many more genotypes. Genet Sel Evol. 2011;43:10. in recombination rate and its genetic determinism in sheep populations. 37. Garrick DJ, Taylor JF, Fernando RL. Deregressing estimated breeding values – Genetics. 2017;207(2):767 84. and weighting information for genomic regression analyses. Genet Sel Evol. 10. Paigen K, Petkov PM. PRDM9 and its role in genetic recombination. Trends 2009;41:55. – Genet. 2018;34(4):291 300. 38. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: molecular 11. Myers S, Bowden R, Tumian A, Bontrop RE, Freeman C, MacFie TS, et al. evolutionary genetics analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9. Drive against hotspot motifs in primates implicates the PRDM9 gene in meiotic recombination. Science. 2010;327(5967):876–9. 12. Baker Z, Schumer M, Haba Y, Bashkirova L, Holland C, Rosenthal GG, et al. Publisher’sNote Repeated losses of PRDM9-directed recombination despite the conservation Springer Nature remains neutral with regard to jurisdictional claims in of PRDM9 across vertebrates. eLife. 2017;6:e24133. published maps and institutional affiliations. 13. Ma L, Cole JB, Da Y, VanRaden PM. Symposium review: genetics, genome- wide association study, and genetic improvement of dairy fertility traits. J Dairy Sci. 2018;102(4):3735–43. 14. Mukhopadhyay CS, Gupta AK, Yadav BR, Khate K, Raina VS, Mohanty TK, et al. Subfertility in males: an important cause of bull disposal in bovines. Asian-Aust J Anim Sci. 2010;23(4):450–5. 15. Zhou Y, Shen B, Jiang J, Padhi A, Park KE, Oswalt A, et al. Construction of PRDM9 allele-specific recombination maps in cattle using large-scale pedigree analysis and genome-wide single sperm genomics. DNA Res. 2018;25(2):183–94. 16. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81(3):559–75. 17. Weller JI, Ron M. Genetic analysis of fertility traits in Israeli Holsteins by linear and threshold models. J Dairy Sci. 1992;75(9):2541–8. 18. Baruch E, Weller JI, Cohen-Zinder M, Ron M, Seroussi E. Efficient inference of haplotypes from genotypes on a large animal pedigree. Genetics. 2006; 172(3):1757–65. 19. Berg IL, Neumann R, Lam KW, Sarbajna S, Odenthal-Hesse L, May CA, et al. PRDM9 variation strongly influences recombination hot-spot activity and meiotic instability in humans. Nat Genet. 2010;42(10):859–63. 20. Grey C, Baudat F, de Massy B. PRDM9, a driver of the genetic map. PLoS Genet. 2018;14(8):e1007479. 21. Elsik CG, Tellam RL, Worley KC, Gibbs RA, Abatepaulo ARR, Abbey CA, et al. The genome sequence of taurine cattle: a window to ruminant biology and evolution. Science. 2009;324(5926):522–8. 22. Rice WR. Nothing in genetics makes sense except in light of genomic conflict. Annu Rev Ecol Evol Syst. 2013;44:217–37. 23. Schenkel MA, Pen I, Beukeboom LW, Billeter JC. Making sense of intralocus and interlocus sexual conflict. Ecol Evol. 2018;8(24):13035–50. 24. Hansen M. Genetic investigations on male and female fertility in cattle. Livest Prod Sci. 1979;6(4):325–34. 25. Taylor JF, Schnabel RD, Sutovsky P. Review: genomics of bull fertility. Animal. 2018;12(s1):s172–83.