fgene-12-640039 March 2, 2021 Time: 17:46 # 1

ORIGINAL RESEARCH published: 08 March 2021 doi: 10.3389/fgene.2021.640039

Genomic Loci Affecting Milk Production in German Black Pied Cattle (DSN)

Paula Korkuc´ 1, Danny Arends1, Katharina May2, Sven König2 and Gudrun A. Brockmann1*

1 Albrecht Daniel Thaer-Institute for Agricultural and Horticultural Sciences, Animal Breeding Biology and Molecular Genetics, Humboldt University Berlin, Berlin, , 2 Institute of Animal Breeding and Genetics, Justus-Liebig-University of Giessen, Giessen, Germany

German Black Pied cattle (DSN) is an endangered population of about 2,550 dual- purpose cattle in Germany. Having a milk yield of about 2,500 kg less than the predominant dairy breed Holstein, the preservation of DSN is supported by the German government and the EU. The identification of the genomic loci affecting milk production Edited by: Ino Curik, in DSN can provide a basis for selection decisions for genetic improvement of DSN in University of Zagreb, Croatia order to increase market chances through the improvement of milk yield. A genome- Reviewed by: wide association analysis of 30 milk traits was conducted in different lactation periods Lingyang Xu, Institute of Animal Sciences, Chinese and numbers. Association using multiple linear regression models in R was performed Academy of Agricultural Sciences, on 1,490 DSN cattle genotyped with BovineSNP50 SNP-chip. 41 significant and 20 China suggestive SNPs affecting milk production traits in DSN were identified, as well as 15 Doreen Becker, Leibniz Institute for Farm Animal additional SNPs for protein content which are less reliable due to high inflation. The Biology (FBN), Germany most significant effects on milk yield in DSN were detected on chromosomes 1, 6, and *Correspondence: 20. The region on chromosome 6 was located nearby the casein gene cluster and the Gudrun A. Brockmann [email protected] corresponding haplotype overlapped the CSN3 gene (casein kappa). Associations for fat and protein yield and content were also detected. High correlation between traits of Specialty section: the same lactation period or number led to some SNPs being significant for multiple This article was submitted to Livestock Genomics, investigated traits. Half of all identified SNPs have been reported in other studies, a section of the journal previously. 15 SNPs were associated with the same traits in other breeds. The other Frontiers in Genetics associated SNPs have been reported previously for traits such as exterior, health, meat Received: 10 December 2020 Accepted: 11 February 2021 and carcass, production, and reproduction traits. No association could be detected Published: 08 March 2021 between DGAT1 and other known milk genes with milk production traits despite the Citation: close relationship between DSN and Holstein. The results of this study confirmed that Korkuc´ P, Arends D, May K, many SNPs identified in other breeds as associated with milk traits also affect milk traits König S and Brockmann GA (2021) Genomic Loci Affecting Milk in dual-purpose DSN cattle and can be used for further genetic analysis to identify genes Production in German Black Pied and causal variants that affect milk production in DSN cattle. Cattle (DSN). Front. Genet. 12:640039. Keywords: genome-wide association study, cattle, SNP chip, Holstein cattle, CSN3 gene, DGAT1 gene, casein doi: 10.3389/fgene.2021.640039 gene

Frontiers in Genetics| www.frontiersin.org 1 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 2

Korkuc´ et al. GWAS for Milk Traits in DSN

INTRODUCTION MATERIALS AND METHODS

German Black Pied cattle (DSN, “Deutsches Schwarzbuntes Populations Niederungsrind”) is an endangered breed of about 2,550 dual- Data of 1,816 DSN cows was available for this study. These cows purpose cattle in Germany. The ancestor population of this represent about two thirds of the DSN population registered breed is considered as one of the founder populations of in Germany (The Society for the Conservation of Old and the nowadays dominantly used high yielding German Holstein Endangered Livestock Breeds (GEH), 2018). The cows were born breed (Köppe-Forsthoff, 1967; Grothe, 1993), which is one between 2005 and 2016, and descended from 76 sires. Cows reason why the German government and the European Union were raised on six farms. In order to reduce environmental support its preservation. This support is necessary because influences, we filtered to have at least 20 DSN cows per farm, milk yield of DSN cows is about 2,500 kg less compared to per sire, and per birth year. This reduced the data set to 1,490 German Holstein, which has led to the replacement of DSN DSN cows from five farms, born between 2007 and 2016, and by Holstein cattle. Although DSN is a dual-purpose breed, descending from 28 sires. milk yield is the main contributor to the economic merit In order to compare the results that we obtained in DSN with and meat yield and carcass quality do not compensate the German Holstein, we used GWAS results previously obtained lower milk yield. in our lab on a population of 2,400 German Holstein bulls. The interest in preservation of DSN does not only stem This population has been used and described in detail repeatedly from its relationship to Holstein cattle and retaining genetic (Zielke et al., 2011, 2013; Abdel-Shafy et al., 2018). diversity as a gene reserve for the future. The interest in DSN cattle is also due to their advantageous traits. For Phenotypes example, the milk fat and protein content with 4.3 % and Milk traits with corresponding pedigree data were obtained 3.7 %, respectively, is higher compared to German Holstein from the cattle breeding association “RBB Rinderproduktion (RBB Rinderproduktion Berlin-Brandenburg GmbH., 2020). Berlin-Brandenburg GmbH” in April 2020. Traits included Moreover, DSN cattle are considered to be more robust for milk, fat, and protein yield in kilogram (milk kg, fat kg, and grazing and more fertile. protein kg) for three lactation periods: 100-days (100d), 200- One of the long-term goals for maintaining DSN is to days (200d), and 305-days (305d). 305 days data was available reduce governmental financial support by increasing economic for the first three lactations (LA1, LA2, and LA3), whereas value of the breed through the improvement of milk yield. 100 and 200 days data was available only for LA1. 305 days Simultaneously, the advantageous traits and the typical body data of cows with < 270 days in milk was not considered. composition of this dual-purpose breed should be maintained. Fat and protein content (fat %, protein %) were calculated by So far, little is known about the genes affecting milk traits dividing fat or protein kg by milk kg of the respective lactation. in DSN. Recently, genome-wide association studies (GWAS) The lactation mean (LAm) was calculated for cows with full in DSN for health traits identified three significant and two data in the first three lactations. This leads to a total of 30 suggestive SNPs for clinical mastitis (Meier et al., 2020) and investigated milk traits in this study. For each trait, outliers 44 significant SNPs for endoparasite resistance (May et al., were defined as values outside the 1.5 times interquartile range 2019). Another study, investigating genomic variation in the within each farm and removed from the data set. This leads casein gene cluster, found three protein variants of CSN2 and to data being available for 1,478, 1,476, 1,372, 1,160, 862, and CSN3 and fixed protein variants for CSN1S1 and CSN1S2 685 DSN cows in LA1 (100d), LA1 (200d), LA1, LA2, LA3, and (Meier et al., 2019). In contrast, 63,404 associations with LAm, respectively. milk traits in general were available from Cattle QTLdb Release 42 (accessed 09/21/2020) (Hu et al., 2019), whereof Genotypes around 79 % were found in studies with Holstein cattle. DSN cattle cows were genotyped using the Illumina R Furthermore, 10 % of the reported associations in those Bovine50SNP v3 BeadChip (Illumina, Inc., 5200 Illumina studies with Holstein cattle are reported within the first Way, San Diego, CA, United States). SNP chip probe sequences 10 Mb on chromosome 14, where the DGAT1 gene is located were remapped against the Bos taurus genome version ARS- which is known to influence milk yield and composition UCD1.2 (Rosen et al., 2018) using NCBI Nucleotide-Nucleotide (Grisart et al., 2002). BLAST version 2.2.31+ (Altschul et al., 1990) in order to obtain In this study, we investigated the genetic basis contributing to genome positions for SNPs on the ARS-UCD1.2 genome build. the variation in milk performance of the current DSN population SNP probes that mapped to multiple genomic locations were by GWAS. Since the DSN population is small, power to find removed. Genotype quality control was performed for animals significant genomic loci is limited. Nevertheless, there is an and SNPs. SNP calls with a GC-score < 0.7 were set to missing. urgent need to provide genetic association information for small Animals with a call rate < 90% were discarded. SNPs with a endangered populations to support their preservation and further call rate < 95% and a minor allele frequency (MAF) < 5% were development. Even if not all genomic loci are significantly removed. Lastly, genotype groups with less than 30 observations detectable, the obtained results in DSN and the comparison were set to missing to prevent spurious association. After to related breeds provide a basis for selection decisions to quality control, 36,929 high confident SNPs were available for genetically improve DSN. further analysis.

Frontiers in Genetics| www.frontiersin.org 2 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 3

Korkuc´ et al. GWAS for Milk Traits in DSN

Genome-Wide Association Study null model [Ys = (1|animals)] and the null model extended Genome-wide association studies was performed with multiple with one of the covariates [Yxs = covariatex + (1|animals)] linear regression models implemented in the R language for (Supplementary Table 3). Separate analyses were performed for statistical computing (version 4.0.3) (R Core Team, 2018). the traits milk, fat and protein yield, and fat and protein content. Various models were tested and compared by calculating the The significance threshold was adjusted for multiple testing inflation factor λ of each model to judge the extent of the excess of using Bonferroni (BF) correction. The number of independent type-I errors (Devlin and Roeder, 1999). Finally, the model with tests (Meff ) was estimated by the simpleM method, to account for the lowest λ was selected (Supplementary Table 1). On average linkage between neighboring SNPs (Gao et al., 2008). Window 27 % of top 100 SNPs of the selected model are shared with each of sizes from 100 to 630 (630 corresponds to the minimum number the other models tested (Supplementary Figure 1). The resulting of SNPs in one out of all chromosomes) in this study were tested. model for testing the additive effect of each SNP was: The lowest estimated Meff value was 18,278 at a window size of 630 which was then used for Bonferroni correction. After ∗ ∗ ∗ ∗ ∗ = + + + + + Yijklmnopq psi fj sk byl bsm cyn Bonferroni correction, SNPs were considered highly significant ∗ ∗ when PBF < 0.01, significant when PBF < 0.05, or suggestive + cs + ac + g + e (1) o p q ijklmnopq when PBF < 0.1. In the case of interaction terms, Meff was multiplied by the number of lactation stages or lactation numbers where Y is the trait, ps represents the covariate for ijklmnopq i (n = 18,278∗3) and p-values were noted as P . Figures were population stratification followed by the covariates for farm BI produced using the R package ggplot2 (version 3.3.2) (Wickham f , sire s , birth year by , birth season bs , calving year cy , j k l m n et al., 2016) unless otherwise stated. For SNP effect plots, calving season cs , age at first calving in days ac , the SNP o p p-values between genotype groups were estimated using pairwise genotype g , and e is the residual error. Covariates q ijklmnopq t-tests and displayed using R package ggpubr (version 0.4.0) marked with an asterisk “∗” were only included into the model (Kassambara, 2020). when the difference in the Akaike information criterion (1AIC) ≤ − = was 10 between the null model (Yi psi) and the null model Haplotypes and Gene Annotation extended with one of the covariates (Y = ps + covariate ) ix i x Haplotype blocks in which SNPs are located were computed (Supplementary Table 2). All covariates were included as fixed using Haploview version 4.2 (Barrett et al., 2005). In order to effects as this resulted in the lowest inflation factor λ. Population define blocks in Haploview, the customized block definition was stratification ps among the 1,490 DSN cows was examined using i used with the option “solid LD spine” set to D0 > 0.6. Genes pairwise population concordance tests on an identity-by-state within each haplotype were annotated using R package biomaRt matrix implemented in PLINK (version 1.90) with a p-value (version 2.48.0) (Durinck et al., 2009) using the Ensembl Bos cut-off of 0.0001 (Purcell et al., 2007; Chang et al., 2015). The taurus database based on the ARS-UCD1.2 assembly (Yates et al., resulting 33 clusters of relatedness were included as a covariate 2020). If no haplotype block could be estimated for a SNP, in the GWAS model. genes within ± 70 kb up- and down-stream (corresponding to Interactions between lactation stages and SNP genotypes were the median haplotype block length of 140 kb) of the respective investigated for 100, 200, and 305 days performance data in SNP position were considered as positional candidate genes. LA1 and between lactation number and SNP genotypes for Additionally, the 1 Mb region centered around the investigated 305 days performance data in lactations 1 to 3. For testing the SNP was inspected for candidate genes. If consecutive 1 Mb genetic interaction effects between milk, fat, and protein yield regions overlapped, they were merged by taking the start of the data and specific lactation periods, the difference between the first and the end of the second region. difference between birth and 100 days, 100 and 200 days, and 200 and 305 days performance data was used as input data. Overlap With Cattle QTLdb A linear mixed-effects model as described by Lu and Bovenhuis (2019) was fitted using the R package lmerTest (version 3.1-3) Associated SNPs were compared to GWAS results and known (Kuznetsova et al., 2017): QTLs from Cattle QTLdb Release 42 (Hu et al., 2019) by using their SNP-IDs (rs). Further, also the 1 Mb region centered around = + ∗ + ∗ + ∗ + ∗ + ∗ the associated SNP was used to find overlapping loci with other Yijklmnopqrs psi fj sk byl bsm cyn publications. The entries from Cattle QTLdb were categorized + ∗ + ∗ + + + + | + cso acp gq Lr gq x Lr (1 animals) eijklmnopqrs into the trait categories “exterior,” “health,” “meat and carcass,” (2) “milk,” “production,” and “reproduction.”

where the same covariates were used as in Eq. 1 but extended by the lactation stage or lactation number Lr, the interaction term RESULTS between lactation stage or lactation number and SNP genotype Lr x gq, and a random intercept included for each individual Associations for Almost All Traits animal (1| animals) to compensate for repeated measurements Genome-wide association analyses using 1,490 DSN cows and on the animals. Covariates marked with an asterisk “∗” were 36,929 high confident SNPs identified associations for all traits only included into the model when the difference in the except for 100-days fat yield in lactation 1 (fat kg 100 days LA1) Akaike information criterion (1AIC) was ≤ = −10 between the and the lactation mean of the 305-days fat yield (fat kg LAm). The

Frontiers in Genetics| www.frontiersin.org 3 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 4

Korkuc´ et al. GWAS for Milk Traits in DSN

inflation factor λ ranged from 1.27 for fat yield (LA2 305 days) PBF < 0.02, Supplementary Figure 3D) decreased milk yield to 2.19 for protein content (LA2 305 days) although extensive until 100, 200, and 305 days in LA1 by 71, 132, and 209 kg, efforts were taken to select a GWAS model that showed the lowest respectively. The corresponding haplotype block (6:80,530,130- overall inflation (Supplementary Table 1). Nonetheless, the 80,626,467) did not contain any gene (Supplementary Table 8). inflation factor λ and the Q-Q-plots for all traits (Supplementary However, the surrounding 1 Mb region harbors three genes Figure 2) showed that the level of statistical significance was ENSBTAG00000050977, EPHA5 (EPH receptor A5), and overestimated in general. We assume that this especially holds for ENSBTAG00000053125. Additional SNPs for milk yield in protein content in all lactation periods and numbers and for fat LA1 were found on chromosome 1 (rs42347234, 1:115,539,332, content in LA1 (100 and 305 days). The number of false positives PBF = 0.03846; rs41255272, 1:143,812,919, PBF = 0.01478), was increased as λ-values higher or equal to 1.5 were observed chromosome 4 (rs42753220, 4:91,071,208, PBF = 0.01390), for these traits. Even if we reduce the genome-wide significance chromosome 5 (rs41652414, 5:114,935,739, PBF = 0.02814), and threshold to PBF < 0.0005, 16 SNPs remain associated. The results chromosome 25 (rs109027867, 25:11,019,450, PBF = 0.03396) of those traits are listed in the Supplementary Table 4. Results (Supplementary Figures 3E–I). The minor allele of these SNPs of associations for 20 traits where λ was below 1.5 are presented showed an increase in milk yield by at least 64 kg, 141, and 204 kg in Table 1. For those traits, 14 highly significant, 27 significant, for 100, 200, and 305 days performance data of LA1, respectively, and 20 suggestive SNPs were found. Some SNPs were associated expect for the SNP rs109027867 on chromosome 25 which with multiple traits because of high correlation between those showed a decrease of the minor allele by 128 kg for 200 days data. traits (Supplementary Tables 5,6). High correlation was found In lactation 2 (LA2), the only association with milk yield between performance data within the first lactation (100, 200, and was found on chromosome 20 (Table 1 and Figure 1) with the 305 days of LA1) (r > 0.76), and between traits of the first three SNP rs110353352 (20:71,448,297, PBF = 0.00040, Supplementary lactations (LA1, LA2, and LA3) and the lactation mean (LAm) Figure 3J). The minor allele C, which segregated at a frequency (r > 0.77). Fat and protein yield within a lactation were also of 0.13, was advantageous for milk production increasing milk highly correlated (r > 0.75). yield by 484 kg. This same SNP allele had also positive effects of 17.3 kg and 15.0 kg on fat and protein yield in LA2, respectively. Effects on Milk Yield The haplotype block of this SNP (20:71,378,297-71,518,297) Since milk yield is the economically most important production contains five genes TRIP13 (thyroid hormone receptor interactor trait, there is major interest in identifying genetic loci 13), BRD9 (bromodomain containing 9), ENSBTAG00000054687, contributing to its variance in DSN cows (Table 1 and Figure 1). TPPP (tubulin polymerization promoting protein), and CEP72 Using the mean milk yield across the first three lactations (LAm), (centrosomal protein 72) (Supplementary Table 8). only two loci were found to be associated, one on chromosome 8 In lactation 3 (LA3), the most significant association for at 53.7 Mb (8:53,663,120, rs41793393, PBF = 0.03554) and another milk yield was found on chromosome 1 (Table 1 and one on chromosome 9 at 10.6 Mb (9: 10,638,013, rs133869947, Figure 1). The top SNP rs43246393 (1:79,757,250, PBF = 0.00020, PBF = 0.04397) (Supplementary Figures 3A,B). These two Supplementary Figure 3K) was also associated with fat loci were also associated with milk yield in lactation 3 (LA3). (PBF = 0.01868) and protein yield (PBF = 0.00007) in LA3 Examining different lactation periods and lactations separately, as well as with protein yield of the average across all three we found additional loci on chromosomes 1, 4, 5, 6, 20, 24, 25, lactations (PBF = 0.02641). The minor allele T, segregating and X. Since the significance varied largely between lactation at a frequency of 0.34, accounted for 390 kg, 13.4 kg and periods and lactation numbers, the interaction between the SNPs 13.2 kg less milk, fat, and protein in LA3, respectively, and and lactation stage or lactation number was also investigated for 10.3 kg less protein in the lactation mean (LAm). The (Supplementary Table 7). test for interaction between genotypes at this top SNP and In the first lactation (LA1), chromosome 6 was most lactation stage or lactation number showed significant effects significantly associated (Table 1 and Figure 1). In the region on milk, fat, and protein yields (PBI < 0.03, Supplementary between 60.2 and 87.2 Mb, 9 SNPs were associated with milk Table 7). The corresponding haplotype block around this SNP yield in all lactation periods of LA1. The top marker rs109592101 (1:79,606,618-79,757,250) contains three genes BCL6 (BCL6 (6:86,112,142, PBF = 0.00009, Supplementary Figure 3C) transcription repressor), RTP2 (receptor transporter protein 2), showed the highest association with the 100 days performance. and SST (somatostatin). In addition to the locus on chromosome The minor allele A with a frequency of 0.42 accounted for a 1, two SNPs on chromosome 8 were found to be associated decrease in milk yield of 78 kg, 119 kg and 193 kg after 100, (rs41793393, 8:53,663,120, PBF = 0.00508, Supplementary 200, and 305 days in lactation, respectively. The same SNP also Figure 3l; rs109542652, 8:53,867,972, PBF = 0.02538). As suggestively affected (PBF = 0.08431) protein yield with a minor mentioned above, this region was also identified for the average allele effect of −1.8 kg after 100 days in lactation. In the haplotype milk yield over lactations 1–3 (LAm). The minor allele T block around the top SNP (6:85,633,295-87,011,619, Figure 2 of the lead SNP rs41793393 (frequency 0.19) is decreasing and Supplementary Table 8) 16 genes are located, among them milk yield in LA3 by 356 kg and in LAm by 316 kg. CSN3 (kappa casein) is known as main milk protein gene and The corresponding haplotype block (8:53,467,317-54,449,954) as a gene affecting milk yield and composition in Holstein cattle contains the genes GNA14 (G protein subunit alpha 14), GNAQ (Mckenzie et al., 1984; Ng-Kwai-Hang et al., 1984). Another SNP (G protein subunit alpha q), CEP78 (centrosomal protein 78), in the same region on chromosome 6 (rs110291935, 6:80,530,130, and PSAT1 (phosphoserine aminotransferase 1). Additional

Frontiers in Genetics| www.frontiersin.org 4 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 5

Korkuc´ et al. GWAS for Milk Traits in DSN

TABLE 1 | GWAS for milk production traits in DSN.

Chr Position (bp) SNP-ID Ref Alt MA MAF Trait N β SE(β)PBF

1 77,049,090 rs110516247 A C C 0.11 Milk kg (LA3) 846 −515 100 0.00679 Protein kg (LA3) 844 −16.3 3.3 0.02351 79,757,250 rs43246393 C T T 0.34 Milk kg (LA3) 856 −390 76 0.00020 Protein kg (LA3) 854 −13.2 2.5 0.00007 Protein kg (LAm) 679 −10.3 2.2 0.02641 Fat kg (LA3) 857 −13.4 3.1 0.01868 115,539,332 rs42347234 C T T 0.11 Milk kg (LA1) 1355 336 70 0.03846 Protein kg (LA1) 1357 11.9 2.4 0.01140 128,503,005 rs109686415 C T C 0.49 Fat % (LA2) 1115 −0.082 0.016 0.00357 Fat % (LAm) 660 −0.082 0.016 0.01599 143,812,919 rs41255272 A G A 0.50 Milk kg (LA1) 1367 204 41 0.01478 Protein kg (LA1) 1369 6.6 1.4 0.04613 2 123,100,345 rs110278850 A C C 0.45 Protein kg (200 days) 1436 3.5 0.8 0.04941 3 15,947,663 rs110565504 T C T 0.24 Fat % (200 days) 1458 0.056 0.019 0.03398 4 47,674,986 rs110143001 G A G 0.42 Fat % (LA3) 844 0.086 0.019 0.04275 91,071,208 rs42753220 C T C 0.40 Milk kg (LA1) 1370 213 44 0.01390 5 62,094,191 rs29009717 A G A 0.18 Fat % (LA3) 853 −0.125 0.032 0.09655 83,959,138 rs41660560 G T G 0.35 Protein kg (LA2) 1160 9.4 2.1 0.01767 93,953,629 rs109945272 T C T 0.24 Fat % (LA2) 1136 −0.077 0.024 0.08780 114,935,739 rs41652414 C T T 0.19 Milk kg (LA1) 1357 220 71 0.02814 Milk kg (100 days) 1457 64 24 0.08596 Milk kg (200 days) 1452 141 46 0.00923 Protein kg (LA1) 1359 7.2 2.4 0.04450 Protein kg (100 days) 1459 2 0.7 0.08697 Protein kg (200 days) 1457 4.1 1.4 0.02288 Fat kg (200 days) 1447 5.1 1.7 0.07535 6 60,162,206 rs41605188 G A A 0.34 Milk kg (200 days) 1469 119 29 0.09377 61,733,082 rs135571989 C A A 0.21 Milk kg (LA1) 1320 −195 71 0.07717 Milk kg (200 days) 1416 −110 45 0.08954 62,988,117 rs42436495 T G G 0.30 Milk kg (LA1) 1370 225 49 0.01789 Milk kg (200 days) 1469 136 32 0.03921 63,010,380 rs42436482 A G G 0.29 Milk kg (LA1) 1368 220 52 0.04884 Milk kg (200 days) 1467 132 34 0.09298 64,928,624 rs42482917 T C C 0.32 Milk kg (200 days) 1463 111 30 0.09593 77,688,509 rs41652041 A G G 0.35 Milk kg (LA1) 1372 −212 49 0.06866 Milk kg (200 days) 1470 −138 31 0.03904 80,530,130 rs110291935 T C T 0.41 Milk kg (LA1) 1370 −209 42 0.01362 Milk kg (100 days) 1474 −71 14 0.00519 Milk kg (200 days) 1469 −132 27 0.00655 80,626,467 rs109872424 T C T 0.33 Milk kg (100 days) 1473 −64 16 0.06727 Milk kg (200 days) 1468 −124 30 0.02416 86,112,142 rs109592101 A G A 0.42 Milk kg (LA1) 1372 −193 41 0.01990 Milk kg (100 days) 1476 −78 14 0.00009 Milk kg (200 days) 1471 −119 26 0.04162 Protein kg (100 days) 1478 −1.8 0.4 0.08431 87,266,808 rs41591365 C T C 0.46 Protein kg (100 days) 1473 2.1 0.4 0.03608 88,164,411 rs41622837 A G A 0.14 Protein kg (200 days) 1444 −6 1.3 0.07033 8 53,663,120 rs41793393 C T T 0.19 Milk kg (LA3) 847 −356 115 0.00508 Milk kg (LAm) 673 −316 97 0.03554 53,867,972 rs109542652 G A A 0.14 Milk kg (LA3) 846 −489 101 0.02538 59,101,606 rs43550935 A G G 0.19 Milk kg (LAm) 685 −271 95 0.07475 Protein kg (LAm) 685 −9.1 3.2 0.02661 100,876,785 rs108983661 G A A 0.25 Fat kg (LAm) 683 12.5 3 0.05697

(Continued)

Frontiers in Genetics| www.frontiersin.org 5 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 6

Korkuc´ et al. GWAS for Milk Traits in DSN

TABLE 1 | Continued

Chr Position (bp) SNP-ID Ref Alt MA MAF Trait N β SE(β)PBF

9 10,638,013 rs133869947 A C A 0.38 Milk kg (LA3) 818 −297 70 0.06280 Milk kg (LAm) 645 −270 60 0.04397 26,353,699 rs110314239 G T T 0.48 Fat kg (LA1) 1366 −6.8 1.5 0.05814 10 3,611,602 rs42697353 T C C 0.49 Fat % (200 days) 1449 −0.061 0.013 0.03506 11 69,443,503 rs110564084 T C T 0.30 Protein kg (LA2) 1157 11.9 2.5 0.05681 92,712,210 rs110540697 T C C 0.41 Fat % (200 days) 1454 0.057 0.013 0.09715 12 66,340,756 rs41629344 T G G 0.39 Fat % (200 days) 1452 0.06 0.013 0.00380 14 26,340,400 rs41727315 G A A 0.06 Protein kg (100 days) 1470 4 0.9 0.06447 16 11,516,923 rs41623175 A G A 0.21 Fat % (200 days) 1459 −0.082 0.02 0.05676 30,668,830 rs43041491 G T G 0.11 Fat % (LA3) 835 0.153 0.031 0.02338 32,105,683 rs798259422 A G G 0.29 Fat % (200 days) 1458 −0.055 0.015 0.05996 33,789,714 rs41796289 C T C 0.39 Fat % (LA2) 1134 −0.061 0.015 0.02776 Fat % (200 days) 1450 −0.049 0.013 0.09433 40,391,486 rs41804404 A G G 0.46 Fat % (200 days) 1446 0.058 0.012 0.04642 46,683,276 rs110777881 A G G 0.36 Protein kg (100 days) 1444 −1.8 0.5 0.05105 46,773,225 rs43719805 T C C 0.47 Protein kg (100 days) 1472 1.9 0.4 0.04084 18 53,596,284 rs109907036 C T T 0.37 Fat % (LA2) 1136 0.086 0.017 0.00477 20 50,879,180 rs41948928 T C T 0.10 Fat % (200 days) 1445 −0.118 0.023 0.00777 71,448,297 rs110353352 C T C 0.13 Milk kg (LA2) 1137 484 86 0.00040 Protein kg (LA2) 1143 15 2.9 0.00503 Fat kg (LA2) 1137 17.3 3.5 0.01434 21 42,828,439 rs41978846 T G T 0.33 Protein kg (200 days) 1450 −4.2 0.9 0.09136 24 16,789,760 rs110860585 C T T 0.41 Milk kg (LA3) 862 −348 72 0.01192 30,250,034 rs110476141 G A G 0.36 Protein kg (LA1) 1355 −7 1.5 0.01207 Protein kg (200 days) 1453 −4.1 0.9 0.03475 Fat kg (LA1) 1348 −8.5 1.7 0.02349 25 7,944,597 rs109583598 T C C 0.29 Fat % (200 days) 1456 0.065 0.016 0.01082 9,711,895 rs110469759 A G A 0.17 Fat kg (LA1) 1368 10.8 3.5 0.00392 Fat kg (200 days) 1467 5.7 2 0.02959 1,1019,450 rs109027867 G T G 0.48 Milk kg (200 days) 1445 −128 26 0.03396 27 13,864,569 rs110009442 C G C 0.32 Fat % (LA2) 1125 0.061 0.018 0.08067 Fat % (200 days) 1442 0.06 0.015 0.09215 18,527,112 rs42957103 G A A 0.41 Fat kg (LA1) 1367 7.5 1.7 0.09536 28 25,196,334 rs41587054 A G G 0.10 Protein kg (100 days) 1419 3.8 0.8 0.06929 29 50,217,955 rs109241029 G A A 0.44 Fat % (LA2) 1137 0.076 0.016 0.04229 50,260,533 rs109840529 A G G 0.42 Fat % (LA2) 1118 −0.082 0.016 0.00338 50,326,170 rs110740589 A G A 0.41 Fat % (LA2) 1113 −0.09 0.016 0.00022 X 12,896,716 rs109188619 T A T 0.18 Milk kg (LA3) 860 546 122 0.02488 116,886,837 rs41626783 T C C 0.38 Protein kg (200 days) 1476 −4.6 1.1 0.08409 117,691,901 rs110011913 A T T 0.31 Protein kg (LA3) 851 −13.5 2.7 0.01321 Protein kg (LAm) 676 −10.8 2.4 0.07065 133,244,405 rs29018822 G T T 0.35 Fat kg (200 days) 1409 −6.8 1.4 0.07242

For each trait, significantly associated SNPs are listed with chromosome number (Chr), chromosomal position in base pairs (bp), the SNP reference ID (SNP-ID), the given allele on the forward strand in the reference genome ARS-UCD1.2 (Ref), the alternative allele (Alt), the minor allele in the examined DSN population (MA), and the minor allele frequency (MAF). Furthermore, the associated trait, the number of cows in the specific analysis of this trait (n), the allele substitution effect of the minor allele (β), the standard error [SE(β)], and the p-value after Bonferroni-correction (PBF ) are given. P-values of PBF < 0.1, PBF < 0.05, and PBF < 0.01 were considered as suggestive (highlighted in gray), significant, or highly significant (highlighted in bold), respectively.

associations with milk yield in LA3 were found for SNP Figure 4). The most significant effect on fat yield was found rs110860585 (24:16,789,760, PBF = 0.01192, Supplementary on chromosome 25. The SNP rs110469759 (25:9,711,895, Figure 3M) and rs109188619 (X:12,896,716, PBF = 0.02488, PBF = 0.00392, Supplementary Figure 3O) with an allele Supplementary Figure 3N). frequency of 0.17 showed an allele substitution effect of the A allele leading to an increase of 10.8 kg fat in LA1 and of Effects on Milk Fat Yield and Content 5.7 kg for 200 days performance in LA1. The association for fat Regions associated with milk fat yield were identified on yield on the other chromosomes (LA3: rs43246393, 1:79,757,250, chromosomes 1, 20, 24, and 25 (Table 1 and Supplementary PBF = 0.01868; LA2: rs110353352, 20:71,448,297, PBF = 0.01434;

Frontiers in Genetics| www.frontiersin.org 6 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 7

Korkuc´ et al. GWAS for Milk Traits in DSN

FIGURE 1 | Manhattan plots for milk yield in kg. Plots are shown for the lactation mean of the 305 days performance (LAm), the 100 and 200 days performance in LA1, and the 305-days performance in the first three lactations (LA1-LA3). Markers above the significance or suggestive thresholds are highlighted in red (solid line, α < 0.05) or blue (dashed line, α < 0.1), respectively.

LA1: rs110476141, 24:30,250,034, PBF = 0.02349, Supplementary Highly significant SNPs associated with fat content Figures 3P–R) coincided with effects on milk and protein yield. were identified on chromosomes 1, 12, 18, 20, and 29 The minor alleles on chromosomes 1 and 24 accounted for lower, (Table 1 and Supplementary Figure 5). The most significant the minor allele on chromosome 20 and 25 for higher milk, association was found on chromosome 29 with the top SNP protein and fat yields (Table 1). rs110740589 (29:50,326,170, PBF = 0.00022, Supplementary

Frontiers in Genetics| www.frontiersin.org 7 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 8

Korkuc´ et al. GWAS for Milk Traits in DSN

FIGURE 2 | Haplotype block including SNP rs109592101 (6:86,112,142) estimated with Haploview. The investigated SNP is located in a haplotype block (6:85,633,295-87,011619) that overlaps with the position of the CSN3 gene (6:85,645,854-85,658,926) which is known to be associated with milk production traits in several cattle breeds.

Figure 3S). This SNP was associated with fat content Additional loci affecting the changing fat contents during in LA2. The minor allele A of this SNP (MAF = 0.41) lactation 1 were found on chromosomes 5 and 6 (p < 0.0006, showed a decrease in fat content in LA2 by 0.09% points. Supplementary Table 7). On chromosome 5 these were The neighboring SNP rs109840529 (29:50,260,533) showed the SNPs rs41660560 (5:83,959,138, PBI = 0.00003) and interaction effects with the 100–305 days lactation stages of rs109945272 (5:93,953,629, PBI = 1.9E-06) and on chromosome LA1 on fat content (PBI = 1.4E-07, Supplementary Table 7). 6 the four SNPs rs41605188 (6:60,162,206, PBI = 1.1E- On chromosome 1, the SNP rs109686415 (1:128,503,005) 07), rs42436495 (6:62,988,117, PBI = 0.00005), rs42436482 was associated with fat content in LA2 (PBF = 0.00357, (6:63,010,380, PBI = 1.7E-07), and rs42482917 (6:64,928,624, Supplementary Figure 3T) and with the lactation mean PBI = 2.3E-12). The changing effects of fat content of these loci of LA1-3 (PBF = 0.01599). This SNP was also significant were significant only when examining the interaction between for the interaction effect between SNP and lactation SNP and lactation stage in LA1. stages for the trait fat content across (PBI = 0.00016, For the most significant region on chromosome 29, the Supplementary Table 7). The SNP rs109907036 on haplotype block (29:50,229,562-50,326,170) harboring the two chromosome 18 (18:53,596,284, PBF = 0.00477, Supplementary SNPs rs109840529 and rs110740589 contained one gene of Figure 3W) was also identified for fat content in LA2. The unknown function (ENSBTAG00000050398) (Supplementary associations on chromosomes 12 (rs41629344, 12:66,340,756, Table 8). The SNP on chromosome 1 was located in PBF = 0.00380, Supplementary Figure 3U) and 20 (rs41948928, a haplotype block (1:128,226,756-128,622,561) harboring the 20:50,879,180, PBF = 0.00777, Supplementary Figure 3V) genes TRIM42 (tripartite motif containing 42) and CLSTN2 were associated with fat content for 200 days performance (calsyntenin 2). The haplotype block around the SNP on data in LA1. The minor allele was the non-beneficial allele chromosome 18 (18:53,340,459-53,596,284) included 14 genes, on chromosome 1 and 20, and the beneficial allele on among them FOXA4 (forkhead box A3) and IGFL1 (IGF like chromosomes 12 and 18. family member 1).

Frontiers in Genetics| www.frontiersin.org 8 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 9

Korkuc´ et al. GWAS for Milk Traits in DSN

Effects on Milk Protein Yield and Content the association of SNPs in close proximity (<500 kb) to such Highly significant effects (PBF < 0.01) on milk protein yield were candidate genes (Ogorevc et al., 2009). We found 19 SNPs on identified on chromosomes 1 and 20 with data from LA3 and the SNP chip that were in close proximity to 10 candidate LA2, respectively (Table 1 and Supplementary Figure 6). The genes for milk production (ABCG2, CSN1S1, CSN1S2, CSN2, SNPs rs43246393 (1:79,757,250, PBF = 0.00007, Supplementary DGAT1, GHR, GPAT4, PAEP, PRLR, and SPP1). None of those Figure 3X) and rs110353352 (20:71,448,297, PBF = 0.00503, 19 SNPs were significantly or suggestively associated with any Supplementary Figure 3Y) on these two chromosomes showed of the investigated traits in DSN (Supplementary Table 10 and besides their effects on milk protein yield also an effect on milk Supplementary Figure 8). and fat yield in the same lactations. While the minor allele of the chromosome 1 SNP was disadvantageous for all yield traits, the Overlap With Other Publications minor allele of the chromosome 20 SNP was favorable. Overall 31 out of 76 identified SNPs (including 16 highly All association tests with milk protein content resulted in significant SNPs associated with protein content) that were highly inflated p-values with λ ≥ 1.5 (Supplementary Figure 2). found to be associated with milk traits in DSN were reported Therefore, we decided to focus on most significant results by in other studies (Supplementary Table 11) whose results lowering the genome-wide significance threshold 100-fold from were available in the Cattle QTLdb. Seven SNPs were found PBF < 0.05 to PBF < 0.0005. This resulted in 16 associated to be associated with the same milk traits in Holstein cattle SNPs for protein content in different lactation periods and as in DSN. These SNPs were located on chromosome 3 at numbers (Supplementary Table 4 and Supplementary Figure 7). 15 Mb (rs110073735, 3:15,470,670) and 22 Mb (rs41587408, These associations were found on chromosomes 2, 3, 5, 6, 10, 3:21,692,628), chromosome 6 between 86−87 Mb (rs109592101, 18, 20, and 28. The most significant association was found 6:86,112,142; rs41591365, 6:87,266,808), chromosome 10 on chromosome 3 (rs110474631, 3:21,764,453, PBF = 4.5E-06, at 46 Mb (rs109605174, 10:46,450,562), chromosome 20 Supplementary Figure 3Z). This SNP was found for protein at 51 Mb (rs41948928, 20:50,879,180), and chromosome X content in LA1 (200, 305 days), LA2, and LAm. The minor allele at 13 Mb (rs109188619, X:12,896,716) (Kolbehdari et al., G of this SNP had a frequency of 0.20 and decreased protein 2009; Cole et al., 2011; Meredith et al., 2012; Nayeri et al., content by on average 0.05 % points in the before mentioned 2016; Wang and Chatterjee, 2017; Jiang et al., 2019). Eight lactation periods and numbers. The corresponding haplotype additional SNPs that were significant in DSN were found block (3:21,764,453-21,819,709) does not contain any gene, but to be associated with other milk traits in Holstein and the 1 Mb region harbors 23 different genes (Supplementary Brown Swiss cattle. Those were located on chromosome Table 9). The SNP rs110291935 located on chromosome 6 5 at 75 Mb (rs110803736, 5:74,853,402), chromosome 6 was the second highest association found in all periods of between 65–81 Mb (rs42482917, 6:64,928,624; rs42224984, LA1 (6:80,530,130, PBF < 0.0003, Supplementary Figure 3A2). 6:65,962,733; rs41652041, 6:77,688,509; rs110291935, The minor allele T of this SNP increased protein content by 6:80,530,130; rs109872424, 6:80,626,467), chromosome 8 at 0.045-0.047 % points. The same SNP was already found to 54 Mb (rs41793393, 8:53,663,120), and chromosome 16 at 40 Mb be associated with milk yield in all periods of LA1 (Table 1), (rs41804404, 16:40,391,486) (Poulsen et al., 2015; Buitenhuis where the minor allele decreased milk yield by 71, 132, and et al., 2016; Dadousis et al., 2017; Jiang et al., 2019). Further, 16 209 kg in the 100, 200, and 305 days performance data, associations to exterior, health, meat and carcass, production, respectively. The corresponding haplotype and candidate genes reproduction traits in diverse breeds were reported (Snelling were described in detail in the section of associations with milk et al., 2010; Bolormaa et al., 2011; Cole et al., 2011; Hawken yield. The third highest association was found on chromosome et al., 2012; McClure et al., 2012; Doran et al., 2014; Aliloo et al., 10 at 46.5 Mb with the top SNP rs109605174 (10:46,450,562, 2015; Mészáros et al., 2015; Mapholi et al., 2016; Parker Gaddis PBF = 6.5E-06, Supplementary Figure 3B2) in LA2. The et al., 2016; Li et al., 2017; Mateescu et al., 2017; Nayeri et al., minor allele of this SNP increased protein content by 0.055 % 2019). When expanding the region of interest not only to single points. The neighboring SNPs rs109277788 (10:44,773,979, SNPs reported in this study, but to 1 Mb regions centered at the PBF = 0.00002) and rs43625129 (10:47,670,717, PBF = 9.2E-06) respective SNPs, all of the regions identified in DSN overlap with increased also protein content in LA2 and in the first 100 days associations uploaded to the Cattle QTLdb. in LA1. The corresponding haplotype block (10:46,330,098- Especially, for the region on chromosome 6 between 80.5– 46,672,954) comprises seven genes ENSBTAG00000054388, 87.2 Mb (flanking SNPs: rs110291935 and rs41591365), which ENSBTAG00000050908, ENSBTAG00000019474, FBXL22 (F- was significantly associated with milk and protein yield in box and leucine rich repeat protein 22), USP3 (ubiquitin specific all investigated periods of LA1 in our study, many different peptidase 3), CA12 (carbonic anhydrase 12), and APH1B (aph-1 associations could be found in Cattle QTLdb. These associations homolog B, gamma-secretase subunit). included milk and protein yield as well as milk kappa-casein content in Holstein cattle (Meredith et al., 2012; Buitenhuis et al., 2016; Jiang et al., 2019), curd firmness and cheese fat recovery Missing Associations to Regions With in Brown Swiss cattle (Dadousis et al., 2017), health traits such Known Effects on Milk Production as somatic cell score in Holstein cattle (Jiang et al., 2019), and Since some genes are well known for their effects on milk exterior traits such as facial pigmentation in cattle yield and composition in Holstein cattle, we separately tested (Mészáros et al., 2015).

Frontiers in Genetics| www.frontiersin.org 9 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 10

Korkuc´ et al. GWAS for Milk Traits in DSN

DISCUSSION Since population stratification has a significant effect on the number of false positive QTLs detected during association In this study, we investigated genetic factors underlying milk analysis, we tried multiple approaches to reduce the inflation production traits in DSN cattle. We detected 41 significant and 20 factor λ for different traits of interest. The higher the inflation suggestive SNPs mostly found in regions previously associated to factor λ, the higher the number of false-positively reported milk production traits or other trait categories. The high overlap significant associations in GWAS analyses. Although we tried of associated SNPs in this study with other studies shows that the to compensate for population stratification by using pairwise regions that influence milk traits in DSN may have similar or even population concordance test and paternal pedigree information, pleiotropic functions in other breeds. we were unable to capture the population structure entirely. That We observed a high overlap of identified genomic regions in is most likely due to the fact that the population size of DSN is DSN and Holstein, even if we did not find significant effects of small, and the number of breeding bulls used each generation is major candidate genes that affect milk production in Holstein limited. As such, DSN animals are not unrelated and complex cattle. While the key genes driving e.g., milk yield or fat and family relationships exist between animals, which is a common protein content might be different between DSN and Holstein, issue in small populations. Thus, we assume that the residual it is surprising that especially the well described association in the inflation is genetically caused due to high linkage disequilibrium region of and around DGAT1 was not detected in DSN although between SNPs. Moreover, using both population stratification DSN and Holstein cattle are closely related. Although the MAFs and pedigree information led to an overfitting of the model that of the three closest SNPs to DGAT1 gene are high (MAF = 0.44– took away genetic variance. Further genetic variance was lost 0.47) in DSN (Supplementary Table 10), we observed that the as we required that each sire was represented by at least 20 actual variant causing the A232K substitution is very rare having offspring in the phenotype data and thus the number of sires a MAF of 0.02 (preliminary results of sequencing data from 57 contributing to the analyzed DSN population was reduced by DSN cattle). It is possible that not only the causal variant of two-thirds. By doing so, also the number of animals available for DGAT1, but also of other known milk genes are rare in DSN and GWAS was reduced and as a result the statistical power to detect thus do not have much effect in the DSN population. significant associations. Nonetheless, we believe that lowering the Several interesting candidate genes were found including genetic variance and statistical power in favor of lower inflation of CSN3 gene (kappa casein) that belongs to the casein cluster p-values is the proper way to prevent false positive associations. on chromosome 6 at 86 Mb, which are known to influence The results of this study could be used as prior information milk fat and protein content as well as milk properties (Caroli to detect SNPs associated with milk production traits in et al., 2009), also plays an important role in DSN for milk other (related) breeds. For that, cattle from other breeds and protein yield. The genetic architecture of the casein gene could be genotyped for single SNPs (e.g., top SNPs) instead cluster of DSN in comparison to other breeds was recently of being genotyped using SNP chips or even being whole investigated in detail (Meier et al., 2019). Among 14 investigated genome sequenced which would reduce costs. Furthermore, breeds in that study, the casein gene cluster of DSN was most since genotypic data of a sufficient amount of DSN cattle was similar to Danish Red. One of the most significant associations not available yet, genomic breeding is not yet established. The on chromosome 1 at 79 Mb was close to SST (somatostatin) identified SNPs and their associations could be directly used and BCL6 (BCL6 transcription repressor). Since SST regulates to make breeding decisions by DSN breeders. Even predictions the secretion of pituitary hormones including prolactin (Eigler about performance of young DSN cows that do not have any and Ben-Shlomo, 2014), it may contribute to the regulation phenotypic data yet could be made if genotypic data would be of milk production (Dybus, 2002; Alipanah et al., 2007). And available. Thus, a first step toward genomic breeding for milk BCL6 was found to be expressed in mammary epithelium and traits in DSN cattle was made. Further analyses are needed to also in breast cancer (Logarajah et al., 2003). Also interesting is evaluate whether these associated SNPs do not show any negative SLC6A3 gene (solute carrier family 6 member 3) on chromosome effect on other trait categories such as fertility. 20 at 71 Mb to which the Gene Ontology term “lactation” (GO:0007595) was assigned. This assignment was inferred from electronic annotation by Gene Ontology Consortium (Ashburner CONCLUSION et al., 2000) and is possibly based on the homology of this gene in mice where it was shown that mutations in this gene The genome-wide association analysis identified several cause lactation failure (Bossé et al., 1997). Beside SLC6A3, other significant regions associated with milk traits in diverse lactation solute carrier family genes were found in the same region: periods and numbers in dual-purpose DSN cattle. In the view of SLC6A18 (family 6 member 18), SLC6A19 (family 6 member 19), the fact that the biggest possible sample size of the investigated SLC9A3 (family 9 member 3), and SLC12A7 (family 12 member DSN population was still relatively small, the most promising 7). Also, genes coding for enzymes or receptors targeting milk SNPs for improving milk production in DSN are those that were ingredients such as prostaglandin and riboflavin were found, also significantly associated with milk traits in other studies. e.g., PTGR1 (prostaglandin reductase 1) on chromosome 8 at These SNPs were located on chromosome 3, 5, 6, 8, 10, 20, and 101 Mb, PTGIR (prostaglandin I2 receptor) on chromosome 18 X. In contrast, no association to the well-known DGAT1 region at 54 Mb, and FLAD1 (flavin adenine dinucleotide synthetase 1) could be detected in DSN. Further analysis is needed to make on chromosome 3 at 15 Mb. sure that these SNPs do not negatively affect other important

Frontiers in Genetics| www.frontiersin.org 10 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 11

Korkuc´ et al. GWAS for Milk Traits in DSN

production traits or DSN characteristic traits such as carcass drafted the manuscript. DA helped with the statistical analysis. and meat, conformation, fertility, and health before using them SK and KM provided genotypes of 61 cows. DA and GB as breeding markers. Nevertheless, the results of this study helped draft the manuscript. All authors read and approved the are a basis for further genetic analysis to identify genes and final manuscript. causal variants that affect milk traits in DSN cattle as well as in other related breeds. FUNDING DATA AVAILABILITY STATEMENT The project was supported by funds of the Federal Ministry The datasets presented in this study can be found in online of Food and Agriculture (BMEL) based on a decision of the repositories. The names of the repositories and accession parliament of the Federal Republic of Germany via the Federal numbers can be found below: The European Molecular Office for Agriculture and Food (BLE) under the Federal Biology Laboratory’s European Bioinformatics Institute (EMBL- Program for Ecological Farming and Other Forms of Sustainable EBI) European Nucleotide Archive (ENA) and European Agriculture (Funding number: 2815NA010). Variation Archive (EVA), https://www.ebi.ac.uk/ena/browser/ home and https://www.ebi.ac.uk/eva/, PRJEB42513 (project) and ERZ1701738 (analyses). ACKNOWLEDGMENTS

We acknowledge support by the German Research Foundation ETHICS STATEMENT (DFG) and the Open Access Publication Fund of Humboldt- Universität zu Berlin. The RBB Rinderproduktion Berlin- Ethical review and approval was not required for the animal study Brandenburg GmbH and DSN farms supported this project with because samples were collected based on routine procedures their expertise in animal selection and collecting ear tags. We on these farm animals. Ear tags were taken as part of the would like to thank Monika Reißmann for sample management required registration procedure, blood samples were taken by and the isolation of DNA and Siham Rahmatalla for her expertise a trained veterinarian to perform standard health recording. on known milk genes. Written informed consent was obtained from the owners for the participation of their animals in this study. SUPPLEMENTARY MATERIAL AUTHOR CONTRIBUTIONS The Supplementary Material for this article can be found GB, PK, and DA designed the study. PK performed all online at: https://www.frontiersin.org/articles/10.3389/fgene. computational and statistical analysis, interpreted the data, and 2021.640039/full#supplementary-material

REFERENCES Bossé, R., Fumagalli, F., Jaber, M., Giros, B., Gainetdinov, R. R., Wetsel, W. C., et al. (1997). Anterior pituitary hypoplasia and dwarfism in mice lacking Abdel-Shafy, H., Bortfeldt, R. H., Reissmann, M., and Brockmann, G. A. (2018). the dopamine transporter. Neuron 19, 127–138. doi: 10.1016/S0896-6273(00) Validating genome-wide associated signals for clinical mastitis in German 80353-0 Holstein cattle. Anim. Genet. 49, 82–85. doi: 10.1111/age.12624 Buitenhuis, B., Poulsen, N. A., Gebreyesus, G., and Larsen, L. B. (2016). Estimation Aliloo, H., Pryce, J. E., González-Recio, O., Cocks, B. G., and Hayes, B. J. (2015). of genetic parameters and detection of chromosomal regions affecting the Validation of markers with non-additive effects on milk yield and fertility in major milk proteins and their post translational modifications in Danish Holstein and Jersey cows. BMC Genet. 16:89. doi: 10.1186/s12863-015-0241-9 Holstein and Danish Jersey cattle. BMC Genet. 17:114. doi: 10.1186/s12863-016- Alipanah, M., Kalashnikova, L., and Rodionov, G. (2007). Association of genetic 0421-2 variants of the prolactin gene with milk production traits in Russian Red Pied Caroli, A. M., Chessa, S., and Erhardt, G. J. (2009). Invited review: milk protein cattle. Proc. Br. Soc. Anim. Sci. 2007, 156–156. doi: 10.1017/s1752756200020597 polymorphisms in cattle: effect on animal breeding and human nutrition. Altschul, S. F., Gish, W., Miller, W., Myers, E. W., and Lipman, D. J. (1990). Basic J. Dairy Sci. 92, 5335–5352. doi: 10.3168/jds.2009-2461 local alignment search tool. J. Mol. Biol. 215, 403–410. doi: 10.1016/S0022- Chang, C. C., Chow, C. C., Tellier, L. C. A. M., Vattikuti, S., Purcell, S. M., and 2836(05)80360-2 Lee, J. J. (2015). Second-generation PLINK: rising to the challenge of larger and Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M., et al. richer datasets. Gigascience 4:7. doi: 10.1186/s13742-015-0047-8 (2000). Gene Ontology: tool for the tool for the unification of biology. Gene Cole, J. B., Wiggans, G. R., Ma, L., Sonstegard, T. S., Lawlor, T. J., Crooker, B. A., Expr. 25, 25–29. doi: 10.1038/75556 et al. (2011). Genome-wide association analysis of thirty one production, health, Barrett, J. C., Fry, B., Maller, J., and Daly, M. J. (2005). Haploview: analysis and reproduction and body conformation traits in contemporary U.S. Holstein visualization of LD and haplotype maps. Bioinformatics 21, 263–265. doi: 10. cows. BMC Genomics 12:408. doi: 10.1186/1471-2164-12-408 1093/bioinformatics/bth457 Dadousis, C., Biffani, S., Cipolat-Gotet, C., Nicolazzi, E. L., Rosa, G. J. M., Gianola, Bolormaa, S., Porto Neto, L. R., Zhang, Y. D., Bunch, R. J., Harrison, B. E., Goddard, D., et al. (2017). Genome-wide association study for cheese yield and curd M. E., et al. (2011). A genome-wide association study of meat and carcass traits nutrient recovery in dairy cows. J. Dairy Sci. 100, 1259–1271. doi: 10.3168/jds. in Australian cattle. J. Anim. Sci. 89, 2297–2309. doi: 10.2527/jas.2010-3138 2016-11586

Frontiers in Genetics| www.frontiersin.org 11 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 12

Korkuc´ et al. GWAS for Milk Traits in DSN

Devlin, B., and Roeder, K. (1999). Genomic control for association studies. McClure, M. C., Ramey, H. R., Rolf, M. M., McKay, S. D., Decker, J. E., Chapple, Biometrics 55, 997–1004. doi: 10.1111/j.0006-341X.1999.00997.x R. H., et al. (2012). Genome-wide association analysis for quantitative trait loci Doran, A. G., Berry, D. P., and Creevey, C. J. (2014). Whole genome association influencing Warner-Bratzler shear force in five taurine cattle breeds. Anim. study identifies regions of the bovine genome and biological pathways involved Genet. 43, 662–673. doi: 10.1111/j.1365-2052.2012.02323.x in carcass trait performance in Holstein-Friesian cattle. BMC Genomics 15:837. Mckenzie, H. A., Brucec Graham, E. R., and Ponzoni, R. W. (1984). Effects of doi: 10.1186/1471-2164-15-837 milk protein genetic variants on milk yield and composition. J. Dairy Res. 51, Durinck, S., Spellman, P. T., Birney, E., and Huber, W. (2009). Mapping identifiers 531–546. doi: 10.1017/S0022029900032854 for the integration of genomic datasets with the R/ Bioconductor package Meier, S., Arends, D., Korkuc,´ P., Neumann, G. B., and Brockmann, G. A. (2020). A biomaRt. Nat. Protoc. 4, 1184–1191. doi: 10.1038/nprot.2009.97 genome-wide association study for clinical mastitis in the dual-purpose German Dybus, A. (2002). Associations of growth hormone [GH] and prolactin [PRL] genes Black Pied cattle breed. J. Dairy Sci. 103, 10289–10298. doi: 10.3168/jds.2020- polymorphisms with milk production traits in Polish Black-and-White cattle. 18209 Anim. Sci. Pap. Rep. 20, 203–212. Meier, S., Korkuc,´ P., Arends, D., and Brockmann, G. A. (2019). DNA sequence Eigler, T., and Ben-Shlomo, A. (2014). Somatostatin system: molecular variants and protein haplotypes of casein genes in German Black Pied Cattle mechanisms regulating anterior pituitary hormones. J. Mol. Endocrinol. 53, (DSN). Front. Genet. 10:1129. doi: 10.3389/fgene.2019.01129 R1–R19. doi: 10.1530/JME-14-0034 Meredith, B. K., Kearney, F. J., Finlay, E. K., Bradley, D. G., Fahey, A. G., Berry, Gao, X., Starmer, J., and Martin, E. R. (2008). A multiple testing correction D. P., et al. (2012). Genome-wide associations for milk production and somatic method for genetic association studies using correlated single nucleotide cell score in Holstein-Friesian cattle in Ireland. BMC Genet. 13:21. doi: 10.1186/ polymorphisms. Genet. Epidemiol. 32, 361–369. doi: 10.1002/gepi.20310 1471-2156-13-21 Grisart, B., Coppieters, W., Farnir, F., Karim, L., Ford, C., Berzi, P., et al. (2002). Mészáros, G., Petautschnig, E., Schwarzenbacher, H., and Sölkner, J. (2015). Positional candidate cloning of a QTL in dairy cattle: identification of a Genomic regions influencing coat color saturation and facial markings in missense mutation in the bovine DGAT1 gene with major effect on milk yield Fleckvieh cattle. Anim. Genet. 46, 65–68. doi: 10.1111/age.12249 and composition. Genome Res. 12, 222–231. doi: 10.1101/gr.224202 Nayeri, S., Sargolzaei, M., Abo-Ismail, M. K., May, N., Miller, S. P., Schenkel, F., Grothe, P. O. (1993). Holstein-Friesian: Eine Rasse geht um die Welt. Münster: et al. (2016). Genome-wide association for milk production and female fertility Landwirtschaftsverlag. traits in Canadian dairy Holstein cattle. BMC Genet. 17:75. doi: 10.1186/s12863- Hawken, R. J., Zhang, Y. D., Fortes, M. R. S., Collis, E., Barris, W. C., Corbet, 016-0386-1 N. J., et al. (2012). Genome-wide association studies of female reproduction Nayeri, S., Schenkel, F., Fleming, A., Kroezen, V., Sargolzaei, M., Baes, C., et al. in tropically adapted beef cattle. J. Anim. Sci. 90, 1398–1410. doi: 10.2527/jas. (2019). Genome-wide association analysis for β-hydroxybutyrate concentration 2011-4410 in Milk in Holstein dairy cattle. BMC Genet. 20:58. doi: 10.1186/s12863-019- Hu, Z. L., Park, C. A., and Reecy, J. M. (2019). Building a livestock genetic 0761-9 and genomic information knowledgebase through integrative developments of Ng-Kwai-Hang, K. F. F., Hayes, J. F. F., Moxley, J. E. E., and Monardes, H. G. G. animal QTLdb and CorrDB. Nucleic Acids Res. 47, D701–D710. doi: 10.1093/ (1984). Association of genetic variants of casein and milk serum proteins with nar/gky1084 milk, fat, and protein production by dairy cattle. J. Dairy Sci. 67, 835–840. Jiang, J., Ma, L., Prakapenka, D., VanRaden, P. M., Cole, J. B., and Da, Y. (2019). A doi: 10.3168/jds.S0022-0302(84)81374-0 large-scale genome-wide association study in U.S. Holstein Cattle. Front. Genet. Ogorevc, J., Kunej, T., Razpet, A., and Dovc, P. (2009). (milk production and 10:412. doi: 10.3389/fgene.2019.00412 mastitis traits) Database of cattle candidate genes and genetic markers for milk Kassambara, A. (2020). ggpubr: “ggplot2” Based Publication Ready Plots. Available production and mastitis. Anim. Genet. 40, 832–851. doi: 10.1111/j.1365-2052. online at: https://cran.r-project.org/package=ggpubr (accessed September 1, 2009.01921.x 2020). Parker Gaddis, K. L., Null, D. J., and Cole, J. B. (2016). Explorations in genome- Kolbehdari, D., Wang, Z., Grant, J. R., Murdoch, B., Prasad, A., Xiu, Z., et al. (2009). wide association studies and network analyses with dairy cattle fertility traits. A whole genome scan to map QTL for milk production traits and somatic J. Dairy Sci. 99, 6420–6435. doi: 10.3168/jds.2015-10444 cell score in Canadian Holstein bulls. J. Anim. Breed. Genet. 126, 216–227. Poulsen, N. A., Rybicka, I., Larsen, L. B., Buitenhuis, A. J., and Larsen, M. K. (2015). doi: 10.1111/j.1439-0388.2008.00793.x Short communication: genetic variation of riboflavin content in bovine milk. Köppe-Forsthoff, J. (1967). 100 Jahre Deutsche Schwarzbuntzucht. Hiltrup: J. Dairy Sci. 98, 3496–3501. doi: 10.3168/jds.2014-8829 Verband deutscher Schwarzbuntzüchter e.V. Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Kuznetsova, A., Brockhoff, P. B., and Christensen, R. H. B. (2017). lmerTest et al. (2007). PLINK: a tool set for whole-genome association and population- package: tests in linear mixed effects models. J. Stat. Softw. 82:26. doi: 10.18637/ based linkage analyses. Am. J. Hum. Genet. 81, 559–575. doi: 10.1086/51 jss.v082.i13 9795 Li, Y., Gao, Y., Kim, Y.-S., Iqbal, A., and Kim, J.-J. (2017). A whole R Core Team (2018). A Language and Environment for Statistical Computing. genome association study to detect additive and dominant single nucleotide Vienna: R Foundation for Statistical Computing. polymorphisms for growth and carcass traits in Korean native cattle, Hanwoo. RBB Rinderproduktion Berlin-Brandenburg GmbH. (2020). Deutsches Asian-Australas. J. Anim. Sci. 30, 8–19. doi: 10.5713/ajas.16.0170 Schwarzbuntes Niederungsrind – lebendes Kulturerbe. Available online at: Logarajah, S., Hunter, P., Kraman, M., Steele, D., Lakhani, S., Bobrow, L., et al. https://www.rinderzucht-bb.de/zucht/dsn-genreserve/ (accessed April 1, (2003). BCL-6 is expressed in breast cancer and prevents mammary epithelial 2020). differentiation. Oncogene 22, 5572–5578. doi: 10.1038/sj.onc.1206689 Rosen, B. D., Bickhart, D. M., Schnabel, R. D., Koren, S., Elsik, C. G., Zimin, A., Lu, H., and Bovenhuis, H. (2019). Genome-wide association studies for genetic et al. (2018). Modernizing the bovine reference genome assembly. Proc. World effects that change during lactation in dairy cattle. J. Dairy Sci. 102, 7263–7276. Congr. Genet. Appl. Livest. Prod. 3:802. doi: 10.3168/jds.2018-15994 Snelling, W. M., Allan, M. F., Keele, J. W., Kuehn, L. A., McDaneld, T., Smith, Mapholi, N. O., Maiwashe, A., Matika, O., Riggio, V., Bishop, S. C., MacNeil, M. D., T. P. L., et al. (2010). Genome-wide association study of growth in crossbred et al. (2016). Genome-wide association study of tick resistance in South African beef cattle. J. Anim. Sci. 88, 837–848. doi: 10.2527/jas.2009-2257 Nguni cattle. Ticks Tick Borne Dis. 7, 487–497. doi: 10.1016/j.ttbdis.2016.02.005 The Society for the Conservation of Old and Endangered Livestock Breeds (GEH). Mateescu, R. G., Garrick, D. J., and Reecy, J. M. (2017). Network analysis reveals (2018). Rassenbeschreibungen Rinder: Deutsches Schwarzbuntes Niederungsrind. putative genes affecting meat quality in . Front. Genet. 8:171. doi: Available online at: http://www.g-e-h.de/index.php/rassebeschreibungen/34- 10.3389/fgene.2017.00171 rassekurzbeschreibungen-rinder/70-deutsches-schwarbuntes-niederungsrind May, K., Scheper, C., Brügemann, K., Yin, T., Strube, C., Korkuc,´ P., et al. (2019). (accessed April 1, 2020). Genome-wide associations and functional gene analyses for endoparasite Wang, Z., and Chatterjee, N. (2017). Increasing mapping precision of genomewide resistance in an endangered population of native German Black Pied cattle. association studies: to genotype and impute, sequence, or both? Genome Biol. BMC Genomics 20:277. doi: 10.1186/s12864-019-5659-4 18, 17–19. doi: 10.1186/s13059-017-1255-6

Frontiers in Genetics| www.frontiersin.org 12 March 2021| Volume 12| Article 640039 fgene-12-640039 March 2, 2021 Time: 17:46 # 13

Korkuc´ et al. GWAS for Milk Traits in DSN

Wickham, H., Chang, W., Henry, L., Pedersen, T. L., Takahashi, K., Wilke, C., et al. Conflict of Interest: The authors declare that the research was conducted in the (2016). Create Elegant Data Visualisations Using the Grammar of Graphics. R absence of any commercial or financial relationships that could be construed as a package ggplot2 version 3.3.2. potential conflict of interest. Yates, A. D., Achuthan, P., Akanni, W., Allen, J., Allen, J., Alvarez-Jarreta, J., et al. (2020). Ensembl 2020. Nucleic Acids Res. 48, D682–D688. doi: 10.1093/nar/ Copyright © 2021 Korku´c, Arends, May, König and Brockmann. This gkz966 is an open-access article distributed under the terms of the Creative Zielke, L. G., Bortfeldt, R. H., Reissmann, M., Tetens, J., Thaller, G., and Commons Attribution License (CC BY). The use, distribution or reproduction Brockmann, G. A. (2013). Impact of variation at the FTO locus on milk fat yield in other forums is permitted, provided the original author(s) and the in holstein dairy cattle. PLoS One 8:e63406. doi: 10.1371/journal.pone.0063406 copyright owner(s) are credited and that the original publication in this Zielke, L. G., Bortfeldt, R. H., Tetens, J., and Brockmann, G. A. (2011). BDNF journal is cited, in accordance with accepted academic practice. No use, contributes to the genetic variance of milk fat yield in German Holstein cattle. distribution or reproduction is permitted which does not comply with Front. Genet. 2:16. doi: 10.3389/fgene.2011.00016 these terms.

Frontiers in Genetics| www.frontiersin.org 13 March 2021| Volume 12| Article 640039