Genetics: Early Online, published on May 30, 2018 as 10.1534/genetics.118.301143

1 Human-mediated introgression of haplotypes in a modern

2 dairy breed

3

4 Qianqian Zhang1,2,#,*, Mario Calus2, Mirte Bosse2, Goutam Sahana1, Mogens Sandø Lund1, and 5 Bernt Guldbrandtsen1

6

7 1 Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, 8 Aarhus University, Denmark

9 2 Animal Breeding and Genomics, Wageningen University & Research, the Netherlands

10 # Present address: Department of Veterinary and Animal Sciences, Faculty of Health and Medical 11 Sciences, University of Copenhagen, Denmark

12 * Corresponding author

13

14 Email addresses:

15 QZ: [email protected]

16 MC: [email protected]

17 MB: [email protected]

18 GS: [email protected]

19 MSL: [email protected]

20 BG: [email protected]

Copyright 2018. 21 Abstract

22 23 Domestic animals can serve as model systems of adaptive introgression and their genomic 24 signatures. In part their usefulness as model systems is due to their well-known histories. Different 25 breeding strategies such as introgression and artificial selection have generated numerous desirable 26 phenotypes and superior performance in domestic animals. The Modern Danish Red is 27 studied as an example of an introgressed population. It originates from crossing the traditional 28 Danish Red Dairy Cattle with the Holstein and Brown Swiss breeds, both known for high 29 production. This crossing happened among other things due to changes in production system, to 30 raise milk production and overall performance. The genomes of Modern Danish Red Dairy Cattle 31 are heavily influenced by regions introgressed from the Holstein and Brown Swiss breeds and under 32 subsequent selection in the admixed population. The introgressed proportion of the genome was 33 found to be highly variable across the genome. Haplotypes introgressed from Holstein and Brown 34 Swiss contained or overlapped known genes affecting milk production, as well as protein and fat 35 content (CD14, ZNF215, BCL2L12 and THRSP for Holstein origin; ITPR2, BCAT1, LAP3 and 36 MED28 for Brown Swiss origin). Genomic regions with high introgression signals also contained 37 genes and enriched QTLs associated with calving traits, body confirmation, feed efficiency, carcass, 38 and fertility traits. These introgressed signals with relative identity by descent scores larger than the 39 median showing Holstein or Brown Swiss introgression are mostly significantly correlated with the 40 corresponding test statistics from signatures of selection analyses in Modern Danish Red Dairy 41 Cattle. Meanwhile, the putative significant introgressed signals have a significant dependency with 42 the putative significant signals from signatures of selection analyses. Artificial Selection has played 43 an important role in the genomic footprints of introgression in the genome of Modern Danish Red 44 Dairy Cattle. Our study on a modern cattle breed contributes to an understanding of genomic 45 consequences of selective introgression by demonstrating the extent to which adaptive effects 46 contribute to shaping the specific genomic consequences of introgression. 47 48 Keywords: selective introgression, signature of selection, high yielding cattle breeds, modern dairy 49 cattle breed 50 51 52 53 Introduction

54 55 Processes of adaptive introgression are complex and their genomic signatures in human and other 56 species have been studied extensively (Hasenkamp, Solomon et al. 2015, Deschamps, Laval et al. 57 2016, Figueiro, Li et al. 2017, Jagoda, Lawson et al. 2017). Genome analysis has enabled an in- 58 depth assessment of genomic consequences of different demographic processes including 59 introgression, selection, and their interplay in the modern species (Deschamps, Laval et al. 2016, 60 Figueiro, Li et al. 2017). Domestic animals can serve as model organisms for these processes. They 61 have several advantages in understanding the impact of introgression and selection on genomes: 62 first, introgression and selection is known to occur between breeds, and the processes are often well 63 documented by breeders; second, massive data are routine collected before and after introgression 64 and selection such as parentage, genotypes and phenotypes; third, under controlled appropriate 65 environmental conditions a large part of genomic consequence is caused by human-mediated 66 directional introgression and selection. 67 68 Artificial selection and different breeding strategies have enabled generating numerous desirable 69 phenotypes in domestic animals such as cattle (Hartwig, Wellmann et al. 2015, Buzanskas, Ventura 70 et al. 2017, Davis, Spelman et al. 2017), pigs (Bosse, Megens et al. 2014, Bosse, Megens et al. 2014, 71 Ai, Fang et al. 2015) and dogs (Galov, Fabbri et al. 2015, vonHoldt, Kays et al. 2016). Strategies 72 including crossbreeding and introgression have been very successful in improving productivity and 73 performance in domestic animals. For example, Chinese pig breeds have been imported to Europe 74 to improve the productivity of European pigs in the late eighteenth and early nineteenth centuries 75 (White 2011). The fertility related traits have been largely improved by the crossbreeding and 76 introgression from Asian pigs (Merks, Mathur et al. 2012, Bosse, Megens et al. 2014, Bosse, Lopes 77 et al. 2015). Similarly, in dairy cattle, crossbreeding and following introgression between local 78 breeds with other breeds has been applied in order to achieve better productivity and performance 79 (Davis, Spelman et al. 2017). The genetic architecture of modern domestic animals including dairy 80 cattle is shaped by the interplay between different forces including the intentional introduction of 81 favorable alleles from other breeds, subsequent selection for favorable introgressed alleles and 82 demographic processes. 83 84 Since the introduction of scientific breeding, the main breeding goal in cattle has been to improve 85 milk yield, fat, and protein content, even for dual-purpose breeds in intensive farming systems 86 especially in Europe (Hartwig, Wellmann et al. 2015). By the crossing of high yielding breeds with 87 local breeds, the productivity of local breeds could rapidly be increased at the expense of the 88 genetic distinctiveness of the local breeds. The high productivity of these admixed breeds was 89 further improved by intense selection, resulting in increased frequencies or even fixation of 90 favorable alleles. Many of the variants thus spreading in the population will have been of 91 introgressed origin. With the great success of cattle breeding including crossbreeding, and because 92 of the availability of large scale genomic data sets, analysis of admixed local cattle breeds 93 represents an appealing model to identify the genomic consequences of admixture with the intention 94 of improving traits of interest in local breeds. We expect that the introgressed genomic regions play 95 an important role in improving productivity and performance in the . We hypothesize 96 that: 1) the genome-wide introgressed regions are non-randomly distributed across the genome with 97 respect to their genomic locations, and that: 2) the majority of highly introgressed regions affects 98 production and performance traits, and that: 3) these introduced haplotypes are or have been under 99 selection in the admixed breed. 100 101 To test these hypotheses and identify specific genomic regions and genes of interest involved in the 102 important traits from high-yields breeds such as Holstein (HOL) and Brown Swiss (BSW), we use 103 the hybrid Modern Danish Red Dairy Cattle (mRDC) breed as an example. Our analyses illustrate 104 the patterns of introgressed and selected haplotypes in an admixed local breed. The hybrid mRDC 105 originates from traditional Danish Red Dairy Cattle (tRDC). In recent decades Holstein and Brown 106 Swiss have been used extensively to improve the milk yield of mRDC (Kantanen, Olsaker et al. 107 2000). Years of crossbreeding and selection have led to the differentiation between mRDC and 108 tRDC. The introgression of and selection for haplotypes from HOL and BSW has probably made a 109 significant contribution to the increased milk production level of mRDC. The objective of this study 110 was to examine genomic patterns of introgression from two high-yielding breeds (HOL and BSW) 111 in a modern dairy breed (i.e. the hybrid mRDC), and unravel the consequences of introgression and 112 selection at the genome level using the whole genome sequence data. 113

114 Methods

115 116 1. SNP genotyping, sequencing, variant calling, and quality control 117 118 Whole genome sequence data were available for 97 animals from 4 breeds (32 Holstein: HOL; 20 119 Brown Swiss: BSW; 15 traditional Red Danish Dairy Cattle: tRDC; 30 modern Danish Red Dairy 120 cattle: mRDC). All individuals’ genomes were sequenced to ~10× of depth or deeper using Illumina 121 paired-end sequencing. Reads were aligned to the cattle genome assembly UMD3.1 (Zimin, 122 Delcher et al. 2009) using bwa (Li and Durbin 2009). Aligned sequences were converted to raw 123 BAM files using samtools (Li, Handsaker et al. 2009). Duplicate reads were marked using the 124 samtools rmdup option (Li, Handsaker et al. 2009). The Genome Analysis Toolkit (McKenna, Hanna 125 et al. 2010) was used for local realignment around insertion/deletion (indels) regions, and 126 recalibration following the 1000 Bull Genome Project guidelines (Daetwyler, Capitan et al. 2014) 127 incorporating information from dbSNP (Sherry, Ward et al. 2001). Finally, variants were called 128 using the Genome Analysis Toolkit’s UnifiedGenotyper (McKenna, Hanna et al. 2010), which 129 simultaneously calls short indels and SNPs. Indels and variants on sex chromosomes were excluded 130 from further analyses. 131 132 2. Population structure and admixture 133 134 Using PLINK (Chang, Chow et al. 2015), the sequence variants were pruned to remove markers 135 with pairwise linkage disequilibrum (LD) greater than 0.1 with any other SNP within a 50 SNP 136 sliding window (advancing by 10 SNPs at a time). The SNPs on all the autosomes were used to 137 study the population structure. To get an overview of population structure of the genotyped animals 138 from different breeds, the whole genome sequence data was used for principal component 139 estimation using GCTA (Yang, Lee et al. 2011). Admixture analyses were done using the program 140 Admixture (Alexander, Novembre et al. 2009) with values of K between 2 and 10. The K value with 141 a low cross validation error was chosen. 142 143 In order to investigate the statistical significance of admixture among these cattle populations.

144 TreeMix software was used to perform the three population (f3) test (Pickrell and Pritchard 2012).

145 In the f3 test with the form of f3 (A; B, C), an extreme negative f3 statistic indicates that there has 146 been significant gene flow to A from populations B and C. The combination of two of HOL, BSW, 147 tRDC (A and B) were used as source populations and mRDC were used as admixed population (C)

148 in the f3 test, which results in 3 different combinations. 149 150 We also calculated the breed proportions for the sequenced mRDC individuals using pedigree. The 151 full pedigree of the sequenced mRDC animals were extracted and used to infer the breed 152 proportions by coding the breed where the ancestors enter as the known parents of the traced animal. 153 154 3. Introgression mapping 155 156 3.1 Calculation of ancestry dosages in mRDC 157 158 A novel two-layer hidden Markov model was implemented in the method developed in Guan (2014) 159 (Guan 2014) to infer the structure of local haplotypes introgressed from HOL, BSW and tRDC in 160 mRDC. The software package ELAI developed by Guan (2014) (Guan 2014) was used to infer the 161 ancestry dosages of the haplotypes from three source populations in mRDC. The SNPs from 162 sequence variants with minor allele frequency (MAF) lower than 0.01 or missing proportion of 163 higher than 0.05 were removed from subsequent analyses. The option of three-way admixture with 164 admixture generations of 10 was chosen, which approximates the history of admixture in the mRDC 165 population. 30 steps of Expectation–maximization (EM) algorithm were run to infer the ancestry 166 dosages of HOL, tRDC and BSW in mRDC. 167 168 3.2 Relative Identity by Descent Scores (rIBD) 169 170 HOL, BSW and tRDC have made large genetic contributions to the mRDC. Therefore, the 171 sequenced mRDC, tRDC, BSW and HOL were selected for introgression mapping analysis. The 172 identity-by-descent (IBD) regions comparing mRDC and tRDC were used as a reference to map the 173 introgression regions from HOL and BSW using a pairwise comparison between these breeds. 174 Following Bosse et al. (Bosse, Megens et al. 2014), sequences for 29 autosomes were first phased 175 using Beagle fastIBD (V. 3.3.2) (Browning and Browning 2007). Pairwise comparisons for 176 detecting IBD were performed between mRDC and tRDC; mRDC and HOL; mRDC and BSW. As 177 recommended in the Beagle documentation (Browning and Browning 2007), 10 independent runs 178 for phasing and pairwise IBD detection were performed. The identified IBD segments were 179 combined from 10 runs and the threshold parameter compromising between power and false- 180 discovery rate was 10-10 for identifying the true shared IBD as suggested by Browning and 181 Browning (Browning and Browning 2007). We defined the IBD score as the proportion of the 182 number of recorded true IBD haplotype segments over the total number of pairwise comparisons 183 using a window of 10 kb. The IBD score was calculated for each pairwise comparison using a 184 custom perl script. To quantify the relative proportion of introgressed genome from HOL or BSW, 185 we calculated the relative IBD score (rIBD) as follows: IBD score (mRDC & tRDC) - IBD score 186 (mRDC & HOL) or IBD score (mRDC & BSW). Thus, the rIBD has values in the range of -1 to 1. 187 rIBD=1 signifies that all haplotypes in the target breed originate from the first source breed, while 188 rIBD=-1 signifies 100% from the second source breed. The variance of rIBD score was calculated 189 using a robust method (Williams 2000). The p values of rIBD scores were derived from the neutral 190 hypothesis that assuming rIBD are normally distributed with a standard deviation of the squared 191 root of robust estimate of rIBD variance (Bosse, Megens et al. 2014). The significant introgressed 192 haplotypes from HOL and BSW in mRDC were defined as the haplotypes with a corrected p value 193 using Benjamini-Hochberg Procedure (Benjamini and Yekutieli 2001) lower than 0.02 and 0.04 194 respectively. The Peterson’s correlations between rIBD scores from HOL or BSW introgression 195 larger than the median of rIBD scores and the corresponding test statistics from signatures of 196 selection analyses were calculated. Moreover, the putative significant introgressed signals from 197 HOL and BSW were extracted and compared with the putative significant signals from signatures 198 of selection analyses using a chi-squared test with Monte Carlo simulation (Hope 1968) 199 implemented in calculating p values. 200 201 4. Detection of signature of selection 202

203 4.1 Fst analysis

204 The genetic differentiation between individuals from tRDC and mRDC was measured by pairwise

205 Fst analysis following Weir and Cockerham (Weir and Cockerham 1984). Pairwise Fst was 206 computed with Genepop 4.2 in bins of 10 kb over the full length of the genome (Weir and

207 Cockerham 1984). The correlations between the Fst and rIBD scores for HOL and BSW 208 introgression for the same bins of 10 kb were calculated. The R package FlexMix (Grun and Leisch 209 2008) was used to fit a series of finite normal mixture models to explore the underlying structure of

210 the distribution of Fst presumably caused by different evolutionary or demographic processes 211 undergoing in the populations such as balancing, directional selection and neutrality. This mixture ̅̅̅̅̅ 푘 ̅̅̅̅̅̅ 2 212 model postulated that 퐹푠푡: ∑푘=1 휋푘푁(퐹푠푡푙| 휇푘, 휎푘 ), where 푘 is the number of components of the ̅̅̅̅̅ 2 213 mixture, 푙 is the 푙 th locus, 휋푘 is the probability that 퐹푠푡 belongs to cluster k , 휇푘 and 휎푘 are 214 expectation and variance of normal distribution, respectively. The number of components of the 215 mixture (푘) was determined by the smallest Akaike’s information criterion (AIC) among different 216 models with different numbers of components. Model parameters were estimated by maximum

217 likelihood via EM algorithm in FlexMix. At last, the p value of the Fst combine the k number of 푘 ̅̅̅̅̅̅ 2 218 components was calculated as: 푝 = 1 − ∑푘=1 휋푘푁(퐹푠푡푙| 휇푘, 휎푘 ). The regions with the cutoff of 219 corrected p values using Benjamini-Hochberg Procedure (Benjamini and Yekutieli 2001) of less 220 than 0.05 were used for testing dependency of putative significant signals between different test 221 statistics.

222 223 4.2 Extended haplotype homozygosity tests 224

225 The extended haplotype homozygosity tests were applied between the breeds for the sequenced 226 individuals as a second evidence for signatures of selection. The genome-wide scan for integrated 227 haplotype score (iHS) for mRDC was performed using the R package rehh (Sabeti, Reich et al. 228 2002, Gautier and Vitalis 2012). The threshold of |iHS| of 2.5 for SNPs was used as the cutoff to 229 define as signals of selection signatures following (Voight, Kudaravalli et al. 2006, Gautier and 230 Vitalis 2012). To compare iHS with other test statistics such rIBD scores in the same scale, we 231 calculated the proportion of SNPs with |iHS|>2 in a window size of 10 kb and identified the 232 windows in the highest 1% of the empirical distribution for proportion of SNPs with |iHS|>2 233 following (Voight, Kudaravalli et al. 2006). The putative significant windows of proportion of 234 SNPs with |iHS| used for testing dependency of putative significant signals between different test 235 statistics.

236

237 4.3 Sharing of runs of homozygosity (ROH)

238 239 Runs of homozygosity (ROH) were computed for the sequenced animals to detect shared ROH 240 regions (minimum 10 kb considered) among individuals. For a description of procedures for 241 calculation of the nucleotide diversity and for detection of ROH, see (Zhang, Calus et al. 2015, 242 Zhang, Guldbrandtsen et al. 2015). The sharing of ROH regions was calculated as the number of 243 individuals sharing the same ROH region on a particular segment using a window of 10 kb bin 244 across the whole genome in mRDC. Regions of enrichment of shared ROH regions were defined as 245 regions exceeding 95th percentile of the empirical distribution of the number of individuals sharing 246 the same ROH regions in any particular segment. 247 248 5. Gene annotations and enrichment of QTLs

249

250 Genes in genomic regions showing significant introgression of haplotypes from HOL or BSW in 251 mRDC were annotated. The cattle QTLs were extracted from Animal QTL Database (Hu, Park et al. 252 2016). QTLs on the X chromosome or without locations and references in the list from Animal 253 QTL Database (Hu, Park et al. 2016) were excluded. The remaining QTLs were classified into 6 254 groups according to the associated traits: milk, reproduction, production, health, meat and carcass, 255 exterior. The QTLs overlapped with genomic regions with high rIBD scores were identified. When 256 two QTLs had the same exact genomic interval and same associated traits group, they were counted 257 as one QTL. Gene annotations in these regions were retrieved from the Ensembl Genes 89 Database 258 using BioMart (Kinsella, Kahari et al. 2011).

259

260 To test for enrichment of QTLs in the candidate introgressed regions from HOL or BSW, we 261 applied a permutation test. The candidate introgressed regions were randomly distributed across the 262 whole genome. This permutation did not change the relative proportion and length of introgressed 263 regions to preserve their correlation structure. We then computed the number of QTLs and the 264 number of QTLs in the 6 groups associated traits, which overlapped with the permuted, introgressed 265 regions. In total, 10,000 permutations were performed. The distribution of numbers of QTLs 266 observed in the permutated regions were treated as the null distribution from which we computed 267 the significance levels of the number of QTLs observed in the real data. 268

269 Results and discussion

270 271 1. Population structure and evidence of introgression 272 273 Population structures of the sampled cattle breeds in this study was analyzed using Admixture and 274 PCA are shown in Figure 1. mRDC had contribution from tRDC, HOL, BSW as observed in the 275 Admixture analysis (Figure 1). Figure 1 clearly demonstrates the hybrid nature of mRDC cattle, 276 which is consistent with recorded pedigree information of introgression history of mRDC 277 (Andersen, Jensen et al. 2003). A large contribution in mRDC was observed from tRDC, the 278 recipient population. It is notable that BSW and HOL are two mainstream breeds, each contributed 279 heavily to the genomes of extant mRDC individuals. The proportion of introgression differs quite 280 extensively among individuals. In the PCA analysis (Figure 1), Principal Component 1 (PC1, 6.23 % 281 of variance) separated HOL, BSW and tRDC. mRDC, however, was dispersed between the other 282 breeds demonstrating admixture of HOL, BSW and tRDC which have contribution to mRDC. 283 Similarly, PC2 separated HOL, BSW and tRDC. mRDC had a wide range of PC2 values among the 284 individuals from the other breeds. 285 286 Moreover, the breed proportions of mRDC individuals were derived using the recoded full pedigree 287 and there is an average of 0.27 HOL ancestor, 0.17 BSW ancestor and 0.29 tRDC ancestor in 288 mRDC. The statistical significance of admixture in mRDC was measured using the combination of

289 two populations chosen from HOL, BSW and tRDC by f3 test using the program threepop from

290 Treemix (Pickrell and Pritchard 2012). We observed extreme Z scores from f3 test for mRDC using 291 any combination of HOL, BSW and tRDC, i.e. -14.97, -11,72 and -19.24. These results are 292 consistent with what we have observed from admixture and PCA analysis and provides statistical 293 significance for the admixture of mRDC from HOL, BSW and tRDC. These results support that 294 mRDC indeed is a composite breed and can be studied further for introgression from HOL and 295 BSW. 296 297 2. Introgression mapping 298 299 Introgression mapping was performed to identify regions in mRDC that contained an excess of 300 introgressed haplotypes. We first examined the local structure of haplotypes in mRDC introgressed 301 from HOL, BSW and tRDC using a three-way admixture approach (Figure S1). We observed 302 average ancestry dosages of 0.78, 0.63 and 0.59 with a standard deviations of 0.24, 0.22 and 0.18 303 from HOL, tRDC and BSW to mRDC average across the whole genome and all mRDC individuals. 304 Three genomic regions with the ancestry dosage of 2 were observed from HOL to mRDC (Figure 305 S1). This indicates complete replacement of tRDC genomic material by HOL genomic material. 306 However, there is only one annotated gene (SNORD116) in these regions. Interestingly, the 307 genomic region with the full HOL ancestry dosage on chromosome 21 (1,348,427-2,107,179 bp) is 308 associated with gestation length and calving ease (Frischknecht, Bapst et al. 2017). Other QTL 309 overlapping regions with full HOL ancestry in mRDC were associated with bovine respiratory 310 disease susceptibility, body weight and udder swelling score (Saatchi, Schnabel et al. 2014, Keele, 311 Kuehn et al. 2015, Michenet, Saintilan et al. 2016). 312 313 To address which admixed haplotypes have most influenced the existing mRDC breed, we inferred 314 whether a genomic region was introgressed from high-yield breeds in multiple individuals by 315 examining the contributions proportional to the admixture fraction. The frequencies of all mRDC 316 haplotypes that were of HOL, BSW or tRDC origin were calculated across the whole genome. The 317 relative fractions of HOL or BSW haplotypes versus tRDC haplotypes in the mRDC group were 318 calculated as ‘relative IBD scores’ (rIBD). Shared haplotypes (i.e. haplotypes with shared ancestry) 319 were observed between mRDC on one hand, and HOL, BSW and tRDC on the other hand. These 320 findings are in agreement with the results observed from population structure and admixture 321 analysis that showed contributions from HOL, BSW and tRDC to mRDC. In contrast to tRDC 322 frequency, HOL haplotype and BSW haplotype frequency in mRDC population, for a given locus, 323 (i.e. rIBD score) ranged from 0.73 to -0.74 and from 0.81 to -0.92, where 1 indicates that all 324 haplotypes were of HOL or BSW origin, while none were of tRDC origin. A value of -1 indicates 325 that all haplotypes were tRDC-like (Figures 2 and 3). The rIBD scores averaged across the genome 326 were negative (-0.06 for HOL introgression and -0.09 for BSW introgression) (Figures 2a and 3a), 327 showing that the majority of the genome displayed more similarity with the tRDC than with either 328 HOL or BSW. However, every chromosome contained genomic regions where the signal for HOL 329 or BSW haplotype was stronger than tRDC. 330 331 The distribution of rIBD from the comparison between mRDC and HOL or BSW for IBD 332 haplotypes resembled a normal distribution (Figures 2b and 3b). By taking a cut-off of rIBD values, 333 we were able to identify the regions which were likely to be of HOL origin or BSW origin. Across 334 the whole genome, many known genes and QTLs were located within regions with high rIBD for 335 HOL origin (Table S1). We observed that the QTLs are significantly enriched in the HOL-like 336 haplotypes in mRDC (p=0.025). The genes and QTLs were associated with economic traits 337 including milk-related traits such as milk yield, protein, fat yield and percentage (CD14, ZNF215, 338 BCL2L12 and THRSP) (Ashwell, Heyen et al. 2004, Beecher, Daly et al. 2010, Magee, Sikora et al. 339 2010, Cole, Wiggans et al. 2011, Cochran, Cole et al. 2013, Fontanesi, Calo et al. 2014, 340 Capomaccio, Milanesi et al. 2015), calving traits (MYH14, KCNC3, SYT3 and CTU1) (Kolbehdari, 341 Wang et al. 2008, Cole, Wiggans et al. 2011, Gaddis, Null et al. 2016, Mao, Kadri et al. 2016, Abo- 342 Ismail, Brito et al. 2017), feed efficiency-related traits (LRRIQ3, ATP6V1B2 and CCKBR) (Abo- 343 Ismail, Kelly et al. 2013, Abo-Ismail, Vander Voort et al. 2014), carcass traits (ZNF215, INTS4 and 344 RPTOR) (Magee, Sikora et al. 2010, Sasago, Abe et al. 2017). The QTLs associated production and 345 reproduction traits were significantly enriched in introgressed regions from HOL showing high 346 rIBD scores (p=0.0004 for production and p=0.034 for reproduction traits). The longest continuous 347 introgressed region (defined as the region with rIBD>0) from HOL to mRDC was on chromosome 348 18 (56,320,000-61,350,000bp) (Figure 2a). This region was previously found to be associated with 349 calving traits and young stock survival in Nordic Holstein cattle (Cole, Wiggans et al. 2011, Mao, 350 Kadri et al. 2016, Wu, Guldbrandtsen et al. 2017). The recombination rate of this region of 351 chromosome 18 is low (Weng, Saatchi et al. 2014). The long genomic regions showing signal of 352 introgression probably tend to occur in regions with low recombination rate. Moreover, introgressed 353 haplotypes included numerous annotated genes due to genetic hitchhiking and short time since 354 introgression. The region with highest rIBD score from HOL was located on chromosome 4 355 (120,540,000-120,810,000 bp, average rIBD=0.449), which is downstream of the gene VIPR2. The 356 VIPR2 gene has been proposed as a candidate gene affecting fat percentage and playing an 357 important role in milk synthesis (Capomaccio, Milanesi et al. 2015). 358 359 Similarly, many known genes and QTLs were occurred in regions where mRDC shared haplotypes 360 with BSW (Table S2). We observed an enrichment of QTLs in the BSW-like haplotypes in mRDC 361 with close to significance (p=0.056). These genes and QTLs mainly affected milk composition 362 including fat and protein percentage and yield (ITPR2, BCAT1, LAP3 and MED28 ) (Cohen-Zinder, 363 Seroussi et al. 2005, Pimentel, Bauersachs et al. 2011, Zheng, Ju et al. 2011, Fang, Fu et al. 2014), 364 growth and body conformation traits such as stature (NCAPG, LCORL, PPP2R1A, IGFBP6 and 365 CREBBP) (Kolbehdari, Wang et al. 2008, Baeza, Corva et al. 2011, Lindholm-Perry, Sexten et al. 366 2011, Cole, Waurich et al. 2014, Sahana, Hoglund et al. 2015), calving and fertility traits (EIF4G3, 367 TGFA and LAP3) (Bongiorni, Mancini et al. 2012, Hering, Olenski et al. 2014, Hoglund, Buitenhuis 368 et al. 2015), and feed efficiency related traits (CLMP and PPP2R1A) (Prakash, Bhattacharya et al. 369 2011, Cole, Waurich et al. 2014). We observed a significant enrichment of QTLs associated with 370 milk, production, meat and carcass traits in the introgressed regions from BSW in mRDC (p=0.045 371 for milk, p=0.019 for production, and p=0.017 for meat and carcass traits). While the longest region 372 introgressed from HOL in mRDC was ~4 Mb, there was no introgressed region longer than 1.5 Mb 373 from BSW in mRDC. The highest value of rIBD signal was observed on chromosome 17 374 (35,630,000-35,780,000 bp, average rIBD=0.563). This region is located downstream of the IL2 375 gene. Also there are one unannotated gene in this region. It has previously been shown that IL2 376 gene is associated with mastitis, milk yield and lactation persistency (Alluwaimi, Leutenegger et al. 377 2003, Prakash, Bhattacharya et al. 2011). 378 379 3. Regions of introgression and evidence for selection 380 381 We have shown that HOL or BSW haplotypes that were introgressed in mRDC often originated 382 from genomic regions harboring genes associated with milk production, calving traits, feed 383 efficiency, fertility or body conformation and carcass traits. An introgression introduced haplotype 384 from HOL or BSW with a continued survival in mRDC is not removed by negative selection or 385 genetic drift might be a result of positive or balancing selection. The structure of introgressed 386 haplotype will change in such as linkage-disequilibrium (LD), and distribution of allele frequencies 387 due to selection. The length of introgressed haplotype was affected by a combination of the local 388 recombination rate and strength of selection. To further infer the introgressied regions which are

389 under selection, we used three independent methods: iHS, Fst, and sharing of ROH among 390 individuals to identify the regions which were under selection in the mRDC population. iHS

391 identified regions with extended homozygosity in mRDC due to selection. Local high Fst statistics 392 reflected the genomic regions which showed strong differentiation between mRDC and tRDC. The 393 sharing of ROH among individuals could differentiate the genomic regions which have been fixed 394 or close to fixation in mRDC. We observed a significant dependency (p<0.01) between the putative

395 significant rIBD signals and significant signals from iHS, Fst and sharing of ROH among 396 individuals, except between the significant rIBD signals from HOL introgression and significant 397 iHS signals. It supports the hypothesis that many introgressed regions from HOL or BSW are 398 probably a result of selection.

399 400 We firstly observed significant positive correlations between Fst and rIBD scores larger than the 401 median from both HOL and BSW (p<0.001), supportings that at least some of the regions in mRDC 402 showing differentiation from tRDC have been introgressed from HOL or BSW (Figures S2 and S3). 403 Genes associated with milk yield, protein and fat yield and percentage such as CD14 and ZNF215 404 (Magee, Sikora et al. 2010, Cochran, Cole et al. 2013) overlapped with haplotypes introgressed

405 from HOL showed a high Fst (Figure 4e). The longest region introgressed from HOL in mRDC

406 (chromosome 18: 56,320,000-61,350,000 bp) also showed high Fst values (Figure 4d). There was a

407 number of genes included in this longest region with introgression from HOL and high Fst regions, 408 such as MYH14 and ZNF613 associated with calving ease and young stock survival (Abo-Ismail, 409 Brito et al. 2017, Wu, Guldbrandtsen et al. 2017) (Figure 4d). The region on chromosome 19

410 (52,370,000- 52,380,000 bp) with a very high Fst of 0.748 showed strong differentiation between 411 tRDC and mRDC which overlapped with a haplotype introgressed from HOL into mRDC. The 412 RPTOR gene associated with carcass traits in cattle (Sasago, Abe et al. 2017) is located here (Figure 413 4a). At the same time, RPTOR gene was found to play an important role regulates cell growth, 414 energy homeostasis, apoptosis, and immune response during adaptions (Sun, Southard et al. 2010). 415 Similarly, we found that two BSW introgressed haplotypes overlapped with genes (ITPR2 and 416 BCAT1) associated with milk yield, fat and protein yield and percentage (Pimentel, Bauersachs et al.

417 2011, Fang, Fu et al. 2014) showed a high Fst (Figure 4c). Moreover, the region on chromosome 6

418 (38,730,000- 38,780,000 bp) with an average Fst value of 0.251 overlapped with a BSW haplotype 419 introgressed into mRDC. The NCAPG and LCORL gene associated with body confirmation traits 420 such as stature, and feed efficiency (Eberlein, Takasuga et al. 2009, Lindholm-Perry, Sexten et al. 421 2011, Setoguchi, Watanabe et al. 2011, Xia, Fan et al. 2017) is located in this region.

422

423 Regions introgressed from HOL or BSW into mRDC and the significant regions from iHS test were 424 compared in Figures S4 and S5. There was a significant positive correlation between the rIBD score 425 larger than the median for introgressed region from BSW in mRDC and iHS test (p<0.001). 426 However, there was no significant correlation between rIBD score for introgressed regions from 427 HOL in mRDC and iHS test. For example, the genomic region putatively under selection from iHS 428 test (38,510,000- 38,540,000, 104 out of 108 SNPs with |iHS|>2, highest |iHS|=3.8) on chromosome 429 6 overlapped with an introgression signal from BSW into mRDC. This region located on the 430 upstream of LAP3 gene associated with milk composition including fat and protein percentage, 431 calving trait (Cohen-Zinder, Seroussi et al. 2005, Zheng, Ju et al. 2011). We observed the 432 significant SNPs with highest |iHS| of 5.3 from iHS test on chromosome 5 overlapped with a high 433 introgression signal from BSW into mRDC. These significant SNPs overlaps with the BCAT1 gene 434 associated with milk yield, protein and fat percentage in milk (Pimentel, Bauersachs et al. 2011) 435 (Figure 4c). There were significant SNPs with highest |iHS| of 4.23 on chromosome 15, which 436 overlapped with an introgressed region from BSW into mRDC. This region lies within the ZNF215 437 gene affecting body confirmation and milk composition (Magee, Sikora et al. 2010) (Figure 4e). 438 Genomic regions showing signals both from iHS test and introgression mapping, e.g. gene BCAT1, 439 were introgressed from BSW, and probably under selection, but not yet fixed in the population due 440 to low pression of selection or recent introduction.

441

442 There was a significant positive correlation (p<0.001) across genomic regions between the number 443 of individuals containing overlapped ROH regions and the rIBD scores larger than the median from 444 both HOL and BSW (Figures S6 and S7). . Sharing of short ROHs shared between individuals 445 indicated selection events that have reached or are close to fixation. Interestingly, many small 446 regions highly enriched for ROH hotspots, overlapped with the longest region introgressed from 447 HOL into mRDC such as the region where ZNF613 gene associated with young stock survival (Wu, 448 Guldbrandtsen et al. 2017) is located (Figure 4d). Similarly, we also observed that the region 449 introgressed from HOL in mRDC, where the THRSP and INTS4 genes are located, shows high 450 enrichment of ROH region among individuals (Figure 4b). Studies showed that gene THRSP is 451 associated with milk composition and involved in the regulation of mammary synthesis of milk fat 452 (Fontanesi, Calo et al. 2014) and gene INTS4 is associated with myristic acid content in carcass trait 453 (Sasago, Abe et al. 2017). One region introgressed from BSW into mRDC, contained the TGFA 454 gene associated with sperm motility (Hering, Olenski et al. 2014) also showed high levels of ROH 455 sharing between individuals.

456

457 Conclusion

458 459 Together, the observed results demonstrated how crossbreeding followed by selection shapes the 460 genomes of a modern breed on a genome-wide scale using dairy cattle as an example. The well- 461 documented breeding practice provide a robust model system to studying the consequences of 462 adaptive introgression. Key observations were a highly uneven distribution across the genome of 463 the proportions of genomic regions introgressed from the donor breeds. Highly introgressed regions 464 contained genes and QTLs known to affect traits of interest and subject to active selection by 465 breeders. These traits include milk production, feed efficiency, calving traits, body confirmation, 466 feed efficiency, carcass, and fertility traits. Artificial selection plays an important role on the 467 genomic footprints from introgression on the genome of a modern dairy cattle breed. These findings 468 contribute to the understanding of genomic consequences of selective introgression in the genomes 469 of modern species. 470

471 Data availability

472 The whole genome sequence data used in this study originated from the 1000 Bull Genome Project. 473 Part of these Whole-genome sequence data of individual bulls of the 1000 Bull Genomes Project 474 (Bouwman et al. 2018 Nature Genetics and Daetwyler et al. 2014 Nature Genetics) are available at 475 https://doi. org/10.1038/s41588-018-0056-5 and NCBI using SRA no. SRP039339, SRR1188706, 476 SRR1205973, SRR1205973, SRR1205992, SRR1262533, SRR1262536, SRR1262538, 477 SRR1262539, SRR1262614, SRR1262659, SRR1262660, SRR1262788, SRR1262789, 478 SRR1262846, SRR1293227. The test statistics and script in this study are available on http://XXX.

479 Competing interests

480 The authors declare that they have no competing interests.

481 Authors’ contributions

482 QZ developed and planned the design of the study, coordinated the study, performed data analyses 483 and drafted the manuscript. BG participated in design of the study, performed data analyses and 484 drafting of the manuscript. MB, GS and MC participated in design of the study, and drafting of the 485 manuscript. MSL participated in drafting of the manuscript. All authors read and approved the final 486 manuscript. 487 Acknowledgements

488 We are grateful to Viking Genetics (Randers, Denmark) for providing samples for sequencing. 489 Qianqian Zhang benefited from a joint grant from the European Commission within the framework 490 of the Erasmus-Mundus joint doctorate "EGS-ABG". This research was supported by the Center for 491 Genomic Selection in Animals and Plants (GenSAP) funded by Innovation Fund Denmark (grant 492 0603-00519B). Mario Calus acknowledges financial support from the Dutch Ministry of Economic 493 Affairs, Agriculture, and Innovation (Public-private partnership “Breed4Food” code BO-22.04-011- 494 001-ASG-LR).

495

496 Figures

497 Figure 1. Population structure for four catte breeds 498 a. Admixture analysis of different cattle breeds with k=3. BSW – Brown Swiss, HOL – Holstein, 499 tRDC – traditional Red Dairy Cattle, mRDC – modern Red Dairy cattle. b. Principal component 500 analysis (PCA) plots among different cattle breeds (Principal component 2 vs. principal component 501 3). BSW – Brown Swiss, HOL – Holstein, tRDC – traditional Red Dairy Cattle, mRDC – modern 502 Red Dairy cattle. 503 504 Figure 2. Genome-wide pattern of relative identity-by-descent (rIBD) score showing introgressed 505 haplotypes from Holstein (HOL) in modern Danish Red dairy cattle (mRDC). a. the rIBD score for 506 all 29 autosomes: the positive scores show the signals where it is more HOL-like whereas the 507 negative scores show the signals where it is more traditional Danish red cattle (tRDC)-like. b. the 508 distribution of rIBD scores: the positive scores show the signals where it is more HOL-like whereas 509 the negative scores show the signals where it is more tRDC-like. The chromosomes 1-29 are 510 colored in red and green in order. 511 512 Figure 3. Genome-wide pattern of relative identity-by-descent (rIBD) score showing introgressed 513 haplotypes from Brown Swiss (BSW) in modern Danish Red dairy cattle. a. the rIBD score for all 514 29 autosomes: the positive scores show the signals where it is more BSW-like whereas the negative 515 scores show the signals where it is more traditional Danish red cattle (tRDC)-like. b. the distribution 516 of rIBD score: the positive scores show the signals where it is more HOL-like whereas the negative 517 scores show the signals where it is more tRDC-like. The chromosomes 1-29 are colored in red and 518 green in order. 519

520 Figure 4. Examples of genomic regions with relative identity-by-descent (rIBD) scores, Fst,

521 integrated haplotype score (iHS) and sharing of ROH. a. Fst and rIBD for genomic region 522 containing gene RPTOR; b. rIBD and sharing of ROH for genomic region containing gene THRSP;

523 c. rIBD and iHS for genomic region containing gene BCAT1; d. rIBD, Fst and sharing of ROH for

524 genomic region containing genes MYH14 and ZNF613; e. rIBD, Fst and iHS for genomic region 525 containing gene ZNF215. 526

527 Supplementary materials

528

529 530 Table S1. Introgressed haplotypes in modern Red Dairy cattle (mRDC) from Holstein (HOL) 531 532 Table S2. Introgressed haplotypes in modern Red Dairy cattle (mRDC) from Brown Swiss (BSW) 533 534 Figure S1. The local ancestry dosage in modern Red Dairy cattle (mRDC) from Holstein (HOL), 535 Brown Swiss (BSW) and traditional Red Dairy cattle (tRDC) for a) chromosome 14, b) 536 chromosome 21 and c) chromosome 29. 537

538 Figure S2. Comparison and correlation between Fst between modern and traditional Red Dairy cattle 539 (mRDC and tRDC) and relative identity-by-descend (rIBD) score introgressed from Holstein (HOL)

540 cattle in mRDC (correlation between rIBD scores larger than median and the corresponding Fst was

541 0.08, p<0.001). a. rIBD score showing introgression from HOL cattle in mRDC. b. Fst between 542 mRDC and tRDC 543

544 Figure S3. Comparison and correlation between Fst between modern and traditional Red Dairy cattle 545 (mRDC and tRDC) and relative identity-by-descend (rIBD) score introgressed from Brown Swiss 546 (BSW) cattle in mRDC (correlation between rIBD scores larger than median and the corresponding

547 Fst was 0.09, p<0.001). a. rIBD score showing introgression from BSW cattle in mRDC. b. Fst 548 between mRDC and tRDC 549 550 Figure S4. Comparison between iHS score detected within modern Danish Red dairy cattle (mRDC) 551 and relative identity-by-descend (rIBD) score showing introgression from Holstein (HOL) cattle in 552 mRDC (correlation between rIBD scores larger than median and the corresponding proportion of 553 SNPs with |iHS|>2 was 0.02, p<0.05). a. genome-wide pattern of rIBD scores showing introgression 554 from HOL in mRDC. b. iHS sore detected within mRDC. 555 556 Figure S5. Comparison between iHS score detected within modern Danish Red dairy cattle (mRDC) 557 and relative identity-by-descend (rIBD) score showing introgression from 558 (BSW) in mRDC (correlation between rIBD scores larger than median and the corresponding 559 proportion of SNPs with |iHS|>2 in a 10 kb window was 0.17, p<0.001). a. genome-wide pattern of 560 rIBS scores showing introgression from BSW in mRDC. b. iHS sore detected within mRDC. 561 562 Figure S6. Comparison between overlapped runs-of-homozygosity (ROH) among modern Red 563 dairy cattle (mRDC) and relative identity-by-descend (rIBD) score showing introgression from 564 Holstein cattle (HOL) in mRDC (correlation between rIBD scores larger than median and the 565 corresponding number of individuals sharing an ROH was 0.05, p<0.001). a. genome-wide pattern 566 of rIBS scores showing introgression from HOL in mRDC. b. sharing of ROH among mRDC. 567 568 Figure S7. Comparison between overlapped runs-of-homozygosity (ROH) among modern Red 569 dairy cattle (mRDC) and relative identity-by-descend (rIBD) score showing introgression from 570 Brown Swiss cattle (BSW) in mRDC (correlation between rIBD scores larger than median and the 571 corresponding number of individuals sharing an ROH was 0.01, p<0.001). a. genome-wide pattern 572 of rIBS scores showing introgression from BSW in mRDC. b. sharing of ROH among mRDC. 573

574 References

575 Abo-Ismail, M. K., L. F. Brito, S. P. Miller, M. Sargolzaei, D. A. Grossi, S. S. Moore, G. Plastow, P. Stothard, S. 576 Nayeri and F. S. Schenkel, 2017 Genome-wide association studies and genomic prediction of breeding 577 values for calving performance and body conformation traits in Holstein cattle. Genetics Selection Evolution 578 49(1):82. 579 Abo-Ismail, M. K., M. J. Kelly, E. J. Squires, K. C. Swanson, S. Bauck and S. P. Miller, 2013 Identification of single 580 nucleotide polymorphisms in genes involved in digestive and metabolic processes associated with feed 581 efficiency and performance traits in beef cattle. Journal of Animal Science 91(6): 2512-2529. 582 Abo-Ismail, M. K., G. Vander Voort, J. J. Squires, K. C. Swanson, I. B. Mandell, X. P. Liao, P. Stothard, S. Moore, G. 583 Plastow and S. P. Miller, 2014 Single nucleotide polymorphisms for feed efficiency and performance in 584 crossbred beef cattle. BMC Genetics 15(1):14. 585 Ai, H. S., X. D. Fang, B. Yang, Z. Y. Huang, H. Chen, L. K. Mao, F. Zhang, L. Zhang, L. L. Cui, W. M. He, J. Yang, X. M. 586 Yao, L. S. Zhou, L. J. Han, J. Li, S. L. Sun, X. H. Xie, B. X. Lai, Y. Su, Y. Lu, H. Yang, T. Huang, W. J. Deng, R. 587 Nielsen, J. Ren and L. S. Huang, 2015. Adaptation and possible ancient interspecies introgression in pigs 588 identified by whole-genome sequencing. Nature Genetics 47(3): 217. 589 Alexander, D. H., J. Novembre and K. Lange, 2009 Fast model-based estimation of ancestry in unrelated 590 individuals. Genome Research 19(9): 1655-1664. 591 Alluwaimi, A. M., C. M. Leutenegger, T. B. Farver, P. V. Rossitto, W. L. Smith and J. S. Cullor, 2003 The cytokine 592 markers in Staphylococcus aureus mastitis of bovine mammary gland. Journal of Veterinary Medicine Series 593 B-Infectious Diseases and Veterinary Public Health 50(3): 105-111. 594 Andersen, B., B. Jensen, A. Nielsen, L. G. Christensen and T. Liboriussen, 2003 Rød Dansk Malkerace- 595 avlsmæssigt of kulturhistorisk belyst. Danmarks HordbrugsForskning. 596 Ashwell, M. S., D. W. Heyen, T. S. Sonstegard, C. P. Van Tassell, Y. Da, P. M. VanRaden, M. Ron, J. I. Weller and H. 597 A. Lewin, 2004. Detection of quantitative trait loci affecting milk production, health, and reproductive traits 598 in Holstein cattle. Journal of Dairy Science 87(2): 468-475. 599 Baeza, M. C., P. M. Corva, L. A. Soria, G. Rincon, J. F. Medrano, E. Pavan, E. L. Villarreal, A. Schor, L. Melucci, C. 600 Mezzadra and M. C. Miquel, 2011. Genetic markers of body composition and carcass quality in grazing 601 steers. Genetics and Molecular Research 10(4): 3146-3156. 602 Beecher, C., M. Daly, S. Childs, D. P. Berry, D. A. Magee, T. V. McCarthy and L. Giblin, 2010. Polymorphisms in 603 bovine immune genes and their associations with somatic cell count and milk production in dairy cattle. 604 BMC Genetics 11(1):99. 605 Benjamini, Y. and D. Yekutieli, 2001. The control of the false discovery rate in multiple testing under 606 dependency. Annals of Statistics 29(4): 1165-1188. 607 Bongiorni, S., G. Mancini, G. Chillemi, L. Pariset and A. Valentini, 2012. Identification of a Short Region on 608 Chromosome 6 Affecting Direct Calving Ease in Piedmontese Cattle Breed. PloS One 7(12): e50137. 609 Bosse, M., M. S. Lopes, O. Madsen, H. J. Megens, R. P. M. A. Crooijmans, L. A. F. Frantz, B. Harlizius, J. W. M. 610 Bastiaansen and M. A. M. Groenen, 2015. Artificial selection on introduced Asian haplotypes shaped the 611 genetic architecture in European commercial pigs. Proceedings of the Royal Society B-Biological Sciences 612 282(1821):20152019. 613 Bosse, M., H. J. Megens, L. A. F. Frantz, O. Madsen, G. Larson, Y. Paudel, N. Duijvesteijn, B. Harlizius, Y. 614 Hagemeijer, R. P. M. A. Crooijmans and M. A. M. Groenen, 2014. Genomic analysis reveals selection for 615 Asian genes in European pigs following human-mediated introgression. Nature Communications 5:4392. 616 Bosse, M., H. J. Megens, O. Madsen, L. A. F. Frantz, Y. Paudel, R. P. M. A. Crooijmans and M. A. M. Groenen, 617 2014. "Untangling the hybrid nature of modern pig genomes: a mosaic derived from biogeographically 618 distinct and highly divergent Sus scrofa populations. Molecular Ecology 23(16): 4089-4102. 619 Browning, S. R. and B. L. Browning, 2007. Rapid and accurate haplotype phasing and missing-data inference for 620 whole-genome association studies by use of localized haplotype clustering. American Journal of Human 621 Genetics 81(5): 1084-1097. 622 Buzanskas, M. E., R. V. Ventura, T. C. S. Chud, P. A. Bernardes, D. J. D. Santos, L. C. D. Regitano, M. M. de 623 Alencar, M. D. Mudadu, R. Zanella, M. V. G. B. da Silva, C. X. Li, F. S. Schenkel and D. P. Munari, 2017. Study 624 on the introgression of beef breeds in Canchim cattle using single nucleotide polymorphism markers. PloS 625 One 12(2): e0171660. 626 Capomaccio, S., M. Milanesi, L. Bomba, K. Cappelli, E. L. Nicolazzi, J. L. Williams, P. Ajmone-Marsan and B. 627 Stefanon, 2015. Searching new signals for production traits through gene-based association analysis in 628 three Italian cattle breeds. Animal Genetics 46(4): 361-370. 629 Chang, C. C., C. C. Chow, L. C. A. M. Tellier, S. Vattikuti, S. M. Purcell and J. J. Lee, 2015. Second-generation 630 PLINK: rising to the challenge of larger and richer datasets. Gigascience 4(1):7. 631 Cochran, S. D., J. B. Cole, D. J. Null and P. J. Hansen, 2013. Discovery of single nucleotide polymorphisms in 632 candidate genes associated with fertility and production traits in Holstein cattle. BMC Genetics 14(1):49. 633 Cohen-Zinder, M., E. Seroussi, D. M. Larkin, J. J. Loor, A. Everts-van der Wind, J. H. Lee, J. K. Drackley, M. R. 634 Band, A. G. Hernandez, M. Shani, H. A. Lewin, J. I. Weller and M. Ron, 2005. Identification of a missense 635 mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6 affecting milk yield 636 and composition in Holstein cattle. Genome Research 15(7): 936-944. 637 Cole, J. B., B. Waurich, M. Wensch-Dorendorf, D. M. Bickhart and H. H. Swalve, 2014. A genome-wide 638 association study of calf birth weight in Holstein cattle using single nucleotide polymorphisms and 639 phenotypes predicted from auxiliary traits. Journal of Dairy Science 97(5): 3156-3172. 640 Cole, J. B., G. R. Wiggans, L. Ma, T. S. Sonstegard, T. J. Lawlor, Jr., B. A. Crooker, C. P. Van Tassell, J. Yang, S. 641 Wang, L. K. Matukumalli and Y. Da, 2011. Genome-wide association analysis of thirty one production, 642 health, reproduction and body conformation traits in contemporary U.S. Holstein cows. BMC Genomics 643 12(1): 408. 644 Daetwyler, H. D., A. Capitan, H. Pausch, P. Stothard, R. Van Binsbergen, R. F. Brondum, X. P. Liao, A. Djari, S. C. 645 Rodriguez, C. Grohs, D. Esquerre, O. Bouchez, M. N. Rossignol, C. Klopp, D. Rocha, S. Fritz, A. Eggen, P. J. 646 Bowman, D. Coote, A. J. Chamberlain, C. Anderson, C. P. VanTassell, I. Hulsegge, M. E. Goddard, B. 647 Guldbrandtsen, M. S. Lund, R. F. Veerkamp, D. A. Boichard, R. Fries and B. J. Hayes, 2014. Whole-genome 648 sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genetics 649 46(8): 858-865. 650 Davis, S. R., R. J. Spelman and M. D. Littlejohn, 2017. BREEDING AND GENETICS SYMPOSIUM:Breeding heat 651 tolerant dairy cattle: the case for introgression of the "slick" prolactin receptor variant into dairy breeds. 652 Journal of Animal Science 95(4): 1788-1800. 653 Deschamps, M., G. Laval, M. Fagny, Y. Itan, L. Abel, J. L. Casanova, E. Patin and L. Quintana-Murci, 2016. 654 Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate 655 Immunity Genes. American Journal of Human Genetics 98(1): 5-21. 656 Eberlein, A., A. Takasuga, K. Setoguchi, R. Pfuhl, K. Flisikowski, R. Fries, N. Klopp, R. Furbass, R. Weikard and C. 657 Kuhn, 2009. Dissection of Genetic Factors Modulating Fetal Growth in Cattle Indicates a Substantial Role of 658 the Non-SMC Condensin I Complex, Subunit G (NCAPG) Gene. Genetics 183(3): 951-964. 659 Fang, M., W. X. Fu, D. Jiang, Q. Zhang, D. X. Sun, X. D. Ding and J. F. Liu, 2014. A Multiple-SNP Approach for 660 Genome-Wide Association Study of Milk Production Traits in Chinese Holstein Cattle. PloS One 9(8): e99544. 661 Figueiro, H. V., G. Li, F. J. Trindade, J. Assis, F. Pais, G. Fernandes, S. H. D. Santos, G. M. Hughes, A. Komissarov, 662 A. Antunes, C. S. Trinca, M. R. Rodrigues, T. Linderoth, K. Bi, L. Silveira, F. C. C. Azevedo, D. Kantek, E. 663 Ramalho, R. A. Brassaloti, P. M. S. Villela, A. L. V. Nunes, R. H. F. Teixeira, R. G. Morato, D. Loska, P. 664 Saragueta, T. Gabaldon, E. C. Teeling, S. J. O'Brien, R. Nielsen, L. L. Coutinho, G. Oliveira, W. J. Murphy and E. 665 Eizirik, 2017. Genome-wide signatures of complex introgression and adaptive evolution in the big cats. 666 Science Advances 3(7): e1700299. 667 Fontanesi, L., D. G. Calo, G. Galimberti, R. Negrini, R. Marino, A. Nardone, P. Ajmone-Marsan and V. Russo, 668 2014. A candidate gene association study for nine economically important traits in Italian Holstein cattle. 669 Animal Genetics 45(4): 576-580. 670 Frischknecht, M., B. Bapst, F. R. Seefried, H. Signer-Hasler, D. Garrick, C. Stricker, R. Fries, I. Russ, J. Solkner, A. 671 Bieber, M. G. Strillacci, B. Gredler-Grandl, C. Flury and I. Consortium, 2017. Genome-wide association 672 studies of fertility and calving traits in Brown Swiss cattle using imputed whole-genome sequences. BMC 673 Genomics 18(1):910. 674 Gaddis, K. L. P., D. J. Null and J. B. Cole, 2016. Explorations in genome-wide association studies and network 675 analyses with dairy cattle fertility traits. Journal of Dairy Science 99(8): 6420-6435. 676 Galov, A., E. Fabbri, R. Caniglia, H. Arbanasic, S. Lapalombella, T. Florijancic, I. Boskovic, M. Galaverni and E. 677 Randi, 2015. First evidence of hybridization between golden jackal (Canis aureus) and domestic dog (Canis 678 familiaris) as revealed by genetic markers. Royal Society Open Science 2(12): 150450. 679 Gautier, M. and R. Vitalis, 2012. rehh: an R package to detect footprints of selection in genome-wide SNP data 680 from haplotype structure. Bioinformatics 28(8): 1176-1177. 681 Grun, B. and F. Leisch, 2008. FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and 682 Constant Parameters. Journal of Statistical Software 28(4): 1-35. 683 Guan, Y. T., 2014. Detecting Structure of Haplotypes and Local Ancestry. Genetics 196(3): 625-642. 684 Hartwig, S., R. Wellmann, R. Emmerling, H. Hamann and J. Bennewitz, 2015. Short communication: Importance 685 of introgression for milk traits in the German Vorderwald and Hinterwald cattle. Journal of Dairy Science 686 98(3): 2033-2038. 687 Hasenkamp, N., T. Solomon and D. Tautz, 2015. Selective sweeps versus introgression - population genetic 688 dynamics of the murine leukemia virus receptor Xpr1 in wild populations of the house mouse (Mus 689 musculus). BMC Evolutionary Biology 15(1):248. 690 Hering, D. M., K. Olenski and S. Kaminski, 2014. Genome-wide association study for poor sperm motility in 691 Holstein-Friesian bulls. Animal Reproduction Science 146(3-4): 89-97. 692 Hoglund, J. K., B. Buitenhuis, B. Guldbrandtsen, M. S. Lund and G. Sahana, 2015. Genome-wide association 693 study for female fertility in Nordic Red cattle. BMC Genetics 16(1):110. 694 Hope, A. C. A., 1968. A Simplified Monte Carlo Significance Test Procedure. Journal of the Royal Statistical 695 Society Series B-Statistical Methodology 30(3): 582-598. 696 Hu, Z. L., C. A. Park and J. M. Reecy, 2016. Developmental progress and current status of the Animal QTLdb. 697 Nucleic Acids Research 44(D1): D827-D833. 698 Jagoda, E., D. J. Lawson, J. D. Wall, D. Lambert, C. Muller, M. Westaway, M. Leavesley, T. D. Capellini, M. 699 Mirazon Lahr, P. Gerbault, M. G. Thomas, A. B. Migliano, E. Willerslev, M. Metspalu and L. Pagani, 2017. 700 Disentangling Immediate Adaptive Introgression from Selection on Standing Introgressed Variation in 701 Humans. Molecular Biology Evolution 35(3):623-630. 702 Kantanen, J., I. Olsaker, L. E. Holm, S. Lien, J. Vilkki, K. Brusgaard, E. Eythorsdottir, B. Danell and S. Adalsteinsson, 703 2000. Genetic diversity and population structure of 20 North European cattle breeds. Journal of Heredity 704 91(6): 446-457. 705 Keele, J. W., L. A. Kuehn, T. G. McDaneld, R. G. Tait, S. A. Jones, T. P. L. Smith, S. D. Shackelford, D. A. King, T. L. 706 Wheeler, A. K. Lindholm-Perry and A. K. McNeel, 2015. Genomewide association study of lung lesions in 707 cattle using sample pooling. Journal of Animal Science 93(3): 956-964. 708 Kinsella, R. J., A. Kahari, S. Haider, J. Zamora, G. Proctor, G. Spudich, J. Almeida-King, D. Staines, P. Derwent, A. 709 Kerhornou, P. Kersey and P. Flicek, 2011. Ensembl BioMarts: a hub for data retrieval across taxonomic 710 space. Database-the Journal of Biological Databases and Curation. 711 Kolbehdari, D., Z. Wang, J. R. Grant, B. Murdoch, A. Prasad, Z. Xiu, E. Marques, P. Stothard and S. S. Moore, 712 2008. A whole-genome scan to map quantitative trait loci for conformation and functional traits in 713 Canadian Holstein Bulls. Journal of Dairy Science 91(7): 2844-2856. 714 Li, H. and R. Durbin, 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. 715 Bioinformatics 25(14): 1754-1760. 716 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin and G. P. D. 717 Proc, 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16): 2078-2079. 718 Lindholm-Perry, A. K., A. K. Sexten, L. A. Kuehn, T. P. L. Smith, D. A. King, S. D. Shackelford, T. L. Wheeler, C. L. 719 Ferrell, T. G. Jenkins, W. M. Snelling and H. C. Freetly, 2011. Association, effects and validation of 720 polymorphisms within the NCAPG-LCORL locus located on BTA6 with feed intake, gain, meat and carcass 721 traits in beef cattle. BMC Genetics 12(1):103. 722 Magee, D. A., K. M. Sikora, E. W. Berkowicz, D. P. Berry, D. J. Howard, M. P. Mullen, R. D. Evans, C. Spillane and 723 D. E. MacHugh, 2010. DNA sequence polymorphisms in a panel of eight candidate bovine imprinted genes 724 and their association with performance traits in Irish Holstein-Friesian cattle. BMC Genetics 11(1):93. 725 Mao, X., N. K. Kadri, J. R. Thomasen, D. J. De Koning, G. Sahana and B. Guldbrandtsen, 2016. Fine mapping of a 726 calving QTL on Bos taurus autosome 18 in Holstein cattle. Journal of Animal Breeding and Genetics 133(3): 727 207-218. 728 McKenna, A., M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, 729 M. Daly and M. A. DePristo, 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing 730 next-generation DNA sequencing data. Genome Research 20: 1297–1303. 731 Merks, J. W. M., P. K. Mathur and E. F. Knol, 2012. New phenotypes for new breeding goals in pigs. Animal 6(4): 732 535-543. 733 Michenet, A., R. Saintilan, E. Venot and F. Phocas, 2016. Insights into the genetic variation of maternal behavior 734 and suckling performance of continental beef cows. Genetics Selection Evolution 48(1):45. 735 Pickrell, J. K. and J. K. Pritchard, 2012. Inference of Population Splits and Mixtures from Genome-Wide Allele 736 Frequency Data. PloS Genetics 8(11): e1002967. 737 Pimentel, E. C. G., S. Bauersachs, M. Tietze, H. Simianer, J. Tetens, G. Thaller, F. Reinhardt, E. Wolf and S. Konig, 738 2011. Exploration of relationships between production and fertility traits in dairy cattle via association 739 studies of SNPs within candidate genes derived by expression profiling. Animal Genetics 42(3): 251-262. 740 Prakash, V., T. K. Bhattacharya, B. Jyotsana and O. P. Pandey, 2011. Molecular Cloning, Characterization, 741 Polymorphism, and Association Study of the Interleukin-2 Gene in Indian Crossbred Cattle. Biochemical 742 Genetics 49(9-10): 638-644. 743 Sabeti, P. C., D. E. Reich, J. M. Higgins, H. Z. P. Levine, D. J. Richter, S. F. Schaffner, S. B. Gabriel, J. V. Platko, N. J. 744 Patterson, G. J. McDonald, H. C. Ackerman, S. J. Campbell, D. Altshuler, R. Cooper, D. Kwiatkowski, R. Ward 745 and E. S. Lander, 2002. Detecting recent positive selection in the human genome from haplotype structure. 746 Nature 419(6909): 832-837. 747 Sahana, G., J. K. Hoglund, B. Guldbrandtsen and M. S. Lund, 2015. Loci associated with adult stature also affect 748 calf birth survival in cattle. BMC Genetics 16(1):47. 749 Sasago, N., T. Abe, H. Sakuma, T. Kojima and Y. Uemoto, 2017. Genome-wide association study for carcass 750 traits, fatty acid composition, chemical composition, sugar, and the effects of related candidate genes in 751 Japanese Black cattle. Animal Science Journal 88(1): 33-44. 752 Setoguchi, K., T. Watanabe, R. Weikard, E. Albrecht, C. Kuhn, A. Kinoshita, Y. Sugimoto and A. Takasuga, 2011. 753 The SNP c.1326T > G in the non-SMC condensin I complex, subunit G (NCAPG) gene encoding a p.Ile442Met 754 variant is associated with an increase in body frame size at puberty in cattle. Animal Genetics 42(6): 650- 755 655. 756 Sherry, S. T., M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, 2001. dbSNP: the NCBI 757 database of genetic variation. Nucleic Acids Research 29(1): 308-311. 758 Sun, C., C. Southard, D. B. Witonsky, R. Kittler and A. Di Rienzo, 2010. Allele-Specific Down-Regulation of RPTOR 759 Expression Induced by Retinoids Contributes to Climate Adaptations. PloS Genetics 6(10):e1001178. 760 Saatchi, M., R. D. Schnabel, J. F. Taylor and D. J. Garrick, 2014. Large-effect pleiotropic or closely linked QTL 761 segregate within and across ten US cattle breeds. BMC Genomics 15(1):442. 762 Voight, B. F., S. Kudaravalli, X. Q. Wen and J. K. Pritchard, 2006. A map of recent positive selection in the human 763 genome. PloS Biology 4(3): 446-458. 764 vonHoldt, B. M., R. Kays, J. P. Pollinger and R. K. Wayne, 2016. Admixture mapping identifies introgressed 765 genomic regions in North American canids. Molecular Ecology 25(11): 2443-2453. 766 Weir, B. S. and C. C. Cockerham, 1984. Estimating F-Statistics for the Analysis of Population-Structure. Evolution 767 38(6): 1358-1370. 768 Weng, Z. Q., M. Saatchi, R. D. Schnabel, J. F. Taylor and D. J. Garrick, 2014. Recombination locations and rates in 769 beef cattle assessed from parent-offspring pairs. Genetics Selection Evolution 46(1):34. 770 White, S., 2011. from globalized PIG BREEDS TO CAPITALIST PIGS: A STUDY IN ANIMAL CULTURES AND 771 EVOLUTIONARY HISTORY. Environmental History 16(1): 94-120. 772 Williams, R. L., 2000. A note on robust variance estimation for cluster-correlated data. Biometrics 56(2): 645- 773 646. 774 Wu, X. P., B. Guldbrandtsen, U. S. Nielsen, M. S. Lund and G. Sahana, 2017. Association analysis for young stock 775 survival index with imputed whole-genome sequence variants in Nordic Holstein cattle. Journal of Dairy 776 Science 100(8): 6356-6370. 777 Xia, J. W., H. Z. Fan, T. P. Chang, L. Y. Xu, W. G. Zhang, Y. X. Song, B. Zhu, L. P. Zhang, X. Gao, Y. Chen, J. Y. Li and 778 H. J. Gao, 2017. Searching for new loci and candidate genes for economically important traits through 779 genebased association analysis of Simmental cattle. Scientific Reports 7:42048. 780 Yang, J. A., S. H. Lee, M. E. Goddard and P. M. Visscher, 2011. GCTA: A Tool for Genome-wide Complex Trait 781 Analysis. American Journal of Human Genetics 88(1): 76-82. 782 Zhang, Q. Q., M. P. L. Calus, B. Guldbrandtsen, M. S. Lund and G. Sahana, 2015. Estimation of inbreeding using 783 pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds. BMC Genetics 16(1):88. 784 Zhang, Q. Q., B. Guldbrandtsen, M. Bosse, M. S. Lund and G. Sahana, 2015. Runs of homozygosity and 785 distribution of functional variants in the cattle genome. BMC Genomics 16(1):542. 786 Zheng, X., Z. H. Ju, J. Wang, Q. L. Li, J. M. Huang, A. W. Zhang, J. F. Zhong and C. F. Wang, 2011. Single 787 nucleotide polymorphisms, haplotypes and combined genotypes of LAP3 gene in bovine and their 788 association with milk production traits. Molecular Biology Reports 38(6): 4053-4061. 789 Zimin, A. V., A. L. Delcher, L. Florea, D. R. Kelley, M. C. Schatz, D. Puiu, F. Hanrahan, G. Pertea, C. P. Van Tassell, 790 T. S. Sonstegard, G. Marcais, M. Roberts, P. Subramanian, J. A. Yorke and S. L. Salzberg, 2009. A whole- 791 genome assembly of the domestic cow, Bos taurus. Genome Biology 10(4):R42.

792

793