<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Genetics of recombination rate variation in the pig 2 3 Martin Johnsson1,2*, Andrew Whalen1*, Roger Ros-Freixedes1,3, Gregor Gorjanc1, Ching-Yi 4 Chen4, William O. Herring4, Dirk-Jan de Koning2, John M. Hickey1† 5 6 1 The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of 7 Edinburgh, Midlothian, EH25 9RG, Scotland, United Kingdom. 8 2 Department of Animal Breeding and Genetics, Swedish University of Agricultural 9 Sciences, Box 7023, 750 07 Uppsala, Sweden. 10 3 Departament de Ciència Animal, Universitat de Lleida-Agrotecnio Center, Lleida,Spain. 11 4 Genus plc, 100 Bluegrass Commons Blvd., Ste122200, Hendersonville, TN 37075, USA. 12 13 * Equal contributions 14 † Corresponding author 15 [email protected] 16

17 Abstract 18 Background 19 In this paper, we estimated recombination rate variation within the genome and between 20 individuals in the pig using multiocus iterative peeling for 150,000 pigs across nine 21 genotyped pedigrees. We used this to estimate the heritability of recombination and perform 22 a genome-wide association study of recombination in the pig. 23 24 Results 25 Our results confirmed known features of the pig recombination landscape, including 26 differences in length, and marked sex differences. The recombination landscape 27 was repeatable between lines, but at the same time, the lines also showed differences in 28 average genome-wide recombination rate. The heritability of genome-wide recombination 29 was low but non-zero (on average 0.07 for females and 0.05 for males). We found three 30 genomic regions associated with recombination rate, one of them harbouring the RNF212 31 , previously associated with recombination rate in several other species. 32 33 Conclusion bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

34 Our results from the pig agree with the picture of recombination rate variation in vertebrates, 35 with low but nonzero heritability, and a major locus that is homologous to one detected in 36 several other species. This work also highlights the utility of using large-scale livestock data 37 to understand biological processes. 38

39 Background 40 41 This paper shows that recombination rate in the pig (Sus scrofa) is genetically variable and 42 associated with alleles at the RNF212 gene. 43 44 Recombination causes exchange of genetic material between homologous . At 45 meiosis, after chromosomes have been paired up and duplicated, they break and exchange 46 pieces of chromosome arms. These recombinations are not evenly distributed along 47 chromosomes. This gives rise to a variable recombination rate landscape with peaks and 48 troughs. 49 50 The recombination rate landscape of the pig has been estimated previously [1]. It shows 51 broadly the same features as in other mammals: low recombination rate in the centre of 52 chromosomes, local hotspots of high recombination rate, a correlation between 53 recombination rate and the fraction of guanine and cytosine bases (GC content), and sex 54 difference in recombination rate [2–4]. In this paper, we investigated how recombination rate 55 varied between individuals and populations in the pig. 56 57 Recombination rate is genetically variable in several other species. Studies in humans [5, 6], 58 cattle [7–10], deer [11], sheep [12, 13] and chickens [14] have observed genetic influence on 59 recombination rate, and genetic associations with alleles at a handful of involved in 60 meiosis, including RNF212, REC8 and PRDM9 (reviewed by [15, 16]). 61 62 To be able to analyse the genetic basis of recombination, we need recombination estimates 63 from a large number related pigs. Recombination rate can be estimated by phasing genotypes 64 in pedigrees [17–20], by direct counting in gametes [21, 22], or by measuring linkage 65 disequilibrium in population samples [2]. Counting methods require specific experiments to bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

66 gather data. Linkage disequilibrium methods only provide averages for a population. In this 67 paper, we used a new pedigree method based on multilocus iterative peeling [23, 24] to 68 estimate recombination simultaneously with genotype imputation. This allowed us to use data 69 from a pig breeding programme, where variable density genotype data has been gathered for 70 genomic selection. 71 72 Our results confirmed known features of the pig recombination landscape, including 73 differences in chromosome length, and marked sex difference. The recombination landscape 74 was repeatable between lines, but at the same time, the lines showed differences in average 75 genome-wide recombination rate. The heritability of genome-wide recombination was low 76 but non-zero. We found three genomic regions associated with recombination rate, one of 77 them harbouring the RNF212 gene, previously associated with recombination rate in several 78 other species.

79 Methods 80 81 We estimated the recombination rate landscape in nine lines of pigs from a commercial 82 breeding programme. We performed six analyses: 83 (1) We estimated the average number of recombinations on each chromosome (the 84 genetic length of chromosomes), and analysed between-sex and between-line 85 differences in genetic length. We compared these estimates to previously published 86 estimates. 87 (2) We estimated the distribution of recombinations along chromosomes (recombination 88 rate landscapes), and analysed between-line and between-sex differences. 89 (3) We estimated the correlation between recombination rate and DNA sequence features 90 previously known to correlate with recombination rate. 91 (4) We estimated pedigree heritability and genomic heritability of recombination rate. 92 (5) We ran a genome-wide association study to detect markers associated with 93 recombination rate. 94 (6) We ran a simulation to test the performance of the method. 95 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

96 Data 97 98 We used SNP chip genotype data from nine lines of pigs from the Pig Improvement 99 Company (PIC) breeding programme. This programme contains a diverse collection of 100 genetics, which represent broadly used populations, including animals of Large White, 101 Landrace, Duroc, Hampshire and Pietrain heritage. The pigs were genotyped at a mix of 102 densities; either at low density (15K markers) using GGP-Porcine LD BeadChips (GeneSeek, 103 Lincoln, NE) or at high density (60K or 75K markers) using GGP-Porcine HD BeadChips 104 (GeneSeek, Lincoln, NE). In total, genotype data was available on 390,758 pigs. 105 106 Recombination rate estimation using multilocus iterative peeling 107 108 We used multilocus iterative peeling to estimate the number and location of the 109 recombination events in each individual [23, 24]. Multilocus iterative peeling uses pedigree 110 data to calculate the phased genotype of each individual as a combination of information 111 from the individual’s own genetic data, and that of their parents (anterior probabilities) and 112 offspring (posterior probabilities) [25]. Multilocus iterative peeling builds on previous 113 peeling algorithms by tracking which parental haplotype an individual inherits at each locus 114 (segregation probabilities). This information can be used to determine which allele an 115 individual inherits, particularly from parents who are heterozygous for that allele. 116 117 The segregation probabilities can be used to determine the number and location of likely 118 recombination events. When a recombination happens, the offspring will inherit from a 119 different parental haplotype. This will cause one, or both of the segregation probabilities to 120 change, i.e. the segregation probability will change from a value close to 0 (likely to inherit 121 the maternal haplotype) to 1 (likely to inherit the paternal haplotype). By analysing the joint 122 distribution of neighbouring segregation probabilities, we are able to calculate the expected 123 number of recombinations between two loci, and the expected number of recombinations 124 across an entire chromosome. 125 126 To aid recombination rate estimation, we introduced two simplifications to the multilocus 127 peeling method: bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

128 1. The segregation probabilities and the anterior probabilities were calculated separately 129 for each parent in lieu of modelling their full joint distribution. 130 2. The segregation and genotype probabilities of the offspring were called when 131 estimating the posterior term for each parent. 132 133 These simplifications were introduced to reduce runtime and memory requirements. In 134 particular, by calling the segregation and genotype values, we are able to store many of the 135 calculations in lookup tables instead of re-computing them for each locus, and each 136 individual. In addition, the calling of segregation values reduced the chance that feedback 137 loops occurred between offspring with fractional segregation values at multiple nearby loci. 138 139 A calling threshold of 0.99 was used for the segregation probabilities, and a calling threshold 140 of 0.9 was used for the genotype probabilities. Segregation probabilities that did not reach the 141 threshold were set to missing (equally likely to inherit either parental haplotype). Genotype 142 probabilities that did not meet the threshold were also set to missing (all genotype states 143 equally likely). 144 145 The joint distribution of segregation values depends on the chromosome length (in cM). To 146 estimate chromosome length, we initialized the length to 100cM (on average 1 recombination 147 per chromosome), and then refined this estimate in a series of steps. At each step we 148 calculated the expected number of recombination for each individual at each locus, and set 149 the chromosome length based on the average population recombination rate. This step was 150 repeated four times. Preliminary simulations found that chromosome length estimates 151 converged after four iterations, and that the recombination estimates for target individuals 152 were insensitive to the assumed chromosome length. 153 154 Filtering of individuals 155 156 After recombination estimation, we filtered the data by removing individuals without 157 genotyped parents and grandparents in order to focus on those with high-quality 158 recombination estimates. Filtering reduced the number of pigs to 145,763. Table 1 shows the 159 resulting number of individuals per line post-filtering, and the total number of dams and sires 160 for those individuals. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

161 162 Table 1. Number of individuals that passed filtering in each line, and the unique number of 163 their dams and sires. By necessity, we inferred recombination rates from an equal number of 164 maternal and paternal chromosomes, but they derive from a much larger number of dams 165 than sires. Line Individuals kept Dams Sires 1 23273 2651 437 2 16661 2255 368 3 14278 2169 215 4 7153 1239 163 5 33566 4349 293 6 11666 1971 162 7 263 76 20 8 4177 727 78 9 34726 5171 492 166 167 168 Comparison between lines and to published maps 169 170 To compare the recombination landscapes of the nine lines we calculated pairwise 171 correlations between lines of the estimated recombination rates at each marker interval, 172 within each sex. To compare the recombination landscapes of the sexes, we calculated the 173 correlation between sexes within each line. 174 175 We compared map length between lines using a linear model, fitting the number of 176 recombinations observed on a chromosome as response, and fixed effects for each line and 177 chromosome. 178 179 To compare the estimated landscapes to published landscapes, we also compared our results 180 to the results of [1] by plotting our map length of each chromosome against published map 181 lengths. 182 183 Correlation with genome features 184 185 To investigate the relationship between local recombination rate and genomic features, we 186 divided the autosomal part of the Sscrofa11.1 genome [26] into 2272 windows of 1 Mbp. We bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

187 used Biostrings version 2.52.0 in the R statistical environment to estimate three features of 188 sequence composition: 189 • fraction of guanine and cytosine bases (GC content); 190 • the PDRM9 consensus motif CCNCCNTNNCCNC [27]; 191 • the CCCCACCCC motif, which was the most strongly associated with recombination 192 in the pig in [1]. 193 194 We used repeat data from RepeatMasker (http://www.repeatmasker.org) [28] from the pig 195 genome to estimate the density of repeats in the same windows. We subdivided the total 196 content of repeats into three broad categories: 197 • Fraction of LTR elements 198 • Fraction of DNA repeats elements 199 • Fraction of low complexity repeats 200 201 We calculated the correlation between the recombination rate and the sequence features 202 within each window. 203 204 To find putative pericentromeric regions, we used the inferred centromere positions from 205 [26]. On chromosomes 8, 11 and 15, where there were more than one inferred location far 206 apart, we picked the most likely location based on karyotypes from [29]. 207 208 Heritability of genome-wide recombination rate 209 210 We estimated the narrow-sense heritability of genome-wide recombination rate using animal 211 models in MCMCglmm [30] version 2.29. We estimated the heritability of recombination 212 using genome-wide recombination rates per megabasepair. We fitted a pedigree animal 213 model with an additive genetic effect and a permanent environmental effect for each parent 214 as random effects. Because we measured recombination rate in parents of genotyped 215 offspring, who have varying numbers of offspring (see Table 1), we used a model with 216 repeated records and a permanent environmental effect for each parent. We analysed sexes 217 and lines separately. We used parameter expanded priors [31] for the individual variance

218 component and for the additive genetic variance component, using V = 1, ν = 1, αµ = 0, αV = 219 1000, which corresponds to a half-Cauchy prior with scale 100, and an inverse-Wishart prior bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

220 (V = 1, ν = 1) for the residual variance. Because of the low number of dams and sires, we 221 excluded the smallest line (line 7) from the quantitative genetic analysis. We also excluded 222 parents with an extremely high average recombination rate (> 5 cM/Mbp). 223 224 Genome-wide association 225 226 We performed genome-wide association studies of genome-wide recombination rates using 227 hierarchical linear mixed models in RepeatABEL [32] version 1.1. The linear mixed model 228 uses a genomic relationship matrix to account for relatedness while including a random 229 permanent environmental effect for each parent. We analysed sexes and lines separately. We 230 used imputed best-guess genotypes from the same run of AlphaPeel. Because of the low 231 number of dams and sires, we again excluded line 7 from the analysis, and parents with 232 average recombination rate > 5 cM/Mbp. We report significant markers below a conventional 233 threshold of p < 5 · 10-8. We used the most significant marker in each region to report 234 variance explained and the frequency of the allele associated with higher recombination. 235 When there were more than one marker with the same p-value, we selected the marker 236 closest to the middle of the interval. 237 238 Simulations 239 240 To demonstrate that the method works, we tested it on a synthetic dataset with features 241 similar to real data. We simulated genotype data with AlphaSimR 0.10.0. We simulated one 242 chromosome, using the same pedigree and same number of genotyped markers as the largest 243 line. The simulated recombination landscape had a constant recombination rate in the middle 244 of the chromosome, and two regions of high recombination rate at the ends, described by 245 second degree polynomials (the figure shows the resulting true recombination rate). We 246 assessed accuracy of the inferred recombination landscape by calculating the correlation 247 between the estimated number of recombination at each marker interval and the true number 248 of recombination. We also calculated the correlation between the estimated number of 249 recombinations and a smoothed recombination landscape, using a window of 50 markers. 250 251 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

252 Results 253 254 Our results showed that: 255 (1) There was variation in the genetic length of chromosomes between sexes and lines. 256 (2) The recombination rate landscape was similar between lines but different between 257 sexes. 258 (3) We confirmed previous findings that local recombination rate is correlated with GC 259 content, repeat content, the CCCCACCCC sequence motif, but not the previously 260 described correlation with the PRDM9 consensus motif. 261 (4) The heritability of recombination rate was on average 0.07 for females and 0.05 for 262 males. 263 (5) Three regions of the genome were associated with recombination rate, one of them 264 containing the candidate gene RNF212. 265 (6) In simulation, we found that multilocus iterative peeling could estimate the number of 266 recombinations per individual with an accuracy of 0.7 for dams and 0.5 for sires, and 267 the average recombination landscape along a chromosome, but with a tendency to 268 overestimate the genetic length. 269 270 Variation in genetic map length between lines and sexes 271 272 The genetic length of chromosomes was different between lines and sexes. Figure 1 shows 273 the estimated map length of each chromosome, along with previously published estimates 274 [1]. Table 2 gives the estimated of total map length in each sex and line, with confidence 275 intervals derived from a linear model. On average, we estimated a sex-averaged map of 21.5 276 Morgan (0.95 cM/Mbp), a female map of 23.6 Morgan (1.04 cM/Mbp), and a male map of 277 19.5 Morgan (0.86 cM/Mbp). Supplementary tables 1-3 contain male, female, and sex- 278 averaged consensus maps of the pig recombination landscape. 279 280 Our estimated genetic lengths of chromosomes were comparable to previous estimates, but 281 tended to be higher. We found that females have higher recombination rate, except on 282 chromosome 1, where male recombination rate was higher, and chromosome 13, where the 283 recombination rate is similar in both sexes. This confirms previous results [1]. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

284 Table 2. Estimates from linear model of total map length. Intervals are 95% confidence 285 intervals. Map length Rate Line Sex (Morgan) Lower Upper (cM/Mbp) 1 female 23.6 23.5 23.6 1.04 1 male 19.4 19.4 19.5 0.86 2 female 24.1 24.1 24.2 1.06 2 male 20.0 20.0 20.0 0.88 3 female 22.3 22.2 22.3 0.98 3 male 18.2 18.1 18.2 0.80 4 female 23.5 23.4 23.5 1.04 4 male 19.3 19.3 19.4 0.85 5 female 22.8 22.7 22.8 1.01 5 male 18.7 18.6 18.7 0.82 6 female 23.7 23.6 23.7 1.04 6 male 19.5 19.5 19.6 0.86 7 female 25.9 25.5 26.2 1.14 7 male 21.7 21.4 22.1 0.96 8 female 24.1 24.0 24.2 1.06 8 male 20.0 19.9 20.1 0.88 9 female 22.6 22.6 22.6 1.00 9 male 18.5 18.4 18.5 0.82 Average female 23.6 1.04 male 19.5 0.86 sex-average 21.5 0.95 286 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

287 A Estimated genetic length

2.0 ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1.0 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● Genetic length (Morgan) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.5 ● 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

B 2.0

1.5

1.0 Genetic length (Morgan)

0.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Sex male female 288 289 Figure 1. Genetic length of each pig autosome, as estimated by multilocus iterative peeling. 290 The horizontal axis corresponds to chromosomes 1-18. Red dots and lines show female 291 estimates, while blue dots and lines show male estimates. Panel A compares estimates from 292 multilocus iterative peeling (filled dots) to estimates from [1] (open circles). Panel B shows 293 estimates from the same line of pigs connected by lines. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

294 Difference in recombination landscape between sexes 295 296 The shape of the recombination landscape was similar between lines but different between 297 sexes. Figure 2 presents the recombination rate landscape for each chromosome, and Figure 3 298 shows the correlation between the per-marker interval recombination rate estimates, between 299 lines and between sexes. Both sexes had higher recombination rate near chromosome ends 300 and lower recombination rate in the middle of the chromosomes. However, there were several 301 broad regions of elevated female recombination rate which was not present in the males. 302 These regions were repeatable between lines. The mean between-line correlation was 0.83 in 303 females and 0.70 in males, whereas the mean correlation between sexes was 0.40 across 304 lines. 305 306 Correlation between genomic features and recombination rate 307 308 Local recombination rate had moderate to low correlation (absolute correlation coefficients 309 less than 0.33) with GC content, repeats and particular sequence motifs. Figure 4 shows the 310 correlations between recombination rate and genomic features in 1 Mbp windows, separated 311 by sex. There were positive correlations with GC content, and negative correlation with 312 sequence repeats when all repeat classes were combined. The correlation between 313 recombination rate and different types of repeats was variable. Recombination rate was only 314 weakly correlated with counts of the PRDM9 consensus motif CCNCCNTNNCCNC, but 315 moderately correlated with counts of the CCCCACCCC motif, previously found to be 316 enriched in high recombination regions in the pig genome [1]. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Recombination landscape

1 2 3 0.050

0.025

0.000 0 100 200 0 50 100 150 0 50 100

4 5 6 0.050

0.025

0.000 0 50 100 0 25 50 75 100 0 50 100 150

7 8 9 0.050

0.025

0.000 0 25 50 75 100 125 0 50 100 0 50 100

10 11 12 0.050

0.025

Recombination rate (per Mbp) Recombination rate 0.000 0 20 40 60 0 20 40 60 80 0 20 40 60

13 14 15 0.050

0.025

0.000 0 50 100 150 200 0 50 100 0 50 100

16 17 18 0.050

0.025

0.000 0 20 40 60 80 0 20 40 60 0 20 40 Position (Mbp)

Sex Female Male 317 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

318 Figure 2. Recombination landscape in the pig. The lines show recombination rate in windows 319 of 1 Mbp along the pig genome (Sscrofa11.1). Red lines show female estimates and blue lines 320 show male estimates. Each line shows one of the nine breeding lines. The black vertical lines 321 are predicted centromere locations in the reference genome, for chromosomes where they are 322 available. 323 324 325 Between lines, female Between lines, male Between sexes

9 0.93 0.91 0.91 0.93 0.96 0.9 0.67 0.89 1 9 0.84 0.77 0.8 0.85 0.95 0.85 0.51 0.79 1 9 0.45

8 0.87 0.85 0.83 0.85 0.86 0.85 0.67 1 0.89 8 0.72 0.68 0.6 0.68 0.76 0.73 0.51 1 0.79 8 0.42

7 0.64 0.63 0.62 0.65 0.65 0.64 1 0.67 0.67 7 0.47 0.44 0.4 0.45 0.5 0.49 1 0.51 0.51 7 0.19 cor 1.00 6 0.86 0.85 0.87 0.88 0.87 1 0.64 0.85 0.9 6 0.76 0.7 0.75 0.79 0.84 1 0.49 0.73 0.85 6 0.39

0.75 5 0.9 0.89 0.89 0.91 1 0.87 0.65 0.86 0.96 5 0.82 0.81 0.8 0.84 1 0.84 0.5 0.76 0.95 5 0.47

Line 0.50

4 0.88 0.87 0.9 1 0.91 0.88 0.65 0.85 0.93 4 0.76 0.69 0.78 1 0.84 0.79 0.45 0.68 0.85 4 0.41 0.25

0.00 3 0.87 0.84 1 0.9 0.89 0.87 0.62 0.83 0.91 3 0.74 0.63 1 0.78 0.8 0.75 0.4 0.6 0.8 3 0.4

2 0.89 1 0.84 0.87 0.89 0.85 0.63 0.85 0.91 2 0.71 1 0.63 0.69 0.81 0.7 0.44 0.68 0.77 2 0.47

1 1 0.89 0.87 0.88 0.9 0.86 0.64 0.87 0.93 1 1 0.71 0.74 0.76 0.82 0.76 0.47 0.72 0.84 1 0.42

1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 326 Line Line 327 Figure 3. Correlation heatmap of recombination landscapes between lines and sexes. 328 Heatmaps show pairwise correlations between lines of the estimated recombination rates at 329 each marker interval, within each sex, and the correlation between sexes within each line. 330 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Correlation with genomic features

GC content 0.33 0.21

PRDM9 consensus motif 0.024 0.019

CCCCACCCC motif 0.28 0.16

All repeats −0.16 −0.17 cor 1.0

0.5 SINE repeats 0.23 0.09 0.0

−0.5

LINE repeats −0.33 −0.25 −1.0

LTR repeats −0.069 −0.0028

DNA repeats −0.021 0.053

Simple repeats 0.16 0.072

Female Male Sex 331 332 Figure 4. Heatmap of correlation between genome features and recombination rate in 333 windows of 1 Mbp. The heatmap shows correlation between recombination rate sequence 334 features within 2272 windows of the autosomal part of the pig genome (Sscrofa11.1). bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

335 Heritability of recombination rate 336 337 Genome-wide recombination rate had low but nonzero heritability (h2 on average 0.07 for 338 females and 0.05 for males). Figure 5 shows the heritability and ratio of permanent 339 environmental variance, broken down by sex and line. There was little evidence of 340 differences in heritability between lines. The open circles in Figure 5 show genomic 341 heritability estimates from genome-wide association analyses. The genomic heritabilities 342 suggest that the SNP chip captured most (on average 83%) of the additive genetic variance in 343 recombination. 344 345 Genome-wide association of recombination rate 346 347 Genome-wide association revealed three regions of the genome containing markers 348 associated with genome-wide recombination rate. Figure 6 shows the results of genome-wide 349 association scans within each line, broken down by sex. Table 3 shows the location of the 350 most significant marker for each region with variance explained and allele frequency. There 351 was a region associated with female recombination rate at the start of chromosome 8 in six of 352 the lines, as well as a region on in line 1, and one on chromosome 1 in line 6. 353 The chromosome 8 region was also associated with male recombination rate in two lines. 354 Figure 7 shows a zoomed-in view of each of these regions, with the location of known 355 candidate genes involved in recombination. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Narrow−sense heritability 0.15

● 0.10

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Heritability 0.05 ● ● ● ● ● ● ● ● ● ● ● ● ●

0.00 1 2 3 4 5 6 8 9 Line Permanent environment 0.15

0.10

0.05 Variance ratio Variance ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.00 ● ● 1 2 3 4 5 6 8 9 Line

Sex ● female ● male 356 357 Figure 5. Heritability of average recombination. The dots show estimates of narrow-sense 358 heritability and the permanent environmental effect for average genome-wide recombination 359 estimated with an animal model. The lines show 95% credible intervals. Red dots and lines 360 show female estimates, while blue dots and lines show male estimates. Open circles show the 361 genomic heritability estimated from genome-wide association. Because of the low number of 362 dams and sires, we excluded the smallest line (line 7) from the analysis. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

363 364 Figure 6. Genome-wide association of average recombination. The subplots are Manhattan 365 plots of the negative logarithm of the p-value of association against genomic position, broken 366 down by line and sex. Alternating colours correspond to chromosomes 1-18. Because of the 367 low number of dams and sires, we excluded the smallest line (line 7) from the analysis. The 368 dashed red line shows a conventional genome-wide significance threshold of 5 · 10-8. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Chromosome 8 locus

Line 1, female Line 2, female ●●●● ●●●●●●● ● ●

● 15 ● ● ● ● ● ● ● ● ● ● 10 ● ●● ●● ●● ● ● ●●● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ●● ● ●● ● ●● ●● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●●● ● ● ●● ● ●● ● ●● ● ● ● ● ● ● ● ●● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ●● ●● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●●● ● ●● ●● ● ● ● ●●●●● ●● ●●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ●● ●●●●●● ● ● ●● ● ● ● ● ● ● ● ●●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ●●● ● ●●● ●● ● ●●●● ● ● ●●● ● ● ● ●●● ● ●●●●●● ●●●●●●●●●● ● ●● ●● ●●●●●●● ●●●●●●●●●●●●● ●● ●●●● ● ● ● ●●● ●● ● ●●● ●●● ●●●● ● ● ●●● ● ●●●● ●●●●● ●●●●●● ● ●●●●●●●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ●●●● ●● ●● 0 ●●●●●● ●● ●●● ●●●●● ●● ● ●●●● ● ●●●●●●●●●●●●●● ● ●●●●● ● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●● ● ● ● ●● ●● ● ●●● ● ●● ●●●●●● ●●● ●● ●●●●● ● ●●●●● ● ●● ●●● ●●●●●●● ●●●●●● ●●●●●●●●●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●● ●●● ●●●●●●●●● ●●●●●●●●●●●

Line 3, female Line 3, male ●● ●● ● ●●

● ●● ● ● 15 ● ● ● ● ● ● ● ● ●● ●● ●●●● ● ● ● ● 10 ● ● ● ● ●● ● ● ●●● ● ● ● ●●●● ● ● ● ●● ● ●●● ● ● ● ● ●●● ● ● ● ● ● ● ●●● ● ● ●● ●●● ● ● ● ● ● ●● ● ● ●●●●●● ● ● ● ● 5 ● ● ● ●● ● ●● ● ● ● ●●●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●●●● ● ●● ● ● ● ● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ●●●● ● ● ●●● ● ●● ● ● ● ●● ● ●● ●● ●● ● ●● ● ● ● ● ● ●●● ● ●●●●●● ●● ● ● ●● ●● ● ● ●● ● ●● ● ●●●●●●●●● ●● ● ●●● ●● ● ● ● ●● ● ●● ● ●●●●●●●●●● ●● ● ● ●●●● ● ● ● ● ●●●● ●●● ●● ● ●●● ● ●●●● ● ●● ● ●●●●● ● ●●●●●● ●●● ●●●●●●●●●●●●●● ● ●● ●●●● ●●●● ●●● ● ● ● ●●●●●●●● ● ●● ●● ●● ●●● ● ●●● ●●●●●● ● ● ●● ●● ●●●●●●●● ●●●● ● ●●●●●●●●●● ●● ● ● ●●●● ● ●●● ●●● 0 ●●● ● ●● ● ● ● ●●● ● ●●● ●●● ●●●●●●●●●●●●●● ●● ●●● ●●● ●● ● ●●● ●●●●● ● ● ● ●●● ●●●● ●●● ●●● ● ● ● ●●● ● ● ●●●●●●●●● ●●● ●● ● ●●●● ● ● ● ●●●●●●●●●● ●● ●●●●●●●●●● ●●●●● ●●●●●●●● ●●●●●● ●●●●●●●●●●●●●● ●● ●●●●● ●●●●●●● ● ●

Line 5, female Line 6, female

15 ●

● 10 ● ● ● ●●● ● ●● ● ●●●●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●● ● ●●● ● ● ●● ● ● ● ●●●● ●●● ● ● 5 ● ●● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●● ● ● ● ● ● ●● ● ●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ● ●●● ● ●●●● ● ●●● ● ● ● ● ●●● ● ●● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●● ●● ● ● ● ●●●●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●●●● ● ●●● ● ● ● ● ● ●● ●●● ●●●●●● ● ● ● ● ●●●● ● ●● ● ● ●●●● ● ●● ●● ● ● ● ●●● ● ● ● ● ● ●● ● ●●● ● ●● ●● ● ●● ●● ●● ●●● ●●● ●●●● ●● ● ● ●●● ●●●● ● ● ● ●●● ● ●●●●●●●● ●● ●● ● ●● ●● ●●●●●●● ● ● ●●● ●●●●●●● ● ●●● ● ● ●● ●●●●●●● ●●●●●● ●● ●●●●● ●● ●●●●●● ● ●●●● ● ●●●● ●●●● ●●●●●●●●●●● ● ●●●●●● 0 ● ● ●● ● ●● ●● ●●● ●●●● ●● ●●●●●●●●●● ● ●●● ● ●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●● ● ● ●●●●●●●●●● ●● ● ●● ●● ●●● ●●●●●●●●● ●●● ●●●●●●● ●●●●●●● ●●●●●●● ●●● ●●●●●●●●●● ●●● ●●●●●●●●●●●●●●● ● ●●●●●●●●● ●●●●●●●●●●●●●●

Line 9, female Line 9, male ●●●● ● ●●●● ● ●●●●●● ●●● ● ● ● ●

● ● ●● ● ● ● ● ●● ● ● ● 15 ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ● ●●●● ● ● ● ● ● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● 5 ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ●●● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ●● ● ● ● ● ● ●● ●● ●● ● ●● ● ● ● ● ●● ● ●●●● ● ● ● ●●●● ● ● ● ●● ● ● ●●●● ●● ● ●● ●●● ● ● ● ● ● ●● ● ● ●●● ●●●●● ●● ●● ● ● ● ● ● ● ●● ● ●●● ●●● ●●●●●●●●● ●● ● ●● ●● ●● ●● ● ●●● ●●● ●● ● ● ● ● ●● ● ●● ● ●● ● ●●●●● ● ●●●●● ●● ● ● ●● ●● ● ● ●● ●● ● ● ● ● ● ● ●●●●● ●●●●●●●● ●● ●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●● ●● ● ●● ●●●● ● ●●● ● ● ●● ● ● ●●● ● ● ●●● ●● ●● ● ●● ●●● ●●● ● ●●● ● ● ● ●●● ● ●●●●●●●● ● ● ● ● ● ●●● ●●●● ● ● 0 ●●●●● ●●● ●●● ●● ●● ●●●●●● ●●●●● ● ●●● ● ●●● ● ●● ●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●● ●●●● ●●●● ●● ●●●●● ●●● ● ● ●● ●●●● ● ●●●●● ● ●●●●●●● ● ● ●●● ● ●●●●● ● ●●●●●●●●● ●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●● Negative logarithm of p − value Negative 0.0 2.5 5.0 7.5 10.0 0.0 2.5 5.0 7.5 10.0

Chromosome 1 locus Chromosome 17 locus

Line 6, female Line 1, female

15 ● ● ● ●● ● ● ●● ● ●● ●●●●● ● ● ● 10 ●● ● ● ● ● ● ● ●● ● ● ● ● ● ●●●●● ● ●●● ●● ● ● ● ● ● ● ● ● ● ● ●●● ● ● ●● ● ● ● ● ● ● ●● ●● ●● ● ● ● ● ● ● ●●● ● ● ● 5 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ●● ● ●●●●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●●● ● ●● ● ● ● ●● ●●● ● ● ● ● ● ● ● ●● ● ● ● ● ●●● ●● ●● ●●● ●● ● ●● ● ● ● ● ● ●●●●●●● ● ●● ● ● ● ●● ● ●● ●● ●● ●● ● ● ●● ●●●● ●●● ●●● ●●●●● ● ● ●●● ●●●●●● ● ●● ●●●●● ●●● ●●● ● ●●● ●● ● ●● ●● ● ●●●●●●● ●●● ● ● ● ● ●●● ● ●●● ● ● ● ●●●●● ● ● ●●●●●● ●●●● ●●●● ●●●● ● ● ● ●● ● ●●●●●●● ● ●● ●●●●●● ●●● ●●●●●●● ●●●● ● ● ● ● ●●●●● ● ● ●●●● ●● ● ● ●●●●●●●●●●●● ●● ● ● ● ● ● ●●●●●●●● ●●●●●● ●●● ● ● ●●●● ●● ●●●● ● ●● ●●● ●●●●●●● ●●●●●●●●● ● ● ●●●●●● ● ●●●●●●● ● ●●●●●●●● ●●● ●●● ● ●● ●●● ● ●●●●●●● ●● ●●●●● ●●●●●●● ●● ●●●● ●● ●●● ● ●●●●●● ●● ●● ● ●● ● 0 ●●●●●●●●●●●●●● ●●●●●●●●●●●●●● ●● ●● ●● ● ●●●●● ●●● ● ●●●●●●●● ●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●● ●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●●●●●●●● ●●●●●● ●●● ●● ●●●●●●●● ●● ● ●● ●● ●● ●● ●●● ●● ●●● ● ● ● ●●● 240 245 250 255 260 56 58 60 62

369 Position (Mbp) 370 Figure 7. Regions associated with recombination rate and location of recombination- 371 associated candidate genes. The subplots are Manhattan plots of the negative logarithm of 372 the p-value of association against genomic position, zoomed in to show the region around the 373 significant markers. The red triangles show location of RNF212 on chromosome 8, SLOC1 374 on chromosome 1, and SPO11 on chromosome 17. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

375 Table 3. Genome-wide association study hits for average recombination, with position of the 376 lead SNP, additive genetic variance explained by the locus, and allele frequency of the allele 377 associated with higher recombination rate. 378 Genetic variance Allele Chromosome Sex Line Lead SNP position explained frequency 1 female 6 252,547,401 0.10 0.57 8 female 1 2,253,270 0.08 0.90 8 female 2 75,256 0.60 0.53 8 female 3 226,298 0.41 0.70 8 male 3 226,298 0.44 0.74 8 female 5 259,617 0.07 0.27 8 female 6 259,617 0.12 0.74 8 female 9 75,256 0.14 0.12 8 male 9 1,283,621 0.22 0.41 17 female 1 59,968,884 0.16 0.78 379

380 Algorithm performance on synthetic data 381 382 We tested the accuracy of the estimated recombination by analysing a synthetic dataset. 383 Figure 8 shows the simulated and estimated map length, recombination landscape, and a 384 scatterplot of simulated and estimated numbers of recombinations per individual. Our method 385 slightly overestimated recombination rate when there was variable recombination along the 386 chromosome. Because of uncertainty in the location of recombinations, the estimated 387 recombination landscape did not track per-marker recombination rate variation very well (r = 388 0.59), but better captured the smoothed recombination landscape using a window of 50 389 markers (r = 0.86). The accuracy of individual-level estimates of recombination was higher 390 for dams (r = 0.72) than for sires (r = 0.55). 391 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

392 393 Figure 8. Recombination rate estimation on simulated data. Cumulative number of 394 recombinations, recombination landscape along the simulated chromosome and the 395 correlation between true and estimated numbers of recombination in sires and dams. The 396 smoothed values are rolling averages of 50 markers. The red dashed line is the regression 397 line between true and estimated values. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

398 Discussion 399 400 In this paper, we estimated recombination rate variation within the genome and between 401 individuals in the pig using multiocus iterative peeling in nine genotyped pedigrees. 402 403 In this section, we discuss three main results: 404 (1) We confirm the known features of the pig recombination landscape, but not the 405 previously described correlation with the PRDM9 consensus motif. 406 (2) We show that recombination rate in the pig is genetically variable and associated with 407 alleles at the RNF212 gene. 408 (3) Multilocus iterative peeling is a compelling method for estimating recombination 409 landscapes from large genotyped pedigrees, but tends to overestimate genetic map 410 length. 411 412 Features of the pig recombination landscape 413 414 Our results recover known features of recombination in the pig, including the relative 415 chromosome lengths, and the marked sexual dimorphism. There are two notable exceptions, 416 where our estimates disagreed with previous results: we estimate overall longer genetic 417 lengths of chromosomes, and the correlations between recombination rate, density of the 418 PRDM9 consensus binding motif, and the density of some repeat classes are different than 419 estimated previously. 420 421 We estimated longer genetic maps than previous estimates for the pig. The total genetic map 422 lengths ranged from 18.5 to 21.7 Morgan for males and 22.3 to 25.9 Morgan for females. In 423 comparison, [1] found sex-specific map lengths of 17.8 and 17.5 Morgan for males, and 22.4 424 and 25.5 Morgan for females. This may be due to overestimation (see below), but also a 425 higher marker density and more complete use of the pedigree allowing us to detect more 426 recombinations. 427 428 The correlation between recombination and density of the PRDM9 consensus binding motif, 429 was lower than previous estimates. Because the PRDM9 protein determines the locations of a 430 subset of recombination hotspots, a positive correlation was expected. We detected only a bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

431 weak positive correlation with PRDM9 consensus motif density and recombination, which 432 suggests that we lack the genomic resolution to detect variation at this scale. The 433 recombination rate landscape is the outcome of processes operating at a much smaller scale, 434 with hotspots of a few kilobasepairs (as estimated by population sequencing [3] or by high- 435 density gamete genotyping [33]). There is more subtle local variation in recombination rate 436 that we cannot detect. 437 438 The associations between recombination and transposable element density were mixed, and 439 different for different types of transposable elements. The overall correlation between 440 recombination rate and repeats was negative, in line with estimates from other species [34]. 441 The negative correlation with LINEs was stronger than previously reported and the positive 442 correlation with simple repeats was weaker. One reason for these differences might be that 443 we used the more complete Sscrofa11.1 reference genome [26], which likely better resolves 444 the repeat landscape of the pig genome than the previous version. 445 446 447 Genetic variation in genome-wide recombination rate 448 449 Our results from the pig agree with the general picture of recombination rate variation in 450 vertebrates. The chromosome 8 locus is homologous to regions identified in humans [35–37], 451 cattle [7, 8, 10] , sheep [12, 13], and chickens [14]. It contains the RNF212 gene, a paralog of 452 which is also associated with recombination in deer [11]. The RNF212 protein binds to 453 recombination complexes, and is essential for crossover formation [38]. 454 455 While RNF212 is an obvious candidate gene, it is harder to find candidates for the other two 456 regions. We searched for the locations of candidate regions from other vertebrates, and 457 rapidly evolving recombination genes in mammals [39]. The chromosome 1 locus overlaps 458 SHOC1, one of the rapidly evolving recombination genes in mammals [39]. The closest 459 candidate recombination gene from the chromosome 17 locus is SPO11, associated with 460 recombination in chickens [14]. However, it is about two megabasepairs away from the most 461 significant marker. 462 bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

463 There are differences in recombination rate between lines, which may be due to fixed genetic 464 differences. Given that livestock populations have relatively small effective population sizes, 465 and assuming that recombination rate variation has a rather simple genetic architecture, line 466 differences in recombination rate might very well be due to genetic differences that have 467 fixed by chance. At the same times, all the lines showed evidence of comparable genetic 468 variation in recombination rate, and there was evidence that the major locus on chromosome 469 8 segregates in most lines. 470 471 A higher recombination rate could be beneficial for breeding, because it would reduce 472 linkage disequilibrium between causative variants and release genetic variance. Simulations 473 suggest that substantial increases in genome-wide recombination rate could increase genetic 474 gain [40]. We can approximate how much breeding could increase recombination rate based 475 on our results. 476 477 First, we can use the Breeder’s equation to predict the response to selection, treating genome- 478 wide recombination as a quantitative trait. The response is the heritability multiplied by the 479 selection differential S, which is the difference between population mean � and mean of the

480 selected individuals � . 481 482 � = ℎ � = ℎ (� − �) 483 484 Using distribution of genome-wide recombination rates from the males of the largest line, the 485 mean were 0.904 cM/Mbp. If we were to select the 10%, 20% or 30% highest recombination 486 individuals, the mean of the selected individuals would be 1.22 cM/Mbp, 1.15 cM/Mbp, and 487 1.11 cM/Mbp respectively. Assuming a heritability of 0.05, comparable to our estimated 488 genomic heritability, this would result in responses of: 489

490 �% = 0.05 ∙ (1.22 − 0.904) = 0.016 cM/Mbp

491 �% = 0.012 cM/Mbp

492 �% = 0.010 cM/Mbp 493 494 Relative to the average recombination rate, that would mean increases of 1.7%, 1.3% and 495 1.1%, respectively. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

496 497 Second, we concentrate on the major locus on chromosome 8 that we detected in most of the 498 lines, and approximate the increase in recombination rate that could be achieved if this locus 499 was fixed for the high recombination allele. Again, using estimates from the largest line, the 500 additive effect a of the chromosome 8 locus was estimated to be 0.0271 cM/Mbp (averaging 501 the male and female estimates), and the frequency f of the high recombination allele was 502 0.332 (weighted average of males and females). The increase in the mean of the population 503 by fixing the chromosome 8 locus would be: 504 505 � = � (1 − �) = 0.0271 ∙ (1 − 0.332) = 0.018 cM/Mbp 506 507 That is, it would increase genome-wide recombination rate by about 2%. 508 509 Compared to the simulation results of [40], which suggest that a doubling or more of 510 genome-wide recombination rate would lead to substantial genetic gains, these results 511 suggest that breeding for higher genome-wide recombination rate is not a practical alternative 512 for improving genetic gain. There may be other potential avenues, such as introducing 513 targeted recombinations in favourable locations [41] by biotechnological means. 514 515 516 Recombination rate inference by multilocus peeling 517 518 Throughout this paper we have used multilocus iterative peeling to estimate recombination 519 rate. In our simulation study, we found that multilocus iterative peeling could estimate the 520 number of recombinations per individual with an accuracy of 0.7 for dams and 0.5 for sires, 521 and the average recombination landscape along a chromosome. This is consistent with our 522 analysis of the pig genome, where we confirm previously known features of the pig 523 recombination landscape. However, the simulation results also show that we overestimated 524 the total genetic map length, consistent with our comparisons between the estimate 525 recombination rate and previously published estimates [1]. 526 527 Multilocus iterative peeling presents a compelling technique for estimating recombination 528 rate in large pedigree populations: it scales well to massive livestock pedigrees (more than bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

529 150,000 individuals), does not require pre-phasing of the data, and handles individuals 530 genotyped on range of platforms without requiring non-overlapping variants to be imputed 531 beforehand. 532 533 The primary downside is that multilocus iterative peeling requires multiple generations of 534 genotyped individuals to be available to accurately phase, impute, and estimate the 535 recombination rate. Although this information may be available in pig or chicken breeding 536 programmes [23, 42], and some wild populations [12] it may not be available in all 537 populations. In addition to this the overestimation of genetic map length suggests that the 538 exact genetic map lengths and counts of recombination for a specific individual may not be 539 accurate, but it is able to recover broad patterns in recombination between chromosomes and 540 between individuals. 541

542 Conclusion 543 544 In this paper we analyse 150,000 individuals from nine pig pedigrees. We find that we are 545 able to recover broad-scale patterns in the total genetic map length, recombination landscape, 546 and sex differences in recombination rates. In addition to this, we found that recombination 547 rate had low, but non-zero heritability, and a genome-wide association study detected three 548 regions associated with recombination rate. This paper highlights the ability to use large scale 549 pedigree and genomic data, as is routinely collected in many closely managed populations to 550 infer and understand recombination and recombination rate variation. 551

552 Declarations 553 554 Ethics approval and consent to participate 555 The samples used in this study were derived from the routine breeding activities of PIC. 556 Consent for publication 557 Not applicable. 558 Availability of data and materials 559 The datasets generated and analysed in this study are derived from the PIC breeding 560 programme and not publicly available. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

561 Competing interests 562 The authors declare that they have no competing interests. 563 Funding 564 The authors acknowledge the financial support from the BBSRC ISPG to The Roslin Institute 565 BBS/E/D/30002275, from Grant Nos. BB/N015339/1, BB/L020467/1, BB/M009254/1, from 566 Genus PLC, Innovate UK, and from the Swedish Research Council Formas Dnr 2016-01386. 567 Author’s contributions 568 JMH, MJ, AW and GG conceived the study. MJ, AW, RRF and CC analysed data. WH and 569 DdK helped interpret the results. MJ, AW and JMH wrote the paper. All authors read and 570 approved the final manuscript. 571 Acknowledgements 572 This work has made use of the resources provided by the Edinburgh Compute and Data 573 Facility (ECDF) (http://www.ecdf.ed.ac.uk). 574

575 References 576 577 1. Tortereau F, Servin B, Frantz L, Megens H-J, Milan D, Rohrer G, et al. A high density 578 recombination map of the pig reveals a correlation between sex-specific recombination and 579 GC content. BMC genomics. 2012;13:586.

580 2. McVean GA, Myers SR, Hunt S, Deloukas P, Bentley DR, Donnelly P. The fine-scale 581 structure of recombination rate variation in the . Science. 2004;304:581–4.

582 3. Myers S, Bottolo L, Freeman C, McVean G, Donnelly P. A fine-scale map of 583 recombination rates and hotspots across the human genome. Science. 2005;310:321–4.

584 4. Kong A, Gudbjartsson DF, Sainz J, Jonsdottir GM, Gudjonsson SA, Richardsson B, et al. 585 A high-resolution recombination map of the human genome. Nature genetics. 2002;31:241.

586 5. Fledel-Alon A, Leffler EM, Guan Y, Stephens M, Coop G, Przeworski M. Variation in 587 human recombination rates and its genetic determinants. PloS one. 2011;6:e20321.

588 6. Kong A, Barnard J, Gudbjartsson DF, Thorleifsson G, Jonsdottir G, Sigurdardottir S, et al. 589 Recombination rate and reproductive success in humans. Nature genetics. 2004;36:1203. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

590 7. Kadri NK, Harland C, Faux P, Cambisano N, Karim L, Coppieters W, et al. Coding and 591 noncoding variants in HFM1, MLH3, MSH4, MSH5, RNF212, and RNF212B affect 592 recombination rate in cattle. Genome research. 2016;26:1323–1332.

593 8. Sandor C, Li W, Coppieters W, Druet T, Charlier C, Georges M. Genetic variants in 594 REC8, RNF212, and PRDM9 influence male recombination in cattle. PLoS genetics. 595 2012;8:e1002854.

596 9. Simianer H, Szyda J, Ramon G, Lien S. Evidence for individual and between-family 597 variability of the recombination rate in cattle. Mammalian genome. 1997;8:830–835.

598 10. Ma L, O’Connell JR, VanRaden PM, Shen B, Padhi A, Sun C, et al. Cattle sex-specific 599 recombination and genetic control from a large pedigree analysis. PLoS genetics. 600 2015;11:e1005387.

601 11. Johnston SE, Huisman J, Pemberton JM. A genomic region containing REC8 and 602 RNF212B is associated with individual recombination rate variation in a wild population of 603 red deer (Cervus elaphus). G3: Genes, Genomes, Genetics. 2018;8:2265–76.

604 12. Johnston SE, Bérénos C, Slate J, Pemberton JM. Conserved genetic architecture 605 underlying individual recombination rate variation in a wild population of Soay sheep (Ovis 606 aries). Genetics. 2016;203:583–98.

607 13. Petit M, Astruc J-M, Sarry J, Drouilhet L, Fabre S, Moreno CR, et al. Variation in 608 recombination rate and its genetic determinism in sheep populations. Genetics. 609 2017;207:767–84.

610 14. Weng Z, Wolc A, Su H, Fernando RL, Dekkers JC, Arango J, et al. Identification of 611 recombination hotspots and quantitative trait loci for recombination rate in layer chickens. 612 Journal of animal science and biotechnology. 2019;10:20.

613 15. Dapper AL, Payseur BA. Connecting theory and data to understand recombination rate 614 evolution. Philosophical Transactions of the Royal Society B: Biological Sciences. 615 2017;372:20160469. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

616 16. Stapley J, Feulner PG, Johnston SE, Santure AW, Smadja CM. Variation in 617 recombination frequency and distribution across eukaryotes: patterns and processes. 618 Philosophical Transactions of the Royal Society B: Biological Sciences. 2017;372:20160455.

619 17. Broman KW, Murray JC, Sheffield VC, White RL, Weber JL. Comprehensive human 620 genetic maps: individual and sex-specific variation in recombination. The American Journal 621 of Human Genetics. 1998;63:861–9.

622 18. Sturtevant AH. The linear arrangement of six sex-linked factors in Drosophila, as shown 623 by their mode of association. Journal of experimental zoology. 1913;14:43–59.

624 19. Coop G, Wen X, Ober C, Pritchard JK, Przeworski M. High-resolution mapping of 625 crossovers reveals extensive variation in fine-scale recombination patterns among humans. 626 science. 2008;319:1395–8.

627 20. Weng Z-Q, Saatchi M, Schnabel RD, Taylor JF, Garrick DJ. Recombination locations 628 and rates in beef cattle assessed from parent-offspring pairs. Genetics Selection Evolution. 629 2014;46:34.

630 21. Segura J, Ferretti L, Ramos-Onsins S, Capilla L, Farré M, Reis F, et al. Evolution of 631 recombination in eutherian mammals: insights into mechanisms that affect recombination 632 rates and crossover interference. Proceedings of the Royal Society B: Biological Sciences. 633 2013;280:20131945.

634 22. Mary N, Barasc H, Ferchaud S, Billon Y, Meslier F, Robelin D, et al. Meiotic 635 recombination analyses of individual chromosomes in male domestic pigs (Sus scrofa 636 domestica). PLoS One. 2014;9:e99123.

637 23. Whalen A, Ros-Freixedes R, Wilson DL, Gorjanc G, Hickey JM. Hybrid peeling for fast 638 and accurate calling, phasing, and imputation with sequence data of any coverage in 639 pedigrees. Genetics Selection Evolution. 2018;50:67.

640 24. Meuwissen T, Goddard M. The use of family relationships and linkage disequilibrium to 641 impute phase and missing genotypes in up to whole-genome sequence density genotypic data. 642 Genetics. 2010;185:1441–9. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

643 25. Elston RC, Stewart J. A general model for the genetic analysis of pedigree data. Human 644 heredity. 1971;21:523–42.

645 26. Warr A, Affara N, Aken B, Beiki H, Bickhart DM, Billis K, et al. An improved pig 646 reference genome sequence to enable pig genetics and genomics research. bioRxiv. 647 2019;:668921.

648 27. Myers S, Freeman C, Auton A, Donnelly P, McVean G. A common sequence motif 649 associated with recombination hot spots and genome instability in humans. Nature genetics. 650 2008;40:1124.

651 28. Bao W, Kojima KK, Kohany O. Repbase Update, a database of repetitive elements in 652 eukaryotic genomes. Mobile DNA. 2015;6:11.

653 29. Hansen-Melander E, Melander Y. The karyotype of the pig. Hereditas. 1974;77:149–158.

654 30. Hadfield JD. MCMC methods for multi-response generalized linear mixed models: the 655 MCMCglmm R package. Journal of Statistical Software. 2010;33:1–22.

656 31. Gelman A. Prior distributions for variance parameters in hierarchical models (comment 657 on article by Browne and Draper). Bayesian analysis. 2006;1:515–34.

658 32. Rönnegård L, McFarlane SE, Husby A, Kawakami T, Ellegren H, Qvarnström A. 659 Increasing the power of genome wide association studies in natural populations using 660 repeated measures–evaluation and implementation. Methods in ecology and evolution. 661 2016;7:792–9.

662 33. Jeffreys AJ, Holloway JK, Kauppi L, May CA, Neumann R, Slingsby MT, et al. Meiotic 663 recombination hot spots and human DNA diversity. Philosophical Transactions of the Royal 664 Society of London Series B: Biological Sciences. 2004;359:141–52.

665 34. Kent TV, Uzunović J, Wright SI. Coevolution between transposable elements and 666 recombination. Philosophical Transactions of the Royal Society B: Biological Sciences. 667 2017;372:20160458.

668 35. Chowdhury R, Bois PR, Feingold E, Sherman SL, Cheung VG. Genetic analysis of 669 variation in human meiotic recombination. PLoS genetics. 2009;5:e1000648. bioRxiv preprint doi: https://doi.org/10.1101/2020.03.17.995969; this version posted March 20, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

670 36. Kong A, Thorleifsson G, Stefansson H, Masson G, Helgason A, Gudbjartsson DF, et al. 671 Sequence variants in the RNF212 gene associate with genome-wide recombination rate. 672 Science. 2008;319:1398–401.

673 37. Kong A, Thorleifsson G, Frigge ML, Masson G, Gudbjartsson DF, Villemoes R, et al. 674 Common and low-frequency variants associated with genome-wide recombination rate. 675 Nature genetics. 2014;46:11.

676 38. Reynolds A, Qiao H, Yang Y, Chen JK, Jackson N, Biswas K, et al. RNF212 is a dosage- 677 sensitive regulator of crossing-over during mammalian meiosis. Nature genetics. 678 2013;45:269.

679 39. Dapper AL, Payseur BA. Molecular evolution of the meiotic recombination pathway in 680 mammals. Evolution. 2019.

681 40. Battagin M, Gorjanc G, Faux A-M, Johnston SE, Hickey JM. Effect of manipulating 682 recombination rates on response to selection in livestock breeding programs. Genetics 683 Selection Evolution. 2016;48:44.

684 41. Bernardo R. Prospective targeted recombination and genetic gains for quantitative traits 685 in maize. The plant genome. 2017;10.

686 42. Hickey JM, Kranis A. Extending long-range phasing and haplotype library imputation 687 methods to impute genotypes on sex chromosomes. Genetics Selection Evolution. 688 2013;45:10.

689