bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

1 Article/Investigation

2 Instability of the pseudoautosomal boundary in house mice

∗ ∗ ∗, †, ‡ ∗ 3 Andrew P Morgan , Timothy A Bell , James J Crowley , and Fernando Pardo-Manuel de Villena

∗ 4 Department of Genetics and Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC

† 5 Department of Psychiatry, University of North Carolina, Chapel Hill, NC

‡ 6 Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden

7 April 20, 2019

8 Corresponding authors:

9 Andrew P Morgan ([email protected])

10 Fernando Pardo-Manuel de Villena ([email protected])

11 5049C Genetic Medicine Building

12 Department of Genetics

13 University of North Carolina

14 Chapel Hill NC 27599-7264

15 Keywords: , recombination, pseudoautosomal region, sex evolution

16

1 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

17 Abstract

18 Faithful segregation of homologous at meiosis requires pairing and recombination. In taxa

19 with dimorphic sex chromosomes, pairing between them in the heterogametic sex is limited to a narrow inter-

20 val of residual sequence homology known as the pseudoautosomal region (PAR). Failure to form the obligate

21 crossover in the PAR is associated with male infertility in house mice (Mus musculus) and humans. Yet despite

22 this apparent functional constraint, the boundary and organization of the PAR is highly variable in mammals,

23 and even between subspecies of mice. Here we estimate the genetic map in a previously-documented ex-

24 pansion of the PAR in the Mus musculus castaneus subspecies and show that the local recombination rate is

25 100-fold higher than the autosomal background. We identify an independent shift in the PAR boundary in the

26 Mus musculus musculus subspecies and show that it involves a complex rearrangement but still recombines in

27 heterozygous males. Finally, we demonstrate pervasive copy-number variation at the PAR boundary in wild

28 populations of M. m. domesticus, M. m. musculus and M. m. castaneus. Our results suggest that the intensity of

29 recombination activity in the PAR, coupled with relatively weak constraints on its sequence, permit the genera-

30 tion and maintenance in the population of unusual levels of polymorphism of unknown functional significance.

31

32

33 Introduction

34 Pairing, and crossing-over between homologous chromosomes is a requirement for orderly progres-

35 sion through meiosis that is broadly conserved in eukaryotes (Mather 1938; Mirzaghaderi and Hörandl 2016).

36 In most organisms, pairing is facilitated by sequence homology. The sex chromosomes of mammals – which

37 are divergent in sequence and structure over most of their length (Graves 2006) – thus pose a unique challenge

38 for males, the heterogametic sex. Homology between the X and Y is limited to remnant “pseudoautosomal

39 regions” (PARs) that are a vestige of the sex chromosomes’ origin as an ancestral pair (Ohno 1966).

40 Synapsis and crossing-over in this relatively small territory appears to be necessary for faithful segregation of

41 the sex chromosomes in both humans and mice (Rouyer et al. 1986; Matsuda et al. 1991; Hassold et al. 1991;

42 Burgoyne et al. 1992) – although not in some other mammals (Ashley and Moses 1980; Wolf et al. 1988; Carnero

43 et al. 1991). We limit our discussion here to the placental mammals; the sex chromosomes of marsupials and

44 monotremes are distinct and have more complex meiotic behavior (Sharman et al. 1950; Sharp 1982; Graves and

45 Watson 1991; Page et al. 2005).

46 Despite the putative functional importance of the PAR, its boundary and structure appear to evolve rapidly

47 in eutherian mammals (Raudsepp and Chowdhary 2015). PARs have been defined at the sequence level for

48 only a handful of organisms, due in large part to the challenges of assembling repeat- and GC-rich sequences.

2 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

49 The human has two PARs, PAR1 on Xp and PAR2 on Xq (Figure 1A). Only PAR1 represents the

50 remnant of the ancestral autosome, while PAR2 arose via several rearrangements within the primate lineage

51 (Charchar et al. 2003); PAR1 recombines avidly in male meiosis (Hinch et al. 2014) while PAR2 recombines in-

52 frequently and has sequence properties similar to the rest of the X chromosome (Cotter et al. 2016). At least one

53 well-described structural polymorphism of PAR1 segregates in human populations (Mensah et al. 2014). The

54 allele is a translocation of X-linked sequence onto the and arose via non-homologous recombi-

55 nation between repetitive elements present on the X and the male-specific portion of the Y chromosome. The

56 duplicated segment recombines in male meiosis at rates similar to single-copy PAR1 (Poriswanish et al. 2018).

57 The PARs of simian primates such as the rhesus macaque have sequence homology and conserved synteny with

58 human PAR1 on Xp (Ellis et al. 1990) and are on the order of 3 Mb in physical size, while those of ruminants

59 extend further into the ancestral X chromosome and are closer to 10 Mb in size (Das et al. 2009).

60 The mouse PAR, however, is dissimilar to those of humans or other mammals. Its basic organization is

61 shown in Figure 1B. Its boundary falls in the third intron of the mouse ortholog of the human Opitz syndrome

62 (OMIM #300000) MID1 (Palmer et al. 1997). Restriction mapping has shown that the mouse PAR is ap-

63 proximately 750 kb in size in the C57BL/6J inbred strain, most of which consists of arrays of GC-rich satellite

64 and other repetitive sequences (Perry et al. 2001), and is exquisitely prone to internal rearrangement in male

65 meiosis (Kipling et al. 1996a,b). Two , Sts (Kipling et al. 1996a) and Asmt (Kasahara et al. 2010), have

66 been genetically mapped distal to Mid1. Two further genes (Ngln4x and Sfrs17a) have been localized to the

67 PAR in comparative studies (Bellott et al. 2014) but lie in an assembly gap whose sequence is awaiting finish-

68 ing (https://www.ncbi.nlm.nih.gov/grc/mouse/issues/MG-3246). The representation of the PAR

69 in the current version of the reference genome (mm10/GRCm38) comprises 961, 540 bases (of which 800, 000

70 lie in two gaps whose size represents a rough estimate by the Genome Reference Consortium) and includes the

71 sequence of Mid1 and Asmt.

72 The PAR boundary is variable even within the Mus genus. It falls between the eighth and ninth exons of

73 Mid1 in the Mus spretus-derived strain SPRET/Ei (Perry et al. 2001). White et al. (2012a) showed that in the

74 Mus musculus castaneus-derived strain CAST/EiJ, the PAR boundary is shifted ~ 400 kb proximally so that

75 the entire sequence of Mid1 is present on both sex chromosomes. This implies translocation of previously

76 X-unique sequence into the Y chromosome at the PAR boundary. Recombinants in this extended PAR were

77 recovered from F2 crosses with the M. m. domesticus-derived WSB/EiJ strain, and heterozygosity in the PAR was

78 associated with spermatogenic failure (White et al. 2012b). Hereafter we refer to the PAR defined in C57BL/6J

79 as the “canonical PAR” and the CAST/EiJ allele as an “extended PAR” (Figure 1B). Unlike the canonical PAR,

80 the extended portion of the PAR in CAST/EiJ consists mostly of unique sequence with GC content and repeat

81 density similar to the genomic background.

3 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

PAR2 A WASH6P B IL9R VAMP7 SPRY3 cen Mid1 Sts Asmt X

FRMPD4 Frmpd4 Y MSL3 Msl3 canonical ARHGAP6 Arhgap6 repetitive PAR AMELX Amelx sequences HCCS Hccs

MID1 Mid1 PAR ~ 400 kb ~ 750 kb CLCN4 Sts STS Asmt YCAST/EiJ XG PAR1 DHRSX White et al. (2012) extended ASMT PAR SHOX human mouse

Figure 1: Schematic organization of the pseudoautosomal region in human and the house mouse. (A) Synteny map of the human and mouse X chromosomes. Orthologous genes shown in black, non-orthologous genes shown in grey, and the gene marking the end of the human-mouse synteny block (MID1) shown in red. PARs shown as grey boxes. Not all human genes are shown. (B) The mouse PAR. The “canonical PAR” defined in the C57BL/6J reference strain (top) encompasses at least two genetically-mapped -coding genes (Sts, Asmt) and the 3’ fragment of Mid1. Its physical size is approximately 750 kb, most of which consists of unassembled repetitive sequence. The “extended PAR” of CAST/EiJ (bottom) involves the duplication of approximately 400 kb of ancestrally X-unique sequence onto the Y, including the 5’ fragment of Mid1 (White et al. 2012a).

82 Gain or loss of pseudoautosomal identity is predicted to have several important consequences for molec-

83 ular evolution. First, since pseudoautosomal genes are diploid in both sexes, they are subject to a different

84 dosage-compensation regime than the rest of the X chromosome and are more likely to escape X-inactivation

85 (Graves et al. 1998). Second, to the extent that recombination is mutagenic (Hellmann et al. 2003), the intensity

86 of pseudoautosomal recombination in male meiosis means pseudoautosomal sequences are expected to di-

87 verge more quickly across species than strictly X- or Y-linked sequences. Diversity within populations should

88 also be higher than at sex-linked sites, due to both higher mutational input and mitigation of hitchhiking and

89 background-selection effects by the high local recombination rate in males. Finally, the spectrum of

90 in pseudoautosomal sequence should be biased by the locally increased effects of GC-biased

91 (Lamb 1984; Eyre-Walker Adam 1993). These expectations have generally been borne out in empirical studies

92 of the human (Schiebel et al. 2000; Filatov and Gerrard 2003; Cotter et al. 2016) and mouse (Perry and Ashworth

93 1999; Huang et al. 2005; White et al. 2012a) pseudoautosomal regions.

94 In this work we use crosses between inbred strains from the three major subspecies of house mice to create

95 a genetic map for the both the extended PAR of CAST/EiJ and the proximal portion of the canonical PAR. We

96 use whole-genome sequencing to confirm the presence of a 14 kb extension of the PAR in the M. m. muscu-

97 lus-derived PWK/PhJ strain and show that, although it involves a complex structural rearrangement, it too

98 recombines in heterozygous males. Finally, we uncover a remarkable degree of polymorphism at and likely

4 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

99 across the PAR boundary in natural populations of all three subspecies. These findings add to a growing body

100 of evidence that the degree of homology required for male fertility is relatively weak (Dumont 2017; Acquaviva

101 et al. 2019). Relaxed constraint on PAR structure, combined with the intensity of double-strand break activity

102 in male meiosis, permit the generation and maintenance of unusual levels of diversity in this peculiar region of

103 the sex chromosomes.

104 Results

105 Recombination in the extended pseudoautosomal region of CAST/EiJ

106 To estimate a recombination map in the CAST/EiJ extended pseudoautosomal region, we used genotype data

107 from two crosses segregating for the CAST/EiJ Y chromosome: 219 N2 progeny of (PWK/PhJ×CAST/EiJ)F1

108 males and 48 N2 progeny of (WSB/EiJ×CAST/EiJ)F1 males. The experimental design is shown in Figure 2. For

109 notational convenience we will refer to these crosses by the sire genotype throughout. Progeny were genotyped

110 on one of two SNP array platforms (one with approximately 77, 000 markers, hereafter “77K”, and one with

111 approximately 11, 000 markers, hereafter “11K”) whose content is optimized for populations derived from 8

112 inbred strains including those used here (Morgan et al. 2016). The nominal position of informative markers in

113 the CAST/EiJ extended PAR in the mm10 reference genome in is shown in Figure 2A. To capture recombination

114 in the canonical PAR, we identified 2 additional informative markers beyond the PAR boundary (see Materials

115 and Methods).

116 Recombinants in the PAR were then identified in curated genotypes using the rules illustrated in Figure

117 2B. Sex (ie. presence or absence the male-specific region of the Y chromosome) anchors the proximal end

118 of the genetic map of the PAR by definition. Aggregating across both array platforms, 26 of 219 progeny of

119 (PWK/PhJ×CAST/EiJ)F1 males and 8 of 48 of progeny of (WSB/EiJ×CAST/EiJ)F1 males have evidence of

120 recombination. Of these, at least 4 (all from the PWK/PhJ×CAST/EiJ cross) are double recombinants (Table

121 1). Example recombinant chromosomes from the PWK/PhJ×CAST/EiJ cross are shown in Figure 3A, and the

122 estimated genetic maps for both crosses in Figure 3B (and in tabular form in File S1). The genetic length of the

123 portion of the PAR spanned by our array markers is 13.7±2.3 cM (point estimate ± standard error) and 16.7±5.4

124 cM in the PWK/PhJ×CAST/EiJ and WSB/EiJ×CAST/EiJ crosses, respectively. (Most of this map, 10.0±2.0 cM

125 and 8.3 ± 4.0 cM, respectively, lies in the “extended” portion of the PAR due to limited marker coverage in the

126 canonical PAR.) Given the region’s physical size of 466 kb (spanning chrX:169,542,082–169,969,759 in the mm10

127 assembly), this translates to recombination rates of 29.4 ± 5.0 cM/Mb and 35.8 ± 11.5 cM/Mb, respectively,

128 assuming negligible crossover interference (Soriano et al. 1987). By contrast, male-specific autosomal rates in

129 these crosses have been estimated at approximately 0.4 cM/Mb (Dumont and Payseur 2011a,b). This 100-fold

5 V/Jall n aenlCS/i lee te obntosrqiercmiainbtenteXadY and X maternal the a between inherit allele recombination require progeny FVB/NJ combinations male maternal Other non-recombinant an allele. markers; inherit CAST/EiJ chromosomes. chromosome paternal progeny lines. a X female dashed and at recombination, red allele allele FVB/NJ of by PWK/PhJ arrays absence indicated with paternal 11K genotyped the are a and progeny In boundaries N2 and 77K PAR and the females, canonical arrays. from FVB/NJ and SNP with markers bred commercial CAST/EiJ SNP were of males panel. (PWK/PhJ ×CAST/EiJ)F1 assembly) the Multiple bottom in mm10 (B) shown the the is in and (in genes, protein-coding (green) shown positions and PAR physical are grey) extended (light Nominal map gaps CAST/EiJ Marker assembly panel. the (A) internal top of its CAST/EiJ. representation with of grey) PAR schematic (dark extended PAR A canonical the boundary. in pseudoautosomal recombination detecting the for near design Experimental 2: Figure certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint A 169

UNC31590957 UNC31591176 UNC31591264

SXX206763835 doi: UNC31591516 UNC31591733 UNC31591894

UNC31592078 https://doi.org/10.1101/561951 UNC31592360 UNC31592588 UNC31592717 169.5 UNC31592836 gUNC200051678 UNC200051678 UNC31593277 UNC31593544 UNC31593763 UNC31594098 UNC31594161 UNC31594461 UNC31594704 position (Mb) position SXX206791242 UNC31594892 UNC31595072 Mid1

170 UNC31595302 UNC31595411 UNC31595576 PAR1 SBB206798913

PAR3 (assembly gaps) JAX00723372 PAR4 UNC31596388 PAR6

UNC31596457 Sts? PAR7 PAR8 ; this versionpostedApril20,2019. a 170.5 PAR9

PAR10 CC-BY 4.0Internationallicense PAR11 PAR12 PAR13

S3R206833141 Asmt PAR14 PAR15 PAR16 PAR17 PAR18 PAR19 171

6

11K array 11K array 77K B N2 . The copyrightholderforthispreprint(whichwasnot -/- F0 FVB/NJ X X -/+ recombinants (PWK/PhJ F0 -/- X Y CAST/EiJ) F1 -/+ informative site CAST/EiJ F0 genotype progeny sex F1 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Recombinants Cross Platform Sex n NR single double PWK/PhJ×CAST/EiJ 77K XX 90 78 8 4 XY 90 80 10 0 11K XX 18 17 1 0 XY 20 17 3 0 XO 1 1 0 0 WSB/EiJ×CAST/EiJ 77K XX 2 2 0 0 XY 4 3 1 0 11K XX 23 20 3 0 XY 19 15 4 0

Table 1: Summary of crossovers detected in the PAR in N2 progeny.

130 elevation of the recombination rate in the CAST/EiJ extended PAR over the autosomal background is consistent

131 with obligatory crossing-over in the PAR at every meiosis (Burgoyne 1982) despite its relatively small physical

132 size (Perry et al. 2001). We find no evidence for transmission bias against recombinants (pooled OR from both

133 crosses = 0.89 by logistic regression, 95% CI 0.44 − 1.82, p = 0.75 by likelihood-ratio test), and a rate of sex-

134 chromosome nondisjunction (XOs: 1/267, 0.37%) similar to prior estimates for these F1s (Dumont 2017).

135 Two biochemical surrogates of meiotic recombination – histone 3 lysine 4 trimethylation (H3K4me3), which

136 marks sites for DSB formation, and DMC1 protein, which binds free ends of DSBs – both accumulate on the

137 CAST/EiJ Y haplotype in spermatocytes. We re-analyzed ChIPseq performed against H3K4me3 and DMC1 in

138 testes of reciprocal F1 hybrids between C57BL/6J and CAST/EiJ (Baker et al. 2015; Smagulova et al. 2016).

139 Reads were assigned to parental haplotypes using known SNVs informative between the parental strains.

140 In the extended PAR, H3K4me3 and DMC1 reads are attributed almost exclusively to the X chromosome in

141 CAST/EiJ×C57BL/6J males, but to both haplotypes in C57BL/6J×CAST/EiJ males (Figure S1). This suggests

142 that recombination is initiated from both haplotypes in the extended PAR when the CAST/EiJ Y chromosome

143 is present, and proceeds through the normal homologous recombination pathway.

144 Extension of the pseudoautosomal region in PWK/PhJ

145 We next attempted to estimate the rate of recombination in the canonical PAR. For this we turned to low-

146 coverage whole-genome sequence data from 86 progeny of (WSB/EiJ×PWK/PhJ)F1 sires and FVB/NJ dams

147 (hereafter referred to as the N2 generation). It was immediately apparent that the complex repetitive sequence

148 in the assembled portion of the PAR – to say nothing of the unassembled portion – would preclude accurate

149 genotyping of SNVs from short reads. However, we noticed a copy-number variant (CNV) at the PAR bound-

150 ary segregating among the N2s that is not present in published whole-genome sequence data from the founders

7 oharypafrs e imn niae onayo aoia PAR. canonical of Physical boundary across indicates positions pooling (right) diamond crosses, lines. genetic Red WSB/EiJ×CAST/EiJ and dashed and platforms. (left) nudged PWK/PhJ×CAST/EiJ red Physical array (B) the both points by in panel. markers top (with indicated the informative are markers in for shown shown boundaries flanking are the are array PAR between and 77K canonical the PAR grey interval from and markers in extended the of CAST/EiJ shown positions the show is bars in overplotting). sex Blue chromosomes avoid for Recombinant to haplotype black. (A) non-recombinant in PAR. expected haplotype extended The recombinant CAST/EiJ cross. the PWK/PhJ×CAST/EiJ of the map in Genetic 3: Figure certified bypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailableunder bioRxiv preprint P P P P P P P P P P P P P P P AE_025_GF_E_753 AE_025_GF_E_818 AE_025_GF_E_822 AE_026_GF_E_838 AE_027_GF_E_878 AE_027_GF_E_881 AE_033_GF_E_735 AE_025_GF_E_814 P P P P AE_025_GF_Y_k11 AE_025_GF_Y_k12 AE_025_GF_Y_k10 P P P AE_026_GF_Y_i10 AE_025_GF_Y_N3 AE_034_GF_Y_K9 AE_034_GF_Y_K1 AE_027_GF_Y_L6 AE_027_GF_Y_L2 AE_027_GF_Y_h3 AE_027_GF_Y_h5 AE_025_GF_Y_k3 AE_025_GF_Y_k6 AE_033_GF_Y_f2 A doi: https://doi.org/10.1101/561951 UNC31592360 UNC31592588 UNC31592717 UNC31592836 UNC200051678

169.5 169.5 UNC31593277 UNC31593544 UNC31593763 UNC31594098 UNC31594161 ● ● ● ● ● position (Mb) position (Mb) UNC31594461 UNC31594704 UNC31594892 ● ● ● ● ●

● ● UNC31595072 169.75 169.75

● ● UNC31595302 UNC31595411 UNC31595576 PAR1 PAR3 ; this versionpostedApril20,2019. ● ● ● ● ● ● ● ● ● ● ● ● JAX00723372 a CC-BY 4.0Internationallicense

● ● ● ● ● ● ● ● ● ● PAR4

● ● ● ● ● ● ● UNC31596388 PAR6 170 170

● ● ● ● ● ● ● ● ● UNC31596457 8 B 170008243 169972826 169921993 169884432 169781058 169747553 169700826 169593019 169542082 bp ● ● ● ● ● ● ● . The copyrightholderforthispreprint(whichwasnot (PWKxCAST)F1 cM (SE) ● ● ● ● ● ● ● 13.7 (2.3) 10.0 (2.0) 5.0 (1.5) 4.6 (1.4) 3.2 (1.2) 0.5 (0.5) 0.0 (0.0) bp ● ● ● ● ● (WSBxCAST)F1 cM (SE) ● ● ● ● ● 16.7 (5.4) 15.6 (5.2) 8.3 (4.0) 2.1 (2.1) 0.0 (0.0) bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

151 of the cross (FVB/NJ, male; WSB/EiJ, female; PWK/PhJ, female) (Keane et al. 2011) but is present in a Collab-

152 orative Cross strain carrying a PWK/PhJ Y chromosome (CC044/Unc, (Srivastava et al. 2017)) (Figure 4A). The

153 copy gain segregates almost perfectly with male sex: 39 of 40 males (97.5%) have the expected copy number

154 1 over the adjacent segment of chrX but 2 copies of the CNV segment, while 46 of 46 females (100%) have 2

155 copies across the entire region (Figure 4B). This pattern is consistent with duplication of X-linked sequence onto

156 the Y chromosome in PWK/PhJ. The segment is thus a candidate for an extension of the PAR. This agrees with

157 the observation of (White et al. 2012a) that primers in the second intron of Mid1 amplify from both X and Y

158 chromosomes in PWD/PhJ, a strain closely-related to PWK/PhJ (Yang et al. 2011).

159 Closer examination of the duplication shows that it lies near but not immediately adjacent to the PAR

160 boundary in the mm10 reference assembly, and that it is not contiguous (Figure S2). The boundaries of the

161 duplication correspond to clusters of read alignments with aberrant mate-pair orientation. We manually in-

162 spected read alignments near the putative duplication boundaries in IGV and obtained base-pair resolution of

163 the boundaries by identifying multiple of reads with partial alignments that terminated at the same position.

164 The coordinates of the duplicated segments in mm10 are chrX:169,953,707–169,966,684 and chrX:169,967,172–

165 169,968,235; the nominal size of the duplicated region is 14,040 bp. Reads from fragments overlapping the

166 duplication boundary were pooled and de novo assembled with SPAdes. This yielded 3 contigs (File S3) whose

167 alignments back to the reference (with blat), shown in Figure S2, define the breakpoints of the duplication.

168 The distal boundary coincides with a cluster of ancient LINE L1 elements, but other boundaries do not appear

169 to correspond to either specific repetitive elements or exons of the Mid1 gene.

170 We used the copy number of sub-segments of the CNV and the constraints imposed by the breakpoint

171 alignments to reconstruct the organization of the PAR boundary on the PWK/PhJ Y chromosome (Figure 5).

172 In this proposed configuration, the previously X-linked sequence lies internal to the PWK/PhJ Y PAR; this is

173 supported by direct evidence from single reads whose alignments cross the breakpoint. The duplication also

174 contains 460 bp of novel sequence (in breakpoint contig region_3) that is absent from the mm10 assembly –

175 including unplaced contigs – but is found in Mid1 introns in de novo assemblies for Mus spretus, Mus caroli

176 and Mus pahari. Recovery of this sequence from multiple outgroup species suggests that it was present in the

177 common ancestor of the genus Mus (5 Mya, (Thybert et al. 2018)) and has since been lost in some lineages of M.

178 musculus. Note that the duplication may be much larger than its nominal size in the mm10 reference, as it may

179 contain other novel sequence that lies too far from the breakpoints to be captured by our assembly strategy.

180 Having established that the duplication is genetically and physically linked to the PAR on the PWK/PhJ

181 Y chromosome, we next tested whether it is truly pseudoautosomal in the sense that it recombines in male

182 meiosis. Biallelic SNVs in the duplicated region were ascertained in the N2 progeny, the parental strains of the

183 cross, and a collection of wild mice with publicly-available whole-genome sequence data. Only sites segregat-

9 _E_518 XX XY x

● ● XY se 170 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● loss of the duplication. ● ● ● ● ● ● ● 169.9 distal ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● E_518 169.8 A) Copy number in the distal X chro- de novo position (Mb) ● ● ● ● ● ● ● ● ● ● ● ● 169.7 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● The copyright holder for this preprint (which was not was (which preprint this for holder copyright The . ximal ● ● ● ● ● pro ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 169.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ×CAST/EiJ cross and a representative N2 male

8.0 4.0 2.0 1.0 0.5

3 2 1 16.0 ed read depth read ed z mali r no depth read ed z mali r no B C 10 PWK/PhJ CC044/Unc WSB/EiJ FVB/NJ Y_B2 XX XY XX XY XY 170 CC-BY 4.0 International license International 4.0 CC-BY a this version posted April 20, 2019. 2019. 20, April posted version this ; 169.9 169.8 position (Mb) 169.7 169.6

8.0 4.0 2.0 1.0 0.5 8.0 4.0 2.0 1.0 0.5 8.0 4.0 2.0 1.0 0.5 8.0 4.0 2.0 1.0 0.5 8.0 4.0 2.0 1.0 0.5

https://doi.org/10.1101/561951 16.0 16.0 16.0 16.0 16.0 ed read depth read ed z mali r no A doi: doi: bioRxiv preprint preprint bioRxiv certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under under available is made It in perpetuity. the preprint to display a license bioRxiv granted has who author/funder, is the review) by peer certified Figure 4: A translocation of X-linked sequence onto the Y in PWK/PhJ. ( mosome, estimated from whole-genome(with sex-chromosome sequencing karyotypes read as depth(PAE_041_HG_Y_B2). Duplicated shown) in segment of indicated 1 by the kb redindicated WSB/EiJ arrow. by bins, CAST/EiJ red and dashed for lines. canonical PAR Canonical the boundaries(“proximal”) PAR shaded are and parental in chrX:169.96 grey. inbred (B) Mb Copy (“distal”, strains number(C) at over Copy the red number proximal arrow X profile in chromosome of panel N2 A) male among PAE_033_HG_E_518, demonstrating N2 males (blue) and females (pink). bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Ydup PAR X C57BL/6J span (kb) 13.9 1.1 5.6 Y Mid1 exon 3 Lx5b

53708 66685 75331 67172 68235 PAR boundary mm10 position (+ 169.9 Mb) 69759

R1 R3 X PWK/PhJ R2

R1 R3 >= 460 bp Y R2 novel sequence homologous to M. spretus, M. caroli and M. pahari

Figure 5: Proposed organization of the PWK/PhJ Y-linked duplication. X-linked segments involved in the du- plication are shown in their position in C57BL/6J (mm10 assembly) in the top panel. Breakpoint coordinates and selected annotations are shown in the middle panel. The bottom panel shows alignments of breakpoint- spanning contigs to the PWK/PhJ X chromosome (which is near-identical to C57BL/6J), along with one possible reconstruction of the Y chromosome consistent with copy-number and alignment constraints. dXY , absolute sequence divergence between X- and Y-linked copies of PAR.

11 HOM_A1 HET HOM_A2 NA genotype

controls N2 XY high-confidence 84 (2.4%) in 28 kb, for an x 2/85 SNV inde loss, is The copyright holder for this preprint (which was not was (which preprint this for holder copyright The . de novo e XY WSB/EiJ XY PWJ/PhJ XX PWK_PhJ−lik AE_033_HG_Y_q9 AE_033_HG_Y_q8 AE_033_HG_Y_q7 AE_033_HG_Y_q6 AE_033_HG_Y_q5 AE_033_HG_Y_q3 AE_033_HG_Y_p9 AE_033_HG_Y_p8 AE_033_HG_Y_p5 AE_033_HG_Y_p1 AE_041_HG_Y_X4 AE_041_HG_Y_X2 AE_041_HG_Y_B8 AE_041_HG_Y_B6 AE_041_HG_Y_B5 AE_041_HG_Y_B3 AE_041_HG_Y_B2 P P P P P P P P P P AE_041_HG_E_597 AE_041_HG_E_595 AE_041_HG_E_594 AE_041_HG_E_591 AE_041_HG_E_589 AE_041_HG_E_467 AE_041_HG_E_466 AE_041_HG_E_460 AE_033_HG_E_680 AE_033_HG_E_678 AE_033_HG_E_674 AE_033_HG_E_672 AE_033_HG_E_525 AE_033_HG_E_524 AE_033_HG_E_523 AE_033_HG_E_522 AE_033_HG_E_521 AE_033_HG_E_520 AE_033_HG_E_518 AE_033_HG_E_517 AE_033_HG_E_515 P P P P P P P AE_041_HG_Y_X12 AE_041_HG_Y_X10 P P P P P P P P P P P P P P P P P P P P P P P * * 12 10) in the N2s and heterozygous in a CC male carrying the HOM_A1 HET HOM_A2 NA > genotype

CC-BY 4.0 International license International 4.0 CC-BY controls N2 XX a this version posted April 20, 2019. 2019. 20, April posted version this ; cM/Mb. The duplication is thus a bona fide extension of the PAR. 84 loss of the PWK/PhJ Y-linked copy of the duplicated sequence. Further sup- x de novo SNV inde loss is provided by the absence of aberrantly-oriented read alignments at the boundaries of https://doi.org/10.1101/561951 de novo doi: doi: e XY WSB/EiJ XY PWJ/PhJ XX PWK_PhJ−lik AE_033_HG_Y_q4 AE_033_HG_Y_q2 AE_033_HG_Y_q1 AE_033_HG_Y_p7 AE_033_HG_Y_p6 AE_033_HG_Y_p4 AE_033_HG_Y_p3 AE_033_HG_Y_p2 AE_041_HG_Y_X9 AE_041_HG_Y_X8 AE_041_HG_Y_X7 AE_041_HG_Y_X6 AE_041_HG_Y_X5 AE_041_HG_Y_X3 AE_041_HG_Y_X1 AE_041_HG_Y_B7 AE_041_HG_Y_B4 AE_041_HG_Y_B1 P P P P P P P P AE_041_HG_E_600 AE_041_HG_E_599 AE_041_HG_E_598 AE_041_HG_E_596 AE_041_HG_E_593 AE_041_HG_E_590 AE_041_HG_E_468 AE_041_HG_E_465 AE_041_HG_E_464 AE_041_HG_E_462 AE_041_HG_E_461 AE_041_HG_E_459 AE_041_HG_E_458 AE_033_HG_Y_q11 AE_033_HG_Y_q10 AE_033_HG_Y_p11 AE_033_HG_Y_p10 AE_033_HG_E_682 AE_033_HG_E_681 AE_033_HG_E_679 AE_033_HG_E_677 AE_033_HG_E_676 AE_033_HG_E_675 AE_033_HG_E_673 AE_033_HG_E_519 AE_033_HG_E_516 AE_033_HG_E_514 P P P P P P P P P P AE_041_HG_Y_X13 P P P P P P P P P P P P P P P P P P P P P P P P P P P P * estimated recombination rate of individual represents a duplicated region in this cross, excluding the individual with the number 1 (PAE_033_HG_E_518) has reverted to homozygosity across the entire region. We conclude that this port for the the allele in PAE_033_HG_E_518 but not the male recombinant (Figure S3). The recombination fraction in the gous genotypes, and 1 female (PAE_033_HG_Y_p2) with a run of heterozygous genotypes. The male with copy calls in the N2s revealed 2 likely recombinants (Figure 6): 1 male (PAE_041_HG_Y_B3) with a run of homozy- chromosomes carry different alleles at all 84 sites; we revisit this finding below. Manual review of genotype PWK/PhJ Y chromosome were retained. This conservativesites ascertainment informative strategy for yielded recombination in (WSB/EiJ×PWK/PhJ)F1 males. To our surprise, the PWK/PhJ X and Y ing at moderate frequency (minor allele count bioRxiv preprint preprint bioRxiv certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under under available is made It in perpetuity. the preprint to display a license bioRxiv granted has who author/funder, is the review) by peer certified Figure 6: Genotypes of N2some females and (left) the and PWK/PhJ males X (right)parental at or inbred informative WSB/EiJ strain sites X controls between haplotypes is thethe under PWK/PhJ shown the mm10 Y in PWK/PhJ reference chromo- both Y-linked allele duplication. topMissing (HOM_A1), panels. The calls heterozygous same are Genotype (HET) shown set calls in or of are grey. homozygous Recombinant color-coded for individuals as shown the homozygous with alternate for red allele arrowheads. (HOM_A2). 194 195 190 193 189 191 192 188 187 186 184 185 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

A ●● B WSB/EiJ● 200 ●● ● ● ●● ● ● ● ● ● ●● ●● ● ● ●●● ● ●● ● 2.5 ●●● PWK/PhJ● ● 100 ●●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● subspecies 0.0 ● ● ● ●●●●● ● ● ●● ●●●●● ●● ●●● ● ● 0 ●●●●●●● ● ●● ● sex ●●●●●●●●●●●● dom ● ●● ●●● ●●● ● ●● ● ● ● ● ● ● mus ●●●● XX ● ● ●● ● ●● ● ●● ● ● WSB/EiJ ● ●● ●● ● cas ●● ● XY Y_B3● ● ● −2.5 ● ● ● PC2 (6.7%) ● PC2 (10.0%) spretus PWK/PhJ Y ● Y_p2 −100 ● ● ● ● ● ● ● ● CC044/Unc E_518 ●●●●●● ●● −5.0 −200 ● PWK/PhJ X

● ●CAST/EiJ ● −7.5 −200 0 200 −10 0 10 PC1 (39.6%) PC1 (74.0%)

Figure 7: Determining the ancestry of the PWK/PhJ Y-linked duplication. (A) PCA on genotype matrix at 65, 977 autosomal SNPs from the 77K array platform, in a collection of 346 wild mice of M. m. domesticus, M. m. musculus, M. m. castaneus and M. spretus ancestry, plus the three wild-derived inbred strains used in this study (open points with labels). Data from Didion et al. (2016). (B) PCA on 84 sites informative between the PWK/PhJ X and Y chromosomes in wild mice (colored by subspecies) and (WSB/EiJ×PWK/PhJ)N2 progeny.

196 Origin of the extended pseudoautosomal region in PWK/PhJ

197 Free recombination is difficult to reconcile with the divergence between the X- and Y-linked copies in PWK/PhJ.

198 The two haplotypes differ at 84 sites in 14 kb (0.60%), within the range expected for autosomal sequences from

199 different mouse subspecies (Geraldes et al. 2008). PWK/PhJ is a wild-derived strain whose founders were

200 trapped in the Czech Republic (Gregorová and Forejt 2000), has > 90% M. m. musculus ancestry (Yang et al.

201 (2011) and Figure 7A) and a M. m. musculus haplotype in the male-specific portion of the Y chromosome

202 (Morgan and Pardo-Manuel de Villena 2017). However, both PWK/PhJ and the related strain PWD/PhJ are

203 known to carry a M. m. domesticus haplotype in the distal portion of the X chromosome (Yang et al. 2011).

204 PCA on SNV genotypes both at autosomal sites and within the Y-linked duplication clearly separates wild and

205 wild-derived M. m. domesticus, M. m. castaneus and M. m. musculus individuals by subspecies (Figure 7A,B).

206 A pseudo-diploid PWK/PhJ Y PAR haplotype, constructed from the consensus PWK/PhJ paternal allele in

207 N2 progeny, clusters with wild M. m. musculus, while the PWK/PhJ X chromosome clusters with wild M. m.

208 domesticus (Figure 7B). N2 female progeny, expected to be homozygous for two domesticus haplotypes (WSB/EiJ

209 and FVB/NJ), cluster with M. m. domesticus. N2 male progeny and the CC044/Unc inbred male, expected to be

210 heterozygous between a musculus- and a domesticus-like haplotype, fall in between the two subspecies-specific

211 clusters. Recombinant N2 progeny lie between the heterozygote and homozygote clusters. The sequence in the

212 PWK/PhJ extended PAR appears to be of M. m. musculus origin.

213 When did the duplication occur? If it occurred during the establishment of the PWK/PhJ line, it must have

13 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

214 been early in the inbreeding process, before the distal X chromosome was fixed for a domesticus haplotype. We

215 examined read alignments from wild-caught M. m. musculus and identified a male from the Czech Republic

216 (CR16) with clusters of aberrantly-oriented read pairs coinciding with the boundaries of the PWK/PhJ dupli-

217 cation (Figure S4). Reads with partial alignments from this male supported precisely the same breakpoints as

218 in PWK/PhJ. Although we cannot exclude the possibility of recurrent mutation, the most parsimonious expla-

219 nation is that the duplication in PWK/PhJ is segregating at moderate frequencies in wild populations of M. m.

220 musculus.

221 Gene expression across the pseudoautosomal boundary

222 The gene that spans the pseudoautosomal boundary, Mid1, is broadly expressed in cell lineages of all three

223 germ layers during embryonic development (Dal Zotto et al. 1998) and knockouts have abnormal migration

224 and branching of axons (Lu et al. 2013). We therefore investigated the effects of shifts in the PAR boundary

225 on transcriptional regulation of Mid1. Our group has previously assayed gene expression by RNAseq in adult

226 brain in male and female progeny of a full diallel cross between CAST/EiJ, PWK/PhJ and WSB/EiJ (Crowley

227 et al. 2015). We re-quantified expression in a total of 90 individuals from nine genotypes with Kallisto (Bray

228 et al. 2016), using the Ensembl version 96 annotation after removal of redundant gene models present on both

229 the X- and Y-encoded copies of the PAR. To characterize patterns of differential splicing, we also aligned reads

230 to the mm10 reference genome after masking the Y-encoded copy of the PAR to avoid mapping artifacts (since

231 the PAR is the only locus that deviates from the conventional haploid representation in the reference genome).

232 Expression of Mid1 is 11.2-fold higher when at least one CAST/EiJ haplotype is present (95% CI 7.5 − 16.7,

−20 233 adjusted p = 9 × 10 ), and there is no difference between males and females (1.2-fold higher expression in

234 females, 95% CI 0.8 − 1.6, adjusted p = 0.94) (Figure 8A). In reciprocal F1 hybrids between CAST/EiJ and either

235 PWK/PhJ or WSB/EiJ, there is no apparent effect of maternal versus paternal inheritance of the CAST/EiJ

236 haplotype on Mid1 expression (1.3-fold higher when maternally-inherited, 95% CI 0.8 − 2.1, adjusted p = 0.49).

237 Next we used known sequence variants to assign reads to parental haplotypes and examined coverage on

238 the maternal and paternal alleles in males and females from reciprocal F1 hybrids with a CAST/EiJ parent

239 (Figure 8B). Reads from biological replicates were pooled for improved visualization. In females, Mid1 is tran-

240 scribed from both X chromosomes. In males, transcription of the exons in the canonical PAR (exons 4-10, grey

241 boxes) occurs on both X and Y. This 3’ fragment of the gene does not encode an open reading frame, so the role

242 of the Y-linked transcripts is unclear. Transcription of the full-length transcript, including the 5’ fragment prox-

243 imal to the PAR boundary (exons 1-3, black boxes), occurs only from the X – unless a CAST/EiJ Y chromosome

244 is present.

245 Finally we learned transcript models from read alignments in the parental strains and compared them to

14 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

246 existing annotations in Ensembl (Figure 8C-D). Transcripts containing the full-length coding sequence of the

247 MID1 protein were recovered from PWK/PhJ and WSB/EiJ but not from CAST/EiJ, despite having nearly

248 100, 000 reads aligned in the Mid1 locus in that strain. Although we observed hundreds of reads originating

249 from exons 6-10 in CAST/EiJ (Figure 8B), none had spliced alignments across the junction between exons 5

250 and 6, nor between 7 and 8. Several CAST/EiJ isoforms have a 3’ extension of exon 4 and a 5’ extension of

251 exon 5 (grey highlighted box in Figure 8D); the coding potential of these isoforms is uncertain. The difficulty

252 of assembling full-length transcripts in CAST/EiJ may be due to mapping artifacts if the organization of this

253 portion of the PAR is divergent from the reference genome. Moreover, it is difficult to assess the accuracy of our

254 transcript assembly in the presence of abundant copy-number variation of segments that include exons 4-10

255 (next section).

256 Extreme structural diversity at the pseudoautosomal boundary

257 Inspection of sequencing data from 67 wild mice revealed abundant copy-number variation in the 1 Mb ad-

258 jacent to the PAR boundary as well as within the canonical PAR itself (Figure 9A, Figure S5). Nominal copy

259 number varies from zero to more than 20 for some segments, and individual mice may have 4 or more distinct

260 copy-number states across the region. As for the PWK/PhJ allele, we sought to identify breakpoints of copy

261 gains and losses by manual review of read alignments in IGV for 38 mice with evidence of CNVs in the distal

262 X chromosome (excluding the canonical PAR). Clusters of 3 or more reads with partial alignments terminating

263 at the same in the mm10 reference were deemed sufficient evidence for a breakpoint. This analysis

264 yielded 32 distinct breakpoints within 200 kb of the PAR boundary, none of which are shared across subspecies.

265 Their spatial distribution appears non-random: their density seems to decrease moving outward from the PAR

266 boundary (Figure 9B, File S9). However, tests for enrichment of breakpoints in proximity to exons and repeti-

267 tive elements in the reference genome, and known recombination hotspots, all returned null results (cf. Figure

268 9C-E). The density of repetitive elements in the 500 kb of sequence proximal to the pseudoautosomal boundary

269 (42.8%) is not different than the subtelomeric regions of the (mean 42.0%, standard deviation 10.2%).

270 The previously-described ~ 466 kb expansion of the PAR in the CAST/EiJ strain (White et al. 2012a), of predom-

271 inantly M. m. castaneus ancestry and derived from founders trapped in Thailand, is not found in wild M. m.

272 castaneus from northern India. The allele may be confined to southeast Asia. Several large deletions adjacent to

273 and possibly involving the canonical PAR boundary that were also identified in CAST/EiJ appear fixed in the

274 Indian population.

275 To gain some insight into the evolutionary trajectory of the PAR in the genus Mus, we estimated copy

276 number in the distal X chromosome and the canonical PAR in 8 M. spretus, 1 M. spicilegus and 1 M. caroli.

277 Both males and females have the expected copy number for the X chromosome in the vicinity of the PAR

15 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

A parental F1 B CASTxPWK PWKxCAST CASTxWSB WSBxCAST

●● 200 300 w)

●● XX ● 0

●● ● ● 100 ● nal bel o ●● ●● sex ● ● 200 ● ● ● ● female ● ● ● ● ● male

● e / pate r xpression (TPM) v 30 ● ● ● ●● 200 ● ● total e ● ● ● ● ●● XY ● nal ab o ●●● ● ● ● ● ●● ● ● ● allele−specific read depth 0 10 (mate r 200

PWKxPWKWSBxWSB PWKxWSBWSBxPWK CASTxCAST CASTxPWKPWKxCASTCASTxWSBWSBxCAST Mid1 exons 9x10-18 1x10-10 0.49

C D CAST/EiJ FF.34.9 FF.34.9 CAST/EiJ FF.34.6 FF.34.6 FF.34.3 FF.34.3 FF.34.2 FF.34.2 FF.34.1 FF.34.1

GG.28.1 GG.28.1 GG.27.9 GG.27.9 GG.27.8 GG.27.8 GG.27.7 GG.27.7 PWK/PhJ GG.27.6 GG.27.6 PWK/PhJ GG.27.5 GG.27.5 GG.27.4 GG.27.4 GG.27.3 GG.27.3 GG.27.2 GG.27.2 GG.27.14 GG.27.14 GG.27.12 GG.27.12 GG.27.10 GG.27.10 GG.27.1 GG.27.1 HH.24.9 HH.24.9 HH.24.8 WSB/EiJ HH.24.8

HH.24.7 WSB/EiJ HH.24.7 HH.24.6 HH.24.6 HH.24.5 HH.24.5 HH.24.4 HH.24.3 HH.24.4 HH.24.1 HH.24.3 HH.24.1 ENSMUST00000171433 ENSMUST00000163810 ENSMUST00000171433 Ensem b ENSMUST00000112107 ENSMUST00000163810 Ensem b ENSMUST00000112105 ENSMUST00000112107 ENSMUST00000112104 ENSMUST00000112105 ENSMUST00000079443 l ENSMUST00000112104 ENSMUST00000078947 ENSMUST00000079443 l ENSMUST00000036753 ENSMUST00000078947 ENSMUST00000036753 169.6 169.7 169.8 169.9 170 position (Mb) assembled Mid1 isoforms

Figure 8: Expression of psueodoautosomal gene Mid1 in F1 hybrids. (A) Total expression in adult whole brain, quantified as transcripts per million (TPM), in all 9 possible reciprocal crosses between CAST/EiJ, PWK/PhJ and WSB/EiJ. Group assignments for several tests of differential expression are shown as color-coded dots beneath x-axis, with corresponding adjusted p-values at right. (B) Aggregate allele-specific read depth over Ensembl- annotated exons of Mid1 (transcript ENSMUST00000112107) in F1 females (top row) and males (bottom row) with a CAST/EiJ X or Y chromosome. Maternal reads are shown above the horizontal axis and paternal reads below, shaded according to strain origin (green, CAST/EiJ; red, PWK/PhJ; purple, WSB/EiJ). (C) Transcript iso- forms assembled from RNAseq read alignments in the parental strains compared to Mid1 transcript models from Ensembl, plotted in genomic coordinates. Grey dashed line shows canonical PAR boundary. (D) Same transcripts as panel C, but drawn with introns compressed, exons to scale and vertically aligned to facilitate comparison of splice sites. Coding region of Ensembl transcripts shown as tall boxes. CAST/EiJ splicing differences in exons corresponding to canonical exons 4 and 5 emphasized by light grey shaded box.

16 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

A dom mus cas

16.0 HG08 [XY] AL16 [XY] H30 [XX] 8.0 4.0 2.0 1.0 0.5

16.0 TP51D [XY] CR12 [XX] H12 [XY] 8.0 4.0 2.0 1.0 0.5 copy number copy

16.0 JR11 [XY] AFG2 [XY] H14 [XX] 8.0 4.0 2.0 1.0 0.5

169.5 169.6 169.7 169.8 169.9 170 169.5 169.6 169.7 169.8 169.9 170 169.5 169.6 169.7 169.8 169.9 170 position (Mb) B duplication alleles individuals

I II III IV

169.6 169.7 169.8 169.9 170 position (Mb)

C ENSMUST00000112104

ENSMUST00000112107 Mid1 ENSMUST00000036753

ENSMUST00000112105 transcripts

ENSMUST00000078947

ENSMUST00000171433

ENSMUST00000163810

ENSMUST00000079443

169.6 169.7 169.8 169.9 170 position (Mb)

D LINE mm10 repeats SINE LTR Satellite family Simple_repeat Low_complexity

169.6 169.7 169.8 169.9 170 position (Mb)

E CASTxCAST H3K4me3 CASTxB6 B6xCAST cross B6xB6 169.6 169.7 169.8 169.9 170 position (Mb)

Figure 9: Abundant copy-number variation at the pseudoautosomal boundary in wild house mice. (A) Copy number in the distal X chromosome, estimated from whole-genome sequencing read depth in 1 kb bins, for representative individuals of M. m. domesticus (blue), M. m. musculus (red) and M. m. castaneus (green). Inferred sex-chromosome karyotype is shown for each individual. Dashed lines show the boundaries of the CAST/EiJ extended and canonical PARs, respectively. Canonical PAR is shaded grey. (B) Distinct copy-number alleles ascertained from breakpoint-spanning reads in 67 wild mice. Each allele is represented as a rectangle extending from the proximal breakpoint up to the pseudoautosomal boundary, and is shown in proportion to the number of times it is observed in our sample. (C) Transcripts of the Mid1 gene from Ensembl release 96. (D) Repeat elements annotated in the mm10 reference by the UCSC Genome Browser. (E) Recombination hotspots ascertained in C57BL/6J (B6), and CAST/EiJ (CAST), and F1 males (Baker et al. 2015).

17 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

278 boundary (Figure S6). Within the most proximal region of the canonical PAR, from the PAR boundary to just

279 past the eight exon of Mid1 (chrX: 169986134), outgroup samples of both sexes have the copy number expected

280 for the X chromosome – suggesting that the region is not pseudoautosomal in these species. (Similarly, no

281 evidence of heterozygosity in the proximal exons of Mid1 was found in whole-exome data from wild-caught

282 male representatives of more distant outgroups Mus cookii and Nannomys minutoides (Sarver et al. 2017); not

283 shown.) This agrees with previous findings that the PCR marker DXYCbl1, whose sequence aligns at chrX:

284 169974684-169974964, is X-unique in offspring of (C57BL/6JxSPRET/Ei)F1 males (Breen et al. 1994; Perry et al.

285 2001). Given the conservation of copy number on either side of the PAR boundary in three outgroup species, we

286 conclude that the duplications observed in each M. musculus subspecies are derived alleles. Without pedigree

287 data we cannot be certain of X-, Y- or pseudoautosomal linkage for these duplications, but similarity to the

288 pattern we observe for the PWK/PhJ allele – which itself segregates in the wild – strongly suggests that at least

289 some involve the pseudoautosomal boundary.

290 Discussion

291 Here we combine crosses involving inter-subspecific hybrid males with whole-genome sequencing data from

292 wild mice to clarify and extend previous observations on the lability of the mouse pseudoautosomal region.

293 We show that the previously-described extension of the PAR in M. m. castaneus-derived CAST/EiJ – that is,

294 translocation of X-unique sequence onto the Y chromosome – undergoes crossing-over in heterozygous males

295 at nearly 100-fold higher rate than autosomal sequence. Our estimate of the recombination rate in the extended

296 PAR (18 − 23 cM/Mb) agrees with a recent estimate from whole-genome sequencing of single recombinant

297 sperm (24 cM/Mb) from a C57BL/6J×CAST/EiJ cross (Hinch et al. 2019) male. The nominal marker order in the

298 reference genome is supported by haplotypes reconstructed for recombinant progeny. Nonetheless, we observe

299 several double-recombinants within < 500 kb, consistent with prior observations that double-recombinants are

300 common in the canonical PAR (Soriano et al. 1987; Pardo-Manuel de Villena and Sapienza 1996).

301 We further confirm that a shorter extension of the PAR has occurred in PWK/PhJ (White et al. 2012a). Unlike

302 the CAST/EiJ allele, the X- and Y-linked copies in PWK/PhJ are divergent in sequence and structure. This

303 suggests that they do not recombine freely in the inbred strain, although we find that they do recombine in inter-

304 subspecific hybrid males. The X-linked copy is of M. m. domesticus ancestry – evidence of either introgression

305 in the wild or contamination in a laboratory colony – while the Y-linked copy comes from M. m. musculus, is

306 segregating in the wild in the Czech Republic.

307 In fact the PWK/PhJ Y-linked allele is just one of numerous structural variants at the PAR boundary that

308 segregate in natural populations of all three subspecies. All of these are derived alleles: both the PAR boundary

309 and a proximal segment of the canonical PAR are single-copy in M. spretus, M. spicilegus and M. caroli and there-

18 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

310 fore most likely were single-copy in the common Mus ancestor. Using breakpoint-spanning reads to uniquely

311 identify structural alleles, we find at least 32 alleles in a sample of 67 mice. None of them are shared across sub-

312 species. The unusual level of standing variation suggests that the mutation rate near the PAR boundary is quite

313 high. Indeed, we recover a de novo deletion on the PWK/PhJ Y haplotype in a sample of only 86 mice, for an

314 estimated mutation rate of 1.2% per generation (95% Poisson CI 0.066% − 5.1%). To our surprise, none of these

315 rearrangements seem to prevent robust expression of an essential gene (Mid1) that flanks the pseudoautosomal

316 boundary (Dal Zotto et al. (1998) and Figure 8).

317 Taken together, our results indicate that the peculiar properties of the PAR in male meiosis – intense recombi-

318 nation activity, apparent absence of crossover interference, markedly elevated rates of structural rearrangement

319 (Kipling et al. 1996a,b) – can “leak” across the PAR boundary. This is in spite of an abrupt and complete loss

320 of sequence homology between the X and Y, at least in the reference assembly. Extension of the PAR requires

321 transposition of sequence from the X onto the Y chromosome. Once this occurs, the region seems immediately

322 subject to PAR-like levels of recombination despite not sharing the sequence features of the canonical PAR.

323 It follows that these sequence features (such as elevated GC content) are indeed a consequence rather than

324 a cause of the PAR’s high recombination rate in males. A summary of the major evolutionary events at the

325 pseudoautosomal boundary in Mus is provided in Figure 10.

326 What could account for de novo X-Y exchanges in the absence of homology to promote pairing? One possi-

327 bility is that the duplications happen through a mechanism such as non-homologous end joining. Another is

328 ectopic recombination mediated by repetitive sequences such as transposon fragments (see Figure 5) or satel-

329 lites. This is the mechanism for the human PAR1 extension present in some populations of European ancestry

330 (Mensah et al. 2014). It has recently been shown that DSBs in the canonical PAR are formed preferentially at

331 minisatellite arrays, and that it is the chromatin configuration of the repeat-rich sequence – rather than the se-

332 quence per se – that is essential for X-Y recombination (Acquaviva et al. 2019). It seems plausible that any nearby

333 sequence drawn into the protein-DNA complex at the chromosomal axis could be susceptible to unequal ex-

334 change. Once present on both sex chromosomes, it would be prone to copy-number changes by further unequal

335 exchange events.

336 Our findings support the notion that while the requirement for X-Y pairing in the first meiotic prophase is

337 strictly enforced, the degree of structural homology in the PAR necessary to achieve pairing is relatively weak in

338 mice. It has been known for several decades that failure to pair the sex chromosomes is associated with meiotic

339 arrest in inter-specific (Matsuda et al. 1992; Hale et al. 1993) and inter-subspecific mouse hybrids (Forejt 1996).

340 The unpaired sex chromosomes do not undergo meiotic sex-chromosome inactivation, and misexpression of

341 X- and Y-linked genes is apparently toxic to germ cells (Good et al. 2010; Vernet et al. 2011; Campbell et al.

342 2013). However, absence of X-Y pairing in sterile hybrids occurs in the context of widespread asynapsis of

19 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

343 both autosomes and sex chromosomes (Bhattacharyya et al. 2013). More direct evidence of the requirement

344 for recombination in the PAR comes from comparison of genetically-engineered mice deficient for isoforms of

345 Spo11, which encodes the enzyme that executes programmed meiotic DSBs (Keeney et al. 1997). The SPO11-β

346 isoform is required for DSB formation and synapsis on autosomes and (in females) the X-chromosome pair;

347 SPO11-α is required for DSB formation in the PAR (Bellani et al. 2010). Male mice deficient in SPO11-α alone

348 have high rates of X-Y asynapsis and are sterile or subfertile, depending on genetic background, despite having

349 normal crossover frequency and synapsis on autosomes (Kauppi et al. 2011; Faisal and Kauppi 2016). Faisal and

350 Kauppi (2016) speculate that partial rescue of the SPO11-α knockout may be due to a PAR variant in segregating

351 in their colony. The best evidence for a specific role of PAR divergence in infertility comes from an experiment

352 by White et al. (2012b): F2 males heterozygous for a WSB/EiJ and a CAST/EiJ haplotype in the PAR have a

353 variety of spermatogenic defects, independent of genotypes at other loci.

354 However, Dumont (2017) has shown that although the number of X-Y crossovers (assayed by immunostain-

355 ing against the MLH1 protein) in inter-subspecific F1 hybrid males is reduced relative to the parental strains,

356 absence of stable X-Y pairing is pervasive in both inbreds and F1s. Failure to pair the sex chromosomes is as-

357 sociated with meiotic arrest and subsequent apoptosis of up to 80% of spermatocytes in some inter-subspecific

358 hybrids (Matsuda et al. 1992) – but also up to 25% of spermatocytes in some inbred strains (Imai et al. 1981;

359 Krzanowska 1989). Even this level of X-Y dissociation is compatible with fertility, at least under laboratory

360 conditions. Two Y-chromosome variants with large extensions of the PAR – the Yaa allele, a 4 Mb proximal

361 extension of the PAR on the SB/Le background (Hudgins et al. 1985; Subramanian et al. 2006); and the Y* allele,

362 a PAR-to-PAR fusion of a Y and an X chromosome (Eicher et al. 1991) – are readily propagated in laboratory

363 colonies. Given these observations and the number of segregating PAR-boundary alleles we observe in natu-

364 ral populations, it is reasonable to conclude that the organization of the region is not under strong purifying

365 selection in house mice. However, we stress that our study provides limited direct information on the fitness

366 consequences of X-Y exchanges in the PAR in inter-subspecific hybrids. These events are compatible with vi-

367 ability through at least the neonatal period (when progeny were sacrificed for genotyping) but details of their

368 effects on male and female fertility are unknown.

369 Our study has several further limitations. First, all of our analyses – whether of array genotypes or whole-

370 genome sequence data – are predicated on the representation of the PAR in the mm10 reference genome. The

371 assembly of the PAR has proven difficult because of its repetitive content, and is thus fragmented and incom-

372 plete. The PAR of C57BL/6J is approximately 750 kb long by restriction mapping (Perry et al. 2001) but has only

373 161 kb of non-gap bases in mm10. We are blind to any sequence that does not have at least partial representation

374 in the reference assembly. It is extremely difficult, and in many cases impossible, to accurately reconstruct com-

375 plex structural variants from short-read data alone. The noisy alignment patterns in the canonical PAR strongly

20 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Mid1

M. m. musculus X Y

M. m. castaneus X Y

M. m. domesticus X Y

M. spicilegus Mid1 M. spretus

X

M. caroli Y

3.0 1.5 0.5 0.0 Mya

Figure 10: Changes in the pseudoautosomal boundary mapped onto the Mus phylogeny. Ancestral PAR, includ- ing unassembled repetitive regions, represented in grey. X-unique sequence translocated to the Y, expanding the PAR, shown as colored blocks. There are now at least three independent alterations of the boundary and/or organization of the PAR (inverted triangles): one in the ancestor of M. musculus (after the divergence from the M. spretus–M. spicilegus clade), expanding the PAR to include exons 4–8 of Mid1; the previously-documented expan- sion in CAST/EiJ; and the rearrangement we describe in PWK/PhJ. Approximate divergence times are shown below the tree, and schematics of the organization of each allele at right.

21 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

376 suggest that its organization is much more complex than the reference assembly would suggest, but we can say

377 little more than that. Detailed characterization of repetitive loci in mammalian genomes (eg. Soh et al. (2014);

378 Cantsilieris et al. (2018); Lilue et al. (2018)) with orthogonal technologies including long-read single-molecule

379 sequencing and high-throughput physical mapping has shown that the view from short reads may be a faint

380 shadow of the truth. Finally, despite these limitations, our study provides a useful foundation for future efforts

381 to characterize the sequence, structure and function of this enigmatic region of the genome.

382 Materials and Methods

383 Mouse breeding

384 The crosses described herein were bred as part of ongoing studies of aging, recombination and infertility in

385 hybrid males. Standard mouse strains (CAST/EiJ, PWK/PhJ, WSB/EiJ, FVB/NJ) were purchased from the

386 Jackson Laboratory (Bar Harbor, ME) between 2008 and 2011. Reciprocal F1 hybrid males were generated in all

387 pairwise combinations between CAST/EiJ (abbreviated with strain code F) PWK/PhJ (G), and WSB/EiJ (H) in

388 2008-2009. F1 males were bred with multiple FVB/NJ females to generate N2 progeny which were sacrificed

389 at birth by cervical dislocation. N2 progeny were named as follows: sire identifier (PAE_xxx), 2-letter encoding

390 of sire genotype (eg. HF for WSB/EiJ×CAST/EiJ), 1-letter encoding of approximate sire age at conception

391 (E=elderly, Y=young), and 2- or 3-letter pup identifier. All breeding was completed by 2011. The study was

392 approved by the Institutional Animal Care and Use Committee of the University of North Carolina at Chapel

393 Hill and all animal husbandry was conducted in the university’s AAALAC-accredited facility (Animal Welfare

394 Assurance #A3410-01) accordance with institutional and federal regulations.

395 DNA extraction, genotyping and quality control

396 Genomic DNA was extracted from tail clips of adult representatives of parental inbred strains or whole tails or

397 heads of N2 progeny using either a standard phenol-chloroform method (Green and Sambrook 2012) or Qiagen

398 DNeasy Blood & Tissue Kits (catalog no. 69506; Qiagen, Hilden, Germany). Approximately 1.5 micrograms of

399 DNA per sample was shipped to Neogen Inc (Lincoln, NE) for hybridization to one of two Mouse Universal

400 Genotyping Arrays, MegaMUGA (77K probes) or MiniMUGA (11K probes). Both arrays are custom-designed

401 on the Illumina Infinium HD platform (Steemers et al. 2006), with content optimized for information content

402 in classical inbred strains and the 8 founder strains of the Collaborative Cross (A/J, C57BL/6J, 129S1/SvImJ,

403 NOD/ShiLtJ, NZO/HlLtJ, CAST/EiJ, PWK/PhJ, WSB/EiJ), as described in (Morgan et al. 2016). Genotypes

404 were called by the vendor from hybridization intensity signals using the semi-supervised clustering algorithm

405 implemented in the Illumina BeadStudio software (Illumina Inc, San Diego, CA).

22 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

406 Genotype data was processed in R v3.3.2 (R Foundation for Statistical Computing, http://www.r-project.

407 org/) using the argyle package v0.2.2 (Morgan 2016). Samples with < 10% missing genotype calls were

408 retained for analysis; there were no genotyping failures. Samples were genetically sexed by comparing av-

409 erage hybridization intensities for X- and Y-linked markers. We identified a single XO individual among the

410 PWK/PhJ×CAST/EiJ N2s and none in the WSB/EiJ×CAST/EiJ or WSB/EiJ×PWK/PhJ N2s. To confirm cor-

411 rect assignment of N2s to F1 sires, we examined N2 genotypes at sites on the X chromosome informative

412 between the parent strains. We identified 24 progeny representing 2 consecutive litters from the same sire

413 (PAE_018) initially labelled as WSB/EiJ×CAST/EiJ. Males were homozygous for WSB/EiJ genotypes on the X

414 and females heterozygous at sites informative between CAST/EiJ and FVB/NJ. These pattern was consistent

415 with a (CAST/EiJxWSB/EiJ)F1 sire, and we updated our pedigrees accordingly.

416 A complete list of N2 progeny, their parents, genetically-confirmed sex, and platform on which they were

417 genotyped is provided in File S4.

418 Refinement of genotypes on the X chromosome

419 Because the aim of this study is to identify crossovers in the PAR, we focused particular attention on the quality

420 of genotype calls at markers in this region. The extended PAR of CAST/EiJ begins (in the mm10 assembly)

421 at chrX:169542082, and the canonical PAR at chrX:169969759. We augmented MegaMUGA genotypes and hy-

422 bridization intensitnes (n = 186 N2s +4 parental strains) with 111 representatives of the 8 Collaborative Cross

423 founder strains and the 26 viable reciprocal F1 hybrids between them. (These control genotypes are available

424 for public download from http://csbio.unc.edu/CCstatus/index.py.) Of 19 markers whose probe

425 sequence aligns to the PAR in mm10, only one (JAX00723372) yields automated calls in the heterozygous and

426 both homozygous states among the 111 control samples. We manually inspected hybridization intensities at

427 each of the remaining 18 markers to find any with cluster patterns from which reliable genotypes might be re-

428 covered. Three clusters – putatively corresponding to the two homozygous and one heterozygous states – were

429 observed for at a single marker, UNC31596457. No heterozygotes are called at this marker by the automated

430 algorithm in BeadStudio. However, it is clear that CAST/EiJ has a private allele at this marker, and that one of

431 the clusters corresponds to CAST/(non-CAST) heterozygotes in the control samples (Figure S7). We therefore

432 re-called genotypes at this marker by fitting a 3-component bivariate Gaussian mixture model to hybridiza-

433 tion intensities of the control samples and N2s using the R package mclust v5.2.1 (Fraley and Raftery 2002).

434 Samples with cluster assignment probability < 0.9 were marked as missing. The result is shown in Figure S8.

435 Following this genotype refinement for MegaMUGA, informative markers in or beyond the extended PAR

436 of CAST/EiJ were identified separately on the two array platforms by comparing genotypes between single rep-

437 resentatives of the relevant inbred strains. This left 7 and 6 markers on MegaMUGA for the PWK/PhJ×CAST/EiJ

23 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

438 and WSB/EiJ×CAST/EiJ crosses respectively; and 1 and 2 markers on MiniMUGA for the same crosses. A list

439 of informative markers for each array-cross combination is provided in File S2. Refined genotype calls for all

440 N2 progeny and controls in the study are provided in File S5.

441 Crossovers in the PAR were identified as illustrated in Figure 2B. Genetic lengths of inter-marker intervals

442 were calculated assuming an identity map function (ie. no crossover interference) and standard errors obtained

443 from formula 2.12 in (Xu 2013).

444 Whole-genome sequencing

445 New whole-genome sequence data was generated for 87 N2 progeny of the WSB/EiJ×PWK/PhJ cross at the

446 University of North Carolina High-Throughput Sequencing Facility. Genomic DNA was fragmented by ultra-

447 sonication on a Covaris instrument (Covaris Inc, Woburn, MA) and enriched for fragments approximately 300

448 bp in size on a PippinPrep system (Sage Science, Beverly, MA). Paired-end libraries were prepared using the Il-

449 lumina TruSeq PE v1 chemistry. Samples were multiplexed in two pools (with 45 and 42 samples, respectively)

450 and sequenced 2x100bp on an Illumina HiSeq 2500 instrument to a depth of approximately 5×. Demultiplexing

451 and postprocessing were performed with Illumina CASAVA v1.8.2. Data has been submitted to the European

452 Nucleotide Archive (accession PRJEB32247).

453 Reads for inbred strains PWK/PhJ, FVB/NJ and WSB/EiJ was obtained from the public FTP site of the

454 Sanger Mouse Genomes Project, February 2015 release (ftp://ftp-mouse.sanger.ac.uk/REL-1502-BAM)

455 (Keane et al. 2011). Reads for Collaborative Cross strains CC001/Unc, CC022/GeniUnc, CC032/GeniUnc,

456 CC044/GeniUNC and CC045/GeniUnc were generated by our research group and have been previously pub-

457 lished (European Nucleotide Archive accession PRJEB14673) (Srivastava et al. 2017). Reads for wild M. mus-

458 culus and M. spretus have been previously published (Harr et al. 2016) and were obtained from the European

459 Nucleotide Archive (accessions PRJEB9450, PRJEB11742, PRJEB14167, PRJEB2176). Reads for M. spicilegus have

460 been previously published (Neme and Tautz 2016) and were obtained from the European Nucleotide Archive

461 (accession PRJEB11513). Reads for M. caroli have been previously published (Thybert et al. 2018) and were

462 obtained from the European Nucleotide Archive (accession PRJEB14895). A complete list of samples used in

463 whole-genome sequencing analysis is provided in File S6.

464 The standard mouse reference assembly is a haploid representation of the genome – except in the PAR,

465 whose sequence appears on both sex chromosomes. This induces ambiguous sequence alignments and ef-

466 fectively halves the observed read depth over each copy of the PAR. To avoid these artifacts, we masked the

467 sequence of the Y PAR (chrY:90745845-91644698) in the mm10/GRCm38 assembly with Ns. Reads were aligned

468 to this masked reference with bwa mem 0.7.15-r1140 (Li 2013) with flags -t 16 -YK100000000 and default

469 settings otherwise. Optical duplicates were marked with samblaster v0.1.24 (Faust and Hall 2014).

24 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

470 Ascertainment of CNVs and their boundaries

471 Copy-number variants in the distal X chromosome were identified by examining read depth in windows of

472 varying sizes, as described in the main text. Paired-end reads are expected to align with a characteristic orien-

473 tation: one read on the forward strand and its mate on the reverse (configuration “FR”). Clusters of pairs with

474 other orientations (RR, FF or RF) may be a signal of underlying structural variation. Summaries of total and

475 FR read depth were calculated with pysamstats v1.1.0 (https://github.com/alimanfoo/pysamstats)

476 and normalized against read depth in single-copy autosomal sequence. CNV boundaries were refined by man-

477 ual inspection of read alignments in IGV v2.4.16 (Robinson et al. 2011). Clusters of 3 or more reads with

478 soft-clipped alignments (that is, alignments of less than the full read length, indicated by the presence of S

479 in the CIGAR string) all terminating at exactly the same base pair, and coinciding with an abrupt change in

480 both total read depth and the proportion of FR reads, were deemed candidate CNV breakpoints. Such reads,

481 as well as their mates (whether the mate was mapped or not) were extracted and converted to fastq format

482 with samtools v1.9 (Li et al. 2009). For the breakpoints of the PWK/PhJ Y CNV, reads crossing each of 3

483 candidate breakpoints were assembled de novo with SPAdes v3.10.1 (Bankevich et al. 2012) with flags --12

484 (for paired-end reads) and --only-assembler (skipping error correction, which requires higher coverage).

485 The longest contig assembled for each breakpoint was aligned back to the mm10 reference with blat (Kent

486 2002) at the UCSC Genome Browser (https://genome.ucsc.edu/cgi-bin/hgTracks?db=mm10). Con-

487 tigs were screened for possible open reading frames with the ExPASy Translate tool (https://web.expasy.

488 org/translate/). Their sequences are provided in File S3.

489 To identify the source of breakpoint sequence in region_3 (Figure 5), we performed blat searches against

490 recent de novo assemblies of M. spretus (GenBank accession GCA_001624865.1; (Lilue et al. 2018)), M. caroli

491 (GCA_900094665.2; (Thybert et al. 2018)) and M. pahari (GCA_900095145.2; (Thybert et al. 2018)). This contig

492 has gapped alignments to these outgroups genomes that resemble alignments of a spliced transcript (Figure

493 S9, File S7), but the aligned segments do not correspond to annotated exons. The longest ORF it encodes is

494 66 aa long, and BLAST searches of 6-frame translations (blastx) against the NCBI non-redundant protein

495 database yield no hits with e-value < 1 (search performed 31 January 2019). We conclude that despite having

496 the appearance of a retrotransposition event, the sequence is unlikely to represent an ancestral coding sequence.

497 SNV calling and filtering

498 For analyses of recombination within the PWK/PhJ extended PAR, we used bcftools v1.9 to ascertain biallelic

499 SNVs in the distal X chromosome (chrX:169400000-169969759) with call rate > 90% in 169 samples including

500 N2 progeny, inbred parental strains, wild mice, and representative CC strains with a PWK/PhJ Y chromosome.

501 We retained 84 sites homozygous in the PWK/PhJ female and heterozygous in the CC044/Unc strain (carrying

25 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

502 a PWK/PhJ Y chromosome) and segregating at absolute allele count > 10 in the N2s. A single N2 individual

503 (PAE_033_HG_E_463) had an outlying number of heterozygous calls and appeared to be cross-contaminated

504 with another individual; it was excluded from this and all other analyses. Genotype calls at these sites are

505 provided in VCF format in File S8.

506 Principal components analysis (PCA) on genotypes in the distal X chromosome was performed with akt

507 v3beb346 (Arthur et al. 2017), treating all samples as nominally diploid (ie. substituting homozygous for hem-

508 izygous genotypes.) Because we did not have access to whole-genome sequence data for an inbred PWK/PhJ

509 male, we created a pseudo-diploid individual homozygous for PWK/PhJ Y-linked alleles in order to perform

510 ancestry assignment for the PWK/PhJ Y PAR.

511 Re-analysis of ChIPseq data

512 Raw reads from published ChIPseq against H3K4me3 in 12 days post-partum testes from reciprocal F1 hybrid

513 males between C57BL/6J and CAST/EiJ (Baker et al. 2015) were downloaded from the NCBI Short Read Archive

514 (accession PRJNA259788) and aligned to the mm10 reference with bwa mem as described above. Post-processed

515 read alignments from ChIPseq against DMC1 from the same genotypes (but different laboratories), representing

516 single-stranded DNA fragments at DSB free ends (Smagulova et al. 2016), were downloaded from NCBI GEO

517 (accession GSE75419).

518 Tag reads were assigned to the CAST/EiJ or C57BL/6J haplotype according to the base called at known in-

519 formative sites between these strains (using biallelic SNVs from the Mouse Genomes Project VCF, May 2015 re-

520 lease; ftp://ftp-mouse.sanger.ac.uk/REL-1505-SNPs_Indels/). Given n informative sites spanned

521 by a read, the log-likelihood of the read originating from each parental haplotype was calculated as the bino-

522 mial probability of observing k bases consistent with that haplotype and n − k bases inconsistent. A read was

523 deemed assignable if log-likelihood of maternal versus paternal assignments differed by > 2 units. Tag density

524 was calculated with the bamCoverage tool from the bamTools suite (Barnett et al. 2011) using 100 bp bins and

525 a 500 bp smoothing window. No ChIP background subtraction or peak-calling was performed.

526 We note that, since reads were aligned to the standard mm10 reference, reads arising from the CAST/EiJ

527 haplotype are somewhat less likely to be aligned, and so the count of CAST/EiJ reads will be an underestimate.

528 This “reference bias” causes our analyses to be conservative.

529 Analysis of gene expression

530 Assay of gene expression by paired-end, unstranded RNAseq in adult brain from 90 progeny of a complete

531 diallel cross (eg. all possible reciprocal crosses) between CAST/EiJ, PWK/PhJ and WSB/EiJ has been described

532 previously (Crowley et al. 2015). We obtained the raw reads and quantified expression using kallisto v0.44.0

26 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

533 (Bray et al. 2016) with default options. Similar to the approach described for whole-genome sequencing, we

534 removed 8 redundant transcripts annotated in the Y-encoded copy of the PAR (retaining X-encoded copies)

535 and performed quantification against the remaining 141, 854 transcripts in the Ensembl version 96 annotation.

536 Transcript-level estimates were aggregated to gene level with the R package tximport (Soneson et al. 2015)

537 and analyses of differential expression were performed with limma-voom (Ritchie et al. 2015).

538 To investigate allele-specific expression and splicing patterns we also aligned reads to our masked version

539 of the reference genome and masked transcript annotation with STAR v2.7.0e (Dobin et al. 2013). Read pairs

540 were aligned to parental haplotypes as described above for the ChIPseq data. Coverage along annotated exons

541 of Mid1 was calculated with bamCoverage using 5 bp bins and a 25 bp smoothing window. Reads were pooled

542 across biological replicates of the same genotype and sex.

543 Mid1 transcript isoforms were assembled from read alignments in the parental strains (CAST/EiJ, PWK/PhJ

544 and WSB/EiJ), pooling reads from both male and female samples, using StringTie v1.3.5 (Pertea et al. 2015)

545 with options -f 0.1 -a 4 -c 0.1 -g 500 and the Ensembl version 96 annotation as a guide.

546 Data availability

547 Whole-genome sequence data from N2 progeny has been submitted to the European Nucleotide Archive (ac-

548 cession PRJEB32247). Supplementary files have been submitted to FigShare. Inferred genetic map for the

549 extended PAR is in File S1; informative markers near PAR boundary in File S2; assembled breakpoint se-

550 quences of PWK/PhJ PAR variant in File S3; list of N2 progeny used for recombination analysis in File S4;

551 array genotype and intensity data, with marker locations, in File S5; and genotype calls in PWK/PhJ PAR

552 variant from whole-genome sequence in File S8. Supporting code is available at https://github.com/

553 andrewparkermorgan/mouse_pseudoautosomal_region.

554 Acknowledgments

555 The authors thank Grace Clark, Justin Gooch, the late Mark Calaway, and Darla Miller for assistance with

556 DNA preparation and submission of samples for genotyping; Randy Nonneman for assistance with mouse

557 husbandry; and Chris Baker, Kevin Brick and Fatima Smagulova for their generous technical assistance in re-

558 analyzing their published ChIPseq data. The authors benefited from fruitful correspondence with Beth Dumont

559 while preparing the manuscript. This work was supported in part by National Institutes of Health grants

560 F30MH103925 (APM), R01HD065024 (FPMdV), R21MH096261 (FPMdV), K01MH094406 (JJC), P50MH090338

561 (FPMdV), P50HG006582 (FPMdV) and U42OD010924 (Terry Magnuson).

27 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

562 References

563 Acquaviva, L., M. Boekhout, M. E. Karasu, K. Brick, F. Pratto, M. van Overbeek, L. Kauppi, R. Daniel Camerini-

564 Otero, M. Jasin, and S. Keeney, 2019 Ensuring meiotic DNA break formation in the mouse pseudoautosomal

565 region.

566 Arthur, R., O. Schulz-Trieglaff, A. J. Cox, and J. O’Connell, 2017 AKT: ancestry and kinship toolkit. Bioinfor-

567 matics 33: 142–144.

568 Ashley, T. and M. J. Moses, 1980 End association and segregation of the achiasmatic X and Y chromosomes of

569 the sand rat, psammomys obesus. Chromosoma 78: 203–210.

570 Baker, C. L., S. Kajita, M. Walker, R. L. Saxl, N. Raghupathy, K. Choi, P. M. Petkov, and K. Paigen, 2015 PRDM9

571 drives evolutionary erosion of hotspots in mus musculus through Haplotype-Specific initiation of meiotic

572 recombination. PLoS Genet. 11: e1004916.

573 Bankevich, A., S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M. Lesin, S. I. Nikolenko,

574 S. Pham, A. D. Prjibelski, A. V. Pyshkin, A. V. Sirotkin, N. Vyahhi, G. Tesler, M. A. Alekseyev, and P. A.

575 Pevzner, 2012 SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J.

576 Comput. Biol. 19: 455–477.

577 Barnett, D. W., E. K. Garrison, A. R. Quinlan, M. P. Strömberg, and G. T. Marth, 2011 BamTools: a c++ API and

578 toolkit for analyzing and managing BAM files. Bioinformatics 27: 1691–1692.

579 Bellani, M. A., K. A. Boateng, D. McLeod, and R. D. Camerini-Otero, 2010 The expression profile of the ma-

580 jor mouse SPO11 isoforms indicates that SPO11beta introduces double strand breaks and suggests that

581 SPO11alpha has an additional role in prophase in both spermatocytes and oocytes. Mol. Cell. Biol. 30: 4391–

582 4403.

583 Bellott, D. W., J. F. Hughes, H. Skaletsky, L. G. Brown, T. Pyntikova, T.-J. Cho, N. Koutseva, S. Zaghlul, T. Graves,

584 S. Rock, C. Kremitzki, R. S. Fulton, S. Dugan, Y. Ding, D. Morton, Z. Khan, L. Lewis, C. Buhay, Q. Wang,

585 J. Watt, M. Holder, S. Lee, L. Nazareth, J. Alföldi, S. Rozen, D. M. Muzny, W. C. Warren, R. A. Gibbs, R. K.

586 Wilson, and D. C. Page, 2014 Mammalian Y chromosomes retain widely expressed dosage-sensitive regula-

587 tors. Nature 508: 494–499.

588 Bhattacharyya, T., S. Gregorova, O. Mihola, M. Anger, J. Sebestova, P. Denny, P. Simecek, and J. Forejt, 2013

589 Mechanistic basis of infertility of mouse intersubspecific hybrids. Proc. Natl. Acad. Sci. U. S. A. 110: E468–

590 E477.

28 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

591 Bray, N. L., H. Pimentel, P. Melsted, and L. Pachter, 2016 Near-optimal probabilistic RNA-seq quantification.

592 Nat. Biotechnol. 34: 525–527.

593 Breen, M., L. Deakln, B. Macdonald, S. Miller, R. Sibson, E. Tarttelln, P. Avner, F. Bourgade, J.-L. Guenet, X. Mon-

594 tagutelli, C. Polrier, D. Simon, D. Tailor, M. Bishop, M. Kelly, F. Rysavy, S. Rastan, D. Norris, D. Shepherd,

595 C. Abbott, A. Pllz, S. Hodge, I. Jackson, Y. Boyd, H. Blair, G. Maslen, J. A. Todd, P. W. Reed, J. Stoye, A. Ash-

596 worth, L. McCarthy, R. Cox, L. Schalkwyk, H. Lehrach, J. Klose, U. Gangadharan, and S. Brown, 1994 To-

597 wards high resolution maps of the mouse and human genomes—a facility for ordering markers to 0.1 cm

598 resolution. Hum. Mol. Genet. 3: 621–627.

599 Burgoyne, P. S., 1982 Genetic homology and crossing over in the X and Y chromosomes of mammals. Hum.

600 Genet. 61: 85–90.

601 Burgoyne, P. S., S. K. Mahadevaiah, M. J. Sutcliffe, and S. J. Palmer, 1992 Fertility in mice requires X-Y pairing

602 and a y-chromosomal “spermiogenesis” gene mapping to the long arm. Cell 71: 391–398.

603 Campbell, P., J. M. Good, and M. W. Nachman, 2013 Meiotic inactivation is disrupted in sterile

604 hybrid male house mice. Genetics 193: 819–828.

605 Cantsilieris, S., B. J. Nelson, J. Huddleston, C. Baker, L. Harshman, K. Penewit, K. M. Munson, M. Sorensen,

606 A. E. Welch, V. Dang, F. Grassmann, A. J. Richardson, R. H. Guymer, T. A. Graves-Lindsay, R. K. Wilson,

607 B. H. F. Weber, P. N. Baird, R. Allikmets, and E. E. Eichler, 2018 Recurrent structural variation, clustered sites

608 of selection, and disease risk for the complement factor H (CFH) gene family. Proc. Natl. Acad. Sci. U. S. A.

609 115: E4433–E4442.

610 Carnero, A., R. Jiménez, M. Burgos, A. Sánchez, and R. Díaz de la Guardia, 1991 Achiasmatic sex chromosomes

611 in Pitymys duodecimcostatus: mechanisms of association and segregation. Cytogenet. Cell Genet. 56: 78–81.

612 Charchar, F. J., M. Svartman, N. El-Mogharbel, M. Ventura, P. Kirby, M. R. Matarazzo, A. Ciccodicola, M. Rocchi,

613 M. D’Esposito, and J. A. M. Graves, 2003 Complex events in the evolution of the human pseudoautosomal

614 region 2 (PAR2). Genome Res. 13: 281–286.

615 Cotter, D. J., S. M. Brotman, and M. A. Wilson Sayres, 2016 Genetic diversity on the human X chromosome does

616 not support a strict pseudoautosomal boundary. Genetics 203: 485–492.

617 Crowley, J. J., V. Zhabotynsky, W. Sun, S. Huang, I. K. Pakatci, Y. Kim, J. R. Wang, A. P. Morgan, J. D. Calaway,

618 D. L. Aylor, Z. Yun, T. A. Bell, R. J. Buus, M. E. Calaway, J. P. Didion, T. J. Gooch, S. D. Hansen, N. N. Robin-

619 son, G. D. Shaw, J. S. Spence, C. R. Quackenbush, C. J. Barrick, R. J. Nonneman, K. Kim, J. Xenakis, Y. Xie,

620 W. Valdar, A. B. Lenarcic, W. Wang, C. E. Welsh, C.-P. Fu, Z. Zhang, J. Holt, Z. Guo, D. W. Threadgill, L. M.

29 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

621 Tarantino, D. R. Miller, F. Zou, L. McMillan, P. F. Sullivan, and F. Pardo-Manuel de Villena, 2015 Analyses of

622 allele-specific gene expression in highly divergent mouse crosses identifies pervasive allelic imbalance. Nat.

623 Genet. 47: 353–360.

624 Dal Zotto, L., N. A. Quaderi, R. Elliott, P. A. Lingerfelter, L. Carrel, V. Valsecchi, E. Montini, C. H. Yen, V. Chap-

625 man, I. Kalcheva, G. Arrigo, O. Zuffardi, S. Thomas, H. F. Willard, A. Ballabio, C. M. Disteche, and E. I.

626 Rugarli, 1998 The mouse mid1 gene: implications for the pathogenesis of opitz syndrome and the evolution

627 of the mammalian pseudoautosomal region. Hum. Mol. Genet. 7: 489–499.

628 Das, P. J., B. P. Chowdhary, and T. Raudsepp, 2009 Characterization of the bovine pseudoautosomal region and

629 comparison with sheep, goat, and other mammalian pseudoautosomal regions. Cytogenet. Genome Res. 126:

630 139–147.

631 Didion, J. P., A. P. Morgan, L. Yadgary, T. A. Bell, R. C. McMullan, L. O. d. Solorzano, J. Britton-Davidian, C. J.

632 Bult, K. J. Campbell, R. Castiglia, Y.-H. Ching, A. J. Chunco, J. J. Crowley, E. J. Chesler, D. W. Förster, J. E.

633 French, S. I. Gabriel, D. M. Gatti, T. Garland, E. B. Giagia-Athanasopoulou, M. D. Giménez, S. A. Grize,

634 I.˙ Gündüz, A. Holmes, H. C. Hauffe, J. S. Herman, J. M. Holt, K. Hua, W. J. Jolley, A. K. Lindholm, M. J.

635 López-Fuster, G. Mitsainas, M. d. L. Mathias, L. McMillan, M. d. G. M. Ramalhinho, B. Rehermann, S. P.

636 Rosshart, J. B. Searle, M.-S. Shiao, E. Solano, K. L. Svenson, P. Thomas-Laemont, D. W. Threadgill, J. Ventura,

637 G. M. Weinstock, D. Pomp, G. A. Churchill, and F. P.-M. d. Villena, 2016 R2d2 drives selfish sweeps in the

638 house mouse. Mol. Biol. Evol. 33: 1381–1395.

639 Dobin, A., C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski, S. Jha, P. Batut, M. Chaisson, and T. R. Gingeras,

640 2013 STAR ultrafast universal RNA-seq aligner. Bioinformatics 29: 15–21.

641 Dumont, B. L., 2017 Meiotic consequences of genetic divergence across the murine pseudoautosomal region.

642 Genetics 205: 1089–1100.

643 Dumont, B. L. and B. A. Payseur, 2011a Evolution of the genomic recombination rate in murid rodents. Genetics

644 187: 643–657.

645 Dumont, B. L. and B. A. Payseur, 2011b Genetic analysis of Genome-Scale recombination rate evolution in

646 house mice. PLoS Genet. 7: e1002116.

647 Eicher, E. M., D. W. Hale, P. A. Hunt, B. K. Lee, P. K. Tucker, T. R. King, J. T. Eppig, and L. L. Washburn,

648 1991 The mouse y* chromosome involves a complex rearrangement, including interstitial positioning of the

649 pseudoautosomal region. Cytogenet. Cell Genet. 57: 221–230.

30 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

650 Ellis, N., P. Yen, K. Neiswanger, L. J. Shapiro, and P. N. Goodfellow, 1990 Evolution of the pseudoautosomal

651 boundary in old world monkeys and great apes. Cell 63: 977–986.

652 Eyre-Walker Adam, 1993 Recombination and mammalian genome evolution. Proceedings of the Royal Society

653 of London. Series B: Biological Sciences 252: 237–243.

654 Faisal, I. and L. Kauppi, 2016 Sex chromosome recombination failure, apoptosis, and fertility in male mice.

655 Chromosoma 125: 227–235.

656 Faust, G. G. and I. M. Hall, 2014 SAMBLASTER fast duplicate marking and structural variant read extraction.

657 Bioinformatics 30: 2503–2505.

658 Filatov, D. A. and D. T. Gerrard, 2003 High mutation rates in human and ape pseudoautosomal genes. Gene

659 317: 67–77.

660 Forejt, J., 1996 Hybrid sterility in the mouse. Trends Genet. 12: 412–417.

661 Fraley, C. and A. E. Raftery, 2002 Model-Based clustering, discriminant analysis, and density estimation. J. Am.

662 Stat. Assoc. 97: 611–631.

663 Geraldes, A., P. Basset, B. Gibson, K. L. Smith, B. Harr, H.-T. Yu, N. Bulatova, Y. Ziv, and M. W. Nachman,

664 2008 Inferring the history of speciation in house mice from autosomal, x-linked, y-linked and mitochondrial

665 genes. Mol. Ecol. 17: 5349–5363.

666 Good, J. M., T. Giger, M. D. Dean, and M. W. Nachman, 2010 Widespread Over-Expression of the X chromosome

667 in sterile F1 hybrid mice. PLoS Genet. 6: e1001148.

668 Graves, J. A., C. M. Disteche, and R. Toder, 1998 Gene dosage in the evolution and function of mammalian sex

669 chromosomes. Cytogenet. Cell Genet. 80: 94–103.

670 Graves, J. A. and J. M. Watson, 1991 Mammalian sex chromosomes: evolution of organization and function.

671 Chromosoma 101: 63–68.

672 Graves, J. A. M., 2006 Sex chromosome specialization and degeneration in mammals. Cell 124: 901–914.

673 Green, M. R. and J. Sambrook, 2012 Molecular Cloning: A Laboratory Manual (Fourth Edition): Three-volume set.

674 Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, fourth edition.

675 Gregorová, S. and J. Forejt, 2000 PWD/Ph and PWK/Ph inbred mouse strains of mus m. musculus subspecies–a

676 valuable resource of phenotypic variations and genomic polymorphisms. Folia Biol. 46: 31–41.

31 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

677 Hale, D. W., L. L. Washburn, and E. M. Eicher, 1993 Meiotic abnormalities in hybrid mice of the C57BL/6J ×

678 mus spretus cross suggest a cytogenetic basis for haldane’s rule of hybrid sterility. Cytogenet. Genome Res.

679 63: 221–234.

680 Harr, B., E. Karakoc, R. Neme, M. Teschke, C. Pfeifle, Ž. Pezer, H. Babiker, M. Linnenbrink, I. Montero, R. Scav-

681 etta, M. R. Abai, M. P. Molins, M. Schlegel, R. G. Ulrich, J. Altmüller, M. Franitza, A. Büntge, S. Künzel,

682 and D. Tautz, 2016 Genomic resources for wild populations of the house mouse, mus musculus and its close

683 relative mus spretus. Scientific Data 3: 160075.

684 Hassold, T. J., S. L. Sherman, D. Pettay, D. C. Page, and P. A. Jacobs, 1991 XY chromosome nondisjunction in

685 man is associated with diminished recombination in the pseudoautosomal region. Am. J. Hum. Genet. 49:

686 253–260.

687 Hellmann, I., I. Ebersberger, S. E. Ptak, S. Pääbo, and M. Przeworski, 2003 A neutral explanation for the corre-

688 lation of diversity with recombination rates in humans. Am. J. Hum. Genet. 72: 1527–1535.

689 Hinch, A. G., N. Altemose, N. Noor, P. Donnelly, and S. R. Myers, 2014 Recombination in the human pseudoau-

690 tosomal region PAR1. PLoS Genet. 10: e1004503.

691 Hinch, A. G., G. Zhang, P. W. Becker, D. Moralli, R. Hinch, B. Davies, R. Bowden, and P. Donnelly, 2019 Factors

692 influencing meiotic recombination revealed by whole-genome sequencing of single sperm. Science 363.

693 Huang, S.-W., R. Friedman, N. Yu, A. Yu, and W.-H. Li, 2005 How strong is the mutagenicity of recombination

694 in mammals? Mol. Biol. Evol. 22: 426–431.

695 Hudgins, C. C., R. T. Steinberg, D. M. Klinman, M. J. Reeves, and A. D. Steinberg, 1985 Studies of consomic

696 mice bearing the Y chromosome of the BXSB mouse. J. Immunol. 134: 3849–3854.

697 Imai, H. T., Y. Matsuda, T. Shiroishi, and K. Moriwaki, 1981 High frequency fo X-Y chromosome dissociation

698 in primary spermatocytes of F1 hybrids between japanese wild mice (mus musculus molossinus) and inbred

699 laboratory mice. Cytogenet. Cell Genet. 29: 166–175.

700 Kasahara, T., K. Abe, K. Mekada, A. Yoshiki, and T. Kato, 2010 Genetic variation of melatonin productivity in

701 laboratory mice under domestication. Proc. Natl. Acad. Sci. U. S. A. 107: 6412–6417.

702 Kauppi, L., M. Barchi, F. Baudat, P. J. Romanienko, S. Keeney, and M. Jasin, 2011 Distinct properties of the XY

703 pseudoautosomal region crucial for male meiosis. Science 331: 916–920.

704 Keane, T. M., L. Goodstadt, P. Danecek, M. A. White, K. Wong, B. Yalcin, A. Heger, A. Agam, G. Slater, M. Good-

705 son, N. A. Furlotte, E. Eskin, C. Nellåker, H. Whitley, J. Cleak, D. Janowitz, P. Hernandez-Pliego, A. Edwards,

32 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

706 T. G. Belgard, P. L. Oliver, R. E. McIntyre, A. Bhomra, J. Nicod, X. Gan, W. Yuan, L. van der Weyden, C. A.

707 Steward, S. Bala, J. Stalker, R. Mott, R. Durbin, I. J. Jackson, A. Czechanski, J. A. Guerra-Assunção, L. R. Don-

708 ahue, L. G. Reinholdt, B. A. Payseur, C. P. Ponting, E. Birney, J. Flint, and D. J. Adams, 2011 Mouse genomic

709 variation and its effect on phenotypes and gene regulation. Nature 477: 289–294.

710 Keeney, S., C. N. Giroux, and N. Kleckner, 1997 Meiosis-specific DNA double-strand breaks are catalyzed by

711 spo11, a member of a widely conserved protein family. Cell 88: 375–384.

712 Kent, W. J., 2002 BLAT–the BLAST-like alignment tool. Genome Res. 12: 656–664.

713 Kipling, D., E. C. Salido, L. J. Shapiro, and H. J. Cooke, 1996a High frequency de novo alterations in the long–

714 range genomic structure of the mouse pseudoautosomal region. Nat. Genet. 13: 78–82.

715 Kipling, D., H. E. Wilson, E. J. Thomson, M. Lee, J. Perry, S. Palmer, A. Ashworth, and H. J. Cooke, 1996b

716 Structural variation of the pseudoautosomal region between and within inbred mouse strains. Proc. Natl.

717 Acad. Sci. U. S. A. 93: 171–175.

718 Krzanowska, H., 1989 X-Y chromosome dissociation in mouse strains differing in efficiency of spermatogenesis:

719 elevated frequency of univalents in pubertal males. Gamete Res. 23: 357–365.

720 Lamb, B. C., 1984 The properties of meiotic gene conversion important in its effects on evolution. Heredity 53 (

721 Pt 1): 113–138.

722 Li, H., 2013 Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997

723 [q-bio] .

724 Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, R. Durbin, and . G. P.

725 D. P. Subgroup, 2009 The sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.

726 Lilue, J., A. G. Doran, I. T. Fiddes, M. Abrudan, J. Armstrong, R. Bennett, W. Chow, J. Collins, S. Collins,

727 A. Czechanski, P. Danecek, M. Diekhans, D.-D. Dolle, M. Dunn, R. Durbin, D. Earl, A. Ferguson-Smith,

728 P. Flicek, J. Flint, A. Frankish, B. Fu, M. Gerstein, J. Gilbert, L. Goodstadt, J. Harrow, K. Howe, X. Ibarra-Soria,

729 M. Kolmogorov, C. J. Lelliott, D. W. Logan, J. Loveland, C. E. Mathews, R. Mott, P.Muir, S. Nachtweide, F. C. P.

730 Navarro, D. T. Odom, N. Park, S. Pelan, S. K. Pham, M. Quail, L. Reinholdt, L. Romoth, L. Shirley, C. Sisu,

731 M. Sjoberg-Herrera, M. Stanke, C. Steward, M. Thomas, G. Threadgold, D. Thybert, J. Torrance, K. Wong,

732 J. Wood, B. Yalcin, F. Yang, D. J. Adams, B. Paten, and T. M. Keane, 2018 Sixteen diverse laboratory mouse

733 reference genomes define strain-specific haplotypes and novel functional loci. Nat. Genet. 50: 1574–1583.

33 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

734 Lu, T., R. Chen, T. C. Cox, R. X. Moldrich, N. Kurniawan, G. Tan, J. K. Perry, A. Ashworth, P. F. Bartlett, L. Xu,

735 J. Zhang, B. Lu, M. Wu, Q. Shen, Y. Liu, L. J. Richards, and Z. Xiong, 2013 X-linked microtubule-associated

736 protein, mid1, regulates axon development. Proc. Natl. Acad. Sci. U. S. A. 110: 19131–19136.

737 Mather, K., 1938 CROSSING-OVER. Biol. Rev. Camb. Philos. Soc. 13: 252–292.

738 Matsuda, Y., T. Hirobe, and V. M. Chapman, 1991 Genetic basis of X-Y chromosome dissociation and male

739 sterility in interspecific hybrids. Proc. Natl. Acad. Sci. U. S. A. 88: 4850–4854.

740 Matsuda, Y., P. B. Moens, and V. M. Chapman, 1992 Deficiency of X and Y chromosomal pairing at meiotic

741 prophase in spermatocytes of sterile interspecific hybrids between laboratory mice (mus domesticus) and

742 mus spretus. Chromosoma 101: 483–492.

743 Mensah, M. A., M. S. Hestand, M. H. D. Larmuseau, M. Isrie, N. Vanderheyden, M. Declercq, E. L. Souche,

744 J. Van Houdt, R. Stoeva, H. Van Esch, K. Devriendt, T. Voet, R. Decorte, P. N. Robinson, and J. R. Vermeesch,

745 2014 Pseudoautosomal region 1 length polymorphism in the human population. PLoS Genet. 10: e1004578.

746 Mirzaghaderi, G. and E. Hörandl, 2016 The evolution of meiotic sex and its alternatives. Proceedings of the

747 Royal Society B: Biological Sciences 283: 20161221.

748 Morgan, A. P., 2016 argyle : An R package for analysis of illumina genotyping arrays. G3 6: 281–286.

749 Morgan, A. P., C.-P. Fu, C.-Y. Kao, C. E. Welsh, J. P. Didion, L. Yadgary, L. Hyacinth, M. T. Ferris, T. A. Bell, D. R.

750 Miller, P. Giusti-Rodriguez, R. J. Nonneman, K. D. Cook, J. K. Whitmire, L. E. Gralinski, M. Keller, A. D. Attie,

751 G. A. Churchill, P. Petkov, P. F. Sullivan, J. R. Brennan, L. McMillan, and F. P.-M. d. Villena, 2016 The mouse

752 universal genotyping array: From substrains to subspecies. G3 6: 263–279.

753 Morgan, A. P. and F. Pardo-Manuel de Villena, 2017 Sequence and structural diversity of mouse Y chromo-

754 somes. Mol. Biol. Evol. 34: 3186–3204.

755 Neme, R. and D. Tautz, 2016 Fast turnover of genome transcription across evolutionary time exposes entire

756 non-coding DNA to de novo gene emergence. Elife 5: e09977.

757 Ohno, S., 1966 Sex Chromosomes and Sex-Linked Genes. Springer-Verlag, Berlin.

758 Page, J., S. Berríos, M. T. Parra, A. Viera, J. A. Suja, I. Prieto, J. L. Barbero, J. S. Rufas, and R. Fernández-Donoso,

759 2005 The program of sex chromosome pairing in meiosis is highly conserved across marsupial species: im-

760 plications for sex chromosome evolution. Genetics 170: 793–799.

761 Palmer, S., J. Perry, D. Kipling, and A. Ashworth, 1997 A gene spans the pseudoautosomal boundary in mice.

762 Proc. Natl. Acad. Sci. U. S. A. 94: 12030–12035.

34 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

763 Pardo-Manuel de Villena, F. and C. Sapienza, 1996 Genetic mapping of DXYMov15-associated sequences in the

764 pseudoautosomal region of the C57BL/6J strain. Mamm. Genome 7: 237–239.

765 Perry, J. and A. Ashworth, 1999 Evolutionary rate of a gene affected by chromosomal position. Curr. Biol. 9:

766 987–989.

767 Perry, J., S. Palmer, A. Gabriel, and A. Ashworth, 2001 A short pseudoautosomal region in laboratory mice.

768 Genome Res. 11: 1826–1832.

769 Pertea, M., G. M. Pertea, C. M. Antonescu, T.-C. Chang, J. T. Mendell, and S. L. Salzberg, 2015 StringTie enables

770 improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33: 290–295.

771 Poriswanish, N., R. Neumann, J. H. Wetton, J. Wagstaff, M. H. D. Larmuseau, M. A. Jobling, and C. A. May, 2018

772 Recombination hotspots in an extended human pseudoautosomal domain predicted from double-strand

773 break maps and characterized by sperm-based crossover analysis. PLoS Genet. 14: e1007680.

774 Raudsepp, T. and B. P. Chowdhary, 2015 The eutherian pseudoautosomal region. CGR 147: 81–94.

775 Ritchie, M. E., B. Phipson, D. Wu, Y. Hu, C. W. Law, W. Shi, and G. K. Smyth, 2015 limma powers differential

776 expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43: e47.

777 Robinson, J. T., H. Thorvaldsdóttir, W. Winckler, M. Guttman, E. S. Lander, G. Getz, and J. P. Mesirov, 2011

778 Integrative genomics viewer. Nat. Biotechnol. 29: 24–26.

779 Rouyer, F., M. C. Simmler, C. Johnsson, G. Vergnaud, H. J. Cooke, and J. Weissenbach, 1986 A gradient of sex

780 linkage in the pseudoautosomal region of the human sex chromosomes. Nature 319: 291–295.

781 Sarver, B. A. J., S. Keeble, T. Cosart, P. K. Tucker, M. D. Dean, and J. M. Good, 2017 Phylogenomic insights into

782 mouse evolution using a pseudoreference approach. Genome Biol. Evol. 9: 726–739.

783 Schiebel, K., J. Meder, A. Rump, A. Rosenthal, M. Winkelmann, C. Fischer, T. Bonk, A. Humeny, and G. Rappold,

784 2000 Elevated DNA sequence diversity in the genomic region of the phosphatase PPP2R3L gene in the human

785 pseudoautosomal region. Cytogenet. Cell Genet. 91: 224–230.

786 Sharman, G. B., A. J. McINTOSH, and H. N. Barber, 1950 Multiple sex chromosomes in the marsupials. Nature

787 166: 996.

788 Sharp, P., 1982 Sex chromosome pairing during male meiosis in marsupials. Chromosoma 86: 27–47.

789 Smagulova, F., K. Brick, Y. Pu, R. D. Camerini-Otero, and G. V. Petukhova, 2016 The evolutionary turnover of

790 recombination hot spots contributes to speciation in mice. Genes Dev. 30: 266–280.

35 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

791 Soh, Y. Q. S., J. Alföldi, T. Pyntikova, L. G. Brown, T. Graves, P. J. Minx, R. S. Fulton, C. Kremitzki, N. Koutseva,

792 J. L. Mueller, S. Rozen, J. F. Hughes, E. Owens, J. E. Womack, W. J. Murphy, Q. Cao, P. de Jong, W. C. Warren,

793 R. K. Wilson, H. Skaletsky, and D. C. Page, 2014 Sequencing the mouse Y chromosome reveals convergent

794 gene acquisition and amplification on both sex chromosomes. Cell 159: 800–813.

795 Soneson, C., M. I. Love, and M. D. Robinson, 2015 Differential analyses for RNA-seq: transcript-level estimates

796 improve gene-level inferences. F1000Res. 4: 1521.

797 Soriano, P., E. A. Keitges, D. F. Schorderet, K. Harbers, S. M. Gartler, and R. Jaenisch, 1987 High rate of recombi-

798 nation and double crossovers in the mouse pseudoautosomal region during male meiosis. Proc. Natl. Acad.

799 Sci. U. S. A. 84: 7218–7220.

800 Srivastava, A., A. P. Morgan, M. L. Najarian, V. K. Sarsani, J. S. Sigmon, J. R. Shorter, A. Kashfeen, R. C. McMul-

801 lan, L. H. Williams, P. Giusti-Rodríguez, M. T. Ferris, P. Sullivan, P. Hock, D. R. Miller, T. A. Bell, L. McMillan,

802 G. A. Churchill, and F. P.-M. d. Villena, 2017 Genomes of the mouse collaborative cross. Genetics 206: 537–556.

803 Steemers, F. J., W. Chang, G. Lee, D. L. Barker, R. Shen, and K. L. Gunderson, 2006 Whole-genome genotyping

804 with the single-base extension assay. Nat. Methods 3: 31–33.

805 Subramanian, S., K. Tus, Q.-Z. Li, A. Wang, X.-H. Tian, J. Zhou, C. Liang, G. Bartov, L. D. McDaniel, X. J. Zhou,

806 R. A. Schultz, and E. K. Wakeland, 2006 A tlr7 translocation accelerates systemic autoimmunity in murine

807 lupus. Proc. Natl. Acad. Sci. U. S. A. 103: 9970–9975.

808 Thybert, D., M. Roller, F. C. P. Navarro, I. Fiddes, I. Streeter, C. Feig, D. Martin-Galvez, M. Kolmogorov,

809 V. Janoušek, W. Akanni, B. Aken, S. Aldridge, V. Chakrapani, W. Chow, L. Clarke, C. Cummins, A. Doran,

810 M. Dunn, L. Goodstadt, K. Howe, M. Howell, A.-A. Josselin, R. C. Karn, C. M. Laukaitis, L. Jingtao, F. Mar-

811 tin, M. Muffato, S. Nachtweide, M. A. Quail, C. Sisu, M. Stanke, K. Stefflova, C. Van Oosterhout, F. Veyrunes,

812 B. Ward, F. Yang, G. Yazdanifar, A. Zadissa, D. J. Adams, A. Brazma, M. Gerstein, B. Paten, S. Pham, T. M.

813 Keane, D. T. Odom, and P. Flicek, 2018 Repeat associated mechanisms of genome evolution and function

814 revealed by the mus caroli and mus pahari genomes. Genome Res. 28: 448–459.

815 Vernet, N., S. K. Mahadevaiah, O. A. Ojarikre, G. Longepied, H. M. Prosser, A. Bradley, M. J. Mitchell, and P. S.

816 Burgoyne, 2011 The y-encoded gene zfy2 acts to remove cells with unpaired chromosomes at the first meiotic

817 metaphase in male mice. Curr. Biol. 21: 787–793.

818 White, M. A., A. Ikeda, and B. A. Payseur, 2012a A pronounced evolutionary shift of the pseudoautosomal

819 region boundary in house mice. Mamm. Genome 23: 454–466.

36 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

820 White, M. A., M. Stubbings, B. L. Dumont, and B. A. Payseur, 2012b Genetics and evolution of hybrid male

821 sterility in house mice. Genetics 191: 917–934.

822 Wolf, K. W., K. Baumgart, and H. Winking, 1988 Meiotic association and segregation of the achiasmatic giant

823 sex chromosomes in the male field vole (microtus agrestis). Chromosoma 97: 124–133.

824 Xu, S., 2013 Principles of Statistical Genomics. Springer, New York.

825 Yang, H., J. R. Wang, J. P. Didion, R. J. Buus, T. A. Bell, C. E. Welsh, F. Bonhomme, A. H.-T. Yu, M. W. Nachman,

826 J. Pialek, P. Tucker, P. Boursot, L. McMillan, G. A. Churchill, and F. P.-M. de Villena, 2011 Subspecific origin

827 and haplotype diversity in the laboratory mouse. Nat. Genet. 43: 648–655.

37 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

828 Supplementary material

829 Tables and data files

830 File S1. Genetic map in the extended pseudoautosomal region.

831 File S2. Informative markers near the pseudoautosomal boundary on the 77K and 11K platforms.

832 File S3. Sequences of de novo assembled contigs spanning breakpoints of the PWK/PhJ Y-linked duplication

833 (FASTA format).

834 File S4. List of N2 progeny used in this study.

835 File S5. Array genotypes and hybridization intensities from 11K and 77K platforms for N2 progeny and control

836 inbred strains (ZIP archive).

837 File S6. List of samples analyzed by whole-genome sequencing (including previously published) in this study.

838

839 File S7. Alignments of a breakpoint-spanning contig (“region_3”) to M. spretus, M. caroli and M. pahari assem-

840 blies (UCSC PSL format).

841 File S8. Genotype calls from whole-genome sequencing in the region of the distal X chromosome correspond-

842 ing to the PWK/PhJ Y-linked duplication (VCF format).

843 File S9. Breakpoints of copy-number alleles near the PAR boundary in wild mice.

38 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

844 Figures

B6 x CAST CAST x B6 20

10 H3K4me3

0

−10

−20 parent B 20 F tag density

10 SSDS 0

−10

−20 168 168.5 169 169.5 170 168 168.5 169 169.5 170

position (Mbp)

Figure S1: Accumulation of meiotic recombination precursors on the extended PAR of CAST/EiJ. Raw density of tags recovered from immunoprecipitation of H3K4me3 (top panels) or DMC1-bound single-stranded DNA fragments (bottom panels) in reciprocal hybrid males between C57BL/6J (B6, black) and CAST/EiJ (CAST, green). Reads were assigned to a parental haplotype on the basis of base calls at informative SNVs; signal on the paternal or maternal haplotype is plotted above or below the x-axis, respectively. Dashed lines indicate boundaries of the CAST/EiJ and canonical PARs.

39 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

A 4.00 normalized read depth normalized 2.00

1.00

0.50 Mid1 exons

0.25 169.95 169.96 169.97 position (Mb)

B 1.0 0.9 FR pairs 0.8 0.7 0.6 0.5 169.95 169.96 169.97 position (Mb) C region_3 breakpoints

region_2

region_1

169.95 169.96 169.97 position (Mb)

D LINE repeats SINE

LTR

Simple_repeat

169.95 169.96 169.97 position (Mb)

Figure S2: Detailed view of the Y-linked duplication identified in PWK/PhJ. (A) Copy number in the distal X chromosome, estimated from read depth in whole-genome sequencing in 200 bp bins and averaged over 4 individuals carrying a PWK/PhJ Y chromosome. Duplicated segments identified by red bars. Red dashed line indicates canonical PAR boundary. Copy number in the canonical PAR is not shown because of noisy read alignment to repetitive sequence. (B) Proportion of read pairs aligning in the expected orientation (FR), binned and average as above. (C) Alignments of de novo assembled contigs spanning duplication breakpoints. Segments of gapped alignments are connected with red bars. (D) Selected classes of repetitive elements annotated in the mm10 reference sequence.

40 bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

PAE_041_HG_Y_B2 PAE_041_HG_Y_B3 PAE_033_HG_E_518 8

4

2 depth

1

0.5

normalized read depth 1 FR

0.5 169.9 169.92 169.94 169.96 169.98169.9 169.92 169.94 169.96 169.98169.9 169.92 169.94 169.96 169.98 position (Mb)

Figure S3: Discordant mate-pair orientation of reads at the boundary of the PWK/PhJ Y-linked duplication. Copy number (black) and proportion of read pairs aligning in the expected orientation (blue) are shown for 3 cases among N2 males: a non-recombinant PAR (PAE_041_HG_Y_B2); a recombinant PAR (PAE_041_HG_Y_B3); and the individual with de novo copy-number loss (PAE_041_HG_E_518). The de novo deletion is accompanied by loss of the discordant read pairs.

41 CC001/Unc CC022/GeniUnc CC032/GeniUnc CC044/Unc CC045/GeniUnc AL16 AL38 AL41 CR16 CR46 The copyright holder for this preprint (which was not was (which preprint this for holder copyright The . 169.96 169.94 42 position (Mb) 169.92 CC-BY 4.0 International license International 4.0 CC-BY a 169.9 this version posted April 20, 2019. 2019. 20, April posted version this 1 8 4 2 1 8 4 2 1 8 4 2 1 8 4 2 1 8 4 2 1 8 4 2 1 8 4 2 8 4 2 1 8 4 2 1 8 4 2 1

;

16 16 16 16 16 16 16 16 16 16 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 normalized read depth read normalized https://doi.org/10.1101/561951 doi: doi: bioRxiv preprint preprint bioRxiv certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under under available is made It in perpetuity. the preprint to display a license bioRxiv granted has who author/funder, is the review) by peer certified Figure S4: Identifying wild micemosome, sharing estimated the from read PWK/PhJ depth Y-linked duplication. inCross whole-genome strains Copy sequencing with number in a in 200 PWK/PhJ the bpfrom Y distal bins, chromosome the for X (CCxxx), males chro- Czech 2 from Republic wild 5blocks Collaborative males (CRxx). identify from Kazakhstan the Red (ALxx) segment and dashedexpected duplicated 2 orientation line wild in on shows males an PWK/PhJ. arbitrary position Blue scale. of lines the show canonical proportion PAR of boundary. read pairs Dark aligning red in the bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

[provided as separate file]

Figure S5: Normalized read depth on distal X chromosome in mice analyzed by whole-genome sequencing in this study. One individual per page, two panels per individual (one at relative broad scale, chrX:168-170 Mb; the other zoomed-in on PAR boundary.) Page title indicates sample type (wild-caught or wild-derived inbred strain), nominal subspecies of origin, country of origin (as ISO 2-letter code), sample ID and chromosomal sex.

43 M.

spretus spretus spicilegus caroli M., spretus XX XY XY XX 170 10 10 10 10 169.99 9 9 9 9 ancestor. (A) Copy number, 8 8 8 8 7 7 7 7 6 6 6 6 5 5 5 5 Mus 169.98 4 4 4 4 position (Mb) The copyright holder for this preprint (which was not was (which preprint this for holder copyright The 169.97 . 3 3 3 3 169.96

2.0 1.0 0.5 8.0 4.0 2.0 1.0 0.5 8.0 4.0 2.0 1.0 0.5 8.0 4.0 2.0 1.0 0.5 8.0 4.0 copy number copy B 44 spretus spretus spicilegus caroli Mid1 XX XY XY XX DXYCbl1. (B) Magnified view of the region just distal to the 170 170 DXYCbl1 CC-BY 4.0 International license International 4.0 CC-BY a this version posted April 20, 2019. 2019. 20, April posted version this ; 169.9 169.9 2 2 2 2 position (Mb) position (Mb) 169.8 169.8 https://doi.org/10.1101/561951 169.7 169.7 doi: doi:

M.. caroli Red dashed line shows nominal PAR boundary in C57BL/6J with relation to exons of 0.5 1.0 0.5 8.0 4.0 2.0 1.0 8.0 4.0 2.0 1.0 0.5 8.0 4.0 2.0 8.0 4.0 2.0 1.0 0.5

16.0 16.0 16.0 16.0 copy number copy A and are numbered in dark blue. bioRxiv preprint preprint bioRxiv certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under under available is made It in perpetuity. the preprint to display a license bioRxiv granted has who author/funder, is the review) by peer certified estimated from whole-genome sequencingspicilegus read depth in 1Mid1. kb Red bins, triangle in showsnominal males position PAR of and boundary. STS females Read marker from Mid1 depth calculated in 100 bp windows. Exons of the longest coding transcript of Figure S6: Defining the ancestral pseudoautosomal boundary in the common bioRxiv preprint doi: https://doi.org/10.1101/561951; this version posted April 20, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

JAX00723372 UNC31596457 2.5 ● ● cross

● FF 2.0 ● ● ● ● ● ● F_ ● ● 1.5 ● _F

y ● __ ●● ● ●● ●● ●●● ● 1.0 ●● ●●● ● ●●●● ●●● ● ●● ● ●● ●● ● sex 0.5 ● ● XX

●●●●●● ●● ● ●● ●●●●●●●●●●●● ●● ●●● ● ●● ● 0.0 ●● ● ● XY 0.0 0.5 1.0 1.5 2.0 2.50.0 0.5 1.0 1.5 2.0 2.5 x

Figure S7: Confirming consistency of genotype calls at two markers in the canonical PAR. Hybridization inten- sities in Cartesian coordinates for 111 inbred and F1 controls are shown, colored by genotype: FF = CASTEi- JxCAST/EiJ; F_ = CAST/EiJ x other; _F = other x CAST/EiJ; __ = other x other. Shapes distinguish males and females.

Illumina auto−call after re−clustering

UNC31596457 UNC31596457

● ● ● ● ● ● ● ● ●●● ● ●●● ● ● ● ● ● ● ●●●● ● ●●●● ●● ●● ●● ●● ●●● ● ●● ●● ● ● ● ● ●●● ● ●● ●●● ● ●● ● ●●● ● ● ● ●● ● ● ● ●●●●●●●● ●● ●●●●●●●● ● ● ●●●●●●●●●●● ● ● ●●●●●●●●●●● ● ● 1.0 ●●●●●● ● 1.0 ●●●●●● ● ●●●●●●●●●●● ● ●●●●●●●●●●●●● ● ●●●●●●●● ● ● ●●●●●● ●● ● ● ●●●●●●●●●●●● ● ● ●●●●●●●●●●●●● ● ● ●●●●●●●●●●● ● ●●●●●●●●●●●● ● ●●●●●●●●●● ● ●●●●●●●●●●●● ● ●●●●●●●●●●● ● ● ●●●●●●●●●●●●● ● ● ●●●●●●●● ● ● ● ●●●●●●●● ● ● ● ●●●●●●●● ●● ● ● ● ●●●●●●●●●● ●● ● ● ● ●●●●●●●●●●● ● ●● ●● ● call ●●●●●●●●●●● ● ●● ●● ● cluster ●●●●●●● ● ● ● ● ●●●●●●●●●● ● ● ● ● ●●●●●●● ● ● ●● ● ●● ●●●●●●● ●● ● ●● ● ● ● ●●●● ●●●●●● ● ● ● ● ●● ●●●●●● ● ● ●●●●● ●●●● ●● ● ●●●●● ●●●● ●● ● ●●●● ●● ● ●●●● ●● ● ●●●● ● ● ● ● ●●●● ●● ● ● ●● ●● ● ● HOM_A1 ● ●● ●● ● ● 1 ●●●●● ● ● ●●●●● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● y ● ●● ● y ● ●● ● ●●● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ●●●●● ● HOM_A2 ● ●●●●● ● 2 ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●● ● ● ● ● 0.5 ● ● ● NA 0.5 ● ● ● 3 ● ● ●● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0 0.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 x x

Figure S8: Revision of genotype calls at array marker UNC31596457. Hybridization intensities in Cartesian coordinates for 111 inbred and F1 controls and 186 N2 progeny are shown, colored by genotype call from Illumina BeadStudio (left) or our Gaussian mixture model (right). Clusters 1 and 3 correspond to non-CAST/EiJ and CAST/EiJ homozygotes, respectively, with heterozygotes in cluster 2.

45 caroli_v1.1 pahari_v1.1 spretus_v1 40 assemblies of out- 30 de novo 20 10 The copyright holder for this preprint (which was not was (which preprint this for holder copyright The relative position in assembly (kb) position in assembly relative . 0

6 5 4 3 2 1 9 8 7 alignment # alignment B 46 caroli_v1.1 pahari_v1.1 spretus_v1 CC-BY 4.0 International license International 4.0 CC-BY a this version posted April 20, 2019. 2019. 20, April posted version this ; 750 500 250 https://doi.org/10.1101/561951 coordinate in contig 'region_3' doi: doi: 0

1 5 4 3 2 7 6 9 8 alignment # alignment A bioRxiv preprint preprint bioRxiv certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under under available is made It in perpetuity. the preprint to display a license bioRxiv granted has who author/funder, is the review) by peer certified group genomes. (A)segments Aligned within segments each relative alignment to arenates region_3. color-coded. are specified (B) Each relative Aligned to gapped segments anA. alignment arbitrary relative offset appears to to outgroup on facilitate genomes. its visual comparison. Coordi- own Segments line; are colored as in panel Figure S9: Alignment of breakpoint contig containing novel sequence (“region_3”) to