G3: Genes|Genomes|Genetics Early Online, published on August 4, 2020 as doi:10.1534/g3.120.401573

Final version, 30 July 2020 1

1 Locating the sex determining region of linkage group 12 of guppy (Poecilia reticulata)

2 Deborah Charlesworth (ORCID iD 0000-0002-3939-9122), Roberta Bergero, Chay Graham 1, Jim 3 Gardner, Lengxob Yong 2

4 Institute of Evolutionary Biology, School of Biological Sciences, 5 , West Mains Road, EH9 3LF, U.K.

6 1 , Department of Biochemistry, Sanger Building, 80 Tennis Ct Rd, 7 Cambridge CB2 1GA, UK

8 2 Centre for Ecology and Conservation, University of Exeter, Penryn, Falmouth, Cornwall, 9 TR10 9FE, UK. 10 11 Keywords 12 Duplication, linkage disequilibrium, genome assembly, partial sex linkage 13 14 Abstract 15 Despite over 100 years of study, the location of the fully sex-linked region of the guppy 16 (Poecilia reticulata) carrying the male-determining locus, and the regions where the XY pair 17 recombine, remain unclear. Previous population genomics studies to determine these 18 regions used small samples from recently bottlenecked captive populations, which increase 19 the false positive rate of associations between individuals’ sexes and SNPs. Using new data 20 from multiple natural populations, we show that a recently proposed candidate for this 21 species’ male-determining gene is probably not completely sex-linked, leaving the maleness 22 factor still unidentified. Variants in the chromosome 12 region carrying the candidate gene 23 sometimes show linkage disequilibrium with the sex-determining factor, but no consistently 24 male-specific variant has yet been found. Our genetic mapping with molecular markers 25 spread across chromosome 12 confirms that this is the guppy XY pair. We describe two 26 families with recombinants between the X and Y chromosomes, which confirm that the 27 male-determining locus is in the region identified by all previous studies, near the terminal 28 pseudo-autosomal region (PAR), which crosses over at a very high rate in males. We correct 29 the PAR marker order, and assign two unplaced scaffolds to the PAR. We also detect a

© The Author(s) 2020. Published by of America. Final version, 30 July 2020 2

30 duplication, with one copy in the male-determining region, explaining signals of sex linkage 31 in a more proximal region.

32 33 Introduction

34 The sex chromosome pair of the guppy have been studied for many decades with the aim of 35 understanding the of recombination suppression between Y and X chromosomes. 36 The earliest genetic studies in the guppy discovered sex linkage of male coloration traits that 37 are polymorphic in guppy natural populations (Winge 1922b; Winge 1922a), and numerous 38 such traits in natural and captive populations have been shown to be transmitted from 39 fathers to sons in this fish, with only a few major coloration factors being autosomal 40 (Haskins and Haskins 1951; Haskins et al. 1961; Lindholm and Breden 2002). Coloration 41 factors show complete or (more rarely) partial sex-linkage disproportionately often, given 42 that this fish has 23 chromosome pairs (Nanda et al. 2014), and the sex chromosome pair 43 represents about 3.8% of the genome (Künstner et al. 2017). This observation strongly 44 suggests that these factors are sexually antagonistic (SA), based on the following reasoning 45 (Bull 1983). SA mutations can occur in autosomal and sex-linked genes, creating conflicts 46 between the sexes. If the conflict is resolved by evolving sex-specific expression, any 47 selection favoring sex linkage is abolished. The observation that guppy male-coloration 48 genes are frequently sex-linked therefore suggests that selection formerly either favored 49 the evolution of sex linkage, or, more likely, that sex linkage favors establishment of 50 polymorphisms for SA mutations. SA effects of male coloration factors are supported by 51 evidence from studies in natural populations of Trinidadian guppies (including Haskins et al. 52 1961; Gordon et al. 2012). It has therefore been thought likely that SA polymorphisms may 53 have been involved in selecting for suppressed recombination between the two members of 54 this chromosome pair (reviewed in Wright et al. 2017; Bergero et al. 2019).

55 Cytogenetic and genetic studies have consistently demonstrated strong genetic sex- 56 determination in all Trinidadian guppy (P. reticulata) samples studied, from various source 57 rivers and domesticated strains (Tripathi et al. 2009; Nanda et al. 2014; Künstner et al. 2017; 58 Wright et al. 2017; Bergero et al. 2019; Dor et al. 2019; Winge 1922a; Winge 1922b) 59 including natural population males (Haskins et al. 1961). Chromosome 12 was later 60 identified as the XY pair (Tripathi et al. 2009; Nanda et al. 2014; Künstner et al. 2017; Wright Final version, 30 July 2020 3

61 et al. 2017; Bergero et al. 2019; Dor et al. 2019), and the same sex chromosome is found in 62 the close relative, the Venezuelan guppy, P. wingei (Nanda et al. 2014; Darolti et al. 2020). 63 XX male or XY female individuals are very rare in guppies (Winge and Ditlevsen 1947). The 64 situation therefore differs from that in the zebrafish, where a genetic sex determinant is 65 detected in wild samples, but not domesticated strains (Howe et al. 2013; Bellott et al. 66 2014).

67 Figure 1 shows a schematic diagram indicating, in general terms, the regions where 68 crossovers are observed on the guppy XY pair. This is an acrocentric chromosome 69 pair, and shows little heteromorphism (Winge 1922b), though differences between the Y 70 and X have been detected between P. wingei strains, and the one captive Trinidadian guppy 71 sample studied differed from P. wingei (Nanda et al. 2014). A sex-linked region occupying 72 considerably less than half of the sex chromosome pair has been identified, based on male- 73 specific heterochromatin and staining by male-specific probes in FISH experiments, and the 74 Y sometimes appears longer than the X (Nanda et al. 1990; Traut and Winking 2001; 75 Lisachov et al. 2015). Using MLH1 focus detection on chromosomes in sperm cells Lisachov 76 et al. (2015) found localization of crossovers at the termini most distant from the 77 centromeres of the guppy XY pair and the autosomes. The chromosomes are acrocentric, 78 and the mean number of MLH1 foci in a testis cell was 23.2 ± 0.5, representing a single 79 crossover on each chromosome on average (Lisachov et al. 2015). Genetic mapping with 80 molecular markers also suggested that crossovers on the sex chromosome pair in male 81 meiosis occur mainly in the terminal region, and rarely nearer the centromere (Tripathi et 82 al. 2009); the crossover events were not localized physically, but 12 out of 13 events in 83 males occurred in the terminal one-sixth of the chromosome, whereas all 15 events in 84 meiosis of the females were in the 4 centromere-proximal sixths (a significant difference by 85 Fisher’s Exact test, P = 0.0001). Lisachov et al. (2015) also observed crossover events in male 86 meiosis in a terminal part of the XY pair just centromere-proximal to the male-specific LG12 87 region mentioned above, which probably includes the sex-determining locus (Figure 1; the 88 size of this region is not yet certain, as explained below). It is also not known whether the 89 changes between regions with different recombination rates are gradual or represent sharp 90 transitions, like those at some mammalian PAR boundaries (Marais and Galtier 2003; Van 91 Laere et al. 2008). Final version, 30 July 2020 4

92 93 Figure 1 about here

94 However, although population genomic studies confirm that chromosome 12 is the XY pair, 95 they do not identify an extensive fully sex-linked region (Bergero et al. 2019; Darolti et al. 96 2020). Instead, variants across much of the chromosome show weak associations with the 97 sex-determining region (Wright et al. 2017; Bergero et al. 2019). There is evidence for 98 migration of sequence variants between the X and Y chromosomes in the large centromere- 99 proximal region, strongly suggesting that rare recombination occurs in this region, as well as 100 in the terminal pseudo-autosomal region, labelled PAR in Figure 1, where most crossovers 101 occur (Bergero et al. 2019; Darolti et al. 2020). If recombination occurs in male meiosis in 102 addition to the high rate of crossovers in the terminal PAR, this will prevent fixation of male- 103 specific variants in the recombining regions, and can account for the low differentiation and 104 low divergence between the sexes in the guppy, across most of chromosome 12 observed in 105 both those studies, and also by (Darolti et al. 2020) and in the present study (see below). 106 Figure 1 therefore indicates the possibility of rare crossover events in the region proximal to 107 the sex-determining region.

108 Specifically, the figure indicates a region where (Nanda et al. 1990; Lisachov et al. 2015) 109 detected such events cytologically, although their analysis could not precisely determine its 110 size or location. Clearly, the occurrence of even rare crossovers, in even a limited region 111 proximal to the sex-determining region, allow sequence variants to move between the X 112 and Y chromosomes; differentiation and divergence will thus establish an equilibrium low 113 level, rather than increasing to ever higher levels over evolutionary time. The guppy XY pair 114 therefore differs from XY pairs of many well-studied organisms. In humans, birds and some 115 plants, divergence at synonymous sites (reflecting the number of generations of isolation of 116 the Y from the X) is much higher than the value of about 1-2% estimated in the guppy 117 (Darolti et al. 2020), and clearly distinct divergence levels , called “evolutionary strata”, are 118 detectable (Lahn and Page 1999; Skaletsky et al. 2003; Zhou et al. 2014).). The oldest 119 stratum must carry the sex-determining locus, and, as expected, is found in different 120 mammals (Cortez et al. 2014), birds (Wang et al. 2014) and snakes (Schield et al. 2019; 121 Schield et al. 2020) that share the same sex-determining locus, while the younger strata are 122 inferred to have evolved later. The guppy may have a “stratum” that includes its sex- Final version, 30 July 2020 5

123 determining locus, and this probably corresponds to the male-specific region described 124 above. However, the evidence just described that its XY pair occasionally recombine makes 125 it unlikely that any younger strata have evolved. Even rare X-Y recombination will prevent 126 the establishment of diverged Y-linked strata carrying Y-specific variants; even if crossovers 127 occur only in a restricted region, any diverged Y sequences will recombine in female meiosis 128 after they have crossed over into a female.

129 Further family studies are, however, needed to refine our understanding of the guppy sex- 130 determining region, and to determine whether this is an extensive, multi-gene fully sex- 131 linked region or stratum, or a small region, potentially a single gene like the insertion of the 132 maleness factor in the medaka, Oryzias latipes (Kondo et al. 2009), or even a single SNP, as 133 has been inferred in another fish, the tiger pufferfish or Fugu, Takifugu rubripes (Kamiya et 134 al. 2012). Our study advances understanding of the guppy sex-determining region’s location, 135 and of the regions that recombine between the XY pair.

136 It was recently reported that a candidate for the sex-determining region has been mapped 137 on linkage group 12 of the guppy (Dor et al. 2019). Genotypes of a microsatellite marker 138 named gu1066 within the Stomatin-like 2 (STOML2) gene, were found to be associated with 139 the sexes of fish from two domesticated guppy strains, Red Blonde and Flame. This marker 140 is located at 25,311,467 bp in the published assembly of the guppy LG12 (Künstner et al. 141 2017), in the region where the cytogenetic results just outlined suggest the presence of the 142 sex-determining locus. These authors also identified a candidate for the guppy sex- 143 determining gene within a 0.9 Mbp region near this marker. Of 27 annotated genes in this 144 region, the GADD45G-like gene (at 24,968,682 bp in the LG12 assembly) was identified as a 145 good candidate for the guppy sex-determining gene, or at least for having with a role in 146 male fertility, although the sequence did not include any male-specific alleles and its 147 expression was not sex-biased (Dor et al. 2019).

148 Here, we show that the gu1066 microsatellite does not have male-specific alleles in natural 149 populations of Trinidadian guppies, and that a polymorphic SNP (single nucleotide 150 polymorphism) in the candidate gene has no male-specific allele. We also describe new 151 mapping results using multiple microsatellite and SNP genetic markers that that revealed Y- 152 X recombinants that provide the first direct genetic evidence that the location of the sex- Final version, 30 July 2020 6

153 determining locus is within the region suggested by the previous studies cited above, 154 including Dor et al. (2019). 155 156 Methods 157 Fish samples, DNA extraction

158 Supplementary Table S1 describes the 14 natural populations samples analysed here, and 159 the sources of the fish used for genetic mapping. These were sampled from natural 160 populations in Trinidad in February 2017, and most individuals were photographed as live 161 specimens in the field before being killed and preserved. Live fish from the same natural 162 populations were also transported to the , and maintained at the University 163 of Exeter, Falmouth, for genetic studies (see below).

164 Genomic DNA for microsatellite genotyping, genotyping of intron length variants, and for 165 high-throughput genotyping (see below), was extracted using the Echolution Tissue DNA Kit 166 (BioEcho, Germany). Microsatellite markers ascertained from the guppy complete genome 167 sequence assembly were genotyped as described previously (Bergero et al. 2019). The 168 primers are listed in Supplementary Table S2, along with the primers for the gu1066 169 microsatellite.

170 Genetic mapping and high-throughput genotyping

171 Because guppy females breed best when kept together with other individuals, some of the 172 “families” whose genotypes are reported here were generated from multiple parental 173 individuals, and the parents of five of the six full sibships described below were 174 distinguished using microsatellite markers. The QHPG5 family was generated from two 175 females housed with one male, and the PMLPB2 family from three females and two males; 176 only the LAH family involved a single male and single female.

177 Using sequences from the guppy female whole genome sequence assembly (Künstner et al. 178 2017), we identified genic sequences found in all 16 Trinidadian guppy individuals sampled 179 from a natural population (10 males and 6 females), and ascertained SNPs across LG12 from 180 our own resequencing of these individuals (Bergero et al. 2019). These SNPs were targetted 181 for high-throughput genotyping (SeqSNP) experiments with genomic DNA extracted as 182 described above. The experiments were carried out by LGC Genomics (LGC Genomics Final version, 30 July 2020 7

183 GmbH, Ostendstraße 25, 12459 Berlin, Germany, www.lgcgroup.com/genomics). The 184 targetted SNPs were selected from within coding sequences, with the criterion that about 185 50 bp of sequence flanking each such SNP should also be coding sequence, in order to 186 maximise the chance that the sequence would amplify in diverse populations, and to 187 minimise the representation of SNPs in repetitive sequences. To further avoid repetitive 188 sequences, the SNPs were chosen to avoid ones whose frequencies in the ascertainment 189 sample were 0.5 in both sexes. The SNPs and their locations in the guppy genome assembly 190 are listed in a Supplementary Table S10, together with some features of the mapping 191 results. As expected, the primers worked well for most targeted sequences.

192 Genetic mapping was done as described previously, using microsatellites ascertained from 193 the guppy complete genome sequence assembly, as well as these SNPs. The set of SNPs 194 included one within the candidate gene, GADD45G-like, identified by (Dor et al. 2019); the 195 gene is also named LOC103474023, and starts at position 24,968,682 in the guppy LG12 196 assembly (Künstner et al. 2017).

197

198 Estimation of FST values between the sexes

199 FST values of individual SNPs between males and females were estimated separately for 200 each population. The sites with variants differed between the samples from the different 201 natural population samples listed in Supplementary Table S1. For each site that varied in a

202 population, the FST value was computed following (Hudson et al. 1992), to quantify the 203 proportion of the total diversity in a sample of sequences that is found between the

204 samples of males and females from each of the natural populations. The FST value expected 205 for a fully sex-linked SNP (with all males heterozygous, and all females homozygous for one 206 of these variants) depends on the number of individuals of each sex that were sampled 207 (Bergero et al. 2019; Darolti et al. 2020); with the samples listed in Supplementary Table S1, 208 this value is about 1/3rd. 209 210 Analyses to search for potentially sex-linked genes in guppy unplaced scaffolds

211 In order to try and include the whole of the guppy sex chromosome, we also identified two 212 unplaced scaffolds, NW_007615023.1 and NW_007615031.1, that are located near one end Final version, 30 July 2020 8

213 of the assembly of the Xiphophorus maculatus (platyfish) chromosome, Xm8, which is 214 homologous to the guppy LG12 (Amores et al. 2014). The platyfish is a close relative of the 215 guppy (Pollux et al. 2014), with a mean synonymous site divergence of around 10%; for 216 comparison, this is similar to the value for Drosophila melanogaster and D. simulans 217 (Tamura et al. 2004).

218 To search for such orthologues, we did reciprocal-best hit BLAT searches using as query the 219 coding sequences (cds) of genes assembled on chromosome 8 of the platyfish (the 220 homologue of the guppy sex chromosome pair) as an initial query against all cds found in 221 the guppy. This identified two unplaced scaffolds that include multiple genes (the complete 222 results are shown in the file “Copy 223 of_Unlocalised_Contigs_UNLOCALISED_Scaffolds_LG12_P8_enriched” deposited in Dryad). 224 NW_007615031.1 has a total length of ~325 kb, and is ~ 5-8 kb from the start of the Xm8 225 assembly (which corresponds with the non-centromere end of the guppy LG12 assembly). 226 This is annotated with 32 genes. NW_007615023.1 is a ~490 kb scaffold that is assembled 227 1.4 -1.6 Mb from the Xm8 start, and is annotated with 22 genes and a pseudogene (23 228 genes in Ensembl). We identified and mapped microsatellites within these scaffolds.

229

230 Reagent, software, and data availability

231 Supplementary files are available in figshare, and further files with more detail will be 232 deposited in Dryad. 233 234 Results 235 No complete association with maleness for markers in the GADD45G-like gene region

236 We genotyped the gu1066 microsatellite in three natural high predation populations of 237 Trinidadian guppies. 16 individuals (32 alleles) of each sex were sampled from each 238 population, and a total of 29 different alleles were observed, of which 11 were shared 239 between two or more population samples (Supplementary Tables S3 and S4). None of the 240 alleles showed any association with the sexes of the individuals, and no alleles, other than 241 occasional rare alleles (mostly singletons), were male-specific in any of the populations 242 sampled. No results consistent with complete Y-linkage were seen. For example, allele 264 Final version, 30 July 2020 9

243 was seen only in 4 males from the Guanapo population, and one male from the Yarra 244 population, but it is homozygous in one of the Guanapo males, showing that it can be 245 carried on the X as well as the Y chromosome. Our results from these natural populations 246 show that this region is not strongly associated with the sex-determining locus of the guppy, 247 and suggest that the observed association is most likely due to use of a captive sample.

248 A polymorphic SNP in the candidate gene identified by Dor et al. (2019) is, however, 249 associated with maleness, although no allele is fully male-specific. We genotyped this SNP in 250 14 natural populations, and it was polymorphic in six of them. Table 1 shows the results 251 from these six populations pooled (the full details, including the genotypes, are in 252 Supplementary Table S5). A Fisher's exact test shows that, overall, males are heterozygous 253 significantly more often than females (P = 0.0004). However, only 17% of the males 254 genotyped are heterozygotes (Table 1). Possible interpretations of these findings are in the 255 Discussion section below. 256 257 Table 1 about here 258

259 Analysis of FST values between the sexes in the same natural population samples. 260 14 of the natural populations listed in Supplementary Table S1 were genotyped for LG12 261 SNPs to test for fully sex-linked genotype configurations. Figure 2 shows that, although 262 associations are detected by our analysis, they are not complete. In these natural 263 populations, no region consistently has numerous SNPs showing differentiation between

264 the sexes near the FST value of around 0.3 expected for fully sex-linked variants; 265 furthermore, both heterozygotes and homozygotes are found in both sexes. This includes 266 the apparently fully sex-linked SNPs previously described in a captive population (Bergero et

267 al. 2019). Similarly to our previous results, FST values between the sexes are often highest in 268 the region proximal to 5 Mb in the published female assembly of the chromosome, and, to a 269 lesser extent, across the region distal to 20 Mb, where the sex-determining locus is thought 270 to be located (see Figure 1). Similar associations with the sexes scattered across LG12 were 271 found in genome sequences of samples of 10 fish of each sex from natural populations from 272 the Yarra, Quare and Aripo rivers (Almeida et al. 2020). However, as discussed below, 273 genetic mapping and other results show that the signals of associations with the sex- Final version, 30 July 2020 10

274 determining locus seen in the centromere-proximal region are not evidence that the sex- 275 determining locus is located in a proximal part of LG12. 276 277 Figure 2 about here 278 279 Genetic mapping in families to detect the PAR boundary and X-Y recombination events

280 Both the male and female genetic maps are similar in the different populations, which 281 include all three drainages in Trinidad. Supplementary Figure S1 shows genetic mapping 282 results for the families reported previously (Bergero et al. 2019), some with more markers, 283 plus some other families newly mapped here, whose source populations are listed in 284 Supplementary Table S1. As for the families studied previously, almost all crossovers in male 285 meiosis were within the physically small terminal LG12 region pseudo-autosomal region.

286 Out of 721 individuals of known sex that we have genotyped for genetic mapping in eleven 287 P. reticulata families (Supplementary Table S1), only two recombinants were detected in 288 any LG12 region other than the PAR. One male in family QHPG5 was a recombinant, and 289 one female in family PMLPB2. The dams and sires of both families were sampled alive from 290 natural populations in Trinidad and the progeny were generated in the United Kingdom (see 291 Methods).

292 The PMLPB2 family (from the Petit Marianne river, in the Northern drainage of the Trinidad 293 Northern mountain range, see Willing et al. 2010) varied at seven LG12 microsatellite 294 markers (Supplementary Table S6); out of 69 female and 68 male progeny genotyped, the 295 recombinant female (PMLPB2_f23r) carries the sire’s Y-linked alleles at the 4 more proximal 296 LG12 markers, located from 1.2 to 11.7 Mb in the published assembly, but his X-linked 297 alleles at the 3 distal markers, including a marker at 21.3 Mb. The sex-determining locus 298 cannot therefore be proximal to 11.7 Mb Mb. Unfortunately, no variable marker was found 299 in the sire of this family between 11.7 and 21.3 Mb.

300 The QHPG5 family, however, establishes that the male-determining factor must be distal to 301 the 21.3 Mb marker. This family’s dam and sire were sampled from a natural population in 302 the Quare river, in the Atlantic drainage (Willing et al. 2010), and the genetic mapping 303 results are shown in Supplementary Table S7. The QHPG5 parents and the small number of 304 progeny (in total, from two full sibships with different dams; sibship 1 included 13 progeny, Final version, 30 July 2020 11

305 and sibship 2 had 10 progeny). The genotypes of LG12 microsatellites yielded one 306 recombinant male (male 7, in sibship 2, labelled QHPG5m07r in Supplementary Table S7). 307 This male inherited his father’s X-linked markers for markers in the centromere-proximal 308 part of the chromosome, but Y-linked alleles for markers from about 21 Mb (Figure 3). To 309 define the crossover breakpoint in male 7, the fish in this sibship (and in the QHPG5 sibship 310 1) were also genotyped for SNPs throughout LG12, using high-throughput genotyping (see 311 Methods and Supplementary Table S7, which shows results for all markers in both sibships).

312 Figure 3 about here

313 Before describing these results in more detail, we first discuss the PAR boundary, which 314 must be distal to the sex-determining locus. This boundary was defined in the QHPG5 family 315 based on the most distal gene showing complete co-segregation with sex among all the 316 progeny other than male 7 in both QHPG5 family sibships; this was an SNP at position 317 25,998,942 bp in the LG12 assembly, whose total size is estimated to be 26.44 Mb (Künstner 318 et al. 2017). In sibship 1, 137 LG12 markers informative in male meiosis show complete sex- 319 linkage, while 10 markers from the terminal part of the assembly of this chromosome are 320 partially sex-linked and can definitively be assigned to the small PAR, consistent with 321 previously published results for this family and for seven other guppy families studied 322 (Bergero et al. 2019). A slightly more centromere-proximal boundary, 25,194,513 bp, is 323 found in two other, larger families, the LAH family previously described, from a high- 324 predation site in the Aripo river, with 42 progeny (see Supplementary Table S1 and (Bergero 325 et al. 2019), and family ALP2B2, from an Aripo low-predation site, with 136 progeny, see 326 Supplementary Tables S8 and S9).

327 We next defined the proximal boundary of the crossover event in the QHPG5 family sibship 328 2, which includes the recombinant male 7. This sibship has the same male parent as sibship 329 1, but a different dam. Figure 3 shows results for both types of markers in sibship 2, which 330 yielded evidence confirming the crossover event suggested by the microsatellite marker 331 results described above. Despite sibship 2’s small size, the crossover is confirmed by twenty 332 markers (16 SNPs and 4 microsatellites), although the data also reveal likely assembly 333 errors, which we discuss in the next section.

334 In sibship 2, 141 markers appear to be sex-linked, with all female progeny inheriting the 335 same paternal allele, which is therefore the sire’s X-linked allele, and three males all Final version, 30 July 2020 12

336 inheriting the other, Y-linked, paternal allele. However male number 7 is recombinant. He 337 inherited the paternal X-linked allele for the first 111 markers (with 12 exceptions, discussed 338 below), but the paternal Y-linked allele for 21 markers informative in male meiosis; 14 of 339 these markers are in positions from 21,049,596 to 23,293,233, and seven from 24,829,827 340 to 25,998,942 bp. The 10 terminal markers appear to be partially sex-linked, as in the other 341 families. The male-determining region must therefore be distal to position 21,049,596 in the 342 assembly, and centromere-proximal to the position of the first PAR marker.

343 The two recombinant individuals suggest that 2/721 = 0.28% of meiotic products in males 344 had crossovers outside the pseudo-autosomal, recombining region. This rate is similar, 345 within the limitations of our modest sibship sizes, to the values obtained previously for 346 recombination between the male-determining factor and the Sb coloration factor in a 347 sample of males from the Aripo river; males from a high-predation site yielded a rate of 348 0.09%, versus 1.3% for low-predation site males (Haskins et al. 1961). 349 350 Detection and correction of assembly errors in LG12: the PAR

351 We next discuss information from which we can infer the correct order of some of the 352 markers, and determine which PAR marker is the most proximal of those so far mapped. 353 Crossing-over occurs at a high rate in the physically small guppy PAR (Bergero et al. 2019), 354 allowing us to order the markers by genetic mapping of the region. We also mapped 355 microsatellites within two unplaced scaffolds that are located near one end of the assembly 356 of the homologous chromosome in the closely related fish, X. maculatus (see Methods), 357 revealing that they are part of the guppy PAR. The most proximal PAR marker so far 358 mapped is in the unplaced scaffold, NW_007615023.1, while the NW_007615031.1 scaffold 359 maps terminal to the microsatellite markers mapped previously.

360 We were able to order the microsatellite PAR markers because some of them were mapped 361 in several different families, and showed consistent ordering that differed from the guppy 362 assembly, suggesting that this region includes several assembly errors (Figure 4). These 363 errors make it impossible to relate the genetic to the physical map, and therefore the figure 364 simply orders the markers according to their genetic map positions. Based on the total size 365 in the published assembly, plus the two unplaced scaffolds, totalling about 815 kb, that we 366 inferred from the X. maculatus assembly and mapped to the guppy PAR (see above), our Final version, 30 July 2020 13

367 results do not change the conclusion that the guppy PAR cannot be larger than a couple of 368 megabases (Bergero et al. 2019). Assuming that consistent high intronic high GC content of 369 a region reflects a high recombination rate that causes GC-biased gene conversion (see 370 Charlesworth et al. 2020), the uniformly high GC content (Supplementary Figure S2) of the 371 scaffold that our map reveals to be terminal (Figure 4) suggests that recombination rates 372 are uniformly high throughout the guppy PAR, rather than being clustered into parts of the 373 PAR with very high crossover rates. 374 375 Figure 4 about here 376 377 Correction of assembly errors in the non-PAR part of LG12

378 Genetic mapping cannot order markers in genome regions where crossovers do not occur, 379 or occur very rarely. In guppy male meiosis, markers in the LG12 centromere-proximal 380 region of at least 24 Mb generally co-segregate perfectly with the sex-determining locus 381 (Supplementary Figure S1), making it difficult to map variants within this region. However, if 382 crossovers are infrequent, then errors in the ordering of markers can be detected in rare 383 recombinant individuals, because co-segregation of such markers with others can reveal 384 their true physical locations. Our sibship that includes the recombinant male suggests 385 several discrepancies in the order of sequences within the proximal region of at least 24 Mb 386 of the guppy LG12 that largely co-segregates with the sex-determining locus (see 387 Supplementary Table S7).

388 The region just proximal to the PAR includes two sets of markers in the recombinant male 389 suggesting assembly errors (Figure 3, Table 2 and Supplementary Table S7). We first discuss 390 nine SNP markers informative in the sire of this sibship labelled Discrepancy 1 in Table 2. 391 These are in LG12 assembly positions between 24,166,053 and 24,508,654 bp (a span of 392 342,601 bp, including SNPs in four separate genes mapped in this sibship). However, they 393 clearly appear to belong more proximally than the surrounding markers, as their 394 segregation patterns in the recombinant male are identical to those for 90 of the 93 more 395 proximal informative markers. The only exceptions to this pattern are three SNPs at 396 positions between 2,952,176 and 2,953,254 (Discrepancy 2 in Table 2). These three were 397 included in our genotyping experiment because their genotypes in a set of 10 males and 6 Final version, 30 July 2020 14

398 females whose complete genomes we sequenced (see Methods) suggested complete sex- 399 linkage (Bergero et al. 2019). Their genotypes in the recombinant QHPG5 family male clearly 400 indicate that they are located terminal to the recombination breakpoint that is supported by 401 the 20 markers (16 SNPs and 4 microsatellites mentioned above), where the recombinant 402 male inherited X-linked alleles. They cannot be located more precisely, because markers in 403 this region co-segregate in this small sibship.

404

405 Table 2 about here

406 407 To further understand the gene order differences between the high-throughput genetic map 408 results for the QHPG5 family and the guppy LG12 assembly, we searched the assembly of the 409 homologous platyfish chromosome (chromosome 8) to find the positions of the four genes with the 410 9 SNPs that constitute Discrepancy 1 in Table 2 and Figure 3. This reveals that two of these are 411 within a region that is assigned a slightly more proximal location in the platyfish assembly (near 15 412 Mb). This is near the breakpoint (at 15,847,876 bp) of a small region whose order in the genetic 413 maps of all guppy families with informative markers is inverted relative to the published assembly 414 (see Supplementary Table S10). We conclude that the genetic map arrangement based on the 415 guppy QHPG5 family (with these SNPs more proximal than in the guppy assembly) is probably 416 correct, and that these genes should be re-assigned to a slightly more proximal location. The 417 comparison with the platyfish chromosome 8 assembly also indicates that a set of guppy genes that 418 is assembled in megabase 10-11 correspond to ones that are at the terminus of the platyfish 419 chromosome. The AC162 microsatellite marker was based on sequence in this region, but its 420 genetic map location is around 30 cM in female meiosis in the ALP2B2 family, consistent with a 421 location more distal than 11 Mb; this marker has so far been mapped only in this family.

422 Our families also support the suggestion in Table 2 that the markers in the LG12 region near 423 3 Mb labelled Discrepancy 2 are located more distally. The evidence for this conclusion is 424 that the segregation patterns of SNPs that are heterozygous in female meiosis in the dams 425 of three sibships match those of distal markers in the families: SNPs with similar segregation 426 patterns to those of these three SNPs are all distal to position 20,516,823 bp in the LAH 427 family, to 22,002,472 bp in two informative ALP2-B2 family sibships, and to 21,344,716 bp in 428 the QHPG5 family sibship 1. The segregation pattern observed is illustrated in Table 3, Final version, 30 July 2020 15

429 together with our interpretation that there is a duplication, which can explain why all males 430 appear to be heterozygous. Furthermore, we examined the segregation patterns of these 431 SNPs in female meiosis in three sibships whose female parents are also heterozygous. The 432 results show that the variants genotyped are located among markers that are close to the 433 PAR boundary, very distant from their assembly positions (Supplementary Tables S8 and S9 434 show the LAH family and the two informative ALP2B2 sibships, respectively; the same can 435 be seen in the smaller QHPG5 family sibship1 in Supplementary Table S7).

436

437 Table 3 about here

438 439 Discussion and Conclusions

440 Dor et al. (2019) suggested the “growth arrest and DNA damage inducible gamma-like” (GADD45G- 441 like) as a plausible “candidate gene for its role in male fertility”, despite finding “no sex difference 442 in either the genomic sequence or gene expression”. However, several of their observations are 443 difficult to reconcile with this conclusion. First, they state that “the male-specific allele was 444 identified in only 85% of males, but not in females”, and that their family D suggested a possible 445 environmental effect of elevated temperature leading to sex reversal, or recombination that 446 produces inviable recombinant females. Furthermore, the gu1066 marker identified 97 females as 447 genetically XY, suggesting that sex reversals occur; when mated with normal XY males, these should 448 yield 25% females, but only 19% of the progeny had ovaries, and only four live fingerlings were 449 produced (by one sex reversed female), three XY and one XX. We detected sex-linkage of other 450 SNPs in this gene in the QHPG5 and ALP2B2 families, but this shows only that this gene is on LG12 451 and not physically distant enough from the sex-determining locus to recombine frequently.

452 Our analysis of genotypes in natural guppy populations (where recombination in past 453 generations can distinguish between complete and partial sex-linkage) leaves it uncertain 454 whether or not the gene Dor et al. (2019) propose as the guppy sex-determining candidate 455 gene is fully sex-linked. A variant in this gene is the first that has been found to be 456 associated with the male phenotype in natural population samples large enough to exclude 457 chance associations (our samples mostly included at least 24 fish, or 48 alleles, per 458 population, see Table 1), and the first association detected in multiple guppy populations, as Final version, 30 July 2020 16

459 well as in the strains studied by Dor et al. (2019). We detected the same variant in males 460 from six natural populations, derived from both the Caroni and Atlantic drainages (though 461 the variant was not detected in the Northern drainage, Supplementary Table S5). In all six 462 populations where the variant was present, it was found only in male individuals, and was 463 absent in all females (Table 1). All males with the variant were heterozygotes; this is 464 consistent with sex-linkage, but does not provide strong support, because the variant was 465 rare among males in most populations (Table 1), and its overall mean frequency was only 466 8%. The association is therefore incomplete. We next discuss possible reasons for an 467 incomplete association.

468 A first possibility is that the variant could have arisen in a fully sex-linked gene (including in 469 the sex-determining gene itself) so recently that it is still segregating in some populations, 470 and has not yet spread to the Northern drainage populations. However, a very recent 471 mutation seems unlikely, given that it is found in two of the three river drainages, which are 472 quite isolated, based on evidence of strong differentiation at microsatellite and other 473 putatively neutral markers (e.g. Suk and Neff 2009; Willing et al. 2010; Fraser et al. 2015). 474 This variant has therefore probably been maintained for a considerable evolutionary time.

475 A second possibility is that the variant might be fully sex-linked, but is absent from some 476 males because Y-linked region is organized into several haplotypes, and that the variant is 477 present on one haplotype, but not others. This could also account for the repeated failure to 478 discover fully Y-linked variants in the guppy (see Introduction). None of the variants will 479 then exhibit the genotype configuration expected under sex linkage, but different variable 480 sites will be heterozygous in males with different haplotypes; moreover, each variable site 481 could share the same variant as the X in certain haplotypes, producing male homozygotes, 482 as observed for the variant we genotyped. At the sex-determining gene itself, only Y-and X 483 haplotypes are expected, but multiple haplotypes could be found across a more extensive 484 region, given that Y-linked male coloration polymorphisms are maintained in guppy 485 populations (see Introduction). However, our current data do not support this scenario, as, 486 of 17 other markers in the region from 24.16Mb to 25.32M, including the gu1066 487 microsatellite, none have variants consistently associated with maleness.

488 We also cannot currently exclude a third alternative, that the association reflects a variant 489 in a partially sex-linked gene that recombines rarely with the sex-determining gene. Given Final version, 30 July 2020 17

490 the evidence for incomplete associations of variants with the sexes of individuals recently 491 reported for many parts of the guppy sex chromosome pair (Wright et al. 2017; Bergero et 492 al. 2019; Almeida et al. 2020), this alternative remains plausible, especially as the variant 493 shown in Table 1 is assembled close to the genetically determined PAR boundary. Currently, 494 analyses of the guppy sex-determining gene are hampered by the absence of a genome 495 sequence assembled from a male. If the Y-linked region include sequences that are missing 496 from the female genome, these will be mapped to homologous sequences elsewhere in the 497 genome, incorrectly suggesting sex-linkage in other genome regions. Our genetic analysis 498 indeed detected such a situation, which we named Discrepancy 2. However, the guppy Y- 499 linked region need not carry genes that are missing from the X chromosome, as other 500 explanations are possible for the discrepancy we detected between the locations (near 3 501 Mb) of three sites whose variants had sex-linked genotype configurations in one sample of 502 the two sexes, and the location where they map genetically (near 20 Mb).

503 Dor et al. (2019) concluded that the guppy sex-determining region is a region of 1.26 Mb 504 that is duplicated between LG 9 and 12 and includes 59 LG12 genes, 17 of which had 505 multiple copies; 8 of these had copies on LG 9 as well as 12, while 9 had multiple copies on 506 LG12. The duplicated region occupies 0.43 Mb at about 25 Mb on LG12, and 6 of the 11 507 genes in the region have multiple copies. We too found a marker that had been assigned to 508 LG9 (the microsatellite GT443) but behaves as fully sex-linked in the male parent of one of 509 our mapping families, LAH (Bergero et al. 2019). Its LG9 location is 8.0 Mb, not close to the 510 LG9 region at 17 Mb identified by Dor et al. (2019). Thus the LG9 copy could represent mis- 511 annotation rather than the duplication Dor et al. (2019) identified. None of our LG9 markers 512 that are informative in male meiosis show sex-linkage in our families (data not shown).

513 Genes that have moved onto the guppy LG12 since the divergence of the guppy from the platyfish 514 should be found on platyfish chromosomes other than Xm8, the homologue of the guppy LG12. We 515 therefore also examined genes assembled on the platyfish chromosome homologous to the guppy 516 LG9 (see Methods). These genes are mostly carried on Xm12, with a few on chromosome 11 517 (Supplementary Figure S3A). We inspected the plots for these chromosomes, to determine which 518 guppy chromosomes carry them, and whether some of them are LG12 genes in the guppy. Some 519 platyfish chromosome 11 genes are detected on guppy LG9, but most of them are on LG14 Final version, 30 July 2020 18

520 (Supplementary Figure S3C). Platyfish chromosome 12 genes are therefore most relevant for asking 521 whether they form a duplicated region on guppy LG12.

522 Xm12 genes are (as expected) mostly found on guppy LG9, with a few on unplaced scaffolds, but 523 some are indeed detected on LG12 (Supplementary Figure S2). Six genes, at 6.7, 8.7, 17.0, 18.0, 524 18.1, and 19.2 Mb of LG12 in the female guppy assembly, are found within a ~500kb region 525 between 1 to 1.5 Mb of Xm12, intermingled with twenty genes that map to a P. reticulata unplaced 526 scaffold, NW_007615028.1 (Supplementary Figure S5). Of the 10 other genes in 527 this unplaced scaffold, 8 have no reciprocal best hits, but two are on Xm8 (as expected for LG12 528 genes; these are at 6.8 Mb and 19.3 Mb of Xm8). This unplaced scaffold may therefore be part of 529 the guppy chromosome 12. If so, the 26 Xm12 genes may have moved to LG12 from the 530 ancestral LG9/Xm12 chromosome in a single event, followed by rearrangements that led the six 531 genes in the current LG12 assembly to be in two distinct locations (Supplementary Figure S5). There 532 is no compelling correspondence with the region identified by Dor et al. (2019). The gene order in 533 the centromere-proximal 20 Mb of the 26.4 Mb guppy LG12 is very similar to that in the 534 homologous platyfish chromosome (Xm8, see Supplementary Figure S4). The large inverted region 535 is an assembly error, as the guppy genetic map supports a gene order in agreement with the 536 platyfish assembly (Supplementary Figure S2, unpublished results of B. Fraser et al. 2020, and 537 Supplementary Fig 1 of Darolti et al. (2020). Therefore, the location for this set of genes must be 538 distal to the repeat region just after 20 Mb that is prominently visible in Supplementary Figure S4. 539 The figure also shows that the LG12 assembly terminal to this region is either rearranged, relative 540 to the platyfish homologue, or, in parts, remains uncertain.

541 Overall, however, the results described here suggest that, even if linkage is partial, the region of the 542 region identified by Dor et al. (2019) may be very closely linked to the sex-determining locus. Our 543 new results support all previous results, including those from the captive material studied by Dor et 544 al. (2019), suggesting that the guppy male-determining factor is located within a region between 545 about 21 and 25 Mb (in the current assembly). We describe the first genetic data based on 546 localising X-Y crossovers within families. The recombinant male individual indicates that the male- 547 determining factor is indeed located distal to 21 Mb. The recombinant female indicates a location 548 distal to 11.7 Mb, but we lack markers to define the location precisely. Moreover, the 11.7 Mb 549 region in the female assembly is near a breakpoint of the inverted region, whose other breakpoint 550 is near 20 Mb (see above). Testing whether this marker’s actual location is more distal will require Final version, 30 July 2020 19

551 an improved assembly. Our results also define the PAR boundary at about 25 Mb (though assembly 552 errors make it difficult to define its boundary or size precisely). Given these problems, and the 553 rarity of recombinant progeny, it remains difficult to define the male-determining locus more 554 accurately. If a more reliable assembly can be generated by long-read sequencing, the rarity of 555 recombination events may still make it impossible to define the locus genetically, though this 556 approach should narrow the region down enough to suggest candidate genes for further testing. 557 558 Acknowledgements: This project was supported by ERC grant number 695225 (GUPPYSEX). 559 We thank Bonnie Fraser (University of Exeter) for discussions about the large inversion on 560 the guppy LG12.

561

562 References 563 Almeida, P., B. A. Sandkam, J. Morris, I. Darolti, F. Breden et al., 2020 Divergence and remarkable 564 diversity of the y chromosome in guppies. BioRxiv 10.1101/2020.07.13.200196. 565 Amores, A., J. Catchen, I. Nanda, W. Warren, R. Walter et al., 2014 A RAD-Tag genetic map 566 for the Platyfish (Xiphophorus maculatus) reveals mechanisms of karyotype 567 evolution among Teleost fish. Genetics 197: 625-641. 568 Bellott, D., J. F. Hughes, H. Skaletsky, L. G. Brown, T. Pyntikova et al., 2014 Mammalian Y 569 chromosomes retain widely expressed dosage-sensitive regulators Nature 508: 494– 570 499. 571 Bergero, R., J. Gardner, B. Bader, L. Yong and D. Charlesworth, 2019 Exaggerated 572 heterochiasmy in a fish with sex-linked male coloration polymorphisms. Proceedings 573 of the National Academy of Sciences of the United States of America 116: 6924- 574 6931. 575 Bull, J. J., 1983 Evolution of Sex Determining Mechanisms. Benjamin/Cummings, Menlo Park, 576 CA. 577 Charlesworth, D., Y. Zhang, R. Bergero, C. Graham, J. Gardner et al., 2020. Using GC content to 578 compare recombination patterns on the sex chromosomes and autosomes of the guppy, 579 Poecilia reticulata, and its close outgroup species. Molecular Biology and Evolution. doi: 580 10.1093/molbev/msaa187/5875062. 581 Cortez, D., R. Marin, D. Toledo-Flores, L. Froidevaux, A. Liechti et al., 2014 Origins and 582 functional evolution of Y chromosomes across mammals. Nature 508: 488–493. 583 Darolti, I., A. Wright and J. Mank, 2020 Guppy Y chromosome integrity maintained by 584 incomplete recombination suppression. Genome Biology and Evolution 12: 965–977. 585 Dor, L., A. Shirak, Y. Kohn, T. Gur, J. Welle et al., 2019 Mapping of the sex determining 586 region on linkage group 12 of guppy (Poecilia reticulata). G3 (Bethesda) 9: 3867- 587 3875. 588 Fraser, B., A. Künstner, D. Reznick, C. Dreyer and D. Weigel, 2015 Population genomics of 589 natural and experimental populations of guppies (Poecilia reticulata). Molecular 590 Ecology 24: 389–408. Final version, 30 July 2020 20

591 Gordon, S. P., A. López-Sepulcre and D. N. Reznick, 2012 Predation-associated differences in 592 sex-linkage of wild guppy coloration. Evolution 66: 912-918. 593 Haskins, C., and E. F. Haskins, 1951 The inheritance of certain color patterns in wild 594 populations of Lebistes reticulatus inTrinidad. Evolution 5: 216-225. 595 Haskins, C., E. F. Haskins, J. McLaughlin and R. E. Hewitt, 1961 Polymorphisms and 596 population structure in Lebistes reticulatus, an ecological study, pp. 320-395 in 597 Vertebrate Speciation, edited by W. F. Blair. University of Texas Press, Austin, TX. 598 Howe, K., M. D. Clark, C. F. Torroja, J. Torrance, C. Berthelot et al., 2013 The zebrafish 599 reference genome sequence and its relationship to the human genome. Nature 496: 600 498–503. 601 Hudson, R. R., D. D. Boos and N. L. Kaplan, 1992 A statistical test for detecting geographic 602 subdivision. Molecular Biology and Evolution 9: 138-151. 603 Kamiya, T., W. Kai, S. Tasumi, A. Oka, T. Matsunaga et al., 2012 A trans-species missense 604 SNP in amhr2 is associated with sex determination in the tiger pufferfish, Takifugu 605 rubripes (Fugu). PLoS Genetics 8: e1002798. 606 Kondo, M., I. Nanda, M. Schmid and M. Schartl, 2009 Sex determination and sex 607 chromosome evolution: insights from Medaka. Sexual Development 3: 88-98. 608 Künstner, A., M. Hoffmann, B. A. Fraser, V. A. Kottler, E. Sharma et al., 2017 The genome of 609 the Trinidadian guppy, Poecilia reticulata, and variation in the Guanapo population. 610 PLOS ONE 11: e0169087. 611 Lahn, B. T., and D. C. Page, 1999 Four evolutionary strata on the human X chromosome. 612 Science 286: 964-967. 613 Lindholm, A., and F. Breden, 2002 Sex chromosomes and sexual selection in Poeciliid fishes. 614 American Naturalist 160: S214-S224. 615 Lisachov, A., K. Zadesenets, N. Rubtsov and P. Borodin, 2015 Sex chromosome synapsis and 616 recombination in male guppies. Zebrafish 12: 174-180. 617 Marais, G., and N. Galtier, 2003 Sex chromosomes: how X-Y recombination stops. Curr. Biol. 618 13: R641-643. 619 Nanda, I., W. Feichtinger, M. Schmid, J. Schröder, H. Zischler et al., 1990 Simple repetitive 620 sequences are associated with differentiation of the sex-chromosomes in the guppy 621 fish. Journal of Molecular Evolution 30: 456-462. 622 Nanda, I., S. Schories, N. Tripathi, C. Dreyer, T. Haaf et al., 2014 Sex chromosome 623 polymorphism in guppies. Chromosoma 123: 373-383. 624 Pollux, B. J. A., R. W. Meredith, M. S. Springer, T. Garland and D. N. Reznick, 2014 The 625 evolution of the placenta drives a shift in sexual selection in livebearing fish. Nature 626 513: 233–236. 627 Schield, D. R., D. C. Card, N. R. Hales, B. W. Perry, G. M. Pasquesi et al., 2019 The origins and 628 evolution of chromosomes, dosage compensation, and mechanisms underlying 629 venom regulation in snakes. Genome Research 29: 590-601. 630 Schield, D. R., G. M. Pasquesi, B. Perry, R. H. Adams, Z. L. Nikolakis et al., 2020 Snake 631 recombination landscapes are concentrated in functional regions despite PRDM9. 632 Molecular Biology and Evolution 37: 1272–1294. 633 Skaletsky, H., T. Kuroda-Kawaguchi, P. J. Minx, H. S. Cordum, L. Hillier et al., 2003 The male- 634 specific region of the human Y chromosome is a mosaic of discrete sequence classes. 635 Nature 423: 825 - 837. 636 Suk, H., and B. D. Neff, 2009 Microsatellite genetic differentiation among populations of the 637 Trinidadian guppy. Heredity 102: 425–434. Final version, 30 July 2020 21

638 Tamura, M., S. Subramanian and S. Kumar, 2004 Temporal patterns of fruit fly (Drosophila) 639 evolution revealed by mutation clocks. Molecular Biology and Evolution 21: 36-44. 640 Traut, W., and H. Winking, 2001 Meiotic chromosomes and stages of sex chromosome 641 evolution in fish: zebrafish, platyfish and guppy. Chromosome Research 9: 659-672. 642 Tripathi, N., M. Hoffmann, D. Weigel and C. Dreyer, 2009 Linkage analysis reveals the 643 independent origin of Poeciliid sex chromosomes and a case of atypical sex 644 inheritance in the guppy (Poecilia reticulata). Genetics 182: 365–374. 645 Van Laere, A.-S., W. Coppieters and M. Georges, 2008 Characterization of the bovine 646 pseudoautosomal boundary: Documenting the evolutionary history of mammalian 647 sex chromosomes. Genome Research 18: 1884-1895. 648 Wang, Z., J. Zhang, W. Yang, N. An, P. Zhang et al., 2014 Temporal genomic evolution of bird 649 sex chromosomes. BMC Evolutionary Biology 14: 250. 650 Willing, E.-M., P. Bentzen, C. v. Oosterhout, M. Hoffmann, J. Cable et al., 2010 Genome-wide 651 single nucleotide polymorphisms reveal population history and adaptive divergence 652 in wild guppies. Molecular Ecology 19: 968–984. 653 Winge, O., 1922a One-sided masculine and sex-linked inheritance in Lebistes reticulatus. 654 Journal of Genetics 12: 145-162. 655 Winge, O., 1922b A peculiar mode of inheritance and its cytological explanation. Journal of 656 Genetics 12: 137-144. 657 Winge, O., and E. Ditlevsen, 1947 Colour inheritance and sex determination in Lebistes. 658 Heredity 1: 65-83. 659 Wright, A., I. Darolti, N. Bloch, V. Oostra, B. Sandkam et al., 2017 Convergent recombination 660 suppression suggests a role of sexual conflict in guppy sex chromosome formation 661 Nature Communications 8: 14251. 662 Zhou, Q., J. Zhang, D. Bachtrog, N. An, Q. Huang et al., 2014 Complex evolutionary 663 trajectories of sex chromosomes across bird taxa. Science 346: 1332. 664 665 Final version, 30 July 2020 22

666 Table 1. Genotypes at an SNP at position 24,969,110 bp in the female assembly, within the 667 candidate sex-determining gene, GADD45G-like (or LOC103474023) in two high-throughput 668 genotyping experiments (the experiment number is indicated in column 1). The river from 669 which each population sample was derived is given, and HP and LP denote high- and low- 670 predation sites, respectively. Part A shows the results for each of the 14 populations 671 sampled (see Supplementary Table S5 for more details), and part B summarizes the overall 672 association of the variant with the male phenotype. 673

A: Genotypes observed in the samples

Numbers of individuals

Experiment Males Population Homozygous number heterozygous Males Females females (variant for the not present, %) variant 1 Guanapo LP (GLZ) 14 10 3 100 1 Guanapo HP 14 10 No variation 1 Marianne HP 17 7 No variation 1 Marianne LP 14 10 No variation 1 Yarra HP 14 10 No variation 1 Yarra LP 14 10 No variation 2 Aripo HP, AH 14 10 1 100 2 Aripo LP, AL 14 10 3 100 2 Quare LP 16 10 No variation 2 Quare HP 16 10 1 100 2 Turure LP 16 10 6 10 2 Turure HP 16 10 1 100 2 Paria LP 8 16 No variation 2 Guanapo LP (GLT) 5 5 No variation

B: Totals across the six populations with variants present Heterozygous Homozygous Total in males 15 75 Total in females 0 60 Final version, 30 July 2020 23

674 Final version, 30 July 2020 24

675 Table 2. Summary of genotypes in the QHPG5 sibship 2, which includes a recombinant male, 676 for markers that are informative in male meiosis. The two discrepancies are discussed in the 677 text, and the complete genotype information for these markers in both QHPG5 sibships is in 678 Supplementary Table S7. The table shows only markers that co-segregated with the 679 phenotypic sex in all progeny other than male 07, and in all other families where the 680 genotypes were informative in male meiosis. When the recombinant male inherited his 681 sire’s Y-linked allele, his genotype is listed as male in the second column, and it is listed as 682 female when he inherited his sire’s X-linked allele. 683 Genotype Number Marker positions in Description of marker results of male of guppy assembly (bp) 07 markers Centromere-proximal markers Female 5 458,883 – 2,756,939 SNPs that map more distally (true Male 3 2,952,176 – 2,953,254 location more terminal, Discrepancy 2) All markers until XY markers except in male 07 Female 75 20,599,360 21,049,596 – XY markers except in male 07 Male 10 23,293,233 Markers suggesting mis-assembly (true 24,166,053 – Female 9 location more proximal, Discrepancy 1) 24,508,654 XY markers in all progeny, including 24,829,827 to PAR Male 6 male 07 boundary 684 Final version, 30 July 2020 25

685 Table 3. Example from the LAH family of segregation results found for all three SNPs named 686 “Discrepancy 2” (see text and Table 2). The LAH family sire is inferred to be heterozygous for 687 all three markers. The example shown is for the LG12 site at 2,953,218 bp. The same pattern 688 was observed whenever both the sire and dam had heterozygous genotypes 689 Dam Sire Female progeny Male progeny genotype genotype A/G A/A A/G G/G A/G A/A A/G G/G Observed 6 11 0 0 25 0 numbers Interpretation Not XA/YGYA Not XA/XG XA/XA XA/XG XG/XG XA/YGYA expected XG/YGYA expected 690 Final version, 30 July 2020 26

691 Figure legends 692 693 Figure 1. Schematic diagram of the P. reticulata XY sex chromosome pair (with the X in pink) to 694 show regions in which crossovers in males probably occur at different rates (symbolized by the bar 695 charts), based on cytogenetic data and population genomic results (see Introduction). Much or all 696 of the chromosome is present on both the X and Y, and coverage of sequences is very similar in the 697 two sexes (the Y is not degenerated and appears not to have lost genes carried by the X (see 698 (Bergero et al. 2019). The diagram shows the centromere at the left-hand end (as in the genome 699 assembly), and the PAR, where most crossover events occur, at the right-hand end. The sizes of the 700 regions indicated are not yet accurately known, and the diagram shows the current information. 701 The region at the centromeric end that has extremely low crossing over in male meiosis may extend 702 across much of the XY pair, and the PAR appears to be at most 2 Mb in size (Bergero et al. 2019). 703 The Results section below describes evidence that the male-determining locus (symbolized by a 704 blue vertical line) is just proximal to the PAR boundary at around 25.7 Mb. The region carrying the 705 male-determining locus may lie within a completely non-recombining male-specific Y region (or 706 MSY), as the presence of male-specific heterochromatin suggests (see Introduction); if so, the 707 region could also include other genes, so the diagram shows a physically extended MSY (in black; 708 this region may be missing from the female assembly, and therefore remain undetected, since such 709 male-specific sequences would not have mapped to the female reference genome). It remains 710 possible that there is just a very small maleness factor, for example a single gene or SNP, as 711 discussed in the Introduction section. 712 713 Figure 2. Differentiation between males and females sampled from natural populations from rivers 714 in the Northern range of Trinidad (Supplementary Table S1), based on high-throughput genotyping

715 results for LG12 SNPs. Differentiation is estimated as the FST value within each population, which is 716 expected to be about 0.3 for fully sex-linked sites, and the y axes of the plot for each population

717 extend up to this value; no SNPs with higher FST values were observed in our samples. Sites with 718 high and low predation rates were sampled from each river, except for the Paria river, where there 719 are no high predation sites.

720 721 Final version, 30 July 2020 27

722 Figure 3. Summary of genotypes at non-PAR markers informative in male meiosis in sibship 2 from 723 the Quare QHPG5 family, in diagrammatic form, showing the two discrepancies between these 724 male genetic map results and the female genome assembly (see also Table 1 and full details, 725 including the marker names and locations, are in Supplementary Table S7). Each column shows the 726 genotypes of a single progeny individual in the sibship, with males at the left and females at the 727 right, and the colours indicate whether, for a given marker, an individual received the sire’s X- or Y- 728 linked allele marker (genotypes with paternal X- linked alleles are coloured in pink, and genotypes 729 with paternal Y-linked alleles are in blue). The left-hand part shows the 58 more centromere- 730 proximal markers, and the right-hand part shows the 58 more terminal markers, up to the inferred 731 PAR boundary. 108 markers that co-segregate perfectly with the individuals’ sexes in meiosis of the 732 male parent of the 13 sibship 1 progeny, with the same sire as the sibship diagrammed 733 (Supplementary Table S7), have clearly identifiable paternal X- and Y-linked alleles. These markers 734 show the same segregation in 9 progeny in sibship 2, but one recombinant male (male 07) was 735 detected (indicated by red arrows at the top of both parts of the diagram). 736 737 Figure 4. Genetic map of the guppy PAR, based on male meiosis with multiple families with sires 738 from natural populations, in Trinidad (seven from high-predation sites and four from low-predation 739 sites, see Supplementary Table S1). The different colour dots indicate different families. Each 740 family’s dam is from the same population as the sire. Only PAR markers are shown, and all more 741 proximal markers co-segregate with the progeny individuals’ sexes (see Supplementary Figure S1), 742 apart from the single recombinant male in the QHPG5 family (see Figure 3 and the text), and a 743 single female in family PMLPB2 (see text). Note that the marker map order in male meiosis is 744 shown, not the positions of markers on the chromosome in megabases (because the errors in the 745 assembly of this region mean that the true positions are not known). 746 Figure 1 Male-determining factor

PAR

~ 24 Mb Region with very rare Physical sizes ~ 2 Mb crossing over unknown

CENTROMERE 50 cM Rare Rare crossovers

PAR Figure 2

Guanapo river Yarra river Marianne river High predation sites between males males between and females ST F

Low predation sites Position in the LG12 assembly (Mb)

Figure 2, continued

High Aripo river Turure river Quare river predation sites between males and females males between ST

Low F predation sites

Paria river low predation site between males males between and females ST F Position in the LG12 assembly ♂ ♀ ♂ ♀ Figure 3 Number of Male 7 ‘s CEN Male 7 Number of Male 7 ‘s markers genotype end Male 7 markers genotype 2 5 Female 3 Male

Discrepancy Female

75 Female 10 (discrepancy Male 1)

9 Female

6 Male Figure 4 ~ 1 megabase

Inverted

Moved Order of markers in the male genetic map Final version, 30 July 2020 28

747 Supplementary files 748 749 Supplementary figures 750 Figure S1. Genetic maps of the guppy LG12 in female (left) and male (right) meiosis, in the 751 same families as shown in Figure 4 of the main text.

752 Figure S2. GC content of introns in two scaffolds that were unplaced in the published guppy 753 genome assembly, but which map to the PAR, as shown in Figure 4 of the main text.

754 Figure S3. Comparison of the gene contents of guppy and platyfish chromosomes.

755 Figure S4. Comparison of the gene arrangements of the guppy LG12 and the homologous 756 chromosome of the platyfish, Xm8.

757 Figure S5. Evidence for a possible movement of genes onto the guppy LG12 from a 758 chromosome carrying homologs assembled on the platyfish Xm8. The Xm8 centromere is 759 probably at the top end, and the region indicated is probably colinear with the guppy LG9. 760 Final version, 30 July 2020 29

761 Supplementary tables

762 Table S1. Trinidadian guppy natural population samples studied, including live fish 763 transported to the United Kingdom, and maintained at the University of Exeter, Falmouth, 764 for genetic studies, together with the names of families used for genetic maps. Hgh- 765 throughput experiment 1 included the LAH family. Where possible, both high and low 766 predation populations were sampled from each river; the Paria river has no high predation 767 sites.

768 Table S2. Primers for the guppy LG12 markers genotyped (mostly microsatellites), including 769 the gu1066 microsatellite and those identified from Xm8, NW_007615031.1 and 770 NW_007615023.1 (see main text).

771 Table S3. Summary of gu1066 genotypes in three natural populations. Blue indicates males 772 and red females.

773 Table S4. Table S4. Genotypes found in 32 alleles sampled from each of three high predation 774 populations (16 from each sex). Grey shading indicates alleles with > 3 copies in a 775 population sample, and bold font indicates alleles that were found in more than a single 776 population.

777 Table S5. Results of high-throughput genotyping for the SNP at site 24,969,110 in the 778 LOC103474023 gene identified by Dor et al. (2019) as a candidate for the guppy sex- 779 determining gene. HP indicates locations with high predation rates and LP indicates low 780 predation rate.

781 Table S6. Genotypes of seven LG12 microsatellite markers proximal to the pseudo- 782 autosomal region (PAR) in the PMLPB2 family, showing an X-Y crossover event in female 783 progeny PMLPB2_f23. For clarity, PAR markers are not shown. 137 progeny were 784 genotyped, excluding two contaminants whose genotypes do not correspond to any othe 785 sires or dams. The dams and sires of the progeny were inferred from the genotypes, and are 786 shown in columns C and D. The positions in the guppy genome assembly are shown below 787 each marker name. The 68 male progeny (blue font in column A) all carry the variant that 788 can be inferred to be the sire's Y-linked allele, and 68 of the female progeny inherited their 789 sire's X-linked allele; two males and one female disagreed with heir assigned sex for all 790 seven markers, and were probably labelled erroneously). Female f23, however, carries her Final version, 30 July 2020 30

791 sires Y-linked alleles at the 4 markers from 1.2 to11.7 Mb , and his X-linked alleles at the 3 792 distal markers, including the marker at 21.3 Mb.

793 Table S7. Genotypes of 264 LG12 markers in the QGPG5 family, which includes an X-Y 794 crossover event in male progeny QHPG5m07 (in sibship 2). Most of the markers are SNPs, 795 which were genotyped by high-throughput experiments (see main text), but 17 796 microsatellites were also genotyped. The two sibships had the same sire, but different 797 dams, and the progeny were assigned to their respective dams using the microsatellites. 798 Column B shows the positions in the published guppy genome assembly (with the 799 microsatellite names and positions in bold font). Column A notes several "landmarks", 800 including locations and sets of markers that are described in the main text. All 13 progeny in 801 sibship 1 show complete co-segregation in male meiosis of all informative markers in the 802 region centromere-proximal to the PAR boundary; marker genotypes are coloured in blue or 803 pink, to indicate whether they inherited the sires's Y- or X-linked alleles, respectively. Cells 804 coloured in grey indicate SNPs that are not informative in a sibship (markers that were 805 uninformative in both male and female meiosis in both sibships are not shown), and yellow 806 indicates discrepancies from Mendelian expectations, probably due to genotyping errors; 807 these are very infrequent. In sibship 2, one individual (indicated in yellow) carries the sire's 808 X-linked markers, and is therefore inferred to be a female, although it was recorded as a 809 male, but three males and five other females show complete co-segregation of all non-PAR 810 markers with sex. Male m07, however, is clearly recombinant: this male carries his sire's X- 811 linked alleles at most centromere-proximal markers, but Y-linked alleles at markers distal to 812 21.2 Mb, with few exceptions which are discussed in the main text (see Figure 3 for a 813 summary of this sibship's results for markers informative in male meiosis only).

814 Table S8. LAH family results from high-throughput genotyping . Colouring is the same as 815 described for Table S7. As there are only a single sire and single dam for this family, all SNPs 816 that are not informative have been omitted. The three SNPs that are boxed (at positions 817 2,952,176, 2,953,218, and 2,953,254) are discussed in the text under the term "Discrepancy 818 2", , and the most proximal position that is consistent with their segregation patterns in 819 female meiosis is distal. to 21 Mb in the assembly. For these three discrepant SNPs, the 820 segregation in male meiosis is not informative about their location, because there is a Final version, 30 July 2020 31

821 duplication such that all male progeny of female heterozygotes appear to be heterozgous, 822 as explained in the main text.

823 Table S9. Segregation patterns in two ALP2B2 sibships in which the three boxed SNPs (at 824 positions 2,952,176, 2,953,218, and 2,953,254) named "Discrepancy 2" are heterozygous in 825 the dams. These are discussed in the text under the term "Discrepancy 2", and again the 826 most proximal position that is consistent with their segregation patterns in female meiosis is 827 distal. The colouring is the same as described for Table S7. The table shows that, while the 828 segregation patterns for most SNPs in female meiosis agree with those assembled in 829 physically nearby positions (for sibship 1, agreement is seen for all 30 SNPs proximal to the 3 830 discrepant ones, and for 7 more distal SNPs before the first detectable crossover), this is not 831 the case for these three SNPs. Instead, in both these sibships (totalling 33 female progeny 832 genotyped), their segregation patterns agree closely with those of SNPs at positions near 23 833 Mb. For these three discrepant SNPs, the segregation in male meiosis is not informative 834 about their location, because there is a duplication such that all male progeny of female 835 heterozygotes appear to be heterozygous, as explained in the main text.

836 Table S10. Genes on the guppy LG12 (the Poecilia reticulata sex chromosome, and their 837 locations in the published guppy genome assembly and in the assembly of the homologous 838 chromosome (Xm8) of the platyfish, Xiphophorus maculatus, together with some results 839 from genetic mapping using SNPs selected from across the guppy assembly. The 840 homologous genes in the two species were determined by NUCmer analysis (Kurtz et al., 841 2004, http://mummer.sourceforge.net) using the platyfish chromosome 8 as the reference 842 and the guppy LG12 sequence as the query. The genes are shown in the order in the 843 published assembly for the guppy, with the guppy centromere end (determined from 844 genetic mapping in male meiosis) at the end with the lowest positions; note that the 845 homologous chromosome if the platyfish is assembled in the opposite order, and therefore 846 distances from the centromere end are shown (in column C), in addition to the assembly 847 positions. Breaks in the platyfish ordering are indicated by grey shading of the regions. 608 848 sequences with % identity < 85% were excluded. A few SNPs that were mapped in the 849 guppy, but where the locus was not identified in the platyfish, are indicated by turquoise 850 blue text in columns E and F. The large region in which the guppy order is inverted is boxed 851 (as explained in the text, genetic mapping in several families indicates that this is an error in Final version, 30 July 2020 32

852 the guppy assembly, rather than a difference between the two species). The PAR is also 853 boxed (based on results in all families with map information), as is the region named 854 "Discrepancy 1" in the main text, and the corresponding positions in platyfish assembly are 855 shown in red, as are the informative guppy SNPs. The region in which the recombinant male 856 in the QHPG5 sibship carries paternal Y-linked variants is highlighted in blue in columns D 857 and E (in all other regions, including the "Discrepancy 1" region, this male carries maternal 858 LG12 alleles, as described in the text).