bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Title

2 A level genome of Astyanax mexicanus surface for comparing population-

3 specific genetic differences contributing to trait evolution.

4

5 Authors

6 Wesley C. Warren1, Tyler E. Boggs2, Richard Borowsky3, Brian M. Carlson4, Estephany

7 Ferrufino5, Joshua B. Gross2, LaDeana Hillier6, Zhilian Hu7, Alex C. Keene8, Alexander Kenzior9,

8 Johanna E. Kowalko5, Chad Tomlinson10, Milinn Kremitzki10, Madeleine E. Lemieux11, Tina

9 Graves-Lindsay10, Suzanne E. McGaugh12, Jeff T. Miller12, Mathilda Mommersteeg7, Rachel L.

10 Moran12, Robert Peuß9, Edward Rice1, Misty R. Riddle13, Itzel Sifuentes-Romero5, Bethany A.

11 Stanhope5,8, Clifford J. Tabin13, Sunishka Thakur5, Yamamoto Yoshiyuki14, Nicolas Rohner9,15

12

13 Authors for correspondence: Wesley C. Warren ([email protected]), Nicolas Rohner

14 ([email protected])

15

16 Affiliation

17 1Department of Animal Sciences, Department of Surgery, Institute for Data Science and

18 Informatics, University of Missouri, Bond Life Sciences Center, Columbia, MO

19 2 Department of Biological Sciences, University of Cincinnati, Cincinnati, OH

20 3 Department of Biology, New York University, New York, NY

21 4 Department of Biology, The College of Wooster, Wooster, OH

22 5 Harriet L. Wilkes Honors College, Florida Atlantic University, Jupiter FL

23 6 Department of Genome Sciences, University of Washington, Seattle, WA

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

24

25 7 Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United

26 Kingdom

27 8 Department of Biological Sciences, Florida Atlantic University, Jupiter FL

28 9 Stowers Institute for Medical Research, Kansas City, MO

29 10 McDonnell Genome Institute, Washington University, St Louis, MO

30 11 Bioinfo, Plantagenet, ON K0B 1L0, Canada

31 12 Department of Ecology, Evolution, and Behavior, University of Minnesota, Saint Paul, MN

32 13 Genetics Department, Blavatnik Institute, Harvard Medical School, Boston, MA

33 14 Department of Cell and Developmental Biology, University College London, London, United

34 Kingdom

35 15 Department of Molecular & Integrative Physiology, KU Medical Center, Kansas City, KS

36

37 Abstract

38 Identifying the genetic factors that underlie complex traits is central to understanding the

39 mechanistic underpinnings of evolution. In nature, adaptation to severe environmental change,

40 such as encountered following colonization of caves, has dramatically altered genomes of species

41 over varied time spans. Genomic sequencing approaches have identified associated with

42 troglomorphic trait evolution, but the functional impacts of these mutations remain poorly

43 understood. The Mexican Tetra, Astyanax mexicanus, is abundant in the surface waters of

44 northeastern Mexico, and also inhabits at least 30 different caves in the region. Cave-dwelling A.

45 mexicanus morphs are well adapted to subterranean life and many populations appear to have

46 evolved troglomorphic traits independently, while the surface-dwelling populations can be used as

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

47 a proxy for the ancestral form. Here we present a high-resolution, chromosome-level surface fish

48 genome, enabling the first genome-wide comparison between surface fish and cavefish

49 populations. Using this resource, we performed quantitative trait (QTL) mapping analyses

50 for pigmentation and eye size and found new candidate for eye loss such as dusp26. We

51 used CRISPR editing in A. mexicanus to confirm the essential role of a gene within an eye

52 size QTL, rx3, in eye formation. We also generated the first genome-wide evaluation of deletion

53 variability that includes an analysis of the impact on -coding genes across cavefish

54 populations to gain insight into this potential source of cave adaptation. The new surface fish

55 genome reference now provides a more complete resource for comparative, functional,

56 developmental and genetic studies of drastic trait differences within a species.

57

58 Main Text

59 Establishing the relationship between natural environmental factors and the genetic basis of trait

60 evolution has been challenging. The ecological shift from surface to cave environments provides

61 a tractable system to address this, as in this instance the polarity of evolutionary change is known.

62 Across the globe, subterranean animals, including fish, salamanders, , and myriads of

63 invertebrate species have converged on reductions in metabolic rate, eye size, and pigmentation1-

64 3. The robust phenotypic differences from surface relatives provide the opportunity to investigate

65 the mechanistic underpinnings of evolution and to determine whether genotype-phenotype

66 interactions are deeply conserved.

67 The Mexican cavefish, Astyanax mexicanus, has emerged as a powerful model to

68 investigate complex trait evolution4. A. mexicanus comprises at least 30 cave-dwelling populations

69 in the Sierra de El Abra, Sierra de Colmena and Sierra de Guatemala regions of northeastern

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

70 Mexico, and surface-dwelling fish of the same species inhabit rivers and lakes throughout Mexico

71 and southern Texas5. The ecology of surface and cave environments differ dramatically, allowing

72 for functional and genomic comparisons between populations that have evolved in distinct

73 environments. Dozens of evolved trait differences have been identified in cavefish including

74 changes in morphology, physiology, and behavior4,6 . It is also worth noting that comparing

75 cavefish to surface fish reveals substantial differences in many traits of possible relevance to

76 disease, including sleep duration, circadian rhythmicity, anxiety, aggression, heart

77 regeneration, eye and retina development, craniofacial structure, insulin resistance, appetite, and

78 obesity7-17. Further, generation of fertile surface-cave hybrids in a laboratory setting has allowed

79 for genetic mapping in these fish10,18-25. Clear phenotypic differences, combined with availability

80 of genetic tools, positions A. mexicanus as a natural model system for identifying the genetic basis

81 of ecologically and evolutionary relevant phenotypes17,26,27.

82 Recently, the genome of an individual from the Pachón cave population was sequenced28.

83 While this work uncovered genomic intervals and candidate genes linked to cave traits, an

84 important computational limitation was sequence fragmentation due to the use of short-read

85 sequencing. In addition, lack of a surface fish genome prevents direct comparisons between surface

86 and cavefish populations at the genomic level. Here, we address these two obstacles by presenting

87 the first de novo genome assembly of the surface fish morph using long-read sequencing

88 technology. This approach yielded a much more comprehensive genome that allows for direct

89 genome-wide comparisons between surface fish and Pachón cavefish. As a proof-of-principle, we

90 first confirmed known genetic mutations associated with pigmentation and eye loss, then

91 discovered novel quantitative trait loci (QTL), and identified coding and deletion mutations that

92 highlight putative contributions to cave trait biology.

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

93

94 Results

95 De novo assembly and curation of the surface fish genome. We set out to generate a robust

96 reference genome for the surface morph of A. mexicanus from a single lab-reared female,

97 descended from wild caught individuals from known Mexico localities (Fig. 1a, Supplementary

98 Figure 1). We sequenced and assembled the genome using Pacific Biosciences single molecule

99 real time (SMRT) sequencing (~73x genome coverage) and the wtdbg2 assembler29 to an

100 ungapped size of 1.29 Gb. Initial scaffolding of assembled contigs was accomplished with the aid

101 of an A. mexicanus surface fish physical map (BioNano) followed by manual assignment of 70%

102 of the assembly scaffolds to 25 total using the existing A. mexicanus genetic linkage

103 map markers30. The final genome assembly, Astyanax mexicanus 2.0, comprises a total of 2,415

104 scaffolds (including single contig scaffolds) with N50 contig and scaffold lengths of 1.7 Mb and

105 35 Mb, respectively, which is comparable to other similarly sequenced and assembled teleost

106 (Supplementary Table 1). The assembled regions (394 Mb) that we were unable to assign

107 to chromosomes were mostly due to 3.36% (2,235 markers total) of the genetic linkage markers

108 not aligning to the surface fish genome, single markers per contig where the orientation could not

109 be properly assigned, or markers that mapped to multiple places in the genome and thus could not

110 be uniquely mapped. The uniquely mapped markers exhibited few ordering discrepancies and

111 significant synteny between the linkage map30 and the assembled scaffolds of the Astyanax

112 mexicanus 2.0 genome, validating the order and orientation of a majority of the assembly (Fig. 1b;

113 Supplementary Fig. 2). In Astyanax mexicanus 2.0, we assemble and identify 11% more total

114 masked repeats that in the Astyanax mexicanus 1.0.2 assembly (Supplementary Table 1). Amongst

115 assembled teleost genomes, A. mexicanus seems to be an intermediate with 41% masked

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

116 interspersed repeats estimated using WindowMasker31, compared to Xiphophorus maculatus

117 (27%) and Danio rerio (50%). Possible mapping bias across cave population sequences to the cave

118 versus surface fish genome references was also investigated by mapping population level

119 resequencing reads to both genomes32. We found the number of unmapped reads is greater for all

120 populations aligned to the Astyanax mexicanus 1.0.2 reference compared to Astyanax mexicanus

121 2.0 (Fig. 1c). Also, the percentage of properly paired reads, that is, pairs where both ends align to

122 the same scaffold, is greater for all resequenced cave populations aligned to the Astyanax

123 mexicanus 2.0 reference (Supplementary Fig. 3) and significantly more non-primary alignments

124 with greater variation were observed (Supplementary Fig. 4). Both metrics indicate that the

125 Astyanax mexicanus 2.0 reference has more resolved sequence regions than the Astyanax

126 mexicanus 1.0.2 reference. Future application of phased assembly approaches will likely resolve

127 a significantly higher proportion of chromosomal sequences in the Astyanax mexicanus genome33.

128

129 130

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

131 Figure 1. Assembly metrics for the surface fish genome Astyanax mexicanus 2.0. a) Adult Astyanax 132 mexicanus surface fish. b) Substantial synteny is evident between a recombination-based linkage map30 and 133 the draft surface fish genome. By mapping the relative positions of genotyping-by-sequencing (GBS) 134 markers from a dense map constructed from a Pachón x surface fish F2 pedigree, we observed significant 135 synteny based on 96.6% of our genotype markers. c) Proportion of unmapped reads for the same samples 136 aligned to the cave (square points) and surface (triangle points) Astyanax reference genomes. Colors unite 137 samples by population identity. Best fit lines are for each alignment (cave: blue, surface: yellow) have 138 similar slope, indicating that population identity has a similar effect on mapping rate to either genome. 139 140 Gene annotation. Two independent sets of protein-coding genes were generated using the NCBI34

141 and Ensembl35 automated pipelines with similar numbers of genes found by each: 25,293 and

142 26,698, respectively (Supplementary Table 2). Gene annotation was aided by the diversity of

143 transcript data derived from whole adult fish, embryos, and 12 different tissues available from the

144 NCBI short read archive. As a result, the total predicted protein-coding genes and transcripts

145 (mRNA) were consistent with other annotated teleost species (Supplementary Table 2) and 1,665

146 new protein-coding genes were added compared to Astyanax mexicanus 1.0.2. Additionally, long

147 non-coding gene (e.g. lncRNA) representation is significantly improved in the Astyanax

148 mexicanus 2.0 reference compared to the Astyanax mexicanus 1.0.2 reference (5,314 vs 1,062),

149 although targeted non-coding RNA sequencing will be required to achieve annotation comparable

150 to (Supplementary Table 2). We assessed completeness of gene annotation by applying

151 BUSCO (benchmarking universal single-copy ortholog) scores, which measure gene completeness

152 among vertebrate genes. We find 94.6% of genes are complete, 4% are missing and 4.2% are

153 duplicated (Supplementary Table 3). In addition, using NCBIs transcript aligner Splign

154 (C++toolkit) gene annotation metrics, same-species RefSeq or GeneBank transcripts show 98.4%

155 coverage when aligned to Astyanax mexicanus 2.0. In total, our measures of gene representation

156 in the Astyanax mexicanus 2.0 reference show a high-quality resource for the study of A.

157 mexicanus gene function.

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

158 Having a high-quality reference genome provides many benefits in exploiting Astyanax mexicanus

159 as a model species. Among the more important uses, from the standpoint of utilizing the system to

160 understand evolutionary mechanisms, are in mapping and identifying genes responsible for

161 phenotypic change, and in unveiling genomic structural variation that provides a substrate for

162 adaptive selection. Thus, to demonstrate the advantages Astyanax mexicanus 2.0 brings to the

163 field, we have explored its use in each of these settings; in particular examining the genetic

164 underpinnings of albinism and reduced eye size, and then looking at the contribution of genomic

165 deletion to variation within Astyanax mexicanus populations.

166 First with respect to identification of evolutionary important genes, we took advantage of a critical

167 attribute of A. mexicanus, nearly unique among cave animals: the ability to interbreed cavefish and

168 surface fish populations to generate fertile hybrids that can be intercrossed to perform QTL studies

169 (for review see36,37). The improved contig length of the surface fish genome and our syntenic

170 analysis that unites the physical genome to a previously published linkage map based on

171 genotyping-by-sequencing (GBS) markers30, should greatly aid the correct identification of

172 genetic changes linked to cave traits. To this end, we demonstrate the power of the Astyanax

173 mexicanus 2.0 reference in gaining deeper insight into an already mapped trait (albinism) and in

174 carrying out new QTL analysis of eye reduction.

175 QTL analysis of albinism. Albinism has previously been mapped in QTL studies of both the

176 Pachòn and Molino populations20. This work was carried out before the existence of any reference

177 genome for the species. However, these mapping studies showed that a single major QTL in each

178 population colocalized with a candidate gene oca2. Oca2 loss of function mutations are known to

179 cause albinism in other organisms, including , mice and zebrafish, and deletions causing

180 functional inactivation of Oca2 were identified in the albino fish from both the Molino and Pachón

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

181 caves20. These compelling data indicating that oca2 mutations are causal of albinism in cavefish

182 were subsequently confirmed by CRISPR-mediated mutagenesis in surface fish27. To build on

183 these results, we first performed two separate de novo QTL analyses of surface/Pachón F2 hybrids

184 (group A and B) in the context of the new reference genome (Fig. 2a). Both studies identified a

185 single QTL for albinism, with a LOD score of 47.31 at the peak marker on linkage group 3 (69%

186 variance explained) in group A, and with LOD score of 22.56 at the peak marker on linkage group

187 21 (37.8% variance explained) in group B (Fig. 2c, d; 2g, h). At the peak QTL positions, F2 hybrids

188 homozygous for the cave allele are albino (Fig. 2e, i). Mapping the markers associated with each

189 of these linkage groups to the surface fish genome revealed that they are both located on surface

190 fish chromosome 13 (Table 1, Fig. 2f, j). This demonstrates a significant improvement over

191 mapping to the original cavefish genome: Linkage group 3 from group A almost completely

192 corresponds to surface fish chromosome 13 in Astyanax mexicanus 2.0, whereas it is split up into

193 many contigs in Astyanax mexicanus 1.0.2 (Fig. 2f, j, Supplementary Fig. 5a). The QTL identified

194 on linkage group 21 in group B corresponds to a 2.8 Mb region on surface fish chromosome 13 as

195 determined using the sequences flanking the 1.5-LOD support interval as input for Ensembl basic

196 local alignment search tool (Table 1, Fig. 2J). In line with previous mapping studies, the gene oca2

197 is found within this region on surface fish chromosome 13. We further functionally verified that a

198 change in the oca2 locus is responsible for albinism in the F2 hybrids, by crossing an albino

199 surface/Pachón F2 hybrid with a genetically-engineered surface fish heterozygous for a deletion in

200 oca2 exon 2127. We found that 46.5% of the offspring completely lacked pigment (n=40/86, Fig.

201 2b). This confirms that a change in the oca2 locus is responsible for loss of pigment in the mapping

202 population.

203 204

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

205 206 Figure 2. Utilizing the surface fish genome for QTL mapping of albinism. a) Representative image of 207 F1 hybrid of surface/Pachón cross and albino F2 hybrid b) Pigmented (top) and albino (bottom) progeny 208 resulting from breeding a surface/Pachón albino F2 hybrid with a surface fish heterozygous for an 209 engineered 4 deletion in oca2 exon 21. c-j) QTL mapping of albinism in surface/Pachón mapping 210 group A (c-f) and B (g-j). c, g) Results of genome wide LOD calculation for albinism using Haley-Knott 211 regression and binary model (0 = pigmented, 1 = albino). Significance threshold of 5% (black dotted line) 212 determined by calculating the 95th percentile of genome-wide maximum penalized LOD score using 1000 213 random permutations. d, h) LOD score for each marker on the linkage group with the peak marker. e, i) 214 Plot highlighting the effect of the indicated genotype at the peak marker (S = surface allele, C = cave allele). 215 f, j) Relative position of the markers that define the 1.5 LOD support interval on chromosome 13 of the 216 surface fish genome. 217 218 219 220 221 222 223 224

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

225 Table 1. QTL mapping of albinism and eye size in two groups of surface/Pachón F2 hybrids. Table 226 comparing QTL marker positions on linkage maps, Pachón genome scaffolds, and surface fish genome 227 chromosomes (1 Astyanax mexicanus 1.0.2; 2 Astyanax mexicanus 2.0).

F2 Surface Surface/ Linkage group Pachón chromosome Pachón Phenotype Marker LOD position (cM) scaffold position1 position2 group A albinism m6036 45.03 3:22.6 KB882185: 1429320 13: 35611239 group A albinism m17241 47.31 3:22.6 KB882122: 1818424 13: 38374549 group A albinism m36842 45.16 3:22.8 KB882122: 2246330 13: 37945372 group B albinism r219177 20.67 22:22.5 KB882122: 1091706 13: 39098911 group B albinism r219695 22.56 22:24.6 KB882122: 2349092 13:37844883 group B albinism r220017 20.98 22:25.0 KB882122: 3270816 13: 41898580 group A eye size m23384 3.39 1:62.6 KB882121: 3061587 3: 23014157 group A eye size m15225 5.46 1:66.8 KB882287: 356685 3: 9301997 group A eye size m30438 3.89 1:67.9 KB872232: 7153 3: 8742127 group B eye size r230947 8.45 15:31.0 KB882132: 1682036 20: 1802669 group B eye size r231141 11.98 15:35.9 KB882132: 2321700 20: 1169105 228 229 230 New insights on albinism locus. While up until this point our reanalysis of albinism was largely

231 confirmatory, an advantage of using the surface fish genome as a reference, compared to the

232 previously available cavefish genome, is the ability to make comparisons between regions that are

233 deleted in cavefish and as such are not available for sequence alignment. We utilized the A.

234 mexicanus 2.0 reference genome to align sequencing data obtained from wild-caught fish from

235 different surface and cave localities32 and analyzed the oca2 locus (Supplementary Fig. 6,7). We

236 found that 9 out of 10 Pachón cavefish carry the deletion in exon 24 that was previously reported

237 in laboratory-raised fish (Supplementary Fig. 6;20). None of the individuals from other populations

238 for which sequence information was available carried the same deletion. Consistent with the

239 laboratory strains, exon 21 of oca2 is absent in all Molino cavefish samples (n=9, Supplementary

240 Fig. 7). None of the other sequenced population samples harbor the same deletion of exon 21,

241 however we found smaller heterozygous or homozygous deletions in exon 21 in some wild

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

242 samples of Pachón (n=5/9), Río Choy (n=1/9), and Tinaja (5/9) fish (Supplementary Fig. 7). In

243 summary, we were able to detect deletions in oca2 that would have not been discovered with

244 alignments to Astyanax mexicanus 1.0.2 since exon 24 is missing in Pachón and thus no reference-

245 based alignments were produced in that region.

246

247 Surface fish genome reveals new candidate genes from prior QTL studies. A prominent feature of

248 cavefish is the absence of external eyes. While a number of eye size QTL have been identified18,19,

249 there is an incomplete understanding of the genetic basis of eye regression. We uncovered

250 additional information by remapping previously published QTL18-24 using the Astyanax mexicanus

251 2.0 reference. A total of 1,060 out of 1,124 markers (94.3%) mapped successfully with BLAST

252 and were included in our surface fish QTL database. There were 77 markers that did not map to

253 the cavefish genome28 but did map to the surface fish genome, 52 markers that mapped to the

254 cavefish but not the Astyanax mexicanus 2.0 reference, and 12 markers that did not map to either

255 reference. The improved contiguity of the chromosome-level surface fish assembly allowed us to

256 identify several additional candidate genes associated with QTL markers that were not previously

257 identified in the more fragmented cavefish genome (Supplementary Data 1). For example, the

258 markers Am205D and Am208E mapped to an approximately 1 Mb region of chromosome 6 of the

259 surface fish genome (46,516,926 - 47,425,713 bp) but did not map to the Astyanax mexicanus

260 1.0.2 reference. This region is associated with feeding angle21, eye size, vibration attraction

261 behavior (VAB), suborbital neuromasts23, and maxillary tooth number19. Multiple strong candidate

262 genes involved in eye development or morphology are contained in this region including

263 rhodopsin, mib2, and ubiad1, as well as GABA A delta which is associated with a variety

264 of behaviors and could conceivably be involved in VAB. Notably, the scaffold containing these

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

265 four genes in Astyanax mexicanus 1.0.2 (KB871939.1) was not linked to this QTL, demonstrating

266 the utility of increased contiguity of Astyanax mexicanus 2.0.

267 Previous studies suggested that the gene retinal gene 3 (rx3) lies within the QTL for

268 outer plexiform layer of the eye24. Another QTL for eye size, size of the third suborbital bone, and

269 body condition18,19, may also contain rx3 (low marker density and low power of older studies,

270 result in a broad QTL critical region in this area). The increased contiguity of Astyanax mexicanus

271 2.0 revealed that rx3 is within the region encompassed by this QTL, whereas in Astyanax

272 mexicanus 1.0.2 the marker for this QTL (Am55A) and rx3 were located on separate scaffolds;

273 thus, we could not appreciate that this QTL and key gene for eye development were in relatively

274 close genomic proximity. While no coding changes are apparent between cavefish and

275 surface fish, expression of rx3 is reduced in Pachón cavefish relative to surface fish28. In zebrafish,

276 rx3 is expressed in the eye field of the anterior neural plate during gastrulation and has an essential

277 role for the fate specification between eye and telencephalon38,39. We have compared rx3

278 expression at the end of gastrulation and confirmed Pachón embryos have reduced expression

279 domain size (Fig. 3a, b). The expression area is significantly smaller in Pachón embryos compared

280 to stage-matched surface fish embryos. The expression of rx3 is restored in F1 hybrids between

281 cavefish and surface fish, indicating a recessive inheritance in cavefish (Fig. 3a, b). To test for a

282 putative role of rx3 in eye development in Astyanax mexicanus we used CRISPR/Cas9 to mutate

283 this gene in surface fish (Fig. 3c), and assessed injected, CRISPant fish for eye phenotypes. Wild-

284 type surface fish have large eyes (Fig. 3d). In contrast, externally visible eyes are completely absent

285 in adult CRISPant surface fish (n=5, Fig.3d). This is consistent with work from other species, in

286 which mutations in rx3 (fish) or Rx (mice) result in a complete lack of eyes40-42. Together these

287 data suggest that the role of rx3 in eye development is conserved in A. mexicanus. Further, they

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

288 support the hypothesis that regulatory changes in this gene may contribute to eye loss in cavefish

289 through specification of a smaller eye field, and subsequently, production of a smaller eye.

290

291 292 Figure 3. Rx3 analysis. a) In situ hybridizations for rx3 (arrows) and mRNA at tailbud stage on 293 surface fish, Pachón cavefish, and F1 hybrid between surface fish male and Pachón cavefish female. Scale 294 bar: 200 µm. b) Comparison of expression area of rx3 at tailbud stage on surface fish (n=17), Pachón 295 (n=14), and F1 hybrid (n=11). Significance shown for ANOVA. p < 0.005 is denoted by * and ns means 296 not significant. c) Diagram of rx3 gene. Boxes indicate exon and lines indicate introns. The empty boxes 297 are UTR and the filled boxes are coding sequence (gene build from NCBI). A gRNA was designed targeting 298 exon 2. The gRNA target site is in blue and the PAM site is in red. The arrow indicates the predicted Cas9 299 cut site. d) left: Adult WT control (uninjected sibling) surface fish. Right: Adult rx3 CRISPR/Cas9-injected 300 surface fish lacking eyes (CRISPant). 301 302 In addition to the compiled database of older QTL studies, a number of genomic intervals

303 associated with previously described locomotor activity difference between cavefish and surface

304 fish25 were also re-screened using the surface-anchored locations of markers from the high-density

305 linkage map30. Within this Pachón/surface QTL map we confirmed the presence of 20 previously

306 reported candidate genes, and identified 96 additional genes with relevant GO terms, including

307 rx3, further demonstrating the power and utility of this new genomic resource (Supplementary

308 Table 4). The new candidates include additional opsins (opn7a, opn8a, opn8b, tmtopsa, and two

309 putative green-sensitive opsins), as well as several genes contributing to circadian rhythmicity

310 (id2b, -5, cipcb, clocka, and ). While analyses of expression data and sequence variation

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

311 are necessary to determine which of these candidates exhibit meaningful differences between

312 morphs, the presence of clocka and npas2 in these intervals is of particular note, as the original

313 analysis conducted using Astyanax mexicanus 1.0.2 did not provide any evidence of a potential

314 role for members of the core circadian clockwork in mediating observed differences in locomotor

315 activity patterns between Pachón and surface fish15,25.

316

317 Genetic mapping with surface fish genome reveals new candidate genes for eye regression.

318 Finally, to gain new insight into the eye reduction phenotype, we used the Astyanax mexicanus

319 2.0 reference to de novo genetically map eye size in the two surface/Pachón F2 groups that we used

320 to map albinism. In surface/Pachón F2 mapping population A (n=188) we identified multiple QTL

321 for normalized eye perimeter that were spread across four linkage groups (Fig. 4a). The QTL on

322 linkage group 1 is significant above a threshold of p < 0.01 (Fig. 4b). The surface allele at the peak

323 marker appears to be dominant since eye size in the heterozygous state is similar to the

324 homozygous state (Fig. 4c). Mapping linkage group 1 to the Astyanax mexicanus 1.0.2 genome

325 assembly results in markers under the QTL peak spread across different contigs, whereas these

326 map to in the new surface fish assembly, emphasizing how the improved quality

327 of the surface fish genome allows the identification of candidate genes throughout the QTL region

328 (Fig. 4d, Supplementary Fig. 5b, Table 1). This region has been identified previously18,28 and

329 contains genes such as shisa2a and shisa2b, as well as . Analysis of the region between the

330 markers with the highest LOD scores (3:9301997-9505868), however, revealed one gene that has

331 not been linked to eye loss in A. mexicanus before, ENSAMXG00000005961 (Fig. 4d). Alignment

332 of this novel gene sequence showed homology to orofacial cleft 1 (ofcc1), also called ojoplano

333 (opo). In medaka, opo has been shown to be involved in eye development. When knocked out, opo

334 affects the morphogenesis of several epithelial tissues, including impairment of optic cup folding

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

335 which resulted in abnormal morphology of both the lens and neural retina in the embryos43.

336 Sequence comparisons of this gene in Pachón cavefish and surface fish revealed several coding

337 changes, however, none affect evolutionarily conserved residues. Future studies are needed to

338 investigate a putative role of opo in the eye loss of Pachón cavefish.

339

340 341 Figure 4. Utilizing the surface fish genome for QTL mapping of eye size in surface/Pachón hybrids. 342 Results from group A (a-d), results from group B (e-h). a, e) Genome wide LOD calculation for eye size 343 using Haley-Knott regression and normal model. Significance threshold of 5% (black dotted line) 344 determined by calculating the 95th percentile of genome-wide maximum penalized LOD score using 345 1000 random permutations. b, f) LOD score for each marker on the linkage group with the peak marker. 346 c, g) Mean phenotype of the indicated genotype at the peak marker (S = surface allele, C = cave allele). d) 347 Relative position of the markers that define the 1.5 LOD support interval for group A on the surface fish 348 chromosome 3. h) Relative position of the markers that define the 1.5 LOD support interval on chromosome 349 20 of the surface fish genome. 350

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

351 In the surface/Pachón F2 mapping population B (n=219) we identified a single QTL for normalized

352 left eye diameter on linkage group 13 with a LOD score of 11.98 at the peak marker that explains

353 24.3% of the variance in this trait (Fig. 4e, f). F2 hybrids homozygous for the cave allele at this

354 position have the smallest eye size, and the heterozygous state is intermediate (Fig. 4g). The values

355 for left and right eye diameter mapped to the same region and, notably, we obtain the same peak

356 when including eyeless fish in the map and coding eye phenotype as a binary trait (i.e. eyed, eye-

357 less). The QTL on linkage group 13 corresponds to a 633 KB region on surface fish chromosome

358 13 as determined using the sequences flanking the 1.5-LOD support interval as input for Ensembl

359 basic local alignment search tool (Fig. 4h, Table 1). There are 22 genes in this region

360 (Supplementary Table 5). Of note, none of the previously mapped eye size QTL in Pachón cavefish

361 map to the same region44. A promising candidate gene in this interval is dusp26. Morpholino

362 knock-down of dusp26 in zebrafish results in small eyes with defective retina development and a

363 less developed lens during embryogenesis45. We used whole genome sequencing data32 to compare

364 the dusp26 coding region between surface, Tinaja, Pachón and Molino and found no coding

365 changes. However, previously published embryonic transcriptome data indicates that expression

366 of dusp26 is reduced in Pachón cavefish at 36 hpf (p<0.05) and 72 hpf (p<0.01) (Supplementary

367 Fig. 8)46. These data suggest a potential role for dusp26 in eye degeneration in cavefish, however,

368 we cannot exclude critical contributions of other genes or genomic regions in the identified

369 interval.

370

371 Structural variation in cave populations. Another important advantage of having a robust reference

372 genome is that it allows one to interrogate the genomic structural variation at a population level.

373 Knowledge of population-specific A. mexicanus structural sequence variation is lacking.

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

374 Therefore, we aligned the population samples from Herman et al.32 against Astyanax mexicanus

375 2.0 to ascertain the comparative state of deletions. We used the SV callers Manta47 and LUMPY48

376 to count the numbers of deletions present in each sample compared to Astyanax mexicanus 2.0

377 (Supplementary Fig. 9). While LUMPY tended to call a larger number of short deletions and Manta

378 a smaller number of long deletions, there was high correlation (R2=0.78) between the number of

379 calls each made per sample (Supplementary Fig. 10), so we used the intersection of deletions called

380 by both callers for further analysis. We then classified these deletions based on their effect (i.e.

381 deletions of coding, intronic, regulatory, or intergenic sequence) (Supplementary Fig. 11).

382 Among the cavefish populations measured for deletion events, 412 genes contained deletions with

383 an allele frequency >5%. We found that the Molino population has the fewest heterozygous

384 deletions, while Río Choy surface fish have the least homozygous deletions, mirroring the

385 heterozygosity of single nucleotide polymorphisms32 (Supplementary Fig. 11). In addition, the

386 Tinaja population showed the most individual variability of either allelic state (standard deviation

387 of 427). Pachón and Tinaja contained the highest number of protein-coding genes altered by a

388 deletion in at least one haplotype, while Río Choy had the least (Fig. 5a). In two examples,

389 and ephx2, we find the deletions that presumably altered protein-coding gene function varied in

390 population representation, number of bases affected, and haplotype state for each (Fig. 5b, c). Of

391 the 412 genes that contained deletions, 109 have assigned in cavefish

392 (Supplementary Table 6). We tested these 109 genes for canonical pathway enrichment using

393 WebGestalt49 and found genes significantly enriched (p<0.05) for AMPK and MAPK signaling,

394 as well as metabolic and circadian function (Table 2). In addition, some genes were linked

395 to diseases consistent with cavefish phenotypes including: ephx2, which is linked to familial

396 hypercholesterolemia, or hnf4a, which is linked with non-insulin dependent diabetes mellitus,

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

397 based on the gene ontology of OMIM and DisGeNET50. Notably, these enrichment analyses

398 recovered the deletion known to be the main cause of pigment degeneration in cavefish, oca227.

399 Further validation of these disease phenotype or pathway inferences is warranted.

400 A. Coding sequence deletions Intron sequence deletions Regulatory sequence deletions 210 7000 1000

200 6750 950

t 190 6500 n u o

c 900 6250 180 n o

eti 6000 850 170 Del 5750 160 800 5500

150 750 5250

Pachon Molino Rascon Tinaja Choy Pachon Molino Rascon Tinaja Choy Pachon Molino Rascon Tinaja Choy

Cave Population Cave Population s

Cave Population u o

B. g y n

C. z exon # 2 1 4 5 o eti mo 3 o el R H d PE

PER3 EPHX2 s u

Pachon_11 o g y n z

Molino_7a o o eti X2

Rascon_8 eter el H d PH Tinaja_2 E n o Choy_11 eti el d

chr22:4,964,704-4,966,581 Pachon Molino Rascon Tinaja Choy 401 chr9:24,897,588-24,899,021 No 402 Figure 5. Sequence deletion events characterized by population origin. a) Total deletion count per 403 population for protein-coding exons and intron sequence and regulatory sequence defined as 1000bp up or 404 downstream of annotated protein-coding genes. b) Gene sequence deletion coordinates for the circadian 405 rhythm gene per3 and cholesterol regulatory gene ephx2 by population. c) The allelic state of each deletion 406 distributed by individual sample per population. 407 408 409 410 Table 2. Protein-coding genes altered by deletion events and their enrichment among canonical pathways 411 and disease. Red text denotes significant tests for disease enrichment (see methods). 412 Gene Pathway or Disease Association Genes Database P value AMPK signaling hnf4a;lipe;ppp2r5a KEGG 0.01

Wnt signaling csnk2b;hltf;map3k7cl;pcdh9;ppp2r5a Panther 0.01

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

MAPK signaling gadd45a;mapkapk3 Panther 0.01

Metabolism cmpk1;cmpk2;ephx2;;ndufa4l2;pig KEGG 0.02 p;prps1;sgms1;sms;uxs1

Circadian clock per3 Panther 0.05 Hypercholesterolemia ephx2 OMIM 0.005 Diabetes mellitus nnf4a OMIM 0.02 Albinism pp3b1;oca2 DisGeNET 0.002 Sialorrhea ;prps1;vps13a DisGeNET 0.001 413 414 415 Coding mutations affecting hypocretin signaling. We next used the surface fish genome to

416 compare amino acid composition of key genes hypothesized to be involved in cave-specific

417 adaptations. The wake-promoting hypothalamic neuropeptide Hypocretin is a critical regulator of

418 sleep in animals ranging from zebrafish to mammals51-53. We previously found that expression of

419 hypocretin was elevated in Pachón cavefish compared to surface fish and pharmacological

420 inhibitors of Hypocretin signaling restore sleep to cavefish, suggesting enhanced Hypocretin

421 signaling underlies the evolution of sleep loss in cavefish9. In teleost fish, Hypocretin signals

422 through a single receptor, the Hypocretin Receptor 2 (Hcrtr2). We compared the sequence of hcrtr2

423 in surface fish and Pachón cavefish and identified two missense mutations that result in protein

424 coding changes, S293V and E295K (Fig. 6a). To examine whether these were specific to Pachón

425 cavefish or shared in other cavefish populations we examined the hcrtr2 coding sequence in Tinaja

426 and Molino cavefish. The affecting amino acid 295 is shared between Pachón and Tinaja

427 populations (Fig. 6a). Further, we identified a six base pair deletion that results in the loss of two

428 amino acids (amino acid 140 and 141; SV) in Molino cavefish. The presence of these variants was

429 validated by PCR and subsequent Sanger sequencing on DNA from individuals from laboratory

430 populations of these fish. The E295K variant in Tinaja and Pachón as well as the two amino acid

431 deletions in Molino affect evolutionarily conserved amino acids suggesting a potential impact on

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

432 protein function (Fig. 6b). We performed an in-silico analysis to test whether the identified

433 cavefish variants in hcrtr2 are predicted to affect the protein structure and stability of Hcrtr2. We

434 used iStable54 to test for potential destabilization effects of the substitution mutations found in

435 Tinaja (E295K) and Pachón (S293V and E295K). iStable predicted a destabilization effect of

436 E295K on the protein structure with a confidence score of 0.842. However, we found a stabilization

437 effect of the S293V mutation in Pachón using iStable with a confidence score of 0.772. We

438 repeated this analysis using MUpro55 and obtained similar results. These findings raise the

439 possibility that the evolved changes differentially affect the function of the same receptor.

440 Structural changes in upon amino acid deletion are difficult to predict with common tools

441 such as iStable and MUpro. To analyze whether the deletion of S140 and V141 in the Molino

442 hcrtr2 could potentially influence the structural integrity of Hcrtr2, we performed a different

443 analysis. We used the SWISS-MODEL protein structure prediction tool56 to identify potential

444 differences in the protein structure between surface and Molino Hcrtr2. We modeled the surface

445 and Molino Hcrtr2 protein using crystal data from the human HCRTR2 (5wqc;57). We then used

446 the VMD visualization software to overlay the surface fish and Molino predicted structure. This

447 analysis indicates that the deletion of S140 and V141 disrupts the structural integrity of an alpha

448 helical structure in the transmembrane region of Hcrtr2 that could potentially affect the stability of

449 this receptor (Fig. 6c). We also tested the Tinaja and Pachón hcrtr2 sequences using a similar

450 approach and found only minor differences between the respective cavefish Hcrtr2 structure when

451 overlaid with the surface fish structure (Fig. 6c). To confirm these potential structural changes in

452 cavefish Hcrtr2 advanced in-situ and in-vivo protein analysis need to be performed in future

453 studies, however, the identification of coding mutations in three different populations of cavefish

454 supports the notion that hypocretin signaling is under selection at the level of receptor. This is in

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

455 line with population sequencing data32, which identified hcrtr2 in the top 5% of FST outliers

456 between surface fish (Rascon) and Pachón cavefish.

457

458 459 Figure 6. Hcrtr2 mutations in cavefish. (a) position of mutations in hcrtr2 in the different cavefish 460 populations (b) Sequence alignment showing that the mutations in Molino and Tinaja are affecting 461 evolutionary conserved amino acids. (c) 3D model based on structure of human HCRTR2. Structure is 462 displayed as comic structure displaying alpha-helices and beta sheets using VMD 1.9.3. A model of Hcrtr2 463 from surface and cavefish fish was rendered using swiss model (https://swissmodel.expasy.org/) and X-ray 464 crystal data from RCSB : 5wqc. Purple arrows indicate site of mutation and yellow 465 arrows indicate sites where differences in the secondary structure between surface and respective cavefish. 466 In the Molino projection the deletion (S140 and V141) in the Molino sequence causes a break in one of the 467 structures of the transmembrane domain (highlighted in purple box). These images were made 468 with VMD/NAMD/BioCoRE/JMV/other software support. VMD/NAMD/BioCoRE/JMV/ is developed 469 with NIH support by the Theoretical and Computational Biophysics group at the Beckman Institute, 470 University of Illinois at Urbana-Champaign. 471

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

472 Discussion

473 The genomic and phenotypic surface-to-cave transitions in A. mexicanus serve as an indispensable

474 model for the study of natural polygenic trait adaptation. Here, we present a high-quality

475 “chromonome” of the surface form of A. mexicanus. Our surface fish genome far surpasses the

476 contiguity and completeness compared to the assembled cavefish version of A. mexicanus28 owing

477 to the use of updated long-read and mapping technology. This allowed for substantial increases in

478 the detected sequence variation associated with cave phenotypes. A substantial level of

479 heterozygosity within surface fish populations, coupled with our use of two surface populations

480 for de novo assembly, however, prevented more complete sequence connectivity as compared to

481 similarly assembled fish genomes such as Xiphophorus maculatus58, a laboratory lineage of reared

482 fish (Supplementary Table 1). Nevertheless, a 40-fold reduction in assembled contigs (an indirect

483 measure of gaps), a 71-fold improvement in N50 contig length, high-level synteny with prior

484 linkage map measures of A. mexicanus chromosome order, and the addition of 1,665 protein-

485 coding genes all offer researchers a resource for more complete genetic experimentation. In this

486 study, we explored cavefish genomes of diverse origins relative to this new genomic proxy for the

487 ancestral (surface) state. This yielded numerous discoveries when determining evolutionary-

488 derived sequence features that may be driving cave phenotypes.

489 We have now refined our knowledge of the genetic basis of troglomorphic traits by identifying

490 candidate genes within new and previously identified QTL. Trait-associated sequence markers,

491 when placed on more contiguous chromosomes of the surface fish genome, delineate sequence

492 structure, especially gene regulatory elements. As proof of this added discovery potential, we

493 identified new candidate genes associated with eye loss relying on a single-QTL model. Previous

494 studies using multi-QTL models have identified eight QTL associated with eye size in Pachón

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

495 showing that eye degeneration likely involves a number of genes19. Part of the single eye size QTL

496 we mapped in one of our Pachón/surface populations had been previously identified, but our new

497 analysis shifted the LOD peak to just outside the previously identified region, revealing a new

498 candidate gene for eye loss, opo. Genetic mapping in a second surface/Pachón hybrid population

499 revealed a new QTL for eye size. A gene within this region, dusp26 has lower expression in Pachón

500 cavefish during critical times of eye development. New studies aimed at identifying regulatory

501 changes linked to these candidates will further clarify the role of these genes in the evolution of

502 eye size reduction.

503 Structural variation such as genomic deletions represent a significant source of standing variation

504 for trait adaptation59 which we were unable to accurately measure previously due to numerous

505 assembly gaps in the Astyanax mexicanus 1.0.2 genome assembly. A species’ use of structural

506 variation, including deletions, insertions, inversions and other more complex events can enable

507 trait evolution, but their occurrence and importance in phenotypic adaptation among wild

508 populations is understudied. A small number of studies in teleost fish show the standing pool of

509 structural variations (SVs) outnumbers single nucleotide variations (SNVs) and are likely in use

510 to alter the genotype to phenotype continuum of natural selection60,61. For example, rampant copy

511 number variation occurs during population differentiation in stickleback59. We conducted the first

512 study of cavefish SVs, albeit only identifying deletions. We identified SVs possibly contributing

513 to observed troglomorphic phenotypes, discovered previously unidentified population-unique

514 deletions, and broadly classified their putative impact through gene function inference to zebrafish.

515 Of the deletion altered protein-coding genes where biological inference is possible, we find low

516 and high frequency deletions in per3 and ephx2, respectively (Fig. 5b, c). Some of these gene

517 disruptions fit expected cave phenotypes while others are of unknown significance (e.g. sms a gene

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

518 involved in beta-alanine metabolism). per3 plays a role in vertebrate circadian regulation and

519 corticogenesis62, and sleep and circadian patterns are significantly disrupted in cavefish9. In all

520 sampled Molino fish, per3 harbors a heterozygous deletion (Fig. 5b, c).

521 In addition, we identify multiple evolved coding changes within hcrtr2, a receptor that

522 regulates sleep across vertebrate species63. This study uncovered unique changes in cave

523 populations at the hcrtr2 locus. Future genome-wide analysis of G protein-coupled receptor

524 (GPCR) differences between surface fish and cavefish populations, combined with functional

525 studies using allele swapping between cavefish and surface fish, will further elucidate the

526 relationship between GPCR function and trait evolution.

527 Our discovery of novel cave specific features of several genes and many others not

528 explored here highlight novel insights that were obscured when using Astyanax mexicanus 1.0.2

529 genome assembly as a reference. The first cave population-wide catalog of thousands of deletions

530 that could potentially impact the cave phenotypes collectively awaits investigation and validation.

531 Future assembly improvements to both the cave and surface forms of the A. mexicanus genome

532 are expected and will be key in understanding population-genetic processes that enabled cave-

533 colonization. The shared gene synteny of A. mexicanus with zebrafish, gene editing success

534 demonstrated in this study, and their interbreeding capabilities promise to reveal not only genic

535 contributions, but likely regulatory regions associated with the molecular origins of troglomorphic

536 trait differences.

537 538 539 Methods section

540 Genome sequencing and assembly curation

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

541 DNA sequencing. High molecular weight DNA was isolated from a single female surface fish

542 using the MagAttract kit (Qiagen) according to the manufacturer’s protocol. The sample used for

543 the assembly was a female fish (Asty152) from a lab reared cross of wild-caught Astyanax

544 mexicanus surface populations from the Río Sabinas and the Río Valles surface localities

545 (Supplementary Fig. 1a). Single molecule real-time (SMRT) sequencing was completed on a

546 PacBio RSII instrument, yielding an average read length of ~12 kb. SMRT sequence coverage of

547 >50-fold was generated based on an estimated genome size of 1.3 Gb. All SMRT sequences are

548 available under NCBI BioProject number PRJNA533584.

549

550 Assembly and error correction. For de novo assembly of all SMRT sequences, we used a fuzzy de

551 Bruijn graph algorithm, wtdbg29 for contig graph construction, followed by collective raw read

552 alignment for assembly base error correction. All error-prone reads were first used to generate an

553 assembly graph with genomic k-mers unique to each read that results in primary contigs.

554 Assembled primary contigs were corrected for random base error, predominately indels, using all

555 raw reads mapped with minimap264. To further reduce consensus assembly base error, we

556 corrected homozygous insertion, deletion and single base differences using default parameter

557 settings in Pilon65 with ~60x coverage of an Illumina PCR-free library (150bp read length) derived

558 from Asty152 DNA.

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

559 Assembly scaffolding. To scaffold de novo assembled contigs, we generated a BioNano Irys

560 restriction map of another surface fish (Asty168) that allowed sequence contigs to be ordered and

561 oriented, and potential mis-assemblies to be identified. Asty168 is the offspring of two surface

562 fish, one from the Río Valles (Asty02) and the other from the Río Sabinas locality (Asty04), both

563 Asty02 and Asty04 were the offspring of wild-caught fish (Supplementary Fig. 1b). We prepared

564 HMW-DNA in agar plugs using a previously established protocol for soft tissues66. Briefly, we

565 followed a series of enzymatic reactions that (1) lysed cells, (2) degraded protein and RNA, and

566 (3) added fluorescent labels to nicked sites using the IrysPrep Reagent Kit. The nicked DNA

567 fragments were labeled with Alexa Fluor 546 dye, and the DNA molecules were counter-stained

568 with YOYO-1 dye. The labeled DNA fragments were electrophoretically elongated and sized on

569 a single IrysChip, and subsequent imaging and data processing determined the size of each DNA

570 fragment. Finally, a BioNano proprietary algorithm performed a de novo assembly of all labeled

571 fragments >150 kbp into a whole-genome optical map with defined overlap patterns. The

572 individual map was clustered and scored for pairwise similarity, and Euclidian distance matrices

573 were built. Manual refinements were then performed as previously described66.

574

575 Chromosome builds. Upon chimeric contig correction and completion of scaffold assembly steps

576 using the BioNano map, we used Chromonomer67 to align all possible scaffolds to the A.

577 mexicanus high-density linkage map30, then assigned chromosome coordinates. Using default

578 parameter settings, Chromonomer attempts to find the best set of non-conflicting markers that

579 maximizes the number of scaffolds in the map, while minimizing ordering discrepancies. The

580 output is a FASTA file format describing the location of scaffolds by chromosome: a

581 “chromonome”.

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

582

583 Defining syntenic regions between cave and surface genomes. A total of 2,235 genotyping-by-

584 sequencing (GBS) markers30 were mapped to both the cave (Astyanax mexicanus 1.0.2) and

585 surface fish (Astyanax mexicanus 2.0) assemblies (Supplementary Fig. 2). These GBS markers

586 were mapped to Astyanax mexicanus 1.0.2 using the Ensembl ‘BLAST/BLAT search’ web tool

587 and resulting information from each individual query was transcribed into an Excel worksheet. To

588 map to the surface fish genome, the NCBI ‘Magic-BLAST’ (version1.3.0) command line mapping

589 tool68 was used. The resulting output of the mapping was a single, tabular formatted spreadsheet,

590 which was used to visualize syntenic regions from the constructed linkage map30. We used Circos

591 software to visualize the positions of all markers that mapped to the Astyanax mexicanus 2.0

592 genome69. Any chromonome errors detected through these synteny alignments were investigated

593 and manually corrected if orthologous data were available, such as Astyanax mexicanus 1.0.2

594 scaffold alignment.

595

596 Gene Annotation. The Astyanax mexicanus 2.0 assembly was annotated using the previously

597 described NCBI70 and Ensembl35 pipelines, including masking of repeats prior to ab initio gene

598 predictions, and RNAseq evidence-supported gene model building. NCBI and Ensembl gene

599 annotation relied on an extensive variety of publicly RNA-seq data from both cave and surface

600 fish tissues to improve gene model accuracy. The Astyanax mexicanus 2.0 RefSeq or Ensembl

601 release 98 gene annotation reports each provide a full accounting of all methodology deployed and

602 their output metrics within each respective browser.

603

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

604 Assaying genome quality using population genomic samples. To understand the impact reference

605 sequence bias and quality may have on downstream population genomic analyses for the cavefish

606 and surface fish genomes, we utilized the population genomic re-sequenced individuals processed

607 in Herman et al.32. In brief, 100bp sequences were aligned to the reference genomes Astyanax

608 mexicanus 1.0.2 and Astyanax mexicanus 2.0. The NCBI annotation pipeline of both assemblies

609 included WindowMasker and RepeatMasker steps to delineate and exclude repetitive regions from

610 gene model annotation. The positional coordinates for repeats identified by RepeatMasker are

611 provided in the BED format at NCBI for each genome. WindowMasker’s ‘nmer’ files (counts)

612 were used to re-generate repetitive region BED coordinates31. The BED coordinates for both

613 maskers were intersected with BEDTools v2.27.171. We used SAMTools 1.9 with these

614 coordinates and alignment quality scores to filter alignments and generate summary statistics for

615 each sample aligned to the cave and surface reference genomes 72. Summary plots were generated

616 in R 3.6.

617

618 Genetic mapping in surface x Pachón crosses. To map albinism and eye size to the new surface

619 genome and test robustness in this methodology across laboratories, two independent F2 mapping

620 populations consisting of surface/Pachón hybrids were analyzed. First, we scored 188

621 surface/Pachón F2 hybrids for albinism and normalized eye perimeter that were used in a

622 previously published genetic mapping study10. Phenotypes were assessed using macroscopic

623 images of entire fish and measurements were obtained using ImageJ73. Normalized eye perimeter

624 was determined by dividing eye perimeter by body length. Albinism was scored as absence of

625 body and eye pigment.

626

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

627 The 25 linkage groups (LGs) constructed de novo in earlier studies of this population10 were

628 scanned using the R (v.3.5.3) package R/qtl (v.1.44-9)74 using the scanone function for markers

629 linked with albinism (binary model) or left eye perimeter relative to body length (normal model).

630 The genome-wide LOD significance threshold was set at the 95th percentile of 1,000 permutations.

631 All marker sequences were aligned to both the Astyanax mexicanus 1.0.2 and Astyanax mexicanus

632 2.0 references using Bowtie (v.2.2.6 in sensitive mode)75. Circos plots were generated using

633 v.0.69-669.

634 The second F2 mapping population (n=219) consisted of three clutches produced from breeding

76 635 paired F1 surface/Pachón hybrid siblings . Albinism and eye size were assessed using

636 macroscopic images of entire fish. Eye diameter and fish length were measured in ImageJ73

637 according to Hinaux et al.77. Fish lacking an eye (21/195 on the left side, 22/172 on the right side)

638 were not included in the analysis of eye size in order to analyze eye size using a normal distribution

639 model. We found that fish length was positively correlated with eye diameter. To eliminate the

640 effect of fish length on potential QTL, we analyzed eye diameter normalized to standard length.

641 We used R/qtl74 to scan the linkage groups (scanone function) for markers linked with albinism

642 (binary model) or eye diameter relative to body length (normal model) and assessed statistical

643 significance of the LOD scores by calculating the 95th percentile of genome-wide maximum

644 penalized LOD score using 1000 random permutations. We estimated confidence intervals for the

645 QTL using 1.5-LOD support interval (Iodint function).

646

647 Complementation analysis in albino surface-Pachón F2 fish. An albino surface-Pachón F2 fish was

648 crossed to oca2D4bp/+ surface fish27. Five day-post-fertilization larval offspring from this cross were

649 scored for pigmentation (presence or absence of melanin pigmentation) by routine observation.

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

650 Larvae were imaged under a dissecting microscope and the number of pigmented and albino

651 progeny were assessed. Following this, DNA was extracted from 8 pigmented and 8 albino

652 progeny, and these fish were then genotyped for the engineered 4 bp deletion by PCR followed by

653 gel electrophoresis, using locus specific primers (forward primer: 5’-

654 CCCAAAGCAGAGTGTTTGGTA-3’, reverse primer: 5’-

655 TTTCCAAAGATCACATATCTTGACA-3’) and methods79,80. Briefly, larval fish were

656 euthanized in MS-222, fish were imaged, then DNA was extracted from whole fish, PCR was

657 performed followed by gel electrophoresis. All genotyped albino embryos had two bands

658 (indicating the presence of the engineered deletion), whereas all genotyped pigmented embryos

659 had a single band, indicating inheritance of the wild-type allele from the surface fish parent.

660

661 Genetic mapping of previously mapped QTL studies to the surface fish genome. To identify any

662 candidate genes potentially associated with cave phenotypes that were not evident in the more

663 fragmented Astyanax mexicanus 1.0.2 genome, we generated a QTL database for the Astyanax

664 mexicanus 2.0 genome to identify genomic regions containing groups of markers associated with

665 cave-derived phenotypes. We followed the methods Herman et al.32 used to create a similar

666 database for the cavefish genome. BLAST was used to identify locations in the surface fish genome

667 for 1,156 markers from several previous QTL studies18,19,21-23,81. The top BLAST hit for each

668 marker was identified by ranking e-value, followed by bitscore, and then alignment length.

669 Previously, 687 of these markers were mapped to the cavefish genome. BEDTools intersect and

670 the surface fish genome annotation were used to identify all genes within the regions of interest.

671 Genomic intervals associated with activity QTL previously identified using the existing high-

672 density linkage map were similarly re-examined using the established locations of the 2,235 GBS

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

673 markers within the Astyanax mexicanus 2.0 genome25. We then investigated whether genes

674 identified near QTL had ontologies associated with cave phenotypes using the NCBI

675 (https://www.ncbi.nlm.nih.gov/) or Ensembl genome browser (https://www.ensembl.org).

676

677 Generation of rx3 CRISPant fish. CRISPant surface fish were generated using CRISPR/Cas9. A

678 gRNA targeting exon 2 was designed and generated as described previously82. Briefly, oligo A

679 containing the gRNA sequence (5’-GTGTAGCTGAAACGTGGTGA-3’) between the sequence

680 for the T7 promoter and a sequence overlapping with the second oligo, oligo B, was synthesized

681 (IDT) (Supplementary Table 7). Following annealing with Oligo B and amplification, the T7

682 Megascript Kit (Ambion) was used to transcribe the gRNA with several modifications, as in27,80.

683 The gRNA was cleaned up using the miRNeasy mini kit (Qiagen) following manufacturer’s

684 instructions and eluted into RNase-free water. Nis-Cas9-nis83 mRNA was transcribed using the

685 mMessage mMACHINE T3 kit (Life Technologies) following manufacturers directions. Single

686 cell embryos were injected with 25 pg of gRNA and 150 pg Cas9 mRNA (as in79,80). Injected fish

687 were screened to assess for mutagenesis by PCR using primers surrounding the gRNA target site

688 (F primer: 5’-AGCCCGGACCGTAAGAAG-3’, R primer: 5’-GCTGTAAACGTCGGGGTAGT-

689 3’). Genotyping was performed with DNA extracted from whole larval fish from the clutch, as

690 described previously84. Gel electrophoresis was used to discriminate between alleles with indels

691 (more than one band results in a smeary band) and wild-type alleles (single band: as in79). Injected

692 fish (CRISPants) along with uninjected wild-type siblings were raised to adulthood. Eye size was

693 assessed in both rx3 gRNA/Cas9 mRNA injected fish (CRISPants) and wild-type sibling adult fish

694 (n=5 each). Fish were anesthetized in MS-222 and imaged under a dissecting microscope.

695

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

696 Structural variation detection. We first aligned Illumina reads from A. mexicanus population

697 resequencing data described by Herman et. al.32 to the Astyanax mexicanus 2.0 reference using

698 bwa mem v0.7.1785 with default options. We then converted the output to bam format, fixed mate

699 pair information, sorted and removed duplicates using the SAMTools v1.9 view -bh, fixmate -m,

700 sort, and markdup -r modules, respectively72. We ran two software packages for calling structural

701 variants (SV) using these alignments: manta47 and lumpy48. We ran manta v1.6.0 with default

702 options as suggested in the package’s manual, and lumpy using the Smoove v0.2.3 pipeline

703 (https://github.com/brentp/smoove) using the commands given in the “Population calling” section.

704 Nextflow86 workflows for our running of both SV callers can be found at

705 https://github.com/esrice/workflows. Due to the low sequence coverage (average ~9x) available

706 per sample, we confined analysis to deletions called by both SV callers and within the size range

707 500bp to 100kb. We consider a deletion to be present in both sets of calls if there was a reciprocal

708 overlap of 50% of the length of the deletion. We used the scripts merge_deletions.py to find the

709 intersection of the two sets of deletions, annotate_vcf.py to group the deletions by their effect (i.e.

710 deletion of coding, intronic, regulatory, or intergenic sequence), and

711 count_variants_per_sample.py to count the numbers of variants called in each sample; all scripts

712 can be found at https://github.com/esrice/misc-tools. All protein-coding genes with detected

713 deletions among cavefish populations as defined in this study were used as input to test for

714 significant enrichment among specific databases within WebGestalt 49. Gene IDs were input

715 as gene symbols, with organism of interest set to zebrafish using protein-coding genes as the

716 reference set. Pathway common databases of KEGG and Panther canonical gene signaling

717 pathways as well as the various genes associated with diseases curated in OMIM and Digenet87

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

718 were reported using a hypergeometric test, and the significance level was set at 0.05. We

719 implemented the Benjamini and Hochberg multiple test adjustment88 to control for false discovery.

720 721 722 References: 723 1 Culver, D. & Pipan, T. The Biology of Caves and Other Subterranean Habitats. Second 724 edn, (Oxford University Press, 2019). 725 2 Protas, M. & Jeffery, W. R. Evolution and development in cave animals: from fish to 726 crustaceans. Wiley Interdiscip Rev Dev Biol 1, 823-845, doi:10.1002/wdev.61 (2012). 727 3 Emerling, C. A. & Springer, M. S. Eyes underground: regression of visual protein networks 728 in subterranean . Mol Phylogenet Evol 78, 260-270, 729 doi:10.1016/j.ympev.2014.05.016 (2014). 730 4 Alex Keene, M. Y., Suzanne McGaugh. Biology and Evolution of the Mexican Cavefish. 731 (Academic Press, 2015). 732 5 Gross, J. B. The complex origin of Astyanax cavefish. BMC Evol Biol 12, 105, 733 doi:10.1186/1471-2148-12-105 (2012). 734 6 Maldonado, E., Rangel-Huerta, E., Rodriguez-Salazar, E., Pereida-Jaramillo, E. & 735 Martinez-Torres, A. Subterranean life: Behavior, metabolic, and some other adaptations of 736 Astyanax cavefish. J Exp Zool B Mol Dev Evol, doi:10.1002/jez.b.22948 (2020). 737 7 Krishnan, J. & Rohner, N. Cavefish and the basis for eye loss. Philos Trans R Soc Lond B 738 Biol Sci 372, doi:10.1098/rstb.2015.0487 (2017). 739 8 Aspiras, A. C., Rohner, N., Martineau, B., Borowsky, R. L. & Tabin, C. J. Melanocortin 4 740 receptor mutations contribute to the adaptation of cavefish to nutrient-poor conditions. 741 Proc Natl Acad Sci U S A 112, 9668-9673, doi:10.1073/pnas.1510802112 (2015). 742 9 Jaggard, J. B. et al. Hypocretin underlies the evolution of sleep loss in the Mexican 743 cavefish. Elife 7, doi:10.7554/eLife.32637 (2018). 744 10 Stockdale, W. T. et al. Heart Regeneration in the Mexican Cavefish. Cell Rep 25, 1997- 745 2007 e1997, doi:10.1016/j.celrep.2018.10.072 (2018). 746 11 Riddle, M. R. et al. Insulin resistance in cavefish as an adaptation to a nutrient-limited 747 environment. Nature 555, 647-651, doi:10.1038/nature26136 (2018). 748 12 Yoshizawa, M. et al. The evolution of a series of behavioral traits is associated with autism- 749 risk genes in cavefish. BMC Evol Biol 18, 89, doi:10.1186/s12862-018-1199-9 (2018). 750 13 Elipot, Y., Hinaux, H., Callebert, J. & Retaux, S. Evolutionary shift from fighting to 751 foraging in blind cavefish through changes in the serotonin network. Curr Biol 23, 1-10, 752 doi:10.1016/j.cub.2012.10.044 (2013). 753 14 Gross, J. B. & Powers, A. K. A Natural Animal Model System of Craniofacial Anomalies: 754 The Blind Mexican Cavefish. Anat Rec (Hoboken) 303, 24-29, doi:10.1002/ar.23998 755 (2020). 756 15 Carlson, B. M. & Gross, J. B. Characterization and comparison of activity profiles 757 exhibited by the cave and surface morphotypes of the blind Mexican tetra, Astyanax 758 mexicanus. Comp Biochem Physiol C Toxicol Pharmacol 208, 114-129, 759 doi:10.1016/j.cbpc.2017.08.002 (2018).

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

760 16 Xiong, S., Krishnan, J., Peuss, R. & Rohner, N. Early adipogenesis contributes to excess 761 fat accumulation in cave populations of Astyanax mexicanus. Dev Biol 441, 297-304, 762 doi:10.1016/j.ydbio.2018.06.003 (2018). 763 17 Rohner, N. Cavefish as an evolutionary mutant model system for human disease. Dev Biol 764 441, 355-357, doi:10.1016/j.ydbio.2018.04.013 (2018). 765 18 Protas, M. et al. Multi-trait evolution in a cave fish, Astyanax mexicanus. Evol Dev 10, 766 196-209, doi:10.1111/j.1525-142X.2008.00227.x (2008). 767 19 Protas, M., Conrad, M., Gross, J. B., Tabin, C. & Borowsky, R. Regressive evolution in 768 the Mexican cave tetra, Astyanax mexicanus. Curr Biol 17, 452-454, 769 doi:10.1016/j.cub.2007.01.051 (2007). 770 20 Protas, M. E. et al. Genetic analysis of cavefish reveals molecular convergence in the 771 evolution of albinism. Nat Genet 38, 107-111, doi:10.1038/ng1700 (2006). 772 21 Kowalko, J. E. et al. Convergence in feeding posture occurs through different genetic loci 773 in independently evolved cave populations of Astyanax mexicanus. Proc Natl Acad Sci U 774 S A 110, 16933-16938, doi:10.1073/pnas.1317192110 (2013). 775 22 Kowalko, J. E. et al. Loss of schooling behavior in cavefish through sight-dependent and 776 sight-independent mechanisms. Curr Biol 23, 1874-1883, doi:10.1016/j.cub.2013.07.056 777 (2013). 778 23 Yoshizawa, M., Yamamoto, Y., O'Quin, K. E. & Jeffery, W. R. Evolution of an adaptive 779 behavior and its sensory receptors promotes eye regression in blind cavefish. BMC Biol 10, 780 108, doi:10.1186/1741-7007-10-108 (2012). 781 24 O'Quin, K. E., Yoshizawa, M., Doshi, P. & Jeffery, W. R. Quantitative genetic analysis of 782 retinal degeneration in the blind cavefish Astyanax mexicanus. PLoS One 8, e57281, 783 doi:10.1371/journal.pone.0057281 (2013). 784 25 Carlson, B. M., Klingler, I. B., Meyer, B. J. & Gross, J. B. Genetic analysis reveals 785 candidate genes for activity QTL in the blind Mexican tetra, Astyanax mexicanus. PeerJ 786 6, e5189, doi:10.7717/peerj.5189 (2018). 787 26 Stahl, B. A. et al. Stable transgenesis in Astyanax mexicanus using the Tol2 transposase 788 system. Dev Dyn, doi:10.1002/dvdy.32 (2019). 789 27 Klaassen, H., Wang, Y., Adamski, K., Rohner, N. & Kowalko, J. E. CRISPR mutagenesis 790 confirms the role of oca2 in melanin pigmentation in Astyanax mexicanus. Dev Biol 441, 791 313-318, doi:10.1016/j.ydbio.2018.03.014 (2018). 792 28 McGaugh, S. E. et al. The cavefish genome reveals candidate genes for eye loss. Nat 793 Commun 5, 5307, doi:10.1038/ncomms6307 (2014). 794 29 Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat Methods 17, 155- 795 158, doi:10.1038/s41592-019-0669-3 (2020). 796 30 Carlson, B. M., Onusko, S. W. & Gross, J. B. A high-density linkage map for Astyanax 797 mexicanus using genotyping-by-sequencing technology. G3 (Bethesda) 5, 241-251, 798 doi:10.1534/g3.114.015438 (2014). 799 31 Morgulis, A., Gertz, E. M., Schaffer, A. A. & Agarwala, R. WindowMasker: window- 800 based masker for sequenced genomes. Bioinformatics 22, 134-141, 801 doi:10.1093/bioinformatics/bti774 (2006). 802 32 Herman, A. et al. The role of gene flow in rapid and repeated evolution of cave-related 803 traits in Mexican tetra, Astyanax mexicanus. Mol Ecol 27, 4397-4416, 804 doi:10.1111/mec.14877 (2018).

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

805 33 Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat 806 Biotechnol, doi:10.1038/nbt.4277 (2018). 807 34 Sayers, E. W. et al. Database resources of the National Center for Biotechnology 808 Information. Nucleic Acids Res, doi:10.1093/nar/gkz899 (2019). 809 35 Yates, A. D. et al. Ensembl 2020. Nucleic Acids Res, doi:10.1093/nar/gkz966 (2019). 810 36 O’Quin K, M. S. The genetic bases of troglomorphy in Astyanax: How far we have come 811 and where do we go from here? In: A. Keene, M. Yoshizawa, S.E. McGaugh, ed. Biology 812 and Evolution of the Mexican Cavefish: Elsevier (2015). 813 37 Casane, D. & Retaux, S. Evolutionary Genetics of the Cavefish Astyanax mexicanus. Adv 814 Genet 95, 117-159, doi:10.1016/bs.adgen.2016.03.001 (2016). 815 38 Cavodeassi, F., Ivanovitch, K. & Wilson, S. W. Eph/Ephrin signalling maintains eye field 816 segregation from adjacent neural plate territories during forebrain morphogenesis. 817 Development 140, 4193-4202, doi:10.1242/dev.097048 (2013). 818 39 Stigloher, C. et al. Segregation of telencephalic and eye-field identities inside the zebrafish 819 forebrain territory is controlled by Rx3. Development 133, 2925-2935, 820 doi:10.1242/dev.02450 (2006). 821 40 Mathers, P. H., Grinberg, A., Mahon, K. A. & Jamrich, M. The Rx homeobox gene is 822 essential for vertebrate eye development. Nature 387, 603-607, doi:10.1038/42475 (1997). 823 41 Loosli, F. et al. Medaka eyeless is the key factor linking retinal determination and eye 824 growth. Development 128, 4035-4044 (2001). 825 42 Loosli, F. et al. Loss of eyes in zebrafish caused by mutation of chokh/rx3. EMBO Rep 4, 826 894-899, doi:10.1038/sj.embor.embor919 (2003). 827 43 Martinez-Morales, J. R. et al. ojoplano-mediated basal constriction is essential for optic 828 cup morphogenesis. Development 136, 2165-2175, doi:10.1242/dev.033563 (2009). 829 44 Borowsky, R. Restoring sight in blind cavefish. Curr Biol 18, R23-24, 830 doi:10.1016/j.cub.2007.11.023 (2008). 831 45 Yang, C. H. et al. NEAP/DUSP26 suppresses receptor tyrosine kinases and regulates 832 neuronal development in zebrafish. Sci Rep 7, 5241, doi:10.1038/s41598-017-05584-7 833 (2017). 834 46 Stahl, B. A. & Gross, J. B. A Comparative Transcriptomic Analysis of Development in 835 Two Astyanax Cavefish Populations. J Exp Zool B Mol Dev Evol 328, 515-532, 836 doi:10.1002/jez.b.22749 (2017). 837 47 Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and 838 cancer sequencing applications. Bioinformatics 32, 1220-1222, 839 doi:10.1093/bioinformatics/btv710 (2016). 840 48 Layer, R. M., Chiang, C., Quinlan, A. R. & Hall, I. M. LUMPY: a probabilistic framework 841 for structural variant discovery. Genome Biol 15, R84, doi:10.1186/gb-2014-15-6-r84 842 (2014). 843 49 Wang, J., Vasaikar, S., Shi, Z., Greer, M. & Zhang, B. WebGestalt 2017: a more 844 comprehensive, powerful, flexible and interactive gene set enrichment analysis toolkit. 845 Nucleic Acids Res 45, W130-W137, doi:10.1093/nar/gkx356 (2017). 846 50 Pinero, J. et al. The DisGeNET knowledge platform for disease genomics: 2019 update. 847 Nucleic Acids Res, doi:10.1093/nar/gkz1021 (2019). 848 51 Prober, D. A., Rihel, J., Onah, A. A., Sung, R. J. & Schier, A. F. Hypocretin/orexin 849 overexpression induces an insomnia-like phenotype in zebrafish. J Neurosci 26, 13400- 850 13410, doi:10.1523/JNEUROSCI.4332-06.2006 (2006).

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

851 52 Yokogawa, T. et al. Characterization of sleep in zebrafish and insomnia in hypocretin 852 receptor mutants. PLoS Biol 5, e277, doi:10.1371/journal.pbio.0050277 (2007). 853 53 Lin, L. et al. The sleep disorder canine narcolepsy is caused by a mutation in the hypocretin 854 (orexin) receptor 2 gene. Cell 98, 365-376, doi:10.1016/s0092-8674(00)81965-0 (1999). 855 54 Chen, C. W., Lin, J. & Chu, Y. W. iStable: off-the-shelf predictor integration for predicting 856 protein stability changes. BMC Bioinformatics 14 Suppl 2, S5, doi:10.1186/1471-2105- 857 14-S2-S5 (2013). 858 55 Cheng, J., Randall, A. & Baldi, P. Prediction of protein stability changes for single-site 859 mutations using support vector machines. Proteins 62, 1125-1132, doi:10.1002/prot.20810 860 (2006). 861 56 Waterhouse, A. et al. SWISS-MODEL: homology modelling of protein structures and 862 complexes. Nucleic Acids Res 46, W296-W303, doi:10.1093/nar/gky427 (2018). 863 57 Suno, R. et al. Crystal Structures of Human Orexin 2 Receptor Bound to the Subtype- 864 Selective Antagonist EMPA. Structure 26, 7-19 e15, doi:10.1016/j.str.2017.11.005 (2018). 865 58 Schartl, M. et al. The genome of the platyfish, Xiphophorus maculatus, provides insights 866 into evolutionary adaptation and several complex traits. Nat Genet 45, 567-572, 867 doi:10.1038/ng.2604 (2013). 868 59 Chain, F. J. et al. Extensive copy-number variation of young genes across stickleback 869 populations. PLoS Genet 10, e1004830, doi:10.1371/journal.pgen.1004830 (2014). 870 60 Catanach, A. et al. The genomic pool of standing structural variation outnumbers single 871 nucleotide polymorphism by threefold in the marine teleost Chrysophrys auratus. Mol Ecol 872 28, 1210-1223, doi:10.1111/mec.15051 (2019). 873 61 Flagel, L. E., Willis, J. H. & Vision, T. J. The standing pool of genomic structural variation 874 in a natural population of Mimulus guttatus. Genome Biol Evol 6, 53-64, 875 doi:10.1093/gbe/evt199 (2014). 876 62 Noda, M. et al. Role of Per3, a circadian clock gene, in embryonic development of mouse 877 . Sci Rep 9, 5874, doi:10.1038/s41598-019-42390-9 (2019). 878 63 Li, S. B. & de Lecea, L. The hypocretin (orexin) system: from a neural circuitry 879 perspective. Neuropharmacology 167, 107993, doi:10.1016/j.neuropharm.2020.107993 880 (2020). 881 64 Li, H. Minimap and miniasm: fast mapping and de novo assembly for noisy long 882 sequences. Bioinformatics 32, 2103-2110, doi:10.1093/bioinformatics/btw152 (2016). 883 65 Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection 884 and genome assembly improvement. PLoS One 9, e112963, 885 doi:10.1371/journal.pone.0112963 (2014). 886 66 Lam, E. T. et al. Genome mapping on nanochannel arrays for structural variation analysis 887 and sequence assembly. Nat Biotechnol 30, 771-776, doi:10.1038/nbt.2303 (2012). 888 67 Catchen, J., Amores, A. & Bassham, S. Chromonomer: a tool set for repairing and 889 enhancing assembled genomes through integration of genetic maps and conserved synteny. 890 bioRxiv 2020.02.04.934711 (2020). 891 68 Boratyn, G. M., Thierry-Mieg, J., Thierry-Mieg, D., Busby, B. & Madden, T. L. Magic- 892 BLAST, an accurate RNA-seq aligner for long and short reads. BMC Bioinformatics 20, 893 405, doi:10.1186/s12859-019-2996-x (2019). 894 69 Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome 895 Res 19, 1639-1645, doi:10.1101/gr.092759.109 (2009).

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

896 70 Pruitt, K. D. et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids 897 Res 42, D756-763, doi:10.1093/nar/gkt1114 (2014). 898 71 Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic 899 features. Bioinformatics 26, 841-842, doi:10.1093/bioinformatics/btq033 (2010). 900 72 Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078- 901 2079, doi:10.1093/bioinformatics/btp352 (2009). 902 73 Schindelin, J. et al. Fiji: an open-source platform for biological-image analysis. Nat 903 Methods 9, 676-682, doi:10.1038/nmeth.2019 (2012). 904 74 Broman, K. W., Wu, H., Sen, S. & Churchill, G. A. R/qtl: QTL mapping in experimental 905 crosses. Bioinformatics 19, 889-890, doi:10.1093/bioinformatics/btg112 (2003). 906 75 Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat Methods 907 9, 357-359, doi:10.1038/nmeth.1923 (2012). 908 76 Riddle, M. R. et al. Genetic architecture underlying changes in carotenoid accumulation 909 during the evolution of the blind Mexican cavefish, Astyanax mexicanus. J Exp Zool B 910 Mol Dev Evol, doi:10.1002/jez.b.22954 (2020). 911 77 Hinaux, H. et al. A developmental staging table for Astyanax mexicanus surface fish and 912 Pachon cavefish. Zebrafish 8, 155-165, doi:10.1089/zeb.2011.0713 (2011). 913 78 Kavalco, K. F. & De Almeida-Toledo, L. F. Molecular cytogenetics of blind mexican tetra 914 and comments on the karyotypic characteristics of genus Astyanax (Teleostei, Characidae). 915 Zebrafish 4, 103-111, doi:10.1089/zeb.2007.0504 (2007). 916 79 Kowalko, J. E., Ma, L. & Jeffery, W. R. Genome Editing in Astyanax mexicanus Using 917 Transcription Activator-like Effector Nucleases (TALENs). J Vis Exp, doi:10.3791/54113 918 (2016). 919 80 Stahl, B. A. et al. Manipulation of Gene Function in Mexican Cavefish. J Vis Exp, 920 doi:10.3791/59093 (2019). 921 81 Yoshizawa, M. et al. Distinct genetic architecture underlies the emergence of sleep loss 922 and prey-seeking behavior in the Mexican cavefish. BMC Biol 13, 15, doi:10.1186/s12915- 923 015-0119-3 (2015). 924 82 Varshney, G. K. et al. High-throughput gene targeting and phenotyping in zebrafish using 925 CRISPR/Cas9. Genome Res 25, 1030-1042, doi:10.1101/gr.186379.114 (2015). 926 83 Jao, L. E., Wente, S. R. & Chen, W. Efficient multiplex biallelic zebrafish genome editing 927 using a CRISPR nuclease system. Proc Natl Acad Sci U S A 110, 13904-13909, 928 doi:10.1073/pnas.1308335110 (2013). 929 84 Ma, L., Jeffery, W. R., Essner, J. J. & Kowalko, J. E. Genome editing using TALENs in 930 blind Mexican Cavefish, Astyanax mexicanus. PLoS One 10, e0119370, 931 doi:10.1371/journal.pone.0119370 (2015). 932 85 Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. 933 arXiv 1303.3997 (2013). 934 86 Di Tommaso, P. et al. Nextflow enables reproducible computational workflows. Nat 935 Biotechnol 35, 316-319, doi:10.1038/nbt.3820 (2017). 936 87 Pinero, J. et al. DisGeNET: a comprehensive platform integrating information on human 937 disease-associated genes and variants. Nucleic Acids Res 45, D833-D839, 938 doi:10.1093/nar/gkw943 (2017). 939 88 Benjamini, Y. Controlling the false discovery rate: A practical and powerful approach to 940 multipe testing. J R Stat Soc Series B Stat Methodol 57, 289-300 (1995). 941

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

942 943 Acknowledgments: We thank Karin Zueckert-Gaudenz and Mihaela Sardiu for technical 944 assistance and the Stowers aquatics group for fish care and help with shipments of fish between 945 NYU and Stowers. AK, SEM, NR are supported by NIH 1R01GM127872-01. This work was also 946 supported by NIH R24OD011198 to WCW, LH, ESR, TL, and MK. NSF EDGE Award 1923372 947 to NR, JEK, SEM, NSF DEB-1754231 to JEK and AK, and NSF IOS-1933428 to JEK, SEM, NR. 948 NR is further supported by institutional funding, funding from the Edward Mallinckrodt 949 Foundation and the JDRF. JBG is supported by NIDCR R01-DE025033 and NSF DEB-1457630. 950 Some computation for this work was performed on the high-performance computing infrastructure 951 provided by Research Computing Support Services and in part by the National Science Foundation 952 under grant number CNS-1429294 at the University of Missouri, Columbia MO. The Minnesota 953 Supercomputing Institute (MSI) at the University of Minnesota provided resources that contributed 954 to the research results reported within this paper. 955 956 Author contributions 957 WCW, NR conceived of the study. AA, TB, RB, BMC, FD, EF, JG, LH, ZH, ACK, AK, JK, CT, 958 MK, MEL, TL, SEM, JTM, MM, RLM, RP, ER, MR, IS, BAS, CT, ST, YY performed and 959 analyzed the experiments. WCW, NR, RB, BCM, JG, ACK, JK, SEM, MM, MR, CT, YY wrote 960 the paper. 961 962 Competing interest statement 963 The authors declare no competing interests 964 965 Data availability statement 966 Original data underlying this manuscript can be accessed from the Stowers Original Data 967 Repository at http://www.stowers.org/research/publications/libpb-1528 or made available on 968 request 969 970 971 972

39 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

973 974 975 976 977 978 979 980 981 982 983 Supplementary Figures 984

985 986 Figure S1. Breeding scheme to cross two surface fish populations from different geographic locations 987 in Mexico. (a) a female Río Sabinas (Asty04) was crossed to a male Río Valles (Asty02) and the resulting 988 female (Asty 137) was crossed to the male Río Valles (Asty02). The resulting female fish (Asty152) was 989 sequenced and assembled. (b) A female Río Valles (Asty02) was crossed to a male Río Sabinas (Asty04)

40 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

990 and the resulting female was used for the BioNano Irys “restriction” map. c) locations of the rivers (blue) 991 and caves (orange) discussed in the study. Maps from Google Maps. 992 993

994 995 Figure S2. Synteny between a recombination-based linkage map and the Astyanax mexicanus 2.0 996 reference. By mapping the relative positions of GBS markers from a dense map constructed from a Pachón 997 x surface fish F2 pedigree, we observed significant synteny based on 96.6% of our markers (a). Although a 998 substantial portion of the genome remains unplaced (grey), we noted sparse distribution of GBS markers 999 beyond the first 6 Mb of the unplaced scaffold (represented in grey, (b). The majority of these alignments 1000 were present in the first 1 Mb of the unplaced scaffold (grey). This syntenic representation indicates 1001 substantial structural similarity between the cavefish and surface genomes and demonstrates the utility of 1002 the draft surface fish genome for identifying QTL intervals and nominating candidate genes associated 1003 therein. 1004 1005

41 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1006 1007 Figure S3. Percentage of properly paired reads for each sample aligned to the surface (y-axis) and 1008 cave (x-axis) reference genomes. Samples are colored by population identity. Line is best fit of all 1009 samples. Properly paired reads include reads that align in the correct orientation to the same 1010 scaffold/chromosome in the assembly. The greater proportion of properly pairing reads in the alignment to 1011 the surface genome likely reflects the greater level of contiguity of the surface assembly, e.g, N50 contig 1012 length. 1013

42 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1014 1015 Figure S4. Non-primary read alignment counts for samples aligned to the surface (y-axis) and cave 1016 (x-axis) reference genomes. The reference alignment for each sample is indicated by shape (cave 1017 reference: square, surface reference: triangle). Colored lines are fit to samples aligned to each respective 1018 genome (cave: blue, surface: yellow). Each sample was aligned to both reference genomes and are united 1019 by color indicating population membership. Inset boxplots are quartiles, median, minimum/maximum 1020 whiskers and results of a paired t-test of the number of non-primary alignments for each sample aligned to 1021 each genome. Non-primary alignments include reads that have multiple mapping positions and reads with 1022 chimeric alignments. Alignment to the cave results in significantly more non-primary alignments with 1023 greater variation among the aligned samples. 1024

43 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1025 1026 1027 Figure S5. Surface fish genome allows for long and continuous intervals in the QTL analysis. Circos plot for 1028 albino QTL (a) and eye size QTL (b) showing the markers with significant LOD scores on both the cavefish (blue 1029 contigs) and surface fish genome (red/black contigs). 1030

44 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1031 1032 1033 Figure S6. Alignment of oca2 exon 24 sequencing data from wild samples utilizing the Astyanax 1034 mexicanus 2.0 as a reference. Portion of oca2 locus on chromosome 13 of the annotated surface fish 1035 genome (top) aligned to sequencing data from wild-caugh surface fish (Río Choy, Rascón) and cavefish 1036 (Molino, Pachón, Tinaja). Exon 24 is highlighted and gaps in horizontal lines indicated sequence deletion. 1037 Alignments made using CLC sequence viewer. 1038 1039

45 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1040 1041 1042 Figure S7. Alignment of oca2 exon 21 sequencing data from wild samples utilizing the 1043 Astyanax mexicanus 2.0 as a reference. Portion of oca2 locus on chromosome 13 of the 1044 annotated surface fish genome (top) aligned to sequencing data from wild-caugh surface fish (Río 1045 Choy, Rascón) and cavefish (Molino, Pachón, Tinaja). Exon 24 is highlighted and gaps in 1046 horizontal lines indicated sequence deletion. Alignments made using CLC sequence viewer. 1047

46 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1048 1049 1050 Figure S8. Pachón cavefish have reduced expression of dusp26. Normalized expression count of 1051 dusp26 at the indicated stage of development. Significance codes; ns p>0.05, *p<0.05, **p<0.005. 1052 Transcriptomics data from 46

47 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1053

1054 Figure S9. Comparative counts of deletions called per sample and grouped by the samples’ 1055 populations according to the SV caller. These deletions are parsed by homozygous or heterozygous 1056 state, or total number of haplotypes affected (rows), and by deletions detected made by lumpy, manta, or 1057 the intersection of the two sets (columns).

48 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1058

1059 1060 Figure S10. Comparison of number of deletions called per sample for lumpy to manta SV detection 1061 algorithms. A high correlation between the results of the two callers is observed. 1062

49 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a Coding Intron Regulatory

b

c

1063

1064 Figure S11. Counts of deletions called per sample, grouped by the samples’ population source. (a) 1065 Homozygous alternate, (b) heterozygous, and (c) total alternate deletions compared to the Astyanax 1066 mexicanus 2.0 reference. Values on the y axis represent total deletion counts. Across all allele types

50 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1067 deletions affecting coding, intronic, or regulatory sequence are labelled accordingly. Putative regulatory 1068 sequence was arbitrarily defined as 1 kb upstream of the start and stop codon. 1069

1070 Table S1. Representative assembly metrics for sequenced teleost genomes1. N50 N50 Total Total % contig scaffold assembly contigs Repeats2 Species Assembled version (Mb) (Mb) size (Gb)

Astyanax Astyanax 1.76 35 1.29 3,030 41 mexicanus mexicanus 2.0 (surface) Astyanax Astyanax 0.014 1.77 1.19 121,345 30 mexicanus mexicanus 1.0.2 (cave) Xiphophorus X maculatus 5.0 9.1 31 0.7 258 27 maculatus male Oryzias latipes ASM223467v1 2.5 31 .73 516 34 Danio rerio GRCz11 1.4 7.3 1.36 19,725 48

1071 1All species-specific assembly metrics derived from the NCBI assembly archive. 1072 2Total repeats estimated by WindowMasker

51 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1073 Table S2. Representative gene annotation measures for assembled teleost genomes1. 1074 Protein Total mRNAs coding ncRNA Species name Assembled version genes

Astyanax Astyanax mexicanus 2.0 25,293; 5,314 42,649 mexicanus 26,6982 Astyanax Astyanax mexicanus 1.0.2 23,628 1,062 33,353 mexicanus Xiphophorus Xiphosphorus 5.0 male 23,238 3,620 43,551 maculatus

Oryzias latipes ASM223467v1 22,071 4,481 44,753 Danio rerio GRCz11 26,522 13,137 52,818

1075 1All species-specific gene annotation metrics derived from the NCBI RefSeq comparisons. 1076 2Gene annotation metrics from Ensembl 98. 1077 1078 1079 Table S3. A summary of gene representation using highly conserved gene orthologs from multiple 1080 species. 1081 Gene metric % Complete 94.6 Complete and single copy 90.4 Complete and duplicated 4.2 Fragmented 1.4 Missing 4.0 1082 BUSCO was run in genome mode. A total of 4,584 conserved mammalian orthologs were used. 1083

1084 Table S4. Candidate genes in surface fish genome intervals underlying previously reported 1085 activity QTL in Astyanax mexicanus1,2. Candidate genes previously identified based on the 1086 Pachón cave genome (Astyanax mexicanus 1.0.2) are indicated with asterisks. 1087 Ensembl Gene ID Gene Name Chromosomal Location ENSAMXG00000021023 smarca4a 2:11818816-11837849:-1

ENSAMXG00000008438 coro1a 2:14117859-14135882:-1 ENSAMXG00000034332 arl6ip1 2:14385087-14398765:-1 ENSAMXG00000009999 unc119.2 2:14893606-14905509:-1 ENSAMXG00000011025 kcnc3a 2:15763673-15882326:-1 ENSAMXG00000002938 alcamb 2:19282937-19312685:-1

52 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

ENSAMXG00000003038 lim2.3 2:19536533-19540803:-1 ENSAMXG00000009792 cryaba 2:20946260-20947554:1 ENSAMXG00000032727 tbx2b 2:24487301-24496709:1 ENSAMXG00000010190 mfrp 2:25661299-25673051:1 ENSAMXG00000013648 flr 2:27598963-27629303:-1 ENSAMXG00000004873 rpgrip1 3:8845958-8869048:1 ENSAMXG00000029409 * 3:9109394-9133706:1 ENSAMXG00000026077 novel gene 3:12526447-12529953:-1 ENSAMXG00000029859 novel gene 3:12998126-13002944:1 ENSAMXG00000001701 novel gene 3:13006893-13010808:1 ENSAMXG00000001577 3:13160896-13167893:-1 ENSAMXG00000006915 npas2 3:14232532-14258948:1 ENSAMXG00000038165 tbx5b 3:14702695-14713758:1 ENSAMXG00000016913 tmtopsb* 3:17563872-17631603:-1 ENSAMXG00000014033 chodl 3:21723943-21733151:-1 ENSAMXG00000014704 boc 3:22384147-22424523:1 ENSAMXG00000014873 klhl40b 3:23009101-23016128:-1 ENSAMXG00000024536 nek2 3:31499309-31505232:-1 ENSAMXG00000024534 ints7 3:31581594-31598913:-1 ENSAMXG00000024533 dtl 3:31598791-31609823:1 ENSAMXG00000034080 slc35b2 3:31767337-31782355:1 ENSAMXG00000018645 extl3 3:32853820-32889965:1 ENSAMXG00000037688 insm1a 3:33509031-33510524:1 ENSAMXG00000043510 zc4h2 3:35812278-35820217:-1 ENSAMXG00000041759 efnb1 3:37303370-37355591:-1 ENSAMXG00000036821 rhogc 3:37574414-37575393:-1 ENSAMXG00000034538 marcksa* 3:38448073-38453896:-1 ENSAMXG00000014261 lama4 3:38535994-38575406:1 ENSAMXG00000004758 id2b 3:41735650-41737485:-1 ENSAMXG00000001703 sox11b 3:42060037-42061293:-1 ENSAMXG00000002111 nhsl1b 3:42555719-42718544:1 ENSAMXG00000009797 mcm3* 3:44339476-44347266:1 ENSAMXG00000030048 opn8a 3:44673077-44694625:1 ENSAMXG00000032634 opn8b 3:44710459-44727248:1 ENSAMXG00000034361 napbb 3:45437838-45448049:-1 ENSAMXG00000037668 cipcb 3:46387024-46395340:-1 ENSAMXG00000017253 vps18 3:56760089-56765547:1 ENSAMXG00000042064 prph2lb 3:57795222-57801758:1 ENSAMXG00000006177 gmds 3:58321161-58384972:1 ENSAMXG00000033985 znf513a 3:59487869-59503723:1

53 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

ENSAMXG00000019342 hif1ab 3:60162341-60177903:-1 ENSAMXG00000034914 six6b 3:60305080-60310494:-1 ENSAMXG00000020545 jag2b 3:61294730-61345304:-1 ENSAMXG00000040803 opn7a 3:62978088-63008081:-1 ENSAMXG00000005565 clocka 3:64430234-64454663:-1 ENSAMXG00000031528 nmu 3:64506864-64526902:-1 ENSAMXG00000005419 paics 3:64711940-64736902:1 ENSAMXG00000005200 ipo13 3:64958602-65019480:-1 ENSAMXG00000024961 htr1b 3:66681640-66682803:1 ENSAMXG00000010672 novel gene 3:66801298-66858990:1 ENSAMXG00000010179 opn5* 3:73215451-73247349:1 ENSAMXG00000021415 igf1rb* 5:27225260-27305403:1 ENSAMXG00000025528 nfil3-5 13:1256220-1257995:-1 ENSAMXG00000009984 ttc26 13:1431334-1441483:1 ENSAMXG00000038316 tmtops2a 13:7136778-7155237:1 ENSAMXG00000039940 vapb* 13:9377160-9388944:1 ENSAMXG00000012711 dmbx1b* 13:10553760-10563244:-1 ENSAMXG00000031725 rtn4a 13:12814377-12831725:-1 ENSAMXG00000016819 scinla* 13:13069655-13089641:-1 ENSAMXG00000025156 znf644b 13:13495670-13547264:1 ENSAMXG00000004665 sap130a 13:16514965-16534778:1 ENSAMXG00000004743 bcl6aa 13:16670930-16676300:-1 ENSAMXG00000026091 cdk5r2a 13:17958856-17959791:1 ENSAMXG00000002203 cryba2a 13:17990940-18002011:-1 ENSAMXG00000002143 desmb 13:18080034-18091983:-1 ENSAMXG00000032420 etv5b 13:18190749-18209077:-1 ENSAMXG00000033167 atf4a 13:19522642-19530777:1 ENSAMXG00000011446 pde6ha 13:19633358-19633738:-1

ENSAMXG00000041726 novel gene 13:25481695-25483109:-1 ENSAMXG00000041979 13:28652595-28682666:-1 ENSAMXG00000039549 tardbp 13:29280168-29284278:-1 ENSAMXG00000024820 kcna2a 13:31980151-31981632:1 ENSAMXG00000016290 setd5 13:32341007-32354272:1 ENSAMXG00000033707 mtmr14 13:32429747-32448915:1 ENSAMXG00000030172 emc3 13:34518688-34525279:1 ENSAMXG00000002173 ghrl 13:36056678-36058841:-1 ENSAMXG00000035630 cnga3a 13:37221041-37236704:1 ENSAMXG00000012578 rp2 13:38348210-38355069:1 ENSAMXG00000035622 rgs4* 13:40360036-40363859:-1

54 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

ENSAMXG00000038790 b9d2* 14:1495190-1503124:1 ENSAMXG00000038525 il7r 14:2287140-2299347:-1 ENSAMXG00000005363 rb1* 14:2978436-3034567:-1 ENSAMXG00000036655 dlat* 14:3037422-3053639:-1 ENSAMXG00000040141 rhogb 14:6287379-6292949:-1 ENSAMXG00000037597 nrxn2a 14:8212904-8413941:1 ENSAMXG00000035067 cryba1b 14:8711160-8716854:-1 ENSAMXG00000024390 unc119b 14:8721262-8757794:1 ENSAMXG00000005573 otpa 14:12375640-12383285:1 ENSAMXG00000030762 htr2cl2* 14:14798655-14808988:-1 ENSAMXG00000035790 lhx5 14:25157877-25177456:1 ENSAMXG00000014101 gnrh2 14:25303792-25305300:-1 ENSAMXG00000014133 14:25310484-25364297:-1 ENSAMXG00000010561 whrna 14:30208193-30463006:-1 ENSAMXG00000037380 rx3 14:33825623-33831203:-1 ENSAMXG00000006642 si:dkey-6b12.5 14:35755277-35788288:-1 ENSAMXG00000016359 sh3pxd2b 14:36399980-36458373:-1 ENSAMXG00000002528 kif3b 15:21886044-21894243:-1 ENSAMXG00000043242 dnmt3bb.2 15:22917681-22935298:-1 ENSAMXG00000012860 tnnc2 15:25799535-25801853:1 ENSAMXG00000031060 ca6 15:27697722-27715234:-1 ENSAMXG00000025967 atoh7* 25:14410275-14410697:1 ENSAMXG00000004732 bag3 25:14442551-14452067:1 ENSAMXG00000034070 fgf8* 25:16916654-16924329:1 ENSAMXG00000012130 chata* 25:17712720-17723271:-1 ENSAMXG00000042717 rgra* 25:17736278-17746726:-1 ENSAMXG00000037212 cxcl12a 25:18837257-18847700:1 ENSAMXG00000002308 zdhhc16a 25:19360022-19366120:-1 ENSAMXG00000035696 sfrp5* 25:19678068-19709202:-1 ENSAMXG00000029072 pitx3* 25:21715018-21725509:1 ENSAMXG00000001987 ninl 25:21769890-21801028:-1 1088 1See Carlson et al. (2018) for description of scored phenotypes and identified QTL intervals. 1089 2Initial candidacy decision based on available Ensembl GO annotations only. 1090 1091 Table S5. List of genes in the 1.5-LOD support interval for the eye size QTL in surface/Pachón 1092 F2 hybrid group B (chromosome 20:1168905-1802909). Gene name Description zswim6 zinc-finger SWIM-type containing kif2a kinesin family member 2a

55 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

kmt5aa methyltransferase 5aa snmp35 small nuclear ribonucleoprotein 35 (U11/U12) dusp26 dual specificity phosphatase 26 denr density-regulated protein hcar1 hydroxycarboxylic acid receptor 1 pde4d phosphodiesterase 4D depdc1b DEP domain containing 1B elovl7b ELOVL fatty acid elongase 7b pitpnm2 phosphatidylinositol transfer protein membrane associated 2 hip1r huntingtin interacting protein 1 related mphosph9 M-phase phosphoprotein 9 ccdc62 coiled-coil domain containing 62 cdk2ap1 cyclin-dependent kinase 2-associated protein 1-like sbno strawberry notch homolog 1 (Drosophila) zmiz2 , MIZ-type containing 2 ppiab peptidylprolyl isomerase Ab (cyclophilin A) rilpl1 Rab interacting lysosomal protein like 1 ENSAMXT00000043836 na ENSAMXG00000040376 na ENSAMXG00000038021 na 1093

56 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1094 Table S6. Astyanax mexicanus protein-coding genes with detected deletions across all cave 1095 populations (see Methods). Entrez gene symbols derived from NCBI gene annotation are 1096 provided. ADGRG2 HLTF LOC103025035 AP3B1 HNF4A LOC103025085 ARL13A HTT LOC103025117 ASB5 IKZF1 LOC103025288 ASTN1 IZUMO1 LOC103025292 BAG3 KCNK6 LOC103025301 CCDC25 KHDC4 LOC103025404 CCSER1 KIAA1107 LOC103025426 CDR2L KIAA1109 LOC103025756 CELF2 KIAA1522 LOC103025817 CEP120 KIAA1671 LOC103025898 CLEC16A KLHL18 LOC103025952 CLHC1 LIPE LOC103026087 CLUL1 LNPK LOC103026151 CMPK1 LOC103021348 LOC103026158 CMPK2 LOC103021388 LOC103026225 COL25A1 LOC103021542 LOC103026350 CSNK2B LOC103021598 LOC103026594 DDX11 LOC103021919 LOC103026618 DESI2 LOC103021937 LOC103026744 DOK6 LOC103022185 LOC103026850 DSTYK LOC103022226 LOC103026918 DUSP27 LOC103022502 LOC103026961 EAF2 LOC103022787 LOC103027025 EML1 LOC103022832 LOC103027070 EPHX2 LOC103022898 LOC103027080 ESCO2 LOC103023116 LOC103027108 ETV1 LOC103023166 LOC103027186 EYS LOC103023221 LOC103027443 FAM13A LOC103023504 LOC103027450 FAM241A LOC103023618 LOC103027475 FAM47C LOC103023655 LOC103027764 FST LOC103024156 LOC103027893 GADD45A LOC103024232 LOC103028100 GDPD1 LOC103024407 LOC103028137 GFRA4 LOC103024577 LOC103028327 GLRX LOC103024654 LOC103028388

57 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

LOC103028431 LOC103033816 LOC103041108 LOC103028477 LOC103033881 LOC103041268 LOC103028601 LOC103034116 LOC103041345 LOC103028648 LOC103034997 LOC103041628 LOC103029016 LOC103035038 LOC103041780 LOC103029343 LOC103035617 LOC103041826 LOC103029370 LOC103035809 LOC103041846 LOC103029526 LOC103035951 LOC103041934 LOC103029568 LOC103036004 LOC103042111 LOC103029852 LOC103036092 LOC103042143 LOC103029876 LOC103036502 LOC103042371 LOC103029921 LOC103037404 LOC103042420 LOC103029962 LOC103037517 LOC103042637 LOC103030246 LOC103037523 LOC103042652 LOC103030391 LOC103037579 LOC103043055 LOC103030841 LOC103037674 LOC103043079 LOC103030872 LOC103037998 LOC103043115 LOC103030885 LOC103038058 LOC103043156 LOC103030943 LOC103038316 LOC103043202 LOC103031014 LOC103038317 LOC103043269 LOC103031077 LOC103038408 LOC103043276 LOC103031216 LOC103038664 LOC103043301 LOC103031263 LOC103038914 LOC103043437 LOC103031336 LOC103039006 LOC103043467 LOC103031517 LOC103039085 LOC103043630 LOC103031755 LOC103039110 LOC103043783 LOC103032028 LOC103039298 LOC103043808 LOC103032103 LOC103039619 LOC103043895 LOC103032125 LOC103039690 LOC103044020 LOC103032234 LOC103039742 LOC103044023 LOC103032294 LOC103039911 LOC103044355 LOC103032413 LOC103039934 LOC103044375 LOC103032734 LOC103040030 LOC103044430 LOC103032969 LOC103040205 LOC103044527 LOC103032982 LOC103040475 LOC103044598 LOC103033078 LOC103040476 LOC103044780 LOC103033392 LOC103040561 LOC103044908 LOC103033459 LOC103040571 LOC103045023 LOC103033531 LOC103040666 LOC103045448 LOC103033672 LOC103040995 LOC103045627

58 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

LOC103045744 LOC111191359 LOC111194017 LOC103045760 LOC111191382 LOC111194034 LOC103046056 LOC111191388 LOC111194087 LOC103046257 LOC111191391 LOC111194215 LOC103046360 LOC111191409 LOC111194235 LOC103046641 LOC111191623 LOC111194306 LOC103046671 LOC111191628 LOC111194328 LOC103046842 LOC111191630 LOC111194329 LOC103046859 LOC111191665 LOC111194330 LOC103047026 LOC111191730 LOC111194614 LOC103047048 LOC111191763 LOC111194615 LOC103047362 LOC111191800 LOC111194726 LOC103047608 LOC111191862 LOC111194743 LOC103047809 LOC111191864 LOC111194888 LOC107197051 LOC111191888 LOC111194889 LOC107197092 LOC111191976 LOC111194890 LOC107197101 LOC111192051 LOC111194989 LOC107197227 LOC111192062 LOC111195020 LOC107197329 LOC111192378 LOC111195097 LOC107197471 LOC111192397 LOC111195104 LOC107197656 LOC111192512 LOC111195105 LOC111188314 LOC111192539 LOC111195171 LOC111188773 LOC111192579 LOC111195182 LOC111188904 LOC111192626 LOC111195396 LOC111189542 LOC111192630 LOC111195397 LOC111189593 LOC111192701 LOC111195398 LOC111189599 LOC111193054 LOC111195488 LOC111189672 LOC111193082 LOC111195507 LOC111189673 LOC111193087 LOC111195553 LOC111189674 LOC111193225 LOC111195738 LOC111189675 LOC111193234 LOC111195938 LOC111190101 LOC111193241 LOC111196042 LOC111190102 LOC111193463 LOC111196132 LOC111190104 LOC111193492 LOC111196234 LOC111190801 LOC111193541 LOC111196391 LOC111190802 LOC111193710 LOC111196682 LOC111191036 LOC111193719 LOC111196811 LOC111191144 LOC111193730 LOC111196872 LOC111191189 LOC111193837 LOC111196873 LOC111191268 LOC111193890 LOC111197088

59 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

LOC111197090 TLK2 LOC111197165 TMEM71 LOC111197441 TMEM8B LYST TRDN MALRD1 TRNAE-CUC MAP3K15 TRNAI-AAU MAP3K7CL TTC12 MAPKAPK3 TTC21B MFSD9 TTC6 MICALL1 URB2 MTMR6 UTP18 NAALADL1 UXS1 NDUFA4L2 VILL NDUFAF6 VPS13A OCA2 VSIG10L2 OIT3 XPA PAPPA XPO7 PCDH9 YPEL1 PER3 ZNF385D PGGHG ZNF536 PHTF2 ZNFX1 PIGP PLEKHA8 PPP2R5A PRPS1 RASA3 RPUSD2 RSRC2 SCEL SDHAF2 SEMA6D SFMBT2 SGCD SGF29 SGMS1 SLC2A9 SMS SPHKAP TBC1D9 TDRKH

60 bioRxiv preprint doi: https://doi.org/10.1101/2020.07.06.189654; this version posted July 6, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1097 Table S7. CRISPR Oligo design. 1098 Oligo A: 5’-3’ TAATACGACTCACTATAGGTGTAGCTGAAACGTGGTGAGTTTTAGAGCTAGAAATAGC Oligo B: 5’-3’ GATCCGCACCGACTCGGTGCCACTTTTTCAAGTTGATAACGGACTAGCCTTATTTTAAC TTGCTATTTCTAGCTCTAAAAC 1099

1100 1101 1102

61