<<

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 DNA transposon expansion is

2 associated with genome size increase

3 in mudminnows

4 5 Robert Lehmann1, Aleš Kovařík2, Konrad Ocalewicz3, Lech Kirtiklis4, 6 Andrea Zuccolo5,6, Jesper N. Tegner1, Josef Wanzenböck7, Louis 7 Bernatchez8, Dunja K. Lamatsch7, and Radka Symonová9,10 8 9 1 Division of Biological and Environmental Sciences & Engineering, 10 Computer, Electrical and Mathematical Sciences and Engineering Division, 11 King Abdullah University of Science and Technology, Thuwal, 23955-6900, 12 Kingdom of Saudi Arabia 13 2 Laboratory of Molecular Epigenetics, Institute of Biophysics, Czech 14 Academy of Science, Královopolská 135, 61265, Brno, 15 3 Department of Marine Biology and Ecology, Institute of Oceanography, 16 Faculty of Oceanography and Geography, University of Gdansk, Gdansk, 17 Poland 18 4 Department of Zoology, Faculty of Biology and Biotechnology, University 19 of Warmia and Mazury, M. Oczapowskiego Str. 5, 10-718, Olsztyn, Poland 20 5 Center for Desert Agriculture, Biological and Environmental Sciences & 21 Engineering Division (BESE), King Abdullah University of Science and 22 Technology, Thuwal 23955-6900, Saudi Arabia 23 6 Institute of Life Sciences, Scuola Superiore Sant’Anna, Pisa 56127, Italy 24 7 Research Department for Limnology Mondsee, University of Innsbruck, A 25 – 5310 Mondsee, 26 8 IBIS (Institut de Biologie Intégrative et des Systèmes), Université Laval, 27 Québec, QC, Canada 28 9 Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, 29 Technische Universität München, Freising, Germany 30 10Faculty of Biology, University of Hradec Kralove, Czech Republic 31 Correspondence: 32 Radka Symonova ([email protected]) 33 Word Count: 6766

1

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

34 Abstract

35 Genome sizes of eukaryotic organisms vary substantially, with whole

36 genome duplications (WGD) and transposable element expansion acting as

37 main drivers for rapid genome size increase. The two North American

38 mudminnows, limi and U. pygmaea, feature genomes about twice the

39 size of their sister lineage (e.g., pikes and pickerels). However, it

40 is unknown whether all Umbra share this genome expansion and

41 which causal mechanisms drive this expansion. Using flow cytometry, we

42 find that the genome of the is expanded similarly to

43 both North American species, ranging between 4.5-5.4 pg per diploid

44 nucleus. Observed blocks of interstitially located telomeric repeats in Umbra

45 limi suggest frequent Robertsonian rearrangements in its history.

46 Comparative analyses of transcriptome and genome assemblies show that

47 the genome expansion in Umbra is driven by extensive DNA transposon

48 expansion without WGD. Furthermore, we find a substantial ongoing

49 expansion of repeat sequences in the blackfish pectoralis, the

50 closest relative to the family , which might mark the beginning of a

51 similar genome expansion. Our study suggests that the genome expansion

52 in mudminnows, driven mainly by transposon expansion, but not WGD,

53 occurred before the separation into the American and European lineage.

54 55 56 Key Words: Genome expansion, Umbra, Robertsonian fusion, centric 57 fission, repetitive sequences. 58

2

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

59 Significance Statement: 60 North American mudminnows feature genomes about twice the size of their

61 sister lineage Esocidae (e.g., pikes and pickerels). However, neither the

62 mechanism underlaying this genome expansion, nor whether this feature is

63 shared amongst all mudminnows is currently known. Using cytogenetic

64 analyses, we find that the genome of the European mudminnow also

65 expanded and that extensive chromosome fusion events have occurred in

66 some Umbra species. Furthermore, comparative genomics based on de-

67 novo assembled transcriptomes and genome assemblies, which have

68 recently become available, indicates that DNA transposon activity is

69 responsible for this expansion.

70 Introduction

71 The driving forces and effects of genome size variations across different taxa

72 are a recurring theme in the field of evolutionary biology (Lynch 2007). While

73 the number of chromosomes is typically highly conserved among

74 (Mank & Avise 2006), genome sizes vary substantially and include the

75 smallest and the largest vertebrate genome (Hardie & Hebert 2004).

76 Specifically, genome size ranges from C = 0.35 pg in bandtail pufferfish

77 (Sphoeroides spengleri) to C = 133 pg in marbled (Protopterus

78 aethiopicus) (Gregory 2020). The main candidate processes introducing

79 such drastic variation are gene-(Ohno 2013; Lu et al. 2012) or genome

80 duplication(Fontdevila 2011; Van de Peer et al. 2017), transposable element

81 (TE) proliferation (Pritham 2009; Tenaillon et al. 2010), and replication

82 slippage at tandem repeat loci(Ellegren 2004). Furthermore, it is unclear

3

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

83 whether this variation is shaped by adaptive processes (Gregory & Hebert

84 1999; Liedtke et al. 2018; Van de Peer et al. 2017) or stochastic sequence

85 gain and loss(Lynch 2007). The presence of TEs in virtually all eukaryotic

86 genomes suggests a role of stochastic sequence gain due to TE proliferation

87 (Elliott & Gregory 2015). There are several recent reports of significant

88 genomic expansion, such as in salamanders due to long terminal repeat

89 element activity (Sun et al. 2012) and in Hydra due to long interspersed

90 nuclear element activity (Wong et al. 2019). Estimates of the pervasiveness

91 and extent of expansion events caused by TE proliferation and other

92 processes will provide insight into the forces that shape genome size

93 evolution.

94 Mudminnows (Umbridae) are very resilient members of the

95 , capable of withstanding extreme cold and can utilize

96 atmospheric oxygen. Under adverse conditions, they are reputed to become

97 dormant in mud (Gilbert & Williams 2002). The order Esociformes includes

98 two families and four genera in at least 12 species with one fossil-only family

99 (Nelson et al. 2016). Furthermore, Esociformes (Haplomi, Esocae; pikes and

100 mudminnows) and its sister order Salmoniformes belong to the clade

101 , a lineage of basal (Nelson et al. 2016). Genome

102 sizes of both North American Umbra species were determined already in the

103 60s to be C = 2.52 - 2.70 pg for Umbra limi and C = 2.4 pg for U. pygmaea

104 (Hinegardner 1968). The genome size of the remaining representatives of

105 the order Esociformes, including , , and Dallia, ranges from

106 0.85 (in Esox) to 1.4 (also in Esox; summarized by (Gregory 2020); details in

4

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

107 Table 1). While two of the three extant Umbra species feature a genome size

108 twice the size of the remaining esociform species, it is currently not known

109 whether this is a universal feature of the Umbridae and can be expected in

110 U. krameri.

111 The diploid chromosome number in Esociformes ranges from 2n = 22 (U.

112 pygmaea and U. limi) to 2n = 71 - 79 (Dallia sp.). Almost all known pike (Esox)

113 species possess 50 (fundamental number FN = 50) usually acrocentric

114 chromosomes, including species from Canada, USA, Sweden, and Russia

115 (Arai 2011) and two European Esox species (Symonová et al. 2017). Such

116 karyotype is considered the pre-duplication ancestor of the order

117 Salmoniformes (, grayling, whitefish) (Rondeau et al. 2014; Ráb

118 2004). There are, however, reports of FN = 86 with 2n = 50 in E. lucius from

119 the south Caspian Sea basin and further reports of FN = 72 - 88 in other E.

120 lucius populations (Khoshkholgh et al. 2015). This morphological variability

121 of chromosomes in the Esox with its broad circumpolar distribution

122 (Nelson et al. 2016) has not yet been analyzed.

123 Pikes are highly interesting from the cytogenomic viewpoint because of their

124 extremely amplified, massively methylated, and potentially functional 5S

125 rDNA recorded on more than 30 centromeric sites of their 50 chromosomes,

126 whereas 45S rDNA is located in only two sites (Symonová et al. 2017). A

127 similar amplification and increased dynamics of the 45S rDNA fraction has

128 been repeatedly recorded in Salmoniformes 12,13, which furthermore

129 underwent an already well-characterized whole-genome duplication (WGD)

130 (Macqueen & Johnston 2014; Lien et al. 2016). Indeed, WGD events have

5

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

131 been described for multiple basal, particularly freshwater teleosts such as

132 (Li et al. 2015) and Siluriformes (Marburger et al. 2018). This

133 observation suggests two distinct explanations for the significant genome

134 size increase of Umbridae compared to its sister lineage Esocidae, an

135 Umbridae-specific WGD on the one hand, and an extreme amplification of

136 ribosomal DNA on the other.

137 Here, we present cytogenomic evidence that all extant members of the

138 Umbra lineage feature an expanded genome size compared to its closest

139 relative, the Esocidae family. Furthermore, we find that Umbridae feature a

140 standard locus and copy number of 5S rDNA typical for teleosts, excluding a

141 5S rDNA expansion as a reason for the Umbra-specific genome. While

142 phylogenetic profiles of orthologous genes across species do not show

143 patterns of WGD, comparative analyses of repetitive sequence content

144 across six Esocidae species show a clear signal of DNA transposon

145 expansion in Umbra pygmaea driving the genome size increase.

146 Results

147 Flow cytometry genome size determination in the European pike 148 and the European mudminnow 149 Flow cytometry measurements using DAPI staining of U. krameri with

150 chicken RBC as internal standard exhibited a genome size of 2C = 4.54 ±

151 0.05 pg (±STDEV) per nucleus (CV% 3.36 ± 0.55; range 4.43-4.60 pg per

152 nucleus, N=10). E. lucius gave genome values of 2C = 2.22 ± 0.03 pg per

153 nucleus (CV% 2.44 ± 1.37, range 2.18 - 2.27 pg per nucleus, N=2). Genome

154 size and chromosome traits (2n and FN) for the order Esociformes are

6

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

155 summarized in Fig. 1 and show the elevated genome size in all three Umbra

156 species. The higher interquartile range in Esox lucius reflects merely the

157 higher sampling effort and multiple values available. A similar situation exists

158 in Umbra limi. We provide an overview of cytogenomic and cytogenetic traits

159 of the order Esociformes (Table 1) based on the checklist of fish karyotypes

160 (Arai 2011).

161

162

163 164 Fig.1. Genome size (C-value) (left) and chromosome (2n) and chromosome 165 arms (FN) counts (right) in esociform fish. 166

167 Table 1. Cytogenomic data for extant species of the order Esociformes from 168 www.genomesize.com, www.animalrdnadatabase.com, literature, and results 169 obtained in this study. C- val 45S 5S Family Species (pg) 2n/FN Method rDNA rDNA Reference E. americanus (Beamish et al. 1971; Rab & Esocidae 1.18 50/72 FD

americanus Crossman 1994) E. americanus 1.06/ (Beamish et al. 1971; Hardie & Esocidae 50 FIA, FD vermiculatus 1.12 Hebert 2004) (Vendrely & Vendrely 1953, Esocidae E. lucius 0.97 BCA 1952a)

Esocidae E. lucius 0.98 BCA (Vendrely & Vendrely 1952b)

7

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Esocidae E. lucius 1.13 FIA (Hardie & Hebert 2003, 2004) 50/ (Vinogradov 1998;

Esocidae E. lucius 1.15 FCM 50-58 Khoshkholgh et al. 2015)

Esocidae E. lucius 1.36 FD (Beamish et al. 1971) present study;

Esocidae E. lucius 1.11 50/50 FCM 2 30-38 (Symonová et al. 2017)

Esocidae E. cisalpinus 50/50 N/A 2 30-34 (Symonová et al. 2017) E. 50/ (Beamish et al. 1971; Rab & Esocidae 1.28 FD 2

masquinongy 76-78 Crossman 1994) (Hardie & Hebert 2003, 2004); Esocidae E. niger 0.92 FIA 4 present study (Beamish et al. 1971) Esocidae E. niger 1.18 50/74 FD 2 (Beamish et al. 1971) Esocidae E. reichertii 1.28 50 FD

Dallia (Beamish et al. 1971; Rab & Esocidae 1.26 74-78/ FD 6 pectoralis Crossman 1994) (Beamish et al. 1971; Crossman Novumbra Esocidae 1.04 48/62 FD 12 hubbsi & Ráb 2001) (Beamish et al. 1971); present

Umbridae U. limi 2.52 FD 3 study

Umbridae U. limi 2.56 FIA (Hardie & Hebert 2004) (Hinegardner 1968; Hinegardner Umbridae U. limi 2.70 22/44 BFA 2 & Rosen 1972) present study; reviewed in (Arai Umbridae U. krameri 2.27 44/44 FCM 2011) Umbridae U. pygmaea 2.41 22/44 FD 2-4 (Beamish et al. 1971) 170 Abbreviations used: Method: BCA – biochemical analysis, BFA – Bulk fluorometric assay, FD 171 – Feulgen densitometry, FIA - Feulgen image analysis densitometry, FCM – Flow cytometry, 172 N/A – not available. 173

174 Comparative analysis of the U. pygmaea transcriptome does not 175 indicate whole genome duplication 176 To test whether the Umbridae family underwent a whole-genome duplication

177 event (WGD), we assessed gene duplication levels in a set of 24 de novo

178 transcriptome assemblies, including the U. pygmaea

179 (Pasquier et al. 2016). We focused on a set of 3640 benchmark universal

180 single-copy orthologous (BUSCO) genes, which occur only once in the

8

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

181 majority (>90%) of a set of 26 selected species (Simão et al.

182 2015). Since only a single salmonid species, Salmo salar, was used to

183 construct the BUSCO set, genes that were duplicated in the salmonid-

184 specific WGD(Macqueen & Johnston 2014) were not excluded. Similarly, as

185 this gene set does not consider Umbridae family species, any potential

186 ohnologs originating from an Umbridae-specific WGD are not excluded.

187 Accordingly, five of the six salmonid de novo transcriptome assemblies show

188 the highest duplication level amongst the 24 considered species (Fig. S2,

189 Table S1). In contrast, U. pygmaea exhibits the second-lowest duplication

190 level, second only to Pangasius hypophthalmus. The elevated duplication

191 level in European eel Anguilla anguilla and arowana Osteoglossum

192 bichirrosum were noted recently, suggesting an additional WGD as a

193 possible cause(Rozenfeld et al. 2019).

194 Transcriptome de novo assembly procedures include steps that can

195 influence the observed number of duplicated genes, such as the clustering

196 of the assembled contigs(Pasquier et al. 2016; Grabherr et al. 2011).

197 Therefore, we collected all available reference genome assemblies matching

198 the species of any of the transcriptome assemblies. The direct comparison

199 of BUSCO estimates for de novo assemblies and reference genome-derived

200 transcriptomes across 10 species revealed a high correspondence of the

201 duplication level (R2 = 0.8) (Fig. S1, Table S2). In contrast, parameters

202 reflecting technical differences of transcriptome generation show

203 substantially lower correspondence, such as the number of fragmented (R2

204 = 0.02) or missing transcripts (R2 = 0.28). Clustering of BUSCO gene

9

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

205 phylogenetic profiles across all considered species provides a more detailed

206 picture (Fig. 2). While a small group of 240 transcripts fails to assemble

207 reliably (cluster I), likely reflecting specific sequence properties, the second

208 group of 819 genes is detected as single-copy in the majority of species

209 (cluster II). The largest cluster III, comprised of 2096 genes, exhibits species-

210 specific duplication patterns. Cluster IV contains 301 genes duplicated

211 mainly in salmonids and thus likely represent remnants of the salmonid-

212 specific WGD (cluster IV). The remaining 184 genes appear almost

213 consistently duplicated or triplicated across all considered species, again

214 likely reflecting sequence properties (cluster V). Both members of the

215 Esocidae, the eastern mudminnow U. pygmaea, and the Esox

216 lucius are placed next to each other in the hierarchical species clustering.

217 Separating cluster III into subclusters reveals that both the eastern

218 mudminnow and the Northern pike feature only small non-overlapping

219 groups of species-specific duplicated genes containing 47 and 76 genes,

220 respectively (Fig. S3, Table S3).

221

III III IV V American whitefish Amiiformes European whitefish Rainbow trout Anguilliformes Grayling Brook trout Beloniformes Brown trout Characiformes Arowana Butterfly fish Elephantnose fish European eel Cypriniformes Black ghost Panga Esociformes Mexican cave tetra Gadiformes Mexican surface tetra Medaka Gymnotiformes European perch Zebrafish Lepisosteiformes Allis shad Northern pike Eastern mudminnow Salmoniformes Ayu Osteoglossiformes Bowfin Atlantic Siluriformes Order Perciformes 222 single-copy duplicated triplicated fragmented missing

10

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

223 Fig.2. Phylogenetic profiling of universal single-copy orthologous genes in 224 24 de novo transcriptome assemblies. Genes are grouped into five clusters 225 (top, I-V) by profile similarity. The hierarchical clustering tree for all species 226 is shown on the left together with the corresponding color-coded order. 227 228

229 Comparative genome analyses of Esociformes species reveal 230 extensive transposable element expansions 231 We performed a direct comparative analysis of the genome composition of

232 U. pygmaea using recently published short-read-based genome assemblies

233 of five Esociformes species(Pan et al. 2021) and, in addition, the high-quality

234 chromosome scale genome assembly of E. lucius (Table S4). Due to the

235 employed short-read sequencing and relatively low coverages, the

236 assemblies for U. pygmaea, E. masquinongy, E. niger, N. hubbsi, and D.

237 pectoralis are fragmented and less complete (11 - 24% BUSCO genes

238 missing) compared to the chromosome-scale assembly of E. lucius (3%

239 missing). Nonetheless, the genome assembly sizes correspond well to direct

240 measurements of genome size (Fig. 3A). In agreement with the

241 transcriptome analysis, the assessment of BUSCO genes in all six genome

242 assemblies shows low levels of duplication. We then characterized the

243 repetitive sequence content of each genome aiming to test the hypothesis

244 that transposable element activity might have caused a genome expansion

245 exclusively in U. pygmaea. We, therefore, annotated repetitive sequences in

246 all six genomes using a combined de novo repeat library (see Methods). The

247 observed repeat content in each genome scales with the total genome size

248 (Fig. 3B), with the highest repeat content of 64% in the largest genome of U.

249 pygmaea and 37% of repeat content in the smallest genome of N. hubbsi

11

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

250 (Fig. 3C, stacked plot on the left). In general, DNA transposons make up the

251 largest part of classified repeat sequences in all analyzed species,

252 contributing about 19% genomic sequence in U. pygmaea, E. lucius, and E.

253 masquinongy (Tab. S4). In contrast, E. niger, N. hubbsi, and D. pectoralis

254 show lower DNA transposon abundances with 14.8%, 10.8%, and 8%,

255 respectively. A similar pattern is observed for LINE elements, while SINE

256 elements and LTRs are generally low in abundance. A principal component

257 analysis of repeat abundances across all six species reveals that the

258 repetitive sequence composition mirrors the phylogenetic relationships (Fig.

259 S4), with high similarity between E. lucius and E. masquinongy, as well as

260 between N. hubbsi and D. pectoralis. Both, U. pygmaea and E. niger,

261 constitute outliers in repeat composition. Importantly, we observe a

262 significant expansion of highly repetitive and species-specific sequences

263 awaiting further characterization in U. pygmaea with 30.8% genomic

264 abundance, compared to abundances in the range of 15.6-21.2% in all other

265 species.

266 Similar to the repeat composition, repeat expansion histories also show

267 significant differences between the considered species (Fig. 3C). Here, the

268 repeat expansion history is captured by the Kimura substitution level for each

269 genomic copy of a repeat as a proxy for its age, since each copy accumulates

270 mutations over time and diverges from its consensus sequence. The

271 resulting divergence landscape suggests that the genome size increase of

272 U. pygmaea results from a recent spike in DNA transposon activity combined

273 with the accumulation of unclassified repetitive sequence, with 45% of the

12

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

274 genomic repeat sequence showing a mean divergence below 8 (Fig. 3C, top

275 left). The most abundant repeat sequence in U. pygmaea is the Tc1/mariner

276 element IC-Mifl (upygm-1-8) which makes up 1% (20.5 Mb) of the genome

277 assembly with an average divergence of 8 (Table S5). The second most

278 abundant repeat Tc1-2_Xt (upygm-1-14), with 14 Mb and 0.7 % of genomic

279 sequence, is also member of the Tc1/mariner family and shows a very recent

280 expansion pattern with an average divergence of 1.34. E. masquinongy and

281 E. lucius also show a notable recent expansion peak driven by DNA

282 transposon activity. Similar to U. pygmaea, the expansion of E. lucius is also

283 driven in part by IC-Mifl which contributes 1.3 % / 11.7 Mb to the total genome

284 assembly, making it the third most abundant repeat only surpassed by

285 Mariner-9 originally identified in Salmo salar and an RTE-1 non-long-

286 terminal-repeat element identified in E. lucius (Table S5). The prominence of

287 IC-Mifl notwithstanding, the high total abundance of DNA transposon

288 sequence in the aforementioned species is caused by a multitude of active

289 families. While the total repeat content of D. pectoralis is low, the divergence

290 landscape reveals a currently ongoing repeat expansion with 27% of the total

291 repeat content at divergences below 4, which translates into 9.1 % of the

292 total genome sequence. In contrast to all other species, the expansion is

293 caused by two LINE/RTE-BovB elements with high abundance (0.5 and

294 0.2%, respectively, of total genome) and low mean divergence (1.16 and

295 0.87) in combination with rRNA, tRNA, and unclassified sequences. This

296 leaves N. hubbsi as the only species with a consistently low transposable

297 element activity.

13

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

298 299 A B 2.00 ● ● U.pygmaea 60 U.pygmaea 1.75 55 ● 1.50 50 E.masquinongy ● E.lucius 1.25 E.masquinongy 45 E.niger ● ● E.lucius D.pectoralis 1.00 N.hubbsi ● ● ● 40 D.pectoralis ● ● N.hubbsi E.niger ● 0.75 [%] Content Repeat Total Genome Assembly Size [GB] Size Assembly Genome 1.0 1.5 2.0 1.0 1.5 2.0 Haploid Genome Size [pg] (FD) Haploid Genome Size [pg] (FD) C U. pygmaea E. masquinongy 60 5 60 5 4 4 40 3 40 3 20 2 20 2 1 1 0 0 0 0 Percent of genome of Percent 0 10 20 30 40 50 0 10 20 30 40 50

E. lucius E. niger 60 5 60 5 4 4 40 3 40 3 20 2 20 2 1 1 0 0 0 0 Percent of genome of Percent 0 10 20 30 40 50 0 10 20 30 40 50

D. pectoralis N. hubbsi DNA RC 5 5 60 60 LTR 4 4 LINE 40 3 40 3 SINE 2 2 20 20 Other 1 1 Unknown 0 0 0 0 Percent of genome of Percent 0 10 20 30 40 50 0 10 20 30 40 50 300 Divergence Divergence 301 Fig.3. Genomic repeat sequence content and divergence landscapes in six 302 Esociformes species. A) Genome size measurements by Feulgen 303 densitometry compared to genome assembly size (Table S4). B) Genome 304 assembly size compared to repetitive sequence content. C) Total repetitive 305 sequence content (left) and repeat copy divergence landscapes (right) as 306 fraction of the total genome assembly for each of the six species, 307 distinguishing the main repetitive sequence classes. 308 309 310 To gain a more detailed overview of the Tc1/mariner transposon superfamily

311 activity, we constructed a phylogeny of all transposase sequences in our

14

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

312 repeat library together with transposase sequences of a previously described

313 Tc1/mariner reference set(Gao et al. 2017) in neoteleosts. This phylogeny

314 was then complemented with the detected genomic abundances in the

315 considered species (Figure 4). We find that most highly abundant elements

316 fall into the Minos-/Bari-like clade (Figure 4 clade A), such as the two most

317 abundant elements of U. pygmaea (upygm-1-8, upygm-1-14), E.

318 masquinongy (eluci-6-168), and N. hubbsi (eluci-4-461). Other more

319 abundant lineages are passport (Figure 4 clade B and D), with the most

320 abundant element eluci-1-0 in E. lucius, and frog prince (Figure 4 clade C,

321 upygm-1-69, eluci-6-2394).

322 Finally, we performed a manual curation of the 30 most abundant repeat

323 sequences found in U. pygmaea, contributing a combined 101.87 Mb to the

324 genome assembly to test the extent to which the unclassified repeat

325 sequence expansion can also be attributed to transposable element activity.

326 We find TE-related structural features in 13 repeats, accounting for 48 Mb,

327 in addition to a highly abundant tandem repeat and a satellite sequence

328 (Table S6). With 11 out of 13 TE-related sequences, the vast majority shows

329 DNA transposon features. In general, we observe a consistent pattern of

330 increased abundance across species in a small group of Tc1/mariner

331 elements mirroring the pattern of total genomic repeat abundance in the

332 respective species.

333 334

15

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

● upygm−6−2325 ● upygm−6−10034 ● eluci−6−6051 ● enig−6−8846 Detected in ● dpect−6−954 ● dpect−6−4153 ● upygm−6−1332 ● upygm−6−12034 ● ● upygm−6−5394 ● upygm−6−104 U.pygmaea ● nhubb−1−755 ● eluci−6−9212 ● emasq−1−282 ● emasq−1−894 ● ● upygm−5−211 D.pectoralis ● eluci−6−5116 ● dpect−6−10709.1 ● dpect−6−12441 ● dpect−6−10709 ● eluci−6−13133 ● ● nhubb−1−714 N.hubbsi ● upygm−5−2353.1 ● upygm−6−1653 ● eluci−6−6295 ● dpect−6−1458.1 ● ● eluci−1−357 ● emasq−6−3556.1 E.lucius ● upygm−6−6677 ● upygm−6−2744 ● emasq−6−646 ● ● dpect−6−3945 E.masquinongy ● dpect−6−7544 ● upygm−6−574.1 ● upygm−6−574 ● upygm−6−2790 ● ● dpect−5−4732 ● eluci−6−1668 E.niger ● emasq−6−4004 ● eluci−6−20304 ● upygm−6−6110 ● upygm−6−7304 ● ● nhubb−6−4879 ● upygm−6−3216 Reference ● dpect−6−9335 ● eluci−6−8244 ● dpect−6−820 ● dpect−6−2508 ● dpect−6−20619 ● enig−5−1580 ● nhubb−6−12303 ● nhubb−5−308 ● upygm−5−90.1 ● upygm−6−298 ● upygm−6−575 ● emasq−6−3556 ● emasq−5−132 ● dpect−6−19895 Genomic ● dpect−6−2508.1 ● upygm−5−4765 ● nhubb−1−771 ● upygm−6−531 ● enig−6−733 ● upygm−5−2353 ● emasq−1−881 ● emasq−1−405 Abundance ● dpect−6−3874 ● emasq−6−14646 ● eluci−4−1823 ● emasq−6−16645 ● dpect−6−1458 ● eluci−5−7810 ● eluci−6−3907 ● dpect−6−1495 [Mb] ● enig−6−5888 ● eluci−5−71 ● enig−6−12499 ● nhubb−5−1582 ● emasq−6−4259 ● emasq−6−5593 ● emasq−5−4684 ● enig−6−12499.1 ● nhubb−6−8570 ● nhubb−6−8092 ● nhubb−6−1591 20 ● nhubb−6−32458 ● enig−6−3291 ● emasq−6−1827 ●upygm−1−1819 ● emasq−6−5593.1 ● nhubb−5−308.1 ● enig−5−6716 ● emasq−6−8753 ● emasq−6−1827.1 ● emasq−6−6909 ● eluci−5−7810.1 ● nhubb−6−2117 ● upygm−4−1265 15 ● enig−6−1186 ● upygm−6−104.1 ● upygm−5−22 ● emasq−6−4004.3 ● upygm−6−3265 ● dpect−6−24044 ● enig−6−7512 ● eluci−6−3507 ● emasq−5−281 ● upygm−6−821 ● upygm−6−1747 ● eluci−6−3507.1 ● enig−5−537 10 ● emasq−6−5367 ● emasq−6−3556.2 ● eluci−6−5978 ● eluci−1−470 ● eluci−1−357.1 ● dpect−6−1458.2 ● upygm−5−900 ● upygm−6−542 ● upygm−6−6677.1 ● eluci−1−449 ● upygm−6−6693 ● emasq−1−666 ● enig−5−1926 5 ● upygm−6−104.2 ● nhubb−1−786 ● Tc1_5_Tr (pogo−like) ● Tc1_6_Ga (pogo−like) ● Tc1_7_On (pogo−like) ● Tc1_7_Ol (pogo−like) ● Tc1_3_Tn (pogo−like) ● Tc1_5_Gm (pogo−like) ● eluci−6−1668.1 ● upygm−6−2515.1 ● emasq−6−4004.2 ● upygm−5−597 ● upygm−6−823 ● emasq−6−4004.1 ● enig−6−735 ● upygm−5−1181 ● upygm−5−211.1 ● emasq−5−3909 ● eluci−6−12688.1 ● nhubb−5−199.1 ● dpect−6−108 ● nhubb−5−1216 ● emasq−6−12668 ● upygm−6−4004 ● dpect−6−5897 ● eluci−6−12688 ● nhubb−5−199 ● upygm−5−561 ● dpect−5−3101 ● enig−6−3315 ● upygm−6−2264 ● emasq−6−6050 ● enig−6−3315.1 ● enig−6−8700 ● enig−6−5490 ● eluci−6−1175 ● upygm−5−3194 ● eluci−6−672 ● upygm−5−2812 ● upygm−4−456 ● emasq−6−1208 ● emasq−6−2577 ● dpect−6−1833 ● eluci−6−7384.1 ● eluci−6−7384 ● dpect−6−31979 ● upygm−6−222 ● upygm−5−90 ● upygm−6−2515 ● eluci−5−5386 ● eluci−6−31659 ● upygm−6−163 ● dpect−5−5339 ● upygm−1−191 ● upygm−1−306 ● upygm−1−396 ● emasq−1−161 ● Tc1_6_Ol (minos−like) ● eluci−5−5330 ● eluci−4−461 ● upygm−1−87 ● enig−4−206 ● eluci−5−423 ● upygm−1−8 ● Tc1_2_Tn (bari−like) ● emasq−1−25 ● eluci−1−99 ● upygm−5−4933 A ● emasq−1−61 ● eluci−6−9355 ● eluci−6−168.1 ● Tc1_2_Gm (minos−like) ● Tc1_3_On (minos−like) ● upygm−1−14 ● eluci−6−168 ● eluci−5−12697 ● Tc1_4_On (minos−like) ● eluci−5−12636.1 ● enig−1−15 ● eluci−5−80 ● Tc1_4_Gm (passport−like) ● Tc1_3_Gm (passport−like) ● eluci−1−0 ● eluci−5−58 ● Tc1_4_Ol (passport−like) ● Tc1_5_Ol (passport−like) B ● Tc1_2_On (passport−like) ● Tc1_3_Tr (passport−like) ● Tc1_3_Ol (passport−like) ● Tc1_1_Tr (sb−like) ● Tc1_1_Tn (sb−like) ● Tc1_6_On (sb−like) ● Tc1_2_Ol (frog.prince−like) ● Tc1_1_Ol (frog.prince−like) ● Tc1_2_Ga (frog.prince−like) ● enig−1−82 ● Tc1_1_On (frog.prince−like) ● Tc1_4_Tr (frog.prince−like) ● upygm−1−69 ● eluci−6−2394 C ● Tc1_1_Gm (frog.prince−like) ● upygm−1−27 ● Tc1_3_Ga (passport−like) ● emasq−1−26 ● Tc1_2_Tr (passport−like) ● Tc1_5_On (passport−like) ● enig−1−12 ● Tc1_1_Ga (passport−like) ● Tc1_5_Ga (passport−like) ● Tc1_4_Ga (passport−like) D ● upygm−1−18 ● upygm−1−106 ● eluci−4−599 ● eluci−5−141 ● eluci−6−2604 ● eluci−5−12636 N.hubbsi D.pect. U.pygm. 335 E.lucius E.masq. E.niger 336 Fig.4. Phylogenetic relationship (left) and genomic abundance (right) of 337 Tc1/mariner superfamily DNA transposons across six Esociformes species. 338 The phylogenetic tree is based on transposase sequences from 206 repeats 339 combined with 33 previously described(Gao et al. 2017) reference elements 340 from six lineages (Pogo, Passport, SB, Frog Prince, Minos, and Bari). Clades 341 containing most abundant elements are marked A-D. Genomic abundance 342 values in Mb are based on full length repeat sequences, where elements not 343 detected in the respective species are shown in white. 344

16

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

345 Fluorescence in Situ Hybridization

346 Studied pike species mostly exhibit karyotypes composed of 50 mono-armed

347 chromosomes (FN = 50). In turn, Umbra limi possesses only 22 metacentric

348 and submetacentric chromosomes (FN = 44), gradually decreasing in size.

349 Fluorescence in Situ hybridization (FISH) with PNA telomere probe revealed

350 hybridization signals only at the very ends of all chromosomes in E. lucius

351 and E. cisalpinus. In Umbra limi, apart from the terminal location of the FISH

352 telomeric signals observed on all chromosomes, up to seven chromosomes

353 exhibited interstitial telomeric sites (ITSs) in the pericentromeric locations. In

354 four of these chromosomes, ITSs overlapped with DAPI positive regions.

355 DAPI positive signals occurred on the q-arms below the pericentromeric

356 regions and did not correspond with the ITSs for up to three other

357 chromosomes (Fig. 5A). The 5S rDNA probe hybridized to three sites in a

358 pericentromeric location on two metacentric chromosomes and centromeric

359 location on one submetacentric chromosome in U. limi (Fig. 5B). All 5S rDNA

360 sites were DAPI-positive.

361

17

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

362 Fig.5. FISH with telomeric probes and 5S rDNA. Chromosomes of Umbra 363 limi after FISH with telomeric probe (A) and 5S rDNA probe (B) and of Esox 364 niger after FISH with 5S rDNA probe (C). All chromosomes are 365 counterstained with DAPI. The FISH with a telomeric probe in U. limi yielded 366 stronger interstitial telomeric signals than the terminal ones. Bar = 10µm. 367

368 Molecular analysis of 5S rDNA genomic fraction

369 The slot blot quantification of 5S rRNA genes indicates approximately 2,000

370 5S rDNA copies per Umbra genome with a slightly higher copy number in U.

371 limi compared to U. krameri. Umbra thus has about ten times lower copy

372 number of 5S rRNA than both Esox species analyzed so far (Fig. 6 A, B).

373 These copies are arranged in tandems in both Umbra as well as in Esox

374 indicated by ladders of bands after the digestion with MspI restriction enzyme

375 (not sensitive to CG methylation) (Fig. 6 C). Increased ladder length for both

376 Umbra species reveals greater sequence divergence of MspI (CCGG) sites

377 compared to Esox. Digestion of DNAs with methylation-sensitive HpaII

378 isoschizomere resulted in hybridization signals in high-molecular-weight

379 regions with little to no ladders (Fig. 6 C), which is consistent with heavy

380 methylation of internal Cs within the CCGG motifs.

18

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

381 382 Fig.6. Genomic analysis of Umbra and Esox 5S rDNA. (A) Southern blot 383 hybridization. (B) Slot blot hybridization. Probe – 5S rDNA insert from the E. 384 cisalpinus clone a, K - U. krameri, L = U. limi, ESLU - E. lucius, (C) DNA 385 methylation analysis of 5SrDNA in Umbra and Esox. HpaII is a methylation- 386 sensitive isoschizomere of MspI. Probe – 5S rDNA insert from the E. 387 cisalpinus (Eci) clone a. ESLU – Esox lucius, K – Umbra krameri, L – Umbra 388 limi.

389

390 Discussion

391 The sizes of eukaryotic genomes vary substantially with teleosts being no

392 exception. The range of described cases of whole genome duplication

393 (WGD) events amongst teleosts make this mechanism the main suspect

394 when observing significant genome size increases. In contrast, we find that

395 the genome expansion in the genus Umbra is not caused by a WGD but

396 transposable element activity, and that extensive chromosome fusion

19

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

397 events have occurred in some Umbra species. In the following, we discuss

398 what sets these inconspicuous members of the order Esociformes apart

399 from their sister lineage from an ecological and evolutionary point of view.

400 Genome expansion and chromosome fusions in Umbra sp.

401 Our results demonstrate that the European mudminnow species U. krameri

402 underwent a genome expansion similar to the two North American Umbra

403 sister species. This genome expansion is thus a common trait of the entire

404 genus and can be localized in time prior to the split of the American and

405 European Umbra lineages, setting the Umbridae family apart from their sister

406 lineages Esox, Dallia, and Novumbra (Marić et al. 2017; Gregory 2020). The

407 European species U. krameri retained the Umbra ancestral chromosome

408 number (2n = FN = 44), which is even lower than the most common teleost

409 ancestral 2n ≈ 50 (Mank & Avise 2006; Sacerdot et al. 2018) chromosome

410 number and might indicate an overall tendency towards chromosome

411 number reduction in the genus Umbra. Paradoxically, the genome expansion

412 in U. limi and U. pygmaea is accompanied by a further reduction in

413 chromosome counts (2n = 22) while maintaining the same number of

414 chromosome arms (FN = 44). This reduction in the number of chromosomes

415 without changes to the number of chromosome arms, as confirmed for U.

416 limi, is consistent with Robertsonian fusion events. Accordingly, up to seven

417 out of 22 bi-armed chromosomes of U. limi exhibit interstitial telomeric sites

418 (ITSs), which presumably are remnants of these fusions. Similar occurrences

419 of telomeric DNA at non-telomeric sites as relics of chromosomal

420 rearrangements including centric fusions (Robertsonian translocations),

20

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

421 tandem fusions, and inversions (Ocalewicz 2013) have been observed at

422 pericentromeric locations of fused mammalian and non-mammalian

423 chromosomes (Ocalewicz et al. 2013; Meyne et al. 1990). Breaks in telomer-

424 adjacent chromatin regions before the fusion event may explain the absence

425 of telomeric repeat sequences at the remaining fusion sites (Nanda et al.

426 1995; Garagna et al. 1995). Furthermore, ITSs may undergo gradual loss

427 and degeneration, leading to progressive shortening. So while FISH is

428 currently the best approach to explore ITSs in situ, some interstitially located

429 telomeric DNA repeats might be too short for detection (Slijepcevic 1998).

430 The low chromosome count of the North American Umbra species (2n = 22)

431 makes them part of a small group of fishes. Among about 32,000 fish species

432 (Nelson et al. 2016), there are only thirteen fish species with 2n < 22

433 chromosomes described to date, made up mainly of Cyprinodontiformes (six

434 in Nothobranchiidae, one in Rivulidae) ranging between 2n = 16 - 20. The

435 lowest chromosome counts were described for the marine teleost species

436 spark anglemouth (Sigmops bathyphilus, Stomiiformes, Gonostomatidae)

437 (Post 1974) with 2n = 12 and the freshwater teleost species chocolate

438 gourami (Sphaerichthys osphromenoides, Osphronemidae,

439 Anabantiformes) (Calton & Denton 1974; Koref-Santibanez & Paepke 1994;

440 Arai 2011) with 2n = 16. Interestingly, the S. bathyphilus belongs to the order

441 Stomiiformes, a member of the superorder Osmeromorpha that split after the

442 earlier divergence of the superorder Protacanthopterygii (i.e. Esociformes

443 and Salmoniformes) (Nelson et al. 2016).

21

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

444 Interplay between cytogenomic and ecological traits 445 The chromosome number of a species can be linked to its ecological traits.

446 Specialized fishes were demonstrated to have smaller genomes than more

447 generalized species within and across lineages (Hinegardner & Rosen 1972;

448 Hardie & Hebert 2004). This constraint may be extended to the chromosome

449 number and genome size in specialized species (Hardie & Hebert 2004), or

450 constraints on genome size linked to the heightened developmental

451 complexity of specialized species (Gregory 2002). Similar principles have

452 been proposed for mammals (Qumsiyeh 1994) where species with higher 2n

453 or FN have a higher recombination rate because proper disjunction requires

454 at least one chiasma per arm (Jensen-Seaman et al. 2004). Increased

455 recombination not only leads to increased genetic variability, thereby

456 allowing the utilization of a wider niche (Qumsiyeh 1994), but also a genome

457 size reduction (Nam & Ellegren 2012). On the other hand, decreased

458 recombination favors the fixation of new mutations and thereby potential

459 speciation (Qumsiyeh 1994). Furthermore, decreased recombination can

460 cause the genome to expand due to the accumulation of transposon

461 insertions (Dolgin & Charlesworth 2008). The interaction between

462 transposon activity and recombination rate may create positive feedback,

463 where the accumulation of transposon insertion sites contributes to further

464 suppress recombination (Kent et al. 2017; Fedoroff 2012). The positive

465 correlation between recombination rate and effective population size (Mugal

466 et al. 2015) requires the consideration of the ecology of Umbra and Esox.

467 Both genera’s species belong to freshwater fishes at northern latitudes and

468 hence are affected by bottlenecks and reduced effective population sizes

22

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

469 within refugia during glaciation periods (Bernatchez & Wilson 1998). Low

470 levels of genetic diversity in pikes have been described extensively (Skog et

471 al. 2014) and may explain the extreme 5S rDNA amplification through genetic

472 drift (Symonová et al. 2017).

473 Genomic differences found among mudminnows can hardly be related to

474 variable ecological traits within this group, as the ecology of mudminnows

475 appears surprisingly similar. The American (U. limii) has

476 been termed a “habitat specialist and resource generalist” (Martin-Bergmann

477 & Gee 1985), referring to the densely vegetated and often overgrown and

478 deoxygenated swampy habitats in which this species is found. Accessory air-

479 breathing capabilities allow mudminnows to use such marginal habitats in

480 which no, or only a few other fish species can be found. Otherwise, the

481 central mudminnow uses a wide variety of small prey. This brief

482 ecological characterization can be extended to all other mudminnows (sensu

483 lato), including Novumbra and Dallia. Despite their evolutionary

484 differentiation, all mudminnows are found in such characteristic habitat types

485 (Kuehne & Olden 2014) which might be related to their low competitive

486 abilities (Wanzenböck & Spindler 1995; Sehr & Keckeis 2017). All

487 mudminnows possess air-breathing accessory capabilities and use a wide

488 variety of small invertebrate prey (Kuehne & Olden 2014). An additional

489 peculiarity of this group of fishes is the parental care behavior, which become

490 territorial during spawning time guarding the eggs. In summary, most

491 ecological traits of mudminnows are convergent or stable despite their

492 divergence in phylogeny and the differences in genomics found in our study.

23

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

493 In contrast, pikes are found in a wide variety of habitats, densely overgrown

494 lentic waters to open waters in large or even brackish waters and

495 riverine habitats. Furthermore, they are rather specialized piscivorous during

496 juvenile and adult life stages. Conversely, they might be called “habitat

497 generalists and resource specialists.” Whether such ecological differences

498 may be related to genomic differences between mudminnows and pikes

499 remains open.

500 Next to potentially ecological influences, there are also purely genomic

501 factors driving genome size evolution. A common cause of genome size

502 expansion is a WGD event (polyploidy) or repeat expansion due to

503 transposon or satellite DNA. In the Umbra genus, however, the genome

504 expansion was accompanied by a reduction in chromosome number, which

505 suggests two scenarios.

506 (i) The Umbra lineage has experienced a rapid chromosome loss (disploidy)

507 following an additional WGD. Evidence from plants suggests that repeated

508 WGD cycles with subsequent chromosome loss may generate karyotypes

509 with few chromosomes (Dodsworth et al. 2016). However, our comparative

510 genome and transcriptome analyses do not support the existence of

511 additional ohnologs in Umbra pygmaea. The observed number of duplicated

512 protein-coding genes in the genome assemblies of U. pygmaea and its sister

513 lineage Esox lucius was similar, excluding WGD as cause of genome

514 expansion in Umbra. While the completeness of the employed

515 transcriptomes and genomes is relatively low, it is not plausible that

516 specifically ohnologs are disproportionately missing in both cases.

24

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

517 (ii) Alternatively, the expansion of repeat sequences could explain the

518 increased genome size. Evidence of such a repeat expansion was found in

519 a sister lineage. The two species E. lucius and E. cisalpinus show an extreme

520 amplification of 5S rDNA across their chromosomes (Symonová et al. 2017)

521 while retaining their genome size and chromosome counts. This massive

522 expansion of tandem repeats by several orders of magnitude must have

523 happened over a short evolutionary time, as the related E. niger still features

524 a low number of 5S rDNA copies ancestral to the Esocidae lineage (Kovařík,

525 unpublished data). While the Umbra lineage shows teleost-

526 typical(Sochorová et al. 2018) 5S rDNA copy numbers, our comparative

527 genome analysis of the U. pygmaea genome confirms a significantly

528 elevated interspersed repeat content of 64%, which is the result of recent

529 transposon expansion peak driven by DNA transposon activity in

530 combination with unclassified repeat sequences. This peak is not only

531 consistent with the pervasive DNA transposon activity particularly in E.

532 lucius, E. masquinongy, and E. niger, but also with the generally high

533 abundance of DNA transposons in most fish genomes (Shao et al. 2019).

534 Manual curation of the top 30 highly abundant unclassified sequences

535 reveals DNA transposon features, further supporting the observed DNA

536 transposon expansion. The D. pectoralis constitutes an

537 exception as it appears in the process of repeat expansion driven by LINE

538 elements, rRNA, tRNA, and unclassified elements. Importantly, genome size

539 variation is the net result of sequence gain and loss (Petrov et al. 2000)

540 allowing for the possibility of sequence gain by TE expansion in U. pygmaea,

25

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

541 or alternatively sequence loss of various extent in the other species. As the

542 expansion peaks observed in all species except N. hubbsi appear to be

543 relatively recent at divergences below ten percent, a sequence gain scenario

544 is more plausible. As all but the E. lucius genome assembly are highly

545 fragmented and lack the completeness of reference genome assemblies, it

546 is however likely that the presented results underestimate the total repeat

547 content, particularly for long repeat sequences. This fragmentation is also a

548 likely explanation for the high abundance of unclassified repeat sequences

549 in U. pygmaea. Furthermore, it is not possible to locate chromosomal fusion

550 sites or potential duplications in the U. pygmaea assembly. Further analyses

551 based on a platinum standard long read based reference assemblies and an

552 extensively curated library of repeat sequences are required to address

553 these questions.

554 Here, we show that all members of the Umbra genus feature a significantly

555 expanded genome, placing the expansion event prior to the separation

556 between the European and North American species. However, only the North

557 American species experienced Robertsonian chromosome fusion events,

558 reducing their chromosome number significantly. Taken together, our

559 findings show that a recent activity peak mainly of Tc1/mariner DNA

560 transposons significantly increased the size of the U. pygmaea genome. We

561 furthermore find evidence of an ongoing repeat-driven genome expansion in

562 the Alaska blackfish Dallia pectoralis. This makes the family Umbridae and

563 the entire order Esociformes an attractive model system to study exclusively

564 repeat-driven genome expansion and its potential role in speciation.

26

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

565

566 Material and Methods

567 Fish sampling 568 Ten individuals of Umbra krameri were descendants from individuals

569 sampled between 1993 - 1997 from “Fadenbach” between Orth and

570 Eckartsau (Danube Wetlands, National Park in Lower Austria) within a nature

571 conservation project of the provincial government of Lower Austria

572 (Wanzenböck & Spindler 1995). Ten individuals of Umbra limi and ten

573 individuals of Esox niger were sampled in Quebec, Canada, in autumn 2017.

574 A local professional fisherman provided samples of two individuals of E.

575 lucius at Mondsee in 2016. Further specimens of E. lucius and E.

576 cisalpinus were analyzed in our earlier study (Symonová et al. 2017). This

577 included 28 eight-month-old specimens (12 males, nine females, and one

578 unsexed) from the local fish farm (Olsztyn, Poland) sampled for cytogenetic

579 analyses.

580 Genome size determination by flow cytometry 581 Ethanol fixed red blood cells (RBC) of Umbra krameri and Esox lucius,

582 respectively, were used for genome size determination following (Lamatsch

583 et al. 2000) with chicken RBC as internal standard (C-value 1.25 pg per

584 nucleus). We determined genome size by flow cytometry using the Attune™

585 NxT Acoustic Focusing Cytometer (Thermo Fisher Scientific, Vienna,

586 Austria). We applied instrument settings optimized using the AttuneR

587 Cytometric Software. We used forward scatter (FSC) and VL1 violet

588 fluorescence (405 nm excitation, bandpass filter 440/50 nm) as triggers for

27

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

589 all measurements. We analyzed 20.000 cells or sample volumes of 0.1 mL

590 at a flow rate of 0.025 mL/min. DAPI is an AT-specific stain, which can yield

591 inaccurate results when the AT/GC content of the sample and internal

592 standard differ significantly. Chicken and E. lucius RBC, however, show a

593 very similar AT-content of 57.73%, 20, and 56.8%, (Borisova et al. 1974),

594 respectively, suggesting that the underestimation of the total amount of DNA

595 is negligible. Records on genome size come from the www.genomesize.com

596 database (Gregory 2020) and the cited literature (Table 1).

597 Comparative transcriptome analysis 598 We analyzed 24 de novo transcriptome assemblies of the PhyloFish

599 database (Pasquier et al. 2016): bowfin (Amia calva), gar (Lepisosteus

600 oculatus), European eel (Anguilla anguilla), butterflyfish (Pantodon

601 buchholzi), arowana (Osteoglossum bicirrhosum), elephantnose fish

602 (Gnathonemus petersi), Allis shad (Alosa alosa), zebrafish (Danio rerio),

603 panga (Pangasianodon hypophthalmus), a black ghost (Apteronotus

604 albifrons), cave Mexican tetra (Astyanax mexicanus), surface Mexican tetra

605 (Astyanax mexicanus), Northern pike (Esox lucius), Eastern mudminnow

606 (Umbra pygmaea), grayling (Thymallus thymallus), European whitefish

607 (Coregonus lavaretus), American (Lake) whitefish (Coregonus

608 clupeaformis), brown trout (Salmo trutta), rainbow trout (Oncorhynchus

609 mykiss), brook trout (Salvelinus fontinalis), Ayu (Plecoglossus altivelis),

610 Atlantic cod (Gadus morhua), medaka (Oryzias latipes), and European perch

611 (Perca fluviatilis). The completeness and duplication were assessed with

612 BUSCO v4.1.4 (Simão et al. 2015) using the dataset actinopterygii_odb10

28

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

613 containing 3640 genes (Table S1). For a subset of 10 species, reference

614 genome assemblies and structural genome annotations are available at the

615 NCBI reference sequence database (Table S2). For this subset, de novo

616 transcriptome assembly completeness and duplication were compared to

617 reference-genome-derived transcriptomes using identical BUSCO settings.

618 Gene counts were separately compared for each of the five gene categories

619 (complete, complete-single-copy, complete-duplicated, fragmented,

620 missing) using the coefficient of determination (Fig. S1). A phylogenetic

621 profile matrix was assembled representing the number of orthologous copies

622 detected for each of the 3640 BUSCO genes across 24 transcriptome

623 assemblies. Hierarchical clustering of the phylogenetic profile matrix was

624 performed using Manhattan distance and the “ward.D2” method

625 implemented in R v3.6.1 to obtain five main gene clusters and 26 sub-

626 clusters of main cluster III.

627 Comparative genome analysis 628 The five published(Pan et al. 2021) short-read based scaffold-level genome

629 assemblies of the Esocidae species Umbra pygmaea, Dallia pectoralis,

630 Novumbra hubbsi, Esox niger, and Esox masquinongy, as well as a

631 chromosome-scale reference genome assembly of Esox lucius, were used

632 for a comparative genome analysis (see Table S4 for accession numbers).

633 For each of the six genomes, a de novo repeat library was generated using

634 RepeatModeler 2.0.1, which yielded between 2,547 and 4,983 sequences

635 per species. These libraries were then combined and clustered using

636 usearch v11.0.667 with 80% similarity cutoff to obtain a single non-redundant

29

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

637 repeat library for the order Esociformes, which yielded a total of 14,856

638 sequences. Censor 4.2.29 was used to identify de novo sequences in

639 addition to the RepeatModeler classification and annotation with

640 RepeatMasker and RepBase 26.01. This library was then used as a

641 reference to annotate repeat sequences with RepeatMasker 4.1.1. Repeats

642 containing known TE coding domains were identified by predicting open

643 reading frames using Emboss v6.6.0.0 and comparing the resulting

644 sequences against the transposase protein sequence collection provided

645 with LTR_retriever(Ou & Jiang 2018) via blast. Requiring hits longer than 150

646 amino acids and with e-value < 1e-10 yielded 116 hits in LINE elements and

647 206 hits in sequences not classified as LINE elements, the latter of which

648 were retained for further analysis. This procedure was also used to extract

649 33 transposase sequences of six previously described neoteleost

650 Tc1/mariner lineages(Gao et al. 2017). All 239 sequences were then aligned

651 with Clustal Omega v1.2.4, and a maximum likelihood phylogenetic tree was

652 calculated with RAxML 8.2.12 using a variable time amino acid substitution

653 model with gamma distributed rate heterogeneity. Analysis and visualization

654 of all data was performed using R 3.6.1. The thirty most abundant Umbra

655 pygmaea repeat sequences of the de novo library which did not show

656 significant similarity with known TEs were selected for further manual

657 curation. Firstly, we tested if the unclassified sequences include local context

658 sequence in addition to the core repetitive tract. To that end, the repetitive

659 sequence was used as a query in a Blastn (Altschul et al. 1997) search

660 against the Umbra pygmaea genome assembly. Two or more matching loci

30

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

661 located on different contigs/scaffolds were extracted including 10 Kbp up-

662 and downstream of the matching sequence. These hit sequences were then

663 compared to each other in a dot plot analysis using the dotter (Sonnhammer

664 & Durbin 1995), allowing the exact definition of the entire repetitive element.

665 Complete elements were then analyzed for similarity with known TEs as

666 defined in the latest version of RepBase (Bao et al. 2015) and/or for the

667 presence of structural features typical of known TE elements such as the

668 inverted repeats of DNA TEs. This allowed the association of 13 elements to

669 a TE class including 2 cases with only diagnostic structural features (Table

670 S6). In addition, one satellite and one tandem repeat sequence were

671 identified, leaving a total of 14 sequences unclassified. The curated

672 sequences are provided at

673 https://github.com/roblehmann/umbridae_genome_expansion.

674 Chromosome analyses and fluorescence in situ hybridization 675 Telomeric DNA repeats on chromosomes of Umbra and Esox species were

676 detected by FISH, using a telomere PNA (peptide nucleic acid) FISH Kit/FITC

677 (DAKO, Denmark) according to the manufacturer’s protocol. Slides with the

678 metaphase spreads were washed with buffer (Tris-buffered saline, pH 7.5)

679 for 2 min, immersed in 3.7 % formaldehyde in 1× TBS for 2 min, washed

680 twice in TBS for 5 min each, and treated with the Pre-Treatment solution

681 including Proteinase K (DAKO) for 10 min. Afterward, slides were washed

682 twice in TBS buffer for 5 min each, dehydrated through a cold (-20 °C)

683 ethanol series (70 %, 85 %, 96 %) for 1 min, and air-dried at room

684 temperature. Ten µl of FITC PNA telomere probe mix (DAKO) was dropped

31

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

685 on the prepared slides and covered with the coverslip. Chromosomal DNA

686 was denatured at 85o C for 5 minutes under the coverslip in the presence of

687 the PNA probe. The hybridization reaction took place in the darkness at room

688 temperature for 90 minutes. After hybridization, the coverslips were gently

689 removed by immersion of slides in the Rinse Solution (DAKO) for 1 min.

690 Slides were then washed in the Wash Solution (DAKO) for 5 min at 65°C and

691 dehydrated by immersion through a series of cold ethanol washes of 70%,

692 85%, 96% for 1 min each and air-dried at room temperature. For the

693 counterstaining, chromosomes were mounted in the VECTASHIELD

694 Antifade Mounting Medium containing DAPI (Vector Laboratories, USA).

695 Metaphase plates were analyzed under a Zeiss Axio Imager A1 microscope

696 equipped with a fluorescent lamp and a digital camera. The images were

697 electronically processed using the Band View/FISH View software (Applied

698 Spectral Imaging, Galiliee, Israel). FISH with 5S rDNA probe was performed

699 according to (Fujiwara et al. 1998) with slight modification (Kirtiklis et al.

700 2014). A 5S rDNA probe was obtained via PCR with forward primer 5S-1: 5’

701 - TACGCC CGATCT CGT CCG ATC - 3’ and reverse primer 5S-2: 5’- CAG

702 GCTGGT ATG GCC GTA AGC - 3’ (Pendas et al. 1994). The PCR reaction

703 was carried out in a 50 µl reaction volume containing 1.25 U GoTaq Flexi

704 DNA Polymerase (Promega, USA), 10 µl of 5X Flexi Reaction Buffer

705 (Promega, USA), 100 µM of each dNTP (Promega, USA), 3 mM of MgCl2

706 (Promega, USA), 10 pM of each primer, 2 µl of DNA template, and nuclease-

707 free water. PCR product, obtained after 30 cycles of amplification and

708 annealing at 55 ˚C, was purified using the GeneElute PCR Clean-Up Kit

32

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

709 (Sigma-Aldrich, USA), then labeled with biotin-16-dUTP (Roche, Germany)

710 by nick-translation method (Roche, Switzerland). In situ hybridization with

711 150 ng of rDNA probe per slide was performed with RNase-pre-treated and

712 formamide-denaturated chromosome slides. A post-hybridization wash was

713 performed at 37˚C for 20 min. Chromosome slides were subjected to the

714 detection with avidin-FITC (Roche, Basel, Switzerland) and then

715 counterstained with DAPI in VECTASHIELD Antifade Mounting Medium

716 (Vector Laboratories, USA). At least 15 metaphase chromosome spreads

717 from each specimen were analyzed using a Nikon Eclipse 90i (Nikon, )

718 microscope equipped with epi-fluorescence. Pictures were acquired using a

719 monochromatic ProgRes MFcool camera (Jenoptic, Germany) controlled by

720 a Lucia software ver. 2.0 (Laboratory Imaging, Czech Republic). Post-

721 processing elaboration of all the pictures was made based on CorelDRAW

722 Graphics Suite 11 (Corel Corporation, Canada).

723 Southern and slot blot hybridizations of 5S rDNA 724 The procedure followed the protocol described by (Koukalova et al. 2010).

725 The 5S rDNA probe was a 243 bp insert of the 5S_Eci_a clone (GenBank

726 KX965716) from E. cisalpinus. The plasmid insert was amplified and labeled

727 with the 32P-dCTP (DekaPrime kit, Fermentas, Lithuania). The probe was

728 hybridized at high stringency conditions (washing 2x SSC, 0.1% SDS

729 followed by 0.1xSSC, 0.1% SDS at 65 °C). The hybridization signals were

730 visualized by Phosphor imaging (Typhoon 9410, GE Healthcare, PA, USA),

731 and signals were quantified using ImageQuant software (GE Healthcare, PA,

732 USA). The copy number of 5S rDNA genes was estimated using slot blot

33

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

733 hybridization. Briefly, the DNA concentration was estimated

734 spectrophotometrically at OD260nm using a Nanodrop 3300

735 fluorospectrometer (Thermo Fisher Scientific, USA). Concentrations were

736 verified by the electrophoresis in agarose gels using dilutions of lambda DNA

737 as standards. The three dilutions of genomic DNA (100, 50, and 25 ng),

738 together with serial dilutions of unlabelled plasmid inserts corresponding to

739 the 5S monomers (GenBank KX965715-6). Each aliquot was denatured in

740 0.4 M NaOH and blotted onto a positively charged Nylon membrane (Hybond

741 XC, GE Healthcare, USA) using a vacuum slot blotter (Schleicher-Schuell,

742 Germany). The probe and the hybridization conditions and visualization of

743 signals were the same as described above.

744 DNA methylation analysis of 5S rDNA 745 Purified genomic DNA samples of U. krameri (K1 - K3), U. limi (L1 - L3), and

746 from E. lucius (2 individuals as a control) were digested with methylation-

747 sensitive HpaII (sensitive to CG methylation) and its methylation-insensitive

748 MspI isoschizomere (both enzymes are cutting at CCGG). The restriction

749 fragments were hybridized on blots with the alpha[P32]dCTP-labelled 5S

750 rDNA probe. Control of digestion efficiency was carried out by spiking the

751 Esox genomic DNA with a non-methylated plasmid DNA (pBluescript,

752 Stratagen) and subsequent hybridization with a plasmid probe. Both MspI

753 and HpaII enzymes yielded expected restriction fragments (not shown).

754

755 Author Contributions

34

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

756 RS conceptualized and supervised the study and provided the cytogenomic 757 context; RS, AK, RL, and DKL methodized the study and wrote the original 758 draft; RL analyzed fish transcriptomes and genomes; RL and AZ analyzed 759 transposable elements; AK performed southern blot and slot blot 760 hybridizations; DKL performed the flow cytometry genome size 761 determination; JW provided the ecological and phylogenetic context of the 762 study and specimens for analyses; LB organized sampling, provided 763 specimens for analyses and reviewed the manuscript; KO, LK performed the 764 cytogenetic analyses; RL, RS, DKL, KO, LK, JW, JT wrote, reviewed and 765 edited the study; RS acquired funding for the study.

766 Acknowledgments

767 This study was supported by the Tyrolean funds project with contract Nr. 768 UNI-0404/2015 to RS and co-funded by the Erasmus+ program of the 769 European Union with contract Nr. 2019-1-CZ01-KA203-061433 to RS and 770 DKL. The authors are also grateful to the `Excelence projekt PřF UHK 771 2209/2018` and the Czech Science Foundation (19-03442S) for financial 772 support. 773 We acknowledge Petr Ráb for discussion on genome size in the Umbra 774 genus, Guillaume Côté for Umbra and Esox in Québec, and we also 775 acknowledge Maria Pichler of UIBK for her technical support.

776 Conflict of Interest

777 The authors declare no conflict of interest.

778 Data Availability

779 The de novo repeat library and manually curated sequences are available via 780 https://github.com/roblehmann/umbridae_genome_expansion

781 References

782 Altschul SF et al. 1997. Gapped BLAST and PSI-BLAST: A new 783 generation of protein database search programs. Nucleic Acids Res. 784 25:3389–3402. doi: 10.1093/nar/25.17.3389. 785 Arai R. 2011. Fish Karyotypes: A Check List. Springer, Japan. 786 Bao W, Kojima KK, Kohany O. 2015. Repbase Update, a database of 787 repetitive elements in eukaryotic genomes. Mob. DNA. 6:11. doi: 788 10.1186/s13100-015-0041-9.

35

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

789 Beamish RJ, Merrilees MJ, Crossman EJ. 1971. Karyotypes and 790 DNA values for members of the suborder Esocoidei (: 791 Salmoniformes). Chromosoma. 34:436–447. doi: 792 10.1007/BF00326315. 793 Bernatchez L, Wilson CC. 1998. Comparative phylogeography of 794 Nearctic and Palearctic fishes. Mol. Ecol. 7:431–452. doi: 795 10.1046/j.1365-294x.1998.00319.x. 796 Borisova OF, Razjivin AP, Zaregorodzev VI. 1974. Evidence for the 797 quinacrine fluorescence on three at pairs of DNA. FEBS Lett. 798 46:239–242. doi: 10.1016/0014-5793(74)80377-7. 799 Calton MS, Denton TE. 1974. Chromosomes of the chocolate 800 gourami: A cytogenetic anomaly. Science (80-. ). 185:618–619. doi: 801 10.1126/science.185.4151.618. 802 Crossman EJ, Ráb P. 2001. Chromosomal NOR phenotype and C- 803 banded karyotype of , Novumbra hubbsi 804 (: Umbridae). Copeia. 2001:860–865. doi: 10.1643/0045- 805 8511(2001)001[0860:CNPACB]2.0.CO;2. 806 Dion-Côté A-M et al. 2017. Standing chromosomal variation in Lake 807 Whitefish species pairs: the role of historical contingency and 808 relevance for speciation. Mol. Ecol. 26:178–192. doi: 809 10.1111/mec.13816. 810 Dodsworth S, Chase MW, Leitch AR. 2016. Is post-polyploidization 811 diploidization the key to the evolutionary success of angiosperms? 812 Bot. J. Linn. Soc. 180. doi: 10.1111/boj.12357. 813 Dolgin ES, Charlesworth B. 2008. The effects of recombination rate 814 on the distribution and abundance of transposable elements. 815 Genetics. 178:2169–2177. doi: 10.1534/genetics.107.082743. 816 Ellegren H. 2004. Microsatellites: Simple sequences with complex 817 evolution. Nat. Rev. Genet. 5:435–445. doi: 10.1038/nrg1348. 818 Elliott TA, Gregory TR. 2015. Do larger genomes contain more 819 diverse transposable elements? BMC Evol. Biol. 15:1–10. doi: 820 10.1186/s12862-015-0339-8. 821 Fedoroff N V. 2012. Transposable Elements, Epigenetics, and 822 Genome Evolution. Science (80-. ). 338:758–767. doi: 823 10.1126/science.338.6108.758. 824 Fontdevila A. 2011. The Dynamic Genome: A Darwinian Approach. 825 OUP Oxford.

36

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

826 Fujiwara A, Abe S, Yamaha E, Yamazaki F, Yoshida MC. 1998. 827 Chromosomal localization and heterochromatin association of 828 ribosomal RNA gene loci and silver-stained nucleolar organizer 829 regions in salmonid fishes. Chromosom. Res. 6:463–471. doi: 830 10.1023/A:1009200428369. 831 Gao B et al. 2017. Characterization of autonomous families of 832 Tc1/mariner transposons in neoteleost genomes. Mar. Genomics. 833 34:67–77. doi: 10.1016/j.margen.2017.05.003. 834 Garagna S et al. 1995. Robertsonian metacentrics of the house 835 mouse lose telomeric sequences but retain some minor satellite DNA 836 in the pericentromeric area. Chromosoma. 103:685–692. doi: 837 10.1007/BF00344229. 838 Gilbert C, Williams J. 2002. Field guide to fishes. North America. 839 Alfred A. Knopf, New York. 840 Grabherr MG et al. 2011. Full-length transcriptome assembly from 841 RNA-Seq data without a reference genome. Nat. Biotechnol. 29:644– 842 52. doi: 10.1038/nbt.1883. 843 Gregory TR. 2020. Genome Size Database. 844 http://www.genomesize.com. 845 Gregory TR. 2002. Genome size and developmental complexity. 846 Genetica. 115:131–146. doi: 10.1023/A:1016032400147. 847 Gregory TR, Hebert PDN. 1999. The modulation of DNA content: 848 Proximate causes and ultimate consequences. Genome Res. 9:317– 849 324. doi: 10.1101/gr.9.4.317. 850 Hardie DC, Hebert PDN. 2004. Genome-size evolution in fishes. 851 Can. J. Fish. Aquat. Sci. 61:1636–1646. doi: 10.1139/F04-106. 852 Hardie DC, Hebert PDN. 2003. The nucleotypic effects of cellular 853 DNA content in cartilaginous and ray-finned fishes. Genome. 854 46:683–706. doi: 10.1139/g03-040. 855 Hinegardner R. 1968. Evolution of Cellular DNA Content in Teleost 856 Fishes. Am. Nat. 102:517–523. doi: 10.1086/282564. 857 Hinegardner R, Rosen DE. 1972. Cellular DNA Content and the 858 Evolution of Teleostean Fishes. Am. Nat. 106:621–644. doi: 859 10.1086/282801. 860 Jensen-Seaman MI et al. 2004. Comparative recombination rates in 861 the rat, mouse, and human genomes. Genome Res. 14:528–538. 862 doi: 10.1101/gr.1970304.

37

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

863 Kent T V., Uzunović J, Wright SI. 2017. Coevolution between 864 transposable elements and recombination. Philos. Trans. R. Soc. B 865 Biol. Sci. 372. doi: 10.1098/rstb.2016.0458. 866 Khoshkholgh M, Alireza A, Sajad N. 2015. Karyotypic 867 characterization of the pike, Esox lucius from the south Caspian Sea 868 basin. Iran. J. Anim. Biosyst. 11:43–49. 869 Kirtiklis L, Ocalewicz K, Wiechowska M, Boroń A, Hliwa P. 2014. 870 Molecular cytogenetic study of the 871 amarus (Teleostei: : Acheilognathinae). Genetica. 872 142:141–148. doi: 10.1007/s10709-014-9761-x. 873 Koref-Santibanez S, Paepke H. 1994. Karyotypes of the 874 Trichogasterinae Liem (Teleostei, Anabantoidei). VIII Congr. Soc. 875 Eur. Ichthyol. 55. 876 Koukalova B et al. 2010. Fall and rise of satellite repeats in 877 allopolyploids of Nicotiana over c. 5 million years. New Phytol. 878 186:148–160. doi: 10.1111/J.1469- 879 [email protected]/(ISSN)1469- 880 8137(CAT)SPECIALISSUES(VI)PLANTPOLYPLOIDY. 881 Kuehne LM, Olden JD. 2014. Ecology and Conservation of 882 Mudminnow Species Worldwide. . 39:341–351. doi: 883 10.1080/03632415.2014.933318. 884 Lamatsch DK, Steinlein C, Schmid M, Schartl M. 2000. Noninvasive 885 determination of genome size and ploidy level in fishes by flow 886 cytometry: detection of triploid Poecilia formosa. Cytometry. 39:91– 887 95. doi: 10.1002/(SICI)1097-0320(20000201)39:2<91::AID- 888 CYTO1>3.0.CO;2-4. 889 Li JT et al. 2015. The fate of recent duplicated genes following a 890 fourth-round whole genome duplication in a tetraploid fish, common 891 (Cyprinus carpio). Sci. Rep. 5:1–9. doi: 10.1038/srep08199. 892 Liedtke HC, Gower DJ, Wilkinson M, Gomez-Mestre I. 2018. 893 Macroevolutionary shift in the size of amphibian genomes and the 894 role of life history and climate. Nat. Ecol. Evol. 2:1792–1799. doi: 895 10.1038/s41559-018-0674-4. 896 Lien S et al. 2016. The Atlantic salmon genome provides insights into 897 rediploidization. Nature. 533:200–205. doi: 10.1038/nature17164. 898 Lu J, Peatman E, Tang H, Lewis J, Liu Z. 2012. Profiling of gene 899 duplication patterns of sequenced teleost genomes: Evidence for 900 rapid lineage-specific genome expansion mediated by recent tandem

38

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

901 duplications. BMC Genomics. 13:1–10. doi: 10.1186/1471-2164-13- 902 246. 903 Lynch M. 2007. The Origins of Genome Architecture. Oxford 904 University Press (OUP) doi: 10.1093/jhered/esm073. 905 Macqueen DJ, Johnston IA. 2014. A well-constrained estimate for the 906 timing of the salmonid whole genome duplication reveals major 907 decoupling from species diversification. Proc. R. Soc. B Biol. Sci. 908 281:20132881. doi: 10.1098/rspb.2013.2881. 909 Mank JE, Avise JC. 2006. Phylogenetic conservation of chromosome 910 numbers in Actinopterygiian fishes. Genetica. 127:321–327. doi: 911 10.1007/s10709-005-5248-0. 912 Marburger S et al. 2018. Whole genome duplication and 913 transposable element proliferation drive genome expansion in 914 Corydoradinae catfishes. Proc. R. Soc. B Biol. Sci. 285:20172732. 915 doi: 10.1098/rspb.2017.2732. 916 Marić S et al. 2017. Phylogeography and population genetics of the 917 European mudminnow (Umbra krameri) with a time-calibrated 918 phylogeny for the family Umbridae. Hydrobiologia. 792:151–168. doi: 919 10.1007/s10750-016-3051-9. 920 Martin-Bergmann KA, Gee JH. 1985. The central mudminnow, 921 Umbra limi (Kirtland), a habitat specialist and resource generalist. 922 Can. J. Zool. 63:1753–1764. doi: 10.1139/z85-264. 923 Meyne J et al. 1990. Distribution of non-telomeric sites of the 924 (TTAGGG)n telomeric sequence in vertebrate chromosomes. 925 Chromosoma. 99:3–10. doi: 10.1007/BF01737283. 926 Mugal CF, Weber CC, Ellegren H. 2015. GC-biased gene conversion 927 links the recombination landscape and demography to genomic base 928 composition. BioEssays. 37:1317–1326. doi: 929 10.1002/bies.201500058. 930 Nam K, Ellegren H. 2012. Recombination Drives Vertebrate Genome 931 Contraction Petrov, DA, editor. PLoS Genet. 8:e1002680. doi: 932 10.1371/journal.pgen.1002680. 933 Nanda I, Schneider-Rasp S, Winking H, Schmid M. 1995. Loss of 934 telomeric sites in the chromosomes of Mus musculus domesticus 935 (Rodentia: Muridae) during Robertsonian rearrangements. 936 Chromosom. Res. 3:399–409. doi: 10.1007/BF00713889. 937 Nelson JS, Grande TC, Wilson MVH. 2016. Fishes of the World, 5th 938 Edition | Wiley.

39

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

939 Ocalewicz K et al. 2013. Pericentromeric location of the telomeric 940 DNA sequences on the European grayling chromosomes. Genetica. 941 141:409–416. doi: 10.1007/s10709-013-9740-7. 942 Ocalewicz K. 2013. Telomeres in fishes. Cytogenet. Genome Res. 943 141:114–125. doi: 10.1159/000354278. 944 Ohno S. 2013. Evolution by Gene Duplication. Springer Berlin 945 Heidelberg. 946 Ou S, Jiang N. 2018. LTR_retriever: A highly accurate and sensitive 947 program for identification of long terminal repeat retrotransposons. 948 Plant Physiol. 176:1410–1422. doi: 10.1104/pp.17.01310. 949 Pan Q et al. 2021. The rise and fall of the ancient northern pike 950 master sex determining gene. Elife. 10:1–50. doi: 951 10.7554/eLife.62858. 952 Pasquier J et al. 2016. Gene evolution and gene expression after 953 whole genome duplication in fish: the PhyloFish database. BMC 954 Genomics. 17:368. doi: 10.1186/s12864-016-2709-z. 955 Van de Peer Y, Mizrachi E, Marchal K. 2017. The evolutionary 956 significance of polyploidy. Nat. Rev. Genet. 18:411–424. doi: 957 10.1038/nrg.2017.26. 958 Pendas AM, Moran P, Freije JP, Garcia-Vazquez E. 1994. 959 Chromosomal mapping and nucleotide sequence of two tandem 960 repeats of Atlantic salmon 5S rDNA. Cytogenet. Genome Res. 961 67:31–36. doi: 10.1159/000133792. 962 Petrov DA, Sangster TA, Johnston JS, Hartl DL, Shaw KL. 2000. 963 Evidence for DNA loss as a determinant of genome size. Science 964 (80-. ). 287:1060–1062. doi: 10.1126/science.287.5455.1060. 965 Post A. 1974. Ergebnisse der Forschungsreisen des FFS “Walther 966 Herwig” nach Südamerika. XXXIV. Die Chromosomen von drei Arten 967 aus der Familie Gonostomatidae (Osteichthyes, Stomiatoidei). Arch 968 Fischwiss. 25:51–55. 969 Pritham EJ. 2009. Transposable elements and factors influencing 970 their success in eukaryotes. J. Hered. 100:648–655. doi: 971 10.1093/jhered/esp065. 972 Qumsiyeh MB. 1994. Evolution of number and morphology of 973 mammalian chromosomes. J. Hered. 85:455–465. doi: 974 10.1093/oxfordjournals.jhered.a111501. 975 Ráb P. 2004. Karyotype evolution in fishes of the order Esociformes.

40

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

976 Rab P, Crossman EJ. 1994. Chromosomal NOR phenotypes in North 977 American pikes and pickerels, genus Esox, with notes on the 978 Umbridae (Euteleostei: Esocae). Can. J. Zool. 72:1951–1956. doi: 979 10.1139/z94-265. 980 Rondeau EB et al. 2014. The Genome and Linkage Map of the 981 Northern Pike (Esox lucius): Conserved synteny revealed between 982 the salmonid sister group and the Yin, T, editor. PLoS 983 One. 9:e102089. doi: 10.1371/journal.pone.0102089. 984 Rozenfeld C et al. 2019. De novo European eel transcriptome 985 provides insights into the evolutionary history of duplicated genes in 986 teleost lineages Vaudry, H, editor. PLoS One. 14:e0218085. doi: 987 10.1371/journal.pone.0218085. 988 Sacerdot C, Louis A, Bon C, Berthelot C, Roest Crollius H. 2018. 989 Chromosome evolution at the origin of the ancestral vertebrate 990 genome. Genome Biol. 19:166. doi: 10.1186/s13059-018-1559-1. 991 Sehr M, Keckeis H. 2017. Habitat use of the European mudminnow 992 Umbra krameri and association with other fish species in a 993 disconnected Danube side arm. J. Fish Biol. 91:1072–1093. doi: 994 10.1111/jfb.13402. 995 Shao F, Han M, Peng Z. 2019. Evolution and diversity of 996 transposable elements in fish genomes. Sci. Rep. 9:1–8. doi: 997 10.1038/s41598-019-51888-1. 998 Simão FA, Waterhouse RM, Ioannidis P, Kriventseva E V, Zdobnov 999 EM. 2015. BUSCO: assessing genome assembly and annotation 1000 completeness with single-copy orthologs. Bioinformatics. 31:3210–2. 1001 doi: 10.1093/bioinformatics/btv351. 1002 Skog A, Vøllestad LA, Stenseth NC, Kasumyan A, Jakobsen KS. 1003 2014. Circumpolar phylogeography of the northern pike (Esox lucius) 1004 and its relationship to the (E. reichertii). Front. Zool. 11:67. 1005 doi: 10.1186/s12983-014-0067-8. 1006 Slijepcevic P. 1998. Telomeres and mechanisms of Robertsonian 1007 fusion. Chromosoma. 107:136–140. doi: 10.1007/s004120050289. 1008 Sochorová J, Garcia S, Gálvez F, Symonová R, Kovařík A. 2018. 1009 Evolutionary trends in animal ribosomal DNA loci: introduction to a 1010 new online database. Chromosoma. 127:141–150. doi: 1011 10.1007/s00412-017-0651-8. 1012 Sonnhammer ELL, Durbin R. 1995. A dot-matrix program with 1013 dynamic threshold control suited for genomic DNA and protein

41

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1014 sequence analysis. Gene. 167:GC1. doi: 10.1016/0378- 1015 1119(95)00714-8. 1016 Sun C et al. 2012. LTR retrotransposons contribute to genomic 1017 gigantism in plethodontid salamanders. Genome Biol. Evol. 4:168– 1018 183. doi: 10.1093/gbe/evr139. 1019 Symonová R et al. 2013. Genome differentiation in a species pair of 1020 coregonine fishes: An extremely rapid speciation driven by stress- 1021 activated retrotransposons mediating extensive ribosomal DNA 1022 multiplications. BMC Evol. Biol. 13:42. doi: 10.1186/1471-2148-13- 1023 42. 1024 Symonová R et al. 2017. Higher-order organisation of extremely 1025 amplified, potentially functional and massively methylated 5S rDNA in 1026 European pikes (Esox sp.). BMC Genomics. 18:391. doi: 1027 10.1186/s12864-017-3774-7. 1028 Tenaillon MI, Hollister JD, Gaut BS. 2010. A triptych of the evolution 1029 of plant transposable elements. Trends Plant Sci. 15:471–478. doi: 1030 10.1016/j.tplants.2010.05.003. 1031 Vendrely R, Vendrely C. 1953. Argenine an deoxyribonucleic acid 1032 content of erythrocyte nuclei and sperms of some species of fishes. 1033 Nature. 172:30–31. 1034 Vendrely R, Vendrely C. 1952a. Sur la teneur compare en arginine et 1035 en acide desoxyribonucleique des noyaux d’erythrocytes de 1036 quelques especes de poissons. Comptes Rendus l’Academie des 1037 Sci. 235:395–397. 1038 Vendrely R, Vendrely C. 1952b. Sur la teneur individuelle en arginine 1039 des spermatozoedes compare la teneur individuelle en arginine des 1040 noyaux d’erythrocytes chez quelques especes de poissons. Comptes 1041 Rendus l’Academie des Sci. 235:444–446. 1042 Vinogradov AE. 1998. Genome size and GC-percent in vertebrates 1043 as determined by flow cytometry: The triangular relationship. 1044 Cytometry. 31:100–109. doi: 10.1002/(SICI)1097- 1045 0320(19980201)31:2<100::AID-CYTO5>3.0.CO;2-Q. 1046 Wanzenböck J, Spindler T. 1995. Rediscovery of Umbra krameri 1047 (Walbaum, 1972) in Austria and subsequent investigations. Ann. des 1048 Naturhistorischen Museums Wien. Ser. B für Bot. und Zool. 97:450– 1049 457. 1050 Wong WY et al. 2019. Expansion of a single transposable element 1051 family is associated with genome-size increase and radiation in the

42

bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1052 genus Hydra. Proc. Natl. Acad. Sci. U. S. A. 116:22915–22917. doi: 1053 10.1073/pnas.1910106116. 1054

43