bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1 DNA transposon expansion is
2 associated with genome size increase
3 in mudminnows
4 5 Robert Lehmann1, Aleš Kovařík2, Konrad Ocalewicz3, Lech Kirtiklis4, 6 Andrea Zuccolo5,6, Jesper N. Tegner1, Josef Wanzenböck7, Louis 7 Bernatchez8, Dunja K. Lamatsch7, and Radka Symonová9,10 8 9 1 Division of Biological and Environmental Sciences & Engineering, 10 Computer, Electrical and Mathematical Sciences and Engineering Division, 11 King Abdullah University of Science and Technology, Thuwal, 23955-6900, 12 Kingdom of Saudi Arabia 13 2 Laboratory of Molecular Epigenetics, Institute of Biophysics, Czech 14 Academy of Science, Královopolská 135, 61265, Brno, Czech Republic 15 3 Department of Marine Biology and Ecology, Institute of Oceanography, 16 Faculty of Oceanography and Geography, University of Gdansk, Gdansk, 17 Poland 18 4 Department of Zoology, Faculty of Biology and Biotechnology, University 19 of Warmia and Mazury, M. Oczapowskiego Str. 5, 10-718, Olsztyn, Poland 20 5 Center for Desert Agriculture, Biological and Environmental Sciences & 21 Engineering Division (BESE), King Abdullah University of Science and 22 Technology, Thuwal 23955-6900, Saudi Arabia 23 6 Institute of Life Sciences, Scuola Superiore Sant’Anna, Pisa 56127, Italy 24 7 Research Department for Limnology Mondsee, University of Innsbruck, A 25 – 5310 Mondsee, Austria 26 8 IBIS (Institut de Biologie Intégrative et des Systèmes), Université Laval, 27 Québec, QC, Canada 28 9 Department of Bioinformatics, Wissenschaftzentrum Weihenstephan, 29 Technische Universität München, Freising, Germany 30 10Faculty of Biology, University of Hradec Kralove, Czech Republic 31 Correspondence: 32 Radka Symonova ([email protected]) 33 Word Count: 6766
1
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
34 Abstract
35 Genome sizes of eukaryotic organisms vary substantially, with whole
36 genome duplications (WGD) and transposable element expansion acting as
37 main drivers for rapid genome size increase. The two North American
38 mudminnows, Umbra limi and U. pygmaea, feature genomes about twice the
39 size of their sister lineage Esocidae (e.g., pikes and pickerels). However, it
40 is unknown whether all Umbra species share this genome expansion and
41 which causal mechanisms drive this expansion. Using flow cytometry, we
42 find that the genome of the European mudminnow is expanded similarly to
43 both North American species, ranging between 4.5-5.4 pg per diploid
44 nucleus. Observed blocks of interstitially located telomeric repeats in Umbra
45 limi suggest frequent Robertsonian rearrangements in its history.
46 Comparative analyses of transcriptome and genome assemblies show that
47 the genome expansion in Umbra is driven by extensive DNA transposon
48 expansion without WGD. Furthermore, we find a substantial ongoing
49 expansion of repeat sequences in the Alaska blackfish Dallia pectoralis, the
50 closest relative to the family Umbridae, which might mark the beginning of a
51 similar genome expansion. Our study suggests that the genome expansion
52 in mudminnows, driven mainly by transposon expansion, but not WGD,
53 occurred before the separation into the American and European lineage.
54 55 56 Key Words: Genome expansion, Umbra, Robertsonian fusion, centric 57 fission, repetitive sequences. 58
2
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
59 Significance Statement: 60 North American mudminnows feature genomes about twice the size of their
61 sister lineage Esocidae (e.g., pikes and pickerels). However, neither the
62 mechanism underlaying this genome expansion, nor whether this feature is
63 shared amongst all mudminnows is currently known. Using cytogenetic
64 analyses, we find that the genome of the European mudminnow also
65 expanded and that extensive chromosome fusion events have occurred in
66 some Umbra species. Furthermore, comparative genomics based on de-
67 novo assembled transcriptomes and genome assemblies, which have
68 recently become available, indicates that DNA transposon activity is
69 responsible for this expansion.
70 Introduction
71 The driving forces and effects of genome size variations across different taxa
72 are a recurring theme in the field of evolutionary biology (Lynch 2007). While
73 the number of chromosomes is typically highly conserved among teleost
74 fishes (Mank & Avise 2006), genome sizes vary substantially and include the
75 smallest and the largest vertebrate genome (Hardie & Hebert 2004).
76 Specifically, fish genome size ranges from C = 0.35 pg in bandtail pufferfish
77 (Sphoeroides spengleri) to C = 133 pg in marbled lungfish (Protopterus
78 aethiopicus) (Gregory 2020). The main candidate processes introducing
79 such drastic variation are gene-(Ohno 2013; Lu et al. 2012) or genome
80 duplication(Fontdevila 2011; Van de Peer et al. 2017), transposable element
81 (TE) proliferation (Pritham 2009; Tenaillon et al. 2010), and replication
82 slippage at tandem repeat loci(Ellegren 2004). Furthermore, it is unclear
3
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
83 whether this variation is shaped by adaptive processes (Gregory & Hebert
84 1999; Liedtke et al. 2018; Van de Peer et al. 2017) or stochastic sequence
85 gain and loss(Lynch 2007). The presence of TEs in virtually all eukaryotic
86 genomes suggests a role of stochastic sequence gain due to TE proliferation
87 (Elliott & Gregory 2015). There are several recent reports of significant
88 genomic expansion, such as in salamanders due to long terminal repeat
89 element activity (Sun et al. 2012) and in Hydra due to long interspersed
90 nuclear element activity (Wong et al. 2019). Estimates of the pervasiveness
91 and extent of expansion events caused by TE proliferation and other
92 processes will provide insight into the forces that shape genome size
93 evolution.
94 Mudminnows (Umbridae) are very resilient members of the order
95 Esociformes, capable of withstanding extreme cold and can utilize
96 atmospheric oxygen. Under adverse conditions, they are reputed to become
97 dormant in mud (Gilbert & Williams 2002). The order Esociformes includes
98 two families and four genera in at least 12 species with one fossil-only family
99 (Nelson et al. 2016). Furthermore, Esociformes (Haplomi, Esocae; pikes and
100 mudminnows) and its sister order Salmoniformes belong to the clade
101 Protacanthopterygii, a lineage of basal teleosts (Nelson et al. 2016). Genome
102 sizes of both North American Umbra species were determined already in the
103 60s to be C = 2.52 - 2.70 pg for Umbra limi and C = 2.4 pg for U. pygmaea
104 (Hinegardner 1968). The genome size of the remaining representatives of
105 the order Esociformes, including Esox, Novumbra, and Dallia, ranges from
106 0.85 (in Esox) to 1.4 (also in Esox; summarized by (Gregory 2020); details in
4
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
107 Table 1). While two of the three extant Umbra species feature a genome size
108 twice the size of the remaining esociform species, it is currently not known
109 whether this is a universal feature of the Umbridae and can be expected in
110 U. krameri.
111 The diploid chromosome number in Esociformes ranges from 2n = 22 (U.
112 pygmaea and U. limi) to 2n = 71 - 79 (Dallia sp.). Almost all known pike (Esox)
113 species possess 50 (fundamental number FN = 50) usually acrocentric
114 chromosomes, including species from Canada, USA, Sweden, and Russia
115 (Arai 2011) and two European Esox species (Symonová et al. 2017). Such
116 karyotype is considered the pre-duplication ancestor of the order
117 Salmoniformes (salmon, grayling, whitefish) (Rondeau et al. 2014; Ráb
118 2004). There are, however, reports of FN = 86 with 2n = 50 in E. lucius from
119 the south Caspian Sea basin and further reports of FN = 72 - 88 in other E.
120 lucius populations (Khoshkholgh et al. 2015). This morphological variability
121 of chromosomes in the Esox genus with its broad circumpolar distribution
122 (Nelson et al. 2016) has not yet been analyzed.
123 Pikes are highly interesting from the cytogenomic viewpoint because of their
124 extremely amplified, massively methylated, and potentially functional 5S
125 rDNA recorded on more than 30 centromeric sites of their 50 chromosomes,
126 whereas 45S rDNA is located in only two sites (Symonová et al. 2017). A
127 similar amplification and increased dynamics of the 45S rDNA fraction has
128 been repeatedly recorded in Salmoniformes 12,13, which furthermore
129 underwent an already well-characterized whole-genome duplication (WGD)
130 (Macqueen & Johnston 2014; Lien et al. 2016). Indeed, WGD events have
5
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
131 been described for multiple basal, particularly freshwater teleosts such as
132 Cypriniformes (Li et al. 2015) and Siluriformes (Marburger et al. 2018). This
133 observation suggests two distinct explanations for the significant genome
134 size increase of Umbridae compared to its sister lineage Esocidae, an
135 Umbridae-specific WGD on the one hand, and an extreme amplification of
136 ribosomal DNA on the other.
137 Here, we present cytogenomic evidence that all extant members of the
138 Umbra lineage feature an expanded genome size compared to its closest
139 relative, the Esocidae family. Furthermore, we find that Umbridae feature a
140 standard locus and copy number of 5S rDNA typical for teleosts, excluding a
141 5S rDNA expansion as a reason for the Umbra-specific genome. While
142 phylogenetic profiles of orthologous genes across species do not show
143 patterns of WGD, comparative analyses of repetitive sequence content
144 across six Esocidae species show a clear signal of DNA transposon
145 expansion in Umbra pygmaea driving the genome size increase.
146 Results
147 Flow cytometry genome size determination in the European pike 148 and the European mudminnow 149 Flow cytometry measurements using DAPI staining of U. krameri with
150 chicken RBC as internal standard exhibited a genome size of 2C = 4.54 ±
151 0.05 pg (±STDEV) per nucleus (CV% 3.36 ± 0.55; range 4.43-4.60 pg per
152 nucleus, N=10). E. lucius gave genome values of 2C = 2.22 ± 0.03 pg per
153 nucleus (CV% 2.44 ± 1.37, range 2.18 - 2.27 pg per nucleus, N=2). Genome
154 size and chromosome traits (2n and FN) for the order Esociformes are
6
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
155 summarized in Fig. 1 and show the elevated genome size in all three Umbra
156 species. The higher interquartile range in Esox lucius reflects merely the
157 higher sampling effort and multiple values available. A similar situation exists
158 in Umbra limi. We provide an overview of cytogenomic and cytogenetic traits
159 of the order Esociformes (Table 1) based on the checklist of fish karyotypes
160 (Arai 2011).
161
162
163 164 Fig.1. Genome size (C-value) (left) and chromosome (2n) and chromosome 165 arms (FN) counts (right) in esociform fish. 166
167 Table 1. Cytogenomic data for extant species of the order Esociformes from 168 www.genomesize.com, www.animalrdnadatabase.com, literature, and results 169 obtained in this study. C- val 45S 5S Family Species (pg) 2n/FN Method rDNA rDNA Reference E. americanus (Beamish et al. 1971; Rab & Esocidae 1.18 50/72 FD
americanus Crossman 1994) E. americanus 1.06/ (Beamish et al. 1971; Hardie & Esocidae 50 FIA, FD vermiculatus 1.12 Hebert 2004) (Vendrely & Vendrely 1953, Esocidae E. lucius 0.97 BCA 1952a)
Esocidae E. lucius 0.98 BCA (Vendrely & Vendrely 1952b)
7
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Esocidae E. lucius 1.13 FIA (Hardie & Hebert 2003, 2004) 50/ (Vinogradov 1998;
Esocidae E. lucius 1.15 FCM 50-58 Khoshkholgh et al. 2015)
Esocidae E. lucius 1.36 FD (Beamish et al. 1971) present study;
Esocidae E. lucius 1.11 50/50 FCM 2 30-38 (Symonová et al. 2017)
Esocidae E. cisalpinus 50/50 N/A 2 30-34 (Symonová et al. 2017) E. 50/ (Beamish et al. 1971; Rab & Esocidae 1.28 FD 2
masquinongy 76-78 Crossman 1994) (Hardie & Hebert 2003, 2004); Esocidae E. niger 0.92 FIA 4 present study (Beamish et al. 1971) Esocidae E. niger 1.18 50/74 FD 2 (Beamish et al. 1971) Esocidae E. reichertii 1.28 50 FD
Dallia (Beamish et al. 1971; Rab & Esocidae 1.26 74-78/ FD 6 pectoralis Crossman 1994) (Beamish et al. 1971; Crossman Novumbra Esocidae 1.04 48/62 FD 12 hubbsi & Ráb 2001) (Beamish et al. 1971); present
Umbridae U. limi 2.52 FD 3 study
Umbridae U. limi 2.56 FIA (Hardie & Hebert 2004) (Hinegardner 1968; Hinegardner Umbridae U. limi 2.70 22/44 BFA 2 & Rosen 1972) present study; reviewed in (Arai Umbridae U. krameri 2.27 44/44 FCM 2011) Umbridae U. pygmaea 2.41 22/44 FD 2-4 (Beamish et al. 1971) 170 Abbreviations used: Method: BCA – biochemical analysis, BFA – Bulk fluorometric assay, FD 171 – Feulgen densitometry, FIA - Feulgen image analysis densitometry, FCM – Flow cytometry, 172 N/A – not available. 173
174 Comparative analysis of the U. pygmaea transcriptome does not 175 indicate whole genome duplication 176 To test whether the Umbridae family underwent a whole-genome duplication
177 event (WGD), we assessed gene duplication levels in a set of 24 de novo
178 transcriptome assemblies, including the eastern mudminnow U. pygmaea
179 (Pasquier et al. 2016). We focused on a set of 3640 benchmark universal
180 single-copy orthologous (BUSCO) genes, which occur only once in the
8
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
181 majority (>90%) of a set of 26 selected Actinopterygii species (Simão et al.
182 2015). Since only a single salmonid species, Salmo salar, was used to
183 construct the BUSCO set, genes that were duplicated in the salmonid-
184 specific WGD(Macqueen & Johnston 2014) were not excluded. Similarly, as
185 this gene set does not consider Umbridae family species, any potential
186 ohnologs originating from an Umbridae-specific WGD are not excluded.
187 Accordingly, five of the six salmonid de novo transcriptome assemblies show
188 the highest duplication level amongst the 24 considered species (Fig. S2,
189 Table S1). In contrast, U. pygmaea exhibits the second-lowest duplication
190 level, second only to Pangasius hypophthalmus. The elevated duplication
191 level in European eel Anguilla anguilla and arowana Osteoglossum
192 bichirrosum were noted recently, suggesting an additional WGD as a
193 possible cause(Rozenfeld et al. 2019).
194 Transcriptome de novo assembly procedures include steps that can
195 influence the observed number of duplicated genes, such as the clustering
196 of the assembled contigs(Pasquier et al. 2016; Grabherr et al. 2011).
197 Therefore, we collected all available reference genome assemblies matching
198 the species of any of the transcriptome assemblies. The direct comparison
199 of BUSCO estimates for de novo assemblies and reference genome-derived
200 transcriptomes across 10 species revealed a high correspondence of the
201 duplication level (R2 = 0.8) (Fig. S1, Table S2). In contrast, parameters
202 reflecting technical differences of transcriptome generation show
203 substantially lower correspondence, such as the number of fragmented (R2
204 = 0.02) or missing transcripts (R2 = 0.28). Clustering of BUSCO gene
9
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
205 phylogenetic profiles across all considered species provides a more detailed
206 picture (Fig. 2). While a small group of 240 transcripts fails to assemble
207 reliably (cluster I), likely reflecting specific sequence properties, the second
208 group of 819 genes is detected as single-copy in the majority of species
209 (cluster II). The largest cluster III, comprised of 2096 genes, exhibits species-
210 specific duplication patterns. Cluster IV contains 301 genes duplicated
211 mainly in salmonids and thus likely represent remnants of the salmonid-
212 specific WGD (cluster IV). The remaining 184 genes appear almost
213 consistently duplicated or triplicated across all considered species, again
214 likely reflecting sequence properties (cluster V). Both members of the
215 Esocidae, the eastern mudminnow U. pygmaea, and the Northern pike Esox
216 lucius are placed next to each other in the hierarchical species clustering.
217 Separating cluster III into subclusters reveals that both the eastern
218 mudminnow and the Northern pike feature only small non-overlapping
219 groups of species-specific duplicated genes containing 47 and 76 genes,
220 respectively (Fig. S3, Table S3).
221
III III IV V American whitefish Amiiformes European whitefish Rainbow trout Anguilliformes Grayling Brook trout Beloniformes Brown trout Characiformes Arowana Butterfly fish Clupeiformes Elephantnose fish European eel Cypriniformes Black ghost Panga Esociformes Mexican cave tetra Gadiformes Mexican surface tetra Medaka Gymnotiformes European perch Zebrafish Lepisosteiformes Allis shad Northern pike Osmeriformes Eastern mudminnow Salmoniformes Ayu Gar Osteoglossiformes Bowfin Atlantic cod Siluriformes Order Perciformes 222 single-copy duplicated triplicated fragmented missing
10
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
223 Fig.2. Phylogenetic profiling of universal single-copy orthologous genes in 224 24 de novo transcriptome assemblies. Genes are grouped into five clusters 225 (top, I-V) by profile similarity. The hierarchical clustering tree for all species 226 is shown on the left together with the corresponding color-coded order. 227 228
229 Comparative genome analyses of Esociformes species reveal 230 extensive transposable element expansions 231 We performed a direct comparative analysis of the genome composition of
232 U. pygmaea using recently published short-read-based genome assemblies
233 of five Esociformes species(Pan et al. 2021) and, in addition, the high-quality
234 chromosome scale genome assembly of E. lucius (Table S4). Due to the
235 employed short-read sequencing and relatively low coverages, the
236 assemblies for U. pygmaea, E. masquinongy, E. niger, N. hubbsi, and D.
237 pectoralis are fragmented and less complete (11 - 24% BUSCO genes
238 missing) compared to the chromosome-scale assembly of E. lucius (3%
239 missing). Nonetheless, the genome assembly sizes correspond well to direct
240 measurements of genome size (Fig. 3A). In agreement with the
241 transcriptome analysis, the assessment of BUSCO genes in all six genome
242 assemblies shows low levels of duplication. We then characterized the
243 repetitive sequence content of each genome aiming to test the hypothesis
244 that transposable element activity might have caused a genome expansion
245 exclusively in U. pygmaea. We, therefore, annotated repetitive sequences in
246 all six genomes using a combined de novo repeat library (see Methods). The
247 observed repeat content in each genome scales with the total genome size
248 (Fig. 3B), with the highest repeat content of 64% in the largest genome of U.
249 pygmaea and 37% of repeat content in the smallest genome of N. hubbsi
11
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
250 (Fig. 3C, stacked plot on the left). In general, DNA transposons make up the
251 largest part of classified repeat sequences in all analyzed species,
252 contributing about 19% genomic sequence in U. pygmaea, E. lucius, and E.
253 masquinongy (Tab. S4). In contrast, E. niger, N. hubbsi, and D. pectoralis
254 show lower DNA transposon abundances with 14.8%, 10.8%, and 8%,
255 respectively. A similar pattern is observed for LINE elements, while SINE
256 elements and LTRs are generally low in abundance. A principal component
257 analysis of repeat abundances across all six species reveals that the
258 repetitive sequence composition mirrors the phylogenetic relationships (Fig.
259 S4), with high similarity between E. lucius and E. masquinongy, as well as
260 between N. hubbsi and D. pectoralis. Both, U. pygmaea and E. niger,
261 constitute outliers in repeat composition. Importantly, we observe a
262 significant expansion of highly repetitive and species-specific sequences
263 awaiting further characterization in U. pygmaea with 30.8% genomic
264 abundance, compared to abundances in the range of 15.6-21.2% in all other
265 species.
266 Similar to the repeat composition, repeat expansion histories also show
267 significant differences between the considered species (Fig. 3C). Here, the
268 repeat expansion history is captured by the Kimura substitution level for each
269 genomic copy of a repeat as a proxy for its age, since each copy accumulates
270 mutations over time and diverges from its consensus sequence. The
271 resulting divergence landscape suggests that the genome size increase of
272 U. pygmaea results from a recent spike in DNA transposon activity combined
273 with the accumulation of unclassified repetitive sequence, with 45% of the
12
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
274 genomic repeat sequence showing a mean divergence below 8 (Fig. 3C, top
275 left). The most abundant repeat sequence in U. pygmaea is the Tc1/mariner
276 element IC-Mifl (upygm-1-8) which makes up 1% (20.5 Mb) of the genome
277 assembly with an average divergence of 8 (Table S5). The second most
278 abundant repeat Tc1-2_Xt (upygm-1-14), with 14 Mb and 0.7 % of genomic
279 sequence, is also member of the Tc1/mariner family and shows a very recent
280 expansion pattern with an average divergence of 1.34. E. masquinongy and
281 E. lucius also show a notable recent expansion peak driven by DNA
282 transposon activity. Similar to U. pygmaea, the expansion of E. lucius is also
283 driven in part by IC-Mifl which contributes 1.3 % / 11.7 Mb to the total genome
284 assembly, making it the third most abundant repeat only surpassed by
285 Mariner-9 originally identified in Salmo salar and an RTE-1 non-long-
286 terminal-repeat element identified in E. lucius (Table S5). The prominence of
287 IC-Mifl notwithstanding, the high total abundance of DNA transposon
288 sequence in the aforementioned species is caused by a multitude of active
289 families. While the total repeat content of D. pectoralis is low, the divergence
290 landscape reveals a currently ongoing repeat expansion with 27% of the total
291 repeat content at divergences below 4, which translates into 9.1 % of the
292 total genome sequence. In contrast to all other species, the expansion is
293 caused by two LINE/RTE-BovB elements with high abundance (0.5 and
294 0.2%, respectively, of total genome) and low mean divergence (1.16 and
295 0.87) in combination with rRNA, tRNA, and unclassified sequences. This
296 leaves N. hubbsi as the only species with a consistently low transposable
297 element activity.
13
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
298 299 A B 2.00 ● ● U.pygmaea 60 U.pygmaea 1.75 55 ● 1.50 50 E.masquinongy ● E.lucius 1.25 E.masquinongy 45 E.niger ● ● E.lucius D.pectoralis 1.00 N.hubbsi ● ● ● 40 D.pectoralis ● ● N.hubbsi E.niger ● 0.75 [%] Content Repeat Total Genome Assembly Size [GB] Size Assembly Genome 1.0 1.5 2.0 1.0 1.5 2.0 Haploid Genome Size [pg] (FD) Haploid Genome Size [pg] (FD) C U. pygmaea E. masquinongy 60 5 60 5 4 4 40 3 40 3 20 2 20 2 1 1 0 0 0 0 Percent of genome of Percent 0 10 20 30 40 50 0 10 20 30 40 50
E. lucius E. niger 60 5 60 5 4 4 40 3 40 3 20 2 20 2 1 1 0 0 0 0 Percent of genome of Percent 0 10 20 30 40 50 0 10 20 30 40 50
D. pectoralis N. hubbsi DNA RC 5 5 60 60 LTR 4 4 LINE 40 3 40 3 SINE 2 2 20 20 Other 1 1 Unknown 0 0 0 0 Percent of genome of Percent 0 10 20 30 40 50 0 10 20 30 40 50 300 Divergence Divergence 301 Fig.3. Genomic repeat sequence content and divergence landscapes in six 302 Esociformes species. A) Genome size measurements by Feulgen 303 densitometry compared to genome assembly size (Table S4). B) Genome 304 assembly size compared to repetitive sequence content. C) Total repetitive 305 sequence content (left) and repeat copy divergence landscapes (right) as 306 fraction of the total genome assembly for each of the six species, 307 distinguishing the main repetitive sequence classes. 308 309 310 To gain a more detailed overview of the Tc1/mariner transposon superfamily
311 activity, we constructed a phylogeny of all transposase sequences in our
14
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
312 repeat library together with transposase sequences of a previously described
313 Tc1/mariner reference set(Gao et al. 2017) in neoteleosts. This phylogeny
314 was then complemented with the detected genomic abundances in the
315 considered species (Figure 4). We find that most highly abundant elements
316 fall into the Minos-/Bari-like clade (Figure 4 clade A), such as the two most
317 abundant elements of U. pygmaea (upygm-1-8, upygm-1-14), E.
318 masquinongy (eluci-6-168), and N. hubbsi (eluci-4-461). Other more
319 abundant lineages are passport (Figure 4 clade B and D), with the most
320 abundant element eluci-1-0 in E. lucius, and frog prince (Figure 4 clade C,
321 upygm-1-69, eluci-6-2394).
322 Finally, we performed a manual curation of the 30 most abundant repeat
323 sequences found in U. pygmaea, contributing a combined 101.87 Mb to the
324 genome assembly to test the extent to which the unclassified repeat
325 sequence expansion can also be attributed to transposable element activity.
326 We find TE-related structural features in 13 repeats, accounting for 48 Mb,
327 in addition to a highly abundant tandem repeat and a satellite sequence
328 (Table S6). With 11 out of 13 TE-related sequences, the vast majority shows
329 DNA transposon features. In general, we observe a consistent pattern of
330 increased abundance across species in a small group of Tc1/mariner
331 elements mirroring the pattern of total genomic repeat abundance in the
332 respective species.
333 334
15
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
● upygm−6−2325 ● upygm−6−10034 ● eluci−6−6051 ● enig−6−8846 Detected in ● dpect−6−954 ● dpect−6−4153 ● upygm−6−1332 ● upygm−6−12034 ● ● upygm−6−5394 ● upygm−6−104 U.pygmaea ● nhubb−1−755 ● eluci−6−9212 ● emasq−1−282 ● emasq−1−894 ● ● upygm−5−211 D.pectoralis ● eluci−6−5116 ● dpect−6−10709.1 ● dpect−6−12441 ● dpect−6−10709 ● eluci−6−13133 ● ● nhubb−1−714 N.hubbsi ● upygm−5−2353.1 ● upygm−6−1653 ● eluci−6−6295 ● dpect−6−1458.1 ● ● eluci−1−357 ● emasq−6−3556.1 E.lucius ● upygm−6−6677 ● upygm−6−2744 ● emasq−6−646 ● ● dpect−6−3945 E.masquinongy ● dpect−6−7544 ● upygm−6−574.1 ● upygm−6−574 ● upygm−6−2790 ● ● dpect−5−4732 ● eluci−6−1668 E.niger ● emasq−6−4004 ● eluci−6−20304 ● upygm−6−6110 ● upygm−6−7304 ● ● nhubb−6−4879 ● upygm−6−3216 Reference ● dpect−6−9335 ● eluci−6−8244 ● dpect−6−820 ● dpect−6−2508 ● dpect−6−20619 ● enig−5−1580 ● nhubb−6−12303 ● nhubb−5−308 ● upygm−5−90.1 ● upygm−6−298 ● upygm−6−575 ● emasq−6−3556 ● emasq−5−132 ● dpect−6−19895 Genomic ● dpect−6−2508.1 ● upygm−5−4765 ● nhubb−1−771 ● upygm−6−531 ● enig−6−733 ● upygm−5−2353 ● emasq−1−881 ● emasq−1−405 Abundance ● dpect−6−3874 ● emasq−6−14646 ● eluci−4−1823 ● emasq−6−16645 ● dpect−6−1458 ● eluci−5−7810 ● eluci−6−3907 ● dpect−6−1495 [Mb] ● enig−6−5888 ● eluci−5−71 ● enig−6−12499 ● nhubb−5−1582 ● emasq−6−4259 ● emasq−6−5593 ● emasq−5−4684 ● enig−6−12499.1 ● nhubb−6−8570 ● nhubb−6−8092 ● nhubb−6−1591 20 ● nhubb−6−32458 ● enig−6−3291 ● emasq−6−1827 ●upygm−1−1819 ● emasq−6−5593.1 ● nhubb−5−308.1 ● enig−5−6716 ● emasq−6−8753 ● emasq−6−1827.1 ● emasq−6−6909 ● eluci−5−7810.1 ● nhubb−6−2117 ● upygm−4−1265 15 ● enig−6−1186 ● upygm−6−104.1 ● upygm−5−22 ● emasq−6−4004.3 ● upygm−6−3265 ● dpect−6−24044 ● enig−6−7512 ● eluci−6−3507 ● emasq−5−281 ● upygm−6−821 ● upygm−6−1747 ● eluci−6−3507.1 ● enig−5−537 10 ● emasq−6−5367 ● emasq−6−3556.2 ● eluci−6−5978 ● eluci−1−470 ● eluci−1−357.1 ● dpect−6−1458.2 ● upygm−5−900 ● upygm−6−542 ● upygm−6−6677.1 ● eluci−1−449 ● upygm−6−6693 ● emasq−1−666 ● enig−5−1926 5 ● upygm−6−104.2 ● nhubb−1−786 ● Tc1_5_Tr (pogo−like) ● Tc1_6_Ga (pogo−like) ● Tc1_7_On (pogo−like) ● Tc1_7_Ol (pogo−like) ● Tc1_3_Tn (pogo−like) ● Tc1_5_Gm (pogo−like) ● eluci−6−1668.1 ● upygm−6−2515.1 ● emasq−6−4004.2 ● upygm−5−597 ● upygm−6−823 ● emasq−6−4004.1 ● enig−6−735 ● upygm−5−1181 ● upygm−5−211.1 ● emasq−5−3909 ● eluci−6−12688.1 ● nhubb−5−199.1 ● dpect−6−108 ● nhubb−5−1216 ● emasq−6−12668 ● upygm−6−4004 ● dpect−6−5897 ● eluci−6−12688 ● nhubb−5−199 ● upygm−5−561 ● dpect−5−3101 ● enig−6−3315 ● upygm−6−2264 ● emasq−6−6050 ● enig−6−3315.1 ● enig−6−8700 ● enig−6−5490 ● eluci−6−1175 ● upygm−5−3194 ● eluci−6−672 ● upygm−5−2812 ● upygm−4−456 ● emasq−6−1208 ● emasq−6−2577 ● dpect−6−1833 ● eluci−6−7384.1 ● eluci−6−7384 ● dpect−6−31979 ● upygm−6−222 ● upygm−5−90 ● upygm−6−2515 ● eluci−5−5386 ● eluci−6−31659 ● upygm−6−163 ● dpect−5−5339 ● upygm−1−191 ● upygm−1−306 ● upygm−1−396 ● emasq−1−161 ● Tc1_6_Ol (minos−like) ● eluci−5−5330 ● eluci−4−461 ● upygm−1−87 ● enig−4−206 ● eluci−5−423 ● upygm−1−8 ● Tc1_2_Tn (bari−like) ● emasq−1−25 ● eluci−1−99 ● upygm−5−4933 A ● emasq−1−61 ● eluci−6−9355 ● eluci−6−168.1 ● Tc1_2_Gm (minos−like) ● Tc1_3_On (minos−like) ● upygm−1−14 ● eluci−6−168 ● eluci−5−12697 ● Tc1_4_On (minos−like) ● eluci−5−12636.1 ● enig−1−15 ● eluci−5−80 ● Tc1_4_Gm (passport−like) ● Tc1_3_Gm (passport−like) ● eluci−1−0 ● eluci−5−58 ● Tc1_4_Ol (passport−like) ● Tc1_5_Ol (passport−like) B ● Tc1_2_On (passport−like) ● Tc1_3_Tr (passport−like) ● Tc1_3_Ol (passport−like) ● Tc1_1_Tr (sb−like) ● Tc1_1_Tn (sb−like) ● Tc1_6_On (sb−like) ● Tc1_2_Ol (frog.prince−like) ● Tc1_1_Ol (frog.prince−like) ● Tc1_2_Ga (frog.prince−like) ● enig−1−82 ● Tc1_1_On (frog.prince−like) ● Tc1_4_Tr (frog.prince−like) ● upygm−1−69 ● eluci−6−2394 C ● Tc1_1_Gm (frog.prince−like) ● upygm−1−27 ● Tc1_3_Ga (passport−like) ● emasq−1−26 ● Tc1_2_Tr (passport−like) ● Tc1_5_On (passport−like) ● enig−1−12 ● Tc1_1_Ga (passport−like) ● Tc1_5_Ga (passport−like) ● Tc1_4_Ga (passport−like) D ● upygm−1−18 ● upygm−1−106 ● eluci−4−599 ● eluci−5−141 ● eluci−6−2604 ● eluci−5−12636 N.hubbsi D.pect. U.pygm. 335 E.lucius E.masq. E.niger 336 Fig.4. Phylogenetic relationship (left) and genomic abundance (right) of 337 Tc1/mariner superfamily DNA transposons across six Esociformes species. 338 The phylogenetic tree is based on transposase sequences from 206 repeats 339 combined with 33 previously described(Gao et al. 2017) reference elements 340 from six lineages (Pogo, Passport, SB, Frog Prince, Minos, and Bari). Clades 341 containing most abundant elements are marked A-D. Genomic abundance 342 values in Mb are based on full length repeat sequences, where elements not 343 detected in the respective species are shown in white. 344
16
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
345 Fluorescence in Situ Hybridization
346 Studied pike species mostly exhibit karyotypes composed of 50 mono-armed
347 chromosomes (FN = 50). In turn, Umbra limi possesses only 22 metacentric
348 and submetacentric chromosomes (FN = 44), gradually decreasing in size.
349 Fluorescence in Situ hybridization (FISH) with PNA telomere probe revealed
350 hybridization signals only at the very ends of all chromosomes in E. lucius
351 and E. cisalpinus. In Umbra limi, apart from the terminal location of the FISH
352 telomeric signals observed on all chromosomes, up to seven chromosomes
353 exhibited interstitial telomeric sites (ITSs) in the pericentromeric locations. In
354 four of these chromosomes, ITSs overlapped with DAPI positive regions.
355 DAPI positive signals occurred on the q-arms below the pericentromeric
356 regions and did not correspond with the ITSs for up to three other
357 chromosomes (Fig. 5A). The 5S rDNA probe hybridized to three sites in a
358 pericentromeric location on two metacentric chromosomes and centromeric
359 location on one submetacentric chromosome in U. limi (Fig. 5B). All 5S rDNA
360 sites were DAPI-positive.
361
17
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
362 Fig.5. FISH with telomeric probes and 5S rDNA. Chromosomes of Umbra 363 limi after FISH with telomeric probe (A) and 5S rDNA probe (B) and of Esox 364 niger after FISH with 5S rDNA probe (C). All chromosomes are 365 counterstained with DAPI. The FISH with a telomeric probe in U. limi yielded 366 stronger interstitial telomeric signals than the terminal ones. Bar = 10µm. 367
368 Molecular analysis of 5S rDNA genomic fraction
369 The slot blot quantification of 5S rRNA genes indicates approximately 2,000
370 5S rDNA copies per Umbra genome with a slightly higher copy number in U.
371 limi compared to U. krameri. Umbra thus has about ten times lower copy
372 number of 5S rRNA than both Esox species analyzed so far (Fig. 6 A, B).
373 These copies are arranged in tandems in both Umbra as well as in Esox
374 indicated by ladders of bands after the digestion with MspI restriction enzyme
375 (not sensitive to CG methylation) (Fig. 6 C). Increased ladder length for both
376 Umbra species reveals greater sequence divergence of MspI (CCGG) sites
377 compared to Esox. Digestion of DNAs with methylation-sensitive HpaII
378 isoschizomere resulted in hybridization signals in high-molecular-weight
379 regions with little to no ladders (Fig. 6 C), which is consistent with heavy
380 methylation of internal Cs within the CCGG motifs.
18
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
381 382 Fig.6. Genomic analysis of Umbra and Esox 5S rDNA. (A) Southern blot 383 hybridization. (B) Slot blot hybridization. Probe – 5S rDNA insert from the E. 384 cisalpinus clone a, K - U. krameri, L = U. limi, ESLU - E. lucius, (C) DNA 385 methylation analysis of 5SrDNA in Umbra and Esox. HpaII is a methylation- 386 sensitive isoschizomere of MspI. Probe – 5S rDNA insert from the E. 387 cisalpinus (Eci) clone a. ESLU – Esox lucius, K – Umbra krameri, L – Umbra 388 limi.
389
390 Discussion
391 The sizes of eukaryotic genomes vary substantially with teleosts being no
392 exception. The range of described cases of whole genome duplication
393 (WGD) events amongst teleosts make this mechanism the main suspect
394 when observing significant genome size increases. In contrast, we find that
395 the genome expansion in the genus Umbra is not caused by a WGD but
396 transposable element activity, and that extensive chromosome fusion
19
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
397 events have occurred in some Umbra species. In the following, we discuss
398 what sets these inconspicuous members of the order Esociformes apart
399 from their sister lineage from an ecological and evolutionary point of view.
400 Genome expansion and chromosome fusions in Umbra sp.
401 Our results demonstrate that the European mudminnow species U. krameri
402 underwent a genome expansion similar to the two North American Umbra
403 sister species. This genome expansion is thus a common trait of the entire
404 genus and can be localized in time prior to the split of the American and
405 European Umbra lineages, setting the Umbridae family apart from their sister
406 lineages Esox, Dallia, and Novumbra (Marić et al. 2017; Gregory 2020). The
407 European species U. krameri retained the Umbra ancestral chromosome
408 number (2n = FN = 44), which is even lower than the most common teleost
409 ancestral 2n ≈ 50 (Mank & Avise 2006; Sacerdot et al. 2018) chromosome
410 number and might indicate an overall tendency towards chromosome
411 number reduction in the genus Umbra. Paradoxically, the genome expansion
412 in U. limi and U. pygmaea is accompanied by a further reduction in
413 chromosome counts (2n = 22) while maintaining the same number of
414 chromosome arms (FN = 44). This reduction in the number of chromosomes
415 without changes to the number of chromosome arms, as confirmed for U.
416 limi, is consistent with Robertsonian fusion events. Accordingly, up to seven
417 out of 22 bi-armed chromosomes of U. limi exhibit interstitial telomeric sites
418 (ITSs), which presumably are remnants of these fusions. Similar occurrences
419 of telomeric DNA at non-telomeric sites as relics of chromosomal
420 rearrangements including centric fusions (Robertsonian translocations),
20
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
421 tandem fusions, and inversions (Ocalewicz 2013) have been observed at
422 pericentromeric locations of fused mammalian and non-mammalian
423 chromosomes (Ocalewicz et al. 2013; Meyne et al. 1990). Breaks in telomer-
424 adjacent chromatin regions before the fusion event may explain the absence
425 of telomeric repeat sequences at the remaining fusion sites (Nanda et al.
426 1995; Garagna et al. 1995). Furthermore, ITSs may undergo gradual loss
427 and degeneration, leading to progressive shortening. So while FISH is
428 currently the best approach to explore ITSs in situ, some interstitially located
429 telomeric DNA repeats might be too short for detection (Slijepcevic 1998).
430 The low chromosome count of the North American Umbra species (2n = 22)
431 makes them part of a small group of fishes. Among about 32,000 fish species
432 (Nelson et al. 2016), there are only thirteen fish species with 2n < 22
433 chromosomes described to date, made up mainly of Cyprinodontiformes (six
434 in Nothobranchiidae, one in Rivulidae) ranging between 2n = 16 - 20. The
435 lowest chromosome counts were described for the marine teleost species
436 spark anglemouth (Sigmops bathyphilus, Stomiiformes, Gonostomatidae)
437 (Post 1974) with 2n = 12 and the freshwater teleost species chocolate
438 gourami (Sphaerichthys osphromenoides, Osphronemidae,
439 Anabantiformes) (Calton & Denton 1974; Koref-Santibanez & Paepke 1994;
440 Arai 2011) with 2n = 16. Interestingly, the S. bathyphilus belongs to the order
441 Stomiiformes, a member of the superorder Osmeromorpha that split after the
442 earlier divergence of the superorder Protacanthopterygii (i.e. Esociformes
443 and Salmoniformes) (Nelson et al. 2016).
21
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
444 Interplay between cytogenomic and ecological traits 445 The chromosome number of a species can be linked to its ecological traits.
446 Specialized fishes were demonstrated to have smaller genomes than more
447 generalized species within and across lineages (Hinegardner & Rosen 1972;
448 Hardie & Hebert 2004). This constraint may be extended to the chromosome
449 number and genome size in specialized species (Hardie & Hebert 2004), or
450 constraints on genome size linked to the heightened developmental
451 complexity of specialized species (Gregory 2002). Similar principles have
452 been proposed for mammals (Qumsiyeh 1994) where species with higher 2n
453 or FN have a higher recombination rate because proper disjunction requires
454 at least one chiasma per arm (Jensen-Seaman et al. 2004). Increased
455 recombination not only leads to increased genetic variability, thereby
456 allowing the utilization of a wider niche (Qumsiyeh 1994), but also a genome
457 size reduction (Nam & Ellegren 2012). On the other hand, decreased
458 recombination favors the fixation of new mutations and thereby potential
459 speciation (Qumsiyeh 1994). Furthermore, decreased recombination can
460 cause the genome to expand due to the accumulation of transposon
461 insertions (Dolgin & Charlesworth 2008). The interaction between
462 transposon activity and recombination rate may create positive feedback,
463 where the accumulation of transposon insertion sites contributes to further
464 suppress recombination (Kent et al. 2017; Fedoroff 2012). The positive
465 correlation between recombination rate and effective population size (Mugal
466 et al. 2015) requires the consideration of the ecology of Umbra and Esox.
467 Both genera’s species belong to freshwater fishes at northern latitudes and
468 hence are affected by bottlenecks and reduced effective population sizes
22
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
469 within refugia during glaciation periods (Bernatchez & Wilson 1998). Low
470 levels of genetic diversity in pikes have been described extensively (Skog et
471 al. 2014) and may explain the extreme 5S rDNA amplification through genetic
472 drift (Symonová et al. 2017).
473 Genomic differences found among mudminnows can hardly be related to
474 variable ecological traits within this group, as the ecology of mudminnows
475 appears surprisingly similar. The American central mudminnow (U. limii) has
476 been termed a “habitat specialist and resource generalist” (Martin-Bergmann
477 & Gee 1985), referring to the densely vegetated and often overgrown and
478 deoxygenated swampy habitats in which this species is found. Accessory air-
479 breathing capabilities allow mudminnows to use such marginal habitats in
480 which no, or only a few other fish species can be found. Otherwise, the
481 central mudminnow uses a wide variety of small invertebrate prey. This brief
482 ecological characterization can be extended to all other mudminnows (sensu
483 lato), including Novumbra and Dallia. Despite their evolutionary
484 differentiation, all mudminnows are found in such characteristic habitat types
485 (Kuehne & Olden 2014) which might be related to their low competitive
486 abilities (Wanzenböck & Spindler 1995; Sehr & Keckeis 2017). All
487 mudminnows possess air-breathing accessory capabilities and use a wide
488 variety of small invertebrate prey (Kuehne & Olden 2014). An additional
489 peculiarity of this group of fishes is the parental care behavior, which become
490 territorial during spawning time guarding the eggs. In summary, most
491 ecological traits of mudminnows are convergent or stable despite their
492 divergence in phylogeny and the differences in genomics found in our study.
23
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
493 In contrast, pikes are found in a wide variety of habitats, densely overgrown
494 lentic waters to open waters in large lakes or even brackish waters and
495 riverine habitats. Furthermore, they are rather specialized piscivorous during
496 juvenile and adult life stages. Conversely, they might be called “habitat
497 generalists and resource specialists.” Whether such ecological differences
498 may be related to genomic differences between mudminnows and pikes
499 remains open.
500 Next to potentially ecological influences, there are also purely genomic
501 factors driving genome size evolution. A common cause of genome size
502 expansion is a WGD event (polyploidy) or repeat expansion due to
503 transposon or satellite DNA. In the Umbra genus, however, the genome
504 expansion was accompanied by a reduction in chromosome number, which
505 suggests two scenarios.
506 (i) The Umbra lineage has experienced a rapid chromosome loss (disploidy)
507 following an additional WGD. Evidence from plants suggests that repeated
508 WGD cycles with subsequent chromosome loss may generate karyotypes
509 with few chromosomes (Dodsworth et al. 2016). However, our comparative
510 genome and transcriptome analyses do not support the existence of
511 additional ohnologs in Umbra pygmaea. The observed number of duplicated
512 protein-coding genes in the genome assemblies of U. pygmaea and its sister
513 lineage Esox lucius was similar, excluding WGD as cause of genome
514 expansion in Umbra. While the completeness of the employed
515 transcriptomes and genomes is relatively low, it is not plausible that
516 specifically ohnologs are disproportionately missing in both cases.
24
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
517 (ii) Alternatively, the expansion of repeat sequences could explain the
518 increased genome size. Evidence of such a repeat expansion was found in
519 a sister lineage. The two species E. lucius and E. cisalpinus show an extreme
520 amplification of 5S rDNA across their chromosomes (Symonová et al. 2017)
521 while retaining their genome size and chromosome counts. This massive
522 expansion of tandem repeats by several orders of magnitude must have
523 happened over a short evolutionary time, as the related E. niger still features
524 a low number of 5S rDNA copies ancestral to the Esocidae lineage (Kovařík,
525 unpublished data). While the Umbra lineage shows teleost-
526 typical(Sochorová et al. 2018) 5S rDNA copy numbers, our comparative
527 genome analysis of the U. pygmaea genome confirms a significantly
528 elevated interspersed repeat content of 64%, which is the result of recent
529 transposon expansion peak driven by DNA transposon activity in
530 combination with unclassified repeat sequences. This peak is not only
531 consistent with the pervasive DNA transposon activity particularly in E.
532 lucius, E. masquinongy, and E. niger, but also with the generally high
533 abundance of DNA transposons in most fish genomes (Shao et al. 2019).
534 Manual curation of the top 30 highly abundant unclassified sequences
535 reveals DNA transposon features, further supporting the observed DNA
536 transposon expansion. The Alaska blackfish D. pectoralis constitutes an
537 exception as it appears in the process of repeat expansion driven by LINE
538 elements, rRNA, tRNA, and unclassified elements. Importantly, genome size
539 variation is the net result of sequence gain and loss (Petrov et al. 2000)
540 allowing for the possibility of sequence gain by TE expansion in U. pygmaea,
25
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
541 or alternatively sequence loss of various extent in the other species. As the
542 expansion peaks observed in all species except N. hubbsi appear to be
543 relatively recent at divergences below ten percent, a sequence gain scenario
544 is more plausible. As all but the E. lucius genome assembly are highly
545 fragmented and lack the completeness of reference genome assemblies, it
546 is however likely that the presented results underestimate the total repeat
547 content, particularly for long repeat sequences. This fragmentation is also a
548 likely explanation for the high abundance of unclassified repeat sequences
549 in U. pygmaea. Furthermore, it is not possible to locate chromosomal fusion
550 sites or potential duplications in the U. pygmaea assembly. Further analyses
551 based on a platinum standard long read based reference assemblies and an
552 extensively curated library of repeat sequences are required to address
553 these questions.
554 Here, we show that all members of the Umbra genus feature a significantly
555 expanded genome, placing the expansion event prior to the separation
556 between the European and North American species. However, only the North
557 American species experienced Robertsonian chromosome fusion events,
558 reducing their chromosome number significantly. Taken together, our
559 findings show that a recent activity peak mainly of Tc1/mariner DNA
560 transposons significantly increased the size of the U. pygmaea genome. We
561 furthermore find evidence of an ongoing repeat-driven genome expansion in
562 the Alaska blackfish Dallia pectoralis. This makes the family Umbridae and
563 the entire order Esociformes an attractive model system to study exclusively
564 repeat-driven genome expansion and its potential role in speciation.
26
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
565
566 Material and Methods
567 Fish sampling 568 Ten individuals of Umbra krameri were descendants from individuals
569 sampled between 1993 - 1997 from “Fadenbach” between Orth and
570 Eckartsau (Danube Wetlands, National Park in Lower Austria) within a nature
571 conservation project of the provincial government of Lower Austria
572 (Wanzenböck & Spindler 1995). Ten individuals of Umbra limi and ten
573 individuals of Esox niger were sampled in Quebec, Canada, in autumn 2017.
574 A local professional fisherman provided samples of two individuals of E.
575 lucius at Lake Mondsee in 2016. Further specimens of E. lucius and E.
576 cisalpinus were analyzed in our earlier study (Symonová et al. 2017). This
577 included 28 eight-month-old specimens (12 males, nine females, and one
578 unsexed) from the local fish farm (Olsztyn, Poland) sampled for cytogenetic
579 analyses.
580 Genome size determination by flow cytometry 581 Ethanol fixed red blood cells (RBC) of Umbra krameri and Esox lucius,
582 respectively, were used for genome size determination following (Lamatsch
583 et al. 2000) with chicken RBC as internal standard (C-value 1.25 pg per
584 nucleus). We determined genome size by flow cytometry using the Attune™
585 NxT Acoustic Focusing Cytometer (Thermo Fisher Scientific, Vienna,
586 Austria). We applied instrument settings optimized using the AttuneR
587 Cytometric Software. We used forward scatter (FSC) and VL1 violet
588 fluorescence (405 nm excitation, bandpass filter 440/50 nm) as triggers for
27
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
589 all measurements. We analyzed 20.000 cells or sample volumes of 0.1 mL
590 at a flow rate of 0.025 mL/min. DAPI is an AT-specific stain, which can yield
591 inaccurate results when the AT/GC content of the sample and internal
592 standard differ significantly. Chicken and E. lucius RBC, however, show a
593 very similar AT-content of 57.73%, 20, and 56.8%, (Borisova et al. 1974),
594 respectively, suggesting that the underestimation of the total amount of DNA
595 is negligible. Records on genome size come from the www.genomesize.com
596 database (Gregory 2020) and the cited literature (Table 1).
597 Comparative transcriptome analysis 598 We analyzed 24 de novo transcriptome assemblies of the PhyloFish
599 database (Pasquier et al. 2016): bowfin (Amia calva), gar (Lepisosteus
600 oculatus), European eel (Anguilla anguilla), butterflyfish (Pantodon
601 buchholzi), arowana (Osteoglossum bicirrhosum), elephantnose fish
602 (Gnathonemus petersi), Allis shad (Alosa alosa), zebrafish (Danio rerio),
603 panga (Pangasianodon hypophthalmus), a black ghost (Apteronotus
604 albifrons), cave Mexican tetra (Astyanax mexicanus), surface Mexican tetra
605 (Astyanax mexicanus), Northern pike (Esox lucius), Eastern mudminnow
606 (Umbra pygmaea), grayling (Thymallus thymallus), European whitefish
607 (Coregonus lavaretus), American (Lake) whitefish (Coregonus
608 clupeaformis), brown trout (Salmo trutta), rainbow trout (Oncorhynchus
609 mykiss), brook trout (Salvelinus fontinalis), Ayu (Plecoglossus altivelis),
610 Atlantic cod (Gadus morhua), medaka (Oryzias latipes), and European perch
611 (Perca fluviatilis). The completeness and duplication were assessed with
612 BUSCO v4.1.4 (Simão et al. 2015) using the dataset actinopterygii_odb10
28
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
613 containing 3640 genes (Table S1). For a subset of 10 species, reference
614 genome assemblies and structural genome annotations are available at the
615 NCBI reference sequence database (Table S2). For this subset, de novo
616 transcriptome assembly completeness and duplication were compared to
617 reference-genome-derived transcriptomes using identical BUSCO settings.
618 Gene counts were separately compared for each of the five gene categories
619 (complete, complete-single-copy, complete-duplicated, fragmented,
620 missing) using the coefficient of determination (Fig. S1). A phylogenetic
621 profile matrix was assembled representing the number of orthologous copies
622 detected for each of the 3640 BUSCO genes across 24 transcriptome
623 assemblies. Hierarchical clustering of the phylogenetic profile matrix was
624 performed using Manhattan distance and the “ward.D2” method
625 implemented in R v3.6.1 to obtain five main gene clusters and 26 sub-
626 clusters of main cluster III.
627 Comparative genome analysis 628 The five published(Pan et al. 2021) short-read based scaffold-level genome
629 assemblies of the Esocidae species Umbra pygmaea, Dallia pectoralis,
630 Novumbra hubbsi, Esox niger, and Esox masquinongy, as well as a
631 chromosome-scale reference genome assembly of Esox lucius, were used
632 for a comparative genome analysis (see Table S4 for accession numbers).
633 For each of the six genomes, a de novo repeat library was generated using
634 RepeatModeler 2.0.1, which yielded between 2,547 and 4,983 sequences
635 per species. These libraries were then combined and clustered using
636 usearch v11.0.667 with 80% similarity cutoff to obtain a single non-redundant
29
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
637 repeat library for the order Esociformes, which yielded a total of 14,856
638 sequences. Censor 4.2.29 was used to identify de novo sequences in
639 addition to the RepeatModeler classification and annotation with
640 RepeatMasker and RepBase 26.01. This library was then used as a
641 reference to annotate repeat sequences with RepeatMasker 4.1.1. Repeats
642 containing known TE coding domains were identified by predicting open
643 reading frames using Emboss v6.6.0.0 and comparing the resulting
644 sequences against the transposase protein sequence collection provided
645 with LTR_retriever(Ou & Jiang 2018) via blast. Requiring hits longer than 150
646 amino acids and with e-value < 1e-10 yielded 116 hits in LINE elements and
647 206 hits in sequences not classified as LINE elements, the latter of which
648 were retained for further analysis. This procedure was also used to extract
649 33 transposase sequences of six previously described neoteleost
650 Tc1/mariner lineages(Gao et al. 2017). All 239 sequences were then aligned
651 with Clustal Omega v1.2.4, and a maximum likelihood phylogenetic tree was
652 calculated with RAxML 8.2.12 using a variable time amino acid substitution
653 model with gamma distributed rate heterogeneity. Analysis and visualization
654 of all data was performed using R 3.6.1. The thirty most abundant Umbra
655 pygmaea repeat sequences of the de novo library which did not show
656 significant similarity with known TEs were selected for further manual
657 curation. Firstly, we tested if the unclassified sequences include local context
658 sequence in addition to the core repetitive tract. To that end, the repetitive
659 sequence was used as a query in a Blastn (Altschul et al. 1997) search
660 against the Umbra pygmaea genome assembly. Two or more matching loci
30
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
661 located on different contigs/scaffolds were extracted including 10 Kbp up-
662 and downstream of the matching sequence. These hit sequences were then
663 compared to each other in a dot plot analysis using the dotter (Sonnhammer
664 & Durbin 1995), allowing the exact definition of the entire repetitive element.
665 Complete elements were then analyzed for similarity with known TEs as
666 defined in the latest version of RepBase (Bao et al. 2015) and/or for the
667 presence of structural features typical of known TE elements such as the
668 inverted repeats of DNA TEs. This allowed the association of 13 elements to
669 a TE class including 2 cases with only diagnostic structural features (Table
670 S6). In addition, one satellite and one tandem repeat sequence were
671 identified, leaving a total of 14 sequences unclassified. The curated
672 sequences are provided at
673 https://github.com/roblehmann/umbridae_genome_expansion.
674 Chromosome analyses and fluorescence in situ hybridization 675 Telomeric DNA repeats on chromosomes of Umbra and Esox species were
676 detected by FISH, using a telomere PNA (peptide nucleic acid) FISH Kit/FITC
677 (DAKO, Denmark) according to the manufacturer’s protocol. Slides with the
678 metaphase spreads were washed with buffer (Tris-buffered saline, pH 7.5)
679 for 2 min, immersed in 3.7 % formaldehyde in 1× TBS for 2 min, washed
680 twice in TBS for 5 min each, and treated with the Pre-Treatment solution
681 including Proteinase K (DAKO) for 10 min. Afterward, slides were washed
682 twice in TBS buffer for 5 min each, dehydrated through a cold (-20 °C)
683 ethanol series (70 %, 85 %, 96 %) for 1 min, and air-dried at room
684 temperature. Ten µl of FITC PNA telomere probe mix (DAKO) was dropped
31
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
685 on the prepared slides and covered with the coverslip. Chromosomal DNA
686 was denatured at 85o C for 5 minutes under the coverslip in the presence of
687 the PNA probe. The hybridization reaction took place in the darkness at room
688 temperature for 90 minutes. After hybridization, the coverslips were gently
689 removed by immersion of slides in the Rinse Solution (DAKO) for 1 min.
690 Slides were then washed in the Wash Solution (DAKO) for 5 min at 65°C and
691 dehydrated by immersion through a series of cold ethanol washes of 70%,
692 85%, 96% for 1 min each and air-dried at room temperature. For the
693 counterstaining, chromosomes were mounted in the VECTASHIELD
694 Antifade Mounting Medium containing DAPI (Vector Laboratories, USA).
695 Metaphase plates were analyzed under a Zeiss Axio Imager A1 microscope
696 equipped with a fluorescent lamp and a digital camera. The images were
697 electronically processed using the Band View/FISH View software (Applied
698 Spectral Imaging, Galiliee, Israel). FISH with 5S rDNA probe was performed
699 according to (Fujiwara et al. 1998) with slight modification (Kirtiklis et al.
700 2014). A 5S rDNA probe was obtained via PCR with forward primer 5S-1: 5’
701 - TACGCC CGATCT CGT CCG ATC - 3’ and reverse primer 5S-2: 5’- CAG
702 GCTGGT ATG GCC GTA AGC - 3’ (Pendas et al. 1994). The PCR reaction
703 was carried out in a 50 µl reaction volume containing 1.25 U GoTaq Flexi
704 DNA Polymerase (Promega, USA), 10 µl of 5X Flexi Reaction Buffer
705 (Promega, USA), 100 µM of each dNTP (Promega, USA), 3 mM of MgCl2
706 (Promega, USA), 10 pM of each primer, 2 µl of DNA template, and nuclease-
707 free water. PCR product, obtained after 30 cycles of amplification and
708 annealing at 55 ˚C, was purified using the GeneElute PCR Clean-Up Kit
32
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
709 (Sigma-Aldrich, USA), then labeled with biotin-16-dUTP (Roche, Germany)
710 by nick-translation method (Roche, Switzerland). In situ hybridization with
711 150 ng of rDNA probe per slide was performed with RNase-pre-treated and
712 formamide-denaturated chromosome slides. A post-hybridization wash was
713 performed at 37˚C for 20 min. Chromosome slides were subjected to the
714 detection with avidin-FITC (Roche, Basel, Switzerland) and then
715 counterstained with DAPI in VECTASHIELD Antifade Mounting Medium
716 (Vector Laboratories, USA). At least 15 metaphase chromosome spreads
717 from each specimen were analyzed using a Nikon Eclipse 90i (Nikon, Japan)
718 microscope equipped with epi-fluorescence. Pictures were acquired using a
719 monochromatic ProgRes MFcool camera (Jenoptic, Germany) controlled by
720 a Lucia software ver. 2.0 (Laboratory Imaging, Czech Republic). Post-
721 processing elaboration of all the pictures was made based on CorelDRAW
722 Graphics Suite 11 (Corel Corporation, Canada).
723 Southern and slot blot hybridizations of 5S rDNA 724 The procedure followed the protocol described by (Koukalova et al. 2010).
725 The 5S rDNA probe was a 243 bp insert of the 5S_Eci_a clone (GenBank
726 KX965716) from E. cisalpinus. The plasmid insert was amplified and labeled
727 with the 32P-dCTP (DekaPrime kit, Fermentas, Lithuania). The probe was
728 hybridized at high stringency conditions (washing 2x SSC, 0.1% SDS
729 followed by 0.1xSSC, 0.1% SDS at 65 °C). The hybridization signals were
730 visualized by Phosphor imaging (Typhoon 9410, GE Healthcare, PA, USA),
731 and signals were quantified using ImageQuant software (GE Healthcare, PA,
732 USA). The copy number of 5S rDNA genes was estimated using slot blot
33
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
733 hybridization. Briefly, the DNA concentration was estimated
734 spectrophotometrically at OD260nm using a Nanodrop 3300
735 fluorospectrometer (Thermo Fisher Scientific, USA). Concentrations were
736 verified by the electrophoresis in agarose gels using dilutions of lambda DNA
737 as standards. The three dilutions of genomic DNA (100, 50, and 25 ng),
738 together with serial dilutions of unlabelled plasmid inserts corresponding to
739 the 5S monomers (GenBank KX965715-6). Each aliquot was denatured in
740 0.4 M NaOH and blotted onto a positively charged Nylon membrane (Hybond
741 XC, GE Healthcare, USA) using a vacuum slot blotter (Schleicher-Schuell,
742 Germany). The probe and the hybridization conditions and visualization of
743 signals were the same as described above.
744 DNA methylation analysis of 5S rDNA 745 Purified genomic DNA samples of U. krameri (K1 - K3), U. limi (L1 - L3), and
746 from E. lucius (2 individuals as a control) were digested with methylation-
747 sensitive HpaII (sensitive to CG methylation) and its methylation-insensitive
748 MspI isoschizomere (both enzymes are cutting at CCGG). The restriction
749 fragments were hybridized on blots with the alpha[P32]dCTP-labelled 5S
750 rDNA probe. Control of digestion efficiency was carried out by spiking the
751 Esox genomic DNA with a non-methylated plasmid DNA (pBluescript,
752 Stratagen) and subsequent hybridization with a plasmid probe. Both MspI
753 and HpaII enzymes yielded expected restriction fragments (not shown).
754
755 Author Contributions
34
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
756 RS conceptualized and supervised the study and provided the cytogenomic 757 context; RS, AK, RL, and DKL methodized the study and wrote the original 758 draft; RL analyzed fish transcriptomes and genomes; RL and AZ analyzed 759 transposable elements; AK performed southern blot and slot blot 760 hybridizations; DKL performed the flow cytometry genome size 761 determination; JW provided the ecological and phylogenetic context of the 762 study and specimens for analyses; LB organized sampling, provided 763 specimens for analyses and reviewed the manuscript; KO, LK performed the 764 cytogenetic analyses; RL, RS, DKL, KO, LK, JW, JT wrote, reviewed and 765 edited the study; RS acquired funding for the study.
766 Acknowledgments
767 This study was supported by the Tyrolean funds project with contract Nr. 768 UNI-0404/2015 to RS and co-funded by the Erasmus+ program of the 769 European Union with contract Nr. 2019-1-CZ01-KA203-061433 to RS and 770 DKL. The authors are also grateful to the `Excelence projekt PřF UHK 771 2209/2018` and the Czech Science Foundation (19-03442S) for financial 772 support. 773 We acknowledge Petr Ráb for discussion on genome size in the Umbra 774 genus, Guillaume Côté for fishing Umbra and Esox in Québec, and we also 775 acknowledge Maria Pichler of UIBK for her technical support.
776 Conflict of Interest
777 The authors declare no conflict of interest.
778 Data Availability
779 The de novo repeat library and manually curated sequences are available via 780 https://github.com/roblehmann/umbridae_genome_expansion
781 References
782 Altschul SF et al. 1997. Gapped BLAST and PSI-BLAST: A new 783 generation of protein database search programs. Nucleic Acids Res. 784 25:3389–3402. doi: 10.1093/nar/25.17.3389. 785 Arai R. 2011. Fish Karyotypes: A Check List. Springer, Japan. 786 Bao W, Kojima KK, Kohany O. 2015. Repbase Update, a database of 787 repetitive elements in eukaryotic genomes. Mob. DNA. 6:11. doi: 788 10.1186/s13100-015-0041-9.
35
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
789 Beamish RJ, Merrilees MJ, Crossman EJ. 1971. Karyotypes and 790 DNA values for members of the suborder Esocoidei (Osteichthyes: 791 Salmoniformes). Chromosoma. 34:436–447. doi: 792 10.1007/BF00326315. 793 Bernatchez L, Wilson CC. 1998. Comparative phylogeography of 794 Nearctic and Palearctic fishes. Mol. Ecol. 7:431–452. doi: 795 10.1046/j.1365-294x.1998.00319.x. 796 Borisova OF, Razjivin AP, Zaregorodzev VI. 1974. Evidence for the 797 quinacrine fluorescence on three at pairs of DNA. FEBS Lett. 798 46:239–242. doi: 10.1016/0014-5793(74)80377-7. 799 Calton MS, Denton TE. 1974. Chromosomes of the chocolate 800 gourami: A cytogenetic anomaly. Science (80-. ). 185:618–619. doi: 801 10.1126/science.185.4151.618. 802 Crossman EJ, Ráb P. 2001. Chromosomal NOR phenotype and C- 803 banded karyotype of olympic mudminnow, Novumbra hubbsi 804 (Euteleostei: Umbridae). Copeia. 2001:860–865. doi: 10.1643/0045- 805 8511(2001)001[0860:CNPACB]2.0.CO;2. 806 Dion-Côté A-M et al. 2017. Standing chromosomal variation in Lake 807 Whitefish species pairs: the role of historical contingency and 808 relevance for speciation. Mol. Ecol. 26:178–192. doi: 809 10.1111/mec.13816. 810 Dodsworth S, Chase MW, Leitch AR. 2016. Is post-polyploidization 811 diploidization the key to the evolutionary success of angiosperms? 812 Bot. J. Linn. Soc. 180. doi: 10.1111/boj.12357. 813 Dolgin ES, Charlesworth B. 2008. The effects of recombination rate 814 on the distribution and abundance of transposable elements. 815 Genetics. 178:2169–2177. doi: 10.1534/genetics.107.082743. 816 Ellegren H. 2004. Microsatellites: Simple sequences with complex 817 evolution. Nat. Rev. Genet. 5:435–445. doi: 10.1038/nrg1348. 818 Elliott TA, Gregory TR. 2015. Do larger genomes contain more 819 diverse transposable elements? BMC Evol. Biol. 15:1–10. doi: 820 10.1186/s12862-015-0339-8. 821 Fedoroff N V. 2012. Transposable Elements, Epigenetics, and 822 Genome Evolution. Science (80-. ). 338:758–767. doi: 823 10.1126/science.338.6108.758. 824 Fontdevila A. 2011. The Dynamic Genome: A Darwinian Approach. 825 OUP Oxford.
36
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
826 Fujiwara A, Abe S, Yamaha E, Yamazaki F, Yoshida MC. 1998. 827 Chromosomal localization and heterochromatin association of 828 ribosomal RNA gene loci and silver-stained nucleolar organizer 829 regions in salmonid fishes. Chromosom. Res. 6:463–471. doi: 830 10.1023/A:1009200428369. 831 Gao B et al. 2017. Characterization of autonomous families of 832 Tc1/mariner transposons in neoteleost genomes. Mar. Genomics. 833 34:67–77. doi: 10.1016/j.margen.2017.05.003. 834 Garagna S et al. 1995. Robertsonian metacentrics of the house 835 mouse lose telomeric sequences but retain some minor satellite DNA 836 in the pericentromeric area. Chromosoma. 103:685–692. doi: 837 10.1007/BF00344229. 838 Gilbert C, Williams J. 2002. Field guide to fishes. North America. 839 Alfred A. Knopf, New York. 840 Grabherr MG et al. 2011. Full-length transcriptome assembly from 841 RNA-Seq data without a reference genome. Nat. Biotechnol. 29:644– 842 52. doi: 10.1038/nbt.1883. 843 Gregory TR. 2020. Animal Genome Size Database. 844 http://www.genomesize.com. 845 Gregory TR. 2002. Genome size and developmental complexity. 846 Genetica. 115:131–146. doi: 10.1023/A:1016032400147. 847 Gregory TR, Hebert PDN. 1999. The modulation of DNA content: 848 Proximate causes and ultimate consequences. Genome Res. 9:317– 849 324. doi: 10.1101/gr.9.4.317. 850 Hardie DC, Hebert PDN. 2004. Genome-size evolution in fishes. 851 Can. J. Fish. Aquat. Sci. 61:1636–1646. doi: 10.1139/F04-106. 852 Hardie DC, Hebert PDN. 2003. The nucleotypic effects of cellular 853 DNA content in cartilaginous and ray-finned fishes. Genome. 854 46:683–706. doi: 10.1139/g03-040. 855 Hinegardner R. 1968. Evolution of Cellular DNA Content in Teleost 856 Fishes. Am. Nat. 102:517–523. doi: 10.1086/282564. 857 Hinegardner R, Rosen DE. 1972. Cellular DNA Content and the 858 Evolution of Teleostean Fishes. Am. Nat. 106:621–644. doi: 859 10.1086/282801. 860 Jensen-Seaman MI et al. 2004. Comparative recombination rates in 861 the rat, mouse, and human genomes. Genome Res. 14:528–538. 862 doi: 10.1101/gr.1970304.
37
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
863 Kent T V., Uzunović J, Wright SI. 2017. Coevolution between 864 transposable elements and recombination. Philos. Trans. R. Soc. B 865 Biol. Sci. 372. doi: 10.1098/rstb.2016.0458. 866 Khoshkholgh M, Alireza A, Sajad N. 2015. Karyotypic 867 characterization of the pike, Esox lucius from the south Caspian Sea 868 basin. Iran. J. Anim. Biosyst. 11:43–49. 869 Kirtiklis L, Ocalewicz K, Wiechowska M, Boroń A, Hliwa P. 2014. 870 Molecular cytogenetic study of the European bitterling Rhodeus 871 amarus (Teleostei: Cyprinidae: Acheilognathinae). Genetica. 872 142:141–148. doi: 10.1007/s10709-014-9761-x. 873 Koref-Santibanez S, Paepke H. 1994. Karyotypes of the 874 Trichogasterinae Liem (Teleostei, Anabantoidei). VIII Congr. Soc. 875 Eur. Ichthyol. 55. 876 Koukalova B et al. 2010. Fall and rise of satellite repeats in 877 allopolyploids of Nicotiana over c. 5 million years. New Phytol. 878 186:148–160. doi: 10.1111/J.1469- 879 [email protected]/(ISSN)1469- 880 8137(CAT)SPECIALISSUES(VI)PLANTPOLYPLOIDY. 881 Kuehne LM, Olden JD. 2014. Ecology and Conservation of 882 Mudminnow Species Worldwide. Fisheries. 39:341–351. doi: 883 10.1080/03632415.2014.933318. 884 Lamatsch DK, Steinlein C, Schmid M, Schartl M. 2000. Noninvasive 885 determination of genome size and ploidy level in fishes by flow 886 cytometry: detection of triploid Poecilia formosa. Cytometry. 39:91– 887 95. doi: 10.1002/(SICI)1097-0320(20000201)39:2<91::AID- 888 CYTO1>3.0.CO;2-4. 889 Li JT et al. 2015. The fate of recent duplicated genes following a 890 fourth-round whole genome duplication in a tetraploid fish, common 891 carp (Cyprinus carpio). Sci. Rep. 5:1–9. doi: 10.1038/srep08199. 892 Liedtke HC, Gower DJ, Wilkinson M, Gomez-Mestre I. 2018. 893 Macroevolutionary shift in the size of amphibian genomes and the 894 role of life history and climate. Nat. Ecol. Evol. 2:1792–1799. doi: 895 10.1038/s41559-018-0674-4. 896 Lien S et al. 2016. The Atlantic salmon genome provides insights into 897 rediploidization. Nature. 533:200–205. doi: 10.1038/nature17164. 898 Lu J, Peatman E, Tang H, Lewis J, Liu Z. 2012. Profiling of gene 899 duplication patterns of sequenced teleost genomes: Evidence for 900 rapid lineage-specific genome expansion mediated by recent tandem
38
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
901 duplications. BMC Genomics. 13:1–10. doi: 10.1186/1471-2164-13- 902 246. 903 Lynch M. 2007. The Origins of Genome Architecture. Oxford 904 University Press (OUP) doi: 10.1093/jhered/esm073. 905 Macqueen DJ, Johnston IA. 2014. A well-constrained estimate for the 906 timing of the salmonid whole genome duplication reveals major 907 decoupling from species diversification. Proc. R. Soc. B Biol. Sci. 908 281:20132881. doi: 10.1098/rspb.2013.2881. 909 Mank JE, Avise JC. 2006. Phylogenetic conservation of chromosome 910 numbers in Actinopterygiian fishes. Genetica. 127:321–327. doi: 911 10.1007/s10709-005-5248-0. 912 Marburger S et al. 2018. Whole genome duplication and 913 transposable element proliferation drive genome expansion in 914 Corydoradinae catfishes. Proc. R. Soc. B Biol. Sci. 285:20172732. 915 doi: 10.1098/rspb.2017.2732. 916 Marić S et al. 2017. Phylogeography and population genetics of the 917 European mudminnow (Umbra krameri) with a time-calibrated 918 phylogeny for the family Umbridae. Hydrobiologia. 792:151–168. doi: 919 10.1007/s10750-016-3051-9. 920 Martin-Bergmann KA, Gee JH. 1985. The central mudminnow, 921 Umbra limi (Kirtland), a habitat specialist and resource generalist. 922 Can. J. Zool. 63:1753–1764. doi: 10.1139/z85-264. 923 Meyne J et al. 1990. Distribution of non-telomeric sites of the 924 (TTAGGG)n telomeric sequence in vertebrate chromosomes. 925 Chromosoma. 99:3–10. doi: 10.1007/BF01737283. 926 Mugal CF, Weber CC, Ellegren H. 2015. GC-biased gene conversion 927 links the recombination landscape and demography to genomic base 928 composition. BioEssays. 37:1317–1326. doi: 929 10.1002/bies.201500058. 930 Nam K, Ellegren H. 2012. Recombination Drives Vertebrate Genome 931 Contraction Petrov, DA, editor. PLoS Genet. 8:e1002680. doi: 932 10.1371/journal.pgen.1002680. 933 Nanda I, Schneider-Rasp S, Winking H, Schmid M. 1995. Loss of 934 telomeric sites in the chromosomes of Mus musculus domesticus 935 (Rodentia: Muridae) during Robertsonian rearrangements. 936 Chromosom. Res. 3:399–409. doi: 10.1007/BF00713889. 937 Nelson JS, Grande TC, Wilson MVH. 2016. Fishes of the World, 5th 938 Edition | Wiley.
39
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
939 Ocalewicz K et al. 2013. Pericentromeric location of the telomeric 940 DNA sequences on the European grayling chromosomes. Genetica. 941 141:409–416. doi: 10.1007/s10709-013-9740-7. 942 Ocalewicz K. 2013. Telomeres in fishes. Cytogenet. Genome Res. 943 141:114–125. doi: 10.1159/000354278. 944 Ohno S. 2013. Evolution by Gene Duplication. Springer Berlin 945 Heidelberg. 946 Ou S, Jiang N. 2018. LTR_retriever: A highly accurate and sensitive 947 program for identification of long terminal repeat retrotransposons. 948 Plant Physiol. 176:1410–1422. doi: 10.1104/pp.17.01310. 949 Pan Q et al. 2021. The rise and fall of the ancient northern pike 950 master sex determining gene. Elife. 10:1–50. doi: 951 10.7554/eLife.62858. 952 Pasquier J et al. 2016. Gene evolution and gene expression after 953 whole genome duplication in fish: the PhyloFish database. BMC 954 Genomics. 17:368. doi: 10.1186/s12864-016-2709-z. 955 Van de Peer Y, Mizrachi E, Marchal K. 2017. The evolutionary 956 significance of polyploidy. Nat. Rev. Genet. 18:411–424. doi: 957 10.1038/nrg.2017.26. 958 Pendas AM, Moran P, Freije JP, Garcia-Vazquez E. 1994. 959 Chromosomal mapping and nucleotide sequence of two tandem 960 repeats of Atlantic salmon 5S rDNA. Cytogenet. Genome Res. 961 67:31–36. doi: 10.1159/000133792. 962 Petrov DA, Sangster TA, Johnston JS, Hartl DL, Shaw KL. 2000. 963 Evidence for DNA loss as a determinant of genome size. Science 964 (80-. ). 287:1060–1062. doi: 10.1126/science.287.5455.1060. 965 Post A. 1974. Ergebnisse der Forschungsreisen des FFS “Walther 966 Herwig” nach Südamerika. XXXIV. Die Chromosomen von drei Arten 967 aus der Familie Gonostomatidae (Osteichthyes, Stomiatoidei). Arch 968 Fischwiss. 25:51–55. 969 Pritham EJ. 2009. Transposable elements and factors influencing 970 their success in eukaryotes. J. Hered. 100:648–655. doi: 971 10.1093/jhered/esp065. 972 Qumsiyeh MB. 1994. Evolution of number and morphology of 973 mammalian chromosomes. J. Hered. 85:455–465. doi: 974 10.1093/oxfordjournals.jhered.a111501. 975 Ráb P. 2004. Karyotype evolution in fishes of the order Esociformes.
40
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
976 Rab P, Crossman EJ. 1994. Chromosomal NOR phenotypes in North 977 American pikes and pickerels, genus Esox, with notes on the 978 Umbridae (Euteleostei: Esocae). Can. J. Zool. 72:1951–1956. doi: 979 10.1139/z94-265. 980 Rondeau EB et al. 2014. The Genome and Linkage Map of the 981 Northern Pike (Esox lucius): Conserved synteny revealed between 982 the salmonid sister group and the Neoteleostei Yin, T, editor. PLoS 983 One. 9:e102089. doi: 10.1371/journal.pone.0102089. 984 Rozenfeld C et al. 2019. De novo European eel transcriptome 985 provides insights into the evolutionary history of duplicated genes in 986 teleost lineages Vaudry, H, editor. PLoS One. 14:e0218085. doi: 987 10.1371/journal.pone.0218085. 988 Sacerdot C, Louis A, Bon C, Berthelot C, Roest Crollius H. 2018. 989 Chromosome evolution at the origin of the ancestral vertebrate 990 genome. Genome Biol. 19:166. doi: 10.1186/s13059-018-1559-1. 991 Sehr M, Keckeis H. 2017. Habitat use of the European mudminnow 992 Umbra krameri and association with other fish species in a 993 disconnected Danube side arm. J. Fish Biol. 91:1072–1093. doi: 994 10.1111/jfb.13402. 995 Shao F, Han M, Peng Z. 2019. Evolution and diversity of 996 transposable elements in fish genomes. Sci. Rep. 9:1–8. doi: 997 10.1038/s41598-019-51888-1. 998 Simão FA, Waterhouse RM, Ioannidis P, Kriventseva E V, Zdobnov 999 EM. 2015. BUSCO: assessing genome assembly and annotation 1000 completeness with single-copy orthologs. Bioinformatics. 31:3210–2. 1001 doi: 10.1093/bioinformatics/btv351. 1002 Skog A, Vøllestad LA, Stenseth NC, Kasumyan A, Jakobsen KS. 1003 2014. Circumpolar phylogeography of the northern pike (Esox lucius) 1004 and its relationship to the Amur pike (E. reichertii). Front. Zool. 11:67. 1005 doi: 10.1186/s12983-014-0067-8. 1006 Slijepcevic P. 1998. Telomeres and mechanisms of Robertsonian 1007 fusion. Chromosoma. 107:136–140. doi: 10.1007/s004120050289. 1008 Sochorová J, Garcia S, Gálvez F, Symonová R, Kovařík A. 2018. 1009 Evolutionary trends in animal ribosomal DNA loci: introduction to a 1010 new online database. Chromosoma. 127:141–150. doi: 1011 10.1007/s00412-017-0651-8. 1012 Sonnhammer ELL, Durbin R. 1995. A dot-matrix program with 1013 dynamic threshold control suited for genomic DNA and protein
41
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1014 sequence analysis. Gene. 167:GC1. doi: 10.1016/0378- 1015 1119(95)00714-8. 1016 Sun C et al. 2012. LTR retrotransposons contribute to genomic 1017 gigantism in plethodontid salamanders. Genome Biol. Evol. 4:168– 1018 183. doi: 10.1093/gbe/evr139. 1019 Symonová R et al. 2013. Genome differentiation in a species pair of 1020 coregonine fishes: An extremely rapid speciation driven by stress- 1021 activated retrotransposons mediating extensive ribosomal DNA 1022 multiplications. BMC Evol. Biol. 13:42. doi: 10.1186/1471-2148-13- 1023 42. 1024 Symonová R et al. 2017. Higher-order organisation of extremely 1025 amplified, potentially functional and massively methylated 5S rDNA in 1026 European pikes (Esox sp.). BMC Genomics. 18:391. doi: 1027 10.1186/s12864-017-3774-7. 1028 Tenaillon MI, Hollister JD, Gaut BS. 2010. A triptych of the evolution 1029 of plant transposable elements. Trends Plant Sci. 15:471–478. doi: 1030 10.1016/j.tplants.2010.05.003. 1031 Vendrely R, Vendrely C. 1953. Argenine an deoxyribonucleic acid 1032 content of erythrocyte nuclei and sperms of some species of fishes. 1033 Nature. 172:30–31. 1034 Vendrely R, Vendrely C. 1952a. Sur la teneur compare en arginine et 1035 en acide desoxyribonucleique des noyaux d’erythrocytes de 1036 quelques especes de poissons. Comptes Rendus l’Academie des 1037 Sci. 235:395–397. 1038 Vendrely R, Vendrely C. 1952b. Sur la teneur individuelle en arginine 1039 des spermatozoedes compare la teneur individuelle en arginine des 1040 noyaux d’erythrocytes chez quelques especes de poissons. Comptes 1041 Rendus l’Academie des Sci. 235:444–446. 1042 Vinogradov AE. 1998. Genome size and GC-percent in vertebrates 1043 as determined by flow cytometry: The triangular relationship. 1044 Cytometry. 31:100–109. doi: 10.1002/(SICI)1097- 1045 0320(19980201)31:2<100::AID-CYTO5>3.0.CO;2-Q. 1046 Wanzenböck J, Spindler T. 1995. Rediscovery of Umbra krameri 1047 (Walbaum, 1972) in Austria and subsequent investigations. Ann. des 1048 Naturhistorischen Museums Wien. Ser. B für Bot. und Zool. 97:450– 1049 457. 1050 Wong WY et al. 2019. Expansion of a single transposable element 1051 family is associated with genome-size increase and radiation in the
42
bioRxiv preprint doi: https://doi.org/10.1101/2021.06.07.447207; this version posted June 7, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1052 genus Hydra. Proc. Natl. Acad. Sci. U. S. A. 116:22915–22917. doi: 1053 10.1073/pnas.1910106116. 1054
43