<<

Posted on Authorea 8 Oct 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160218269.98008776/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. h ubti paetya xeln idctr pce.Ti pce svleal omn environmental many to vulnerable is 2013). species al., This et Eick species. 2002; “indicator” Arzbach, excellent 1 & an between (Fredrich apparently m are is 3.0 temperatures burbot to water The m the 0.3 and winter of ice-covered, during depths still spawns water is species in water This water, the stage. cold North freshwater, 4 larval when for of in and pelagic preference typically basins as a thrives spring, such freshwater and fish early 2018), fecundity, range suitable al., high this or all longitudinal et temperatures, Although nearly (Blabolil widest low ancestors in at the 1998). marine spawning distributed showing its (Lehtonen, of is distribution, characteristics Asia burbot many holarctic north retained The wide and a world. Europe, the has America, in fish fish This freshwater of 2016). biological al., and et evolutionary, (Schaefer taxonomical, the ( burbot high- hinders The cod one significantly Atlantic only . date, order limitation the the to this i.e., However, of Only and species, studies 2006). available, waters. Gadiformes (Nelson, coastal habitats is the to freshwater of 2011), habitats in Gadiform sequence benthic known 2004). deep-sea are genome (FAO, from catch order quality fish ocean this in marine high-latitude in haddock) total every and species world’s in hake, two the waters cod, of cold (e.g., 18% fish inhabit approximately commercial fish for important accounts most and the world of the some includes Gadiformes order The be Gadiformes. to order the appear of environment. that study freshwater genes evolutionary Introduction the putative and to 377 ecological adapt the addition, might for genes In resource selected total invaluable ago. positively A an years These indicated provide Mb. million data identified. analyses 22.10 final were genome 44.4 of Phylogenetic The These N50 burbot about annotated. scaffold in data. cod functionally a selection Hi-C Atlantic were and positive the the Mb which under using 2.01 with of by of diverged 94.82% N50 pseudo-chromosomes predicted, burbot contig 22 were contig that a to a genes with of with anchored Mb, protein-coding size total were 575.92 22,067 in was A sequences of Mb correction assembled technology. 575.83 the Hi-C was The Hi-C after Here, assembly species and genome genome Mb. assembled This sequencing preliminary released. 2.15 the long been of and freshwater. genome PacBio adaptive has size generated, to using on were N50 genome solely studies constructed sequences reference for PacBio was adapted model polished high-quality burbot is Gb good no the 95.24 a that However, of is () burbot genome environment. family The chromosome-level freshwater cod world. first to the the in marine of fish from freshwater member of evolution only range longitudinal the widest is the shows lota) (Lota burbot The Abstract 2020 8, October 4 3 2 1 freshwater Han in Zhiqiang adaptations evolutionary the lota) into insights (Lota provides burbot of assembly genome Chromosome-level hjagOenUniversity Ocean Zhejiang Company Technology Gooalgene Wuhan University Forestry Northeast available not Affiliation ° Bresne l,19) pwigocr nfiet rvlsbtaei hlo aso ryefields groyne or bays shallow in substrate gravel to fine on occurs Spawning 1993). al., et (Bergersen C oalota Lota 1 ahn Liu Manhong , steol ebro h o aiy(aia)ta saatdsll ofreshwater to solely adapted is that (Gadidae) family cod the of member only the is 2 iLiu Qi , 3 a Zhai Hao , 1 2 hjnxiao shijun , 3 n inin Gao Tianxiang and , au morhua Gadus Sa tal., et (Star ) .lota L. 4 has ° C Posted on Authorea 8 Oct 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160218269.98008776/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. eoesz,crettegnm seby n vlaeasmle.Apie-n irr a constructed the was estimate library to paired-end prepared A were assemblies. libraries evaluate sequencing and assembly, Illumina genome generate to the respectively. sequencing correct genomic reads, size, for genomic applied genome were long platforms and II Sequel short PacBio and NovaSeq-6000 Illumina The sequencing The genome and electrophoresis library. construction gel sequencing agarose Library USA). 1% DNA all (Plextech, extracted using checked the was Analyzer while were DNA/Protein assembly, tissue construct molecules DNA muscle Pultton genome until to genomic and white nitrogen for the method the of liquid sequencing extraction from concentration in DNA and DNA phenol/chloroform integrity stored of for Genomic standard and part used sequencing. the northeastern collected were transcriptome the were using tissues for River, tissues used Muscle Heilong spleen were from and extraction. tissues 2019 liver, RNA November gill, and in gonad, collected DNA eye, was Muscle, g) (~800 China. fish female single A extraction DNA and species. Sample marine-originated for resource adaptation genomic freshwater methods a of and of provide process signatures Materials will evolutionary genetic study key This the the distantly identify reads, address species. 13 to of Percomorpha further short genomics freshwater used to combining comparative three was by by Perciformes including assembly and constructed species, The burbot related was in data. adaptation burbot freshwater sequencing to of related Hi-C evolution assembly and adaptations. genome reads, freshwater chromosome-level long identify freshwater PacBio a to in study, way strong practical this is a convergence In phenotypic genomic Physiological provides convergent The and changes. and lineages genetic extent, 1988). marine-derived same some of (Jara, the to from fish adaptations reproducible arise freshwater be morpholog- commonly to may may their evolution traits reported of convergent convergence been a the has underlying freshwater as changes characters, to such challenges, ocean physiological effort, in these from Concerted and fluctuations Despite transition ical habitats. dramatic this freshwater lineages. completed radiation, successfully have fish invaded UV condi- lineages freshwater and environmental marine-derived of primary from changing levels fish 2017), from with high freshwater relatively competition (Finnegan, several associated salinity, and occurs barriers lower temperature, ecological rarely should include freshwater events and transition factors This physiological transition and These freshwater to ocean fish. tions. opportunity to due the of an likely left Marine behavior provides is burbot and habitat genes. morphology, freshwater which the ecology, a functional that to the numerous indicates oceanic in phenomenon select an changes This from environmental from transition drastic 2005). transition The for al., classical freshwater. et the a the (Houdt to (Gadidae),represents that migrated Pliocene family suggests early cod evidence the the Fossil evolution the in in genome into freshwater. the insights species to shaping reveal freshwater marine in to only changes help burbot,the environmental may The of 2005). process Sanetra, role on This 2005; the freshwater. conducted essential. al., to and been is et marine burbot have from (Houdt burbot the studies burbot the of genetic the of history of of limited evolutionary genome degree structure only the some population present, sequencing and undergo At Thus, genetic isolation may available scarce. loci burbot the remains However, microsatellite the SNPs. fish the high-quality species, this genome-wide this including with for studies of resolved threats, conservation information be distribution the of can holarctic support series which will widest adaptation, A resources local of the 1999). Genomics extirpation Given al., 2010). or burbot. al., decline et et the (Tammi of caused (Stapanian have populations declined the species, burbot invasive have of many and of populations 16% exploitation, need distribution burbot fragmentation, in in and Finland, habitat otherwise completely in number pollution, are destroyed example, in or For been extirpated, declined 1990). have been severely species Lyle, or have have fish & threatened, cold-water (Maitland burbot in are on measures the populations change burbot conservation Many climate of The century. of stocks past 2010). impacts However, the al., the during 2010). of et indicator al., (Stapanian early et pollution an (Stapanian as and serve temperatures may habitats water marginal warming particular, in changes, 2 Lota eu a led naie uoenrivers European inhabited already has genus Posted on Authorea 8 Oct 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160218269.98008776/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 02 o cuaeyslcdainet.Tasrpoi aa(x5 p oae-00pafr)generated platform) NovaSeq-6000 bp, al., (2x150 et data (Doerks homologous Transcriptomic (v2.4.0) The GeneWise alignments. 1e-5). using by spliced [?] proteins accurately (E-value matching tBLASTn for the with using against 2002) by aligned 1997) genome then burbot were Karlin, the sequences to genome & aligned and (Burge database NCBI GenScan the and of sequences 2006) protein al., prediction, et assembly. homology-based genome the (Stanke polyacanthus For burbot (v2.7) the settings. Augustus in default genes using on protein-coding based performed the strategies was predict three 1.3.1 to genome, applied v. respectively repeat-masked were tRNAscan-SE databases, the the miRBase using on and identified Based were Rfam as genes the 1997). used (tRNA) Eddy, on RNA & sequences genes based Transfer (Lowe microRNA genome software software and 2005). the (rRNA) v.1.1 al., RNA with Infernal et ribosomal elements, The the (Griffiths-Jones repeat 2005). using al., known predicted et the (Jurka for were database for sequences Repbase ( the applied genome Finder against RepeatProteinMask were the queries Repeat and Tandem search 2007) RepeatMasker to prediction. al., genome. executed gene this et the in (Zhao before elements LTR_FINDER genome peat and burbot the 1999) in genome (Benson, annotated the were annotation search elements the functional to Repeat of and 2015) orthologs. completeness prediction copy al., the single gene et Second, 4,584 annotation, (Simao of respectively. Repeat 2.0) consisted 2012), (version which al., BUSCO database, the et using First, the (Mark by 0.7.10- against assembly. BLASR evaluated (version BWA-MEM genome was and using by the assembly 2009) genome of genome Durbin, assembly accuracy burbot & the and to (Li to completeness applied aligned r789) assembly. were was the genome reads package chromosomal-level assess PacBio R and the to the Illumina genome of performed in chromosomal-level quality were ggplot2 the the The methods Lachesis evaluate perform Two to reads. selected. to heatmap Hi-C were applied valid Hi-C genome and then genome-wide the contigs a was to corrected generate aligned parameters the High-quality uniquely default to using ends aligned with (2014). by independently both assembly were 2013) with ends al. al., pairs read et two were read et The The Rao reads the (Burton sequencing only 1.2.22. to Bowtie The and constructed. according with genome, platform. library, genome was NovaSeq-6000 the burbot the Illumina sequencing polished prepare the the for for to to mapped sequenced library used were was Hi-C libraries burbot fragment the Hi-C the assembly, of genome tissue chromosome-scale long muscle a the using obtain by To applied assembly genome then chromosome were the reads and polish short analysis Illumina to Hi-C using 2014). used polishing al., of was rounds et arrow two (Walker steps and The Pilon two 2013), with output, al., NextDenovo applied. et the (Chin were in data errors polishing sequencing sequencing pa_correction, sequence random 15,000; the genome seed_cutoff, correct To 300; of parallel_jobs, 100. with random_round, parameters: NextDenovo following and The estimate genome the 320; to 2013). burbot using survey al., by the genome et reads assemble the (Liu long genome to analyze PacBio burbot to performed the used was of was (https://github.com/Nextomics/NextDenovo) content data package repeat read and heterozygosity, short size, Illumina genome generated. the the were of cell method SMRT Kmer-based assembly one The the genome from using data and by and estimation Kb system, size library 20 II The Genome of Sequel protocol. size PacBio manufacturer’s fragment the the following a with by with sequenced USA) (PacBio, was library 1.0 SMRTbell kit low- an preparation with template constructed SMRTBell reads we analysis. discarding duplicated sequencing, subsequent and After long-read sequences, for For adapter protocol. used bases[?]5), standard were low-quality Illumina or reads bases the clean N to the 10% than according sequences, more bp with 300 (reads bases of quality size insert an with mexicanus , utoudlslimnaeus Austrofundulus , rza latipes Oryzias , mhpinocellaris Amphiprion , eiotu oculatus Lepisosteus , 3 nbstestudineus Anabas and , oohnacoriiceps Notothenia abinitio http://www.repeatmasker.org , saoiai calliptera Astatotilapia ooos n RNA-sequencing and homologs, , au morhua Gadus binitio ab binitio Ab eedwlae from downloaded were rdcino re- of prediction , Acanthochromis eeprediction gene , Astyanax were ) Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. otgN0o .5M Tbe2.Telnt fti sebywscnitn ihtegnm ieestimated size genome a the with with consistent assembly was assembly genome this Mb using of 575.83 assembled length a The were covering analysis. produced data 2). library, k-mer (Table which These by long-read Mb polishing, 2.15 the S4). pilon of Table from N50 and Supplementary contig data racon 1, high-quality by (Table Gb followed assembly 95.24 NextDenovo genome generated the platform of 36.60% genome 173.16-fold II of assembled Sequel content the PacBio repeat of The and completeness f 0.57% and of of assembly number heterozygosity Genome with total 3.2 representing Mb The library, ˜550 insert-size be S2). 150 to S3). Table Table estimated Illumina (Supplementary Supplementary was the size 1, from genome (Table a generated The genome with werer 62,863,455,719, burbot data was Illumina the -mers of of Gb coverage 79 genome 143.64-fold the of of total characterization A initial and estimation size analyses. enrichment 3.1Genome functional the discussion for and reference Results genome 3 zebrafish the to according groups orthologous zebrafish in excluding , (PSGs) species, genes other selected and albus, positively species foreground species-specific as the First, burbot , with selection. 4.5 identified PAML positive were in under burbot program genes ( CODEML find ratio the to dN/dS genes, applied the ortholog estimate the to among with used alignments was identified sequence were multiple pathways construct KEGG To and terms GO gene overrepresented contraction Significantly and corrected expanded applied. (burbot, information. The species was freshwater tree analyses correction three families. phylogenetic of contraction gene estimated families and changed gene the contraction expansion significantly and with shared indicate family 2006) the to Gene burbot, al., in used et families times. was Bie calibration 0.05 (De the < 3.1 value estimated as CAFE was used using were performed species were 2017) among of al., time times et divergence Divergence construct (Kumar to The 13 2003). concatenated all with 2014). and (Sanderson, affinis by crocea-Gambusia locus 2004) (Stamatakis, shared r8s (Edgar, gene genes RaxML the 3.8.31) orthologous each with by (version single-copy tree for MUSCLE The downloaded phylogenetic selected using results. were the a aligned was BLASTP using S1) further by transcript filtered Table were v2.0.9 the longest (14) species Information ORTHOMCL on the by based (Supporting constructed Only settings were fish default groups teleost Orthologous 100). variants. of splicing version alternative species (release 12 Ensembl of from sequences selection protein genomic The the for to testing assigned and was analyses 2005). database genomic al., 2008). Comparative (GO) the et (Lobo, Ontology for (Conesa utility Gene used Blast2GO BLASTN the were the using 1e–5 from and by of BLASTX information genes pathway using threshold by E-value and eu- and genes an ontology Genes 2003), protein-coding Functional with of the al., Encyclopedia databases of Kyoto 2000) et annotation and Goto, (Boeckmann functional 2003), & TrEMBL al., (Kanehisa All et HiCESAP. SwissProt, (Tatusov (KEGG) and (KOG) the Genomes 2016). proteins 2008) InterPro, al., of al., groups (NR), et et the orthologous (Cantarel protein (Ghosh karyotic to transcript MAKER nonredundant Cufflinks aligned by putative by removed NCBI were the was predicted spleen, The and redundancy was and 2016), and which liver, merged, al., structure, were gill, et gene models gonad, (Pertea gene using eye, (v2.0.10) detected muscle, HISAT2 were the using structures including by sequences tissues, genome six assembled of mixture a from lpaharengus Clupea lpaharengus Clupea .affinis G. and P .affinis G. aus[?]0.05. values nG em,adKG ahaswr nihd n h ejmn n ohegFDR Hochberg and Benjamini the and enriched, were pathways KEGG and terms, GO in ) and and eeslce sfrgon pce n te pce,ecuigzebrafish excluding species, other and species foreground as selected were ) sxlucius Esox sxlucius Esox (96.9–150.9), k sbcgon pce.Scn,trefehae pce ie,burbot, (i.e., species freshwater three Second, species. background as mr eka et f12(upeetr iueS n al S3). Table and S1 Figure (Supplementary 112 of depth a at peak -mers sbcgon pce.G n EGctgre eeasge to assigned were categories KEGG and GO species. background as , lpahrnu-.affinis harengus-G. Clupea ω Yn,20) w ieetbac-ielklho ai et were tests ratio likelihood branch-site different Two 2007). (Yang, ) aiihhscoe-oohnacoriiceps crocea-Notothenia Larimichthys 4 198–6.)fo h ieredatabase TimeTree the from (149.85–165.2) ootrsalbus, Monopterus (88–114), ai rerio Danio ai rerio Danio M. L. P k Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. 1) ovrey h ubtcerysoe otatdgn aiisi oohlccl deinvaplasma corrected via and adhesion rected (GO:0098742, S10 cell molecules Tables homophilic corrected (Supplementary adhesion in system membrane families (GO:0007156, immune gene molecules contracted and adhesion showed binding, clearly membrane ion burbot repair, the Conversely, damage S11). cell with associated were 34 and terms rected GO corrected 73 (GO:0046914, in binding enriched Atlantic 639 significantly displayed corrected and divergence were genome (GO:0015074, P burbot corrected burbot The burbot integration of (GO:0006259, of The DNA ancestor process families 4). metabolism common including gene (2018). (Figure the mainly expanded pathways, with Mya al. The KEGG compared et ˜44.4 4). families Hughes was clustered (Figure gene by were species contracted cod time cod gene cod 1564 Atlantic estimated 14,504 two and and the into burbot expanded between with clustered that consistent time be showed was divergence could and time orthologous genomes constructed the burbot single-copy the was the of and on tree using genes Based together, phylogenetic identified 21,664 S9). ML Table were addition, (Supplementary the genes In families genes, orthologous gene unique teleosts. single-copy 132 selected 2,650 including families, 13 and of families genes gene and 19,998 of rRNAs, 300 total genes tRNAs, A predicted S8). 6390 Table 20658 of (Supplementary databases of total prediction annotation RNA A total and noncoding 3.4 9). A by protein, (Table identified KEGG nucleotide, were 3. and microRNAs the Figure GO, The 519 to KOG, and in S2). TrEMBL, alignment shown Figure Swissprot, by (Supplementary was NR, annotated number InterPro, burbot exon successfully predicted of and were the characteristics length (95.36%) intron of genome length, of statistics exon summary length, The on : CDS 8). length, species,including based (Table mRNA teleost strategies coriiceps respectively oculatus,Notothenia ten morhua,Lepisosteus of bp, other limnaeus,Gadus mexicanus,Austrofundulus combination calliptera,Astyanax 1,223 to testudineus,Astatotilapia the ocellaris,Anabas and compared latipes,Amphiprion by were 292.38, models predicted 14,606, gene were were length genes 7). (Table tron protein-coding (5.38%) 30.96Mb 21,664 in elements initio of interspersed The assembly), total long analysis. the and of k-mer A 11.63%), 50.24% the Mb; Mb; from (289.32 (66.95 obtained element repeats transposable (36.60%) (Table DNA genome value terminal assembly the the long the of than of consisted larger 66.74% mainly for obviously sequences accounting was repetitive detected, content a were repeat sequences and This repeat Mb of 6). Mb 2.01 annotation 384.29 of gene of N50 total and A contig prediction a gene with annotation, Mb, Repeat 575.92 5). 3.3 was (Table (Figure strength correction Mb assembly genome signal 22.10 Hi-C this of interaction after of on N50 the quality genome Based scaffold high assembled and a 4). final easily indicated (Table which The distinguished Mb strong, 2). be 51.8 considerably was to could to diagonal anchored Mb pseudochromosomes the were 15.18 around 22 sequences from assembled the by (126.38 ranging the heatmap, assembly Gb of lengths the chromosomal-scale 69.51 88.66% chromosome a LACHESIS, generated with of into pseudo-chromosomes, library use oriented 22 Hi-C the and The With anchored S7). genes. approach. Table then the Supplementary duplicated genome scaffolding were 2.84% the of Hi-C assembly and in the 97.55% draft copy S6). found using the single and were and and in reads BUSCO S5 complete contigs complete Illumina Table the The of the (Supplementary of the quality of 91.93% genome of (4344/4584) the including assembled % evaluate 3), 94.67% the (Table that to 99.23 assembly to showed assembly that mapped analysis burbot showed BUSCO successfully the results The were to The reads aligned long were assembly. PacBio reads genome long initial PacBio the and reads Illumina The au 52E0) icinbnig(O0020 corrected (GO:0008270, binding ion zinc =5.22E-05), value oprtv eoisadtemcaimo dpint freshwater to adaption of mechanism the and genomics Comparative ooos n Nsq h vrg auso h eelnt,eo egh n vrg in- average and length, exon length, gene the of values average The RNAseq. and homologs, , P P au 4209E2) n eaooei ellnae(o44,corrected (ko04640, lineage cell hematopoietic and =4.200299E-20), value au=.8-0 Otrs mn ua n uloiesgrmtbls k050 corrected (ko00520, metabolism sugar nucleotide and sugar amino terms, GO value=1.18E-10) P au 11E9) aua ilrcl-eitdcttxct k060 cor- (ko04650, cytotoxicity cell-mediated killer natural =1.19E-91), value P au 20E0) ppoi rcs G:061,corrected (GO:0006915, process apoptosis =2.05E-06), value 5 P P au 26E2) ebae(O0100 cor- (GO:0016020, membrane =2.69E-29), value au 26E2) elcl deinvaplasma- via adhesion cell-cell =2.69E-29), value P hwn iia itiuinpten in patterns distribution similar showing , au 20E9) rniinmtlion metal transition =2.02E-96), value cnhcrmspolyacanthus,Oryzias Acanthochromis P au 00E0) DNA =0.00E+00), value P × la aa(al 1, (Table data clean ) au=.4-8 that value=3.04E-18) ab P Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. hswr a upre yteNtoa e eerhadDvlpetPormo China of Program Development and Research Key National (LH2019C070). the China of Province by Heilongjiang of Foundation supported Science Natural was and (2017YFA0604904) work burbot. This of history evolutionary the detecting elucidating useful flow, Acknowledgements and also gene populations in are assessing involved the data units, of genes These conservation adaptation in identifying candidate change. to beneficial local environment including of were analysis under applications, series results conservation Gadiformes genome A The diverse order whole in for analyses. species. process for genomics evolution cod comparative data the these other elucidating genomic in with identified second important burbot were the adaptation of and supplied widest freshwater annotation evolution the Gadiformes and represents the assembly order and investigate genome family the further The cod of long the world. PacBio of the genome member and in freshwater high-quality Hi-C fish only freshwater the the of integrating is range by burbot longitudinal provided The was burbot data. sequencing the read of assembly genome chromosomal-scale A The burbot. al.,2011). of Conclusion et studies 4. (Zhao evolution intestine future the for in resources absorption valuable acid as folic of on efficiency selection the positive affect to 0015884, shown previously Tables (GO: (Supplementary been pathways transport repair acid and S19). replication, of folic and metabolism, in S18 acid group suggests amino enriched a finding terms, were GO This of lineages =8.10E-05) value freshwater lineages. UV presence environment. freshwater of to The three ocean PSGs exposure in the The of convergent levels in damage. functionally high that had DNA the genes with with cause these consistent compared may was that environment selection radiation freshwater positive in UV under radiation repair the to DNA al., of fish in et component involved freshwater (Yao genes a of repair is break exposure ino80e double-strand The Protein DNA the 2007). to al., contributes et and 2008). (Das complex arrest INO80 cycle remodeling chromatin cell damage-induced DNA promotes Genes repair. (burbot, damage lineages point DNA freshwater to three related were with ) detected were PSGs affinis 38 Additionally, corrected (GO:0090304, S16). compound process Table metabolic cyclic acid organic nucleic and the =4.13E-03), genome in burbot corrected enriched the (GO:0034641, in functionally PSGs cess corrected were as (GO:1901360, PSGs identified were process burbot genes metabolic The 377 ratio of S15). likelihood total Table different A two (Supplementary adaptation, performed. freshwater were for model) selection (branch-site fatty positive tests omega-3 under capabilities. evolving acids, osmoregulatory genes fatty provide the omega-6 identify and To the fluidity with of membrane Compared cell components species. improve different freshwater help than the G (Ta¸sbozanacids acids & with gene fish fatty in consistent freshwater environment contracted omega-3 and ionic are of the stable marine levels findings for a between These found of acids was requirements fatty permeability. pathway functional omega-3 membrane reduced KEGG cell the concentra- enriched reflect for ion no may freshwater the However, families adjusting gene cell. for Such the critical families. outside are and functions associ- These inside families tions gene S14). contracted Table two (Supplementary and =0.00E+00) corrected families value gene (GO:0007155: adhesion expanded cell no with ated shared species freshwater three Notably, S13). corrected and (ko04621, S12 Tables signaling plementary receptor NOD-like and value=1.52E-04), znf385a . sfrgon rnh(upeetr al 1) orPG ( PSGs Four S17). Table (Supplementary branch foreground as ) slc19a1 slctdusra ntep3atvtn pathway. activating p53 the in upstream located is slc19a1 a nipratrl nflt rnmmrn rnpr.Lwomtcpesr has pressure osmotic Low transport. transmembrane folate in role important an has a mrv oi cdasrto o rswtrseis hs aawl serve will data These species. freshwater for absorption acid folic improve may P au 41E0) N eaoi rcs G:067,corrected (GO:0016070, process metabolic RNA =4.11E-03), value P t3 n nabp1 and stk33 au 18E0) ellrntoe opudmtblcpro- metabolic compound nitrogen cellular =1.83E-02), value P au=.0+0 n ebae(O0100 corrected (GO:0016020, membrane and value=0.00E+00) 6 atcpt ntemttcDAdmg check- damage DNA mitotic the in participate znf385a P kc ,21) aiefihhv higher have fish Marine ¨ ok¸ce, 2017). P stk33 au 61E0 (Supplementary ) =6.16E-03 value au=.1-2 ahas(Sup- pathways value=2.41E-02) neat ihp3T5 and p53/TP53 with interacts , ino80e slc19a1 , nabp1a .albus M. corrected , and P znf385a and value G. P P Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. ihBiol Fish Muˇska burbot M, Matˇena of J, Koˇcvara J˚uza Assessment L, ( T, (2018) J, burbot Duras of P, movements Blabolil Winter USA. Wyoming, (1993) , RJ Bull Baldes in sequences. drawdown MF, DNA Cook analyze EP, to Bergersen program figshare a in the finder: available repeats with openly 573. Tandem (1999) database also G is Assembly Benson data NCBI genome at the deposited References addition, SRR12577979). been In deposited and have been SRR12550374 has annotations at PRJNA663985. SRR12549430, genome and accession (SRR12549297, burbot for (SRA) genome GenBank data) Archive assembled RNA-seq Read and The Sequence Hi-C the Illumina, at (PacBio, data sequencing Raw Statement Accessibility Data manuscript. final the approved No and reviewed data. authors the interests analyzed All samples. of Z.Q.H. sequencing manuscript. Conflicts sequencing. the the genome collected wrote the H.Z. G. performed and T.X. and M.H.L. and DNA/RNA project. Z.Q.H. the the extracted managed X.S.J. and and conceived Q.L. Z.Q.H. and G. T.X. contributions Author ocmn ,BiohA pelrR lte ,Etece ,GsegrE atnM ihu K, Michoud its M, and knowledgebase Martin protein E, SWISS-PROT Gasteiger The A, (2003) Estreicher M Schneider M, 2003. S, Blatter in Pilbout TrEMBL R, I, supplement Apweiler Phan A, C, O’Donovan Bairoch B, Boeckmann orsT olyR,ShlzJ otn P okP(02 ytmtcietfiaino oe protein novel of identification Systematic (2002) P Bork CP, gene functions. Ponting of nuclear study J, with the Schultz associated for families RR, tool domain computational Copley A T, survival CAFE: cell Doerks (2006) Determines MW Hzf Hahn (2007) JP, Demuth SW evolution. N, Lee family Cristianini SA, T, Aaronson Bie A, transactivation. De Bernstein p53 Y, modulating by Kimura stress B, genotoxic Zhao upon L, Raj S, Das J, Huddleston long-read A, from assemblies Copeland genome A, microbial Clum (2008) finished C, Nonhybrid, Heiner M (2013) J, Yandell data. J Drake sequencing AS, Korlach SMRT AA, Alvarado SW, Klammer Turner P, C, EE, Marks Holt Eichler DH, B, Alexander genomes. CS, Moore organism Chin E, model emerging Ross for G, designed Parra pipeline , annotation SMC, easy-to-use Robb an L, MAKER: scaffolding Korf interactions. Chromosome-scale chromatin BL, (2013) on J Cantarel Shendure based JO, assemblies Kitzman genome R, novo Qiu de RP, of Patwardhan DNA. A, genomic Adey human JN, in Burton structures gene complete of Prediction 78. (1997) , S Karlin C, Burge noain iulzto n nlssi ucinlgnmc research. genomics functional in analysis and visualization annotation, G A, Conesa 18 https://doi.org/10.6084/m9.figshare.12927203 188-196. , , 92 t ,Gr´aGoe M eo ,Tlo ,Rbe 20)BatG:auieslto for tool universal a Blast2GO: (2005) M Robles Tal´on M, J, Garc´ıa-G´omez Terol S, JM, ¨ otz 1545-1559. , Bioinformatics a Methods Nat oalota Lota uli cd Res Acids Nucleic , 22 1269-1271. , , L 78 ouainssanblt ncnrlErpa reservoirs. European central in sustainability population 1758) (L. 10 clFeh Fish Freshw Ecol 563. , Lu 2020). (Liu, , 31 eoeres Genome 7 365-370. , Cell , 2 , 141-145. , a Biotechnol Nat , ´ h ,Vjrı ,HlbvaM eek J Peterka Vejˇr´ık Holubov´aR´ıha M, M, L, 130 ˇ 12 47-56. , 624-637. , Bioinformatics oalota Lota , 31 uli cd Res Acids Nucleic 1119-1125. , uiga extreme an during ) , 21 o Biol Mol J 3674. , eoeRes Genome , , 27 268 J , Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. ,Bloa ,Za M iX,Wn ,Fn ,XeB huZ,HagH hnS,VnaehB h Q Shi B, Venkatesh SL, Chen genomic H, and Huang transcriptomic ZC, on Zhou Becker B, based CH, Xie Li (Actinopterygii) C, R, fishes Fang Betancur-R ray-finned data. M, D, of Wang Arcila phylogeny XF, AW, Comprehensive Li Thompson (2018) XM, CC, Zhao Baldwin N, Y, Bellorae Sun L, Y, Huang Ort´ı G, LC, Hughes RNAs Non-Coding Annotating Rfam: (2005) A Bateman Genomes. A, Complete Khanna M, in Marshall Cufflinks. S, and Moxon Elbe, TopHat S, River Griffiths-Jones Using the Data in RNA-Seq of lota Analysis Lota (2016) Biology burbot, CKK of Chan use S, Ghosh structure bank Fishes. and Cottoid Freshwater Migration and (2002) Germany. Marine HH in Arzbach Morphology F, and Fredrich Diet in Convergence (2017) D Finnegan ( Ichthyol burbot the Appl of J preferences throughput. Habitat high (2013) and D Eick accuracy high with alignment sequence Res multiple Acids MUSCLE: (2004) RC Edgar oeT,Ed R(97 RAcnS:Apormfripoe eeto ftase N ee in species. declining genes six on RNA action current transfer fishes: of British Biol of detection conservation Fish Practical improved (1990) AA for Lyle PS, program Maitland A tRNAscan-SE: (1997) sequence. SR genomic Eddy (BLAST). Tool TM, Search Lowe Alignment Local evolutio- Basic the (2008) genomic into I insights of https://doi.org/10.6084/m9.figshare.12927203.v1 Lobo provides Estimation Dataset. lota) (2013) figshare. (Lota W freshwater. burbot Fan in of DS, adaptations assembly Mu nary genome YX, projects. Chromosome-level Chen genome (2020) novo N, Q, de Li Liu, in H, Zhang frequency XS, k-mer Hu analyzing JY, by transform. Yuan characteristics Burrows-Wheeler YJ, with Shi alignment BH, read Liu short accurate and and Fast Timetrees, (2009) Timelines, matics L.). R lota for Durbin (Lota H, Resource burbot Li A of biology TimeTree: Winter (2017) (1998) H S Lehtonen Hedges M, Suleski Times. G, genomes. Divergence and Stecher genes S, of a Kumar encyclopedia Update, Kyoto Repbase KEGG: (2005) (2000). J 27-30. S Walichiewicz Goto O, M, Kohany Kanehisa P, elements. Klonowski repetitive A, eukaryotic Pavlicek fishes. of of VV, database osmoregulation Kapitonov and J, excretion Jurka of aspects Some (1988) Z Jara the of history evolutionary the on view mitogenic A ( (2005) burbot F Volckaert gadoid, freshwater A, Perretti Holarctic LD, Cleyn JKJV, Houdt etaM i ,Pre M ekJ,Slbr L(06 rncitlvlepeso nlsso RNAseq of analysis expression Transcript-level (2016) SL Ballgown. Salzberg and JT, StringTie Leek HISAT, GM, with Pertea experiments D, Alignment Kim Local M, Basic Application. Pertea using and reads Theory sequencing (BLASR): molecule Refinement single Successive Mapping with (2012) Tesler Glenn Chaisson, Mark alAa Sci Acad Natl P , , 25 1374 , etcrf f Zeitschrift , 7SplA) 37(Suppl 32 1754-1760. , 1792-97. , 339. , , 29 uli cd Res Acids Nucleic o ilEvol Biol Mol 541-548. , rFischkunde ur ¨ , uli cd Res Acids Nucleic 115 319-334. : 66249-6254. , , 34 , oalota Lota , 1812-1819. , Spl1.) (Suppl 25 , 955-964. , yoee eoeRes Genome Cytogenet 33 D121-D124. , ). oalota Lota o Ecol Mol 159-178. , a Protoc Nat 8 rmteRvrEb:A xeietlapproach. experimental An Elbe: River the from ) , eoScFuaFoaFennica Flora Fauna Soc Memo 14 o Biol Mol J , 2445-2457. , M Bioinformatics BMC 11 , 1650. , 110 reldZoologiczny Przeglad , 462-467. , ri rpitarXiv preprint ArXiv 215 403-410. , uli cd Res Acids Nucleic ehd nMolecular in Methods , 13 , 238. , 32 , 74 . . Bioinfor- 45-52. , Nucleic , 28 J , Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. hoR ipBv ,Vsni ,GlmnI 21)Mcaim fmmrn rnpr ffltsinto folates LTR of (2007) transport sons. W membrane Hao of X, the Mechanisms Zhao in (2011) enzyme ID Co- epithelia. Goldman deubiquitinating RC, across M, Uch37 Conaway and Visentin L, the cells N, Florens of Diop-Bove MP, regulation R, Washburn complex. Zhao of chromatin-remodeling SK, Ino80 Swanson modes the H, Distinct in Takahashi (2008) and Y, JW proteasome Cai Conaway JJ, RE, Jin hen J, L, likelihood. Wortman Song maximum Q, by TT, analysis Yao Zeng and phylogenetic CA, Detection 4: Variant PAML Cuomo Microbial (2007) S, Comprehensive Z Sakthikumar for Yang Tool A, Integrated Abouelliel An Improvement. M, Pilon: Assembly (2014) Priest Genome AM T, Earl DA R, Shea Natale SK, Mazumder T, Young JJ, DM, eukaryotes. Yin Abeel Krylov AI, includes BJ, Wolf version EV, Walker S, updated Vasudevan Koonin AV, An Sverdlov B, database: S, COG Kiryutin Smirnov The AR, BS, (2003) Rao Jacobs AN, and Nikolskaya JD, SL, Jackson fish Mekhedov ND, on Fedorova eutrophication RL, of Tatusov Effects sampling. (1999) random a J on reveals Vuorenmaa based G cod Ta¸sbozan survey M, O, Atlantic a Rask of Nilsen lakes: J, S, sequence Finnish Lien Mannio in genome S, K, fisheries A, The Searle Lappalainen Lagesen S, (2011) Johansen J, O, K L, Tammi Jakobsen Andersen Du Skage R, N, B, T, Reinhardt Stenseth Aken Moum K, system. S, B, Malde J, immune Karlsen J, Omholt unique Vogel C, Thorsen I, J, Previti H, Jonassen C, Knight Kuhl Nepal F, T, R, Neufeld M, Gjoen Solbakken Winer P, Espelund J, MJ, Berg K, Paulsen A, M, TB, Tina Evenson Lanzen Rounge R, TF, Edvardsen OF, Gregers J, A, Wetten M, Tooming-Klunderud Malmstrom A, Lappalainen U, Sharma Grimholt JR, S, MH, Jentoft A, Jackson measures. Nederbragt conservation B, CP, and Star burbot Madenjian of empirical An status VL, Worldwide Erie: (2010) Lake Paragamian MD in L.) MA, lota (Lota Stapanian burbot of Recruitment prediction (2010) initio A approach. ab Cook modelling AUGUSTUS: LD, (2006) Witzel B MA, Morgenstern Stapanian S, Waack A, Phy- Hayes transcripts. Large I, alternative of Gunduz of Post-Analysis O, and Keller Analysis M, Phylogenetic Stanke for tool A 8: genome Version logenies. assessing RAxML BUSCO: (2014) (2015) A orthologs. Stamatakis EM single-copy Zdobnov with matu- EV, completeness gonadal Kriventseva annotation P, of and Induction Ioannidis assembly (2016) RM, Waterhouse S lota. FA, Wuertz Lota J, Simao burbot Adriaen in W, temperatures Meeus different P, at (Teleostei). Husmann ration fish B, gadoid Hermelink freshwater FJ, a lota), Schaefer (Lota burbot the Resour from Ecol Microsatellites Mol (2005) absence A the Meyer in M, times divergence Sanetra and principles evolution reveals molecular clock. resolution of molecular kilobase rates a absolute at Inferring of genome r8s: human (2003) the MJ of Sanderson map Omer 3D I, Machol A AL, looping. (2014) Sanborn chromatin EL JT, of Robinson Aiden ID, ES, Bochkov EK, Lander Stamenova AD, NC, Durand MH, Huntley SSP, Rao uli cd Res Acids Nucleic Bioinformatics kc A(07 at cd nFish. in Acids Fatty (2017) ¨ ok¸ce MA , 5 Bioinformatics 390-392. , clFeh Fish Freshw Ecol Cell Nature , , nuRvNutr Rev Annu 30 35 uli cd Res Acids Nucleic , 159 1312-1313. , W265-W268. , IDR necetto o h rdcino ullnt T retrotranspo- LTR full-length of prediction the for tool efficient an FINDER: , 477 1665-1680. , lsOne Plos , 19 207-210. , , 19 301-302. , , 31 , 326-337. , 9 , ,177. 34 e112963. , W435-439. , 9 ihBiol Fish J o cell Mol ihre aa Ecol Manag Fisheries , 89 Bioinformatics ihFish Fish M Bioinformatics BMC , 2268-2281. , o ilEvol Biol Mol 31 909-917. , , 11 , 34-56. , 31 , , 24 6 3210-3212. , 173-186. , 1586-1591. , , 4 41. , Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. iue4Pyoeei nlssaddvrec iete ftebro ihohrtlotspecies. teleost other with burbot (LTR). the repeats of tandem tree long content time and GC divergence TE, distribution, and DNA gene analysis depth, circle: Phylogenetic read inner 4 long to Figure depth, circle read outer short From genome, burbot. the of of characteristics Genome burbot. 3 the Figure of heatmap Hi-C Genome-wide 2 Figure ( burbot The 1 Figure legend Figure oalota Lota 1000 800 600 400 200 0 0 ) 200 400 10 600 800 1000 3 4 5 6 7 8 9 10 Posted on Authorea 8 Oct 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.160218269.98008776/v1 — This a preprint and has not been peer reviewed. Data may be preliminary. in-freshwater genome-assembly-of-burbot-lota-lota-provides-insights-into-the-evolutionary-adaptations- tables.pdf file Hosted vial at available https://authorea.com/users/365476/articles/485566-chromosome-level- 11 F E D C B A