Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. oetal epnil o hsaatto,tewoegnm euneo h pce nei otelake the to endemic the mutations of molecular sequence of presence genome the whole investigate To the lake. adaptation, sequenced. the this was in thrive, for with but alkaline, responsible survive, is only potentially , not North that Mongolia, species Inner fish in located China Nur, Dali Henan, Lake Xinxiang, Road, [email protected] Dong or She [email protected] Jian 46 E-mail: University, Normal Henan Fisheries, of College Culti- Aquatic China for Henan, * Province 453007, Henan Xinxiang University, of Normal Center Research Henan Technology vation, Engineering Fisheries, of Nie College Guoxing Wang, Xianfeng Dong, Jing Yan, environment alkaline Zhou Chuanjiang extremely in survival its into sights Liu Ruyao Zhou Chuanjiang in survival environment alkaline its extremely into insights provides Cobitidae) dalaica (: Triplophysa of genome chromosome-level The aac eoe eune nti td,ntol isi h nlsso laieaatto,btmyas siti revealing of in assist genome also chromosome-level may but The adaptation, alkaline T. of high-quality analyses The Triplophysa. the in ago. divergent population 898 years aids highly Dali million only alkaline of the the not 1 of with total study, that mysteries approximately associated a this the suggested populations, potentially in models, River analyses them sequenced Hai gene Demographic of genome, freshwater siluroides several dalaica endemic SLC4A4. T. with from cotransporter, selection, and diverged have positive bicarbonate tibetana, might to sodium T. could subjected as 98.62% dalaica, being such which, T. within likely comprised adaptation, predicted, of as Repeats were identified comparisons were genes recovered. were sequences protein-coding Through being genome 23,925 genes also whole of annotated. chromosomes total all A Nearly functionally most for genome. Mb. be whole 9.27 telomeres the of data, with of N50 Gb chromosomes, 35.01% contig this 106 25 approximately De a and for onto with Gb respectively. technology, responsible 126.5 oriented Mb, Hi-C of 607.91 potentially and and total totalled sequencing mutations anchored A long-read genome molecular that using sequenced. a generated of was species generated were lake assembly fish presence genome, the novo three estimated the to the the endemic investigate of of species To 200X the one nearly of covering dalaica lake. sequence Triplophysa the genome with whole in alkaline, the is thrive, adaptation, China, but North survive, Mongolia, only Inner not in located Nur, Dali Lake Abstract 2020 16, July 1 ea omlUiest olg fFisheries of College University Normal Henan orsodn authors: Corresponding 1 umn Yan Xuemeng , * oH,YntoTn,CagigYn,Wne a iWn,RyoLu Xuemeng Liu, Ruyao Wang, Xi Ma, Wenwen Yang, Changxing Tang, Yongtao Hu, Bo , 1 oHu Bo , 1 1 oga Tang YongTao , igDong Jing , rpohs dalaica Triplophysa * 1 inegWang Xianfeng , Abstract 1 hnxn Yang Changxing , 1 Cpiioms oiia)poie in- provides Cobitidae) (Cypriniformes: 1 n uxn Nie Guoxing and , rpohs dalaica Triplophysa 1 ewnMa Wenwen , 1 1 n ftethree the of one iWang Xi , 1 , Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. oioda(.Yn ta. 09.Hwvr h oeua ehnssudryn dpaint ihsalt high to adaptation underlying in mechanisms fishes molecular of the adaptation al., However, and of et diversification, populations 2019). Yang classification, in al., (X. of environments et analysis Plateau Yang comparative Tibetan (L. for the stage Cobitoidea a in set fishes and of biology, endemic assembly of genome local The in mechanisms investigating speciation 2019). for adaptation environments, of and environmental ideal sequence adaptation freshwater is the genome of inhabit whole which basis into the genetic altitudes, only techniques example, the high sequencing For elucidating not high-throughput at for fish. of fish evolution or opportunities the new lakes, These years, provided alkaline recent has In and 2014). speciation. saline ecological Pauly, in and adaptation & survive also (Froese can Wu, FishBase Yang, but (Wang, in technology recorded sequencing RNA as 2015). using He, resolved & be could Song, of the hypoxia response screening, high-altitude hypoxia and the to inheritance including in adaptation gene involved genes, alkali-resistant likely candidate for as Thirteen Therefore, reported shortened. been age. significantly of be years can two cycle fish, research at acclimatized generally high-altitude age, acid-base early or an homoeostasis, saline-tolerant ion other detected were in with excretion involved Compared urea genes 2017). and harboring al., elimination, sweeps et species oxygen selective Xu economic reactive (J. under their response, regions, to protein genomic Due unfolded regulation, of China. set in a significantly , intensified has development water investigatingvalue, salt-alkali for years, candidates optimal recent are In they as resistance, fish. stress such in and mechanisms species, tolerance long-term resistance basicity special alkaline the high of to in As number adapt resulted to small 2013). has ability a This al., only et with Mongolia, salinized. Xu Inner habitat, and of this concentrated weather Si, middle in gradually dry (Xiao, the auratus and fish been present) sandstorms of during to has perennial selection BP Lake dramatically the directional years Dali both fluctuated calibrated to the level (3,450 Due of Holocene water 2008). water late Lomtatidze, Mongolian Its & the Inner Itoh, during eastern Zhai, expanded Range. shrank have the constantly to artesian Khingan on believed and in Greater basin is Holocene It endorheic Mongolia, the 9.6). an Inner (pH from in mmol/L and meltwater 53.57 located to River, km is rise Yellow 1,600 lake can alkalinity the to This its where of Nur. China, tributaries North Dali Plateau, Lake and as trunk such the waters, in predominately China, genus divergent dalaica Triplophysa highly the of mysteries the revealing Introduction in assist also may but of adaptation, several Hai with freshwater high-quality endemic selection, ago. The from years positive diverged million have to 1 cotransporter, might subjected approximately population bicarbonate being Dali populations, sodium the River likely as that such as 98.62% suggested adaptation, which, identified analyses alkaline within were Demographic with predicted, genes associated were 898 potentially genes of of 25 them protein-coding onto comparisons total 23,925 oriented Through a and of models, anchored annotated. total were approximately functionally A sequences comprised be genome. Repeats genome could using whole whole recovered. generated being all the were also Nearly of genome, chromosomes 35.01% Mb. most estimated for 9.27 the telomeres of 607.91 of with totalled N50 200X chromosomes, genome contig a nearly generated a covering assembly novo with data, De Mb, Gb respectively. technology, 106 Hi-C and and sequencing Gb long-read 126.5 of total A .auratus C. Linnaeus, 2 uigteeryHlcn 1,0–,0 airtdyasB) u oams nu fglacial of influx mass a to due BP), years calibrated (11,500–7,600 Holocene early the during and ecsu waleckii .dalaica T. samme ftefml oiia:Cpiioms ti ieydsrbtdi northern in distributed widely is It Cypriniformes. Cobitidae: family the of member a is Triplophysa .dalaica T. .waleckii L. eoe eune nti td,ntol isi h nlsso alkaline of analyses the in aids only not study, this in sequenced genome, .dalaica T. .siluroides T. a hiei h ihyakln aeDl u,tu eosrtn their demonstrating thus Nur, Dali Lake alkaline highly the in thrive can s r togydvre ru ffihecmasn 3 ai species, valid 137 encompassing fish of group diverged strongly a are fish aeawy trce rae teto than attention greater attracted always have and , eanrltvl oryunderstood. poorly relatively remain .dalaica T. rvddgnmcrsucst etrudrtn iea loach Tibetan understand better to resources genomic provided .tibetana T. 2 bet dp otehgl aiie niomn (J. environment salinized highly the to adapt to able , .dalaica T. .dalaica T. adasldfudto o ute investigation further for foundation solid a laid ugsigta eei ehnssof mechanisms genetic that suggesting , , .tibetana T. .dalaica T. .dalaica T. ηιφ-1αΒ ec eulmtrt at maturity sexual reach and , Triplophysa .siluroides T. and In . ηιφ-2αΑ, Triplophysa .waleckii L. SLC4A4 Carassius . have gene . Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. o iClbaycntuto,wt h irr eune nteIlmn ie e ltom sn the using platform, Ten X HiSeq Illumina the aforementioned on the sequenced from genome, library tissue, entire the the Muscle with within 2009). construction, position al., library spatial et Hi-C in for (Lieberman-Aiden relationships, studied bioinfor- DNA with be chromosome combined to technology, whole capturing sequencing By allows high-throughput assemblies. analyses, chromatin, genome chromosome-level matic in construct patterns to interaction applied DNA widely all to been has 2009), technique technology Durbin, Hi-C all Hi-C & The 2014). Next, using Li al., 2012). genome et (H. (htt- (Walker Tesler, BWA-mem the iteratively & using of v0.3.0 Pilon PacBio contigs (Chaisson Scaffolding of the FALCON corrected pipeline 3-rounds of the for BLASR using results to genome the mapping the mapped reads following the polish were on when reads long based genome Illumina Arrow, assembled PacBio filtered using the corrected of was to assembly reads basis draft length The a the step. fol- with assembly performed pipeline, on initially FALCON were the performed pre-assembly lowing and was correction Error following assembly ps://github.com/PacificBiosciences/falcon). the novo using Genome estimated De the size of genome correction and and Assembly Kdepth, as 2017). selected al., randomly peak et were (Vurture main Kingsford, 2018), data & the (Marcais Gu, clean 2.2.10 of & version Gb formula Jellyfish depth Chen, 37.5 using estimate estimated with Zhou, totaling to was 17-mers 2011), (Chen, data genome, effective of fastp sequencing estimated distribution genome using depth the The whole cleaned of the were 50X from study About selected this settings. in default generated under reads paired-end size 2500 genome All HiSeq of Illumina the evaluation on and de sequenced analysis RNA- for and mode. one K-mer constructed used PE tissue, was bp each individual bp, For 150 350 aforementioned Bio). the (Takara of the using kit size platform, of purification insert ovary, RNA an total with and library, a including spleen, sequencing using tissues, eight gill, extracted from were reads. muscle, RNA using sequencing, total paired-end heart, novo sequencing prediction, bp gene and brain, 150 for construction liver, possible of as library intestine, millions evidence Illumina of expression many both generation as to the obtain To in subject resulting also platform; was 2500 1B), HiSeq com- Figure For the platform. of was sequencing 2 DNA II Extracted Biosciences, Sequel site PacBio (Pacific mode. sampling the protocols PE of manufacturing cell freshwater bp PacBio one another 150 using to CA, parison, sequenced the according Diego, then using libraries, were San kb system, Libraries (Illumina, USA). 20 2500 CA, procedures two HiSeq construct standard an a to Illumina with extracted on used to library, sequenced was also according paired-end was DNA constructed A library was genomic The method. bp, Total USA). extraction 400 MS-222. phenol/chloroform of standard tricaine size the anesthetic, insert using the tissue with muscle treatment from following dissected ly 116 female A sequencing and collection Sample discussed. also were METHODS with region AND compared freshwater MATERIALS was consisting the set set and gene gene final Nur final the Dali a adaptation, Lake and alkaline alkaline resul- Mb, of to the 9.27 sets related of of genes gene size available size possible The N50 publicly detect To contig technology. a Hi-C genes. with and 23,925 Mb, sequencing of 607.91 long-read approximately PacBio was both genome using of tant Nur, genome Dali chromosome-level Lake the of present waters we study, this In ° 92”,smln ie1o iue1) n ujce oDAsqecn.Tefihwsimmediate- was fish The sequencing. DNA to subjected and 1B), Figure of 1 site sampling 39’24”E, Genome .dalaica T. Size Fgr A a olce rmLk aiNr nInrMnoi (43 Mongolia Inner in Nur, Dali Lake from collected was 1A) (Figure = Knum/Kdept .dalaica, T. .siluroides T. rmteHiRvri ea rvne(35 province Henan in River Hai the from eoehtrzgst a siae sn eoecp 2.0 GenomeScope using estimated was heterozygosity Genome . and uo f2 badalength a and kb 28 of cutoff .tibetana T. 3 .dalaica, T. ffciepplto iednmc ewe the between dynamics size population Effective . .dalaica T. hc naisteeteeyalkaline extremely the inhabits which eoesz,uigkmranalysis. k-mer using size, genome cutoff ° ro 7k hsnfrthe for chosen kb 27 of pr 42.”,113 54’28.0”N, .dalaica, T. ° 51’26.0”E, ° a used was 22’43”N, Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. ult ee,fwrta 0ecddaioaisado abrn rmtr emnto rframeshifts, 2018 or Low- (release termination of premature strategies. to harboring annotations used transcriptome-based set. Functional and/or was gene and acids the 2008) homology-, amino from al., novo, removed encoded et also 50 de resul- (Haas were than with using v1.1.1 fewer regions; (EVM) models exon genes, EVidenceModeler gene quality possible set, predicted of gene sites all consensus fed acceptor integrate a then and (https://github.com/TransDecoder/TransDecoder/wiki). generate was donor TransDecoder Cufflinks-set to the using Finally, The identify predicted to regions 2011). 2003) Pachter, coding (Cufflinks- al., tant & models et gene Rinn, into (Haas Donaghey, assembled PASA Trapnell, subsequently into and (Roberts, 2012), mapped Cufflinks al., were using et reads using (Trapnell RNA-seq set) TopHat predicted prediction, using regions transcriptome-based genome coding For the possible 2000). to with Durbin, regions; & aligned (Birney best v2.2.0 the GeneWise against compared sequences protein the , .Gn oeso hs pce eedwlae rmteNB aaaeadte eeue oidentify to used 2003) were Roos, then & Stoeckert, and Li, Database (L. NCBI pipeline the v2.0.9 settings. from OrthoMCL default downloaded the under following were families species gene these orthologous of potential models ( Gene Cobitidae in ). two ( included These in Cypriniformes. four in fish relative other of six position (5.21-60.0), evolutionary database the ascertain InterPro estimation To the time on divergence hits. based and best genes analysis the predicted Phylogenetic to annotate according to InterProScan assigned Next, used terms sequence. was GO hit with 2005) best the al., from extracted et then were (Quevillon pathways KEGG and Descriptions 2009). proteome the (Majoros, prediction, GlimmerHMM homology-based For Steinkamp, 2007), (Stanke, used. including Guigo, was species, v3.3 & 2004), six Augustus the (Korf, Parra, of including SNAP each (Blanco, and software, of 2004), v1.4.4 of protein Salzberg, variety GeneID possible & a 2004), predict Pertea, prediction, to Morgenstern, novo used & were de Waack, strategies For applied. server transcriptome-based settings candidate v1.2 and genes. default RNAmmer with with homology- coding respectively, the with rRNAs, 2018), and and together 1997) tRNAs al., novo, Eddy, predict De to & et used (Lowe (Kalvari also server were search 2007) 13.0) (v1.3.1) al., (release et tRNAscan-SE (Lagesen The miRNA, database 2009). rRNA, Rfam al., of the et models the in covariance in the residing deposited regions scanning genes by snRNA performed (Tarailo-Graovacand was RepeatMasker annotation using identified ncRNA were Homology-based TEs (Tarailo-Graovac the applied all LTR settings Next, and 2009). default 2007). v.1.0.11 Wang, with Chen, & novo, RepeatModeler & Xu de 2015). Z. Kohany, elements, 2009; & transposable Chen, possible & Kojima, Chen, the identify (Bao, in & to query (Tarailo-Graovac present used the (TEs) also RepeatMasker as elements 1999). 22.11) transposable known (v. (Benson, identify base was se- 4.07b to 2003) used repetitive Finder Graner, was & moderately Repeats 2009) Varshney, the Tandem (SSRs), Michalek, in (Thiel, using repeats SSRs tool identified sequence for MISA search simple The to of sequences. used repetitive comprised highly genome and the quences, in Ns. sequences 100 Repeat using in filled genome were implemented ori- contigs the method and adjacent of clustering anchor between Annotations hierarchical Gaps to agglomerative polished 2013). used the an al., and to et using constructed mapped (Burton was chromosomes, alignments then Lachesis map low-quality the and removing contact to 2018), After chromosomal contigs 2012). al., inter/intra entate Salzberg, et an & (Chen reads, (Langmead fastp duplicate 2.2.5) using and (version cleaned 2 were Bowtie reads using Raw genome mode. PE bp 150 rpohs siluroides Triplophysa 0,adKG rlae8.)dtbss sn ls iha -au f1- Cmcoe al., et (Camacho 1e-5 of E-value an with Blast using databases, 84.0) (release KEGG and 10), nbrlu grahami Anabarilius .dalaica T. and , .dalaica T. ihpou maculatus, Xiphophorus .dalaica T. eoe rlmnr eetduigBAT Evle[]1-)(Camacho 1e-5) [?] (E-value BLASTN using detected preliminary genome, synxmexicanus Astyanax rdce ee eepromdb erhn h r O,Uniprot KOG, nr, the searching by performed were genes predicted , .dalaica T. .rerio D. eoe ihdfutprmtr ple.Tne eet were repeats Tandem applied. parameters default with genome, , ainlatranslucida Danionella eemdl fisgnm eecmae ihta of that with compared were genome its of models gene , 4 a apdot the onto mapped was , ai rerio Danio , cauu punctatus Ictalurus .siluroides T. and , .dalaica T. .dalaica T. eaorm amblycephala Megalobrama and eoe n ahof each and genome, eoe ihRep- with genome, , .tibetana T. aiuurubripes Takifugu ne were finder ,and ), Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. aaue o h env sebyo the of assembly novo be 0.375%. de approximately to was the estimated rate was for heterozygosity size species used 17-mer genome estimated other Data to the the of Moreover, S2), subject sizes (Table respectively. then genome 53 2019), estimated were reported al., of the previously et reads depth of the a These to 50-fold at similar approximately located siluroides data. was covering was which genome peak reads, genome Mb, NGS main sample 631 short generated the estimate Gb To the As 37.5 S1. from analysis. Table of selected in total were summarized a are genome, heterozygosity, report this and in size generated data next-generation All the of of He, (EPS) assembly size & novo population Long, effective De Zou, the Chen, infer (Fu, to -r5 DISCUSSION BCFtools 4.13e-9 used -t15 AND of by -N25 were RESULTS rate time, follows: followed generation as mutation 2009), set 2-year cyprinid parameters This a al., reported with and genome. et PSMC, previously 2009). heterozygous 2010), using A the Durbin, Li history of & demographic (H. “4+25*2+4+6”. sequence the Li consensus -p SAMtools were deduce a (H. to assembly, reads generate applied used short to the then settings used all was and were default 2018), 2017), results with McCarthy, al., & BWA-mem, mapping et (Danecek using (Chen both assembly fastp on novo using de Based filtering the and to cleaning mapped after Briefly, demographic technology, compared. next-generation quently using sequencing, two genome for whole for histories generated reads short on Based group, history ortholog Demographic each for ‘fix ‘fix twice parameters models. hypothesis, performed hypothesis, two were alternative null the analyses the the convergence, between for For check To while used. final respectively. 1, were 1.5, with to and 2) set 0 = to each NSsite set were and ‘omega’ estimate 2 to and seqCutoff = used codon, omega’ was (model 2007), = (Yang, models seqType v4.9e Ashkenazy, branch-site PAML follows: in (Sela, this, as included GUIDANCE2 set ( codeml, Next, using parameters ratios model with muscle. performed dN/dS gene = groups, were msaProgram on ortholog and alignments all based 0.3, sequence for strategy, = 2015) hits Multiple Pupko, acclimation, best & alkaline used. reciprocal Katoh, to was A related 1e-05, identified. be were [?] to groups public likely gene three those between orthologous comparison specifically of set genes, new selected a positively detect precisely Demuth, de- To Cristianini, were Bie, genomes (De these (CAFE) in v4.0.1 2006). residing Evolution, Family Hahn, families gene & gene of of Analysis contraction Computational and using tected expansion possible for the 110, genes OrthoMCL, estimated = candidate using times BDparas dalaica divergence 7, T. on = selected Based model positively 1.73, and [?] dynamics RootAge family 2, Gene points. = calibration clock as follows: set as set (http://www.timetree.org/) kappa with assess were 2007) Timetree (Yang, To parameters from v4.9e MCMCTree package downloaded The PAML model. in times for included GTRGAMMA MCMCTree, estimations, divergence RAxML using the time using known implemented into Divergence using were filtered fed performed. fishes, analysis were regions and relative replicates and phylogenetic concatenated variable bootstrap then perform 100 highly were robustness, to with Alignments topological 2014), sequences; 2007). (Stamatakis, nucleotide subsequent Castresana, v8.2.11 and 2004), into & (Edgar, (Talavera analysis back MUSCLE v0.91b phylogenetic using transformed Gblocks molecular aligned were were both sequences they perform protein deduced where to Briefly, used estimations. were time divergence genes orthologous copy Single am 2 alpha 62, = gamma and p vle bandtruhcmaio fbt h h2dsrbto n wc h R values, LRT the twice and distribution chi2 the both of comparison through obtained -values .tibetana, T. ω ,t eueslcinpesrslaigt h urn vlto of evolution current the to leading pressures selection deduce to ), .dalaica T. hc eeetmtdt e680 b(.Yn ta. 09 n 5 b(.Yang (X. Mb 652 and 2019) al., et Yang (L. Mb 638.07 be to estimated were which am 1 rgene 11, = gamma eeddcdfloigtePM ieie(.L ubn 01,adsubse- and 2011), Durbin, & Li (H. pipeline PSMC the following deduced were .dalaica T. Triplophysa genome .dalaica T. pce eeae sn LS 223,wt nEvlecutoff E-value an with v2.2.30, BLAST using generated species .dalaica T. am 31,adsigma2 and 23.18, = gamma n eaieseis swl sgn aiisidentified families gene as well as species, relative and 5 eoewr eeae sn aBoSqe II. Sequel PacBio using generated were genome Triplophysa am 14.5. = gamma mg’ad‘mg’were ‘omega’ and omega’ .dalaica T. pce,sc as such species, .dalaica T. oachieve To . .dalaica T. . T. - Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. sdtbs umre,aesoni al 5 nttl 354 86%o h oa 395gns ol be could genes, 23,925 total the of 98.62% 23,594, total, In functions. S5. potential Table genome. having well in the as as shown assignments, in annotated term are residing GO summaries, 1,207snRNAs and of database pathways, and KEGG Distribution as rRNAs, descriptions, 684 function respectively. tRNAs, including annotation annotations, 11,504 9.9, ncRNA CDS Functional miRNAs, homology-based average and the 1,664 contrary, bp length, of the total On 1,715.77 gene a robustness. average bp, showed annotation with 12,128.58 and based, conservation genes, gene, gene homologue among both 23,925 initio, per similar ab was of exons, of parameters consisted of results these set the number integrating This and by length, obtained (8.9%) predictions. was LRT genes based by protein-coding followed RNA-seq of (16.01%), set proportion final greatest The the and up the homology made of (4.24%). Combined 35.01% transposons LINEs MISA. for DNA and using accounted which, obtained annota- sequences of were function repeat 2), SSRs that gene (Table showed 197,396 and results of model, based total gene novo a repeat, de annotations, include repeat genome For assembled newly tions. a of the annotations indicate Generally, all collinearity, perfect the nearly annotations numbers, present the Genome gap the minimal and of the chromosomes, quality rate, most of anchoring level for high high sequences between the collinearity together, telomeric gene taken of technology, Thus presence Hi-C 2). (Table on (Figure chromosomes based collinearity eight results, perfect for scaffolding end the single of sequence one accuracy telomeric in- dalaica at of the 10 and stretches assess than chromosomes, To long less 14 captured had 1). of assembly chromosome interaction ends our each The Moreover, Notably, all de- at Mb. 5. 1. (TAACCC/TTAGGG)n were 23.6 chromosome relationships Figure of length, except in interaction length genome gaps, shown N50 on whole troduced are scaffold 200X Based chromosome the a each nearly in of library. resulting along than covering 96.3% chromosomes, relationships Hi-C data, approximately chromosome, 25 the covering onto of the orientated contigs, for Gb and on 141 anchored generated 106 technology, locations were of Hi-C size, closer total using tected genome a between technology, estimated frequently sequencing the more next-generation of occur on Based interactions that farther. thought is and It robust was assembly the genome of the found that Scaffolding BUSCOs suggest complete results the these of together, Taken 93.7% complete. approximately nearly single- S4). 4,584 (Table with the assembly , of the coverage all in the among estimate (Simao, conserved to (BUSCO) used from genes Orthologs were ranged copy Single-Copy 2015) reads Universal Zdobnov, aligned Benchmarking & of Kriventseva, Moreover, percentage Ioannidis, the Waterhouse, The S3). (Table to evaluate applied. To 91.08% assess mapped parameters to contamination. default were to 84.78% with bacterial reads 2015), conducted no RNA-seq Salzberg, suggesting was detected, & assembly; all was Langmead, analysis the content assembly, GC for GC the of 38.77% of distribution of coverage unimodal content scrutinized. and a GC throughput thoroughly result, average a an thus, was As with available; genome sequencing. technology assembled before contamination II the potential sequel of platforms. newest quality previous the from The improved used of significantly we length of were N50 because lengths length contig N50 N50 be the Contig subread may Thus, Pilon. This 2019). and contig al., Arrow reported. a using et assembly Yang with rounds genome X. Mb polishing Preliminary 2019; several 607.91 respectively. after approximately reported kb, assembly, obtained formerly 32.4 Mb, genome the and with 9.27 ultimate kb of remained, with 19.7 size subreads v0.3, of N50 FALCON Gb, length in 126.5 N50 totaling performed and was 200X, length nearly read removal, average read an low-quality and adaptor Following n t w eaiespecies, relative two its and .dalaica T. .siluroides T. genome .dalaica T. .rerio D. and .dalaica T. .tibetana T. and eoeassembly. genome n h pce sdfrantto Fgr 1,suggesting S1), (Figure annotation for used species the and .tibetana T. ee28 bad31M,rsetvl L age al., et Yang (L. respectively Mb, 3.1 and Mb 2.87 were 6 .dalaica T. eecmae,wt eut niaignearly indicating results with compared, were , .dalaica T. a uhlne hnta formerly that than longer much was eoeuigHST (Kim, HISAT2 using genome .dalaica T. eoeassembly genome T. Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. yaiso eefmle eiigi h eoe h nlssdtce 3 xaddgn aiis swell as families, gene expanded 431 detect the detected to in analysis used families, The was gene CAFE genome. contracted clustering, the species. family 491 in the gene residing as of OrthoMCL families adaptation and gene environmental estimation of denote time dynamics may divergence families both gene on of Based contraction and expansion genome Significant the in dynamics family Gene of that showed dliaCr242 r napstv eeto lse TbeS) hc a lyapvtlrl nthe in role pivotal a play may which (Tdalaica.Chr18.955, S7), genes (Table ANP Two cluster in 2009). selection sodium al., regulate os- positive et to in (Cao a involved response confirmed eels in system been in has endocrine are hormone (ANP) key regulating Tdalaica.Chr02.422) peptide a sodium natriuretic is atrial primary while system the vertebrates, (NP) as in peptide homoeostasis (Tdalaica.Chr17.484: natriuretic ion and genes The ion moregulation CA and S7). two (Table gene that cluster CA found selection We the 1994). between Perry, Tdalaica.Chr24.506: relationship & , important Laurent, an Wood, (Goss, indicating regulation stress; saline-alkali under pre- up-regulated in enzyme investigation metal similar CO zinc-containing a as regulation, conducted a ion such is regulation, regulation, permeability CA including acid-base HCO processes, 2004). and osmotic export Boron, biological and cotransport various & in Fulton, environments involved bicarbonate alkali Romero, dominantly 2007; sodium high Schwartz, tolerate and & fish (Purkerson (CA) which anhydrase through mechanism carbonic portant both that (Na (KEGG: reported carriers adhesion have S8). focal studies (Table and enriched Previous per- significantly map04370), paracellular was (KEGG: to which (VEGF) ways of (KEGG: factor of each (CAMs) number map04510), growth molecules a endothelial adhesion in cell balance. vascular acid-base map04512), participate map04514), and (KEGG: genes homeostasis ECM-receptor selected ion including pathways, positively with both meability of that Nur, selected maintenance suggest the positively piezo-type Dali analyses to with Two of KEGG Lake key associated mechanism adaptation to another S8), conditions. the is endemic Table stress explain transportation map04020, fishes partially salinity (KEGG: the may in pathway and genes, players channels, of signaling ion pivotal calcium adaptation with enriched be associated the the Tdalaica.Chr08.21) to for Tdalaica.Chr01.107; were thought (FAM38: force selection, genes aquaporins mechanosensitive driving positive and to key channels subject a likely ion is genes, candidate change 898 Salinity of set a two result, S7). other environmental (Table a denote identified the As also selection identified. with positive were comparing to selection subject When genes dynamics, family adaptation. gene significant to addition genome In the in “Parathyroid genes and contracted selected synapse”, S6). Positively for “Dopaminergic (Table while pathway”, action” signaling secretion”, and “Estrogen “Bile secretion, were and synthesis, three pathway”, hormone top signaling the “Relaxin genes, junction”, family “Gap were genes family these of that tree showed phylogenetic It a 3). method, (Figure likelihood confidence maximum with high the with and reconstructed genes was that copy species showed single clustering these family using Gene constructed species. in related genes from 23,925 to genes of ortholog copy total single a on based deduced were of position phylogenetic estimation The time divergence and analysis Phylogenetic .dalaica T. Triplophysa .siluroides T. .dalaica T. + n ,2 eesnl oyotoosars l eae pce.Bsdo ohtesupermatrix the both on Based species. related all across orthologs copy single were 4,126 and HCO / s ieyt ea es 46Ma(iue4). (Figure Mya 14.6 least at be to likely fish dnie stebslseiswti the within species basal the as identified 3 - and CAXVI ornpre,Slt Carrier4A4, Solute cotransporter, .dalaica T. .tibetana T. .dalaica, T. .dalaica. T. ,adone and ), nohnhsmykiss, Oncorhynchus ol edvddit 921gn aiis fwih16wr unique were 116 which of families, gene 19,271 into divided be could 2 eyrto:CO rehydration: iegdapoiaey83 ilo er g Ma,wt h history the with (Mya), ago years million 8.31 approximately diverged .dalaica T. ihnteodrCpiioms n t siae ieo divergence of time estimated its and Cypriniformes, order the within SLC4A4 .dalaica T. eoe h o he nihdptwy o expanded for pathways enriched three top The genome. ee(Tdalaica.Chr04.806: gene Triplophysa 7 2 +H SLC4A4 nigta xrsinlvl fteC eewere gene CA the of levels expression that finding Triplophysa 2 oeteeakln niomns Paracellular environments. alkaline extreme to H = O pce,gnswt vdne fpositive of evidences with genes species, a eit HCO mediate can ) 2 CO pce.Dvrec ieestimation time Divergence species. 3 H = .dalaica T. + SLC4A4 +HCO scoe to closer is 3 3 – eei positive a in were ) - rvosstudy previous A . rnpr;a im- an transport; 3 - rmtebody the from .tibetana, T. CAV Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. hswr a nnilyspotdb h olwn udn:teNtoa aua cec Foundation Science Natural National the funding: following the by supported financially was BioProject: work under This Archive Read Sequence NCBI INFORMATION the FUNDING into deposited been have PRJNA624716. data sequencing DNA The authors. STATEMENT AVAILABILITY all DATA M.W.W. from and and input assembly with W.X. the manuscript interpreted H.B., the and D.J., ORCID wrote analyzed N.G.X. Y.X.M., W.X.F and L.R.Y., and H.B. Y.C.X. Z.C.J., research. T.Y.T., annotations. the Z.C.J., designed experiments. and the conceived performed N.G.X. with and population, Z.C.J. River Hai the factors. from of environmental related CONTRIBUTIONS Mya population different likely AUTHOR 1 from the genes approximately resulting that diverged pivotal likely have suggested of fluctuations might analyses number EPS Nur, Demographic different a Dali identified Lake species. analyses to the selection endemic short of Positive adaptation RNA-Seq alkaline Mya. the to 8 of than mapping more occurring including obtained assessments, were anchored of of dalaica telomeres contigs comparisons number Importantly, all model A with gene nearly level. collinearity Through with and chromosome chromosomes. Mb, appraisal, the 9 BUSCO the at reads, exceeded of was assembly majority assembly the the of for length thus, N50 chromosomes; contig to The sequence genome China. whole individual a northeast obtained an we technology, Hi-C of and approximately assembly sequencing proportionally. long-read concluded both increased of have EPS application and Through may shrink, expansion to lake begun Lake of the Conclusion process point which This at Mya, EPS. ago, 1 years decreased approximately 10,000 in developing begun resulting the enlargement, have According for lake possibly may water level. shallow with Nur low 400,000 of extremely Dali along plenty nearly an onset Lake providing However, reaching to (1988), the area Mya, declined whole Gao to gradually 1 the of corresponds has approximately with research EPS This since geological the expanding However, then, the time. slowly Since 2009). to short been Mya. al., a has 0.4 et in approximately population (Clark are constant expanded at the ago almost EPSs has years of is two freshwater, EPS 19,000-20,000 EPS the to approximately the the that endemic deglaciation, shows then, population, Hemisphere River It Since Hai Northern the 5. of ago. of Figure EPS years in The 20,000 shown Mya. until is 1 data than more genomic by the similar from deduced history of S7). involvement (Table Demographic the acclimation salinity of factors and evidence growth homeostasis further history the ion fibroblast providing Demographic IGFI of in selection, with one regulation involved positive found 2009), the to were also in We al., subject (IGFI) hormones genes, regulation. et these I FGFS permeability (Cao factor three paracellular of and acclimation growth modulators gene, salinity insulin-like important and and S8). as (Table behaving (GH) homeostasis hypoxia (Haase, (FGFS) hormone altitude ion leaven high growth of sugar to study, regulation response genes, and previous the selected transport in a positively role glucose in hypoxicIn ex- pivotal enriched as the the a significantly play controlling such was in may by map04066) factor processes, it oxygen (KEGG: activation suggesting physiological pathway low transcriptional signaling to to HIF-1 key tolerance related 2013). a body’s genes as the of enhances acts pression which (HIF-1a) mechanism, alpha regulation 1 response factor inducible Hypoxia a lsrto closer was hajagZo https://orcid.org/0000-0002-6433-737X Zhou Chuanjiang .tibetana T. .dalaica T. than .dalaica T. nei oteakln aeDl u,lctdi ne Mongolia, Inner in located Nur, Dali Lake alkaline the to endemic .siluroides T. .dalaica T. .rerio D. ihohrfihblnigt yrnfre,w on that found we Cypriniformes, to belonging fish other with uvvlaeswr rdal rwe yde waters, deep by drowned gradually were areas survival 8 ihdvrec between divergence with , and .tibetana, T. .dalaica T. hs t P rdal expanded. gradually EPS its thus, ; l ugs ihasml accuracy. assembly high suggest all .dalaica T. and .dalaica T. .tibetana T. T. , Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. a,W,Kjm,K . oay .(05.RpaeUdt,adtbs frpttv lmnsin elements repetitive of database a Update, Repbase (2015). O. Kohany, & K., K. genomes. eukaryotic Kojima, W., Bao, Reference data. genomic using species. freshwater), relative (Hai, River Hai S1 the Figure and water) alkaline (Dali, Nur Dali bootstrap 100 using 5 supported Figure fully were nodes all and method, likelihood replicates. maximum the using recovered was iue4 Figure tibetana 3 (low). white Figure to (high) red from density contact 2 individuals Figure two the of sites Sampling B the of Image A iue1 Figure LEGENDS FIGURE S8 Table S7 Table genome. S6 Table S5 Table S4 Table S3 Table S2 Table technology. Hiseq S1 Table 2 Table 1 Table Computing Performance kindly High who LEGENDS The Wuhan, TABLE by Ltd, provided University. Co, also Normal was Biotechnology support Henan Diggers Study of Zouming, Center Figures. Dr. drawing and with Xupan help from provided help the appreciate study.We this supported also in (2019GGJS063), in Tech- China teachers ACKNOWLEDGEMENTS project 2019, excellent and the young in Science for province of supported program Henan Department Team Training of Province, Innovation and Universities Henan Technology 182102110007), from 182102110237, and Projects (182102110046, Science nology (CXTD2016043). The China province, 31872199). Henan (31401964, China of . umr frpa noain ae ndffrn strategies. different on based annotations repeat of Summary the of Summary neato eainhp ewe hoooergosars h eoe h oo a illuminates bar color The genome. the across regions chromosome between relationships Interaction hlgntcrltosaddvrec ie among times divergence and relations Phylogenetic apeifrainof information Sample EGptwyercmn nlsso ee aiissbett xaso n otato nthe in contraction and expansion to subject families genes of analyses enrichment pathway KEGG EGptwyercmn nlsso ee ujc opstv eeto eiigi h genome. the in residing selection positive to subject genes of analyses enrichment pathway KEGG umr fgnssbett oiieslcinrsdn ntegenome. the in residing selection positive to subject genes of Summary genome. the for predicted models gene all for annotations functional assembly. of genome assembly.Summary novo genome de novo the de of the assessments to BUSCO rate mapping transcriptome the statistics. of 17-mer Summary on based size genome of Estimation SCpo eitn h eorpi itr fthe of history demographic the depicting plot PSMC eemdlcliert between collinearity model Gene umr fcenrasgnrtdfrtewoegnm n rncitm,bsdo Illumina on based transcriptome, and genome whole the for generated reads clean of Summary itiuino eelnt,CS n h ubr feosaditosbetween introns and exons of numbers the and CDS, length, gene of Distribution .dalaica T. o N,6 DNA, Mob .dalaica T. specimen .dalaica T. 1 doi:10.1186/s13100-015-0041-9 11. , eoeassembly. genome niiul sdi hsstudy. this in used individuals .dalaica T. 9 n h te w eaiefishes, relative two other the and .dalaica T. .dalaica T. n eaieseis h phylogeny The species. relative and niiul olce rmLake from collected individuals .rerio D. .dalaica T. and and T. Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. as .J,Slbr,S . h,W,Pre,M,Aln .E,Ovs . ota,J .(2008). R. J. Wortman, . factors. hypoxia-inducible . . by erythropoiesis . Assemble of . J., Regulation to Orvis, Program doi:10.1016/j.blre.2012.12.003 . (2013). E., H. the I., J. V. and L. Allen, Haase, EVidenceModeler M., Hannick, Pertea, using Jr., W., annotation K., Alignments. Zhu, structure Spliced R. L., gene as- Smith, S. eukaryotic alignment Salzberg, R., Automated transcript J., maximal J. B. using Wortman, annotation Haas, M., genome S. Arabidopsis Mount, the Improving L., semblies. A. (2003). O. Delcher, trout White, J., rainbow the B. of infusions. Haas, responses (HCl) Morphological acid (1994). and F. (NaHCO3) S. base Perry, hyperoxia, 12 & to P., Laurent, gill M., mykiss) expanding C. (Oncorhynchus Lake. genes Wood, Nur chimerical G., G. Dali of Goss, the generation of rapid Changes The (1988). (2010). throughput. Z. S. Gao, He, high & and M., zebrafish. accuracy Long, in the M., diversity high Zou, for protein with M., tool alignment Chen, computational B., sequence Fu, a multiple CAFE: MUSCLE: 32 (2006). Res, Acids W. (2004). Nucleic M. C. Hahn, R. & M. Edgar, P., A. evolution. J. McCabe, family Demuth, gene N., . of Cristianini, study consequences. . T., variant haplotype-aware Bie, . De B., BCFtools/csq: Wohlfarth, (2017). J., A. Clark, 33 S. E., matics, McCarthy, A. & Carlson, P., D., Danecek, J. Maximum. preprocessor. Shakun, Glacial FASTQ Last S., all-in-one The A. ultra-fast (2009). Dyke, an U., fastp: P. (2018). Clark, J. align- Gu, & local Y., basic 34 Chen, using formatics, Y., theory. reads Zhou, and S., sequencing Chen, molecule application Growth single (BLASR): (2009). Mapping Z. refinement J. (2012). doi:10.1186/1471-2105-13-238 successive Du, G. & with Tesler, P., & J. ment Chang, J., X., M. expression Y. Qinghai: Chaisson, Wang, Lake in C., przewalskii) environments. X. (Gymnocypris water carp Chen, (2009). different naked S., of L. in Wang, factor T. growth Q., insulin-like Madden, X. and & hormone Chen, K., B., Bealer, Y. J., Cao, Papadopoulos, N., applications. Ma, and V., architecture Avagyan, Chromosome- BLAST+: G., interactions. (2013). Coulouris, J. chromatin C., Shendure, on Camacho, & based O., assemblies J. genome Kitzman, doi:10.1038/nbt.2727 novo R., 1119-1125. de Qiu, genes. P., of identify R. scaffolding Patwardhan, to A., scale geneid Adey, Using N., J. Burton, (2007). R. Guigo, & 4 experiment. G., Chapter annotation Parra, Drosophila E., the Blanco, in GeneWise Using (2000). sequences. R. 10 DNA Durbin, analyze & to E., program Birney, a finder: doi:10.1093/nar/27.2.573 repeats 573-580. Tandem (2), (1999). G. Benson, 6,4547 doi:10.1007/BF00004449 465-477. (6), doi:10.1101/gr.10.4.547 547-548. (4), uli cd e,31 Res, Acids Nucleic nt43 doi:10.1002/0471250953.bi0403s18 3. 4 Unit , 1) 0723.doi:10.1093/bioinformatics/btx100 2037-2039. (13), 1) 84i9.doi:10.1093/bioinformatics/bty560 i884-i890. (17), eoeBo,9 Biol, Genome 5,19-77 doi:10.1093/nar/gkh340 1792-1797. (5), M eois 11 Genomics, BMC iifrais 22 Bioinformatics, 1) 6456.doi:10.1093/nar/gkg770 5654-5666. (19), e opEdcio,161 Endocrinol, Comp Gen cec,325 Science, 1,R.doi:10.1186/gb-2008-9-1-r7 R7. (1), M iifrais 10 Bioinformatics, BMC egahclRsac,7 Research, Geographical 54) 1-1.doi:10.1126/science.1172873 710-714. (5941), 5.doi:10.1186/1471-2164-11-657 657. , 1) 2917.doi:10.1093/bioinformatics/btl097 1269-1271. (10), 10 3,4046 doi:10.1016/j.ygcen.2009.02.005 400-406. (3), 2.doi:10.1186/1471-2105-10-421 421. , M iifrais 13 Bioinformatics, BMC (4). urPoo Bioinformatics, Protoc Curr lo e,27 Rev, Blood a itcnl 31 Biotechnol, Nat ihPyilBiochem, Physiol uli cd e,27 Res, Acids Nucleic eoeRes, Genome 1,41-53. (1), Bioinfor- Bioin- 238. , (12), Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. ea . sknz,H,Kth . uk,T 21) UDNE:acrt eeto funreliable of detection accurate parameters. GUIDANCE2: multiple of (2015). uncertainty T. the Pupko, transporters. doi:10.1093/nar/gkv318 for - & W7-14. 3 accounting K., HCO Katoh, regions of family H., alignment SLC4 Ashkenazy, The I., (2004). Sela, F. W. Boron, expression & RNA-Seq M., (2005). 447 Improving C. Arch, R. Fulton, (2011). F., L. Lopez, Pachter, M. & Romero, & bias. L., fragment J. R., for Rinn, correcting Apweiler, J., by Donaghey, estimates N., C., Trapnell, Mulder, A., Roberts, N., identifier. Harte, domains S., physiology. renal doi:10.1093/nar/gki442 protein Pillai, in anhydrases V., carbonic InterProScan: Silventoinen, of role E., The Quevillon, (2007). J. G. occurrences 71 Schwartz, of counting Int, & parallel M., ab efficient J. source for approach Purkerson, open lock-free two fast, GlimmerHMM: A and (2011). k-mers. TigrScan C. of Kingsford, (2004). & L. G., Marcais, S. Salzberg, & genes gene-finders. RNA M., eukaryotic transfer Pertea, initio of detection H., improved W. for program Majoros, a tRNAscan-SE: (1997). sequence. R. genomic S. Dekker, in Eddy, . & M., . genome. T. human . Lowe, the A., of Telling, principles T., folding Ragoczy, reveals interactions M., 326 long-range Imakaev, Science, of L., mapping Williams, Comprehensive eukaryotic L., for (2009). N. groups J. ortholog Berkum, of van identification E., OrthoMCL: Lieberman-Aiden, (2003). S. Process- D. Data Roos, Project & Jr., Genome genomes. J., . C. Stoeckert, . L., SAMtools. Li, . and N., format Homer, Alignment/Map J., Sequence Ruan, The doi:10.1093/bioinformatics/btp352 T., Fennell, (2009). A., S. Wysoker, ing, B., sequences. Handsaker, whole-genome transform. H., individual Burrows-Wheeler from Li, history with population human alignment of read Inference 475 short (2011). Nature, R. accurate Durbin, & and H., Fast Li, (2009). R. 2. 25 Durbin, Bowtie Bioinformatics, & with alignment H., gapped-read RNAm- Li, Fast (2007). (2012). W. L. doi:10.1038/nmeth.1923 S. D. 357-359. Ussery, Salzberg, genes. & & RNA B., T., ribosomal Langmead, Rognes, of H., annotation H. Staerfeldt, rapid A., doi:10.1093/nar/gkm160 and E. consistent Rodland, require- P., mer: memory Hallin, low genomes. K., with novel Lagesen, in aligner finding spliced Gene fast a (2004). HISAT: I. Korf, (2015). L. I. S. A. Salzberg, Petrov, & . ments. families. B., RNA . Langmead, non-coding D., . for Kim, R., resource S. genome-centric Eddy, a E., to Rivas, P., shifting 46 E. 13.0: Nawrocki, Rfam N., Quinones-Olvera, (2018). J., Argasinska, I., Kalvari, D) 35D4.doi:10.1093/nar/gkx1038 D335-D342. (D1), a ehd,12 Methods, Nat 2,1315 doi:10.1038/sj.ki.5002020 103-115. (2), iifrais 27 Bioinformatics, eoeRs 13 Res, Genome 5,4559 doi:10.1007/s00424-003-1180-2 495-509. (5), 75) 9-9.doi:10.1038/nature10231 493-496. (7357), 55) 8-9.doi:10.1126/science.1181369 289-293. (5950), 1) 7416.doi:10.1093/bioinformatics/btp324 1754-1760. (14), uli cd e,25 Res, Acids Nucleic 4,3730 doi:10.1038/nmeth.3317 357-360. (4), 9,27-19 doi:10.1101/gr.1224503 2178-2189. (9), iifrais 20 Bioinformatics, 6,7470 doi:10.1093/bioinformatics/btr011 764-770. (6), eoeBo,12 Biol, Genome 5,9594 doi:10.1093/nar/25.5.955 955-964. (5), uli cd e,33 Res, Acids Nucleic M iifrais 5 Bioinformatics, BMC 1) 8827.doi:10.1093/bioinformatics/bth315 2878-2879. (16), 11 3,R2 doi:10.1186/gb-2011-12-3-r22 R22. (3), uli cd e,35 Res, Acids Nucleic iifrais 25 Bioinformatics, 9 doi:10.1186/1471-2105-5-59 59. , WbSre su) W116-120. issue), Server (Web uli cd e,43 Res, Acids Nucleic a ehd,9 Methods, Nat uli cd Res, Acids Nucleic 1) 2078-2079. (16), 9,3100-3108. (9), Pflugers Kidney (W1), (4), Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. ag . i,H,M,Z,Zu . o,M,Mo . ag .(09.Crmsm-ee genome Chromosome-level Tibetan (2019). the R. Yang, of environment . Chromosome- high-altitude . A harsh . the Y., to (2019). Mao, adapted S. M., fish Zou, He, a siluroides. Y., & tibetana, Plateau. Zou, Triplophysa Triplophysa Y., Z., of Ma, Loach, Zhang, assembly H., Y., Liu, Tibetan X., Dong, Yang, a S., of Duan, T., Assembly doi:10.3389/fgene.2019.00991 Wang, Reference Y., Adaptive Scale of Wang, Basis L., Genomic Yang, (2017). P. Xu, LTR . (2007). posons. . H. Wang, Environment. & . Alkaline Z., Extremely B., Xu, an Chen, in Z., waleckii) Yao, (Leuciscus W., and Ide 34 Peng, sequencing Amur Evol, Transcriptome Y., of Survival Jiang, (2013). The X. T., Evolution: Sun, J. Li, into . insights J., reveals . Xu, lake alkaline-saline . central-eastern extreme Z., an in Zhao, inhabiting Lake J., waleckii) adaptation. Wang, (Leuciscus plateau Dali stress Ide L., of Amur the Zhao, wild Hydrology B., of of Wang, analysis P., (2008). analysis Ji, Z. J., Transcriptome Xu, Lomtatidze, variability. & monsoon S., (2015). Asian Itoh, East S. doi:10.1007/s10933-007-9179-x D., Holocene and fishes. Zhai, He, Mongolia B., in Inner & Si, hypoxia J., Z., to Xiao, (2014). adaptation Song, M. for A. B., Implications Earl, Wu, doi:10.1016/j.gene.2015.04.023 . dalaica): L., (Triplophysa . Yang, fish . improvement. S., Y., assembly Sakthikumar, genome Wang, A., and detection Abouelliel, variant M., microbial 9 comprehensive Priest, One, for PLoS T., tool Shea, integrated M. an T., Schatz, Pilon: Abeel, reads. & J., J., short Gurtowski, B. from Walker, H., profiling Fang, genome J., reference-free C. doi:10.1093/bioinformatics/btx153 fast Underwood, Differential M., 2202-2204. GenomeScope: Nattestad, (2012). L. J., (2017). Pachter, F. C. Sedlazeck, . Cufflinks. W., . and G. TopHat . Vurture, with R., experiments D. Kelley, RNA-seq doi:10.1038/nprot.2012.016 D., of 562-578. Kim, analysis (3), G., expression Pertea, transcript L., and Goff, develop- gene A., the Roberts, for C., databases Trapnell, L.). EST vulgare Exploiting (Hordeum (2003). barley A. in SSR-markers Graner, 106 genomic gene-derived & of in K., characterization elements R. and Varshney, repetitive ment W., identify Michalek, to T., RepeatMasker Thiel, Using (2009). N. sequences. Chen, & M., Tarailo-Graovac, ambiguously and divergent alignments. removing sequence after protein phylogenies from of Improvement blocks finding aligned (2007). gene J. for Castresana, server & web G., Talavera, a AUGUSTUS: (2004). phylo- B. large Morgenstern, of eukaryotes. & S., in post-analysis Waack, and R., Steinkamp, analysis M., phylogenetic Stanke, for tool BUSCO: a (2015). 8: version M. genies. RAxML E. (2014). Zdobnov, A. & Stamatakis, V., E. Kriventseva, orthologs. single-copy P., doi:10.1093/bioinformatics/btv351 with Ioannidis, completeness 3210-3212. annotation M., and assembly R. genome Waterhouse, assessing A., F. Simao, 3,4142 doi:10.1007/s00122-002-1031-0 411-422. (3), iifrais 30 Bioinformatics, o clRsu,19 Resour, Ecol Mol uli cd e,35 Res, Acids Nucleic 1,1519 doi:10.1093/molbev/msw230 145-159. (1), urPoo iifrais hpe 4 Chapter Bioinformatics, Protoc Curr 1) 126.doi:10.1371/journal.pone.0112963 e112963. (11), uli cd e,32 Res, Acids Nucleic LSOe 8 One, PLoS 9,11-33 doi:10.1093/bioinformatics/btu033 1312-1313. (9), 4,12-06 doi:10.1111/1755-0998.13021 1027-1036. (4), WbSre su) 2528 doi:10.1093/nar/gkm286 W265-268. issue), Server (Web 4,e90.doi:10.1371/journal.pone.0059703 e59703. (4), IDR necetto o h rdcino ullnt T retrotrans- LTR full-length of prediction the for tool efficient an FINDER: WbSre su) 3932 doi:10.1093/nar/gkh379 W309-312. issue), Server (Web ytBo,56 Biol, Syst nt41.doi:10.1002/0471250953.bi0410s25 10. 4 Unit , 12 4,5457 doi:10.1080/10635150701472164 564-577. (4), ora fPloinlg,40 Paleolimnology, of Journal rn ee,10 Genet, Front ee 565 Gene, iifrais 33 Bioinformatics, iifrais 31 Bioinformatics, ho plGenet, Appl Theor a rtc 7 Protoc, Nat 2,211-220. (2), 1,519-528. (1), o Biol Mol 991. , (14), (19), Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. ag .(07.PM :pyoeei nlssb aiu likelihood. maximum by analysis phylogenetic 4: doi:10.1093/molbev/msm088 PAML (2007). Z. Yang, 13 o ilEo,24 Evol, Biol Mol 8,1586-1591. (8), Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. 14 Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. 60 T. tibetana T. T. dalaica T. D. rerio 56.37(46.00-63.02)

1 2 3 4 3 7 50

50.59(41.93-59.33) 4

1 7

2 9

5 5 5 6 7 8 9 4 40

40.13(35.79-47.11) 13

6 21

3 10

9 25 15

30 14 22

10 11 12 13 14 15 16 17 8

13 24

16 18 20

15 11 18

20 15

17 17

14 1 12

14.61(6.05-30.94) 12 8

18 19 20 21 22 23 24 25 6 21

10 20 23 8.31(3.12-18.85) 7.89(2.37-25.40)

11 19 23

10 22

24 2 25

0 16 Megalobrama amblycephala Anabarilius grahami Danionella translucida Danio rerio Triplophysa siluroides Triplophysa tibetana Triplophysa dalaica 19 Posted on Authorea 16 Jul 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159493308.83574782 — This a preprint and has not been peer reviewed. Data may be preliminary. survival-in-extremely-alkaline-environment level-genome-of-triplophysa-dalaica-cypriniformes-cobitidae-provides-insights-into-its- 2.docx Table file Hosted survival-in-extremely-alkaline-environment level-genome-of-triplophysa-dalaica-cypriniformes-cobitidae-provides-insights-into-its- 1.docx Table file Hosted vial at available at available

Effective population size (x104) 30 10 15 20 25 0 5 10 4 https://authorea.com/users/343092/articles/469773-the-chromosome- https://authorea.com/users/343092/articles/469773-the-chromosome- 10 Years (g=2, Years 5 16 µ=4x10 -9 ) 10 6 Dali Hai