Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. i dong liu fishes of in expansion genes unexpected opsin into insight undulates humphead Cheilinus endangered wrasse the of assembly genome Chromosome-level ardepeet nqeopruiyt anisgtit dpain pce fLbia rgntdin originated Labridae of (Alfaro Species periods adaptation. Paleogene into early insight to family gain The Cretaceous to environments. ecological late opportunity marine the sustaining unique in a role important presents an Labridae plays fish coral of Adaptation Introduction 1 undulatus Cheilinus future for resources valuable provides Keywords for wrasse copies humphead fishes. opsin the of increased of morphology the uneven sequencing functional gene of the Genome and providing adaptations and evolution, conversion, alternative conservation, behavior. undulatus, gene of indicated sexual C. investigation to tissues and for five contributed in foraging to specific probably expression visual up opsin windows opsin the of expansion adjacent of abundance single-gene and Divergence showing windows for fluidity. opsin Rh2, responsible function in was and LWS1, conversion transposons SWS2, Gene of opsins distribution analyses arrays. of genomic tandem expansion Comparative in in contracted unexpected selection. and copies were revealed owing expanded positive genes were divergence, underwent fishes families low 22,218 genes other gene with of 46 560/1,848 with genome, Additionally, and total of ago, respectively. A 39.88% years phylogeny, million for express reconstructed chromosomes. 58.37 accounting to the approximately 24 abundant, showed species most to analyses close were Transcriptomic with anchored Transposons was evolve represented. and to genome gene. completely Mb, Gb predicted were 1.17 16.5 the genes The of of BUSCO 96.79% sequencing. of length values. Hi-C 96.36% ecological N50 and and and an Nanopore, annotated, economic with Illumina, high functionally contigs using with undulatus, 328 species C. endangered from of an generated assembly united is bones genome undulatus pharyngeal present Cheilinus paired we wrasses, of Here, feature Among specialized jawbone. a display single and a environments, into reef coral in worldwide distributed are Wrasses Abstract 2020 11, September 2 1 pce on ncrlresadisoehbtt n sdsrbtdi uho h rpclIndo-Pacific tropical the of much 2004). (Russell in fish distributed high-priced and is valued and most habitats the of inshore prey one of and is capture reefs it wrasse, efficient Moreover, coral humphead Ocean. for on the bones, force arch fishes, found bite gill labrid great species of Among a pair generate 1990). a to Sorbini fish from (Cowman derived labrid bones allowed pharyngeal is and separated (Wainwright bones which right shapes, pharyngeal and jawbone, body left united single display morphs, The D a fishes color, widespread into (Liu specific other environments united the intra- whereas reef are and various inter- bones to of pharyngeal variety adapt paired extensive to an behavior with feeding genera 71 in species hnhiOenUniversity Ocean Shanghai available not Affiliation 1 iyn Wang Xinyang , tal. et 02.Teeris ardefsisdmntaeti hrnelaprts(ankv& (Bannikov apparatus pharyngeal this demonstrate fossils Labridae earliest The 2012). rncitm,vsa ee eecneso,transposon conversion, gene gene, visual transcriptome, , 1 ogiguo hongyi , 2 uun Zhang Xuguang , 1 tal. et hiiu undulatus Cheilinus 08 n ucl iesfidit vr519 over into diversified quickly and 2018) tal. et 1 igZhang Ming , 09.I h edn paau,the apparatus, feeding the In 2019). R .undulatus C. pel sa endangered an is uppell, ¨ 1 n eqa tang wenqiao and , tal. et n fthe of one , 2009). 2 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. sebysrtg.I oprsnwt te nw s eoe,w on that found we genomes, wrasse humphead fish endangered known genome a for other and level technologies, with Hi-C chromosomal comparison platform, LWS1 the sequencing In DNA at strategy. long-read assembly Nanopore assembly genome reads, short first Illumina the using present we study, (Qi genome of mitochondrial resources the of knowledge, assembly H chromosomal our (Liu with To genome reported. a However, been bones. not pharyngeal united the and system undulatus C. (Phillips and surface, al. the to et closer living (Lin those environments from through living differ restored different fish under been fish have of may adaptation sensitivity Color function. of (Springer (Sharkey loss display duplications or gene divergent be could duplicates al. et (Lin and Cyprininae in (Cortesi events duplications duplication whole-genome or duplications al. et (Lagman wavelengths, ( vertebrates lamprey red early and green, in blue, subfamilies ultraviolet, vision, dim ( to rhodopsin LWS sensitive are one that including (Collin subfamilies family, five respectively of gene total monophyletic a a with of composed (Cortesi opsins , streams mountain five clear possess to Fish sea & deep (Cheng dark factor the developmental feeding from for important (Phillips ranging mechanisms For A an fish alternative provides Labrid genes. is answered. genes of which opsin opsin be ecology of zooplankton, of expression to diverse foraging expression Interestingly, remains for ( 2007). the Flamarique this juveniles 1 is sensitive however, in short-wavelength adaptation Therefore, functions gathering; the this but column. food trout, of water rainbow for example the in adaptations well-studied through example, visual and prey of potential widespread apparatus sensitivity detect set particularly visual jaw Meanwhile, to a visual, esophagus. pharyngeal fish co-evolved for the specialized for into have coding date, food genes useful the to transport with be evolution, unknown and associations could manipulate, is morphological be collect, genome may In to whole there chiefly parameters. the functions and because feeding provided, probably be and not diet, olfactory, could its architecture to genetic related (Sadovy the change reefs habitat (Graham coral weight the to in adjacent underlying kg waters shallower 190 in over found and length Body eyes. in undulatus the m C. through 2.3 running (Donaldson of lines List size distinctive stages. maximum Red of developmental a 2001 pair different with IUCN a and the at lips, in varies fleshy species large color threatened individuals, a adult and of trade forehead Book international Data 2001). coral Red Therefore, Sadovy in 1996 (Sadovy 1998). & animals IUCN species (Sadovy toxic the this such ecology in conserve of reef species to reproduction of limited excess stability been controls has starfish, the and maintaining boxfish, environments, hares, reef sea of predators few SWS2 and tal. et 07.Temjrt fryfindfihsdslysvrlcpe ihnec pi ufml u otandem to due subfamily opsin each within copies several display fishes ray-finned of majority The 2007). 02.Hwvr itei nw bu h oeua ehns fosntne ulcto.Opsin duplication. tandem opsin of mechanism molecular the about known is little However, 2012). ee,four genes, 02.Sc vnsdcaewehrfihaescesu ncthn ryo saigfo predators from escaping or prey catching in successful are fish whether dictate events Such 2012). tal. et n idewvlnt estv ( sensitive middle-wavelength one , tal. et eti australis Geotria SWS tal. et 07.Tne ulcto samjrcnrbtrto contributor major a is duplication Tandem 2017). .undulatus C. sa da addt o h netgto fosneouini oa effihbsdo h visual the on based fish reef coral in evolution opsin of investigation the for candidate ideal an is dlsihbtsepotrre lpsadbnhla -0m hra ueie r typically are juveniles whereas m, 2-60 at benthal and slopes reef outer steep inhabit adults 09 aebe eotdfrhmha rse rma vltoayprpcie genomic perspective, evolutionary an From wrasse. humphead for reported been have 2019) 2016). 06.Tedpiae fosngisadlse r eivdt orlt ihteevolutionary the with correlate to believed are losses and gains opsin of duplicates The 2016). usqety w onso hl-eoedpiainepne iulosnit five into opsin visual expanded duplication whole-genome of rounds two subsequently, ; .undulatus C. SWS2 tal. et ee,adfive and genes, rvd nih notemcaimo h iulsse o odfrgn.I this In foraging. food for system visual the of mechanism the into insight provide 03.Sneyaayi fosngnsidctdta oa ulcto produced duplication local a that indicated genes opsin of analysis Synteny 2003). ihu as ugsigta h pi eei h neta tt njw (Davies jaws in state ancestral the is gene opsin the that suggesting jaws, without ) tal. et scaatrzdb eea rmnn etrs nldn ag upo the on hump large a including features, prominent several by characterized is tal. et Rh2 07 riatvto foeosn eutn nrtnlmonochromacy retinal in resulting opsin, one of inactivation or 2017) ulcto nsloiswr eadda osqec ftetraploidy of consequence a as regarded were salmonids in duplication 06.Osn nfihaeky otescesu ooiaino habitats, of colonization successful the to keys are fish in Opsins 2016). Rh2 tal. et Rh2 ee,tems eotdnme fayfihyt h multiple The yet. fish any of number reported most the genes, .undulatus C. 04.Afiegn eetieo pi a efudi the in found be can opsin of repertoire five-gene A 2014). ,adoeln-aeeghsniie( sensitive long-wavelength one and ), tal. et LWS 2 03.Ti pce a enlse savulnerable a as listed been has species This 2003). ee r oti oede-ae pce (Rennison species deep-water some in lost are genes tal. et tal. et tal. et stelretmme ftefml Labridae, family the of member largest the is SWS1 03.Ltl skonaottemechanism the about known is Little 2003). 05 Rennison 2015; 07.Osngn eetie ndeep-water in repertoires gene Opsin 2017). LWS tal. et eemyb ofntoa nadults, in nonfunctional be may gene ) tal. et ufml mlfiain(Rennison amplification subfamily 2015). 03 n e transcriptomes few a and 2013) tal. et tal. et 2015). 2012). .undulatus C. LWS .undulatus C. .undulatus C. LWS Rh1 pi gene, opsin ) and ), a five has SWS2 SWS1 must has Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. h ro ftennpr la ed a rtcretduigNxDnv hts/gtu.o/Nex- The (https://github.com/ NextDenovo k. using 22 corrected at first set was cutoff reads seed clean the nanopore with the tomics/NextDenovo), of error The USA). CA, as Diego, assembly library San Genome Inc., RNA-seq 2.5 (Illumina Illumina mode spec- the PE150 Nanodrop construct in a to (Zhu 6000 using used study NovaSeq checked was previous was a RNA separately quality in extracted The RNA described was UK). spleen, RNA and Ringmer, liver, USA), individual. (Labtech, muscle, RNA- same Camarillo, brain, trophotometer models, the (Invitrogen, the gene from Reagent including predict collected Trizol tissues, tissues to multiple using gonad and for and regions data retina, gene transcript organ, over generate olfactory genome to assembled performed the was of seq rate coverage the estimate To sequencing Fluorometer and 3.0 extraction Qubit according RNA UK), a Oxford, 2.4 Nanopore, using sequenced (Oxford sequencer and quantified instructions. DNA prepared manufacturer’s were PromethION were the Nanopore prep) libraries the to 1D using (three DNA cell libraries flow Nanopore, kb genomic one fragments, 20 (SQK-LSK109, on kb three long Kit Meanwhile, the 20 Sequencing USA). of fragments Ligation Camarillo, the ends long (Invitrogen, 1D the the Finally, the reduced, to were UK). for ligated reads were instructions Oxford, short System adapters manufacturer’s the Pippin sequencing the After Blue The to kb. a sequencing. according 20 using of nanopore selected value for were peak used muscle a were with fresh USA), from MA, DNA Science, high-quality (Sage was extracted size of sequencing fragments Genome and Long construction step. library next DNA the Long in 2.3 analysis N = heterozygosity G perform amendment: not with did N formula we the Therefore, using calculated of S1). distribution k-mer for (Fig. the obtained from site for were different distribution significantly reads San not frequency clean was Inc., k-mer genome the (Illumina a reads, mode obtained We redundant PE and 150 analysis. low-quality in of platform removal 6000 constructed novo NovaSeq After The Illumina USA). PCR. the using CA, (Qiagen, amplified using Diego, and Kits sequenced end-repaired, was Tissue purified, library (Xiao 2000, & DNA Covaris study using Blood previous fragments DNeasy bp a of 300–500 using in size performed tissues genome as muscle technology, preparation. The fresh sample from before Germany). degC extracted Halden, -80 was at estimation DNA stored size were High-quality genome samples gonad, All and organ, h. extraction 24 olfactory DNA for spleen, 2.2 1 nitrogen liver, with interpretation. liquid muscle, rinsed otolith in quickly brain, annuli collected, frozen The on were then laboratory. fish based and the old, the of years to tissues 4 retina transported be and was to fish determined living was The fish The assembly. and sequencing female wild A a collection as Sample serve will 2.1 wrasse. assembly humphead Methods the genome of draft and conservation annotated Materials and the sexual 2 ecology ecology, that behavior of strategy, believe studies prey We in future species. role for important this in resource an of sites play evolution may selection copies and conversion, positive opsin change, in showed gene increase analyses by sudden PAML the expanded that conversion. subsequently gene duplication, opsin Rh2 to whole-genome contributed via transposons produced while initially were genes k-mer ee.RAsqecn RAsq nlsssoe aito nosnepeso.Orrslsindicate results Our expression. opsin in variation showed analyses (RNA-seq) sequencing RNA genes. sebyt siaetegnm ie l la ed eesbetdt 7mrfeunydistribution frequency 17-mer to subjected were reads clean All size. genome the estimate to assembly num stenme of number the is .undulatus C. Fg ) agti unzo,Gagogpoic,Cia a sdfrgenome for used was China, province, Guangdong Guangzhou, in caught 1), (Fig. k mr,adDi the is D and -mers, tal. et 04.Teetasrp irre eesqecduiga Illumina an using sequenced were libraries transcript These 2014). .undulatus C. k mrepce et,a ecie (Xiao described as depth, expected -mer tal. et 3 k a siae ae nIlmn N sequencing DNA Illumina on based estimated was -mer .undulatus C. 09.I re,DAwsrnol hae to sheared randomly was DNA brief, In 2019). num .undulatus C. × D / hsht uee aie(B)solution, (PBS) saline buffered phosphate k .undulatus C. mer Fg 1.Tehtrzgst fthe of heterozygosity The S1). (Fig. depth eoewsasmldusing assembled was genome hr stegnm size, genome the is G where , ttehl-xetddepth half-expected the at tal. et 2019). de Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. ubn21)adteIlmn N-e ed fmlil ise yhst (Kim hisat2 by tissues multiple of reads RNA-seq Illumina the and 2010) Durbin ht:/w.eetakr r) ial,RpaMse ht:/w.eetakrog a sdt screen to used was (http://www.repeatmasker.org) a RepeatMasker generate Finally, to org). annotated (http://www.repeatmasker. and a TE-lib into by masked merged were genome Wessler libraries the LTR & in LTR (Han soft-masked MITE-hunter using were using tified TREs identified a When were produce (MITE) to parameters. elements 2010) default transposable with miniature 1999) the assembly, (Benson Finder Repeat dem al. the of (Stanke models Augustus gene using validation consensus assembly build genome To and provide annotation With not functional DpnII. did Gene of that 2.7 site of pairs restriction sequencing read the nanopore (Burton at Clean Lachesis from sequences using contigs groups the mode. the to information, (Chen end-to-end alignment Fastp interaction in by valid using excluded 2012) filtered were Salzberg were information & reads interaction low-quality (Langmead Illumina the the Bowtie2 using After DNA using sequenced ligation, was mode. proximity library PE150 Hi-C al. marking, with (Xiao The biotin platform described amplification. 6000 (DpnII), previously DNA NovaSeq digestion and as shearing, chromatin process, physical pair lysis, the purification, a formaldehyde, in between with involved distance fixation steps one-dimensional The the on individual. dependent (Xiao strongly loci are of which FreeBayes contigs, using among calling of information assembly SNP chromosomal a by obtain library validated To Hi-C was using base assembly single Chromosome of 2.6 rate 2012). accuracy Marth The (Hu & NextPolish (Garrison genome. by assembled reads the short (Simao DNA BUSCO using Illumina predicted the were genome using assembled reads the brata in long genes error-corrected The for 2020). runs polished two was genome in assembled nanopore- The (https://github.com/ruanjue/smartdenovo). SMARTdenovo ipesqec eet SR)o the of (SSRs) repeats sequence Simple annotation (Trapnell element Cufflinks Repetitive using 2.8 genes expressed SMART, of Pfam, signatures PRINTS, number other ProDom, values the and the (Sarah checking searching domains, InterPro by functional by in motifs, evaluated genes performed 10 databases examine protein-coding we x PROSITE the to Second, 1 and of 2001) PANTHER, Blast2GO. of structure Rolf value using secondary e & information the an (Zdobnov in annotation with InterProScan NR BLAST using on using (Kanehisa annotation based genes databases genes (KEGG) protein-coding to Genomes the assigned and annotate Genes to of the used First, Encyclopedia were Kyoto methods. and combined two (GO) on Ontology based (Bairoch annotated database functionally were SWISS-PROT genes Transposon protein-coding using removed predicted were The genes sourceforge.net). transposon-including and psi. set, transposon- gene a (http:// into PSI merged were models gene All melops of sequences novo 08 ihdfutprmtr,tecenra ar eempe otepolished the to mapped were pairs read clean the parameters, default with 2018) 03.AtrSR eesf-akd admrpttv lmns(Rs eeanttduigTan- using annotated were (TREs) elements repetitive tandem soft-masked, were SSRs After 2013). oe,hmlg eune n rncitdt,rsetvl.Frhmlg-ae rdcin protein prediction, homology-based For respectively. data, transcript and sequence, homology model, d9dtbs.Teitgiyo h eoewsasse sn h luiasotrasb W L & (Li BWA by reads short Illumina the using assessed was genome the of integrity The database. odb9 > and , ,bsdo h luiasotrasfo isetranscripts. tissue from reads short Illumina the on based 0, aiuurubripes Takifugu ai rerio Danio tal. et idr(Flicek Finder 09.TeH- irr a osrce sn fmsl isefo h aeone same the from tissue muscle of g 1 using constructed was library Hi-C The 2019). enovo de tal. et , IElbay n h ogtria eetrtornpsn LR eeiden- were (LTR) retrotransposons repeat terminal long the and library, MITE atrsesaculeatus Gasterosteus tal. et eedwlae rmteNB aaae(tp:/w.cinmnh gov). (https://www.ncbi.nlm.nih. database NCBI the from downloaded were 06,GMM (Jens GeMoMa 2006), tal. et tal. et enovo de 03,wihwr ute ree n retdit chromosomes. into oriented and ordered further were which 2013), .undulatus C. .undulatus C. 00,teNB o-eudn rti N)dtbs,teGene the database, (NR) protein non-redundant NCBI the 2010), .undulatus C. 04 ooti a obtain to 2014) rnpsnlbay(Elb.Tegnm sebywshard- was assembly genome The (TE-lib). library transposon enovo de h iCtcnqewsapidt banteinteraction the obtain to applied was technique Hi-C the , , 4 arsbergylta Labrus eoewr rtietfiduigGAo(Wang GMATo using identified first were genome eoeasml,gn rdcin eeperformed were predictions gene assembly, genome tal. et eMdLbay(eMlb sn RepeatModeler using (RepM-lib) Library RepMod tal. et enovo de 06,adPS (Haas PASA and 2016), 09.Teqaiyo eeantto was annotation gene of quality The 2009). T irr.Teefe,MT and MITE Thereafter, library. LTR , .undulatus C. aelba maculatus Lateolabrax tal. et tal. et tal. et tal. et eecutrdit 24 into clustered were tal. et .undulatus C. 05 lge against aligned 2015) 09,icuetissue include 2019), 05 ihteverte- the with 2015) -5 02 ihFPKM with 2012) Otrswere terms GO . 08,with 2008), , tal. et Symphodus genome 2000) tal. et de et et Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. rhlgu ee a acltduigCdm ntePM akg Yn 97 ihtebranch-site significant genes were the there opsin with whether of determine 1997) Identification to (Yang pair 2.14 package model on each PAML constraints on the conducted genes. functional was selection in of determine test positive genes ratio Codeml to greater selection likelihood using changes A is positive calculated acid-level the model. (?) non- was amino identify To site of genes in each number 2003). common orthologous the for Wu is value when substitutions & (Fay selection (e synonymous occurs proteins Positive selection Blastp of Positive number using 1. the alignment results. than by paired aligned divided the all substitutions from to synonymous inferred subjected were were genes species thologous 14 model. from Proteins graphical probabilistic genes a selection da Positive from Huang 2.13 2004; derived Speed likelihoods & (Beissbarth conditional EnrichPipeline32 comparing with ( families by gene family expanded The Significantly analyzed gene were tree. phylogenetic each species the 14 based of of in constructed lineage contracted tree each or along phylogenetic Bie expanded variations the that (De and families CAFE clustering gene the family using genes, gene orthologous by single-copy obtained PAMLon orthologs in the MCMCTREE families to and gene According (Mya), of time. ago contraction divergence years and estimated million Expansion the 28.46 confirm 2.12 the and to With used 98.25, was 100. 239.84, 1997) of 471.34, (Yang the bootstrap packages sites, (Hedges were construct and four database to regions model Timetree at outgroup the CDS Gtrgamma an times in with the as deposited used 2006) times and was divergence (Stamatakis (CDS), calibration shark param- RaxML ghost sequence default using Thereafter, with coding tree results. 2013) 2000). phylogenetic into clustering Standley (Castresana translated & OrthoMCL Gblocks (Katoh were the using Mafft filtered from sequences using extracted Protein-aligned aligned were were clusters cluster eters. each gene from orthologous sequences single-copy Protein not 619 could of that total orthologs time A divergence genomes, compared and the analyses of Phylogenetic genes. repertoires 2.11 species-specific gene to predicted ascribed (Li the were OrthoMCL In found using be orthologs. family find gene to identify 1.5 to clustered then and milii pectinirostris mus ( lwfih( clownfish nldn alnwas ( wrasse the ballan in including proteomes predicted The identification family Gene 2.10 respectively. genome, (Karin the tools in RNAmmer rRNAs and 1997) and Eddy & (Lowe tRNAscan-SE translated (Griffiths-Jones The not database are ncRNA that Rfam transcripts the non-coding against as defined are (Willingham and proteins genome into the in anywhere appear can (Bao ncRNAs annotation databases (ncRNA) RepBase RNA and Non-coding RepM-lib, 2.9 TE-lib, the in annotation homology-based 2015). with repeats for yolsu semilaevis Cynoglossus ,wr lee ooti h ogs citprgn,sbetdt nalv-l lsp(-au [?]1e-5), (E-value Blastp all-vs-all an to subjected gene, per script longest the obtain to filtered were ), mhpinpercula Amphiprion ,sotdsaas( seabass spotted ), tal. et ,sikeak( stickleback ), .bergylta L. tal. et 06.Arno it n et oe a sdt rdc eefamily gene predict to used was model death and birth random A 2006). ,zbas ( zebrafish ), 05.TenRA of ncRNAs The 2005). .undulatus C. ,crwn rse( wrasse corkwing ), .maculates L. P .aculeatus G. values .rerio D. < eoeadtoefo te eoe f1 ees fishes, teleost 13 of genomes other from those and genome .5 eepromduigaG emercmn analysis enrichment term GO a using performed were 0.05) ,pffrs ( pufferfish ), tal. et ,mdk ( medaka ), ,cv s ( fish cave ), 5 .undulatus C. 05 yainn ihtekonrN n tRNA. and rRNA known the with aligning by 2005) .melops S. P tal. et vle eeue odtrietesignificance the determine to used were -values .rubripes T. rza latipes Oryzias tal. et ioylcelsanshuiensis Sinocyclocheilus 2009). ,nl iai ( tilapia nile ), tal. et 03,wt h nainidxstat set index inflation the with 2003), eoewr dnie sn BLAST using identified were genome .undulatus C. tal. et 07 eeue opeittRNAs predict to used were 2007) ,adgotsak( shark ghost and ), 06,w eetdcalibration selected we 2006), ,mdkpe ( mudskipper ), rohoi niloticus Oreochromis h vrg among ? average the , < e5,ador- and 1e-5), = Callorhinchus ,tnu sole tongue ), Boleophthal- tal. et ), Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. a ttsi PMvleo .-.5frlwrepeso,37-5frmdlepeso,admr than more and expression, model for 3.75-15 expression, lower were for (FPKM) 0.1-3.75 reads of mapped value million FPKM per (Trapnell statistic transcript method a the of inference liver, was kilobase against Bayesian muscle, per a mapped fragments using brain, were of calculated of estimation individual the reads same for in (Trapnell the the intervals above Cufflinks from with identified into collected genes together input opsin tissues tissue, of gonad retina sequences and from coding organs, transcriptomes olfactory of spleen, reads filtered for The analyzed were transcriptomes Retina-specific the in expression of differences gene distribution a significant Opsin and a validate 2.18 window, drew to flanking and used its window and was region. window hotspot per test opsin transposon Poisson numbers the sliding one-sample transposon of kb numbers A and 100 transposon a numbers script). window used gene per (R we number plotted conversion, transposon gene/transposon We and gene number to gene script). the contribute determine (Perl genes if statistically to opsin test chromosome the the to along flanking window used transposons gene was if opsin gene examine to local To 1 related the of transposon flanking size of step sequence Content a the duplication. with 2.17 of whole-genome 30 identity by of Sequence induced window was sliding genes. conversion a gene three using than genes more of sequences with (Rozas coding A 6 the least DnaSP 1989). along at in (Sawyer values included GENECONV ? using group 0.4, events checked obtained than each was we duplication less group and were the groups per genes, between conversion values with opsin Gene distance value these conversion, where 1989). one (Sawyer grouped gene into described we merged via as were trees, groups expansion These gene gene and sequences. three synteny opsin gene of genes by mechanism opsin obtained bootstrap by of the assessed sites investigate was selection To trees positive gene and the conversion in Gene clades the 2.16 were of trees replicates. reliability 1000 gene The coding using and the ratio. computed Muscle, extracted likelihood probabilities using we the upstream aligned for Second, were three chosen found, sequences (Kumar plus genes. was be v10 coding gene opsin could MEGAX single-exon opsin chromosome/contig the using and same the of constructed the genes, as gene on opsin downstream set genes that of was upstream/ adjacent genes sequences window adjacent the adjacent of sliding no check none to the reported If approach of we defined window genes. size were sliding downstream region The opsin a three syntenic of and used the regions. locations We in nearby the orientation annotation. in observed and genome first order appeared original we gene the The trees, on gene annotations. and based genome synteny studied gene the in both genes by duplications genes opsin sequences only opsin examine coding of and To the analyses S2), while phylogenetic (Table S1), and files (Table Synteny annotation the retained 2.15 and genome zebrafish were studies. sequences genes their the follow-up protein sensitive for from these against genes/light used retrieved of BLASTed were opsin sequences were were as coding genes annotated fishes the opsin negatives, genes with 13 of false aligned in reduce locations were To the motifs genomic 3) sequences genomes, conserved retained. protein fish with were protein fish 13 opsins opsin genes 13 other of (e-value opsin and motifs the sequences conserved The zebrafish of protein find 2) The each to For download.html) 1) (http://hmmer.org/ sequences. Hmmer follows: sequences. genes using opsin as reference database zebrafish identified as Pfam and were used RefSeq, NCBI genes and from opsin S1) downloaded (Table was (Zv10) subtracted file were annotation genome zebrafish The < .5idctdsaitclsgicne omauetedvrec ewe ee ihgn conversion, gene with genes between divergence the measure To significance. statistical indicated 0.05 tal. et 07.T vi istwr eecae,w acltd?vle o e subfamily per for values ? calculated we clades, gene toward bias avoid To 2017). < 1x10 tal. et -5 .Ol h rti eune ihtebs ist noae zebrafish annotated to hits best the with sequences protein the Only ). 02 o h siaino pi eeepeso.Teconfidence The expression. gene opsin of estimation the for 2012) tal. et 08 o aiu ieiodmtos n h oe GTR model the and methods, likelihood maximum for 2018) .undulatus C. .undulatus C. 6 tal. et 02.Tecieino xrse abundance expressed of criterion The 2012). odtrieteepeso fosngenes. opsin of expression the determine to sn ls,adtempigraswere reads mapping the and Blast, using P value < .1wsrgre sthe as regarded was 0.01 P Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. sacieinada D au ?005 eotie ieetal xrse ee rmmlil tissues multiple used from was genes muscle expressed in differentially genes obtained of we expression [?]0.005, the value When FDR values genes. FPKM an on and protein-coding based criterion predicted obtained a were total transcripts as the tissue of length in 96.79% abnormal expressed in genes no 3B). for annotated 21,572 genomes, (Fig. of be annotated observed total could available was A genes with introns 9,190 species and of seven exons, of total to genes, (13,654) a public of Compared 61.27% and in distribution 3A). least S8) annotation (Table at Functional (Fig. database that (Table databases 2). one indicated all in genes (Fig. GO, homologues predicted 1246 and displayed to SWISS-PROT, we total genes 460 NR, the the the and from KEGG, of of ranges KOG, genes data, 99.69% chromosome including 22,218 transcriptome in databases, annotation, for distributed functional with accounting were After functional, together and S7). were S8), (Table models, genes genes gene protein-coding protein-coding 22,286 predict predicted of to total to a of used Mb obtained size 27.2 were from genome methods wrasse. ranges final humphead Homology-based chromosomes a the 24 an for and the assembly with annotation chromosomes, genome of scaffolds Genome 24 chromosomal size 24 3.3 the the The into providing on clustered 2), into genome. anchored were draft (Fig. oriented reliably Mb Mb 99.98% and S3), 3.7 59.6 a ordered, (Table of representing Mb clustered, length Mb, 51.5 were N50 1173.2 an of contigs with length the contigs N50 information, 308 chromosome interaction Finally, for valid polished information interaction the chromosomes. the valid mapped provided With which uniquely the (64.4%), which at pairs assembly. pairs, read million sequencing read clean 380.2 Hi-C million clean and 320.6 total via 93.2%, obtained the of S3) Q30 of (Table with 76.4% genome pairs for undulatus read the accounting clean pairs, of total read coverage million clean 497.8 124-fold produced reads, which raw level, chromosomal Gb 145.8 obtained We assembly assembly genome thorough genome provided high Chromosome-level than have showed more 3.2 we was RNA-seq Therefore, genome Illumina the S6). from (Table of tissues 94.98% accuracy multiple genome to base for entire of 89.74% the the from transcriptome and Meanwhile, rates reads, the map-read short Furthermore, genome. Illumina assembled 1). of the (Table known 98% in 99.99% than the found more were than (Xiao by genes larger covered Mb BUSCO is was 900 complete species approximately of S5), to this (Table 96.36% 366 Mb of and from 16.5 genome ranging of The The usually length N50 fishes, 42%. al. contig analysis. marine was from a other 17-mer content Mb with of by GC continuity 1173.4 genomes average of was estimated assembly level whole-genome of size of genome high the assembly draft genome length a and final genome reached total the The which for a than number, runs. result, obtained contig lower two a were 328 reads in slightly As Mb polished clean was was S4). 16.4 Gb genome (Table genome of 86.4 nanopore-assembled assembly assembled length obtained genome the N50 following we of contig the from and size a for filtered libraries, kb were and 20-kb 31.69 sequences Mb three of adapter 1164.9 of length and sequencing N50 reads indicating an low-quality nanopore sequencing, with The from long-read data S3). nanopore (Table via genome data genome raw Gb the 90.7 of and coverage sequencing, 77-fold short-read Illumina via data Gb 10 x 4.2 approximately (Vurture was GenomeScope k-mers of number total the of size genome assessment The quality and assembly Genome 3.1 Discussion and Results 3 level. (Love (Pertea DESeq2 expression high for 15 .undulatus C. 08.W vlae h ult fteassembled the of quality the evaluated We 2018). eoe fe xlso ftecenra ar htcudntpoieitrcinifrain we information, interaction provide not could that pairs read clean the of exclusion After genome. tal. et . .undulatus C. 04,uigtefledsoeyrt FR ocluaethe calculate to (FDR) rate discovery false the using 2014), tal. et 07 ttekmrpa ihadpho 3.W eune prxmtl 49.9 approximately sequenced We 33x. of depth a with peak k-mer the at 2017) tal. et a siae ob .812 bbsdo 7mrfeunis(i.S) and S1), (Fig. frequencies 17-mer on based Gb 1.18-1.27 be to estimated was 05.Mawie h ieec ngn xrsinwsietfidby identified was expression gene in difference the Meanwhile, 2015). 7 10 .undulatus C. sn nGE(Sun findGSE using eoeaantteBSOdatabase, BUSCO the against genome tal. et 08 n . 10 x 3.9 and 2018) P au tasignificant a at value .undulatus C. tal. et > ,accounting 0, 09 Xu 2019; 10 The . using C. et Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. a-ne s eoe ihanttdtasoos uha erfih(Howe zebrafish (Brawand (Cowman tilapia as Mya such 50 transposons, around annotated super (Shao (Tine with Cheilines transposon Mya genomes with contrast, 27 230 fish In ancestor ray-finned displays about common genome. which originated a the Zebrafish zebrafish, in from the burst species. as a of such and and history 2,618 diversity, activity and genome, recent highest rRNAs, (Sotero-Caio the suggesting the 111 families 5A), of ncRNAs, display 46.07% 711 (Fig. fishes of for 30% ray-finned total than accounted A lower which the mostly 2). sequences, in (Table were repeat genome annotated the the were the of of tRNAs in 39.88% identified Mb for were accounted 540.85 (SSRs) shared transposons repeats found genes sequence We by simple 227 shared and smallest 4B). genome. types) genes the DNA Fig. 965 and and up/down, mostly 4B), (RNA expressed were Transposons genes Fig. there (1,632/1,172 up/down, and retina expressed tissues, and genes between muscle (2,509/2,232 size by spleen intersection and the muscle on the focused We 4A). (Fig. lseighmlgu eesqecsfrom of sequences history , gene evolutionary homologous the clustering understand better To content Transposon evolution 5B). Genome 3.4 (Fig. content of transposon genome richer the exhibiting in evolutions. genomes present larger highly with is size, genome to (Chen flatfish . xaso fosngnsi a-ne fishes ray-finned in genes opsin of Expansion future. 3.6 the in investigated of be loss (Waszak could development protein, the tumor impact and predisposes factor-like subunit which elongation, growth aggregation, largest al. translational and the hepatocyte et misfolding catalyzes is protein which and to ELP1 complex, leads Interestingly, variants (ELP1), elongator function S10). conserved 1 (Table evolutionarily annotation protein the function of complex ( for elongator SWISS-PROT selection of by positive synthase, site identified significantly branch acid to the fatty subjected to were in identified activity, According genes oxidoreductase were 46 S2). processes, that families reduction found (Fig. oxidation- gene activity with contracted associated catalytic 1,848 mainly and were and families families gene expanded gene expanded ( 560 families of total the A in of genes evolution families. selected adaptive positively gene and the families investigate gene To of contraction of and radiation adaptive Expansion accelerated 3.5 jaw pharyngeal (Alfaro diversity the subsequently, ecomorphological Labridae; Labrid of the occurred diversification which initial (Alfaro the diversification, be period included) with Cretaceous family ancestor Late common (Labrida the the in subclade Mya percomorph 85-100 major approximately with consistent with are ancestor analysis, common phylogenetic the the from to According with together times. divergence clustered determine and tree phylogenetic in , n .milii C. and .anshuiensis S. .undulatus C. 00.Pstvl eetdEP in ELP1 selected Positively 2020). tal. et .undulatus C. P < 2018), .5 eesbetdt Otr nihetaayi.W on httesignificantly the that found We analysis. enrichment term GO a to subjected were 0.05) tal. et sarsl,69snl-oygnsad2,8 ee rm1,1 aiiswr identified were families 15,410 from genes 22,286 and genes single-copy 619 result, a As . Fg A.Nx,w sdtecdn eune f69snl-oygnst osrc a construct to genes single-copy 619 of sequences coding the used we Next, 6A). (Fig. , tal. et aiuurubripes Takifugu .semilaevis C. 04,admdkpe (You mudskipper and 2014), eoe(i.6) oa f40gnsblnigt 9 infiatyepne gene expanded significantly 199 to belonging genes 430 of total A 6B). (Fig. genome tal et 04,teoag lwfih(Lehmann clownfish orange the 2014), .melops S. .bergylta L. 07.Tasoo ciiyaddvriyaeascae ihteevolutionary the with associated are diversity and activity Transposon 2017). . .undulatus C. .aculeatus G. , .aculeatus G. and .undulatus C. (Aparicio and .bergylta L. .undulatus C. .undulatus C. .melops S. .bergylta L. eoe(al 9.Tedvrec ae ftetransposons the of rates divergence The S9). (Table genome tal. et and tal. et , tal. et .latipes O. ugsigipratyrlso rnpsni genomic in transposon of roles importantly suggesting , eogn oteLbiafamily. Labrida the to belonging , .maculates L. .undulatus C. 2009). 02,crwn rse(Mattingsdal wrasse corkwing 2002), rud5.7Ma hsclbaintm sblee to believed is time calibration This Mya. 58.37 around 8 04,w on httasoo otn contributed content transposon that found we 2014), a lya motn oei eeomn,adits and development, in role important an play may , eetmtdteepninadcnrcinof contraction and expansion the estimated we , .melops S. , tal. et .pectinirostris B. tal. et rud9.8Ma(i.6) u results Our 6B). (Fig. Mya 92.28 around .undulatus C. tal. et eietfidsnl-oyotoosby orthologs single-copy identified we , 2018), tal. et , 04,whereas 2014), P .niloticus O. < 2018). .5,adteegnsfunction genes these and 0.05), = .anshuiensis S. 09.I oprsnwt ten with comparison In 2009). tal. et nteeouinr re we tree, evolutionary the in , .undulatus C. .maculates L. 03,sotdsabass sea spotted 2013), , .undulatus C. .percula A. .undulatus C. (Yang .undulatus C. tal. et iegdfrom diverged , .undulatus C. tal. et .rubripes T. 08,Nile 2018), , diverged diverged .rerio D. 2016), was Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. oe ucindffrn rmteosngn in gene opsin three the in from chromosome different another function to novel including translocated fishes, chromosome was Labridae copy another of One to composed translocated branches the copy in one for of except analyses Synteny number, copy in changes 7A). few (Fig. very with 3R (Cortesi after fishes percomorph most of duplication. background. whole-genome light than a al. rather against event, prey expansion detect opsin to (Sadovy (Lin juveniles zooplankton m capture of 30 to than areas less river depths outer as murky and reefs, shallow al. regions, inshore sandy inhabit to in important due notably sensitivity were longer-wavelength the restore to parum al. the of as events such duplicate, retrotransposed a plus in duplicate posed the Surprisingly, (Porath-Krause changes functional gene causing subsequently between located was that in gene the a However, and fragment, group. percomorph up produce the to in four occurred to events up duplication additional of environments two retention dim-light where kb), to 30 adapt three (˜ to to region duplications genomic gene a in by diversification species 2020). these Jia of & acuity (Yokoyama visual improve could which oisin copies curdi h omnacso fCpiioms(Lin Cypriniformes of (Lin ancestor (2R) events common the duplication in whole-genome of occurred of chromosomes different run in complexity second LWS1 SWS2- the of showed of component trees, result adjacent evolutionary a immediately with an together duplicated genes, and opsin of analyses variation. Synteny sensitivity visual species’ and in that lost four, to were in related subfamily The bp Rh1 subfamilies. 100 the five in than in into S11), Genes less sorted (Table and segments in genes fishes. one), found opsin DNA these in 122 in (missing bps identified lost genes 480 finally were of incomplete We (You fragments 5 reference. DNA predators genes, sequence inserting terrestrial a complete avoid as 117 used to including were order genome in zebrafish the vision aerial and for types, contrast, adapted visual several (Shao is number representing capture environments, copy fishes, investigated prey ray-finned we for functions, of system behavioral genomes for well-assembled useful 13 is milli in genes genes opsin opsin of diversity of the whether test To .melops S. .latipes O. 03,adthe and 2003), 09,sgetn htti pce newn h R s-pcfi eoedpiain iia othat to similar duplication, genome fish-specific a 3R, the underwent species this that suggesting 2009), 05 Lin 2015; Rh2 sa ugop(i.6) o example, For 6B). (Fig. outgroup an as ihcmon yspsesdfieunique five possessed eyes compound with , SWS2 .anshuiensis S. .pectinirostris B. .anshuiensis S. oisi admary,ecp zebrafish. except arrays, tandem in copies , tal. et LWS1 Fg A.Orrslspoieeiec that evidence provide results Our 7A). (Fig. .rubripes T. ee nsm ecmrhfihlnae (Cortesi lineages fish percomorph some in genes LWS1 LWS1 .undulatus C. 07 Phillips 2017; Rh2 eehv o enosre napoiaey10ryfindfihgnms(Cortesi genomes fish ray-finned 100 approximately in observed been not have gene SWS2 eehsdslydsnl-eedpiain opouefu ois n retrotrans- a and copies, four produce to duplications single-gene displayed has gene . eei eurdfrtepeaetwvlntsudrsalwwtrcniin,such conditions, water shallow under wavelengths prevalent the for required is gene sacvfih n t ysaecmltl ot(Yang lost completely are eyes its and cavefish, a is hwdta eeepnin aeocre opouetret v admcopies tandem five to three produce to occurred have expansions gene that showed Rh2-2 SWS2 , .undulatus C. , .maculates L. .semilaevis C. tal. et ee in genes tal. et hra te pce hwdete one either showed species other whereas , n he ee eefudol ntreseis The species. three in only found were genes three and sasnl-eedpiainta fe csa pigor o adaptive for springboard a as acts often that duplication single-gene a is tal. et 07.Teeoe ulcto fthe of duplication Therefore, 2017). tal. et .undulatus C. 2018). 06 Rennison 2016; .pectinirostris B. ihbhvosotngie yvsa cues. visual by guided often behaviors with , and , SWS2 05.W nerdthat inferred We 2015). and , .maculates L. .melops S. .pectinirostris B. .niloticus, O. SWS2 eein gene .pectinirostris B. LWS SWS ugsiga nxetdepnino the of expansion unexpected an suggesting , .undulatus C. .anshuiensis S. 9 .bergylta L. and .undulatus C. tal. et SWS2 LWS1 .maculates L. pi ee,adthese and genes, opsin ncnrs,ohrseisdslydn oethan more no displayed species other contrast, In . and tal. et eels (Sharkey loss gene LWS1 sal eyo y eeomn n h visual the and development eye on rely usually n pt orajcn oisin copies adjacent four to up and SWS2 .bergylta L. 02.Hwvr nisc species, insect an However, 2012). tal. et neouinr rgn Fg A,wihwas which 7A), (Fig. origins evolutionary in eedpiain niae nindependent an indicated duplications gene 07.The 2017). napiiu s dpe oterrestrial to adapted fish amphibious an , .undulatus C. tal. et LWS1 n n oywsdvrie oaqiea acquire to diversified was copy one and , a five has ol edvrie rmthe from diversified be could pce otosngnsta eenot were that genes opsin lost Species . a eivdt earsl fte4 that 4R the of result a be to believed was .aculeatus G. 2016). ol iesf fe eeduplication, gene after diversify could , hwddvrec u oa inserted an to due divergence showed .melops S. 05.Itrsigy eosre the observed we Interestingly, 2015). LWS1 xaso curdin occurred expansion Fg A.Taiinly expansion Traditionally, 7A). (Fig. Rh2 tal. et LWS1 SWS2 LWS1 tal. et eecudices h ability the increase could gene a 8crmsms(Huo chromosomes 48 has tal. et LWS ois h otrpre of reported most the copies, the ; and , 07.Adpiaeo the of duplicate A 2017). sasnl nin opsin, ancient single a as ee rone or gene, eehstoduplicated two has gene 2017). LWS2 ulctoswr used were duplications 06.Osngnsin genes Opsin 2016). .undulatus C. .bergylta L. Rh1-2 SWS1 LWS1 tal. et and .undulatus C. eewsonly was gene eewslost was gene .undulatus C. Rh2-1 LWS1 SWS2 SWS2 eo ves- Xenos expansions 04.In 2014). Fg 7B). (Fig. juveniles genes gene gene gene C. et et et , Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. eoesoe ihcneto rnpsn 3.8 fteetr eoe;teeoe edtrie the determined we therefore, genome); entire the of genomic (39.88% gene Razali transposons as opsin (Mat of such interpret content architecture environment, high genomic to genomic a shape difficult the showed can is that genome transposons it reported of results, been activity (Sandkam our has The (Lagman conversion on It 3R gene Based the opsin factor. after affect copies. single can that, architecture, gene a demonstrated on opsin results based to Our expansion rise 4). gave (Table also 48% duplicates should than duplicates more case, no this al. and In low, very duplication. in were genome occurred entire conversion than (Lagman gene rather sequence events flanking in expansion a copies share gene opsin in local increase of sudden rate the whether determined then We the in of 95% post-speciation than during greater the probability in a found with was sites codon selection much positive five was value (? ? selection The purifying (?). substitutions vs. synonymous b to for non-synonymous 1.0 of than rate less the an using plays windows and sliding DNA, of conversions segment P gene the two another in were onto conversion occurred there conversions copied Gene that found be We expansion. bp, to (79 1994). gene DNA Dykhuizen opsin of & the explain segment (Guttman of evolution to a in used causes role that be important could process which any conversion, is gene determined We expansion expansion. opsin as opsin of such of Mechanisms genes, mechanisms 3.7 opsin genetic multiple whether Interestingly, revealing known 1989). copies, (Sawyer not conversion opsin gene is unexpected and regarded it duplication, was gene knowledge, which duplication, our clades, genome To closed different the into duplication. separated In SWS1 gene were after S5). copies diversification two (Fig. and as events duplication duplication one showed showed gene species many and duplication, in found was duplicate anshuiensis the In in 7A). present were gene SWS1 SWS1 the of maculates duplications (Phillips Four sites the tuning genes. in sequence these events opsin known in duplication the color one detect as employ to To such which reduced sensitivity, was visual starfish, to gene and contribute Rh2 boxfish, to the copy hares, contrast, sea In percula Multiple predating environments. in A. 2020). reef Jia adaptation to & visual (Yokoyama similar environment disguises, good this imply in wavelengths genes discriminate Rh2 better (Sadovy slopes, m to reef (Davies 60 species outer to more copies vertebrate steep up along opsin possess of habitats depths by m to offshore organisms 50 reefs inhabiting of observed and commonly capabilities flats, more photoreceptive reef are the fish Labridae expanding in 2007). role important an (Phillips fish any .01,sgetn pi xaso ygn ovrin eas etdgnswt eecnesosby conversions gene with genes tested also We conversion. gene by expansion opsin suggesting 0.0001), = 03,gn ovrin aecnrbtdt h ubro pi ee.Bsds h retrotransposed the Besides, genes. opsin of number the to contributed have conversions gene 2013), SWS2 SWS2 , eedpiain lhuhsneyaaye eeldtoajcn ee noecrmsm (Fig. chromosome one in genes adjacent two revealed analyses synteny although duplication, gene LWS1 P .4) n eecneso of conversion gene one 0.044), = Fg 3.A iegneatrgn duplication, gene after divergence As S3). (Fig. Fg B,sgetn htteeseislkl dpe lentv ehnssisedo gene of instead mechanisms alternative adapted likely species these that suggesting 7B), (Fig. LWS1 eoeshowed genome Fg 6) and S6B), (Fig. c eein gene and , Rh2 LWS1 tal. et eete,gn ulctoswr bevdi w pce Fg 4.Det R the 4R, to Due S4). (Fig. species two in observed were duplications gene tree, gene Rh2 Rh2 vs. a .undulatus C. < .undulatus C. or 2016). ee hntoelvn bv 0m(Lin m 30 above living those than genes .) oucvrpstv eeto ie nteosngn,w sdPM ofind to PAML used we gene, opsin the in sites selection positive uncover To 1.0). .undulatus C. xaddtercp ubr noeseis pi ee a hnecpe by copies change can genes Opsin species. one in numbers copy their expanded , SWS2 Rh2 LWS1 Rh2 SWS2 Rh2 LWS1 ,and b, gene, ee Tbe3.Orrslssgetta pi eecnesosoccurred conversions gene opsin that suggest results Our 3). (Table genes eedpiain,weesfu eedpiain lsaretrotransposed a plus duplications gene four whereas duplications, gene ulctosmycicd iheeeouini hc hyhv played have they which in evolution eye with coincide may duplications , tal. et Fg A i.S) hlgntcaayi a ple oinfer to applied was analysis Phylogenetic S4). Fig. 7A, (Fig. , SWS2 SWS2 Rh2 LWS1 vs. a Rh2 tteeouinr level. evolutionary the at 03,w eemndtefln euneo h oa eewhere gene local the of sequence flank the determined we 2013), LWS1 vs. a , vs. d LWS1 vs. a and , LWS1 Rh2 gene, Rh2 SWS2 Fg 6) niaigta h pi eeunderwent gene opsin the that indicating S6C), (Fig. d Rh2 10 and , 9 bp, (96 b Fg 8A), (Fig. e LWS1 tal. et ee.Teietfiain ftefln sequences flank the of identifications The genes. 34bp, (314 d Rh2 tal. et .pectinirostris B. vs. a 2017). ee,w osrce hlgntctesof trees phylogenetic constructed we genes, P 03.I srpre htmrn s below fish marine that reported is It 2003). .06 and 0.0006) = .undulatus C. .undulatus C. tal. et SWS2 LWS1 P .1) and 0.016), = tal. et 07.GensniieR2helps Rh2 Green-sensitive 2017). vs. a 7 bp, (70 d Rh2 .undulatus C. and 09,adthe and 2019), SWS2 stersl fa increased an of result the is n n ulcto in duplication one and , ee(i.8) u osite no but 8A), (Fig. gene Rh2 .niloticus O. P vs. d SWS2 Fg S6A), (Fig. d .undulatus C. .4) w gene Two 0.047). = .semilaevis C. eoe the genome, Rh2 vs. b i o show not did .undulatus C. tal. et 17bp, (197 e Rh2 SWS2 showed 2016). SWS2 tal. et gene Rh2 and L. S. et c Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. civn 134M eoewt otgN0lnt f1. bbsdo a ed f9. b 77- a Gb, 124-fold 90.7 a of represented reads that raw reads on raw based sequencing, the Mb read technology, 16.5 sequencing long of Hi-C Nanopore first length Using we via N50 study, genome. contig generated reference this the a wrasse high-continuity In of with humphead and coverage genome limited. high-quality role the fold Mb remain important of of 1173.4 species lack an assembly a this plays the genome achieving of that to draft efforts species Due a conservation endangered present ecology. and an understanding reef is the of wrasse genomes, stability humphead the the maintaining and in fish, marine and are behavior, Wrasses development, plasticity, visual in Conclusions role 4 important an plays primary pattern a (Sandkam reproduction. expression is partners gene aggregations, mating opsin spawning between the during discriminating and behavior (Todd detecting mating (Sadovy change for age unique of sex a years of 8-9 as around trigger such male stimuli, to (Stieb environmental female feeders from and sex herbivorous the changing with hermaphrodite, in damsels protogynous into cues growing in visual juveniles strategy (Sadovy (Mackin for of environments feeding habitats hormone genes adaption photic opsin different thyroid alternative different of in to expression provide signal alternative adapt live to The endocrine to useful used behavior. the and be is environment, by may expression the triggered organisms, opsin of are of stages developmental which mechanism This of copies 2019). visual the even variable (Phillips of of genes, ecologies adaption expression in opsin behavioral genes the and opsin that diversity of phenotypic found expression diverse drive shown to have species sensitivities labrid that reported been pcrlyrsd naha-oti atr (Mackin of pattern sensitivity head-to-tail the a in reside spectrally ti neetn ont that note to (Porath-Krause interesting functions novel is of It acquisition and duplication, gene after changes 9). role (Tian retinol profound SWS1 rhodopsin (Fig. encoding a of expression gene plays level divergent the which the showed contrast, all-trans-retinal, at In to generation all-trans-retinol efficiency. chromophore converts high chromophorein 18.63) light-sensitive with (FPKM: a retinol 10 produce corresponding dehydrogenase to to which retinaldehyde retinal 15.34), 11-cis (FPKM: 9-cis of converting 5 biosynthesis dehydrogenase retinal the Werner encoding in (Skorczyk- genes step include final opsin retina the of the catalyzes analysis in expression expressed (FPKM highly performed expression Genes we Carleton higher genes, & of showed expanded (Hofmann tissue genes the evolution retina and the in of adaption changes RNA-seq ecological functional via fish determine facilitating To in role 2009). key a plays expression genes Opsin expansion. opsin opsin of to important (Liu expression play ascribed transposons Divergent evolution Transposons of be 3.8 in strand. may means positive average transposons behavior the The that adaptive in believe strand. 111 to to positive and plasticity reasonable the strand negative in genome the 116 in the in and roles 107 and strand was negative 8B), window the gene in (Fig. per 113 strand is ( negative window difference the per significant along a 130) with down 8B), and 131 ( them (up the between the windows different in occurring significantly adjacent transposons mainly was expansion of the number gene number in opsin the to that that owing found than 3, We chromosome along chromosome. window this kb in 100 a of content transposon Rh2 , SWS2 idw 8)wssgicnl oe hnta nteajcn idw u 0 n on11 (Fig. 111) down and 102 (up windows adjacent the in that than lower significantly was (82) windows c, SWS2 Rh1 tal. et ,and a, obu aeegh a etrdby restored was wavelengths blue to 05.Gnsecdn eia eyrgns FK:2.8 r aal of capable are 26.98) (FPKM: 9 dehydrogenase retinal encoding Genes 2015). Rh2 tal. et LWS1 eeepesdi te ra nta ftertn,sgetn functional suggesting retina, the of instead areas other in expressed were b LWS1/SWS2 P tal. et .undulatus C. 09.Osngn xrsindvrec sblee ob responsible be to believed is divergence expression gene Opsin 2019). .2fru n .0 o on.Teaeaemaso transposons of means average The down). for 0.002 and up for 0.02 = a o xrse naytsu.I ayfishes, many In tissue. any in expressed not was SWS2 03,adosnepeso aito sas ihycreae to correlated highly also is variation expression opsin and 2003), > 5,acutn o 73 ftettlrtn-xrse genes. retina-expressed total the of 17.3% for accounting 15), a, oa f4,2,5 la ed eeotie,ad3,849 and obtained, were reads clean 44,028,650 of total A . SWS2 eei tcatceet n xiisasic between switch a exhibits and event, stochastic a is gene P tal. et .4.Frthe For 0.04). = 11 ,and d, 09,wiei beetles, in while 2019), tal. et LWS1 tal. et Rh1 03.Tecutraayi fosngenes opsin of analysis cluster The 2013). tal. et xr ois(Sharkey copies extra Rh2 07.Furthermore, 2017). r xrse ntertn,whereas retina, the in expressed are b tal. et LWS1 ee h ubro rnpsn in transposons of number the gene, 00 Robert 2020; 07.Orrslssoe that showed results Our 2017). - SWS2 tal. et SWS2 .undulatus C. tal. et idw(1)wslower was (111) window 06.Arcn study recent A 2016). a otrcnl,and recently, lost was 2016). tal. et tal. et LWS1 .undulatus C. dls which adults, 07.I has It 2017). tal. et 08.I is It 2008). and 2003), SWS2 tal. et sa is Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. esbrhT pe 20)Gsa:fidsaitclyoerpeetdGn nooiswti ru of group a within Ontologies Gene overrepresented statistically find GOstat: (2004) genes. T Speed T, Beissbarth hnS hn ,Sa ,Q 21)Woegnm euneo afihpoie nihsit Wsex ZW into insights provides flatfish a of lifestyle. sequence benthic Whole-genome phylogenetic a (2014) in to H adaptation use QF and their C, evolution Shao for chromosome G, alignments Zhang multiple S, Chen from blocks conserved of Selection analysis. (2000) J Castresana interactions. chromatin on based RP Patwardhan blies A, Adey JN, Burton YI fish. Li sequences. CE, DNA Wagner analyze D, Brawand to program a finder: 573-580. repeats eukaryotic, Tandem in (1999) G elements Benson repetitive of database a Update, Repbase (2015) Labroidei) O (Perciformes, Kohany fish labrid K, Resource of genomes. Kojima species Protein and W, Universal genus Bao Bolca,Italy. new The Monte a (2010) of bloti, Eocene J Coris the (1990) Zhang from L V, Sorbini AF, Amendolia Bannikov S, Altairac L, 2009. (UniProt) Bougueleret A, Bairoch rubripes. Fugu E of Stupka J, Chapman S, Aparicio RC Harrington boundary. & BC, Cretaceous–Palaeogene Faircloth (No.2018YFD0900802 jaws fishes? ME, pharyngeal labrid Alfaro in China in innovation evolutionary diversification Does lineage of (2009) rapid PC Wainwright to Program BL, lead Banbury CD, R&D Brock ME, Key Alfaro Fisheries. National of Project the Disciplines References First-class by Universities Shanghai supported the and was 2018YFD0900905) work This gene behavior, understand in chromosome-level further The opsin Acknowledgements to resources of species. fishes. valuable expression in this provide adulthood, uneven evolution will into divergent of and wrasse transition the reproduction fluidity, The humphead foraging, by and of efficient contributed reversal, assembly in region. was sexual genome divergence genome function which behavior, for special mate-searching conversion, plasticity copies a specific gene visual opsin in to indicated increased retina owing elements The the wrasse, transposable humphead duplication. of have We the gene distribution clues. for after visual increased adaptation specific in opsins, alternative with were species of to highlighting this correlates ability number of possibly genome the in morphology increased genomics, larger increase functional which of the sudden a jawbone, investigate 22,180 united a 39.88% that to the with found for reveal us of annotated a accounted allows feature was fishes elements with specialized food, genome other Transposable A Mb manipulate The with elements. 1173.2 genes. Mb. transposable Comparisons of 51.5 BUSCO of of content size complete length genome. genome the N50 entire a of scaffold the 97% produced a and and and Mb genes chromosomes 3.7 functional 24 of into length anchored N50 contig genome the of coverage Nature Bioinformatics o ilEvol Biol Mol oieDna Mobile 513 375-381. , uli cd research acids Nucleic science 20 6 17 11. , 297 1464-1465. , 540-552. , 1301-1310. , tal. et , tal. et , a clEvol Ecol Nat tal. et , 21)Tegnmcsbtaefraatv aito nArcncichlid African in radiation adaptive for substrate genomic The (2014) aueBiotechnology Nature 37 td ieceSiGaiet ezaiD Bolca Di Terziari Giacimenti Sui Ricerche E Studi 20)Woegnm htu sebyadaayi ftegenome the of analysis and assembly shotgun Whole-genome (2002) D169-D174. , tal. et , 21)Crmsm-cl cffligo env eoeassem- genome novo de of scaffolding Chromosome-scale (2013) SWS2 2(4):688-696 , 21)Epoiedvricto fmrn se tthe at fishes marine of diversification Explosive (2018) 12 M vltoayBiology Evolutionary BMC , LWS ,and 1, aueGenetics Nature 31 1119-1125. , Rh2, ntne atr ycomparative by pattern tandem in 46 9 253-260. , uli cd research acids Nucleic 1,255. (1), 133-148. , 27 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. uJ a ,SnZ i 22)NxPls:afs n ffiin eoeplsigto o ogra assembly. long-read for tool polishing genome efficient and fast a NextPolish: (2020) S Liu Bioinformatics Z, Sun J, Fan J, Hu genome. human the CF role Torroja to important MD, an Clark fish. play K, in expression Howe pigments gene visual differential and of duplication diversification organ- Gene among the (2009) times in KL divergence Carleton of CM, knowledge-base Hofmann public a TimeTree: (2006) S Kumar isms. J, transposable Dudley inverted-repeat S, miniature Hedges discovering for using program sequences. annotation a genomic structure from MITE-Hunter: gene elements (2010) eukaryotic SR Automated Wessler Alignments. (2008) Y, Spliced M Han Assemble not to . recombination, . Program Pertea. of the W, result and Zhu EVidenceModeler a SL, as Salzberg coli BJ, Escherichia Haas in divergence Clonal (1994) mutation. DE Dykhuizen D, Guttman genomes. rse(hiiu nuau) p -2,Rpr oNtoa aieFseisSrie ffieo Protected M of Marshall Office S, Service, Moxon Fisheries humphead S, Marine report: Griffiths-Jones National review to Status Report (2015) 1-123, MS Trianni Resources. RE, pp. Schroeder undulatus), EE, (Cheilinus DeMartini wrasse CH, sequencing. Boggs short-read KS, from Graham detection variant 2014. Haplotype-based Ensembl (2012) (2014) G Biology K Billis Marth K, E, Beal Garrison D, Barrell evolution. M, protein Amode in P, selection Flicek and constraint, Genet functional Hum divergence, (Labri- Genomics Sequence 1835 Rev (2003) Ruppell, CI undulatus Wu Cheilinus gene JC, Fay world: of the study of the fishes for Threatened tool (2001) dae). computational Y a Sadovy CAFE: TJ, (2006) Donaldson MW Hahn JP, Demuth evolution. N, family Cristianini T, lamprey. Bie anadromous De an in expression gene LS pigment Carvalho visual JA, Cowing WL, Davies a)adters ftohcnvlyo oa reefs. (Labri- coral lineages wrasse on novelty of trophic origins evolutionary of the rise Dating the (2009) and LV Herwerden dae) DR, evolution Bellwood gene opsin PF, dynamic Cowman highly and duplications Ancestral fishes. (2015) N percomorph Hart in S, Stieb Z, Musilova F, Cortesi rainbow of natural retina during WL opsin vertebrates. the Davies sensitive MA, in light Knight (SWS2) SP, photoreceptors blue Collin cone to (SWS1) of UV organization from switch Chromatic development. irreversibly cones (2007) preprocessor. single FASTQ I all-in-one trout: Flamarique ultra-fast an C, fastp: Cheng (2018) G Jia Y, Chen 34 Y, Zhou S, Chen i884-i890. , niomna ilg fFishes of Biology Environmental Bioinformatics 1-9. , uli cd research acids Nucleic science urBiol Curr x Biol Exp J 36 Bioinformatics 2253-2255. , 266 22 PNAS Nature 1380-1383. , 13 2971-2972. , 210 R864-865. , 4 112 213-235. , 496 4123-4135. , tal. et , 22 33 1493-1498. , uli cd research acids Nucleic tal. et , 498-503. , 1269-1271. , 121-124. , tal. et , 62 21)Tezbas eeec eoesqec n t relationship its and sequence genome reference zebrafish The (2013) tal. et , 20)Acetclu iin utpeosngnsi h ancestral the in genes opsin multiple vision: colour Ancient (2003) 428. , 20)Fntoa hrceiain uig n euainof regulation and tuning, characterization, Functional (2007) nerCm Biol Comp Integr 20)Ra:anttn o-oigRA ncomplete in RNAs non-coding annotating Rfam: (2005) oeua hlgntc n evolution and phylogenetics Molecular 13 38 AE J FASEB e199. , 49 630-643. , eoeBiology Genome uli cd research acids Nucleic 21 2713-2724. , 9 7. , 52 Bioinformatics 42 621-631. , Quantitative 749-755. , Annu Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. ua ,SehrG iM na ,Tmr 21)MG :MlclrEouinr eeisAnalysis Genetics Evolutionary requirements. Molecular memory X: MEGA low (2018) K with Tamura C, aligner Platforms. Knyaz Computing spliced M, across Li fast G, Stecher a S, HISAT: Kumar (2015) SL in Salzberg improvements Methods B, Nature 7: Langmead version D, software alignment Kim sequence multiple MAFFT usability. and (2013) performance DM Standley K, and Katoh genes of encyclopaedia kyoto RE KEGG: genes. Andreas (2000) H, K. Peter L, S. Karin S,, A, Nakaya S, Kawashima genomes. S, Goto M, Kanehisa JL Erickson prediction. W, Michael com- K, the Jens toward GH paths Chen B, tools: Zhang enrichment R, lists. Bioinformatics Huo gene (2009) large RA of Lempicki analysis BT, functional prehensive Sherman W, da Huang oeT,Ed R(97 RAcnS:APormfrIpoe eeto fTase N ee in Genes RNA Transfer of Detection Improved for Program A data tRNAscan-SE: RNA-seq Sequence. (1997) for Genomic dispersion SR and Eddy change TM, fold Lowe of estimation Moderated (2014) Cheilinus S DESeq2. on Anders with Based W, Analysis Feature Huber Polymorphic MI, SNP Love and SSR (2019) Y Wang Y, Transcriptome. He M, undulatus Yang J, Liu H, Liu and genomes nasus. Labridae. fish Coilia the ray-finned of W of Tang 59 populations systematics J, in in Yang genes Advances D, opsin (2019) Liu W of Tang falls X, and Huang rises D, The Liu (2017) adaptation. genomes. T environmental eukaryotic Wang for W, for implications Li groups their F, ortholog Wang of J, identification Lin OrthoMCL: (2003) DS Roos research transform. Genome CJ, Burrows-Wheeler Stoeckert with L, alignment Li chromosome-scale long-read A accurate and Genes: Fast percula. (2010) Nemo’s matics Amphiprion R Finding clownfish Durbin (2018) orange H, the CT Li of Michell genome C, the of Schunter assembly 2. DJ, reference Bowtie Lightfoot with alignment R, gapped-read Lehmann Fast duplications. (2012) genome vertebrate SL early Salzberg of their B, of rounds Langmead visual two duplication of by the repertoire established ancestral in was vertebrate region receptors The genomic oxytocin/vasopressin (2014) shared and G Sundstro”m subunits X, alpha Abalo shared transducin J, opsins, Widmark their D, duplications. of Daza D, genome duplication Lagman vertebrate by trans- early opsins, of established visual rounds was of two repertoire 238. receptors the ancestral oxytocin/vasopressin in vertebrate The region and (2013) genomic X subunits Abalo alpha J, Widmark ducin D, Daza D, Lagman uli cd research acids Nucleic 26 uli cd research acids Nucleic 589-595. , uli cd research acids Nucleic eoeBiol Genome 12 13 357-360. , uli cd research acids Nucleic 2178-2189. , tal. et , eoisadApidBiology Applied and Genomics tal. et , o ilEvol Biol Mol 15 o DNA Mob 35 o ilEvol Biol Mol tal. et , 550. , 22)SN ertasoo aito rvsEoyi iprt nnatural in disparity Ecotypic drives variation Retrotransposon SINE (2020) 28 tal. et , 3100-3108. , 20)Tekroyeo hiiu undulates. Cheilinus of karyotype The (2009) e89. , 27-30. , 20)RAmr ossetadrpdantto frbsmlRNA ribosomal of annotation rapid and consistent RNAmmer: (2007) 11 21)Uigito oiincnevto o oooy ae gene based homology- for conservation position intron Using (2016) 30 25 4. , 35 772-780. , 955-964. , 1547-1549. , cetfi reports Scientific uli cd Res Acids Nucleic 14 3 1-12. , 7 37 15568. , 1-13. , aieFisheries Marine M vltoaybiology evolutionary BMC aueMethods Nature o clResour Ecol Mol aieSciences Marine M vlBiol Evol BMC 41 9 357-359. , 107-117. , 33 00 Bioinfor- 13 94-97. , 1-16. , 238. , 13 , Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. n eoto oe D5gn uainaetn h nain yoie(p.Tyr175Phe). tyrosine invariant the affecting mutation gene RDH5 317-327. novel , a of report genome and M assessing Michalczuk P, BUSCO: Pawlowski (2015) orthologs. A, Skorczyk-Werner single-copy EM with Zdobnov completeness EV, opsin Kriventseva annotation through I, and sensitivity Panagiotis assembly blue RM, of Waterhouse FA, loss beetles. Simao the group, Overcoming animal (2017) largest S the Shin in NP, P.Lord duplication MS, Fujimoto CR, Sharkey maculatus. N Wang conversion. C, Li gene C, detecting Shao for tests Statistical (1989) S Sawyer TK research Attwood acids Evolution A, Vision Rolf Color Selection. H, Impacts Sexual Sarah Environment Based Genomic Visually (2017) with F Family Breden a CT, in Watson JB, Joy fish. BA, reef Sandkam coral giant known poorly P and Labrosse threatened M, Kulbicki Y, trade. Sadovy food-fish live Kong Hong hits 4 Ciguatera (1998) Y Polymor- Sadovy Sequence undulatus. DNA Cheilinus 6: (2004) B DnaSP Russell (2017) Datasets. S Large Guirao-Rico of JC, Analysis Sanchez-DelBarrio phism A, double- Ferrer-Mata repair end-joining- J, germline. and Rozas elegans conversion Gene Caenorhabditis (2008) the fish. JL in ray-finned Bessereau breaks EM, in strand Jorgensen divergence MW, and Davis VJ, duplication Robert gene wrasse, Opsin humphead (2012) the of JS Evol Taylor sequence Phylogenet GL, genome Owens mitochondrial DJ, Complete Rennison (2013) R Huo Argopecten J, scallop, undulatus. Luo Cheilinus SW, the Yin in XZ, duplication Qi expression gene differential after and change differences Structural functional (Pectinidae). (2016) reveal irradians B opsins Birla rhabdomeric D, Fagglonato among A, Palrett A, Sensitivity Porath-Krause Visual to Contribute Mechanisms Labridae. Genetic the Multiple (2016) in NJ Variation Marshall KL, Carleton GA, Phillips reads. RNA-seq CM from Antonescu scriptome GM, Pertea M, Pertea Plasticity, Genome melops). OK in (Symphodus Torresen Role S, Adaptive Jentoft M, Elements Mattingsdal Transposable Phytopathogens. (2019) Fungal K in Nadarajah Evolution and B, Pathogenicity Cheah N, Razali Mat C 116 Gutierrez RA, Frey RD, Mackin 51-53. , 16882-16891. , GigaScience 37 62 211-215. , 986-1008. , eeis&MlclrRsac Gmr Research Molecular & Genetics Genomics tal. et , M vltoaybiology evolutionary BMC 7 . o ilEvol Biol Mol 21)Crmsm-ee eoeasml ftesotdsabs,Lateolabrax bass, sea spotted the of assembly genome Chromosome-level (2018) tal. et , a Biotechnol Nat 110 tal. et , tal. et , o.Bo.Evol. Biol. Mol. 399-403. , h UNRdLs fTraee pce 2004 Species Threatened of List Red IUCN The 20)Itrr:teitgaiepoensgauedatabase. signature protein integrative the InterPro: (2009) tal. et , tal. et , 21)Edcierglto fmlihoai oo vision. color multichromatic of regulation Endocrine (2019) 33 20)Tehmha rse hiiu nuau:Snpi fa of Synopsis undulatus: Cheilinus wrasse, humphead The (2003) 201-215. , 21)Acniuu eoeasml ftecrwn wrasse corkwing the of assembly genome continuous A (2018) 21)SrnTeealsipoe eosrcino tran- a of reconstruction improved enables StringTie (2015) 33 tal. et , cetfi reports Scientific 16 290-295. , eoeBo Evol Biol Genome eiw nFs ilg Fisheries & Biology Fish in Reviews 15 34 250. , Genetics 21)Fnu liucau:rve fteliterature the of review albipunctatus: Fundus (2015) 3299-3302. , n o Sci Mol J Int 12 1095-1105. , o ilEvol Biol Mol 180 P ieRe ihIfrainBulletin Information Fish Reef Live SPC 7 Bioinformatics 8. , 673-679. , 9 20 3100-3107. , 3597. , 6 526-538. , 31 3210-3212. , e.T4592A11023949. , 13 plGenet Appl J 327-364. , Nucleic PNAS Mol 56 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. ooaaS i 22)Oii n dpaino re-estv R2 imnsi vertebrates. in pigments (RH2) green-sensitive of adaptation and Origin Bio (2020) Open likelihood. H maximum Jia cave by S, into Yokoyama analysis insights phylogenetic provides for package genome program cavefish a Biosci Sinocyclocheilus PAML: The (1997) Z (2016) Yang D Fang J, fas- Bai Oplegnathus adaptation. X, knifejaw Chen barred J, Yang the Oplegnathi- fishes. family of Sillaginidae the for sequence S genome in Genome Zhu reference genome S, (2019) Xiao draft S, J chromosome-level Xu Li first J, the Liu 1844): D, Schlegel, dae. Ma & (Temminck Z, ciatus Xiao Y, Xiao NFAT. of S repressor Batalov a A, finds Orth A, in Willingham microsatellites of analysis BL and identification Gudenas medulloblastoma. the GW, for Robinson tool SM, novel Waszak A GMATo: (2013) Z genomes. and Luo large Phylogenetic P, A Lu X, Pharyngognathy: Wang of Beyond. Evolution and The Fishes Labroid (2012) in Innovation K Key 61 Jaw Tang Pharyngeal the S, of Price Appraisal Functional W, Smith P, Wainwright reads. M short Nattestad from FJ, Sedlazeck GW, Vurture Cufflinks. and TopHat with L experiments Goff A, Roberts C, Trapnell insights change. provide variation sex H controlled its Liu socially and orchestrate O, genome bass Ortega-Recalde sea EV, European Todd speciation. (2014) and B euryhalinity Louro to P, adaptation Gagnaire into H, Kuhl M, Tine maturation. M rhodopsin Sun human promote within T, variation Li size Y, genome Tian estimating findGSE: frequencies. (2018) k-mer K using Schneeberger Arabidopsis M, and Piednoe J, Ding genes. H, opsin Sun in variation expression L and Sueess structural generalized F, (Pomacentridae): a with sources. Cortesi eukaryotes external SM, in from Stieb prediction hints Gene uses (2006) that S Waack model Markov B, hidden Morgenstern O, Schoffmann M, of Stanke thousands with analyses models. phylogenetic mixed likelihood-based and maximum taxa RAxML-VI-HPC: in (2006) Elements A Genes Phototransduction Transposable Stamatakis Cone-Specific of of Inactivation Cetaceans. Diversity (2016) Monochromatic and R Rod Patel Evolution N, in Fugate (2017) CA, Emerling D MS, Ray Springer A, Suh RN, Genomes. II Vertebrate Platt C, Sotero-Caio 1001-1027. , GigaScience 13 10 555-556. , M Biology BMC 873-882. , Bioinformation 8 Bioinformatics Nature 1-8. , tal. et , tal. et , eoeBo.Evol Biol. Genome Bioinformatics 580 14 science 21)Adatgnm sebyo h hns ilg Slaosnc) h first the sinica), (Sillago sillago Chinese the of assembly genome draft A (2018) . 21)Nuei euae iulfnto i eitn eiodtasotto transport retinoid mediating via function visual regulates Neurexin (2013) tal. et , 396-401. , tal. et , 9 Neuron 33 541-544. , 309 rn.Eo.Evol Ecol. Front. tal. et , 2202-2204. , tal. et , 21)Dffrnilgn n rncitepeso nlsso RNA-seq of analysis expression transcript and gene Differential (2012) 21)WyU iinadrdvso r motn o damselfish for important are vision red and vision UV Why (2017) 1570-1573. , 22 GigaScience 77 a Protoc Nat 91 tal. et , tal. et , Bioinformatics c Adv Sci 20)Asrtg o rbn h ucino ocdn RNAs noncoding of function the probing for strategy A (2005) 2688-2690. , 311-322. , 161-177. , 21)Srs,nvlsxgns n pgntcreprogramming epigenetic and genes, sex novel Stress, (2019) a Commun Nat 21)GnmSoe atrfrnefe eoeprofiling genome reference-free fast GenomeScope: (2017) 22)Grln lnao uain nSncHedgehog Sonic in mutations Elongator Germline (2020) 5 16 4 7 7 eaaw7006. , 61. , 562-578. , . 34 M Bioinformatics BMC 550-557. , 5 5770. , o Ecol Mol 26 1323-1342. , 7 62. , ytmtcBiology Systematic optAppl Comput FEBS Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. oaino ahsneyi umrzdi al 1 h lso h eoeanttoscudb sdin used be could annotations genome the of files The S1. Table in summarized S2. is Table synteny single- each 619 of on location based tree picture. species as from same milii genomes the in is estimated orientation right the and orthologs synteny the to copy gene on correspond Opsin number node each circled 7. near split Figure The numbers the Numbers species. The since (red) these 100%. ancestor. contracted of of common and values time recent (green) support most divergence expanded had the been nodes have from All that species families of times. gene calibration showed of fossil species indicated each families under nodes internal gene the in and blot time tree size. phylogenetic divergence genome construct noted tree, chart pie Phylogenetic the (A) of area 6. The Figure genome. a in presented transposons of iseadmsl sda akrud u uce e eia i ie;o:ofcoyogn r ri;sp: brain; br: organ; olfactory or: liver; (A) li: retina; 5 re: Figure muscle; mu: genes. gonad. background. expressed gl: as tissue-specific spleen; used and muscle shared, and muscle tissue genes expressed the of 4 Figure species. six other for and distribution Mb. databases. length in public and the axis in annotation left Gene the on 3. showed Figure was of size density chromosomes gene and and windows, size kb Chromosome 2. Figure of picture A 1. edited, read, Figure authors All analysis. Legends and gene Figure sequencing, functional quality. and manuscript. Nanopore assembly annotation final the performed the genome assessed and approved the and and study analyses out this bioinformatic carried the assembled of Zhang and performed Xuguang otolith, projects Zhang on and the annuli Ming sequencing, determined sequencing; designed fish, Illunima transcript the Tang performed of Wenqiao picture samples, a genome; the took the Guo collected Hongyi Wang size; Xinyang genome the manuscript; estimated the wrote Liu Dong SRA7239481. The number accession genome the and PRJNA622923. under Contributions sequences Archive number Read raw Author Sequence The accession NCBI the SAMN14532944. NCBI in number deposited the accession were NCBI files assigned annotation the was assigned were sequencing data genomic BioSample from BioProject Reveal The Epithelium Olfactory of nasus). Transcriptomes Accessibility (Coilia Novo Data Anchovy De Grenadier (2014) Japanese in J Migration Yang e103832. Spawning D, , for Pathways Liu and W, methods Genes Tang signature-recognition the L, the Wang for G, platform Zhu integration an of – adaptation InterProScan terrestrial InterProScan. (2001) the in A into Rolf insights provide EM, genomes Zdobnov Mudskipper (2014) X Xu fishes. Q, amphibious Zan C, Bian X, You iegnedsrbto fkontasoosin transposons known of distribution Divergence eefml oprsnamong comparison family Gene sda outgroup. as used . . xrs fgnsbsdo h rncitm of transcriptome the on based genes of Express iegneadpretg ftasoosi h eoeof genome the in transposons of percentage and Divergence . Bioinformatics (A) aueCommunications Nature ytn fSS n W1pi ee n1 erettv pce and species reprsentative 13 in genes LWS1opsin and SWS2 of synteny (B) (B) hiiu underlatus Cheilinus .(B) oprsno ee xn n nrnlnt itiuin between distributions length intron and exon, gene, of Comparison ytn fR2osngn nseistesm sta ftelf.Tegenomic The left. the of that as same the species in gene opsin Rh2 of synteny 17 h hlgntcrltosi of relationship phylogenetic The 847-848. , .undulatus C. 5 5594. , sdfrtegnm sequencing genome the for used n te se,adsnl-oyotooswr sdto used were orthologs single-copy and fishes, other and .underlatus C. 17 hiiu underlatus Cheilinus hiiu undulatus Cheilinus .underlatus C. . (B) hiiu underlatus Cheilinus i hrscmaigtepercentage the comparing charts Pie (B) ieetepeso ee in genes of express Different ee eepotdi 100 in plotted were Genes . ihohrfihs h red The fishes. other with hiiu underlatus Cheilinus Cheilinus . (A) eeannotated Gene . Callorhinchus (A) underlatus .undulatus C. lSone PloS Numbers 9 . . Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. al 4 Table 3 Table 2 Table of 1 Table transcriptome 10. on by divided based been express X have Legends opsin axis Tables genes. of Y opsin analyses in of Cluster numbers locations positive transposon 9. significantly depicted and Figure depicted lines window, lines dotted kb copies dotted vertical 100 Rh2 vertical The a and between depicted 3. 1, P distribution axis substitutions Chromosome of with in genes size synonymous opsin strands step of and Rh2 positive/negative a of rate and transposons 30 sites Pairwise of opsin, selection window Rh2 a window. with of sliding calculated conversions by analyses Gene conversion 8. Figure dniyo euneflnigtelclgn ngn conversion gene in gene local the flanking sequence genes of opsin Identity of sites selection wrasse positive humphead of the Number of genome in annotations element Repetitive assessed genome assembled the of Summary < 0.05. (B) h ubr ntasoosadgnsdsrbto along distribution genes and transposons in numbers The 18 hiiu underlatus Cheilinus . (A) Gene . Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. 19 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. 20 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. 21 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. 22 Posted on Authorea 11 Sep 2020 — The copyright holder is the author/funder. All rights reserved. No reuse without permission. — https://doi.org/10.22541/au.159986378.81705478 — This a preprint and has not been peer reviewed. Data may be preliminary. al dniiaino ln sequence.docx cheilinus-undulates-insight-into-unexpected-expansion-of-opsin-genes-in-fishes flank of articles/479883-chromosome-level-genome-assembly-of-the-endangered-humphead-wrasse- identification 4 Table file Hosted sites.docx selection wrasse-cheilinus-undulates-insight-into-unexpected-expansion-of-opsin-genes-in-fishes positive of 357254/articles/479883-chromosome-level-genome-assembly-of-the-endangered-humphead- Number 3 Table file Hosted insight-into-unexpected-expansion-of-opsin-genes-in-fishes repeats.docx of chromosome-level-genome-assembly-of-the-endangered-humphead-wrasse-cheilinus-undulates- Type 2 Table file Hosted assessed.docx fishes genome assembled humphead-wrasse-cheilinus-undulates-insight-into-unexpected-expansion-of-opsin-genes-in- the of com/users/357254/articles/479883-chromosome-level-genome-assembly-of-the-endangered- Summary 1 Table file Hosted vial at available https://authorea.com/users/357254/articles/479883- 23 vial at available vial at available https://authorea.com/users/357254/ vial at available https://authorea.com/users/ https://authorea.