Austral de Austral D. Joseph Th 10.1111/1755-0998.12588doi: differences to lead between the of thisversion citethisarticle and asVersion Record.Please copyediting, paginationbeen throughthe andproofreadingtypesetting, process,may which This article acceptedundergonehas been fullpeer forpublicationreview buthasnot and adaptation, bony fish, Keywords: trade,aquarium E-mail: Phone: +61 478 514097 University, PO Box WA U1987,Perth, 6845,Australia * King AbdullahUniversity Science of and Technology,Thuwal Arabia 23955,Saudi WA6845,Perth, Arabia University, KingAbdullah ofScienceandTechnology,Thuwal23955,Saudi Engineering 1 Aranda Running title: austriacus blacktailof aniconicthe RedSeareef Draft genomefish, theCover' submission 'From Molecular Ecology Resources Article :From Cover the DateAccepted 19-Jul-2016 : Revised Date:05-Jul-2016 DateReceived 22-Nov-2015 : http://olabout.wiley.com/WileyCDA/Section/id-828039.html#terms article maybeusedfornon-commercialpurposesinaccordancewithWileyTermsandConditionsSelf-Archivingat its characteristics.MolecularEcologyResources,whichhasbeenpublishedinfinalformathttp://doi.org/10.1111/1755-0998.12588.This and Berumen,M.2016.DraftgenomeofaniconicRedSeareeffish,theblacktailbutterflyfish(Chaetodonaustriacus):Currentstatus This isthepeerreviewedversionoffollowingarticle:DiBattista,J.andWang,X.Saenz-Agudelo,P.Piatek,M.Aranda, Red Sea Correspondance: JosephCorrespondance: DiBattista, Department ofEnvironment Curtin andAgriculture,

is article isprotectedbyis article copyright. All rightsreserved. Accepted Article , 1 [email protected] 2 , Department Curtin of andAgriculture, Environment University, PO Box U1987, and Research Center, D

DiBattista ): c Chile, Valdivia 5090000,Chile

Michael L Michael Berumen urrent statusand urrent Draft genome of the blacktail genomeDraft ofthe blacktail Australia 1 ,2 * , WangXin ,

3

Instituto Ambientales yEvolutivas,Universidad deCiencias

ivision ivision Scienceand l andEnvironmental of Biologica

its characteristics its 1 1 , Pablo Saenz-Agudelo , 4 Computational Bioscience Research Center, endemism, endemism, 1,3

, Marek J. Piatek, J. Marek butterflyfish ( genomics, Indo-Pacific 4 , Manuel

This article isThis article by protected copyright. All rightsreserved. approaches sequencing particularly Latimeria radiations Howe obtained Osteichthyes), benefit Genetic Introduction Pacific congen related Red unique Sea environment potential to serve The quality and completenessofthe 28,926 shelter austriacus assembled generalquestto study B Abstract

Accepted Article areamongutterflyfish most the of iconic There are at least areatleast There

et . Using . high for

research

al. for ;

) (fami

and annotated the genome sequence ofand annotated genomesequence the multiple Amemiya

, a

(Chen (NGS) - r

2013 well quality eef n Arabian n Arabian region endemic species that is reliant oncoralreefs forand food

available bonyavailable fish er which

ly

ic fish

- ) is

as studied

,

C

2015;

species ofbutterflyfish

technologies aquaculture

rapidly fields ichlidae; protein es a resource for studies a resourceforstudies

5 et represent ions evolution of biogeography, , ,

000

are al.

Xiao

model of

2014 moving still - true reef in true fish species coding genes

research , Loh

as well as as as well et

more (common lacking

). ( systems al. f

et or (superclass Osteichthyes) Genome

from

2015)

al.

draft draft review . than

Genome

, 2008

despite

a comparison gene of sequences closely between simple related

c .

were predicted fromwere predicted genome 50% on

distributed - arp,

scal see )

, the

or coral

Cyprinus of

e sequences

Kosuri significant to to

co resources evolutionary

all the the

of increasingly

medical tropical many seaswith traits diverse and -

reef model fishesandrepresent a system evolution of

known C. austriacus , more broadly blacktail

and po &

carpio

Church from

advances for research

genome vertebrate pulation geneticspulation

novelties 13 non

b ; complex the

utterflyfish Xu reef fishreef ,

967 2014 -

suggests that it model b across tropical the Indo

s

(z et ony in

reference, a as a total of assembled scaffolds.

ebrafish,

) al. species, next

(

and

systems, fish c

adaptations adaptations to the 2014 oela organisms

-

( bioinf generation es Chaetodon canth, .

) ( were Danio We , superclass

adaptive

with ormatic has the

, genus first

rerio great

- ;

This article isThis article by protected copyright. All rightsreserved. also environmental minimal This one for genome of diversity specializ family analyses indicate thatthebutterflyfish from family diverged nearest theangelfish relative, its the next closest being 61% ofnutrition, source are 2008) 2002; Littlewood feeding behavio 2003; Lawton groupspeciose 130species) of(> reef fishbythe targeted ornamental often trade mechanismsthe adaptations the ofsome underlying coral thesefishto reefs. of al. et shark, (b ( rubripes far adaptations, waterbut thus a cold pufferfish northwest from the Pacific Ocean ( Tetraodon Tetraodon nigroviridis

Accepted Article luefin

From butterflyfish andThe conspicuous bannerfish inpa (family Chaetodontidae)

of

harbo s 2014 . Indeed,that of fishthe species corals feed directly onscleractinian primaryas their emi

Rhincodon typus the ( Pomaca

; t fre

una, a - van 2004 dePeer

rs enclosed most ). This doesno deficiency

- biogeographic shwater

based phylogeneticcomp based one Thunnus orientalis

et al. et variables

nthidae), nthidae), 33millionabout years ago r unique

, of

et al. et and e

the basin inflow 2013 ations in wrasses (family: Labridae)

highest ; Read cological nicheselectioncological and 2004; Pratchett 2005;GrahamCole 2007; ;

(Sofianos van de Peer 2004van dePeer )

at , display perspective, , ), a brackish), a pufferfishof rivers water fromthe to Sri Lanka China

geologically high the butterflyfi

the ,the et al. et

levels

north

; rates Nakamura

2003; a tremendous diversity of a tremendousdiversity colo

2015 t allow us to t usto allow identify

western arisons arisons among of of

shes butterflyfish

endemism

complex evaporation ) Ngugi ) are are the genomicclosest resources(

, ( a pelagic predator thea pelagic PacificOcean in northern Cole

et al. et boundary

et (Roberts & (Roberts

this family this ideal serve as an should model

(20%; (20%; et al. et

regions

al.

2013 for es

(Bellwood , the

2012; and are

marine

2008; Lewis Rotjan & 2008 of ) Bellwood specializ ,

genes under selection under orelucidate genes

prominent of and the latitudinal the

Raitsos Ormond Zekeria 1992; the

Indian organis

et al. et world ed reeffish r pattern world’s et al. et et al. et

et

in

gradients Ocean 2010) ms,

’ al.

s

fish 2008; Gregson

2010 oceans,

2013). , bodyshape, with largest fishlargest (whale

. D

assemblages is es ). Molecular ). Molecular also see

ue to the ue tothe

subject . in 12.9%

rticular, a

the The

( Wabnitz Takifugu )

et al. et , with the Red

Red

Spaink for to et al. et

of Sea.

Sea

This article isThis article by protected copyright. All rightsreserved. the congruent phylogeny markers would Nanninga differentiation and disjunctive et and upwelling DiBattista what uncertain Pleistocene and (DiBattista into (Roberts that scleractinian fi shes,

Acceptedal. Article

first The Our

a shallow cor 6 the

limits

2014 turbid

of

benefit 12.6% al

adjacent origin objective

draft

[RAD the

et

genera is

et , isolation ; off et

et distribution al. or dispersal Giles - glacial

how

( 12 al. water

al. corals

137

genome

al. for the

from whole - of 1992) between

Seq species 2014; 2016a

the (DiBattista Gulf

2016

high polychaetes, northeast et m) in

region

cycles ]

(DiBattista events, the

al.

this , separation for connection genome for

although

of b Giles of ) levels

. 2015 ) application recorded

of populations

, population

Some these study a

Aden

but

south

(Rohling some Red

or

African

et )

et

of

. is

sequencing)

to

The relatively 8.1% al.

possible was

or al. Sea some

et thought among endemism of

reef

test with from

as 2015 al.

2016a

19

importance

of to reef

genomics et

and of

for

far

fish finer 2016a ° of

the use NGS al.

the

; to coral

extant

barriers these to as e fish

southern S ;

sedentary

chinoderms,

species

1998; 20°

aenz - Indian Sheppard

NGS

Red scale to be the in

) approaches ,

.

reef

,

the

shed , Red

Butterflyfish

N species

species at -

Gulf

ultra Sea

of Agudelo

approaches Siddall

in

to hypotheses

least Ocean blacktail

organisms (Robert

Arabian these

Sea

light

the larval

organisms were -

of & conserved

are

is in

16.5% reef Oman Red Sheppard

et

barriers on (

that

maintained et

part, e.g. at

s dispersal

now

al. butterflyfish

coasts es

et the al. fish

Sea one

and related (

restricted

restriction Froukh al. are

(DiBattista 2003; for

the

at 2015

known exact

elements is

(Roberts time

de is

1992;

the particularly

1991 ascidians

(Smeed result still

supported

include

novo

)

to Bailey

. within larval

timing described

& This

) to

subject reef water DiBattista

,

(

Kochzius of site

as

Chaetodon

extend assembl [UCEs] et

et 1997;

,

area a

stage fish well

cold

2009) the

al.

al. and

and associated

narrow

flow by interesting

to

1992;

2016a Red as

connectivity. - of

as 5.8%

Kemp their water location the

( much y .

for

see

et endemic 2008 during

research Even genetic to

Sea,

(18 al.

Nanninga )

provide range

for .

2000) de ; DNA

2016a

more km)

given o

of bate r

)

This article isThis article by protected copyright. All rightsreserved. ofUniversity Scienceand sequencing, 45 μgwasfor provided mate CA using aliquotwasquantified extracted dsDNA aQubit (Invitrogen, HS Kit Assay Valencia,(Qiagen, CA) followingthe manufacturer's DNA protocol. Total from each tissue and wasnotavailable preferred to inbreduse an individual for genome sequencing to simplify assembly preserved andarchived genomic preparation library (KAUST ID: Tissue whole RS3552), the and specimen was (N Arabia A single DNA sequencing Methods and Material environment may (e.g. Sea, C. closely exclusively the austriacus

Accepted Articlelunulatus

). A Chaetodontidae Genomic for shotgun DNA libraries sequencingprepared were atthe King Abdullah

respectively provide DiBattista

related total of prepandtotal forlibrary 10μgofDNAwas provided downstream shotgun C. austriacus ). placed into aliquots fourseparate using aQiagen DNeasy BloodandTissue Kit 18.658617

in , Chaetodon

s

a and .

the

means sister

et

(Waldrop

C. al. northern

family

melapterus

taxa

°, E 2015 to

specimen was collected on March 5, 2013, near Ablo Island, Saudispecimen onMarch5,2013,nearAbloIsland, collected was

at California the Academyof Sciences (CAS236290).Althoughis it

austriacus identify .

40.826967 found Genomic DNAfromwas extracted 100mg approximately ofgill ). Technology (

found

and et

Future

al.

central

, together

genes

which 2016 in

Rüppell,

g the °). Portions filamentsof gill the removed were for enomic

)

Red

related , live Red

KAUST) Bioscience KAUST)Bioscience (BCL)Core Laboratory using with in - pair librarypair prep.

Sea the

in

1836 Sea. comparisons

rare

to the

Cora region

unique

This

is a Indian reas

one l lochaetodon

iconic (DiBattista

of

of adaptations

Ocean,

among co

twelve

- species occurrence

Pacific

these

subgenus, shallow et

to al. is

the distributed closely

2016a Ocean

and

water different

C.

hybridization )

,

, related

trifasci

and and

members , Carlsbad,

almost an inbred an inbred

Arabian has

fish atus

three

of

,

This article isThis article by protected copyright. All rightsreserved. Accepted weresequences identified the genome with by performing NCBIbacterial search BLAST a ( fish transcripts tissue extractions andnormaliz bestassemblyusingthe akm wasobtained mitogenomesfish to prior genome shortreadassembler the Abyss with assembly respectively. 11,207,168 read remaining kept default parameterswere as 2013 quality trimming,in resulting total of808,641,943paired which included Article The shotgun sequences De novogenome run PE Illumina KAUST, HiSeq2000at and all three sizes insert (3 FC (Cat# GelPlusprotocolPair outlined “Illumina in MatePairlibrarypreparation Nextera guide KAUST BCLfacility tostrandsubject d ofanrun PEonfive lanes Illumina HiSeq2000(v3 atKAUST. reagents) an Illumina

Paired on three ofPElanes Illumina )

for qualityfilteringfollowing withthe parameters: score >20). - 132 - end libraries were screenedmitochondrial for sequencesafter aligningavailable to TruSeq

- 9001DOC, Part#15035209 RevD.)”. The5

adapter trimming, B - i.e. pairs were retained 3 forthe assembly assembly isplacement of3

contaminants), such These asbacteria. putative bacterial contaminant DNA Sample Kit Preparation (Illumina, SanDiego,CA). Themate and Genotypic Te 777,393,511 paired (101 bp read length) -

end reads andsubjected weresequenced toq - pair readswereprocessedusing NextClip ing the was libraries the increased potential to sequence non

NextSeq500 at NextSeq500 at Genotypic - 5kb, 5 5kb, - trimming, low and chnology g - - . 8kb, and 8 reads containing 150.99 er size of63. size er After trimming, 33,333,612,19,362,215,and were quality filtered using SeqQCwere filtered using quality - 5 kb, 5 enomics following Nextera facility a Mate - - 8 kb,and 8 11kb - quality trimming end functions A further consequence ofanimalfurther A

- - t 0, 8kb was library lanerun onone of were prepared separately were prepared separately at the

Technology - y 32,17(softmatch), and the - 5kb, 5 - 11 kbinsert Gb

- of data (withdata of 8kb, and 8

. tool (

uality filteringuality and

Mate

Leggett The librarywas libraries, - pair libraries vers. - vers. 11kb) were

a 2.2 et al. Phred

1.5.2.; . , A

- This article isThis article by protected copyright. All rightsreserved. 2006 &2.1; Iwata Gotoh2012 modelsGene predicted were usinga combined approac Gene http://www.timetree.org/ Evolutionary fordifferent distances the species to Brawand comparative analysiswithselectedfish specieswasretrieved from Amemiya categories: DNA assemb conjunction with BLASTandCrossmatchidentified annotate the to repeatsinthegenome 1.4 (Wheeler annotat ( using models Repeatmodeler consensus vers. Recon We interspersed identified repeats and DNA lowcomplexity sequences in genomethe using Genome annotation is genome sequence usingfilled al. et dataset. sequences

Accepted Articlehttp://www.repeatmasker.org/RepeatModeler.html

;

1.0 (Edgar &Myers 2005) 2011

a GlimmerHMM

vers. nnotation nnotation ly. We used a hierarchical systemly. Weusedahierarchical toclassify ed based ona et et al. )

GapCloser with the followingwith the 1.0.8 (Bao & Eddy 2002), RepeatScout1.0.8 (Bao&Eddy2002), et al.et

(2014), (2014), Howe

- transposon, LTR,SI transposon,

function a function Assembled 2013). Repeatmasker - PRJNA292048

combination of RepBase repeat identification vers. vers. . )

and

3.0.3

1.12 nnotation et al. et c ab initio parameters: ontigs were .

,

The output wassubsequentlyThe output refineused to andclassify (Luo Kelley Kelley

(2013), Kasahara (2013), .

NEs, LINEs

et al. et

gene vers.

vers. et al. et

and phylogenetic comparison - joined

x 1,

2012) prediction (Augustus

1.0.8

4.0.5 (Chen2004) 2012 vers. - T 20, ,

C. austriacus . TheNCBI BioProjectaccession for the

using and simple repeats. and simple )

). . Identified repeat elements. Identified repeat were 20.02 (Bao vers. de novo

et al. For homology h ofhomology –

g 1 SSPACE

1.0.5 (Price

(2007), and Wang . T

repeat elements int he resulting scaffolds et al. et were retrieved fromwere retrieved

was also used in was also usedin - vers. BASIC - based prediction gene

2015) et al.et - based Data for the Data forthe

3.0.3

-

vers.

2005), and Dfam

, (SPALN et al. et et al.et Stanke o the different

2.0

and PILER (2015). (2015). (2013),

( were gap were gap Boetzer

et al. et vers. vers.

This article isThis article by protected copyright. All rightsreserved. Pfamthe annotations functional we used InterProScan chalumnae included the species fis against ofsever gene sets with

, database (Table S1 database rem and

- ( blasted (BlastP)a blasted 5 Amemiya ted against against ted the database,2) TreMBL a

These gene These gene models were was retained,t and3) oval ofgenemodels moreoval with thantwostopcodons and<20

2014 Danio rerio ( ], Haas - zebrafish E. off of

electricus - ) with anaverage of size 10,232 based and based We identified

al fishal species withfully sequenced genomes to order identify in et al. et -

searcher [ ucius Howe

2013

in Supporting in Supporting Information ) gainst Swissprotdatabase; the t

926 high quality gene926 highquality sets -

species 5 2008

was retained whereaswas retained

) [ ( , and et al. et Ro Gallant he remaining models gene annotation were without blasted ab initio with ) toform acomprehensi ndeau -

then then specific parameters for sub 270,127 exons with an coding270,127 exonswith average sequenc

D. 53.7% and39.81%forexon 2013

default default to parameters UniProt inorder align the

- et al.et rerio f

annotated using a three et al.et gene were prediction integr rag

]) to ourassembled genome; t ( ed publicin the databases Jones ments. Augustus andGlimmerHMM

2014 (

Howe 2014 l l gene no models with hits suitable against et al.et )

, gene models with > withnohitor hits ], E. ).

et al. et bp, and a gene density of density bp, andagene

Latimeria chalumnae Latimeria (Table S1andTable S2

A final validationA final of 2014 lucius ve ofconsensus set models gene he best hit forhe besthit gene each model

2013 ab initio )

to predict domainsto predict base ( Rondeau ) - Electrophorus electricus Electrophorus . Toprovide further st s

ep approach and intron

gene predicti ated EVM using that we

his provided et al. et

bp

the modelsthe was [

in length Amemiya s,

used; used; t

2014 : 1) in and best the on. The

were run were al ) e , hese hese l L. d on , we

et et

.

This article isThis article by protected copyright. All rightsreserved. genebetween pairs (TableS3 analysis tool SynChro (Drillon against the recently published NileTilapia genome (Brawand 15), 656.4Mb(kmer 17) Thiskmer size. approach generated N coverage, is the actual of coverage genome,the mean Listhe r we thecoverage, applied formula M=N*(L isthe value mean forvalue data.Basedand coverage onthis the the kmer actual genome with plotted different kmer sizes(15, 17 smallused the paired insert accuracy satisfying and continuance of ourasse The mapping thelibraries, rates S2total (Fig. ofthe rate short a qualityIn order toevaluatethe of the assembled genome Genome v median (~ coverage againsthits curated proteinsfromhighly SWISS the performed analyzing distributionthe by ofhitlengths20,794genemodelssignificant with Accepted Article ll libraries BWAusing To further assessTo further of the quality ou We estimated approximatekmer genome the a using size distribution ofdistribution mapped read alidation

a frequency a frequency histogram. distribution should The

in Supporting Informationin Supporting - insert insert

82 paired %; see

vers. vers. ,

and 65 - , had abroad

end library todetermine frequency distribution kmer the using and 19). and 19). kmersize For each -

end 0.7.5 (Li Durbin & 2010) Fig. S1 in Supporting in Supporting Information et al.et - 4.9 Mb(kmer 19)( pair sizes insert assembledon the genome a indicated

library, which from200to ranged comparable

2014

r assembly weperfo in Supporting Information). ). Owing to the different). Owingtothe qualities of mate the range, butrange, ), allowing for 5,10 –

mbly ( K + 1) / L, where M is L,where Mis K +1)/ the mean kmer genome estimatesof650.8Mb(kmer size were all Fig. - PROT andcalcula database Fig. S2 with default parameters. Themapping

). , S ,

we counted kmer the frequency and

we assessed the we assessed mapping of rates rmed 3 W

substantially higher than65%. in Supporting Information be unimo - e also assessed completeness also the e

based based approach. Briefly, , in Supporting Information et al. et

and 15 intervening genesand 15intervening microsynteny analyses ead length

2014) the using synteny 400 dal

, bp, was96.5 where the ,

and ting their K

is the peak - % in % in ). pair pair we

).

This article isThis article by protected copyright. All rightsreserved. amountslower of can oftenvaryby+/ estimates from other estimatedg based butterflyfish 2,000scaffolds than bp greater genomes. F on scaffolds 200bpin > length value 1)(Table Bioproject under the through CenterNational forBiotechnologyInformation (NCBI) ReadArchive Short fold coverage ofthegenome processing after (Table1).RawIllumina available reads are C. austriacus Quality Genome a R in Supporting Information). CEGMA were either genes.of core genesassessed eukaryotic by Theresults showedthat98%ofthecore of genome the CEGMA using

Accepted Article esults

After trimming and filtering, After trimming andfiltering, s

increased increased - filtered Illumina reads .

ssembly The N50 or example, or genomewas approximately0.71Gb

genome generated genome generated 8 PRJNA292048 enome of0.65Gb size identified - fold in and N90 -

Chaetodon (733 Mb completely or partiallyrepresentedinour genome S4 assembly (Table 10% sequencing fromread assemblies2015) (Guoetal. short an N50ofbp 136,950 the scaffoldthe

repeat elements

lengths ) (Hardie &Hebert ) (Hardie 2004 vers. vers. . (> 97% had ofbases (Nakamura

These are values . species such species such as 150.99 the the remaining

2.5 (Parra 2.5 (Parra

of genome the the

Gb

(Fig. S2 (Fig. contigs were

et al. et . of approximately whichrepresented data,

was achieved inbluefint et al.

(N50 =170,231 (N50 reads

Chaetodon aureofasciatus 2013) , whichwa in Supporting Information similar to

2007) Phred were assembled into ). .

Total ass 21,086 Kmer

in order quality scores > 20) other published bonyfish s slightly larger than s slightly kmerthe -

based genomebased estimates size

bp and embly ofthescaffolded size , to determine the presence N90 = 3,026 una , respectively 118,995 24,561

(685 Mb based on based ), but similar ), but to o f the diploid

bp

contigs due to ) and ) based based 212 ; t hese - -

This article isThis article by protected copyright. All rightsreserved. Elements DNA andan in (SINEs) increase transposons Short in comparison to and Oreochromis niloticus elementstransposable (TEs) ofrepeat elementscontent el most abundantelements were DNA transposon contained 17.86% (127 repeatelementsMb) anaverage with GC content of 42.48%.The orderPerciformesthe ( (~51.3%), of annotatedgenes(> the models gene. assembledper scaffold rangedfrom1 to126,with7%ofthescaffolds onlyone containing bacterialobservable contamination inthefinalgap assembled coveragenone and withhigh few in contigs meaning singleton our genome, thatallscaffolds were wepresent here aBLASTnot have match inanyofthe There databasesunannotated. weredesignated were annotations forprovided 24,433genes 28,926 hig approachUsing acombined ofhomology Annotation

Acceptedements inouranalysis (TableS4 Article Analysis oftherepetitiveelement content showed the that Pundamilia nyererei BLAST results sho ,

of 97.5% whichmorethan ,

as well as Long as as well Terminal (LTRs) Repeat elements,themarginally despite h confidence h confidence gene models. BLAST Stegaste

Oryzias latipes s

(~16.2%) , see Neolamprologus brichardiNeolamprologus

wed high confidence database matches database wed highconfidence (

(~114 Mya (~114 Mya divergence 70%) matched70%) from sequences , Fig. as well as as well relative the proportions of mostabundant the , were ,

and 1).

(Japanese ricefish)

similar tothe closer related species and TableS5 Oreochromis were

(Table S1 are are indicative ofcon known sequences fromknown sequences . The majority - based based and - based annotation againstmultiple annotation databasesbased s , which accountedfor>54% oftheclassified )

in Supporting Information). . differences observed However, clear were in Supporting Information (~5.1%) , - closed closed assembly. Astatotilapia and Long (LINEs) Interspersed and Long(LINEs)Interspersed Nuclear , whichshowedproportionsof reduced de novo the fishthe genera

,

which, like

tamination. gene prediction weidentified gene prediction Chaetodon austriacus burtoni

- The number ofgenes The 5 ) for Chaetodon,

, Larimichthys Indeed, Indeed, Metriaclima zebra Metriaclima ~82 ).

The overall The overall Contigs that didContigs that

% of the gene % ofthegene there was no belong to such as

genome deeper

,

This article isThis article by protected copyright. All rightsreserved. Accepted inhabit. that they for dispersal habitatcapacity, selection, and adaptation oceanicenvironmentsto the unique austriacus short comparison with (82 scaffold andarelatively(170,231 bp), high size propo arereleasedthey geneexpressed sets, longercoverage, reads to ‘modelstatus group’ resource ( genome Using an Discussion Article assembly approach. observedthe differences simplycould the dueto with inherent associated be biases each referred to here elementsrepetitive s among (~429 Mya divergence) 2; Table(Fig. S5 Mya divergence) differentTEs was substantially in t phylogenetic (~139 distance Myadivergence).The content overall and proportionsof relative % ofthe genes)willfacilitatepredicted the - term, this ; the first first ; the

NGS approach we were able to sequence, to NGS assemble wewereable approach http://caus.reefgenomics.org

in greater the Indo

resource were derived fromreadsequencing assemblies only,which short means that .

, closely related

In the In the long draft draft That said,t Ctenopharyngodon idellus

a linkage map genome foratropical fishreef available to more close ofthegaps

for future genomic studies will will pecies mustpecies viewed cautiongiven be other with thatthe genomes - term, genomemay help shapenewhypotheses this he high (~212 genome coverage - enable comparative withgenomics thecongenersof research Pacific (Waldrop butterflyfish species

he more distant to linearizethescaffolds ) has the

and S6

(~237 Myadivergence)

immed potential theChaetodontidae to elevate et al. et , in Supporting Information transcriptome toidentify sequencing

ly with increased short

related related species

(DiBattista 2016 iate iate rtion of rtion proteinannotated

use of these sequences for use ofthese ), which agenetic may basis reveal , andnewas , -

a fold . nd annotate

This et al.et ), large ), large N50 average such as , publically available

and submitted - read sequencing sembly L. ). a butterflyfish D.

chalumnae C

rerio omparisons of -

algorithms as ) related tothe coding . Inthe

(~232

family

genes genes

C. This article isThis article by protected copyright. All rightsreserved. identify theunderpinnings of genetic traits involved the in of evolutionary programs sequencingrefinementsfurther ornamental fish trade assemblyamong studies approaches S6 inSupplementary Material species contained substantially lesstransposable elements thant TEsabundant withincreasing phylogenetic distance highlighted substantial differences intheoverall content elements showed selected similarity across fish species high to but closer related species, als genome models andgene assembly gene models C. austriacus fishsequenced not genomesare lackofthe closely support forfurther most, all of not This but thegenemodels. explainedby might be inpart T. genomesequenced (Brawand 24,559genesthe identified in hybridization this within group(Hobbs (Bellwood evolution ofabroader set withinthespecialized butcomplex taxa butterflyfish family

Accepted Articlerubripes T We 28,926 protein coding predicted genesin his assembly , experimental experimental under conditions study controlled

et al. (Aparicio from

gene set usinggene set NCBI majority the nrdatabase, however,matched vast ofthe the curios

2010), as asreveal agenetic well forthe basis

P here here related genomes erciformes species

( et et al.

( Delbeek http://reefbuilders.com/tag/hybrid provides

2002). The BLAST et al. et ) O. niloticus , although may this in bedue, biasinsequencingpart, and to a

, 2013; 2013) Lawtonet al.

would then have then would yet one ofthefirst

. 2014), but 2014), but in public in public repositories,

available on NCBI.phylogenetic classification Theof the . Moreover, o Moreover,

, which lends further support to , whichlendsfurther the et al. et , next the relatedspecies fully closest with a

2013; DiBattis still still - based annotation of annotation theseprovidedbased genes draft draft ur comparison therepertoire ofrepetitive of C. austriacus broad applicationsbroad to lower thanlower 31,059 the predicted genes in .

Specifically wefind allPerciformes that “genomic for blueprints” breeders inthe and

-

. , since most ofthe butterflyfish/

ability ability to spawn andsurvivein This information and, insomecases, ta

relative proportionrelative et al. et , which is slightly , whichis higher than

relative he zebra

2015)

captive breeding ) high rates of high rates . fish genome (Table G . fidelity ofthefidelity

enome scans to enome to scans , recently recently

along withalong the

ofmost the production o This article isThis article by protected copyright. All rightsreserved. Accepted prep,library Illumina sequencing,genome andgenome assembly, annotation. Ltd. with assistancetheir with CoreKAUST Laboratory Bioscience with Sivakumar Neelamegam andHicham for Mansour andclassificationannotation species Californiathe Academy of Sciences; contributions from Catania for withspecimen LuizRochaandDavid assistance at archiving Resources Core Gusti andMarine Coastal Lab andAmr support logistic funds toM FundsResearch (OCRF) AwardNo.CRG under Acknowledgements. Article and natural can which representsfraction anannotated ofourgenome. draft allelochemical familymayin this role onthe of depend proteins in CYP3Athe of detoxification specializ environments with it (i.e. cyanide fishing) fish particularlyaquaria, fortheobligatecora

therefore (Lawton ed Dinesh Velayutham, .

L butterflyfish anthropogenic pressures. anthropogenic

. et al. et provide B that that result s . , wethankEric MasonDream at inSaudiArabia Divers

, as well 9024 National, aswell GeographicSocietyGrant as a f ound insome (i.e.soft corals; foodsources Maldonadoetal.2016), their 2013)

library prep

a firstcritical indeterminingstep these species whether can adapt to .

in

, Recent studies have studieshave Recent also shownthat This research wasby This research the supported ofCompetitiveKAUST Office

in addition to a .

More generally,More ongoing perturbationstoshallow l oss ofhabitat Vinay Pantulwar Vinay

and Illumina sequencing , pr as as well

Yi Jin Liew Yi Jin for providing essential scripts for

the nefariousthe methods fishing that sometimescome l livores, livores, may

and coralbleaching are ,

- and 1 ovisioning theovisioning genome browser - 2012

Subashini mitigate . - We acknowledgealso important BER

;

and GenotypicTechnologyPvt. This butterflyfish

the degree the of specialization diet -

002 andbas

the the for theirassistancewith ongoing a major threat to the - 11 toJ

and eline eline research harvest the KAUSTthe

. draft draft D - water coastal . D. ; the the

genome of these For

This article isThis article by protected copyright. All rightsreserved. Areviewof(2016a) contemporary ofendemism patterns waterfor shallow reef E,SinclairAgudelo P,Salas MR,JH, Gaither HobbsJP,Kahil M,Kochzius R,M, Myers G,Robitzch V,Saenz Paulay JD DiBattista Biogeography the at crossroads ofmarine provinces biogeographical inthe Sea.Arabian Berumen (2015) ML JD, HeS,Priest MA,Sinclair DiBattista Hobbs RochaLA, JP, Butterflyfishes, JC(2013) Delbeek fishes ontropical coral reefs. C Chen R (2015) ProtocolsCurrent inBioinformatics toidentifyrepetitiveChen NUsingRepeatMasker elements (2004) sequences. ingenomic forsubstrate adaptiveradiationinAfrican cichlid fish. Brawand D,Wagner CE,Li YI,Malinsky I,FanS, M,Keller usingcontigs SSPACE. M, Boetzer He fishes. of Evolutionary history butterflyfishesthe Chaetodontidae) the (f: feeding and rise ofcoral Bellwood DR, Klanten S, CowmanPF, Pratche in eukaryotic genomes. W,Bao RepbaseUpdate,ofrepetitive O(2015) Kojimaadatabase elementsKK,Kohany genomes.sequenced Bao Z,Eddy SRAutomated of denovoidentification (2002) sequence families repeat in Publishing. Springer R(2009)Bailey ofthe assemblyshotgun andanalysis genome of S, Aparicio Chapman E,PutnamJM,Dehal J,Stupka Chia N, genomecoelacanth insights intotetrapod provides evolution. Amemiya Alföldi J,LeeAP,FanS, CT, J W, S, W,Altschul Gish localalignment E,Lipman Miller D (1990)Basic Myers tool. search References ournal Accepted MS,Jones AJ,Pratchett GP(2008) ole Article Journal EvolutionaryBiology of

of

Mol , Roberts M, Bouwmeester M,Bouwmeester J, Bowen, Roberts BW, DF, Lozano Coker

, nkel CV,nkel HJ, D,Pirovano Jansen Butler W Scaffolding pre (2011) ecular 42, On Bioinformatic Resources. 292 Ecosystem Sites to Geography: FromEcoregions

1601 . Captive care and of breeding butterflyfishes.coral reef

Genome Research Genome

When Biol Mobile Mobile - Bioinformatics 1614 ogy

biogeographical provinces collide: hybridization fishes ofreef - Taylor Westneat RJ, M, WilliamsTH, Toonen S, Berumen ML Fish , .

215, DNA and

403 , , Chapter 4, 6, Fish , , ,

-

11. 27 410 12, 23 Philippe H,MacCallum (2013) al I,et Diversity andfunctional importance ofcoral , ,

eries

578 . 1269 335

Genomics Proteomics Bioinformatics tt MS,Konow L(2010) N, vanHerwerden - - , 579

Fugu rubripes Fugu 349 - 9 Unit 410. 1276. , 286 . .

Nature -

307

.

Nature ,

et al. et

P, etal.(2002)Whole - 513, . T Science aylor TH,aylor Bowen BW, . 2

(2014) Thegenomic 375 , nd 496

edition, New - , 381. , 297

Journal of Journal 311 Biology of of Biology - Cortés D Cortés

,

- 1301

316 The African fauna in the - assembled . .

-

1310. York, York, - F, Choat genome ‐ feeding

- This article isThis article by protected copyright. All rightsreserved. AcceptedButterflyfishes, CRC Press, Boca Raton,FL. pp.48 butterflyfishes. Hobbs J andFisheries Sciences Aquatic DC,HebertHardie Program Alignments. toAssemble Spliced Automated(2008) gene structureannotation using EVidenceModeler eukaryotic and the W, SL,Zhu Allen JE,Orvis J, Haas BJ,Salzberg M, White Pertea O,BuellCR,Wortman JR Frontiers in Physiology analysis estimates ofthegenome of sizes Guo LT,WangWu SL,W,QJ, ZhouXG,XieFlowcytometry YJ(2015) and Zhang K Coral Reefs butterflyfish (Chaetodontidae) feeding rates andcoral consumption onthe BarrierGreat Reef. Gregson MA, Pratchett MS, Berumen ML,Goodm climate driven coral mortality. GrahamNAJ (2007) Evol andgenetics inthe kinship sponge reef EC,SaenzGiles convergent evolution organs. ofelectric WellsAnand R, GB,Pinch M Article LL, Gallant JR, Traeger Biol damselfishes M:Froukh T,Kochzius boundaries Species and lineagesin evolutionary the green blue Bioinformatics Edgar RC,MyersEW PILER: andclassification identification of (2005) genomic repeats. blocksand visualize synteny eukaryoticchromosomes. along G, Drillon Molecular Ecology Resources Using abutterflyfish genome ageneral for as tool RAD JD,DiBattista Saenz RedSeain the M,Rocha LA,ToonenRJ,Westneat Berumen (2016 ML JD DiBattista Red Sea. ogy ution

- 2008,

PA, vanHerwerden MS,Allen L,Pratchett GR (2013)Hybridizationamong Journal ofBiogeographyJournal ,

Carbone A,Fischer G(2014).SynChro: afastand toolto easy reconstruct 5 , ,

27, 2487 , Choat JH, Gaither MR, Lozano , ChoatJH,Gaither Hobbs JP, Chromis viridis . 72, ,

Journal Journal of Biogeography 21, - 583 Agudelo P,

451 - 2502 PD (2004) Genome i152 - - Ecological versatility andthe feeding declineofcoral fishes following 591 Agudelo P,Piatek M,Wang X,Aranda M,BerumenML(submitted) - - 69 Pratchett MS,Berumen In: B Kapoor ML, 457 ,

. - Volkening JD, MoffettGN, Phillips H,ChenPH,NovinaCD, Jr.,

6 158. .

, .

144. Ravasi

and

.

et al.et

Mar ,

61 Chromis atripectoralis , ,

ine

43, (2014) Nonhumanforthe Genomic basis genetics. 1636 T, Hussey N,Berumen ML (2015)

Biol - Stylissa carteri Stylissa 423 size evolution infishes. evolution size Science , - Bemisia tabaci Bemisia 43, 1646. ogy Genome Biology Genome - 439.

13 ,

, 153, -

30. 344, an BA (2008) BA (2008) an

119

1522

in the Red Sea. - Seq studies Seq studies inspecialized reef fish.

-

- b B and Q(Hemiptera: Aleyrodidae).

(Pomacentridae). 127. Cortés DF,Cortés Myers R,Paulay G, ) - On theoriginof endemicspecies , 1525. 9, PloS one

R7. Canadian Journal ofCanadian Journal Relationships between

(eds.)

, Ecol

Exploring seascapeExploring 9,

e92621. Biology ofBiology J ogy and ournal ofournal

Fish -

mer

This article isThis article by protected copyright. All rightsreserved. Acceptedgeographical adaptations tonovel toxin exposures inbutterflyfish. RS,Hoh E,Rimoldi GK Gadepalli JM, Ostrander (2016)Biochemical mechanisms for Maldonado A, LavadoR, KnustonWatanabe S,SlatteryM,Ankisetty S,GoldstoneJV, K, improvedmemory Luo R, Liu cichlids. s analysis reveals Loh YHE, LS,Mims Katz TD, Kocher YiSV,Streelman MC, Zootaxa Chaetodon Littlewood, D.T.J transform. andaccurate Li H,DurbinR(2010)Fast long preparationand read tool for Nextera longmate libraries. pair RM,Leggett BJ, Clavijo BG) artisanal fisheries. In: Lawt applications. S,Church, GM(2014)Kosuri Article records new eight ofcoral reef fishesfrom Arabia. Kemp J(2000) Research metagenomicfor sequences augmented andclustering byclassification Glimmer DR,LiuAL,with SLKelley Geneprediction Pop M,Salzberg B,Delcher (2012) genome vertebratedraft genome andinsights into evolution. S,Nakatani M, Y,QuKasahara Naruse K,Sasaki W, AhsanB, classification. A, Mitchell Nuka G P,BinnsJones D,ChangHY,M,LiW,C, Fraser McWill McAnulla Acids Research of version Spalnthatincorporates extended additional species H, Iwata 498 genomezebrafish reference anditsrelationship to thehuman sequence genome. MD,TorrojaHowe CF,Torrance K,Clark J, - , 269 503 on RJ, Pratchett MS,on RJ,Pratchett JC Delbeek (2013) , .

Genome Biol Gotoh O(2012)Benchmarking spliced alignment programs Spaln2, an including - 779, , 291 Bioinformatics

40, and Chaetodontidae the Perciformes)(Teleostei : morphology. withreferenceto B, Xie Y,Li Z,HuangW, YuanJ(2012) Nat .

Bioinformatics 1

e9. , Zoogeography offishes coralofthe Zoogeography the reef northeastern Gulf ofAden,with - 20. 40, ure ignatures of differentiationamid genomic inLakeMalawi polymorphism .,

- efficient short McDonald SM McDonald

e161. et al. et Methods ogy Biology of Butterflyfishes Clissold L, MD,CaccamoClissold Clark (2013) M

, ,

(2014) (2014) 5:InterProScan genome

9 26, , , R113 ,

30, Large 11 589 , -

read 236 499 , - . 595.

Gil - scale scale - - 1240. l AC de novo 507

. de

,

Cribb TH Berthelot C,Muffato al M,et (2013)

- novo Harvesting ofHarvesting butterflyfishes for and aquarium read alignment Burrows with

assembler. (eds: Pratchett (eds:

Fauna of Arabia DNA synthesis: technologiesDNA and synthesis:

SOAPdenovo2: anempirically

(2004)

- scale proteinfunctionscale GigaScience Nature -

Bioinformatics

Molecular phylogenetics of

specific features. specific features. JT (2008) MS, Berumen ML, et al. et PloS one , , NextClip: ananalysis iam J, H, Maslen

447 18, (2007) Themedaka . , Nucleic Acids 1, ,

293

Comparative 714 18 - , Wheeler - 11, , btt702 321 . -

719.

Nucleic Nature e015 .

The Kapoor

.

4208.

,

496

, This article isThis article by protected copyright. All rightsreserved. ML ( AcceptedSaenz Prog Rotjan RD, mixedmodels. F,HuelsenbeckJP(2003).MrBayes 3: phylogenetic inferenceRonquist Bayesian under PLoS One lucius C, BirdThegenome NH,KoopBF (2014) andlinkagemap thenorthern pike( of Rondeau EB,MinkleyDR, Leong JS,Messmer KR, AM,Jantzen JR, vonSchalburg Lemon sea Rohling EJ,FJ, Jorissen Fenton M, B of structure andRed Seabutterflyfishes angelfishes. Rober incidence ofterritoriality CM,Roberts Ormond (1992) RFG largest fish,world’s the whale shark: Haase CP,WebbDove AD DH, (2015) IIIRA, R,AhmadRead TD,SJ,Alam Joseph Petit Weil R,Vuong MT,M,Bhimani phytoplankton of seasonal succession theRedSea. DE,PradhanY,Bre Raitsos Article genomes. AL,JonesPrice NC,PevznerPA(2005) Denovoidentification families ofrepeat inlarge Island,BarrierLizard northern Great Reef. MSPratchett (2005) in eukaryotic genomes. G,BradnamParra CEGMA: K, KorfI(2007) annotate toaccurately genes apipeline core 388 an across antagonistic temperature Ngugi DK,AntunesA,BruneStingl Ecol genetic the populationpredict structure ofacoral fish thereef RedSea. in Nanninga GB,Saenz Proc visual ofmultiple changes pigment in genes the complete tuna. genomebluefin ofPacific Nakamura K,SaitohOshim Y,Mori - - level lowstands 500,000 ofthepast years. 405 ogy eedings ress 2015 - conserved): synteny between revealed the salmonid groupand Neoteleostei. the sister ts CM, Shepherd ARD,Ormond RFG (1992) Agudelo P,DiBattista JD, Piatek M,Gaither M,HarrisonH, NanningaG,Berumen , .

23, Ser Bioinformatics , ) Seascapegenetics along environmental gradients Arabianin the Peninsula:

9, Lewis SM(2008)

ies of the of the 591

e102089. Bioinformatics , - 367 602 Nat

, - Dietary overlap among coral . Agudelo P,Manica A,Berumen ML(2014) 73

ional

Bioinformatics - , 91 – 21,

a review. a review. . win RJW, G,Hoteit I (2013) Stenchikov

Acad

i351 , Impact predators ofcoral on tropical reefs.

19 emy of , - –

1572 Butterflyfish social 358. salinity gradient in Redthe Sea. Environ ertrand P, JP( ertrand Gnassen G,Caulet Rhincodon typus ,

a K, Mekuchi M,Sugaya T,et (2013) a K,Mekuchi al

23, Draft sequencing andassembly ofthegenometh of U (2012) -

1574. Sci Mar 1061 ences mental Nature

ine -

1067. - Biogeography of p USA feeding butterflyfishes (Chaetodontidae)at

PloS one Biol Large

J Biol , ournal ournal of

- 394, , Smith 1828.

behavior, special tothe with reference ogy 110 ogy of

- scale variation inassemblage , 162 , , 148,

11061 8 ,

- Biogeogr Fishes e64909 165 373

Environmental gradients

- PeerJ PeerJ PrePrints . 11066 Mol

elagic bacterioplankton elagic

- Remote sensing the 382 , 1998) . 34

ecular aphy Mar . ,

Mol .

79

Magnitudes of - ine , ecular 93

19, Ecol

Evolutionary

. Esox Ecol

239 ogy , e1036

JS, JS, ogy -

250 , 21

. . , e

This article isThis article by protected copyright. All rightsreserved. Accepted BioinformaticsGenomics Proteomics WuYu J(2015) Xiao J,ZhangZ, J, Acids Research Dfam: ofrepetitive DNA onprofile adatabase based models. hidden Markov TJ,Wheeler Clements R,JonesTA,Jurka J,Smit J,EddySR,Hubley AF, FinnRD (2013) adaptation. ( carp Wang Y,ZhangNingZ,LiZhaoQ., Y,Lu Y, (Subgenusbutterflyfishes Bowen(2016 BW WaldropJE, E,HobbsJP,Randall Ecology boundaries and the in Lake relationships Victoria cichlidadaptive radiation. (2013) Wagner CE,Keller I,Wittwer S,Selz OM,MwaikoS,GreuterL,SivasundarA,SeehausenO Species Wabnitz C(2003) polyploids. Yvan dePeer (2004) prediction initio ofalternative transcripts. M,Keller O,GunduzI,HayesA,Stanke Waack ab Morgenstern B(2006)AUGUSTUS: S, Article gene findingineukaryotes. M,SteiStanke Funct HJ, Spaink HP,Jansen D Res Sea circulation: 2.Three Sof Oceanol Smeed DA (1997) DA (2003) M, Siddall Rohling EJ, Almogi Arabia CRC,Sheppard SheppardALS (1991) of frominsights ddRADsequencing anemonefishes. ianos SS(2003) ianos earch Ctenopharyngodon idellus ional

, Genome

(No. 17) 12, , ogica , 22, 108

Sea Ge Nature genetics Genome Biol 3

787 -

170 , nomics Acta nkampWaack R,S,Morgenstern B(2004) - , 3066 - level fluctuations during thelast cycle.glacial wide RAD sequence data provide resolutionofspecies data unprecedented wide RAD sequence 41, - Cambridge, UNEP/Earthprint 798. .

)

From Ocean Aquarium:The inMarine to GlobalTrade Ornamental Seasonal variation strait in the oftheflowof BabalMandab. ,

Phylogeography, population structure and evolution of coral

An OceanicGeneral Circulation (OGCM) Model of investigation Red the D70 20, .

, Tetraodon genomeTetraodon confirms

13 773 - - ogy irks RP(2014) irks , dimensional circulation RedSea.in the 82. Corallochaetodon 144 , Nucleic Acids Res - , 47

781

5 - , 156 , -

Labin A,Hemleben C,Meis 250 ) provides insights its evolution into and vegetarian 625 .

. DiBattista DiBattista JD

. - A brief review of software tools forpangenomics.A briefreviewof tools software

631. .

Corals and coral andCorals coral communities of Arabia.

Advances in genomicsAdvances in fish. ofbony

Nucleic AcidsResearch ). earch

J .

ournal of , RochaLA,K et al. et Takifugu , 32, Molecular Ecology

(2015) Thedraftgeno W309

Biogeogr

AUGUSTUS: awebserver for

chner D,Schmeltzer chner I, Smeed findings: most fish are most are findings: fish ancient Nature - osaki RK osaki W312 J , ournal ournal of aphy 34, , .

423 W435 , , , 43, 24, Berumen ML, ,

853

Geophys Molecular

1116 - Brief 6241 me thegrass of 439. Nucleic - -

Fauna of 858 feeding - ings -

1129. 6255. .

ical

in

This article isThis article by protected copyright. All rightsreserved. Accepted developed the manuscriptapprove paper. and ofthefinal analyzed the data J.D.D. ofand andM.L.B.conceived designed the genome J.D.D., study. Article contributions Author Genome URL: browser PRJNA292048 (ingenome FASTAf sequence setsThe data supporting results the of thisarticle, including the Accessibility Data 168 intheamongbutterflyfish species RedSea. four ZA Zekeria ofthe diversity commoncarp, Xu

P, Zhang X, Wang X, P, Zhang LiJ,Liu X, .

, Dawit Y .

. X.W. annotated and M.A.validated, ,

Ghebremedhin S http://caus.reefgenomics.org

Cyprinus carpio ormat), are available in areavailablein the BioProject: ormat), NCBI repository, G, Kuang (2014) Y,etal , Naser M Naser . Nat Mar , Videler Videler JJ ure

ine and ine

,

Genet and analyzed the genome.

Genome andgenetic sequence Freshwater Freshwater Res (2 ics 002) Chaetodon austriacus , 46

, Resource partitioning

1212

P.S. - 1219 earch - S. , .

and M.J.P. and M.J.P. , All authors All authors 53,

163 - respectively, are genome sequence a Table 1. The N50 or N90 N90 indexN90 indexN50 Sequences >= Mb 1 Sequences >= 10 Kb Sequences >= 1 Kb Sequences >= 500 bp Percentage Averagelength Maximum length Sequences generated Total assembly size Assemblystatistics Mate pair8 Mate pair5 Mate pair3 Shotgun sequences Library Sequencing This article isThis article by protected copyright. All rightsreserved. Accepted Article ) above) which 50 or 90% length (

Chaetodon austriacus a a

- - - of non 11 Kb 8 Kbinserts 5 Kbinserts statistics

assembled

contig and scaffold of the final index indicate the

- ATGC characters

inserts

.

of the genome shortest

genome

3.80 Gb 1.47 Gb 2.30 Gb 163 , Size andassembly statistics. sequencing

Gb 663,525,437 bp

282,798 bp

Raw Raw data 21,086 bp

3,026 bp 5,641 bp Contigs 118,995 16,071 66,176 84,217 0 % 0 0

Coverage

230.06

1.47 2.88 4.85

X X X

X

0.70 Gb 1.50 Gb 2.70 Gb 151 Size

Gb Processed data 712,378,618 bp

1,950,664 bp 170,231 bp

24,561 bp 51,004 bp

Scaffolds

6.85 % 12,937 13,441 13,967 8,012 18

Coverage 212.66

0.95 1.95 3.40

X X X

X

This article isThis article by protected copyright. All rightsreserved. genus.specific t The toallthegenes. respect different inth colors numbers inner number are the outer of andthe genes M) Dicentrarchus Larimichthys Perciformes. depict colors and associated Theletters followingthe genera match from ofclosely knownsequences fish to within related species genera order the gene models andknownproteins. The majority vast of andagainst by nr grouped depic genusto Fig. Accepted Articlehe Circos plot,which

Cynoglossus

1 Genomic of composition , C) , H)

, N)

Stegastes

Neolamprologus

Tetraodon provides information percentage ofgenes the regarding matching toa , D)

, O) Oreochromis Chaetodon austriacus

, I) Fundulus

Poecilia t phylogenomic relationships , E) , P) , J)

Notothe e ringcorrespond outer to same the colors Oryzias

Takifugu numbers the are .

C. Genes wereclassified bybest hits nia , Q)

austriacus , F) , K)

Danio

Haplochromis Maylandia , and R)

genes have theirbest ir

between predicted the

percentage with : A) , L)

Others.

, G) Chaetodon

Pundamilia

The , B)

, in

This article isThis article by protected copyright. All rightsreserved. Accepted Article phylogenetic from distance species ( ourfocal Perciformes thedashed line), aswell (below the in change proportions with increasin Notespecies. Fig. 2 Relative proportion ofthemosttransposable prevalent element

the generally higherthe of amount speciesfrom DNAtransposons in order the Chaetodon austriacus

). s inselectedfish

g