Using a genome as a general tool for RAD-Seq studies in specialized reef fish

Item Article

Authors DiBattista, Joseph; Saenz Agudelo, Pablo; Piatek, Marek J.; Wang, Xin; Aranda, Manuel; Berumen, Michael L.

Citation DiBattista JD, Saenz-Agudelo P, Piatek MJ, Wang X, Aranda M, et al. (2017) Using a butterflyfish genome as a general tool for RAD- Seq studies in specialized reef fish. Molecular Ecology Resources. Available: http://dx.doi.org/10.1111/1755-0998.12662.

Eprint version Post-print

DOI 10.1111/1755-0998.12662

Publisher Wiley

Journal Molecular Ecology Resources

Rights This is the peer reviewed version of the following article: Using a butterflyfish genome as a general tool for RAD-Seq studies in specialized reef fish, which has been published in final form at http://doi.org/10.1111/1755-0998.12662. This article may be used for non-commercial purposes in accordance With Wiley Terms and Conditions for self-archiving.

Download date 24/09/2021 03:51:01

Link to Item http://hdl.handle.net/10754/622970 Accepted Article E-mail: [email protected] [email protected] E-mail: PO Box U1987, Perth,6845, WA Australia of Department DiBattista, Joseph *Correspondence: Arabia Saudi 23955, Thuwal Arabia, Saudi 23955, Thuwal Technology, and Science of University Abdullah This by is protectedarticle reserved. Allrights copyright. doi: 10.1111/1755-0998.12662 lead to differences between this version and the throughbeen the copyediting, pagination typesetting, which process, may and proofreading hasThis article been accepted publicationfor andundergone peerfull but review not has 4 1 Michael L. Berumen JosephDiBattista D. title: Running fish reef specialized in studies RAD-Seq for tool general a as genome a butterflyfish Using Resources Genetic Permanent – Resources Ecology Molecular Article :Resource type Article Accepted :14-Feb-2017Date Revised: 11-Feb-2017 Date Received :25-May-2016 Date 0000-0002-5696-7574) : ID (Orcid DIBATTISTA D. JOSEPH DR.

de Ciencias AmbientalesEvolutivas, y Universidad Austral de Chile, Valdivia 5090000, Chile, Environment and Agriculture, Curtin University, Box PO U1987, Perth,Australia, WA6845, Computational Bioscience Research Center, King Abdullah University of Science and Technology, and Technology, Science of University Abdullah King Center, Research Bioscience Computational King Engineering, and Science Environmental and Biological of Division Center, SeaResearch Red RAD-Seq in in butterflyfish RAD-Seq 1,2* 1

, Pablo Saenz-Agudelo , Pablo

1,3 , Marek J. Piatek J. , Marek Version of Record. Please Version as this cite ofRecord. article Environment and Agriculture, Curtin University, University, Curtin Agriculture, Environment and 4 , Xin Wang , Xin 1 , Manuel Aranda , Manuel 2 Department of Department of 3 Instituto Instituto 1 , and

Accepted Article This by is protectedarticle reserved. Allrights copyright. resolve phylogenetic placement, identify genes that may be under selection, and even identify to demographic estimate (e.g.parameters geneeffective flow,size, population admixture), rapidly genotypeindividuals’ many atlarge of numbers geneticimproves markers ourability tothe Moreover, management.capacity effective ofgenetic variation distribution for to recent populationhistory bottlenecks),(i.e. connectivity amongsites, and thespatial Ecologists and evolutionary biologists often ma Introduction andGulf study. Arabian with further populations inhabiting challengingcoral reef envi highlights thepotential toapproach forthis into insights reveal adaptive mechanisms in butterflyfish taxa related to calcium transmembrane transport and binding. The latter result to putativeassign gene tofunction SNPs, and(3)of an enrichment genes among sister resources closelyfor related but notdistantly related butterflyfish species based on the ability annotatedversus annotated SNPs across allsp RAD-Seq Our markers. analyses indicate(1) a modest gap betweennumbernon- the of associated function with and without our reference genome to see if it improves the quality of DNA sequencing data tosingle identifynucleotide polymorphismand(SNP) markers their isolating nuclear loci. We hereprocessed double- published draft genome ( speciesin the Red Sea Arabian and provid Sea Data from a restrictionlarge-scale site associated (RAD-Seq)DNA studyof butterflyfish nine Abstract SNP sequencing, next-generation genomics, ecological ddRAD, Chaetodontidae, adaptation, Keywords: austriacus Chaetodon ecies, (2) an advantage of using genomic advantageof (2) an ecies, ed a means to test theutility ofarecently ) and assessapparent bias inthis method of ndate geneticdata to hypothesestest related digest restriction-site (ddRAD) restriction-site digest associated ronments such as the Red Sea, Arabian Sea,such asArabian Sea, theRed ronments Accepted Article in reef fish, including surgeonfish(Gaither inhabiting tropicalseas. Wenote thatRAD-Seq have methodsonly been applieda few times Reef fish,on the otherhand,have to map, align, andotherwise trace their SNPs backto annotatedgenomes. these taxa, however, are considered “model organisms”, and therefore unique in their ability iconicamong African cichlidspecies (Wagner in theherringSea Baltic ( Hohenlohefreshwater; (Hand species. Examples includestudiescharacterizin generatedhave informative single nucleotidepolymorphism (SNP)datasets in several fish This by is protectedarticle reserved. Allrights copyright. based on coldwater ( lack of publicly genomicavailable resources. reef fish relative to other aquatic organisms can therefore be, at least in part, toattributed a all rely on interest; exclusively toor confidently sites) gene sampling SNPs functionto assign offail minimal species, and ofhowever, arein these sc limited Most studies, (Bernal 2015), hamlets (Puebla target sequence data adjacent to restriction enzyme recognition sites (Davey site Restriction associatedDNA sequencing(RAD-Seq)methods, which useNGS to from acrossthe genome tomeet demand this (for review see Kosuri & 2014).Church decreasing of sequencinghigh-throughput datacost (NGS), next-generation can be sampled genes that facilitatetoour adaptation rapidlychanging Now, the environment. with et al. et al. 2015), thegeneticbasis differentforstickleback forms of (e.g. oceanic versus 2016), (Jackson Takifugu rubripesTakifugu et al. et et al. Clupea harengus 2010; Catchen 2014;Picq de novo received much less attention, despite 5000 species ; van de Peer 2004), brackish water ( et al. assembly of rawsequence reads.This deficiency in et al. ; Corander ; Corander et al. et et al. 2014), and parrotfish (Stockwell 2014),parrotfish and 2016), angelfish (Tariel 2016), angelfish All published genomes to dateinthis group are et al. g different trout species and hybridstheir ope (based on small sample sizes, few study sizes, samplesmall onope (based 2015), clownfish (Saenz-Agudelo 2013a), 2013a), identifying cryptic lineages of 2013;Henning et al. 2013), andresolving relationships etal. et al. 2014). Most of 2016), grunts Tetraodon et al. etal. 2011), 2016). et al.

Accepted Article This by is protectedarticle reserved. Allrights copyright. enabled sampled intheof butterflyfish sites Redat SeaArabian and Sea, including genomically the We collected fin orgill tissue between from 48 and108 individuals for each ofninespecies Sample collection methods and Material niche selection(Cole (Wabnitz 2003; Lawton DiBattista ( butterflyfish Sea Red the for genome published recently A distinctphylogenetically ( nigroviridis organisms. representative genome,been have conducted across sucha numberlarge ofrelated, aquatic species. This isfirst the time comparisons between RAD-Seqanalyses,with withoutand a capacity ofthis genomeserve to scaffold asabutterflyfishfor in RAD-Seq markers other Second,toincrease utility theof genome this for of study,broader topics we further test the species, which includes assessing the apparent bias in our method of isolating nuclear loci. C. austriacus therefore We havetwo methodological in aims this study. we First, test whether usingthe analysis in related species. debate (DiBattista large numbers of endemic species (DiBattista (Bailey 2009), conditionsenvironmental fluctuating (e.g. Raitsos Indeed,familyspecies). (>130 the Red Sea hasexperienced historicalintermittentisolation biogeographicimportantly patterns related to the specialized and speciose Chaetodontidae C. austriacus C. et al. ; vanPeerde( 2004), pelagic genome as referencea improves of the RAD-Seqquality markers for this 2016a), an often obligate targeted bythe ornamentalfish trade et al. (see Fig.1 and Table 1). We also collected samples from a surgeonfish surgeonfish a from samples collected also We 1). Table and Fig.1 (see et al. 2016c). The 2016c). et al. Rhincodon typus Rhincodon 2008), evolutionary distinctness (Bellwood 2013), allows us togenes explore underpinningecological C. austriacus Thunnus orientalis ; Read et al. etal. genome may therefore enhance RAD-seq 2016b)whose origins aresubject to still 2015)species. ; Nakamura Chaetodon austriacus et al. 2013), and harbors harbors 2013), and et al. et al. 2010), butmore 2013), and ; Accepted Article quality filter) populations (populations: -p)(for more detail see Saenz-Agudelo Saenz-Agudelo see detail more -p)(for (populations: populations filter) quality (relaxed one but all or filter) quality (strict all in present and -r) (populations: ofindividuals 80% least > frequency minor allele were: 1) set data final the settings for the other all species. Population filterin “ref_map.pl” pipeline inStacks below(see and Appe species ( SNPs focal our for locicreatingcatalog when a setto2. (n) parameterThis combination yielded closest the number of between mismatches of number maximum 2; to set (N) stacks primary to reads secondary aligning when mismatches of maximum number 4; to set (M) individuals for loci between mismatches to required reads of number minimum were: Stacks in used Theparameters pipeline. “denovo_map.pl” the using weregenotyped SNPs and analyzed 20 (in a 5bp slidingwindow) werediscarded. All remaining reads from each specieswere separately read quality was considerably from lower position this onwards;with reads average phredan score < since pipeline “process_radtags” the using format FASTQ in bp 81 of length common a endto the This by is protectedarticle reserved. Allrights copyright. (Catchen and prepared from 500ng per sample of DNA by:1)digesting at 37 (Invit Kit Assay HS dsDNA Qubit a using quantified was aliquot extracted each from DNA Total protocol. manufacturer's the following CA) Valencia, (Qiagen, Kit Tissue and Blood DNeasy Qiagen a using extracted was DNA genomic brief, In al. (Willette exist methods other several although step, shearing no but cutter method (ddRAD; Peterson tag RAD double-digest the we used case, our In sites. restriction to DNA adjacent of fragments sequencing and enzymes restriction with DNA genomic digesting involves method The RAD-Seq method sequencing DNA associated site Restriction ourfor analyses. All samples were preservedethanol in95% prior toprocessing. species ( Bioscience Laboratory Coreusing SEoption. the (KAUST) Technology and Science of University Abdullah King the at v3 reagents) bp; (101 HiSeq2000 whic libraries, genomic nine form to concentration multiplexed sequencing protocol, andcombining 6) pools, appropriate, as in equimolar Illumina standard the to correspond that primers indexing unique and DNA, library of 20µl reaction µl 50 in clonality reduce to cycles PCR 10 a time, 4)size-selecting 500 bp 300 to fragmentson agarose gels from eachpool, amplifying5) over RAD sequences were first analyzed analyzed first were sequences RAD RAD-Seq analysis 2016). 2016). MluCI Ctenochaetus striatus; N et al. (NEB), 2) ligating to unique combinations of custom adaptors, 3) pooling 16 individuals at at individuals 16 pooling 3) adaptors, custom of combinations unique to ligating 2) (NEB), 2011, 2013b). Raw reads for each individual in the RAD tag library were trimmed at at trimmed were library tag RAD the in individual each for reads Raw 2013b). 2011, et al. et al. C. austriacus C. 2012), which uses one common and one rare restriction enzyme enzyme restriction rare one common and which usesone 2012), de novo de =120 10from populations), which served anas outgroup ) compared to the number of SNPs recovered using the the SNPs recovered using of numberthe compared to ) (without a reference genome) using Stacks Stacks using genome) reference a (without s containing 25 µl Illumina True Seq Master Mix, Mix, Master Seq True Illumina µl 25 containing s rogen, Carlsbad, CA). Genomic libraries were were libraries Genomic CA). Carlsbad, rogen, h were then run on nine lanes of an Illumina Illumina an of lanes nine on run then were h 0.05 and 2) the locus had to be genotyped in at at in be genotyped to had locus the 2) and 0.05 g parameters used in order to include a locus in in alocus include to order in used g parameters form a stack (m) set to 3; maximum number of of maximum number 3; (m)setto stack a form ndix S1), and was therefore used as the optimal therefore usedas was the ndix S1), and ° C usingC the restriction enzymes et al. et al. 2014;Andrews 2015). 2015).

vers. 1.44 SphI et

Accepted Article This by is protectedarticle reserved. Allrights copyright. with loci, 289,504 found we anddenovo_map.pl, S1) Appendix (see combination parameter optimal Bioproject(NCBI PRJNA292048;http://caus.reefgenomics.org; Liew see to assemble and genotype SNPs denovo_map.pl used we First, ways. different two in SNPs genotype and assemble to Stacks used We Results RAxML in model approximation GTRCAT unpartitioned an using sequences RAD with built were trees phylogenetic (ML) Likelihood Maximum Complimentary homology levels for all studied species against the S1). the same(seeAppendix alsoparameters remained filtering population denovo_map.pl; for above outlined as stack a build to reads of number an then were reads Aligned length. seed default Bowtie toreport only (i.e.one the alignmentbest) per read and allowed mismatches up 3 to in the the to aligned al. RAD sequences were also analyzed using the implemented in Geneious Geneious in implemented map reads each from of these species to the and assemble to pipeline ref_map.pl the used also We genome. sequenced afully with butterflyfish calculated. werealso lengths scaffold and of SNPs number the for distributions Frequency length. fortheir SNPs of number higher statistically a host that scaffolds highlights analysis particular This size. scaffold of afunction as genome assembled the to mapped that sample), from SNPs of number the we estimated Second, set. gene that for annotations using (or under-represented) used were analyses These analysis. ref_map.pl the the weight01 model, acut-off performed using TopGo TopGo performed using dataset of genome reference the in genes different GO enrichment analyses. identified We 3775 SNPs from the associated genes werethen used lookto for potential enrichment of specific gene functions using the using where species two only the represent These models. gene containing regions mapped to which from SNPs identified we First, ways. several the to mapped that of SNPs function and form the explored next We visualized using FigTree based on 100,000 rapid-bootstrap replicates with was support Branch included. were per species individuals 3 least at and species 10 all in present loci bp representing 2846shared loci SNP (2170 fixed SNPs (denovo_map: -N2-n -m bycompiling3 -M 4 6) four individuals species,per resulting in total38,070 STACKS in generated alignment (.phylip) sequence full a of consisted analyses for these 2016a). For 2016a). this approach, trimmed from reads eachRAD species (and thus were first SNPs) C. austriacus

mapping to2011 different genes of referencethe genome.GO enrichment analyseswere C. austriacus, C. austriacus C. reference genome increased the number of SNPs identified (see below). The vers. vers. with corresponding tolerance intervals (95% confidence in 99% of the 99% the of in confidence intervals(95% tolerance corresponding with vers. vers. scaffoldsusing Bowtie 2.22.0 from the R Bioconducter package (Alexa & Rahnenfuhrer 2010), 2010), Rahnenfuhrer & (Alexa package Bioconducter R the from 2.22.0 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/). 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).

of of 10.0.2(Drummond de novo de P < 0.01, < and specificbackground coveredgene sets by stacks from for ten different species, including species, different ten for C. austriacus C. C. austriacus C. C. austriacus C. alyzed with ref_map.pl using the same minimum minimum same the using ref_map.pl with alyzed C. austriacus Ct. striatus Ct. striatus to find which GO terms were over-represented over-represented termswere which GO find to vers. C. austriacus et al. and variable 676 SNPs within species). Only This step additionally allowed us to estimate estimate to us allowed additionally step This 1.0.1(Langmead 2009)(also seeRee & Hipp input The 2015). , SNPs and from2341 the and itsclosest relative, scaffolds, which are now publicly available C. austriacus C. genome as a reference (DiBattista (DiBattista reference a as genome used as an outgroup. Trees were outgroup. were an Trees used as genome. genome. vers. 2.0(Stamatakis 2006) as et al. C. austriacus dataset mapping to 3056 3056 to mapping dataset et al. et C. austriacus C. 2016). Using an 2009). Weconfigured 2009). C. melapterus C. C. melapterus melapterus C. scaffolds in , the only , et Accepted Article to all proteins. Briefly, the the Briefly, proteins. all to comparison in loci our by covered models gene the analysed we enzyme, restriction chosen and thus method RAD-seq our of consequence a was function gene in bias observed this if determine To regulationGTPase of activity, axon guidance, andchloride transmembrane transport, among others. positive the in involved genes for enrichment strongest the with functions, gene particular towards gene function.Analysiscovered of the GO terms thespecies-specific by showed SNPs astrong bias analysed we first end, To this models. gene within lying onSNPs analyses GOenrichment performed we processes, biological specific with associated amino acid stretchesgene inall models for stretc acid amino (ML) Methionine-Leucine internal harboring or with starting proteins encoding regions genomic for enrich could enzyme restriction this using Consequently, (L). leucine a by followed (M) methionine a to translates sequences coding In order determinevariableto if SNPs in Gene function under putativeselection ref_map.pl. using when filtering after loci SNP useable filter) quality (relaxed or27 filter) outgroup, surgeonfish our Indeed, identified. SNPs of number Acanthuridae),with a52-foldquality (strict 56-fold or filter) a (relaxed quality decrease filter) in the (i.e. family Chaetodontidae the of outside fish considered we when pronounced (excluding ref_map.pl using species of innumber the qualityfilter) decrease (relaxed C. austriacus for denovo_map.pl versus ref_map.pl using identified a Wenote 38%(strict quality 56% or filter) (relaxed quality filter) increase inthe number SNPs 13.5X). = coverage of depth (average ref_map.pl This by is protectedarticle reserved. Allrights copyright. and GOspecific SNP analyses enrichment approaches. RAD-seq other for consideration be animportant may which bias, observed the for account to analyses enrichment subsequent our for backgrounds test: (Chi-square bias enrichment suchof within genes our loci, thus identifying this a as sourcepreviously of the observed our with associated models gene of set the within ofcoverage =12.3X), and from 12.1X( 10.2X ( passing our filters based on the relaxed criteria (T and 10,257 ( across butterflyfish all species (range: to189,554 363,221), and between ( 476 for individuals between or within site (relaxed(strict8,842 quality 10,711 qualit or filter) and between 1,271 ( between the species. These shared thespecies. termsbetween include These C. melapterusC. Chaetodon fasciatus Chaetodon Chaetodon paucifasciatus Chaetodon . In contrast, In . was there a2- to6-fold decrease (strict quality filter), to12-fold or 4- , respectively,were that significantly enriched ( Chaetodon trifascialis Chaetodon P < 0.0001). 0.0001). Based < on this wegeneratedresult species-specific GO term SphI ) to 16.2X( recognition site (GCATGC) contains an ATG followed by a C, which in in which C, a by followed ATG an contains (GCATGC) site recognition C. austriacus C. C. austriacus C. ) loci per species passing our filters based on the strict criteria, criteria, strict on the based filters our passing species per loci ) Chaetodon paucifasciatus Chaetodon Chaetodon fasciatus Chaetodon C. austriacus identified 29 and 37 biological processes in processes biological and37 29 identified ) and) 13,539 ( C. austriacus ; Table 1). ; Table This downward was trend more SNPs SNPs identified the for remaining butterflyfish d calcium ion transmembrane transport, axon axon transport, transmembrane ion d calcium able 1). Average depth of coverage ranged from from ranged coverage of depth Average 1). able if SNPs were randomly distributed with respect to respect to with distributed randomly SNPs were if loci. This analysis revealed ahighly significant (Table 1). average, On were 262,247 loci there y filter) variable loci containing at least one SNP SNP one least at containing loci variable y filter) hes. We therefore analyzed the frequency of ML of thefrequency analyzed hes.therefore We and its closest relative, C. melapterus Chaetodon paucifasciatus Chaetodon and compared them to the frequency frequency the to them compared and ) to16.1X( Ct. striatus ) for denovo_map.pl (average depth depth (average denovo_map.pl ) for P < 0.01), of which <0.01), of were sharedsix (Table (Table closest1), the relative to Chaetodon paucifasciatus Chaetodon , had only quality(strict 8 C. melapterus C. Chaetodon trifascialis Chaetodon Ct. striatus ) loci per species) loci per C. austriacus C. , were , family , family ) for

) Accepted Article expected) the SNPs number of mapping to refe the (as that found We genome. own its across density SNP of distribution the track to order in scaffolds This by is protectedarticle reserved. Allrights copyright. to SNPs, and(3) an ofenrichment genes among sistertaxa butterflyfish to related calcium not distantly related on based species butterflyfish ability to the putative assign gene function SNPs across all species, (2) an advantage of using genomic resources for closely related but analysesindicate modest (1) a gap between the numbernon-annotated of versus annotated analysisusing a reference genome for a seriesof Red Sea andArabian Seabutterflyfish. Our We compared analysisof RAD-Seq data using a Discussion SNPs. representative produced approaches both that indicating genome, reference SNPs (i.e. 4%) identified with the denovo_map.pl for 385 only that we found Moreover, 2b). (Figure behavior social and transmission, synaptic transport, in found exclusively SNPs whereas 2a), (Figure behavior locomotory and morphogenesis, concentrations, calcium austriacus prol cell muscle cardiac of regulation positive guidance, neuron cell-cell adhesion 4c). (Fig. loci SNP shared with built tree consensus Likelihood Maximum by a revealed This phylogenetic consistent decay is with the relationships inferred among butterflyfish species 4b). (Fig. (~2%) homology of level lowest the had surgeonfish outgroup the similarity; and28% 19% between mapping butterflyfish the of rest the with respectively, 87% and of 86% level a at mapped that shows RAD data all for genome reference assigned varied among species (Fig. 4a). An SNPs number of absolute the that and genome, the of regions annotated versus non-annotated the of regions non-annotated versus annotated to surgeonfish, outgroup and the species, each butterflyfish of thes generality phylogenetic the test to In order overview Phylogenetic S2). Appendix (see them between GO enrichment in sets represent of SNPs above belowand thetoleranc kb; twoThese Fig. 3b). categories (i.e. 10SN > scaffolds with 10SNPs that> mapped to (Fig. them back despite 3a) large a mean scaffold size 150 (~ 0.71961; Fig. 3c). That said, scaffolds3055 had <10SNPs that mapped back tothemversus 115 We next mapped SNP loci identified from were predominantly were associatedwith genes involved inresponse toregulationcytosolic of C. austriacus C. C. melapterus C. scaffolds. foundWe that approximately 16% more SNPs were assigned to were enriched for functions involved in ion ion involved in transmembrane functions for were enriched , positive regulation of excitatory postsynaptic potential, and and potential, postsynaptic excitatory of regulation , positive estimate of the percent homology to the homology the percent to estimateof the iferation. Variable SNPs exclusively found in in found exclusively SNPs Variable iferation. C. austriacus C. Ps per scaffold versus < 10 SNPs per scaffold) perscaffold) SNPs <10 versus perscaffold Ps C. austriacus C. e analyses, we assigned we e analyses, assigned rence genome increased with scaffold size( de novo e threshold, respectively, with a clear difference difference clear a with respectively, e threshold, C. austriacus C. with to ref_map.pl gap-closed the and its andclosest relative, its approach without (i.e. a genome)to approach tofailed map tothe SNPs identified within identifiedSNPs C. melapterus C. C. austriacus C. R 2 = ,

Accepted Article stochastic placement of restriction cut sites (Davey (Davey sites cut restriction of placement stochastic This by is protectedarticle reserved. Allrights copyright. Myers (e.g. SNP “hotspots” as to refer we what here for genomicfuture studies. further createspotential the to elevateChaetodontidae the family togroup’ a‘model for mostgenetic the comprehensive polymorphism and estimatereef of dateto for any fish, transmembrane The binding. transport RAD-Seand reference genome is available from a closely related species per genome (as from is available species related a reference closely lackinggenomes), reference but thatdetailed analysisRAD-Seqdata of is improved whena provide valuable insight to population genomic questions for non-model organisms (i.e. those when when biasapparentin method the ofSNP isolation becan identifiedand accounted for. butterflyfish species we considered (average numberof annotated SNPs =1199; Wedid,Fig. 4a). the identified could SNPs assignedbe toannotated of genomeregions most the the for of of portion a large uncertainty, this Despite other. each of bp 81 within sitting SNPs multiple detect th for application downstream intended our loci, filtering option inStacksorder in complyto chromosomes notyetknown. therefore is Moreov (DiBattista map assembly of target gene rich regions (e.g. Roda Roda (e.g. regions rich gene target restriction that enzymes Restriction enzyme. choice of our stems from which apparently stretches), however, a strong detect with respectbias co to on between-speciescomparisons focused for studies be ideal would it such as and fish, reef of taxa sister other to extend likely would related butterflyfish species interms theof ability toassign a SNPs putative function. This advantage distantly not but closely related for genomic resources using to advantage a clear observed We missed. often are biases methodological of types these genome, reference available an Without 2013). can occur, which hasshown been tofurther bias theinference of geneticvariability (Gautier dropout allelic individuals, all) not (but some of cut-site the in occurred has mutation a if Moreover, here. shown we have as analyses subsequent in considered be to need that biases own their estimating connectivity adaptability and of fishes. However, such approacheslikely are tointroduce projects to application adirect also but function, or location spacing, known with genotyping provides an advantageresearchers for whowish may to select asmallerof number sites SNP for (Campbell the dataGO using annotationswithin sister taxa, whichvariable revealed that SNPs were case, the availability of the ( tuna bluefin Pacific the for shown Most

RAD tags are thought to be randomly distributed across the genome based on the the on based genome the across distributed randomly be to thought are tags RAD et al. C. austriacus 2014), have previously been used when mo when used been previously have 2014), et al. 2016a); the exact location of each SN , however, consists of gap-closed scaffolds and does not include a linkage linkage a include not does and scaffolds gap-closed of consists however, ,

Overall, ourresultsOverall, suggest that RAD-Seq inherently approaches C. austriacus within et al. Thunnus orientalis; orientalis; Thunnus 2013), or 2013), specific even primers flankingvariable SNPs genome sequence allowed for functional interrogation of of interrogation functional for allowed sequence genome monophyletic lineages. Similar advantages were recently wererecently advantages Similar lineages. monophyletic with population genetic assumptions of independent of independent assumptions genetic population with vered gene functions (i.e. proteins containing ML containing proteins (i.e. functions gene vered ese data, which further reduced our ability to reduced further which ese data, er, we enabled the “select one SNP per read” “select per read” we oneSNP enabled the er, et al. Pecoraro Pecoraro qdataset wepresent hereprovides thus et al. 2013), and indeed we found no evidence evidence no found we and indeed 2013), P relative to autosomal (or sex-linked) re specificity is required. This refinement refinement This required. is specificity re 2005; 2005; also see 3).Fig. genome The et al. 2015). example,For 2015). our in

Pecoraro Pecoraro et al. 2015) or 2015) et al.

Accepted Article for each sample and library preparationprotocol. speciesdifferences to in sample size, inaddition tostochastic factors such asvariable DNAquality among loci post-filtered pre-and of range large the attribute additionally We coverage. per locus ref_ma with reads discarding of probably a lower for results mapping better marginally haveachieved therefore former increased from 12.2Xto 14.9Xwhen mappingto the reference genome (Table 1). We may criteria, respectively) compared to respectively) and fewer restriction sites recovered versus 8812(5913 variable loci passing strict similarity, phylogenetic This by is protectedarticle reserved. Allrights copyright. Waldrop (see species two these for behigh genome must al. likely hybridize inregions of overlapon based simi Waldrop (see haplotypes DNA mitochondrial share Kya), (~50 recently diverged only have 4c), (Fig. other each to related closely are species these unusual, may appear result C. melapterus that Wenote the percentage mapping of the reads reference to genomewasactually for higher region. the in endemism of rates high underlying mechanisms the to insight some provide could particularly and waters, regional in present challenges chemical unique the handle to species allow adaptations genetic that hypothesis the test could region Sea Arabian and Sea Red the from fish additional of study Further optimal larval development in viafish ossification and gill formation (e.g. Malvezzi skeleton in , protective shells many for of the marine invertebrates,well as as promotes structural of formation the for process important an is seawater from calcium of assimilation The conserved. are functions gene which dictate may conditions these to adaptation that suggesting lower, much is Aqaba of Gulf the although 1982), (Krumgalz Ocean Indian and adjacent If we assume that Sea Red the in similar remarkably are ratios calcium/chlorine selection, under are functions these significant enrichment of sequence variation and specific gene functions. the genome werethat sequenced,not resulting consistent in a association and statistically arenot th theseSNPs isplausible that alternative A respectively. environments, Sea Arabian and Sea Red prevailing the to adaptations convergent were in both significantly overrepresented importantlywe thatgenes found associated withcalcium transmembrane transport and binding recognize mates, anddefend territoriesconspecifics from Boyle (e.g. &Tricas More 2014). bonds, pair maintain to need and monogamous are fish ofthese most that given butterflyfish for important highly are cues olfactory and visual interpreting and detecting that hypothesis represent plausible candidate genes for functional future interrogation.consistent This is with the fish reef specialized perception these for indicate might which processes, with neurological associated genes included functions These 2). (Fig. functions gene certain for enriched significantly 2015). Recent 2015). and/or on-going genetic exchange suggests that introgression across thenuclear (87.5%) versus the genomically enabled C. melapterus C. C. austriacus C. had lower levels of polymorphism (0.78%versus 2.3%, within C. austriacus C. or or , and yet the average depth of coverage for the for coverage the averagedepthof the yet , and p.pl based on increased homogeneity and higher higher and homogeneity increased on based p.pl emselves otherregions adaptive to linked of but potential differences in behavior or sensory sensory or behavior in differences potential lar behavior observed in nearby seas (DiBattista (DiBattista seas nearby in observed behavior lar between C. austriacus and and et al. the different species, and therefore therefore and species, thedifferent C. melapterus C. melapterus C. 2016). Moreover, despite (86.4%)(Fig. 4b). Although this Althoughthis 4b). (86.4%)(Fig. , suggesting potential potential suggesting , simply because we had we because had simply et al. et al. 2016), 2016), and 2015). 2015). et Accepted Article This by is protectedarticle reserved. Allrights copyright. Acknowledgements. studies. genomic comparative full for required of resources investment substantial the without bepossible therefore may mechanisms evolutionary of understanding our in advances Substantial processes. biological conserved assess as well as approaches mean important an remains this that but genomes, increasing, our results suggestthat increasing steadily is genomes available publicly of number the While studies. evolutionary-oriented of number a for promise holds likewise populations within SNPs of number large a of identification The rapid Conclusion Masonat Dream Divers inSaudi Arabia;Red the SeaState Governmentand TheRed Sea TourSpecialist handlinggene for Fouad NaseebandThabet AbdullahKhamis, as well as AhmedIssa AliAffrar from Socotra attheEnvironment Protection Authority (EPA) Socotra, andespecially Salah Saeed Ahmed, Socotra, insupport wekindly the thank Water Ministry and of of Environment Yemen,staff tofunds M.L.B., as well as aNationalGeographic SocietyGrant 9024-11 toJ.D.D. For Research(OCRF)No. underAward Funds specimen archiving at theCaliforniaAcademy of Sciences,Tane Sinclair-Taylor for acknowledgeRocha Luizcontributions from important David and Catania forassistancewith KAUST. Forassistance benchwith work at KAUST we thank CraigMichell. We also Priest,Mark Rocha, Luiz of members TaneSinclair-Taylor, and theEcology Reef Labat Richard Coleman, Michelle Gaither,Hobbs, Jean-Paul Jennifer Nanninga, McIlwain, Gerrit collections, wethank Tilman Alpermann,BrianBernardi, Giacomo Bowen, Camrin Braun, Agriculture andMinistry of Fisheries in Oman includingAbdul Karim. Forspecimen Core Lab the theCoastal Resources in Djibouti; and AmrGusti; and Marine KAUST and Prévot Younis; atDolphin M. and the DiversandScarpellini, Nicolas ofDelicrewM/V the Universityin Sudan, as well asEquipe Cousteau N.Hussey,including S. Kessel, C.

This This research was supported by the KAUST Office of Competitive ral logistics.ral Forlogistic supportelsewhere,we thankEric phylogenetic distance decreases the utility of of utility the decreases distance phylogenetic CRG-1-2012-BER-002 and baseline research s to identifys to apparent bias inSNP isolation Accepted Article Bailey R(2009) Campbell NR, Harmon SA, Narum SR (2014) Genotyping (2014) SR Narum SA, Harmon NR, Campbell This by is protectedarticle reserved. Allrights copyright. genomics. evolutionary and ecological for AndrewsGood KR,JM, Miller Luikart MR, G, Hohenlohe PAHarnessing (2016) the power of RADseq 2.22.0. Enrichment topGO: (2010) J Rahnenfuhrer Alexa A, References Sivakumar Neelamegam and Hicham Mansour for their assistance with Illumina sequencing. atlaterformatting figures stages,the andCore Bioscience KAUST Laboratory with restriction Or of history colonization recent and structure Catchen J,Bassham S, Wilson Currey M,Yeates T, O'Brien C, Q, (2013a)Cresko WA populationThe sequences. short-read from novo de loci genotyping and building Stacks: (2011) JH Postlethwait W, Cresko P, Hohenlohe A, Amores JM, Catchen Resources method genotyping SNP cost effective reef monogamousbutterflyfish. territorial a for cues olfactory and visual intruders: and mates of Discrimination (2014) TC Tricas Boyle KS, Ecology evolutionary history of sympatric sister Bernal MA, Gaither SimisonMR, WB, RochaLA Introgression (2016) and selection shaped the Biology Evolutionary fishes. feeding coral of rise the and (f:Chaetodontidae) of the history Bellwood KlantenDR, Cowman S, PF, Pratchett MS Publishing. . DOI: 10.1111/mec.13937.. , ‐ 15, site associated DNA associated site 855-867. Ecosystem Geography: From Ecoregions toSites Ecoregions From Geography: Ecosystem , 23, 335-349. ‐ sequencing. sequencing. based on custom amplicon sequencing. sequencing. amplicon custom on based ‐ species ofcoral fishes reef (: Nature Reviews Genetics Reviews Nature Molecular Ecology egon threespine stickleback threespine egon determinedusing Behaviour Animal , Konow N, van Herwerden L (2010) Evolutionary Evolutionary L (2010) Herwerden van ,N, Konow analysis for Gene Ontology. R package version version package R Ontology. Gene for analysis G3: Genes, Genomes, Genetics Genomes, Genes, G3: ‐ in ‐ Thousands by sequencing (GT sequencing by Thousands , . 2 22, nd , 2864-2883. 2864-2883. , 92, edition, New York, Springer Springer New York, edition, 17, 33-43. 81-92. Haemulon Molecular Ecology , 1, Journal of ). 171-182. Molecular ‐ seq): A Accepted Article (2009) Geneious v4.8. Available at: http://www.geneious.com/ A T,Wilson Thierer S, Stones-Havas R, Moir M, Kearse J, Heled M, Cheung B, Ashton AJ, Drummond Sea. Arabian the in provinces biogeographical marine hybr collide: provinces biogeographical When (2015) DiBattista JD, Rocha LA, Hobbs JP, S, He Priest MA, Sinclair-Taylor TH, Bowen BW, Berumen ML Biogeography of Journal Toonen M, Westneat BerumenRJ, ML On (2016c) DiBattista JD, Choat JH, Gaither MR, HobbsJP, Lozano-Cortés DF, Myers PaulayR, G, RochaLA, 439. Davey JW, Cezard T,Fuentes 499-510. This by is protectedarticle reserved. Allrights copyright. Sea. Red the in reef fauna water shallow for endemism of patterns Taylor Toonen TH, RJ, Westneat M, Williams MLS, ABerumen (2016b) reviewcontemporary of E, Sinclair- Salas P, Saenz-Agudelo V, Robitzch G, Paulay R, Myers M, Kochzius M, Kahil JP, MR, Hobbs DiBattista JD, Roberts M,Bouwmeester J, BowenCokerDF,Lozano-Cortés BW, DF,JH, ChoatGaither characteristics. ( the blacktail butterflyfish reeffish, RedSea an iconic of genome Draft (2016a) ML Berumen Aranda M, M, Piatek P, Saenz-Agudelo X, Wang JD, DiBattista genotyping. for implications data: Sequencing sequencing. next-generation using genotyping and marker discovery Davey JW, Hohenlohe Etter PD, PA, Boone JQ, Catchen JM, Blaxter MLGenome-wide (2011) genetic genomics. population for set tool analysis an Stacks: (2013b) WA Cresko A, Amores S, Bassham PA, Hohenlohe J, Catchen in the Baltic Sea herring herring Sea Baltic in the differentiation population cryptic of degree High (2013) J Merilä L, Cheng KK, Majander J, Corander tropical coral reefs. Cole AJ,Pratchett MS, Jones GP (2008) Diversity and functional importance of coral Molecular Ecology Resources Fish and Fisheries Molecular Ecology Molecular , Clupea harengus Clupea 43, ‐ 13-30. Utrilla P, Eland Gharbi C, K,Blaxter ML Special(2013) features of RAD ,

9, , . 286-307. 22, Molecular Ecology . Online Early, doi:10.1111/1755-0998.12588. 3124-3140. Molecular Ecology Molecular the origin of endemic species in the Red species theRed Sea. the origin in endemic of idization of reef fishes at the crossroads of of crossroads the at fishes of reef idization Chaetodon austriacus Chaetodon Journal of Biogeography of Journal , 22, , 2931-2940. 22, Journal of Biogeography of Journal NatureGenetics Reviews 3151-3164. ): current status and its its and status current ): , 42, 1601-1614. ‐ feeding fishes on fishes feeding , 43, ,

12, 423-

Accepted Article This by is protectedarticle reserved. Allrights copyright. Molecular Ecology e The A(2013) Estoup JM, Cornuet P, Pudlo C, Kerdelhué J, Foucaud T, Cezard K, Gharbi M, Gautier 1543-1557. and isolation geographic of signatures Genomic (2015) LA Rocha WB, SA, Simison Jones BW, Bowen RR, Coleman MA, Bernal MR, Gaither thousands of species-diagnostic SNPs using RAD sequencing. RAD using SNPs species-diagnostic of thousands of mapping and Discovery introgression: and Genomics G(2015) Luikart PA, Hohenlohe WH, Hand BK, Hether Kovach TD, RP,Muhlfeld Amish CC, SJ, Boyer MC, O’RourkeMiller MR, SM, Lowe short DNA sequences to the human genome. genome. human the to sequences DNA short Langmead Trapnell Pop B, M, C, Ultrafast Salzberg SL (2009) and memory-efficient alignment of Methods Nature KosuriLarge-scaleGMChurch, (2014) S, Krumgalz Calcium BS(1982) distributionworld intheocean waters. 9, ( Nassau in phylogeography Claydon JAB, Calosso MC, KS,Sealey Schärer MT,BernardiG Population (2014) structure and Nemeth YS, BX,de Mitcheson Semmens AM, Jackson e1000862. tags. RAD sequenced using stickleback threespine in adaptation of parallel genomics Population (2010) WA Cresko EA, Johnson N, Stiffler PD, Etter S, Bassham PA, Hohenlohe Ecology mapping. linkage dense for markers RAD using of pitfalls and benefits fishes: cichlid Henning F,Lee HJ, Franchini P, GeneticMeyer A (2014) mappingof horizontal stripesLake in Victoria In: fisheries. Lawton RJ, Pratchett DelbeekMS, Harvesting JC (2013) of butterflyfishes for aquarium and artisanal ff e97508. ect of RAD allele dropout on the estimation of genetic variation within and between populations. populations. between and within variation genetic of estimation the on dropout allele RAD of ect , 23, 5224-5240. Biology of Butterflyfishes Butterflyfishes of Biology , 11, , 22, 499-507. 3165-3178. Epinephelus striatus Epinephelus (eds: PratchettMS, Berumen ML, Kapoor BG), 269-291. natural selection incoral fishes. reef de novo de Genome Biology Genome DNA synthesis: technologies and applications. applications. and technologies synthesis: DNA RS, Heppell SA, Bush PG, Aguilar-Perera A, ), amass-aggregating marine fish. Current Current , 10, R25. Oceanologica Acta Oceanologica PLoS Genetics PLoS Molecular Ecology Molecular , 61, 146-154. Molecular , 5, , 6, 121-128. PLoS ONE PLoS

, 24,

,

Accepted Article This by is protectedarticle reserved. Allrights copyright. coral fishes( reef Picq S,McMillan WO, PopulationPuebla O(2016) genomics localof adaptation speciation versus in e37135. for method inexpensive an RADseq: digest Double (2012) HE Hoekstra HS, Fisher EH, Kay JN, Weber BK, Peterson to ocean acidification. (2015) A quantitative genetic approach to assess the evolutionary potential coastal of a marine fish H Baumann DD, Chapman CJ, Gobler D, Garant JD, DiBattista KA, Feldheim CS, AJ, Murray Malvezzi Database Liew Aranda YJ, M, Voolstra CR (2016) Reefgenomi yellow in inferences structure population for technique genotyping 2b-RAD of assessment Methodological (2015) A Cariani L, Bargelloni F, Tinti N, Bodin E, Chassot I, Zudaire H, Murua F, Arcoha J, Rooker J, Muir S, Ortega-Garcia B, Leroy C, Papetti R, Franch A, Villamor M, Babbucci C, Pecoraro genome. human the across and hotspots rates Myers S, Bottolo L, Freeman G, McVeanC, Donnelly (2005)P Afine-scale recombination map of USA Sciences of Academy National tuna. bluefin Pacific of genome complete the in genes pigment visual multiple Nakamura Y,MoriSaitoh K, K,Oshima K, Mekuchi SugayaM, etal. Evolutionary T, (2013) changes of ( Puebla O, BerminghamMcMillan E, WO Genomic(2014) atollsof differentiation in coral fishes reef shark: whale DH,Dove (2015)Draft AD sequencing and assembly of the genome world’s of the largest fish, the Webb CP, Haase JS, Vuong R, Bhimani M, Ahmad R, Weil MT, Alam SJ, Joseph RA, III TD, Petit Read seasonal succession Sea. of Red the PloS ONE, e64909. Raitsos DE, Pradhan Y, Brewin RJW, StenchikovG, Systematics Plant in Sequencing Next-generation (RADseq). DNA associated site restriction from history phylogenetic Inferring (2015) AL RH, Hipp Ree Hypoplectrus fi n tuna ( n tuna , 1-4. de novo de Rhincodon typus spp., Serranidae). Serranidae). spp., Thunnus albacares Thunnus Hypoplectrus SNP discovery and genotyping in model and non-model species. species. non-model and model in genotyping and discovery SNP Evolutionary Applications Evolutionary Smith 1828. spp, Serranidae). Serranidae). spp, Molecular Ecology ). , Marine Genomics Marine 110, 11061-11066. PeerJ PrePrints PeerJ Science , Ecology and Evolution 8, , 181-204. Hoteit I(2013)Remote sensing thephytoplankton cs.Org – a repository for marine genomics data. genomics repository marine for a cs.Org – , 352-362. 23, , Online Early. , 5291-5303. 310, , e1036. 321-324. , 6, 2109-2124. Proceedings of the the of Proceedings PLoS ONE , 7,

Accepted Article Bulletin of Marine Science Marine of Bulletin Institute. Studies Advanced Pan-Pacific the from Insight systems? marine in sequencing generation Silva, I,Matz, MV, Meyer, Santos, E, Seeb MD, LW, Seeb JE (2014) So, you want touse next- WA, Fernandez- ED,Cresko, KE, Crandall Carpenter DJ, PH, Barshis Barber Willette DA, FW, Allendorf Corallochaetodon (Subgenus butterflyfishes coral-feeding of andevolution structure population Phylogeography, (2016) BW Bowen ML, Berumen RK, Kosaki LA, Rocha JD, DiBattista JE, Randall JP, Hobbs E, Waldrop relationships in the Lake Victoria cichlid adaptive radiation. radiation. adaptive cichlid Victoria Lake the in relationships Genome O(2013) Seehausen A, L, Sivasundar Greuter S, Mwaiko OM, Selz S, Wittwer I, CE, Keller Wagner UNEP/Earthprint. Cambridge, Wabnitz C(2003) Biology Genome van Tetraodon Peer Y(2004) de genome confirms markers. RADseq on based Tariel J, GC,Longo G Tempo Bernardi and(2016) of speciation mode in Philippines. genomicsconservation toinform of functionally a reeffishimportant ( Stockwell BL, Larson Waples WA, RK, Abesamis LW, RA,Seeb Carpenter KE(2016) applicationThe of models. mixed and taxa of thousands Stamatakis A RAxML-VI-HPC:(2006) maximum likelihood-based phylogenetic analyses with anemonefishes. of ddRAD sequencing grad environmental along genetics Seascape (2015) Saenz lautus This by is protectedarticle reserved. Allrights copyright. Ortiz LH, Rieseberg P, Prentis PB, Pelser A, Lowe A, Schaul HL, Liu GM, Walter L, Ambrose F, Roda ‐ Barrientos D (2013) Genomic evidence for the parallel evolution of coastal in evolution forms of the GenomicBarrientos theparallel for evidence D (2013) ‐ complex. complex. Agudelo P, DiBattista JD, Piatek MJ, Gaither MR, Harrison HB, Nanninga GB, Berumen ML ML Berumen GB, Nanninga HB, Harrison MR, Gaither MJ, Piatek JD, DiBattista P, Agudelo ‐ wide RAD sequence data provide unprecedented resolution of species boundaries and and boundaries of species resolution unprecedented provide data sequence RAD wide Conservation Genetics Conservation , Molecular Ecology Molecular 5, ). ). From Ocean to Aquarium: The Global Trade in Marine Ornamental Species Ornamental Marine in Trade Global The Aquarium: to Ocean From Journal of Biogeography of Journal 250. , 90, Molecular Phylogenetics and Evolution 79-122. , , 17, 22, Bioinformatics Molecular Ecology 239-249. 2941-2952. , 43, 1116-1129. Takifugu ients in the Arabian Peninsula: insights from from insights Peninsula: Arabian the in ients , 22, 2688-2690. , 24, findings: most fish are ancient polyploids. polyploids. areancient fish most findings: Molecular Ecology 6241-6255. , 98, 84-88. Scarus niger Scarus Holacanthus , 22, 787-798. ) in the ) in angelfishes angelfishes (No. 17) (No. Senecio Senecio Accepted Article biological process, andcellular component. protein andGOinformation available. Gene ontology categories include molecular function, (b) 2Fig. This by is protectedarticle reserved. Allrights copyright. coding; non-annotateda wasSNP anySNP didnotmeet that this criteria. SNP was considered annotated if located in any region of the scaffold identified as protein relationship among species. Mapping parameters butterflyfish (and an outgroup surgeonfish), and (c) Maximum Likelihood (ML) phylogenetic Chaetodon austriacus annotatedplus non-annotated, is in brackets), (b)percentage ofreadsmapping the to 4Fig. lines. reddashed = interval tolerance lines; dashed =green deviations standard 2 + line; dashed protocol ddRAD a using size scaffold of a function genome, (b) frequency distributionof scaffoldmillion size (in base pairs), andnumber (c) as of SNPs Fig. 3 1Fig. enrichment analyses.All authors developed the manuscript and ofapprove the final paper. and M.J.P. produced and analyzedthe RAD libraries.X.W. and M.A. performedGO P.S.-S., J.D.D., study. RAD-Seq the designed and of conceived M.L.B. and P.S.-S., J.D.D., contributions Author doi:10.5061/dryad.f09rh. Dryad: from areavailable format .vcf in RAD reads Filtered http://caus.reefgenomics.org URL: browser Genome PRJNA292048. Raw RAD reads (FASTQ format) are available in the NCBI repository, BioProject: Data Accessibility C. melapterus (a) Frequency of SNPs per scaffold for for scaffold per SNPs of Frequency (a) Top tenhighest enriched( (a) Numberof annotated versus number non-annotated(total SNPsof SNPs, i.e. sites. collection and species study including Sea, Arabian and Sea Red the of Map based on produced SNPsannotated using addRADprotocol thathave reference genome using Bowtie for Red Sea and Arabian Sea resident resident Sea Arabian and Sea Red for Bowtie using genome reference P < 0.01) gene ontologies for (a) Chaetodon austriacus Chaetodon and “ref_map.pl” option in Stacks. Mean = black black = Mean Stacks. in option “ref_map.pl” and used for Bowtie were -n = 3 and –k = 1. A =1. –k and 3 -n = were Bowtie for used mapping assembled tothe Chaetodon austriacus and

Accepted Article This by is protectedarticle reserved. Allrights copyright. 1 Fig.

Accepted Article This by is protectedarticle reserved. Allrights copyright. Fig. 2a

Accepted Article This by is protectedarticle reserved. Allrights copyright. Fig. 2b

Accepted Article This by is protectedarticle reserved. Allrights copyright. Fig. 3a

Accepted Article This by is protectedarticle reserved. Allrights copyright. Fig. 3b

Accepted Article This by is protectedarticle reserved. Allrights copyright. Fig. 3c

Accepted Article This by is protectedarticle reserved. Allrights copyright. Fig. 4a

Accepted Article This by is protectedarticle reserved. Allrights copyright. Fig. 4b

Accepted Article 4c Fig.

(Eritrean butterflyfish) paucifasciatus Chaetodon (white-face butterflyfish) mesoleucos Chaetodon butterflyfish) (hooded Chaetodon larvatus (Red Sea butterflyfish)racoon Chaetodon fasciatus butterflyfish) (exquisite austriacus Chaetodon Species denovo_map.pl (Arabian butterflyfish) melapterus Chaetodon

AcceptedThis by is protectedreserved.article Allrights copyright. Article loci passing filter”. one) populations, which are represented bynumbers outside andthe insideparentheses, respectively, for of“# variable >0.05,thelocus 2) had to bein genotypedleast at 80%of individuals, and thewas3) locus present in all (or all but cases,12 individuals were sampled perPopulation population. included: filtering parameters 1) minor allele frequency outgroupbefore surgeonfish) qualityafter and filtering usin Table 1. Stacks results of ddRAD datanine for speciesof Red Sea andArabian residentSea (and butterflyfish an

Sample Sample size size 72 48 60 96 84 72 (Gulf of(Gulf Aqaba to Farasan South (Gulf of(Gulf Aqaba to Farasan South (geographic range of sampling) of(geographic range sampling) (Gulf of Aqaba to Djibouti) Djibouti) to Aqaba of (Gulf Number ofpopulationsNumber (Thuwal to Djibouti) Djibouti) to (Thuwal Djibouti) to (Thuwal (Djibouti to Muscat) Banks Banks 6 4 5 8 7 6 ) ) g denovo_map.plg ref_map.pl in Stacks.and all options In 100,273,743 289,504 13.7 106,207,151 54,334,800 85,686,503 97,774,030 93,749,728 # of reads # of reads used used

coverage coverage depth of of depth 16.2 11.9 12.2 12.6 10.2 av.

# of loci 363,221 218,077 273,706 298,347 262,590

(11,151) (11,151) (13,539) (13,539) (12,393) (10,711) (10,711) variable variable p (4,384) (4,384) (2,650) 10,257 10,257 10,028 7,608 7,608 2,275 1,343 8,842 8,842 assin # of # of loci g

14,493,353 13.3 50,966 50,966 13.3 14,493,353 184,187 14.9 85,591,171 42,029 12.1 20,137,784 18,726,021 16.1 44,073 44,073 16.1 18,726,021 64,641 13.1 23,585,269 91,785,299 13.4 194,235 194,235 13.4 91,785,299

# of reads used used coverage coverage depth of av. ref_map.pl ref_map.pl # of loci # of loci (10,780) (10,780) variable p (1,806) (1,806) (7,761) (1,145) (1,145) (2,179) 1,487 1,487 5,913 1,871 1,871 8,812 8,812 (697) (697) assin # of 568 955 loci g

Outgroup (chevron butterflyfish) (chevron trifascialisChaetodon (bluecheek butterflyfish) semilarvatus Chaetodon (horseshoe butterflyfish) (horseshoe Chaetodon pictus (striated surgeonfish) Ctenochaetus striatus

AcceptedThis by is protectedreserved.article Allrights copyright. Article

108 108 120 120 96 60 Abbreviation: av = average; SNP, single nucleotide polymorphism. (Gulf of Aqaba Masirah to Island) (Gulf of(Gulf to Al Aqaba Hallaniyats) (Djibouti to Masirah Island) Masirah Island) to(Djibouti (Jazirat Burqan to Djibouti) Burqan Djibouti) (Jazirat to 10 9 8 5 110,578,849 126,536,786 89,373,543 46,939,818

10.7 12.3 10.9 9.8

516,163 239,244 189,554 225,976

(1,508) (1,508) (1,271) (2,053) (2,053) (4,131) 1,270 1,270 2,073 415 415 476

25348 24 8,516 12.4 2,573,428 45,670 12.2 25,527,940 19,177,775 13.9 37,766 37,766 13.9 19,177,775 40,368 12.8 9,907,802

(331) (331) (338) (338) (759) (27) 239 552 74 8