Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. atrso o eeoyoiyadhg nreigdpeso,dsiecnieal mut fdetected of amounts considerable despite ( populations depression, dog endangered inbreeding domesticated and high di- from threatened and genome-wide introgression of heterozygosity of historical history low incorporation demographic of the the patterns enabled of has context biology the (Leroy conservation within into metrics genomics versity of integration The efforts. Introduction and management term and future long isolated to the into integral their insights are by provide that assembly, constrained genome populations sizes new pupfish to population our Sands compared effective from White tularosa generated tracts long-term contemporary inferences, C. smaller putative population in of revealed These lower because alignments markedly habitat. likely was limited Nuclear heterozygosity species, areas Genome-wide related (mitochondrial). genic from investigation. 0.022 near estimates further and variegatus, heterozygosity merit C. (nuclear) species increased that invasive 0.027 C. widespread introgression annotated of the were described of pattern 25,260 and MYA, previously tularosa a and C. ~1.6-4.7 the between exhibited 1.4Mb distances diverged of and of Genetic Many which genes N50 regions. of intergenic orthologs. near contig in assembly conserved those or a novo to Actinopterigii within with compared de expected fell genome The the markers draft of microsatellite species. 1.08Gb 95% tularosa related a of including among in possibility genes, diversity resulted the coding genomic Specifically, reads evaluated protein compared error-corrected tularosa. Units, and C. II Significant of genome, Evolutionary Sequel conservation quality tularosa PacBio demarcate and high C. evolution to a the the used developed into into we conservation previously insights introgression Herein, of markers biological withdrawal. provide is microsatellite groundwater to America, localized it and North we use pollution, Southwestern and chemical in genome species, Mexico reference invasive New draft to to part endemic in tularosa), due (Cyprinodon concern pupfish Sands White The Abstract 2020 24, November 4 3 2 1 DeWoody Black Andrew of patterns heterogenous differentiation and diversity tularosa) low reveals (Cyprinodon genome pupfish Sands White Endangered The ouainadgnm-iepten fvraiiycneeg hnwoegnm aaaeue compared used are (Fischer data loci genome microsatellite whole few when a emerge genetic from a can quantify variability genotypes from to of to data microsatellites) patterns genome-wide provide (e.g. and that markers long-term population tools genetic of neutral predictor on for putatively (Wan key relied diversity anonymous, critical a historically of are is have number diversity and geneticists limited genetic dynamics conservation overall population persistence, While historical population actions. into management insights contemporary key guiding provide can assessments genome-wide ea n nvriyCleeStation College University M and A Texas Copenhagen of University University Auburn University Purdue tal. et 1 08.Freape hl-eoesqec nlsso rywl ( wolf Grey of analysis sequence whole-genome example, For 2018). tal et 1 an Willoughby Janna , 04 n dniycnevto nt Mrt,19) oee,ipratdffrne in differences important However, 1994). (Moritz, units conservation identify and 2004) . 2 naBr¨uniche-Olsen Anna , .lpsfamiliaris lupus C. tal. et 1 07.Seicly eoisoesamr reliable more a offers genomics Specifically, 2017). 3 G´omez-Sanchez; ra Pierce Brian , ai lupus Canis 4 n Andrew and , tal. et 08.Such 2018). revealed ) Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. xrs epaePe i . a sdo eoi N hae oama rgetsz f3K and 30Kb of size University the fragment at mean II at a Sequel center to PacBio a sheared genomics of DNA core (SMRTcells) genomic cells the real-time on in single-molecule SMRTBell used 8M library The was two chemistry. Illumina 2.0 using 2x151bp sequenced the with kit NovaSeq prepare Prep Illumina Template an to using long Express used IN) PacBio Lafayette, was (West for University requirements protocol Purdue integrity from PCR-free the extracted until TruSeq extract was meet freezer to The DNA to -80@C used weight NY) a was molecular (Ithaca, in high kit Genomics deposited preparations, Tissue sequencing. Polar and library read and by ethanol PacBio Blood tissue the 95% DNeasy muscle For in skeletal Qiagen stored tissue. a fin W), preparations, from 54” library DNA 6’ Illumina 106deg ( For N, male processing. 2.88” heterogametic 54’ deceased (32deg a River 2018, September In Sequencing and Construction Library Collection, Sample Methods conservation species. and the imperiled of Methods this assessments of high-resolution potential future evolutionary enable and to infrastructure status genomic and order groundwork the the genome- evaluate among we the represented Finally, evaluate genomes sequences. we (mtDNA) (Heterozygosity, mitochondrial Next, diversity and of (nuDNA) wide classification. genomes nuclear ESU from to at assembly in collected distances this markers genetic pupfish compare “neutral” Sands and be new White ESU) tularosa to the Creek a leverage assumed Salt for then the previously assembly We of microsatellites genome part fishes. (a draft related population new other River population a Lost unforeseen evaluate Upper and and the future present any we against safeguard Herein, of to management established and and ESU classification (Stockwell Creek the extirpations Salt in the aid by to habitats. founded two developed these (Iyengar been between ESUs have native these differences markers two these salinity microsatellite between by differences additional morphologic driven nine and putatively genetic the ESUs, of Creek recognition Since Salt and Collyer and Spring Furthermore, mitochondrial, Malpais (ESUs). Units allozyme, Significant and Using Evolutionary al. two Fish et demarcated Creek. U.S. and Salt populations the native and by two Mexico the Spring species New Malpais endangered Stockwell the markers, possible by locations: microsatellite a threatened distinct as as of ecologically IUCN, review two populations by under native at Endangered is Ancestral as and Service. listed Fish, Wildlife is and species Game The of against ( Department Mexico. pupfish hedge Sands New a White artificial Southern as the of or on or focuses listed facilities populations, paper breeding present wild species (Koike 2020). The captive declining populations pupfish (IUCN augmenting in natural vulnerable 12 or of maintained or currently extinction reestablishing commonly threatened imminent for are are the as stock there species identified provide complex, pupfish others to many pupfish refugia 14 (e.g, this, and fishers the of recreational endangered Among by Because highly critically bait them or 1997). as makes endangered Echelle, used which as those and 1981), Nearc- as Echelle (Miller, Many (Black such springs ; loss congeners 1995). desert habitat introduced Wildekamp, 1979), isolated with (Deacon, 1986; small bridization withdrawal ( Miller, to groundwater fishes and restricted to salinity-tolerant (Smith are susceptible and distribution species temperature Neotropical of pupfish genera and tic differentiation 10 Nearctic population of broad and group a diversity a are of pupfishes patterns The (Lehmann detailing species image endangered resolution among high and within a achieve to approach 20)dmntae usata dpiebd hp aito ewe niiul olce rmthe from collected individuals between variation shape body adaptive substantial demonstrated (2005) eoefreiec fitorsinvahsoia neseichbiiainb siaigpairwise estimating by hybridization interspecific historical via introgression of evidence for genome tal. et tal. et 04.Adtoal,torfg ouain MudSrn n otRvr were River) Lost and Spring (Mound populations refuge two Additionally, 2004). 1998). H tal. et for ) .tularosa C. 19)fudeiec o infiatgntcdffrnito between differentiation genetic significant for evidence found (1998) Cyprinodontiformes .tularosa C. tal. et .tularosa C. n opr hsdvriyetmt osvrlohrfish other several to estimate diversity this compare and 2 wsp-4 .tularosa C. tal. et 09 Shingate 2019; sebyt eemn h eoi oiinof position genomic the determine to assembly h vrrhn olo hswr st lay to is work this of goal overarching The . a olce rmteuprraho Lost of reach upper the from collected was ) 08 Martin 2008; eefis ouetddrn h al 1900s early the during documented first were ,wihi nei oteTlrs Basin Tularosa the to endemic is which ), tal. et tal. et tal. et 2020). 06 n neseichy- interspecific and 2016) 06 Black 2016; .tularosa C. Cyprinodontidae tal. et .variegatus C populations, 2017). with ) C. Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ompigteefaue oteasml ihbat(..0 Gertz (v.2.10; blast with assembly the from to obtained features were these mapping sequences to cDNA related 54,466 and from 2019) used Consortium, were sequences transcriptome transcriptome and maculatus protein and ( assembly, Minnow protein nuDNA Sheepshead translated the species: of annotate alignments help 2) To local and 2008). assembly 1) the performing: to by sequences accomplished then removed were were tation proteins known matching regions ( tularosa and 2008) library Hubley, the the and in Smit from identified (v.1.09; were families repeatmodeler element with transposable bly curation, genome draft After Annotation Genome Mitochondrial and Hahn Nuclear (v.1.8; mitobim with the assembly alongside genome used mtDNA and 2010) Hall, and ( Quinlan pupfish Hole mtDNA Devils al. the the Patil into et to (NUMTs) aligned (v.0.1; Translocations that coalqc Nuclear reads of Illumina used incorporation first inadvertent we the assembly, prevent help of To sequence reads. mtDNA complete The by genes. evaluated core was 3,640 the Completeness and of assembly). assembly set the the a Seppey in between alignments genes v.4.0.6; target local (busco; genome of performing curated Orthologs (proportion The comprehensiveness Single-Copy and removed. Universal genes) were Benchmarking species for non-target from scanned al. derived then been was have Challis (v.2.1; to assembly identified separate blobtoolkit2 using were as Roach contaminants that assembled (v.1.0; for those were evaluated purge_haplotigs then that program itera- were the error-correction haplotypes sequences using total heterozygous haplotype removed three coverage and so High identified polishing, were S1). short-read contigs (Figure of conducted iterations several were Zimin after minimap2 tions polca plateau the (v.3.41; with using to MaSuRCA correction assembly began assembler error quality optimal Chin additional the for (v.2.3; with the used arrow compiled were to (v.3.2; short-reads with script Illumina back calling quast controlled quality mapped consensus iteration, the by then polishing, followed depth to were 2018), each SMRTcells lead Li, After can both (v.2.11; reads from excessive errors. subreads as base-call PacBio assembly, of for accumulation depth the optimal contigs the to consensus Gurevich determine create due to to assemblies 90x) used subpar was and wtdbg2 (50x program levels the coverage genomes, fuzzy mtDNA a and to from Shen nuDNA over (v.0.12; both converted seqkit of were program assembly SMRTcells the two using the command removed v8.0 were from SMRTLink were files identifiable the that mapping using clipping reads format alignment processed by binary any 2015) subread discarding Following Krueger, PacBio and <20), adapters. length. (v.0.6.5; in (Phred sequencing galore 30-nucleotides bases of trim than quality absence less low with / removing filtered presence adapters, were the Illumina (v.0.11.7; reads and fastqc of assessment, distributions examination score quality visual by quality conducted 2017) was sequences Andrews, short-read Illumina of assessment Quality Assemblies Genome and Control Center). Quality Biotechnology Carver J. (Roy Illinois of 09 oqatttvl sessgnm opeeesb vlaigsrcua nert eg,full:partial (e.g., integrity structural evaluating by completeness genome assesses quantitatively to 2019) 06.Teresulting The 2016). rnpsbeeeetlbay ln ihteZbas ( Zebrafish the with along library, element transposable tal. et .Aogaltreseis 056poensqecswr onoddfrom downloaded were sequences protein 70,576 species, three all Among ). eBruijn de 03 a sdt eeaesmaysaitc eg,N0 oietf h pia depth. optimal the identify to N50) (e.g., statistics summary generate to used was 2013) ± 0ncetds ihPoEcue v12 Campbell (v.1.2; ProtExcluder with nucleotides) 50 rp v22 unadL,22) aBoraswr u-ape ttodifferent two at sub-sampled were reads PacBio 2020). Li, and Ruan (v.2.2; graph bam .variegatus C. binitio ab l a hncnetdbc to back converted then was file .tularosa C. eepeitoswt h ae ieie(..1 Cantarel (v.2.31; pipeline maker the with predictions gene ,Zbas ( Zebrafish ), bam2fastq tal. et .diabolis C. a eosrce sn h lae luiapaired-end Illumina cleaned the using reconstructed was 00 n atos(..7 Li (v.1.17; samtools and 2020) 3 tal. et tal. et .diabolis C. n octntd ed 1bo 5K nlength in >50Kb or <1Kb Reads concatenated. and .rerio D. ulmtgnm sabcbn o the for backbone a as mitogenome full Actinopterigii_obd10 06 ro oasml.T eeaea generate To assembly. to prior 2016) 2013). tal. et ai rerio Danio n h otenPays ( Platyfish Southern the and ) tN seby(C004.;Lema (NC_030345.1; assembly mtDNA ) tal. et fastq Ensembl 07.Cagst osnu assembly consensus to Changes 2017). 06 n cenn lgmnswith alignments screening and 2006) l omtwt etos(v.2.29.0; bedtools with format file tal et tal. et (Cunningham eBs irr,gn anno- gene library, RepBase ) 03.Floiglong-read Following 2013). . 04.Uigtecustom the Using 2014). .tularosa C. aaae hc contained which database, tal. et tal. et swiss-prot tal. et 09 oextract to 2009) tal. et 08.Primary 2018). uN assem- nuDNA Xiphophorus 09 prior 2019) .tularosa C. 00 and 2020) (UniProt fastq enovo de tal. et file C. et Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. eeec eoe eeaalbea h ieo u analyses, our new of our time compare the to ( at sought pupfish available Amargosa then were we genomes framework, reference comparative a provide sa to and perspective For metrics Assembly the assemble to used S2. methods the of representation RefSeq anno- the functional to to prior compared removed, redundancy Tillich and merged (v.1.84; then were the models with gene tation the The 2004). with Korf, 2019) (v.0.15; Stanke, 2005). and Birney, Hoff and Slater (v.2.2.0; exonerate ( ( Guppy HE swl spplto ieetain eal aaeeswr sdfrteHEts Goand (Guo test HWE the for (Goudet Expectations Hardy-Weinberg used G-test from were exact departure parameters 1995). log-likelihood for Raymond, Default the (v.4.6; tested differentiation. and were population 1992) loci as two Thompson, well Stockwell these from as at results obtained the genotyped (HWE) plotting were ESU) and genotypes / times WSP-11), (N=10 1,000 & model the (WSP-2 associated permuting demarcation data by found evaluated our microsatellites and was with of associated significance microsatellite all with data cases, given using only one both a using regressions, In between another two gene. distances and conducted loci of we microsatellite distribution Results), the non-normal (see with (H gene the model annotated of linear nearest a Because the using R. fit the base each then and were in homozygosity at gene observed function diversity their predicted (N=11), nearest genetic loci the microsatellite of to these distance estimates between microsatellite relationship prior the a ESU, and if each For in) ( (or alleles found gene of was coding (H number observed it protein the of predicted that as estimates nearest feature includes microsatellite This the the each marker. to For gene, microsatellite regions. Kb) a genic (in within neighboring distance Integra- to occurred the in proximity reviewed report the manually determine we then to marker, were (IGV) lengths Viewer amplimer Genome and tive sequences microsatellite mapped All primers. sequences microsatellite al. full Rice each the (v.6.31; from primersearch end, 2009). distances this To Durbin, the Iyengar gene(s). determine from functional gene to obtained functional suspected assembly a nearest were genome on the our selection to leveraged with we locus microsatellite associated 1974), hitchhiking Haigh, genetic and to Smith (Maynard subject be might markers (Stockwell regions microsatellite control D-loop mtDNA and zyme define to work Earlier using the NCBI using Identification obtained from were Microsatellite downloaded files were (SRA) Archive species 2020. Read Illumina all August Sequence (i.e., and for 15 site data on assembly for FTP sequence GenBank) methods each (or shotgun same for RefSeq the raw the using used and galore, assemblies minimum data trim a genome with sequence using Draft cleaned raw quast and and The obtained busco were 5Kb. by Biosample) assessed of were length completeness annotation sequence and statistics assembly rence), utoudlslimnaeus Austrofundulus 19)pie e nthe in set primer (1998) eoeasml oohrpbihdreference-enabled published other to assembly genome glt ( ggplot2 oclareticulata Poecilia ..2 ika,21) oeaut etaiya h w irstliemresue nESU in used markers microsatellite two the at neutrality evaluate To 2016). Wickham, v.3.32; UniProtKB tal. et .nevadensis; C. 07,vsaie ihoda (Lohse ogdraw with visualized 2017), .tularosa C. A .tularosa C. C_0267.) o l i eoe icuigtenew the (including genomes six all For GCF_001266775.1). ; .tularosa C. tec ou (Stockwell locus each at ) aaaeadbat.TeasmldmDAsqec a noae ihgeseq with annotated was sequence mtDNA assembled The blastp. and database GCF_000633615.1), ; tal. et Sswsbsdo uaieynurlmcoaelt aa ln ihallo- with along data, microsatellite neutral putatively on based was ESUs .rerio D. 20)adaindt the to aligned and (2004) GCA_000776015.1) seby(C089.)wt lt v3;Kn,20) o visual a For 2002). Kent, (v.35; blatn with (NC_028292.1) assembly seby eurn eomsac aefrbt owr n reverse and forward both for rate mismatch zero a requiring assembly, tal. et and binitio Ab 00 a sdt eemn h oaino ahStockwell each of location the determine to used was 2000) .maculatus X. .rerio D. tal. et 4 .tularosa C. tal. et eepeitoswr aeuigAgsu (v.3.3.2; Augustus using made were predictions gene . he other Three 98.T epguewehrayo hs ‘neutral’ these of any whether gauge help To 1998). tal. et Cyprinodontiformes 98 Iyengar 1998; obs GF027251 n h nulKillifish Annual the and (GCF_002775205.1) .tularosa C. riigst swl storud fsnap of rounds two as well as sets training n xetdhtrzgst (H heterozygosity expected and ) .variegatus C. tal et 03 n rti eunesmlrt was similarity sequence protein and 2013) uN n tN eoe,seFigure see genomes, mtDNA and nuDNA Cyprinodontiformes 96,a mlmne ngenepop in implemented as 1996), . eoewt w v077 iand Li (v.0.717; bwa with genome obs tal. et ~ tal. et GF003551 n the and (GCF_000732505.1) Distance+ESU w cffl-ee pupfish scaffold-level Two . 2004). 19) h 0samples 20 The (1998). .tularosa C. eeas included, also were fastq .tularosa C. < sn the using ) lsfreach for files 0bfo a from 50Kb seabove). (see exp .tularo- C. sra-toolkit swell as ) refe- lm et Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 9K) eutn n209cnis lbol2wste sdt dniynntre N eune nthe in sequenced DNA non-target identify length to (total used artifactual then as was 12 two and blobtools2 the 3.5Mb) contigs. purge_haplotigs from length 2,009 subreads program (total PacBio The haplotigs in using as polishing. resulting assembly polished short-read 596Kb), the then of N50 from was iterations a contigs assembly with three 332 The contigs, discarded using 39%. 2,606 corrected among of error content 1.09Gb were and yielded GC SMRTcells there wtdbg2 a ~40x. filtering, assembler of and coverage the quality 1,357Mb genome level, After of coverage estimated 90x an reads. with optimal paired-end a total) the 151-bp of Using yielded with (99.6% platform million reads remaining sequencing 286 reads million llumina controlled among among The quality 16 remained coverage). data million 100Gb among (~100x 284.9 sequence 5-50Kb, assembly data raw between genome sequence of reads with extracting use 43Gb of After for worth 22Kb. reads 165Gb of million length 14.3 yielded read cells polymerase SMRT N50 mean II Sequel PacBio Two Assemblies Genome and Control Quality was intention Results Our megax. of with review comparisons. estimated pairwise systematic was our a sequences perform mtDNA the to species not between using (K2P) node (Robinson distances available phlo.io genetic each using rendered at and constraints format time within as al. replicates used et bootstrap were 100 rates using model fixed substitution Jukes-Cantor a (Kumar using megax generated (v.2.1;Thompson was for clustalw matrix two using stance analysis; conducted phylogenomic then for was used sequences other were from sequences five mitogenome , full eight Therefore, group. The (NC_028292.1). included (C. also pupfish was River species: Red following tularosa the the for and NCBI from KM985373.1) downloaded were diabolis 2020 August C. all 15 species, of pupfish as among ble relationship matrilineal Kimura, the (K2P; compare were differentiation. To parameter genomic distances Kimura-2 pairwise genetic the of alignments, patterns using package evaluate length) pairwise R to the in local with (>2Kb Following generated sequence 1). 1980) -max_target_seqs aligned -culling_limit 3, using each 1, -qcov_hsp_perc genome between masked 3, -max_hsp estimated (-perc_identity repeat 50, congeneric parameters -evalue each non-default 1, against following performed the then with were blastn alignments local the Pairwise using sections 2000). 30Kb into ( genome each genome breaking reference available an with sis), species with pupfish visualized related For were results divergence The genomic >20. and mitochondrial quality and removed base Nuclear were and 30 length of in quality mapping >100Kb minimum Scaffolds Korneliussen (v.093; S1). angsd (Table using analyses downstream from tec iei ahrfrnegnm,gna v130 Pockrandt (v.1.3.0; Smit (v.4.07; genmap the repeatmasker genome, mismatches. two reference to up each allowing in site each at ( heterozygosity Genome-wide diversity Genomic .rerio D. n o context for and 08.EtmtdmDAdvrec ie eete noae oteN re xotdi Nexus in exported tree, NJ the to annotated then were times divergence mtDNA Estimated 2018). tN sebyrpre een h rvosyarchived previously the herein, reported assembly mtDNA (KX061747.1), eBs irr sarfrne ie ihmpaiiy< n eetdrgoswr excluded were regions repeated and <1 mappability with Sites reference. a as library RepBase tal. et Cyprinodontidae .maculatus X. 08.Mlclrtm siae eeotie rmTmte (Kumar Timetree from obtained were estimates time Molecular 2018). .nvdni amargosa nevadensis C. H a siae o six for estimated was ) tal. et n notru.Mlil euneaineto ullnt mitogenome length full of alignment sequence Multiple outgroup. an and GF027251,hmlgu uN ein eeietfidb first by identified were regions nuDNA homologous (GCF_002775205.1), ape Cyprinodontidae 04 sn h noddst rqec pcrm(F) eurn a requiring (SFS), spectrum frequency site unfolded the using 2014) .maculatus X. v541 Paradis (v.5.4.1; tal. et rubrofluviatilis splitter 06.I diint raigatmte,ma pairwise mean timetree, a creating to addition In 2016). 5 tN seby(C017.)wsue sa out- an as used was (NC_011379.1) assembly mtDNA K833.) h eetpps ( pupfish Desert the (KU883631.1), Cyprinodontiformes hlgn,btt rvd vltoaycnetfor context evolutionary provide to but phylogeny, ucinipeetdi mos(..0 Rice (v.6.60; emboss in implemented function tal. et tal. et tal. et C092) nadto otenew the to addition In NC_009125). ; 05 a hnue oietf eet using repeats identify to used then was 2015) .tlrs,C variegatus C. tularosa, C. 04.Rslswr iulzdwith visualized were Results 2004). Cyprinodontidae tal. et 94 n egbrJiig(J di- (NJ) Joining Neighbor a and 1994) .tularosa C. 00 a u using run was 2020) is,t eemn mappability determine to First, . reltime-ML .variegatus C. eSqmDAassembly mtDNA RefSeq tN eoe availa- genomes mtDNA ggplot2. H ucin(Tamura function and .macularius C. tal. et a calculated was (KT288182.1), 100-mers .nevaden- C. .tularosa C. 07 and 2017) ggplot2 tal. et and C. ; Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 1.Dvriyvle mn l i pce lutae that illustrated ( species 0.00003049 six to all ) among values Diversity S1). ( heterozygosity Genome-wide diversity Genomic differentia- of levels (p-values=0) tularosa significant C. illustrated F loci (WSP-2, these at tion G-test exact An (p-values=0.195-1.00). (Stockwell demarcation H R to p-value=0.953; related SE=0.103, negatively was p-value distance genic SE=0.092, that SE=0.082, found Estimate=0.265, we Intercept: heterozy- 50Kb, 2A; reported (Figure previously ESU R microsatellite had or value=0.850; mapped between which gene were relationship loci nearest three significant 11/16 the and p no to Using genes distance suggested 2). more and (Table regression or heterozygosity one comprehensive gene of any our 10-100Kb from estimates, between gosity >100Kb located location were five a gene(s), to a of characterize 10Kb to within used the previously in microsatellites cated 16 the of Each Identification Microsatellite to the than similar over lower 1%), improved moderately F was values, 95%; these (C upon the based in completeness, 95%) Genome the (Table (C 1%). the 1.36Mb genes (F Using single-copy of fragmentation complete inherent 1). N50 3,452 Table identified a N50=31.5Mb; busco and size=35.3Mb; orthologs, 8.12Mb contig chromosome-level of max the 5Kb size (contigs=97; than than contiguous contig less greater max but (those reticulata N50=0.0843Mb) a contigs size=4.51Mb; 1.086Mb, contig 1,995 max of on (contigs=4,093; new length statistics the total assembly Thus, a genome, 1). showed draft length) pupfish similarityin Sands sequence White A the region. For control one and genes, metrics rRNA Assembly 2 of genes, two release tRNA the genome 22 between current published search polypeptides, other the 13 with in of consistent annotated also ordering were genes annotations coding these that 23,019 curated the the in with genes variegatus congruent coding generally protein 25,260 is of which coordinates genomic the identified maker Annotation Genome subsequently annotation. was Mitochondrial to and and assembly prior Nuclear primary file the sequence within nuDNA contig1125 with the match from identity removed 100% a had published assembly other NA to length the in of similar assembly The short-reads. of lumina sequence mtDNA complete The as assigned not were fasta (68Kb) contigs five ‘no-hits’, Overall, libraries. pupfish < .maculatus X. 0.001; l otiig204cniswste sdfrdwsra analyses. downstream for used then was contigs 2,004 containing file Arthropoda Distance seby(otg=4;mxcni ie4.M;N03.M)o the or N50=31.4Mb) size=46.3Mb; contig max (contigs=843; assembly GF003551.Fntoa noaino h new the of annotation Functional (GCF_000732505.1). eoe(al 2). (Table genome .tularosa C. 2 ST 005 =.5;D=,9.Uo iiigorrgeso omcoaelt oita were that loci microsatellite to regression our limiting Upon DF=2,19). F=0.050; =0.005; .maculatus X. seby( 66;F03) qiaett h chromosome-level the to equivalent 0.3%), F 96.6%; (C assembly < .nevadansis C. 030 S-1 F WSP-11, =0.350; .tularosa C. siae000 E000 p-value=0.780; SE=0.000, Estimate=0.000, : or , 0.001; .tularosa C. tal. et Chytribiomycota eoeasml.Trewr on ihna nrno ee v eelocated were five gene, a of intron an within found were Three assembly. genome Distance .variegatus C. 2 032 =.29 F21) ohmcoaelt akr sdt nomESU inform to used markers microsatellite Both DF=2,15). F=3.3249; =0.302; 98 ofre oHryWibr xettoswti hi epcieESU respective their within Expectations Hardy-Weinberg to conformed 1998) H sebyi oecniuu hntescaffold-level the than contiguous more is assembly ,wihw oewsdrvdfo ihyibe ie Estimated line. inbred highly a from derived was note we which ), a hnetmtdfralsxrfrneenabled reference six all for estimated then was ) seby( 0;F1% iue1). Figure 13%; F 70%; (C assembly ioeoe revealed mitogenomes Cyprinodon .tularosa C. siae-.0,S=.0,p-value=0.022; SE=0.003, Estimate=-0.008, : ST .tularosa C. n eesbeunl lee u fteasml.Tecurated The assembly. the of out filtered subsequently were and ) 064 n eelctdnab Actinopterygii .tularosa C. .Our ). and ) C. ) Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. eune n D) ncnrs,nDAgnmsaesilicmlt vnwe osdrn high-quality considering when even incomplete tRNA still (e.g., biological are regions the genomes constrained represents nuDNA to likely contrast, compared and variance In well-known D-loop) K2P is the ND4). the assembly (e.g., and and mtDNA regions sequences high entire hypervariabile relatively the to mappability; is inherent variation sequence mappability of sequence consequence so a annotated likely is in variance (Willoughby reported alignments in higher (K2P=0.006-0.137) those mtDNA 4-5x to to relative equivalent were (K2P=0.01-0.05) roughly herein distances reported pupfish genetic values in vs differentiation with range nuDNA) part, a demonstrated to most with Willoughby (relative species 5) the divergence (Figure For related divergence mtDNA MYA. between taxonomic in 0-10 with comparisons of increased pairwise times the 4B) divergence sequences, (Figure within distance mtDNA waterways genomic two and that these nuDNA between both differentiation Using genomic of population population. levels refuge moderate evaluate River demonstrated (F to Lost (Salt (2017) sub-population required River Stockwell populations Lost is and two Lower research Heilveil and these However, Upper between the divergence difference of River). between little structure translocation genomic Lost was As there little Upper and apart. and expect 2010), years (Carman, Creek we 50 2008 genomes, since than basis less mitochondrial semi-annual sampled a between and on occurred ESU have two same individuals these the 40 from between with collected estimate distance individuals distance genetic between genetic quantify The to assembly larosa MSB:Fish:96664). mtDNA (http://fishnet2.net/; our in 1983 used largely also ber was assembled we and tree, previously 5) phylogenomic (Figure a the node to of each addition history at In evolutionary support the bootstrap with 100% provided agreement phylogeny tests based neutrality mtDNA resolution twoOur High the divergence with genomic ESUs. associated considerable mitochondrial two adaptation add and the local herein Nuclear into define provided insights helped data genomic genomic the that of future assessments but studies provide our neutrality, may microsatellite and microsatellite earlier selection of natural to tests of context formal test weak not we a are Indeed, only provenance is fashion. This neutral Stockwell WSP-11; strictly Expectations. & Hardy-Weinburg the processes a (WSP-02 available selective within in publicly that were regions evolved data suggesting intergenic have proximity, genotype and genic not genic and may between heterozygosity they between differ suggesting correlation negative (3/16), a gene found a of intron an S nteWieSnspps.W ucsflyietfidtelcto fal1 irstliemreswithin markers microsatellite 16 all of monitor location and the define identified successfully to We used the previously pupfish. microsatellites Sands of White the coordinates in genomic ESU the identify to was intention whole Our future in metrics diversity Identification population-level Microsatellite to and need individual- the of studies. losses. outweigh re-sequencing examination would future gene-pools, genome and 2016), for differentiated separate planning (Arnold, / coverage two ESUs careful but adapted insufficient between the the certainty, locally with inoculations protect for homogenizing reciprocal policy and of required small preserve insurance cost are of potential an act populations the the to help Creek if through population similar to Salt unclear source is approach be and it the River may one Furthermore, effectively, Lost be population so (Rails the may refuge do population of rescue) this recipient to surveys genetic the However, genomic than (i.e., variation population flow genetic 2009). gene more Excoffier, considerably other local and needs the facilitated (Petit If level, heterozygosity genome diversity. increase whole genetic populations the of source their levels at replicate original to strive maintain should populations and refuge perspective, management genetic a From .nevadensis C. .tularosa C. ioeoe a setal eo(2=.09 iue4) hc swa n ol xett observe to expect would one what is which 4B), Figure (K2P=0.0009; zero essentially was mitogenomes tal. et rf eoe ayo hs oiwr ihrwti 0bo rdce ee 51)o inside or (5/16) genes predicted of 10Kb within either were loci these of Many genome. draft 00)rgos ent htKPvraiiywssbtnilylwri uN alignments nuDNA in lower substantially was variability K2P that note We regions. =0.06) 22)a ohnDA( nuDNA both at (2020) .tularosa C. ioeoeta a ore rmteSl re S nDecem- on ESU Creek Salt the from sourced was that mitogenome .tularosa C. .vreau sC nevadensis C. vs variegatus C. Cyprinodontidae Ss hs r osrainise htwl require will that issues conservation are These ESUs. 8 .tularosa C. eg,Echelle (e.g., .tularosa C. eoe oee,frtetolc where loci two the for However, genome. tal. et ST .2)wihsget htfuture that suggests which 0.024) = 00)admDA( mtDNA and =0.04) 98,w on odvainfrom deviation no found we 1998), tal. et ouain r shomogenous as are populations tal. et tal. et 05 Sa˘glam2005; .tularosa C. 00.Mr extensive More 2020). 00.Ti disparity This 2020). .variegatus C. tal. et ESUs. 2016). .tu- C. Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. nrw,S 21) atC ult oto olfrhg hogptsqec aa vial online Press Available University Oxford data. Exchange. sequence Genetic With throughput Divergence high (2016). for M.L. tool Arnold, control for quality group a lab FastQC: http://www.bioinformatics.babraham.ac.uk/projects/fastqc at: DeWoody (2017). the S. of Andrews, members thank We part in Agriculture. supported References was and project. JAD Food this (W9126G-12-2-0019). throughout for Engineers feedback of Institute constructive Corps National U.S. the the by by funded was research This think we Overall, conservation. the pupfish could regimes). Acknowledgements of to studies that parasite contribute evolutionary or regions and they ecological salinity divergent hope future ultimately, different benefit or will and to concern) analyses adaptation and represent to conservation resources could (e.g., assembly new major that these our selection contigs (a used of outlying introgression also targets identify We to represent help due genes. to but randomly, functional homogenized species distributed annotated related regions not heterozygosity from from is microsatellite divergence of markers distance of genome-wide runs genomic “neutral” evaluate of analyses large to suite relative encompass retrospective this partitioned and/or determine Our at and diversity is populations to genetic diversity inbreeding). other warranted population from recent the are that individuals to surveys indicates to to genomic due related extends population (e.g., heterozygosity analyses that homozygosity of and exploratory lack pupfish, potential the of its other if number illustrate including broadly a more species, to performed related and assembly we our of resource, to divergence context community add To a fishes. as of for group genome this annotated an for available of draft broader first the the present to We resource significant pupfishes. a the provide with herein working Conclusions extreme presented those fish an Echelle analyses especially compromised’ (e.g., example, and biologists, ‘genetically For population data conservation any original the of fish. cull the community think to into desert we back is endangered Thus, samples populations this ‘representative’ 1997). fish manage translocate desert we or managed reintroduce how of in and alter evolution approach influenced may utilized this it often how but because and introgressions, the important of within is introgression patterns , of genomic tracts the potential of identification these less, reject) (or confirm to suited variegatus between alignments tus sequence pairwise all of variegatus of possibility C. distribution the the evaluating raises Furthermore, fully this have gence. Alternatively, not 4A). may the (Figure estimates homogenized divergence nuDNA sequence our from that of introgression suggesting contemporary patterns sequences, true 0.0269) the = Brown captured distance by the (K2P nuDNA between reported higher and distances first well-recognized genetic the rates with of mutation pairwise agreement estimates in in mitochondrial increase divergence, 1.9-2.2x of nuDNA a to levels demonstrated (R compared results divergence correlation comparisons, sequence positive pairwise data mtDNA a 5/6 nuDNA For showed 4C). for rates (Figure provisional divergence comparisons are nuDNA divergence and genomic mtDNA of Both estimates K2P distances sequences. genetic absolute mtDNA underestimated this Thus, to necessarily but relative data. and matches, nuDNA regions spurious mappable the reduce homogenized in to certainly designed almost was we strategy means our Furthermore, assemblies. chromosome-level KPrne=.04024.Hr,w ugs htcniswt erzr 2 ausmyrepresent may values K2P zero near with contigs that suggest we Here, =0.0004-0.264). range (K2P nrgesdrgos(iue4) oee,etniesmln n esqec aawl ebetter be will data re-sequence and sampling extensive However, 4A). (Figure regions introgressed lutae ihyvratmsi fhmlgu lcsbetween blocks homologous of mosaic variant highly a illustrated .tularosa C. .tularosa C. efudta eeoyoiyi u sebyi tiigylwrltv to relative low strikingly is assembly our in heterozygosity that found We . eepo n e oti euto nnDA(n ehp tN)diver- mtDNA) perhaps (and nuDNA in reduction this to led and pool gene .variegatus C. .tularosa C. noa into .tularosa C. 9 .tularosa C. and .variegatus C. n n hti saogtebs assemblies best the among is it that find and tal. et ouains,wihcudhv partially have could which population(s), 17) oee,w on similar found we However, (1979). tN KPdsac 0.022) = distance (K2P mtDNA 2 .tularosa C. 09)aogalsxpairwise six all among =0.99) .tularosa C. eoe Neverthe- genome. Cyprinodontidae and .tularosa C. .variega- C. .tularosa C. tal. et and C. Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. odt . amn,M,dMes . ose,F 19) etn ieetaini ili populations. diploid in differentiation Testing (1996). F. Rousset, T., deMeeus, Genetics M., Raymond, population. J., wolf grey Goudet, declining a in T., admixture Marques-Bonet, and R., inbreeding :3599-3612 Carrasco, extinction: C., to statistics Ensenat, path Composition-based the N., BLAST. (2006). Sastre, of F. I., module S. Olalde, TBLASTN Altschul, G´omez-Sanchez, D., the A., Sch¨affer, improving A. R., searches: Agarwala, nucleotide K., translated Y. and Yu, M., of in E. variation effects Gertz, SNP genetic and microsatellite pupfish: of Owens comparison Genomics M. empirical the an Leuzinger, of – C., Conservation differentiation Rellstab, (2013). M.C., P. Fischer, B. (2005). May, extirpations. A. and S., Meyer, translocations Parmenter, multiple T.E., J., Dowling, A. R.A., Finger, Cyprinodontidae). Bussche, (Teleostei: Den Cyprinodon Van genus pupfish A.F., new-world :320-339 the Echelle, Study of Case E.W., biogeography A Historical Carson, Non-natives: by A.A., Taxa Minnow. Endemic Echelle, Sheepshead of Introgression and Genetic Pupfish Springs (1997). West. Leon F. the with A. Echelle, of A., fishes A. Echelle, threatened and Endangered (1979). :41-64 E. J. Deacon, W. Akanni, estab- P., :D745–51 recently Achuthan, and F., native ( of Cunningham, pupfish divergence Sands Morphological White (2005). of C., C.A. populations data. Heiner, Stockwell, lished sequencing J.M., J., SMRT Novak, Drake, M.L., long-read A., Collyer, from A. assemblies Klammer, genome P., microbial Marks, methods finished H., quality Nonhybrid, BlobToolKit–Interactive D. Alexander, (2020). (2013). M. S., NM. Blaxter, C. Fe, & Santa Chin, G., Fish, Cochrane, and assemblies. Game J., genome Mexico Rajan, of New assessment E., 2009. Richards, report, R., status Challis, pupfish Sands White (2010) genomes. B., S. organism Moore, Carman, E., model Ross, emerging G., for Parra, designed M., :188-196. pipeline management, S. annotation creation, Robb, rapid easy-to-use I., the an for Korf, kit L., tool B. a annotations. Cantarel, MAKER-P: genome (2014). plant of al. control et quality C., and Holt, M., Law, M.S., DNA. mitochondrial Campbell, animal of B evolution Society Rapid Royal (1979). the A.C., review of Wilson, A M., (2016). George, M. W.M., Itzkowitz, Brown, K., Little, A., Bloch, ( morphologic T., pupfish Paciorek, restoration. and springs L., genetic Leon Al-Shaer, the Rapid L., of J. Snekser, (2017). N., B. A. P. Black, Samollow, pupfish, Springs M., Leon endangered C. the Hollenbeck, of . populations A., wild and H. captive between Seears, divergence N., A. Black, oeua Ecology Molecular 144 10 18: qai osrain aieadFehae Ecosystems Freshwater and Marine Conservation: Aquatic :563-569 :1933-1940 9https://doi.org/10.1186/s12864-016-3459-7 69 26 76 :2237-2256. 967–1971 : yrndnbovinus Cyprinodon 3 ee,Gnms Genetics Genomes, Genes, G3: rnatoso h mrcnFseisSociety Fisheries American the of Transactions yrndntularosa Cyprinodon tal. et tal. et ogtr osrainsrtg n epnet habitat to response and strategy conservation long-term ) ln Physiology Plant 10 21) siaiggnmcdvriyadpopulation and diversity genomic Estimating (2017). osrainBiology Conservation 21) neb 2019. Ensembl (2019). ). 164 10 Copeia tal., et :1361-1374 :513-524 ra ai auaitMemoirs Naturalist Basin Great 26 2005 :410-416 11 adl,M 20) MAKER: (2008). M. Yandell, uli cd Research Acids Nucleic :153-161 rbdpi halleri Arabidopsis :1-11 M Biology BMC eoeresearch Genome tal et oeua Ecology Molecular yrndnbovinus Cyprinodon 142 tal. et ,Tre,S W. S. Turner, ., :1430-1443 4 Proceedings 21) On (2018). :1-14 Copeia Nature . BMC , 18 27 47 3 2 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. i .(08.Mnmp:piws lgmn o uloiesequences. nucleotide Next- for alignment (2018). pairwise J. Minimap2: L.,Wang, Waits, (2018). concern. H. A., conservation Li, Strand, of A., populations J. within DeWoody, erosion W., genetic Applications M. monitoring for Bruford, metrics L., generation E. Carroll, reference G., clownfish chromosome-scale Leroy, Berumen, orange A S., Genes: the Foret, Nemo’s of K., Finding Mineta, genome (2019). :570-585 H., the T. Ohyanagi, Gojobori, of C.T., M., assembly Michell, Aranda, C., D.J., Schunter, Miller, M.L., D.J., Lightfoot, the R., of pupfish characterization Lehmann, and Hole Sequencing Devils (2016). endangered Simons Cyprinodontidae) the H. tiformes: Lee of Timetrees, L., genome Timelines, Senger, mitochondrial B., for complete Wilson, Resource P., A Kevin, TimeTree: S.C., Lema, (2017). S.B. Hedges, Genetics Times. M., Evolutionary Divergence Suleski, Molecular and G., X: Stecher, MEGA S., (2018). Kumar, K. Tamura, C., platforms. Knyaz, computing M., across quality Li, Analysis apply G., consistently Stecher, to S., FastQC 516-517 and Kumar, files Cutadapt FastQ around to tool trimming wrapper Sequencing adapter A galore. Generation and Trim Next (2015). of F. Analysis Krueger, ANGSD: (2014). R. Nielsen, Data. A., Albrechtsen, genomes. success T.S., novel complex. of Korneliussen, in pupfish finding analysis desert Gene DNA the (2004). for Microsatellite I. management (2008) Korf, refuge P.A. of Bussche, years den 33 Van after com- Conservation diversity D., through genetic Loftis substitutions conserving base A.A., in of Echelle rate H., evolutionary Koike estimating for sequences nucleotide method of simple studies A parative (1980). M. tool. Kimura, alignment BLAST-like in BLAT–the markers (2002). microsatellite ( W.J. of pupfish Kent, Characterization Sands (2004). White P.A. the Morin, species, D., threatened Layfield, a A., C. Down- Stockwell, https://www.iucnredlist.org A., Iyengar, 2020-2. augustus. Version 2020 with Species. July Threatened genomes 09 of on List single loaded Red in IUCN The genes 2020. Predicting IUCN (2019). M. Stanke, Bioinformatics J., K. Hoff, species. two fish for fragmentation protected habitat a and https://doi.org/10.1007/s10641-017-0591-4 translocations of of units signatures significant Genetic evolutionarily (2017). C.A. Stockwell, J.S., Heilveil, approach. mapping genome iterative genomic for from and directly tool genomes baiting e129-e129 assessment mitochondrial reads-a Reconstructing quality sequencing (2013). QUAST: B. next-generation Chevreux, L., (2013). Bachmann, C., G. Hahn, Tesler, N., Vyahhi, multiple for V., assemblies. proportion Saveliev, Hardy-Weinberg of A., test Gurevich, exact the Performing (1992). A. alleles. E. W.,Thompson, S. Guo, M Bioinformatics BMC Biometrics Bioinformatics 11 11: 65 :1066-1083 361-372 321-329 :e57 oeua ilg n Evolution and Biology Molecular iohnra N atB Part DNA Mitochondrial 29 15 :1072-1075 :356 oeua ilg n Evolution and Biology Molecular ora fMlclrEvolution Molecular of Journal . yrndntularosa Cyprinodon M Bioinformatics BMC mhpinpercula Amphiprion 11 1 doi:10.1093/molbev/msx116 :705-707 niomna ilg fFishes of Biology Environmental eoeRes Genome ). 5: 35 oeua clg Notes Ecology Molecular Bioinformatics . yrndndiabolis Cyprinodon earch 16 59 :1547-1549 oeua clg Resources Ecology Molecular :111-120 uli cd research acids Nucleic 12 :656-664 urn rtcl in Protocols Current 34 :3094-3100 100: (Cyprinodon- Evolutionary 4 :191-193 631-638 Animal 41 19 : Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ..(06.Pyoeeisspota nin omnoii ftosinicios eisHl n Devils and Hole Devils icons: scientific two of origin common ancient pupfish. an Hole support Phylogenetics (2016). M.R. wtdbg2. with Large assembly of Sa˘glam, long-read Comparison accurate and and Fast Viewing Interactive (2020). https://doi.org/10.1038/s41592-019-0669-3 H. : Li, Phylo.io J., for Ruan, reassignment (2016). contig Web C. the allelic Dessimoz, on Trees D., Haplotigs: Phylogenetic Purge Dylus, O., (2018). Robinson, A.R. Borneman, assemblies. genome & diploid S.A. third-gen Schmidt, M.J., Roach, ie . oge,I,Besy .(00.EBS:TeErpa oeua ilg pnSoftware Open Biology Molecular European The EMBOSS: (2000). A. Bleasby, ecumeni- and I., tests Suite Longden, exact for P., features. software Rice, genomic genetics comparing population for 1.2): utilities (version GENEPOP of cism. suite (1995). flexible M. Raymond, a BEDTools: (2010). I.M. Bioinformatics Hall, genome of A.R., computation Quinlan, ultra-fast GenMap: (2020). K. Reinert, delimitation. C.S., tree mappability. Iliopoulos, species M., forest and Alzamel, C., flow to Pockrandt, Gene Application CoalQC (2009). data: L. (2020). Excoffier, genomic :386-393 N. J., Vijay, from R. C.G., histories language. Petit, Kushalappa, demographic R B.N., inferring in Satish, evolution while S., genomes. and control Raghavendra, phylogenetics Quality S.S., of - Shinde, analyses A.B., APE: Patil, (2004) K Strimmer J, Bioinformatics Claude New Sons, E, and Paradis Wiley (Eds) conservation. for J. units’ D. significant ‘evolutionarily Soltz 9 Defining Southwest R.J., (1994). American Naiman C. the deserts, Moritz, in American Cyprinodon) Wiley North (genus O. in pupfishes fauna, E. Fishes and fish and York In deserts its Hocutt of from 94-101). H. Coevolution inferred C. (pp. as (1981). Fishes. Basin R. Freshwater Grande R. York American Miller, Rio New North Sons, the of and of Zoogeography Wiley evolution The John The (eds.). In: (1986). earth. Valley: R. 457–485. on Death R. range in p. Miller, species survival L., smallest Diabolical M. Sciences the (2016). Biological Miller, in H. B: assimilation L. Society genetic Royal Simons, and & the flow J., of ceedings B. gene Turner, colonization, E., pupfish J. sets. recent Crawford, data H., expression C. visualizing Martin, generat- and for genomes tools mitochondrial of suite and OrganellarGenomeDRAW–a plastid Research (2013). of R. maps Bock, physical S., R., and ing Kahlau, Durbin, format O., ,G., (SAM) Drechsel, Abecasis alignment/map M., Lohse, Sequence G,. The Marth, (2009). N,. Homer, Subgroup J., Processing SAMtools, Ruan, Data T., Project Fennell, Genome A., Wysoker, 1000 B., Transform. Handsaker, Burrows-Wheeler H., with alignment Li, read short accurate and Fast (2009). formatics R. Durbin, H., Li, :373-375 rnsi Genetics in Trends ora fHeredity of Journal .. amtie,J,Sih .. iae-aeae . ihl,AL,ORuk,SM,Miller, S.M., O’Rourke, A.L., Nichols, J., Linares-Casenave, M.J., Smith, J., Baumsteiger, I.K., ˙ 41 bioRxiv 25 Bioinformatics o:10.1093/nar/gkt289 doi: Bioinformatics :1754-60 oeua Ecology Molecular 26 20 :841-2 :289–290 16 86 25 36 :276-277 :248-249 oeua ilg n Evolution and Biology Molecular :2078-9 :3687-3692 25 :3962-3973 M Bioinformatics BMC 12 8 doi:https://doi.org/10.1098/rspb.2015.2334 : 283 19: 460 33 :2163-2166 rnsi clg Evolution & Ecology in Trends rnsi clg Evolution & Ecology in Trends auemethods Nature uli Acids Nucleic 17 :155-158. Bioin- Pro- 24 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. cit sdt efr sebyadcmaaiegnmcanaly- genomic comparative and assembly perform to Contributions Author used ses: Scripts https://www.ncbi.nlm.nih.gov/sra/?term=PRJNA670474 assemblies: genome and sequences DNA interests. competing Accessibility no Data have they that with declare wheat, authors bread All Hybrid of progenitor S.(2017). a Salzberg, tauschii, J., Interests Aegilops Dvorak, Competing of J.A., genome Yorke, repetitive S., algorithm. highly Koren, mega-reads and T., the large Zhu, the M.C., of Luo, Predictors Submitted assembly D., (2020). vertebrates. Puiu, J.A. congeneric A.V., DeWoody in world. Zimin, R., divergence the Bylsma, genome of S., mitochondrial fishes Mathur, and Cyprinodontiform M., nuclear oviparous Sundaram, of the A.M., Harder, of J.R., Atlas Indiana Willoughby, Mishawaka, killies: Association, of Killifish world American A II. Vol (1995). H. R. Wildekamp, genetics conservation which for (2016). marker genetic H. Which Wickham, (2004). G. S. Fang, T., Fujihara, issue? H., Wu, H., knowledge. Q. protein Wan, (2017). of S. hub worldwide Greiner, a and UniProt: R. :D506-D515 Bock, (2019). genomes. A., Consortium. organelle Fischer, UniProt of E.S., annotation Ulbricht-Jones, accurate T., and -versatile Pellizzer, GeSeq P., Lehwark, matrix weight M., and Tillich, progressive penalties estimating of gap sensitivity for position-specific the weighting, improving method sequence W: choice RelTime CLUSTAL through alignment the (1994). T.J. sequence of Gibson, multiple foundation D.G., Higgins, Theoretical J.D., of Thompson, units rates. (2018). evolutionary significant S. variable evolutionarily from Kumar, two times Q., divergence for Tao, evidence K., Genetic Tamura, (1998). gene. A. favourable pupfish. Jones, a Sands of M., White effect Mulvey, hitch-hiking C., The Stockwell, (1974). compari- J. www.repeatmasker.org sequence Haigh, at Available biological M., J. for Open-1.0. Smith, RepeatModeler heuristics (2008). of R. generation Hubley, Automated A., Smit, (2005). E. Birney, C., of son. assembly S. genome G. Chromosome-level Slater, (2020). B. Venkatesh, ( B.H., crab Tay, horseshoe A., coastal Prasad, the V., Ravi, P., Shingate, Annotation and manipulation. Assembly Genome Assessing Shen BUSCO: Prediction. Gene NY (eds) (2019). York, M. E.M. New Kollmar Zdobnov, In: M., Completeness. Manni, M., Seppey, https://doi.org/10.5061/dryad.p8cz8w9nt M Bioinformatics BMC , . e . i . u .(06.Sqi:acospafr n lrfs oli o AT/ file FASTA/Q for toolkit ultrafast and cross-platform a SeqKit: (2016). F. Hu, Y., Li, S., Le, W., Electrophoresis uli cd Research Acids Nucleic LSONE PLoS 25 glt:EeatGahc o aaAnalysis Data for Graphics Elegant ggplot2: nmlConservation Animal :2165-2176 6 doi:10.1371/journal.pone.0163962 ahpesgigas Tachypleus eoeResearch Genome :31 11 47-0https://doi.org/10.1093/nar/22.22.4673 :4673-80 1 ). 1 :213-225 oeua clg Resources Ecology Molecular :066100 oeua ilg n Evolution and Biology Molecular 13 ehd nMlclrBiology Molecular in Methods uli cd Research Acids Nucleic pigrVra e York New Springer-Verlag . O:10.1111/1755-0998.13233 DOI: . eeisResearch Genetics uli cd Research Acids Nucleic 35 o 92 Humana, 1962. vol , :1770-1782 45 :W6-W11 23 :23-35 47 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. S-5------PP 5--14BACE2 FLRT2 124 59 - GRID1 - - 355 49 - LECTIN C-TYPE 35 YPEL1 UKP 303 - FAM177A1 56 PCP4 1 RIMS1 - 2 - - 39 - - 167 - TRIM59 - ATOH1 ------2 - - - 3 - 4 0.00 - 0.49 - 0.49 - - 0.00 - - 0.53 0.53 - - - - 2 - WSP-27 6 WSP-21 - 3 WSP-35 WSP-25 WSP-29 WSP-26 0.00 WSP-23 - 0.81 WSP-32 - 0.12 WSP-24 WSP-22 WSP-31 WSP-30 0.00 WSP-33 0.70 WSP-34 0.13 WSP-20 + WSP-11 + WSP-02 Microsatellite (i.e., feature associated the “UKP”. to Markers as according labeled gene. + closest classified are the are function to unknown area upstream of genic or genes downstream a (Kb) within distance fell genic that the and population (N=10) Creek (H Heterozygosity tularosa C. 2 Table (%) *GC Size Total size contig Max L50 Contig N50 Contig contigs # Level Accession The NCBI. length. in and authors. nucleotides A.N.B all study. from 1. input the Table with of paper design the and wrote collection J.A.D. sample Tables: and A.N.B the to data. contributed the analyzed JAD A.B.O and B.L.P. J.R.W, A.N.B, ++ al. 1998 irstliemresue in used markers Microsatellite H obs H , ++ ++ ++ ++ ++ ++ ++ ++ . exp us sebysaitc o h six the for statistics assembly Quast eoe ahrwcrepnst n irstliemre,teosre (H observed the marker, microsatellite one to corresponds row Each genome. ei rxmt opeiul ulse irstliemreswti h el annotated newly the within markers microsatellite published previously to proximity Genic and .001 .93----3 MGA UKP 36 ENDOU PSD 505 - PARVA PRP2 1 18 - - - RERE 75 - 100 - - - - - 143 - - - 757 - - - - - 2 CBX2 9 NEUROD4 - ELOVL6 24 - 42 3 631 2 TEAD1 CDK16 4 TENM2 2 9 0.19 0.51 0 ENO1 3 2 2 0 0 0.50 0.51 0 5 0.51 0 0 0.40 0.39 0.60 3 4 0 2 0.50 2 3 Gene 0.19 2 0 0.53 2 Kb 0.19 5 0.39 Gene 0.20 Feature 0.40 0 0.34 0 0.20 Kb 0.66 0.50 Gene 0.20 Creek Salt 0 Creek Salt 0.50 Creek Salt Spring Malpais Spring Malpais H Spring Malpais A obs exp ol vial o ohppltos bandfo Iyengar from obtained populations) both for available (only .tlrs .vreau .nvdni .lmeu .rtclt .maculatus X. reticulata P. limneaus A. nevadensis C. variegatus C. tularosa C. 93 83 939 11 39 97 704Mb 35.3Mb 11 39 31.5Mb 728Mb 853 46.3Mb Chromosome 175 31.4Mb Chromosome 38 848Mb 11.19Mb 3,687 3,062 1.13Mb Scaffold 0.957Mb 0.750Mb 38 19,582 0.0087Mb Scaffold 1,026Mb 360 4.51Mb 39 1,086Mb 4,093 8.12Mb 0.843Mb 236 Scaffold 1.363Mb 1,995 Contig Pending ,adnme fAlls( Alleles of number and ), .tularosa C. .tularosa C. C_0720. C_0761. C_0267. C_0631. GCF_002775205.1 GCF_000633615.1 GCF_001266775.1 GCA_000776015.1 GCF_000732505.1 H exp sebyi e,teohrfieasmle eedwlae from downloaded were assemblies five other the new, is assembly S lsicto acltdfo eoye nStockwell in genotypes from calculated classification ESU A Cyprinodontidae tta ou ihnteMlasSrn N1)o Salt or (N=10) Spring Malpais the within locus that at ) 14 H A assemblies , eotduigcnis[?]5,000 contigs using reported obs tal. et intron 2004 .Poencoding Protein ). H obs exp ,expected ), et A a ptemUsra ei ei ontemDownstream Downstream Genic Genic Upstream Upstream nrnLRFNII Intron Intron Intron C1A70FANCL IFT80 700 BCL11A 36 SMC4 - - 3GRIN2C 43 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. new the 1 Figure Legends: Figures n ei itne )Uigol akr oae ?5K rmnihoiggnsrvae significant a revealed genes neighboring from [?]50Kb located Iyengar R from markers ( obtained only correlation Using negative B) (p-value=0.022) (R distance. relationship (p-value=0.780) genic H non-significant and a between H showed relationship (lm= analysis The model gression linear a Creek). using Salt tested or Spring pais predicted closest 2 Figure n iegnetms(Y)aelbldudrtete.Etmtddvrec ie eegnrtdusing generated circles were black times by divergence signified Estimated are tree. (Kumar support the TimeTree bootstrap under 100% labeled of are the grey (MYA) Significance solid times new divergence the sequences. the and and mtDNA designate divergence, to full mtDNA used of is to alignment star nuDNA pairwise A of 5. six taxa. ratio the Figure among 1:1 for correlation a divergence K2P depicts mtDNA the scat- line and is c) red nuDNA line distances; dashed K2P mtDNA mean The and dis- nuDNA between between comparisons. both distances signify correlation had mtDNA genetic the red which aligned mean between illustrating comparisons colored distances matrix, pairwise terplot Points genetic signify triangle K2P boxes left mean Black comparison. matrix, Upper triangle each signifies sequences. b) right Lower line for percentiles; red sequences. <0.01 windows nuDNA Dashed or >30kb aligned >0.99 across sequences. were value reference that nuDNA K2P tances aligned nuDNA between mean (K2P) the distances 2 Kimura Pairwise 4 Figure (Korneliussen angsd in implemented as inbred spectrum, frequency site al. unfolded the et using species each for estimated 3 Figure Actinopterigii reltime-ML .tularosa C. 04.Ardsa sue odsgaetenew the designate to used is star red A 2014). .maculatus X. eei iest (H diversity Genetic . eoi siae fhtrzgst ( heterozygosity of estimates Genomic . ula n iohnra enpiws eei itne among distances genetic pairwise mean mitochondrial and Nuclear . . egbrjiigte osrce sn h ue-atrmdl olwn utpesequence multiple following model, Jukes-Cantor the using constructed tree joining Neighbor UC eut o nlsso eoecmltns mn six among completeness genome of analysis for results BUSCO ucin(Tamura function assembly. .tularosa C. aaaecnann ,4 oeotoos tradbl otaeue odsgaethe designate to used are font bold and star A orthologs. core 3,640 containing database tal. et ie eeoyoiyin heterozygosity line, tal. et 07.Ardsa sue odsgaetenew the designate to used is star red A 2017). 20)adStockwell and (2004) ee h hp fec irstlielcsdpcstesuc S (Mal- ESU source the depicts locus microsatellite each of shape The gene. OBS tal. et OBS tec irstliemre N1)adtedsac K)t the to (Kb) distance the and (N=11) marker microsatellite each at ) Dsac+S) )Fralaalbemcoaelt akr,tere- the markers, microsatellite available all For A) ~Distance+ESU). 08 rmcntandtm siae mnmx bandfrom obtained (min/max) estimates time constrained from 2018) 2 03)btenH between =0.30) .tularosa C. tal. et H 15 .tularosa C. o h six the for ) sgetyrdcdcmae orltdspecies. related to compared reduced greatly is (1998). OBS OBS S n h itnet lss eewere gene closest to distance the and ESU , seby ihteecpino h highly the of exception the With assembly. n ei itne H distance. genic and Cyprinodontiformes .tularosa C. 2 005 ewe heterozygosity between =0.005) .tularosa C. Cyprinodontiformes Cyprinodontiformes assembly. assembly. obs examined. siae were estimates using , H a) : was Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 16 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 17 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 18 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 19 Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 20