<<

Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. atrso o eeoyoiyadhg nreigdpeso,dsiecnieal mut fdetected of amounts considerable despite ( populations depression, dog endangered inbreeding domesticated and high di- from threatened and genome-wide introgression of heterozygosity of historical history low incorporation demographic of the the patterns enabled of has context biology the (Leroy conservation within into metrics genomics versity of integration The efforts. Introduction and management term and future long isolated to the into integral their insights are by provide that assembly, constrained genome populations sizes new pupfish to population our Sands compared effective from White tularosa generated tracts long-term contemporary inferences, C. smaller putative population in of revealed These lower because alignments markedly habitat. likely was limited Nuclear heterozygosity , areas Genome-wide related (mitochondrial). genic from investigation. 0.022 near estimates further and variegatus, heterozygosity merit C. (nuclear) species increased that invasive 0.027 C. widespread introgression annotated of the were described of pattern 25,260 and MYA, previously tularosa a and C. ~1.6-4.7 the between exhibited 1.4Mb distances diverged of and of Genetic Many which genes N50 regions. of intergenic orthologs. near contig in assembly conserved those or a novo to Actinopterigii within with compared de expected fell genome The the markers draft of microsatellite species. 1.08Gb 95% tularosa related a of including among in possibility genes, diversity resulted the coding genomic Specifically, reads evaluated protein compared error-corrected tularosa. Units, and C. II Significant of genome, Evolutionary Sequel conservation quality tularosa PacBio demarcate and high C. evolution to a the the used developed into into we conservation previously insights introgression Herein, of markers biological withdrawal. provide is microsatellite groundwater to America, localized it and North we use pollution, Southwestern and chemical in genome species, reference invasive New draft to to part endemic in tularosa), due ( concern pupfish Sands White The Abstract 2020 24, November 4 3 2 1 DeWoody Black Andrew of patterns heterogenous differentiation and diversity tularosa) low reveals (Cyprinodon genome pupfish Sands White Endangered The ouainadgnm-iepten fvraiiycneeg hnwoegnm aaaeue compared used are (Fischer data loci genome microsatellite whole few when a emerge genetic from a can quantify variability genotypes from to of to data microsatellites) patterns genome-wide provide (e.g. and that markers long-term population tools genetic of neutral predictor on for putatively (Wan key relied diversity anonymous, critical a historically of are is have number diversity and geneticists limited genetic dynamics conservation overall population persistence, While historical population actions. into management insights contemporary key guiding provide can assessments genome-wide ea n nvriyCleeStation College University M and A Texas Copenhagen of University University Auburn University Purdue tal. et 1 08.Freape hl-eoesqec nlsso rywl ( wolf Grey of analysis sequence whole-genome example, For 2018). tal et 1 an Willoughby Janna , 04 n dniycnevto nt Mrt,19) oee,ipratdffrne in differences important However, 1994). (Moritz, units conservation identify and 2004) . 2 naBr¨uniche-Olsen Anna , .lpsfamiliaris lupus C. tal. et 1 07.Seicly eoisoesamr reliable more a offers genomics Specifically, 2017). 3 G´omez-Sanchez; ra Pierce Brian , ai lupus Canis 4 n Andrew and , tal. et 08.Such 2018). revealed ) Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. xrs epaePe i . a sdo eoi N hae oama rgetsz f3K and 30Kb of size University the fragment at mean II at a Sequel center to PacBio a sheared genomics of DNA core (SMRTcells) genomic cells the real-time on in single-molecule SMRTBell used 8M library The was two chemistry. Illumina 2.0 using 2x151bp sequenced the with kit NovaSeq prepare Prep Illumina Template an to using long Express used IN) PacBio Lafayette, was (West for University requirements protocol Purdue integrity from PCR-free the extracted until TruSeq extract was meet freezer to The DNA to -80@C used weight NY) a was molecular (Ithaca, in high kit Genomics deposited preparations, Tissue sequencing. Polar and library read and by ethanol PacBio Blood tissue the 95% DNeasy muscle For in skeletal Qiagen stored tissue. a fin W), preparations, from 54” library DNA 6’ Illumina 106deg ( For N, male processing. 2.88” heterogametic 54’ deceased (32deg a River 2018, September In Sequencing and Construction Library Collection, Sample Methods conservation species. and the imperiled of Methods this assessments of high-resolution potential future evolutionary enable and to infrastructure status genomic and order groundwork the the genome- evaluate among we the represented Finally, evaluate genomes sequences. we (mtDNA) (Heterozygosity, mitochondrial Next, diversity and of (nuDNA) wide classification. genomes nuclear ESU from to at assembly in collected distances this markers genetic pupfish compare “neutral” Sands and be new White ESU) tularosa to the Creek a leverage assumed Salt for then the previously assembly We of microsatellites genome part fishes. (a draft related population new other River population a Lost unforeseen evaluate Upper and and the future present any we against safeguard Herein, of to management established and and ESU classification (Stockwell Creek the extirpations Salt in the aid by to habitats. founded two developed these (Iyengar been between ESUs have native these differences markers two these microsatellite between by differences additional morphologic driven nine and putatively genetic the ESUs, of Creek recognition Since Salt and Collyer and Spring Furthermore, mitochondrial, Malpais (ESUs). Units allozyme, Significant and Using Evolutionary al. two et demarcated Creek. U.S. and Salt populations the native and by two Mexico the Spring species New Malpais endangered Stockwell the markers, possible by locations: microsatellite a threatened distinct as as of ecologically IUCN, review two populations by under native at Endangered is Ancestral as and Service. listed Fish, Wildlife is and species Game The of against ( Department Mexico. pupfish hedge Sands New a White artificial Southern as the of or on or focuses listed facilities populations, paper breeding present wild species (Koike 2020). The captive declining populations pupfish (IUCN augmenting in natural vulnerable 12 or of maintained or currently reestablishing commonly threatened imminent for are are the as stock there species identified provide complex, pupfish others to many pupfish refugia 14 (e.g, this, and fishers the of recreational endangered Among by Because highly critically bait them or 1997). as makes endangered Echelle, used which as those and 1981), Nearc- as Echelle (Miller, Many (Black such springs ; loss congeners 1995). desert habitat introduced Wildekamp, 1979), isolated with (Deacon, 1986; small bridization withdrawal ( Miller, to groundwater fishes and restricted to salinity-tolerant (Smith are susceptible and distribution species temperature Neotropical of pupfish genera and tic differentiation 10 Nearctic population of broad and group a diversity a are of pupfishes patterns The (Lehmann detailing species image endangered resolution among high and within a achieve to approach 20)dmntae usata dpiebd hp aito ewe niiul olce rmthe from collected individuals between variation shape body adaptive substantial demonstrated (2005) eoefreiec fitorsinvahsoia neseichbiiainb siaigpairwise estimating by hybridization interspecific historical via introgression of evidence for genome tal. et tal. et 04.Adtoal,torfg ouain MudSrn n otRvr were River) Lost and Spring (Mound populations refuge two Additionally, 2004). 1998). H tal. et for ) .tularosa C. 19)fudeiec o infiatgntcdffrnito between differentiation genetic significant for evidence found (1998) .tularosa C. tal. et .tularosa C. n opr hsdvriyetmt osvrlohrfish other several to estimate diversity this compare and 2 wsp-4 .tularosa C. tal. et 09 Shingate 2019; sebyt eemn h eoi oiinof position genomic the determine to assembly h vrrhn olo hswr st lay to is work this of goal overarching The . a olce rmteuprraho Lost of reach upper the from collected was ) 08 Martin 2008; eefis ouetddrn h al 1900s early the during documented first were ,wihi nei oteTlrs Basin Tularosa the to endemic is which ), tal. et tal. et tal. et 2020). 06 n neseichy- interspecific and 2016) 06 Black 2016; .tularosa C. Cyprinodontidae tal. et .variegatus C populations, 2017). with ) C. Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ompigteefaue oteasml ihbat(..0 Gertz (v.2.10; blast with assembly the from to obtained features were these mapping sequences to cDNA related 54,466 and from 2019) used Consortium, were sequences transcriptome transcriptome and maculatus protein and ( assembly, Minnow protein nuDNA Sheepshead translated the species: of annotate alignments help 2) To local and 2008). assembly 1) the performing: to by sequences accomplished then removed were were tation proteins known matching regions ( tularosa and 2008) library Hubley, the the and in Smit from identified (v.1.09; were families repeatmodeler element with transposable bly curation, genome draft After Annotation Genome Mitochondrial and Hahn Nuclear (v.1.8; mitobim with the assembly alongside genome used mtDNA and 2010) Hall, and ( Quinlan pupfish Hole mtDNA Devils al. the the Patil into et to (NUMTs) aligned (v.0.1; Translocations that coalqc Nuclear reads of Illumina used incorporation first inadvertent we the assembly, prevent help of To sequence reads. mtDNA complete The by genes. evaluated core was 3,640 the Completeness and of assembly). assembly set the the a Seppey in between alignments genes v.4.0.6; target local (busco; genome of performing curated Orthologs (proportion The comprehensiveness Single-Copy and removed. Universal genes) were Benchmarking species for non-target from scanned al. derived then been was have Challis (v.2.1; to assembly identified separate blobtoolkit2 using were as Roach contaminants that assembled (v.1.0; for those were evaluated purge_haplotigs then that program itera- were the error-correction haplotypes sequences using total heterozygous haplotype removed three coverage and so High identified polishing, were S1). short-read contigs (Figure of conducted iterations several were Zimin after minimap2 tions polca plateau the (v.3.41; with using to MaSuRCA correction assembly began assembler error quality optimal Chin additional the for (v.2.3; with the used arrow compiled were to (v.3.2; short-reads with script Illumina back calling quast controlled quality mapped consensus iteration, the by then polishing, followed depth to were 2018), each SMRTcells lead Li, After can both (v.2.11; reads from excessive errors. subreads as base-call PacBio assembly, of for accumulation depth the optimal contigs the to consensus Gurevich determine create due to to assemblies 90x) used subpar was and wtdbg2 (50x program levels the coverage genomes, fuzzy mtDNA a and to from Shen nuDNA over (v.0.12; both converted seqkit of were program assembly SMRTcells the two using the command removed v8.0 were from SMRTLink were files identifiable the that mapping using clipping reads format alignment processed by binary any 2015) subread discarding Following Krueger, PacBio and <20), adapters. length. (v.0.6.5; in (Phred sequencing galore 30-nucleotides bases of trim than quality absence less low with / removing filtered presence adapters, were the Illumina (v.0.11.7; reads and fastqc of assessment, distributions examination score quality visual by quality conducted 2017) was sequences Andrews, short-read Illumina of assessment Quality Assemblies Genome and Control Center). Quality Biotechnology Carver J. (Roy Illinois of 09 oqatttvl sessgnm opeeesb vlaigsrcua nert eg,full:partial (e.g., integrity structural evaluating by completeness genome assesses quantitatively to 2019) 06.Teresulting The 2016). rnpsbeeeetlbay ln ihteZbas ( Zebrafish the with along library, element transposable tal. et .Aogaltreseis 056poensqecswr onoddfrom downloaded were sequences protein 70,576 species, three all Among ). eBruijn de 03 a sdt eeaesmaysaitc eg,N0 oietf h pia depth. optimal the identify to N50) (e.g., statistics summary generate to used was 2013) ± 0ncetds ihPoEcue v12 Campbell (v.1.2; ProtExcluder with nucleotides) 50 rp v22 unadL,22) aBoraswr u-ape ttodifferent two at sub-sampled were reads PacBio 2020). Li, and Ruan (v.2.2; graph bam .variegatus C. binitio ab l a hncnetdbc to back converted then was file .tularosa C. eepeitoswt h ae ieie(..1 Cantarel (v.2.31; pipeline maker the with predictions gene ,Zbas ( Zebrafish ), bam2fastq tal. et .diabolis C. a eosrce sn h lae luiapaired-end Illumina cleaned the using reconstructed was 00 n atos(..7 Li (v.1.17; samtools and 2020) 3 tal. et tal. et .diabolis C. n octntd ed 1bo 5K nlength in >50Kb or <1Kb Reads concatenated. and .rerio D. ulmtgnm sabcbn o the for backbone a as mitogenome full Actinopterigii_obd10 06 ro oasml.T eeaea generate To assembly. to prior 2016) 2013). tal. et ai rerio Danio n h otenPays ( Platyfish Southern the and ) tN seby(C004.;Lema (NC_030345.1; assembly mtDNA ) tal. et fastq Ensembl 07.Cagst osnu assembly consensus to Changes 2017). 06 n cenn lgmnswith alignments screening and 2006) l omtwt etos(v.2.29.0; bedtools with format file tal et tal. et (Cunningham eBs irr,gn anno- gene library, RepBase ) 03.Floiglong-read Following 2013). . 04.Uigtecustom the Using 2014). .tularosa C. aaae hc contained which database, tal. et tal. et swiss-prot tal. et 09 oextract to 2009) tal. et 08.Primary 2018). uN assem- nuDNA 09 prior 2019) .tularosa C. 00 and 2020) (UniProt fastq enovo de tal. et file C. et Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. eeec eoe eeaalbea h ieo u analyses, our new of our time compare the to ( at sought pupfish available Amargosa then were we genomes framework, reference comparative a provide sa to and perspective For metrics Assembly the assemble to used S2. methods the of representation RefSeq anno- the functional to to prior compared removed, redundancy Tillich and merged (v.1.84; then were the models with gene tation the The 2004). with Korf, 2019) (v.0.15; Stanke, 2005). and Birney, Hoff and Slater (v.2.2.0; exonerate ( ( Guppy HE swl spplto ieetain eal aaeeswr sdfrteHEts Goand (Guo test HWE the for (Goudet Expectations Hardy-Weinberg used G-test from were exact departure parameters 1995). log-likelihood for Raymond, Default the (v.4.6; tested differentiation. and were population 1992) loci as two Thompson, well Stockwell these from as at results obtained the genotyped (HWE) plotting were ESU) and genotypes / times WSP-11), (N=10 1,000 & model the (WSP-2 associated permuting demarcation data by found evaluated our microsatellites and was with of associated significance microsatellite all with data cases, given using only one both a using regressions, In between another two gene. distances and conducted loci of we microsatellite distribution Results), the non-normal (see with (H gene the model annotated of linear nearest a Because the using R. fit the base each then and were in homozygosity at gene observed function diversity their predicted (N=11), nearest genetic loci the microsatellite of to these distance estimates between microsatellite relationship prior the a ESU, and if each For in) ( (or alleles found gene of was coding (H number observed it protein the of predicted that as estimates nearest feature includes microsatellite This the the each marker. to For gene, microsatellite regions. Kb) a genic (in within neighboring distance Integra- to occurred the in proximity reviewed report the manually determine we then to marker, were (IGV) lengths Viewer amplimer Genome and tive sequences microsatellite mapped All primers. sequences microsatellite al. full Rice each the (v.6.31; from primersearch end, 2009). distances this To Durbin, the Iyengar gene(s). determine from functional gene to obtained functional suspected assembly a nearest were genome on the our selection to leveraged with we locus microsatellite associated 1974), hitchhiking Haigh, genetic and to Smith (Maynard subject be might markers (Stockwell regions microsatellite control D-loop mtDNA and zyme define to work Earlier using the NCBI using Identification obtained from were Microsatellite downloaded files were (SRA) Archive species 2020. Read Illumina all August Sequence (i.e., and for 15 site data on assembly for FTP sequence GenBank) methods each (or shotgun same for RefSeq the raw the using used and galore, assemblies minimum data trim a genome with sequence using Draft cleaned raw quast and and The obtained busco were 5Kb. by Biosample) assessed of were length completeness annotation sequence and statistics assembly rence), utoudlslimnaeus Austrofundulus 19)pie e nthe in set primer (1998) eoeasml oohrpbihdreference-enabled published other to assembly genome glt ( ggplot2 oclareticulata ..2 ika,21) oeaut etaiya h w irstliemresue nESU in used markers microsatellite two the at neutrality evaluate To 2016). Wickham, v.3.32; UniProtKB tal. et .nevadensis; C. 07,vsaie ihoda (Lohse ogdraw with visualized 2017), .tularosa C. A .tularosa C. C_0267.) o l i eoe icuigtenew the (including genomes six all For GCF_001266775.1). ; .tularosa C. tec ou (Stockwell locus each at ) aaaeadbat.TeasmldmDAsqec a noae ihgeseq with annotated was sequence mtDNA assembled The blastp. and database GCF_000633615.1), ; tal. et Sswsbsdo uaieynurlmcoaelt aa ln ihallo- with along data, microsatellite neutral putatively on based was ESUs .rerio D. 20)adaindt the to aligned and (2004) GCA_000776015.1) seby(C089.)wt lt v3;Kn,20) o visual a For 2002). Kent, (v.35; blatn with (NC_028292.1) assembly seby eurn eomsac aefrbt owr n reverse and forward both for rate mismatch zero a requiring assembly, tal. et and binitio Ab 00 a sdt eemn h oaino ahStockwell each of location the determine to used was 2000) .maculatus X. .rerio D. tal. et 4 .tularosa C. tal. et eepeitoswr aeuigAgsu (v.3.3.2; Augustus using made were predictions gene . he other Three 98.T epguewehrayo hs ‘neutral’ these of any whether gauge help To 1998). tal. et Cyprinodontiformes 98 Iyengar 1998; obs GF027251 n h nulKillifish Annual the and (GCF_002775205.1) .tularosa C. riigst swl storud fsnap of rounds two as well as sets training n xetdhtrzgst (H heterozygosity expected and ) .variegatus C. tal et 03 n rti eunesmlrt was similarity sequence protein and 2013) uN n tN eoe,seFigure see genomes, mtDNA and nuDNA Cyprinodontiformes 96,a mlmne ngenepop in implemented as 1996), . eoewt w v077 iand Li (v.0.717; bwa with genome obs tal. et ~ tal. et GF003551 n the and (GCF_000732505.1) Distance+ESU w cffl-ee pupfish scaffold-level Two . 2004). 19) h 0samples 20 The (1998). .tularosa C. eeas included, also were fastq .tularosa C. < sn the using ) lsfreach for files 0bfo a from 50Kb seabove). (see exp .tularo- C. sra-toolkit swell as ) refe- lm et Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 9K) eutn n209cnis lbol2wste sdt dniynntre N eune nthe in sequenced DNA non-target identify length to (total used artifactual then as was 12 two and blobtools2 the 3.5Mb) contigs. purge_haplotigs from length 2,009 subreads program (total PacBio The haplotigs in using as polishing. resulting assembly polished short-read 596Kb), the then of N50 from was iterations a contigs assembly with three 332 The contigs, discarded using 39%. 2,606 corrected among of error content 1.09Gb were and yielded GC SMRTcells there wtdbg2 a ~40x. filtering, assembler of and coverage the quality 1,357Mb genome level, After of coverage estimated 90x an reads. with optimal paired-end a total) the 151-bp of Using yielded with (99.6% platform million reads remaining sequencing 286 reads million llumina controlled among among The quality 16 remained coverage). data million 100Gb among (~100x 284.9 sequence 5-50Kb, assembly data raw between genome sequence of reads with extracting use 43Gb of After for worth 22Kb. reads 165Gb of million length 14.3 yielded read cells polymerase SMRT N50 mean II Sequel PacBio Two Assemblies Genome and Control Quality was intention Results Our megax. of with review comparisons. estimated pairwise systematic was our a sequences perform mtDNA the to species not between using (K2P) node (Robinson distances available phlo.io genetic each using rendered at and constraints format time within as al. replicates used et bootstrap were 100 rates using model fixed substitution Jukes-Cantor a (Kumar using megax generated (v.2.1;Thompson was for clustalw matrix two using stance analysis; conducted phylogenomic then for was used sequences other were from sequences five mitogenome , full eight Therefore, group. The (NC_028292.1). included (C. also pupfish was River species: Red following tularosa the the for and NCBI from KM985373.1) downloaded were diabolis 2020 August C. all 15 species, of pupfish as among ble relationship matrilineal Kimura, the (K2P; compare were differentiation. To parameter genomic distances Kimura-2 pairwise genetic the of alignments, patterns using package evaluate length) pairwise R to the in local with (>2Kb Following generated sequence 1). 1980) -max_target_seqs aligned -culling_limit 3, using each 1, -qcov_hsp_perc genome between masked 3, -max_hsp estimated (-perc_identity repeat 50, congeneric parameters -evalue each non-default 1, against following performed the then with were blastn alignments local the Pairwise using sections 2000). 30Kb into ( genome each genome breaking reference available an with sis), species with pupfish visualized related For were results divergence The genomic >20. and mitochondrial quality and removed base Nuclear were and 30 length of in quality mapping >100Kb minimum Scaffolds Korneliussen (v.093; S1). angsd (Table using analyses downstream from tec iei ahrfrnegnm,gna v130 Pockrandt (v.1.3.0; Smit (v.4.07; genmap the repeatmasker genome, mismatches. two reference to up each allowing in site each at ( heterozygosity Genome-wide diversity Genomic .rerio D. n o context for and 08.EtmtdmDAdvrec ie eete noae oteN re xotdi Nexus in exported tree, NJ the to annotated then were times divergence mtDNA Estimated 2018). tN sebyrpre een h rvosyarchived previously the herein, reported assembly mtDNA (KX061747.1), eBs irr sarfrne ie ihmpaiiy< n eetdrgoswr excluded were regions repeated and <1 mappability with Sites reference. a as library RepBase tal. et Cyprinodontidae .maculatus X. 08.Mlclrtm siae eeotie rmTmte (Kumar Timetree from obtained were estimates time Molecular 2018). .nvdni amargosa nevadensis C. H a siae o six for estimated was ) tal. et n notru.Mlil euneaineto ullnt mitogenome length full of alignment sequence Multiple outgroup. an and GF027251,hmlgu uN ein eeietfidb first by identified were regions nuDNA homologous (GCF_002775205.1), ape Cyprinodontidae 04 sn h noddst rqec pcrm(F) eurn a requiring (SFS), spectrum frequency site unfolded the using 2014) .maculatus X. v541 Paradis (v.5.4.1; tal. et rubrofluviatilis splitter 06.I diint raigatmte,ma pairwise mean timetree, a creating to addition In 2016). 5 tN seby(C017.)wsue sa out- an as used was (NC_011379.1) assembly mtDNA K833.) h eetpps ( pupfish Desert the (KU883631.1), Cyprinodontiformes hlgn,btt rvd vltoaycnetfor context evolutionary provide to but phylogeny, ucinipeetdi mos(..0 Rice (v.6.60; emboss in implemented function tal. et tal. et tal. et C092) nadto otenew the to addition In NC_009125). ; 05 a hnue oietf eet using repeats identify to used then was 2015) .tlrs,C variegatus C. tularosa, C. 04.Rslswr iulzdwith visualized were Results 2004). Cyprinodontidae tal. et 94 n egbrJiig(J di- (NJ) Joining Neighbor a and 1994) .tularosa C. 00 a u using run was 2020) is,t eemn mappability determine to First, . reltime-ML .variegatus C. eSqmDAassembly mtDNA RefSeq tN eoe availa- genomes mtDNA ggplot2. H ucin(Tamura function and .macularius C. tal. et a calculated was (KT288182.1), 100-mers .nevaden- C. .tularosa C. 07 and 2017) ggplot2 tal. et and C. ; Posted on Authorea 24 Nov 2020 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.160621932.29583857/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 1.Dvriyvle mn l i pce lutae that illustrated ( species 0.00003049 six to all ) among values Diversity S1). ( heterozygosity Genome-wide diversity Genomic differentia- of levels (p-values=0) tularosa significant C. illustrated F loci (WSP-2, these at tion G-test exact An (p-values=0.195-1.00). (Stockwell demarcation H R to p-value=0.953; related SE=0.103, negatively was p-value distance genic SE=0.092, that SE=0.082, found Estimate=0.265, we Intercept: heterozy- 50Kb, 2A; reported (Figure previously ESU R microsatellite had or value=0.850; mapped between which gene were relationship loci nearest three significant 11/16 the and p no to Using genes distance suggested 2). more and (Table regression or heterozygosity one comprehensive gene of any our 10-100Kb from estimates, between gosity >100Kb located location were five a gene(s), to a of characterize 10Kb to within used the previously in microsatellites cated 16 the of Each Identification Microsatellite to the than similar over lower 1%), improved moderately F was values, 95%; these (C upon the based in completeness, 95%) Genome the (Table (C 1%). the 1.36Mb genes (F Using single-copy of fragmentation complete inherent 1). N50 3,452 Table identified a N50=31.5Mb; busco and size=35.3Mb; orthologs, 8.12Mb contig chromosome-level of max the 5Kb size (contigs=97; than than contiguous contig less greater max but (those reticulata N50=0.0843Mb) a contigs size=4.51Mb; 1.086Mb, contig 1,995 max of on (contigs=4,093; new length statistics the total assembly Thus, a genome, 1). showed draft length) pupfish similarityin Sands sequence White A the region. For control one and genes, metrics rRNA Assembly 2 of genes, two release tRNA the genome 22 between current published search polypeptides, other the 13 with in of consistent annotated also ordering were genes annotations coding these that 23,019 curated the the in with genes variegatus congruent coding generally protein 25,260 is of which coordinates genomic the identified maker Annotation Genome subsequently annotation. was Mitochondrial to and and assembly prior Nuclear primary file the sequence within nuDNA contig1125 with the match from identity removed 100% a had published assembly other NA to length the in of similar assembly The short-reads. of lumina sequence mtDNA complete The as assigned not were fasta (68Kb) contigs five ‘no-hits’, Overall, libraries. pupfish < .maculatus X. 0.001; l otiig204cniswste sdfrdwsra analyses. downstream for used then was contigs 2,004 containing file Arthropoda Distance seby(otg=4;mxcni ie4.M;N03.M)o the or N50=31.4Mb) size=46.3Mb; contig max (contigs=843; assembly GF003551.Fntoa noaino h new the of annotation Functional (GCF_000732505.1). eoe(al 2). (Table genome .tularosa C. 2 ST 005 =.5;D=,9.Uo iiigorrgeso omcoaelt oita were that loci microsatellite to regression our limiting Upon DF=2,19). F=0.050; =0.005; .maculatus X. seby( 66;F03) qiaett h chromosome-level the to equivalent 0.3%), F 96.6%; (C assembly < .nevadansis C. 030 S-1 F WSP-11, =0.350; .tularosa C. siae000 E000 p-value=0.780; SE=0.000, Estimate=0.000, : or , 0.001; .tularosa C. tal. et Chytribiomycota eoeasml.Trewr on ihna nrno ee v eelocated were five gene, a of intron an within found were Three assembly. genome Distance .variegatus C. 2 032 =.29 F21) ohmcoaelt akr sdt nomESU inform to used markers microsatellite Both DF=2,15). F=3.3249; =0.302; 98 ofre oHryWibr xettoswti hi epcieESU respective their within Expectations Hardy-Weinberg to conformed 1998) H sebyi oecniuu hntescaffold-level the than contiguous more is assembly ,wihw oewsdrvdfo ihyibe ie Estimated line. inbred highly a from derived was note we which ), a hnetmtdfralsxrfrneenabled reference six all for estimated then was ) seby( 0;F1% iue1). Figure 13%; F 70%; (C assembly ioeoe revealed mitogenomes Cyprinodon .tularosa C. siae-.0,S=.0,p-value=0.022; SE=0.003, Estimate=-0.008, : ST .tularosa C. n eesbeunl lee u fteasml.Tecurated The assembly. the of out filtered subsequently were and ) 064 n eelctdnab