<<

Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. idvriyi elnn napaeaysaea naamn aedet nhooei atr.Classical factors. anthropogenic the to address due to rate scalable alarming not an and resource-intensive, at time-consuming, scale are planetary approaches a monitoring on declining is Biodiversity Abstract G.U – 0000-0003-4086-7445 ORCiD: S.M 0000-0002-9114-8793- ORCiD: Biology, Molecular and (G.U. Cellular [email protected] for Correspondence: CSIR-Centre Species, Endangered of India Hyderabad, Conservation the for Laboratory Umapathy PCR-free Govindhaswamy and using Manu Shivakumara Life of Tree title: the Running across eDNA Extracellular Biomonitoring of for Sequencing Workflow Ultra-deep Metagenomic for Novel adapted A and ecosystems different in ARTICLE tested RESOURCE be availability life increasing can of tree the approach the and our across sequencing project, biodiversity of biogenome of one costs assessment earth about decreasing rapid with the microorgan- large-scale With detected from from effectively brackish workflow. resources life, be hyper-diverse reproducible can of genomic , large tree our of and the a using across reads Chilika, as biodiversity paired-end such in that macroorganisms billion show setup low-abundant we relatively experimental demonstrate statistics, the We spatially-replicated incidence-based to pilot-scale Using isms algorithm. a ancestor India. in common in next-generation lowest ecosystem approach ultra-deep dual lagoon pro- a an our enrichment the in step, of DNA using preparation life utility extracellular strategy library of assignment customised the PCR-free tree pseudo-taxonomic a completely the a of a across and comprising samples, data workflow sequencing, water biodiversity metagenomic filtered unbiased novel large-volume biomonitoring, deliver a from next-generation can describe tocol of using we that potential organisms Here, technologies of full genomic assay. groups the PCR-free single targeted utilise To develop monitoring to to biases. limited intended environ- PCR-induced currently we various The is from evaluating scope for suffers crisis. its solution which However, biodiversity holistic metabarcoding, current entities. and ecological scalable, the various efficient, monitoring address an in biodiversity provides to changes Classical framework scalable factors. biomonitoring anthropogenic not next-generation to DNA-based and due mental rate resource-intensive, alarming time-consuming, an at are scale approaches planetary a on declining is Biodiversity Abstract 2021 22, February 1 Manu Shivakumara of Sequencing eDNA Ultra-deep Extracellular PCR-free using the Life across of Biomonitoring Tree for Workflow Metagenomic Novel A etefrClua n oeua ilg CSIR Biology Molecular and Cellular for Centre imntrn costete flife of tree the across Biomonitoring 1 n oidawm Umapathy Govindhaswamy and # ,saucm.e.n(S.M.) [email protected] ), # 1 1 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. N na niomna ape(iue1,eN-ae imntrn a mre sapwru new powerful Numerous 2017). a al., of as define et sources (Deiner emerged clearly various communities has to ecological exploiting biomonitoring survey and By we eDNA-based way 2019), 2020). 1), the al., Stewart, revolutionised (Figure et 2016; has sample considerable that Turner, Rodriguez-ezpeleta environmental technique been & 2020; an also al., (Barnes in has biomo- et eDNA DNA (Pawlowski there eDNA-based of eDNA advances, 2012a). of ecology term technical al., development the the the et methodological the understand with (Taberlet the utilising to Along air) in ecosystem effort 2019). soil, strides water, (Seymour, the tremendous techniques (e.g., in witnessed nitoring samples taxa decade environmental various last whole of mole- 2015). from The (eDNA)-based Willerslev, presence directly DNA & the Environmental extracted (Thomsen detect scale. methods DNA large biomonitoring techniques a classical biomonitoring over on advantages eDNA-based deploy several to of offer scalable identification methods easily manual to cular not require before, are resource-intensive, ever and time-consuming, life. than specimens, of are large-scale techniques a encyclopedia at biomonitoring the strategies. biodiversity Classical in management monitor few catalogued guide and next being and assess the before policies to in even conservation imperative extinct extinct chart that Under become go go estimated 2019). has might may U.N. it Services, and species Therefore, the Ecosystem extinction many by and with scenario, Biodiversity threatened report current on be assessment the long-lasting Platform may global have the Science-Policy species may A (Intergovernmental to eukaryotic and dropped services. decades million irreversible compared have and is a times species functioning species to of of many ecosystem up thousands Extinction of the 2017). to sizes on 2015, hundreds al., population effects increased et The have (Ceballos 2020). loss, rates rate habitat background al., extinction change, threat et land-use species impending pollution, (Ceballos an and as cata- currently change significantly, such to is factors climate of years there anthropogenic However, years various of and 2013). to hundreds 250 Given majority poaching, al., due take despite earth. the extinction et would science mass (Costello on 2014), it discovery, sixth alone to exist al., of species species unknown 2016) et novel eukaryotic remain the of Lennon, (Parr diversity all rate database & microbial logue current life (Locey the the of At microbes of exploration. encyclopedia of most scientific the species and of environ- diversity trillion various eukaryotic completeness a inhabit of the to to lineages and up ( adapted evolutionary million contested estimates and and 8.7 the fourth these 2011) evolved about a All have that al., and estimate life Eukaryota, et evolution. studies Bacteria, of (Mora accepted of Archaea, tree Widely years the domains: earth. three across of on to ments Organisms billions belong of Viruses. life of result of category tree the the is up earth make that on biodiversity vast The INTRODUCTION Life of of Tree tree Metagenomics, the ing, across be biodiversity can sequencing of approach Keywords: assessment of our rapid costs project, large-scale decreasing biogenome for the earth adapted With with to the and life. detected from workflow. ecosystems microorganisms different resources effectively reproducible from in genomic be our of tested life, availability can using increasing of Using Fishes, reads the tree and paired-end and India. billion the Arthropods spatially- in one as across pilot-scale ecosystem about such biodiversity a lagoon macroorganisms in brackish that low-abundant approach hyper-diverse show relatively our large we of a statistics, utility Chilika, dual the incidence-based in the preparation demonstrate setup using We library experimental strategy assignment PCR-free replicated algorithm. pseudo-taxonomic completely ancestor a a and common sequencing, samples, lowest DNA next-generation extracellular water customised ultra-deep in filtered a an life of large-volume step, of comprising workflow tree from metagenomic the novel suffers across protocol a data which intended describe enrichment biodiversity we we metabarcoding, unbiased Here, biomonitoring, deliver assay. using next-generation can single of that organisms a potential technologies its of full genomic However, PCR-free the groups entities. develop utilise targeted To to ecological various monitoring biases. in PCR-induced to various changes limited from evaluating currently for solution is holistic scope and provides framework scalable, biomonitoring efficient, next-generation an DNA-based environmental The crisis. biodiversity current niomna N,EtaellrDA etGnrto imntrn,SognSequenc- Shotgun Biomonitoring, Next-Generation DNA, Extracellular DNA, Environmental 2 ± . ilo)seiso Eukaryotes of species million) 1.3 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. htiflec ahohr(acmt,20) etgnrto imntrn ssahlsi prahb in- machine-learning by using approach life derived holistic of networks a web ecological interacting uses multi-layered eDNA-based continually large-scale biomonitoring a the from next-generation is properties in 2009), network ecosystem meta-ecosystems, (Bascompte, ferring the other to incorporating Since each genes by 2020). influence from biomonitoring al., that levels, classical et organisational (Makiola of framework ecological limits biomonitoring of the current 2017). spectrum all the al., the et broader for push to (Bohan applicability a to barcodes biomonitoring several their next-generation aims DNA Nevertheless, restrict the biomonitoring library as that decades. known Next-generation approaches the coming bar- biomonitoring targeted eDNA-based complete the DNA the of to frontier to in of next inherent plan mission millions are and biodiversity amassing limitations planetary species is and available eukaryotic disadvantages a consortium were of with Life taxa range organisms of related wide multicellular Barcode closely taxonomic a between International low from resolution The at species, conserved taxonomic codes 2019). invasive are high al., single that a et a a ITS) provide reference (Meiklejohn rbcL-matK, or of and large COI, Kingdom) single-species monitoring Second, (e.g., (e.g., 2019). a classical barcodes (Seymour, ranks of eDNA-based existing DNA macro-invertebrates monitoring standardised and of the species fishes of targeted approaches with multiple of databases of targeted communities synonymous or idea or was the 2020) the species, taxa adopted keystone al., First, related readily reasons. et of community main (Jensen two group scientific hybridisation species of as The single because such 2019). a group approaches biomonitoring a al., targeting PCR-free used detect Recently, et for is to marker. assay employed utilised (Seeber barcoding PCR is were quantitative common metabarcoding sequencing a PCR-based a Routinely, approaches universal capture sharing detected. a targeted are taxa and taxa sensitively, related to related species of limited of single group currently a a detect are or to biomonitoring species specific eDNA-based a the in where in employed environments of fast-changing methodologies the needs The on the check for paleo-communities a suitable keep and techniques to past Anthropocene. eDNA-based high required reconstruct current make extremely stra- programs, to scalability an biomonitoring sampling used and with large-scale annual be flexibility sediments very can or Such deep centuries a seasonal in 2018). to of with preserved al., years (B´alint resolution DNA biomonitoring et from such extracellular temporal ranging long-term Finally, Further, resolution application high benefit 2018). 2020). temporal time-sensitive a might al., al., with a months et et particles few (Nagler for (Thomas tegies soil a suitable difficult surface-reactive to is is weeks suspended organisms days extra-organismal few to of Whereas, few sampling bound 2018). a direct DNA al., of ecology where extracellular et that resolution functional species Lennon DNA invasive the temporal 2016; relic detecting studying al., moderate as as et for considered a (Carini suited are with signal 1) DNA best DNA functional (Figure extracellular is desired applications and DNA the various extra-organismal organismal obscures for while biodiversity example, communities, scalability For of microbial ecological resolutions 2014). offer of (Dysthe sample spatio-temporal sampling al., a repeated of are et in sample for range present samples same (Bohmann need DNA wide the the environmental eDNA and a of eliminating Thus, taxa providing types organisms, diverse different taxa. by of target the set of to Fourth, different used 2018). range a For al., be detect wide can et 2015). to sample a al., single repurposed other et a across be and Torti that can organisms DNA, sense 2016; the from from extracellular Turner, in invertebrates, DNA sizes & scalable, of microorganisms, various biologically (Barnes sources contain 1) of multiple can (Figure particles sample contain sample life biological per typically sediment of samples incurred bulk tree eDNA a cost the Third, instance, economic across increases. West the biomonitoring organisms project (e.g., eDNA-based reduces different the effort in many sequencing of used minimal next-generation scale methods with the and molecular ecosystems as PCR of sampling entire nature quantitative in covering scalability high-throughput as samples physical the such Such of Second, electrofishing. number 2021). or large al., bio- nets et a significantly classical gill of requires the using communities surveying collection of fish than most enables detect resources than to and effort samples time water less less filtering very ecological example, rela 3 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ssml olcin niomna N xrcin irr rprto,sqecn,adbioinformatics. and sequencing, by such preparation, introduced pipeline library biomonitoring artefacts to the extraction, approach and of DNA top-down step reads every environmental a duplicate at followed collection, solutions We uninformative deduced sample samples 2015). of logically of environmental as Zador, adapting number in & by proportion high issues (Kebschull microbes overwhelming these libraries of a the alleviate sequencing abundance and 2020), of high 2017) al., amplification the al., et to PCR et due Singer assignment taxa (e.g., Stat taxonomic overall macrobial between databases (e.g., of to efficiencies proportion reference relative lysis low incomplete a in taxa 2017), Tara differences to microbial al., (e.g., to et due area (Djurhuus due vast reads taxa bias a of of monitoring a extraction from range of DNA wide biodiversity while collection a the 2020)), namely, encountered from of al., approach, cell-types representation are et metagenomic unbiased (Sunagawa a that an project using obtain challenges ecosystems oceans to main large samples in of five life number identified large of data we tree to sequences. the survey, sampling genome across literature transitioning from biodiversity whole the era, of for biogenome number on factor earth growing limiting Based Therefore, the the the life. for from of be pipeline benefit tree biomonitoring could will futuristic in- the that lab a across analysis, sequencing approaches. dry develop biomonitoring metagenomic genome followed and to next-generation various adopting aimed lab, project, untargeted for of we wet to barrier the completion field, biomonitoring a and of targeted the species be progress phase from known at not the first every challenges will with sequencing sequences the technical future, finally reference Instead, in near and of lineage the availability phase, eukaryotic the In second itiatives, phase. the the in third in the families will every in genome the for reference genomes of representative representative a each by where for approach The massive of phylogenomic generated (https://www.earthbiogenome.org/affiliated-project-networks). this time a joined taxa adopted be record have has a of initiatives databases project in variety sequencing genome biogenome million) genome wide earth incomplete 1.5 large-scale a of Several (about 2018). scenario targeting species al., eukaryotic the effort et known change (Lewin the to decade all in a set for initiative over is presently resources moonshot is international Project genomic an metabarcoding generating BioGenome Nevertheless, which by Earth 2017). to the rela- al., due have called et databases (Stat genome biology databases metagenomics whole barcode than of the DNA preferable completeness than the coverage more the Currently, species upon sequences. of high depend very proportion genome directly tively The whole will sequences. containing annotated approaches DNA taxonomically metagenomic databases environmental be capture, reference of can hybridisation library to that Due and input across 2012b). sequences the PCR biodiversity al., metagenomic of as monitor et representation (Taberlet such to approaches unbiased steps used targeted an enrichment of be provide limitations of targeted can and of benefits that biases absence the full sample overcome the sequencing the per and throughput life sequences reap of high DNA to tree extremely of fundamental the on billions metage- is sequencing yield a life that shotgun could envision environmental of platforms to performing tree first biodiversity the of the were of approach al. across nomic structure et biomass Taberlet underlying programs. of the biomonitoring abundance next-generation preserves eDNA-based relative which detection of der (van of terms targeted approach in are unbiased taxa and of untargeted range An wide bias are a primer when there and 2020). deter- abundances bias, Finally, Nijland, in relative marker diversity. & bias, and usage genomic Loos diversity amplification its original bias, their extraction the limits monitoring by DNA obscures genes as and that separated such non-barcoding communities biases species from technical of between known data potential various interactions life of ecosystem-level functional of of lack building the tree types targeted the for the mining other necessary Further, a across is and distances. organisms data , evolutionary trophic holistic the and a large all incorporate Such plants, of assay. that abundances single fungi, networks differ relative a ecological protists, in that the sample prokaryotes, determine barcodes environmental obtained to DNA as an principle, used standard from such in be on organisms never be, dependency can can of the approach that to groups information due next-generation major ecological instance, to in For approaches div 4 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. rsrigfitrhle smd po eictn lsi aeilwihcmltl eoe n rcsof 2019). traces al., any self- et removes (Thomas The completely temperature extraction. which enrichment room DNA material in DNA for We plastic weeks Extracellular dark sampling. several desiccating the the for of in during eDNA the up temperature preserves controls avoids made and filtration room replaced also water is negative at are system any holder laboratory which sampling of use filter the holders The not volume preserving to filter minute. during did and samples self-preserving per lysis we the rate single-use cell litre Thus, transported any sterile the location. one minimise utilising sampling than maximise to every by less to 10psi for contamination of of used sample rate pressure was of flow vacuum module risk maximum a a of maintain sampling 47mm set free-form and in triplicate We the water a filtration A location. of to used litres per 2018). bind 10 sampled we to al., about water shown Further, filtered et We been 2018). 28 (Thomas 2013). has Keeley, on al., such Inc. it & selected station DNA (Liang as et sampling The affinity extracellular (MCE), geolocated chemical (Nagler locations. ester by each its (50um-2mm) cellulose to sampling bound due mixed sand water potentially the DNA of of be and in extracellular up volume could observed (2-50um), DNA made more that turbidity membrane of for silt DNA particles filter high allow yield any (<2um), soil to the of sufficient clay suspended 0.22um to absence obtain retains of as due the to instead also to clogging 0.45um filtered size water owing before of pore be the DNA size filtered from to of pore be extracted amounts need a be to more water selected can relatively We of that step. require eDNA volumes amplification that total large approaches the 2015), PCR-free of al., fraction for a et only (Torti experiences up sample makes and study. DNA sampling 2012) this extracellular and al., of Since minimal et objective is (Sarkar the column for metres collection water suffice sample 2 the will Large-volume in about DNA stratification extracellular of vertical for depth water of Chilika average surface possibility Since an the critical. the with is winds, the sampling lagoon coastal in targeted minimal water strong We programs with biomonitoring shallow 3). information annual with (Figure a maximum or DNA seasonal lagoon the is Extracellular for the obtaining choice locations. of where excellent sampling an sector ecosystems the is central vast at resolution the 1) spatio-temporal in (Figure high very water transect demonstrate east-central a the km geolocated to in in 10 spaced strategy DNA equally system a Indian environmental sampling three on river extracellular the selected replicated S29) We major of approach. spatially S28, a metagenomic (S27, pilot-scale part our river, stations a north-eastern of Mahanadi designed reproducibility the We and of feasibility Bengal, 2012). tributaries the al., of community the et is Bay unique from (Sarkar Chilika a the India India. with species from km. of freshwater sq. coast organisms and 1100 east marine about ocean we the spanning of ecosystem, world on large the consisting located a in assemblage in ecosystem lagoon life brackish-water lagoon of largest tropical tree second the biodiverse the across highly biodiversity a of Chilika, monitoring effective selected for methodology our test To design study era. Pilot-scale biogenome earth for the METHODS tool in powerful AND programs a MATERIALS biomonitoring experimental provides approach pilot-scale next-generation to unbiased a implement 2) in and to (Figure ecosystems untargeted ecologists workflow aquatic an molecular large such metagenomic in of novel life implementation our of this Large-scale tree of In setup. are the taxa. reproducibility across macrobial organisms and biodiversity on than monitor feasibility target taxa effectively rather the microbial the pseudo- analysis of demonstrate community when a effect we overshadowing incidence-based richness adapted paper, the used billions we taxa minimise we Fourth, to obtained the Fifth, statistics taxa. database. approximate and abundance-based macrobial reference to libraries low-abundant the assignment PCR-free the the in read underrepresented pushed the detect we of to of Third, approach of sample sequencing reads. taxonomic method per ultra-deep the PCR-free reads in performing completely paired-end duplicates by a of filtration and used sequencing water artefacts we of large-volume PCR-induced Second, using limits obliterate protocol. samples to extraction environmental preparation DNA the library customised in a DNA and extracellular techniques the targeted we First, th eebr21 sn h nertdeN ape ySmithroot by sampler eDNA integrated the using 2019 December 5 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. h hsht ue a rsl rprdbfr xrcinb iig017 fNaH of 0.197g mixing by extraction contamination. before aerosol any prepared avoid 70% to freshly and extraction clean water was Na the distilled a during buffer by a liquids in followed the phosphate membranes bleach, Within all commercial filter The pipette Germany). diluted the to 50% (Macherey-Nagel, filter-tips from with used kit extraction We wiped ethanol. DNA were soil isolation surfaces extracellular Nucleospin bench DNA performed The the column-based we chemicallab. a collection, from by through DNA sample columns the isolated of desorb and week then and the reagents is bridging using the DNA using cation of particles via desorbed protocol groups particles cellular The phosphate soil intact the the 2012c). of with the (Taberlet to surface compete displacement lysing the bound buffer to without DNA the bound extracellular particles from DNA filter groups the soil extracellular the phosphate desorb and The DNA from to membrane buffer. extracellular extracted is phosphate filter the 1) saturated protocol MCE enrich (Figure extraction the to the DNA the of 2012c) extra-organismal from of surface al., and principle protocol et main organismal Taberlet extraction The of 2013; DNA membranes. Keeley, proportion buffer-based & the Liang phosphate minimise 2015; lysis-free and al., modular et a (Lever literature modified and adapted We h rcs ftxnmcantto fmtgnmcrasdrcl eed nteaalblt freference of availability the on depends directly reads metagenomic of annotation taxonomic of process our The for suitable improved strategy capabilities drastically The analysis sequencing has platform. Sequence high-throughput Novaseq that extremely the would Illumina provides on from which and requirements. cycles sequencer sample sequencing 300 reads next-generation deep for short per of sequenced of series reads and quality latest cell single-stranded paired-end the the into flow of is denatured patterned were 6000 billions a libraries Novaseq onto yield pooled loading The to adjusted before macroorganisms. We low-abundant DNA libraries libraries. detect other our using to with verified us pooling of enable were before concentration validation adjusted libraries library were relative the qPCR concentrations a the of the using and sizes quantified USA) were fragment Bio, libraries (Takara The The kit cause USA). chip. that DNA Coulter, high-sensitivity jumps (Beckman dual Bioanalyzer tag Agilent unique beads and reduce size-selected using SPRI significantly were ligation ultra- fragments indices with Covaris to library UD purified adaptor-ligated the prior The The dA-tailed using USA). samples. multiplexed and fragments (IDT, across repaired 350bp Illumina cross-contamination were completely into for DNA adaptors sheared a fragmented indexing randomly to the (UD) first of due reproducibility was ends the preparation DNA The test input library to sonicator. The us reduce in replicates approach. allows involved technical drastically strategy our include stochasticity sampling of and spatially-replicated not no the chimaeras, did Moreover, is We workflow. indels, there amplification. PCR-free substitutions, library since from preparation as library-preparation library arise for such PCR-free DNA that DNA of artefacts reads Truseq microgram Illumina duplicate PCR-induced one the uninformative and avoid chose Qubit We 20ng/ul, in to preparation. to assay library diluted method for DNA was input double-stranded concentration as high-sensitivity DNA taken USA). a was Scientific, using Fisher any extract (Thermo DNA include 4 extracellular not the library did quantified PCR-free sequencing We We the ultra-deep kit. for and soil required controls. preparation extraction DNA Nucleospin library negative input the PCR-free column. from of silica from obtained amount the be buffer high through cannot EDTA the passed protocols Tris since then preparation extraction of and during buffer, 150ul condition controls binding and using DNA-binding negative the matter, The eluted using acids. particulate column. was adjusted humic removal precipitate then of DNA inhibitor to was proportion Nucleospin g flow-through large the x the co-extracting care through cells. 11000 of avoid passed took intact at to We was minutes centrifuged the cells. supernatant 15 immediately disrupt lyse the beyond was to not mixing mixture microbiology but of homogenised ceramic in time process large The employed the for The desorption bead-beating exceed shaken module. from the to were holder different to not falcon tubes principally recalcitrant vertical falcon is clumps the The process with soil This kit. vortex-mixer phosphate homogenise soil a 5ml Nucleospin help on containing the them tubes beads from placing falcon by 15ml (0.6-0.8mm) into minutes beads them 10-15 ceramic placing large before and forceps sterile buffer using rolled and holders 2 HPO 4 n10lo N-rewtr(.2,p ) itrmmrnswr aeul ae u ftefilter the of out taken carefully were membranes Filter 8). pH (0.12M, water DNA-free of 100ml in 6 2 PO 4 n ,7 of 1,47g and Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ihrls hn1 frasrltv oteprn ao rls hntnras hcee a ihs among contained highest was which whichever levels reads, ten family than analysis false-positive and less community or the genus Incidence-based taxon the reduce species, parent and of To the identity, the to read pair. thresholds. relative at 60% each two every reads taxa of the for of for 1% frequency assignments divergence LCA than low taxonomic less sequence the very et either 2bLCA maximum calculated filtered (Hingamp the a then we obtained algorithm workflow and 1e-5, We assignments, 2bLCA easy- independently of 80%. the the reads of threshold with used coverage paired-end e-value We filtered reads query an low-complexity assignment. minimum the used and taxonomic a annotate We trimmed to taxonomically 2013). quality prior to also al., 2018) MMseqs2 were al., reads in et sequencing implemented raw (Chen BLAST+ FASTP The in implemented 2009). using algorithm DUSTMASKER al., the regions using et low-complexity (satellite, cluster than the (Camacho each keywords identity masked of shorter We annotation S¨oding, 99% sequences 2018). representative & sequences at sequence the (Steinegger clustered the the in LINCLUST were on using sequences filtered redundancy remaining based We reduce The sequences to tandem). assemblies. repetitive intersperse, genome and transposon, repetitive, RefSeq 1Mb, repeat, than and (INSDC), in longer Genbank database Collaboration or NCBI the 100bp Database the in Sequence (nt) represented from Nucleotide sequences not International regions nucleotide is non-redundant the 26 that the from on using species downloaded labels database related reference taxonomic a of the Currently, their constructed of labels environment. calculated. We and sampled presence taxonomic cases. the is the the in the E-value) presence of Therefore, indicate pseudo-taxonomic species of most sequence. only of represent indicator terms query might labels direct the it (in a otherwise, taxonomic as of Instead, query; hit considered identity the original be best taxonomic of cannot the reads the sequenced real to most metagenomic not the to assigned databases, is not be relative genome locus and will hits the reference assignments hit If significant incomplete best database. the to the the all of due against label of hit taxonomic best LCA locus the the reference the the the of species, of algorithm when resolution region related hit taxonomic aligned any The the best Further, the in the S¨oding, 2017). species. querying finds target by & and the assessed from species (Steinegger sequence is sequenced the MMseqs2 nearest contain the not in from does database implemented locus homologous 2013) a algorithm for ancestor al., common searches lowest et blast dual (Hingamp the using (2bLCA) approach assignment pseudo-taxonomic a adapted We of reads size database. of resources, reference vectorised assignment computing the a available Pseudo-taxonomic of the utilises also size on then MMseqs2 and based hits. and datasets, parameter database S¨oding, (k=15) 2017) query sensitivity the kmers & and the the short reads (Steinegger fine-tune shared query to MMseqs2 et the the flexibility between called (Lindgreen provides using fas- homology species algorithm query infer times divergent each search to between of alignment for hybrid Smith-Waterman homology Alternative hits thousands a find prefilters databases. are selected cannot first reference 2019) we and which large al., sensitive Therefore, against less et 2016). such very reads (Wood al., algorithms but of KRAKEN BLAST detection billions as than assign homology query such ter to alignment-based to BLAST algorithms However, as slow kmer-based sequence. queried such prohibitively alignment-free the query algorithms are to search the assignment BLAST sensitive to taxonomic of as towards robust output label levels origi- a the taxonomic provides taxonomic algorithm requires a 2007) (LCA) algorithm lower al., LCA Ancestor from et The Common move (Huson sequences. Lowest MEGAN we proportion the in the as circumstances, implemented general, nally these assembled, reduces In 2016). Under genomes reads al., resolution. their et metagenomic genome species-level have (Coissac pres- classified the species selection species the in taxonomically rate, known of mutation regions of complexity, history the different sequence evolutionary Further, all the and recombination, on databases. of reads sure, depending sequence fraction classifiable resolutions nucleotide small taxonomically taxonomic public variable and of a provide homology, in proportion only detect archived the to because and Currently, used low annotated, loci. algorithm very genomic the be the of would sensitivity of the resolution organisms, taxonomic target the the from sequences genome th coe 00fo h CIFPsres h tdtbs locnit fannotated of consists also database nt The servers. FTP NCBI the from 2020 October 7 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. 4gnr,Apii ihfu eea hnrctyswt he eea n etlswt n observed one with Reptiles and genera, with Aves three genera, the with 30 among Chondrichthyes with representation genera, Mammalia highest four were the Phylum with groups had invertebrates. Amphibia represented genera of low genera, 61 60.39% the with 14 and Among Actinopteri sequenced chordates. class the of and and in 39.61% invertebrates we Metazoa genera of detect, of 97 comprised groups to with targeted genera difficult Arthropoda Bacteria, commonly most Metazoan the of the 308 were among are The genera observed genera macro-organisms samples. genera 675 and eukaryotic abundant Specifically, of (18.85%), Further, Archaea, low number Viridiplantae 3.67%. the 4). the of (18.85%), Since just inspected Fungi (Figure genera 1A). (32.36%), with samples (Table 64 Protists sequenced namely, (29.93%) proportion Viruses, kingdoms, Metazoa lowest the of the among the all genera classified Eukaryotes across had reads 47 of distributed of Viruses of proportion genera estimate the highest presence 1029 an the and the and had provided (45.65%), Eukaryota indicated which samples. level results level, our genus genus in the life the the of to to tree up the up the least across of at genera 21.53% 1815 resolution about of Overall, a three specifically, (4.02%). had proportions, the Eukaryota different and assignments across with (44.23%), assignments in life pseudo-taxonomic Bacteria of to pseudo-taxonomic read (0.32%), tree pair-merged each us Archaea the the (51.41%), allowed million across for Viruses spread of 353.07 reads labels were one of assignments the LCA taxonomic total to constructed derived of The a reads independently samples. structure the obtained the the paired-end We against using of The pair. assignments reads algorithm. 0.54%) the taxonomic 2bLCA high-quality (SD discordant the sequences 8.71% the the nucleotide average using searching correct million taxonomy on By 60.9 NCBI annotated identity. of about we eachlevels clustering sequence MMseqs2, units and using 99% sampling obtained filtering database We at 14 by reference composition. and sequences INSDC taxonomic S29 reference the the sample determining million the from for 35.5 from analysed of separately reads average complementary set were million the the which a 100 of S27, method, sequencing of and reads preparation and units S28 duplicates of (S27) sampling library from optical 0.39%) 27 26 PCR-free from derived (SD Station arising completely We 98.19% and 0.31%) a strand. (SD (S28) of 6.65% used average 28 was we an Station rate retained Since duplication from we each assignment. filtering, reads taxonomic the low-complexity paired-end from for and billion reads quality 1.5 paired-end extracellular After billion and spatially-replicated samples. 2.6 (S29), three about the sample obtained from 29 we data Station demultiplexing, sequencing of Upon bases samples. Giga DNA 851.43 environmental of total a generated life We of tree the across samples. the Jaccard Biomonitoring the among estimates depth the estimate SpadeR sequencing 2015). measured to in shared Jost, differences we depth RESULTS & and the Finally, (Chao unique sequencing considering bootstraps taxa. of by of hundred number indices various with function the Sorenson package across R using and a diversity SpadeR samples as the spatially-replicated maximum extrapolation in completeness three detect taxa and (Hsieh the sample interpolation to bootstraps among the hundred The similarity required with community 2014). assessed package depth al., R then optimal et iNEXT We the (Chao the level 2016). using units species al., performed sampling the were (hill et the at curves sample of each accumulation higher in each genus prohibitively Genbank taxa of in and various the genera of 2019) richness in different al., genus level of asymptotic et of genus the (Leray genus 0 counts estimated the 1% the of then the We to number than at obtained 2020). analysis less sequences al., our we be et misidentified restricted unit, We (Locatelli to taxonomically level. sampling estimated genus of each is the proportion In to database up each. the least reads reads because at of million classified level billions abundance-based taxonomically 100 containing is are standard sample of that database metagenomic the units reads each reference of divided sampling taxa. We the instead macrobial multiple 2012). the framework when into al., from inevitably statistical et taxon signal environment (Colwell incidence-based taxonomic a the analysis the an community in of obscures adapted microorganisms and abundance we taxa of database. true Therefore, microbial diversity reference the the and the of in represent abundance overrepresentation available causes high not the sequence in the genome do taxon Moreover, the the counts of incomplete. of abundance read fraction the the on raw th re)uigsaitcletaoaino eu cuuaincre fteosre incidences observed the of curves accumulation genus of extrapolation statistical using order) 8 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ovr o aooi ak eo aiylvlb h C loih,wihde o ffc u analysis our merging and affect independently not reads does paired-end which the algorithm, analysing of LCA choice the our contrary, by classified the been level have On may family homologs level. sensitivity distant below less genus very to the ranks evolutionarily due at reads only taxonomic unclassified miss low the to of very algorithms likely computationally most sensitive are to Thus, is study less species. BLAST related this Nevertheless, as the in databases. study not used reference and our MMseqs2 large for as against necessary such Cowart reads was BLAST (e.g., of than rate proportions searches billions recall low BLAST search the laboratory using obtain to with the studies we infeasible compromise to from The algorithm, compared depth. 2018). search PCR effective rate) al., sensitive the eliminating (recall et less reducing sequences by by relatively annotated useless sequences and taxonomically data (Figure filtering the of the stringent ecosystem of in to part the rates considerable due in a However, duplication render macroorganisms low may abundant otherwise very limits low which the workflow, achieve push from a we also By even is date, ecosystems. We biodiversity to aquatic DNA DNA 4). detect environmental large environmental for to in datasets extracellular life biomonitoring sequencing shotgun of of of deepest tree sequencing the the of across shotgun one biomonitoring generating ultra-deep next-generation for PCR-free approach that promising show plausible is results life marine Our of other tree any the to across dataset Biomonitoring taxonomy available filtered is our sequence in genome data). hits whole Discussion (Supporting other the whose database no China, of reference were in the genome There river in of whole Yangtze mammals database. regions to the the reference annotated hits to Since the 12,399 whose endemic found study. species in We dolphin related case database. river to reference a Chinese hits the 1E). of as in examined (Table available we ecosystem are ( 0.91 assembled, Chilika genome taxon dolphin been of the Irrawaddy the abundant not index the low was has of Jaccard chose dolphin detecting similarity species we a Irrawaddy for community database, flagship and approach reference lowest endangered 1D) pseudo-taxonomic the (Table The an the in 0.96 7). underrepresented of of is effectiveness (Figure index using that the taxa measured Sorenson examine was various a genera further samples with across To of the Protists number indices among among similarity shared Jaccard community found and and The unique samples. the Sorenson the estimate not the the comparing is to among by species us taxa allowed target databases, approach various the strategy incomplete pseudo-taxonomic across when sampling of the species spatially-replicated related issue of Our a database. the reproducibility of reference alleviate presence the the to in indicate strategy represented only may assignment assignments pseudo-taxonomic taxonomic a the the adopted crossing we 7.57%), Since 2 SD approach of 95.62%, 6). 1C) pseudo-taxonomic (AVG (Table effort (Figure the 90% Mammalia consideration sequencing of of under for the Effectiveness threshold taxa 3.88% double the the by With than all the 13.7%). only greater for have SD coverage increases threshold have samples 91.74%, coverage 95% (AVG taxa reads, 14.13%). the the Mammalia SD million reads, all and for 60.42%, 100 billion (AVG reads, coverage 1.52%) genera Mammalia of billion lowest SD by 1 of depth the 87.75%, followed of (AVG lowest fraction with 15.03%) depth Viruses the SD the read by At 55.51%, is a followed (AVG depth. Aves At coverage 1.54%) for given SD Sample estimated coverage given a 93.59%, lowest 6). then (AVG the at the we (Figure Bacteria in read point, for depth detected 1 reference coverage sequencing be a least highest of as can asymptotic at function samples The that by respective 1B). a taxon detected (Table of as particular curves richness coverage a accumulation genus sample the of asymptotic asymptotic using diversity the the the samples maximum Using as estimated the saturation sample. the then show all We represents curves in 5). richness accumulation (Figure taxon genus the reads each All paired-end of depth. billion richness sequencing 1 genus of beyond function various increases across a depth curves as sequencing accumulation sample genus each extrapolated generated for we taxa effort, minimal biomonitoring with diversity for maximum depth sequencing Optimal genus. oetmt notmldpho eunigt detect to sequencing of depth optimal an estimate To 9 Lipotes osbyetntgenus extinct possibly a , ralabrevirostris Orcaella ), Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ilo ardedrasmyb pia o imntrn costete flf,wihcuddtc about coverage the one detect increased could about only billion which that two per life, to indicate reads billion of one results tree billion from the depth Our 3 the across costs. of Doubling biomonitoring 6). sequencing depth (Figure for diversity current sequencing optimal the of under maximum be 90% may a limit by reads caused to practical paired-end issues up the billion technical concentrations. values is the library predict mitigate which in to differences and sample, us the depths allowed to the curves sequencing due but accumulation including unequal reads samples further genus paired-end taxa, across be billion of examined 5.6 data cannot extrapolation of as sequencing that the total Statistical depth level) of a all generated optimal genus depths we which study, the the unequal this at (at define obtained In threshold effort. we million) sequencing coverage widely, minimal 100 sample vary with maximum of increased life a multiples of cross tree (in sequencing ones, the depth low-abundant minimal across with sequencing also organisms biodiversity we discrete maximum of Hence, the a detect abundance sequencing. to with the ultra-deep required projects Since perform depth biomonitoring to effort. sequencing funding decade, optimal considerable last the require the may determine over samples reduced of drastically (Supporting number have large animals costs biomonitoring farmed sequencing for the domestically optimal Although and be ecosystem. Cat, may the rodents, into Dog, reads of runoffs assignments billion Human, sewage taxonomic One from untreated containing catchment significant the reads the Mammalia from a of inhabiting originating class Therefore, number organisms possibly the 2016). high data), terrestrial in al., represent a noticed monitor et may and distinctly to study (Deiner bats, be ability also our area can the in 2019). organisms, catchment which demonstrated taxa al., unintended the have area detected et of in ecosystems lowest the (Adams detection aquatic occurring of the eDNA the in species proportion had of studies is terrestrial rates Reptiles eDNA studies the by-catch. shedding 2018). eDNA-based even the low al., in of as their et inventories lining to to known silver (Suresh referred due the observed ecosystem likely with commonly taxa, sampled consistent A inspected the are the in that among occurring Actinopteri, diversity commonly species and of be diversity diverse can Arthropoda high macroorganisms, highly as a abundant reads eDNA detected low such We of including that framework. groups billions life, statistical concluded of monitored generate incidence-based tree an million) to the et using sequencer across (328.7 (Cowart monitored biodiversity next-generation viewpoint reliably depth that presence/absence high-throughput show better and a extremely sample a from an per fauna with utilise the perspective benthic We study determines marine abundance 2018). another detecting an al., that for contrast, from eukaryotes sequencing suitable In biomonitoring is concluded of 2017). million) metagenomics for (22.3 approach depth al., depth eDNA. suitable et the shallow extracellular a Metazoa with (Stat is not the study of previous is in macroorganisms groups a metagenomics macro-fauna various fact, detecting that from In from for taxa. DNA genera low-abundant viruses Additionally, factor of towards of RNA sensitivity amounts 2019). limiting number of al., detectable diversity main large et of missing a The (Zhang presence the of the methods to detection indicates metagenomic due successful which direct be rates demonstrate also DNA-based mutation results could with high al., detected the our detected et genera Lennon, to (Nooij be homology Viral due & cannot remote likely of (Locey detect that (3.67%), number to catalogued genus-level low comparisons Viruses Further, level to be The Viruses. protein up 2018). to requiring and classified yet lineages Bacteria reads between than is divergence of genera and diversity and eukaryotic proportion Viruses their of least of number of the abundance greater 99% had a high observe than the we despite more reads. Hence, Third, million environment, 2016). abundant low 6.49 studies. the detect Second, about targeted to in dataset. with power to excellent level Bacteria our comparable provides genus of resolution is the high-taxonomic depth that to with high up macro-organisms reads the of classified eukaryotic number to reads of large 4.02% of due a and First, Such assignments proportion reasons. Viruses highest taxonomic three than the to million Eukaryota had due 14.2 in Eukaryota possibly over higher of reads, for classified was 4.02% accounted the only detected of reads up genera most computational made of up such Eukaryotes made number of Although which the effect workflow. Bacteria, the assi 10 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ist h oretxnwt h viaiiyo elppltddtbssi h future. the in databases well-populated of availability the during the closer to redundancy evolutionarily with clustered the more taxon provide reducing been source will while other have approach the similarity pseudo-taxonomic might to were sequence The hits genomes construction. high there suspect database Delphinidae We to though of closer. due other process even phylogenetically sequences the the the more gene database, from from be dolphin the sequences may sequences river that of Chinese gene in available Delphinidae thousands annotated genome not family Interestingly, the the is dolphin samples. from that genome reference our river database whole in the the Chinese whose in present in the genomes Chilika, be available hit of to not dolphin species expected example is Irrawaddy an flagship but species provide database endangered 7) also target the an we (Figure the in illustrate, similarity dolphin, To community from Irrawaddy approach. high the pseudo-taxonomic locus very the of have the of samples available when reproducibility spatially-replicated nearest high hit the indicating the we from best approach, Hence, results the pseudo-taxonomic feasible. Our with the as not database. genome In chosen is databases. the approach is reference by taxonomy-free across species approaches on fraction taxonomy-free a loci dependent and a utilising taxonomy-based from partially have of scenario, advantages only being arise a with may approach sequences such sizes pseudo-taxonomic the In a genome adapted sampled. studies, large genome with (Callahan metagenomic organisms their samples of higher across of case Further, diversity resolutions. compare in taxonomic to of But, varying approach abundance 2017). relative taxonomy-free The al., a 2015). amplicon provides al., et or clustered ASVs et 2015) (Tikhonov be or Schloss, sequences can OTUs & denoised and (Westcott different the locus identity reads on sequence single based being on al., sequenced a (ASVs) (M¨achler variants based from et are of databases sequence (OTUs) arise approaches reference units number derived Taxonomy-free taxonomic the sequences large due originated. operational the on into studies, a Also, they dependency targeted environment, the which genetics. marker-based eliminate the from In population to sequence 2020). taxa and in studies reference source barcoding taxa metabarcoding the the in in novel employed match used assembled, uncatalogued directly routinely yet of not are not proportion may that high are loci a genomes few to a whose only organisms contain non-model databases most more for become Currently, might future. reproducible and shotgun the is 72 complications ultra-deep in approach fewer to higher-end, costs pseudo-taxonomic relatively the up sequencing The reads. with of on add million choice reduction case 720 can simpler further requiring extreme much the taxa) which an 6 a with is CytB) x attractive is markers example Protists (COI, eDNA 2 metabarcoding 23S), x Animals of life (16S, sets sequencing and Bacteria of primer 2 23S), tree rbcL), x (16S, the & (matK, replicates sets Archaea Although technical different Loos namely, Plants (3 two groups, der location ITS), pair, six van per primer (18S, libraries the 2017; per of Fungi of replicates al., tree each technical 28S), the et for three (18S, across (Stat marker least program at per metabarcoding biomonitoring need primers life unbiased may of of completely one markers a tree different metabarcoding, implement achieve with using to life libraries to example, and taxa For primer-bias, of 2020). reduce increases stochasticity,Nijland, groups to also PCR-induced different approaches taxon alleviate targeted a target to of targeting to required cost primers replicates the different However, technical with metabarcoding metagenomics. of libraries library, than number per choice the requirement achieve better low-depth considering to relatively a significantly library the In be of per to sequencing. to Gb reads Due of seem million 2019). 300 may Gb 10 al., requires only et 150 coverage require (Singer requires 100x may sensitivity typical biomonitoring at metabarcoding good now for as genome 100 such reads also sized approaches requires paired-end are mammalian targeted coverage 150bp contrast, depths 30x a billion sequencing at assembling one high novo genome Such whereas extremely human de sample. sequencing, at a and per re-sequencing length sequencing, incurred instance, in of For cost 150bp Gb biology. the to in reducing applications up various run reads Illumina in single paired-end the a deliver as such can in to sequencers small platform depths for next-generation Novaseq feasible utilise The be they advantage may 6000. them when sample take Novaseq differentiating programs per can for reads biomonitoring One samples paired-end sized low. re 11 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. hf oad mlmniglresaenx-eeainbooioigporm costete flf nthe in life of ACKNOWLEDGEMENTS paradigm tree a the advances for across study necessary programs era. this is biomonitoring biogenome that that next-generation earth and metagenomics believe large-scale test eDNA-based implementing We of further towards conditions. capabilities shift to different the opportunity costs under regarding excellent sequencing understanding workflow an taxa, Plummeting our metagenomic provide spatio-temporal effort. specific novel projects sampling high our to sequencing the the optimise enables genome traits reducing 1) large-scale Third, significantly (Figure functional increasing DNA by services. of extracellular and ecosystems employing and large mapping through functioning approach whole of level, our the monitoring ecosystem by gene from provided to biodiversity data the of changes generated tracking resolution at by the community diversity kingdoms of linking across nature monitoring interactions multi-faceted and species for the inferring Second, allows enables biodiversity that abundances. detecting genome has assay relative effectively single approach and the a simultaneously our in in the of limitations, changes life in possibility of the programs the tree biomonitoring First, Despite the databases next-generation features. 2020). across key large-scale barcode into three al., resolving to curated incorporated et due as and using (Singer future adapted such approaches be metagenomics applications to targeted than For potential level, suitable the databases. pseudo-taxonomic species more sequence using the reference be ecosystem at may uncurated adopted the structure using in widely community level our changes be Finally, genus fine-scale broad-scale affordable. not the more monitoring may become at to approach biomonitoring costs labels limited metagenomic Next, large-scale sequencing deep currently very a 2019). the is for Hence, al., until approach prohibitive sequencing. managers et ecosystem economically of (Seeber and be costs researchers animals by may current by strategies sample the visited sampling at per routinely innovative programs reads need waterholes may billion from cases one samples sub- Such obtaining and water plants, animals. collecting microbes, terrestrial detect as large to pooled such useful not soil, be the but only from may organisms, isolated ecosystems, applicability obtaining DNA terranean terrestrial Further, extracellular where in ecosystem). the plots ecosystems Tundra since sampling different tested in multiple (e.g., rigorously for from samples difficult be necessary of should be is ecosystems number may testing terrestrial limited DNA to more a extracellular ecosystem, of with model warrant quantities objectives a next- microgram that our in for achieved limitations way study we unbiased some pilot-scale Although an has a in optimisation. also data and approach required in testing metagenomic the further advancements deliver our is can Nevertheless, Hence, bioinformatics that and biomonitoring. technologies 2017). sequencing, generation level futuristic preparation, every al., develop library at to extraction, adaptations et DNA attempt with collection, 2) an (Bohan sample (Figure workflow from life metagenomic pipeline a novel the of Our at of genome 2020). biomonitoring al., tree entire next-generation et deploying the (Makiola the for from scale requirement global data across indispensable requires an species which a are ecosystems technologies monitoring various entire metagenomic of from to approach genes inherent individuals holistic from the more to entities of due ecological a standardised life different of employs have of range that biomonitoring tree wide organisms entire next-generation of the Moreover, groups of not major loci. but limitations across interest barcoding may abundances the of relative approaches taxon revealed the targeted provide particular have Such to a its 2017). inability decade monitoring of al., for past et understanding suitable the (Elbrecht be thorough technique only in a metabarcoding studies requires PCR-based Numerous widely-used problems the weaknesses. practical and solve strengths to technical technologies of application Successful 1) (Figure conclusion DNA environmental and biomonitoring of Limitations large-scale type designing consideration. targeted ecosystems. before under the large ecosystem of studies spatially large resolutions the pilot-scale represent monitoring spatio-temporal in of for to the importance evaluate suitable required to the be samples programs may emphasises extra- of DNA also of number sampled extracellular distribution study the the that homogenous Our in reducing implies the km and effectively to column 10 the by due water than be ecosystems that the could higher indicates in biodiversity much of DNA is also resolut 12 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. eals . hlc,P . ae,P .(00.Vrertso h rn sidctr fbiological of indicators as brink the on (2020). H. America extinction. P. of mass Raven, States sixth & the R., and P. annihilation Ehrlich, G., extinction Ceballos, mass America sixth declines. ongoing of the and States via United losses annihilation the Biological population (2017). R. Accele- by Dirzo, (2015). & signaled R., M. P. extinction. Ehrlich, T. G., Palmer, mass Ceballos, & sixth M., the R. Entering DNA Pringle, losses: Garc´ıa, Relic A., https://doi.org/10.1126/sciadv.1400253 species D., (2016). human-induced A. diversity. N. Barnosky, modern Fierer, microbial R., rated & P. soil S., Ehrlich, of G., M. Ceballos, estimates Strickland, E., obscures E. and Morgan, soil W., https://doi.org/10.1038/nmicrobiol.2016.242 J. in Leff, abundant (2009). J., L. is P. T. Marsden, replace Madden, P., & should Carini, K., variants Bealer, J., sequence Papadopoulos, Exact N., 2105-10-421 applications. Ma, (2017). and V., Architecture Avagyan, P. BLAST+: G., Coulouris, S. analysis. C., Holmes, Camacho, data & marker-gene J., in P. units ps://doi.org/10.1038/ismej.2017.119 McMurdie, taxonomic Bruyn, de J., operational & W., B. In D. Yu, monitoring. M., Callahan, biodiversity Knapp, and S., biology Woodward, Creer, wildlife R., G. & for Evolution Carvalho, DNA P., J., Environmental T. (2014). M. A. M. Gilbert, Ecologi- A., Evans, Dumbrell, of K., Bohmann, A., Reconstruction Automated Raybould, Large-scale, A., ps://doi.org/10.1016/j.tree.2017.03.001 In Biomonitoring: Tamaddoni-Nezhad, Networks. Global C., for cal Next-Generation Vacher, implications (2017). A., G. and D. DNA https://doi.org/10.1126/science.1170749 Bohan, environmental Science. of In of Advancement life. the of ecology for web Association the The Disentangling (2009). (2016). J. Bascompte, R. C. Turner, In https://doi.org/10.1007/s10592-015-0775-4 & genetics. A., conservation M. Barnes, https://doi.org/10.1016/j.tree.2018.09.003 In Ltd. Bowler, Ecology. Elsevier & in G., 945–957). Series Englund, pp. A., Time 12, M. DNA Leibold, Environmental M., Vellend, (2018). P., Taberlet, D. P., H. Grossart, M., B´alint, Pfenninger, M., niomna N eN) ihacs td fpitdtrl Crsmspca DAudrfield under eDNA reptile picta) non-avian of (Chrysemys review turtle brief painted A (2019). of J. study F. Janzen, case & a R., M. with conditions. Muell, (eDNA), A., L. DNA NGS Hoekstra, environmental CCMB M., the I. C. of Adams, V Purushotham sequencing. Mr. and and and libraries REFERENCES fieldwork Nagabandi of during Tulasi preparation Karne with Mrs. Divyasree help thank and grant for Ray vide we facility Manisha Society Finally, from Geographic sample extraction. received facilitating National PhD for DNA help the Authority DBT-BINC the Development from Chilika acknowledge Grant a and We Career Rastogi by collection. Gurdeep Early supported (DBT), Dr. an thank Biotechnology was We and of EC-53199T-18. S.M. India no. Department BT/PR29032/FCB/125/4/2018. of the no. Govt. from from grant study fellowship this vide for India, G.U. of by Govt. received funding the acknowledge We Vl 9 su ,p.3837.Esve t.https://doi.org/10.1016/j.tree.2014.04.003 Ltd. Elsevier 358–367). pp. 6, Issue 29, (Vol. Diversity rnsi clg n Evolution and Ecology in Trends , 117 , 11 2) 39–30.https://doi.org/10.1073/pnas.1922686117 13596–13602. (24), 4,5.https://doi.org/10.3390/d11040050 50. (4), osrainGenetics Conservation , 114 3) 68–69.https://doi.org/10.1073/pnas.1704949114 E6089–E6096. (30), M Bioinformatics BMC rceig fteNtoa cdm fSine fteUnited the of Sciences of Academy National the of Proceedings Vl 7 su ,p.11) pigrNetherlands. Springer 1–17). pp. 1, Issue 17, (Vol. 13 Vl 2 su ,p.4747.Esve t.htt- Ltd. Elsevier 477–487). pp. 7, Issue 32, (Vol. Science rceig fteNtoa cdm fSine of Sciences of Academy National the of Proceedings rnsi clg n Evolution and Ecology in Trends Vl 2,Ise53,p.4649.American 416–419). pp. 5939, Issue 325, (Vol. SEJournal ISME , 10 1,19 https://doi.org/10.1186/1471- 1–9. (1), aueMicrobiology Nature , 11 cec Advances Science rnsi clg and Ecology in Trends 1) 6924.htt- 2639–2643. (12), Vl 3 Issue 33, (Vol. , 2 3,1–6. (3), , 1 (5). Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. . eVra,C,Kret,E,...Oaa .(03.Epoignce-yolsi ag N iue nTara in viruses DNA large nucleo-cytoplasmic Exploring (2013). Raes, H. P., Ogata, Bork, metagenomes. Y., . Vil- Desdevises, microbial . H., H., . Oceans Sarmento, Moreau, E., M., Karsenti, I., Ferrera, J. C., Claverie, Vargas, J., S., De Poulain, Sunagawa, J., L., K., Faust, Subirana, G., C., Lima-Mendez, Clerissi, E., G., lar, S. Acinas, monitoring. N., Grimsley, P., stream Mock, Hingamp, routine weaknesses and S., for strengths western Assessing identification K. Evolution (2017). and macroinvertebrate F. the McKelvey, Ecology Leese, metabarcoding-based & K., samplesdetecting J., DNA Aroviita, M. K., DNA of Meissner, Young, E., environmental E. Vamos, concept. J., V., Repurposing Elbrecht, of K. (2018). proof Carim, a K. W., as M. ps://doi.org/10.1002/ece3.3898 T. falcata) Schwartz, Franklin, (Margaritifera & pearlshell T., DNA E., Rodgers, and K. C., filtration of J. Evaluation (2017). Dysthe, P. levels. D. F. trophic Goldsmith, multiple Chavez, R., Science across & K. assessments Marine Walz, B., biodiversity in DNA O., A. environmental Romero-Maraccini, Boehm, for M., methods M., K. extraction Breitbart, Yamahara, J., R., C. Michisaki, Closek, B., Se- J., Next-Generation Port, Integrating A., Century: Djurhuus, M., 21st Plantegenest, the C., Pauvert, https://doi.org/10.1016/bs.aecr.2017.12.001 for F., Inc. Biomonitoring Massol, In Press N., (2018). Analysis. J. Network M. J. Ecological Kitson, Into D. J., quencing DNA Evans, A. Environmental Dumbrell, & A., C., (2016). D. Vacher, Bohan, F. P., information. A. Altermatt, S. biodiversity Derocles, & of C., belts J. conveyer Walser, are https://doi.org/10.1038/ncomms12544 M¨achler, E., rivers A., metabarco- that DNA E. reveals Environmental Fronhofer, (2017). K., L. Deiner, Bernatchez, communities. & plant E., and Bista, M. S., Pfrender, survey https://doi.org/10.1111/mec.14350 Creer, N., we F., Vere, how Altermatt, Transforming de Lacoursi`ere-Roussel, A., ding: M., M., D. Seymour, M¨achler, Lodge, E., M., I., DNA H. environmental Bik, of K., Peninsula. sequencing Deiner, Antarctic Metagenomic West (2018). extinct? C. the go H. from C. they assemblages https://doi.org/10.1016/j.margen.2017.11.003 Cheng, before faunal & species marine R., earth’s K. reveals Murphy, name A., we D. Can Cowart, (2013). (2012). E. T. N. J. Stork, Longino, https://doi.org/10.1126/science.1230318 & & M., L., R. R. May, In Chazdon, J., X., M. C. Costello, comparison Mao, and Y., extrapolation S. rarefaction, Lin, sample-based assemblages. and J., of individual-based N. linking Gotelli, estimators and A., Models Chao, K., R. Colwell, Extending genomes: to barcodes In https://doi.org/10.1111/mec.13549 From barcoding. (2016). Ltd. P. DNA Taberlet, preprocessor. of & FASTQ concept S., all-in-one Lavergne, the ultra-fast M., P. An Hollingsworth, Fastp: species. E., (2018). Coissac, new J. of Gu, rates & discovery Y., Chen, via formatics Y., profiles Zhou, entropy S., and Chen, diversity Evolution Estimating Ra- (2014). and (2015). Ecology M. L. A. in Jost, Ellison, Methods diversity & & species K., in A., R. estimation Colwell, Chao, and H., sampling K. for Ma, framework L., A E. numbers: Sander, studies. Hill C., with T. extrapolation Hsieh, and J., refaction N. Gotelli, A., Chao, Science clgclMonographs Ecological , 34 Vl 3,Ise61,p.4346.Aeia soito o h dacmn fScience. of Advancement the for Association American 413–416). pp. 6118, Issue 339, (Vol. 1) 84i9.https://doi.org/10.1093/bioinformatics/bty560 i884–i890. (17), ora fPatEcology Plant of Journal , 4 OT,34 https://doi.org/10.3389/fmars.2017.00314 314. (OCT), , 8 1) 2517.https://doi.org/10.1111/2041-210X.12789 1265–1275. (10), , SEJournal ISME 84 oeua Ecology Molecular , 6 1,4–7 https://doi.org/10.1890/13-0133.1 45–67. (1), 8,8382 https://doi.org/10.1111/2041-210X.12349 873–882. (8), , 5 1,32.https://doi.org/10.1093/jpe/rtr044 3–21. (1), dacsi clgclResearch Ecological in Advances , 7 9,17–65 https://doi.org/10.1038/ismej.2013.59 1678–1695. (9), 14 Vl 5 su ,p.12–48.BakelPublishing Blackwell 1423–1428). pp. 7, Issue 25, (Vol. clg n Evolution and Ecology oeua Ecology Molecular aueCommunications Nature aieGenomics Marine Vl 8 p –2.Academic 1–62). pp. 58, (Vol. , 8 , 5,25–60 htt- 2659–2670. (5), 26 2) 5872–5895. (21), , 37 , 7 ehd in Methods 148–160. , Frontiers 1,1–9. (1), Bioin- Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ¨ clr . asr .C,&Atrat .(00.Dcso-aigadbs rcie o taxonomy-free for diversity. numbers. practices America microbial Hill best using and of global Decision-making biomonitoring States (2020). in https://doi.org/10.1111/mec.15725 predict F. metabarcoding United laws Altermatt, DNA the & environmental Scaling C., of J. (2016). Walser, Sciences M¨achler, E., T. of J. Academy Lennon, National ps://doi.org/10.1073/pnas.1521291113 & reliability the GenBank’s J., of (2020). K. In S. Locey, eDNA. D. for Baetscher, America https://doi.org/10.1073/pnas.2007421117 assignment of & Sciences. States species-level United of O., the Academy seeking N. of Sciences researchers Therkildsen, of Academy biodiversity National B., for P. uncertain McIntyre, metagenome of is S., speed and N. accuracy the Locatelli, of evaluation An (2016). P. P. Gardner, tools. & samples. analysis L., water K. environmental Adair, from S., DNA Lindgreen, extracellular of Technology recovery Filtration and (2018). (2013). Science G. Environmental A. Keeley, Zhang, & Z., . . Liang, . Haussler, C., R., J., In J. K. Durbin, life. America Castilla-Rubio, Hackett, A., of of S., V., future K. States https://doi.org/10.1073/pnas.1720115115 Richards, I. the United Crandall, A., Grigoriev, the for J., Patrinos, M., of life Coddington, Sciences M. E., Sequencing J., Goldstein, W. diverse Project: P., W. BioGenome Johnson, from T. Baker, Earth D., pools M. J., E. DNA Gilbert, W. Jarvis, F., of Kress, D., Forest, separation E., V., the G. S. and Robinson, Edwards, RNA, A., and H. DNA Lewin, of B., extraction A. the types. Michaud, sample for P., environmental Eickenbusch, method A., modular resource Torti, reliable A a A., is M. GenBank (2019). Lever, J. relic R. where Machida, and & America when, N., research.of B. How, biodiversity Nguyen, (2018). century L., K. 21st S. for B. Ho, Lehmkuhl, N., Knowlton, & M., A., Leray, S. Placella, diversity. E., microbial affects M. DNA Muscarella, assessment T., for J. trapping camera Lennon, to eDNA of comparison A (2020). https://doi.org/10.1098/rspb.2019.2353 A. diversity. E. mammal Hadly, terrestrial & of T., sequencing high-throughput Hebert, in K., distortions Leempoel, PCR-induced water of Sources from (2015). DNA M. sets. A. environmental data Zador, nuclear & M M., and M., J. services M. mitochondrial Kebschull, Hansen, of ecosystem S., S. capture and Bach, target A., biodiversity Manica, Genome-scale samples. S., (2020). on Liu, (2019). F. E., report E. P. I. Sigsgaard, assessment R., Services, extra- M. global Jensen, Ecosystem and the and rarefaction of Biodiversity policymakers for ps://doi.org/10.5281/ZENODO.3553579 on for package data. metagenomic Platform of mary R analysis MEGAN Science-Policy an (2007). C. Intergovernmental S. iNEXT: Schuster, & (2016). J., Qi, F., A. Research A. Auch, Chao, H., D. numbers). Huson, & (Hill H., diversity K. species https://doi.org/10.1111/2041-210X.12613 Ma, of C., polation T. Hsieh, oeua clg Resources Ecology Molecular , uli cd Research Acids Nucleic 17 , 116 3,3736 https://doi.org/10.1101/gr.5969107 377–386. (3), cetfi Reports Scientific 4) 25–25.https://doi.org/10.1073/pnas.1911714116 22651–22656. (45), rnir nMicrobiology in Frontiers rceig fteRylSceyB ilgclSciences Biological B: Society Royal the of Proceedings MBio , , 6 43 1,11.https://doi.org/10.1038/srep19233 1–14. (1), rceig fteNtoa cdm fSine fteUie States United the of Sciences of Academy National the of Proceedings , 7509.39.https://doi.org/10.1111/1755-0998.13293 1755-0998.13293. , 2) 13e4.https://doi.org/10.1093/nar/gkv717 e143–e143. (21), 9 , Vl 1,Ise1,p.42–33.Ntoa cdm fSciences. of Academy National 4325–4333). pp. 17, Issue 115, (Vol. 47 3.https://doi.org/10.1128/mBio.00637-18 (3). 1) 3493.https://doi.org/10.1021/es401342b 9324–9331. (16), ehd nEooyadEvolution and Ecology in Methods 15 , 6 MY.https://doi.org/10.3389/fmicb.2015.00476 (MAY). at-ekv . J & T., Santl-Temkiv, ˇ Vl 1,Ise5,p.321322.National 32211–32212). pp. 51, Issue 117, (Vol. rceig fteNtoa cdm of Academy National the of Proceedings oeua Ecology Molecular , 113 ø ø lr .R,&Thomsen, & R., P. ller, gne,B .(2015). B. B. rgensen, , 2) 9057.htt- 5970–5975. (21), , 287 7 rceig fthe of Proceedings 11) 20192353. (1918), 1) 1451–1456. (12), mec.15725. , Proceedings Genome htt- . Sum- Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. igr . hkri,S,MCrh,A,Fhe,N,&Hjbbe,M 22) h tlt fametagenomics a of utility The (2020). M. Comprehensive metabarcoding Hajibabaei, eDNA & (2019). biomonitoring. N., of marine Fahner, M. for A., study approach McCarthy, Hajibabaei, case S., & Shekarriz, a G., A., Singer, McCarthy, technology: G., cell J. flow Barnes, patterned A., ultra-deep seawater. In N. via Fahner, analysis research. C., biodiversity DNA A. environmental G. of Singer, future and progression D. Rapid African A. Biology Greenwood, from (2019). & DNA J., M. environmental Melzheimer, Seymour, of L., M. capture East, hybridization W., using Panigrahi, D. surveillance Forster, waterholes. & U., mammal K., Terrestrial Lober, K., A. G. (2019). 2020. Mohanty, McEwen, A., K., P. al. K. Seeber, et Satpathy, In Pawlowski K., redu- by A. between lake. Trade-offs Chilika https://doi.org/10.1007/978-1-4020-4410-6_57 Bhattacharya, ?” (2020). A., (2012). term K. Bhattacharya, S. Comment the Deine, K., : behind S. . . DNA s Sarkar, . He- environmental What’ H., C., from Roger, : Doi, Antognazza, interpretations E., C., DNA S., accurate https://doi.org/10.22541/AU.160783291.14092604/V1 Environmental Alter, Abbott, O. producing K., R., Wangensteen, “ and Ben, L., Nijland, on terminology A., Mirimin, de, complex Q., Lacoursiere, H. cing Mauvisseau, S., Boer, M., Manu, S., of Monaghan, C., lyar, Comparison K., Bean, (2020). Stewart, O., M. F., Lintermans, Morissette, & N., fish. J., freshwater Rodriguez-ezpeleta, abundant C. and Ecosystems Fulton, rare Freshwater T., detecting and for B. Marine methods Broadhurst, Conservation: survey DNA C., environmental biomonitoring. S. and in traditional Banks, Pro- use P., v2: future M. its Life Piggott, behind for of recommendations What’s Encyclopedia and DNA: A., terminology Environmental The Goddard, Ecology the (2020). (2014). Clarifying A., F. J. term? Altermatt, J. the R. & Hammock, Apoth´eloz-Perret-Gentil, Earth. L., Corrigan, L., J., on Walley, & Pawlowski, Life K., G., About Lans, T. Knowledge S., J. to K. ps://doi.org/10.3897/BDJ.2.e1079 Access Holmes, Schulz, virus Global P., M., of viding Leary, Studer, Overview N., (2018). J., Wilson, G. Rice, S., P. C. M. https://doi.org/10.3389/fmicb.2018.00749 S.A. Parr, Koopmans, Media In Frontiers & applications. 749). A., biological p. Kroneman, their APR, Issue and H., methods Vennema, classification D., metagenomic Schmitz, S., environ- Nooij, natural in https://doi.org/10.1007/s00253-018-9120-4 on Verlag. DNA there Springer Extracellular In are (2018). 6343–6356). applications. J. pp. species and Ascher-Jenull, many relevance & How G., features, (2011). Pietramellara, ments: B. H., Worm, Insam, & M., B., Nagler, G. A. ocean? Their Simpson, the – S., in GenBank Adl, and P., and earth D. BOLD Tittensor, of materials. C., Mora, Assessment biological (2019). of Next- M. identification for J. the Questions Robertson, for ps://doi.org/10.1371/journal.pone.0217084 Key & reliability Haji- (2020). N., and G., D., A. Damaso, accuracy Gravel, Brennan, D. A., J., Bohan, A., K. A. Bouchez, . Meiklejohn, . Dumbrell, P., . P., P., S. David, Jarne, Boerlijst, A., B., R. A., https://doi.org/10.3389/fenvs.2019.00197 Hoorn, In Curry, M. der S., Biomonitoring. van Barnes, Creer, Generation B., J., T., Hayden, D. Cordier, M., E., Baird, babaei, Canard, G., A., Z. Bush, Compson, A., Makiola, Vl ,Ise1 p –) aueRsac.https://doi.org/10.1038/s42003-019-0330-9 Research. Nature 1–3). pp. 1, Issue 2, (Vol. , 29 cetfi Reports Scientific oeua clg Resources Ecology Molecular 2) 2846.https://doi.org/10.1111/mec.15643 4258–4264. (22), LSBiology PLoS nylpdao at cecsSeries Sciences Earth of Encyclopedia , 9 rnir nEvrnetlScience Environmental in Frontiers 1,11.https://doi.org/10.1038/s41598-019-42455-9 1–12. (1), BioRxiv , 9 , 8,e012.https://doi.org/10.1371/journal.pbio.1001127 e1001127. (8), 19 000.6936.https://doi.org/10.1101/2020.03.16.993667 2020.03.16.993667. , 6,18–46 https://doi.org/10.1111/1755-0998.13069 1486–1496. (6), ple irbooyadBiotechnology and Microbiology Applied , 31 16 1,1314 https://doi.org/10.1002/aqc.3474 173–184. (1), idvriyDt Journal Data Biodiversity Vl ,p 9) rnir ei S.A. Media Frontiers 197). p. 7, (Vol. p.1816.Srne Netherlands. Springer 148–156). (pp. LSONE PLoS rnir nMicrobiology in Frontiers , 14 Vl 0,Ise15, Issue 102, (Vol. , 6,e278.htt- e0217084. (6), 2 Communications Authorea 1,17.htt- 1079. (1), Molecular Aquatic Vl 9, (Vol. 1–2. , Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ihnv . ec,R . igen .S 21) nepeig1Smtgnmcdt ihu clus- without data metagenomic 16S Interpreting for (2015). conservation S. A N. in resolution. Wingreen, sub-OTU tool achieve & to emerging (2020). W., tering R. An S. Leach, - M., C. DNA Tikhonov, Environmental Goldberg, In & (2015). biodiversity. M., E. https://doi.org/10.1016/j.biocon.2014.11.019 present and Willerslev, Sinnesael, past & species. J., monitoring F., invasive Ponce, P. aquatic L., Thomsen, of P. detection Nguyen, eDNA S., https://doi.org/10.1002/edn3.25 rapid Tank, for C., system A. biodegrad- partially self-preserving, Thomas, A (2019). S. A C. Sampler: Goldberg, 210X.13212 & eDNA filter. J., Howard, (2018). eDNA L., S. able P. Nguyen, C. C., Goldberg, A. & Thomas, A., T. Seimon, L., system. https://doi.org/10.1111/2041-210X.12994 P. sampling DNA Nguyen, environmental J., integrated fully Howard, studies. C., metabarcoding D., isolation for A. Rioux, and suitable Thomas, sampling L., material Soil Gielly, starting (2012). next- W., of E. Shehzad, amount Towards Coissac, large & C., Ecology F., from Miquel, Pompanon, DNA J., C., (2012). extracellular Roy, Melodelima, of C., E. E., J. Campione, Clement, Willerslev, P., M., Choler, S. & Prud’Homme, P., C., Taberlet, Brochmann, metabarcoding. F., DNA using Pompanon, https://doi.org/10.1111/j.1365-294X.2012.05470.x assessment E., biodiversity In Coissac, generation DNA. Environmental P., (2012). Taberlet, H. L. Rieseberg, & P.,294X.2012.05542.x A. M., Sharma, Hajibabaei, ocean K., Ecology E., S. global Coissac, Karna, towards P., (2018). M., S. Taberlet, Mukherjee, Lenka, Oceans: S., & Lake Tara K. S., Chilika Bhatta, Nanda, Iu- in K., K., P., A. (2020). Management R. Pattnaik, Manna, Hingamp, C. K., K., B. L., Vargas, S. Das, Mohanty, Guidi, de Bowler, R., E., V. N., Suresh, Boss, Grimsley, . P., . . G., Bork, Gorsky, M., S., Babin, M., https://doi.org/10.1038/s41579-020-0364-5 Kandels, In G., Follows, O., S. biology. C., Acinas, ecosystems Jaillon, Vargas, C., D., de Bowler, P., envi- dicone, G., Bork, aquatic Cochrane, of G., sources S. C., on Acinas, factors S., abiotic Sunagawa, and biotic of effects https://doi.org/10.1007/s10531-019-01709-8 the In Understanding time. DNA. linear ronmental (2019). in A. sets K. sequence Stewart, protein huge Clustering (2018). J. nications Soding, analysis & the M., for Steinegger, searching sequence protein sensitive & enables In https://doi.org/10.1038/nbt.3988 S., MMseqs2 sets. tropical E. data a (2017). Harvey, massive in J. J., of life Soding, S. of & Newman, tree M., E., the Steinegger, T. across Berry, Metabarcoding D., eDNA: J. with environment. biomonitoring Dibattista, marine Ecosystem R., Bernasconi, (2017). M. J., Bunce, M. Huggett, M., Stat, , Vl 1 su ,p.18–73.Jh ie os t.https://doi.org/10.1111/j.1365- Ltd. Sons, & Wiley John 1789–1793). pp. 8, Issue 21, (Vol. 21 , 9 8,11–80 https://doi.org/10.1111/j.1365-294X.2011.05317.x 1816–1820. (8), 1,18 https://doi.org/10.1038/s41467-018-04964-5 1–8. (1), ehd nEooyadEvolution and Ecology in Methods idvriyadConservation and Biodiversity cetfi Reports Scientific aueBiotechnology Nature aueRvesMicrobiology Reviews Nature . SEJournal ISME , 7 1,11.https://doi.org/10.1038/s41598-017-12501-5 1–11. (1), Vl 5 su 1 p 0612) auePbihn Group. Publishing Nature 1026–1028). pp. 11, Issue 35, (Vol. ilgclConservation Biological Vl 8 su ,p.9310) pigrNetherlands. Springer 983–1001). pp. 5, Issue 28, (Vol. 17 , 9 ehd nEooyadEvolution and Ecology in Methods Vl 8 su ,p.4845.Ntr Research. Nature 428–445). pp. 8, Issue 18, (Vol. , 1,6–0 https://doi.org/10.1038/ismej.2014.117 68–80. (1), 10 8,13–11 https://doi.org/10.1111/2041- 1136–1141. (8), ihadSels iest n t Sustainable Its and Diversity Shellfish and oeua Ecology Molecular niomna DNA Environmental Vl 8,p.41) leirLtd. Elsevier 4–18). pp. 183, (Vol. , 21 , , 9 2 aueCommu- Nature 8,2045–2050. (8), 6,1379–1385. (6), 3,261–270. (3), Molecular Molecular Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. rmwoeognss iht o oeua egtetaognsa N rmdsoitdprso the of parts dissociated from DNA extra-organismal weight DNA molecular organismal weight low molecular to High high types: main organisms, three whole to attributed from be can samples water environmental 1: Figure the approved and draft the data, to the edits generated provided samples, FIGURES G.U. the AND TABLES collected manuscript. S.M. link: the study. drafted the reviewer immedi- designed and manuscript. and analysis, following idea the released the peer-review. un- performed conceived after the provided G.U publicly SRA be and will using S.M. NCBI link be DOI verified R permanent to and https://github.com/manu-script/exDNA-pilot-study. a will CONTRIBUTIONS scripts and be review: AUTHOR submitted archived, Python for be data pipeline, available will may Snakemake been are repository the analysis GitHub The data), submission has The the (Supporting for file study used The count scripts lineage this taxonomic filtered PRJNA691704. fi- in nal The acceptance. accession generated https://dataview.ncbi.nlm.nih.gov/object/PRJNA691704?reviewer=3ocgpcesbu8qu5gl19m8749stc. upon data BioProject sequencing ately the raw der the All Virosphere RNA the Expanding STATEMENT (2019). ACCESSIBILITY DATA C. E. Holmes, & 2. C., Kraken X. virology-092818-015851 with Qin, W., analysis Metagenomics. Wang, Unbiased metagenomic M., Y. by Improved Chen, Z., (2019). Y. B. Zhang, Langmead, & meth- J., units. reference-based Lu, taxonomic Biology outperform E., operational methods D. to clustering Wood, novo sequences De gene J., rRNA S. (2015). 16S D. Newman, https://doi.org/10.7717/peerj.1487 assigning P. D., for Schloss, J. & ods Australia. DiBattista, survey L., north-western metabarcoding T., S. eDNA tropical Westcott, Z. Large-scale over Richards, (2021). transitions M. S., and Bunce, E. break & Distributions Harvey, biogeographic M., Heydenrych, M., marine extra- and L., reveals Stat, communities C. of marine Skepper, J., of A., implications M. metabarcoding Harry, DNA Travers, and bulk: K., in dynamics, West, Biases Origin, (2020). R. involved. Nijland, & methodology (2015). In M., the L. B. Loos, der B. sediments. van Jorgensen, marine & in A., https://doi.org/10.1016/j.margen.2015.08.007 pools M. DNA Lever, cellular A., Torti, , 20 h ope clg fevrnetlDA(DA natpclautceoytm N in DNA ecosystem. aquatic typical a in (eDNA) DNA environmental of ecology complex The 1,27 https://doi.org/10.1186/s13059-019-1891-0 257. (1), d.32.https://doi.org/10.1111/ddi.13228 ddi.13228. , oeua Ecology Molecular nulRve fVirology of Review Annual https://doi.org/10.1111/mec.15592 . aieGenomics Marine 18 , 6 1,1919 https://doi.org/10.1146/annurev- 119–139. (1), Vl 4 p 8–9) Elsevier. 185–196). pp. 24, (Vol. PeerJ , 2015 iest and Diversity 1) e1487. (12), Genome Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. iiaiyb pligicdnebsdsaitc ntefitrdcutdt.(iuecetdwt BioRen- with created (Figure data. count community filtered analyse the and querying on richness by genus statistics sequences asymptotic der.com) incidence-based Derive quality-filtered a applying to (f) on by sequencing data label database. similarity shotgun sequencing nucleotide pseudo-taxonomic extracellu- the ultra-deep non-redundant the a Prepare paired-end a enrich Assign Generate against (c) and (e) (d) Desorb protocol. protocol. sequencer. (b) lysis-free PCR-free high-throughput buffer-based completely membrane. phosphate a (MCE) through saturated ester libraries a cellulose using mixed DNA 0.45um lar a through water of 2: Figure resolution spatial BioRender.com) of with kilometres created several (Figure to eDNA. metres of few fate a from and and biodiversity, different transport, temporal, The of eventually origin, of resolutions sediments. the years may the spatio-temporal on of particles in varying thousands depending time provide onto to long sample seconds adsorbed a environmental few for an DNA surface-reactive a preserved in extracellular onto or DNA makes later The adsorbs which of resuspended bridging, types otherwise be cation degradation. may or through which to sand nucleotides down, and resistant sediment into silt, more clay, degraded substances, DNA humic completely exists the as is DNA such extracellular particles, it The soil until suspended structures. free-form cellular outside a DNA in extracellular degraded highly and organisms, umr formtgnmcwrflwfo apigt iifrais a itrlrevolumes large Filter (a) bioinformatics. to sampling from workflow metagenomic our of Summary 19 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. hl n classes. and phyla 4: Figure central the in transect km 10 a on apart km 5 located are lagoon. S29) the and of S28, sector (S27, stations sampling geolocated 3: Figure itiuino h eetdgnr n11)ars h reo iei aiu oan,kingdoms, domains, various in life of tree the across (n=1815) genera detected the of Distribution egahclcto fCiialgo nIda n h io-cl apigsrtg.Tethree The strategy. sampling pilot-scale the and India, in lagoon Chilika of location Geographic 20 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ersn ttsial xrpltdvle,adterbo rudtelnsrpeet h 5 confidence 95% the represents lines the around ribbon the and intervals. values, extrapolated statistically represent 5: Figure eu cuuaincre safnto fsqecn et cosvrostx.Tedte lines dotted The taxa. various across depth sequencing of function a as curves accumulation Genus 21 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. ne ih9%cndneitrasars aiu taxa. various across intervals confidence 95% with index 7: Figure intervals. and confidence regression, 95% polynomial local the from indicates derived lines samples the the around among values ribbon coverage the of trend the represent lines 6: Figure apecvrg ttegnslvla ucino eunigdphars aiu aa The taxa. various across depth sequencing of function a as level genus the at coverage Sample omnt iiaiyaogtesmlsmaue ihteSrno ne n h Jaccard the and index Sorenson the with measured samples the among similarity Community 22 Posted on Authorea 22 Feb 2021 | The copyright holder is the author/funder. All rights reserved. No reuse without permission. | https://doi.org/10.22541/au.161401815.51766652/v1 | This a preprint and has not been peer reviewed. Data may be preliminary. vs1 33 (SD 13.30 reads. (SD paired-end 91.75 billion similarity index. Community one Jaccard (E) of the index. depth with Sorenson sequencing measured the sample a with samples Average 1.97) measured for the (C) samples (SD among deviation the 30.6 deviation. standard among standard similarity with Community with level (D) sample genus per the richness at genus coverage (SD asymptotic 61.21 14 estimated Average (B) 1: Table (SD 93.72 (SD 292.34 30 Aves (SD 191.37 61 (SD 189.73 Mammalia 97 (SD 313.19 308 Actinopteri Arthropoda (SD 194 985.81 (SD 194 673.46 Metazoa (SD 64.28 333 Viridiplantae (SD 45.97 Fungi 1029 675 Protists 64 Eukaryota 47 Bacteria Archaea Viruses Taxon ttsia umr cosvrostaxa. various across summary Statistical samples across genera Observed (A) 0.59) 0.63) 5.19) 5.32) 2.09) 5.49) 26.66) 35.83) 2.72) 0.55) 1.03) sample per richness genus asymptotic Estimated (B) 23 ( )Ttlnme fosre eeaars samples. across genera observed of number Total A) 7.23) (SD 95.16 6.50) (SD 94.75 4.36) (SD 95.95 3.52) (SD 95.87 5.87) (SD 95.36 1.90) (SD 97.77 3.00) (SD 93.04 3.34) (SD 95.31 0.30) (SD 99.22 2.15) (SD 97.72 1.91) (SD 98.49 reads billion one at coverage Sample (C) 13.71) .80.93 1.00 0.98 0.95 0.93 1.00 0.97 0.98 0.98 0.97 0.91 0.99 0.94 0.99 1.00 0.97 1.00 0.98 0.97 1.00 1.00 0.99 index Sorenson (D) .01.00 1.00 index Jaccard (E)