Article (refereed) - postprint

This is the peer reviewed version of the following article:

Creedy, Thomas J.; Norman, Hannah; Tang, Cuong Q.; Qing Chin, Kai; Andujar, Carmelo; Arribas, Paula; O'Connor, Rory S.; Carvell, Claire; Notton, David G.; Vogler, Alfried P. 2020. A validated workflow for rapid taxonomic assignment and monitoring of a national fauna of (Apiformes) using high throughput DNA barcoding. Molecular Ecology Resources, 20 (1). 40-53, which has been published in final form at https://doi.org/10.1111/1755-0998.13056

This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.

© 2019 John Wiley & Sons Ltd

This version available http://nora.nerc.ac.uk/id/eprint/525292/

Copyright and other rights for material on this site are retained by the rights owners. Users should read the terms and conditions of use of this material at http://nora.nerc.ac.uk/policies.html#access

This document is the authors’ final manuscript version of the journal article, incorporating any revisions agreed during the peer review process. There may be differences between this and the publisher’s version. You are advised to consult the publisher’s version if you wish to cite from this article.

The definitive version is available at https://onlinelibrary.wiley.com/

Contact UKCEH NORA team at [email protected]

The NERC and UKCEH trademarks and logos (‘the Trademarks’) are registered trademarks of NERC and UKCEH in the UK and other countries, and may not be used without the prior written consent of the Trademark owner. Accepted Article Whiteknights,PO BoxReading, 237, UK, 6AR RG6 3, 38206 LaLaguna, Tenerife, Spain. UK 7PY, SL5 Ascot, London, College Imperial Campus, This by is protectedarticle reserved. Allrights copyright. doi: 10.1111/1755-0998.13056 lead to differences between this version and the throughbeen the copyediting, pagination typesetting, which process, may and proofreading hasThis article been accepted publicationfor andundergone peerfull but review not has 7 6 5 4 3 2 1 Article :Resource type Article PROF. ALFRIED VOGLER ID (Orcid :0000-0002-2462-3718) DR. ANDUJAR CARMELO (Orcid :0000-0001-9759-7402) ID MS. HANNAH NORMAN (Orcid ID: 0000-0002-4800-5375) Current address: School of Agriculture, Policy and Development, University of Reading, Reading, of University Development, and Policy Agriculture, of School address: Current Faculty of Biological Sciences, University of Leeds, Leeds,LS2 9JT UK, Resear Evolution and Ecology Island address: Current UK 9TY, TW20 Surrey, Egham, Lane, Bakeham CABISite, Ltd, NatureMetrics address: Current Park Silwood Sciences, Life of Department DTP, Planet Changing a for andSolutions Science UK 7PY, SL5 Ascot, London, College Imperial Campus, Park Silwood Sciences, of Life Department 5BD, UK SW7 London, Rd, Cromwell Museum, History Natural Sciences, of Life Department Thomas J. Thomas J. Creedy Paula Arribas Paula A validated workflow for rapid taxonomic assignment and monitoring of a national fauna of of fauna national a of monitoring and assignment taxonomic rapid for workflow A validated 1,2,5 1,2 *, Hannah Norman Hannah *, bees (Apiformes) using high throughput DNA barcoding barcoding DNA throughput high using (Apiformes) bees , Rory S. O’Connor S. Rory , 1,3 6,7 *, Cuong Q. Tang *, Q. Cuong , ClaireCarvell Version of Record. Please as this Version cite ofRecord. article

ch Group (IPNA-CSIC), Astrofísico Sánchez Fco. 8 , David G. Notton G. , David 1,4 *, Kai Qing Chin 1 , Alfried P. Vogler P. , Alfried 1 , Carmelo Andujar Carmelo , 1,2

1,2,5 , Accepted Article organismal associations. A reference database of Cytochrome Oxidase subunit 1 ( 1 subunit Oxidase of Cytochrome database Areference associations. organismal and artefacts potential assess to of individuals deep andthe barcoding scheme, concept monitoring consisting (Apiformes), of bees Kingdom) (United fauna aregional for protocol DNA barcoding a high-throughput devises study This pollinators. ABSTRACT [email protected] Email: SW7 5BD, UK Hannah Norman, Department Lifeof Sciences, Natural History Museum, Cromwell Road, London, correspondence: for Author equally. contributed authors *These UK This by is protectedarticle reserved. Allrights copyright. 8 to nuclear pseudogenes based on patterns of read abundance and phylogenetic relatedness to theto relatedness phylogenetic and abundance read of on patterns based pseudogenes nuclear to 180 molecularentities ofApiformes identifiablespecies. to Anascribedwere additional entities 72 revealed which position, phylogenetic and abundance read employing protocol new a mix required survey UK-wide froma bees of individuals separate of 762 sequencing Illumina deep permitted tagging Double species. Linnaean the within entities taxonconcepts, molecular but species delimitations resulted in numerous split and lumped (fewer) includingof92.4% 278 speciesknown fromshowed UK the highcongruence withmorphological NERC Centre for Ecology & Hydrology, Benson Lane, Crowmarsh Gifford, Wallingford OX10 8BB, 8BB, OX10 Wallingford Gifford, Crowmarsh Lane, Benson Hydrology, & Ecology for Centre NERC Improved taxonomic methods are needed to quantify declining populations of insect of populations declining quantify to needed are methods taxonomic Improved

. Extracting the target barcode from ampliconthe of reference library construction, a proof-of- a construction, library reference of cox1 ) sequences Accepted Article This by is protectedarticle reserved. Allrights copyright. pollination services to crops and wild plants, and inform policy decisionsconservation and efforts. systematic monitoringpollinator of populations, better understand to the impacts ofdeclines on long-term and large-scale for strategies develop to need urgent an is there Thus, be implemented. can levels regional at formonitoring methodologies solid unless quantify, to difficult are abundance (Potts Kingdom alone inthe United agricultural the implications for (Kennedy communities pollinator in changes significant to led have crops, monoculture of use the particularly intensification, (Vanbergen change environmental to contributing factors other and various andparasites, pathogens species, invasive and non-native of introduction loss, habitat of effects combined bythe driven 2014), (Garibaldi productivity agricultural and biodiversity INTRODUCTION tagging. dual double sequencing, Illumina contamination, barcoding, community Pollinators, words: Key conservation policies thefor protection of pollinators. DNA- aid greatly herewill built The resources conduc to generated were scripts procedures. Custom andlaboratory pan traps in cross-contamination low-level for evidence besides DNA, human of of bees, common detection the and parasite a known including the with associated in almost all samples, resulting from traces of species insect caught the in sametraps, organisms set.reference Clustering readsrevealed ra of a Widespreaddeclines in pollinator populations causingare concern about future the of global et al. 2013). Landscape effects on pollination of crops through agricultural

sector and pollination sector andpollination services millions of worthpounds hundreds of etal. et al. et 2010). However, these trends in species distribution and and distribution species in trends these However, 2010). 2013; Ricketts nge of secondary Operational Taxonomic Units (OTUs) (OTUs) Units Taxonomic nge ofsecondaryOperational based monitoring to inform management and management inform to monitoring based etal. t critical steps of the bioinformatics protocol. protocol. bioinformatics the of steps critical t 2013; Hallmann 2013; etal. 2008), with obvious economic economic obvious with 2008), et al. et 2017; Lever et al.

Accepted Article speciesprovide to accurate and verifiable molecular identification. re andvalidated comprehensive a is approaches associatedthe specimen(Arribas al. et 2016; Shokralla to back traced be to sequences allows which tags, secondary with sequencing Illumina for pooling to prior PCR initial the in tags unique using ofsamples of thousands information individual the (Yoccoz profile read sequence mixed on the based data incidence species community-level produces bulk, in sequencing amplicon throughput high to subjected are catches trap entire which by ‘metabarcoding’, of approach more the recent Similarly, or externallyassociated with ata internally organisms from DNA reveal may also methodology barcoding) (HT barcoding throughput generated in the course of a pollinator monitoring scheme. Thegreat sequencing depth of such high- potentially specimens of thousands assay could barcoding DNA for large-scale approach HTS-based Asuitable level. community or individual the at ‘barcode’ DNA by their species identify to techniques costly identify(Lebuhn to and challenging make them trapping mass from specimens of quantity large and diversity species istaxonomists,growing butthere a alte needfor (Westphal habitats grassland the most methodeffective for systematicmonitoring as proposed been have traps pan Instead, 2015). Pocock & (Isaac biased spatially and be temporally distributions,they provideno informationonabundance orlocal population size, and are knownto Powney This by is protectedarticle reserved. Allrights copyright. from recordssubmitte occurrence species of This impediment may be overcome through the use of high throughput sequencing (HTS) (HTS) sequencing throughput high of use the through may beovercome impediment This Current evidence of change in pollinator populations in the United Kingdom comes primarily Kingdom primarily comes the in United evidence populations in ofchange pollinator Current et al. 2019). While these allow for the analysis of large-scale changes in species et al. etal. 2013). 2013). rget specimen or as a carry-over a as specimen or rget 2008). Species identification is usually performed by expert by performed expert usually is identification Species 2008). d by volunteer recorders Biesmeijer (e.g. byd volunteer rnative methods, in particular because the great great the because particular in methods, rnative ference set of DNA sequence data from target from target data sequence DNA of set ference et al. al. et of bee diversity in Eu in diversity bee of 2012). Current HTS protocols can maintain maintain can protocols HTS Current 2012). et al. 2015). However, crucial to these from other specimens in the trap. ropean agricultural and and agricultural ropean etal. 2006; Accepted Article This by is protectedarticle reserved. Allrights copyright. using latestthe keys available theat time (Amiet DGN, by identified and netting hand by caught were specimens Most DGN. co-author by maintained in the UK according to the list of Falk and Lewington (2015)and notes from various sources Building aregional reference database MATERIALS AND METHODS procedures. laboratory and handling specimen from or traps the in present organisms associated withthe target specimens, ascross-contamination wellas from other species of assessment the allowed approach deep-sequencing the addition, In database. of the refinement further allowed and methodology conventional against identification molecular of comparisons Morphological scheme. monitoring national for a identify HTS-obtained short barcode reads from 762bee specimens gathered as part of a pilot study (Smith in discrimination species Cytochrome the using UK the from known bees of species 278 forthe database reference curated well of a Th Kingdom. United of the Apiformes) (Hymenoptera: bees the on focused fauna, pollinator regional-scale of a samples mass-trapped of abundance speciesas determined by morphology, with multiple specimensspecies perwhere available. These UK unique available all included set reference The below. described survey the from pan traps publicati different between cross-checked also were identifications some while study, the way through only part becameavailable (2015) Lewington & keyof Falk comprehensive the because references various onthese draw hadto Identifications al. 2014; Bogusch 2014; Straka & 2012;Falk &Lewington Benton 2015; 2006; Mueller 2016). This study lays the groundwork for the use of HTS techniques to assess diversity and and diversity assess to techniques HTS of use the for groundwork the lays study This A cox1 reference database was generated from DN generated was database reference c Oxidase subunit I ( cox1

) barcode marker (Hebert marker barcode )

et al. etal. ons. Additional specimens were obtained using using obtained were specimens Additional ons. identifications performed in parallel provided provided parallel in performed identifications 2008).This reference set was then used to e e first step in this process was the generation 2001, Amiet 2004, 2010; A barcoding of bee species known to occur occur to known species bee of barcoding A etal. 2003),which provides good et al. et 2007; Amiet et Accepted Article separately for eachgenus groups using BEASTseparat 1.8.1 (Fujisawafor method Coalescent) Yule Mixed (Generalized GMYC The (Drummond & Rambaut & (Ratnamethod network linkage single a employing database, BOLD Barraclough(Barcode Identification 2013) was Numbers) performed were automatically generated onphylogene f only a single British representative ( used were Alignments This by is protectedarticle reserved. Allrights copyright. Rophites time. same the at sequenced were that barcodes Syrphidae with along in BOLD (Bar deposited were Sequences Material). (Supplementary procedures standard followed technology terminator dye ABI using sequencing mitochondria of84 alignment onan based designed TWYTCWACWAAYCATAAAGATATTGGBEEr TAWACTTCWGGRTGWCCAAAAAATCA) and newly complete The rpm. 75 at incubator a shaking in overnight K) Proteinase (ATL and buffer extraction in 56°C at incubation http://dx.doi.org/10.5519/0002965). History Portal Natural MuseumData atthe available morphological are vouchers species, identified by different taxonomists,or belonging tospeciescomplexes. Specimen data for of inclusion allowed replicates within-species DNA was extracted from a hind leg using a Qiagen DNeasy Blood and Tissue Kit, after after Kit, and Blood Tissue DNeasy a Qiagen using leg a hind from was extracted DNA the using aligned were Sequences ), GMYCwas adding after conducted congeneric BOsequences from

cox1 for distance-based and coalescence-based species delimitation: delimitation: species coalescence-based and distance-based for ‘barcode region’ (658 bp)wa (658 region’ ‘barcode Apis MAFFT , specimens from acrossspecimens thegeographical range of Anthidium code of Life Datasystem) under the project BEEEE, BEEEE, project underthe Datasystem) of Life code v1.3. l genomes from 22 genera of Apiformia. PCR and and PCR Apiformia. of genera 22 from genomes l (Katoh , Ceratina s amplified using primers (BEEf (BEEf primers using amplified s

et al. or sequences uploaded onto uploaded sequences or singham & Hebert 2013). (2) (2) 2013). Hebert & singham

, ing independent coalescent independent ing 09 pui in plugin 2009) 07. For 2007). i tes constructed trees tic LD.

, (data.nhm.ac.uk; (data.nhm.ac.uk; Macropis

eea with genera

Geneious (1) BINs (1)

and . Accepted Article This by is protectedarticle reserved. Allrights copyright. performed was each pool of amplification Secondary UK). Wycombe, Coulter, (Beckman XP beads AMPure Agencourt using cleaned 8 and of pools (Schnell detected be could jumping of tag products the that tag, so same the used primers reverse and forward tags with a Hamming distancedifferent totalof tagged with a of3, primer 8 sets.In all reactions, al. be each for combinations tag unique generate to (Shokralla protocol PCR dual’ ‘double a using weretagged eachindividual for Amplicons the of bpportion 418 DNeasy orKit BloodandTissue kitsappliedtolysate. the Each extract DNA was PCRamplified for a performed were DNA extractions shaking incubator. a56°C on hours for 12 buffer K ATL/Proteinase ul 200 consisting solution lysis in specimen whole voucher specimens Molecularin the Collection Facility the NHMUK. at specimens All sequenced. individually and processed bycommercialservices.offering taxonomists bee specimens total, identification 762 In were transferred to -20°C as soon as possible after colle and ethanol, 99% in stored field, the in taxa other from separated were (Apiformes) Bees protocol). 1A; see Carvell m 200 from running each set (Figure of pan traps transects along bystandardised netting collected Westphal blue; after and white UV-yellow, (painted bowls ofwater-filled sets of consisting pan traps with collected were samples National PollinatorandPollination Monitoring FrameworkCarvell (NPPMF; HTS using caught samples a Generating field dataset from test (2016). Tagswere addedin the initial PCRby amplificationusing The reference database was used for identifi for used was database reference The DNA was extracted from individual specimens by piercing the abdomen and submerging the the and submerging abdomen the piercing by specimens from individual extracted was DNA et al. etal. 2016 and supplementary materials for a full description of the sampling sampling of the description for a full materials and supplementary 2016 2015). Amplicons generated with different primer tags were merged into 96 cox1 barcode region (Andujar region barcode et al. 2008) from 14sites across the UK, and further specimens were e specimen, following the procedures of Arribas ofArribas procedures the following specimen, e ction. Specimens were identified morphologically using eitherthe BioSprintQiagen DNA96 Blood with i5 and i7 Nextera XT indices using unique unique using indices XT Nextera andi7 i5 with were stored in 99% ethanol and deposited as as anddeposited ethanol in 99% were stored cation of specimens obtained through the et al. et

2018; supplementary materials). cox1 primers with different 6bp different with primers et al. 2016). Mixed etal. 2015) et Accepted Article This by is protectedarticle reserved. Allrights copyright. (Zhang PEAR 2011), (Martin cutadapt invokes NAPmerge script was used to generate a set of discarded. Read quality was reviewed using (AndrewsFASTQC 2010). Followingdemultiplexing, the reads according to tags. their pairs Mate with only onematching read correct the tag were binning before used) bp tag 6 the in errors no permitting (i.e. sequences adapter the to mismatch permitted 10% default the used and runs, demultiplexing large for 2011) (Martin cutadapt wraps script This files. read paired 762 into reads separate to primers PCR first-round the of tags unique andwerefurt software Illumina using indices onXT based demultiplexed were libraries The96 data. raw the of filtering bioinformatics wrap to on about of 50% aflow cell of an Illumina v.3 MiSeq (2x300 bp paired-end). sequenced then were which libraries, final 96 the of each for USA) CA, (Illumina, combinations MID not containing a correct primer sequence, and thei and sequence, primer correct a containing not a pairs, read assemble sequences, primer remove with 1 or errors more were expected removed with were used toidentify this HT the represent to designated was sequence barcode)(HT dataset reference the of utility the Testing the specimens. complete of pool a generated process This parameters. default Perl scripts of the custom NAPtime pipeline (www.github.com/tjcreedy/NAPtime) were used used were (www.github.com/tjcreedy/NAPtime) pipeline NAPtime custom of the scripts Perl set ofthe From obtained reads for each specimen “high-throughput asingle barcode” putative barcode from the read mixture. mixture. read from the barcode

etal. her demultiplexed using NAPdemux based on the on the based NAPdemux using her demultiplexed nd perform quality filtering respectively. Any reads Any reads respectively. filtering nd perform quality full-length reads forfurt reads full-length 2014) and 2014) r mates, were discarded, and any merged reads reads anymerged and discarded, were r mates, fastq_filter cox1 gene of that specimen. Three methods USEARCH -fastq_filter USEARCH cox1 ; otherwise, wrapped software used used software wrapped otherwise, ; amplicon sequences for each of her analysis. Thescript (Edgar 2010) to (Edgar Accepted Article not distinguish among variants present in the mixt the in present variants among distinguish not does it such, As filtering. no length with filtering, quality after sequences unique all of counts read based onsimple is read’ method ‘top This specimen. represents DNA the target template abundant most the that assumption under the details), for materials supplementary (see read foreachlibrary barcode. HT the as used OTU’ was ‘top OTU, the abundant most ofthe centroid The of3%. threshold adissimilarity with clustering and and (ii), (i) were removedinsteps copy 1 only and with generation of an output table of OTU read numbers by sample. All sequences differing from 418bp (using clusters OTU to ofreads mapping (v) and sequences, consensus OTU of set output ofan and generation radius cluster to according the using denoising (iii) reads; minimum of a set by represented sequences only retain to sequence, per unique reads of number comprises the following steps: (i)filtering sequence and amplicons), andquality-filtered (merged NAPmerge output from data the with starting the NAPcluster script, which includes standard functions from the This by is protectedarticle reserved. Allrights copyright. validates this selection statistically and taxonomicallyorder in avoid to variants. these process The amplicon mix, but limits the selection to reads of the expected amplicon length of 418bp, and selection. sequence incorrect frequent to leading ( segments DNA PCR or sequencing errors, or truevariants resulting from co-amplification ofnuclear mitochondrial using USEARCH cluster_otus USEARCH The second The method selecting for HT barcodthe Firstly, we generated OTU (Operational Taxonomic Units) clusters and centroid sequences and centroid sequences clusters Units) Taxonomic (Operational OTU generated we Firstly, The thirdThe method also finds the most highly represented sequence amongreads in an Numts ), gut contents, internalised parasites, and cross-contamination, possibly possibly cross-contamination, and parasites, internalised contents, gut ), . This was performed using a metaba a using performed was . This UNOISE algorithm (Edgar algorithm USEARCH usearch_global usearch_global USEARCH ure, including erroneous erroneous including ure, s by length; (ii) dereplication and filtering by e sequence simply chooses the most frequent mostsimplye sequence frequent chooses the etal. USEARCH cluster_otus 2011); (iv) clustering of sequences rcoding pipeline, implemented in in implemented pipeline, rcoding and a custom .uc parser) and and parser) .uc acustom and USEARCH variants from resulting suite (Edgar 2010), was employed for for employed was Accepted Article This by is protectedarticle reserved. Allrights copyright. identifications molecular ofcorrect the number level, taxonomic and method selection barcode HT each For taxonomists. by supplied identifications bitscorebreak to taxon ties.The identity of this hit was compared morphologicalthe against using selected, was similarity highest the with match the and retained, were bp >400 of sequences BLASTn aboveusing the generated HT the reference BEEEE collection bybarcodes tested against identifying S1. Figure Supplementary in visualised is process the and NAPselect, tool, apurpose-built in implemented was Themethod confidence”. “high BLAST the passes it if confidence” “low with is output sequence the similar, sufficiently not are they if restarts; process the occurs, this merge If here). wasused (99% given threshold a above similarityis their if sequence frequent most next with the merged is it test, bootstrapping the fails If asequence consideration. further from removed was sample taxon, the different a matches read abundant most the NCBI a to subjected are procedure this by revealed sequences abundant most the Finally defined. clearly not is specimen target the for sequence barcode abundant most a because 0.5 lowconfidence,for sample theentire less confidence, above than which and is disregarded a designates of0 Ap-value (p-value). iterations bootstrap 10,000 using estimated is alone chance by sequence abundant most the as frequently as occurring sequence of a probability the numbersequences, the ofunique and sample in the reads of number total the Given bootstrapping. by abundance read in difference the of significance the for assessed is sequence unique each abundant, most the with Starting sequence. unique reco dereplication), (i.e. byidentity sequences starts withthe length filteringthiscase (in test, or discarded and the process restarted if if restarted process the and discarded or test, The successThe of these three methods, the and with default parameters. Only matches with with matches Only parameters. default with nt database and the hits assessed for the focal taxon (in this case, Hymenoptera). If the the If Hymenoptera). case, this (in focalfor the taxon assessed hits andthe database , rejecting any sequence not 418 bp), and groups groups bp),and 418 not sequence any , rejecting at at genusthe and species levelswas tallied. rding the abundance of reads representing each ofreadsrepresenting abundance the rding sequence as significantly more frequent with high high with frequent more significantly as sequence not. Sequences passing both tests are output as >95% identity and overlap with the reference reference the overlapwith and identity >95% accuracy of the sequences they output, was was output, they sequences accuracy the of BLASTn search against against search Accepted Article This by is protectedarticle reserved. Allrights copyright. nt target specimen were designated for the sequence barcode HT NAPselect the match didnot that OTUs the Specifically, contaminants. primary the than specimenother bee each amplified DNAfrom dataset testing the in DNA concomitant of Exploration identify identify to used was pipeline filtering A copy. tree-based mitochondrial true the from derived clusters above). collectionNAPselect the reference HT barcodes and (as Apiformes were additionally identified using (Huson algorithm Ancestor Common Lowest weighted the sampleto facilitate validcomparison between samples using Rpackage the calculationsxCore Team 2018). using OTU R(R The sampledataset wasrarefied 400 reads to per number of readswas significantly lowercomparison. in the and BEEEE reference, a matched that OTU related closely another with samples across coincided from derived considered were OTUs Thus, lower. is number coincide and are copy, mitochondrial corresponding 2018). HT barcode from OTUs in each sampleobtained from pan trapping. Onlysecondary OTUs that matched a(NAPselect) database using using database The OTUsThe generated with the NAPcluster script Numts The resultingThe datasets werereconfigured for Cross-contamination among sampleswas tested assessing by the distribution of secondary Numt may appear as separate OTUs in metabarcode data and add spurious OTUs to the to OTUs spurious and add data in metabarcode OTUs separate as appear may -derived OTUs based on the assumption that they are closely related to the another BLASTn sample were used in this analysis. sources Three cross-contamination of , followed by taxonomic binning using using binning bytaxonomic , followed as “secondary OTUs”. These OTUs were searched against the the NCBI searched against OTUs were These “secondary OTUs”. as BLASTn nt across sequenced samples, while their copy various statistics and to perform downstream downstream perform to and statistics various

(see above) the allowed exploration of co- (default parameters)against BEEEE the etal. Numts MEGAN6 2016). Any OTUs assigned Any OTUs to 2016). if their presence completely presence their if cox1 Community Edition with with Edition Community vegan vegan sequence, including including sequence, (Oksanen et al.

Accepted Article This by is protectedarticle reserved. Allrights copyright. barcodes,of out total the number uniqueof HT barcodes. One-sample matched that sample a in OTUs secondary of proportion the i.e. scored, also was together th control, a As contamination. from well-level containingotherbarcodes 3 HT as secondary would OTUs havecontamination a rate of 3/12 =0.25 contaminan possible well HT 12 are barcode, there adifferent is each these, of set a in if indices: XT different with specimens 13 contained preparation in OTUs secondary as were present that barcodes calcul we sources, of combination or source each tag on a single plate, and between specimenswith the in individuals other from were considered: was fitted inaquasi-binomial ANOVA, controlsetting the the reference as level. rate contamination on source of effect the control, the against sources between compare To zero. than greater significantly was combination source or source foreach rate contamination mean the listed as endangered (RDB3-RDB1) (8 species), or rare and localised (5species), while 2species were (1 species, known missing from theUK). The species ex are either species sequences). (201 Together, the two setsinclude bee species255 (92.4% of 278species speciessequences)(15 not represented in BOLD the dataset, and novelhaplotypes further for 107 14 sequences(6 species)from specimenscollected UK. the in new The355 sequences add 10UK from UK (Fig. the 1A). The represented BOLD data Barc fromthe obtained sequences barcode length set, representing Linnaean species.165 These A total of 355bee specimens were newly sequencedfor the COI barcode to generate the reference bees UK of database A reference RESULTS Heriades rubicola Heriades

), only found in the Channel Islands (1 species, (1species, Islands Channel in the found only ), same trap, between specim between trap, same e rate ofe contamination from all possible sources new sequences were compared against 1754 full- ated the proportion totalof possible selected a sample. For example,well eachin library the 245 of the 278 UK bee species, but comprised only species, only bee comprised UK 278but 245 the of ts for any oneof samples, sample ts for these any a thus the same Nextera XTindexin a single well. For ode of Life Database for (BOLD) species known tinct (6 species), rarely introduced by accident accident by rarely introduced species), (6 tinct t tests were testswere used toassess if ens withPCR theens same agilissima Andrena any HT HT ), Accepted Article Andrena bimaculata included: GMYC method) the of (arequirement other each to respect with monophyletic not challenging with relatives ofclose groups due to and Linnaean of Inconsistencies 3). 2, (Fig. lumped and were split networ distance-based BIN the largely with agreed casesinrare the patterns of splittingand more lumping complexGMYCwere The species (Fig. 3). GMYC eithersplitincongruence, the This by is protectedarticle reserved. Allrights copyright. panzeri al. subdivided the for 2 Fig. (see each genus for generated trees phylogenetic Lasioglossum withmorphological the species definitions were limited to five genera, Inconsistency S2). Figure (Supplementary identifications species Linnaean on the precisely mapped of the (94.9%) 242 foundthat We ±0.08%). 6.7% (mean 24.9% to 0% from ranged variation interspecific and error±0.04%), standard 0.31%, (mean of 69species), followedApidae by 76)andof (4 (4of 62). families separately, the greatest number of species missing from the database was in (9 only recently added& Notton (Cross 2017;Notton al. 2015); and groupings within within groupings and 2015); 2007); suspected geographically confined speciesamong the Genetic variation within morphologically identified Linnaean species ranged from 0% to 5.9% 5.9% to from 0% ranged species Linnaean identified morphologically within variation Genetic De novo species delimitation from the DNA DNA sequ from the delimitation De novospecies , and , and Colletes N. goodeniana N. and and Nomada

succinctus - A. tibialis A. . - N. succincta succincta N. species group ; A. clarkella Lasioglossum rufitarse Lasioglossum (42 cases) cases) orlumped(42 an(14 existing nominal species, but clusters. clusters.

( - C. halophilus A. lapponica morphological identifications. Subsets of species kmethod in the extent whichto nominal species et al. et cox1 ences using the GMYC method were based on werebased method GMYC the using ences -based sequence clusters at 97% similarity similarity 97% at clusters sequence -based , 2016). When considering eachof six the - Nomada flava flava Nomada C. hederae C. - A. helvola Dasypoda cox1 - Nomada -based entities were mainly were -based entities Andrena, Bombus, Colletes, Colletes, Bombus, Andrena, C. succinctus -

hirtipes - A. varians A. N. leucophthalma N. ). Incases most). of group (Schmidt ) (Kuhlmann ) (Kuhlmann ; the recently ; therecently - et N. N. et Accepted Article This by is protectedarticle reserved. Allrights copyright. TestingHTS data against the reference library around 1000 reads. reads. around 1000 ta the on based barcodes HT producing libraries of percentage the per sample, reads 500 below poor is performance while that shows 4 Figure variation inread numbers to explorethe effectivity Given Material). (see filtering Supplementary reads) abundant top among discrimination (low statistical or Hymenoptera) to matches (no taxonomic latter mainlycomprised libraries , obtaineda barcode forindividuals,749 failingto do specimens so for 13 (Table1A). The and readabundance of significance statistical by selection barcode validates which method, barcodeHTfrom thesesequences each specim for specimenperrange = 6,921;(sd = 7-56,781; >1,000). 87% Three methodswereto used designate a of reduced anaverage to 5,851 was this filtering, quality and stringent read merging pairs per amplicon pool after demultiplexing (sd = 10,615; range = 18- 88,241; 95% > 1,000). After were obtained for the survey samples. were obtainedsurvey the for identifications. samples, Across all 154unique spec th sequences, reference fewer matched methods by top read the and top OTU barcodes obtained the because but near 100%, was allfor methods set fewer matches and at559 584, respectively. nu The cl referenceset(Table 1A),whileBEEEE theOTU Illumina sequencing of 762specimens from the UK in the sequences to match a produced (99.7%) 734 NAPselect, by chosen barcodes ofthe Out with verylow numbers,wh read

ustering and top-read me top-read and ustering ies identifications against the BEEEE reference set reference identifications BEEEE ies against the xonomically validated top read reaches 90% at readreaches 90% top validated xonomically of NAPselect at different read valuessample. per en (see Materials and Methods). The NAPselect mber of species-levelmber reference hitsthe against ey resulted inapprox ey resulted this result, we were able to leverage the wide we result, leverage were to able this survey resulted in an averageof 9,025 read ich were removed based on the onthe removedwere based ich . 25% fewer specimen thods had substantially substantially had thods cox1 sequences sequences Accepted Article Linnaean taxon names (similar to Fig. 2), but a total of 136 NAPselect sequences were at least least at were sequences NAPselect 136 of a total but 2), Fig. to (similar names taxon Linnaean ofcl clusters in small sequences grouped generally analys to phylogenetic and subjected set reference identifications, NAPselect 731 combined the HT barcodesall sequences of were 335 ofwith the the the single HT barcode matching alumped matchessplit tospecies correct species the were at level(94.1% and the at genus 98.8% level), and barcodeswas similar to very overall the 76.5% matches rate: of lumped tospecies and 88.2% of HT of forsets matches these level andgenus species correct of The proportion and split. lumped to a species that was lumped, 178to a species that was split and 1to a species that was both were 17 sequence, reference BEEEE a to match BLAST a had that byNAPselect generated sequences HTbarcode 734 the Of analysis. GMYC the in split and/or were lumped that species against driverof molecular misidentification by examiningthe proportion of correctandincorrect matches someNAPselect specie (asexpected because produced markedly more successful identifications, whereas example, for genera; species-rich the among particular 5), in (Figure genera Thesucce level). genus at correct (707 sequenced correct species identifications using this methodwas highest, the at specimens611 outof the 762 numberof absolute the level, species to ofbarcodes proportion larger aconsiderably designated specimens identified as same the species with both data types (Table 1B). However,NAPselect as of 83-86% with 95-96%, with level genus at high was specimens source the of identifications This by is protectedarticle reserved. Allrights copyright. To account for other causes of the inconsistencies in morphological versus molecular molecular versus morphological in inconsistencies the of causes other for To account a was dataset reference the in observed splitting and lumping the whether investigated We Congruence of species-level molecularidentifications with the species-level morphological and s were inseparable by DNA; see above). above). DNA; bysee wereinseparable s split specieswascorrect at the species level well. as ss rate of molecular identification differed among ose relatives that mostly correspond well with the is (Supplementary (Supplementary tree S3). resulting Figure The is Colletes showed low success even using Andrena and Bombus

Accepted Article This by is protectedarticle reserved. Allrights copyright. species pairs of samples. six afurther in reads of the and40% 10 between and OTU this morphological ID for that specimen, but they represented less than 10% of the reads in of 20 these secondary OTUs,28 of the mismatched 136 sample with primer tagged the of contamination suggesting plate, single from a from haplotypes related closely several to addition speciesof withone 15 different representative to assigned cluster a notable including individuals, 30 around in observed is inconsistency former This identification. molecular correct the with OTUs secondary of recovery the from b) or species, related distantly grouping clusters may well) a (within tagging orsecondary sequences) of plate a level (within PCR the either at Library contamination discrepancies. observed the of most for account may that identifications morphological and molecular the both with problems for We evidence found eachcluster. in sequences to namesassigned the with conflict in partially accounting for the misidentification rate. and contaminants ofobvious removal after identifications molecular correct to 97% over rises ‘mismatches’ down in mophologicalerrors are the to identification, 83-86% the observed and rate identification matched molecular the ID precisely, either to or species group.The majority of the samples were 100% correct, of78% mismatched the specimenswere incorrect, andcorrect the non-mismatched for the identifications the while by that and found DGN, undertaken was specimens morphological identifications of asmall sample of 10 mismatched and14 non-mismatched from thenameassigned on to thecluster based thereference sequence. A review of the identification current in the study resulted in mixed species clusters, whose or identification differed wilkella - Over 100inconsistentrecordsinvolved clusters of mixed closerelatives.Notable several are A. similis Lasioglossum ) and Bombus ( L. albipes (the Andrena labialis - B. terrestris/lucorum B. L. calceatum exact matches haplotype exact the reference to(in further species).sequences were All ofthese further s contained secondary OTUs that matched the matched that OTUs secondary s contained be recognisable from identical a) sequences or based on the reference set that contained contained that set reference the on based , L. fulvicorne complex) whose morphological morphological whose complex) - A. labialis L. fratellum DNA. Examining ), Andrena ( A. A. Accepted Article taxonomic identification, errors in various DNA labo sources of DNA may becarry-overfrom Potential th likely This by is protectedarticle reserved. Allrights copyright. OTUs raised the question about originthe of th composition. community OTU secondary of details further report andS4 figure materials The supplementary samples. numerous Finally, Parasitiformes. and Sarcoptiformes, Crotonioidea othe the while ofbumblebees, parasite tracheal comprised four OTUs, of which one was identified to species, The oomycetes. and fungi numerous and plants flowering several as well as (alphaproteobacteria), also OTUs from organisms that associate directly with such bees as Acari ()and Coleoptera OTUs were identified as common flower visitors, including three species of were sequence and traps the in present were which (Syrphidae), ofhoverflies species several included The Diptera OTUs). (22 insects other and various included235 OTUs from across the eukaryotes,including DipteraOTUs), (48 Coleoptera (6 OTUs), also dataset the but OTUs, of set the dominated thus Apiformes library. reference BEEEE the to hits Numt BLAST/MEGAN using Apiformes as identified were 263 which of OTUs, 498 generated NAPcluster within USEARCH dataset testing the in DNA concomitant of Exploration addition to Soldier Beetles (Cantharidae), Malachite Beetles (Malachiidae)and aPollen (Nitidulidae), Beetle in filtering. The final count of count final The filtering. Numts When OTU clustering OTU When wascarried ondata out setof entire reads the 762 from all samples, The high incidence of NAPselect barcode sequences (i.e. Apiformes) occurring as secondary secondary as occurring Apiformes) (i.e. sequences barcode NAPselect of incidence high The Zophobas atratus Zophobas . In addition, several OTUs were reclassified as Diptera in the phylogenetic tree used forOTUs wereDipterain thephylogenetic treeused reclassifiedseveral as . Inaddition, . Out of these,tree-based theassessment potential of , a non-native species of Darkling Beetle (Tenebrionidae). There were were There (Tenebrionidae). Beetle Darkling of species , anon-native bona fide bona OTUs identified as Apiformes was 180, of which 170 had 170 which of 180, was Apiformes as identified OTUs ese non-target specimensese non-target ratory procedures, e ratoryand procedures, e traps, mixing of specimens during handling for handling during ofspecimens mixing traps, e rs identifiedonlywere as members of the d in the same run as the bees. Five of the six the of bees.Five the as samerun d inthe

Homo sapiens Homo Locustacarus buchneri, Locustacarus Numts DNA was detected in rrors in tag sequencing. identified 72 OTUs as 72 OTUs identified in the barcoding mix. Wolbachia Cantharis aknown

Accepted Article species cross-contaminated within wells (mean = 2.5%, SD = 1.1). calceatum of specimens with (PCR tag) plate a sharing species other of ofsamples 7% in found being here, contamination that showed wells and plates for analysis these, a rate of 20%. Attrap level, the average rate for 23speciesthe was 7.6% (SD =4.5). sameThe This by is protectedarticle reserved. Allrights copyright. barcoded 63specimens other species 21traps, from of these and Lasioglossum malachurum from species ofadifferent sample other one least atin OTUs secondary as found each were species different 23 to identified OTUs species. individual stage. construction library the at PCR secondary from i.e. well, same the in samples those from lower much was cross-contamination of level The PCR. primary greate the that indicating Table S1), Supplementary 6, (Fig. contamination trap and within-plate and contamination within-plate for rate contamination backgroundcross-contamination levelof from any source, there was asignificantincrease in recognised as secondary in OTUs at least one sample. Compared with control,the i.e. the combinations(SupplementaryAltogether, S1). Table 180Apiformes of the OTUs were 132 grea significantly but was low, sample another barcodein HT that the were sequences DNA with contamination ofdirect level the In general, The lowThe levelof contamination was reflected in the pattern of cross-contaminationof . 45species cross-contaminated within plates, with a mean rate of 2.2% (SD= 1.7), and 13 L. calceatum , of which there were 37 specimens,were which of 37 there , andof 5% samples of other species sharing awell (MID) with Lasioglossum calceatum Lasioglossum thewas sametrap.Thethese most of frequent ter than zero for most andsource zeroforsources than most ter st rate of contamination may have been during during have been may contamination of rate st in the studyin the from21traps.We HT L. malachurum malachurum L. was most the commoncross- was found in 13 of 13 in found was L. L. Accepted Article This by is protectedarticle reserved. Allrights copyright. Gezon although 2019), Hart, and Robinson (Drinkwater, made consideration of the use of broad-target collection methods with high collateral catches should be (Powney declines be should It for and testing. database reference with pan traps,which was successful in providing widea rangeof speciesthe generation for of a (Gill conservation and management use land inform and to changes forpopulation evidence provide robust b of identifications species-level Cost-effective methodology collection Sample DISCUSSION individuals for long-term monitoring whilst minimising catch sizes (Carvell (Carvell sizes catch minimising whilst monitoring long-term for individuals trap pan short relatively a used protocol study Our structure. community term long affect not does sampling reasonable inparticular, pan traps The reference database reference The monitoring. conventional of part been not have thus and difficult taxonomically before, including ‘bycatch’the of numerous other insect pollinators, mostly in the Diptera, which are ever than producedby depth the mass-collecting ofdata of advantage to fuller us take allows studies community atool for as ofmetabarcoding use growing broadly, the impact. More the minimise can methods bioinformatic robust although analysis, molecular for downstream ge may methods bulk-collection that we demonstrate samples from a proof-of-concept monitoring scheme. The combined effort of new sampling and and ofnewsampling Thecombined effort scheme. monitoring proof-of-concept a from samples of sequencing high-throughput using identification species for trialled then was which technology, We conducted this analysis in two stages, by first building the reference database using Sanger Sanger using reference database the building first by two stages, in analysis this conducted We etal. et al. 2016). This study used specimens of bees obtained through mass-collection mass-collection through obtained bees of specimens used study This 2016). 2019), and invertebrate declines in general (Hallmann

ees and other insect pollinators are required to to required are pollinators insect other ees and mentioned that in the wider context of pollinator pollinator of context wider the in that mentioned exposure period designed to sample sufficient nerate unwantednerate cross-contaminationof levels et al. (2015) show that in the case of of case inthe show that (2015) et al. 2016). In this study, et al. 2017), careful 2017), Accepted Article differences tend to be exacerbated in local subsets (Bergsten subsets local be in to exacerbated tend differences species-level the as only, samples UK with performed if clearer be even may discrimination This by is protectedarticle reserved. Allrights copyright. higher in specimenthe mismatched are certainly observed across the HTS dataset, there is little evidence that this was substantially issue with the reference set or with the NAPselect pipeline. While low rates of cross-contamination contaminationor misidentification of the specimens by the NPPMF taxonomists, rather than an morphological and molecular identifications. We from apparent was which database, reference somecases after secondary inspectionof specimens. However, some problems remained with the high congruencewithBOLD the database also suggests thethat identifications have correct,been in The locally. may discrimination that theadd to powerofspecies many new haplotypes contributed corroborate markers mitochondrial ‘gene trees’ area good reflection of thespecies-le mitochondrial the that shows which species, Linnaean the with groups molecular of congruence in distributed morewidely are species incongruent with Linnaean the species. sp be to shown were proportion of widespread species. The references by weexpanded existing Furthermore, complete setof the UK bees,with only fewrare or a presumed speciesextinct missing. virtually a in resulted BOLDdatabase, the in already data barcode with together sequencing, evident inthe discriminatory. In remaining the casesthe molecu separate entities, showing that for almost bee species all in the UK this setis sufficiently The overall reference database comprises a mixture of UK and non-UK sequences, as many many as sequences, andnon-UK of UK mixture a comprises database reference overall The de novo species delimitation using GM the cox1 barcode delimited 94.9% of species inreference the database as s, observingcorrectthe sequences assecondary onlya reads in lit into additional GMYC groups which, however, were not not were however, which, groups GMYC additional into lit the species hypotheses (DeSalle vel entities, as both morphological diagnostics and and North America. We found generally high high generally found We America. North and generating novel sequences from UK populations from populations sequences UK novel generating the HTSsequences 136 with inconsistent lar analysislar lumpedthe Linnaean species,as attribute most of this failure to either YC method, while YC method,while et al. et 2012). The UK sample et al. 2005). Species an even greater evengreater an Accepted Article hitherto unrecognised species. For For species. unrecognised hitherto divergent versa, Vice 2007). (Kuhlmann three recognised the e.g. useful; maybe markers genetic Additional 2). (Fig. analysis inour separable not were they Nomada goodeniana-succincta (Kuhlmann species these delineate to sufficient not been associated with a morphologi Lewington and 2015), are there biological and distributionaldifferences while c three named species is reliable, if challenging, now that there is a key covering all UK species (Falk & This by is protectedarticle reserved. Allrights copyright. succinctus in four of the 27 genera studied (“lumped” in Fig. 3).In some instances, such as the or (mostly) split Linnaean the taxa. molecular The faileddata smallseparatea to number of species against referencethe setis closer to97%. identification rate ofcorrect true the misidentifications, taxonomic for correction and contaminants of reassessment by our shown as set, reference of the 136 ‘mismatched’ barcodes HT can in fact be correctly identified throughcomparison to the most that and anunderestimate, is 1B) (Table NAPselect for identifications species molecular correct inthe re represented already beyondthose clusters addnewsequence HT barcodes the did cases noneof in these We note that database. of the Bombus knowncases oftaxonomically problematic groups, in particular in the affected problems these Inpart, re-identification. small-scale our on based identifications, It most appears likely most that mismatches rate. background the than no higher abundance a read at manyof these and cases of minority small DNA clus includes reference The , to which this study added a large number of sequences that may forarefinement useful that be of sequences number alarge study added ,this which to species group,which shared haplotypeswith Colletes species lumped in group relieson colour subtle variants &Lewington (Falk and 2015) cally differentiated, eastern European species, European cally eastern is although differentiated, not it example, a divergent haplotype in haplotype divergent a example, ters (established by the BIN or GMYC methods) that lumped cox1 entities (splitting) may the existence indicate of cox1 are due to incorrect or unclear morphological morphological orunclear are incorrect to due a subset of these specim ofthesesubset a exhibit fixed differences in EF-1a and ITS and ITS EF-1a in differences fixed exhibit et al. ference set.ference Itis that clear the 86% rate of C. hederae C. , 2007. Similarly, separation the of the , morphological identification of Lasioglossum Dasypoda hirtipes Dasypoda ens. Afterremoval of ox1 sequences are are sequences , Nomada has now has Colletes Colletes and Accepted Article This by is protectedarticle reserved. Allrights copyright. hit). BLAST a produce not did (23% database reference the to match a find to more likely for barcode a designates also method this while wo that sequence the to analogous pool, amplicon inthe sequence unique frequent most the select simply to was method Analternative cases. of 27% in database reference the to hit a BLAST produce not did method this with obtained sequence the However, barcode. HT the as each OTUin sample represented most highly ofthe sequence centroid the designate to pipeline USEARCH the using analysis metabarcoding astandard undertake to was approach intuitive most The amplicons. anonymous of apool from sequence this designating confidently for thetarget asequence recover to ability our and approach this of efficacy the test to specimens individual on applied here was but from survey species a to identify used then was barcoding”) (“HT barcoding High-throughput (metabarcoding) samples mixed sequencing for potential great has The methodology traps. of pan barcodes throughput high Generating database. reference morphologica leg, one only destroyed extraction DNA Since work. morphological forfuture ground fertile provide andmay species Linnaean addition, the newly detected clusters may lead to the discoveryof separate entities within the In S3). Fig. andSupplementary Text (Supplementary identification morphological and molecular mostly a few species of identifica to remove morphological in particular ofthe UKpart fauna (Schmidt

Lasioglossum et al. et curated the have2015).already We

that also accounted for most of the inconsistencies of of inconsistencies ofthe most for accounted also that every sample, these sequences are only marginally onlyevery sequences marginally are sample, these uld be generated by Sang be generated uld tion errors, andtheremaining problems affect specimen. Weemployed three methods for l vouchers can be re-examined to refine the the refine to bere-examined can vouchers l cox1 er sequencing. However, However, sequencing. er database extensively, extensively, database Accepted Article This by is protectedarticle reserved. Allrights copyright. nuclear mitochondrial DNA regions ( regions DNA mitochondrial nuclear of forsequencing potential the increase also may this species, of associated range of abroader co-amplification Besides region. barcode standard the amplify to used primers Hymenoptera-specific (Arribas se Illumina for primers the However, product. PCR clean produced which referencedatabase, the build to sequencing Sanger in used those from different substantially not were program monitoring the proportionof reads attributable to secondary OTUs, and their taxonomic diversity. Specimens from a permits analysis current the on mixedsamples, conducted metabarcoding standard Unlike precise determinationofamplicons derived fromsinglespecimens. surprising A findingwas thehigh dataset testing the in DNA concomitant of Exploration sequence. correct the select effective, with error rates determined largely by se searchesGenBank against and assesses taxonomythe hits.This of the methodclearly is very 1). The key improvement introduced by thisscri (Table methods over other the improvement -asubstantial hit aBLAST with sequence a produce not majority of NAPselectsequences matched reference the database, and only 3.7% of specimens did reads, although worked generally NAPselect ve numbe read low to due specimens for 13 sequence desce to according checks reads other read and co these If sequencelength. predicted matching the frequencies than greater significantly is frequency read the that and (Hymenoptera), group taxonomic aspecific readmatches this that requires but The third method, thirdThe implemented in the NAPselect script, also selects the top-abundant read, etal. 2016) and probably have a wider taxonomic amplitude than the Numts ). Out of 509 recovere ). Outof509 OTUs ry well numbers. reasonable Thegreatat read nding abundance. This pipeline did not output a output not did pipeline This abundance. nding quencing were designed for broad amplification of of amplification broad for designed were quencing pt probably was that NAPselect conducts BLAST BLAST conducts NAPselect that was probably pt of reads,besides of other the requirement for exact quencing depth issues rather than an inability to to inability an than rather issues quencing depth nditions are not met, NAPselect discards the top the discards met, not NAPselect are nditions rs or low differentiation among other abundant abundant other among differentiation or low rs base calls consistent with a single predominant predominant asingle with consistent calls base

d from all samples combined, Accepted Article This by is protectedarticle reserved. Allrights copyright. from likely derived OTUs foridentifying procedure The account. into is 3) taken (Fig. to 180, whichcloserthe 154species is identified to morphologically, inparticular ifOTUsplitting phylo the with revealed was assignment incorrect were identified as mitochondrial (Pons persistence evolutionary are oflimited that assumption the under database), reference the to matches (close copy true presumed the with clade forma that OTUs removed only We copies. mitochondrial true the with distributed co- are that OTUs oflow-abundance distribution onthe based filter afurther implemented therefore We sufficiently. artefacts avoid these the not bp fragment for could 418 filtering expected length but this survey. which initially, as Apiformes identified were 263 less so in metabarcoding. Here, it proved to be a be to proved it Here, metabarcoding. so in less cox1 inthestep metabarcoding filtering process, howeve crops and wild grasses (Poaceae), and (Poaceae), and grasses andwild crops the wheat stemborer included ofDiptera Species pan traps. the in specimens numerous of presence the with consistent the same laboratory thes as in been processed known have to to belonged species ofthe secondary OTUs None (and pan traps). coleopte of known representatives plausible highly studies. metabarcoding in seen frequently richness their putative parasitoids of other invertebrates. greatestThe proportion werehoverflies(Syrphidae); were these reference haplotypes and high variation in read abundance between target target between read abundance in variation high and haplotypes reference Other secondary OTUs were assignable to awi to assignable were OTUs secondary Other Numts Numt(s diverge without the constraints ofcoding constraints the without diverge ) – both situations that are common in HT barcoding studies but potentially potentially but studies barcoding HT in common are that situations both ) – Cephus pygmeus e samples. Instead, the detection of pollen beetles ( beetles pollen of detection the Instead, e samples. Numts . This method (and the removal of several other OTUs whose , a flower visitor whose larvae feed in the stems of cereal in thestemsof whosevisitor feed , a flower larvae Sarcophaga et al. greatly exceeds the number of species of in expected number greatlyexceeds the critical preventing step th geny) reduced the total number of Apiformes OTU OTU Apiformes of number total the reduced geny) 2005). Based on these criteria a total of 72 OTUs Based 72 ontotal criteria 2005). these aof ran and dipteran pollinators attracted to flowers flowers to attracted pollinators dipteran and ran sp. (flesh flies) that arecarrion flies) sp.(flesh thator feeders r it is dependent on the availability of “true” “true” of on availability the dependent is r it de range of distantly related taxa, including regions and thus may deviate in in maylength, deviate andthus regions e overestimate of species Meligethes Numts cox1 OTUsand is a novel Numts ) was

Accepted Article This by is protectedarticle reserved. Allrights copyright. of using approach our that indicating problems, bythese affected greatly not was XT library Nextera 13 different using primary PCR the conclude that we muchlower, was wells the within contamination the Because S1). Table (Supplementary the combined model(platextrap) shows only marginally higher levels ofcontamination as problem, the add to may contamination Trap-level sequencing. index in or errors the tags), PCR not XT indices, Nextera (in the tag-jumping laboratory, in the dueto mixing physical be either could which indices, library different but index primer same the with samples between i.e. plate, single DNAextraction. However,we find that greatestthe rate of contaminationmay been have within a specimens fromthe same pan trap were stored together priormorphological to identification and assignedto Apiformes (beyond the et al. (Lucas target of the anddiet symbionts parasites, including associations, ecological and composition community local detect to be used can DNA associated Thus, community. pollinator wider and the specimens target the of associates true as plausible are OTUs secondary most Yet, filtering. escaped or GenBank, th in werepoorly represented that taxa of members be may identity an assign confidently not could MEGAN which to OTUs ‘unknown’ flowering plantsthat includes many species pres including visited, pollinators plants flowering of types the suggest Angiospermae to belonging OTUs sequences. buchneri mite, atracheal including parasites, likely were internal records sequencing Other contamination. trap as well as contamination oflaboratory risk the to exposed additionally thus in andwereprocessed traps inthe present widely 2018). 2018). Cross contamination in the traps may also explain the may number OTUs also large explain the traps of secondary inthe contamination Cross , known to be associated with bumble bees ( bumble bees with associated be to , known Caryophyllales sp., Numts Cichorieae Cichorieae ). The potential for DNA mixing was further increased as as increased further was mixing DNA for potential The ). sp., sp., a parallel study in the same sequencing run and run and sequencing same the in study parallel a ey may be chimeras or sequencing errors that ent in meadows). In addition, widely observed observed widely Inaddition, meadows). in ent primer tags before being combined in a single asingle in being before combined primer tags Geraniaceae Geraniaceae Bombus sp. sp.), and numerous bacterial bacterial numerous and sp.),

and

Lamiids (a (a Lamiids large clade of Locustacarus Accepted Article This by is protectedarticle reserved. Allrights copyright. specialists (Tang High-throughput sequencing cangreatly changethe approach to monitoring of pollinators, by taxonomic verified databases reference against reads sequence of identification mass through Conclusions identification. morphological during handling specimen for gloves latex of use and bottles, storage and pans, nets bee ofclean use the such as andimplemented, be understood should contamination methods work together, greater awarenessthes of present in virtually every one of the specimens. As scientists using morphological and molecular DNA, human found the widely regarding particular in facility, sequencing ofthe protocols strict the and handling specimen with precautions the given processing, lab molecular during introduced misassignment of PCR fragments. In addition, some reve and forward on tags primer unique same the species discrimination, which only left about 5% of species UK without an unequivocal identification the same and other traps. In addition, we found numerous OTUs apparently contributed by contributed apparently OTUs numerous we found addition, In traps. other and same the from specimens other mixing with of levels apparent high surprisingly detected We metabarcoding. specim single of deepsequencing the However, observederrors morphological the identification the for accounting and parallel, in conducted identifications morphological with consistency at species level. The subsequent utilization of the database for UK bees monitoring shows high filteringprocedure. Lastly, the widely used OT read of the part a routine be should which copies, mitochondrial actual the ‘trails’ their distribution which greatly total estimates inflate ofthe species diversity;can they be outefficiently filtered as each sample (after adequate taxonomic numerical adequate and (after each sample in read abundant most the against analyses OTU of comparison by a shown as detection, species

et al. et 2015; Ji et al et . 2013).We firstestablished the power of the U clustering may not produce the most accurate most accurate the may produce not U clustering correct molecular identification rates exceeds 97%. rse strands can largely eliminate the problem the of largelystrands can eliminate rse e issues isneeded and the steps to avoid DNA types of contamination less were likely tobe ens also revealed the ens also revealed filtering), which revealed afullidentification various pitfalls of cox1 markerfor Numts , Accepted Article This by is protectedarticle reserved. Allrights copyright. reviewers commentsfor their on the manuscript. morphological identifications of bees sampled in the NPPMF pilot study. We thank anonymousthree the conducted Wright Ivan and Roberts Stuart Harvey, Martin available. collection NPPMF DGNcollect to bees. Jackie Mackenzie-Dodds (NHM Boroug Royal and SSSI), (Hartslock England Natural Woods), Leigh (Bookham Common, Trust National Park), Ecology Peninsula (Greenwich Trust Land Bristol Downs, CityCouncil (The Troopers Hill), Conservatorsof Wimbledon and Commons, Putney acknowle We WC1101. project under Government Scottish Defra and by funded jointly was study pantrapping pilot NPPMF The Commission. European werede pipelines bioinformatics The HN). (to College Imperial at Programme Training Doctoral Planet a Changing for Solutions Science the NERC of afellowship and NHMUK, fromthe contributions in-kind with PH0521), (contract UK ofthe (Defra) ACKNOWLEDGEMENTS world. the of parts other in pollinators of studies to extended be can and traps pan passive by obtained typically specimens of number larger much forthe canof nowberolled study out database barcode comprehensive and methodology bioinformatics The OTUs. secondary the through cost, while also providingfurther information onspecies interactions and composition ecosystem reduced at monitoring, pollinator in identification taxonomic of speed and accuracy the increases greatly The thus method ofmagnitude. up by orders scaled but sequencing, from Sanger data as way same in the them use and thus species, Linnaean the within subgroups to particular assignment or for variation genotypic establish to i.e. level, read the at data HTS use to possible is it filtering, quality stringent under applied if Yet, samples. more 25% approximately in specimen target of the This work was funded by the UK Department Department UK bythe funded was work This

veloped under the iBioGen project funded by the the by funded project iBioGen the under veloped UK) and NPPMF staff are thanked for making the the making for thanked are staff NPPMF and UK) h of Greenwich (Blackheath) for permission for for permission for (Blackheath) of Greenwich h dge the Borough of Lewisham (Blackheath), (Blackheath), Lewisham of Borough the dge of Environment, Forestry and Rural Affairs Rural Affairs and Environment,of Forestry Accepted Article This by is protectedarticle reserved. Allrights copyright. Amiet F, Herrmann A,M, Müller Neumeyer R(2007) Apidae 5- Amiet Herrmann F, Müller M, A, R (2010) NeumeyerApidae 6 - Amiet F, Herrmann M, Müller A, Neumeyer R (2004) Apidae 4 - - Apidae 3 R(2001) Neumeyer A, Müller M, Amiet F, Herrmann REFERENCES biomonitoring. DNA-based in services commercial offering Competing interests draft. final KQC and APV wrote an initial draft of manuscript. the All authorscontributed to writing the of the advi provided and pipelines analytical designed CA documented, sampled them,preserved morphologicalvouchers, and verifiedidentification. PA and specimens and co-ordinatedmorphological identifications. DGN collected specimens,identified, collected andRO KQC CC, tools. bioinformatics TJCdeveloped analysis. data performed and HN contributions Author methods are detailed in the supplementary materials. sequence the for used scripts Perl label. BEEEE the BOLDunder at available data Sequence Additional https://github.com/tjcreedy/NAPtime. at available are selection andbarcode clustering Data accessibility CQT isCQT Senior Scientist andAPV on the is Science Board Advisory of NatureMetrics, company a CQT, KQC TJC, data; molecular generated HN and study; CQT the designed APV and HN CQT, 208. 208. Melecta, Melitta, Nomada, Pasites, Tet Macropis, Eucera, Epeolus, Epeoloides, Dasypoda, Ceratina, Biastes, Anthophora, Panurgus Stelis Osmia, Megachile, Lithurgus, Heriades, Dioxys,

.

Fauna Helvetica Fauna

26 , 1–317. . Fauna Helvetica Fauna ce on project design and analysis. CQT, HN, TJC, TJC, HN, CQT, analysis. and design project ce on Halictus, Lasioglossum Halictus, . Fauna Helvetica Fauna Anthidium, Chelostoma, Coelioxys, Coelioxys, Chelostoma, Anthidium, Andrena, Melitturga, Panurginus, Panurginus, Melitturga, Andrena,

20 , 1–356. Ammobates, Ammobatoides, Ammobatoides, Ammobates,

. 9 Fauna Helvetica , 1–273.

6 , 1– Accepted Article Ji Y, Ashton L, Pedley JM, JM, Pedley L, Ji Y, Ashton DNA through identifications Biological (2003) JR DeWaard SL, A,Ball Cywinska PDN, Hebert Edgar R (2010) Drinkwater,, Robinson, E. E.J. and Hart,G. A. Keeping (2019) invertebrate research ethical ina (2005) unholy tr R,DeSalle The EganMG, M Siddall CrossI, Notton Small-headed (2017) DG ResinBee, Heriades rubicola, Britain(Hymenoptera: new to C GrayP, Andujar C,Arribas IsaacNJB,Pocock Bias MJO(2015) and information in biologicalrecords. Huson DH, BeierS, Flade I E CA, Hallmann Jongejans M, Sorg This by is protectedarticle reserved. Allrights copyright. diversity. and abundance bee wild on sampling lethal ofrepeated, effect The (2015) E. R. and Irwin, W. D. Inouye, S., J. Ascher, S., E. Wyman, J., Z. Gezon, R Winfree I, LA, Steffan-Dewenter Garibaldi Fujisawa T, Barraclough TG (2013)Delimiting species using single-locus data and Generalized the Falk SJ, Lewington R(2015) of speed and sensitivity improves UCHIME R (2011) C, Knight Quince JC, BJ, Clemente Haas RC, Edgar trees. sampling by analysis Bayesian evolutionary BEAST: A (2007) AJ,Drummond Rambaut Carvell C, Isaac NJB., M,Jitlal Europe central of bees cuckoo the of identification and Review J(2012) P, Straka Bogusch T Fujisawa DT, J,Bilton Bergsten and mitochondrial Metabarcoding AP(2016) Vogler M, Shepherd K, C,Hopkins P,Andujar Arribas Amiet F, A,Müller Neumeyer R(2014)Apidae 2 - Gill RJ, Baldock KCR, Brown MJF 1054 Biesmeijer JC, Roberts SPM, ReemerM Linnean Society Linnean barcodes. areas. protected in biomass insect flying Manuscript Manuscript opinion. public ofshifting landscape Biology Evolutionary barcoding. Megachilidae). WC1101. Project Government: Government and Welsh Scottish (Defra), Affairs Rural and Food Environment, for Department the to report summary Final Framework. Monitoring pesticideeffectsspill. of a analysis of large-scale microbiome sequencing data. Pt2 Society, to Biodiversity In: pollinators. insect wild to threats andmitigating understanding regardless of Honey Bee abundance. Beeabundance. Honey of regardless Biology Systematic sets. data simulated on evaluation and method revised A approach: Coalescent Yule Mixed PLC. Publishing detection. chimera Sphecodes). Halictidae: (Hymenoptera: Netherlands. the and in Britain plants pollinated barcoding. Evolution and Ecology soil. ofthe mesofauna the unveil to arthropods endogean of metagenomics Systropha Sphecodes, Rophites, Rhophitoides, USEARCH fastq_filter, available online available fastq_filter, USEARCH Proceedings of the Royal Society B Society theRoyal of Proceedings Philosophical Transactions of the Royal Society London Series B Series London Society Royal ofthe Transactions Philosophical Systematic Biology Systematic British Journal Entomologyof and NaturalHistory

115 et al.

62 , et al. , et Bioinformatics , 522-531.

, et al. , et 7 et al. , 707-724.

, Art. 214., Art. 7 (2013) Reliable, verifiable and efficient monitoring of biodiversity via via ofbiodiversity monitoring andefficient verifiable Reliable, (2013) Field guide to the bees of Great Britain and Ireland and Britain Great bees of the to guide Field , 1071-1081. , et al. , et (2016) MEGAN Community Edition - Interactive exploration and and exploration - Interactive Edition Community MEGAN (2016) , et al. , et Molecular Ecology , et al. , et (2016)Design and Testing of a National Pollinator and Pollination (eds. Woodward G, Bohan DA), pp. 135-206. (2018) Metabarcoding of freshwater invertebratesto detect the

61 (2012) The effectof geographical scale of sampling on DNA (2017) More thandecline percent 75 overin 27years total , 851-869. , et al. , et (2016)Protecting an ecosystem service: Approaches to

27 , et al. , et , 2194-2200. Science Methods in Ecology and Evolution Ecology in Methods Zootaxa (2006) Parallel (2006) declines in pollinators and insect- Plos One Plos Colletes, Dufourea, Hylaeus, Nomia, Nomioides, Nomioides, Nomia, Hylaeus, Dufourea, Colletes, (2013) Wild (2013) pollinators enhance of fruitcrops set

270 339 at at https://www.drive5.com/usearch/ 27 inity: DNA and taxonomy, species delimitation Methods in Ecology and Evolution and in Ecology Methods . , 1-41. , 146-166. Fauna Helvetica Fauna Science , 313-321. , 1608-1611.

12 . Plos Computational Biology Computational Plos

313 , 351-354.

4 30 , 1-239. Ecosystem Services: From From Services: Ecosystem , 1-6. , 1-6. Biological Journal of the the of Journal Biological

. Accepted Author . Accepted 360 , 1905-1916. Bloomsbury

6 12 Methods in , 1044- . . BMC Accepted Article Meier R, Wong W, Srivathsan A, Foo M (2015) $1 DNA barcodes for reconstructing complex complex reconstructing for barcodes DNA $1 M (2015) A,Foo W, Srivathsan Wong R, Meier Lucas A, Bodger O, Brosi BL, Lever JJ,van NesEH, Scheffer M, Bascompte The J (2014) suddencollapse of pollinator communities. andphenological biogeographical Molecular, (2007) DLJ Quicke A, Dawson GR, Else M, Kuhlmann In: MAFFT. with ofDNA sequences alignment Multiple H G, (2009) K, Toh Asimenos Katoh This by is protectedarticle reserved. Allrights copyright. Team RC (2018) Smith MA, Rodriguez JJ, Whitfield JB Whitfield JJ, Rodriguez MA, Smith Shokralla S, Porter TM, Gibson JF ju Tag (2015) MTP Gilbert K, IB,Bohmann Schnell Schmidt Schmid-Egger S, C, Moriniere J, Haszpr Ricketts TH, RegetzJ, Steffan-Dewenter I Index Barcode The species: forall registry PDN Hebert A DNA-based S, (2013) Ratnasingham Pons J, Vogler ComplexAP (2005) Pattern of Coalescence and Evolution Fast of a Mitochondrial rRNA M G,Friendly Blanchet J, Oksanen Notton DG, T,DayAR Quoc (2 Cuong reads. sequencing fromhigh-throughput sequences removes adapter Cutadapt (2011) M Martin EF Connor S, G,Lebuhn Droege Kuhlmann Revision M(2007) of the bees of the Colletes fasciatus-group in southern Africa MC Neel E, Lonsdorf CM, Kennedy Powney M, Edwards GD, Carvell C, Potts SG, Roberts SPM,R Dean Tang M, Hardman Y, CJ, Ji Hardman Tang M, succinctus evidence for the existence of three western European sibling species inthe supportsyears of classical 250 taxonomy: identifications European forCentral bees (Hymenoptera, Apoidea partim). partim). Apoidea (Hymenoptera, phenomes and finding speciesrare specimen-richin samples. metabarcoding. byDNA revealed networks pollination hoverfly generalised Ecology Letters agroecosystems. in pollinators bee on wild effects landscape Biology in Molecular metabarcoding. Computing, Vienna, Austria. URL URL https://www.R-project.org/ Vienna, Austria. Computing, are general there patterns? Number (BIN) system. collections. and morphology, barcoding, DNA history, natural of integration byiterative exposed platform. MiSeq Illumina an using identification specimen misidentifications inmetabarcoding studies. Nature Communications Beetles. of Tiger Radiation Recent a in Pseudogene version 2.5-1 History Natural and Entomology to(Hymenoptera, Britain Megachilidae, Megachilinae,Osmiini). EMBnet 8 scales. global (Hymenoptera: Colletidae). 105 Europe. Europe. 991–1000 abundance via mitogenomics. mitogenomics. via abundance , Art. 5133 5133 , Art. , 12359-12364.

Journal of Apicultural Research Apicultural of Journal 17 R: A language and environment for statistical computing. R Foundation for Statistical Statistical for Foundation R computing. statistical for environment and language R: A group. group. , 10-12. Proceedings of the National Academy of Sciences of the United States of America of States United the of ofSciences Academy National the of Proceedings available at http://cran.r-project.org/. http://cran.r-project.org/. at available Conservation Biology Conservation

17 Ecology Letters Ecology Organisms Diversity and Evolution and Diversity Organisms , 350-359. (ed. D P), pp. 39-64. Humana Press. Press. Humana 39-64. pp. P), D (ed. et al. Plos One Plos , et al. , et

, et al. , et 10 et al. , et al. , et (2015) High-throughput monitoring of wild bee diversity and diversity bee wild of monitoring High-throughput (2015) , Art. 1018 African Invertebrates African Ecology Letters Ecology , et al. , et et al. 016) Viper's016) Bugloss MasonBee, , et al. , et , et al. , et Methods in Ecology and Evolution Evolution and Ecology in Methods (2010) Declinesof managed honeybees and beekeepers in (2018)Floral resource partitioning byindividualswithin

(2013) Detecting insect pollinator declines on regional and 8 16

, et al. , et 29 . Molecular Ecology Resources (2019) Widespread (2019) lossesof pollinating insects in Britain. (2015) Massively(2015) parallelsequencing multiplexDNA for , 1245-1257

(2018) (2018) , 134-143. 27 (2008) Extreme diversity of tropical parasitoidwasps (2013) A global quantitative synthesis of local and oflocal synthesis quantitative A global (2013) , 113-120. (2008) Landscape effects oncrop pollination services:

49 unar G,unar Hebert PDN (2015)DNAbarcoding largely , 15-22.

vegan: Community EcologyPackage. R package 11 mps illuminated - reducing sequence-to-sample sequence-to-sample -reducing illuminated mps Molecular Ecology Resources , 499-515.

48 , 155-165. 7(2): , 121-165. Molecular Biology and Evolution and Biology Molecular . Scientific Reports Ecology Letters Ecology Cladistics

Hoplitis 15 6 , 985-1000. (9), 1034-1043(9), ( , n/a-n/a. , n/a-n/a. Hoplitis British Journal of of Journal British

15

Scientific Reports 5

16 , 1289-1303. . ) , 584-599. adunca Methods Colletes Colletes

22 , new , new (4),

Accepted Article This by is protectedarticle reserved. Allrights copyright. reAd mergeR. PEAR: Paired-End a Illumina Zhang andaccurate A (2014) Flouri T, JJ,K, fast Kobert Stamatakis L Gielly KA, Brathen Yoccoz NG, C, BommarcoG Westphal Carre R, Vanbergen AJ, M, Baude JCBiesmeijer form diversity. habitats and biogeographical regions. regions. biogeographical and habitats pollinators. Bioinformatics Frontiers in Ecology and theEnvironment and Ecology in Frontiers

30 Molecular Ecology , 614-620. , 614-620. , et al. , et , et al. , et , et al. , et (2012)DNA from soil mirrors plant taxonomic and growth

21 (2008) Measuring bee diversity in different European European in different diversity bee Measuring (2008) , 3647-3655. (2013) Threats to an(2013) Threats ecosystem service: pressures on Ecological Monographs Ecological

11 , 251-259.

78 , 653-671. Accepted Article B Morphologic Most A

This by is protectedarticle reserved. Allrights copyright. absolute number of correct identifications. morphologicalor molecular identification. NAPselect the that methodNote the returned highest identification, relativemorphological tothe identificationof specimen, the at levels different of identification metric sufficiency (a of of theset), reference respectively. 1B.The accuracy Table of level genus or aspecies produce that of those proportion and method) selection barcode HT the obtained, the number of matches toa reference collectionsequence in the ofaccuracy metric of (a sequences of Thenumber 1A. Table set. reference BEEE the against barcodes of identification Table 1.The recovery success of different methods of barcode rateselection of and the accurate eu pce oa oprsn 1 2 1 Genus Species 0 Total 1 Species Genus Total comparisons 3 731 comparisons 583 Species Species Total 555 comparisons genus-level identification: identificationswith and 1 morphological species-level Ofspecimens 761 with totalOf 762 specimen al ID level level ID al

ID level level ID Molecular : Specimens with s: identification molecular species-level Sequences with sequences Most eu-ee orc 10)1(0% 1(100%) 2(100%) 1(100%) 1(100%) Genus-levelcorrect 3(100%) Genus-levelcorrect 528 Genus-level correct 471 Species-level correct identification molecular genus-level Sequences with only dataset reference matching sequences Specimens with (99.5%) 556 749 762 762 OTU frequent OTU frequent 05)0(% 2(0.3%) 0(0%) 3 (0.5%) 734 584 559 (95.1%) (84.9%) (100%) 584 read read frequent Most read read frequent Most (96.9%) 565 (86.8%) 506 (99.7%) 732 NAPselect sequence sequence NAPselect (96.7%) 707 (83.6%) 611

Accepted Article This by is protectedarticle reserved. Allrights copyright. Figure 1 BEEEE reference set of specimens. specimens. referencesetof BEEEE UK. B. Sampling localities beeof specimenscollected by NHMUKand the thatNPPMF formed the number of species represented per family. This dataset comprises of species 278bee 255 the inthe this study (165), which were compiled with existing BOLD sequences (245species) to form total the of part as sequenced species the denote columns TheBEEEE totals, give bars above and numbers collection, Column colours denotewhether species from eachdataset comprised any UK specimens, reference the form to compiled were sequences which from source andgeographical dataset family, each of species of bee number The A. study. this in used andspecies Specimens 1. Figure

Accepted Article This by is protectedarticle reserved. Allrights copyright. BOLD analysis, or in both. The species names are shown on the tree. thetree. shownon species are names BOLD in The both. or analysis, species which is ei adifferent colour Eachrepresents show specieswhich are not split or lumpedwith other species in bothGMYC the and BOLD analysis. fill no with Boxes bins. BOLD the boxes of column second the and species, GMYC the demonstrates genus the of subset a of analysis BOLD and GMYC 2. Figure

ther split, lumped or both in either the GMYC or GMYC the either in both or lumped split, ther Nomada . The first column of boxes Accepted Article This by is protectedarticle reserved. Allrights copyright. species in each genus is given above the genus name. Note that for many genera the morphospecies morphospecies the genera many for that Note name. genus the above given is genus each in species of number total The (blue). morphospecies other with sequences those lump and morphospecies splitting morphospeciesthe (orange), lump th either shown, are clusters of incongruent number The separately. assessed is genus Each lines). (stipled assignments BIN BOLD and lines) (solid (GMYC) model Coalescent Yule Mixed Generalised Figure Congruence3. of species delimitationwith assignment toLinnaean species,comparing the bars). (no methods DNA-based either with congruent wereperfectly assignments e morphospeciese (yellow), or both split the

Accepted Article This by is protectedarticle reserved. Allrights copyright. or low confidence) HT barcode, while blue shows only high confidence HT barcodes. HT confidence high only shows blue while HT barcode, confidence) or low barcodes usingNAPselect theHTSdataset. across of pools amplicon of percentage The 4: Figure a given number of reads reads number of agiven Red line shows the value for any (high confidence confidence any (high for value the shows line Red

or fewer producing HT HT producing fewer or Accepted Article This by is protectedarticle reserved. Allrights copyright. presented here, grouped by genus and the three different barcode designation methods. methods. designation barcode different three the and genus by grouped here, presented identifi correctly be failed barcode to designated calcul we species, morphological each For genera. across species morphological different for failure identification molecular of proportion The 5: Figure ed using the reference database. These values These are database. thereference ed using ated the proportion of of proportion the ated specimens for which the specimens which for

Accepted Article This by is protectedarticle reserved. Allrights copyright. possible). (all contamination cross of rate background a with compared and trap plate same and plate same the from samples considering when higher significantly is recovery OTU shared of rate The source. that from recovered OTUs cross-contamination possible of possible of proportion the shows axis y and contamination, cross of sources different different from OTUs cross-contamination possible rates recovery of range 95% confidence and mean the showing whisker plot Figure Box 6: and

sourcescross of contamination. Xaxis shows the