Calibration of the multi-gene metabarcoding approach as an efficient and accurate biomonitoring tool

Guang Kun Zhang Department of Biology McGill University, Montréal

April 2017

A thesis submitted to McGill University in partial fulfillment of the requirements of the degree of Master of Science

© Guang Kun Zhang 2017

1 TABLE OF CONTENTS

Abstract ...... 3

Résumé ...... 4

Acknowledgements ...... 5

Contributions of Authors ...... 6

General Introduction ...... 7 References ...... 9

Manuscript: Towards accurate detection: calibrating metabarcoding methods based on multiplexing multiple markers...... 13 References ...... 32 Tables ...... 41 Figures ...... 47

General Conclusions ...... 52 References ...... 54

Appendix ……………...... 55 Manuscript supporting information ...... 55 Additional materials ...... 83

2 ABSTRACT

Climate change can impact biodiversity across different ecosystems, hence large-scale, time-sensitive biomonitoring tools are needed to survey global biodiversity. DNA barcoding is often used to identify single species based on single gene fragments, whereas DNA metabarcoding combines barcoding and high-throughput sequencing to survey for multiple species in complex environmental samples. The metabarcoding approach has been adopted for biodiversity surveys and diet analyses, but is only now starting to be widely applied in various fields such as early detection of invasive species and forensics science. The selection of genetic markers used for metabarcoding can greatly affect species detection rates and the taxonomic accuracy of the species detected. An ideal genetic marker would have both high amplification success, due to conserved priming sites, and high discrimination power due to divergent sequences of amplified genetic fragments within the taxonomic groups of interest. For many taxa, a single marker that provides these characteristics has proven to be elusive; however, only a limited number of metabarcoding studies have used multiple genetic markers to circumvent this problem and/or have cross-validated the species detected from natural environmental samples. The use of evolutionarily-independent genetic markers with different sequence characteristics is expected to improve species detection rates and the accuracy of species discrimination; for example, the use of both mitochondrial and nuclear genetic markers. Only few studies to date have used cocktails of species-specific or group-specific primer pairs to increase species detection rates. To address these outstanding issues, we have improved species detection by calibrating a metabarcoding approach using multiple markers and multiple primer pairs on mock communities with species known a priori. This approach can be applied for biomonitoring natural environmental samples containing similar species from the same major taxonomic groups as tested here.

3 RÉSUMÉ

Le changement climatique a des répercussions sur la diversité biologique des différents écosystèmes. Un outil de biosurveillance rapide à grande échelle est donc nécessaire pour étudier la biodiversité mondiale. Bien que le « barcoding moléculaire » soit souvent utilisée pour identifier des espèces uniques basées sur des fragments de gène unique, l'approche du « métabarcdoing » combine le barcoding d’ADN et le séquençage à haut débit pour l'étude de plusieurs espèces dans des échantillons environnementaux complexes. Cette approche de métabarcoding a été utilisée dans les enquêtes sur la biodiversité et les analyses alimentaires, et seulement récemment commence à être appliquée dans divers domaines tels que la détection précoce des espèces envahissantes et la science médico-légale. Cependant, le choix des marqueurs génétiques peut grandement affecter les taux de détection des espèces et la précision taxonomique des espèces détectées. Un marqueur génétique idéal devrait avoir à la fois un succès d'amplification élevé avec des sites d'amorçage conservés et un pouvoir de discrimination élevé avec des séquences divergentes de fragments génétiques amplifiés dans tous les groupes d'intérêts taxonomiques, mais tel marqueur est souvent insaisissable. Seul un nombre limité d'études metabarcoding ont utilisé des marqueurs génétiques multiples au lieu d'un marqueur unique pour cibler des groupes taxonomiques plus divers et / ou ont validé les espèces détectées dans les échantillons naturels de l'environnement. L'utilisation de marqueurs génétiques qui évoluent indépendamment avec des caractéristiques de séquences différentes améliorerait particulièrement les taux de détection des espèces et la précision de la détection des espèces, par exemple une combinaison de marqueurs génétiques mitochondriaux et nucléaires. De plus, seules quelques études ont utilisé un mélange de paires d'amorces spécifiques d'une espèce ou d'un groupe spécifique pour augmenter les taux de détection des espèces. Nous avons donc amélioré la détection des espèces avec l'approche de metabarcoding en utilisant un mélange de deux marqueurs et de multiple paires d’amorces pour caractériser des communautés simulées avec des espèces connues a priori. Cette approche peut ensuite être appliquée pour la biosurveillance des échantillons environnementaux naturels contenant des espèces similaires provenant de divers groupes taxonomiques.

4 ACKNOWLEDGEMENTS

I foremost would like to express my gratitude towards my supervisor - Melania Cristescu. She not only guided and supported me in the academics, but also mentored me for becoming a better scientist and leading me into the future career. I really appreciate her trust in me among all the candidates for working on such interesting projects, since I was previously working on species distribution modeling for my honours thesis. Due to lack of experience and English as a second language, writing has been my weakness in the academics life. Melania lent me books on academics writing, and used her own experience for guiding me how to improve my academics writing. In addition, she offered me opportunities to present my work at both local and international conferences, which helped me in the oral communication and built more confidence for presenting myself. Furthermore, as a student living far from family, Melania organized many social activities for the whole lab members, which made me feel the warmth for being home again. I am also so grateful towards my co-supervisor, Cathryn Abbott, from the Fisheries & Oceans Canada, Pacific Biological Station. She provided valuable insights into the manuscript and thesis writing, and she also mentored me on writing clearly, concisely, and effectively based on her writing experience in the government setting. She not only supported me in the academics, but also provided me the opportunity to work in her highly quality-controlled government laboratory that supports regulatory science. I initially was invited to her lab to meet her students and exchanged the ideas of my projects in June 2016, then I was hired as a part-time employee from January-April 2017 for applying my skills and obtaining government working experience. Both Melania and Cathryn had brought me into the fantastic network called CAISN (Canadian Aquatic Invasive Species Network), which allowed me meeting, getting assistance, and built collaboration with excellent scientists from government, academics institutions across Canada. I would also express my special thanks to the post-doc, Frédéric Chain, who helped me as a role-model and mentor in the academics. He taught me very patiently from zero knowledge in the bioinformatics field into scripting and performing analysis on my own. He not only helped me so much on scripting and data analysis of my manuscript, but also provided very valuable comments on my thesis and manuscript writing. I would also like to thank my committee members Rowan Barrett and Irene Gregory-Eaves for useful discussion and valuable insights of

5 this project. Additionally, I thank my fellow lab members, who provided support and company: Tiffany Chin, Katie Millette, Julien Flynn, Emily Brown, Sarah Finlayson, Genelle Harrison, Alessandra Loria, James Bull, Michaela Harris, and Joanne Littlefair. Finally, I would like to thank my friends and family for supporting me in the academics and personal life, which provided me love and happiness throughout my degree.

CONTRIBUTIONS OF AUTHOURS

GZ designed the project, performed the laboratory work, scripted the bioinformatics workflow, analyzed the dataset, and wrote the thesis. FC helped with the bioinformatics analysis including scripts writing and troubleshooting and provided valuable feedbacks on manuscript and thesis writing. CA provided valuable insights on writing both manuscript and thesis writing. MC funded the projects and helped with designing the project and guided the manuscript and thesis writing at various stages.

6 GENERAL INTRODUCTION

Biodiversity across all the Earth’s ecosystems are susceptible to negative impacts casued by various factors such as climate change, aquatic and air pollution, habitat destruction, and land-use-related stressors (Bellard et al. 2012; Bellard et al. 2014). Species extinction rates are projected to increase with future global temperatures regardless of taxonomic groups (Uber 2015). The changing climate is also causing the loss of ice sheets and opening up new channels for shipping, such that aquatic invasive species may potentially be introduced into new areas via ballast water (Smith & Stephenson 2013; Chain et al. 2016). A cost-effective and large-scale biomonitoring tool is needed in a time-sensitive fashion (Ji et al. 2013; Cristescu 2014) as traditional sampling methods in the field are labor intensive and time-consuming. DNA metabarcoding has been developed for large-scale taxonomic identification of complex environmental samples (Ji et al. 2013; Taberlet et al. 2012). While DNA barcoding is often used to identify a single organism pre-sorted from a mixed sample based on a single gene fragment, metabarcoding combines DNA barcoding and high-throughput sequencing (HTS) to survey a mixture of species from environmental samples (Hebert et al. 2003; Taberlet et al. 2012; Cristescu et al. 2014). Metabarcoding as a biomonitoring tool is starting to become widely used for biodiversity surveys (Drummond et al. 2015; Chain et al. 2016), aquatic invasive/nonindigenous species detection (Zaiko et al. 2015; Brown et al. 2016), and diet analysis (Leray et al. 2013; Pompanon et al. 2012) across different ecosystems. Many studies have shown that the metabarcoding approach has the potential to detect more species than traditional sampling based on morphology (Leray et al. 2013; Hope et al. 2014; Drummond et al. 2015; Cowart et al. 2015). For example, Zaiko et al. (2015) compared the metabarcoding approach with routine marine coastal water monitoring surveys and morphological analyses, and reported that four out of five nonindigenous species were detected exclusively using the metabarcoding approach. In addition, a higher number of diatom taxa was detected using metabarcoding compared to light microscopy for water quality assessment (Zimmermann et al. 2015). Despite the promise of using metabarcoding as a cost-effective approach for environmental biomonitoring, several factors such as marker choice, the availability of reference sequences for species-level discrimination power, and

7 bioinformatics analyses for accurate assignment of sequencing reads to the corresponding species hinder its efficiency (Bucklin et al. 2016; Taberlet et al. 2012; Cristescu 2014). Most metabarcoding studies discuss the criticality of choosing the genetic markers because no ideal marker has been found. The ideal genetic marker (standardized DNA fragments) used in metabarcoding must show high interspecific variation and low intraspecific variation, and it is often difficult to strike a balance between high amplification success and species discrimination power (Bohle and Gabaldón 2012; Cristescu 2014). Various genetic markers have been commonly used for targeting different taxonomic groups (Shaw et al. 2017), such as 16S rRNA for bacteria (Claesson et al. 2010), 18S rRNA for eukaryotes (Bik et al. 2012; Chain et al. 2016), 12S rRNA for vertebrates (Kelly et al. 2014), mitochondrial COI for vertebrates and invertebrates (Yu et al. 2012), Internal Transcribed Spacer ITS for fungi and algae (Schoch et al. 2012), and Chloroplast rbcL and trnL for plants (Hollingsworth et al. 2009). Novel genetic variants introduced by mutation accompanied by genetic drift or natural selection leads to changes in the sequence composition of DNA across generations (Li 2006), and the evolutionary rates vary between different segments of the genome and across taxa (Hebert et al. 2003). The use of evolutionarily independent markers with very different characteristics would potentially provide both broad taxonomic coverage and high phylogenetic resolution. The nuclear 18S rRNA gene is more conserved than the mitochondrial cytochrome c oxidase subunit I (COI) gene, leading to greater amplification success when examining broad taxonomic groups and lower resolution for species identification compared to COI (Saccone et al. 1999; Hebert et al. 2003; Deagle et al. 2014). Taberlet et al. (2012) suggested that using a single organelle marker can cause erroneous identifications in metabarcoding studies due to interspecific mitochondrial introgressions. To avoid this problem, both uniparentally inherited organelle DNA and biparentally inherited DNA should be used and compared. Furthermore, Geller et al. (2013) reported major difference in species recovery/amplification levels among different primer pairs for barcoding major marine invertebrates groups, and suggested that the use of group-specific primer pairs can efficiently balance the inconsistent amplification success across broad taxonomic groups (Bucklin et al. 2016; Cristescu 2014). Several metabarcoding studies used the multigene approach for surveying broader taxonomic groups (e.g. Drummond et al. 2015; Zaiko et al. 2015), cross-validating species detections across different markers (e.g. Drummond et al. 2015; Stoeck et al. 2010), and increasing amplification success across species

8 by using multiple primer pairs of single marker (e.g. Letendu et al. 2014; Clarke et al. 2014). However, mock communities, in which the species composition is known a priori, can be efficiently used for cross-validating multi-marker studies (Cristescu 2014; Pompanon et al. 2012). Clarke et al. (2014) conducted the only metabarcoding (to my knowledge) study that tested multigene (16S and COI) and multi-primer pairs per marker (4 primer pairs for 16S and 6 primer pairs for COI) in mock communities consisting of 13 insects species and 1 arachnid. There is a great need to test the multi-marker and multi-primer pairs metabarcoding approach in more complex communities, as PCR inhibitors (Demeke & Jenkins 2010), specimen biomass (Deagle et al. 2013), and primer-template mismatches (Piñol et al. 2015) can all affect species detection levels. Many studies have suggested the use of traditional sampling methods in parallel with metabarcoding methods (Cristescu 2014; Bucklin et al. 2016). However, the use of mock communities allows for direct comparisons between morphological identification and molecular identification based on various marker(s). The metabarcoding method has the potential to revolutionize large-scale biodiversity surveying. However, the choice of marker(s) still remains challenging. In this study, we improved and validated a current metabarcoding approach by using multi-marker and multi- primer pairs. Species detection rates and species detection accuracy were both tested and compared between nuclear 18S and mitochondrial COI markers, and among 3 COI primer pairs in the mock communities composed of 78 zooplankton species. A validated metabarcoding approach that can be applied for surveying various aquatic habitats was developed.

REFERENCES

Bellard C, Bertelsmeier C, Leadley P, Thuiller W, Courchamp F (2012) Impacts of climate change on the future of biodiversity. Ecology Letters, 15, 365-377.

Bellard C, Leclerc C, Leroy B, Bakkenes M, Veloz S, Thuiller W, Courchamp F (2014) Vulnerability of biodiversity hotspots to global change. Global Ecology and Biogeography, 23, 1376-1386.

9 Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK (2012) Sequencing our way towards understanding global eukaryotic biodiversity. Trends in Ecology & Evolution, 27, 233-243.

Bohle HM, Gabaldón T (2012) Selection of marker genes using whole-genome DNA polymorphism analysis. Evolutionary Bioinformatics, 8, 161-169.

Brown EA, Chain FJJ, Crease TJ, MacIsaac HJ, Cristescu ME (2015) Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities? Ecology and Evolution, 5, 2234-2251.

Brown EA, Chain FJJ, Zhan A, MacIsaac HJ, Cristescu ME (2016) Early detection of aquatic invaders using metabarcoding reveals a high number of non-indigenous species in Canadian ports. Diversity and Distributions, 22, 1045-1059.

Bucklin A, Lindeque PK, Rodriguez-Ezpeleta N, Albaina A, Lehtiniemi M (2016) Metabarcoding of marine zooplankton: prospects, progress and pitfalls. Journal of Plankton Research, 38, 393-400.

Chain FJJ, Brown EA, MacIsaac HJ, Cristescu ME (2016) Metabarcoding reveals strong spatial structure and temporal turnover of zooplankton communities among marine and freshwater ports. Diversity and Distributions, 22, 493-504.

Claesson M, Wang Q, O’Sullivan O (2010) Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Research, 38, e200.

Clarke LJ, Soubrier J, Weyrich LS et al. (2014) Environmental metabarcodes for insects: in silico PCR reveals potential for taxonomic bias. Molecular Ecology Resources, 14, 1160- 1170.

Cowart DA, Pinheiro M, Mouchel O, Maguer M, Grall J, Miné J, Arnaud-Haond S (2015) Metabarcoding is powerful yet still blind: a comparative analysis of morphological and molecular surveys of seagrass communities. PLoS One, 10, e0117562.

Cristescu ME (2014) From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity. Trends in Ecology & Evolution, 3, 613-623.

Deagle BE, Jarman SN, Coissac E, Pompanon F, Taberlet P (2014) DNA metabarcoding and the cytochrome c oxidase subunit I marker: not a perfect match. Biological Letter, 10, 20140562.

Deagle BE, Thomas AC, Shaffer AK, Trites AW, Jarman SN (2013) Quantifying sequence proportions in a DNA-based diet study using Ion Torrent amplicon sequencing: which counts count? Molecular Ecology Resources, 13, 620-633.

10

Demeke T, Jenkins GR (2010) Influence of DNA extraction methods, PCR inhibitors and quantification methods on real-time PCR assay of biotechnology-derived traits. Analytical and Bioanalytical Chemistry, 396, 1977-1990.

Drummond AJ, Newcomb RD, Buckley TR, Xie D, Dopheide A, Potter BC et al. (2015) Evaluating a multigene environmental DNA approach for biodiversity assessment. Gigascience, 4, 46.

Geller J, Meyer C, Parker M et al. (2013) Redesign of PCR primers for mitochondrial cytochrome c oxidase subunit I for marine invertebrates and application in all-taxa biotic surveys. Molecular Ecology Resources, 13, 851-861.

Hebert PDN, Cywinska A, Ball SL, deWaard, JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society London B, 270, 313-321.

Hollingsworth PM, Forrest LL, Spouge JL et al. CBOL Plant Working Group (2009) A DNA barcode for land plants. PNAS, 106, 12794-12797.

Hope PR, Bohmann K, Gilbert MTP, Zepeda-Mendoza ML, Razgour O, Jones G (2014) Second generation sequencing and morphological faecal analysis reveal unexpected foraging behaviour by Myotis nattereri (Chiroptera, Vespertilionidae) in winter. Frontiers in Zoology, 11, 39.

Ji Y, Ashton L, Pedley SM et al. (2013). Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. Ecology Letters, 16, 1245-1257.

Kelly RP, Port JA, Yamahara KM, Crowder LB (2014) Using environmental DNA to census marine fishes in a large mesocosm. PLoS One, 9, e86175.

Leray M, Yang JY, Meyer CP, Mills SC, Agudelo N, Ranwez V, Boehm JT, Machida RJ (2013) A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents. Frontiers in Zoology, 10, 34.

Letendu G, Wubet T, Chatzinotas A, Welhelm C, Buscot F, Schlegel M (2014) Effects of long- term differential fertilization on eukaryotic microbial communities in an arable soil: a multiple barcoding approach. Molecular Ecology, 23, 3341-3355. Li WH (2006) Molecular Evolution. Sinauer, ISBN 0-87893-480-4. Piñol J, Mir G, Gomez-Polo P, Agust N (2014) Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of . Molecular Ecology Resources, 15, 819-830.

11 Pompanon F, Deagle B, Symondson WOC, Brown DS, Jarman SN, Taberlet P (2012) Who is eating what: diet assessment using next generation sequencing. Molecular Ecology, 21, 1931-1950.

Saccone C, Giorgi C, Gissi C, Pesole G, Reyes A (1999) Evolutionary genomics in Metazoa: the mitochondrial DNA as a model system. Gene, 238, 195-209.

Schoch CL, Seifert KA, Huhndorf S et al. (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences of the United States of America, 109, 6241-6246.

Shaw JLA, Weyrich L, Cooper A (2017) Using environmental (e)DNA sequencing for aquatic biodiversity surveys: a beginner’s guide. Marine and Freshwater Research, 68, 20-33.

Smith LC, Stephenson SR (2013) New Trans-Arctic shipping routes navigable by mid-century. Proceedings of the National Academy of Science of the USA, 110, E1191-E1195.

Stoeck T, Bass D, Nebel M et al. (2010) Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Molecular Ecology, 19, 21-31.

Taberlet P, Coissac E, Hajibabaei M et al. (2012) Environmental DNA. Molecular Ecology, 21, 1789-1793.

Uber MC (2015) Climate change: Accelerating extinction risk from climate change. Science, 348, 571-573.

Yu DW, Ji Y, Emerson BC, Wang X, Ye C, Yang C, Ding Z (2012) Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods in Ecology and Evolution, 3, 613-623.

Zaiko A, Samuiloviene A, Ardura A, Garcia-Vazquez E (2015) Metabarcoding approach for nonindigenous species surveillance in marine coastal waters. Marine Pollution Bulletin, 100, 53-59.

Zimmermann J, Glockner G, Jahn R, Enke N, Gemeinholzer B (2015) Metabarcoding vs. morphological identification to assess diatom diversity in environmental studies. Molecular Ecology Resources, 15, 526-542.

12

Towards accurate species detection: Calibrating metabarcoding methods based on multiplexing multiple markers

G.K. Zhang,* F.J.J. Chain,* C. Abbott,¶ and M.E. Cristescu* * Department of Biology, McGill University, 1205 Docteur-Penfield, Montreal, Quebec H3A 1B1, Canada ¶ Aquatic Health, Fisheries and Oceans Canada, 3190 Hammond Bay Road, Nanaimo, British Columbia V9T 6N7, Canada

Corresponding author: Guang K Zhang Department of Biology, McGill University, 1205 Docteur Penfield, Stewart Biology Building N6/9, Montreal, QC, Canada H3A 1B1; Phone: 514-398-1622; e-mail: [email protected]

13 Abstract Metabarcoding is becoming a rapid and efficient method for biodiversity assessment. This approach combines DNA barcoding of complex samples with high-throughput sequencing, often using one genetic marker (barcode) and one, generally universal primer pair. However, species-level identification depends heavily on the choice of marker and the performance of the primer pair of choice, with a trade-off between species amplification success and species-level identification resolution. A multi-barcode approach combined with the use of multiple primer pairs per marker may provide a suitable balance between amplification success and species-level resolution across divergence groups of species. We present a versatile metabarcoding protocol for biomonitoring that involves the use of multiple barcodes and multiple primer pairs per barcode in a single high-throughput run via sample multiplexing. Based on the amplification success and the amplification regions of the primer pairs, a combination of three COI primer pairs and one 18S primer pair were selected for metabarcoding mock communities of zooplankton. The amplified regions of the primer pairs refereed to the ‘fragments’ hereon, making the 3 COI and one 18S fragments. Among the species detected by COI in mock communities with multiple species, the majority (62.5-81.8%) of the species was detected by all 3 COI fragments, 0-10% of the species was detected exclusively by one COI fragment alone, and 18.2-20.8% of the species was detected by 2 out of 3 COI fragments. With the use of the COI marker which included 3 fragments, we detected 42.9-78.6% species, while the use of the 18S marker resulted in the detection of 55.4-71.4% of species. The species detection level was significantly improved to 71.4-88.5% by combining all 4 fragments of the two markers. Furthermore, the sequencing depth and species detection rates were not affected by using multiple markers in one illumina run (mixing amplicons from different marker/fragment prior to indexing) vs. single marker (one marker per illumina run), which makes the method versatile and cost effective. Overall, our metabarcoding approach utilizing multiple barcodes and multiple primer pairs per barcode improved the species detection rates by 22.5% to 34.9%, making it an attractive, cost effective method to biomonitor natural zooplankton communities and reporting aquatic invasive species or endangered species with more confidence.

Keywords: metabarcoding, multigene, multiple primer pairs, zooplankton, COI, 18S

14 Introduction Metabarcoding is opening the opportunity of surveying entire ecosystems overcoming important limitations of traditional DNA barcoding which was developed to taxonomically identify single specimen (Hebert et al. 2003; Ratnasingham and Hebert 2007). Metabarcoding combines the barcoding approach with next-generation sequencing (NGS) technologies to assess biodiversity in ‘bulk’ (ie mixed) samples or from environmental DNA (reviewed in Taberlet et al. 2012; Cristescu 2014). Although metabarcoding is a very promising method for biodiversity assessment, its efficient application is still hindered by several limitations. The challenges of developing a reliable metabarcoding approach include finding a suitable DNA region to amplify across target taxa, PCR amplification across divergent groups of organisms, PCR amplification errors, the need for high-quality reference sequence databases, and choosing the appropriate bioinformatic steps with reasonable thresholds (Taberlet et al. 2012; Yoccoz 2012; Cristescu et al. 2014). Therefore, choosing one or more appropriate genetic markers for metabarcoding is essential (Pompanon et al. 2012; Bucklin et al. 2016), as it affects PCR amplification success and whether species-level resolution for biodiversity estimation is achieved. To allow efficient species identification, the genetic marker used must show high interspecific variation and low intraspecific variation; however, it is often difficult to strike a balance between high amplification success across taxon groups and species-level resolution (Bohle and Gabaldón 2012). Most current metabarcoding projects use a single locus approach and the most common markers are the cytochrome c oxidase subunit I (COI) for (Hebert et al. 2003; Leray et al. 2013), Internal Transcribed Spacer ITS for fungi (Horton and Bruns 2001; Schmidt et al. 2013), plastid DNA (matK and rbcL) for land plants (CBOL Plant Working Group, 2009; Yoccoz et al. 2012). Using a single organelle marker can occasionally cause erroneous species identification due to interspecific mitochondrial introgressions, therefore both uniparentally-inherited organelle DNA and biparentally inherited DNA should be used (Taberlet et al. (2012). The mitochondrial COI gene has a high resolution for species identification and has extensive reference sequence libraries (Ratnasingham and Hebert 2007), but it is often difficult to amplify consistently across diverse taxa (Deagle et al. 2014). In contrast, the nuclear 18S gene is more conserved at the priming site than the mitochondrial COI gene, leading to greater amplification success when examining broad taxonomic groups, but gives lower resolution for species identification (Saccone et al. 1999; Hebert et al. 2003; Bucklin et al. 2016). Another

15 disadvantage with the 18S gene is that it varies in length at V4 region across diverse species, causing major issues of alignments across broad taxa and consequent difficulties testing various divergence thresholds at the bioinformatics steps (Hebert et al. 2003; Flynn et al. 2015). Due to variability in the number of polymorphic sites and the availability of reliable reference sequences for different genetic markers, there is a need to use multiple markers in metabarcoding studies, especially for distinguishing between closely related species (Bucklin et al. 2016; Chase & Fay 2009; Cristescu 2014; Drummond et al. 2015). A single marker approach is commonly used, while a multi-gene approach has been applied in a limited metabarcoding studies to my knowledge. However, comparisons of biodiversity estimates across markers are not reported in some multi-marker metabarcoding studies (e.g. COI for metazoan and RuBiscCO for diatoms, Zaiko et al. 2015; species-specific primer pairs of COI and cytochrome b markers, Thomsen et al. 2012; chloroplast trnL and rbcL for surveying different terrestrial habitats, Yoccoz et al. 2012). In addition, some multi-marker metabarcoding studies only used a single primer pair per marker (e.g. Drummond et al. 2015; Kermarrec et al. 2013), which can lead to underestimation of species detection compared to using multiple primer pairs per marker. Furthermore, a multi-gene metabarcoding approach should be validated using realistic mock communities, where the expected number of species is known a priori (e.g. Clarke et al. 2014; Kermarrec et al. 2013; Elbrecht et al. 2016). Clarke et al. (2014) is the only metabarcoding study thus far that tests multiple primer pairs of both 16S and COI markers in mock communities; however, primer efficiency was only tested on 14 species with 13 insects and 1 arachnid. There is a need to test species detection rates and taxonomic identification accuracy for the multi- marker and multi-primer pairs of each marker in mock communities of diverse taxonomic groups. In this study, we test the efficiency of using a combination of mitochondrial and nuclear markers (COI and 18S) and multiple COI primer pairs in a single Illumina run for recovering species by metabarcoding mock communities of zooplankton. Species detection rates and detection accuracies are compared between 18S marker when sequenced alone in one Illumina run vs. 18S marker when sequenced and multiplexed with other markers/fragments in one run. We validate the multiplexing multigene approach using a series of mock communities containing Single Individuals per Species (SIS), Multiple Individuals per Species (MIS), as well as Populations of Single Species (PSS) composed of single individuals, low abundance populations

16 and high abundance populations of single species. This validated workflow can be applied to assess zooplankton biodiversity in natural aquatic habitats.

Methods Primer testing Preliminary primer tests were conducted to examine amplification success of 14 primer pairs across a wide range of metazoan groups. A total of 13 COI primer pairs and one 18S primer pair were selected from the literature and tested to target the COI-5P and 18S-V4 regions (See Supporting Information, Table S1 for the complete list of primers and their sources). Only one 18S primer pair was tested since this primer pair is well known for its successful amplification across broad range of zooplankton groups (Zhan et al. 2013; Chain et al. 2016; Brown et al. 2016). Specimens used in this study were sampled from 16 major Canadian ports that cover four geographic regions (Atlantic coast, Pacific coast, Arctic, Great lakes; Chain et al. 2016; Brown et al. 2016) and were identified morphologically by taxonomists. A total of 104 species belonging to the phyla Rotifera, Crustacea, and the Subphylum Tunicata were selected and tested (see Supporting Information Table S2 for details). A subset of those species was used to assemble mock communities for metabarcoding validation (see below). PCR amplification was performed in a total volume of 12.5µL: 0.2µM of each forward and reverse primers, 1.25U taq DNA polymerase (GeneScript, VWR), 2mM Mg2+, 0.2µM dNTP, and 2µL of genomic DNA. The PCR conditions of each primer pair were used based on their sources in the literature (Supporting Information table S1). The selection of primer pairs for this metabarcoding study was based not only on the amplification success across a wide range of taxa, but also the discriminatory power of the amplified regions for species identification. After investigating the amplification success, we selected one primer pair targeting the nuclear 18S V4 region (Zhan et al. 2013) and 3 COI primer pairs producing 3 different fragments within the COI-5P region (Table 1 and Figure 1). The three COI primer pairs selected for metabarcoding the mock communities amplify three distinct COI fragments: the FC fragment by LCO1490 x Ill_C_R (Folmer et al. 1994; Shokralla et al. 2015), the Leray fragment by mlCOIintF x HCO2198 (Leray et al. 2013; Folmer et al. 1994), and the Folmer fragment by LCO1490 x HCO2198 (Folmer et al. 1994). The 3 different fragments of COI-5P region have different levels of genetic variation and

17 amplification success rates across the species in the mock communities. These four primer pairs were subsequently used for metabarcoding several mock communities.

Assemblage of mock communities Three types of mock communities were designed for incorporating intra-genomic, intra- specific and inter-specific variation within the COI-5P and 18S-V4 regions across various invertebrate taxonomic groups. Mock communities were constructed with the aim of representing natural zooplankton communities including species from broad taxonomic groups: Mollusca, Rotifera, Tunicata, (Amphipoda, Anostraca, Cladocera, Cirripedia, Copepoda, Decapoda) (see Supporting Information Table S3 for detailed species list). These specimens had been identified morphologically by taxonomists to the species or level, and in a few exceptions to the family level. Three types of mock communities were assembled, hereafter referred to as Single Individuals per Species (SIS) consisting of single individuals from each of 76 species (Table S3: 1a, 1b, 1c, 1d, 1e, 1g), Multiple Individuals per Species (MIS) consisted of various number of individuals of 37 species (Table S3: 2a, 2b, 2c, 2d, 2e, 2g), and Populations of Single Species (PSS) consisted of single, low and high number of individuals of single species (Table S3: 3a1- 3d3), respectively. The inclusion of single individuals in the SIS communities allowed examination of species detection with only inter-specific variation. On the other hand, the MIS communities, which most closely resembled natural communities, allowed the examination of species detection with both intra-specific and inter-specific variation. DNA from mock communities was extracted using Qiagen DNeasy Blood & Tissue kits and stored in ultra pure water in the freezer at -20ºC, as described in Brown et al. (2015).

Library preparation and Next-Generation Sequencing (NGS) DNA extractions were first quantified using PicoGreen (Quant-iTTM Picogreen dsDNA Assay Kit, Thermo Fisher Scientific Inc.), then diluted to 5ng/µL when the concentration was higher than this. The protocol ‘16S Metagenomic Sequencing Library Preparation’ (Illumina Inc.)’ was used with small modifications to prepare the sequencing-ready libraries. Library preparation involved a first PCR, a first cleaning with Agencourt AMPure beads (Beckman Coulter Life Sciences Inc.), indexing PCR with Nextera Index kit, and a second clean-up prior to next-generation sequencing (NGS). The first PCR was performed on separate plates for all 4

18 primer pairs, with 2 replicates of each library for each fragment. The four primer pairs amplified four different genetic fragments of 18S V4 and COI-5P markers: 18S fragment by Uni18S x Uni18SR (Zhan et al. 2013), FC fragment by LCO1490 x Ill_C_R (Folmer et al. 1994; Shokralla et al. 2015), Leray fragment by mlCOIintF x HCO2198 (Leray et al. 2013; Folmer et al. 1994), and Folmer fragment by LCO1490 x HCO2198 (Folmer et al. 1994). PCR amplification was performed in a total volume of 12.5µL: 0.2µM of each forward and reverse primers, 6µL of 2xKAPA HiFi HotStart ReadyMix (KAPA Biosystems Inc., US), and 1.5µL of diluted genomic DNA. Due to the incompatibility of KAPA kit with primers involving inosine (“I”) in the COI primer Ill_C_R (Shokralla et al. 2015), all the FC fragments were amplified using a standard PCR gradient as in the original paper (Shokralla et al., 2015). The PCR thermocycler regimes were the same as in the original papers: 18S V4 (Zhan et al. 2013), FC (Shokralla et al. 2015), Leray (Leray et al. 2013), and Folmer (Folmer et al. 1994) (see Figure 1 for details). The 2 replicates of each PCR reaction for each fragment were pooled together after validation on a 1% electrophoresis gel. Equal volumes of 8µL from each fragment were quantified and pooled to make 32µL for each library, containing all four fragments. After this step, there was a total of 24 libraries each with 4 different PCR amplicons: 6 libraries of simple communities; 6 of complex communities; and 12 libraries of single species communities (see Supporting Information table S3 for details). The 24 libraries obtained were cleaned using ultra-pure beads at ratio of 0.875 (28µL beads in 32 µL solution), indexed using Nextera® XT Index Kit (24-index), and final clean-up using ultrapure beads to become sequencing-ready. All the sequencing-ready libraries were submitted to Genome Quebec for final quantification, normalization, pooling and sequencing using pair-end 300bp Illumina MiSeq sequencer in one run. Thus, the method for mixing amplicons of different markers/fragments prior to indexing and sequencing is not only versatile but also cost effective. A subset of 4 SIS and 4 MIS communities were also sequenced using only the 18S primer pair (a single marker per Illumina run) using the same Illumina MiSeq pair-end 300bp platform. This step was conducted to compare sequencing depth and species detection rates of a specific marker between single marker sequenced alone in one run and the same marker when sequenced and multiplexed with other markers/fragments (more than one marker/fragment per run for the same sample/library).

19 Building a local reference database We created a local database composed of 156 total sequences (see Supporting Information Table S4 for details). Reference sequences were generated by Sanger Sequencing in this study (27 sequences) or related projects conducted at the Biodiversity Institute of Ontario (BIO) on the same zooplankton populations (noted as ‘Pending-XXXX’), or in our laboratory (6 sequences; Brown et al. 2015), or were obtained from online databases (123 reference sequences; NCBI GenBank http://www.ncbi.nlm.nih.gov/nuccore, BOLD http://www.boldsystems.org/). I used congeneric or confamilial species as reference sequences when the focal species were identified to family level only, or when we lacked more specimens for Sanger Sequencing and no online reference sequences where available (Supporting Information Table S4). All COI reference sequences were aligned and adjusted to have an equal length of 652bp, so the FC fragments matched the 5’ end of the reference sequences, the Leray fragments matched the 3’ end of the reference sequences, and the Folmer fragments matched the whole COI-5P gene region (see Figure 1 for the detailed fragments positions). The 18S reference sequences were either 18S V4 region only or longer sequences containing V4 region without trimming. The best BLAST hit against our local reference database was used to classify each sequence read and a positive detection with at least 95% identity and an alignment length of at least 150bp in forward and reverse reads. These relative relaxed thresholds of the BLAST were used to accommodate for the species with congeneric or confamiliar reference sequences.

Bioinformatics analyses The bioinformatic pipeline in this study consisted of demultiplexing, quality filtering and trimming of raw reads, and subsequently assigning to reads via BLASTN (Altschul et al. 1990) against our local reference database. Each mock community was processed as a separate library with unique combination of indices, thus each sample/mock community is referred as ‘library’ in the following. Raw reads were first demultiplexed, by assigning them to their corresponding libraries, generating paired forward R1 and reverse R2 files for each library (Raw Read Pairs in Table 2). The raw reads were then quality filtered and trimmed via ‘Quality Trimmer’ from the FASTX-Toolkit http://hannonlab.cshl.edu/fastx_toolkit/, with a minimum Phred quality score of 20 and a minimum length of 150bp after trimming (see Trimmed-R1 and Trimmed-R2 in Table 2). After quality trimming the R1 and R2 reads were separately used as

20 queries in BLAST against the local database. The BLAST results of R1 and R2 were then concatenated based on the unique sequence read ID assigned by Illumina sequencer (see Paired Reads after Trimming in Table 2). In the blasting results, only the sequences with both R1 and R2 returning to a reference sequence at >95% identity and >150bp match in length were kept for further analysis (see Filter-Blasting step in Table 2). The blasting results were then further filtered based on whether both R1 and R2 reads hit the same species (see Filter-Blasting Same Species in Table 2) and if they hit different species they were dropped out for further analysis. The number of reads of each fragment were assigned and compared across the 24 libraries (Table 2). The above bioinformatics pipeline referred to as R1&R2 analysis. Species detection was confirmed when 1 or more read(s) matched a reference library sequence with >95% identity and >150bp. Since this study used mock communities with expected species known a priori, reads that matched with BLAST the expected species in the corresponding libraries were considered as the number of reads with correct species assignments. The correct species assignment percentages were calculated by dividing the number of reads with ‘correct’ species assignments by the total number of reads with filtered BLAST hits for the 18S and COI markers separately. Both read abundances and species information were obtained from python and bash scripting (available by request from the corresponding author).

Results Primer testing The amplification success of 104 species with 13 COI primer pairs and one 18S primer pair was highest for the 18S fragment amplified (78/104 species tested, 75%), followed by the COI_Radulovici fragment (60/104 species tested, 57.7%), and the COI_FC fragment (51/104 species tested, 49%) (Supporting Information Table S2). The amplification success rate was increased by 30.8% in COI_FC fragment with reverse primer Ill_C_R comparing to the COI_Folmer fragment with same forward primer LCO1490 but different reverse primer HCO2198. While the COI_Leray fragment had the same reverse primer HCO2198 but different forward primer mlCOIintF than the COI_Folmer fragment with forward primer LCO1490, the overall amplification success rate was very similar, with 38.5% for COI_Leray and 37.5% for COI_Folmer fragments. Although the 3 COI fragments (COI_FC, COI_Leray, COI_Folmer) were designed to target a wide range of phyla (Supporting Information Table S1), the

21 amplification success was species or taxonomic groups dependent. For example, the COI_FC fragment had 28.6% increase than the COI_Leray and 24.1% increase than the COI_Folmer fragments in amplifying 73 species of Arthropoda, and both the COI_FC and COI_Folmer fragments had 83.3% increase in amplifying infraorder Cladocera than the COI_Leray fragment.

Read abundance comparison A total of 20.73 million raw read pairs were quality filtered to 16.72 million paired reads, then BLAST filtered to 12.04 million paired reads, and finally 9.85 million paired reads were assigned to the corresponding fragments (Figure 2). The four Single Individuals per Species (SIS) libraries (1a, 1b, 1c, 1d) and the four Multiple Individuals per Species (MIS) libraries (2a, 2b, 2c, 2d) were quantified and pooled in equal molar for next-generation sequencing, but the number of raw read pairs vary, especially for the MIS library 2c containing Carcinus maenas, Corbicula fluminea, Daphnia pulex, and Nerita species (Figure 2a). A large proportion of MIS 2c raw reads were dropped out due to poor quality (31.8%) after the quality trimming step. Thus, the raw read pairs did not correlate with the amount of amplicons prior to next-generation sequencing. Furthermore, the amounts of amplicons for low and high number of individuals in the Populations of Single Species (PSS) were in proportion to the number of individuals, for example, PSS 3a3 with 30 individuals had 3 times of the amplicons of the PSS 3a2 with 10 individuals. However, the number of raw read pairs did not correlate with the proportion of amplicons or number of individuals (Figure 2a). The BLAST results from the forward R1 and reverse R2 reads usually matched the same species but not always (Figure 2a). The relative read abundance of the four fragments differed across the 24 libraries, depending on the species compositions in the mock communities (Figure 2b-2e). In general, the most abundant fragment was the Leray fragment, followed by the 18S fragment, then the FC fragment, and the Folmer fragment with the lowest abundance (Figure 2b-2e). Note that the number of reads of the 18S fragment in libraries 3d1-3d3 (Figure 2b) was not zero but were very low (see number of reads of ‘18S fragment’ in Table 2).

The performance of 18S marker when sequenced alone (single marker) vs. sequenced together with other markers (multi-marker approach)

22 Since the method presented here is the multi-marker approach with more than one marker sequenced in one run, making it versatile and cost effective, but the impacts on sequencing depth and species detection rates need to be examined and compared to the single marker approach. The same genomic DNA of the four SIS (libraries 1a, 1b, 1c, 1d) and the four MIS (libraries 2a, 2b, 2c, 2d) communities were used in the library preparation and sequenced in the same proportions of 5% on two separate NGS runs (18S alone and 18S with 3 other COI fragments) using the same Illumina platform. The sequencing depth (number of reads per individual/species) and species detection rates of the 18S V4 marker from our multi-marker metabarcoding approach (18S with 3 COI fragments) was compared to a separate single-marker metabarcoding approach (18S alone) on the SIS communities (Table 3) and MIS communities (Table 4). In both the SIS and MIS, the sequencing depth (number of reads) on average and per individual or per species was consistent between the single-marker and multi-marker datasets (Table 3 & Table 4). In the SIS communities of 56 species, a total of 6 species were detected exclusively in single-marker or multi-marker datasets, but the number of reads were close to zero (equal or below to 11 reads). In the MIS communities of 14 species, only 2 species were detected differently between single- marker and multi-marker datasets (less than 50 reads and marked in grey in Table 4). This demonstrates that the majority of species (89.3%; 50 out of 56 species) in SIS and 85.7% of species (12 out of 14 species) in MIS were consistently detected in both single-marker and multi- marker metabarcoding approaches, and the sequencing depth per individual or per species was also consistent regardless of 18S marker when sequenced alone or 18S marker when sequenced and multiplexed with other markers/fragments sharing the same indices in the same library.

Primer pair choice on species detection Differences in the number of species detected among the 3 COI fragments were compared in both SIS communities (libraries 1e, 1g) and MIS communities (libraries 2e, 2g) (Figure 3). Among the species detected by any of the COI fragment, the majority of species (62.5%-81.8%) were detected using any of the 3 COI fragments, and only 1-2 species were detected by a single COI fragment but not the other COI fragments. The 3 COI fragments together detected 42.9%-78.6% of the species in the corresponding libraries, with the lowest proportion being in the SIS library 1e (24 out of 56 species), and the highest proportion being in the MIS library 2e (9 out of 14 species) (Figure 3). In addition, the species detection rates of the

23 3 COI fragments for the 49 species differed between the primer testing on the single species samples and the metabarcoding of mock communities (Supporting Information Figure S5). The FC fragment had a higher species detection rates using PCR & Gel electrophoresis in the primer testing, but Leray and Folmer fragments had a higher species detection rates in the NGS metabarcoding mock communities. The combination of 3 COI primer pairs improved the species detection rates greatly in primer testing (10.2-32.7%) but slightly in metabarcoding mock communities (2.1-6.2%) than using single COI primer pair alone (Figure S5).

Marker choice on species detection Species detection levels were also compared between the 18S V4 marker and the COI-5P marker (considering all 3 COI fragments) in the SIS communities (libraries 1e, 1g) and MIS communities (libraries 2e, 2g) (Figure 4). The differences in species detection rates between the COI and 18S markers highly depended on the species composition of each library. In the four SIS and MIS (libraries 1e, 1g, 2e, 2g), the COI marker detected 42.9%-78.6% species while the 18S marker detected 55.4%-71.4% species. Overall, the species detection rates combining both markers were often substantially higher than when using a single marker, ranging from 71.4%- 88.5% (Figure 4). The two markers sometimes recovered the same species (25%-57.1%) but not always. Species recovery was slightly improved by adding more primer pairs of the same marker (3 COI fragments), but significantly improved by adding a different marker (18S fragment) in both SIS and MIS communities (Figure 5). Furthermore, the use of both COI marker (3 fragments) and 18S marker improved the species detection rates by 10.2-14.3% in the primer testing and by 16.3-18.3% in the metabarcoding mock communities than using single marker alone (Supporting Information Figure S5). Since this metabarcoding study used the mock communities with species known a priori, the ‘correct’ taxonomy assignment rates of both 18S and COI markers were then assessed based on the number of reads with BLAST matches to the actual expected species in the corresponding libraries (Table 5). The number of reads in the species assignment referred to the reads with both R1 and R2 matching the same species >95% identity and >150bp match in length. The 18S marker had 98.2%-99.4% correct species assignments, and much higher percentages than the COI marker of 72.5%-89.4% on average (Table 5). However, the average correct species assignment percentage of 18S marker was 1.2% lower in the libraries of multiple species (SIS

24 and MIS) than in the PSS libraries 3a1-3d3, while the COI marker had a 16.9% increase on the average correct species assignment in the SIS and MIS than in the PSS (Table 5; Table S6).

Discussion Most metabarcoding studies aim to taxonomically assign next-generation sequencing reads to species level, and accurately detect the species present in complex samples (Coissac et al. 2012; Cristescu 2014). The choice of marker has been discussed in most metabarcoding study, and all makers are known to have some disadvantages (Deagle et al. 2014). The ideal marker should be a short species- or generic-specific genomic region flanked by relative conserved priming sites for trade-off between amplification success and phylogenetic resolution across broad range of taxonomic groups (Deagle et al. 2014; Shaw et al. 2017). The mitochondrial cytochrome c oxidase subunit I (COI) was used as the standard marker for animals due to fast evolution rate for better species level resolution (Hebert et al. 2003; Yu et al. 2012), but its highly variable priming sites led to the difficulty in amplifying broad taxonomic groups (Deagle et al. 2014; Bucklin et al. 2016). The small subunit 18S rDNA was suggested and used as an alternative marker for obtaining high taxonomic coverage due to conserved priming sites (Bik et al. 2012; Deagle et al. 2014; Chain et al. 2016), but the 18S marker was reported as underestimating the true biodiversity because of conserved sequences for the cryptic or closely related species (Tang et al. 2012; Bucklin et al. 2016) or overestimating diversity due to high intra-individual variation for the multicopy gene problem (McTaggart & Crease 2005; Bik et al. 2012; Flynn et al. 2015). Both the hypervariable 18S V4 region and the COI-5P region have previously been used for assessing aquatic biodiversity in single marker metabarcoding studies, for examples, 18S marker (Zhan et al. 2013; Brown et al. 2016; Chain et al. 2016), and COI-5P marker (Ayalagas et al. 2016; Leray et al. 2013). The use of multiple group-specific primer pairs was suggested for COI marker for obtaining higher amplification success (Cristescu 2014; Bucklin et al. 2016), and both uniparentally inherited marker like COI and biparentally inherited marker like 18S were suggested for use and comparison (Taberlet et al. 2012). Through the use of mock communities with species known a priori (Brown et al. 2015), a multi-gene (COI and 18S) and multi-primer pair (3 COI primer pairs) metabarcoding approach was applied for every sample/library in one run as being cost effective and overcoming both amplification biases of COI gene and species level resolution of 18S gene.

25 Multiple primer pairs The mitochondrial COI marker was reported as technical challenging for amplification of broad taxonomic groups due to lack of conserved priming sites (Deagle et al. 2014; Bucklin et al. 2016), the cocktails of both group-specific (Bucklin et al. 2010) and species-specific (Thomsen et al. 2012) primer pairs were used in barcoding or metabarcoding the COI marker. The 18S primer pair used in this study was designed for targeting V4 region of zooplankton and had been successfully used in the metabarcoding studies (Zhan et al. 2013; Brown et al. 2015; Chain et al. 2016; Brown et al. 2016). The 13 COI primer pairs from the literature were selected and tested on 103 zooplankton species for examining the amplification biases and obtaining higher taxonomic coverages. Based on our primer testing results, the 13 COI primer pairs had a much lower amplification success, ranging from 0-57.7%, compared to the high of 75.7% success for the 18S primer pair, which is consistent with the literature. The 13 COI primer pairs tested here indeed showed differences in overall amplification success; however, the amplification successes vary depending on the species, regardless of taxonomic groups, universal or group-specific primer pairs. For example, the ‘COI_Prosser’ primer pair (Supporting Information table S2) was designed specifically for amplifying zooplankton, but it had similar species amplification success with the universal primer pairs such as ‘COI_Folmer’ and ‘COI_Leray’. In other words, amplification successes of the 13 COI primer pairs were generally species-specific rather than group-specific in the majority of taxa tested here (Supporting Information table S2). In addition to amplification success across taxa of interest, amplicon length is also an important consideration for studies using degraded environmental DNA, thus requiring short amplicons (Meusnier et al. 2008; Cristescu 2014), and is upwardly limited by the capacity of NGS technology to obtain accurate long reads (Shaw et al. 2017). For example, primer pairs used here that amplified more than 600bp (Supporting Information table S1 and S2) had sequence gaps between forward and reverse reads when sequenced on the Illumina MiSeq pair-end 300bp platform. Therefore, the combination of the full COI fragment of 658bp (‘COI_Folmer’) with overlapping two short COI fragments (‘COI_FC’ and ‘COI_Leray’) of 325bp and 313bp were chosen for metabarcoding mock communities with expectation of higher species amplification success coverage and suitability for studying natural community DNA or degraded eDNA. Most past metabarcoding studies used single primer pair per marker, but multiple primer pairs (species-specific or not) was suggested and shown to improve the amplification success

26 (Bucklin et al. 2010; Thomsen et al. 2012; Clarke et al. 2014; Bucklin et al. 2016). The species detection rates of the 3 COI fragments in the metabarcoding mock communities were expected to be higher than species amplification success during primer testing, but varied for 3 COI fragments (Figure S5). The Leray and Folmer fragments had higher species detection rates in metabarcoding mock communities than the primer testing, which was potentially due to the high throughput sequencing technology offering higher sensitivity than gel electrophoresis. The combination of all 3 COI fragments and FC fragment alone had higher species detection rates in primer testing than metabarcoding mock communities, which was potentially due to the PCR inhibitors of some species present in mock communities that did not affect single species samples used in primer testing (Demeke & Jenkins 2010). Specimen biomass (Deagle et al. 2013), amount of genomic DNA extracted (Polz & Cavanaugh 1998), DNA extraction methods (Fliegerova et al. 2014), primer-template mismatch (Piñol et al. 2015) are all known to affect PCR amplification and efficiency. The majority of species was detected by all 3 COI primer pairs, potentially due to non-group specific amplification for the targeted taxa. Certain taxonomic groups were detected by one COI primer pair alone or two COI primer pairs, such as with Leray fragment alone, and Stolidobranchia with FC and Folmer fragments. The combination of 3 COI primer pairs did improve the overall species detection rates in both primer testing and metabarcoding mock communities, and the relative number of reads of the 3 COI fragments also varied upon species. Furthermore, Leray et al. (2013) showed the entropy of genetic variation across the COI-5P marker, and multiple species-specific or most taxonomic coverage COI primer pairs had been used to increase the number of species amplified and the taxonomic resolution (Letendu et al. 2014; Thomsen et al. 2012; Clarke et al. 2014). Thus, using multiple primer pairs is expected to improve species detection rates while assessing the natural communities, especially for using multiple non-universal primer pairs covering different regions of the same marker/barcode.

Marker choice The choice of marker can greatly affect the species estimates (Bucklin et al. 2016; Tang et al. 2012; Cristescu 2014). Standard markers are often used for targeting various taxonomic groups (Shaw et al. 2017), such as 16S rRNA for bacteria (Claesson et al. 2010), 18S rRNA for eukaryotes (Bik et al. 2012; Chain et al. 2016), 12S rRNA for vertebrates (Kelly et al. 2014),

27 mitochondrial COI for vertebrates and invertebrates (Yu et al. 2012), Internal Transcribed Spacer for fungi and algae (Schoch et al. 2012), Chloroplast rbcL and trnL for plants (Hollingsworth et al. 2009). Only a limited number of metabarcoding studies used the multi-gene approach, and the use of multiple evolutionary independent markers was suggested but rarely used in a single NGS run. A few metabarcoding studies have used and compared 18S and COI markers for biodiversity survey, with variable results across different taxonomic groups. Drummond et al. (2015) reported both COI and 18S markers showing good proxies to the traditional biodiversity survey dataset in the soil eDNA. Tang et al. (2012) reported that COI marker eDNA surveys in meiofauna estimated more species than morphospecies (species identified by morphology), whereas 18S underestimated species richness. In the current study using zooplankton mock communities, different species or taxonomic groups were detected by using the 18S marker and the COI marker, such as order Cyclopoida, Cardiida and by 18S marker alone, Thecosomata by COI marker alone. We experienced difficulties with assigning reads of the multicopy 18S gene to the corresponding species in certain species/taxa, where the reverse reads had multiple hits to the reference sequences in the BLAST results, and potentially due to highly conserved sequences for the closely related species (Prokopowich et al. 2003; Tang et al. 2012). Brown et al. (2015) listed problematic species for taxonomic assignment, such as Corbicula fluminea and Palaemonetes species due to indels, Artemia species, Balanus species, Daphnia species due to high similarity in their sequences with the species of the same genus, Corbicula fluminea, Diaphanosoma brachyurum, Eurytemora affinis, Leptodora kindtii, Macrocyclops albidus, and Pseudocalanus mimus due to intraspecific variation at the 18S V4 region. The bioinformatics approach in this study for BLAST forward and reverse reads separately without merging may resolve the issues of indels with gaps, but the difficulty of assigning 18S reads to the corresponding species led to the low species detection rates than the primer testing and often resulted in higher taxonomic level identification such as genera or family levels. Furthermore, the difficulty in amplification of COI marker continuously affected the species detection rates despite of using 3 universal COI primer pairs, such as crustacean groups Calanus, Oithona, Pseudocalanus, which were also reported as problematic for amplification in Young et al. (2016). However, no issue was found for assigning COI reads to the corresponding species, and we were able to distinguish species of the same genus such as Gammarus while using the COI marker, thus the species detection rates

28 of COI marker should be in proximity to the actual species amplified by COI gene. Many species were only detected by 18S marker or COI marker due to low amplification success of COI primer pairs and the conserved sequences of the 18S marker for the related species. Overall, the combination of 18S marker and COI marker improved the species detection rates by 22.5-34.9% than using single marker/primer pair at the species level resolution. The multi-gene approach was only used in a limited metabarcoding studies, and cross- validation with multiple markers and mock communities with known species composition was suggested (Cristescu 2014) and only used in Clarke et al. (2014) study to my knowledge. We found that correct taxonomic assignment to species was lower for COI than 18S, potentially because we were considering only the morphologically identified species in the mock communities and did not consider potential prey species in the gut or any contaminating species. For example, in the library 3d2 with 10 individuals of Leptodora kindtii the prey species Leptodiaptomus minutus was also detected but necessarily scored as an ‘incorrect’ species assignment. In addition, the contaminant species Daphnia magna was detected solely by COI marker in the PSS library 3a1 with a single individual of Limnoperna fortune, but not in the PSS 3a2 (10 individuals of L. fortunei) and 3a3 (30 individuals of L. fortunei), which suggested the usefulness of multi marker approach. Thus, despite the evidence that the 18S marker had a higher species assignment accuracy than COI marker using the bioinformatics pipeline (Table 5), only COI marker was able to detect the prey and contaminant species in this study. The sequencing depth was also a major concern for describing the community patterns, as the number of samples/libraries in one run affects the number of reads per individual or per species (Letendu et al. 2014; Shaw et al. 2017), and the universal eukaryotic primers often exerted little or no control over the sequencing depth and coverage (Letendu et al. 2014; Cristescu 2014). Since the 18S primer pair and the three COI primer pairs used in this study were all universal primer pairs, and the four fragments were mixed prior to indexing/labelling in this multi-marker approach, the sequencing depth and species detection rates for the 18S marker were compared between 18S marker when sequenced alone and the 18S marker when sequenced and multiplexed with 3 COI fragments. It was found that the number of reads per individual or species varied significantly upon different species in the mock communities. However, the number of 18S reads assigned to each species was highly similar between single marker and multi-marker approaches, and the overall species detection rates were the same in both SIS and

29 MIS communities. Therefore, the sequencing depth and species detection rates were not affected by using multi-marker approach with more than one marker per sample/library vs. single-marker approach with only one marker per sample/library.

Future Directions Many genetic markers have been applied to date in metabarcoding studies for biodiversity assessment, such as: hypervariable nuclear 18S V4 region for zooplankton (Zhan et al. 2013); nuclear 28S rRNA for (Hirai et al. 2015); mitochondrial COI for metazoan (Leray et al. 2013); mitochondrial 16S (Clarke et al. 2014); mitochondrial cytochrome b for arthropods (Hope et al. 2014); internal transcribed spacer ITS for fungi (Schmidt et al. 2013); and chloroplast trnL and rbcL for plants (Yoccoz et al. 2012). The metabarcoding approach with multiple independent markers had been shown to resolve species amplification efficiency and discrimination power across broad range of taxa (Cristescu 2014; Drummond et al. 2015; Bucklin et al. 2016). Furthermore, the quantification of species abundance in metabarcoding studies is still challenging due to technical and biological biases, such as multicopy ribosomal and mitochondrial genes (Bucklin et al. 2016; Deagle et al. 2013; Prokopowich et al. 2003). Saitoh et al. (2016) reported that species detections of both COI and 16S markers were consistent with the morphologically identified species, but COI performed better than 16S for quantification in the Collembola taxa, which implied that the specific markers would perform better in quantification in other taxa. With the indexing systems in the NGS technology, more genetic markers can be used to cost-effectively assess biodiversity in a broader range of taxa within the same run, and species detection can also be cross-validated among the markers. PCR- free methods have been developed to avoid PCR bias and to enable use of more markers, such as the mito-metagenomics (Liu et al. 2014; Zhou et al. 2013). Therefore, the use of multiple evolutionary independent markers and multiple primers should be used and applied in the future metabarcoding studies for improving the reliability of biodiversity (Bucklin et al. 2016; Cristescu 2014; Leray & Knowlton 2014).

30 Conclusions The community (e)DNA based metabarcoding approach has been used for biodiversity assessment of multiple taxa in various habitats, and with recent improvements in reference sequences database, sequencing platforms, bioinformatics tools, reduction in sequencing time and costs, it has the potential to be widely applied in many other fields such as assessment of microorganisms in health or forensics, inspection of processed foods, surveillance of invasive or endangered species. Most metabarcoding studies only used single locus/marker, and the choice of marker is known to greatly affect the species estimates and the detection accuracy. Our results suggest that a multi-marker and multiple primer pairs metabarcoding approach can overcome the amplification biases, improve the phylogenetic resolution across taxonomic groups of zooplankton, and ultimately biodiversity estimates. The combination of two evolutionary independent markers (18S marker and COI marker) improved the overall species detection rates and allowed the cross-validation with each other for more accurate species detected. The 18S marker had very high species amplification success, but showed poor performance when assigning reads to the corresponding species in the complex communities with multiple species. The reason for poor taxonomic assignment was potentially due to conserved sequences of the cryptic or closely related species. Although many species or taxonomic groups were not amplified or detected by the COI marker, the use of 3 universal COI primer pairs improved the overall species detection rates than using single COI primer pair. Furthermore, the sequencing depth (number of reads per individual/species) and species detection rates of 18S marker were not affected between 18S marker when sequenced alone and 18S marker when sequenced and multiplexed with COI marker of 3 fragments per sample. Overall, our metabarcoding approach utilizing multiple markers (mitochondrial COI and nuclear 18S) and multiple primer pairs for COI marker improved the species detection rates than using a single primer pair and/or marker, and the sequencing depth and species estimates were not affected by adding more markers/fragments in the multi-marker vs. single marker datasets. Thus, the calibrated approach in this study would be useful to biomonitor zooplankton natural communities and detect aquatic invasive species with greater chance and accuracy in major aquatic habitats.

31 Acknowledgements I thank R. Young and S. Adamowicz at the Biodiversity Institute of Ontario (BIO) for generating the reference sequences for many of the zooplankton species included in this study. We also thank E. Brown for help with preparing libraries. Furthermore, I would like to thank T. Crease, E. Brown and J. Flynn for helpful advice. Many of the samples included in this study were collected by the members of the Canadian Aquatic Invasive Species Network (CAISN) and Fisheries and Oceans Canada (DFO). This research was supported by the NSERC CAISN to MEC and CA and the NSERC Discovery Grant to MEC.

REFERENCES Allan E, Weisser WW, Fischer M et al. (2012) A comparison of the strength of biodiversity effects across multiple functions. Oecologia, 173, 223-237. Altschul S, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. Journal of Molecular Biology, 215, 403-410. Arndt A, Marquez C, Lambert P et al. (1996) Molecular phylogeny of eastern Pacific sea cucumbers (Echinodermata: Holothuroidea) based on mitochondrial DNA sequence. Molecular Phylogenetics and Evolution, 6, 425-437. Aylagas E, Borja A, Irigoien X, Rodriguez-Ezpeleta N (2016) Benchmarking DNA metabarcoding for biodiversity-based monitoring and assessment. Frontiers in Marine Science, 3, 96. Aylaga E, Borja A, Rodriguez-Ezpeleta N (2013) Environmental status assessment using DNA metabarcoding: towards a genetics based marine biotic index (gAMBI). PlosOne, 9, e90529. Bálint M, Schmidt PA, Sharma R et al. (2014) An Illumina metabarcoding pipeline for fungi. Ecology and Evolution, 4, 2642-2653. Bik HM, Porazinska DL, Creer S, Caporaso JG, Knight R, Thomas WK (2012) Sequencing our way towards understanding global eukaryotic biodiversity. Trends in Ecology & Evolution, 27, 233-243. Blaxter M, Mann J, Chapman T et al. (2005) Defining operational taxonomic units using DNA barcode data. Philosophical Transactions of the Royal Society, 360, 1935-1943. Bohle HM, Gabaldón T (2012) Selection of marker genes using whole-genome DNA polymorphism analysis. Evolutionary Bioinformatics, 8, 161-169.

32 Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E (2014) OBITools: a Unix- inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16, 176-182. Brown EA, Chain FJJ, Crease TJ, MacIsaac HJ, Cristescu ME (2015) Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities? Ecology and Evolution, 5, 2234-2251. Brown EA, Chain FJJ, Zhan A, MacIsaac HJ, Cristescu ME (2016) Early detection of aquatic invaders using metabarcoding reveals a high number of non-indigenous species in Canadian ports. Diversity and Distributions, 22, 1045-1059. Bucklin A, Lindeque PK, Rodriguez-Ezpeleta N, Albaina A, Lehtiniemi M (2016) Metabarcoding of marine zooplankton: prospects, progress and pitfalls. Journal of Plankton Research, 38, 393-400. Bucklin A, Ortman BD, Jennings RM, Nigro LM, Sweetman CJ, Copley NJ et al. (2010) A “Rosetta Stone” for metazoan zooplankton: DNA barcode analysis of species diversity of the Sargasso Sea (Northwest Atlantic Ocean). Deep Sea Research Part II: Topical Studies in Oceanography, 57, 2234-2247. Cambray JA (2003) Impact on indigenous species biodiversity caused by the globalisation of alien recreational freshwater fisheries. Aquatic Biodiversity, 500, 217-230. Carew ME, Pettigrove VJ, Metzeling L et al. (2013) Environmental monitoring using next generation sequencing: rapid identification of macroinvertebrate bioindicator species. Frontiers in Zoology, 10, 45. Chain FJJ, Brown EA, MacIsaac HJ, Cristescu ME (2016) Metabarcoding reveals strong spatial structure and temporal turnover of zooplankton communities among marine and freshwater ports. Diversity and Distributions, 22, 493-504. Chase MW, Fay MF (2009) Barcoding of plants and fungi. Science, 325, 682-683. Claesson M, Wang Q, O’Sullivan O (2010) Comparison of two next-generation sequencing technologies for resolving highly complex microbiota composition using tandem variable 16S rRNA gene regions. Nucleic Acids Research, 38, e200. Clarke LJ, Soubrier J, Weyrich LS et al. (2014) Environmental metabarcodes for insects: in silico PCR reveals potential for taxonomic bias. Molecular Ecology Resources, 14, 1160- 1170. Coissac E, Riaz T, Puillandre N (2012) Bioinformatic challenges for DNA metabarcoding of plants and animals. Molecular Ecology, 21, 1834-1847. Collins RA, Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Molecular Ecology Resources, 13, 969-975.

33 Costa FO, deWaard JR, Boutillier J et al. (2007) Biological identifications through DNA barcodes: the case of the Crustacea. Canadian Journal of Fisheries and Aquatic Sciences, 64, 272-295. Cowart DA, Pinheiro M, Mouchel O, Maguer M, Grall J, Miné J, Arnaud-Haond S (2015) Metabarcoding is powerful yet still blind: a comparative analysis of morphological and molecular surveys of seagrass communities. PLoS One, 10, e0117562. Creer S, Fonseca VG, Porazinska DL, Giblin-Davis RM, Sung W, Power DM et al. (2010) Ultrasequencing of the meiofaunal biosphere: practice, pitfalls and promises. Molecular Ecology, 19, 4-20. Cristescu ME (2014) From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity. Trends in Ecology & Evolution, 3, 613-623. Darling JA, Blum MJ (2007) DNA-based methods for monitoring invasive species: a review and prospectus. Biological Invasions, 9, 751-765. Deagle BE, Jarman SN, Coissac E, Pompanon F, Taberlet P (2014) DNA metabarcoding and the cytochrome c oxidase subunit I marker: not a perfect match. Biological Letter, 10, 20140562. Deagle BE, Thomas AC, Shaffer AK, Trites AW, Jarman SN (2013) Quantifying sequence proportions in a DNA-based diet study using Ion Torrent amplicon sequencing: which counts count? Molecular Ecology Resources, 13, 620-633. Decelle J, Romac S, Sasaki E, Not F, Mahé F (2014) Intracellular Diversity of the V4 and V9 Regions of the 18S rRNA in Marine Protists (Radiolarians) Assessed by High- Throughput Sequencing. PLosOne, 9, e104297. Demeke T, Jenkins GR (2010) Influence of DNA extraction methods, PCR inhibitors and quantification methods on real-time PCR assay of biotechnology-derived traits. Analytical and Bioanalytical Chemistry, 396, 1977-1990. Drummond AJ, Newcomb RD, Buckley TR, Xie D, Dopheide A, Potter BC et al. (2015) Evaluating a multigene environmental DNA approach for biodiversity assessment. Gigascience, 4, 46. Doyle JJ, Doyle JL (1990) Isolation of plant DNA from fresh tissue. Focus, 12, 13-15. Edgar RC (2013) UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature Methods, 10: 996-998. Edgar RC, Haas BJ, Clemente JC et al. (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics, 27, 2194-2200. Egge E, Bittner L, Andersen T, Audic S, de Vargas C, Edvardsen B (2013) 454 Pyrosequencing to describe microbial eukaryotic community composition, diversity, and relative abundance: a test for marine haptophytes. PlosONE, 8, e74371.

34 Elbrecht V, Taberlet P, Dejean T, Valentini A, Usseglio-Polatera P, Beisel JN, Coissac E, Boyer F, Leese F (2016) Testing the potential of a ribosomal 16S marker for DNA metabarcoding of insects. PeerJ, 4, e1966, DOI 10.7717/peerj.1966. Esling P, Lejzerowicz F, Pawlowski J (2015) Accurate multiplexing and filtering for high- throughput amplicon-sequencing. Nucleic Acids Research, 43, 2513-2524. Fliegerova K, Tapio I, Bonin A, Mrazek J, Callegari ML et al. (2014) Effect of DNA extraction and sample preservation method on rumen bacterial population. Anaerobe, 29, 80-84. Flynn JM, Brown EA, Chain FJJ, MacIsaac HJ, Cristescu ME (2015) Towards accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods. Ecology and Evolution, 5, 2252-2266. Francis RA (2012) A handbook of Global Freshwater Invasive Species, 1st edn. Earthscan Press, New York, NY. Folmer O, Black M, Hoeh W et al. (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and Biotechnology, 3, 294-299. Geller J, Meyer C, Parker M et al. (2013) Redesign of PCR primers for mitochondrial cytochrome c oxidase subunit I for marine invertebrates and application in all-taxa biotic surveys. Molecular Ecology Resources, 13, 851-861. Godfray HCJ (2002) Challenges for taxonomy. Nature, 417, 17-19. Gómez GF, Cienfuegos AV, Gutiérrez LA et al. (2010) Morphological and molecular analyses demonstrate identification problems of Anopheles nuneztovari (Diptera: Culicidae) using dichotomous keys. Revista Colombiana de Entomologia, 36, 68-75. Guryev V, Makarevitch I, Blinov A et al. (2000) Phylogeny of the genus Chironomus (Diptera) inferred from DNA sequences of mitochondrial Cytochrome b and Cytochrome oxidase I. Molecular Phylogenetics and Evolution, 19, 9-21. He F, Hu XS (2005) Hubbell’s fundamental biodiversity parameter and the Simpson diversity index. Ecology Letters, 8, 386-390. Hebert PDN, Cywinska A, Ball SL, deWaard, JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society London B, 270, 313-321. Hirai J, Kuriyama M, Ichikawa T, Hidaka K, Tsuda A (2015) A metagenetic approach for revealing community structure of marine planktonic copepods. Molecular Ecology Resources, 15, 68-80. Hoareau TB, Boissin E (2010) Design of phylum-specific hybrid primers for DNA barcoding: addressing the need for efficient COI amplification in the Echinodermata. Molecular Ecology Resources, 10, 960-967.

35 Hollingsworth PM, Forrest LL, Spouge JL et al. CBOL Plant Working Group (2009) A DNA barcode for land plants. PNAS, 106, 12794-12797.

Hope PR, Bohmann K, Gilbert MTP, Zepeda-Mendoza ML, Razgour O, Jones G (2014) Second generation sequencing and morphological faecal analysis reveal unexpected foraging behaviour by Myotis nattereri (Chiroptera, Vespertilionidae) in winter. Frontiers in Zoology, 11, 39. Horton T, Bruns TD (2001) The molecular revolution in ectomycorrhizal ecology: peeking into the black-box. Molecular Ecology, 10, 1855-1871. Ivanova NV, Zemlak TS, Hanner RH, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. Molecular Ecology Notes, 7, 544-548. Ji Y, Ashton L, Pedley SM et al. (2013) Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding. Ecology Letters, 16, 1245-1257. Johnson WS, Allen DM (2005) Zooplankton of the Atlantic and Gulf coasts. The Johns Hopkins University Press, Maryland, US. Kajtoch L (2014) A DNA metabarcoding study of a polyphagous beetle dietary diversity: the utility of barcodes and sequencing techniques. Folia Biologica (Krakow), 62, 223-234. Kelly RP, Port JA, Yamahara KM, Crowder LB (2014) Using environmental DNA to census marine fishes in a large mesocosm. PLoS One, 9, e86175. Kermarrec L, Franc A, Rimet F, Chaumeil P, Humbert JF, Bouchez A (2013) Next-generation sequencing to inventory taxonomic diversity in eukaryotic communities: a test for freshwater diatoms. Molecular Ecology Resources, 13, 607-619. Leray M, Knowlton N (2014) DNA barcoding and metabarcoding of standardized samples reveal patterns of marine benthic diversity. PNAS, 112, 2076-2081. Leray M, Yang JY, Meyer CP, Mills SC, Agudelo N, Ranwez V, Boehm JT, Machida RJ (2013) A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents. Frontiers in Zoology, 10, 34. Letendu G, Wubet T, Chatzinotas A, Welhelm C, Buscot F, Schlegel M (2014) Effects of long- term differential fertilization on eukaryotic microbial communities in an arable soil: a multiple barcoding approach. Molecular Ecology, 23, 3341-3355. Liu S, Lu J, Su X, Tang M, Zhang R, Zhou L, Zhou C, Yang Q, Ji Y, Yu, DW, Zhou X (2013) SOAPBarcode: revealing biodiversity through assembly of illumina shotgun sequences of PCR amplicons. Methods in Ecology and Evolution, 4, 1142-1150. Loreau M, Naeem S, Inchausti P et al. (2001) Biodiversity and ecosystem functioning: current knowledge and future challenges. Science, 294, 804-808.

36 Lundin D, Severin I, Logue JB, Östman Ö, Andersson AF, Lindström E (2012) Which sequencing depth is sufficient to describe patterns in bacterial alpha- and ß-diversity? Environmental Microbiology Reports, 4, 367-372. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376-380. McTaggart SJ, Crease TJ (2005) Selection on the structural stability of a ribosomal RNA expansion segment in Daphnia obtusa. Molecular Biology and Evolution, 22, 1309-1319. Meusnier I, Singer GAC, Landry JF et al. (2008) A universal DNA mini-barcode for biodiversity analysis. BMC Genomics, 9, 214. Meyer CP (2003) Molecular systematic of cowries (: Cypraeidae) and diversification patterns in the tropics. Biological Journal of the Linnean Society, 79, 401-459. Meyer CP, Paulay G (2005) DNA barcoding: error rates based on comprehensive sampling. PLoS Biology, 3, e422. Palumbi SR (1996) Nucleic acids II: the polymerase chain reaction. In Molecular Systematics (Hillis DM, Moritz C, Mable BK. Sinauer Associates, Sunderland, Massachusetts, eds), 205-247. Park DS, Suh SJ, Oh HW, Hebert PDN (2010) Recovery of the mitochondrial COI barcode region in diverse Hexapoda through tRNA-based primers. BMC Genomics, 11, 423. Piñol J, Mir G, Gomez-Polo P, Agusti N (2015) Universal and blocking primer mismatches limit the use of high-throughput DNA sequencing for the quantitative metabarcoding of arthropods. Molecular Ecology Resources, 15, 819-830. Pochon X, Bott NJ, Smith KF et al. (2013) Evaluating detection limits of next-generation sequencing for the surveillance and monitoring of international marine pests. Plos One, 8, e73935. Polz MF, Cavanaugh CM (1998) Bias in template-to-product ratios in multitemplate PCR. Applied and Environmental Microbiology, 64, 3724-3730. Pompanon F, Deagle B, Symondson WOC, Brown DS, Jarman SN, Taberlet P (2012) Who is eating what: diet assessment using next generation sequencing. Molecular Ecology, 21, 1931-1950. Porazinska DL, Sung W, Giblin-Davis RM, Thomas WK (2010) Reproducibility of read numbers in high-throughput sequencing analysis of nematode community composition and structure. Molecular Ecology Resources, 10, 666-676. Prokopowich CD, Gregory TR, Crease TJ (2003) The correlation between rDNA copy number and genome size in eukaryotes. Genome, 46, 48-50. Radulovici AE, Bernard SM, Dufresne F (2009) DNA barcoding of marine from the Estuary and Gulf of St Lawrence: a regional-scale approach. Molecular Ecology Resources, 9, 181-187.

37 Ratnasingham S, Hebert PDN (2007) BOLD: the Barcode of Life Data System (www.barcodinglife.org). Molecular Ecology Notes, 7, 355-364. Ruiz GM, Fofonoff PW, Carlton JT et al. (2000) Invasion of coastal marine communities in North America: apparent patterns, processes, and biases. Annual Review of Ecology and Systematics, 31, 481-531. Saccone C, Giorgi C, Gissi C, Pesole G, Reyes A (1999) Evolutionary genomics in Metazoa: the mitochondrial DNA as a model system. Gene, 238, 195-209. Saitoh S, Aoyama H, Fujii S et al. (2016) A quantitative protocol for DNA metabarcoding of springtails (Collembola). Genome, 59, 705-723. Schmidt PA, Bálint M, Greshake B, Bandow C, Römbke J, Schmitt I (2013) Illumina metabarcoding of a soil fungal community. Soil Biology & Biochemistry, 65, 128-132. Schloss PD, Westcott SL, Ryabin T et al. (2009) Introducing mother: open-source, platform- independent, community-supported software for describing and comparing microbial communities. Applied and Environmental Microbiology, 75, 7537-7541. Schnell IB, Bohmann K, Gilbert MT (2015) Tag jumps illuminated--reducing sequence-to- sample misidentifications in metabarcoding studies. Molecular Ecology Resources, 15, 1289-1303. Schoch CL, Seifert KA, Huhndorf S, Robert V, Spouge JL, Levesque CA, Chen W. Fungal Barcoding Consortium (2012) Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences of the United States of America, 109, 6241-6246. Shannon CE (1948) A mathematical theory of communication. The Bell System Technical Journal, 27, 379-423, 623-656. Shaw JLA, Weyrich L, Cooper A (2017) Using environmental (e)DNA sequencing for aquatic biodiversity surveys: a beginner’s guide. Marine and Freshwater Research, 68, 20-33. Shin S, Lee TK, Han MJ et al. (2014) Regional effects on chimera formation in 454 pyrosequenced amplicons from a mock community. Journal of Microbiology, 52, 566- 573. Shokralla S, Spall JL, Gibson JF, Hajibabaei M (2012) Next-generation sequencing technologies for environmental DNA research. Molecular Ecology, 21, 1794-1805. Shokralla S, Porter TM, Gibson JF, Dobosz R, Janzen DH, Hallwachs W, Golding B, Hajibabaei M (2015) Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform. Scitific Reports, 5, 9687. Smith DP, Peay KG (2014) Sequence depth, not PCR replication, improves ecological inference from next generation DNA sequencing. PLoS One, 9, e90234.

38 Smith LC, Stephenson (2013) New trans-Arctic shipping routes navigable by midcentury. PNAS, 110, 4871-4872. Stoeck T, Bass D, Nebel M et al. (2010) Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Molecular Ecology, 19, 21-31. Tang CQ, Leasi F, Obertegger U, Kieneke A, Barraclough TG, Fontaneto D (2012) The widely used small subunit 18S rDNA molecule greatly underestimates true diversity in biodiversity surveys of the meiofauna. PNAS, 109, 16208-16212. Tang M, Tan M, Meng G, Yang S, Su X, Liu S, Song W, Li Y, Wu Q, Zhang A, Zhou X (2014) Multiplex sequencing of pooled mitochondrial genomes – a crucial step toward biodiversity analysis using mito-metagenomcis. Nucleic Acids Research, 42, e166. Taberlet P, Coissac E, Hajibabaei M et al. (2012) Environmental DNA. Molecular Ecology, 21, 1789-1793. Tedersoo L, Anslan S, Bahram M, Põlme S, Riit T, Liiv I et al. (2015) Shotgun metagenomes and multiple primer pair-barcode combinations of amplicons reveal biases in metabarcoding analyses of fungi. MycoKeys, 10, 1-43. Thomsen PF, Kielgast JK, Iversen LL et al. (2012) Monitoring endangered freshwater biodiversity using environmental DNA. Molecular Ecology, 21, 2565-2573. Vásárhelyi C, Thomas V (2003) Analysis of Canadian and American legislation for controlling exotic species in the Great Lakes. Aquatic Conservation: Marine and Freshwater Ecosystems, 13(5), 417-427. Willerslev E, Davison J, Moora M et al. (2014) Fifty thousand years of Arctic vegetation and megafaunal diet. Nature, 506, 47-51. Wood SA, Smith KF, Banks JC et al. (2013) Molecular genetic tools for environmental monitoring of New Zealand’s aquatic habitats, past, present and the future. New Zealand Journal of Marine and Freshwater Research, 47, 90-119. Xu S, Hebert PDN, Kotov AA et al. (2009) The noncosmopolitanism paradigm of freshwater zooplankton: insights from the global phylogeography of the predatory cladoceran Polyphemus pediculus (Linnaeus, 1761) (Crustacea, Onychopoda). Molecular Ecology, 18, 5161-5179. Yoccoz NG (2012) The future of environmental DNA in ecology. Molecular Ecology, 21, 2031- 2038. Yoccoz NG, Brathen KA, Gielly L et al. (2012) DNA from soil mirrors plant taxonomic and growth from diversity. Molecular Ecology, 21, 3647-3655. Young RG, Abbott C, Therriault T, Adamowicz SJ (2016) Barcode-based species delimitation in the marine realm: a test using (Multicrustacea: Thecostraca and Copepoda). Genome, 10.1139/gen-2015-0209.

39 Yu DW, Ji Y, Emerson BC, Wang X, Ye C, Yang C, Ding Z (2012) Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods in Ecology and Evolution, 3, 613-623. Zaiko A, Martinez JL, Schmidt-Petersen J, Ribicic D, Samuiloviene A, Garcia-Vazquez E (2015) Metabarcoding approach for the ballast water surveillance – An advantageous solution or an awkward challenge? Marine Pollution Bulletin, 92, 25-34. Zimmermann J, Glockner G, Jahn R, Enke N, Gemeinholzer B (2015) Metabarcoding vs. morphological identification to assess diatom diversity in environmental studies. Molecular Ecology Resources, 15, 526-542. Zhan A, Hulák M, Sylvester F, Huang X, Adebayo AA, Abbott, CL et al. (2013) High sensitivity of 454 pyrosequencing for detection of rare species in aquatic communities. Methods in Ecology and Evolution, 4, 558-565. Zhou X, Li Y, Liu S et al. (2013) Ultra-deep sequencing enables high-fidelity recovery of biodiversity for bulk arthropod samples without PCR amplification. GigaScience, 2, 4.

40 TABLES

Table1: The four primer pairs used in this metabarcoding study: One 18S primer pair amplifying the V4 region and 3 COI primer pairs amplifying different fragments of the COI-5P gene. The complete list of primers used for primer testing are provided in Supporting Information (Table S1). Fragment Primer Name Sequence (5' - 3') Direction Target Taxa Reference Fragment Size

Uni18S AGGGCAAKYCTGGTGCCAGC F Metazoan Zhan et al. 2013 18S 310-620* Uni18SR GRCGGTATCTRATCGYCTT R Metazoan Zhan et al. 2013 LCO1490 GGTCAACAAATCATAAAGATATTGG F Various phyla Folmer et al., 1994 COI_FC 325 Ill_C_R GGIGGRTAIACIGTTCAICC R Arthropoda Shokralla et al. 2015 mlCOIintF GGWACWGGWTGAACWGTWTAYCCYCC F Various phyla Lerey et al., 2013 COI_Leray HCO2198 TAAACTTCAGGGTGACCAAAAAATCA R Various phyla Folmer et al., 1994 313

LCO1490 GGTCAACAAATCATAAAGATATTGG F Various phyla Folmer et al., 1994 COI_Folmer 658 HCO2198 TAAACTTCAGGGTGACCAAAAAATCA R Various phyla Folmer et al., 1994 *The 18S fragment varies in length for different species.

41 Table 2: Reads remaining after each bioinformatic filtering step, R1R2 analysis only. 1) Raw Read Pairs in each library; 2) R1 and R2 after trimming by quality score (20) and length (150bp after trimming); 3) Combined paired R1 and R2 reads after trimming; 4) Paired reads where each read (R1 and R2) match a reference sequence (> 95% identity and >150bp length); 5) Paired reads where both R1 and R2 match the same reference sequence; 6) Paired reads assigned to each fragment. Since the same forward primer was used for FC and Folmer fragments, and the same reverse primer was used for Leray and Folmer fragments, the assignment of the reads to the corresponding COI fragments was based on the read position alignment on the reference sequence, as reported in the BLAST results (see Figure 2 for the fragment positions). Raw Read Trimmed- Trimmed- Paired Reads Filter- Filter-Blasting- Library 18S FC Leray Folmer Pairs R1 R2 after Trimming Blasting SameSpecies 1a 786961 732060 613404 635005 324738 207556 39923 4062 151626 11945 1b 854975 815425 773465 743403 472101 407426 92011 55333 238036 22046 1c 929743 902841 875974 854692 584322 503691 166998 73084 231414 32195 1d 884801 853816 820011 779477 338938 215921 123502 15556 71483 5380 1e 914893 887362 861284 849489 533725 443829 153957 61232 187120 41520 1g 781706 751895 725703 624292 462968 380403 103268 49898 212793 14444 2a 925818 896816 879586 849619 679961 637883 216345 81245 306006 34287 2b 1036566 1009121 963326 957489 504939 454293 163410 91998 177804 21081 2c 1619499 1529135 1444012 1103997 549010 321592 97184 123404 14002 87002 2d 760043 738317 718755 725395 574756 383888 76266 55724 220464 31434 2e 915977 890991 872002 852460 665676 550787 161688 67174 272534 49391 2g 933645 891848 862069 734207 464301 405228 84980 58609 242543 19096 3a1 864551 808857 758339 579143 460992 451683 106024 16847 309430 19382 3a2 620639 599186 562566 415369 344398 343891 165357 14056 154023 10455 3a3 848115 807089 749840 597439 487477 486745 207189 23719 237827 18010 3b1 665830 646399 618519 467575 391294 370523 49843 87122 228910 4648 3b2 598325 582154 560215 447416 360447 344473 45137 80649 216754 1933 3b3 604169 551276 516445 315381 254613 245801 39495 7421 198880 5 3c1 878572 837934 796782 738671 639100 453527 71604 37155 325269 19499 3c2 984226 945575 896894 849501 734085 514066 87262 56390 336935 33479 3c3 947283 906549 856936 797477 708977 509943 75427 61220 348371 24925 3d1 845249 734376 670594 582939 494308 409633 52 104521 295096 9964 3d2 826818 773111 732549 698267 594312 486103 266 167564 301917 16356 3d3 702450 615365 563521 518738 418385 323118 33 139858 164883 18344 Average 863786 821146 778866 696560 501826 410500 96968 63910 226838 22784 STD 200992 194408 189389 185673 129699 104787 62575 41850 81269 18427

42 Table 3: Comparison of species detection and sequencing depth for the 18S V4 marker in 18S marker alone vs. 18S marker multiplexed with other markers/fragments, sequenced using the same gDNA of the Single Individuals per Species (SIS) (library 1a, 1b, 1c, 1d). The ‘18S marker alone’ referred to the dataset generated by sequencing only the 18S V4 marker, while the column with ‘18S marker multiplexed with other markers/fragments’ referred to the dataset generated by sequencing the 18S V4 marker together with three other COI fragments. The inconsistences in species detection between 18S marker alone and 18S marker multiplexed with other markers datasets were marked as grey.

18S marker multiplexed with Phylum/Subphylum Order Genus/Family Species 18S marker alone other markers/fragments Crustacea Amphipoda Crangonyx Crangonyx 3 46 Amphipoda Gammarus Gammarus lacustris 19 5785 Amphipoda Gammarus Gammarus lawrencianus 0 0 Amphipoda Gammarus Gammarus oceanicus 1 0 Amphipoda Hyalella Hyalella azteca 0 0 Amphipoda Hyalella Hyalella clade 1 4 0 Amphipoda Hyalella Hyalella clade 8 4710 8259 Amphipoda Hyperia Hyperia galba 0 0 Amphipoda Hyperoche Hyperoche mediterranea 0 0 Amphipoda Themisto Themisto libellula 7257 9322 Anostraca Artemia Artemia spp 44983 48709 Calanus Calanus finmarchicus 16943 115 Calanoida Centropages Centropages abdominalis 46425 39936 Calanoida Eurytemora Eurytemora affinis 10009 12309 Calanoida Leptodiaptomus Leptodiaptomus minutus 4637 5535 Calanoida Limnocalanus Limnocalanus macrurus 0 0 Calanoida Pseudocalanus Pseudocalanus mimus 0 0 Cyclopoida Acanthocyclops Acanthocyclops vernalis 0 0 Cyclopoida Eucyclops Eucyclops speratus 5306 7905 Cyclopoida Macrocyclops Macrocyclops albidus 6320 8719 Cyclopoida Oithona Oithona atlantica 19 8 Decapoda Carcinus Carcinus maenas 1257 7 Decapoda Caridean Caridean larvae 7883 17169 Decapoda Crangonidae Crangonidae 46012 51560 Decapoda Grapsidae Grapsidae 0 0 Decapoda Hippolytidae Hippolytidae 0 0

43 Decapoda Majidae Majidae 0 0 Decapoda Neotrypaea Neotrypaea californiensis 41745 50011 Decapoda Xanthidae Xanthidae 0 0 Diplostraca Bosmina Bosmina longirostris 1364 2383 Diplostraca Bythotrephes Bythotrephes longimanus 0 0 Diplostraca Ceriodaphnia Ceriodaphnia lacustris 2 8 Diplostraca Daphnia Daphnia parvula 0 0 Diplostraca Daphnia Daphnia pulex 19918 30247 Diplostraca Daphnia Daphnia pulicaria 0 0 Diplostraca Diaphanosoma Diaphanosoma brachyurum 48 26 Diplostraca Holopedium Holopedium gibberum 0 1 Diplostraca Leptodora Leptodora kindtii 48 1 Diplostraca Polyphemus Polyphemus pediculus 1693 7113 Harpacticoida Clytemnestra Clytemnestra scutellata 0 0 Harpacticoida Tachidiidae 0 0 Harpacticoida Tisbe Tisbe furcata 1265 1707 Harpacticoida Zaus Zaus abbreviatus 2031 3402 Sessilia Balanus Balanus crenatus 27 5 Sessilia Balanus Balanus glandula 85431 70521 Sessilia Chthamalus Chthamalus dalli 2998 3621 Mollusca Anaspidea Pteropoda Pteropoda 0 0 Cycloneritimorpha Nerita Nerita spp 1258 4137 Dreissena polymorpha or Myida Dreissena rostriformis bugensis 0 0 Mytilida Limnoperna Limnoperna fortunei 1425 3051 Mytilida Mytilus Mytilus edulis 2739 6523 Neogastropoda Nassarius distortus 594 3291 Thecosomata Limacina Limacina helicina 0 0 Venerida Corbicula Corbicula fluminea 0 2 Tunicata Copelata Oikopleura Oikopleura labradonensis 0 11 Phlebobranchia Ciona Ciona intestinalis 8 0 Average number of reads per individual or species (n=56) 6507 7169

44 Table 4: Comparison of species detection and sequencing depth for the 18S V4 marker in 18S marker alone vs. 18S marker multiplexed with other markers/fragments datasets, sequenced using the same gDNA of the Multiple Individuals per Species (MIS) (library 2a, 2b, 2c, 2d). The ‘18S marker alone’ referred to the dataset generated by sequencing only the 18S V4 marker, while the column with ‘18S marker multiplexed with other markers/fragments’ referred to the dataset generated by sequencing the 18S V4 marker together with three other COI fragments. The inconsistences in species detection of 18S marker between 18S marker alone and 18S marker multiplexed with other markers/fragments datasets were marked as grey.

Phylum 18S marker 18S marker multiplexed with (Subphylum) Order Family Genus/Family Species n alone other markers/fragments Crustacea Amphipoda Gammaridae Gammarus Gammarus lawrencianus 1 0 0 Crustacea Amphipoda Hyalellidae Hyalella Hyalella clade 8 5 31239 58960 Crustacea Anostraca Artemiidae Artemia Artemia spp 2 74529 82148 Crustacea Calanoida Acartiidae Acartia Acartia longiremis 5 0 0 Crustacea Calanoida Eurytemora Eurytemora affinis 23 78877 75980 Crustacea Calanoida Leptodiaptomus Leptodiaptomus minutus 1 9180 10281 Crustacea Decapoda Portunidae Carcinus Carcinus maenas 2 4538 168 Crustacea Decapoda Palaemonidae Palaemonetes Palaemonetes spp 5 138730 120748 Crustacea Diplostraca Daphniidae Daphnia Daphnia pulex 10 25065 65503 Crustacea Diplostraca Leptodoridae Leptodora Leptodora kindtii 5 47 0 Crustacea Sessilia Balanidae Balanus Balanus crenatus 10 150068 77705 Crustacea Sessilia Chthamalinae Chthamalus Chthamalus dalli 1 13758 22117 Mollusca Cycloneritimorpha Neritidae Nerita Nerita spp 1 7745 31272 Mollusca Venerida Cyrenidae Corbicula Corbicula fluminea 5 0 9 Average number of reads per individual (n=76) 7023 7170 Average number of reads per species (n=14) 38127 38921

45 Table 5: Comparison of correct species assignment percentages for 18S marker vs. COI marker, where both R1 and R2 reads hit a species in the corresponding mock community (see detailed species list in the Supporting Information table S3).

18S COI Library (species Number of Number of Reads Number of Number of Reads name/number of species) Reads with with Correct Species Reads with with Correct Species Blast Hits Assignments Blast Hits Assignments 1e (SIS; n=56) 153973 148426 289874 158667 1g (SIS; n=27) 103270 103043 277135 276055 2e (MIS; n=14) 161752 158634 389099 388475 2g (MIS; n=26) 84980 84692 320248 318311 Average 125994 123699 319089 285377 Correct Species Assignment % 98.2% 89.4% 3a1 (Limnoperna fortunei) 106024 105955 345659 106384 3a2 (Limnoperna fortunei) 165357 165347 178534 106714 3a3 (Limnoperna fortunei) 207189 207171 279556 155555 3b1 (Balanus crenatus) 49843 49784 320680 281327 3b2 (Balanus crenatus) 45139 42001 299336 264087 3b3 (Balanus crenatus) 39495 39405 206306 176686 3c1 (Tortanus discaudatus) 71700 71308 381923 317882 3c2 (Tortanus discaudatus) 87402 86973 426804 206598 3c3 (Tortanus discaudatus) 75539 75067 434517 353720 3d1 (Leptodora kindtii) 52 10 409581 334364 3d2 (Leptodora kindtii) 266 8 485837 403110 3d3 (Leptodora kindtii) 33 8 323085 261494 Average 70670 70253 340985 247327 Correct Species Assignment % 99.4% 72.5%

46 FIGURES

Figure 1: The 5’ end fragment of 325bp refers to ‘FC fragment’ matching COI-5P gene before the nucleotide position 400, the 3’ end fragment of 313bp refers to ‘Leray fragment’ matching COI-5P gene after nucleotide position 300, and the whole COI-5P gene of 658bp refers to ‘Folmer fragment’ with forward reads R1 matching before nucleotide position 300 and the reverse reads R2 matching after nucleotide position 400. The gray lines refer to the forward and reverse reads from the pair-end 300bp Illumina MiSeq next-generation sequencing. *Note that the 18S fragment sizes vary upon different species, thus the forward and reverse reads do not always overlap.

47 2.0 RawPairs (a) ReadsWithBothBLASTHit 1.5 ReadsWithBothBLASTHitSameSpecies

1.0

Read Pairs/Reads 0.5 Abundance (millions)

0.0 1a 1b 1c 1d 1e 1g 2a 2b 2c 2d 2e 2g 3a1 3a2 3a3 3b1 3b2 3b3 3c1 3c2 3c3 3d1 3d2 3d3 0.35 0.28 (b) 0.21 0.14 (millions)

18S Fragment 0.07 0.00 1a 1b 1c 1d 1e 1g 2a 2b 2c 2d 2e 2g 3a1 3a2 3a3 3b1 3b2 3b3 3c1 3c2 3c3 3d1 3d2 3d3 0.35 0.28 (c) 0.21 0.14 (millions)

FC Fragment 0.07 0.00 1a 1b 1c 1d 1e 1g 2a 2b 2c 2d 2e 2g 3a1 3a2 3a3 3b1 3b2 3b3 3c1 3c2 3c3 3d1 3d2 3d3 0.35 (d) 0.28 0.21 0.14

(millions) 0.07

Leray Fragment 0.00 1a 1b 1c 1d 1e 1g 2a 2b 2c 2d 2e 2g 3a1 3a2 3a3 3b1 3b2 3b3 3c1 3c2 3c3 3d1 3d2 3d3 0.35 (e) 0.28 0.21 0.14 (millions) 0.07 Folmer Fragment 0.00 1a 1b 1c 1d 1e 1g 2a 2b 2c 2d 2e 2g 3a1 3a2 3a3 3b1 3b2 3b3 3c1 3c2 3c3 3d1 3d2 3d3 Figure 2: Read abundance of raw and filtered reads/read pairs (a), followed by the number of reads from each of the 4 fragments (b-e) across 24 libraries (1a-1g: Single Individuals per Species (SIS); 2a-2g: Multiple Individuals per Species (MIS); 3a1-3d3: Populations of Single Species(PSS). Note the low abundant reads of 18S fragments in libraries 3d1-3d3.

48

Figure 3: Comparison of 3 COI fragments (‘FC’, ‘Leray’, ‘Folmer’) on species detection in both Single Individuals per Species (SIS) and Multiple Individuals per Species (MIS).

49

Figure 4: Comparison of COI vs. 18S markers on species detections in both mock communities: Single Individuals per Species (SIS) and Multiple Individuals per Species (MIS). Species were detected using the bioinformatics analysis for both the COI (including all 3 fragments) and 18S markers.

50

100% 90% 80% 70% 60% 50% 40% 30% Single Individuals Species (n=83) Species Recovery Rates 20% Multiple Individuals Species (n=40) 10% 0% FC Leray+ Folmer+ 18S+ Figure 5: Accumulation of species recovery percentages after including more fragments in both Single Individuals per Species (SIS) and Multiple Individuals per Species (MIS). The first data points show the percentage of species detected by the FC fragment alone, then species detected by the FC and Leray fragments, followed by adding the Folmer and 18S fragments. Note the same species in different libraries was considered as separate ‘species’ (see Supporting Information table S3 for species list).

51 GENERAL CONCLUSIONS DNA-based species identification by metabarcoding has been successfully used to assess community structure and biodiversity in various aquatic ecosystems; however, the choice of marker(s) is discussed and debated in most metabarcoding studies with the goal of obtaining both broad taxonomic coverage and the best phylogenetic resolution. This work presents a calibrated metabarcoding approach using two evolutionary independent markers and multiple primer pairs that improve species recovery rates by 22.5-34.9% over using a single marker and one primer pair per marker. A limited number of metabarcoding studies have used multiple markers for targeting different taxonomic groups; however, species detections are rarely cross-validated using different markers. Moreover, such studies can suffer from amplification bias due to the use of a single primer pair per marker. This novel approach of using multiple markers is appealing in the broader context of metabarcoding for biomonitoring and conservation management because not only are species recovery rates significantly improved, but species detection accuracies were 89.4-98.2% using the designated bioinformatics analysis. Here I argue that metabarcoding approaches should be used widely in environmental institutions for resource and conservation management. Firstly, the use of both nuclear 18S and mitochondrial COI markers significantly improved overall species detection rates, with both markers having high species assignment accuracy in mock communities of multiple zooplankton species. The multicopy nuclear 18S marker had high species amplification success during primer testing, but conserved sequences within closely related species made 18S reads difficult to assign to the corresponding species. Although many species were not amplified or detected by the COI marker, all COI reads were assigned to the corresponding species thus providing species level resolution. The use of multiple markers can be used not only to cross-validate species detections, as was done in this study (e.g. Clarke et al. 2014), but can also target broader taxonomic species/groups (e.g. Drummond et al. 2015; Kermarrec et al. 2013), such as chloroplast trnL for plants and the internal transcribed spacer (ITS) for fungi. Secondly, in this study the use of multiple universal primer pairs for a single marker improved overall species detection rates compared to using a single COI primer pair. The multiple species-specific and group-specific primer pairs were also shown to increase the species amplification coverage (Clarke et al. 2014; Letendu et al. 2014; Bucklin et al. 2016). Thus, species detection rates are expected to be improved by using multiple group-specific primer pairs of the same marker in metabarcoding

52 natural communities with more diverse species compositions. Lastly, the use of multiple barcodes and multiple primer pairs per barcode in a single high-throughput run via sample multiplexing did not negatively influence sequencing depth or species detection rates. The same community DNA was sequenced using the 18S marker alone and using 18S multiplexed with 3 COI primer pairs in separate runs. The number of reads per individual/species and overall species detection rates were very similar for 18S when it was sequenced alone versus when it was multiplexed and sequenced with multiple fragments of another marker. With recent improvements in reference sequences databases, sequencing platforms, bioinformatics tools, and reductions in sequencing time and costs, the community DNA or environmental DNA based metabarcoding approach has the potential to be widely applied in diverse fields (Shaw et al. 2017) such as assessment of microorganisms in health or forensics, inspection of processed foods, and surveillance of invasive or endangered species. Each of these applications could be improved by broadening the taxonomic coverage and providing more powerful phylogenetic resolution. Furthermore, the metabarcoding method should complement many other biotechnology techniques such as CRISPR/Cas9 or RADseq for advancing the field of biodiversity conservation with respect to both surveying and controlling endangered or invasive species (Corlett 2017). The multigene and multiple primer pairs metabarcoding approach I developed and calibrated here using mock communities can be used for surveying complex natural zooplankton communities with more taxonomic coverage and higher species detection accuracy than traditional methods.

53 References Bucklin A, Lindeque PK, Rodriguez-Ezpeleta N, Albaina A, Lehtiniemi M (2016) Metabarcoding of marine zooplankton: prospects, progress and pitfalls. Journal of Plankton Research, 38, 393-400.

Clarke LJ, Soubrier J, Weyrich LS et al. (2014) Environmental metabarcodes for insects: in silico PCR reveals potential for taxonomic bias. Molecular Ecology Resources, 14, 1160- 1170.

Corlett RT (2017) A bigger toolbox: biotechnology in biodiversity conservation. Trends in Biotechnology, 35, 55-65.

Deagle BE, Jarman SN, Coissac E, Pompanon F, Taberlet P (2014) DNA metabarcoding and the cytochrome c oxidase subunit I marker: not a perfect match. Biological Letter, 10, 20140562.

Drummond AJ, Newcomb RD, Buckley TR, Xie D, Dopheide A, Potter BC et al. (2015) Evaluating a multigene environmental DNA approach for biodiversity assessment. Gigascience, 4, 46.

Hebert PDN, Cywinska A, Ball SL, deWaard, JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society London B, 270, 313-321.

Kermarrec L, Franc A, Rimet F, Chaumeil P, Humbert JF, Bouchez A (2013) Next-generation sequencing to inventory taxonomic diversity in eukaryotic communities: a test for freshwater diatoms. Molecular Ecology Resources, 13, 607-619.

Letendu G, Wubet T, Chatzinotas A, Welhelm C, Buscot F, Schlegel M (2014) Effects of long- term differential fertilization on eukaryotic microbial communities in an arable soil: a multiple barcoding approach. Molecular Ecology, 23, 3341-3355.

Shaw JLA, Weyrich L, Cooper A (2017) Using environmental (e)DNA sequencing for aquatic biodiversity surveys: a beginner’s guide. Marine and Freshwater Research, 68, 20-33.

54 APPENDIX

Manuscript Supporting Information

Table S1: The complete list of primers used for primer testing, and the 4 primer pairs used in the metabarcoding study are bolded. The fragment name refers to the 14 primer pairs tested in the supporting information table S2. *Note that the 18S fragment varies in length for different species.

Fragment Primer Name Sequence (5' - 3') Direction Target Taxa Reference Fragment Size

Uni18S AGGGCAAKYCTGGTGCCAGC F Metazoan Zhan et al. 2013 18S 310-620* Uni18SR GRCGGTATCTRATCGYCTT R Metazoan Zhan et al. 2013 LCO1490 GGTCAACAAATCATAAAGATATTGG F Various phyla Folmer et al., 1994 COI_FC 325 Ill_C_R GGIGGRTAIACIGTTCAICC R Arthropoda Shokralla et al. 2015 mlCOIintF GGWACWGGWTGAACWGTWTAYCCYCC F Various phyla Lerey et al., 2013 COI_Leray 313 HCO2198 TAAACTTCAGGGTGACCAAAAAATCA R Various phyla Folmer et al., 1994 LCO1490 GGTCAACAAATCATAAAGATATTGG F Various phyla Folmer et al., 1994 COI_Folmer 658 HCO2198 TAAACTTCAGGGTGACCAAAAAATCA R Various phyla Folmer et al., 1994 mlCOIintF GGWACWGGWTGAACWGTWTAYCCYCC F Various phyla Lerey et al., 2013 COI_Leray2 313 jgHCO2198 TAIACYTCIGGRTGICCRAARAAYCA R Invertebrates Geller et al., 2013 ZplankF1_t1 TCTASWAATCATAARGATATTGG F Zooplankton Prosser et al. 2013 COI_Prosser 663 ZplankR1_t1 TTCAGGRTGRCCRAARAATCA R Zooplankton Prosser et al. 2013 jgLCO1490 TITCIACIAAYCAYAARGAYATTGG F Invertebrates Geller et al., 2013 COI_Geller 658 jgHCO2198 TAIACYTCIGGRTGICCRAARAAYCA R Invertebrates Geller et al., 2013 dgLCO1490 GGTCAACAAATCATAAAGAYATYGG F Mollusca Meyer, 2003 COI_Meyer 658 dgHCO2198 TAAACTTCAGGGTGACCAAARAAYCA R Mollusca Meyer, 2003 CrustDF1 GGTCWACAAAYCATAAAGAYATTGG F Crustacea Radulovici et al., 2009 COI_Radulovici 658 CrustDR1 TAAACYTCAGGRTGACCRAARAAYCA R Crustacea Radulovici et al., 2009 Uni-MinibarF1 CAAAATCATAATGAAGGCATGAGC F Various phyla Meusnier et al., 2008 COI_Meusnier 130 Uni-MinibarR1 TCCACTAATCACAARGATATTGGTAC R Various phyla Meusnier et al., 2008 COI_Niels TS2AscF2 TCNACHAAYCATAARGATATT F Tunicates Niels Van Steenkiste (DFO) 663

55 TS2AscR2 ACYTCNGGRTGNCYAAAAAAYCA R Tunicates Niels Van Steenkiste (DFO) F Marine LoboF1 KBTCHACAAAYCAYAARGAYATHGG metazoans Lobo et al. 2013 COI_Lobo 658 R Marine LoboR1 TGRTTYTTYGGWCAYCCWGARGTTTA metazoans Lobo et al. 2013 CrustF1 TTTTCTACAAATCATAAAGACATTGG F Crustacea Costa et al., 2007 COI_CostaF1 658 HCO2198 TAAACTTCAGGGTGACCAAAAAATCA R Various phyla Folmer et al., 1994 CrustF2 GGTTCTTCTCCACCAACCACAARGAYATHGG F Crustacea Costa et al., 2007 COI_CostaF2 658 HCO2198 TAAACTTCAGGGTGACCAAAAAATCA R Various phyla Folmer et al., 1994

56 Table S2: A list of species (n=104) tested with one 18S primer pair and multiple COI primer pairs (“0” refers to not amplified; “1” refers to successfully amplified with PCR and showing bands on gel electrophoresis).

COI_ Fragment 18S COI_FC COI_Leray Folmer COI_Leray2 COI_Prosser COI_Geller COI_Meyer ForwardPrimer Uni18S LCO1490 mlCOIintF LCO1490 mlCOIintF ZplankF1_t1 jgLCO1490 dgLCO1490 jgHCO219 ReversePrimer Uni18SR Ill_C_R HCO2198 HCO2198 jgHCO2198 ZplankR1_t1 8 dgHCO2198 Fragment Length (in bp) 310-620* 325 313 658 313 663 658 658 Annelida (n=1) 1 0 0 0 0 1 0 0 Tubifex tubifex 1 0 0 0 0 1 0 0 Arthropoda (n=73) 53 36 28 29 22 26 31 33 Amphipoda 4 6 4 1 0 4 3 4 Caprella mutica 0 1 1 0 0 1 1 1 Crangonyx 1 1 0 0 0 0 0 0 Gammarus lawrencianus 0 1 1 0 0 1 0 0 Gammarus oceanicus 1 1 1 0 0 0 1 1 Gammarus spp. 1 1 1 1 0 1 1 1 Hyalella azteca 1 1 0 0 0 1 0 1 Anostraca 1 1 1 1 1 0 1 1 Artemia salina 1 1 1 1 1 0 1 1 Calanoida 10 5 5 3 2 2 6 6 Acartia hudsonica 1 0 0 0 0 0 0 0 Acartia longiremis 1 1 0 0 0 0 0 0 Centropages hamatus 0 1 1 1 1 1 1 1 Epischura lacustris 1 1 1 0 1 0 1 0 Eurytemora herdmani 1 0 0 0 0 0 0 0 Metridia pacifica 0 1 1 0 0 0 0 0 Microcalanus pusillus 0 0 0 0 0 0 0 0 Paracalanus spp. 1 0 0 0 0 0 0 0 Pseudocalanus newmani 1 0 1 0 0 1 1 1

57 Pseudodiaptomus 0 0 0 0 0 0 0 0 Senecella calanoides 1 0 0 0 0 0 1 1

Skistodiaptomus oregonensis 1 0 0 1 0 0 1 1 Temora longicornis 1 0 0 0 0 0 0 1 Tortanus discaudatus 1 1 1 1 0 0 1 1 Cirripedia 3 3 2 2 2 2 2 2 Balanus crenatus 1 1 0 0 0 0 0 0 Chthamalus dalli 1 1 1 1 1 1 1 1 Cirripedia cyprid 1 1 1 1 1 1 1 1 Cladocera 19 11 6 11 7 8 9 10 Alona affinis 1 1 0 0 0 0 0 0 Bosmina coregoni 1 0 0 0 0 0 0 0 Bythotrephes longimanus 1 1 1 1 1 1 1 1 Cercopagis pengoi 1 1 0 1 0 0 0 1 Ceriodaphnia lacustris 1 1 1 1 1 1 1 1 Chydorus globosus 1 0 0 0 0 0 0 0 Daphnia ambigua 1 0 0 1 0 1 1 1 Daphnia dentifera 0 0 0 0 0 0 0 0 Daphnia magna 1 1 1 1 0 1 1 1 Daphnia mendotae 0 0 0 1 0 1 1 1 Daphnia obtusa 1 0 0 1 1 0 1 1 Daphnia pulex 1 1 1 1 0 1 1 1 Daphnia pulicaria 1 0 0 1 1 1 1 1 Diaphanosoma brachyurum 1 0 0 0 0 0 0 0 Eurycercus lamellatus 0 1 0 0 0 0 0 0 Evadne spp. 1 0 0 0 0 0 0 0 Holopedium gibberum 1 1 1 0 1 1 1 1 Leptodora kindtii 1 1 0 0 0 0 0 0 Pleopsis polyphemoides 1 0 0 1 1 0 0 0

58 Pleuroxus procurvus 0 0 0 0 0 0 0 0 Podon spp. 1 0 0 0 0 0 0 0 Polyphemus pediculus 1 1 1 1 1 0 0 0 Sida crystallina 1 0 0 0 0 0 0 0 Simocephalus 0 1 0 0 0 0 0 0 Cyclopoida 4 1 1 1 1 1 1 1 Eucyclops speratus 1 0 0 0 0 0 0 0 Macrocyclops albidus 1 0 0 0 0 0 0 0 Mesocyclops edax 1 1 1 1 1 1 1 1 Oithona similis 1 0 0 0 0 0 0 0 Oithona spp. 0 0 0 0 0 0 0 0 Decapoda 8 8 7 8 8 7 8 8 Cancridae 1 0 0 0 0 0 0 0 Grapsidae 0 0 0 0 0 0 0 0 Hemigrapsus nudus 0 1 1 1 1 1 1 1 Hemigrapsus oregonensis 1 1 1 1 1 1 1 1 Lophopanopeus bellus 1 1 1 1 1 1 1 1 Majidae 0 0 0 0 0 0 0 0 Metacarcinus magister 1 0 1 1 1 0 1 1 Paguroidea; hermit crab 1 1 1 1 1 1 1 1 Phyllolithodes papillosus 1 1 0 1 1 1 1 1 Pugettia gracilis 1 1 1 1 1 1 1 1 Scyra acutifrons 0 1 0 0 0 0 0 0 Xanthidae 1 1 1 1 1 1 1 1 Diptera 0 0 0 0 0 0 0 0 Tachididae 0 0 0 0 0 0 0 0 Ephemeroptera 1 0 1 1 1 1 1 1 Hexagenia 1 0 1 1 1 1 1 1 Harpacticoida 1 1 0 0 0 0 0 0 Clytemnestra scutellata 0 0 0 0 0 0 0 0

59 Microsetella norvegica 0 1 0 0 0 0 0 0 Unknown Harpacticoid 1 0 0 0 0 0 0 0 Isopoda 1 0 1 1 0 1 0 0 Unknown Isopoda 1 0 1 1 0 1 0 0 Mysida 0 0 0 0 0 0 0 0 Hemimysis anomala 0 0 0 0 0 0 0 0 Poecilostomatoida 1 0 0 0 0 0 0 0 Corycaeus anglicus 1 0 0 0 0 0 0 0 Chordata (n=11) 10 6 4 5 5 4 6 7 Actinopterygii 2 2 2 1 2 2 2 2 Unknown Fish 1 1 1 1 1 1 1 1 Gasterosteus aculeatus 1 1 1 0 1 1 1 1 Tunicata 8 4 2 4 3 2 4 5 Botryllus schlosseri 1 1 0 0 0 0 0 1 cfr. Cnemidocarpa finmarkiensis 1 1 1 1 1 0 1 1 Ciona intestinalis 1 0 0 0 0 0 0 0 Eudistoma molle? 1 0 0 0 0 1 0 0 Halocynthia aurantium 1 1 0 1 1 0 1 1 Oikopleura 1 0 0 0 0 0 0 0 Pyura haustor 1 0 0 0 0 0 0 0 Styela clava 0 0 0 1 0 0 1 1 Styela gibbsii 1 1 1 1 1 1 1 1 Mollusca (n=12) 8 9 7 3 5 5 7 6 Bivalvia 6 7 6 0 4 4 4 4 Corbicula fluminea 1 1 1 0 1 1 1 1 Crassostrea gigas 1 1 1 0 1 1 1 1 Leukoma staminea 0 1 1 0 0 0 0 0 Limnoperna fortunei 1 1 1 0 1 1 1 1 Macoma secta 1 1 1 0 1 1 1 1 Mytilus edulis 2 2 1 0 0 0 0 0

60 Ruditapes philippinarum 0 0 0 0 0 0 0 0 Saxidomas gigantea 0 0 0 0 0 0 0 0 Gastropoda 1 2 1 2 1 1 2 1 Limacina helicina 1 1 1 1 1 1 1 1 Nassarius distortus 0 0 0 0 0 0 0 0 Nerita spp. 0 1 0 1 0 0 1 0 Pteropoda 1 0 0 1 0 0 1 1 Unknown Pteropoda 1 0 0 1 0 0 1 1 Platyhelminthes (n=1) 1 0 0 0 0 0 0 0 beige flatworm 1 0 0 0 0 0 0 0 Rotifera (n=5) 5 0 1 2 1 3 2 4 Brachionus calyciflorus 1 0 1 1 1 1 1 1 Brachionus rubens 1 0 0 1 0 1 1 1 Cephalodella acidophila 1 0 0 0 0 1 0 1 Keratella cochlearis 1 0 0 0 0 0 0 0 Keratella quadrata 1 0 0 0 0 0 0 1 Total (n=103) 78 51 40 39 33 39 46 50

Table S2 con’d: A list of species (n=104) tested with one 18S primer pair and multiple COI primer pairs (“0” refers to not amplified; “1” refers to successfully amplified with PCR and showing bands on gel electrophoresis).

Fragment COI_Radulovici COI_Meusnier COI_Niels COI_Lobo COI_CostaF1 COI_CostaF2 ForwardPrimer CrustDF1 Uni-MinibarF1 TS2AscF2 LoboF1 CrustF1 CrustF2 ReversePrimer CrustDR1 Uni-MinibarR1 TS2AscR2 LoboR1 HCO2198 HCO2198 Fragment Length (in bp) 658 130 663 658 658 658 Annelida (n=1) 0 0 0 0 0 0 Tubifex tubifex 0 0 0 0 0 0 Arthropoda (n=73) 41 19 16 0 20 24 Amphipoda 5 2 2 0 1 3 Caprella mutica 1 0 1 0 0 1

61 Crangonyx 0 1 0 0 0 0 Gammarus lawrencianus 1 0 0 0 0 0 Gammarus oceanicus 1 1 1 0 1 1 Gammarus spp. 1 0 0 0 0 1 Hyalella azteca 1 0 0 0 0 0 Anostraca 1 1 0 0 1 0 Artemia salina 1 1 0 0 1 0 Calanoida 8 5 0 0 0 2 Acartia hudsonica 1 0 0 0 0 0 Acartia longiremis 1 0 0 0 0 0 Centropages hamatus 1 1 0 0 0 1 Epischura lacustris 0 1 0 0 0 1 Eurytemora herdmani 0 0 0 0 0 0 Metridia pacifica 0 0 0 0 0 0 Microcalanus pusillus 0 1 0 0 0 0 Paracalanus spp. 0 0 0 0 0 0 Pseudocalanus newmani 1 0 0 0 0 0 Pseudodiaptomus 0 1 0 0 0 0 Senecella calanoides 1 0 0 0 0 0

Skistodiaptomus oregonensis 1 0 0 0 0 0 Temora longicornis 1 1 0 0 0 0 Tortanus discaudatus 1 0 0 0 0 0 Cirripedia 3 2 2 0 2 2 Balanus crenatus 1 0 0 0 0 0 Chthamalus dalli 1 1 1 0 1 1 Cirripedia cyprid 1 1 1 0 1 1 Cladocera 11 3 7 0 6 9 Alona affinis 0 0 0 0 0 0 Bosmina coregoni 0 0 0 0 0 0 Bythotrephes longimanus 1 0 0 0 1 1

62 Cercopagis pengoi 1 0 0 0 0 0 Ceriodaphnia lacustris 1 0 1 0 0 1 Chydorus globosus 1 0 0 0 0 0 Daphnia ambigua 1 0 1 0 1 1 Daphnia dentifera 0 0 0 0 0 0 Daphnia magna 1 0 1 0 1 1 Daphnia mendotae 1 0 1 0 0 0 Daphnia obtusa 1 1 1 0 0 1 Daphnia pulex 0 0 1 0 1 1 Daphnia pulicaria 1 0 1 0 1 1 Diaphanosoma brachyurum 1 0 0 0 0 0 Eurycercus lamellatus 0 0 0 0 0 0 Evadne spp. 0 0 0 0 0 0 Holopedium gibberum 1 1 0 0 1 1 Leptodora kindtii 0 0 0 0 0 0 Pleopsis polyphemoides 0 0 0 0 0 0 Pleuroxus procurvus 0 0 0 0 0 0 Podon spp. 0 0 0 0 0 0 Polyphemus pediculus 0 1 0 0 0 0 Sida crystallina 0 0 0 0 0 0 Simocephalus 0 0 0 0 0 1 Cyclopoida 2 0 0 0 1 0 Eucyclops speratus 0 0 0 0 0 0 Macrocyclops albidus 1 0 0 0 0 0 Mesocyclops edax 1 0 0 0 1 0 Oithona similis 0 0 0 0 0 0 Oithona spp. 0 0 0 0 0 0 Decapoda 9 5 5 0 8 6 Cancridae 1 0 0 0 0 0 Grapsidae 0 0 0 0 0 0

63 Hemigrapsus nudus 1 0 1 0 1 1 Hemigrapsus oregonensis 1 1 0 0 1 1 Lophopanopeus bellus 1 1 1 0 1 0 Majidae 0 0 0 0 1 0 Metacarcinus magister 1 0 0 0 1 1 Paguroidea; hermit crab 1 1 1 0 0 1 Phyllolithodes papillosus 1 1 1 0 1 1 Pugettia gracilis 1 0 1 0 1 1 Scyra acutifrons 0 0 0 0 0 0 Xanthidae 1 1 0 0 1 0 Diptera 0 0 0 0 0 0 Tachididae 0 0 0 0 0 0 Ephemeroptera 0 0 0 0 1 1 Hexagenia 0 0 0 0 1 1 Harpacticoida 1 1 0 0 0 0 Clytemnestra scutellata 1 0 0 0 0 0 Microsetella norvegica 0 1 0 0 0 0 Unknown Harpacticoid 0 0 0 0 0 0 Isopoda 1 0 0 0 0 1 Unknown Isopoda 1 0 0 0 0 1 Mysida 0 0 0 0 0 0 Hemimysis anomala 0 0 0 0 0 0 Poecilostomatoida 0 0 0 0 0 0 Corycaeus anglicus 0 0 0 0 0 0 Chordata (n=11) 5 7 4 0 1 2 Actinopterygii 2 2 1 0 1 2 Unknown Fish 1 1 1 0 0 1 Gasterosteus aculeatus 1 1 0 0 1 1 Tunicata 3 5 3 0 0 0 Botryllus schlosseri 1 0 0 0 0 0

64 cfr. Cnemidocarpa finmarkiensis 1 1 1 0 0 0 Ciona intestinalis 0 0 0 0 0 0 Eudistoma molle? 0 0 0 0 0 0 Halocynthia aurantium 0 1 1 0 0 0 Oikopleura 0 1 0 0 0 0 Pyura haustor 0 0 0 0 0 0 Styela clava 0 1 0 0 0 0 Styela gibbsii 1 1 1 0 0 0 Mollusca (n=12) 8 2 3 0 4 1 Bivalvia 6 2 3 0 3 1 Corbicula fluminea 1 0 0 0 1 0 Crassostrea gigas 1 0 1 0 1 1 Leukoma staminea 0 0 0 0 0 0 Limnoperna fortunei 1 1 1 0 1 0 Macoma secta 1 0 1 0 0 0 Mytilus edulis 2 1 0 0 0 0 Ruditapes philippinarum 0 0 0 0 0 0 Saxidomas gigantea 0 0 0 0 0 0 Gastropoda 1 0 0 0 1 0 Limacina helicina 1 0 0 0 1 0 Nassarius distortus 0 0 0 0 0 0 Nerita spp. 0 0 0 0 0 0 Pteropoda 1 0 0 0 0 0 Unknown Pteropoda 1 0 0 0 0 0 Platyhelminthes (n=1) 1 0 0 0 0 0 beige flatworm 1 0 0 0 0 0 Rotifera (n=5) 5 3 0 0 0 1 Brachionus calyciflorus 1 1 0 0 0 0 Brachionus rubens 1 1 0 0 0 0 Cephalodella acidophila 1 1 0 0 0 0

65 Keratella cochlearis 1 0 0 0 0 0 Keratella quadrata 1 0 0 0 0 1 Total (n=103) 60 31 23 0 25 28

66 Table S3: Species list in the mock communities, ‘n’ refers to number of individuals (1a-1g: single individuals per species (SIS); 2a-2g: multiple individuals per species (MIS); 3a1-3d3: populations of single species (PSS)). The libraries 1e and 2e were the gDNA pooled from four separate DNA extractions 1a-1d and 2a-2d respectively (DNA extraction and dilutions seen in Brown et al. 2015 for details).

Library Order Family Species n Provider Single Individuals per Species (‘SIS’: 1a,1b,1c,1d,1e,1g) 1a Cyclopoida Cyclopidae Acanthocyclops vernalis 1 See Brown et al. 2015 1a Diplostraca Bosminidae Bosmina longirostris 1 See Brown et al. 2015 1a Diplostraca Daphniidae Ceriodaphnia lacustris 1 See Brown et al. 2015 1a Diplostraca Daphniidae Daphnia parvula 1 See Brown et al. 2015 1a Diplostraca Daphniidae Daphnia pulex 1 See Brown et al. 2015 1a Diplostraca Daphniidae Daphnia pulicaria 1 See Brown et al. 2015 1a Diplostraca Holopediidae Holopedium gibberum 1 See Brown et al. 2015 1a Diplostraca Leptodoridae Leptodora kindtii 1 See Brown et al. 2015 1a Diplostraca Polyphemidae Polyphemus pediculus 1 See Brown et al. 2015 1a Calanoida Clausocalanidae Pseudocalanus mimus 1 See Brown et al. 2015 1b Anostraca Artemiidae Artemia spp 1 See Brown et al. 2015 1b Phlebobranchia Cionidae Ciona intestinalis 1 See Brown et al. 2015 1b Venerida Cyrenidae Corbicula fluminea 1 See Brown et al. 2015 1b Amphipoda Crangonyctidae Crangonyx 1 See Brown et al. 2015 Dreissena polymorpha or 1b Myida Dreissenidae rostriformis bugensis 1 See Brown et al. 2015 1b Calanoida Temoridae Eurytemora affinis 1 See Brown et al. 2015 1b Amphipoda Gammaridae Gammarus lacustris 1 See Brown et al. 2015 1b Amphipoda Gammaridae Gammarus oceanicus 1 See Brown et al. 2015 1b Amphipoda Gammaridae Gammarus lawrencianus 1 See Brown et al. 2015 1b Amphipoda Hyalellidae Hyalella azteca 1 See Brown et al. 2015 1b Amphipoda Hyalellidae Hyalella clade 1 1 See Brown et al. 2015 1b Amphipoda Hyalellidae Hyalella clade 8 1 See Brown et al. 2015 1b Amphipoda Hyperiidae Hyperia galba 1 See Brown et al. 2015 1b Amphipoda Hyperiidae Hyperoche mediterranea 1 See Brown et al. 2015 1b Mytilida Mytilidae Limnoperna fortunei 1 See Brown et al. 2015 1b Amphipoda Hyperiidae Themisto libellula 1 See Brown et al. 2015 1c Sessilia Balanidae Balanus crenatus 1 See Brown et al. 2015 1c Sessilia Balanidae Balanus glandula 1 See Brown et al. 2015 1c Decapoda Portunidae Carcinus maenas 1 See Brown et al. 2015 1c Decapoda Atyidae Caridean larvae 1 See Brown et al. 2015 1c Sessilia Chthamalinae Chthamalus dalli 1 See Brown et al. 2015 1c Decapoda Crangonidae Crangonidae 1 See Brown et al. 2015 1c Decapoda Xanthidae Xanthidae 1 See Brown et al. 2015 1c Decapoda Grapsidae Grapsidae 1 See Brown et al. 2015

67 1c Thecosomata Limacinidae Limacina helicina 1 See Brown et al. 2015 1c Decapoda Majidae Majidae 1 See Brown et al. 2015 1c Mytilida Mytilidae Mytilus edulis 1 See Brown et al. 2015 1c Neogastropoda Nassarius distortus 1 See Brown et al. 2015 Cycloneritimorph 1c a Neritidae Nerita spp 1 See Brown et al. 2015 1c Decapoda Hippolytidae Hippolytidae 1 See Brown et al. 2015 1d Diplostraca Cercopagididae Bythotrephes longimanus 1 See Brown et al. 2015 1d Calanoida Calanidae Calanus finmarchicus 1 See Brown et al. 2015 1d Calanoida Centropages abdominalis 1 See Brown et al. 2015 Diaphanosoma 1d Diplostraca Sididae brachyurum 1 See Brown et al. 2015 1d Cyclopoida Cyclopidae Eucyclops speratus 1 See Brown et al. 2015 1d Harpacticoida Tachidiidae Tachidiidae 1 See Brown et al. 2015 1d Calanoida Diaptomidae Leptodiaptomus minutus 1 See Brown et al. 2015 1d Calanoida Centropagidae Limnocalanus macrurus 1 See Brown et al. 2015 1d Cyclopoida Cyclopidae Macrocyclops albidus 1 See Brown et al. 2015 1d Decapoda Callianassidae Neotrypaea californiensis 1 See Brown et al. 2015 1d Harpacticoida Peltidiidae Clytemnestra scutellata 1 See Brown et al. 2015 1d Copelata Oikopleuridae Oikopleura labradonensis 1 See Brown et al. 2015 1d Cyclopoida Oithonidae Oithona atlantica 1 See Brown et al. 2015 1d Anaspidea Aplysiidae Pteropoda 1 See Brown et al. 2015 1d Harpacticoida Tisbidae Tisbe furcata 1 See Brown et al. 2015 1d Harpacticoida Harpacticidae Zaus abbreviatus 1 See Brown et al. 2015 1e pool gDNA from 1a-1d 56 See Brown et al. 2015 1g Calanoida Acartiidae Acartia hudsonica 1 See Brown et al. 2015 1g Trachymedusae Rhopalonematidae Aglantha digitale 1 See Brown et al. 2015 1g Anostraca Artemiidae Artemia spp 1 See Brown et al. 2015 1g Isopoda Asellidae Asellus 1 See Brown et al. 2015 1g Sessilia Balanidae Balanus crenatus 1 See Brown et al. 2015 Botrylloides PBS-DFO-HG004 (DNA 1g Stolidobranchia Styelidae violaceus 1 sample number) Botryllus PBS-DFO-HG009 (DNA 1g Stolidobranchia Styelidae schlosseri 1 sample number) Brachionus 1g Ploima Brachionidae calyciflorus 1 See Brown et al. 2015 Bythotrephes 1g Diplostraca Cercopagididae longimanus 1 See Brown et al. 2015 1g Amphipoda Caprellidae Caprella mutica 1 PBS-DFO 1g Diplostraca Cercopagididae Cercopagis pengoi 1 See Brown et al. 2015 1g Venerida Cyrenidae Corbicula fluminea 1 See Brown et al. 2015

68 PBS-DFO (Identified by Rick Harbo, Magalie Castelin, George Holm, 1g Ostreida Ostreidae Crassostrea gigas 1 Niels Van Steenkiste) 1g Diplostraca Daphniidae Daphnia magna 1 See Brown et al. 2015 Hemigrapsus PBS-DFO (Identified by 1g Decapoda Varunidae oregonensis 1 Marina Wright) Hemimysis 1g Mysida Mysidae anomala 1 See Brown et al. 2015 1g Amphipoda Hyalellidae Hyalella azteca 1 See Brown et al. 2015 1g Ploima Branchionidae Keratella quadrata 1 See Brown et al. 2015 1g Thecosomata Limacinidae Limacina helicina 1 See Brown et al. 2015 Limnoperna 1g Mytilida Mytilidae fortunei 1 See Brown et al. 2015 Lophopanopeus PBS-DFO (Identified by 1g Decapoda Panopeidae bellus 1 Magalie Castelin) PBS-DFO (Identified by 1g Cardiida Tellinidae Macoma secta 1 Scott Gilmore) 1g Cyclopoida Cyclopidae Mesocyclops edax 1 See Brown et al. 2015 Metacarcinus PBS-DFO (Identified by 1g Decapoda Cancridae magister 1 Marina Wright) Microsetella 1g Harpacticoida Ectinosomatidae norvegica 1 See Brown et al. 2015 1g Stolidobranchia Styelidae Styela clava 1 PBS-DFO Tortanus 1g Calanoida Tortanidae discaudatus 1 See Brown et al. 2015 Multiple Individuals per Species (‘MIS’: 2a,2b,2c,2d,2e,2g) 2a Sessilia Balanidae Balanus crenatus 10 See Brown et al. 2015 Gammarus 2a Amphipoda Gammaridae lawrencianus 1 See Brown et al. 2015 Leptodiaptomus 2a Calanoida Diaptomidae minutus 1 See Brown et al. 2015 2a Decapoda Palaemonidae Palaemonetes spp 5 See Brown et al. 2015 2b Calanoida Acartiidae Acartia longiremis 5 See Brown et al. 2015 2b Anostraca Artemiidae Artemia spp 2 See Brown et al. 2015 2b Sessilia Chthamalinae Chthamalus dalli 1 See Brown et al. 2015 2b Amphipoda Hyalellidae Hyalella clade 8 5 See Brown et al. 2015 2b Diplostraca Leptodoridae Leptodora kindtii 5 See Brown et al. 2015 2c Decapoda Portunidae Carcinus maenas 2 See Brown et al. 2015 2c Venerida Cyrenidae Corbicula fluminea 5 See Brown et al. 2015 2c Diplostraca Daphniidae Daphnia pulex 10 See Brown et al. 2015 Cycloneritimorph 2c a Neritidae Nerita spp 1 See Brown et al. 2015 2d Calanoida Temoridae Eurytemora affinis 23 See Brown et al. 2015 2e pool gDNA from 2a-2d 76 See Brown et al. 2015

69 2g Calanoida Acartiidae Acartia hudsonica 4 See Brown et al. 2015 2g Anostraca Artemiidae Artemia spp 2 See Brown et al. 2015 2g Isopoda Asellidae Asellus 1 See Brown et al. 2015 2g Sessilia Balanidae Balanus crenatus 3 See Brown et al. 2015 Botrylloides PBS-DFO-HG004 (DNA 2g Stolidobranchia Styelidae violaceus 2 sample number) Botryllus PBS-DFO-HG009 (DNA 2g Stolidobranchia Styelidae schlosseri 1 sample number) Brachionus 2g Ploima Brachionidae calyciflorus 5 See Brown et al. 2015 Bythotrephes 2g Diplostraca Cercopagididae longimanus 2 See Brown et al. 2015 2g Amphipoda Caprellidae Caprella mutica 1 PBS-DFO 2g Diplostraca Cercopagididae Cercopagis pengoi 3 See Brown et al. 2015 2g Venerida Cyrenidae Corbicula fluminea 2 See Brown et al. 2015

PBS-DFO (Identified by Rick Harbo, Magalie Castelin, George Holm, 2g Ostreida Ostreidae Crassostrea gigas 2 Niels Van Steenkiste) 2g Diplostraca Daphniidae Daphnia magna 3 See Brown et al. 2015 Hemigrapsus PBS-DFO (Identified by 2g Decapoda Varunidae oregonensis 2 Marina Wright) Hemimysis 2g Mysida Mysidae anomala 1 See Brown et al. 2015 2g Amphipoda Hyalellidae Hyalella azteca 3 See Brown et al. 2015 2g Ploima Branchionidae Keratella quadrata 5 See Brown et al. 2015 2g Thecosomata Limacinidae Limacina helicina 1 See Brown et al. 2015 Limnoperna 2g Mytilida Mytilidae fortunei 3 See Brown et al. 2015 Lophopanopeus PBS-DFO (Identified by 2g Decapoda Panopeidae bellus 2 Magalie Castelin) PBS-DFO (Identified by 2g Cardiida Tellinidae Macoma secta 1 Scott Gilmore) 2g Cyclopoida Cyclopidae Mesocyclops edax 4 See Brown et al. 2015 Metacarcinus PBS-DFO (Identified by 2g Decapoda Cancridae magister 1 Marina Wright) Microsetella 2g Harpacticoida Ectinosomatidae norvegica 3 See Brown et al. 2015 2g Stolidobranchia Styelidae Styela clava 1 PBS-DFO Tortanus 2g Calanoida Tortanidae discaudatus 4 See Brown et al. 2015 Populations of Single Species (‘PSS’: single, low, high number of individuals) Limnoperna 3a1 Mytilida Mytilidae fortunei 1 See Brown et al. 2015 Limnoperna 3a2 Mytilida Mytilidae fortunei 10 See Brown et al. 2015

70 Limnoperna 3a3 Mytilida Mytilidae fortunei 30 See Brown et al. 2015 3b1 Sessilia Balanidae Balanus crenatus 1 See Brown et al. 2015 3b2 Sessilia Balanidae Balanus crenatus 10 See Brown et al. 2015 3b3 Sessilia Balanidae Balanus crenatus 17 See Brown et al. 2015 Tortanus 3c1 Calanoida Tortanidae discaudatus 1 See Brown et al. 2015 Tortanus 3c2 Calanoida Tortanidae discaudatus 8 See Brown et al. 2015 Tortanus 3c3 Calanoida Tortanidae discaudatus 15 See Brown et al. 2015 3d1 Diplostraca Leptodoridae Leptodora kindtii 1 See Brown et al. 2015 3d2 Diplostraca Leptodoridae Leptodora kindtii 10 See Brown et al. 2015 3d3 Diplostraca Leptodoridae Leptodora kindtii 28 See Brown et al. 2015

71 Table S4: The accession numbers for the references in the built local database, including sequences generated in this study, CAISN network, BOLD, and NCBI online databases. The reference sequences of closely related species were used for the specimens identified at family level or species without available reference sequences, indicated in the brackets).

Species 18S-Database 18S-Accession COI-Database COI-Accession Acanthocyclops vernalis NCBI AY626999 BOLD EES04912 Acartia hudsonica CAISN Pending-XXXX BOLD-CAISN CAISN867-13 Acartia longiremis NCBI GU969156 This study Pending-XXXX Aglantha digitale NCBI EU247821 BOLD-CAISN CAISN185-12 Artemia spp NCBI (Artemia salina) X01723 NCBI (Artemia franciscana) EF615573.1 Asellus This study Pending-XXXX BOLD (Asellus aquaticus) GU130252 Balanus crenatus This study Pending-XXXX This study Pending-XXXX Balanus glandula NCBI AF201663.1 NCBI KM217564.1 Bosmina longirostris BOLD-CAISN BIOUG01746F09 BOLD ZPLMX590-06 Botrylloides violaceus NCBI AY903927.1 BOLD GBGC6336-09 Botryllus schlosseri NCBI FM244858.1 This study Pending-XXXX Brachionus calyciflorus This study Pending-XXXX This study Pending-XXXX Bythotrephes longimanus NCBI AF070094 BOLD GBCB0018-06 Calanus finmarchicus NCBI AF367719 BOLD (Calanus glacialis) GBCX1672-14 Caprella mutica NCBI (Caprella equilibra) AY743950.1 BOLD WWGSL281-08 Carcinus maenas CAISN-BOLD BIOUG04840 BOLD BNSDE364-14 Caridean larvae NCBI (Thoralus cranchii) EU868758 BOLD (Caridina rubella) GBCDA445-12 Centropages abdominalis NCBI GU969163.1 BOLD GBA6913-10 Cercopagis pengoi NCBI EF189620.1 BOLD RBGC142-03 BOLD-CAISN (Ceriodaphnia Ceriodaphnia lacustris NCBI (Ceriodaphnia dubia) AF144208 dubia) CAISN048-12 Chthamalus dalli NCBI KM974371 BOLD GBFCC0231-06 Ciona intestinalis NCBI AB013017 BOLD GBGC0236-06 Clytemnestra scutellata NCBI (Alteuthellopsis species) EU380289.1 BOLD (Clytemnestrinae) GBA14468-13 Corbicula fluminea NCBI EU660782 BOLD GBMIN1846-12 Crangonidae NCBI (Crangon crangon) EU920938.1 BOLD (Crangon septemspinosa) SDP258010-15

72 Crangonyx NCBI (Crangonyx forbesi) AF202980.1 BOLD (Crangonyx gracilis) DSMYS144-07 Crassostrea gigas NCBI AB064942.1 BOLD BNAGB698-14 Daphnia magna This study Pending-XXXX This study Pending-XXXX Daphnia parvula Brown et al. 2015 NA BOLD GBFCE0136-06 Daphnia pulex NCBI AF014011.1 BOLD GBFCE0275-06 Daphnia pulicaria Brown et al. 2015 NA BOLD GBFCE335-1 CAISN (Diaphanosoma Diaphanosoma brachyurum species) Pending-XXXX BOLD GBCB266-07 Dreissena polymorpha or rostriformis bugensis NCBI (Dreissena polymorpha) AF120552.1 BOLD (Dreissena polymorpha) GBMBV2960-14 Eucyclops speratus NCBI AJ746333.1 BOLD ACSD100-11 Eurytemora affinis NCBI JX995300.1 BOLD-CAISN CAISN901-13 Gammarus lacustris NCBI EF582915.1 BOLD-CAISN CRCN14909 Gammarus lawrencianus Brown et al. 2015 NA NCBI FJ581660.1 Gammarus oceanicus NCBI (Gammarus setosus) JF966164 BOLD ECCRU002-10 Grapsidae NCBI (Grapsus albolineatus) FJ172755.1 BOLD (Grapsus adscensionis) JSDAZ028-08 Hemigrapsus oregonensis NCBI (Hemigrapsus sinensis) EU284146.1 BOLD-CAISN CAISN465-13 Hemimysis anomala This study Pending-XXXX BOLD GBA2802-08 NCBI (Hippolyte Hippolytidae obliquimanus) EU868752.1 BOLD (Hippolyte obliquimanus) GBCMD13610-13 Holopedium gibberum NCBI AF070111.1 BOLD RBGC149-03 Hyalella clade 1 This study Pending-XXXX NCBI DQ464605.1 Hyalella clade 8 This study Pending-XXXX NCBI DQ464630.1 Hyalella azteca This study Pending-XXXX BOLD CDINV045-07 Hyperia galba Brown et al. 2015 NA NCBI KT209327.1 NCBI (Hyperoche Hyperoche mediterranea medusarum) KC428897.1 NCBI (Hyperoche medusarum) EF989667.1 Keratella quadrata NCBI DQ297697.1 NCBI DQ297774.1 Leptodiaptomus minutus NCBI AY339153.1 BOLD GBA3065-08 Leptodora kindtii This study Pending-XXXX This study Pending-XXXX Limacina helicina Brown et al. 2015 NA BOLD GBMLG12535-13 Limnocalanus macrurus NCBI HQ407006.1 BOLD RBGC101-03

73 Limnoperna fortunei This study Pending-XXXX BOLD GBMBM348-13 Lophopanopeus bellus This study Pending-XXXX BOLD GBCDA1886-12 Macoma secta NCBI AY553975.1 NCBI (Macoma balthica) KP977733.1 Macrocyclops albidus NCBI DQ538505 NCBI KC627343.1 Majidae NCBI (Maja squinado) DQ079758.1 BOLD (Maja squinado) GBCMA8843-14 Mesocyclops edax Brown et al. 2015 NA BOLD ZPII1002-11 Metacarcinus magister NCBI AY527220.1 BOLD DSCRA162-07 Microsetella norvegica BOLD-CAISN BIOUG01750-B03 BOLD CAISN1149-13 Mytilus edulis NCBI KC429331.1 This study Pending-XXXX Nassarius distortus NCBI (Nassarius hepaticus) HQ834030.1 BOLD (Nassarius hepaticus) GBMLS5449-09 Neotrypaea californiensis NCBI AF436003.1 BOLD (Callianassa subterranea) BNSDE184-12 Nerita spp This study Pending-XXXX This study Pending-XXXX Oikopleura labradonensis NCBI FM244869.1 BOLD (Oilopleura intermedia) GBMIN43462-14 Oithona atlantica CAISN Pending-XXXX BOLD-CAISN CAISN735-13 Palaemonetes spp NCBI (Palaemonetes vulgaris) AY743941.1 BOLD (Palaemonetes vulgaris) VATWO139-14 Polyphemus pediculus NCBI EF189633.1 BOLD RBGC141-03 NCBI (Pseudocalanus Pseudocalanus mimus elongatus) JX995319.1 NCBI AF513651.1 Pteropoda This study Pending-XXXX BOLD (Stylocheilus longicauda) GBMLG0219-06 Styela clava This study Pending-XXXX NCBI FJ528635.1 Tachidiidae NCBI (Tachidius triagularis) JQ315760.1 BOLD (Tachidius discipes) BNSCP121-14 Themisto libellula NCBI JN039368.1 NCBI FJ602467.1 Tisbe furcata CAISN Pending-XXXX BOLD-CAISN CAISN1131-13 Tortanus discaudatus This study Pending-XXXX BOLD-CAISN CAISN1455-14 Xanthidae NCBI (Xantho poressa) FM161989.1 BOLD (Xantho hydrophilus) MLALN020-10 Zaus abbreviatus NCBI (Zaus abbreviatus) EU380284.1 BOLD-CAISN CAISN1170-13

74 Figure S5: Comparison of species detection rates between PCR & Gel electrophoresis vs. NGS metabarcoding on the 49 species used in both the primer testing and mock communities. ‘PCR only’ refers to the species detected only by PCR and gel electrophoresis on the single species samples. ‘NGS only’ refers to the species detected only by next-generation sequencing mock communities in the metabarcoding approach.

75 Table S6: Detailed number of reads of the species detected by 18S marker and 3 COI fragments across 24 libraries. The column ‘n’ refers to the number of individuals.

Library Order Genus/Family Species n 18S FC Leray Folmer 1a Cyclopoida Acanthocyclops Acanthocyclops vernalis 1 0 0 0 0 1a Diplostraca Bosmina Bosmina longirostris 1 2383 0 0 0 1a Diplostraca Ceriodaphnia Ceriodaphnia lacustris 1 8 12 12803 64 1a Diplostraca Daphnia Daphnia parvula 1 0 0 0 0 1a Diplostraca Daphnia Daphnia pulex 1 30247 10 163 5963 1a Diplostraca Daphnia Daphnia pulicaria 1 0 0 0 0 1a Diplostraca Holopedium Holopedium gibberum 1 1 555 92809 161 1a Diplostraca Leptodora Leptodora kindtii 1 1 2899 30377 4814 1a Diplostraca Polyphemus Polyphemus pediculus 1 7113 509 15253 924 1a Calanoida Pseudocalanus Pseudocalanus mimus 1 0 0 0 0 1b Anostraca Artemia Artemia spp 1 48709 341 97932 475 1b Phlebobranchia Ciona Ciona intestinalis 1 0 0 0 0 1b Venerida Corbicula Corbicula fluminea 1 2 9935 46 11205 1b Amphipoda Crangonyx Crangonyx 1 46 0 0 0 Dreissena polymorpha or 1b Myida Dreissena rostriformis bugensis 1 0 0 0 0 1b Calanoida Eurytemora Eurytemora affinis 1 12309 4480 121711 1263 1b Amphipoda Gammarus Gammarus lacustris 1 5785 0 0 0 1b Amphipoda Gammarus Gammarus lawrencianus 1 0 0 0 0 1b Amphipoda Gammarus Gammarus oceanicus 1 0 37145 9671 2617 1b Amphipoda Hyalella Hyalella azteca 1 0 50 905 2336 1b Amphipoda Hyalella Hyalella clade 1 1 0 0 0 0 1b Amphipoda Hyalella Hyalella clade 8 1 8259 0 0 0 1b Amphipoda Hyperia Hyperia galba 1 0 0 4 1 1b Amphipoda Hyperoche Hyperoche mediterranea 1 0 0 0 0 1b Mytilida Limnoperna Limnoperna fortunei 1 3051 3320 6966 4139 1b Amphipoda Themisto Themisto libellula 1 9322 0 0 0

76 1c Sessilia Balanus Balanus crenatus 1 5 3 419 3 1c Sessilia Balanus Balanus glandula 1 70521 14483 11278 16675 1c Decapoda Carcinus Carcinus maenas 1 7 22 1 18 1c Decapoda Caridean Caridean larvae 1 17169 0 0 0 1c Sessilia Chthamalus Chthamalus dalli 1 3621 32066 1713 9452 1c Decapoda Crangonidae Crangonidae 1 51560 0 0 0 1c Decapoda Grapsidae Grapsidae 1 0 0 0 0 1c Decapoda Hippolytidae Hippolytidae 1 0 0 0 0 1c Thecosomata Limacina Limacina helicina 1 0 18 14 1223 1c Decapoda Majidae Majidae 1 0 0 0 0 1c Mytilida Mytilus Mytilus edulis 1 6523 0 62 9 1c Neogastropoda Nassarius Nassarius distortus 1 3291 0 0 0 1c Cycloneritimorpha Nerita Nerita spp 1 4137 7 31 1624 1c Decapoda Xanthidae Xanthidae 1 0 0 0 0 1d Diplostraca Bythotrephes Bythotrephes longimanus 1 0 9 1966 4 1d Calanoida Calanus Calanus finmarchicus 1 115 0 0 0 1d Calanoida Centropages Centropages abdominalis 1 39936 367 1 5 1d Harpacticoida Clytemnestra Clytemnestra scutellata 1 0 0 0 0 Diaphanosoma 1d Diplostraca Diaphanosoma brachyurum 1 26 0 0 0 1d Cyclopoida Eucyclops Eucyclops speratus 1 7905 0 0 0 1d Calanoida Leptodiaptomus Leptodiaptomus minutus 1 5535 12671 9232 195 1d Calanoida Limnocalanus Limnocalanus macrurus 1 0 0 0 0 1d Cyclopoida Macrocyclops Macrocyclops albidus 1 8719 0 0 0 1d Decapoda Neotrypaea Neotrypaea californiensis 1 50011 0 0 0 1d Copelata Oikopleura Oikopleura labradonensis 1 11 0 0 0 1d Cyclopoida Oithona Oithona atlantica 1 8 0 0 0 1d Anaspidea Pteropoda Pteropoda 1 0 0 0 0 1d Harpacticoida Tachidiidae Tachidiidae 1 0 0 0 0 1d Harpacticoida Tisbe Tisbe furcata 1 1707 0 2358 1 1d Harpacticoida Zaus Zaus abbreviatus 1 3402 0 0 0

77 1e Cyclopoida Acanthocyclops Acanthocyclops vernalis 1 0 0 0 0 1e Anostraca Artemia Artemia spp 1 17044 21 23988 151 1e Sessilia Balanus Balanus crenatus 1 12 22 69 5 1e Sessilia Balanus Balanus glandula 1 46040 11466 4662 18818 1e Diplostraca Bosmina Bosmina longirostris 1 48 0 0 0 1e Diplostraca Bythotrephes Bythotrephes longimanus 1 0 1 299 0 1e Calanoida Calanus Calanus finmarchicus 1 9 0 0 0 1e Decapoda Carcinus Carcinus maenas 1 10 21 0 13 1e Decapoda Caridean Caridean larvae 1 10548 0 0 0 1e Calanoida Centropages Centropages abdominalis 1 6796 5 0 0 1e Diplostraca Ceriodaphnia Ceriodaphnia lacustris 1 0 0 17 0 1e Sessilia Chthamalus Chthamalus dalli 1 2258 24727 700 9075 1e Phlebobranchia Ciona Ciona intestinalis 1 0 0 0 0 1e Harpacticoida Clytemnestra Clytemnestra scutellata 1 0 0 0 0 1e Venerida Corbicula Corbicula fluminea 1 0 714 6 4670 1e Decapoda Crangonidae Crangonidae 1 34549 0 0 0 1e Amphipoda Crangonyx Crangonyx 1 5 0 0 0 1e Diplostraca Daphnia Daphnia parvula 1 0 0 0 0 1e Diplostraca Daphnia Daphnia pulex 1 917 0 0 9 1e Diplostraca Daphnia Daphnia pulicaria 1 0 0 0 0 Diaphanosoma 1e Diplostraca Diaphanosoma brachyurum 1 10 0 0 0 Dreissena polymorpha or 1e Myida Dreissena rostriformis bugensis 1 0 0 0 0 1e Cyclopoida Eucyclops Eucyclops speratus 1 942 0 0 0 1e Calanoida Eurytemora Eurytemora affinis 1 4107 333 41692 527 1e Amphipoda Gammarus Gammarus lacustris 1 2350 0 0 0 1e Amphipoda Gammarus Gammarus lawrencianus 1 0 0 0 0 1e Amphipoda Gammarus Gammarus oceanicus 1 0 3513 1374 768 1e Decapoda Grapsidae Grapsidae 1 0 0 0 0 1e Decapoda Hippolytidae Hippolytidae 1 0 0 0 0

78 1e Diplostraca Holopedium Holopedium gibberum 1 0 7 1859 2 1e Amphipoda Hyalella Hyalella azteca 1 0 5 186 888 1e Amphipoda Hyalella Hyalella clade 1 1 0 0 0 0 1e Amphipoda Hyalella Hyalella clade 8 1 3448 0 0 0 1e Amphipoda Hyperia Hyperia galba 1 0 1 1 1 1e Amphipoda Hyperoche Hyperoche mediterranea 1 0 0 0 0 1e Calanoida Leptodiaptomus Leptodiaptomus minutus 1 951 406 1967 7 1e Diplostraca Leptodora Leptodora kindtii 1 0 34 150 13 1e Thecosomata Limacina Limacina helicina 1 0 23 7 1055 1e Calanoida Limnocalanus Limnocalanus macrurus 1 0 0 0 0 1e Mytilida Limnoperna Limnoperna fortunei 1 872 215 1048 1324 1e Cyclopoida Macrocyclops Macrocyclops albidus 1 1131 0 0 0 1e Decapoda Majidae Majidae 1 0 0 0 0 1e Mytilida Mytilus Mytilus edulis 1 3936 0 73 2 1e Neogastropoda Nassarius Nassarius distortus 1 1533 0 0 0 1e Decapoda Neotrypaea Neotrypaea californiensis 1 5688 0 0 0 1e Cycloneritimorpha Nerita Nerita spp 1 1673 10 26 1125 1e Copelata Oikopleura Oikopleura labradonensis 1 3 0 0 0 1e Cyclopoida Oithona Oithona atlantica 1 1 0 0 0 1e Diplostraca Polyphemus Polyphemus pediculus 1 111 4 89 0 1e Calanoida Pseudocalanus Pseudocalanus mimus 1 0 0 0 0 1e Anaspidea Pteropoda Pteropoda 1 4 0 0 0 1e Harpacticoida Tachidiidae Tachidiidae 1 0 0 0 0 1e Amphipoda Themisto Themisto libellula 1 2743 0 0 0 1e Harpacticoida Tisbe Tisbe furcata 1 240 0 436 0 1e Decapoda Xanthidae Xanthidae 1 0 0 0 0 1e Harpacticoida Zaus Zaus abbreviatus 1 447 0 0 0 1g Calanoida Acartia Acartia hudsonica 1 2 0 145 0 1g Trachymedusae Aglantha Aglantha digitale 1 2413 1 1 8 1g Anostraca Artemia Artemia spp 1 3009 8 3457 6 1g Isopoda Asellus Asellus 1 44 0 0 0

79 1g Sessilia Balanus Balanus crenatus 1 16 79 1149 14 1g Stolidobranchia Botrylloides Botrylloides violaceus 1 0 33 0 31 1g Stolidobranchia Botryllus Botryllus schlosseri 1 0 4272 2 0 1g Ploima Brachionus Brachionus calyciflorus 1 0 8 0 1 1g Diplostraca Bythotrephes Bythotrephes longimanus 1 0 0 4 0 1g Amphipoda Caprella Caprella mutica 1 4 0 170 1 1g Diplostraca Cercopagis Cercopagis pengoi 1 2740 19 163 2109 1g Venerida Corbicula Corbicula fluminea 1 0 897 2 703 1g Ostreida Crassostrea Crassostrea gigas 1 8083 29 1627 1338 1g Diplostraca Daphnia Daphnia magna 1 1712 995 108848 1614 1g Decapoda Hemigrapsus Hemigrapsus oregonensis 1 0 13172 22895 1782 1g Mysida Hemimysis Hemimysis anomala 1 344 21 1 7 1g Amphipoda Hyalella Hyalella azteca 1 0 0 0 0 1g Ploima Keratella Keratella quadrata 1 8 0 0 0 1g Thecosomata Limacina Limacina helicina 1 0 80 9 1314 1g Mytilida Limnoperna Limnoperna fortunei 1 75881 771 869 924 1g Decapoda Lophopanopeus Lophopanopeus bellus 1 4840 28915 72878 2654 1g Cardiida Macoma Macoma secta 1 2797 0 0 0 1g Cyclopoida Mesocyclops Mesocyclops edax 1 0 0 0 0 1g Decapoda Metacarcinus Metacarcinus magister 1 0 500 351 943 1g Harpacticoida Microsetella Microsetella norvegica 1 22 0 0 0 1g Stolidobranchia Styela Styela clava 1 321 18 1 973 1g Calanoida Tortanus Tortanus discaudatus 1 807 48 173 14 2a Sessilia Balanus Balanus crenatus 10 77705 78621 304921 33971 2a Amphipoda Gammarus Gammarus lawrencianus 1 0 251 223 111 2a Calanoida Leptodiaptomus Leptodiaptomus minutus 1 10281 1066 713 21 2a Decapoda Palaemonetes Palaemonetes spp 5 120748 0 0 0 2b Calanoida Acartia Acartia longiremis 5 0 0 0 0 2b Anostraca Artemia Artemia spp 2 82148 285 136249 723 2b Sessilia Chthamalus Chthamalus dalli 1 22117 87360 9683 19204 2b Amphipoda Hyalella Hyalella clade 8 5 58960 0 0 0

80 2b Diplostraca Leptodora Leptodora kindtii 5 0 4270 31645 1130 2c Decapoda Carcinus Carcinus maenas 2 168 2887 5 57 2c Venerida Corbicula Corbicula fluminea 5 9 118847 855 65196 2c Diplostraca Daphnia Daphnia pulex 10 65503 44 3617 1027 2c Cycloneritimorpha Nerita Nerita spp 1 31272 1370 9301 20703 2d Calanoida Eurytemora Eurytemora affinis 23 75980 55610 220231 31413 2e Calanoida Acartia Acartia longiremis 5 0 0 0 0 2e Anostraca Artemia Artemia spp 2 14504 15 1169 124 2e Sessilia Balanus Balanus crenatus 10 34321 46910 196638 30798 2e Decapoda Carcinus Carcinus maenas 2 7 8 0 3 2e Sessilia Chthamalus Chthamalus dalli 1 1754 10880 89 5254 2e Venerida Corbicula Corbicula fluminea 5 0 561 9 3341 2e Diplostraca Daphnia Daphnia pulex 10 3682 0 3 23 2e Calanoida Eurytemora Eurytemora affinis 23 30051 7415 74084 9078 2e Amphipoda Gammarus Gammarus lawrencianus 1 0 99 110 54 2e Amphipoda Hyalella Hyalella clade 8 5 17518 0 0 0 2e Calanoida Leptodiaptomus Leptodiaptomus minutus 1 1509 496 237 8 2e Diplostraca Leptodora Leptodora kindtii 5 0 298 157 94 2e Cycloneritimorpha Nerita Nerita spp 1 1081 2 7 511 2e Decapoda Palaemonetes Palaemonetes spp 5 54207 0 0 0 2g Calanoida Acartia Acartia hudsonica 4 4 0 217 0 2g Anostraca Artemia Artemia spp 2 3019 15 1510 19 2g Isopoda Asellus Asellus 1 1 0 0 0 2g Sessilia Balanus Balanus crenatus 3 367 2491 21246 299 2g Stolidobranchia Botrylloides Botrylloides violaceus 2 0 8 0 15 2g Stolidobranchia Botryllus Botryllus schlosseri 1 0 17393 2 20 2g Ploima Brachionus Brachionus calyciflorus 5 15 8 0 5 2g Diplostraca Bythotrephes Bythotrephes longimanus 2 0 0 0 0 2g Amphipoda Caprella Caprella mutica 1 3 1 568 3 2g Diplostraca Cercopagis Cercopagis pengoi 3 3756 23 146 1566 2g Venerida Corbicula Corbicula fluminea 2 1 3339 6 2738

81 2g Ostreida Crassostrea Crassostrea gigas 2 13453 49 2693 1939 2g Diplostraca Daphnia Daphnia magna 3 2853 1204 132843 2264 2g Decapoda Hemigrapsus Hemigrapsus oregonensis 2 0 4715 8063 595 2g Mysida Hemimysis Hemimysis anomala 1 135 4 0 1 2g Amphipoda Hyalella Hyalella azteca 3 0 0 0 0 2g Ploima Keratella Keratella quadrata 5 5 0 0 4 2g Thecosomata Limacina Limacina helicina 1 0 116 23 2637 2g Mytilida Limnoperna Limnoperna fortunei 3 53593 1451 1526 1272 2g Decapoda Lophopanopeus Lophopanopeus bellus 2 3862 27036 72664 2533 2g Cardiida Macoma Macoma secta 1 1867 0 0 0 2g Cyclopoida Mesocyclops Mesocyclops edax 4 0 0 0 0 2g Decapoda Metacarcinus Metacarcinus magister 1 0 598 466 1286 2g Harpacticoida Microsetella Microsetella norvegica 3 11 0 0 0 2g Stolidobranchia Styela Styela clava 1 284 18 0 1848 2g Calanoida Tortanus Tortanus discaudatus 4 1463 121 527 43 3a1 Mytilida Limnoperna Limnoperna fortunei 1 105955 14326 149790 16161 3a2 Mytilida Limnoperna Limnoperna fortunei 10 165347 14031 153880 10453 3a3 Mytilida Limnoperna Limnoperna fortunei 30 207171 23678 237670 18006 3b1 Sessilia Balanus Balanus crenatus 1 49784 87106 228805 4644 3b2 Sessilia Balanus Balanus crenatus 10 42001 73379 215991 1923 3b3 Sessilia Balanus Balanus crenatus 17 39405 7412 198797 2 3c1 Calanoida Tortanus Tortanus discaudatus 1 71308 36404 318338 19419 3c2 Calanoida Tortanus Tortanus discaudatus 8 86973 35889 184408 30596 3c3 Calanoida Tortanus Tortanus discaudatus 15 75067 60966 345532 24873 3d1 Diplostraca Leptodora Leptodora kindtii 1 10 104481 294868 9954 3d2 Diplostraca Leptodora Leptodora kindtii 10 8 167544 301656 16348 3d3 Diplostraca Leptodora Leptodora kindtii 28 8 139830 164686 18336

82 ADDITIONAL MATERIALS

Comparison of bioinformatics analyses The thesis only presents one bioinformatics pipeline, but often different bioinformatics analyses can greatly affect the species detection, thus different bioinformatics analyses involving the use of various softwares were compared (Appendix Table i). “R1only” and “R2only” methods refer to the blasting results of the trimmed R1 reads and trimmed R2 reads separately against the local database, and the subsequent species assignment. The “R1/R2” method refers to the species assigned by either trimmed R1 or trimmed R2 or both, and the “R1&R2” method refers to the species assigned by both R1 and R2. The “Merge” method refers to merging the R1 and R2 reads via the SeqPrep program (https://github.com/jstjohn/SeqPrep) prior to BLAST and taxonomy assignments. The “Join” method refers to the use of OBITools (Boyer et al. 2014) to join the unmerged R1 and R2 reads from SeqPrep, and subsequent BLAST searches from the joined reads against our local reference database (Appendix Table i). The different bioinformatics analyses were performed and compared, with the 18S V4 marker as an example (Table i), from the dataset generated by sequencing only the 18S V4 amplicons in the ‘single individuals species’ library 1e (see the detailed species list in the Supporting Information table S3). The species detection levels were the same for “R1only” and “R2only” methods, but the species detected by R1 and R2 were different (e.g. genus Bythotrephes, Limacna). The species detected and detection rates were very different for using different bioinformatics analysis, among the “R1only”, “R2only”, “R1/R2”, “R1R2”, “Merge”, and “Join” methods, with the lowest detection level with the “R1R2” method (Appendix Table i). Factors affecting species estimates include sequence filtering, merging or joining forward and reverse reads, clustering reads into operational taxonomic units (OTUs), and setting intra- and inter-specific divergence thresholds (Coissac et al. 2012; Flynn et al. 2015; Brown et al. 2015). The difference in species detection levels between forward R1 and reverse R2 of the mulit-copy 18S marker was potentially due to intra-specific variation across the V4 region. For example, Leptodora kindtii only had a good taxonomic match for forward R1, but not for reverse R2. Brown et al. (2015) also reported difficulty assigning reads to Leptodora kindtii due to intraspecific variation. For incorporating both R1 and R2 information, many metabarcoding studies merged forward and reverse reads prior to blasting (e.g. Hope et al. 2014; Leray et al. 2013). However, there were sequencing gaps in our data between forward R1 and reverse R2

83 (generated using MiSeq pair-end 300bp) in the Folmer fragments of 658bp and in some species where 18S was more than 600bp in length (e.g. Crangonyx forbesi, Daphnia magna, Gammarus lacustris). Previously, Liu et al. (2013) used both full length libraries and the shortgun libraries to deal with the 658bp amplicons when sequencing on the HiSeq 2x150bp, where the short gun libraries referred to the fragmentation of the 658bp amplicons then obtaining the short, overlapping reads. A detraction of this method is the potential for errors introduced during the assembly of the short gun reads. Aylaga et al. (2016) processed the reads of the 658bp folCOI barcode (equivalent to Folmer fragment in this study) by joining forward and reverse- complemented reverse reads to create a 409bp read without a 249bp internal fragment. However, the forward and reverse reads of Folmer and 18S fragments were not joined together using OBITools in the bioinformatics pipeline presented in this study. The assignment of reads to corresponding fragments were based on the matched BLAST positions of both R1 and R2 on the reference sequences, and this approach can be applied for assigning amplicons with or without sequencing gaps. Furthermore, the method of blasting R1 and R2 separately but only assessing reads with both R1 and R2 hits can still be applied for natural communities, but requires the selection and trimming of online reference sequences, as done by Aylagas et al. (2016) for the COI reference databases. Additional bioinformatics steps can be taken, such as the Operational taxonomic units (OTU) clustering and singletons removal. OTU clustering was not utilized in this study, because different divergence thresholds of the OTU clustering can underestimate biodiversity for clustering reads of different species into single OTUs, and can also overestimate biodiversity by oversplitting reads from the same species into different OTUs in 18S V4 marker (Brown et al. 2015; Bucklin et al. 2016; Collins & Cruickshank 2013; Young et al. 2016). In addition, species represented only by singletons (single read blasted to the species above the thresholds) were considered as detected in this study because we used mock communities with species known a priori. In past work of using the same community DNA, the retension of singletons of 18S marker was shown to have a little impact on the species detection rates through OTU clustering in the mock communities (Flynn et al. 2015), and the singletons were found to be important for allowing the detection of low abundance or low biomass species, such as rotifer Brachionus calyciflorus (Brown et al. 2015), tunicate Ciona intestinalis and barnacle Chthamalus dalli (Flynn et al. 2015). However, the interpretation of singleton data is complicated by ‘tag jumping’,

84 which has been observed in the Illumina indexing system and causes reads to be assigned to the wrong library/sample (Schnell et al. 2015; Esling et al. 2015). Esling et al. (2015) found that up to 28.2% of unique sequences corresponded to undetected mistags in mock communities libraries, and Schnell et al. (2015) found tag jumping rates of 2.6% in a leech study and 2.1% in a bat diet study. Thus, the inclusion of singletons can improve detection of rare or low abundant species, but can also potentially overestimate biodiversity by interpreting erroneous results caused by tag jumping (Esling et al. 2015) or sequencing errors (Tedersoo et al. 2010) or different bioinformatics workflow (Flynn et al. 2015).

References

Aylagas E, Borja A, Irigoien X, Rodriguez-Ezpeleta N (2016) Benchmarking DNA metabarcoding for biodiversity-based monitoring and assessment. Frontiers in Marine Science, 3, 96.

Boyer F, Mercier C, Bonin A, Le Bras Y, Taberlet P, Coissac E (2014). OBITools: a Unix- inspired software package for DNA metabarcoding. Molecular Ecology Resources, 16, 176-182.

Brown EA, Chain FJJ, Crease TJ, MacIsaac HJ, Cristescu ME (2015) Divergence thresholds and divergent biodiversity estimates: can metabarcoding reliably describe zooplankton communities? Ecology and Evolution, 5(11), 2234-2251.

Coissac E, Riaz T, Puillandre N (2012) Bioinformatic challenges for DNA metabarcoding of plants and animals. Mol Ecol, 21, 1834-1847.

Collins RA, Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Mol Ecol Resour, 13, 969-975.

Esling P, Lejzerowicz F, Pawlowski J (2015) Accurate multiplexing and filtering for high- throughput amplicon-sequencing. Nucleic Acids Research, 43, 2513-2524.

Flynn JM, Brown EA, Chain FJJ, MacIsaac HJ, Cristescu ME (2015) Towards accurate molecular identification of species in complex environmental samples: testing the performance of sequence filtering and clustering methods. Ecology and Evolution, 5(11), 2252-2266.

Hope PR, Bohmann K, Gilbert MTP, Zepeda-Mendoza ML, Razgour O, Jones G (2014) Second generation sequencing and morphological faecal analysis reveal unexpected foraging

85 behaviour by Myotis nattereri (Chiroptera, Vespertilionidae) in winter. Frontiers in Zoology, 11, 39.

Leray M, Yang JY, Meyer CP, Mills SC, Agudelo N, Ranwez V, Boehm JT, Machida RJ (2013) A new versatile primer set targeting a short fragment of the mitochondrial COI region for metabarcoding metazoan diversity: application for characterizing coral reef fish gut contents. Frontiers in Zoology, 10, 34.

Liu S, Lu J, Su X, Tang M, Zhang R, Zhou L, Zhou C, Yang Q, Ji Y, Yu, DW, Zhou X (2013) SOAPBarcode: revealing arthropod biodiversity through assembly of illumina shotgun sequences of PCR amplicons. Methods in Ecology and Evolution, 4, 1142-1150.

Hope PR, Bohmann K, Gilbert MTP, Zepeda-Mendoza ML, Razgour O, Jones G (2014) Second generation sequencing and morphological faecal analysis reveal unexpected foraging behaviour by Myotis nattereri (Chiroptera, Vespertilionidae) in winter. Frontiers in Zoology, 11, 39.

Schnell IB, Bohmann K, Gilbert MT (2015) Tag jumps illuminated--reducing sequence-to- sample misidentifications in metabarcoding studies. Molecular Ecology Resources, 15, 1289-1303.

Young RG, Abbott C, Therriault T, Adamowicz SJ (2016) Barcode-based species delimitation in the marine realm: a test using Hexanauplia (Multicrustacea: Thecostraca and Copepoda). Genome, 10.1139/gen-2015-0209.

86 Table i: Comparison of bioinformatics methods for the number of species detected on the ‘single individuals species’ (library 1e with 56 species in Supporting Information Table S3), which was sequenced on the illumina MiSeq pair-end 300bp platform with only single 18S V4 marker. “R1only” and “R2only” methods refer to the species detected from the blasting results of the trimmed R1 reads only and R2 reads only, respectively. “R1/R2” method refers to the species detected by R1 or R2, and “R1R2” method refers to species detected by both R1 and R2. “Merge” method refers to the species detected by blasting the merged reads of R1 and R2 via SeqPrep. “Join” method refers to join the unmerged R1 and R2 reads from SeqPrep using OBITools, then blast the joined reads (not overalapped reads) to the built local database for the number of species detected.

NGS Platform Illumina MiSeq 2x300bp Marker(s) 18S V4 Expected # Method R1only R2only R1/R2 R1R2 Merge Join Species Genus/Family Number of Species Detected Acanthocyclops 0 0 0 0 0 0 1 Artemia 1 1 1 1 1 1 1 Balanus 2 2 2 2 2 2 2 Bosmina 1 1 1 1 1 1 1 Bythotrephes 1 0 1 0 1 1 1 Calanus 1 1 1 1 1 1 1 Carcinus 1 1 1 1 1 1 1 Caridean 1 1 1 1 1 1 1 Centropages 1 1 1 1 1 1 1 Ceriodaphnia 1 1 1 1 1 1 1 Chthamalus 1 1 1 1 1 1 1 Ciona 1 1 1 1 0 0 1 Clytemnestra 0 1 1 0 0 0 1 Corbicula 1 1 1 0 1 1 1 Crangonidae 1 1 1 1 1 1 1 Crangonyx 1 1 1 0 1 1 1 Daphnia 1 2 2 1 2 2 3 Diaphanosoma 1 1 1 1 1 1 1 Dreissena 0 1 1 0 0 0 1 Eucyclops 1 1 1 1 1 1 1

87 Eurytemora 1 1 1 1 1 1 1 Gammarus 2 1 2 1 2 1 3 Grapsidae 1 0 1 0 1 1 1 Hippolytidae 0 0 0 0 0 0 1 Holopedium 1 0 1 0 1 1 1 Hyalella 3 2 3 1 3 3 3 Hyperia 1 0 1 0 1 1 1 Hyperoche 0 1 1 0 1 1 1 Leptodiaptomus 1 1 1 1 1 1 1 Leptodora 1 1 1 1 1 1 1 Limacina 0 1 1 0 1 0 1 Limnocalanus 0 0 0 0 0 0 1 Limnoperna 1 1 1 1 1 1 1 Macrocyclops 1 1 1 1 1 1 1 Majidae 0 1 1 0 1 1 1 Mytilus 1 1 1 1 1 0 1 Nassarius 1 1 1 1 0 0 1 Neotrypaea 1 1 1 1 1 1 1 Nerita 1 1 1 1 1 1 1 Oikopleura 0 1 1 0 0 0 1 Oithona 1 1 1 1 0 0 1 Polyphemus 1 1 1 1 1 1 1 Pseudocalanus 0 0 0 0 0 0 1 Pteropoda 1 0 1 0 1 1 1 Tachidiidae 0 0 0 0 0 0 1 Themisto 1 1 1 1 1 1 1 Tisbe 1 1 1 1 1 1 1 Xanthidae 0 0 0 0 0 0 1 Zaus 1 1 1 1 1 0 1 Total 41 41 48 31 42 38 56 Detection% 73.21% 73.21% 85.71% 55.36% 75.00% 67.86%

88

Table ii: Comparison of correct species assignment percentages of 18S marker using “R1only”, “R2only, “R1/R2” analysis, referring to the species detected by only forward R1, only reverse R2, either forward R1 or reverse R2, respectively from the BLAST results.

18S - R1 only 18S - R2 only 18S-R1/R2 Library (species name/number Number of Number of Number Number of Number of Number of of species) Reads Reads with of Reads Reads with Reads Reads with with Blast Correct Species with Correct Species with Blast Correct Species Hits Assignments Blast Hits Assignments Hits Assignments 1e (SIS; n=56) 281900 49892 187777 31626 469677 81518 1g (SIS; n=27) 192857 156743 141446 101177 334303 257920 2e (MIS; n=14) 303714 249635 213557 147154 517271 396789 2g (MIS; n=26) 186718 163207 112005 83574 298723 246781 Average 241297 154869 163696 90883 404994 245752 Species Recovery % 64.2% 55.5% 60.7% 3a1 (Limnoperna fortunei) 118041 108923 83554 78596 201595 187519 3a2 (Limnoperna fortunei) 170483 170385 109701 109397 280184 279782 3a3 (Limnoperna fortunei) 215690 215580 134553 134036 350243 349616 3b1 (Balanus crenatus) 79297 79080 34979 26351 114276 105431 3b2 (Balanus crenatus) 67521 67203 31215 25120 98736 92323 3b3 (Balanus crenatus) 54778 54561 22771 20056 77549 74617 3c1 (Tortanus discaudatus) 261993 117780 238800 234040 500793 351820 3c2 (Tortanus discaudatus) 313441 144806 284251 278970 597692 423776 3c3 (Tortanus discaudatus) 282129 129005 250487 244912 532616 373917 3d1 (Leptodora kindtii) 90839 89193 56047 21 146886 89214 3d2 (Leptodora kindtii) 115268 113210 71171 22 186439 113232 3d3 (Leptodora kindtii) 102773 101040 60598 26 163371 101066 Average 156021 115897 114844 95962 270865 211859 Species Recovery % 74.3% 83.6% 78.2%

89 Table iii: Number of reads for the species detected using the 18S and 3 COI fragments in ‘Populations of Single Species’ (PSS), with expected species highlighted in grey, followed by the species detected but not included in the mock communities.

Library Species (n=number of individuals) 18S FC Leray Folmer 3a1 Limnoperna fortunei (n=1) 105955 14326 149790 16161 Balanus crenatus 13 37 166 1 Daphnia magna 0 2400 159360 632 Limacina helicina 0 60 90 2582 3a2 Limnoperna fortunei (n=10) 165347 14031 153880 10453 Balanus crenatus 7 21 120 0 3a3 Limnoperna fortunei (n=30) 207171 23678 237670 18006 Balanus crenatus 11 36 126 1 3b1 Balanus crenatus (n=1) 49784 87106 228805 4644 Limnoperna fortunei 53 7 68 4 3b2 Balanus crenatus (n=10) 42001 73379 215991 1923 Balanus glandula 3053 7250 664 9 Limnoperna fortunei 60 7 71 1 3b3 Balanus crenatus (n=17) 39405 7412 198797 2 Limnoperna fortunei 84 6 62 2 3c1 Tortanus discaudatus (n=1) 71308 36404 318338 19419 Acartia hudsonica 323 0 4 0 Daphnia magna 0 633 6720 69 Leptodora kindtii 0 93 144 2 3c2 Tortanus discaudatus (n=8) 86973 35889 184408 30596 Acartia hudsonica 337 0 971 0 Daphnia magna 0 20414 151378 2878 Leptodora kindtii 0 82 128 5 3c3 Tortanus discaudatus (n=15) 75067 60966 345532 24873 Acartia hudsonica 332 0 373 0 Daphnia magna 0 117 2276 24 Eurytemora affinis 70 0 0 0 Leptodora kindtii 0 133 141 6 3d1 Leptodora kindtii (n=1) 10 104481 294868 9954 Tortanus discaudatus 25 27 148 7 3d2 Leptodora kindtii (n=10) 8 167544 301656 16348 Balanus crenatus 4 5 51 1 Leptodiaptomus minutus 227 0 0 0 Tortanus discaudatus 16 14 154 6 3d3 Leptodora kindtii (n=28) 8 139830 164686 18336 Tortanus discaudatus 14 23 121 6

90