<<

1 Running Head: Aquatic assemblages from eDNA metabarcoding

2 Casting a broader net: Using microfluidic to capture aquatic data from

3 diverse taxonomic targets

4 Laura L. Hauck1†, Kevin A. Weitemier2†, Brooke E. Penaluna1, Tiffany Garcia2, and Richard Cronn1* 5 1U.S. Department of Agriculture, Forest Service, Pacific Northwest Research Station, 3200 SW Jefferson Way,

6 Corvallis, OR 97331, email: [email protected]

7 2Oregon State University, Department of Fisheries and Wildlife, 104 Nash Hall, Corvallis, OR 97331

8 Tel: 1 (541) 737-7291 Fax: 1 (541) 750-7329 *Author for correspondence

9 †L. Hauck and K. Weitemier should be considered joint first author.

10

11 Abstract 12 Environmental DNA (eDNA) assays for single- and multi- detection show promise for providing 13 standardized assessment methods for diverse taxa, but techniques for evaluating multiple taxonomically- 14 divergent assemblages are in their infancy. We evaluated whether microfluidic multiplex metabarcoding and 15 high-throughput could identify diverse aquatic and riparian assemblages from 48 taxon-general and 16 taxon-specific metabarcode primers. eDNA screening was paired with electrofishing along a stream continuum 17 to evaluate congruence between methods. A fish hatchery located midway along the stream continuum 18 provided a dispersal barrier, and a point source for non-native White Sturgeon (Acipencer transmontanus). 19 Microfluidic metabarcoding detected all 13 species observed by electrofishing, with overall accuracy of 86%. 20 Taxon-specific barcoding primers were more successful than taxon-general universal metabarcoding primers at 21 classifying sequences to species. Both types of markers detected a transition from downstream sites dominated 22 by multiple fish species, to upstream sites dominated by a single species; however, we failed to detect a 23 transition in amphibian population structure. White Sturgeon was only detected at the hatchery outflow, 24 indicating eDNA transport was not detectable at ~2.4 km. Overall, we identified 878 predicted taxa, with most 25 sequences (49.8%) derived from fish (Actinopteri, Petromyzontidae), (21.4%), Arthropoda (classes 26 Insecta, Decapoda; 16.6%), and Apicomplexan parasites (3.83%). Taxa accounting for ~1% or less of 27 sequences included freshwater red algae, diatoms, amphibians, and beaver. Our work shows that microfluidic 28 metabarcoding can survey multiple phyla per assay, providing fine discrimination required to resolve closely- 29 related species, and enable data-driven prioritization for multiple forest health objectives.

30 Introduction 31 Land management agencies survey and monitor many biota to meet diverse management objectives. These 32 activities typically involve multiple agencies and objectives that address questions focusing on shared 33 geography. Adjacent riparian areas can be migration corridors for amphibians or water-dependent mammals, 34 while catchments may harbor pathogens that affect wildlife (e.g., white-nose syndrome [Flory et al., 2012]; 35 amphibian chytridiomycosis [Olson et al., 2013]), forest health (sudden oak death from [Hansen 36 et al., 2012]), or human community health (Tiedemann, 2000). These overlapping management concerns on a Aquatic assemblages from eDNA metabarcoding 2

37 common catchment require multiple teams with technical expertise to address basic questions of species 38 detection. Typically, biotic surveys are conducted with limited cross-taxon integration due to the difficulty of 39 coordinating across disciplines, jurisdictions (state, federal), and agencies (US Fish and Wildlife Service, US 40 Forest Service, Bureau of Land Management, US Geological Survey, and others). 41 Environmental DNA (eDNA) analysis has emerged as a powerful method for detecting aquatic and riparian 42 species, one with the capacity to bridge diverse disciplines and inform multiple management objectives. 43 Originally developed for characterizing microbial (Venter et al., 2004) and fungal (Anderson and Cairney, 44 2004) communities, eDNA analysis has expanded to include diverse eukaryotes, including plants (Willerslev et 45 al., 2003), invertebrates (Hajibabaei et al., 2011; Thomsen, Kielgast, Iversen, Wiuf, et al., 2012), and 46 vertebrates (Andersen et al., 2012; Thomsen, Kielgast, Iversen, Møller, et al., 2012). Methods for eDNA 47 analysis have evolved from assays targeting one to a few well-characterized taxa (e.g., qPCR, digital PCR; 48 Nathan et al., 2014), to “metabarcoding” assays that identify scores of taxonomic targets per sample (Deiner et 49 al., 2016; Thomsen, Kielgast, Iversen, Møller, et al., 2012; Valentini et al., 2016; Wilcox et al., 2018). 50 Metabarcoding approaches have been shown to provide detection accuracies equivalent or better than 51 traditional sampling methods (Deiner et al., 2016; Thomsen, Kielgast, Iversen, Møller, et al., 2012; Valentini et 52 al., 2016), and a much larger taxonomic spectrum per assay. Metabarcoding represents a technological leap, 53 but it is not without limitations, such as the challenge of independently validating unexpected (false) positives 54 and negatives, the difficulty in detecting rare species, or the difficulty of identifying taxa to the species-level 55 using universal DNA metabarcoding (Deiner et al., 2016). From the perspective of land management 56 agencies, metabarcoding assays based on single markers share a limitation in that they address a fraction of the 57 broad spectrum management questions asked by land managers. 58 Incorporating multiple barcoding and metabarcoding targets into a single assay offers independent 59 observations for taxon presence/absence, and more accurate biodiversity estimates across highly-diverse taxa 60 (Drummond et al., 2015; Elbrecht et al., 2017; Evans et al., 2017; Gibson et al., 2014; Stat et al., 2017). Here, 61 we evaluate the performance of microfluidic PCR in combination with eDNA metabarcoding to 62 simultaneously evaluate up to 48 loci in 2,304 nanoliter-scale PCR reactions per array. High level multiplexing 63 allows simultaneous screening with taxon-general and taxon-specific genes in one assay, providing multiple 64 markers for taxonomic inference (e.g., phylum to species), and a mechanism to independently validate 65 presence and absence observations (Brown et al., 2016). 66 We tested eDNA samples collected from a stream continuum ~11 km in length. This continuum includes an 67 impassable barrier to hatchery fish, which creates a break in the distribution of hatchery and wild stocks for 68 select species. Multiple primer sets targeting taxon-general and taxon-specific mitochondrial genes were 69 designed for the detection of fish (salmonids, sculpins, lamprey, sturgeon), amphibians (frogs, salamanders), 70 invertebrates (crayfish, mayflies, stoneflies), and (Phytophthora, Saprolegnia) and fungal 71 (Batrachochytrium, Pseudogymnoascus) pathogens. PCR products ranging from ~150-406 bp in length were 72 amplified using the microfluidic Fluidigm Access Array, and products were sequenced by Illumina massively 73 parallel sequencing. Presence/absence detection by eDNA is directly compared to electrofishing to evaluate the 74 accuracy and specificity of eDNA microfluidic metabarcoding as a qualitative and semi-quantitative proxy for 75 species-level field identification. Aquatic assemblages from eDNA metabarcoding 3

76 Materials and Methods 77 Study Site 78 We surveyed five sites in Fall Creek, a tributary to the Alsea River in the Coast Range of Oregon, USA (Figure 79 1). Annual precipitation at Fall Creek averages 1856 mm/yr (2001-2010 estimate; Wang et al., 2016), with 2/3 80 of annual precipitation falling November–February. Stream temperatures range from 5-17°C year-round. The 81 Oregon Hatchery Research Center (OHRC) is located on Fall Creek, and OHRC maintains a physical barrier to 82 fish passage that completely blocks upstream migration of Chinook Salmon (Oncorhynchus tshawytscha 83 Walbaum), partially blocks passage of hatchery Rainbow Trout (O. mykiss Walbaum, including the steelhead 84 life history) and Coho Salmon (O. kisutch Walbaum), and allows migration of wild Coastal Cutthroat Trout (O. 85 clarkii clarkii Richardson), Rainbow Trout, Coho Salmon, and smaller native fish. Two sampling sites on Fall 86 Creek were located downstream of the OHRC (sites 1, 2), one site was at the OHRC effluent outflow (site 3), 87 and two sites were located upstream of the hatchery (sites 4, 5). 88 The OHRC provides experimental opportunities to evaluate the impact of eDNA point sources on downstream 89 detection. At the time of our sampling in 2017, the hatchery was rearing ~5,000 Rainbow Trout and ~15,000 90 Chinook Salmon (D.L.G. Noakes, personal communication). OHRC also maintains eight captive White 91 Sturgeon (Acipencer transmontanus Richardson), creating a point source for a phylogenetically distinctive 92 species not naturally found in this catchment. 93 Traditional Backpack Electrofishing Survey 94 Backpack electrofishing (Cossel et al., 2012) for fish, amphibians, and crayfish was conducted at sites below 95 (sites 1, 2) and above (sites 4, 5) the OHRC within a 15 day period (20-July-2017 to 04-August-2017). Site 3 96 (hatchery outflow) was not evaluated by electrofishing. Some species show high morphological similarity and 97 can be misidentified in the field. These include Riffle and Reticulate sculpin (Cottus gulosus Girard; C. 98 perplexus Gilbert & Evermann), Longnose and Speckled dace (Rhinichthys cataractae Valenciennes; R. 99 osculus Girard), and juvenile Rainbow and Coastal Cutthroat trout < 50 mm in length. The latter group was 100 recorded as “Oncorhynchus young-of-year” (YOY). See supplemental text for additional details. 101 Water Sampling and DNA Extraction 102 Water samples were collected immediately prior to electrofishing surveys at the downstream blocknet used for 103 the electrofishing survey. Preliminary trials (data not shown) indicated that 3 L water samples filtered through 104 0.45 μm cellulose nitrate filters (Sterlitech, Kent, WA, USA) gave significantly better results than smaller 105 samples (e.g., 1 L with 0.45 μm filters), and comparable results to larger volumes through 1.0 μm high volume 106 filtration capsules (e.g., 60 L through a Pall Envirochek® Sampling Capsule). We filtered 3 L of stream water 107 (2 L thalweg + 1 L adjacent slack water) using a peristaltic pump (Proactive Pegasus Alexis, Fairborn, Ohio, 108 USA). At site 3 all 3 L were collected in a pool at the outflow from OHRC. Each site was represented by eight 109 independent filter replicates to capture taxa with low detection probabilities (Ficetola et al., 2015). To prevent 110 cross-site contamination we always entered sites downstream from where samples were taken, and equipment 111 (bottles, tweezers, waders) was decontaminated with a 50% bleach solution followed by a triple rinse of 112 deionized water. Filters were stored in 5 ml vials on wet ice during collection and transport, frozen at –20 °C 113 within 6 hours of collection, and stored at –20 °C until DNA extraction. DNA was extracted using the MoBio 114 (Qiagen) Power Water DNA extraction kit per manufacturer’s instructions. This kit has a step specifically 115 designed to remove inhibitors, such as humic acids commonly found in freshwater ecosystems (Matheson et 116 al., 2010; Wetzel, 1993). Aquatic assemblages from eDNA metabarcoding 4

117 Primer Design for Microfluidic PCR amplification 118 We used the Fluidigm 48.48 Access Array™ (Fluidigm, San Francisco, CA, USA) to amplify taxon-general 119 and taxon-specific target genes. The Access Array uses integrated fluidic circuits that enable the simultaneous 120 amplification of 48 targets from 48 samples, for 2,304 independently amplified and barcode-tagged reactions 121 per assay, which are sequenced using massively parallel sequencing. We designed amplification primers with 122 annealing temperatures of 58-60°C, and lengths ranging from ~230-500 bp (lower bound to meet 123 post-amplification removal of primer-adapter dimers; upper bound to limit amplicon length). Forward and 124 reverse amplification primers were modified by the addition of 5’ common sequence tags (CS1, CS2; 125 www.fluidigm.com) that add the P5 and P7 Illumina sequences and dual-index multiplex barcodes in one 126 amplification step. 127 We designed taxon-general universal metabarcoding primers that amplified diverse classes of organisms (e.g., 128 ray-finned fishes [Teleostei] and amphibians [Batrachia], Chondrostei fish, mussels [Bivalvia], and insects), 129 targeting the 12S rDNA region used by Valentini et al. (2016), along with the 16S, 18S, and internal 130 transcribed spacer (ITS) rDNA.. To design taxon-specific primers, we focused on taxonomically-informative 131 genes typically used for species barcoding such as 1 (COI), cytochrome B (CytB), 132 NADH dehydrogenase 2 (ND2), D-loop, and beta-tubulin. Primers for these genes were designed to amplify 133 multiple related species (e.g., salmonids) in gene regions that included diagnostic polymorphisms for species- 134 specific identification (“barcode gaps”; Hebert et al., 2004). 135 All primers were computationally screened for primer compatibility, annealing temperature, and off-target 136 activity. Those that could be validated with positive control DNA were screened following manufacturer’s 137 instructions (Fluidigm Corporation, 2016) with minor modifications. See supplemental text for additional 138 details on primer development and screening. Primer sequences are shown in Supplemental File 2. 139 Multiplex PCR Amplification and Massively-Parallel Sequencing 140 We analyzed eight replicate samples per five sites with 15 ng/μl input eDNA, plus two positive controls and 141 one negative control. Positive controls consisted of a multi-target standard that included either 2×106 or 2×104 142 molecules of each amplicon per Fluidigm reaction. Positive and negative reactions were supplemented with 143 “inert” (non-target) genomic DNA from two gymnosperms not native to Fall Creek (Pinus lambertiana Dougl. 144 and Ginkgo biloba L.) to equalize the final DNA concentration among all samples (~15 ng/μl). See 145 supplemental text for additional details. 146 DNA samples and primers were sent for amplification and sequencing to the Idaho Institute for 147 and Evolutionary Studies (iBEST) Genomics Resources Core Facility (Moscow, ID, USA). Amplification used 148 the Fluidigm 48.48 Access Array (Fluidigm, San Francisco, CA, USA) and standard Fluidigm cycling 149 parameters (Table 1). Bovine serum albumin (BSA) was added at 0.2 μg/μl to alleviate PCR inhibition from 150 contaminants in environmental DNA (Romanowski et al., 1993; Widmer et al., 1996), and to mitigate inhibitor 151 driven bias (Valentini et al., 2016). 152 Following amplification, pooled products were cleaned using the NGS Library Cleanup Kit (MagBio 153 Genomics, Inc., Gaithersburg, MD, USA), quantified using the KAPA qPCR kit (Kapa Biosystems, Woburn, 154 MA, USA), and sequenced on ¼ Illumina MiSeq lane (Illumina, San Diego, California, USA) using version 3 155 chemistry and 300 bp paired-end reads. Aquatic assemblages from eDNA metabarcoding 5

156 Bioinformatic Analysis 157 Sequence reads were demultiplexed using sample-specific dual barcode combinations and target-specific 158 amplification primers using dbcAmplicons (version 0.8.6; Settles and Gerritsen, 2014). See supplemental text 159 for additional details. 160 Overlapped sequences were taxonomically assigned using Centrifuge (version 1.0.4-beta, accessed 2018-10- 161 07; Kim et al., 2016). We used a custom database developed from sequences downloaded from National Center 162 for Biotechnology Information (NCBI) (nt) database in January 2018 using queries listed in 163 Supplemental Table 1. The highest scoring match between query and target was reported; if multiple sequences 164 scored equally, the lowest taxonomic rank containing all highest-scoring hits was reported. 165 Preliminary Centrifuge analyses classified some reads as belonging to species related to those from western 166 Oregon, but known to not occur in the Pacific Northwestern U.S. From these analyses, a list of 337 species, 167 genera, and hybrids was developed to exclude from future classification (Supplemental Table 4). Notable 168 excluded taxa include Oncorhynchus × Salmo; marine Cottus species; genera closely related to Cottus; and 169 species and subspecies of Oncorhynchus, Cottus, Rana, Acipenser, and Phytophthora found outside of the 170 Pacific Northwestern U.S.. 171 For each sample × primer combination, the number of reads classified to each taxon was counted. Reads 172 appearing in the negative control were used to calculate a minimum read threshold, which was greater than 173 95% of the negative control primer × taxon counts. Counts below this threshold were then dropped from all 174 other sample × primer combinations. Counts above the threshold were considered “positive hits” for that taxon 175 in that sample × primer combination. To assess primer efficiency for each primer pair we compared the total 176 number of reads generated in the 2×106 and 2×104 copy positive controls across all primer pairs and ranked 177 them. We examined Pearson correlations between read counts as a function of the length of the amplicon and 178 the GC% of the primers (Figure 3, Supplemental File 2).

179 Agreement in taxon presence/absence between electrofishing and eDNA 180 Agreement between detection by electrofishing and eDNA was assessed for all sites excluding the hatchery 181 outflow. For this analysis, only taxa detected by electrofishing are considered, and detection of a taxon’s DNA 182 in one replicate was considered a detection by eDNA for that taxon at that site. By using the electrofishing 183 detections as “true” abundance, the sensitivity (or “true positive” rate), specificity (“true negative” rate), and 184 accuracy ([true positive + true negative]/all observations) of eDNA detections were calculated in R (version 185 3.5.2; R Core Team, 2018) using the caret package (version 6.0-80; Kuhn, 2008). Additionally, for genera with 186 multiple species detected by electrofishing (e.g., Oncorhynchus, Cottus, Rhinichthys), detections by 187 electrofishing and eDNA were grouped and evaluated at the genus level, and agreement among methods 188 assessed similarly. 189 For Oncorhynchus and Cottus, detection accuracy could be determined using three loci, independently or 190 combined, with each genus represented by 12S rDNA, COI, and either ND2 (Oncorhynchus) or CytB (Cottus) 191 (Supplemental Table 3). For Oncorhynchus, multiple primer pairs were developed for ND2 (3 sets) and COI (2 192 sets) so that all species likely to occur in Fall Creek could be sampled. Agreement between electrofishing and 193 eDNA was assessed separately for each of the taxon-specific targets; agreement was also assessed for 194 “consensus detection”, requiring any two loci for a positive detection, or requiring all three loci for a positive 195 detection. Aquatic assemblages from eDNA metabarcoding 6

196 Metagenomic count variation over a stream gradient 197 To examine variability of read abundance across the stream gradient, read counts were summarized for all 198 replicates of CytB for Phytophthora (oomycetes that include plant pathogens of Pacific Northwest trees), and 199 12S rDNA and COI for the four salmond species present in Fall Creek. We also summarized read count 200 variability for White Sturgeon using replicates of 16S, CytB, and D-loop, since this taxon only exists in 201 captivity at site 3. Our Centrifuge database classifies White Sturgeon sequences to several taxa (e.g., 202 Acipenseridae, Acipenser, A. transmontanus, A. medirostris); for this reason, White Sturgeon counts were 203 averaged across three gene targets for each taxon, then summed across taxa. A visualization of all taxa detected 204 at individual sites and all sites was created with Krona (version 2.7; Ondov et al., 2011). For this presentation, 205 taxon counts above the minimum read threshold were summed for all replicates within a site and for all 206 primers within a replicate.

207 Results 208 eDNA metabarcoding results 209 Sequencing samples on one quarter of a MiSeq flowcell produced 5,863,313 reads. For 80.64% of these reads 210 the primer producing the amplicon could be identified, the reads pairs overlapped, and the sequence classified 211 (Table 2). For individual sites, this translated to a mean of 650,570 + 174,982 sequences/site (range: 437,037 - 212 913,090), and a mean of 81,321 + 40,761 sequences/replicate/site (range: 1300 - 143,133). 213 Our experiment included 48 primer pairs, and we directly tested the efficiency of 39 pairs using positive 214 controls with 2x104 or 2x106 template DNA molecules per gene target per Access Array reaction (Figure 3, 215 Supplemental File 2). Sequence read counts from these positive controls showed a 2.5-fold range in 216 abundance, with the highest rank efficiency from primers targeting 16S rDNA from crayfish and insects, and 217 the lowest efficiency by primers targeting 12S rDNA from Signal Crayfish (Pacifastacus leniusculus Dana; 218 Figure 3A). The absence of large-fold differences in product yield indicates that there are no strong biases in 219 amplification efficiency across primers, and that all should be capable of detecting eDNA from field samples. 220 Sequence read counts from the two positive control samples were highly correlated and nearly equal (r = 221 0.950; slope = 0.952; F1,34 = 313.8, P < 0.001), despite the 100-fold concentration difference in template DNA 222 molecules (Figure 3B). This indicates that PCR was saturated with respect to substrate, and that reactions 223 reached the plateau phase of amplification. Product yield in the 2x106 molecule control is significantly 224 associated with amplicon length (r = 0.370; F1,34 = 5.407, P = 0.0262; Figure 3C), but not mean primer GC% 225 (r = -0.248; F1,34 = 2.229, P = 0.145; Figure 3D), indicating that optimizing for uniform amplicon lengths 226 could yield better sequencing uniformity across targets. 227 Although screened for off-target activity during development, some primer pairs showed substantial off-target 228 amplification (Supplemental File 2). For example, the taxon-general amphibian 12S rDNA primers amplified 7 229 genera, with most reads coming from teleost fish and 1.4% from amphibians. Similarly, a taxon-specific CytB 230 primer pair designed for Coastal Tailed Frog (Ascaphus truei Stejneger) amplified 31 genera, including insects, 231 with only 17.9% of reads on target. Many factors impact the fidelity of primers, including design, but the rarity 232 of target DNA in the eDNA sample likely causes some primers to amplify off-target sequences. 233 Amplification efficiency of eight targets could not be evaluated in control reactions since we lacked source 234 DNA. These include taxa from classes Amphibia (Rana pretiosa), Bivalvia (Anodonta sp, Margaritifera sp.), 235 Conoidasida (Cryptosporidium hominis), Dothideomycetes (Pseudogymnoascus destructans), Mammalia 236 (Castor canadensis), and Pucciniomycetes (Conartium ribicola). An additional primer targeted the Ginko 237 DNA added to positive and negative controls. Aquatic assemblages from eDNA metabarcoding 7

238 Stream complexity and defining the “hidden world” 239 Based on NCBI , 3.2 million DNA sequences from field samples were classified into 878 predicted 240 taxa, including 20 phyla, 67 classes, 156 orders, 302 families, 449 genera, and 648 species (Figure 7, for an 241 interactive chart see Supplemental File 4). Overall representation of taxonomic groups was roughly 242 proportional to the number of primers used in targeted PCR amplification; for example: 243 Fish, including Actinopteri (ray-finned fish) and Petromyzontidae (lamprey), were amplified with the largest 244 number of primer pairs (15 of 48), and they accounted for 1,592,684 sequences (49.8% of total) across five 245 sites. The most abundant sequences derived from the families Salmonidae, Cottidae, Petromyzontidae and 246 Cyprinidae, in decreasing order. 247 Oomycota, including families Saprolegniaceae, , and Pythiaceae, were amplified using 4 248 primer pairs, and they accounted for 684,245 sequences (21.4% of total) across sites. The most abundant 249 oomycete sequences were derived from fish and amphibian pathogens in the genus Saprolegnia, followed by 250 plant pathogens Phytophthora and Pythium, in decreasing order. 251 Arthropods were targeted using 5 primers, and accounted for 395,584 sequence reads (16.6% of total) across 252 sites. Sequences from class Insecta were most abundant, with ~330 taxa detected from the orders 253 Ephemeroptera, Plecoptera and Diptera (in decreasing order). Nineteen genera accounted for > 1% or more of 254 the total Insecta sequences, and 14 of these genera are known to be native to western Oregon streams (Table 4). 255 Decapoda, represented exclusively by Signal Crayfish, was the second most abundant class of arthropod. 256 Parasitic alveolates from the phylum Apicomplexa were amplified using one primer pair targeting 18S rDNA 257 from Cryptosporidium, and they accounted for 122,497 sequences (3.8% of total). The most abundant 258 sequences derived from the orders Neogregarinorida, Eucoccidiorida, and Eugregarinorida. 259 Freshwater red algae (family Lemaneaceae; 40,128 reads, 1.26%) and diatoms (family Bacillariaceae; 39,849 260 reads, 1.25%) were the remaining groups that accounted for more than 1% of all classified sequences. These 261 taxa were amplified by non-specific priming with Saprolegnia COI (Lemaneaceae) and Cryptosporidium 18S 262 rDNA (Bacillariaceae) primers. 263 Amphibians stand out as a surprisingly underrepresented class. Amphibians accounted for 25,848 sequences 264 (0.81% of total), and this low representation contrasts the large number of primer pairs (12) used to screen for 265 amphibian taxa, and their abundance in the electroshocking survey (especially site 5). 266 A final observation is that the sole mammal surveyed in this analysis – North American Beaver (Castor 267 canadensis Kuhl), a know resident of Fall Creek – accounted for 4,145 sequences (0.14%) and was detected at 268 sites 1 and 2. 269 Electrofishing Results: Species presence and abundance 270 Thirteen species were identified by electrofishing in Fall Creek, including fishes (9), amphibians (3), and 271 crayfish (1; Figure 2). Taxon distributions followed two trends, with most taxa either distributed across all four 272 sites, or found in the three downstream sites (1, 2, 4). Taxa captured across all four sites include: (a) Coastal 273 Cutthroat Trout, the third most-abundant salmonid in our study, but the most numerically abundant at site 5; 274 (b) Coastal Giant Salamander (Dicamptodon tenebrosus Baird and Girard), the most abundant amphibian in 275 our study; and second most numerically abundant species at site 5; and (c) Signal Crayfish. 276 Sea-run fishes, sculpins, minnows, Rough-skinned Newts (Taricha granulosa Skilton), and Coastal Tailed 277 Frogs were captured at one or more of the downstream sites (Figure 2). Salmonids include: Rainbow Trout, the Aquatic assemblages from eDNA metabarcoding 8

278 most numerically abundant salmonid in our study that was found at sites 1, 2 and 4; juvenile Coho Salmon, 279 which showed a similar distribution as Rainbow Trout; and juvenile Chinook Salmon, which was found almost 280 exclusively at site 1, except a single occurrence at site 2. Sculpins (Cottus spp.) were the most abundant fish 281 taxa with regard to counts and in our study, with Reticulate Sculpin as the dominant species. Sculpins, 282 dace (Rhinichthys spp.), and Pacific Lamprey (Entosphenus tridentatus Richardson) ammocetes were all found 283 at the three downstream sites, with the highest abundances at sites 1 and 2. Two amphibians, Coastal Tailed 284 Frog and Rough-skinned Newt, were each detected at a single location (sites 1 and 4, respectively), and they 285 had the lowest overall biomass among detected species (Figure 2). 286 YOY accounted for 34% (n = 307) of combined counts for trout, but because they cannot be identified to 287 species, counts and biomass for Coastal Cutthroat and Rainbow trout are not precisely known. Sculpin and 288 dace show similar complications, with unknowns accounting for 2.7% of sculpin and 0.7% of dace counts. 289 Comparison between electrofishing and eDNA 290 All 13 species of fishes, amphibians, and crayfish that were identified with electrofishing were identified by 291 eDNA (Figures 4B-4F). Salmonids, sculpins, Pacific Lamprey, and Signal Crayfish were detected by eDNA at 292 sites where they were detected by electrofishing. Speckled and Longnose dace were detected at one and two 293 sites, respectively, out of three sites where they were detected by electrofishing. eDNA identified an additional 294 mitochondrial haplotype lineage of sculpins at sites 1, 2, and 4; it also detected Rainbow Trout at site 5, where 295 it was not detected by electrofishing. 296 Coastal Tailed Frog and Coastal Giant Salamander were both detected by electrofishing and eDNA, with 297 eDNA detecting Coastal Tailed Frog at an additional site beyond electrofishing, but only detecting Coastal 298 Giant Salamander at three of four sites. Rough-skinned Newt was observed by eDNA at Site 3, where 299 electrofishing was not performed, but was not observed by eDNA at Site 4, where it was observed by 300 electrofishing. Dunn’s Salamander (Plethodon dunni Bishop) and Pacific Tree Frog (Pseudacris regilla Baird 301 and Girard), which are riparian amphibians, were only detected by eDNA. 302 Overall, eDNA had a detection accuracy of 0.87 across taxa for species captured by electrofishing, with nearly 303 identical sensitivity and specificity (0.86 and 0.87, respectively; Table 2, Figure 4B). Accuracy at the genus 304 level, for genera with multiple species, was 0.83, with slightly higher sensitivity (true positive rate) of 0.90, 305 and lower specificity (true negative rate) of 0.50. The taxon-specific primers performed better for salmonids 306 (0.94 accuracy) than for sculpins (0.50), but within each group the taxon-specific loci performed equally well. 307 Although individual loci targeting sculpins had only 0.50 accuracy, the use of multiple loci improved sculpin 308 detection accuracy to 0.75 (Table 3). The use of multiple loci for salmonid detection did not improve accuracy 309 because individual loci were highly accurate and performed identically across sites. 310 Of the three loci targeting salmonids (12S rDNA, COI, ND2), taxon-specific ND2 and COI were the most 311 accurate (0.94) (Table 3). All loci targeting sculpins (12S rDNA, COI, CytB) showed equal accuracy (0.50), 312 but lower than that observed for salmonids. Requiring the agreement of two loci reduced accuracy for sculpins 313 (0.25), but equalled the detection accuracy of ND2 and COI for salmonids. Requiring the agreement of all 314 three loci for salmonids reduced accuracy to 0.67, while no sculpin species was supported by detection from all 315 three loci. 316 Metagenomic counts along the stream 317 Sampling a stream gradient allowed us to examine how eDNA counts changed longitudinally for select taxa. 318 Oomycetes like Phytophthora are nearly ubiquitous in our samples, with Phytophthora CytB sequences 319 detected in 39 of 40 samples. Most sequences were identified as Phytophthora bilorbang Aghighi, Hardy, Aquatic assemblages from eDNA metabarcoding 9

320 Scott & Burgess, but they could also represent the undescribed Phytophthora taxon “Oaksoil” (Brasier et al., 321 2003; Hansen and Delatour, 1999; Sims et al., 2015). Phytophthora sequence counts showed a strong linear 322 relationship (r = 0.841; F1,38 = 92.11, P < 0.001) with total sequences from sample libraries (Figure 5). 323 Normalizing sequence counts by total library counts (e.g., Phytophthora reads per million total reads [RPM]; 324 Pereira et al., 2018) mitigated library-specific differences, and reduced the mean coefficient of variation across 325 sites (CV = 60.6% for untransformed counts; CV = 39.1% for transformed counts) for Phytophthora. Most 326 importantly, RPM transformed read counts were not significantly different across sites (one-way ANOVA, 327 F4,35 = 1.122 , P= 0.362; Figure 6A). 328 The interaction between sample site 3, the OHRC facility, and RPM-transformed eDNA counts was examined 329 for three species reared at the hatchery. At the time of our study, OHRC was rearing ~15,000 Chinook Salmon 330 and ~5,000 Rainbow Trout smolts, and this is associated with a strong increase in eDNA signal for both 331 species at site 3 (Figure 6B, 6D). For Chinook Salmon, samples from the proximal downstream site 2 332 appeared to also show evidence of elevated eDNA from downstream transport, with high read counts similar to 333 those of site 3 (Figure 6D), positive eDNA detection in 8/8 COI replicates (Figure 4D) and 7/8 ND2 replicates 334 (Figure 4E). These high values contrast the single Chinook Salmon individual captured at this site (Figure 2). 335 Since Rainbow Trout eDNA was detected at all sites with all marker genes, it is unclear whether eDNA 336 abundance of this species at site 2 was influenced by downstream transport from the OHRC. White Sturgeon, 337 the only non-native fish targeted in this study, showed a significant increase in eDNA at site 3 where it was 338 detected in 4/8 replicates from the OHRC outflow (Figure 6F). This species is only present in OHRC raceways 339 and eDNA was not detected in the downstream sites, indicating that transport of sturgeon eDNA is spatially 340 limited, or that our assay has a high detection limit for this taxon.

341 Discussion 342 Using eDNA metabarcoding via microfluidic multiplex PCR and high-throughput sequencing we successfully 343 detected all species that were identified by electrofishing along a stream continuum. Detection of salmonid 344 species has the highest detection accuracies because they were targeted with taxon-specific and taxon-general 345 primers. Other fishes (sculpin, dace, lamprey) show equal detection accuracies at the level of genus or family, 346 while being more challenging to identify to the species level. This is either due to the use of taxon-general 347 primers, limited genetic divergence in these closely-related species, or because databases lacked 348 representatives for local populations and species (e.g., sculpin, discussed below). We also detected ecologically 349 important vertebrates from the same samples, including aquatic and riparian amphibians and beaver, although 350 detection of amphibians was less accurate than fish. Finally, we detected a broad spectrum of oomycete and 351 fungal pathogens from environmental samples, as well as diverse aquatic insects. The taxonomic breadth 352 revealed from this assay (878 predicted taxa from one Access Array cell and ¼ MiSeq run) highlights the 353 power of microfluidic multiplex PCR as a method for assessing a large number of diverse taxa relevant to 354 stream and riparian communities. This highly-multiplexed approach significantly extends the reach of 355 traditional “metabarcoding” (e.g., Deiner et al., 2016; Thomsen, Kielgast, Iversen, Møller, et al., 2012; 356 Valentini et al., 2016), and it offers a high degree of specificity that exceeds early results from hybridization- 357 based enrichment (Wilcox et al., 2018). 358 Microfluidic metabarcoding: successes and challenges, by taxonomic group 359 While environmental DNA detected most targeted taxa, some detections occurred with high accuracy while 360 others showed low correspondence with electrofishing results. Disagreement between electrofishing and eDNA 361 observations may be due to several factors. Species detected by eDNA, but not electrofishing, may be present 362 nearby but not captured, such as when animals are located upstream and their DNA is sampled from Aquatic assemblages from eDNA metabarcoding 10

363 downstream transport. Electrofishing also shows estimates of mark–recapture efficiencies of only 4–25% in 364 streams (Bayley and Peterson, 2001; Rosenberger and Dunham, 2005), and it is most efficient in shallower 365 water with average stream habitat conditions, and for larger fish (Price and Peterson, 2010). Electrofishing can 366 also miss fishes with low capture probabilities, such as those possessing coarse scales (cyprinids) or lacking 367 swim bladders (sculpins). Species observed by electrofishing, but not eDNA, may be a result of bias against 368 PCR amplification by primers used in the study, sampling heterogeneity, or bias against specific taxa by the 369 methods used in water collection, filtering, or DNA isolation. Some primer sets also amplified off-target taxa, 370 and a high relative abundance of non-targeted:targeted species may lead to an under detection of target species 371 for some primers (Supplemental File 2). Finally, the detection limit of the Access Array with eDNA is not 372 known, so under detection may be due to insufficient target DNA.

373 Multiple taxon-specific and taxon-general markers improve detection accuracy and reduce detection 374 uncertainty: Examples from salmonids 375 Salmonids showed near-perfect agreement between electrofishing and eDNA, except a single discrepancy at 376 site 5. We attribute this discrepancy to false negative detection with electrofishing, based on the strength of 377 eDNA evidence for the presence of Rainbow Trout (detection in 7 of 8 replicates; detection by three genes 378 [12S rDNA, COI, ND2]). Uncertain morphological classification of juveniles probably contributes to this 379 discrepancy, but only four individuals were classified as YOY at this site so the impact of their DNA would be 380 modest. The more likely explanation is that Rainbow Trout are present site 5, a finding that expands the 381 presence of this species further upstream than is generally recognized. 382 Salmonids highlight a trend that we observed study-wide: taxon-specific barcoding markers (COI, ND2, CytB) 383 provide higher classification accuracy than taxon-general universal markers (12S, 16S and 18S rDNA). With 384 salmonids, the difference in classification accuracy is due to the minimal sequence divergence between Coastal 385 Cutthroat and Rainbow trout at 12S rDNA (e.g., one segregating polymorphism). Under these conditions, 386 Centrifuge classifies many Coastal Cutthroat Trout sequences as either “Rainbow Trout” or “Oncorhynchus”, 387 leading to an underestimate of Coastal Cutthroat Trout presence based on taxon-general markers relative to 388 taxon-specific markers or electrofishing. A similar lack of precision was observed for sculpins (identified as 389 “Unidentified Cottus”) and daces (“Unidentified Cyprinidae”). This finding mirrors those from studies that 390 also show greater resolution with taxon-specific versus taxon-general barcode markers (Drummond et al., 391 2015; Evans et al., 2017). 392 The importance of basing eDNA presence/absence estimates on observations from multiple independent 393 markers (e.g., Evans et al., 2017) can also be illustrated using salmonids. In microfluidic PCR, DNA is 394 amplified in a separate chamber for each primer pair, so each reaction provides an independent estimate of 395 taxon presence. These independent observations can be used to devise detection criteria that range from lenient 396 (e.g., detected with genes A, B or C) to restrictive (e.g., detected with genes A, B, and C). Multiple loci 397 increase the likelihood that low abundance or difficult to detect taxa can be observed, and they can be used to 398 independently corroborate unexpected observations, such as novel observations for occupancy or range (e.g., 399 Rainbow Trout presence at site 5). Although the use of multiple loci did not improve the detection accuracy for 400 salmonids, they have the potential to improve accuracy, as we show with sculpins (Table 3). Aquatic assemblages from eDNA metabarcoding 11

401 Database representation and accuracy influence classification of novel diversity: An example from 402 sculpins 403 Sculpins also showed near-perfect agreement between electrofishing and eDNA, with Riffle and Reticulate 404 sculpin detected at sites 1, 2 and 4 by both methods. eDNA indicated the presence of sculpin CytB sequences 405 at site 5, where it was not detected by electrofishing. As observed with Rainbow Trout, this likely indicates 406 that sculpins extend further upstream than previously documented. eDNA also indicated the presence of 407 abundant COI and CytB haplotypes that classify to Prickly Sculpin (C. asper Richardson), a morphologically 408 distinctive species that does not occur in Fall Creek. We classified these sequences as “Prickly Sculpin-like”, 409 but they likely represent haplotype lineages that have yet to be attributed to Riffle or Reticulate Sculpin. 410 Further investigation into the genetic and morphological variation in sculpins is warranted to clarify taxonomic 411 boundaries for previously documented sculpin lineages, and to identify novel haplotypic lineages. 412 Sculpin results highlight the importance of the classification databases used to classify eDNA metabarcoding 413 data, especially in taxon-rich, geographically widespread groups like sculpins. Compared to sculpin species 414 native to Fall Creek, Prickly Sculpin is over-represented in NCBI at all loci (276 CytB sequences vs. 167 for 415 Riffle and 5 for Reticulate; 29 COI sequences vs. 3 Riffle and 6 Reticulate; 8 12S rDNA sequences vs. 0 Riffle 416 and 1 Reticulate; search conducted October 29, 2018). This over-representation increases the likelihood that 417 uncharacterized sequences will be identified as Prickly Sculpin. Databases like NCBI are broadly 418 comprehensive, and may contain haplotype-morphotype combinations (e.g., hybrids, introgressed haplotypes, 419 or retained ancestral alleles; Collins and Cruickshank, 2013; Ermakov et al., 2015; Moritz and Cicero, 2004) 420 that are relevant to specific geographic regions, but misleading outside of those regions. Similarly, taxonomic 421 revisions and errors in classification and DNA sequencing can accumulate in NCBI, and these may affect 422 classification accuracy. The use of a region-specific database based on comprehensive taxon representation and 423 vouchered specimens would eliminate many classification errors; this type of database is under development in 424 Oregon (https://www.obgp.org/).

425 Unrelated factors result in imprecise classifications: Examples from dace and Pacific Lamprey 426 Dace and Pacific Lamprey were detected by electrofishing and eDNA, but detections with eDNA were 427 characterized by ambiguous or incorrect classifications. Longnose and Speckled dace were observed by 428 electrofishing at three sites, but eDNA only observed Speckled Dace at two sites, and Longnose Dace at one 429 site. 12S rDNA sequences classified as “Unidentified Cyprinidae” were observed at two of these sites, and they 430 are likely attributable to one or both of these species. In contrast to other taxa, we surveyed dace using only 431 one taxon-general metabarcoding marker (12S rDNA), and this locus may lack the precision to discriminate 432 closely-related Cyprinidae. 433 Pacific Lamprey was detected by electrofishing and eDNA using taxon-general (16S rDNA) and taxon-specific 434 (CytB) primers, but it was also imprecisely classified by eDNA to family (Petromyzontidae), even though it is 435 the sole lamprey species in Fall Creek. In this instance, recognition of the genus Entosphenus distinct from 436 Lampetra in NCBI has resulted in nearly identical sequences representing different genera; the most 437 parsimonious resolution is to move all sequences up the taxonomic hierarchy to family Petromyzontidae. Two 438 actions – the application of taxon-specific markers, and modifying the reference database to reflect finer 439 taxonomic subdivisions (e.g., subfamily Lampetrinae) or different taxonomic concepts (e.g., a broader 440 definition of Lampetra) – will improve the detection accuracy in future assays. Aquatic assemblages from eDNA metabarcoding 12

441 Detection of amphibians: importance of detection probability, primer specificity, and assay sensitivity 442 The accuracy of amphibians detection across sites was moderately lower than overall eDNA accuracy (75% vs 443 86%, respectively), but amphibians were also detected in fewer replicates per site, and at sequence counts far 444 lower than fish, invertebrates, or pathogens. We observed discrepancies where amphibians were detected by 445 eDNA but not electrofishing (Pacific Tree Frog, Dunn’s Salamander), which could represent false eDNA 446 positives (possibly from contamination) or true eDNA positives for species intermittently inhabiting aquatic 447 habitats. Both Pacific Tree Frog and Dunn’s Salamander are native to the Fall Creek watershed, so an eDNA 448 detection is reasonable. We observed other discrepancies (Coastal Giant Salamander, Rough-Skinned Newt) 449 where amphibians were detected by eDNA and electrofishing, but not at the same sites, likely indicating a low 450 detection probability in electrofishing (e.g., low numbers) or eDNA sampling (e.g., poor assay specificity or 451 sensitivity). Assaying a larger number of replicates could be one solution to improving accuracy when 452 detection probabilities and false positive rates are low (Ficetola et al., 2015). 453 While the ability of amphibians to traverse the aquatic/terrestrial interface can make continuous detection in 454 water samples difficult, poor assay detection for amphibians is also at least partly responsible for lowered 455 detection. For example, 1.4% of the reads from 12S rDNA “Salamander” primers derived from the order 456 Caudata, with the remaining sequences dominated by fish. Our assay included other salamander-specific 457 primers that ranked highly in amplification efficiency (Coastal Giant Salamander ND2, rank 3 of 40; Rough- 458 skinned Newt ND2, rank 13 of 40; Supplemental File 2), suggesting that primer specificity and efficiency do 459 not fully account for our unreliable detection of amphibians. The influence of a third factor, assay sensitivity, is 460 not well understood. The Fluidigm Access Array is typically used for sequence characterization (e.g., Brown et 461 al., 2016), not quantitative or semi-quantitative analysis. Future studies including quantitative methods like 462 qPCR or ddPCR will help understand these limits.

463 Microfluidic metabarcoding extends opportunities to examine diverse, unrelated communities: 464 Examples from invertebrates and aquatic oomycetes 465 Over one-third of DNA sequences obtained in this analysis derived from phyla Arthropoda (aquatic insects and 466 crayfish) and Oomycota (water molds; Figure 7, Supplemental File 4). The only taxon from these phyla 467 evaluated by electrofishing, Signal Crayfish, was detected at 100% accuracy using eDNA. Our field 468 identification efforts did not extend to aquatic insects, but two universal primer sets targeting 16S rDNA 469 produced 395,584 sequences that were classified to Ephemeroptera (46.4%), Plecoptera (25.2%), and Diptera 470 (19.6%), and 14 of 19 identified genera are confirmed residents of western Oregon (Table 4). 471 Microfluidic metabarcoding also revealed diverse oomycetes and forest pathogens. Phytophthora was our 472 specific focus, as these waterborne oomycetes include significant pathogens responsible for root/crown rot and 473 stem cankers on oaks and oak relatives (P. ramorum Werres), Port-Orford Cedar (P. lateralis (Mont.) de 474 Bary), and other trees and shrubs (Hansen et al., 2012). These pathogens are the focus of pathogen surveys 475 across the U.S., as management of Phytophthora-infected zones seasonally impacts timber harvesting, 476 recreation, and restoration activities (U.S. Department of the Interior, 2004). Given their ubiquity, 477 Phytophthora, Saprolegnia, or similarly cosmopolitan oomycetes may also provide “internal positive” controls 478 that can be used in lieu of spike-in controls (Tourlousse et al., 2017) for sample validation, or as a check for 479 errors in collection (e.g., filtration) or DNA extraction methods. Our assay included primers for additional 480 animal pathogens, specifically amphibian chytrid fungus (Batrachochytrium dendrobatidis Longcore, Pessier 481 & Nichols; Olson et al., 2013) and the fungus that causes White-nose syndrome in (Pseudogymnoascus 482 destructans (Blehert & Gargas) Minnis & D.L. Lindner; Blehert et al., 2009). We did not detect these Aquatic assemblages from eDNA metabarcoding 13

483 pathogens at Fall Creek, where they are not known to occur, but this assay allows primers for these and other 484 pathogens to be included with routine screening of other eDNA targets.

485 Conclusion 486 We demonstrate that eDNA metabarcoding, using multiple primer pairs via microfluidic multiplex PCR and 487 high-throughput sequencing, can successfully detect the presence of multiple species across Eukarya, 488 validating the ability to define multiple aspects of aquatic biodiversity from a single sample. Although eDNA 489 detected all taxa that were detected by electrofishing with an overall accuracy of 86%, eDNA additionally 490 detected insects, a mammal, and oomycete pathogens not detected by electrofishing. The ability to target 491 multiple genetic loci across multiple taxa allows for independent observations of taxa across loci, allows for 492 fine taxonomic resolution as well as broad detection at higher taxonomic levels, and allows for detection of 493 common that can act as proxies for per-sample positive controls. This work broadens the 494 scope of eDNA research by informing conservation decisions for a wide range of taxonomic groups, including 495 common, endangered, rare, and cryptic species, enabling data-driven prioritization and evaluation of 496 management actions across the aquatic community.

497 Acknowledgements 498 We thank electrofishing crews, colleagues who contributed tissue samples of taxa, and landowners for stream 499 access. Fish and amphibian collections in 2017 were authorized by Oregon Department of Fish and Wildlife 500 scientific take permit #21223 for fish and #090-17 for amphibians; National Marine Fisheries Permit #21050; 501 and United States Forest Service Institutional Animal Care and Use Committee Permit #2017-11. Financial 502 support was provided by National Council for Air and Stream Improvement, Inc and the US Forest Service 503 Pacific Northwest Research Station. Please see the supplemental text for a complete list of acknowledgements.

504 Data accessibility 505 Sequence data is archived under the NCBI Sequence Read Archive BioProject PRJNA517944. Classification 506 output from Centrifuge, R scripts, and Krona tables are deposited in the Oregon State University institutional 507 archive ScholarsArchive@OSU (dx.doi.org/10.7267/x920g302b). Sculpins, Longnose Dace, and steelhead 508 from our sampling sites at Fall Creek are vouchered in the Oregon State Ichthyology Collection (OSIC). Aquatic assemblages from eDNA metabarcoding 14

509 References 510 Andersen K, Bird KL, Rasmussen M, et al. (2012) Meta-barcoding of ‘dirt’ DNA from soil reflects vertebrate 511 biodiversity. Molecular Ecology 21(8): 1966–1979. DOI: 10.1111/j.1365-294X.2011.05261.x. 512 Anderson IC and Cairney JWG (2004) Diversity and ecology of soil fungal communities: increased 513 understanding through the application of molecular techniques. Environmental 6(8): 514 769–779. DOI: 10.1111/j.1462-2920.2004.00675.x. 515 Bayley PB and Peterson JT (2001) An approach to estimate probability of presence and richness of fish 516 species. Transactions of the American Fisheries Society 130(4): 620–633. DOI: 10.1577/1548- 517 8659(2001)130<0620:AATEPO>2.0.CO;2. 518 Blehert DS, Hicks AC, Behr M, et al. (2009) Bat white-nose syndrome: An emerging fungal pathogen? Science 519 323(5911): 227–227. DOI: 10.1126/science.1163874. 520 Brasier CM, Cooke DEL, Duncan JM, et al. (2003) Multiple new phenotypic taxa from trees and riparian 521 ecosystems in Phytophthora gonapodyides–P. megasperma ITS Clade 6, which tend to be high- 522 temperature tolerant and either inbreeding or sterile. Mycological Research 107(3): 277–290. DOI: 523 10.1017/S095375620300738X. 524 Brown SDJ, Collins RA, Boyer S, et al. (2012) SPIDER: An R package for the analysis of species identity and 525 evolution, with particular reference to DNA barcoding. Molecular Ecology Resources 12(3): 562–565. 526 DOI: 10.1111/j.1755-0998.2011.03108.x. 527 Brown SP, Ferrer A, Dalling JW, et al. (2016) Don’t put all your eggs in one basket: a cost-effective and 528 powerful method to optimize primer choice for rRNA environmental community analyses using the 529 Fluidigm Access Array. Molecular Ecology Resources 16(4): 946–956. DOI: 10.1111/1755- 530 0998.12507. 531 Collins RA and Cruickshank RH (2013) The seven deadly sins of DNA barcoding. Molecular Ecology 532 Resources 13(6): 969–975. DOI: 10.1111/1755-0998.12046. 533 Cossel JO, Gaige MG and Sauder JD (2012) Electroshocking as a survey technique for stream-dwelling 534 amphibians. Wildlife Society Bulletin 36(2): 358–364. DOI: 10.1002/wsb.145. 535 Deiner K, Fronhofer EA, Mächler E, et al. (2016) Environmental DNA reveals that rivers are conveyer belts of 536 biodiversity information. Nature Communications 7: 12544. DOI: 10.1038/ncomms12544. 537 Drummond AJ, Newcomb RD, Buckley TR, et al. (2015) Evaluating a multigene environmental DNA 538 approach for biodiversity assessment. GigaScience 4(1): 46. DOI: 10.1186/s13742-015-0086-1. 539 Elbrecht V, Vamos EE, Meissner K, et al. (2017) Assessing strengths and weaknesses of DNA metabarcoding- 540 based macroinvertebrate identification for routine stream monitoring. Methods in Ecology and 541 Evolution 8(10): 1265–1275. DOI: 10.1111/2041-210X.12789. 542 Ermakov OA, Simonov E, Surin VL, et al. (2015) Implications of hybridization, NUMTs, and overlooked 543 diversity for DNA barcoding of Eurasian ground squirrels. PLOS ONE 10(1): e0117201. DOI: 544 10.1371/journal.pone.0117201. 545 Evans NT, Li Y, Renshaw MA, et al. (2017) Fish community assessment with eDNA metabarcoding: effects of 546 sampling design and bioinformatic filtering. Canadian Journal of Fisheries and Aquatic Sciences 547 74(9): 1362–1374. DOI: 10.1139/cjfas-2016-0306. 548 Ficetola GF, Pansu J, Bonin A, et al. (2015) Replication levels, false presences and the estimation of the 549 presence/absence from eDNA metabarcoding data. Molecular Ecology Resources 15(3): 543–556. 550 DOI: 10.1111/1755-0998.12338. 551 Flory AR, Kumar S, Stohlgren TJ, et al. (2012) Environmental conditions associated with bat white-nose 552 syndrome mortality in the north-eastern United States. Journal of Applied Ecology 49(3): 680–689. 553 DOI: 10.1111/j.1365-2664.2012.02129.x. 554 Fluidigm Corporation (2016) Access Array System for Illumina Sequencing Systems User Guide (PN 100- 555 3770 J1). Fluidigm Corporation. Available at: www.fluidigm.com. 556 Gibson J, Shokralla S, Porter TM, et al. (2014) Simultaneous assessment of the macrobiome and Aquatic assemblages from eDNA metabarcoding 15

557 in a bulk sample of tropical arthropods through DNA metasystematics. Proceedings of the National 558 Academy of Sciences 111(22): 8007–8012. DOI: 10.1073/pnas.1406468111. 559 Hajibabaei M, Shokralla S, Zhou X, et al. (2011) Environmental barcoding: A next-generation sequencing 560 approach for biomonitoring applications using river benthos. PLOS ONE 6(4): e17497. DOI: 561 10.1371/journal.pone.0017497. 562 Hansen E and Delatour C (1999) Phytophthora species in oak forests of north-east France. Annals of Forest 563 Science 56(7): 539–547. DOI: 10.1051/forest:19990702. 564 Hansen EM, Reeser PW and Sutton W (2012) Phytophthora beyond agriculture. Annual Review of 565 Phytopathology 50(1): 359–378. DOI: 10.1146/annurev-phyto-081211-172946. 566 Hebert PDN, Stoeckle MY, Zemlak TS, et al. (2004) Identification of birds through DNA barcodes. PLOS 567 Biology 2(10): e312. DOI: 10.1371/journal.pbio.0020312. 568 Integrated DNA Technologies (2017) PrimerQuest® Program. Coralville, Iowa, USA: Integrated DNA 569 Technologies. Available at: https://www.idtdna.com/SciTools. 570 Kim D, Song L, Breitwieser FP, et al. (2016) Centrifuge: Rapid and sensitive classification of metagenomic 571 sequences. Genome Research. DOI: 10.1101/gr.210641.116. 572 Kuhn M (2008) Building predictive models in R using the caret package. Journal of Statistical Software; Vol 573 1, Issue 5 (2008). DOI: 10.18637/jss.v028.i05. 574 Matheson CD, Gurney C, Esau N, et al. (2010) Assessing PCR inhibition from humic substances. The Open 575 Inhibition Journal 3(1). Available at: https://benthamopen.com/ABSTRACT/TOEIJ-3-38 576 (accessed 31 January 2019). 577 Moritz C and Cicero C (2004) DNA barcoding: Promise and pitfalls. PLOS Biology 2(10): e354. DOI: 578 10.1371/journal.pbio.0020354. 579 Nathan LM, Simmons M, Wegleitner BJ, et al. (2014) Quantifying environmental DNA signals for aquatic 580 invasive species across multiple detection platforms. DOI: 10.1021/es5034052. 581 Olson DH, Aanensen DM, Ronnenberg KL, et al. (2013) Mapping the global emergence of Batrachochytrium 582 dendrobatidis, the amphibian chytrid fungus. PLOS ONE 8(2): e56802. DOI: 583 10.1371/journal.pone.0056802. 584 Ondov BD, Bergman NH and Phillippy AM (2011) Interactive metagenomic visualization in a web browser. 585 BMC Bioinformatics 12(1): 385. DOI: 10.1186/1471-2105-12-385. 586 Pereira MB, Wallroth M, Jonsson V, et al. (2018) Comparison of normalization methods for the analysis of 587 metagenomic gene abundance data. BMC Genomics 19(1): 274. DOI: 10.1186/s12864-018-4637-6. 588 Price AL and Peterson JT (2010) Estimation and modeling of electrofishing capture efficiency for fishes in 589 wadeable warmwater streams. North American Journal of Fisheries Management 30(2): 481–498. 590 DOI: 10.1577/M09-122.1. 591 R Core Team (2018) R: A Language and Environment for Statistical Computing. Vienna, Austria: R 592 Foundation for Statistical Computing. Available at: http://www.R-project.org/. 593 Romanowski G, Lorenz MG and Wackernagel W (1993) Use of polymerase chain reaction and electroporation 594 of Escherichia coli to monitor the persistence of extracellular plasmid DNA introduced into natural 595 soils. Applied and Environmental Microbiology 59(10): 3438–3446. 596 Rosenberger AE and Dunham JB (2005) Validation of abundance estimates from mark–recapture and removal 597 techniques for Rainbow Trout captured by electrofishing in small streams. North American Journal of 598 Fisheries Management 25(4): 1395–1410. DOI: 10.1577/M04-081.1. 599 Settles M and Gerritsen A (2014) DbcAmplicons. Available at: https://github.com/msettles/dbcAmplicons 600 (accessed 31 January 2018). 601 Sims LL, Sutton W, Reeser P, et al. (2015) The Phytophthora species assemblage and diversity in riparian 602 alder ecosystems of western Oregon, USA. Mycologia 107(5): 889–902. DOI: 10.3852/14-255. 603 Stat M, Huggett MJ, Bernasconi R, et al. (2017) Ecosystem biomonitoring with eDNA: Metabarcoding across 604 the tree of life in a tropical marine environment. Scientific Reports 7(1): 12240. DOI: 10.1038/s41598- Aquatic assemblages from eDNA metabarcoding 16

605 017-12501-5. 606 Thomsen PF, Kielgast J, Iversen LL, Møller PR, et al. (2012) Detection of a diverse marine fish fauna using 607 environmental DNA from seawater samples. PLOS ONE 7(8): e41732. DOI: 608 10.1371/journal.pone.0041732. 609 Thomsen PF, Kielgast J, Iversen LL, Wiuf C, et al. (2012) Monitoring endangered freshwater biodiversity 610 using environmental DNA. Molecular Ecology 21(11): 2565–2573. DOI: 10.1111/j.1365- 611 294X.2011.05418.x. 612 Tiedemann AR (2000) Wildlife. Drinking Water from Forests and Grasslands: A Synthesis of the Scientific 613 Literature SRS-39, General Technical Report. Asheville, North Carolina: United States Department of 614 Agriculture, Forest Service, Southern Research Station. 615 Tourlousse DM, Yoshiike S, Ohashi A, et al. (2017) Synthetic spike-in standards for high-throughput 16S 616 rRNA gene amplicon sequencing. Nucleic Acids Research 45(4): e23–e23. DOI: 10.1093/nar/gkw984. 617 U.S. Department of the Interior, Bureau of Land Management and U.S. Department of Agriculture, Forest 618 Service (2004) Final supplemental environmental impact statement: Management of Port-Orford-cedar 619 in southwest Oregon. BLM/OR/WA/PL-04/005-1792, January. Portland, OR. Available at: 620 https://www.fs.usda.gov/detail/rogue- 621 siskiyou/landmanagement/resourcemanagement/?cid=stelprdb5316256. 622 Valentini A, Taberlet P, Miaud C, et al. (2016) Next-generation monitoring of aquatic biodiversity using 623 environmental DNA metabarcoding. Molecular Ecology 25(4): 929–942. DOI: 10.1111/mec.13428. 624 Venter JC, Remington K, Heidelberg JF, et al. (2004) Environmental genome of the 625 Sargasso Sea. Science 304(5667): 66–74. DOI: 10.1126/science.1093857. 626 Wang T, Hamann A, Spittlehouse D, et al. (2016) Locally downscaled and spatially customizable climate data 627 for historical and future periods for North America. PLOS ONE 11(6): e0156720. DOI: 628 10.1371/journal.pone.0156720. 629 Wetzel RG (1993) Humic compounds from wetlands: Complexation, inactivation, and reactivation of surface- 630 bound and extracellular . SIL Proceedings, 1922-2010 25(1): 122–128. DOI: 631 10.1080/03680770.1992.11900072. 632 Widmer F, Seidler RJ and Watrud LS (1996) Sensitive detection of transgenic plant marker gene persistence in 633 soil microcosms. Molecular Ecology 5(5): 603–613. DOI: 10.1111/j.1365-294X.1996.tb00356.x. 634 Wilcox TM, Zarn KE, Piggott MP, et al. (2018) Capture enrichment of aquatic environmental DNA: A first 635 proof of concept. Molecular Ecology Resources 18(6): 1392–1401. DOI: 10.1111/1755-0998.12928. 636 Willerslev E, Hansen AJ, Binladen J, et al. (2003) Diverse plant and animal genetic records from Holocene and 637 Pleistocene sediments. Science 300(5620): 791–795. DOI: 10.1126/science.1084114. 638 Ye J, Coulouris G, Zaretskaya I, et al. (2012) Primer-BLAST: A tool to design target-specific primers for 639 polymerase chain reaction. BMC Bioinformatics 13(1): 134. DOI: 10.1186/1471-2105-13-134. 640 Aquatic assemblages from eDNA metabarcoding 17

641 Table 1: Fluidigm PCR and primer validation protocol. †Final extension step excluded from the primer 642 validation protocol.

PCR Stages Number of Cycles

50 °C, 2 minutes 1

70 °C, 20 minutes 1

95 °C, 10 minutes 1

95 °C, 15 seconds 58 °C, 30 seconds 10 72 °C, 1 minute

95 °C, 15 seconds 80 °C, 30 seconds 2 58 °C, 30 seconds 72 °C, 1 minute

95 °C, 15 seconds 58 °C, 30 seconds 8 72 °C, 1 minute

95 °C, 15 seconds 80 °C, 30 seconds 2 58 °C, 30 seconds 72 °C, 1 minute

95 °C, 15 seconds 58 °C, 30 seconds 8 72 °C, 1 minute

95 °C, 15 seconds 80 °C, 30 seconds 5 58 °C, 30 seconds 72 °C, 1 minute

72 °C, 5 minutes† 1

4 °C hold 643 644 645 646 647 648 649 650 Aquatic assemblages from eDNA metabarcoding 18

651 Table 2: DNA sequence counts. A: Sequence counts for the entire run. Overlapped: Read pairs that were 652 successfully overlapped and with the originating primer identified. Classified: Overlapped pairs successfully 653 classified to a taxon. B: Total: Sequences classified at that site. Min/Max: Minimum and maximum sequences 654 classified per replicate, respectively. 655

A Total pairs Overlapped Classified % Classified

5,863,313 4,941,836 4,727,948 80.6

B Site 1 Site 2 Site 3 Site 4 Site 5

Total 641,982 689,835 913,090 437,037 570,904

Min 6003 5223 20,211 1300 3957

Max 113,649 126,521 143,133 121,225 104,514 656 Aquatic assemblages from eDNA metabarcoding 19

657 Table 3: Accuracy (AC), sensitivity (SN), and specificity (SP) for individual and combined eDNA markers. 658 Marker performance was determined using electrofishing data as the reference for presence/absence. One 659 locus, two loci, and three loci values count a taxon as detected at a site if at least one, two, or all three of the 660 three loci detect that taxon, respectively. All taxa - species: eDNA detection of all taxa found by electrofishing, 661 classified to the species level. All taxa - genus: eDNA detection of all genera found by electrofishing.

Taxonomic Group Target gene AC SN SP

Oncorhynchus species 12S rDNA 0.94 1.00 0.75

Oncorhynchus species COI 0.94 1.00 0.75

Oncorhynchus species ND2 0.94 1.00 0.75

Oncorhynchus species 1 locus 0.94 1.00 0.75

Oncorhynchus species 2 loci 0.94 1.00 0.75

Oncorhynchus species 3 loci 0.94 1.00 0.75

Cottus species 12S rDNA 0.50 0.00 1.00

Cottus species COI 0.50 0.50 0.50

Cottus species CytB 0.50 0.50 0.50

Cottus species 1 locus 0.75 1.00 0.50

Cottus species 2 loci 0.25 0.00 0.50

Cottus species 3 loci 0.50 0.00 1.00 662

All taxa - species All loci 0.87 0.86 0.87

All taxa - genus All loci 0.83 0.90 0.50 663 664 Aquatic assemblages from eDNA metabarcoding 20

665 Table 4: Insect genera identified from eDNA sequences at Fall Creek, OR. Taxa accounting for greater than 666 1% of total insect sequences were identified from Centrifuge output. Taxa native to Oregon are indicated (†taxa 667 known from North America but not reported in Oregon).

Order Family Genus Native to OR?

Diptera Chironomidae Cricotopus N†

“ Tanytarsus N†

Drosophilidae Drosophila Y

Sciomyzidae unknown

Simuliidae Simulium N

Ephemeroptera Ameletopsidae Ameletopsis N

Baetidae unknown

Ephemerellidae Caudatella Y

“ Caurinella N†

“ Drunella Y

Heptageniidae Cinygmula Y

“ Epeorus Y

“ Rhithrogena Y

Leptophlebiidae unknown

Plecoptera Chloroperlidae Sweltsa Y

Leuctridae Despaxia Y

Nemouridae Malenka Y

“ Zapada Y

Perlidae Calineuria Y

“ Doroneuria Y

“ Hesperoperla Y

Perlodidae Megarcys Y

668 669 Aquatic assemblages from eDNA metabarcoding 21

670 671 Figure 1: Map of five sampling sites at Fall Creek, OR. Sites are numbered moving upstream from the 672 confluence with the Alsea River, with sites 1 and 2 downstream of the Oregon Hatchery Research Center 673 (OHRC), site 3 at the outflow of the OHRC, and sites 4 and 5 upstream of the OHRC dam. 674 Aquatic assemblages from eDNA metabarcoding 22

675 676 Figure 2: Electrofishing abundance of target taxa at eDNA sampling sites in Fall Creek, OR. A: Total counts. 677 B: Biomass (g). *Biomass was not determined for three taxon × site combinations. 678 Aquatic assemblages from eDNA metabarcoding 23

679 680 Figure 3: Primer efficiency across primer sets. A: Read counts from 39 primer sets derived from positive 681 control reactions containing 2×104 or 2×106 template molecules per target. Vertical black line represents the 682 2×106 molecule control median count of 14,970 reads. “Salmonid Species ND2” and “Salmonid Species COI” 683 are averages of three and two primer sets, respectively. B: Biplot of sequence yield from the two positive 684 controls, with linear trendline, trendline equation, and correlation test statistics. C: Biplot of sequence yield 685 (2×106 molecule control) as a function of amplicon length, in bases. D: Biplot of sequence yield as a function 686 of forward+reverse primer %GC-content. 687 Aquatic assemblages from eDNA metabarcoding 24

688 689 Figure 4: Summary of presence/absence detection for samples and replicates at Fall Creek sites 1, 2, 4 and 5. 690 Summaries for select genera are shown in the upper panel, and summaries for specific taxa shown in the lower 691 panel. A: Presence (black) and absence (white) for fish and amphibians based on electrofishing. “Unidentified 692 Oncorhynchus” refers to YOY Rainbow or Coastal Cutthroat trout, “Unidentified Cottus” refers to Riffle or 693 Reticulate sculpin, “Unidentified Rhinichthys” refers to Longnose or Speckled dace. B: Presence (black) and 694 absence (white) for fish and amphibians based on all eDNA markers combined. Individual detections are 695 shown for each of 8 replicates; grey indicates that the taxon was observed at the site in at least one replicate. 696 “Unidentified Oncorhynchus”, “Unidentified Cottus”, “Unidentified Cyprinidae”, and “Unidentified 697 Rhinichthys” may refer to any member of those taxa, respectively. C: Presence/absence for fish based on 698 12SrDNA metabarcoding primers targeting teleosts. D: Presence/absence for salmonids and cottids based on 699 COI primers targeting those taxa. E: Presence/absence for salmonids based on taxon-specific ND2 primers. F: 700 Presence/absence for cottids based on taxon-specific CytB primers. 701 Aquatic assemblages from eDNA metabarcoding 25

702 703 Figure 5: Phytophthora abundance relative to total library yield. The abundance of summed counts of 704 Phytophthora species against total library yield for each of 40 stream sample libraries, with linear trendline and 705 correlation statistics. 706 Aquatic assemblages from eDNA metabarcoding 26

707 708 Figure 6: Boxplots showing means, interquartile ranges, and individual counts for eight replicates at five sites 709 from select taxa and genes. (A) Phytophthora amplified with CytB; (B) Rainbow Trout amplified with 12S 710 rDNA and ND2; (C) Coastal Cutthroat Trout amplified with 12S rDNA and ND2; (D) Chinook Salmon 711 amplified with 12SrDNA and ND2; (E) Coho Salmon amplified with 12S rDNA and ND2; and (F) White 712 Sturgeon amplified with 16S rDNA, CytB, and D-Loop (averaged across loci, see text). 713 Aquatic assemblages from eDNA metabarcoding 27

714 715 Figure 7: Summary of identified taxa in a multi-level Krona taxonomic chart. Taxa are displayed in bands 716 proportional to their representation across all sequences from Fall Creek samples. Inner levels represent higher 717 taxonomic ranks, with lower taxonomic ranks shown in outer levels. An interactive version of this chart for 718 each site is available in Supplemental File 4. 719 Aquatic assemblages from eDNA metabarcoding 28

720 Supplemental material 721 Supplemental File 1: Supplemental text including additional methodological details, acknowledgements, and 722 tables. 723 Supplemental File 2: PCR primer sequences, targets, and properties. 724 Supplemental File 3: List of 337 taxa excluded from classification database. taxID: NCBI taxonomic identifier. 725 taxname: Taxonomic name from NCBI. 726 Supplemental File 4: Interactive Krona taxonomic chart. Taxa are displayed in bands proportional to their 727 representation across all sequences from Fall Creek samples. Inner levels represent higher taxonomic ranks, 728 with lower taxonomic ranks shown in outer levels. 729