bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Marine water environmental DNA metabarcoding provides a comprehensive

2 diversity assessment and reveals spatial patterns in a large oceanic area

3 Natalia Fraija-Fernández1, Marie-Catherine Bouquieaux1, Anaïs Rey1, Iñaki Mendibil1, Unai

4 Cotano1, Xabier Irigoien1,2, María Santos1, Naiara Rodríguez-Ezpeleta1*

5 1AZTI, Marine Research Division, Sukarrieta, Bizkaia, .

6 2IKERBASQUE, Basque Foundation for Science, Bilbao, Spain.

7 *Corresponding author.

8

9

1 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

10 Abstract

11 Current methods for monitoring marine fish diversity mostly rely on surveys,

12 which are invasive, costly and time-consuming. Moreover, these methods are selective,

13 targeting a subset of at the time, and can be inaccessible to certain areas. Here, we

14 explore the potential of environmental DNA (eDNA), the DNA present in the

15 as part of shed cells, tissues or mucus, to provide comprehensive information about fish

16 diversity in a large marine area. Further, eDNA results were compared to the fish diversity

17 obtained in pelagic trawls. A total of 44 5L-water samples were collected onboard a wide-

18 scale oceanographic survey covering about 120,000 square kilometres in Northeast Atlantic

19 Ocean. A short region of the 12S rRNA gene was amplified and sequenced through

20 metabarcoding generating almost 3.5 million quality-filtered reads. Trawl and eDNA samples

21 resulted in the same most abundant species (European , , Atlantic

22 and blue whiting), but eDNA metabarcoding resulted in more detected fish and

23 elasmobranch species (116) than trawling (16). Although an overall correlation between fish

24 biomass and number of reads was observed, some species deviated from the common trend,

25 which could be explained by inherent biases of each of the methods. Species distribution

26 patterns inferred from eDNA metabarcoding data coincided with current ecological

27 knowledge of the species, suggesting that eDNA has the potential to draw sound ecological

28 conclusions that can contribute to fish surveillance programs. Our results support eDNA

29 metabarcoding for broad scale marine fish diversity monitoring in the context of Directives

30 such as the Common Policy or the Marine Strategy Framework Directive.

31 Keywords: environmental DNA, metabarcoding, marine fish surveys, ,

32 Elasmobranchii

2 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

33 Introduction

34 Monitoring of marine biodiversity provides a baseline for policy implementation

35 towards a sustainable use of the marine environment and its resources. Among the traditional

36 methods for surveying marine fauna, trawling has been widely used, as identification and

37 quantification of large volumes of organisms is considered a reliable method for monitoring

38 fish and other marine populations (ICES, 2015; Massé, Uriarte, Angélico, & Carrera,

39 2018). Fish surveys using trawls are conditioned by the gear’s own characteristics (e.g., mesh

40 size, area of opening) and deployment parameters (e.g., towing speed, depth and diel

41 variation) (Heino et al., 2011). Consequently, besides being invasive and time consuming,

42 fish trawling in pelagic environments can be largely selective affecting diversity estimates and

43 knowledge of species composition (Fraser, Greenstreet, & Piet, 2007; ICES, 2004). For

44 instance, due to their large body size, fast swimming speed, and in some cases, scarcity, many

45 elasmobranch species are not thoroughly surveyed (Rago, 2004). Therefore, alternative

46 methods are needed, and advances in DNA sequencing and bioinformatics have opened new

47 avenues to assess marine biodiversity in a non-invasive manner (Danovaro et al., 2016; Rees,

48 Maddison, Middleditch, Patmore, & Gough, 2014).

49 In particular, the analysis of environmental DNA (eDNA), that is, the genetic material

50 shed and excreted by organisms to the environment, to characterise the biological

51 communities present in an environment (P. Taberlet, Coissac, Hajibabaei, & Rieseberg, 2012)

52 is gaining increasing attention for monitoring aquatic environments (Thomsen & Willerslev,

53 2015). Community composition can be inferred from eDNA samples through metabarcoding,

54 whereby the eDNA is collected from the water column through filtering, selectively amplified

55 through PCR using primers targeting a given barcode from a particular taxonomic group and

56 sequenced (Pierre Taberlet, Coissac, Pompanon, Brochmann, & Willerslev, 2012). The

3 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

57 resulting sequences are then compared against a reference database to perform biodiversity

58 inventories (Deiner et al., 2017). Most studies using eDNA metabarcoding for monitoring fish

59 communities are based in freshwater environments and have shown that eDNA

60 metabarcoding provides overall estimates that are equivalent or superior to traditional

61 methods such as visual surveys, trawling or electrofishing (Hänfling et al., 2016; Minamoto,

62 Yamanaka, Takahara, Honjo, & Kawabata, 2012; Pont et al., 2018).

63 As opposed to freshwater systems, the marine environment has in general a larger

64 water-volume to fish biomass ratio and is influenced by currents, implying that the eDNA is

65 less concentrated and disperses quicker (Hansen, Bekkevold, Clausen, & Nielsen, 2018).

66 This, coupled with a higher sympatric marine fish diversity, suggests that monitoring fish

67 diversity through eDNA sampling could be particularly challenging in the marine

68 environment. Indeed, only a handful of studies have applied eDNA metabarcoding for

69 monitoring fish and elasmobranchs in natural marine environments (e.g., O’Donnell et al.,

70 2017; Stat et al., 2017). Among them, only a few have compared eDNA and other traditional

71 surveying methods and are based on a very small area of a few square kilometres either in

72 ports (Jeunen et al., 2019; Sigsgaard et al., 2017; Thomsen et al., 2012) or in coastal areas

73 (Andruszkiewicz et al., 2017; DiBattista et al., 2017; Yamamoto et al., 2017) or have

74 performed comparisons at family level taxonomic assignments (Thomsen et al., 2016). Thus,

75 although these studies envision eDNA metabarcoding as a promising method for non-

76 invasive, faster, more efficient and reliable marine surveys, this needs still to be tested in the

77 context of a survey covering a broad marine area.

78 The Bay of Biscay is a biogeographical area in the North Atlantic Region covering

79 more than 220,000 Km2, at which the main economic activities include commercial .

80 Large populations of species such as the encrasicolus, the

4 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

81 European pilchard Sardina pilchardus, the European hake , the

82 Atlantic Mackerel Scomber scombrus and the Atlantic horse mackerel Trachurus trachurus

83 are dominant in the area (ICES, 2018). Fish diversity in the Bay of Biscay has been accounted

84 using mainly observational methods, fish trawling and acoustic surveys; thus, there is room

85 for incorporating and assessing the performance of eDNA-based surveys. This paper aims to

86 test the potential of eDNA metabarcoding to assess the fish community composition in a large

87 marine area, such as the Bay of Biscay. For that aim, we have compared eDNA

88 metabarcoding based estimates with those derived from fishing trawls catches and have

89 related eDNA metabarcoding based estimates with the known spatial distribution and

90 ecological patterns of the species in the area.

91

92 Methods

93 Sample collection

94 Fish catches and water samples were collected during the BIOMAN 2017 survey (Santos,

95 Ibaibarriaga, Louzao, Korta, & Uriarte, 2018) between the 5th and the 29th of May 2017

96 covering the area of about 120,000 km2 between the French continental shelf and the Spanish

97 shelf (Figure 1) on board the Emma Bardán and Ramón Margalef research vessels. Fish

98 catches were obtained on board the R/V Emma Bardán pelagic trawler. The trawl had an 8

99 mm mesh size end, and towing time and speed were 40 min and 4 knots, respectively. A

100 total of 44 stations were used for trawling. Although station depths varied between 26 and

101 3000 m, the maximum fishing depth was 156 m. Onboard, fish were morphologically

102 identified to species level or, when doubt, to the smallest taxonomic rank (e.g., family or

103 ). Biomass estimates were standardised as Kg caught per taxa. Water samples were

5 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

104 collected on board the R/V Ramón Margalef research vessel using the continuous circuit

105 intake of the ship at 4.4 m depth in 44 additional stations (Figure 1). A total of 5 L sea water

106 per station was filtered through Sterivex 0.45 µm pore size enclosed filters (Millipore) with a

107 peristaltic pump and kept at -20 °C until further processing.

108

109 DNA extraction and amplicon library preparation

110 DNA extractions were performed using the DNeasy® blood and tissue kit (Qiagen)

111 following the modified protocol for DNA extraction from Sterivex filters without preservation

112 buffer by Spens et al. (2017). DNA concentration was measured with the Quant-iT dsDNA

113 HS assay kit using a Qubit® 2.0 Fluorometer (Life Technologies, California, USA). DNA

114 from all 44 samples was amplified with the teleo_F/telo_R primer pair (hereafter ‘teleo’),

115 targeting a short region (60 bp) of the mitochondrial 12S rRNA gene, combined with the

116 human blocking primer teleo_blk (Valentini et al., 2016). PCR amplifications were done in a

117 final volume of 20 µl including 10 µl of 2X Phusion Master Mix (ThermoScientific,

118 Massachusetss, USA), 0.4 µl of each amplification primer (final concentration of 0.2 µM), 4

119 µl of teleo_blk (final concentration of 2 µM), 3.2 µl of MilliQ water and 2 µl of 10 ng/µl

120 template DNA. Samples from 4 stations were also amplified i) using the same procedure but

121 without the blocking primer and ii) using the mlCOIintF/dgHCO2198 primer pair (hereafter

122 ‘mlCOI’), targeting a short region (310 bp) of the COI gene (Meyer, 2003). The

123 thermocycling profile for PCR amplification included 3 min at 98 °C; 40 or 35 cycles (for

124 ‘teleo’ and ‘mlCOI’, respectively) of 10 s at 98 °C, 30 s at 55 or 46 °C (for ‘teleo’ and

125 ‘mlCOI’, respectively) and 45 s at 72 °C, and finally, 10 min at 72 °C. PCR products were

126 purified using AMPure XP beads (Beckman Coulter, California, USA) following

6 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

127 manufacturer’s instructions and used as templates for the generation of 12 x 8 dual-indexed

128 amplicons in the second PCR reaction following the ‘16S Metagenomic Sequence Library

129 Preparation’ protocol (Illumina, California, USA) using the Nextera XT Index Kit (Illumina,

130 California, USA). Multiplexed PCR products were purified using the AMPure XP beads,

131 quantified using Quant-iT dsDNA HS assay kit using a Qubit® 2.0 Fluorometer (Life

132 Technologies, California, USA) and adjusted to 4 nM. 5 µl of each sample were pooled,

133 checked for size and concentration using the Agilent 2100 bioanalyzer (Agilent Technologies,

134 California, USA), sequenced using the 2 x 300 paired end protocol on the Illumina MiSeq

135 platform (Illumina, California, USA) and demultiplexed based on their barcode sequences.

136

137 Reference database

138 Two reference databases were created for the ‘teleo’ barcode. A first ‘global’ database

139 included all Chordata 12S rRNA and complete mitochondrial genome sequences available

140 from GenBank (accessed in February 2018). By performing an all-against-all BLAST

141 (Altschul, Gish, Miller, Myers, & Lipman, 1990), potential sources of contamination or

142 erroneous taxonomic assignments were removed such as human contaminations (e.g., non-

143 human labelled sequences that matched at 100% identity with the Homo sapiens 12S rRNA

144 sequence) or cross-contaminated sequences (e.g., sequences arising from the same study that,

145 even when belonging to different genus, were 100% identical). All sequences were trimmed

146 to the ‘teleo’ region. for the GenBank sequences was retrieved using E-utilties

147 (Sayers, 2008) and modified to match that of the World Register of Marine Species: WoRMS

148 (Horton et al., 2018), forcing for seven taxonomic levels, i.e., Phylum, Subphylum, Class,

149 Order, Family, Genus, Species. This ‘global’ reference database contains 10,284 ‘teleo’

7 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

150 region sequences. For the second database, only sequences from target species were retrieved

151 so that more exhaustive error checking was possible. The list of the 1,858 fish species

152 expected in the Northeast Atlantic and Mediterranean areas was compiled from FishBase

153 (http://www.fishbase.org) and their corresponding scientific names and sequences were

154 obtained from NCBI (https://www.ncbi.nlm.nih.gov). For the retrieved records, only those

155 covering the ‘teleo’ region were selected and aligned (Table S1). A phylogenetic tree was

156 built with RAxML (Stamatakis, 2014) using the GTR-CAT model and visualized with iTOl

157 (Letunic & Bork, 2016). The tree was visually inspected and the records corresponding to

158 misplaced species were removed from the database. This ‘local’ reference database contains

159 ‘teleo’ region sequences of 612 species. For the ‘mlCOI’ barcode, the reference database

160 consisted in the COI sequences and their corresponding taxonomy obtained from the BOLD

161 (Ratnasingham & Hebert, 2007) database.

162

163 Read pre-processing, clustering and taxonomic assignment

164 Quality of raw demultiplexed reads was verified with FASTQC (Andrews, 2010).

165 Forward and reverse primers, were removed with cutadapt (Martin, 2011) allowing a

166 maximum error rate of 20%, discarding read pairs that do not contain the two primer

167 sequences and retaining only those reads longer than 30 nucleotides. Paired reads were

168 merged using pear (Zhang, Kobert, Flouri, & Stamatakis, 2014) with a minimum overlap of

169 20 nucleotides. Pairs with average quality lower than 25 Phred score were removed using

170 Trimmomatic (Bolger, Lohse, & Usadel, 2014). mothur (Schloss et al., 2009) was used to

171 remove reads i) not covering the target region, ii) shorter than 40 or 313 nucleotides, for

172 ‘teleo’ and ‘mlCOI’, respectively, iii) containing ambiguous positions and iv) being potential

8 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

173 chimeras, which were detected based on the UCHIME algorithm (Edgar, Haas, Clemente,

174 Quince, & Knight, 2011). Reads were clustered into OTUs using vsearch (Rognes, Flouri,

175 Nichols, Quince, & Mahé, 2016) at 97% similarity threshold and into SWARMs using Swarm

176 (Mahé, Rognes, Quince, de Vargas, & Dunthorn, 2014) with a d value of 1. In both cases, the

177 LULU post-clustering algorithm (Frøslev et al., 2017) was applied with a minimum threshold

178 of sequence similarity for considering any OTU/SWARM as an error of 97%. Taxonomic

179 assignment of unique reads and of representative sequences for each OTU or SWARM was

180 performed using the naïve Bayesian classifier method (Wang, Garrity, Tiedje, & Cole, 2007)

181 implemented in mothur using the 12S rRNA and COI databases described above. Reads with

182 the same taxonomic assignment were clustered into phylotypes.

183

184 Biodiversity analyses

185 Analyses were performed with the R packages Phyloseq (McMurdie and Holmes

186 2013) and Vegan (Oksanen et al., 2019). Sampling stations were classified into three

187 categories considering their depth and distance from the coast (see Map in Figure 1): shallow

188 stations where maximum station depth was <90 m, medium stations, when depth ranged

189 between 90 and 127 m and, deep stations where depth was >127 m. To assess differences in

190 fish diversity across sampling zones (i.e., according to shallow, medium and deep stations) we

191 calculated the Bray-Curtis dissimilarity index for relative abundance of species with the

192 function ordinate using only phylotypes with more than 10 reads. These distances were then

193 ordinated using a non-metric Multidimensional Scaling (NMDS) as implemented in Phyloseq

194 and differences between stations were tested with PERMANOVA (1,000 permutations) using

195 the function adonis within the R package Vegan previous testing for homogeneity of variance

9 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

196 using the function betadisper. A linear model was used on species with more than 1,000

197 reads, to test for the effect of the abundance of reads (previously standardised according to the

198 overall number of reads and stations per zone), and the distance from the coast. An overall

199 correlation between the log-transformed values of the number of reads obtained and the

200 biomass caught per species was explored with the Pearson correlation coefficient as

201 implemented in R package Stats. For an even geographic distribution between water and fish

202 sampling sites, a total of nine water sampling sites north La Rochelle were removed for the

203 correlation analysis. In addition, in order to compare eDNA and trawling based estimates at a

204 smaller scale, we created groups of stations so that this comparison was possible. For that

205 aim, we combined the data from all eDNA and trawling stations within < 20nm of each

206 eDNA station in what we call mega-stations. A total of 30 mega-stations resulted. A mantel

207 test as implemented in the R package ade4 (Dray & Dufour, 2007) was used to explore

208 differences between geographic and Bray-Curtis distances of the mega-stations. The list of

209 species commonly reported from the Bay of Biscay was obtained mainly from i) Basterretxea,

210 Oyarzabal, and Artetxe (2012), ii) the AZTI’s database on fish discards in the

211 area gathered according to EU regulation 2017/1004 of 17 May 2017, iii) the data obtained

212 from fish pelagic trawling during BIOMAN surveys from 2003 until 2019, iv) the ICES

213 database for International Bottom Trawling Surveys available from www.ices.dk and v) the

214 2017 Pélagiques Gascogne (PELGAS) integrated survey (Mathieu et al., 2019).

215

216 Results

217 Data quality and overall taxonomic composition

10 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

218 We obtained a total of 4,640,913 raw ‘teleo’ reads from which 3,366,264 (72%) were

219 retained after quality check for downstream analyses. The average number of ‘teleo’ reads per

220 sample was 70,131 (Table S2). Using the ‘global’ database, 99.88% of the reads were

221 classified as Actinopterygii or Elasmobranchii. The remaining 0.12% (4,343 reads) were

222 classified as mammals (40.16%) and birds (9.60%), with half of the reads (50.24%) not

223 classified into Class level (Figure S1). Only 14 reads in eight samples were specifically

224 assigned to H. sapiens. From these, two samples did not include the specific blocking primer

225 used, suggesting that samples held very little contamination from external sources. Using the

226 ‘local’ database, 99.98% of the reads were classified either as Actinopterygii or

227 Elasmobranchii and, depending on the clustering method used, the number of taxa recovered

228 varied. SWARM clustering yielded 90 OTUs identified at the species level (including 95.5%

229 of the reads) and vsearch, 109 (including 95% of the reads), whereas not clustering reads into

230 OTUs, but using phylotypes, resulted in 116 Actinopterygii and Elasmobranchii species

231 (including 95% of the reads) identified. Further analyses were based on phylotypes assigned

232 to the species level (Table S3) as no additional information is provided by using OTU

233 clustered reads. From the 116 identified species, 50 included more than 10 reads.

234 More than half of the reads are assigned to European anchovy, E. encrasicolus

235 (51.67%), followed by European pilchard, S. pilchardus (27.67%), Atlantic mackerel,

236 Scomber scombrus (4.96%), blue whiting, poutassou (2.36%), white

237 seabream, Diplodus sargus (1.52%) and axillary seabream Pagellus acarne (1.20%), which

238 together represent 89.4% of the reads (Figure 2A). A small percentage of the reads (0.27%)

239 were classified as Elasmobranchii, including seven species such as the Greenland ,

240 Somniosus microcephalus, the blue shark, Prionacea glauca and the undulate ray, Raja

241 undulata (Figure 2B).

11 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

242 As for the four samples amplified with ‘mlCOI’ primers, we obtained 389,665 raw

243 reads from which, 324,731 (83%) were retained for downstream analyses. The average

244 number of ‘mlCOI’ reads per sample retained after quality filtering is 81,183 (Table S2).

245 Using the BOLD database, 89.86% of the reads were classified into Phylum, 80.87% of which

246 were metazoans, and among them 47.88% were classified as arthropods and 2.51% as

247 (Figure S2). Within chordates 74.56% of the reads were classified as Actinopterygii

248 (1.87% of the overall reads), resulting in only seven taxa classified into species (Figure S2).

249

250 Comparison with fish trawling

251 Trawling operations during the BIOMAN survey resulted in a total of 18 taxa caught,

252 from which lanternfishes (Fam. Myctophidae) and mullets (Mugil sp.) were the only ones not

253 classified into species level. Qualitatively, a total of 10 species were identified both from the

254 eDNA and trawling catches (Figure 3A). Six species were collected during catches and not

255 detected through eDNA, namely Sprattus sprattus, Trachurus mediterraneus, Boops boops,

256 Zeus faber, Trisopterus luscus, and Capros aper (Table S4); from these, there are no

257 sequences for T. mediterraneus and B. boops in the reference database and the fact that we

258 find T. minutus in eDNA suggest that this could be actually T. luscus. To assess the

259 relationship between the biomass of fish caught and the number of reads obtained through

260 eDNA, data from T. mediterraneus and T. trachurus were combined into Trachurus spp. and

261 that from T. luscus and T. minutus into Trisopterus spp. There was an overall correlation

262 between fish biomass and number of reads per species although not significant at p < 0.05

263 (Figure 3B). E. encrasicolus was the most abundant species for both methods, whilst the

264 relative abundance for some species like Dicentrarchus labrax, M. poutassou and S.

265 pilchardus was higher when using eDNA. In contrast, the relative abundance of M.

12 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

266 merluccius, S. scombrus and Trachurus spp. was higher in catches than when using eDNA

267 (Figure 3B; Table S4). At a local scale, no significant correlation between eDNA and trawling

268 based abundances was found (Mantel test, r = -0.04 p = 0.646). In fact, eDNA data showed a

269 more constant abundance of the three most abundant species (E. encrasicolus, S. pilchardus

270 and S. scombrus), compared to trawl data, which showed in general a higher number of

271 species per station, except for those eight stations were E. encrasicolus was dominant (> 94%

272 of the catch) (Figure S3).

273

274 Species distribution patterns

275 A significant correlation was detected between species abundance differences and

276 geographic distances between stations both for eDNA (R2 = 0.38 p < 0.01) and trawling

277 stations (R2 = 0.20 p < 0.01). Yet, although for eDNA, pairs of distant stations tend to be

278 more different in taxonomic composition, pairs of nearby stations cover the full range of

279 Bray-Curtis distances (Figure 4). Comparisons between samples within same or distinct depth

280 category (shallow, medium, deep) or within same or distinct sampling methods (eDNA,

281 trawling) had no effect over the observed patterns (Figure S4).

282 The overall compositional pattern of our data showed significant differences between

283 species occurrence and sampling sites according to their zone (e.g., shallow, medium and

284 deep stations) (PERMANOVA F2,43 = 2.24, p < 0.05) (Figure 5). Within the main species

285 contributing to the spatial ordination of our data, two main groups can be broadly observed.

286 On one side, species like E. encrasicolus, M. merluccius, Coris julis S. scombrus, M.

287 poutassou, Lophius piscatorius, S. microcephalus, Xenodermichthys copei, and P. glauca

288 tended to be more abundant in deeper stations and their relative abundances increased in sites

13 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

289 >127m-deep (Figure 6). In contrast, a second loop in the spatial ordination of the data include

290 other species such as Gobius niger, Ammodytes dubius, D. sargus, Argentina silus, D. labrax,

291 S. pilchardus, Mola mola and Scomber colias (Figure 5). This information correlates with a

292 pattern of higher abundance in <90m-deep sites for, e.g., S. pilchardus, D. sargus, M. mola, A.

293 dubius, D. labrax and S. colias (Figure 4B). Relatively to the abundance of reads and station

294 depth, four species, namely A. silus, Glaucostegus cemiculus, G. niger and Pagellus

295 bogaraveo, remain unchanged between shallow and deep stations. Specifically, for

296 elasmobranch species, a pattern nicely correlated with higher relative abundances of typical

297 demersal species like R. undulata in shallow sites and pelagic species like S. microcephalus

298 and P. glauca in medium and deep sites (Figure 6). Species like Labrus merula and Buenia

299 affinis were among the most abundant in number of reads (>1,000 per species) but have not

300 been previously reported for the Bay of Biscay.

301

302 Discussion

303 This study shows how eDNA metabarcoding provides a comprehensive overview of

304 the fish diversity in a large-scale marine area. Compared to fish trawling, eDNA

305 metabarcoding was able to “capture” a larger number of fish species. Both, eDNA and

306 trawling based estimates (in number of reads and biomass, respectively) indicate that E.

307 encrasicolus represents half of the abundance, which is consistent to the known large and

308 stable anchovy population in the Bay of Biscay (Erauskin-Extramiana et al., 2019; Santos,

309 Uriarte, Boyra, & Ibaibarriaga, 2018; Uriarte, Prouzet, & Villamor, 1996) and with the fact

310 that the BIOMAN survey took place during the anchovy spawning season. The seven most

311 abundant species in fish trawling representing >1% of the total biomass were T. trachurus, S.

14 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

312 scombrus, T. mediterraneus, M. merluccius, M. poutassou, S. pilchardus and B. boops, which

313 were all, except those not present in the reference database (B. boops and T. mediterraneus),

314 also found in the eDNA metabarcoding data, and four of them (E. encrasicolus, S. pilchardus,

315 S. scombrus and M. poutassou) were also among the most abundant species from eDNA data.

316 Thus, concerning the most abundant species in the Bay of Biscay, eDNA and trawling data

317 provided comparable conclusions.

318 The following three species were caught during fish trawling but were absent from

319 eDNA data despite being present in the reference database, Z. faber, S. sprattus and C. aper.

320 One possible explanation for this false negative detection could be the little abundance of this

321 species’ DNA in the water, as suggested by the small and reduced number of catches (2.07 Kg

322 in 3 sites, 1.07 Kg in 2 sites and 0.34 Kg in 2 sites, respectively). In fact, a small number of

323 reads, i.e., 591, was also detected for Solea solea, a species from which 0.05 Kg were caught

324 in a single station. If this is the case, filtering larger volumes of water and increasing

325 sequencing depth could improve detection. Alternatively, reference sequences for Z. faber S.

326 sprattus and C. aper could be undetected errors in the reference database (Li et al., 2018), or

327 correspond to alternative intraspecific variants. On the other hand, in accordance with

328 previous studies, eDNA data resulted in about 100 more species (35 with more than 10 reads)

329 than trawling data collected simultaneously (Thomsen et al., 2012; Thomsen et al., 2016;

330 Yamamoto et al., 2017). For example, species such as D. sargus, P. acarne, M. mola, D.

331 labrax, L. piscatorius, Chelon ramada, A. dubius, G. niger, Ctenolabrus ruperstris, A. silus,

332 S. microcephalus and B. affinis were not found in catches, but were more abundant in eDNA

333 reads than the 5th most abundant species (M. merluccius) in catches. The fact that eDNA

334 results in a higher number of species could be partially attributed to the efficiency of the

335 method to detect benthic or coastal species, difficult to catch by pelagic trawling nets, focused

15 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

336 on small and medium-size pelagic species. To check to what extent eDNA is able to detect in

337 surface waters (4 m) demersal species we compared the results with the ICES International

338 Bottom Trawling Surveys (IBTS surveys) data for the Bay of Biscay from 2003 to 2019

339 (ICES, 2013) and with the 2017 Pélagiques Gascogne (PELGAS) integrated survey in the

340 same area (Mathieu et al., 2019). eDNA metabarcoding data was able to detect at least 31 out

341 of 164 species reported for the Bay of Biscay by IBTS surveys and 13 out of 45 species by

342 PELGAS survey (Figure S5). Although not being a thorough comparison, as time periods and

343 sampling seasons at least from IBTS surveys are different, the comparison provides an overall

344 sense of eDNA as a potential method for surveying a large marine area in a relatively simple

345 way. Differences in eDNA and pelagic trawl catchability can also explain the differences in

346 relative abundances of the species found by the two kind of sampling methods, such as S.

347 pilchardus, M. poutassou and D. labrax, with higher number of eDNA reads relative to the

348 biomass caught, or T. trachurus, S. scombrus and M. merluccius, showing the opposite.

349 However, similarity between both eDNA and trawling stations suggests that stations further

350 apart tend to be more different. The amount, quality and stability of DNA molecules are

351 largely affected by the production rate from each organism, diffusion of the molecules in the

352 water and its inherent degradation (Collins et al., 2018; Murakami et al., 2019; Thomsen et

353 al., 2012). But also, PCR amplification stochasticity and sequencing depth are known to affect

354 the number of reads obtained from an eDNA sample (DiBattista et al., 2017; Zinger et al.,

355 2019).

356 T. minutus, a morphologically similar species to T. luscus was identified through

357 eDNA, which make us raise the hypothesis that specimens collected from catches were

358 misidentified as T. luscus, potentially being T. minutus as eDNA revealed. This would not be

359 an isolated case where morphological characteristics difficult to observe hamper taxonomic

16 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

360 identification, and other available data (e.g., DNA) is needed for species identification

361 (Dayrat, 2005). A remarkable case are lanternfishes of the Myctophidae, where species

362 identification is based on the morphology and the shape and size of , which are

363 extremely fragile and seldom recovered intact (Cabrera-Gil et al., 2018). In this case, eDNA

364 can play a major role for species identification as this study has shown, where at least five

365 myctophid species were identified through eDNA. On the other hand, erroneous database

366 records or missing sequences can bias eDNA based estimates. The quality and completeness

367 of the reference database is crucial for taxonomic classification of eDNA data (Callahan,

368 McMurdie, & Holmes, 2017). For example, two species were among the most abundant in our

369 dataset, but not reported previously in the Bay of Biscay, namely L. merula and B. affinis. A

370 careful examination suggests that, although L. merula could be misled by its close relative L.

371 bimaculatus, occurring in the Bay of Biscay, the sequences attributed to B. affinis seem to be

372 correctly assigned (Figure S6), suggesting that eDNA was able to detect species not

373 previously reported in the area despite in low abundance.

374 Besides species diversity, eDNA also provides information on species distribution,

375 which is comparable to that expected in the area. For instance, the number of reads assigned

376 to the pelagic species M. poutassou and S. scombrus increased in stations deeper than 90m,

377 where preferred habitats for these species occur (Ibaibarriaga et al., 2007). A contrasting

378 pattern was observed for the greater argentine A. silus, a species commonly found at depths

379 between 50 and 200m (Basterretxea et al., 2012), but found in our data at shallower stations.

380 This could also suggest an incongruence with species identification with a close relative, in

381 this case A. sphyraena commonly found over the continental slope (Basterretxea et al., 2012),

382 but with no 12Sr RNA sequence in our reference database, or DNA from A. silus (even in its

383 form of egg or larvae) dispersed to shallower stations. Similarly, species like S. pilchardus, D.

17 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

384 sargus, D. labrax, P. acarne and Alosa spp. showed a distribution for this dataset in stations

385 less than 90m depth, as our eDNA revealed. Available data on the diversity of elasmobranch

386 species in the Bay of Biscay is limited, as most of these species are discarded from

387 commercial fisheries and landing data is incomplete (ICES, 2017; Rodríguez-Cabello, Pérez,

388 & Sánchez, 2013; Rusyaev & Orlov, 2013). Hence, in agreement to previous studies, our data

389 supports eDNA as a potential mechanism for detecting and studying the distribution of

390 elusive and deep-water species, which normally go undetected in fish trawl surveys, e.g.,

391 elasmobranchs, (Thomsen et al., 2016). In any case, eDNA results also revealed an ecological

392 pattern for elasmobranchs. For instance R. undulata, which has a high-site fidelity occurred

393 only in shallow waters (ICES, 2014), whilst large as S. microcephalus, P. glauca and

394 Lamna nasus predominantly occurred in deeper sites.

395 Aside from biological factors (e.g., individual shedding rate, persistence of DNA in

396 the water) that can alter the quantity of eDNA released to the environment, technical

397 considerations can introduce biases on the quality and number of reads generated per species

398 and hence inferences driven from them (Dejean et al., 2011; Lamb et al., 2019; Thomsen et

399 al., 2016). Reference databases are crucial to secure taxonomic assignment for data derived

400 from eDNA samples (Zinger et al., 2019). While recent analyses on the taxonomic annotation

401 of metazoan GenBank sequences suggest their reliability for eDNA metabarcoding studies

402 (Leray, Knowlton, Ho, Nguyen, & Machida, 2019; Li et al., 2018), we encountered the need

403 of including a thorough curation step for our ‘global’ database giving several mislabelled

404 sequences. Species-level annotations were not considered in Leray et al. (2019), and we found

405 incorrectly annotated sequences at all taxonomic levels. As environmental samples contain

406 highly complex DNA signal from various organisms, primer choice is critical for species-

407 level identification (Collins et al., 2019). We found that for our samples, the eukaryote

18 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

408 universal COI primers result in a very small proportion of reads assigned to Actinopterygii.

409 This is due to the fact that the primers target a large number of taxonomic groups, so larger

410 coverage is needed for producing robust data (Alberdi, Aizpurua, Gilbert, Bohmann, &

411 Mahon, 2018; Corse et al., 2019; Gunther, Knebelsberger, Neumann, Laakmann, & Martinez

412 Arbizu, 2018; Stat et al., 2017). The use of more specific primers in our study allowed the

413 specific detection of both Actinopterygii and Elasmobranchii. (Kelly, Port, Yamahara, &

414 Crowder, 2014; Miya et al., 2015). Yet the amount of reads attributed to Elasmobranchii is

415 small as ‘teleo’ primers were not specifically designed for this taxa, e.g., Kelly et al. (2014),

416 and recent developments on elasmobranch-specific primers (Miya et al., 2015) could

417 potentially be a powerful tool to increase the elasmobranch diversity in future marine surveys.

418 In addition, for closely related species such as Alosa alosa and Alosa fallax, the target barcode

419 was exactly the same, so being cautious we consider them as Alosa spp. Another crucial

420 methodological step is the clustering method. We showed that using a clustering method (i.e.,

421 vsearch and swarm) decreased the number of identified species, probably because the

422 algorithm merged sequences from different species into singular OTUs. Recent studies have

423 suggested that clustering techniques and the use of percentages of similarities specially in

424 short (<100 bp) sequences might mislead diversity estimates (Calderón-Sanou, Münkemüller,

425 Boyer, Zinger, & Thuiller, 2019; Callahan et al., 2017; Xiong & Zhan, 2018). Thus, procuring

426 a taxonomically comprehensive database with good quality sequences, and accurate data

427 curation steps is crucial for producing robust and reproducible ecological conclusions from

428 eDNA metabarcoding methods (Collins et al., 2019; Weigand et al., 2019). Including a

429 human-specific blocking primer in our samples had little effect, as we indeed detect, although

430 a small percentage (< 0.01%), reads identified as H. sapiens. The use of blocking primers in

431 metabarcoding analysis has been previously used to block dominant taxa in a specific

19 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

432 samples, for instance host DNA from diet analysis (Jakubavičiūtė, Bergström, Eklöf, Haenel,

433 & Bourlat, 2017), or human DNA from ancient samples (Boessenkool et al., 2012). Our

434 results suggest that our samples held very little contamination from external sources such as

435 human manipulation, air or input from land.

436 Alternative ways to survey marine biodiversity and unbiased evaluations of the

437 ecosystem components are needed as these provide the baseline for policy implementation in

438 the context of global marine directives (e.g., Common Fisheries Policy or the Marine Strategy

439 Framework Directive). eDNA metabarcoding is becoming a more accessible method that

440 generates reliable information for ecosystem surveillance and invites its application on regular

441 marine monitoring programs (Bohmann et al., 2014; Lacoursière-Roussel, Rosabal, &

442 Bernatchez, 2016; Takahara, Minamoto, Yamanaka, Doi, & Kawabata, 2012). Results in this

443 study have shown that eDNA samples provide information on fish diversity in a broad-scale

444 marine area such as the Bay of Biscay. We detected almost ten times more fish and

445 elasmobranch species compared with pelagic trawling. Some of these species can be

446 considered elusive or difficult to capture with traditional fishing methods. In addition,

447 ecological data was driven from eDNA, thus opening new avenues using molecular methods

448 in broad-scale marine surveys.

449

450 Acknowledgements

451 Authors are grateful to the crew of R/V Ramon Margalef and R/V Emma Bardán for their

452 support during filtering and collection of samples, and specially to Luis Ferrer, Marina

453 Chifflet, Bea Beldarrain and Carlota Pérez for their support on filtering onboard. Thanks are

454 due to Iker Pereda for bioinformatic support and to Elisabete Bilbao for technical assistance.

20 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

455 This project has been supported by the Department of Economic Development and

456 Infrastructure of Basque Government (projects GENPES and ECOPES) and by the Spanish

457 Ministry of Science, Innovation and Universities (project CTM2017-89500-R). This is

458 contribution number X [to be included upon acceptance] from the Marine Research Division

459 (AZTI).

460

461

21 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

462 References

463 Alberdi, A., Aizpurua, O., Gilbert, M. T. P., Bohmann, K., & Mahon, A. (2018). Scrutinizing key 464 steps for reliable metabarcoding of environmental samples. Methods in Ecology and Evolution, 9(1), 465 134-147. doi:10.1111/2041-210x.12849

466 Altschul, S. F., Gish, W., Miller, W., Myers, E. W., & Lipman, D. J. (1990). Basic local alignment 467 search tool. Journal of Molecular Biology, 215(3), 403-410. doi:10.1016/S0022-2836(05)80360-2

468 Andrews, S. (2010). FastQC: a quality control tool for high throughput sequence data. Retrieved from 469 http://www.bioinformatics.babraham.ac.uk/projects/fastqc

470 Andruszkiewicz, E. A., Starks, H. A., Chavez, F. P., Sassoubre, L. M., Block, B. A., & Boehm, A. B. 471 (2017). Biomonitoring of marine vertebrates in Monterey Bay using eDNA metabarcoding. PLoS 472 ONE, 12(4), e0176343. doi:10.1371/journal.pone.0176343

473 Basterretxea, M., Oyarzabal, I., & Artetxe, I. (2012). Guía faunística de especies (comerciales y no 474 comerciales) capturadas en el Golfo de Bizkaia por la flota vasca (AZTI Tecnalia Ed.). Sukarrieta 475 (Bizkaia).

476 Boessenkool, S., Epp, L. S., Haile, J., Bellemain, E., Edwards, M., Coissac, E., . . . Brochmann, C. 477 (2012). Blocking human contaminant DNA during PCR allows amplification of rare mammal species 478 from sedimentary ancient DNA. Molecular Ecology, 21(8), 1806-1815. doi:10.1111/j.1365- 479 294X.2011.05306.x

480 Bohmann, K., Evans, A., Gilbert, M. T. P., Carvalho, G. R., Creer, S., Knapp, M., . . . de Bruyn, M. 481 (2014). Environmental DNA for wildlife biology and biodiversity monitoring. Trends in Ecology & 482 Evolution, 29(6), 358-367. doi:https://doi.org/10.1016/j.tree.2014.04.003

483 Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: a flexible trimmer for Illumina 484 sequence data. Bioinformatics (Oxford, England), 30(15), 2114-2120. 485 doi:10.1093/bioinformatics/btu170

486 Cabrera-Gil, S., Deshmukh, A., Cervera-Estevan, C., Fraija-Fernández, N., Fernández, M., & Aznar, 487 F. J. (2018). infections in lantern fish (Myctophidae) from the Arabian Sea: A dual role for 488 lantern fish in the life cycle of Anisakis brevispiculata? Deep Sea Research Part I: Oceanographic 489 Research Papers, 141, 43-50. doi:https://doi.org/10.1016/j.dsr.2018.08.004

490 Calderón-Sanou, I., Münkemüller, T., Boyer, F., Zinger, L., & Thuiller, W. (2019). From 491 environmental DNA sequences to ecological conclusions: How strong is the influence of 492 methodological choices? Journal of Biogeography, 0(0). doi:10.1111/jbi.13681

493 Callahan, B. J., McMurdie, P. J., & Holmes, S. P. (2017). Exact sequence variants should replace 494 operational taxonomic units in marker-gene data analysis. The ISME Journal, 11(12), 2639-2643. 495 doi:10.1038/ismej.2017.119

496 Collins, R. A., Bakker, J., Wangensteen, O. S., Soto, A. Z., Corrigan, L., Sims, D. W., . . . Mariani, S. 497 (2019). Non-specific amplification compromises environmental DNA metabarcoding with COI. 498 Methods in Ecology and Evolution. doi:10.1111/2041-210X.13276

499 Collins, R. A., Wangensteen, O. S., O’Gorman, E. J., Mariani, S., Sims, D. W., & Genner, M. J. 500 (2018). Persistence of environmental DNA in marine systems. Communications Biology, 1(1), 185. 501 doi:10.1038/s42003-018-0192-6

22 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

502 Corse, E., Tougard, C., Archambaud-Suard, G., Agnèse, J.-F., Messu Mandeng, F. D., Bilong Bilong, 503 C. F., . . . Dubut, V. (2019). One-locus-several-primers: A strategy to improve the taxonomic and 504 haplotypic coverage in diet metabarcoding studies. Ecology and Evolution, 9(8), 4603-4620. 505 doi:10.1002/ece3.5063

506 Danovaro, R., Carugati, L., Berzano, M., Cahill, A. E., Carvalho, S., Chenuil, A., . . . Borja, A. (2016). 507 Implementing and Innovating Marine Monitoring Approaches for Assessing Marine Environmental 508 Status. Frontiers in Marine Science, 3(213). doi:10.3389/fmars.2016.00213

509 Dayrat, B. (2005). Towards integrative taxonomy. Biological Journal of the Linnean Society, 85(3), 510 407-417. doi:10.1111/j.1095-8312.2005.00503.x

511 Deiner, K., Bik, H. M., Mächler, E., Seymour, M., Lacoursière-Roussel, A., Altermatt, F., . . . 512 Bernatchez, L. (2017). Environmental DNA metabarcoding: Transforming how we survey animal and 513 plant communities. Molecular Ecology, 26(21), 5872-5895. doi:10.1111/mec.14350

514 Dejean, T., Valentini, A., Duparc, A., Pellier-Cuit, S., Pompanon, F., Taberlet, P., & Miaud, C. (2011). 515 Persistence of Environmental DNA in Freshwater Ecosystems. PLoS ONE, 6(8), e23398. 516 doi:10.1371/journal.pone.0023398

517 DiBattista, J. D., Coker, D. J., Sinclair-Taylor, T. H., Stat, M., Berumen, M. L., & Bunce, M. (2017). 518 Assessing the utility of eDNA as a tool to survey reef-fish communities in the Red Sea. Coral Reefs, 519 36(4), 1245-1252. doi:10.1007/s00338-017-1618-1

520 Dray, S., & Dufour, A. (2007). The ade4 Package: Implementing the Duality Diagram for Ecologists. 521 Journal of Statistical Software, 22(4), 1-20. doi:10.18637/jss.v022.i04

522 Edgar, R. C., Haas, B. J., Clemente, J. C., Quince, C., & Knight, R. (2011). UCHIME improves 523 sensitivity and speed of chimera detection. Bioinformatics (Oxford, England), 27(16), 2194-2200. 524 doi:10.1093/bioinformatics/btr381

525 Erauskin-Extramiana, M., Alvarez, P., Arrizabalaga, H., Ibaibarriaga, L., Uriarte, A., Cotano, U., . . . 526 Chust, G. (2019). Historical trends and future distribution of anchovy spawning in the Bay of Biscay. 527 Deep Sea Research Part II: Topical Studies in Oceanography, 159, 169-182. 528 doi:https://doi.org/10.1016/j.dsr2.2018.07.007

529 Fraser, H. M., Greenstreet, S. P. R., & Piet, G. J. (2007). Taking account of catchability in groundfish 530 survey trawls: implications for estimating biomass. ICES Journal of Marine Science, 531 64(9), 1800-1819. doi:10.1093/icesjms/fsm145

532 Frøslev, T. G., Kjøller, R., Bruun, H. H., Ejrnæs, R., Brunbjerg, A. K., Pietroni, C., & Hansen, A. J. 533 (2017). Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity 534 estimates. Nat Commun, 8(1), 1188. doi:10.1038/s41467-017-01312-x

535 Gunther, B., Knebelsberger, T., Neumann, H., Laakmann, S., & Martinez Arbizu, P. (2018). 536 Metabarcoding of marine environmental DNA based on mitochondrial and nuclear genes. Scientific 537 Reports, 8(1), 14822. doi:10.1038/s41598-018-32917-x

538 Hänfling, B., Lawson Handley, L., Read, D. S., Hahn, C., Li, J., Nichols, P., . . . Winfield, I. J. (2016). 539 Environmental DNA metabarcoding of fish communities reflects long-term data from established 540 survey methods. Molecular Ecology, 25(13), 3101-3119. doi:10.1111/mec.13660

23 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

541 Hansen, B. K., Bekkevold, D., Clausen, L. W., & Nielsen, E. E. (2018). The sceptical optimist: 542 challenges and perspectives for the application of environmental DNA in marine fisheries. Fish and 543 Fisheries, 19(5), 751-768. doi:10.1111/faf.12286

544 Heino, M., Porteiro, F. M., Sutton, T. T., Falkenhaug, T., Godoe, O. R., & Piatkowski, U. (2011). 545 Catchability of Pelagic Trawls for Sampling Deep-living Nekton in the Mid North Atlantic. Retrieved 546 from IIASA, Laxenburg, Austria: http://pure.iiasa.ac.at/id/eprint/9822/

547 Horton, T., Kroh, A., Ahyong, S., Bailly, N., Boyko, C. B., Brandão, S. N., . . . Zhao, Z. (2018). World 548 Register of Marine Species (WoRMS). Retrieved from http://www.marinespecies.org

549 Ibaibarriaga, L., Irigoien, X., Santos, M., Motos, L., Fives, J. M., Franco, C., . . . Reid, D. G. (2007). 550 Egg and larval distributions of seven fish species in north-east Atlantic waters. Fisheries 551 Oceanography, 16(3), 284-293. doi:10.1111/j.1365-2419.2006.00430.x

552 ICES. (2004). Report of the Working Group on Fish Ecology (WGFE) (ICES CM 2004/G:09). 553 Retrieved from http://ices.dk/sites/pub/

554 ICES. (2013). Report of the International Bottom Trawl Survey Working Group (IBTSWG) (ICES CM 555 2013/SSGESST:10). Retrieved from http://ices.dk/sites/pub/

556 ICES. (2014). Report of the Working Group on Elasmobranch (WGEF) (ICES CM 557 2014/ACOM:19). Retrieved from http://ices.dk/sites/pub/

558 ICES. (2015). Manual for International Pelagic Surveys (IPS). . Retrieved from 559 http://ices.dk/sites/pub/

560 ICES. (2017). Report of the Working Group on Elasmobranchs (ICES CM 2017/ACOM:16). 561 Retrieved from http://ices.dk/sites/pub/

562 ICES. (2018). Bay of Biscay and the Iberian coast ecoregion. Retrieved from http://ices.dk/sites/pub/

563 Jakubavičiūtė, E., Bergström, U., Eklöf, J. S., Haenel, Q., & Bourlat, S. J. (2017). DNA 564 metabarcoding reveals diverse diet of the three-spined stickleback in a coastal ecosystem. PLoS ONE, 565 12(10), e0186929-e0186929. doi:10.1371/journal.pone.0186929

566 Jeunen, G.-J., Knapp, M., Spencer, H. G., Lamare, M. D., Taylor, H. R., Stat, M., . . . Gemmell, N. J. 567 (2019). Environmental DNA (eDNA) metabarcoding reveals strong discrimination among diverse 568 connected by water movement. Molecular Ecology Resources, 19(2), 426-438. 569 doi:10.1111/1755-0998.12982

570 Kelly, R. P., Port, J. A., Yamahara, K. M., & Crowder, L. B. (2014). Using Environmental DNA to 571 Census Marine Fishes in a Large Mesocosm. PLoS ONE, 9(1), e86175. 572 doi:10.1371/journal.pone.0086175

573 Lacoursière-Roussel, A., Rosabal, M., & Bernatchez, L. (2016). Estimating fish abundance and 574 biomass from eDNA concentrations: variability among capture methods and environmental 575 conditions. Molecular Ecology Resources, 16(6), 1401-1414. doi:10.1111/1755-0998.12522

576 Lamb, P. D., Hunter, E., Pinnegar, J. K., Creer, S., Davies, R. G., & Taylor, M. I. (2019). How 577 quantitative is metabarcoding: A meta-analytical approach. Molecular Ecology, 28(2), 420-430. 578 doi:10.1111/mec.14920

24 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

579 Leray, M., Knowlton, N., Ho, S.-L., Nguyen, B. N., & Machida, R. J. (2019). GenBank is a reliable 580 resource for 21st century biodiversity research. Proceedings of the National Academy of Sciences, 581 201911714. doi:10.1073/pnas.1911714116

582 Letunic, I., & Bork, P. (2016). Interactive tree of life (iTOL) v3: an online tool for the display and 583 annotation of phylogenetic and other trees. Nucleic Acids Research, 44(W1), W242-W245. 584 doi:10.1093/nar/gkw290

585 Li, X., Shen, X., Chen, X., Xiang, D., Murphy, R. W., & Shen, Y. (2018). Detection of Potential 586 Problematic Cytb Gene Sequences of Fishes in GenBank. Frontiers in genetics, 9, 30-30. 587 doi:10.3389/fgene.2018.00030

588 Mahé, F., Rognes, T., Quince, C., de Vargas, C., & Dunthorn, M. (2014). Swarm: robust and fast 589 clustering method for amplicon-based studies. PeerJ, 2, e593. doi:10.7717/peerj.593

590 Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. 591 EMBnet.journal, 17(1). doi:10.14806/ej.17.1.200

592 Massé, J., Uriarte, A., Angélico, M. M., & Carrera, P. (2018). Pelagic survey series for and 593 anchovy in ICES subareas 8 and 9 – Towards an ecosystem approach. (Vol. 332): ICES Cooperative 594 Research Report

595 Mathieu, D., Laurence, P., Patrick, G., Pierre, R., Patrick, L., & Erwan, D. (2019). Biotic data 596 collected during identification trawls on the PELGAS integrated survey in the Bay of Biscay. 597 SEANOE. https://doi.org/10.17882/59809

598 Meyer, C. P. (2003). Molecular systematics of cowries (Gastropoda: Cypraeidae) and diversification 599 patterns in the tropics. Biological Journal of the Linnean Society, 79(3), 401-459. doi:10.1046/j.1095- 600 8312.2003.00197.x

601 Minamoto, T., Yamanaka, H., Takahara, T., Honjo, M. N., & Kawabata, Z. i. (2012). Surveillance of 602 fish species composition using environmental DNA. Limnology, 13(2), 193-197. doi:10.1007/s10201- 603 011-0362-4

604 Miya, M., Sato, Y., Fukunaga, T., Sado, T., Poulsen, J. Y., Sato, K., . . . Iwasaki, W. (2015). MiFish, a 605 set of universal PCR primers for metabarcoding environmental DNA from fishes: detection of more 606 than 230 subtropical marine species. R Soc Open Sci, 2(7), 150088. doi:10.1098/rsos.150088

607 Murakami, H., Yoon, S., Kasai, A., Minamoto, T., Yamamoto, S., Sakata, M. K., . . . Masuda, R. 608 (2019). Dispersion and degradation of environmental DNA from caged fish in a marine environment. 609 Fisheries Science, 85(2), 327-337. doi:10.1007/s12562-018-1282-6

610 O’Donnell, J. L., Kelly, R. P., Shelton, A. O., Samhouri, J. F., Lowell, N. C., & Williams, G. D. 611 (2017). Spatial distribution of environmental DNA in a nearshore marine habitat. PeerJ, 5, e3044. 612 doi:10.7717/peerj.3044

613 Oksanen, J., Blanchet F, G., Friendly, M., Kindt, R., Legendre, P., McGlinn, D., . . . Wagner, H. 614 (2019). vegan: Community Ecology Package R. package version 2.5-5. Retrieved from 615 https://CRAN.R-project.org/package=vegan

616 Pont, D., Rocle, M., Valentini, A., Civade, R., Jean, P., Maire, A., . . . Dejean, T. (2018). 617 Environmental DNA reveals quantitative patterns of fish biodiversity in large rivers despite its 618 downstream transportation. Scientific Reports, 8(1), 10361. doi:10.1038/s41598-018-28424-8

25 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

619 Rago, P. J. (2004). Fishery-independent sampling: survey techniques and data analyses.

620 Ratnasingham, S., & Hebert, P. D. N. (2007). BOLD: The Barcode of Life Data System 621 (http://www.barcodinglife.org). Molecular Ecology Notes, 7(3), 355-364. doi:10.1111/j.1471- 622 8286.2007.01678.x

623 Rees, H. C., Maddison, B. C., Middleditch, D. J., Patmore, J. R. M., & Gough, K. C. (2014). 624 REVIEW: The detection of aquatic animal species using environmental DNA – a review of eDNA as a 625 survey tool in ecology. Journal of Applied Ecology, 51(5), 1450-1459. doi:10.1111/1365-2664.12306

626 Rodríguez-Cabello, C., Pérez, M., & Sánchez, F. (2013). New records of chondrichthyans species 627 caught in the Cantabrian Sea (southern Bay of Biscay). Journal of the Marine Biological Association 628 of the , 93(7), 1929-1939. doi:10.1017/s0025315413000271

629 Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahé, F. (2016). VSEARCH: a versatile open 630 source tool for metagenomics. PeerJ, 4, e2584-e2584. doi:10.7717/peerj.2584

631 Rusyaev, S. M., & Orlov, A. M. (2013). Bycatches of the greenland shark Somniosus microcephalus 632 (Squaliformes, ) in the barents sea and the adjacent waters under bottom trawling data. 633 Journal of , 53(1), 111-115. doi:10.1134/s0032945213010128

634 Santos, M., Ibaibarriaga, L., Louzao, M., Korta, M., & Uriarte, A. (2018). Index of biomass of 635 anchovy (Engraulis encrasicolus, L.) and sardine (Sardina pilchardus) applying the DEPM and 636 sighting in the Bay of Biscay 2017. In ICES (Ed.), Working group on acoustic and egg surveys for 637 sardine and anchovy in ICES Areas 7, 8 and 9 (WGACEGG). (pp. 158-198). 3-17 November 2017: 638 ICES WGACEGG REPORT 2017.

639 Santos, M., Uriarte, A., Boyra, G., & Ibaibarriaga, L. (2018). Anchovy DEPM surveys 2003-2012 in 640 the Bay of Biscay (Subarea 8): BIOMAN survey series. In J. Massé, A. Uriarte, M. M. Angélico, & P. 641 Carrera (Eds.), Pelagic survey series for sardine and anchovy in ICES subareas 8 and 9 - Towards an 642 ecosystem approach (Vol. 332, pp. 85-102): ICES Cooperative Research Report.

643 Sayers, E. (2008, Updated 2018 Oct 24). E-utilities Quick Start. Entrez Programming Utilities Help. 644 Retrieved from www.ncbi.nlm.nih.gov/books/NBK25500/

645 Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., . . . Weber, C. F. 646 (2009). Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for 647 Describing and Comparing Microbial Communities. Applied and Environmental Microbiology, 648 75(23), 7537. doi:10.1128/aem.01541-09

649 Sigsgaard, E. E., Nielsen, I. B., Carl, H., Krag, M. A., Knudsen, S. W., Xing, Y., . . . Thomsen, P. F. 650 (2017). Seawater environmental DNA reflects seasonality of a community. Marine 651 Biology, 164(6). doi:10.1007/s00227-017-3147-4

652 Spens, J., Evans, A. R., Halfmaerten, D., Knudsen, S. W., Sengupta, M. E., Mak, S. S. T., . . . 653 Hellström, M. (2017). Comparison of capture and storage methods for aqueous macrobial eDNA using 654 an optimized extraction protocol: advantage of enclosed filter. Methods in Ecology and Evolution, 655 8(5), 635-645. doi:10.1111/2041-210x.12683

656 Stamatakis, A. (2014). RAxML version 8: a tool for phylogenetic analysis and post-analysis of large 657 phylogenies. Bioinformatics, 30(9), 1312-1313. doi:10.1093/bioinformatics/btu033

26 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

658 Stat, M., Huggett, M. J., Bernasconi, R., DiBattista, J. D., Berry, T. E., Newman, S. J., . . . Bunce, M. 659 (2017). Ecosystem biomonitoring with eDNA: metabarcoding across the tree of life in a tropical 660 marine environment. Scientific Reports, 7(1), 12240. doi:10.1038/s41598-017-12501-5

661 Taberlet, P., Coissac, E., Hajibabaei, M., & Rieseberg, L. H. (2012). Environmental DNA. Molecular 662 Ecology, 21(8), 1789-1793. doi:10.1111/j.1365-294X.2012.05542.x

663 Taberlet, P., Coissac, E., Pompanon, F., Brochmann, C., & Willerslev, E. (2012). Towards next- 664 generation biodiversity assessment using DNA metabarcoding. Molecular Ecology, 21(8), 2045-2050. 665 doi:10.1111/j.1365-294X.2012.05470.x

666 Takahara, T., Minamoto, T., Yamanaka, H., Doi, H., & Kawabata, Z. i. (2012). Estimation of Fish 667 Biomass Using Environmental DNA. PLoS ONE, 7(4), e35868. doi:10.1371/journal.pone.0035868

668 Thomsen, P. F., Kielgast, J., Iversen, L. L., Moller, P. R., Rasmussen, M., & Willerslev, E. (2012). 669 Detection of a diverse marine fish fauna using environmental DNA from seawater samples. PLoS 670 ONE, 7(8), e41732. doi:10.1371/journal.pone.0041732

671 Thomsen, P. F., Moller, P. R., Sigsgaard, E. E., Knudsen, S. W., Jorgensen, O. A., & Willerslev, E. 672 (2016). Environmental DNA from Seawater Samples Correlate with Trawl Catches of Subarctic, 673 Deepwater Fishes. PLoS ONE, 11(11), e0165252. doi:10.1371/journal.pone.0165252

674 Thomsen, P. F., & Willerslev, E. (2015). Environmental DNA – An emerging tool in conservation for 675 monitoring past and present biodiversity. Biological Conservation, 183, 4-18. 676 doi:10.1016/j.biocon.2014.11.019

677 Uriarte, A., Prouzet, P., & Villamor, B. (1996). Bay of Biscay and Ibero Atlantic anchovy populations 678 and their fisheries. Scientia Marina, 60, 237-255.

679 Valentini, A., Taberlet, P., Miaud, C., Civade, R., Herder, J., Thomsen, P. F., . . . Dejean, T. (2016). 680 Next-generation monitoring of aquatic biodiversity using environmental DNA metabarcoding. 681 Molecular Ecology, 25(4), 929-942. doi:10.1111/mec.13428

682 Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian classifier for rapid 683 assignment of rRNA sequences into the new bacterial taxonomy. Applied and Environmental 684 Microbiology, 73(16), 5261-5267. doi:10.1128/AEM.00062-07

685 Weigand, H., Beermann, A. J., Čiampor, F., Costa, F. O., Csabai, Z., Duarte, S., . . . Ekrem, T. (2019). 686 DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and 687 recommendations for future work. Science of the Total Environment, 678, 499-524. 688 doi:https://doi.org/10.1016/j.scitotenv.2019.04.247

689 Xiong, W., & Zhan, A. (2018). Testing clustering strategies for metabarcoding-based investigation of 690 community–environment interactions. Molecular Ecology Resources, 18(6), 1326-1338. 691 doi:10.1111/1755-0998.12922

692 Yamamoto, S., Masuda, R., Sato, Y., Sado, T., Araki, H., Kondoh, M., . . . Miya, M. (2017). 693 Environmental DNA metabarcoding reveals local fish communities in a species-rich coastal sea. Sci 694 Rep, 7, 40368. doi:10.1038/srep40368

695 Zhang, J., Kobert, K., Flouri, T., & Stamatakis, A. (2014). PEAR: a fast and accurate Illumina Paired- 696 End reAd mergeR. Bioinformatics (Oxford, England), 30(5), 614-620. 697 doi:10.1093/bioinformatics/btt593

27 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

698 Zinger, L., Bonin, A., Alsos, I. G., Bálint, M., Bik, H., Boyer, F., . . . Taberlet, P. (2019). DNA 699 metabarcoding—Need for robust experimental designs to draw sound ecological conclusions. 700 Molecular Ecology, 28(8), 1857-1862. doi:10.1111/mec.15060

701 702

28 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

703 Figures 704

705 706 707 Figure 1. Study area and sampling sites for the BIOMAN 2017 survey in the Bay of Biscay. 708 Triangles represent eDNA sampling sites where station depth was <90 m, squares, eDNA 709 sampling sites with depths between 90 m and 127 m, and circles, eDNA sampling sites with 710 >127 m depths. Crosses are located where pelagic fishing trawls were deployed. 100 m and 711 200 m isobaths are shown. 712 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

713 714 Figure 2. Relative number of reads (%) assigned to (A) Actinopterygii and (B) 715 Elasmobranchii species recovered from eDNA metabarcoding. Note that 4.96% 716 Actinopterygii were not classified into species level. 717 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

718 719 Figure 3. (A) Venn diagram showing fish species caught in trawls and detected through 720 eDNA metabarcoding organised in decreasing order according to biomass or number of reads. 721 (B) Relationship between the log10-transformed values for the number of reads and biomass 722 in Kg from all fish species simultaneously found through eDNA and caught during fish 723 trawling. Shaded area represents the 95% confidence interval of the linear regression. 724 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

725 726 Figure 4. Scatterplot showing the overall relationship between Bray-Curtis distance and 727 geographic distance between pairs of eDNA (grey) and trawling (blue) stations. Pearson 728 correlation is shown for each data group. Shaded area represents the 95% confidence interval 729 of the linear regression. 730 731 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

732 733 Figure 5. Non-metric multidimensional scaling (NMDS) plot, with a stress of 0.15, showing 734 the similarity of species from each sample based on their relative abundance. The ellipse 735 shows the 95% distance based on the centroid of the 3 sampling zones groups (coastal, neritic 736 and oceanic). Spatial patterns of the species with >1,000 reads are shown. 737 bioRxiv preprint doi: https://doi.org/10.1101/864710; this version posted December 5, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

738 739 Figure 6. Linear relationship between depth and the relative number of reads obtained for 740 each species with >1,000 reads. Scale of the right-hand y-axis has been amplified to ease 741 visualization, dashed lines correspond to the left-hand y-axis, whilst continuous lines to the 742 right-hand y-axis.