Molecular Ecology Resources

Starting a DNA barcode reference library for shallow water from the southern European Atlantic coast

Journal:For Molecular Review Ecology Resources Only Manuscript ID: Draft

Manuscript Type: Resource Article

Date Submitted by the Author: n/a

Complete List of Authors: Lobo, Jorge; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho; MARE – Marine and Environmental Sciences Centre, Departamento de Ciências e Engenharia do Ambiente, Universidade Nova de Lisboa Teixeira, Marcos; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho Borges, Luisa; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho; Helmholtz-Zentrum Geesthacht, Centre for Material and Coastal Research, Ferreira, Maria; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho Hollatz, Claudia; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho Gomes, Pedro; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho Sousa, Ronaldo; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho; CIIMAR/CIMAR – Interdisciplinary Centre of Marine and Environmental Research, Universidade do Porto, Ravara, Ascensão; CESAM – Centre for Environmental and Marine Studies, Departamento de Biologia, Universidade de Aveiro, Costa, Maria; MARE – Marine and Environmental Sciences Centre, Departamento de Ciências e Engenharia do Ambiente, Universidade Nova de Lisboa Costa, Filipe; CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho

Annelida, Benthos, Cytochrome c oxidase subunit I (COI-5P), Estuaries, Keywords:

Page 1 of 39 Molecular Ecology Resources

1 Starting a DNA barcode reference library for shallow water polychaetes from the southern European

2 Atlantic coast

3 Keywords : Annelida, benthos, cytochrome c oxidase subunit I (COI5P), estuaries, taxonomy

4

5 JORGE LOBO 1,2,*, MARCOS A. L. TEIXEIRA 1, LUISA M. S. BORGES 1,3, MA RIA S. G. FERREIRA 1,

6 CLAUDIA HOLLATZ 1, PEDRO A. GOMES 1, RONALDO SOUSA 1,4, ASCENSÃO RAVARA 5, MARIA H.

7 COSTA 2, FILIPE O. COSTA 1 8 For Review Only 9 1CBMA – Centre of Molecular and Environmental Biology, Departamento de Biologia, Universidade do Minho,

10 Campus de Gualtar, 4710057 Braga, Portugal, 2MARE – Marine and Environmental Sciences Centre,

11 Departamento de Ciências e Engenharia do Ambiente, Faculdade de Ciências e Tecnologia da Universidade

12 Nova de Lisboa, 2829516 Monte de Caparica, Portugal, 3HelmholtzZentrum Geesthacht, Centre for Material

13 and Coastal Research, MaxPlanckStraße 1, 21502, Germany, 4CIIMAR/CIMAR – Interdisciplinary Centre of

14 Marine and Environmental Research, Universidade do Porto, Rua dos Bragas, 123, 4050123, Porto, Portugal,

15 5CESAM – Centre for Environmental and Marine Studies, Departamento de Biologia, Universidade de Aveiro,

16 Campus de Santiago, 3810193 Aveiro, Portugal.

17

18 * Correspondence: [email protected]

19

20 Abstract

21

22 polychaetes, as a group, have been seldom the focus of dedicated DNA barcoding studies, despite their

23 ecological relevance and often dominance, particularly in softbottom estuarine and coastal marine ecosystems.

24 Here we report the first assessment of the performance of DNA barcodes in the discrimination of shallow water

25 from the southern European Atlantic coast, focusing on specimens collected in estuaries and

26 coastal ecosystems of Portugal. We analysed cytochrome oxidase I DNA barcodes (COI5P) from 164

27 specimens, which were assigned to 51 morphospecies. To our dataset from Portugal, we added available

28 published sequences selected from the same species, genus or family, to inspect for taxonomic congruence

29 among studies and collection location. The final dataset comprised 79 morphospecies and 290 specimens, which

1 Molecular Ecology Resources Page 2 of 39

30 generated 99 Barcode index numbers (BINs) within Barcode of Life Datasystems (BOLD). Among these, 22

31 BINs were singletons, 47 other BINs were concordant, confirming the initial identification based on

32 morphological characters, and 30 were discordant, most of which consisted on multiple BINs found for the same

33 morphospecies. Some of the most prominent cases include Hediste diversicolor (O.F. Müller, 1776) (7),

34 viridis (Linnaeus, 1767) (2) and Owenia fusiformi s (delle Chiaje, 1844) (5), all of them reported from Portugal

35 and frequently used in ecological studies as environmental quality indicators. However, our results showed

36 discordance between molecular lineages and morphospecies, or added additional relatively divergent lineages. 37 The potential inaccuraciesFor in environmental Review assessments, where Only underpinning polychaete species diversity is 38 poorly resolved or clarified, demands additional and extensive investigation of the DNA barcode diversity in this

39 group, in parallel with alpha taxonomy efforts.

2 Page 3 of 39 Molecular Ecology Resources

40 Introduction

41

42 The estuarine and coastal intertidal areas have a large number of benthic invertebrates, where the polychaetes are

43 one of the most representative classes and, therefore, important indicators of environmental quality in these

44 ecosystems (Sousa et al. 2006, 2008). In addition, they are also an important trophic link in the food chain being

45 an important prey for many species with conservational importance (e.g. fish, birds and mammals), are

46 responsible for important ecosystem functions that include nutrient cycling and also being important ecosystem 47 engineers due bioturbationFor and bioirrigation Review activities (Kristensen Only et al . 1985; Volkenborn et al. 2007). Some 48 polychaete species are also used as baits and are, therefore, economically important for fisheries in several

49 European countries (Gambi et al. 1994; Olive 1994; Gillet & Torresanib 2003). However, polychaetes as a

50 whole are poorly studied in comparison to other taxa of similar ecological importance (Quijón & Snelgrove

51 2005).

52 Concerns with the adequate management of marine ecosystems gained more momentum with the

53 implementation of the Directive 2000/60 / EC of the European Parliament and Council, 23 October 2000. This

54 directive establishes a framework for Community action in the field of water policy, which requires the

55 preparation of management plans for watersheds for each river basin, including estuaries, in order to achieve

56 good ecological and chemical status, and also contributing to mitigate the effects of floods. Studies focusing on

57 polychaetes could contribute to this goal. However, species identification of this functional group based on

58 morphological characters is of great difficulty (Coull 1999). These difficulties are patent in the polychaete

59 records available in the BOLD database, where among the 10430 published records only 6055 were identified to

60 the species level (accessed in 8 October 2014).

61 An accurate identification based on morphological approaches can be extremely difficult and sometimes an

62 impossible task (Rouse & Pleijel 2001. The difficulties often linked to a reliable species discrimination include, a

63 shortage of taxonomists, the use of incomplete identification keys and the collection of degraded or bodily

64 injured specimens caused by sampling techniques (Knowlton 1993). In addition, taxonomic ambiguities and

65 uncertainties are frequently generated by the presence of complex life stages and cryptic or hidden species

66 (Knowlton 1993; Jarman & Elliott 2000; Bickford et al. 2006; Nygren 2014).

67 Ecological and biogeographic studies rely upon the ability to distinguish between morphologically similar

68 species, as well as knowledge of their evolutionary relationships. In order to respond to this limitation, several

3 Molecular Ecology Resources Page 4 of 39

69 studies have examined variation in the mitochondrial DNA sequences of the cytochrome oxidase I gene and

70 confirmed DNA barcodes (Hebert et al . 2003a) as a reliable approach to discriminate species of polychaetes (e.g.

71 Glover et al . 2005; Bleidorn et al . 2006; Rice et al . 2008; Olson et al . 2009; Pleijel et al. 2009; Barroso et al.

72 2010; Nygren & Pleijel 2011). Some studies revealed interesting findings in polychaete species that were

73 presumed to be cosmopolitan, but which were in fact species complexes comprising several cryptic species, as

74 for example Eurythoe complanata (Pallas, 1766) (Barroso et al. 2010) and the Eumida sanguinea (Örsted, 1843)

75 complex comprising between up to ten additional putative species (Nygren & Pleijel 2011). DNA barcoding 76 differs from the approachesFor and conventional Review taxonomic identification Only tools by allowing direct comparison of 77 specimens with a global reference library, allowing cryptic detection as well as the identification of species from

78 fragments, at any stage of the life cycle; thus creating a universal master key in a format that reduces ambiguity

79 (Costa & Carvalho 2007). In some cases, with the help of this molecular tool, the morphological, ecological and

80 behavioral differences that were once overlooked can now be detected after further examination of divergent

81 taxa (Hebert et al. 2004; Smith et al . 2006). Nevertheless, most existing DNA barcode studies concerning

82 polychaetes focused only on a particular species or genus (e.g. Jolly et al. 2006; Virgilio et al. 2009; Berke et al .

83 2010; Sampertegui et al . 2013). Very few addressed a greater diversity of species for this class of (e.g.

84 Aguado et al . 2007; Ravara et al. 2010; Norlinder et al . 2012), and others, focused mostly on specimens from the

85 American continent, like Canada, Alaska and other areas of the Artic (Zanol et al ., 2010; Carr et al . 2011; Hardy

86 et al . 2011), and from China (Zhou et al . 2010). According to Rouse & Pleijel (2001), the global number of

87 accepted polychaete species reaches 9000, not counting with a few thousand more that were named, but which

88 are currently considered invalid. Admittedly, there are still many more polychaetes to be described making this

89 class an important component in the diversity of marine . According to BOLD (accessed in 17 October

90 2014), only 793 species are reported to have published DNA barcodes, among which Norway and the “Atlantic

91 Ocean” are the most well represented European zones in terms of published COI sequences of Polychaeta (621

92 and 209 sequences, respectively). These contrast with the much higher number of sequences from Canada (2234)

93 and USA (1123), although 3230 sequences were unspecified regarding their geographical location.

94 The aims of the present study were: i) to contribute to the enrichment of a global reference library of

95 DNA barcodes for polychaetes from the European Atlantic coasts, starting with the polychaetes from Portugal;

96 and ii) to promote taxonomic revisions, highlighting putative hidden species and detecting new occurrences. This

97 reference library of DNA barcodes for polychaetes can be of paramount importance for biomonitoring of coastal

4 Page 5 of 39 Molecular Ecology Resources

98 benthic communities using new highthroughput sequencing technologies (Costa & Antunes 2012), diagnosis of

99 littleknown species from deep sea communities (Knox et al . 2012) or to gather species level data for the

100 investigation of important ecological topics such as trophic interactions, namely by the identification of

101 polychaete prey species in the gut of fish and other predators (see Smith et al . 2005).

102

103

104 Material and Methods 105 For Review Only 106 Sample collection

107

108 Sediment samples were collected from several sites along the Portuguese coast, coastal lagoons and estuaries

109 namely Minho estuary, Lima Estuary, Ria de Aveiro lagoon and Sado estuary (Fig. 1), using a corer sampler

110 (110 mm diameter) and then sieved through a 0.5 mm screen. Samples were transported to the laboratory under

111 refrigerated conditions. In the laboratory each specimen was photographed, preserved in absolute ethanol and

112 morphologically identified to species level or to the lowest possible taxonomic rank, using the stereomicroscope,

113 and with the aid of species descriptions and keys in specialized monographs and literature (e.g. Fauvel 1927;

114 Hayward & Ryland 1995).

115

116 Genetic analysis

117

118 DNA extraction from a small piece of muscle tissue from each specimen was carried out using the E.Z.N.A.

119 Mollusc DNA Kit (Omega Biotek), following the manufacturer’s instructions. The barcode region of the

120 mtDNA gene of COI5P was amplified in an iCycler TM (BioRad) thermal cycler using a premade PCR mix

121 from Invitrogen TM and five alternative primer pairs (Table 1), depending on amplification success. PCR thermal

122 cycling conditions for each primer pair are also presented in Table 1. Each reaction contained 2.5 l 10× PCR

123 buffer, 2.5 l of 1.5 mM MgCl 2, 0.5 l of 0.2 mM dNTP mixture, 0.2 l of 5 U/ l of DNA Taq polymerase plus

124 10 M of each primer (1.5 l for LoboR1/LoboR1, polyLCO/polyHCO and jgLCO1490/ jgHCO2198; 0.5 l for

125 LCO1490/ HCO2198 and C_VF1LFt1/ C_VR1LRt1) and 4 l of DNA template and completed with sterile

126 milliQgrade water to make up a total volume of 25 l.

5 Molecular Ecology Resources Page 6 of 39

127 Free nucleotides and primers were removed from the PCR products using shrimp alkaline phosphatase

128 (GE Healthcare) and exonuclease I (Thermo Scientific Fermentas) and then sequenced bidirectionally using the

129 BigDye Terminator 3 kit, and run on an ABI 3730XL DNA analyser (all from Applied Biosystems TM ) by STAB

130 Vida Lda (Portugal).

131

132 Data treatment and analyses

133 134 All sequence trace filesFor were edited individuallyReview to remove primers Only and eventual low quality ends, and inspected 135 for ambiguous base calls. The resultant sequences were aligned using Clustal W implemented in MEGA v. 6.0

136 (Tamura et al . 2013) and carefully inspected for eventual indels, stop codons or unusual amino acid sequence

137 patterns (see Song et al . 2008). The Neighbor Joining (NJ) method was applied to construct phenograms using

138 the Kimura2parameter (K2P) model for the nucleotide–based phenogram (Kimura 1980) and the JonesTaylor

139 Thornton (JTT) model for the amino acid substitution (Jones et al . 1992). Intra and interspecific distances were

140 also calculated using K2P model. COI5P sequences belonging to close taxa were mined from Genbank and

141 BOLD databases and a dedicated dataset was created in BOLD comprising all the sequences used in this study.

142 We used BINs (Ratnasingham & Hebert 2013) provided by BOLD as a model for MOTU clustering for

143 all sequences, and a BIN discordance report was generated in order to enable comparison between

144 morphospecies and MOTUs generated by COI5P sequence data. The taxonomic reliability of the species

145 records generated in this study was ranked using the grades A to E proposed by Costa et al . (2012), here re

146 adapted to the use of BINs as MOTU delimitation criteria:

147 Grade A: External concordance: unambiguous BIN match with all specimens of the same morphospecies

148 from other BOLD projects or published sequences.

149 Grade B: Internal concordance: species’ BIN congruent within our dataset, with at least 3 specimens of

150 the same species examined. No matching sequences found from other studies.

151 Grade C: Suboptimal concordance (possible within species genetic structure): at least 3 specimens of the

152 same morphospecies are available within the library but they are split among more than one nearest neighboring

153 BINs.

154 Grade D: Insufficient Data: low number of specimens analysed (1 or 2 individuals) and no matching

155 sequence available in BOLD.

6 Page 7 of 39 Molecular Ecology Resources

156 Grade E: Discordant species assignments: sequences for a given species in our dataset did not match with

157 the BIN or BINs for the same species in BOLD. The specimen may match with a BIN of a different species or

158 falls in a separate nonneighboring BIN.

159

160 Hediste diversicolor complex

161

162 Preliminary analyses of Hediste diversicolor sequences, and the existence of published evidence for a species 163 complex (AudzijonyteFor et al. 2008; VirgilioReview et al. 2009), justified Only a dedicated independent analyses for this 164 species. We merged COI sequences from the studies of Audzijonyte et al. (2008) and Virgilio et al. (2009) with

165 our own COI5P data for this species, and complemented this dataset with available COI data for a congeneric

166 species – Hediste atoka Sato & Nakashima, 2003 – obtained from Japan (Tosuji & Sato 2010).

167 Sequences were aligned using Clustal W implemented in MEGA v. 6.0, and pruned to 318 bp to

168 standardize to the fragment length used by Virgilio et al. (2009). A Maximum Likelihood (ML) phylogenetic

169 reconstruction was performed in MEGA v. 6.0 using the bestfit substitution model GTR+Γ+I as indicated by the

170 same software. Node support was assessed through 1000 bootstrap replicates. Average genetic distance among

171 the most patent sequence clusters, detected through inspection of the ML tree, was determined using the K2P

172 substitution model. K2P was chosen for this purpose to enable direct comparison of genetic distances with

173 related studies.

174

175

176 Results

177

178 A total of 164 original COI sequences from 51 polychaeta morphospecies (9 specimens identified only to genus

179 level and 12 specimens to family level) belonging to 20 families were generated in this study together with data

180 from the study by Lobo et al . (2013) (PolyPT dataset in Table 2). 47% of the COI5P sequences were produced

181 using the primer set 1 (LoboF1/LoboR1), 27% with primer set 2 (polyLCO/polyHCO), 16% with primer set 3

182 (LCO1490/HCO2198), 8% with primer set 4 (C_VF1LFt1/ C_VR1LRt1) and 1% with primer set 5

183 (jgLCO1490/jgHCO2198). Additional 126 COI5P sequences mined from BOLD and GenBank were added to

184 the alignment (Table 2). Taxonomic classification, number of specimens and their geographic origin are shown

7 Molecular Ecology Resources Page 8 of 39

185 in Table 3. COI5P sequences with 658 bp were obtained for 58% of specimens (95), while the remaining

186 individuals had sequences between 500 and 657 bp. Upon aligning and translating all sequences no stop codons

187 were found. However, we detected a 3nucleotide deletion, which was exclusively present in all seven species of

188 the family Capitellidae, including those added from GenBank. The deletion starts at the nucleotide position 103

189 of the barcode region and, upon translation, corresponds to the deletion of an aminoacid residue in position 35,

190 which is typically daspartic acid (asp) in the remaining sequences, except in the species Cirratulus spectabilis

191 (Kinberg, 1866), which is eglutamic acid (glu). 192 For Review Only 193 Intra and interspecific divergences

194

195 Global intra and interspecific distances concerning all the polychaete species and genera under analyses are

196 provided in the Table 4. The mean intraspecific distance was 2.2% (range 0.0–33.3%), while the average

197 congeneric distance was 24.0% (range 0.3–35.4%) and the average within family distance was a little higher

198 24.4% (14.2 35.5%). Using PolyPT data only, the intraspecific divergence was substantially lower (0.4%) and

199 the minimum congeneric distance (14.4%) exceeded the maximum intraspecific divergence (6.0%). Reanalyses

200 of the global dataset excluding most cases of flagrant species mismatches (9 ambiguous cases), which will be

201 discussed later, produced considerably distinct average divergences (0.5% (range 0–3.3%) and 23.7% (20.0 –

202 31.1%) respectively for intra and interspecific divergences).

203

204 NJ phenogram

205

206 Figure 2 shows a NJ phenogram produced using 290 COI5P sequences from 79 polychaete morphospecies.

207 Thirteen species are represented by only one sequence, and 56 of the remaining 66 species were grouped in

208 monophyletic clades with low divergence (BINconcordant clades). With a few exceptions, families and orders

209 also grouped congruently with the current classification. Our sequences for Pista cristata (Müller, 1776) , and

210 Glycera alba (O.F. Müller, 1776) did not group in the same clade with specimens of these species obtained from

211 other studies. Seven species displayed comparatively high intraspecific divergences (> 3%), splitting in more

212 than one lineage with high bootstrap support, (2), Hediste diversicolor (detailed results below),

8 Page 9 of 39 Molecular Ecology Resources

213 Nereis pelagica Linnaeus, 1758 (3), Owenia fusiformis (5, of which 3 from Portugal) , Praxillella praetermissa

214 (Malmgren, 1865) (3) and Trypanosyllis zebra (Grube, 1860) (4).

215

216 BINs and ranking system

217

218 BIN summary can be found in the Table 5. Regarding the taxonomic concordant BINs, only 15% of the original

219 sequences generated in this study could be compared to external data. There were no available data for the 220 remaining 85% of PolyPTFor specimens. Review Discordant BINs were analysed Only and filtered in order to remove the outlier 221 taxonomic identifications. For example, some polychaete COI5P lineages appear in BoldSystems database as

222 different species (e.g. Nereis pelagica and Nereis pelagica CMC02) and this BIN would be discordant; therefore,

223 it was considered concordant in this study. The BIN for one record (MPCPT03814 Eteone flava (Fabricius,

224 1780)) was not attributed because the quality standards were not met. 99 BINs were attributed to 79 analysed

225 species.

226 The ranking system was applied to thirty of fiftyone species identified at the species taxonomic level.

227 The ranking grades A and/or B (high taxonomic reliability) were attributed to 15 species (50%): Diopatra

228 neapolitana (Delle Chiaje, 1841), Eteone flava , Euclymene robusta (Arwidsson, 1906), Euclymene

229 santandarensis (Rioja, 1917), Heteromastus filiformis (Claparède, 1864) , Leiochone leiopygos (Grube, 1860),

230 Marphysa sanguinea (Montagu, 1815), Neanthes fucata (Savigny in Lamarck, 1818) , Nephtys hombergii

231 (Savigny in Lamarck, 1818) , Nereis falsa Quatrefages, 1866 , Notomastus profondus (Eisig, 1887), Perinereis

232 cultrifera (Grube, 1840) , Sabellaria alveolata (Linnaeus, 1767), Scolelepis foliosa (Audouin & Milne Edwards,

233 1833) and Trichobranchus glacialis Malmgren, 1866 ; 4 species (13%) showed a high intraspecific divergence

234 (grade C): Hediste diversicolor , Nereis pelagica , Owenia fusiformis and Platynereis dumerilii (Audouin & Milne

235 Edwards, 1834) ; 6 species (20%) were attributed a grade D (insufficient data): Axiothella constricta (Claparède,

236 1869) , Cirriformia tentaculata (Montagu, 1808), Lysidice ninetta (Audouin & MilneEdwards, 1833) ,

237 Lumbrineris latreilli Audouin & Milne Edwards, 1834, Sabella pavonina (Savigny, 1822) and Sthenelais boa

238 (Johnston, 1833); and only 5 species (17%) were attributed a grade E (incongruent DNA barcodes): Eulalia

239 viridis , Glycera alba , Pista cristata and Praxillella praetermissa and Trypanosyllis zebra . Grades B and C were

240 the most represented.

241

9 Molecular Ecology Resources Page 10 of 39

242 Hediste diversicolor complex

243

244 From the alignment of compiled COI sequences (this study; Audzijonyte et al. 2008; Virgilio et al . 2009) of H.

245 diversicolor , a ML tree was produced (Fig. 3) that generated 4 highly divergent clades (3 of them previously

246 recognized by Virgilio et al. 2009), namely: I) from northeast Atlantic coasts (from Germany to Morocco) to

247 western Mediterranean Sea; II) Mediterranean Sea; III) Black and Caspian Seas; and IV) Adriatic Sea. Average

248 distance between these 4 clades was 8% (range 5.7% to 9.1%). All sequences generated from Portugal grouped 249 within clade I, which For displayed an averageReview distance of 3.8% (0.0%Only – 8.7%). Sequences of specimens from 250 Portugal had average genetic distances of 3.5% (0.0% – 5.6%), did not form a separate subclade, as previously

251 observed (Virgilio et al. 2009) and were assigned to 7 different BINs (only considering PolyPT dataset). The

252 relatively high distances found among specimens from Portugal were detected both among and within sampling

253 locations (Minho and Lima estuaries and Aveiro lagoon). Some specimens of H. diversicolor collected in the

254 same exact site and collection event displayed genetic distances as high as 5% (e.g. specimens collected in Lima

255 estuary).

256

257 Discussion

258

259 Reference library of DNA barcode

260

261 This study contributes DNA barcodes for 51 morphospecies of shallow water polychaetes from the Atlantic

262 European coast, of which 15 species are new additions to the global reference library of DNA barcodes.

263 Efficiency of DNA barcode in species discrimination relies on the occurrence of a gap between ‘within species’

264 and ‘among genus’ COI distances (Bucklin et al . 2010; Costa & Carvalho 2010). We have found such distance

265 gap within the Portuguese dataset here analyzed. Average within species (0.6%) and among genus (21.2%)

266 distances were comparable to those found in a major DNA barcoding study in this taxonomic group (Carr et al.

267 2011), and within the range that has been found in other marine invertebrate groups (Costa et al. 2009; Matzen

268 da Silva et al. 2011). The value for the average within species divergence increased to 2.2% when we added

269 publicly available sequences from other authors to our analyses. This sizeable increase appears to originate

270 mostly from several cases of discordances between morphologybased identifications and BINs (multiple

10 Page 11 of 39 Molecular Ecology Resources

271 morphospecies within a BIN, or a single species split among two or more BINs). These particular cases are

272 discussed in detail further below.

273

274 Within our reference library, species records with reliability grades A or B amount only to 50.0%, which is

275 substantially lower than what has been determined for other reference libraries, as for example for fish of the NE

276 Atlantic (84.9%, Knebelsberger et al. 2014). This does not imply that polychaetes’ libraries are inherently less

277 reliable, or that DNA barcoding performs less effectively in this group. Indeed, much as 20.0% of our species 278 records lacked matchingFor sequences fromReview other studies for comparison Only or did not have enough data (grade D) to 279 be attributed a higher grade. Species displaying comparatively high withinspecies divergences (grade C)

280 account for 17.0% of the species records, and require further inspection to confirm or refute their taxonomic

281 status, including checking the possibility of the occurrence of hidden species. At last, only 13.0% of the species

282 displayed mismatches between morphologybased identifications and BINs (grade E), which may include cases

283 of misidentification, as discussed further below. Globally, these values reflect the still very incipient stage of

284 completion of the global reference library of DNA barcodes for this group.

285

286 Comparison between morphology-based identifications and BINs

287

288 Considering the global dataset, compared to the number of species identified based on morphology (79), the

289 corresponding number of BINs was much higher (99). Excluding the 22 singleton BINs, which constitute

290 additions of species records to the global reference library, among the remaining 77 BINs, only 47 were

291 concordant, displaying a onetoone link with morphologically identified species. Some of them include

292 specimens displaying small genetic distances (<1%), although they originated from populations geographically

293 very distant from each other (e.g. Diopatra neapolitana and Sthenelais boa ) including populations from

294 European and North American Atlantic coasts ( Lysidice ninetta ). In cases where BINs contained higher

295 numbers of members, it was possible to have increased confidence on the taxonomic identifications of those

296 specimens. COI5P sequences belonging to the species Diopatra neapolitana and Owenia fusiformis with 85 and

297 31 members, respectively are two good examples of such a situation (but see discussion about O. fusiformis

298 further below).

299

11 Molecular Ecology Resources Page 12 of 39

300 Discordant BINs may occur for several reasons, and may have a biological or operational origin (Ratnasingham

301 & Hebert 2013). The main biological reasons include the occurrence of several relatively distant lineages within

302 a species, or DNA barcode sequence sharing among species. Operational reasons include misidentifications,

303 sample contamination, sample mislabeling, species names syntaxderived faults or inaccuracies of the BIN

304 delineation algorithm (Hebert et al . 2003b; Ratnasingham & Hebert 2013). The 30 discordant BINs resulted

305 from several of the above reasons. Twentysix BINs originated from multiple relatively distant lineages within a

306 morphospecies, namely Eulalia viridis (2 BINs), Hediste diversicolor (7 BINs), Owenia fusiformis (5 BINs), 307 Platynereis dumerilii (2For BINs), Praxillella Review praetermissa (3 BINs), Only Nereis pelagica (3 BINs) and Trypanosyllis 308 zebra (4 BINs). Most of these multiple withinspecies lineages were already documented in previous studies.

309 Hediste diversicolor is a particular case that contributes to a significant proportion of the discordant BINs and

310 will be discussed separately below.

311

312 DNA barcode sequences of Eulalia viridis collected in Portugal differed from sequences of conspecifics

313 collected in Russia (Hardy et al. 2011), as much as 22.0%. Because we did not find sequences from other studies

314 that would match the ones from Portugal, we could not conclude if these results derived either from inaccurate

315 taxonomic identification or from detection of undescribed species. However, there is already evidence from

316 previous studies using biochemical and surgical morphological analysis proposing that E. viridis is a complex of

317 two species in Northern Europe. Local populations along the coasts of Northern Europe differ slightly in

318 coloration and time of reproduction; on the western coast of Sweden the reproductive cycle starts 4 to 6 weeks

319 earlier than on the coast of the United Kingdom and France (Olive 1975; Pleijel 1993). Bonse et al. (1996)

320 divided the complex in E. viridis and (Audouin & Milne Edwards, 1833), the first one found in

321 Sweden, Denmark and Germany, and the second in France and England. It was referred that both species are

322 morphologically similar and a possible explanation is that larvae of populations of the original common species

323 that separated after the Last Ice Age are no longer able to cross the North Sea barrier. Alternatively, it is also

324 conceivable that while the Ice Age was still in progress, populations became reproductively isolated, giving rise

325 to distinct species. Considering solely geographical proximity with the type locality, one would assume that the

326 sequences from Russia might belong to E. viridis and the ones from Portugal would correspond to E. clavigera ,

327 although definitive conclusions require examination of additional specimens and populations .

328

12 Page 13 of 39 Molecular Ecology Resources

329 The morphospecies Owenia fusiformis was second in the number of BINs (5), and displayed an equal number of

330 distinct lineages in the NJ tree (L1 – L5), diverging between 14.0 and 22.0%. Specimens from Canada (L1) and

331 USA (L2) (Carr et al. 2011) are grouped in their own separate branches diverging about 14.0%, whereas the 3

332 lineages from the European Atlantic coast (L3L5; Jolly et al. (2006) and our data) diverge in average 18.0%,

333 and are more conspicuous, comprising each specimens from both Brittany and Portugal. The 3 sympatric

334 lineages of O. fusiformis in the NE Atlantic were first reported by Joly et al. (2006), who found evidence of this

335 species persistence in small northern glacial refugia, and of northwards range expansion from regions situated 336 closer to the Mediterranean.For However, Review whether the expansion towards Only the northeast Atlantic actually reflects 337 separate interglacial periods is unclear due to the lack of a molecular clock calibration for coastal polychaete

338 species (Joly et al. 2006). All O. fusiformis specimens examined in our study were collected in Ria de Aveiro

339 lagoon (NW Portugal) and grouped in a separate lineage (L5) from other specimens from the Sado estuary (SW

340 Portugal), examined by Jolly et al. (2006). This indicates the presence of an equal number of cryptic O.

341 fusiformis lineages in Portugal, as those previously found in Brittany.

342

343 A possible species complex within Trypanosyllis zebra (Syllidae) was signaled after comparison of the COI5P

344 sequences generated in this study with three previously published sequences in Aguado et al. (2007, 2012),

345 resulting in 4 BINs diverging in average 27.0%. Since the particular morphology of T. zebra is very

346 characteristic with conspicuous ‘zebra like’ dorsal stripes (Fauvel 1927; Hayward & Ryland 1995), and therefore

347 hard to miss, an identification error is less probable in this case. Because of the low number of sequences

348 examined, comparisons among populations are very limited in this case. Notably, within the family Syllidae, at

349 least 3 species complexes have already been reported, including Syllis alternata Moore, 1908 and Syllis elongata

350 Day, 1949 (Carr et al . 2011) and Syllis gracilis Grube, 1840 (Maltagliati et al. 2000). The last examples of

351 species distributed among multiple BINs emerging from the dataset comprise cases of moderate divergence

352 among lineages and low numbers of sequences examined; therefore, preventing further inferences. In the case of

353 Platynereis dumerilii specimens from Portugal and Italy diverge around 2.9%, whereas in Marphysa sanguinea

354 specimens from Portugal and France diverge 2.5%. These differences were enough to split P. dumerilli in two

355 BINs, albeit only one BIN was attributed to M. sanguinea . Our sequences of Nereis pelagica together with the

356 ones from Carr et al . (2011) were split into 3 distinct lineages according to their geographic location (Canada,

357 Artic Ocean; USA, Alaska; Europe, Portugal, Russia plus one specimen from the NW Atlantic). Sympatric

13 Molecular Ecology Resources Page 14 of 39

358 lineages were also found in Canada and Russia for the specimens of Praxillella praetermissa (Hardy et al . 2011),

359 and our data suggest an additional divergent lineage for this species.

360

361 Some discordant BINs signaled possible misidentifications. That is the case of Glycera alba , where specimens

362 from India and Sweden (22) grouped on a different BIN compared to the Portuguese ones (20). However, BIN

363 22 included also two specimens (no available sequences in public BoldSystems database.) identified as Glycera

364 lapidum Quatrefages, 1866. The discordant BINs found for Pista cristata , and Pista estevanica (Berkeley & 365 Berkeley, 1942) may alsoFor result from Review misidentifications. Specimens Only of P. cristata and P. estevanica from USA 366 grouped in the same BIN, with only 0.3% of distance, contributing greatly to the abnormal minimum divergence

367 values found within genus. Our single sequence of P. cristata from Portugal generated in this study is 27.0%

368 distant from its USA conspecific. Part of the observed discrepancies only confirm the great difficulty in the

369 taxonomic identification of this class of annelids and most of these ambiguities were only unraveled by

370 confrontation with molecular data, which otherwise may have passed unnoticed.

371

372 Other cases that gave rise to discordant BINs may reflect the particularities of the BIN clustering algorithm The

373 BIN system benefits from some level of plasticity in MOTU delimitation. The initial limit used to differentiate

374 MOTUs, and therefore for splitting specimens into separate BINs, is 2.2% (Ratnasingham & Hebert 2013).

375 However, the BIN 66 corresponding to the species Nephtys hombergii had 4.2% of intraspecific divergence,

376 while the BIN 3 (Hediste diversicolor ) had only 2.0% distance to the nearest BIN. Such situations, although rare,

377 may occur due to the algorithm used for assigning the DNA barcodes into BINs. The algorithm RESL (Refined

378 single linkage) uses the method "Markov clustering" (MCL), which allows to group members that possess high

379 variability in their sequence but with no discontinuity to remain as a single MOTU (Ratnasingham & Hebert

380 2013). On the other hand, in the case of a group whose sequence variation presents clear internal partitions, these

381 are split over two or more BINs, even if the difference is less than 2.2%. This allows the Markov Clustering

382 (MCL), a graph analytical approach, to separate sets of sequences that may be overlooked by a fixed value, and

383 to refine BIN attribution, as additional data is available.

384

385 Hediste diversicolor complex

386

14 Page 15 of 39 Molecular Ecology Resources

387 The existence of cryptic diversity within H. diversicolor has been recognized in previous studies using COI

388 sequences and allozymes (Audzijonyte et al. 2008) or COI and cyt b sequences (Virgilio et al . 2009). These

389 studies identified between 3 and 4 highly divergent clades of H. diversicolor , with total or partial geographic

390 overlapping, whose divergence supersedes or equals common interspecific divergence levels among polychaete

391 species (Aguado et al . 2007; Zanol et al . 2010; Zhou et al . 2010; Carr et al . 2011; Hardy et al . 2011; Norlinder et

392 al . 2012). In this scope, H. diversicolor is not much different from various other cases of multiple highly

393 divergent lineages within species that are reported in this and other studies (Jolly et al. 2006; Hardy et al. 2011). 394 However, apart from theFor highly divergent Review clades, H. diversicolor alsoOnly displays a complex pattern of withinclade 395 variability, particularly in the case of clade I, to which all specimens from Portugal belong. Indeed, clade I

396 comprises a fair number of specimens (114) and populations sampled, extending from Morocco to Norway,

397 where a clear genetic or geographic structure is hard to perceive. Although within clade genetic distances are not

398 as high as typical values found between congeneric polychaete species, they are much higher than what is

399 usually observed within species, or even within species clades. Coincidentally, the congeneric species H. atoka

400 presents the same pattern. These findings somewhat deviate from the typical pattern of low withinspecies /

401 withinclade variation in DNA barcodes that has been recorded in taxa (Hebert et al . 2003a). The fact that

402 representative specimens from Portugal and Finland were assigned to as much as 7 and 20 different BINs

403 respectively, illustrates the uniqueness of this case.

404 Nuclear mitochondrial pseudogenes (NUMTs), coamplified with the target DNA barcode region during PCR,

405 have been pointed as potential sources of overestimation of within species diversity (Song et al . 2008; Williams

406 & Knowlton 2001). Because NUMTs are not selectively constrained, they are susceptible to mutations leading to

407 gap insertion or deletion, frame shift mutations or unviable amino acid substitutions (Williams & Knowlton

408 2001). We did not find evidence of the above in our H. diversicolor sequences. These sequences showed a

409 significantly higher proportion of synonymous compared to nonsynonymous substitutions typical of a

410 functional gene and, upon translation, no unusual pattern was found. The amino acid sequences displayed low

411 variability and were identical among some specimens within clade I, including among those displaying high

412 nucleotide distances, whereas some of them were even identical to Hediste atoka amino acid sequences.

413 H. diversicolor sequences from Portugal generated in this study, appear to confirm a trend for high genetic

414 diversity within species / clade, identical to the one indicated by the study of Virgilio et al . (2009), where

415 comparatively high levels of nucleotide (π) and haplotype ( h) diversity have been reported based on COI and

15 Molecular Ecology Resources Page 16 of 39

416 cytb sequences, namely within clade I. Thus, considering the available data on the whole (206 sequences from

417 multiple locations), it appears that H. diversicolor displays a fairly higher level of variability in COI sequences

418 compared to other species here analyzed, and also compared to the typical pattern of low withinspecies variation

419 in DNA barcodes that has been recorded in numerous animal taxa (Hebert et al . 2003b; Costa et al . 2012). The

420 reasons for this unusual pattern of variability are not known yet, and deserve further examination. Because H.

421 diversicolor is a dominant, widespread and ecologically relevant species (Quijón & Snelgrove 2005), acting both

422 as a predator and prey for numerous species, being an important ecosystem engineer due to their bioturbation 423 activities, and playing Foran important roleReview in nutrient cycling the implications Only of these findings for the ecology of 424 soft bottom coastal communities, along its distribution range, should be investigated and clarified.

425

426 Final remarks

427

428 Previous studies demonstrated the ability of DNA barcodes to distinguish polychaete species (e.g. Aguado et al .

429 2007, 2012; Barroso et al. 2010; Carr et al . 2011), and our data globally confirmed it. Beyond that confirmation,

430 there are two points that were evident from our study: the incipient state of the completion of the global

431 reference library of DNA barcodes for polychaetes, and a comparatively high prevalence of discordances

432 between morphological identifications and MOTUs, which was largely due to the occurrence of multiple distant

433 lineages within several morphospecies.

434 The apparent low effort visible in the assembly of reference libraries for polychaetes indicates that this may be a

435 comparatively understudied marine invertebrate group, with a comparatively lower number of dedicated

436 taxonomic experts, and whose morphological features can be particularly challenging for taxonomic work

437 (Nygren 2014). The high number of MOTUs compared to morphospecies found in this and other studies (Jolly et

438 al . 2006; Carr et al . 2011; Hardy et al . 2011; Nygren & Pleijel 2011) suggests a considerable amount of hidden

439 diversity in this group. Although some or many of the MOTUs described may not correspond necessarily to

440 independent species, the observation of distant lineages within a morphospecies is an ecologically relevant

441 finding on itself. Even if a small part of those lineages are confirmed as separate species in future studies,

442 implications for monitoring programs are potentially pertinent, considering the high relevance of this group in

443 marine ecosystems and, in particular, their extensive use as of environmental indicators in the monitoring of

444 estuarine and coastal communities (e.g. Giangrande et al . 2005). Reference libraries of DNA barcodes can

16 Page 17 of 39 Molecular Ecology Resources

445 therefore be a key tool for a more extensive and rigorous documentation of the diversity of polychaetes, where

446 numerous cryptic species have been found with the aid of molecular approaches (e.g. Nygren 2014).

447

448

449 Acknowledgments

450 This work was supported by FEDER through POFCCOMPETE and by national funds from "Fundação para a

451 Ciência e a Tecnologia (FCT)" in the scope of the grants FCOMP010124FEDER015429 and PEst 452 C/BIA/UI4050/2011. JorgeFor Lobo is supportedReview by a FCT PhD grant Only (SFRH/BD/69750/2010). Claudia Hollatz is 453 supported by a CAPES Postdoctoral grant under the Ministry of Education, Brazil. Ascensão Ravara is

454 supported by a postdoctoral grant (BPD/UI88/2911/2013) within the project Sustainable Use of Marine

455 Resources MARES (CENTRO07ST24FEDER002033) cofinanced by QREN Mais Centro Programa

456 Operacional do Centro e União Europeia / Fundo Europeu de Desenvolvimento Regional. The authors would

457 also like to thank Monica Landi (CBMA) for the support in the initial molecular experiments. We thank

458 Mehrdad Hajibabaei for making available lab facilities to generate some of the sequences. We also thank the

459 Biodiversity Institute of Ontario for providing 48% of the sequencing service.

460

461

462 References

463 464 Aguado MT, Nygren A, Siddall ME (2007) Phylogeny of Syllidae (Polychaeta) based on combined molecular 465 analysis of nuclear and mitochondrial genes. Cladistics , 23 , 552–564. 466 Aguado MT, San Martin G, Siddall ME (2012) Systematics and evolution of syllids (Annelida, Syllidae). 467 Cladistics , 28 , 234–250. 468 Audzijonyte A, Ovcarenko I, Bastrop R, Väinölä R (2008) Two cryptic species of the Hediste diversicolor group 469 (Polychaeta, Nereididae) in the Baltic Sea, with mitochondrial signatures of different population 470 histories. Marine Biology , 155 , 599–612. 471 Barroso R, Klautau M, Sole Cava AM, Paiva PC (2010) Eurythoe complanata (Polychaeta: Amphinomidae), the 472 ‘cosmopolitan’ fireworm, consists of at least three cryptic species. Marine Biology , 157 , 69–80. 473 Berke SK, Mahon AR, Lima FP, Halanych KM, Wethey DS, Woodin SA (2010) Range shifts and species 474 diversity in marine ecosystem engineers: patterns and predictions for European sedimentary habitats. 475 Global Ecology and Biogeography , 19 , 223–232. 476 Bickford D, Lohman DJ, Sodhi NS et al. (2006) Cryptic species as a window on diversity and consevation. 477 Trends in Ecology and Evolution , 22 , 148–155. 478 Bleidorn C, Kruse I, Albrecht S, Bartolomaeus T (2006) Mitochondrial sequence data expose the putative 479 cosmopolitan polychaete Scoloplos armiger (Annelida, Orbiniidae) as a species complex. BMC 480 Evolutionary Biology , 6, 47. 481 Böggemann M (2009) Polychaetes (Annelida) of the abyssal SE Atlantic. Organisms Diversity & Evolution , 9, 482 251–428.

17 Molecular Ecology Resources Page 18 of 39

483 Bonse S, Schmidt H, EibyeJacobsen D, Westheide W (1996) Eulalia viridis (Polychaeta: ) is a 484 complex of two species in Northern Europe: Results from biochemical and morphological analyses. 485 Cahiers de Biologie Marine , 37 , 33–48. 486 Bucklin A, Steinke D, BlancoBercial L (2010) DNA Barcoding of Marine Metazoa. Annual Review of Marine 487 Science , 3, 471–508. 488 Calosi P, Rastrick SPS, Lombardi C et al . (2013) Metabolic adaptation and acclimatisation to ocean acidification 489 in marine ectotherms: An in situ transplant experiment at a shallow CO2 vent system. Philosophical 490 Transactions of the Royal Society B: Biological Sciences , 368 , 20120444. 491 Carr CM, Hardy SM, Brown TM, Macdonald TA, Hebert PDN (2011) A trioceanic perspective: DNA 492 Barcoding reveals geographic structure and cryptic diversity in Canadian polychaetes. PLoS ONE , 6, 493 e22232. 494 Costa FO, Carvalho GR (2007) The Barcode of Life Initiative: synopsis and prospective societal impacts of 495 DNA barcoding of Fish. Genomics, Society and Policy , 3, 29–40. 496 Costa FO, Henzler CM, Lunt DH, Whiteley NM, Rock J (2009) Probing marine Gammarus (Amphipoda) 497 taxonomy with DNA barcodes. Systematics and Biodiversity , 7, 365–379. 498 Costa FO, Carvalho GRFor (2010) New Review insights into molecular evolution:Only prospects from the Barcode of Life 499 Initiative (BOLI). Theory in Biosciences , 129 , 149–157. 500 Costa FO, Antunes PM (2012) The contribution of the Barcode of Life initiative to the discovery and monitoring 501 of Biodiversity. In: Mendonça A, Chakrabarti R (eds) Natural Resources, Sustainability and Humanity – 502 A comprehensive View. Springer Science+Business Media , Dordrecht, 37 , 68. 503 Costa FO, Landi M, Martins R, Costa MH, Costa ME, Carneiro M, Alves MJ, Steinke D, Carvalho GR (2012) A 504 ranking system for reference libraries of DNA Barcodes: application to marine fish species from 505 Portugal. PLoS ONE , 7, e35858. 506 Coull BC (1999) Role of meiofauna in estuarine softbottom habitats. Australian Journal of Ecology , 24 , 327– 507 343. 508 Fauvel P (1927) Polychètes sédentaires: addenda aux errantes, archiannélides, myzostomaires. Faune de France , 509 16 , 1–494. 510 Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial 511 cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular Marine Biology and 512 Biotechnology , 3, 294–299. 513 Gambi MC, Castelli A, Giangrande A et al. (1994) Polychaetes of commercial and applied interest in Italy: an 514 overview. Mémoires du Muséum National d'Histoire Naturelle , 162 , 593603. 515 Geller J, Meyer C, Parker M, Hawk T (2013) Redesign of PCR primers for mitochondrial cytochrome c oxidase 516 subunit I for marine invertebrates and application in alltaxa biotic surveys. Molecular Ecology 517 Resources , 13 , 851–861. 518 Giangrande A, Licciano M, Musco L (2005) Polychaetes as environmental indicators revisited. Marine Pollution 519 Bulletin , 50 , 1153–1162. 520 Gillet P, Torresanib S (2003) Structure of the population and secondary production of Hediste diversicolor (O.F. 521 Müller, 1776), (Polychaeta, Nereidae) in the Loire estuary, Atlantic Coast, France. Estuarine, Coastal 522 and Shelf Science , 56 , 621–628. 523 Glover AG, Goetze E, Dahlgren TG, Smith CR (2005) Morphology, reproductive biology and genetic structure 524 of the whalefall and hydrothermal vent specialist, Bathykurila guaymasensis Pettibone, 1989 525 (Annelida: Polynoidae). Marine Ecology , 26 , 223–234. 526 Hardy SM, Carr CM, Hardman M, Steinke D, Corstorphine E, Mah C (2011) Biodiversity and phylogeography 527 of Arctic marine fauna: insights from molecular tools. Marine Biodiversity , 41 , 195–210. 528 Hayward PJ, Ryland JS (Eds.) (1995) Handbook of the Marine Fauna of North-West Europe. Great Britain . 529 Oxford University Press Inc., New York. 530 Hebert PDN, Cywinska A, Ball SL, deWaard JR (2003a) Biological identifications through DNA barcodes. 531 Proceedings of the Royal Society of London Series B: Biological Sciences , 270 , 313–321. 532 Hebert PDN, Ratnasingham S, deWaard JR (2003b) Barcoding animal life: Cytochrome c oxidase subunit 1 533 divergences among closely related species. Proceedings of the Royal Society of London B: Biological 534 Sciences , 270 , 596–599. 535 Hebert PDN, Penton EH, Burns JM, Janzen DH, Hallwachs W (2004) Ten species in one: DNA barcoding 536 reveals cryptic species in the neotropical skipper butterfly Astraptes fulgerator . Proceedings of the 537 National Academy of Sciences of the United States of America , 101 , 14812–14817. 538 Ivanova NV, Zemlak TS, Hanner R, Hebert PDN (2007) Universal primer cocktails for fish DNA barcoding. 539 Molecular Ecology Notes , 7, 544–548.

18 Page 19 of 39 Molecular Ecology Resources

540 Jarman SN, Elliott NG (2000) DNA evidence for morphological and cryptic Cenozoic speciations in the 541 Anaspididae, ‘living fossils’ from the Triassic. Journal of Evolutionary Biology , 13 , 624–633. 542 Jolly MT, Viard F, Gentil F, Thiebaut E, Jollivet D (2006) Comparative phylogeography of two coastal 543 polychaete tubeworms in the Northeast Atlantic supports shared history and vicariant events. Molecular 544 Ecology , 15 , 1841–1855. 545 Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein 546 sequences. Computer Applications in the Biosciences , 8, 275–282. 547 Kimura M (1980) A simple model for estimating evolutionary rates of base substitutions through comparative 548 studies of nucleotide sequences. Journal of Molecular Evolution , 16 , 111–120. 549 Knebelsberger T, Landi M, Neumann H, Kloppmann M, Sel AF, Campbell PD, Laakmann S, Raupach MJ, 550 Carvalho GR, Costa FO (2014) A reliable DNA barcode reference library for the identification of the 551 North European shelf fish fauna. Molecular Ecology Resources , 14 , 1060–1071. 552 Knowlton N (1993) Sibling species in the sea. Annual Review of Ecology Systematics , 24 , 189–216. 553 Knox MA, Hogg ID, Pilditch CA, Lo AN, Hebert PDN, Steinke D (2012) Mitochondrial DNA (COI) analyses 554 reveal that amphipod diversity is associated with environmental heterogeneity in deepsea habitats. 555 Molecular Ecology,For 21 , 4885–4897. Review Only 556 Kristensen E, Jensen MH, Andersen TK (1985) The impact of polychaete ( Nereis virens Sars) burrows on 557 nitrification and nitrate reduction in estuarine sediments. Journal of Experimental Marine Biology and 558 Ecology , 85 , 75–91. 559 Lobo J, Costa PM, Teixeira MAL, Ferreira MSG, Costa MH, Costa FO (2013) Enhanced primers for 560 amplification of DNA barcodes from a broad range of marine metazoans. BMC Ecology, 13: 34. 561 Mahon AR, Mahon HK, Dauer DM, Halanych KM (2009) Discrete genetic boundaries of three Streblospio 562 (Spionidae, Annelida) species and the status of S. shrubsolii . Marine Biology Research , 5, 172–178. 563 Maltagliati F, Peru AP, Casu M et al. (2000) Is Syllis gracilis (Polychaeta: Syllidae) a species complex? An 564 allozyme perspective. Marine Biology , 136, 871–879. 565 Matzen da Silva J, Creer S, dos Santos A, Costa AC, Cunha MR, Costa FO, Carvalho GR (2011) Systematic 566 and evolutionary insights derived from mtDNA COI Barcode diversity in the Decapoda (Crustacea: 567 Malacostraca). PLoS ONE , 6, e19449. 568 Norlinder E, Nygren A, Wiklund H, Pleijel F (2012) Phylogeny of scaleworms (Aphroditiformia, Annelida), 569 assessed from 18SrRNA, 28SrRNA, 16SrRNA, mitochondrial cytochrome c oxidase subunit I (COI), 570 and morphology. Molecular Phylogenetics and Evolution , 65 , 490–500. 571 Nygren A, Pleijel F (2011) From one to ten in a single stroke resolving the European Eumida sanguinea 572 (Phyllodocidae, Annelida) species complex. Molecular Phylogenetics and Evolution , 58, 132–141. 573 Nygren A (2014) Cryptic polychaete diversity: a review. Zoologica Scripta , 43 , 172–183. 574 Olive PJW (1975) Reproductive biology of Eulalia viridis (Müller) of the north eastern U.K. Journal of the 575 Marine Biological Association of the United Kingdom, 55 , 313–326. 576 Olive PJW (1994) Polychaeta as a world resource: a review of patterns of exploration as sea angling baits and 577 the potential for aquaculture based production. Mémoires du Muséum National d'Histoire Naturelle , 578 162 , 603610. 579 Olson MA, Zajac RM, Russello MA (2009) Estuarinescale genetic variation in the polychaete Hobsonia florida 580 (Ampharetidae; Annelida) in Long Island Sound and relationships to Pleistocene glaciations. Biological 581 Bulletin , 217 , 86–94. 582 Pleijel F (1993) Polychaeta Phyllodocidae . Marine invertebrates of Scandinavia, 8: Scandinavian university 583 Press, Oslo, Norway, 159 pp. 584 Pleijel F, Rouse G, Nygren A (2009) Five colour morphs and three new species of Gyptis (Hesionidae, Annelida) 585 under a jetty in Edithburgh, South Australia. Zoologica Scripta , 38 , 89–99. 586 Quijón PA, Snelgrove PVR (2005) Polychaete assemblages of a subarctic Newfoundland fjord: Habitat, 587 distribution, and identification. Polar Biology , 28 , 495–505. 588 Ratnasingham S, Hebert PDN (2013) DNAbased registry for all animal species: the Barcode Index Number 589 (BIN) system. PLoS ONE , 8, e66213. 590 Ravara A, Wiklund H, Cunha MR, Pleijel F (2010) Phylogenetic relationships within Nephtyidae (Polychaeta, 591 Annelida). Zoologica Scripta , 39 , 394–405. 592 Rice SA, Stephen K, Rice KA (2008) The Polydora cornuta complex (Annelida: Polychaeta) contains 593 populations that are reproductively isolated and genetically distinct. Invertebrate Biology , 127 , 45–64. 594 Rouse GW, Pleijel F (2001). Polychaetes . Oxford University Press, Oxford, 354 pp. 595 Sampertegui S, Rozbaczylo N, CanalesAguirre CB, Carrasco F, Hernandez CE, RodriguezSerrano E (2013) 596 Morphological and molecular characterization of Perinereis gualpensis (Polychaeta: Nereididae) and its

19 Molecular Ecology Resources Page 20 of 39

597 phylogenetic relationships with other species of the genus off the Chilean coast, Southeast Pacific. 598 Cahiers de Biologie Marine , 54 , 27–40. 599 Siddall ME, Apakupakul K, Burreson EM et al . (2001) Validating Livanow: molecular data agree that leeches, 600 Branchiobdellidans, and Acanthobdella peledina form a monophyletic group of oligochaetes. Molecular 601 Phylogenetics and Evolution , 21 , 346–351. 602 Smith PJ, McVeagh SM, Allain V, Sanchez C (2005) DNA identification of gut contents of large pelagic fishes. 603 Jounal of Fish Biology , 67 , 1178–1183. 604 Smith MA, Woodley NE, Janzen DH, Hallwachs W, Hebert PDN (2006) DNA barcodes reveal cryptic host 605 specificity within the presumed polyphagous members of a genus of parasitoid flies (Diptera: 606 Tachinidae). Proceedings of the National Academy of Sciences of the United States of America , 103 , 607 3657–3662. 608 Song H, Buhay JE, Whiting MF, Crandall KA (2008) Many species in one: DNA barcoding overestimates the 609 number of species when nuclear mitochondrial pseudogenes are co amplified. Proceedings of the 610 National Academy of Sciences of the United States of America , 105 , 13486–13491. 611 Sousa R, Dias S, Antunes C (2006) Spatial subtidal macrobenthic distribution in relation to abiotic conditions in 612 the Lima estuary,For NW of Portugal. Review Hydrobiologia , 559 , 135–148. Only 613 Sousa R, Dias S, Freitas V, Antunes C (2008) Subtidal macrozoobenthic assemblages along the River Minho 614 estuarine gradient (northwest Iberian Peninsula). Aquatic Conservation: Marine and Freshwater 615 Ecosystems , 18 , 1063–1077. 616 Sutherland W, ArmstrongBrown S, Armsworth P et al . (2006) The identification of 100 ecological questions of 617 high policy relevance in the UK. Journal of Applied Ecology , 43 , 617–627. 618 Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: molecular evolutionary genetics 619 analysis version 6.0. Molecular biology and evolution , 30 , 2725–2729. 620 Tosuji H, Sato M (2010) Genetic evidence for parapatric differentiation of two forms of the brackishwater 621 nereidid polychaete Hediste atoka . Plankton Benthos Research , 5, 242–249. 622 Virgilio M, Fauvelot C, Constantin F, Abbiath M, Backeljau T (2009) Phylogeography of the common ragworm 623 Hediste diversicolor (Polychaeta: Nereididae) reveals cryptic diversity and multiple colonization events 624 across its distribution. Molecular Ecology , 18 , 1980–1994. 625 Volkenborn N, Polerecky L, Hedtkamp SIC, van Beusekom JEE, de Beer D (2007) Bioturbation and 626 bioirrigation extend the open exchange regions in permeable sediments. Limnology and Oceanography , 627 52 ,1898–1909 628 Williams ST, Knowlton N (2001) Mitochondrial pseudogenes are pervasive and often insidious in the snapping 629 shrimp genus Alpheus . Molecular Biology and Evolution , 18 , 1484–1493. 630 Zanol J, Kenneth M, Struck TH, Fauchald K (2010) Phylogeny of the bristle worm family Eunicidae (Eunicida, 631 Annelida) and the phylogenetic utility of noncongruent 16S, COI and 18S in combined analyses. 632 Molecular Phylogenetics and Evolution , 55 , 660–676. 633 Zhong M, Struck TH, Halanych KM (2008) Phylogenetic information from three mitochondrial genomes of 634 Terebelliformia (Annelida) worms and duplication of the methionine Trna. Gene , 416 , 11–21. 635 Zhou H, Zhang ZN, Chen HY et al . (2010) Integrating a DNA barcoding project with an ecological survey: a 636 case study on temperate intertidal polychaete communities in Qingdao, China. Chinese Journal of 637 Oceanology and Limnology , 28 , 899–910. 638

639

640 Data accessibility

641

642 All information about specimens and molecular data generated in this study is compiled in the BOLD project

643 titled “Polychaeta of Portugal [MPCPT] ”. A total of 290 COI5P sequences belonging to 79 species (5 orders, 20

644 families) were analysed and they are available in the BOLD Dataset (Dataset ID DSPTGB). All original

645 sequences generated in this study were submitted to GenBank and correspond to accessions numbers XXXXXX

646 (accessions will be provided upon paper acceptance) (Table 3).

20 Page 21 of 39 Molecular Ecology Resources

647

648

649 Author Contributions

650

651 J.L., M.A.L.T. and F.O.C. globally designed the study and wrote the manuscript. J.L., M.A.L.T., L.M.S.B.,

652 M.S.G.F., P.A.G., R.S., A.R. and M.H.C. participated in specimen processing, including collection,

653 identification, image capture and database uploading. J.L. and M.A.L.T. carried out molecular analyses. J.L., 654 M.A.L.T., C.H. and F.O.C.For analysed Review data. All authors contributed Only for the results’ discussion, and manuscript 655 revision and editing.

656

21 Molecular Ecology Resources Page 22 of 39

Table 1 Primer pairs used to amplify COI5P from polychaete specimens in this study. Set Reference Primer Direction (5’ – 3’) PCR thermal cycling conditions (N) 1 LoboF1 (F) KBTCHACAAAYCAYAARGAYATHGG 1) 94°C (5 min); 2) 5 cycles: 94°C (30 s), 45°C (1 min 30 s), 72°C (1 Lobo et al . 2013 min); 3) 45 cycles: 94°C (30 s), 54°C (1 min 30 s), 72°C (1 min); 4) 72°C (80) LoboR1 (R) TAAACYTCWGGRTGWCCRAARAAYCA (5 min). For Review Only

2 polyLCO (F) GAYTATWTTCAACAAATCATAAAGATATTGG Carr et al . 2011 1) 94°C (1 min); 2) 5 cycles: 94°C (40 s), 45°C (40 s), 72°C (1 min); 3) 35 (46) cycles: 94°C (40 s), 51°C (40 s), 72°C (1 min); 4) 72°C (5 min). polyHCO (R) TAMACTTCWGGGTGACCAAARAATCA

3 LCO1490 (F) GGTCAACAAATCATAAAGATATTGG 1) 94°C (1 min); 2) 5 cycles: 94°C (30 s), 45°C (1 min 30 s), 72°C (1 Folmer et al . 1994 min); 3) 35 cycles: 94°C (30 s), 51°C (1 min 30 s), 72°C (1 min); 4) 72°C (28) HCO2198 (R) TAAACTTCAGGGTGACCAAAAAATCA (5 min).

4 C_VF1LFt1 (F) VF1_t1:VF1d_t1:LepF1_t1:VF1i_t1 Ivanova et al . 2007 1) 94°C (1 min); 2) 5 cycles: 94°C (30 s), 50°C (40 s), 72°C (1 min); 3) 35 (14) cycles: 94°C (1 min), 54°C (40 s), 72°C (1 min); 4) 72°C (10 min). C_VR1LRt1 (R) VR1_t1:VR1d_t1:LepR1_t1:VR1i_t1

5 jgLCO1490 (F) GMATAGTAGGMACRGCYCTNA Geller et al . 2013 1) 95°C (5 min); 2) 35 cycles: 95°C (30 s), 48°C (30 s), 72°C (45 s); 3) (2) 72°C (5 min). jgHCO2198 (R) YCCTGTGAATAGGGGGAATC

N represents the number of sequences obtained

22 Page 23 of 39 Molecular Ecology Resources

Table 2 Species and sequences global number used in this study. A summary of the geographical locations and primers distribution among the sequences are also shown. Species Sequences Site Number Number Portugal Aljezur 1 1 Lima Estuary 5 25 Minho Estuary 1 5 Praia das Avencas 2 3 Ria de Aveiro 9 21 Sado Estuary 25 64 Sines 1 1

Viana do Castelo coast 11 44

This study 49 151

Lobo et al . (2013) 11 13 For Review Only PolyPT 51 164 Worldwide Angola 1 3 areas Australia 1 2

Canada 15 43

Chile 1 2

China 1 2 France 3 3 India 2 2

Italy 1 1

Portugal 51 171

Russia 9 4 South Korea 2 6

Spain 1 1

Sweden 2 2 USA 14 30

Western Europe 3 13

All sequences 79 290 PolyPT COI sequences generated in this study plus Lobo et al. 2013.

23 Molecular Ecology Resources Page 24 of 39

Table 3 Taxonomic classification of 79 species from the class Polychaeta, number of specimens and distribution. Family Species BIN (MOTU) Distribution (N) GenBank Accession BoldSystems Process ID Source Cirratulidae Unid. Cirratulidae ACP4027 (M 48 ) Sado Estuary, PCALE029 14 This study Portugal (1) Cirratulus spectabilis (Kinberg, 1866) AAW0098 (M47) Alaska, USA (3) KBPOL10311; KBPOL BOLD KBPOL10511; project KBPOL26911 (Christina Carr) Cirriformia tentaculata (Montagu,For 1808) ReviewACI2312 (M4 4) Viana Only do Castelo SFPOM089 11; This Study Coast, Portugal SFPOM08811 (2) Cirriformia sp.1 ACO5093 (M45) Sado Estuary, PCALN00310 This study Portugal (1) Cirriformia sp.2 AC O5094 (M4 6) Viana do Castelo SFPOM090 11 This study Coast, Portugal (1) Capitellidae Capitellidae sp.1 ABX4984 (M49) Canada (3) GU672219; Carr et al . 2011 GU672221; GU672306 Capitellidae sp.2 ACO4609 (M50) Lima Estuary, MPCPT04514; This study Portugal (2) MPCPT04614 Capitellidae sp.3 AAH0966 (M51) Canada (3) GU672220; Carr et al. 2011 GU672207; HQ023476 Capitellidae sp.4 AAY6125 (M52) Alaska, USA (3) KBPOL56711; KBPOL BOLD KBPOL56611; project KBPOL64911 (Christina Carr) Capitella capitata (Fabricius, 1780) AAE7888 (M75) Canada (3) HQ023470; Carr et al . 2011 GU672406; GU672407 Heteromastus filiformis (Claparède, 1864) ACO5092 (M7 3) Ria de Aveiro, SFPOM023 11; This study Portugal (5) MPCPT02814; MPCPT01214; MPCPT02714; SFPOM02111 Notomastus profondus (Eisig, 1887) ACP4224 (M7 4) Sado Estuary, PCALE068 14; This study

24 Page 25 of 39 Molecular Ecology Resources

Portugal (3) PCALE06914; PCALE07014 Eunicidae Lysidice collaris (Grube, 1870) AAM4750 (M42) USA (1) GQ497557 Zanol et al . 2010 Lysidice ninetta (Audouin & Milne Edwards, 1833 ) AAX6206 (M4 3) Viana do Castelo GQ497564 SFPOM072 11 This study; Coast, Portugal Zanol et al . (1); USA (1) 2010 For Review Only

Marphysa disjuncta (Hartman, 1961) AAS3818 (M41) USA (2) GQ497549 CMBIA22311 Zanol et al . 2010; CMBIA BOLD project (Peter E. Miller) Marphysa sanguinea (Montagu, 1815) AAF0769 (M39) Sado Estuary, AY040708 PCALN00610; This study; Portugal (3); PCALN01410; Siddall et al . France (1) PCALE07414 2001 Marphysa sp.1 AAY5846 (M4 0) Sado Estuary , PCALN009 10; This study Portugal (2) PCALN00410 Glyceridae Glycera sp.1 AAY5542 (M21) Viana do Castelo SFPOM01311; This study Coast, Portugal SFPOM01411 (2) Glycera sp.2 AAY5541 (M2 0) Viana do Castelo SFPOM012 11 This study Coast, Portugal (1) Glycera alba (O.F. Müller, 1776) AAY5541 (M22); Sado Estuary, JN852946; LOBO04313 Lobo et al . ACH5832 (M20) Portugal (1); India KF815720 2013; Norlinder (1); Sweden (1). et al . 2012; Singh e t al . unpublished Glycera southeastatlantica (Böggemann, 2009) AAM4674 (M23) Angola Basin (3) GQ426631; Boeggemann GQ426632; 2009 GQ426630 Lumbrineridae Lumbrineris sp. ACO5551 (M26) Aljezur, Portugal MPCPT04714. This study. (1) Lumbrineris latreilli Audouin & Milne Edwards, 1834 ACP3939 (M25) Sado Estuary, PCALE06014 This study Portugal (1)

25 Molecular Ecology Resources Page 26 of 39

Lumbrineris tetraura (Schmarda, 1861) AAX5951 (M27) China (2) GU362689; Zhou et al . EU352318 2010; Peng & Cheng unpublished Ninoe nigripes Verrill, 1873 AAA9058 (M2 4) Canada (3) HQ024149; Carr et al. 2011 HQ024151; HQ024156 For Review Only Maldanidae Axiothella constricta (Claparède, 1869) ACG0776 (M93) Sado Estuary, LOBO03913 Lobo et al . 2013 Portugal (1)

Axiothel la rubrocincta (Johnson, 1901) AAY5102 (M 88 ) Alaska, USA (2) KBPOL472 11; KBPOL BOLD KBPOL43511 project (Christina Carr)

Euclymene sp.1 ACG1020 (M96) Sado Estuary, LOBO04613; This study; Portugal (2) PCALE03814 Lobo et al . 2013 Euclymene sp.2 ACG0681 (M97) Sado Estuary, MPCPT02214; This study Portugal (6) MPCPT01714; PCALE03614; PCALE03914; PCALE03514; PCALE03414 Euclymene robusta (Arwidsson, 1906) ACG1021 (M95) Sado Estuary, LOBO04113; This study; Portugal (3) PCALE03314; Lobo et al . 2013 PCALE03714 Euclymene santandarensis (Rioja, 1917) ACG0681 (M9 7) Sado Estuary, LOBO045 13; LOBO038 This study; Portugal (4) 13; Lobo et al . 2013 LOBO04013 Leiochone leiopygos (Grube, 1860) ACG0717 (M94) Sado Estuary, LOBO03713; This study; Portugal (3) PCALE04114; Lobo et al . 2013 PCALE04014 Maldanidae sp.1 ACJ0424 (M90) Sado Estuary, PCALE05814 This study Portugal (2) Maldanidae sp.2 ACP5251 (M92) Sado Estuary, PCALE06214; This study Portugal (1) PCALE05914.

26 Page 27 of 39 Molecular Ecology Resources

Maldanidae sp.3 ACG0717 (M94) Sado Estuary, PCALE04214 This study Portugal (1) Maldanidae sp.4 ACG1021 (M95) Sado Estuary, PCALE06414 This study Portugal (1) Maldanidae sp.5 ACG0681 ( M9 7) Sado Estuary, PCALE066 14; This study Portugal (3) PCALE06514; PCALE06314 Praxillella affinis pacifica Berkeley,For 1929 ReviewAAD4508 (M90) Canada Only (3) HM473649; Carr et al . 2011 HM473650; HM473652 Praxillella praetermissa (Malmgren, 1865) ACF9830 (M91); Sado Estuary, GU672372; LOBO03613; This study; AAD5785 (M86); Portugal (4); GU672350; PCALE03214; Lobo et al . AAD1536 (M87) Russia (2); GU672610; PCALE03014; 2013; Hardy et Canada (4) GU672356; PCALE03114 al . 2011 GU670833; GU672345 Nepthyidae Nepthyidae AAH7451 ( M6 4) Sado Estuary, PCALE075 14 This study Portugal (1)

Nephtys sp. AAH7451 (M64) Sado Estuary, (1), SFPOM02911; This study Aveiro (1), MPCPT03214 Portugal Nephtys caeca (Fabricius, 1780) AAX6622 (M65) Canada (1); HM473493 KBPOL04111; Carr et al . 2011; Alaska, USA (2) KBPOL04011 KBPOL BOLD project (Christina Carr) Nephtys hombergii (Savigny in Lamarck, 1818) AAH7451 (M6 4) Sado Estuary (4), GU179410 PCALN008 10; This study; Aveiro (3), SFPOM06711 Ravara et al ., Portugal PCALN00110; 2010 PCALN01010 PCALN01110; SFPOM03011 Nereididae Hediste diversicolor (O.F. Müller, 1776) ACO5139 (M1); Aveiro (3), Lima MPCPT00113; This study; AAY5198 (M2); Estuary (10), MPCPT03314 Lobo et al. 2013 ACG1110 (M3); Minho Estuary MPCPT03414; ABZ5903 (M4); (5), Portugal MPCPT03514

27 Molecular Ecology Resources Page 28 of 39

ACO5140 (M5); MPCPT04814: ACG1109 (M6); SFPOM01511; ACP3870 (M7) SFPOM01811; SFPOM02011; PCALE04514; PCALE04714; PCALE05014; For Review Only PCALE05114; PCALE05214 PCALE05314; LOBO03113; LOBO032 13; LOBO03313; LOBO03413

Neanthes fucata (Savigny in Lamarck, 1818) AAY5414 (M 9) Sines (1), Lima SFPOM040 11; This study Estuary (6), PCALE00309; Portugal MPCPT01314; SFPOM03911; MPCPT01514; MPCPT01414; SFPOM03811 Neanthes japonica (Izuka, 1908) ACH5777 (M 8) South Korea (3) JX503016; Kim et al . JX503018; JX503017 unpublished Nereis falsa Quatrefages, 1866 ACH5486 (M12) Sado Estuary, MPCPT00213; This study Portugal (4) MPCPT00313; PCALN00710; PCALN02110. Nereis pelagica Linnaeus, 1758 ABX6564 (M17) Viana do Castelo HQ024123; MPCPT02114; This study; Carr AAF4530 (M18) Coast, Portugal GU672452; SFPOM07411; et al . 2011; AAB1421 (M19) (3); Russia (2); GU672449; SFPOM07311; Hardy et al . USA (3); NW HQ023603; KBPOL33011; 2011; KBPOL Atlantic (1), HQ023602; KBPOL21811; BOLD project Canada (3) HQ023614 KBPOL22011 (Christina Carr) Nereis zonata Malmgren, 1867 AAE2406 (M1 6) Canada (3) HQ024405, Carr et al . 2011 HQ024404, HQ024403

28 Page 29 of 39 Molecular Ecology Resources

Perinereis sp. AAY5413 (M11) Viana do Castelo SFPOM04311; This study Coast, Portugal SFPOM04111; (3) SFPOM04411 Perinereis cultrifera (Grube, 1840) AAY5413 (M11) Viana do Castelo SFPOM07811; This study Coast, Portugal SFPOM07711; (7) SFPOM07611; SFPOM04211; For Review Only MPCPT03014; SFPOM03511; SFPOM03611 Perinereis vallata (Grube, 1858) ACA4723 (M10) Chile (2); India HQ705193; Sampertegui et (1) JX676143; al . 2013; HQ705194 Iyyapparajanara simapallavan et al. Unpublished Platynereis bicanaliculata (Baird, 1863) AAC5672 (M15) Canada (3) HM473591; Carr et al . 2011 HM473589; HM473590 Platynereis dumerilii (Audouin & Milne Edwards, ABY1368 (M13); Viana do Castelo KC591838 SFPOM05011; This study; 1834) ACP6515 (M14) Coast, Portugal MPCPT00613; Calosi et al . (3); Italy (1) MPCPT00513 2013 Onuphidae Diopatra marocensis (Paxton, Fadlaoui & Lechapt, AAY5981 (M 98 ) Óbidos, Portugal FJ428923; FJ428917. Berke et al . 1995) (2) 2010 Diopatra neapolitana (Delle Chiaje, 1841) AAX9469 (M99) Aveiro, Portugal FJ428910; FJ428930; SFPOM01611 This study; (1); Western FJ428932 Berke et al. Europe (3) 2010 Orbiniidae Orbiniidae sp.1 ACG0076 ( M3 6) Sado Estuary, LOBO042 13 Lobo et al . 2013 Portugal (1) Leitoscoloplos pugettensis (Pettibone, 1957) AAD8935 (M3 7) Canada (2) HM473438; Carr et al. 2011 HM473442

Scoloplos acutus (Verrill, 1873) AAC3630 (M38) Canada (3) HQ024227; Carr et al . 2011 HQ024230; HQ024223 Oweniidae Owenia fusiformis (delle Chiaje, 1844) AAZ1576 (M76); Canada (2); DQ319452; KBPOL06511; This study; Carr AAW0013 (M77); Alaska, USA (2); DQ319282; KBPOL15311; et al . 2011;

29 Molecular Ecology Resources Page 30 of 39

AAY9554 (M78); Sado Estuary (2), DQ319459; SFPOM06511; KBPOL BOLD AAZ1577 (M79); Aveiro, Portugal DQ319249; SFPOM06611; project AAA1512 (M80) (3); Western DQ319227; SFPOM06411 (Christina Carr); Europe (9) DQ319322; Jolly et al . 2006 DQ319359; DQ319306; DQ319383; For Review OnlyDQ319483; DQ319481; GU672307; GU672205 Phyllodocidae Eteone flava (Fabricius, 1780) ACO4402 (M7 0) Lima Estuary, MPCPT036 14 This study Portugal (5) MPCPT04014 Eulalia viridis (Linnaeus, 1767) AAY5110 (M68); Viana do Castelo GU672477; SFPOM07911; This study; AAE3409 (M71) Coast (3), Praia GU672434; SFPOM08111 Hardy et al . das Avencas (1), GU672436 MPCPT02914; 2011 Portugal; Russia PCALE01111 (3). Phyllodocidae sp.1 ABY0206 (M69) Viana do Castelo MPCPT02014; This study Coast, Portugal SFPOM03711. (2) Polynoidae Harmothoe imbricata (Linnaeus, 1767) ACE8668 ( M58 ) Alaska, USA (3) KBPOL111 11; KBPOL BOLD KBPOL49311; project KBPOL77311 (Christina Carr) Harmothoe impar (Johnston, 1839) AAY5847 (M62) Sweden (1) JN852930 Norlinder et al . 2012 Lepidonotus squamatus (Linnaeus, 1758) ACI1085 (M6 0) South Korea (3) JX503013; Kim et al . JX503014; JX503015 unpublished Lepidonotus clava (Montagu, 1808) AAY7885 (M6 1) Western Europe JN852934 Norlinder et al . (1) 2012 Polynoidae sp.1 AAY7884 (M59) Aveiro, Portugal SFPOM01011 This study (1) Polynoidae sp.2 AAY7885 (M61) Aveiro, Portugal SFPOM01111 This study (1) Polynoidae sp.3 AAY5847 (M62) Sado Estuary, PCALN01610 This study Portugal (1)

30 Page 31 of 39 Molecular Ecology Resources

Sabellidae Sabella pavonina (Savigny, 1822) ACA4120 (M72) Sado Estuary, MPCPT01614; This study; Portugal (2) PCALN00210 Lobo et al . 2013 Sabellaridae Sabellaria alveolata (Linnaeus, 1767) AAY7553 (M55) Avencas (2), PCALE01411; This study Viana do Castelo PCALE01511; Coast (5), MPCPT02414; Portugal MPCPT00713; For Review Only MPCPT02314; MPCPT02614; MPCPT02514 Sigalionidae Sthenelais boa (Johnston, 1833) AAZ0896 (M63) Aveiro, Portugal KJ183006 SFPOM06811 This study; (1); France (1) Cowart et al . unpublished Spionidae Spionidae ACO6063 (M53) Lima Estuary, MPCPT04414 This study Portugal (1) Scolelepis sp. ACP4757 (M56) Lima Estuary, PCALE07714 This study Portugal (1) Scolelepis foliosa (Audouin & Milne Edwards, 1833) ACG1122 (M57) Lima Estuary (6), MPCPT04114 This study; Portugal MPCPT04314 Lobo et al . 2013 MPCPT01914; PCALE07614; LOBO03513; Scolelepis squamata (O.F. Müller, 1806) AAI0763 (M54) Canada (1); HM473679 KBPOL77611; Carr et al . 2011; Alaska, USA (2) KBPOL77511 KBPOL BOLD project (Christina Carr) Streblospio shrubsolii (Buchanan, 1890) ACO6063 (M5 3) Tagus Estuary, EU151719; Mahon et al . Portugal (2) EU151720 2009 Syllidae Syllis sp.1 AAY5331 (M 28 ) Viana do Castelo SFPOM062 11; This study Coast, Portugal SFPOM05811 (2) Syllis sp.2 AAY5332 ( M29 ) Viana do Castelo SFPOM060 11; This study Coast, Portugal SFPOM05911 (2) Syllis elongata Day, 1949 AAF2550 (M30) Canada (3) HQ932627; Carr et al . 2011 HM473696;

31 Molecular Ecology Resources Page 32 of 39

HM473697 Trypanosyllis coeliaca Claparède, 1868 AAW8649 (M35) Spain (1) EF123785 Aguado et al . 2007 Trypanosyllis zebra (Grube, 1860) ACH7277 (M31); Sado Estuary, EF123786; MPCPT00413 This study; ACB6891 (M32); Portugal (1); JF903790; JF903793 Aguado et al . AAW8651 (M33); France (1) 2007; Aguado et ACB6890 (M34) Australia (2) al . 2012 Terebellidae Terebellidae sp.1 For ReviewACO5523 (M83) Sado OnlyEstuary, MPCPT01814 This study Portugal (1) Terebellidae sp.2 ACP4603 (M8 3) Sado Estuary, PCALE072 14 This study Portugal (1) Nicolea zostericola (Örsted, 1844) AAC0291 (M8 5) Alaska, USA (3) KBPOL528 11; Christina Carr KBPOL52411; BOLD project KBPOL53011 Pista cristata (Müller, 1776) ACP4603 (M8 4); Sado Estu ary, NC_011011; PCALE071 14 This study; AAH9500 (M82) Portugal (1); USA EU239688 Zhong et al. (2) 2008 Pista estevanica (Berkeley & Berkeley, 1942) AAH9500 (M82) USA (1) CMBIA27911 CMBIA BOLD project (Peter E. Miller) Pista flexuosa (Grube, 1860) AAD0905 (M81) Canada (2); HQ024445; Hardy et al . Russia (2) HQ024447; 2011; Carr et al. GU672606; 2011 GU672609

Trichobranchidae Trichobranchidae AAY8419 (M67) Aveiro, Portugal SFPOM05311; This study (2) SFPOM00111 Trichobranchus glacialis Malmgren, 1866 ACO5729 (M66) Sado Estuary, MPCPT00914; This study Portugal (5) MPCPT01014; MPCPT03114; MPCPT00814; MPCPT01114 Oligochaeta: Hoplochaetella stuarti (Bourne, 1887) JN887890; Kumar et al. Octochaetidae JN887891; JN793519 unpublished N, number of specimens in each species

32 Page 33 of 39 Molecular Ecology Resources

Table 4 Intra and interspecific distances of families, genus and species analysed in this study.

Taxa Minimum Distance (%) Mean Distance (%) Maximum Distance (%)

Within Species PolyPT 32 0 0.6 6.04

All sequences For 63 Review 0 2.2Only 33.27

Within Genus PolyPT 7 14.4 21.15 26.4

All sequences 20 0.3 24.04 35.4

Within Family PolyPT 5 5.69 24.57 33.8

All sequences 11 14.18 24.44 35.49

PolyPT COI sequences generated in this study plus Lobo et al. 2013.

33 Molecular Ecology Resources Page 34 of 39

Table 5 Number of BINs and number of taxonomically concordant, discordant and singleton records for both COI5P sequence and including the GenBank (GB) and BOLD records used in the NJ tree. Number BINs

Poly PT Dataset Global Dataset Concordance* 28 47 Discordance 15 30 Singletons 15 22 Total 58 99 PolyPT COI sequences generated in this study plus Lobo et al. 2013. * means 1 BIN per 1 species

For Review Only

34 Page 35 of 39 Molecular Ecology Resources

Table 6 Data used in this study of the species Hediste diversicolor and Hediste atoka . Species Geographic Location N GenBank Accession Source Hediste diversicolor (O.F. Müller, Baltic Sea 3 EU300738 –EU300740 Virgilio et al. 2009 1776) Kattegat Sea 9 EU300762–EU300770 NE Europe 66 EU300733–EU300737 EU300703–EU300732 EU300675–EU300702 For ReviewEU300672 –EU300674 Only Tyrrhenian Sea 16 EU300773–EU300783 EU300771–EU300772 EU300784–EU300786 Adriatic Sea 35 EU300658–EU300671 EU300641 –EU300650 EU300637–EU300640 EU300651 –EU300657 Black Sea 13 EU300741–EU300753 Caspian Sea 8 EU300754–EU300761 Hediste diversicolor form A Baltic Sea 29 FJ030956 –FJ030984 Audzijonyte et al .2008 Hediste diversicolor form B Baltic Sea 10 FJ030985–FJ030994 Hediste atoka Sato & Nakashima, 2003 Japan and Korean Coasts 17 AB603871 –AB603887 Tosuji & Sato 2010 Kyushu and the Ryukyu Islands 28 AB603842–AB603864 (South of Japan) AB603866–AB603870 N, number of specimens in each species.

35 Molecular Ecology Resources Page 36 of 39

Fig. 1 Map of the study area showing the sampling sites (black stairs).

Fig. 2 Nucleotide tree NJ phylogenetic reconstruction (K2P distances) of COI5P sequences from 79 polychaete species belonging to 20 families. Oligochaeta Hoplochaeta stuarti was used as outgroup. Clades collapsed represent specimens with a genetic distance below 4%, except Hediste diversicolor complex. 1 000 bootstrap iterations. Bootstrap values above 90% are shown. Scale bar represents 5% difference in nucleotide sequences.

Fig. 3 ML phylogenetic reconstruction of COI5P sequences from Hediste diversicolor and H. atoka species. Clades collapsed represent specimens with a genetic distance below 4%. 1 000 bootstrap iterations. Bootstrap values above 60% are shown. Scale bar represents 2% difference in nucleotide sequences. Maps show the specimens distribution (black color) for clades. * shows the clade I subtree (Portuguese specimens from this study are marked with a grey background). For Review Only

36 Page 37 of 39 Molecular Ecology Resources

For Review Only

164x147mm (300 x 300 DPI)

Molecular Ecology Resources Page 38 of 39

For Review Only

168x199mm (300 x 300 DPI)

Page 39 of 39 Molecular Ecology Resources

For Review Only

168x155mm (300 x 300 DPI)