e-COST STSM Report

Action Number: CA15219-Developing new genetic tools for bioassessment of aquatic ecosystems in Europe – or DNAqua-Net STSM Title: gap-analysis in DNA barcode reference libraries of macrobenthic fauna from transition and coastal waters along the western European Atlantic coast Reference: ECOST-STSM-CA 15219-150217-082111 Applicant: Sofia Alexandra Ferreira Duarte Home institution: University of Minho, Portugal Host institution: AZTI Tecnalia, Spain Period: 19th February 2017 to 2nd March 2017

Introduction The aim of this STSM was mainly to contribute to the working group (WG) 1 goals and RC (research challenge) 3 of conducting a species gap-analysis in reference libraries of European aquatic biota relevant for the Water Framework Directive (WFD, 2000/60/EC) and the Marine Strategy Framework Directive (2008/56/EC) (Leese et al. 2016). The specific tasks were: 1) create, by compiling data from existing databases (e.g. AMBI, Macroben), a checklist of species for the dominant groups of benthic macroinvertebrate taxa used in biomonitoring of transition and coastal waters – Annelida, Crustacea and – along the western European (EU) Atlantic coast; 2) create a dedicated BOLD dataset project to initiate the compilation of a core COI-5P DNA barcode reference library for European benthic macroinvertebrate taxa occurring in transition and coastal waters; 3) audit and annotate the reference library records compiled in task 2 and 4) conduct a species gap-analysis by comparing the checklist generated in task 1 with the curated dataset generated in task 3.

Results Task 1. Create by compiling data from existing databases a checklist of species for dominant groups of benthic macroinvertebrates used in biomonitoring of transition and coastal waters – Annelida, Crustacea and Mollusca – along the western European (EU) Atlantic coast. A checklist of 2,525 marine invertebrate species occurring in transition and coastal waters of the western EU Atlantic coast was compiled by using as basis the species list of the AZTI’s Marine Biotic Index (AMBI) (5,319 species), retrieved from the AMBI 5.0 software (http://ambi.azti.es) (Borja et al. 2000), as well as a species occurrence list from the Basque Country monitoring network of coasts and estuaries (840 species), both provided by the host. The taxonomic classification of the retrieved soft-bottom macroinvertebrate species was done through the World Register of Marine Species (WoRMS) database (www.marinespecies.org). All records with taxonomic ranks higher than species level were removed from the list and only species with accepted scientific names were maintained in the list. An ordination of the list through the accepted scientific names enabled to remove any replicated records. Up to 88 % of the species in the list fell into the three target groups - Annelida, Crustacea and Mollusca. For species in the initial list for which no registers were found in WoRMS, the taxonomic classification was done

1 through the Integrated Taxonomic Information System (ITIS) (https://www.itis.gov/) database. Since our geographic target was the western EU Atlantic coast, the geographic distribution of each species in the list was assessed through the WoRMS, as well as by the Ocean Biogeographic Information System (OBIS) (http://www.iobis.org/). Thus, our final list comprehended a total of 2,525 species occurring in the western EU Atlantic coast comprehending 1,055 species of Annelida, 853 species of Crustacea and 617 species of Mollusca.

Task 2. Create a dedicated BOLD dataset to initiate the compilation of a core COI-5P DNA barcode reference library, by including all published records previously generated by both research groups, as well as publicly available records on BOLD or Genbank, where DNA barcodes were generated for the taxonomic groups and over the geographic range detailed in 1. A dataset “DS-EUMARINV, European Marine invertebrates with DNA barcodes” was initiated on BOLD (http://v4.boldsystems.org/) and included records generated by both research groups, as well as other published records from BOLD projects that generated barcodes for invertebrate marine species occurring in the western EU Atlantic coast (Table 1).

Table 1. Original projects and codes, and associated publications, from which DNA barcodes for the three target groups were compiled and added to the dataset DS-EUMARINV. Project code Target groups Nº of COI-5P DNA Publication barcodes AMPPT Crustacea 126 Lobo et al. 2017 BCAS Annelida, Crustacea, Mollusca 109 Aylagas et al. 2016 BIPM Mollusca 4 - BIV Mollusca 13 Vilela 2015 BNAGB Mollusca 349 Barco et al. 2016 BNSA, BNSC, Crustacea 612 Raupach et al. 2015 BNSCI, BNSCP, BNSDE, BNSIS BOCI Mollusca 23 Barco et al. 2013 BVALN Mollusca 6 - DGAS Mollusca 71 - FCDOP, FCDPH, Crustacea 239 Matzen da Silva et al. 2011 JSDAZ, JSDPX, JSDSC, JSDSV, JSDUK FCGA Crustacea 16 Costa et al. 2009 GBAN Annelida 1 Summers et al. 2015 GBCM Crustacea 1 Cowart et al. 2015 GBMBV Mollusca 4 Glockner et al. 2013, Cowart et al. 2015, Clewing et al. 2013 GBMGA Mollusca 1 Cowart et al. 2015 GBMIN Mollusca 3 Reid et al. 2012 GBMLB Mollusca 3 - GBMLG Mollusca 2 Carmona et al. 2013 GBMLS Mollusca 1 Bourlat et al. 2008 LMBAG Mollusca 37 Borges et al. 2016 LOBO Annelida, Crustacea, Mollusca 62 Lobo et al. 2013 METP Mollusca 10 - MPCPT Annelida 89 Lobo et al. 2016 PIPM Mollusca 8 - Total 1,790

The generated dataset included 1,790 records with 1,056 COI-5P DNA barcodes belonging to Crustacea, 570 to Mollusca and 164 to Annelida, covering a total of 536 morphospecies. These

2 were assigned to 566 Barcode Index Numbers (BINs): 78 Annelida morphospecies were assigned to 82 BINs, 292 Crustacea morphospecies were assigned to 312 BINs and 166 Mollusca morphospecies were assigned to 172 BINs. Overall, 368 BINs were concordant, 17 were discordant and 181 were singletons.

Task 3. Audit and annotate the reference library records compiled in 2. The records compiled in the dataset DS-EUMARINV were not yet audited and annotated since the best way to conduct this process and the criteria to be used are still under discussion by the proponent and the host institutions. However some agreement between the proponent and the host institutions was reached in what respects some aspects: i) an automated workflow should be created, using as basis the one proposed by Oliveira et al. 2016, in order that any change introduced in the auditing protocol could immediately be applied to a reference library that is under construction, what would not be possible when auditing and annotating manually; ii) use only sequences that are barcode compliant (minimum length of 500 bp; <1% ambiguous bases; presence of 2 trace files; minimum of low trace quality status and specification of the geographic location) and iii) preferentially datasets should contain at least 4 barcodes per species and generated by different laboratories.

Task 4. Conduct a species gap-analysis by comparing the checklist generated in 1 with the curated dataset generated in 3. Although in the initial proposal we aimed to perform the species gap-analysis by comparing the checklist generated in 1, with the curated dataset generated in 3, we took profit of the tool “Species checklist” and the options “Progress Report” and “Hit list” in BOLD v4 to conduct the species gap-analysis. By using this tool, we uploaded a species checklist entitled “CL-AMEU, AMBI European waters”. The % of species with DNA barcodes was below 50% for all the targeted groups (ca. 49 % Crustacea, 44 % Mollusca and 41 % Annelida). (Table 2) and were distributed by 83 families, 11 orders and 3 classes, among Crustacea; by 67 families, 8 orders and 4 classes, among Mollusca and by 45 families, 7 orders and 2 classes, among Annelida.

Table 2. Summary of the results in % of the progress report obtained after conducting the species gap- analysis in BOLD v4, from the class to the species taxonomic rank level for the three target groups. Annelida Crustacea Mollusca Overall Total nº of 1,055 853 617 2,525 species Taxon rank % with DNA barcodes Class 100.0 100.0 83.3 90.9 Order 91.7 93.3 71.4 80.7 Family 84.3 79.9 79.7 80.6 Genus 59.5 68.2 54.7 60.9 Species 40.7 48.8 43.8 44.3

Final Remarks In the current study, a gap analysis was conducted for marine invertebrate species belonging to the three dominant groups occurring in transition and coastal waters of the western EU Atlantic coast – Annelida, Crustacea and Mollusca. We took profit of the possibility of conducting directly gap-analyses in BOLD v4 to fulfil this task, where the only requirement is the upload of species checklists. A progress report with the % of species barcoded and a hit list with the species from the list needing barcodes are immediately generated. For compiling our species checklist, the list of species used to calculate the AZTI’s Marine Biotic Index (AMBI) was used as basis. The AMBI is currently used as a component of the benthic invertebrates’ assessment by several Member States within the North East Atlantic and describes the sensitivity of macrobenthic

3 species to both anthropogenic and natural pressures (Borja et al. 2000). Thus, the list of species used in AMBI is highly relevant for the WFD and the Marine Strategy Framework Directive. The % of barcoded species was similar for the three groups, varying between 41 to 49 %. Although a strong effort was made to compile an accurate list as much as possible, this list will need constant update, particularly a regular verification of the species nomenclature. One of the most pertinent problems faced during the compilation of the list was how to deal with the species synonyms. We tried to solve this issue by matching our original list against WoRMS and ordering the species by the accepted scientific names. However, this will not solve entirely this problem because the species nomenclature on BOLD do not seem also to be regularly updated. Just as an example, the names of the mollusc species incrassatus and Nassarius reticulatus were recently altered to incrassata and Tritia reticulata; thus our list already includes the species with the updated names but in BOLD the former names are those that are still in use. This is something that needs to be carefully addressed in next WG1 meetings. We also initiated the compilation of a core COI-5P DNA barcode reference library in BOLD comprising 1,790 barcodes generated by both research groups, as well as other published records from BOLD projects. However, the dataset was not yet audited and annotated since the best way about how to conduct this process and the criteria to be used are still under discussion by the proponent and the host institutions. The quality assurance of the barcode reference libraries is also a point that is still under discussion by the WG1, which in the last meeting held on the past 9th of March, have proposed to arrange an auditing group that shall develop a minimum set of requirements for an auditing workflow in how data of reference libraries should be curated. The current STSM was highly profitable in starting a checklist and a core COI-5P DNA barcode reference library that comprises relevant invertebrates’ species occurring in transition and coastal waters in the western European Atlantic coast. It also allowed to start a collaboration between the two institutions – University of Minho and AZTI-Tecnalia, which will continue in the near future. For instance, we propose to start to prepare a COST action manuscript on the reference library and gap-analysis for the marine EU species, meant to be submitted for publication after collection of additional contributions from COST participants, and fair completion of the library and annotation. In addition, an abstract with the major findings attained during this STSM, in what respects the gap-analysis, will be submitted to the 7th International Barcode of Life (iBOL) conference (http://dnabarcodes2017.org/).

References Aylagas E, et al. 2016. Benchmarking DNA metabarcoding for biodiversity-based monitoring and assessment. Frontiers in Marine Science 3: 96. Barco A, et al. 2013. Molecular data reveal cryptic lineages within the northeastern Atlantic and Mediterranean small mussel drills of the Ocinebrina edwardsii complex (Mollusca: : Muricidae). Zoological Journal of the Linnean Society 169: 389–407. Barco A, et al. 2016. Identification of North Sea molluscs with DNA barcoding. Molecular Ecology Resources 16: 288–297. Borges LMS, et al. 2016. With a little help from DNA barcoding: investigating the diversity of Gastropoda from the Portuguese coast. Scientific Reports 6: 20226.

4

Borja A, et al. 2000. A marine biotic index to establish the ecological quality of soft•bottom benthos within European estuarine and coastal environments. Marine Pollution Bulletin 40: 1100•1114. Bourlat SJ, et al. 2008. Feeding ecology of Xenoturbella bocki (phylum Xenoturbellida) revealed by genetic barcoding. Molecular Ecology Resources 8: 18-22. Carmona L, et al. 2013. A tale that morphology fails to tell: a molecular phylogeny of Aeolidiidae (Aeolidida, Nudibranchia, Gastropoda). PLoS ONE 8: e63000. Clewing C, et al. 2013. Molecular phylogeny and biogeography of a high mountain bivalve fauna: the Sphaeriidae of the Tibetan Plateau. Malacologia 56: 231-252. Costa FO, et al. 2009. Probing marine Gammarus (Amphipoda) with DNA barcodes. Systematics and Biodiversity 7: 365-379. Cowart DA, et al. 2015. Metabarcoding is powerful yet still blind: a comparative analysis of morphological and molecular surveys of seagrass communities. PLoS ONE 10: e0117562. Glockner G, et al. 2013. The mitochondrial genome of Arctica islandica; Phylogeny and variation. PLoS ONE 8: e82857. Leese F, et al. 2016. DNAqua-Net: Developing new genetic tools for bioassessment and monitoring of aquatic ecosystems in Europe. Research Ideas and Outcomes 2: e11321. Lobo J, et al. 2013. Enhanced primers for amplification of DNA barcodes from a broad range of marine metazoans. BMC Ecology 13:34. Lobo J, et al. 2016. Starting a DNA barcode reference library for shallow water polychaetes from the southern European Atlantic coast. Molecular Ecology Resources 16: 298-313. Lobo J, et al. 2017. Contrasting morphological and DNA barcode-suggested species boundaries among shallow water amphipod fauna from the southern European Atlantic coast. Genome 60: 147-157. Matzen da Silva J, et al. 2011. Systematic and evolutionary insights derived from mtDNA COI barcode diversity in the Decapoda (Crustacea: Malacostraca). PLoS ONE 6: e19449. Oliveira LM, et al. 2016. Assembling and auditing a comprehensive DNA barcode reference library for European marine fishes. Journal of Fish Biology, doi:10.1111/jfb.13169. Raupach MJ, et al. 2015. The application of DNA barcodes for the Identification of marine crustaceans from the North Sea and adjacent regions. PLoS ONE 10: e0139421. Reid DG, et al. 2012. A global molecular phylogeny of 147 periwinkle species (Gastropoda, Littorininae). Zoological Scripta 41: 125-136. Summers M, et al. 2015. Whale falls, multiple colonisations of the deep, and the phylogeny of Hesionidae (Annelida). Invertebrate Systematics 29: 105-123. Vilela AP, 2015. Biblioteca de referência de DNA barcodes de bivalves (Mollusca: Bivalvia) e gastrópodes (Mollusca: Gastropoda) da costa Portuguesa. MSc thesis in Ecology, University of Minho, Braga, Portugal.

5