bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 The mitochondrial genomes of 11 aquatic macroinvertebrate 2 species from Cyprus

3 Jan-Niklas Macher1*, Katerina Drakou2, Athina Papatheodoulou2, Berry van der 4 Hoorn1, Marlen Vasquez2

5 1) Naturalis Biodiversity Center, Postbus 9517, 2300 RA Leiden, Netherlands

6 2) Cyprus University of Technology, Department of Environmental Science and Technology,

7 30 Archibishop Kyprianos str., 3036 Limassol, Cyprus

8 * Corresponding author

9

10 Graphical Abstract

11

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

12 Abstract

13 Aquatic macroinvertebrates are often identified based on morphology, but molecular 14 approaches like DNA barcoding, metabarcoding and metagenomics are increasingly 15 used for species identification. These approaches require the availability of DNA 16 references deposited in public databases. Here we report the mitochondrial genomes 17 of 11 aquatic macroinvertebrates species from Cyprus, a European Union island 18 country in the Mediterranean. Only three of the molecularly identified species could be 19 assigned to a species name, highlighting the need for taxonomic work that leads to 20 the formal description and naming of species, and the need for further genetic work to 21 fill the current gaps in reference databases containing aquatic macroinvertebrates. 22 23 Keywords: Mitochondrial genome, Freshwater Biodiversity, Macroinvertebrates, 24 Mediterranean, Biodiversity Hotspot 25

26 Introduction

27 Aquatic macroinvertebrates are commonly used for the assessment of ecosystem 28 quality (1–3) and in ecological studies (4–6). Taxa are often identified based on 29 morphology, but the advent of DNA sequencing technologies has led to a paradigm 30 shift, as techniques like DNA barcoding (7), metabarcoding (8) and metagenomic 31 approaches (9–11) allow to identify and study large numbers of macroinvertebrate 32 specimens, species and communities in a short amount of time and at reasonable 33 costs. DNA based identification of macroinvertebrates relies on the comparison of 34 obtained DNA sequence data to reference sequences deposited in databases like 35 BOLD (12) or NCBI GenBank (13). Recent work has shown that some geographic 36 areas and taxonomic groups are well represented in reference databases, but 37 information on other areas and taxa is scarce, limiting the applicability of DNA based 38 methods for studying the respective taxonomic groups and ecosystems (14, 15). It is 39 therefore important to sequence aquatic macroinvertebrate species and deposit the 40 information in public databases. Full mitochondrial genomes have the advantage of 41 containing (usually) 13 protein coding genes and the large and small subunits of the 42 ribosomal RNA, which can potentially help improve species identification and makes

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

43 data available for phylogenetic studies.

44 Here we report the mitochondrial genomes of 11 aquatic macroinvertebrates species 45 from Cyprus, a European Union island country in the Mediterranean. The aquatic 46 macroinvertebrate fauna of Cyprus is expected to be diverse and harbour many 47 endemic species due to the country’s location in the Mediterranean Biodiversity 48 Hotspot (16). However, the fauna is vastly understudied with respect to aquatic 49 macroinvertebrates, with less than 50 molecular barcodes deposited in the Barcode 50 of Life database as of May 2020, and no country-specific identification keys available 51 to date. We sequenced and assembled the mitochondrial genomes of two Baetidae 52 (Ephemeroptera), one Caenidae (Ephemeroptera), one Chironomidae (Diptera), one 53 Dixidae (Diptera), one Simuliidae (Diptera), one Hydrachnidae (), one 54 Gomphidae (Odonata), one Euphaeidae (Odonata), one Gyrinidae (Coleoptera) and 55 one Erpobdellidae (Hirudinea) species. The sequencing of mitochondrial genomes of 56 selected taxa is part of a COST DNAaqua-NET(17) project on Cyprus freshwater 57 biodiversity, which tackles the challenges of describing aquatic diversity in 58 understudied regions and making data available for future studies and biomonitoring 59 (18, 19). The ongoing project will lead to an in-depth description of the aquatic 60 macroinvertebrate fauna of Cyprus.

61 Material & Methods

62 Specimens were collected from a perennial stream in Cyprus (coordinates: 34.769801 63 N, 32.911568 E, see Figure 1) on 14-08-2018 using a surber sampler (500um mesh 64 bag). Specimens were identified on family level based on morphology by local experts. 65 All specimens were stored in 96% ethanol and shipped to the Naturalis Biodiversity 66 Center.

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

67 68 Figure 1: A) Location of the sampling site in Cyprus B) Photo of the sampling site

69 In the laboratory, specimens were carefully rinsed with lab grade water to remove 70 external adhesions. DNA was extracted from the whole body as follows. The 71 specimens were cut with scissors and crushed with spatulas in 2ml Eppendorf tubes. 72 All equipment was sterile. DNA was extracted using the Macherey-Nagel (Düren, 73 Germany) NucleoSpin tissue kit on the KingFisher (Waltham, USA) robotic platform, 74 following the manufacturer’s protocol. A negative control containing lab grade water 75 was processed together with the tissue samples and was checked for contamination 76 during all following steps. Extracted DNA was stored at -20 ºC overnight until further 77 processing. Quantity and size of the extracted DNA was checked on the QIAxcel 78 platform. 15ng of DNA per sample was used for further processing. The New England 79 Biolabs (Ipswitch, USA) NEBNext kit and oligos were used for library preparation, 80 following the manufacturer’s protocol and selecting for an average DNA fragment 81 length of 300 bp. Fragment length and quantity of DNA in samples was checked on 82 the Agilent (Santa Clara, USA) Bioanalyzer. Samples were equimolar pooled before 83 sending for sequencing. The negative control, which did not show any signal of DNA 84 present, was added to the final library with 10% of the total library volume. The final 85 library was shipped to BGI (Shenzhen, China) for sequencing on the NovaSeq 6000 86 platform (2x 150bp).

87 Raw reads were quality checked using MultiQC (20). Fastp (21) was used for trimming 88 of Illumina adapters and removal of reads shorter than 100 bp. Reads were assembled 89 using megahit (22) with a maximum k-mer length of 69 and default parameters. 90 Spades (23) was used for a separate assembly to assess consistency of assembly 91 results (all commands: Supplementary material 1). Contigs were blast searched

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

92 against the NCBI Genbank nucleotide database (13). Identified putative mitochondrial 93 genomes were annotated using the MITOS (24) web server. Reading frames and 94 annotations were manually checked and corrected using Geneious Prime (v. 2020.1), 95 which was subsequently used to submit the annotated mitochondrial genomes to NCBI 96 Genbank. Gene order and direction was compared to the available mitochondrial 97 genomes of related species downloaded from NCBI Genbank. The obtained 98 cytochrome c oxidase I (COI) genes were compared with sequences deposited in the 99 NCBI Genbank and BOLD databases to check for availability of species- level 100 barcodes. A match of 97% identity or higher was used as threshold for a species-level 101 match (7). Occurrence data was compared to literature and to data available in the 102 Global Biodiversity Information Facility.

103 Results

104 Sequencing on the NovaSeq platform resulted in 43,369,218 reads (reads per sample: 105 Supplementary table 1). The negative control contained 952 reads (0.002% of all). 106 Illumina platforms are known to produce tag switching (25, 26), and a small proportion 107 of reads is commonly found in negative controls. As the number of reads observed in 108 our negative control was low, we did not suspect contamination. We obtained the full 109 mitochondrial genome for all 11 macroinvertebrate species. Coverage varied from 18 110 (Euphaeidae_Cy2020_sp. 1, Odonata) to 449 (Caenidae_Cy2020_sp. 1, 111 Ephemeroptera), with an average coverage of 135-fold (see supplementary table 2 for 112 coverage and assembly length). All mitochondrial genomes contained 13 protein 113 coding genes, 22 tRNAs and the large and small subunit of the ribosomal RNA (see 114 supplementary material 2 for mitochondrial genomes. Fully annotated genomes will 115 be available in NCBI GenBank upon publication). Megahit and Spades assemblies 116 resulted in identical mitochondrial contigs, with the expected exception being length 117 variations in the highly repetitive and AT rich control regions, which are difficult to 118 assemble with short reads. Gene order and direction of all mitochondrial genomes was 119 identical to that of related taxa available in Genbank.

120 Of the 11 species, six could be molecularly identified on species level using the 97% 121 identity threshold. However, only three of these six species could be assigned to a full 122 binomial name, while the other three were assigned to references with preliminary 123 names, pending formal description or taxonomic identification (see table 1). ‘Baetis sp.

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

124 2 ZY-2019’ was previously identified in Israel (27), which is close to Cyprus. ‘Caenis 125 (macrura) sp. MAA05’ was reported from streams in Northern Iraq (BOLD ID: BMIKU- 126 0029). The dipteran Simulium petricolum was previously reported from areas around 127 the Mediterranean and from Britain (28). The Dixidae species with BOLD ID FiDip134 128 was previously found in Norway, but only two reference sequences from the same 129 region are available and a lack of a formal species name does not allow assessing 130 whether this species has previously been found in the mediterranean area. The 131 dragonfly Onychogomphus forcipatus is known from a wide range across Europe, 132 including the mediterraenan areas (occurrence data from GBIF: (29)), and the water 133 Mideopsis roztoczensis is known from areas in central Europe as well as from the 134 areas around the Mediterranean (GBIF data: (30)). These preliminary results indicate 135 that the aquatic macroinvertebrate fauna of Cyprus consists of both European and 136 Asian species.

137 Our results highlight the need for taxonomic work that leads to the formal description 138 and naming of species, and the need for further genetic work to fill the current gaps in 139 reference databases containing aquatic macroinvertebrates (14, 15). This is especially 140 important for yet understudied regions like Cyprus, for which no specific identification 141 keys for macroinvertebrates exist. We show that shotgun sequencing can be used for 142 rapid recovery of mitochondrial genomes and can help with filling the gaps in reference 143 databases. The ongoing COST DNAaqua-NET (17) project on Cyprus freshwater 144 biodiversity will combine taxonomic and molecular approaches and will lead to an in- 145 depth description of the aquatic macroinvertebrate fauna of Cyprus.

146 Table 1: Blast results and NCBI accession numbers for the 11 assembled 147 mitochondrial genomes. Identities >97% were recorded as species level match 148 (highlighted in bold).

Reference database match NCBI (% identity, database, ID) accession number

Ephemeroptera

Baetidae_Cy2020_sp. 1 Baetis sp. 2 ZY-2019 (99.24%, BOLD, ID: Available upon MH827966, (27)) publication

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Baetidae_Cy2020_sp. 2 Baetis vardarensis (85.32%, BOLD) Available upon publication

Caenidae_Cy2020_sp. 1 Caenis (macrura) sp. MAA05 (98.31%, BOLD: ‘early Available upon release’ status) publication

Diptera

Chironomidae_Cy2020_sp. 1 Chironomidae sp. (95.37%, BOLD) Available upon publication

Simuliidae_Cy2020_sp. 1 Simulium petricolum (98.93%, BOLD, ID: Available upon GQ465950 (28)) publication

Dixidae_Cy2020_sp. 1 Dixidae sp. (99.23%, BOLD, ID: Available upon FiDip134) publication

Odonata

Euphaeidae_Cy2020_sp. 1 Epallage fatime (95.72%, BOLD) Available upon publication

Gomphidae_Cy2020_sp. 1 Onychogomphus forcipatus (99.85%, BOLD, ID: ZFMK- Available upon TIS-2534021) publication

Acari

Hydrachnidae_Cy2020_sp. 1 Mideopsis roztoczensis (99.1%, BOLD, ID: Available upon JN018102(31)) publication

Coleoptera

Gyrinidae_Cy2020_sp. 1 Gyrinus sp. (94.78%, BOLD) Available upon publication

Hirudinea

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Erpobdellidae_Cy2020_sp. 1 Dina sp. (93.94%, BOLD) Available upon publication

149

150 Acknowledgements

151 We thank Elza Duijm, Frank Stokvis, Marcel Eurlings and Roland Butôt for support in 152 the lab.

153 This article is based upon work from COST Action DNAqua-Net (CA15219), 154 supported by the COST (European Cooperation in Science and Technology) 155 program.

156 Data availability

157 All raw data has been deposited in the NCBI Short Read Archive (SRA). Accession 158 numbers: Available upon publication. All mitogenomes have been deposited in NCBI 159 GenBank. Accession numbers: Available upon publication. All DNA extracts are stored 160 in the Naturalis Biodiversity Center collection.

161 References

162

163 1. M. C. Jackson, O. L. F. Weyl, F. Altermatt, I. Durance, N. Friberg, A. J. Dumbrell, J. J. 164 Piggott, S. D. Tiegs, K. Tockner, C. B. Krug, P. W. Leadley, G. Woodward, 165 Recommendations for the Next Generation of Global Freshwater Biological Monitoring 166 Tools. Advances in Ecological Research (2016), pp. 615–636.

167 2. D. Hering, R. K. Johnson, S. Kramm, S. Schmutz, K. Szoszkiewicz, P. F. M. 168 Verdonschot, Assessment of European streams with diatoms, macrophytes, 169 macroinvertebrates and fish: a comparative metric-based analysis of organism response 170 to stress. Freshwater Biology. 51 (2006), pp. 1757–1785.

171 3. J. C. Morse, Y. J. Bae, G. Munkhjargal, N. Sangpradub, K. Tanida, T. S. Vshivkova, B. 172 Wang, L. Yang, C. M. Yule, Freshwater biomonitoring with macroinvertebrates in East 173 Asia. Frontiers in Ecology and the Environment. 5 (2007), pp. 33–42.

174 4. J. J. Piggott, C. R. Townsend, C. D. Matthaei, Climate warming and agricultural 175 stressors interact to determine stream macroinvertebrate community dynamics. Glob. 176 Chang. Biol. 21, 1887–1906 (2015).

177 5. A. J. Boulton, Parallels and contrasts in the effects of drought on stream 178 macroinvertebrate assemblages. Freshwater Biology. 48 (2003), pp. 1173–1185.

179 6. J. N. Macher, R. K. Salis, K. S. Blakemore, R. Tollrian, C. D. Matthaei, F. Leese,

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

180 Multiple-stressor effects on stream invertebrates: DNA barcoding reveals contrasting 181 responses of cryptic mayfly species. Ecological Indicators. 61 (2016), pp. 159–169.

182 7. P. D. N. Hebert, A. Cywinska, S. L. Ball, J. R. deWaard, Biological identifications 183 through DNA barcodes. Proceedings of the Royal Society of London. Series B: 184 Biological Sciences. 270 (2003), pp. 313–321.

185 8. P. Taberlet, E. Coissac, F. Pompanon, C. Brochmann, E. Willerslev, Towards next- 186 generation biodiversity assessment using DNA metabarcoding. Mol. Ecol. 21, 2045– 187 2050 (2012).

188 9. A. Crampton-Platt, D. W. Yu, X. Zhou, A. P. Vogler, Mitochondrial metagenomics: letting 189 the genes out of the bottle. GigaScience. 5 (2016), , doi:10.1186/s13742-016-0120-y.

190 10. Y. Ji, T. Huotari, T. Roslin, N. M. Schmidt, J. Wang, D. W. Yu, O. Ovaskainen, 191 SPIKEPIPE: A metagenomic pipeline for the accurate quantification of eukaryotic 192 species occurrences and intraspecific abundance change using DNA barcodes or 193 mitogenomes. Mol. Ecol. Resour. (2019), doi:10.1111/1755-0998.13057.

194 11. J. Macher, V. M. A. Zizka, A. M. Weigand, F. Leese, A simple centrifugation protocol for 195 metagenomic studies increases mitochondrial DNA yield by two orders of magnitude. 196 Methods in Ecology and Evolution. 9 (2018), pp. 1070–1074.

197 12. S. Ratnasingham, P. D. N. Hebert, BARCODING: bold: The Barcode of Life Data 198 System (http://www.barcodinglife.org). Molecular Ecology Notes. 7 (2007), pp. 355–364.

199 13. D. A. Benson, M. Cavanaugh, K. Clark, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, E. W. 200 Sayers, GenBank. Nucleic Acids Res. 45, D37–D42 (2017).

201 14. H. Weigand, A. J. Beermann, F. Čiampor, F. O. Costa, Z. Csabai, S. Duarte, M. F. 202 Geiger, M. Grabowski, F. Rimet, B. Rulik, M. Strand, N. Szucsich, A. M. Weigand, E. 203 Willassen, S. A. Wyler, A. Bouchez, A. Borja, Z. Čiamporová-Zaťovičová, S. Ferreira, 204 K.-D. B. Dijkstra, U. Eisendle, J. Freyhof, P. Gadawski, W. Graf, A. Haegerbaeumer, B. 205 B. van der Hoorn, B. Japoshvili, L. Keresztes, E. Keskin, F. Leese, J. N. Macher, T. 206 Mamos, G. Paz, V. Pešić, D. M. Pfannkuchen, M. A. Pfannkuchen, B. W. Price, B. 207 Rinkevich, M. A. L. Teixeira, G. Várbíró, T. Ekrem, DNA barcode reference libraries for 208 the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future 209 work. Sci. Total Environ. 678, 499–524 (2019).

210 15. T. M. Porter, M. Hajibabaei, Over 2.5 million COI sequences in GenBank and growing. 211 PLoS One. 13, e0200177 (2018).

212 16. R. A. Mittermeier, N. Myers, C. G. Mittermeier & P. Robles Gil, Hotspots: Earth's 213 biologically richest and most endangered terrestrial ecoregions. CEMEX, SA, 214 Agrupación Sierra Madre, SC (1999).

215 17. F. Leese, F. Altermatt, A. Bouchez, T. Ekrem, D. Hering, K. Meissner, P. Mergen, J. 216 Pawlowski, J. Piggott, F. Rimet, D. Steinke, P. Taberlet, A. Weigand, K. Abarenkov, P. 217 Beja, L. Bervoets, S. Björnsdóttir, P. Boets, A. Boggero, A. Bones, Á. Borja, K. Bruce, V. 218 Bursić, J. Carlsson, F. Čiampor, Z. Čiamporová-Zatovičová, E. Coissac, F. Costa, M. 219 Costache, S. Creer, Z. Csabai, K. Deiner, Á. DelValls, S. Drakare, S. Duarte, T. Eleršek, 220 S. Fazi, C. Fišer, J.-F. Flot, V. Fonseca, D. Fontaneto, M. Grabowski, W. Graf, J. 221 Guðbrandsson, M. Hellström, Y. Hershkovitz, P. Hollingsworth, B. Japoshvili, J. Jones, 222 M. Kahlert, B. Kalamujic Stroil, P. Kasapidis, M. Kelly, M. Kelly-Quinn, E. Keskin, U. 223 Kõljalg, Z. Ljubešić, I. Maček, E. Mächler, A. Mahon, M. Marečková, M. Mejdandzic, G. 224 Mircheva, M. Montagna, C. Moritz, V. Mulk, A. Naumoski, I. Navodaru, J. Padisák, S.

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

225 Pálsson, K. Panksep, L. Penev, A. Petrusek, M. Pfannkuchen, C. Primmer, B. 226 Rinkevich, A. Rotter, A. Schmidt-Kloiber, P. Segurado, A. Speksnijder, P. Stoev, M. 227 Strand, S. Šulčius, P. Sundberg, M. Traugott, C. Tsigenopoulos, X. Turon, A. Valentini, 228 B. van der Hoorn, G. Várbíró, M. Vasquez Hadjilyra, J. Viguri, I. Vitonytė, A. Vogler, T. 229 Vrålstad, W. Wägele, R. Wenne, A. Winding, G. Woodward, B. Zegura, J. Zimmermann, 230 DNAqua-Net: Developing new genetic tools for bioassessment and monitoring of 231 aquatic ecosystems in Europe. Riogrande Odontol. 2, e11321 (2016).

232 18. F. Leese, A. Bouchez, K. Abarenkov, F. Altermatt, Á. Borja, K. Bruce, T. Ekrem, F. 233 Čiampor, Z. Čiamporová-Zaťovičová, F. O. Costa, S. Duarte, V. Elbrecht, D. Fontaneto, 234 A. Franc, M. F. Geiger, D. Hering, M. Kahlert, B. K. Stroil, M. Kelly, E. Keskin, I. Liska, 235 P. Mergen, K. Meissner, J. Pawlowski, L. Penev, Y. Reyjol, A. Rotter, D. Steinke, B. van 236 der Wal, S. Vitecek, J. Zimmermann, A. M. Weigand, Why We Need Sustainable 237 Networks Bridging Countries, Disciplines, Cultures and Generations for Aquatic 238 Biomonitoring 2.0: A Perspective Derived From the DNAqua-Net COST Action. Next 239 Generation Biomonitoring: Part 1 (2018), pp. 63–99.

240 19. A. Weigand, J. Zimmermann, A. Bouchez, F. Leese, DNAqua-Net: Advancing Methods, 241 Connecting Communities and Envisaging Standards. Proceedings of TDWG. 1 (2017), 242 p. e20310.

243 20. P. Ewels, M. Magnusson, S. Lundin, M. Käller, MultiQC: summarize analysis results for 244 multiple tools and samples in a single report. Bioinformatics. 32 (2016), pp. 3047–3048.

245 21. S. Chen, Y. Zhou, Y. Chen, J. Gu, fastp: an ultra-fast all-in-one FASTQ preprocessor. 246 Bioinformatics. 34, i884–i890 (2018).

247 22. D. Li, C.-M. Liu, R. Luo, K. Sadakane, T.-W. Lam, MEGAHIT: an ultra-fast single-node 248 solution for large and complex metagenomics assembly via succinct de Bruijn graph. 249 Bioinformatics. 31, 1674–1676 (2015).

250 23. A. Bankevich, S. Nurk, D. Antipov, A. A. Gurevich, M. Dvorkin, A. S. Kulikov, V. M. 251 Lesin, S. I. Nikolenko, S. Pham, A. D. Prjibelski, A. V. Pyshkin, A. V. Sirotkin, N. Vyahhi, 252 G. Tesler, M. A. Alekseyev, P. A. Pevzner, SPAdes: a new genome assembly algorithm 253 and its applications to single-cell sequencing. J. Comput. Biol. 19, 455–477 (2012).

254 24. M. Bernt, A. Donath, F. Jühling, F. Externbrink, C. Florentz, G. Fritzsch, J. Pütz, M. 255 Middendorf, P. F. Stadler, MITOS: improved de novo metazoan mitochondrial genome 256 annotation. Mol. Phylogenet. Evol. 69, 313–319 (2013).

257 25. G. L. Owens, M. Todesco, E. B. M. Drummond, S. Yeaman, L. H. Rieseberg, A novel 258 post hoc method for detecting index switching finds no evidence for increased switching 259 on the Illumina HiSeq X. Mol. Ecol. Resour. 18, 169–175 (2018).

260 26. I. B. Schnell, K. Bohmann, M. T. P. Gilbert, Tag jumps illuminated--reducing sequence- 261 to-sample misidentifications in metabarcoding studies. Mol. Ecol. Resour. 15, 1289– 262 1303 (2015).

263 27. Z. Yanai, J.-L. Gattolliat, N. Dorchin, of Baetis Leach in Israel 264 (Ephemeroptera, Baetidae). ZooKeys. 794 (2018), pp. 45–84.

265 28. J. C. Day, M. Mustapha, R. J. Post, The subgenus Eusimulium (Diptera: Simuliidae: 266 Simulium) in Britain. Aquatic Insects, 32(4), pp. 281-292 (2010).

267 29. Website, (available at Onychogomphus forcipatus Linnaeus, 1758 in GBIF Secretariat 268 (2019). GBIF Backbone Taxonomy. Checklist dataset https://doi.org/10.15468/39omei

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

269 accessed via GBIF.org on 2020-06-26.).

270 30. Website, (available at Mideopsis roztoczensis Biesiadka & Kowalik, 1979 in GBIF 271 Secretariat (2019). GBIF Backbone Taxonomy. Checklist dataset 272 https://doi.org/10.15468/39omei accessed via GBIF.org on 2020-06-26.).

273 31. J. Arabi, M. L. I. Judson, L. Deharveng, W. R. Lourenço, C. Cruaud, A. Hassanin, 274 Nucleotide composition of CO1 sequences in (Arthropoda): detecting new 275 mitogenomic rearrangements. J. Mol. Evol. 74, 81–95 (2012).

276 277

278

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.