The Mitochondrial Genomes of 11 Aquatic Macroinvertebrate Species
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 1 The mitochondrial genomes of 11 aquatic macroinvertebrate 2 species from Cyprus 3 Jan-Niklas Macher1*, Katerina Drakou2, Athina Papatheodoulou2, Berry van der 4 Hoorn1, Marlen Vasquez2 5 1) Naturalis Biodiversity Center, Postbus 9517, 2300 RA Leiden, Netherlands 6 2) Cyprus University of Technology, Department of Environmental Science and Technology, 7 30 Archibishop Kyprianos str., 3036 Limassol, Cyprus 8 * Corresponding author 9 10 Graphical Abstract 11 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 12 Abstract 13 Aquatic macroinvertebrates are often identified based on morphology, but molecular 14 approaches like DNA barcoding, metabarcoding and metagenomics are increasingly 15 used for species identification. These approaches require the availability of DNA 16 references deposited in public databases. Here we report the mitochondrial genomes 17 of 11 aquatic macroinvertebrates species from Cyprus, a European Union island 18 country in the Mediterranean. Only three of the molecularly identified species could be 19 assigned to a species name, highlighting the need for taxonomic work that leads to 20 the formal description and naming of species, and the need for further genetic work to 21 fill the current gaps in reference databases containing aquatic macroinvertebrates. 22 23 Keywords: Mitochondrial genome, Freshwater Biodiversity, Macroinvertebrates, 24 Mediterranean, Biodiversity Hotspot 25 26 Introduction 27 Aquatic macroinvertebrates are commonly used for the assessment of ecosystem 28 quality (1–3) and in ecological studies (4–6). Taxa are often identified based on 29 morphology, but the advent of DNA sequencing technologies has led to a paradigm 30 shift, as techniques like DNA barcoding (7), metabarcoding (8) and metagenomic 31 approaches (9–11) allow to identify and study large numbers of macroinvertebrate 32 specimens, species and communities in a short amount of time and at reasonable 33 costs. DNA based identification of macroinvertebrates relies on the comparison of 34 obtained DNA sequence data to reference sequences deposited in databases like 35 BOLD (12) or NCBI GenBank (13). Recent work has shown that some geographic 36 areas and taxonomic groups are well represented in reference databases, but 37 information on other areas and taxa is scarce, limiting the applicability of DNA based 38 methods for studying the respective taxonomic groups and ecosystems (14, 15). It is 39 therefore important to sequence aquatic macroinvertebrate species and deposit the 40 information in public databases. Full mitochondrial genomes have the advantage of 41 containing (usually) 13 protein coding genes and the large and small subunits of the 42 ribosomal RNA, which can potentially help improve species identification and makes 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 43 data available for phylogenetic studies. 44 Here we report the mitochondrial genomes of 11 aquatic macroinvertebrates species 45 from Cyprus, a European Union island country in the Mediterranean. The aquatic 46 macroinvertebrate fauna of Cyprus is expected to be diverse and harbour many 47 endemic species due to the country’s location in the Mediterranean Biodiversity 48 Hotspot (16). However, the fauna is vastly understudied with respect to aquatic 49 macroinvertebrates, with less than 50 molecular barcodes deposited in the Barcode 50 of Life database as of May 2020, and no country-specific identification keys available 51 to date. We sequenced and assembled the mitochondrial genomes of two Baetidae 52 (Ephemeroptera), one Caenidae (Ephemeroptera), one Chironomidae (Diptera), one 53 Dixidae (Diptera), one Simuliidae (Diptera), one Hydrachnidae (Acari), one 54 Gomphidae (Odonata), one Euphaeidae (Odonata), one Gyrinidae (Coleoptera) and 55 one Erpobdellidae (Hirudinea) species. The sequencing of mitochondrial genomes of 56 selected taxa is part of a COST DNAaqua-NET(17) project on Cyprus freshwater 57 biodiversity, which tackles the challenges of describing aquatic diversity in 58 understudied regions and making data available for future studies and biomonitoring 59 (18, 19). The ongoing project will lead to an in-depth description of the aquatic 60 macroinvertebrate fauna of Cyprus. 61 Material & Methods 62 Specimens were collected from a perennial stream in Cyprus (coordinates: 34.769801 63 N, 32.911568 E, see Figure 1) on 14-08-2018 using a surber sampler (500um mesh 64 bag). Specimens were identified on family level based on morphology by local experts. 65 All specimens were stored in 96% ethanol and shipped to the Naturalis Biodiversity 66 Center. 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 67 68 Figure 1: A) Location of the sampling site in Cyprus B) Photo of the sampling site 69 In the laboratory, specimens were carefully rinsed with lab grade water to remove 70 external adhesions. DNA was extracted from the whole body as follows. The 71 specimens were cut with scissors and crushed with spatulas in 2ml Eppendorf tubes. 72 All equipment was sterile. DNA was extracted using the Macherey-Nagel (Düren, 73 Germany) NucleoSpin tissue kit on the KingFisher (Waltham, USA) robotic platform, 74 following the manufacturer’s protocol. A negative control containing lab grade water 75 was processed together with the tissue samples and was checked for contamination 76 during all following steps. Extracted DNA was stored at -20 ºC overnight until further 77 processing. Quantity and size of the extracted DNA was checked on the QIAxcel 78 platform. 15ng of DNA per sample was used for further processing. The New England 79 Biolabs (Ipswitch, USA) NEBNext kit and oligos were used for library preparation, 80 following the manufacturer’s protocol and selecting for an average DNA fragment 81 length of 300 bp. Fragment length and quantity of DNA in samples was checked on 82 the Agilent (Santa Clara, USA) Bioanalyzer. Samples were equimolar pooled before 83 sending for sequencing. The negative control, which did not show any signal of DNA 84 present, was added to the final library with 10% of the total library volume. The final 85 library was shipped to BGI (Shenzhen, China) for sequencing on the NovaSeq 6000 86 platform (2x 150bp). 87 Raw reads were quality checked using MultiQC (20). Fastp (21) was used for trimming 88 of Illumina adapters and removal of reads shorter than 100 bp. Reads were assembled 89 using megahit (22) with a maximum k-mer length of 69 and default parameters. 90 Spades (23) was used for a separate assembly to assess consistency of assembly 91 results (all commands: Supplementary material 1). Contigs were blast searched 4 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.26.173625; this version posted June 28, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 92 against the NCBI Genbank nucleotide database (13). Identified putative mitochondrial 93 genomes were annotated using the MITOS (24) web server. Reading frames and 94 annotations were manually checked and corrected using Geneious Prime (v. 2020.1), 95 which was subsequently used to submit the annotated mitochondrial genomes to NCBI 96 Genbank. Gene order and direction was compared to the available mitochondrial 97 genomes of related species downloaded from NCBI Genbank. The obtained 98 cytochrome c oxidase I (COI) genes were compared with sequences deposited in the 99 NCBI Genbank and BOLD databases to check for availability of species- level 100 barcodes. A match of 97% identity or higher was used as threshold for a species-level 101 match (7). Occurrence data was compared to literature and to data available in the 102 Global Biodiversity Information Facility. 103 Results 104 Sequencing on the NovaSeq platform resulted in 43,369,218 reads (reads per sample: 105 Supplementary table 1). The negative control contained 952 reads (0.002% of all). 106 Illumina platforms are known to produce tag switching (25, 26), and a small proportion 107 of reads is commonly found in negative controls. As the number of reads observed in 108 our negative control was low, we did not suspect contamination. We obtained the full 109 mitochondrial genome for all 11 macroinvertebrate species. Coverage varied from 18 110 (Euphaeidae_Cy2020_sp. 1, Odonata) to 449 (Caenidae_Cy2020_sp. 1, 111 Ephemeroptera), with an average coverage of 135-fold (see supplementary table 2 for 112 coverage and assembly length). All mitochondrial genomes contained 13 protein 113 coding genes, 22 tRNAs and the large and small subunit of the ribosomal RNA (see 114 supplementary material 2 for mitochondrial genomes. Fully annotated genomes will 115 be available in NCBI GenBank upon publication). Megahit and Spades assemblies 116 resulted in identical mitochondrial contigs, with the expected exception being length 117 variations in the highly repetitive and AT rich control regions, which are difficult to 118 assemble with short reads.