DNA Barcoding a Nightmare Taxon: Assessing Barcode Index Numbers and Barcode Gaps for Sweat Bees
Total Page:16
File Type:pdf, Size:1020Kb
Genome DNA barcoding a nightmare taxon: Assessing barcode index numbers and barcode gaps for sweat bees Journal: Genome Manuscript ID gen-2017-0096.R2 Manuscript Type: Article Date Submitted by the Author: 18-Aug-2017 Complete List of Authors: Gibbs, Jason; University of Manitoba, Entomology Is the invited manuscript for consideration in a Special This submissionDraft is not invited Issue? : Keyword: Apoidea, species delimitation, identification, taxonomy, Hymenoptera https://mc06.manuscriptcentral.com/genome-pubs Page 1 of 452 Genome 1 DNA barcoding a nightmare taxon: Assessing barcode index numbers and barcode gaps 2 for sweat bees 3 Jason GIBBS 4 University of Manitoba, 5 Department of Entomology, 6 12 Dafoe Rd. 7 Winnipeg, Manitoba, R3T 2N2 8 Draft 9 Email: [email protected] 10 11 12 13 14 15 16 17 18 1 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 2 of 452 19 Abstract 20 There is an ongoing campaign to DNA barcode the world’s >20,000 bee species. Recent 21 revisions of Lasioglossum (Dialictus ) (Hymenoptera: Halictidae) for Canada and the eastern 22 United States were completed using integrative taxonomy. DNA barcode data from 110 species 23 of L. (Dialictus ) are examined for their value in identification and discovering additional 24 taxonomic diversity. Specimen identification success was estimated using the best close match 25 method. Error rates were 20% relative to current taxonomic understanding. Barcode Index 26 Numbers (BINs) assigned using Refined Single Linkage Analysis (RESL) and ‘barcode gaps’ 27 using the Automatic Barcode Gap Discovery (ABGD) method were also assessed. RESL is 28 incongruent for 44.5% species, althoughDraft some cryptic diversity may exist. Forty-three of 110 29 species were part of merged BINs with multiple species. The barcode gap is non-existent for the 30 data set as a whole and ABGD showed levels of discordance similar to the RESL. The 31 ‘viridatum species-group’ is particularly problematic, so that DNA barcodes alone would be 32 misleading for species delimitation and specimen identification. Character-based methods using 33 fixed nucleotide substitutions could improve specimen identification success in some cases. The 34 use of DNA barcoding for species discovery for standard taxonomic practice in the absence of a 35 well-defined ‘barcode gap’ is discussed. 36 Keywords. Apoidea, Hymenoptera, identification, species delimitation, taxonomy 37 38 39 2 https://mc06.manuscriptcentral.com/genome-pubs Page 3 of 452 Genome 40 Introduction 41 DNA barcoding is a method for identifying biological samples to species using the sequence of a 42 standardized gene fragment (Hebert et al. 2003a, Hebert and Gregory 2005). DNA barcoding 43 standards for animals require sequencing a fragment of mitochondrial cytochrome c oxidase 44 subunit 1 not less than 500 bp in length (Hebert et al. 2003a, Hubert et al. 2008), although shorter 45 fragments also can be used for identification (Hajibabaei et al. 2006b) and individual nucleotide 46 substitutions can be diagnostic (Burns et al. 2007, Gibbs 2009b). DNA barcodes are also 47 commonly used to aid in the delimitation of species boundaries (Gibbs 2009a, Rehan and 48 Sheffield 2011, González-Vaquero et al. 2016b, Packer and Ruz 2017). I prefer to use the terms 49 `specimen identification` and ‘species discovery’Draft (Collins and Cruickshank 2012) to prevent 50 confusion over the more ambiguous term ‘species identification’. 51 DNA barcoding for specimen identification works by comparing unknown samples to a 52 database of sequences generated from identified material available on GenBank and the Barcode 53 of Life Data System (BOLD; Ratnasingham & Hebert 2007). BOLD is the product of 54 collaboration between computer programmers, molecular biologists and taxonomists (Smith et 55 al. 2005, Hajibabaei et al. 2005, Ratnasingham and Hebert 2007). Taxonomists provide the 56 scientific context for sequence data used for specimen identification (Goldstein and DeSalle 57 2011). The advantage of this enterprise from the taxonomists’ perspective is a potential wealth of 58 molecular data that can be used to test hypotheses of species limits (DeSalle et al. 2005). The 59 combination of traditional taxonomic approaches with molecular methods can create a 60 ‘taxonomic feedback loop’, which can lead to species discovery and more well resolved 61 taxonomies (Page et al. 2005). Inclusion of molecular data in taxonomic studies is one part of a 3 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 4 of 452 62 broader integrative approach to the science sometimes referred to as ‘integrative taxonomy’ 63 (Dayrat 2005). 64 DNA barcoding success is often related to the presence of a barcoding ‘gap’ (Meyer and 65 Paulay 2005). If genetic divergence within species does not overlap with divergence between 66 congeners then DNA barcodes can identify specimens effectively. DNA barcoding loses efficacy 67 when the barcoding ‘gap’ becomes small or absent (Meyer and Paulay 2005). Sequence 68 divergence thresholds such as 2 or 3% have been suggested for grouping specimens into 69 provisional species (Hebert et al. 2004b, Smith et al. 2005). High success rates using similar 70 thresholds have been reported for some taxa (Hebert et al. 2003b, 2004b, Barrett and Hebert 71 2005), but would not be sufficient for someDraft species-complexes, including groups of butterflies 72 (Hebert et al. 2004a), flies (Meier et al. 2006, Virgilio et al. 2012), and bees (Gibbs 2009a, 73 2009b, Almeida et al. 2009). DNA barcoding thresholds work best if matched to the specific 74 taxon library (Meier et al. 2006). A regression method for determining ad hoc thresholds for 75 DNA barcoding libraries has been proposed (Virgilio et al. 2012). An alternative to sequence 76 divergence thresholds is to use diagnostic nucleotide substitutions or unique patterns of 77 nucleotide polymorphism for identifying closely related species (DeSalle et al. 2005, Goldstein 78 and DeSalle 2011, Gibbs et al. 2013). 79 Ratnasingham and Hebert (2013) recently developed the Barcode Index Number (BIN) 80 system for categorizing DNA barcodes into operational taxonomic units (OTUs) in the absence 81 of taxonomic information. BINs are assigned using Refined Single Linkage Analysis (RESL), an 82 algorithm that doesn’t use prior taxonomic knowledge (Ratnasingham and Hebert 2013). RESL 83 uses a 2.2% threshold of sequence divergence to delimit preliminary OTUs and then refines them 84 using a graphical Markov clustering analysis (Ratnasingham and Hebert 2013). A BIN can have 4 https://mc06.manuscriptcentral.com/genome-pubs Page 5 of 452 Genome 85 four possible relationships for any species pair, which Ratnasingham and Hebert (2013) refer to 86 as ‘match’, ‘split’, ‘merge’ and ‘mixture’ (their figure 1). When traditional taxonomy and BINs 87 are concordant, they are said to ‘match’. Splits occur when single species are assigned multiple 88 BINs. Merging occurs when a single BIN number is assigned to two or more species, what has 89 long been referred to informally as ‘lumping’. In the ‘mixture’ scenario, two BINs are assigned 90 to two species, but sequences of at least one species fall into both BINs. Merging and mixture of 91 BINs may occur in situations of introgression, incomplete lineage sorting, or if species are 92 inappropriately assigned too many names (Rheindt et al. 2009). Other methods have been 93 developed for assigning OTUs using sequence data (Pons et al. 2006), including the Automatic 94 Barcode Gap Discovery (ABGD) method (Puillandre et al. 2012). ABGD is similar to RESL in 95 that it defines preliminary OTUs by inferringDraft a barcode gap and then refines partitions 96 recursively allowing for different barcode gaps across the dataset. ABGD has been used in 97 several barcoding studies, typically using default or slightly modified settings (Hendrixson et al. 98 2013, Kekkonen and Hebert 2014). 99 Bees (Hymenoptera: Apoidea: Anthophila) are the most important pollinators of 100 flowering plants (Klein et al. 2007, Ollerton et al. 2011). Pollination services provided by bees 101 are crucial for most terrestrial ecosystem functioning and much of agricultural production (Klein 102 et al. 2007). Bees are also potentially valuable as indicator taxa of ecosystem health (Zayed et al. 103 2004), which stems from their increased extinction risk due to a haplodiploid sex determination 104 mechanism (Zayed and Packer 2005, Zayed 2009). Haplodiploidy also has potential impacts on 105 the rates of mitochondrial introgression (Patten et al. 2015), which may reduce the utility of 106 DNA barcodes in species that hybridize (Nicholls et al. 2012, Patten et al. 2015). Nevertheless, 107 DNA barcodes have been used successfully in bees for faunal studies (Sheffield et al. 2009, 5 https://mc06.manuscriptcentral.com/genome-pubs Genome Page 6 of 452 108 Magnacca and Brown 2012, Schmidt et al. 2015, Packer and Ruz 2017), species and subspecies 109 discovery and delimitation (Gibbs 2009a, Rehan and Sheffield 2011, Pauly et al. 2015, 110 González-Vaquero et al. 2016a, Sheffield et al. 2016), and specimen identification (Sheffield et 111 al. 2011). 112 I examine DNA barcode data generated for a group of taxonomically challenging bees in 113 North America using BINs and barcode gaps. Sweat bees (Hymenoptera: Halictidae) have been 114 called ‘morphologically monotonous’ (Michener 1974) and ‘the despair of taxonomists’ 115 (Wheeler 1928), and the large subgenus Lasioglossum (Dialictus ) is notoriously the most 116 difficult to identify to species. Lasioglossum (Dialictus ) are extremely abundant in surveys of 117 North American bees (MacKay and KnererDraft 1979, Grixti and Packer 2006, Campbell et al. 2007, 118 Droege et al. 2010, Ngo et al. 2013), making identification tools crucial for studies of bee 119 diversity. Thousands of L. (Dialictus ) are collected each year in North America, but taxonomic 120 keys have only become available recently for a subset of species based on geographic regions 121 (Gibbs 2009b, 2010a, 2011). Consequently, many studies have been published for which L. 122 (Dialictus ) specimens are unidentified or misidentified (Kalhorn et al.