<<

169 ARTICLE Barcode-based species delimitation in the marine realm: a test using (: and Copepoda) Robert G. Young, Cathryn L. Abbott, Thomas W. Therriault, and Sarah J. Adamowicz

Abstract: DNA barcoding has been used successfully for identifying specimens belonging to marine planktonic groups. However, the ability to delineate species within taxonomically diverse and widely distributed marine groups, such as the Copepoda and Thecostraca, remains largely untested. We investigate whether a cytochrome c oxidase subunit I (COI-5P) global pairwise sequence divergence threshold exists between intraspecific and inter- specific divergences in the plus the thecostracans ( and allies). Using publicly accessible sequence data, we applied a graphical method to determine an optimal threshold value. With these thresholds, and using a newly generated planktonic marine data set, we quantify the degree of concordance using a bidirec- tional analysis and discuss different analytical methods for sequence-based species delimitation (e.g., BIN, ABGD, jMOTU, UPARSE, Mothur, PTP, and GMYC). Our results support a COI-5P threshold between 2.1% and 2.6% p-distance across methods for these taxa, yielding molecular groupings largely concordant with traditional, morphologically defined species. The adoption of internal methods for clustering verification enables rapid biodiversity studies and the exploration of unknown faunas using DNA barcoding. The approaches taken here for concordance assessment also provide a more quantitative comparison of clustering results (as contrasted with “success/failure” of barcoding), and we recommend their further consideration for barcoding studies. Key words: , DNA barcoding, MOTU, species delimitation, coalescence. Résumé : Le codage a` barres de l’ADN a été employé avec succès pour identifier des spécimens de planctons marins. Cependant, la capacité de distinguer les espèces au sein de groupes marins présentant une grande diversité taxonomique et une large distribution, comme les Copepoda et les Thecostraca, n’a pas encore été testée. For personal use only. Les auteurs ont cherché a` déterminer s’il existait, au sein des données globales de divergence génétique pour la sous-unité I de la cytochrome c oxydase (COI-5P), une démarcation entre les divergences intra- et interspécifiques au sein des copépodes et des Thecostraca (balanes et autres). À l’aide des données de séquences publiques, les auteurs ont employé une méthode graphique pour déterminer un seuil optimal. Avec ces seuils, et a` l’aide de nouvelles données sur le plancton marin, les auteurs ont quantifié le degré de concordance au moyen d’une analyse bidirectionnelle et ils discutent des différentes méthodes analytiques pouvant servir a` délimiter les espèces (p. ex. BIN, ABGD, jMOTU, UPARSE, Mothur, PTP, GMYC). Les résultats supportent l’emploi d’un seuil pour COI-5P entre 2,1 et 2,6 % de distance-p pour l’ensemble de ces méthodes chez ces crustacés; ces seuils permettant d’obtenir des groupements moléculaires largement concordants avec les méthodes traditionnelles fondées sur la morphologie. L’adoption de méthodes internes pour la validation des groupements permet de réaliser rapidement des études de biodiversité et d’explorer des faunes inconnues au moyen du codage a` barres de l’ADN. Les approches

Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 employées dans ce travail pour la mesure de concordance fournissent également une comparaison plus quanti- tative des résultats de groupement (par rapport a` des « succès/échec » du codage a` barres), et les auteurs en recommandent la considération lors d’études de codage a` barres. [Traduit par la Rédaction] Mots-clés : Maxillopoda, codage a` barres de l’ADN, MOTU, délimitation des espèces, coalescence.

Received 4 December 2015. Accepted 17 July 2016. Corresponding Editor: Ian Hogg. R.G. Young and S.J. Adamowicz. Biodiversity Institute of Ontario and Department of Integrative Biology, University of Guelph, 50 Stone Road East, Guelph, ON N1G 2W1, Canada. C.L. Abbott and T.W. Therriault. Fisheries and Oceans Canada, Pacific Biological Station, 3190 Hammond Bay Road, Nanaimo, BC V9T 6N7, Canada. Corresponding author: Robert G. Young (email: [email protected]). Copyright remains with the author(s) or their institution(s). Permission for reuse (free in most cases) can be obtained from RightsLink.

Genome 60: 169–182 (2017) dx.doi.org/10.1139/gen-2015-0209 Published at www.nrcresearchpress.com/gen on 21 October 2016. 170 Genome Vol. 60, 2017

Introduction to specimen identification relied on the presence of low Quantifying biodiversity in marine ecosystems has be- intraspecific genetic variation and larger interspecific di- come increasingly important owing to a rise in anthro- vergence between species, i.e., the presence of a “barcoding pogenic disturbances and increases in local and global gap” (discussed in Meyer and Paulay 2005). Despite criti- extinction rates (Singh 2002; Radulovici et al. 2010). Tra- cisms of strictly similarity-based approaches (Will et al. ditional morphological approaches to specimen identifi- 2005; Hickerson et al. 2006; Collins and Cruickshank 2013), cation are not realistic for completing comprehensive such delimitation methods have been shown to be useful surveys of marine regions owing to the cost, expertise, among a wide range of taxa (Will et al. 2005; Ebach and and time required. Additionally, even when sufficient Holdrege 2005; Radulovici et al. 2010; Huemer et al. 2014). resources are available, morphological identifications Unfortunately, thresholds for species delimitation are are challenging for many specimens owing to variation often inappropriately chosen and simply based on past in morphological characters across life stages, poorly re- literature (Collins and Cruickshank 2013). Although solved in some groups, and cryptic or unde- there has been work developing methodologies to justify scribed species (Knowlton 1993; McManus and Katz 2009; the selection of a molecular threshold (Lefébure et al. Packer et al. 2009). These difficulties are further exacer- bated when conducting geographically broad biodiver- 2006; Blanco-Bercial et al. 2014), these methods rely sity surveys, as taxonomic expertise is often linked to solely on external comparisons, often only to taxonomic specific taxonomic groups and (or) geographic regions. species identifications, which can vary widely depending Therefore, DNA sequence-based specimen identification on the identifier (Shen et al. 2013). Nevertheless, once a systems, including DNA barcoding (Hebert et al. 2003a, well-defined and justified threshold is established, sim- 2003b), are indispensable for conducting large-scale bio- ple delimitation methods are advantageous for quickly diversity surveys. Once established, molecular method- and easily assessing potential biodiversity. ological pipelines, publicly accessible sequence databases, Here, our work focuses on the marine planktonic and tested analytical tools will not only facilitate biodi- Copepoda and Thecostraca. Past placement of these classes versity surveys but will also enable the rapid detection of was under the superclass Maxillopoda (Newman 1992). introduced species through environmental sampling and However, more recent work has indicated that Maxil- high-throughput sequencing techniques (Cristescu 2014). lopoda is not monophyletic (Regier et al. 2005), and the Identifying unknown specimens through DNA bar- taxonomic placement of Copepoda and Thecostraca within codes requires a reference library containing morpho- the has been redefined (Newman 1992; Regier logically identified, barcoded specimens against which et al. 2010; Oakley et al. 2013)(Fig. 1). Recent work using unknowns can be compared (Collins and Cruickshank For personal use only. multiple lines of evidence has placed the subclasses Co- 2014). Currently, it is challenging to use DNA barcode databases for identification of many marine species; pepoda and Thecostraca as sister lineages making up most species remain to be described, and even known Hexanauplia (Regier et al. 2010; Oakley et al. 2013). species often have little to no DNA barcode coverage We investigate the prospects for using rapid species (McManus and Katz 2009; Bucklin et al. 2010; Blanco-Bercial delimitation tools within a hyper-abundant and widely et al. 2014). Important first steps for enabling specimen distributed group of marine invertebrates, the subclasses identification using DNA barcodes are to investigate Copepoda and Thecostraca. Using publicly available patterns of interspecific and intraspecific variation cytochrome c oxidase subunit I (COI-5P) barcode data, within target taxa and to determine the degree to we first describe patterns of genetic divergence and explore which an integrative approach to species delimitation is the potential for a global pairwise sequence divergence Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 necessary—as contrasted with more straightforward ap- threshold to be used to delimit specimens into species- proaches using a single molecular marker (Collins and like units. Second, using a novel data set of morpho- Cruickshank 2013, 2014). In addition to building an iden- logically identified specimens, we also quantify the tification system for known species, it is important to be concordance among multiple sequence clustering meth- able to determine when a specimen is likely to represent ods as well as the concordance between molecular delim- a species that is novel to the database through some form itations and current taxonomy. Our use of bidirectional of species delimitation. concordances is a new approach for the barcoding liter- The success of DNA barcoding has been documented across a range of marine taxa (e.g., da Silva et al. 2011; ature. Our efforts provide insights into marine plank- Blanco-Bercial et al. 2014; and references therein). De- tonic crustacean genetic divergence patterns and species spite these successes, establishing a robust system is still boundaries under differing species definitions. In addi- challenging. With 31 phyla—and with an estimated mil- tion to contributing to the development of molecular lion species remaining to be discovered—the genetic identification systems for these taxa, the approaches em- variation within marine metazoans is not fully appreci- ployed here may be considered for other understudied ated (Bucklin et al. 2010). Earlier molecular approaches marine invertebrates with large geographic ranges.

Published by NRC Research Press Young et al. 171

Fig. 1. A phylogram showing the relationship of from the same site and were assigned the same morpho- Thecostraca and Copepoda with respect to other major logical identification as the sequenced specimen. The lineages. Topology is based on Regier et al. specimens will be archived at the Biodiversity Institute (2010) and Oakley et al. (2013). The section of the tree of Ontario, University of Guelph, and the digital speci- shaded in blue indicates the clade Hexanauplia (Oakley men information is available through the Barcode of Life et al. 2013) and represents the taxonomic focus of this work. [Colour online.] Data Systems (BOLD, http://www.boldsystems.org/, proj- ect code CAISN) (Ratnasingham and Hebert 2007). DNA extraction followed (Ivanova et al. 2006), with several variations (specific methodology can be located in the supplementary data, File S61). Multiple PCR primer sets were used to amplify the barcode region. These primers and further details of the molecular methodology can also be found in the supplementary data (File S61). CodonCode Aligner (CodonCode Corporation, Center- ville, Mass., USA) was used to display sequence quality, screen for short sequence amplifications, and assemble consensus sequences where available. Sequences were aligned using the default (FFT-NS-2) alignment strategy of the multiple alignment program for nucleotide se- quences (MAFFT Ver. 7) (Katoh and Standley 2013), and Methods the multiple sequence alignment (MSA) was trimmed to Specimen collection and molecular laboratory a final length of 588 base pairs (bp). The MSA was then methodology translated using the invertebrate mitochondrial code in Plankton samples were collected from May 2011 to MEGA6 (Tamura et al. 2013) to verify the alignment. Sin- August 2012 at one proposed port location and 11 current gle nucleotide insertions or deletions evident through port locations across all three of Canada’s ocean regions frame shifts were further investigated and edited if they (Arctic, Atlantic, and Pacific) (Fig. 2). Plankton samples revealed an error in the nucleotide base reading. In cases were collected from small vessels using plankton nets of of an unresolved frame shift or stop codon, the sequence both 250 ␮m and 80 ␮m mesh sizes at 0–15 m depth. At was removed, as this pattern suggests the presence of most locations, two seasonal periods were sampled: a nuclear pseudogene. All remaining sequences were July–September and November–December. Collections in screened for potential contamination and (or) misidenti- For personal use only. northern regions were sometimes limited to one season fications; specific methodology can be found in the sup- owing to logistical challenges. At each location, for each plementary data (File S61). season collected, six separate plankton tows were made to provide representation across the entire port. Data sets and molecular operational taxonomic units Samples were maintained in 95% ethanol and trans- Two data sets of COI-5P sequences were analyzed. The ferred to –20 °C cold storage within 6 months of collec- novel data set consisted of marine collections sequenced tion. All samples were split into three fractions, one of here that were morphologically identified to the species which was used in this study. To reduce the number of level. The second set, reference, contained all Copepoda samples for sorting and identification, we combined all and Thecostraca specimens collected here that could not samples from a single mesh size from a single port, and be identified to the species level together with all pub-

Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 for the remainder of this manuscript we refer to these as licly accessible Maxillopoda data on the BOLD system “samples”. Specimens within samples were sorted mor- (BOLD search for “Maxillopoda” in the public data portal, phologically and were taxonomically identified to the using the API search method, conducted on 1 December lowest possible level (identification references: Nouvel 2014; http://www.boldsystems.org/index.php/API_Public/ 1950; Gardner and Szabo 1982; Roff et al. 1984; Kathman specimen?taxon=Maxillopoda). Although the taxonomic 1986; Todd et al. 1996; Gerber 2000; Johnson 1996; designation Maxillopoda is no longer accepted, it was Johnson and Allen 2012). Four to six individuals of each necessary to use it when searching for public sequence morphologically identified maxillopod taxon per sample data, as many databases have not been updated to the were used for further molecular analysis. currently accepted taxonomy. The two data sets were DNA extraction consumed whole individuals, as all used in several different and several similar analyses, individuals were less than approximately 1 mm3. Batch which are described below and visually displayed in vouchers were designated that consisted of individuals Fig. 3. The reference data set was then reduced by excluding

1Supplementary data are available with the article through the journal Web site at http://nrcresearchpress.com/doi/suppl/10.1139/gen- 2015-0209.

Published by NRC Research Press 172 Genome Vol. 60, 2017

Fig. 2. Map of Canada with plankton sampling sites indicated by red circles. Sites include (A) Vancouver, (B) Victoria, (C) Roberts Bank, (D) Nanaimo, (E) Churchill, (F) Steensby Inlet, (G) Iqaluit, (H) Deception Bay, (I) Baie de Sept-Iles, (J) Port Hawkesbury, (K) Bedford Basin, and (L) Bayside. [Colour online.]

those sequences not assigned a Barcode Index Number the Bayesian information criterion implemented in (BIN), to remove sequence data not meeting the minimum the jModelTest2 program (Guindon and Gascuel 2003; quality standards for BIN compliance (Ratnasingham and Darriba et al. 2012). Talavera et al. (2013) recently investi- Hebert 2013). Genetic distances were calculated and sum- gated the results from GMYC using variously constructed marized using the “Distance Summary” and “Barcode input trees and largely found the same resulting clustering Gap Analysis” tools on BOLD (Ratnasingham and Hebert assignments. We also used three different tree-building

For personal use only. 2007). All sequences longer than 200 bp were analyzed methods (ultrametricized trees constructed using Bayes- for the reference and novel data sets. Analyses were con- ian, neighbour-joining, and maximum-likelihood meth- ducted using the BOLD sequence alignment, Kimura ods) for use by the coalescence-based MOTU delineation 2-parameter (K2P) (Kimura 1980) genetic distances, and programs. As the clusters from GMYC, as well as PTP, pairwise deletion of missing data. across all three input trees were similar, detailed meth- Five similarity-based and two coalescence-based anal- ods and results are presented for the Bayesian tree only yses for generating molecular clusters or molecular (see supplementary data, File S61 for Bayesian tree con- operational taxonomic units (MOTUs) (Blaxter 2004) struction details), consistent with recommendations by were compared. Similarity-based methods included BIN Tang et al. (2014). PTP and GMYC clustering results using (Ratnasingham and Hebert 2013), Automatic Barcode unique haplotypes were used to assign all sequences/ Gap Discovery (ABGD) (Puillandre et al. 2012), jMOTU specimens in the data set to a cluster. Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 (Jones et al. 2011), UPARSE (Edgar 2013), and Mothur The GMYC (Fujisawa and Barraclough 2013) analysis (Schloss et al. 2009). The two coalescence-based methods was conducted using python implementation of the were Poisson Tree Processes (PTP) (Zhang et al. 2013) and single-threshold model as downloaded from the Exelixis Generalized Mixed Yule Coalescent (GMYC) (Fujisawa Lab web server (http://species.h-its.org/gmyc/). The PTP and Barraclough 2013). Coalescence-based phylogenetic model was also used to infer putative species boundaries methods were included in the comparison for the novel (Zhang et al. 2013). The BOLD-implemented refined sin- data set only, as the reference data set was too large and gle linkage algorithm provided BIN assignments for each construction of an input tree too computationally expen- sequence (Ratnasingham and Hebert 2013). This method sive, while similarity-based methods were applied to uses a 2.2% p-distance seed threshold but then refines both data sets. Prior to model testing and tree construc- groupings for individual BINs and neighbouring clusters tion, all exact duplicate sequences were removed from based on the level of continuity in the distribution of the MSA using ElimDupes (https://hcv.lanl.gov/content/ genetic divergences among sequences. The jMOTU pro- sequence/ELIMDUPES/elimdupes.html). Phylogenetic re- gram uses a similarity-based approach; a BLAST identity construction for the novel data set used the best-fit filter of 99 was employed, and clusters were arranged model of nucleotide substitution as determined using using the number of variable nucleotides, equivalent

Published by NRC Research Press Young et al. 173

Fig. 3. Flow chart for the analysis of the two data sets used in this study. Part 1 shows the key steps used to analyze the reference data set. Clustering analyses included the use of four programs: ABGD, jMOTU, Mothur, and UPARSE. In addition to these four methods, BOLD BIN assignments were used to evaluate agreement with taxonomic identifications. Part 2 shows the steps in the analysis of the novel data set. The novel data set was clustered into molecular operational taxonomic units using the same four similarity-based analyses as for the reference data set (ABGD, jMOTU, Mothur, and UPARSE). In addition to these, BINs and two coalescent (GMYC and PTP) clustering methods were used, and agreement with taxonomic identifications was quantified. [Colour online.]

to p-distance (Jones et al. 2011). For the remaining range was chosen because it is expected to encompass similarity-based methods (ABGD, Mothur, and UPARSE), the general transition between intraspecific and inter- we set the programs to define molecular clusters using specific variation within our COI-5P data set (Blanco-Bercial p-distance so that results could be compared across et al. 2014). For each program for which the user can spec- methods (Puillandre et al. 2012; Schloss et al. 2009; Edgar ify the threshold (ABGD, jMOTU, Mothur, and UPARSE), 2013). the numbers of clusters generated at each p-distance For personal use only. The ABGD method was implemented through the were plotted, and the vertex point of the resulting curve ABGD C source available on the ABGD website (Puillandre was considered to represent the optimal threshold for et al. 2012). Analysis settings were as follows: Pmin value clustering these data for that method (Handl et al. 2005) of 0.001, Pmax equal to 0.15, a minimum gap width X (Fig. 4). For pairwise divergence thresholds below the equal to 0.001, and 1000 steps using p-distance. The gap proposed optimum, the sequences will be over-split into width used here was much smaller than the default too many MOTUs, having lower correspondence to evo- width of 1.5, and the number of steps was much larger at lutionary species units. Conversely, at pairwise diver- 1000 compared with 10. This was done to obtain cluster- gence thresholds above the optimum, sequences will be ing results across a range of divergences to conduct an over-lumped into too few MOTUs. To determine the op- elbow analysis (explained below). UPARSE implementa- timum, we used a graphical approach where the Euclid-

Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 tion with threshold values greater than 3% is not recom- ian distance between the origin of the graph (0,0) and mended, and a work-around for this problem was every point on the curve was obtained. The point on the implemented to explore a broad range of possible curve with the smallest Euclidean distance to the origin thresholds, which can be seen in the supplementary data was considered the hypothesized ideal global threshold (File S11). The Mothur program had three clustering op- value (Fig. 4). To determine this threshold using empiri- tions available for MOTU assignment (nearest neigh- cal data, thereby foregoing the need to approximate the bour, furthest neighbour, and average neighbour), and curve, we scaled the y-axis (number of MOTUs at a given the average neighbour method was used. Commands threshold) to be equal in length to the x-axis. Analysis and scripts for the generation of the results from Mothur conducted using divergence values between 0%–10% and can be found in the supplementary data (File S11). 0%–20% and scaling the y-axis yielded very similar results (not shown). Determining optimal global thresholds To calculate the optimal divergence threshold for each Concordance among MOTUs and between MOTUs and similarity-based clustering method, we generated clus- morphological species ters for the reference data set using a range between 0% Concordance among MOTUs generated by different and 15% p-distance pairwise divergence thresholds. This analytical methods, as well as concordance between

Published by NRC Research Press 174 Genome Vol. 60, 2017

Fig. 4. Conceptual diagram for the determination of ter agreement to all Linnaean species labels for both the the optimal molecular divergence threshold values. The novel and reference data sets. The comparison resulted vertical long-dashed black line indicates the point on the in four possible outcomes: complete “match”, where curve (elbow) representing a threshold value that does not both clusters match exactly; an exact “split”, where the over-split or over-lump the sequences into molecular reference cluster was split into multiple clusters, with no operational taxonomic units. This point represents the closest distance to the origin (0,0) (red circular-dashed members of the corresponding clusters being unac- arrow), as contrasted with larger vectors (blue small- counted; a complete “lump”, where the reference cluster dashed arrows). The corresponding point on the x-axis was combined with one or more additional reference indicates the value for the percent pairwise divergence clusters in their entirety, with no members unaccount- representing the proposed optimal threshold for the given ed; and a “mixed” result, where a reference cluster was data set using the graphed analysis method. [Colour online.] both split and lumped (Ratnasingham and Hebert 2013). Species represented by a single specimen were removed from this analysis, as these specimens are only able to represent matches in the analysis and would thus bias the results towards concluding concordance. An agree- ment matrix was constructed, and total match, split, lump, and mixed numbers were tabulated for each data set and clustering analysis.

Results Novel molecular data set of Canadian marine Hexanauplia A total of 404 new DNA barcode sequences were gen- erated. After applying a 500 bp length criterion and re- moving sequences containing more than 1% Ns, we obtained 366 (247 identified to species) novel barcode sequences for a newly sampled and morphologically identified planktonic crustacean collection (see taxo- nomic breakdown table in the supplementary data, File S11). Amplification of the COI-5P region was challenging, MOTUs and morphological species assignments, was quan- with some specimens receiving eight PCR attempts using For personal use only. tified using an adjusted Wallace coefficient (Wallace 1983). up to six different primer sets. Once a protocol was es- This coefficient was selected because it provides bidirec- tablished, successful amplifications across the Hexanau- tional results. Calculation of the coefficient required a plia data set increased, and final sequencing success was data matrix containing all specimens, with each row rep- 59% of individuals, representing 100% of the morpholog- resenting a unique specimen and each column a unique ical species. There were several groups that remained clustering analysis result (see supplemental data, File S11). more difficult to amplify, including , Microcalanus, Specimens were removed from the analysis if data were Metridia (specifically Metridia longa), , Paracalanus, missing in any one of the clustering columns being ana- and Pseudocalanus. lyzed. Once a matrix was constructed, adjusted Wallace Patterns of genetic divergence coefficients were computed through the website Com- There was an overall separation between intraspecific Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 paring Partitions (http://www.comparingpartitions.info/) and interspecific divergences for both the reference and (Severiano et al. 2011). Coefficients were determined for the novel data sets (Fig. 5). For the reference data set, the reference and novel data sets separately, and MOTUs were results were limited to sequences with associated Lin- generated for each data set using the analysis-specific naean species names, which resulted in a data set of thresholds obtained from the larger (reference) data set 2825 sequences comprising 262 species. The average only. mean intraspecific K2P divergence value for all species in Comparing MOTUs and morphological identifications the reference data set was 1.84%. The mean maximum Overall concordance between two clustering methods intraspecific pairwise distance was 3.03%, and the mean quantifies how well they agree; however, it does not pro- distance to the nearest neighbour divergence (the smallest vide information on the nature of the agreements/dis- pairwise distance to the closest individual of a different agreements nor highlight those possible problematic species) was 14.79%. The novel data set, with 247 sequences sequences or taxa that are yielding the conflicting clus- representing 27 species, had a mean maximum intraspe- tering results. To understand how clustering results ob- cific distance of 4.35% and a mean distance to the nearest tained from the various molecular methods agreed with neighbour of 19.82%. The average mean intraspecific diver- morphological groupings, we quantified molecular clus- gence value for all species in the novel data set was 1.81%.

Published by NRC Research Press Young et al. 175

Fig. 5. Histograms displaying the intraspecific and interspecific Kimura 2-parameter (K2P) pairwise sequence divergences for the (A) reference and (B) novel data sets (see Table 1 for information on the composition). [Colour online.] For personal use only.

Proposed threshold values through elbow analysis Table 1. Adjusted Wallace coefficient (Wallace 1983) concor- Global pairwise sequence divergence (GPSD) thresh- dance values for the clustering results obtained from four olds, proposed to represent an optimum generally sepa- programs for the reference data set. rating interspecific and intraspecific divergences, ranged ABGD jMOTU Mothur UPARSE from 2.1% to 2.6% across clustering analysis methods for 2.1% (759) 2.3% (862) 2.6% (856) 2.2% (878) the reference data set. Analysis with ABGD displayed the ABGD 2.1% — 0.73 0.72 0.65 lowest percent divergence threshold (2.1%). The jMOTU Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 jMOTU 2.3% 1 — 0.99 0.89 and UPARSE analyses had threshold values of 2.3% and Mothur 2.6% 1 1 — 0.89 2.2%, respectively, while the Mothur result had the high- UPARSE 2.2% 1 1 1 — est value at 2.6%. Note: Thresholds applied to determine clusters are indicated in the row and column labels and were obtained for each analysis Concordance among MOTU clustering methods method via elbow analysis (Fig. 4). Values in parentheses indicate Molecular clustering results with 2825 specimens, rep- the total number of molecular operational taxonomic units as de- resenting 262 uniquely identified species, from the ref- termined by the corresponding analysis and threshold. Each value in the table indicates how well the clusters generated by the erence data set were similar across the four similarity- method indicated by the row label correspond to the clusters based methods using GPSD threshold values (Table 1). yielded by the method indicated in the column label. Each pair of This concordance index can take values between 0 and 1, methods is represented by two values in the table. with higher values indicating a stronger ability of one clustering approach (row label) to explain the clusters The adjusted Wallace concordance values for the generated by another method (column label). Concor- novel data set showed varied concordance across the dance values between a pair of methods can differ in eight clustering methods (Table 2). Owing to the UPARSE accordance with the direction of the comparison. function of identifying suspected chimeric sequences,

Published by NRC Research Press 176 Genome Vol. 60, 2017

Table 2. Bidirectional concordance among clustering methods for the novel data set using adjusted Wallace coefficients (Wallace 1983). Morphology (29) BINs (40) PTP-ML (51) GMYC (37) ABGD 2.1 (30) jMOTU 2.3 (40) Mothur 2.6 (39) UPARSE 2.2 (39) Morphology (29) — 0.562 0.46 0.52 0.704 0.563 0.569 0.559 BINs (40) 0.925 — 0.782 0.881 1 1 1 0.994 PTP-ML (51) 0.923 0.953 — 0.964 0.999 0.954 0.967 0.954 GMYC (37) 0.929 0.956 0.859 — 0.994 0.957 0.969 0.951 ABGD 2.1 (30) 0.938 0.81 0.664 0.742 — 0.81 0.819 0.806 jMOTU 2.3 (40) 0.926 0.999 0.783 0.881 1 — 1 0.994 Mothur 2.6 (39) 0.926 0.988 0.785 0.882 1 0.989 — 0.983 UPARSE 2.2 (39) 0.925 0.999 0.787 0.88 1 1 1 — Note: Values in parentheses indicate the total number of clusters generated for each analysis. The global pairwise sequence divergence thresholds for ABGD, jMOTU, Mothur, and UPARSE are those obtained via elbow analysis using the reference data set. Each value in the table indicates how well the clusters generated by the method indicated by the row label correspond to the clusters yielded by the method indicated in the column label. Each pair of methods is represented by two values in the table.

three sequences were removed from all concordance cal- Results comparing molecular clusters with culations, leaving 244 specimens for analysis. There were morphological groupings noticeably directional results in discriminatory power The extent of agreement between molecular clusters between molecular clustering results and morphological and morphological species containing more than one Linnaean species labels. For example, the Wallace coeffi- identified specimen varied between the reference and cient value from BINs to Linnaean species labels was novel data sets. For the novel data set, the Linnaean 0.925, meaning that two specimens in the same BIN have names were assigned during this study and are based a 92.5% chance they would also have the same Linnaean upon morphology; it is presumed this is primarily the case for the reference data set as well, but varied meth- species label. In contrast, two specimens with the same ods (including integrative consideration of molecular Linnaean species label have only a 56.2% chance of fall- data) could have been used among the records in the ing in the same BIN. This example is typical of all com- public data set. The molecular-based clusters displayed parisons between morphological Linnaean species and 40%–80% direct matches with species labels in the novel molecular clusters, where the molecular clusters are data set and 55%–66% using the reference data set (Fig. 6). more discriminant than the species designations. The Of the 29 morphologically identified species in the novel PTP coalescence-based clustering of the specimens parti-

For personal use only. data set, 20 were represented by more than one speci- tioned the data far more than all other methods, with men and so were included in the agreement analysis. Ten 51 MOTUs compared with 37–40 MOTUs for most of morphologically identified species shared a high degree the molecular methods and 29 morphological species of agreement between barcode-based clusters and groups. Moreover, PTP had the lowest concordance in morphological identifications. Two genera, and comparison with all other clustering results, including Centropages, showed varying degrees of agreement among both molecular and morphological techniques. The mor- clustering methods including mixed, split, or matched phological species were best able to explain the ABGD specimen clustering assignment. The species results, with a concordance value of 0.704, but generally herdmani and Tortanus discaudatus were predominately exhibited more modest ability to explain the molecular split into two or more molecular clusters. The remaining results as compared with all other molecular clustering identified species (Paracalanus parvus, longicornis, Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 methods (0.664–0.819 for ABGD to explain other molec- Zaus abbreviatus) predominately matched molecular clus- ular clustering results, 0.994–1 for ABGD clusters to be ters. The reference data set exhibited 56.0% (153/273) explained by all other molecular clustering methods). exact matches across all clustering methods. Of the Sets of clusters generated by BIN, jMOTU, Mothur, and remaining Linnaean names, approximately 2.2% were UPARSE had the highest levels of concordance with one predominantly matched to most molecular clustering another among all comparisons (0.988–1.00). Overall, methods, 11% were split, 2.9% were lumps, and 25.6% molecular clustering methods, including both similarity- were mixes. based and coalescence-based methods, had a strong abil- ity to explain the morphological species; in contrast, the Discussion morphological species had a generally low ability to ex- Here, we empirically estimate a COI-5P threshold plain the molecular clustering results. This suggests the for Hexanauplia, which ranges between 2.1% and 2.6% molecular data are more discriminant, and generate p-distance among the analytical methods employed. We more clusters, than the morphological data, as examined then used these thresholds to quantify the degree to for assigning specimens to Linnaean species, for the which molecular clusters agree with species units ac- novel data set. cording to current taxonomy. We discovered that for

Published by NRC Research Press Young et al. 177

Fig. 6. Agreement between morphologically grouped (Lefébure et al. 2006; Ratnasingham and Hebert 2013; specimens based on Linnaean species labels and clusters Huemer et al. 2014). Unfortunately, research to test the generated using molecular methods for the reference and performance of COI barcode data in marine invertebrate novel data sets. The sample size of species included in taxa has been limited (but see Radulovici et al. 2010; each analysis is indicated in parentheses for each data set. Bucklin et al. 2010, 2011; Blanco-Bercial et al. 2014). Fur- The numbers of molecular operational taxonomic units generated are also indicated in parentheses after each thermore, studies investigating barcode performance of- analysis method. [Colour online.] ten do so by comparing the relative performance of barcode data against morphologically identified speci- mens. The primary use of morphological identification as the gold standard for concluding “successes” and “fail- ures” of barcode data, which is common but not univer- sal across the barcoding literature, presents problems with marine taxa such as Hexanauplia, as some repro- ductively, evolutionarily, and even ecologically distinct taxa lack discernible and diagnostic morphological dif- ferences (Carrillo et al. 1974; Knowlton 1993; McManus and Katz 2009; Bucklin et al. 2011). Few studies have tested the utility of barcode data for providing consistent clustering through an internal, sequence-based measure of cluster validation (Handl et al. 2005). Establishing a threshold for hexanauplian COI-5P bar- code clusters, done here through elbow analysis, can pro- vide an appropriate single threshold for the taxon as a whole. By basing this analysis upon similarity-based clus- tering methods, a threshold can be calculated much more efficiently than by applying coalescence-based methods, which are not presently feasible for extremely large data sets. This graphical method yields clusters similar to Linnaean taxonomy, but more finely subdi- vided. This is evident in the high unidirectional adjusted Wallace coefficient values, whereas Linnaean values are For personal use only. lower than molecular values. This increased resolution by molecular clustering still allows for MOTUs to be linked back to current taxonomy, which is especially important for conservation studies and when screening for species presence/absence as part of invasive species monitoring efforts. Our molecular clustering approaches are partitioning the data into clusters smaller than current morphological species descriptions; for some study aims, a less stringent GPSD threshold may be needed if delimita- approximately 60% of Linnaean species labels, molecular tion into current morphological species groups is desired as Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 species delineation methods assigned specimens to the opposed to expected diversity of evolutionarily distinct spe- same groupings as morphological identifications. We cies based on molecular data. suggest that concordance of independent data sets can The similarity-based molecular methods yielded simi- provide greater support for species boundaries, with lar results to GMYC, in which a threshold is sought that more confidence that the groupings arising from these explicitly divides nodes into those corresponding with different character sets reflect evolutionarily indepen- Yule (interspecies) vs. coalescent (intraspecies) evolution- dent species (see Mayden 1997 for treatment of species ary processes. Therefore, both of these categories of concepts). methods may yield groupings that are closer to evolu- Support for rapid species delimitation for Hexanauplia tionary species, a species concept increasingly favoured using global thresholds by several authors (e.g., Mayden 1997; de Queiroz 2007), Although the use of GPSD thresholds has been criti- than they are to morphological species. While barcode- cized (Will et al. 2005; Ebach and Holdrege 2005), taxon- based MOTUs may represent evolutionarily distinct but specific studies have established that thresholds may in some cases morphologically similar species, further successfully delimit specimens into clusters that are study based on additional molecular, morphological, largely concordant to established taxonomic groups and biological data is generally considered necessary

Published by NRC Research Press 178 Genome Vol. 60, 2017

before formal taxonomic revisions are supported (Collins morphological identifications would be beneficial. This and Cruickshank 2013). Researchers may also disagree could be accomplished by using the remaining batch about the biological meaning of divergent allopatric mi- vouchers as the basis for multiple sets of independent tochondrial lineages within single Linnaean species, de- identifications by different investigators, followed by pending upon the preferred species concept. While a DNA barcoding of these specimens. demonstration of reproductive isolation in sympatry Current suggestions in the literature indicate the use may be required to meet species status under the biolog- of a Maxillopoda threshold ranging from 2% to 3% pair- ical species concept, separately evolving, genetically wise divergence (Radulovici et al. 2010; Blanco-Bercial divergent populations that are currently on different et al. 2014). Here, the total range of determined thresh- evolutionary trajectories might be recognized as species olds for both data sets across similarity-based methods under an evolutionary species concept (Mayden 1997). was fairly small, between 2.1% and 2.6% GPSD (p-distance) Nevertheless, our results support the use of a single mo- depending upon the method, and consistent with previ- lecular marker for rapid species delineation and for in- ous reports. This range, which was obtained using the dicating likely cryptic species for further exploration. same data set, shows how the choice of clustering anal- We sampled up to 34 individuals for those Linnaean spe- ysis can impact the resulting clusters. Interestingly, the cies that were split into two or more MOTU here. Because BIN 2.2% seed threshold—which was calibrated against these exhibited up to 10% sequence divergence, we sug- morphological species using select groups of taxa: bees, gest that further study is warranted, as some of these butterflies and moths, fish, and birds (Ratnasingham and may meet the criterion for species status under biologi- Hebert 2013)—is within the range of global thresholds cal or evolutionary species concepts. calculated here. Results presented here show that GPSD We have presented results using limited specimen col- thresholds are relatively consistent across the four tested lections (see taxonomic breakdown table in the supple- similarity-based analyses and that the following GPSD 1 mentary data, File S1 ) of Hexanauplia, and there could thresholds should be used for copepods and thecostra- be differing levels of molecular variation for subgroups cans: ABGD = 2.05%, jMOTU = 2.35%, Mothur = 2.6%, and within this focal taxon. However, this was not tested UPARSE = 2.2%. here, as we were interested in a possible GPSD threshold that would partition the data into species-like groups Similarity-based vs. coalescent-based methods: across a large taxonomic data set. Although there may be performance and feasibility some taxonomic bias present in our data sets, there is a Similarity-based delimitation methods have certain large phylogenetic breadth in the reference data set, par- advantages over coalescence-based methods: speed, sim- ticularly for and . Given the diverse plicity to implement, and ability to accommodate large For personal use only. data set, we expect that the global threshold will also data sets. Among the taxa investigated here, the work well across other poorly sampled groups in our similarity-based delimitation methods (ABGD, jMOTU, data sets, such as and , Mothur, UPARSE, and BINs) displayed a higher concor- although this supposition would benefit from directed dance compared with morphology than did coalescence- testing in future studies. Having evidence to support a based methods (PTP and GMYC). Specifically, BINs single threshold for larger taxonomic breadth is impor- exhibited more direct matches to Linnaean names than tant, as this is desirable for analyses based upon high- PTP and GMYC results. This slightly better performance throughput sequencing of mixed-species samples. using the BIN method than the other clustering meth- It is also important to note that there could be noise in ods, when compared with current taxonomy, has also our results, potentially due to misidentified specimens. been reported in fish, birds, and two moth groups Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 It is likely that some misidentifications are present in (Ratnasingham and Hebert 2013). our data sets. This is especially true of the reference data Ratnasingham and Hebert (2013) reported an overall set obtained from the publicly accessible BOLD database. higher percentage of exact cluster matches between BINs It is unlikely that such misidentifications would greatly and Linnaean species than we report (Ratnasingham and influence the outcome of our elbow analysis, as this anal- Hebert: 83%–97%; here: 40%–80%). This difference most ysis relied only upon -level identifications. However, likely indicates a lower correspondence in Hexanauplia the presence of misidentifications can more significantly between named species and evolutionary species—those impact our concordance and agreement analyses. This currently on separate evolutionary trajectories as indi- issue may explain why we observed a higher proportion cated by genetic separation, whether allopatric or not. of “lumps” in the reference data set compared with our This could also be due to the presence of more readily novel data set. Misidentifications present in the novel discernible diagnostic morphological characters in those data set would again add error to our agreement analysis taxonomic groups or a higher proportion of species pairs comparing molecular clustering results with morphology- in Hexanauplia that are distinguishable only by chemical, based identifications. Although not completed here ow- ecological, and (or) behavioural traits (e.g., see Knowlton ing to limited resources, verifying the accuracy of our 1993).

Published by NRC Research Press Young et al. 179

There were two pairs of taxa exhibiting mixing in can differ according to the preferred primary species our novel data set ( with A. longiremis, concept and the selected character system (Mayden 1997; Centropages abdominalis with C. hamatus). Although DNA bar- De Queiroz 2007). By quantifying concordance, we can codes have not been previously reported for A. hudsonica examine and compare the signals for various delinea- and A. longiremis, prior research using morphological tions emerging from different character types and and molecular evidence has shown close relationships analysis methods. This bidirectional concordance assess- among other Acartia species and high barcode variability ment, not previously used in the barcoding literature, within single Acartia species (Blanco-Bercial et al. 2014; da provides more information than simply reporting fail- Costa et al. 2011). As with Acartia, the genus Centropages ures when molecular clusters do not agree with morpho- has also been noted as having discordant molecular clus- logical species. This extra information can support tering as compared with morphological identifications existing morphological species as evolutionary species, (Blanco-Bercial et al. 2014). In addition, the family Cen- through a new character system, or provide new biolog- tropagidae has been noted as having a plastic response to ical insights (e.g., into potential cryptic species preva- differing environments, thereby making morphological lence) in cases of discordance. identifications more difficult (Blanco-Bercial et al. 2011). Adjusted Wallace concordance values indicated that Although beyond the scope of this work, we suggest that the molecular clusterings for the novel data set were future research is warranted into the specific nature of more discriminant than current Linnaean species across these “mixing” results and the reasons for the discor- all clustering analyses. This is not surprising, as the low dance between molecular clusters and morphological overall knowledge of the total biodiversity in Copepoda species in these genera. has been noted in the literature (Bucklin et al. 2011; A lower proportion of direct matches between mor- Blanco-Bercial et al. 2014). Moreover, although morpho- phology and barcode-based clusters in copepods and the- logical identifications for the novel data set were con- costracans, compared with several better-studied taxa, ducted by trained experts, some of the observed cases of may also be exacerbated by evolutionary processes that discordance may be due to incorrect identification of are largely unique to the marine realm, such as the accu- specimens, owing to the difficult taxonomy and because mulation of high levels of intraspecific diversity in ex- specimens are often damaged during collection or pres- tremely large populations with large geographic ranges. ervation and juveniles have undeveloped diagnostic Because of this effect, combined with some cases of true morphological characters. With the novel data set, there recent speciation, Meyer and Paulay (2005) predicted was variation in clustering outcomes between coalescence- greater overlap between intraspecific and interspecific based and similarity-based analyses. Linnaean species divergences in marine environments as compared with better explain the similarity-based molecular clusters For personal use only. terrestrial environments. Further investigation into this and are less concordant with the coalescence-based clus- potential difference in patterns of sequence variation ters. These results reflect some variation in the MOTUs between marine and terrestrial systems is warranted and generated using coalescence- versus similarity-based should include a broad variety of marine taxa. The opti- analyses, with the latter generating groupings that can mal method for delimiting marine species may vary de- be somewhat better predicted by current species identi- pending on the scientific questions and the species fications. However, with the exception of PTP and ABGD, concept. While in some instances the method of delimi- the differences among all molecular methods in their tation may be important, in other cases there may be concordance to current taxonomy were modest. Thus, little variation in the species counts across methods. This the molecular data appear to be revealing biological vari- limited variation can be seen in our results, where ation that was previously unrecognized in the current

Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 GMYC, BIN, jMOTU, Mothur, and UPARSE had very close taxonomic hierarchy, which may be reflective of evolu- concordance values, varying between approximately tionary species diversity. 0.88 and 1.0. This finding suggests that more rapid Consistency among molecular delimitation methods similarity-based methods are expected to yield group- Concordance values provide an overall description of ings that largely correspond to those obtained with the the consistency of the clustering results among methods; GMYC method, which is often favoured for its explicit however, this metric tells us little about the specifics of evolutionary model. Consideration of the study question how individuals are partitioned into clusters. If we first is also important; for example, a more discriminant anal- consider the novel data set, when the clustering method ysis option or threshold may be preferable for detecting results in numerous MOTUs there are also fewer exact invasive species or endangered species in a given habitat. matches between molecular groupings and Linnaean Quantifying concordance as an alternative to concluding species. Although the variation in the number of clusters failure/success of barcoding did appear to influence the proportion of matches to Assessing “true” species boundaries—especially in a current species, the number of matches was relatively geographically widespread and taxonomically diverse similar for all analyses, with 60%–70% exact matches. group such as Hexanauplia—is difficult, and boundaries ABGD, with the fewest total clusters, had the highest

Published by NRC Research Press 180 Genome Vol. 60, 2017

number of exact matches among all methods for the This may include primer design and investigation of the novel data set, with no splits. It is important to note that importance of specimen fixation in cold conditions im- with a larger gap parameter setting than used here (such mediately following field collection (Prosser et al. 2013). as the default) and recursive partitioning, the ABGD pro- Finally, the adoption of internal methods for clustering gram performs an exploration of the variation in the verification, such as the analyses presented here, is en- data to create molecular groupings. Thus, we found de- couraged in DNA barcode studies to enable rapid biodi- fault parameters had a tendency to lump the data into versity study and exploration of unknown faunas. larger groups (results not shown). This may have been because our marine focal taxon exhibits more continuity Acknowledgements This research was funded by the Natural Sciences in the range of genetic divergences than many taxa pre- and Engineering Research Council of Canada (NSERC) viously tested. Instead, we used a small gap parameter to through the Second Canadian Aquatic Invasive Species force a threshold upon ABGD and obtain a specific global Network (CAISN II) grant to Hugh MacIsaac and col- threshold using the elbow analysis approach. Therefore, leagues and by an NSERC Discovery Grant to Sarah ABGD may behave differently from here in future appli- Adamowicz. The vouchered maxillopod barcode collection cations, depending upon the settings selected. was generated with partial support from Canada’s The remaining analyses (BIN, GMYC, jMOTU, Mothur, Genomics Research and Development Initiative (Quaran- and UPARSE) had very consistent agreement when con- tine and Invasive Species Project), including DNA se- sidering both total matches and splits together (Fig. 6). quencing by the Applied Genomics facility of the These consistent results indicate the clustering methods, National Research Council of Canada. We also thank the whether resulting in a match or a split, can accurately Ontario Ministry of Research and Innovation for funding place specimens into Linnaean species using these meth- the ongoing development of BOLD, used for data man- ods in approximately 70%–80% of cases. If we were to agement and analysis for this project. We are grateful to remove taxa showing highly unstable correspondence the CAISN collection teams for obtaining the specimens between molecular groupings and current species, used in this study and to Alex Chen, Tooba Khan, which are likely in need of taxonomic revision (Acartia Jessmyn Niergarth, Rebeca Orue Herrera, Michelle Pyle, hudsonica, A. longiremis, Centropages abdominalis, C. hamatus, and Jenna Snelgrove for assistance with specimen sort- Eurytemora herdmani, and ), then the per- ing and photography. We would like to also acknowl- centage of exact matches to current species for our meth- edge expert identifiers Carol Cooper, Mary Greenlaw, ods would increase to approximately 78% and 91%, Rebecca Milne, and Lou Van Guelpen. We would also like similar to barcode agreements reported in other taxo- to recognize the Advanced Analysis Centre at the Univer- nomic groups (Ratnasingham and Hebert 2013; Blanco-Bercial For personal use only. sity of Guelph and the Canadian Centre of DNA Barcod- et al. 2014). ing for sequencing support. Thank you to Niels Van Conclusion Steenkiste, Magalie Castelin, Geoff Lowe, Elizabeth We conclude that when applying GPSD thresholds, the Boulding, Fatima Mitterboeck, and Sujeevan Ratnasing- method selected for MOTU generation is important; our ham for helpful discussions and to two anonymous re- results indicate a need for careful selection of both the viewers and the Associate Editor Ian Hogg for thoughtful method of generating MOTU clusters and the threshold comments. applied. Our study has also found a larger number of References MOTUs generated as compared with morphological spe- Blanco-Bercial, L., Bradford-Grieve, J., and Bucklin, A. 2011. Mo- cies labels. These data indicate either poor taxonomic lecular phylogeny of the Calanoida (Crustacea: Copepoda). Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 identification in the databases, the presence of cryptic Mol. Phylogenet. Evol. 59(1): 103–113. doi:10.1016/j.ympev.2011. species, and (or) evidence of substantial intraspecific di- 01.008. PMID:21281724. Blanco-Bercial, L., Cornils, A., Copley, N., and Bucklin, A. 2014. versity at the COI-5P gene region. Continued research is DNA barcoding of marine copepods: assessment of analytical needed to quantify to what extent these MOTUs repre- approaches to species identification. PLoS Curr. 6: 1–21. doi: sent real biological entities under an explicit species 10.1371/currents.tol.cdf8b74881f87e3b01d56b43791626d2. concept of interest. Future work may also include ampli- Blaxter, M.L. 2004. The promise of a DNA taxonomy. Philos. fication of additional molecular markers (particularly Trans. R. Soc. B Biol. Sci. 359(1444): 669–679. doi:10.1098/rstb. 2003.1447. from the nuclear genome) to verify the taxonomic place- Bucklin, A., Hopcroft, R.R., Kosobokova, K.N., Nigro, L.M., ment of specimens and lend support to the identified Ortman, B.D., Jennings, R.M., and Sweetman, C.J. 2010. DNA GPSD thresholds for the COI-5P molecular region. This barcoding of Arctic Ocean holozooplankton for species iden- work may be especially useful for specimens identified tification and recognition. Deep Sea Res. Part II Top. Stud. here as problematic taxa. In addition to further research Oceanogr. 57(1–2): 40–48. doi:10.1016/j.dsr2.2009.08.005. Bucklin, A., Steinke, D., and Blanco-Bercial, L. 2011. DNA barcod- on species boundaries in taxonomically problematic ing of marine metazoa. Annu. Rev. Mar. Sci. 3(1): 471–508. groups, additional protocol development is needed for doi:10.1146/annurev-marine-120308-080950. groups with low amplification and sequencing success. Carrillo, B.G., Miller, C.B., and Wiebe, P.H. 1974. Failure of inter-

Published by NRC Research Press Young et al. 181

breeding between Atlantic and Pacific populations of the Hickerson, M.J., Meyer, C.P., and Moritz, C. 2006. DNA barcod- marine calanoid Giesbrecht. Limnol. ing will often fail to discover new animal species over broad Oceanogr. 19(3): 452–458. doi:10.4319/lo.1974.19.3.0452. parameter space. Syst. Biol. 55(5): 729–739. doi:10.1080/ Collins, R.A., and Cruickshank, R.H. 2013. The seven deadly sins 10635150600969898. PMID:17060195. of DNA barcoding. Mol. Ecol. Resour. 13(6): 969–975. doi:10. Huemer, P., Mutanen, M., Sefc, K.M., and Hebert, P.D.N. 2014. 1111/1755-0998.12046. PMID:23280099. Testing DNA barcode performance in 1000 species of Euro- Collins, R.A., and Cruickshank, R.H. 2014. Known knowns, pean Lepidoptera: large geographic distances have small ge- known unknowns, unknown unknowns and unknown knowns netic impacts. PloS ONE, 9(12): e115774. doi:10.1371/journal. in DNA barcoding: a comment on Dowton et al. Syst. Biol. 63(6): pone.0115774. PMID:25541991. 1005–1009. doi:10.1093/sysbio/syu060. PMID:25116917. Ivanova, N.V., deWaard, J., and Hebert, P.D.N. 2006. An inexpen- Cristescu, M.E. 2014. From barcoding single individuals to me- sive, automation—friendly protocol for recovering high- tabarcoding biological communities: towards an integrative quality DNA. Mol. Ecol. Notes, 6(4): 998–1002. doi:10.1111/j. approach to the study of global biodiversity. Trends Ecol. 1471-8286.2006.01428.x. Evol. 29(10): 566–571. doi:10.1016/j.tree.2014.08.001. PMID: Johnson, K.B. 1996. A guide to marine coastal plankton and 25175416. marine invertebrate larvae. Kendall Hunt. da Costa, K.G., Vallinoto, M., and da Costa, R.M. 2011. Molecular Johnson, W.S., and Allen, D.M. 2012. Zooplankton of the Atlan- identification of a new cryptic species of (Cope- tic and Gulf coasts: a guide to their identification and ecol- poda, ) from the Northern coast of Brazil, based on ogy. The Johns Hopkins University Press. mitochondrial COI gene sequences. J. Coast. Res. S1(64): 359– Jones, M., Ghoorah, A., and Blaxter, M. 2011. jMOTU and Taxon- 363. erator: turning DNA Barcode sequences into annotated oper- Darriba, D., Taboada, G.L., Doallo, R., and Posada, D. 2012. ational taxonomic units. PloS ONE, 6(4): e19259. doi:10.1371/ jModelTest 2: more models, new heuristics and parallel com- journal.pone.0019259. PMID:21541350. puting. Nat. Methods, 9(8): 772. doi:10.1038/nmeth.2109. Kathman, R.D. 1986. Identification manual to the Mysidacea PMID:22847109. and Euphausiacea of the northeast Pacific. Vol. 93. Fisheries da Silva, J.M., Creer, S., Dos Santos, A., Costa, A.C., Cunha, M.R., and Oceans, Information and Publications Branch. Costa, F.O., and Carvalho, G.R. 2011. Systematic and evolu- Katoh, K., and Standley, D.M. 2013. MAFFT multiple sequence tionary insights derived from mtDNA COI barcode diversity alignment software version 7: improvements in perfor- in the (Crustacea: ). PLoS ONE, 6(5): mance and usability. Mol. Biol. Evol. 30(4): 772–780. doi:10. e19449. doi:10.1371/journal.pone.0019449. PMID:21589909. 1093/molbev/mst010. PMID:23329690. De Queiroz, K. 2007. Species concepts and species delimitation. Kimura, M. 1980. A simple method for estimating evolutionary Syst. Biol. 56(6): 879–886. doi:10.1080/10635150701701083. rates of base substitutions through comparative studies of PMID:18027281. nucleotide sequences. J. Mol. Evol. 16(2): 111–120. doi:10.1007/ Ebach, M.C., and Holdrege, C. 2005. DNA barcoding is no sub- BF01731581. PMID:7463489. stitute for taxonomy. Nature, 434(7034): 697–697. doi:10.1038/ Knowlton, N. 1993. Sibling species in the sea. Annu. Rev. Ecol. 434697b. PMID:15815602. Syst. 24: 189–216. doi:10.1146/annurev.es.24.110193.001201. Edgar, R.C. 2013. UPARSE: highly accurate OTU sequences from Lefébure, T., Douady, C.J., Gouy, M., and Gibert, J. 2006. Rela- microbial amplicon reads. Nat. Methods, 10(10): 996–998. tionship between morphological taxonomy and molecular

For personal use only. doi:10.1038/nmeth.2604. PMID:23955772. divergence within Crustacea: proposal of a molecular thresh- Fujisawa, T., and Barraclough, T.G. 2013. Delimiting species us- old to help species delimitation. Mol. Phylogenet. Evol. 40(2): ing single-locus data and the Generalized Mixed Yule Coales- 435–447. doi:10.1016/j.ympev.2006.03.014. PMID:16647275. cent (GMYC) approach: a revised method and evaluation on Mayden, R.L. 1997. A hierarchy of species concepts: the denoue- simulated data sets. Syst. Biol. 62: 707–724. doi:10.1093/sysbio/ ment in the saga of the species problem. In Species: the units syt033. PMID:23681854. of biodiversity. Edited by M.F. Claridge, H.A. Dawah, and Gardner, G.A., and Szabo, I. 1982. British Columbia pelagic ma- M.R. Wilson. Chapman and Hall, London. pp. 381–324. rine Copepoda: an identification manual and annotated bib- McManus, G.B., and Katz, L.A. 2009. Molecular and morphological liography. Vol. 62. Department of Fisheries and Oceans. methods for identifying plankton: what makes a successful Canadian Special Publication of Fisheries and Aquatic Sci- marriage? J. Plankton Res. 31(10): 1119–1129. doi:10.1093/plankt/ ences 62. fbp061. Gerber, R.P. 2000. An identification manual to the coastal and Meyer, C.P., and Paulay, G. 2005. DNA barcoding: error rates estuarine zooplankton of the Gulf of Maine region from Pas- based on comprehensive sampling. PLoS Biol. 3(12): e422. Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17 samaquoddy Bay to Long Island Sound. Parts 1 and 2. Acadia doi:10.1371/journal.pbio.0030422. PMID:16336051. Productions, 80 and 98. Newman, W.A. 1992. Origin of Maxillopoda. Acta Zoologica, Guindon, S., and Gascuel, O. 2003. A simple, fast, and accurate 73(5): 319–322. doi:10.1111/j.1463-6395.1992.tb01100.x. algorithm to estimate large phylogenies by maximum likeli- Nouvel, H. 1950. Mysidacea. Zooplankton No. 18–27. Conseil hood. Syst. Biol. 52(5): 696–704. doi:10.1080/10635150390235520. International Pour L’exploration De la Mer. PMID:14530136. Oakley, T.H., Wolfe, J.M., Lindgren, A.R., and Zaharoff, A.K. Handl, J., Knowles, J., and Kell, D.B. 2005. Computational cluster 2013. Phylotranscriptomics to bring the understudied into validation in post-genomic data analysis. Bioinformatics, the fold: monophyletic ostracoda, fossil placement, and pan- 21(15): 3201–3212. doi:10.1093/bioinformatics/bti517. PMID: crustacean phylogeny. Mol. Biol. Evol. 30(1): 215–233. doi:10. 15914541. 1093/molbev/mss216. PMID:22977117. Hebert, P.D.N., Cywinska, A., Ball, S.L., and deWaard, J.R. 2003a. Packer, L., Gibbs, J., Sheffield, C., and Hanner, R. 2009. DNA Biological identifications through DNA barcodes. Proc. R. barcoding and the mediocrity of morphology. Mol. Ecol. Re- Soc. B Biol. Sci. 270(1512): 313–321. doi:10.1098/rspb.2002. sour. 9(s1): 42–50. doi:10.1111/j.1755-0998.2009.02631.x. PMID: 2218. 21564963. Hebert, P.D.N., Ratnasingham, S., and deWaard, J.R. 2003b. Bar- Prosser, S., Martínez-Arce, A., and Elías-Gutiérrez, M. 2013. A coding animal life: cytochrome c oxidase subunit 1 diver- new set of primers for COI amplification from freshwater gences among closely related species. Proc. R. Soc. B Biol. Sci. microcrustaceans. Mol. Ecol. Resour. 13(6): 1151–1155. doi:10. 270(S1): S96–S99. doi:10.1098/rsbl.2003.0025. 1111/1755-0998.12132. PMID:23795700.

Published by NRC Research Press 182 Genome Vol. 60, 2017

Puillandre, N., Lambert, A., Brouillet, S., and Achaz, G. 2012. tween typing methods. J. Clin. Microbiol. 49(11): 3997–4000. ABGD, Automatic Barcode Gap Discovery for primary species doi:10.1128/JCM.00624-11. PMID:21918028. delimitation. Mol. Ecol. 21(8): 1864–1877. doi:10.1111/j.1365- Shen, Y.Y., Chen, X., and Murphy, R.W. 2013. Assessing DNA 294X.2011.05239.x. PMID:21883587. barcoding as a tool for species identification and data quality Radulovici, A.E., Archambault, P., and Dufresne, F. 2010. DNA control. PLoS ONE, 8(2): e57125. doi:10.1371/journal.pone. barcodes for marine biodiversity: moving fast forward? 0057125. PMID:23431400. Diversity, 2(4): 450–472. doi:10.3390/d2040450. Singh, J.S. 2002. The biodiversity crisis: a multifaceted review. Ratnasingham, S., and Hebert, P.D.N. 2007. BOLD: The Barcode Curr. Sci. 82(6): 638–647. Available from http:// www.currentscience.ac.in/Downloads/article_id_082_ of Life Data System. Mol. Ecol. Notes, 7(3): 355–364. doi:10. 06_0638_0647_0.pdf. 1111/j.1471-8286.2007.01678.x. PMID:18784790. Talavera, G., Dinca˘, V., and Vila, R. 2013. Factors affecting spe- Ratnasingham, S., and Hebert, P.D.N. 2013. A DNA-based regis- cies delimitations with the GMYC model: insights from a try for all animal species: The Barcode Index Number (BIN) butterfly survey. Methods Ecol. Evol. 4(12): 1101–1110. doi:10. System. PloS ONE, 8(7): e66213. doi:10.1371/journal.pone. 1111/2041-210X.12107. 0066213. PMID:23861743. Tamura, K., Stecher, G., Peterson, D., Filipski, A., and Kumar, S. Regier, J.C., Shultz, J.W., and Kambic, R.E. 2005. Pancrustacean 2013. MEGA6: Molecular Evolutionary Genetics Analysis, phylogeny: hexapods are terrestrial and maxillopods version 6.0. Mol. Biol. Evol. 30(12): 2725–2729. doi:10.1093/ are not monophyletic. Proc. R. Soc. London, Ser. B, 272(1561): 395– molbev/mst197. PMID:24132122. 401. doi:10.1098/rspb.2004.2917. PMID:15734694. Tang, C.Q., Humphreys, A.M., Fontaneto, D., and Regier, J.C., Shultz, J.W., Zwick, A., Hussey, A., Ball, B., Barraclough, T.G. 2014. Effects of phylogenetic reconstruc- Wetzer, R., et al 2010. Arthropod relationships revealed by tion method on the robustness of species delimitation using phylogenomic analysis of nuclear protein-coding sequences. single-locus data. Methods Ecol. Evol. 5(10): 1086–1094. doi: Nature, 463(7284): 1079–1083. doi:10.1038/nature08742. PMID: 10.1111/2041-210X.12246. PMID:25821577. 20147900. Todd, C.D., Laverack, M.S., and Boxshall, G. 1996. Coastal ma- Roff, J.C., Davidson, K.G., Pohle, G., and Dadswell, M.J. 1984. A rine zooplankton: a practical manual for students. Cam- bridge University Press. guide to the marine flora and fauna of the Bay of Fundy and Wallace, D.L. 1983. A method for comparing two hierarchical Scotian Shelf: larval Decapoda: Brachyura. Canadian Technical clustering’s: comment. J. Am. Stat. Assoc. 78(383): 569–576. Report of Fisheries and Aquatic Sciences, 1322: iv + 57. doi:10.2307/2288118. Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Will, K.W., Mishler, B.D., and Wheeler, Q.D. 2005. The perils of Hollister, E.B., et al. 2009. Introducing mothur: open-source, DNA barcoding and the need for integrative taxonomy. Syst. platform-independent, community-supported software for Biol. 54(5): 844–851. doi:10.1080/10635150500354878. PMID: describing and comparing microbial communities. Appl. 16243769. Environ. Microbiol. 75(23): 7537–7541. doi:10.1128/AEM.01541- Zhang, J., Kapli, P., Pavlidis, P., and Stamatakis, A. 2013. A gen- 09. PMID:19801464. eral species delimitation method with applications to phylo- Severiano, A., Pinto, F.R., Ramirez, M., and Carriço, J.A. 2011. genetic placements. Bioinformatics, 29(22): 2869–2876. doi: Adjusted Wallace coefficient as a measure of congruence be- 10.1093/bioinformatics/btt499. PMID:23990417. For personal use only. Genome Downloaded from www.nrcresearchpress.com by 45.72.205.22 on 02/15/17

Published by NRC Research Press