<<

Received: 3 November 2017 | Revised: 15 February 2018 | Accepted: 24 February 2018 DOI: 10.1111/1755-0998.12781

RESOURCE ARTICLE

DINOREF: A curated () reference database for the 18S rRNA gene

Solenn Mordret1 | Roberta Piredda1 | Daniel Vaulot2 | Marina Montresor1 | Wiebe H. C. F. Kooistra1 | Diana Sarno1

1Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, Italy Abstract 2Sorbonne Universite, CNRS, UMR are a heterogeneous group of present in all aquatic ecosys- Adaptation et Diversite en Milieu Marin, tems where they occupy various ecological niches. They play a major role as primary Station Biologique, Roscoff, France producers, but many species are mixotrophic or heterotrophic. Environmental Correspondence metabarcoding based on high-throughput sequencing is increasingly applied to Diana Sarno, Integrative Marine Ecology, Stazione Zoologica Anton Dohrn, Naples, assess diversity and abundance of planktonic , and reference databases Italy. are definitely needed to taxonomically assign the huge number of sequences. We Email: [email protected] provide an updated 18S rRNA reference database of dinoflagellates: DINOREF. Funding information Sequences were downloaded from GENBANK and filtered based on stringent quality Italian Ministry of Education, University and Research criteria. All sequences were taxonomically curated, classified taking into account classical morphotaxonomic studies and molecular phylogenies, and linked to a series

of metadata. DINOREF includes 1,671 sequences representing 149 genera and 422 species. The taxonomic assignation of 468 sequences was revised. The largest num- ber of sequences belongs to and that include toxic and

symbiotic species. DINOREF provides an opportunity to test the level of taxonomic resolution of different 18S barcode markers based on a large number of sequences and species. As an example, when only the V4 region is considered, 374 of the 422

species included in DINOREF can still be unambiguously identified. Clustering the V4 sequences at 98% similarity, a threshold that is commonly applied in metabarcoding studies, resulted in a considerable underestimation of species diversity.

KEYWORDS 18S rRNA gene, dinoflagellates, diversity, phylogeny, sequence database, V4 region

1 | INTRODUCTION precise ways to enumerate them. DNA-based detection and enumera- tion methodologies, such as high-throughput sequencing (HTS) Assessing global biodiversity constitutes an important and urgent task metabarcoding of environmental samples, now offer opportunities for in the face of the currently unprecedented rate of climate change, but assessing protistan diversity rapidly and precisely (Amaral-Zettler, this task is fraught with major challenges. A large part of this biodiver- McCliment, Ducklow, & Huse, 2009; Massana et al., 2015; Piredda sity is composed of protists, and it is especially in these unicellular et al., 2017; Stoeck et al., 2009; de Vargas et al., 2015). Yet, to trans- that taxonomists are confronted by the fact that cell mor- late these HTS data into species occurrences requires a comprehen- phology does not always allow discrimination of species, especially in sive reference database. Curated databases of reference sequences small and relatively featureless taxa. When compared with morpho- linked to taxonomically identified specimens constitute important logical traits, sequence data usually provide more precise and appar- research infrastructures for the advancement of our knowledge of the ently more objective ways to delineate species and therefore more protistan diversity (Decelle et al., 2015; Morard et al., 2015).

| Mol Ecol Resour. 2018;1–14. wileyonlinelibrary.com/journal/men © 2018 John Wiley & Sons Ltd 1 2 | MORDRET ET AL.

Dinoflagellates form a large of protists distributed in vari- dinoflagellate classification at the suprageneric level, we also present ous aquatic ecosystems (Hackett, Anderson, Erdner, & Bhattacharya, an alternative organization of the sequence data, namely one based 2004; Not et al., 2012). The lineage includes autotrophs, hetero- on results of 18S phylogenetic inferences integrated with the latest trophs and mixotrophs; most species are free-living, but parasites literature sources. Finally, we illustrated the degree to which the 18S and symbionts are abundant as well, adding complexity to aquatic V4 region, which is broadly used in metabarcoding surveys and has ecological networks (Gomez, 2012b; Jephcott et al., 2016; Stoecker, been proposed as the universal eukaryotic prebarcode (Hu et al., 1999; Weisse et al., 2016). Dinoflagellates are of socio-economic rel- 2015; Pawlowski et al., 2012), is able to discriminate species. evance because several species produce secondary metabolites, some of which are highly toxic to humans and marine organisms 2 | MATERIALS AND METHODS (Anderson, Cembella, & Hallegraeff, 2012; Berdalet et al., 2016). Therefore, dinoflagellates are monitored on a regular basis in many 2.1 | Sequence retrieval coastal regions. The assessment of dinoflagellate diversity is still lar- gely based on light microscopy observations (LM). Yet, this approach Dinoflagellate 18S rRNA entries available on the 29th of August 2016 has its limitations: it requires a high level of taxonomic expertise and were downloaded from NCBI GENBANK (https://www.ncbi.nlm.nih.gov/) is time-consuming, and many species, including minute, naked and using the following text query: Dinophyceae[] AND (small parasitic dinoflagellates, are virtually impossible to identify. In some subunit ribosomal[titl] OR 18S[titl] OR SSU[titl]). The features associ- cases, toxic species are difficult to distinguish from nontoxic relatives ated with the GENBANK entries were extracted. Sequences of early in LM (de Salas, Laza-Martınez, & Hallegraeff, 2008; Montresor, branching dinoflagellates, not classified as Dinophyceae (sensu Orr John, Beran, & Medlin, 2004). et al. (2012)), were recovered genus by genus and were excluded from

Despite its rather low variability (Murray, Flø Jørgensen, Ho, Patter- DINOREF, but provided as supplementary material (see below). son, & Jermiin, 2005), the 18S ribosomal subunit (18S) is potentially a good DNA barcode region for dinoflagellates because of the large number of 2.2 | Sequence verification sequences deposited in public repositories as a result of its popular use in taxonomic and phylogenetic studies (Gomez, 2014; John et al., 2014), as Sequences labelled as “environmental samples” in GENBANK (i.e., from well as in single-cell PCR amplification studies (e.g., Gomez, Moreira, & metabarcoding, clone libraries, uncultured organisms—labelled as

Lopez-Garc ıa, 2010; Hoppenrath, Murray, Sparmann, & Leander, 2012; Ki, ENV in the GENBANK locus field) and those classified above the genus

Jang, & Han, 2005; Ruiz Sebastian & O’Ryan, 2001). The variable V4 or V9 level were removed. Retained sequences were aligned with MAFFT regions in the 18S rRNA-encoding region are the most commonly used version 7 (default parameters, Katoh & Standley, 2013) and manually nucleotide markers in metabarcoding studies on environmental samples checked. Sequences not meeting the following quality criteria were (e.g., Le Bescot et al., 2016; Onda et al., 2017). removed (Figure 1, step 1): (i) sequences deposited as 18S but not To improve taxonomic annotation of environmental sequences representing 18S at all or only the very 30-end of it; and (ii) 18S generated by HTS approaches, curated reference barcode databases sequences not including the full V4 region. Of the remaining 2 are needed. PR was the first database that became available for pro- sequences, the regions outside the 18S rRNA gene were removed. tists (Guillou et al., 2013), which included 136,866 nuclear-encoded After a second filtration step (Figure 1, step 2), only those sequences sequences that were taxonomically assigned to eight taxonomic that fulfilled all of the following criteria were retained: (i) sequence fields. Few other specialized databases have been created to provide length ≥450 bp; (ii) <50 ambiguous nucleotides; (iii) <20 consecutive taxonomically validated sequences with updated nomenclature and ambiguous bp; (iv) insertion <20 consecutive bp; and v. deletions contextual metadata for different groups of organisms (e.g., Decelle <100 consecutive bp. Sequences that aligned poorly or not at all et al., 2015 for 16S of photosynthetic eukaryotes; Morard et al., with any of the other sequences were blasted against GENBANK and

2015 for ). were removed if BLAST results suggested placement outside dinoflag-

The aim of this study was to provide a taxonomically curated ellates (i.e., BLAST assignation <90%). The retained set of sequences database, called DINOREF, composed of the “core dinoflagellates,” that constituted the “quality-checked database” (Figure 1). is, the species with a dinokaryon, a modified nucleus containing per- manently condensed fibrillar chromosomes, as defined by Orr, Mur- 2.3 | Taxonomic validation ray, Stuken,€ Rhodes, and Jakobsen (2012). To populate the database, we gathered all dinoflagellate 18S rRNA sequences available in The taxonomic validation was an iterative process including the

GENBANK, screened them against a set of quality criteria and verified following steps: their taxonomic assignations by means of phylogenetic positioning.

For each sequence, and nomenclature from GENBANK were 1. control on the validity of the nomenclature, based on Fensome updated, deploying a phylogenetic approach, and taking into account et al. (1993) and Gomez (2012a); ALGAEBASE (Guiry & Guiry, 2017; the most recent literature and taxonomy databases. We organized http://www.algaebase.org), CEDIT (Hoppenrath & Elbr€achter, 2015; the 18S sequences into the same suprageneric ranks as those used http://www.dinophyta.org) and information from the literature of 2 in PR (Guillou et al., 2013). Given the “dynamic status” of the specific groups (Figure 1, step 3); MORDRET ET AL. | 3

FIGURE 1 Workflow describing the steps applied to generate the curated and annotated dinoflagellate 18S rRNA Reference Database DINOREF from the sequences downloaded from NCBI

2. phylogenetic evidences based on the primary tree, the genus- A number of long sequences (≥1,700 bp), representing the diversity level trees and the taxonomic reference tree (Figure 1, step 4). within each genus, were selected to build a reduced taxonomic refer- ence tree (Figure 1, step 4). In case the only reference for a given genus Species names were validated following taxonomically accepted was represented by a shorter sequence, its phylogenetic placement was names in ALGAEBASE (Guiry & Guiry, 2017; names marked as “C”). verified building a “rough tree” based on a shorter alignment, but these Sequences included in the quality-checked database were aligned with shorter sequences were excluded from the alignment used to build the

MAFFT version 7, and a phylogenetic tree was built using FASTTREE (Price, reduced taxonomic reference tree. The data set was aligned using MAFFT

Dehal, & Arkin, 2010) as implemented in the GENEIOUS software (Kearse and the tree built with RAXML (Stamatakis, 2014). Branch support was et al., 2012). This primary tree provided information on the number, statisti- established using 100 bootstrap replicates. This reduced tree provided cal support and position of the different terminal (Figure 1, step 4). a clearer representation of dinoflagellate phylogeny and enabled checks In order to identify sequences attributed to the wrong genus in on the phylogenetic relationships for the terminal clades, allowing the

GENBANK, we aligned the sequences labelled with the same genus name removal of any remaining nondinoflagellate sequence from the quality- using MAFFT and visualized them in Seaview (Gouy et al., 2010). When checked database (Figure 1, step 5). This set of taxonomically curated possible (three or more sequences per genus), a maximum-likelihood sequences was called the “taxonomically checked database.” tree was inferred using PHYML version 3.0 (Guindon et al., 2010). Branch support was established using 100 bootstrap replicates. Gen- 2.4 | Database finalization era represented by less than three sequences were grouped with their closest phylogenetic groups after verifying their positioning in the pri- From the taxonomically checked database, a final nonredundant mary tree. Outlying sequences in the generic trees were sought in the sequence alignment (i.e., only containing unique sequences) was pro- primary tree, and if they were found inside the of another genus, duced using MAFFT and a majority-rule consensus tree (RAXML, 100 boot- they were renamed accordingly. straps) was built (Figure 1, step 6). Every sequence in the taxonomically 4 | MORDRET ET AL. checked database was labelled using a standardized eight term-ranked 3 | RESULTS taxonomy, that is, , Supergroup, Division, Class, Order, Family, 2 Genus and Species in the same way as in PR (Guillou et al., 2013). The 3.1 | The DINOREF database database includes also an additional classification of the sequences into different “Superclades” defined according to the results of phylogenetic 3.1.1 | Phase 1: Sequence verification inferences and specialized literature (Figure 1, step 7).

The standard metadata extracted from GENBANK (see Supplemen- A total of 6,175 GENBANK sequences were downloaded with the given tary Material 1 for a complete list) has been supplemented by infor- search criteria. Upon removal of sequences not assigned at a genus mation on: (i) species habitat (Guiry & Guiry, 2017; Hoppenrath, or species level, 4,199 were retained, and of these, 2,223 remained Murray, Chomerat, & Horiguchi, 2014), (ii) potential toxicity following removal of non-18S sequences as well as 18S sequences (Moestrup et al., 2009) and (iii) symbiotic or parasitic lifestyle derived in which the V4 region was incomplete or lacking. The verification from original papers (Figure 1, step 8). The taxonomically checked process resulted in a database of 1,684 aligned, good-quality database integrated with metadata constitutes “DINOREF (dinoflagel- sequences. late reference database)”. 3.1.2 | Phase 2: Taxonomic validation 2.5 | V4 analysis The primary tree inferred from the quality-checked database To compare the resolution power of the V4 barcode region with that revealed that genera constituted the best-supported taxonomic level; of the full-length 18S gene, the V4 fragments were extracted from the that is, they were usually monophyletic with high bootstrap support alignment using the V4 primers described in Piredda et al. (2017). (data not shown). Three genera were found to be represented by Sequences of the V4 region were then dereplicated and split by genus more than 150 sequences each: Alexandrium with 210 sequences, and Superclade using Mothur (Schloss et al., 2009). Each group of with 169 sequences and with 173 sequences was then re-aligned using MAFFT and checked manually. For sequences (around 36% of the sequences in the database; Table 2). each Superclade and each genus, distance matrices of pairwise differ- Some species within these three genera were represented by a large ences between sequences over the length of the V4 (p-distance) were number of different 18S sequences. For instance, 54 slightly differ- generated using the software MEGA version 7 (Kumar, Stecher, & ent sequences were attributed to Gambierdiscus scabrosus. The num- Tamura, 2016). Boxplots of distances were produced with R (R Devel- ber of species included in other genera was much lower: a first opment Core Team, 2016) using the “ggplot2” library (Wickham, group of 27 genera included each between 68 and 10 sequences 2009). V4 OTUs at 98% similarity were generated clustering the (about 40% of the total number of sequences), a second group of 55 sequences with VSEARCH algorithm with “distance-based greedy cluster- genera contained between nine and three sequences (18% of the ing” (DGC, Rognes, Flouri, Nichols, Quince, & Mahe, 2016) as imple- total), whereas the remaining 63 genera were represented by one or mented in Mothur. The V4 unique sequences and the V4 OTUs at two sequences only (6% of the total) (Figure 4). 98% similarity are provided as supplementary materials (see below). In The resulting taxonomically checked database contained 1,671 those files, we also listed the 18S sequences sharing the same V4 dinoflagellate sequences, corresponding to 1,540 unique sequences region and the V4 sequences clustering in the same OTU at 98%. and belonging to 149 genera (Table 1) and 422 species (Table 2). 2 DINOREF has been incorporated into PR version 4.9.0 which is Thirteen of the original 1,684 sequences had to be removed because available at https://doi.org/10.6084/m9.figshare.5913181. they did not belong to dinoflagellates. The assignation of 468

Supplementary data that include the DINOREF database as an Excel sequences (28% of the total database) had to be revised because the file (Supplementary Material 1), a full-length 18S sequences fasta names originally assigned to them were synonyms, invalid or the phy- text file (Supplementary Material 2) associated with three different logenetic analyses revealed that they were attributed to the wrong taxonomy files, (i) the original taxonomy of GENBANK entry (Supple- taxon. Sequence length ranged from 579 to 1,764 bp (Figure S1), mentary Material 3), (ii) the curated taxonomy (Supplementary Mate- with 1,200 of them being full or nearly full length (between 1,600 rial 4) and (iii) the curated taxonomy including the Superclade and 1,764 bp). The majority of sequences between 1,100 and classification used in this study (Supplementary Material 5), are avail- 1,400 bp originated from single-cell amplifications. able in flat file format on Figshare (https://doi.org/10.6084/m9. figshare.5568454). The format of the fasta and taxonomy text files 3.1.3 | Phase 3: Database finalization is compatible with Mothur (Schloss et al., 2009) and Qiime (Capo- raso et al., 2010) tools. Fasta and taxonomy files can be opened with The curated sequences were organized hierarchically, following the 2 a text editor. We also provide two Excel files with V4 sequences 8-level taxonomic framework used in the PR database (Columns E-L and V4 OTUs at 98% similarity (Supplementary Materials 6 and 7). in Supplementary Material 1). For the assignation of the “order” Supplementary Material 8 includes a fasta file containing all level, we followed a conservative approach accepting the following sequences of early branching dinoflagellates recovered from GENBANK six orders: Gonyaulacales, , Dinophysiales, , but not included in DINOREF. Suessiales and (“Order” in Table 1; column I in MORDRET ET AL. | 5

Supplementary Material 1). Some sequences could not be placed There was an incomplete and unbalanced 18S sequence represen- within any of these orders and were listed as Dinophyceae ordo tation of described taxa, and the number of available sequences dif- . Assignment at the family level was problematic due fered considerably among Superclades, clades and genera (Table 2). to the still unresolved phylogenetic relationship among genera (see For example, the order Suessiales included 14 genera represented by Section 4). We generally followed the classification provided by 18S sequences and 11 without reference 18S sequence, while

ALGAEBASE (Guiry & Guiry, 2017). Gonyaulacales were reasonably well covered (17 genera with sequences and three without). Overall, only 422 of the 2,342 (18%) taxonomically recognized dinoflagellate species were found to have a 3.2 | Superclades: an attempt to depict the current reference 18S sequence (Guiry & Guiry, 2017) (Table 2). When consid- organization of dinoflagellates ering genera, only 149 of the 232 genera had a reference sequence.

A majority-rule consensus tree built with the 1,540 unique The DINOREF database includes also a broad set of metadata sequences provided a phylogenetic representation of the 18S retrieved from GENBANK and/or from the taxonomic literature (Supple- sequences contained in the DINOREF database. In this tree, a large mentary Material 1). Most sequences (89%) originated from marine number of clades with ≥50% bootstrap support collapsed onto a ecosystems, 8% from freshwater and 2% from brackish, estuarine or polytomy (Figure 2). The well-supported clades of the majority-rule continental saline habitats. Specific lifestyle information (i.e., sym- consensus tree (Table 1; Figure 2) were grouped by us into higher biont or parasite and their host) is available for 302 sequences taxonomic ranks (“Superclade” in Table 1; Column D in Supplemen- (18%). In addition, 24% of the sequences were annotated as benthic, tary Material 1) reflecting the current taxonomic organization based and 34% as belonging to potentially toxic species. on previously published phylogenies and multigene phylogenies (see selected references in Table 1). 3.3 | The barcoding V4 region The largest Superclade included species of the order Gonyaula- cales (Superclade 1) and was recovered from the tree in five well- If only the V4 region is considered, the number of unique DINOREF supported clades. Within it, clade 1A (Table 1; Figure 2) encom- sequences shrunk from 1,540 to 946 (Supplementary Material 6). passed a large number of sequences of the genera Alexandrium and This decrease in sequence number was mostly due to slightly differ- Gambierdiscus, while the other contained either a single genus (clade ent intraspecific 18S sequences sharing identical V4 ribotypes, that 1D, ) or different but phylogenetically closely related gen- is, haplotypes, unique sequences of ribosomal genes. Nonetheless, a era (e.g., clade 1B, and Tripos). large proportion of the intraspecific diversity, observed when the Sequences of the order Dinophysiales (Superclade 2) grouped whole 18S was taken into account, was also detectable when only into three clades, while Prorocentrales (Superclade 11) were dis- the V4 marker was considered. For example, the single morphologi- tributed over five distinct clades. cally defined species Alexandrium fundyense and Gambierdiscus Sequences attributed to the order Peridiniales sensu lato were scabrosus were represented by 103 and 54 18S and 36 and 39 V4 considerably diversified in the 18S phylogeny, and seven Superclades sequences, respectively. However, cases occurred in which different were identified: Thoracosphaeraceae (7 clades), Peridiniales sensu species shared the same V4 ribotype; For example, V4 failed to dis- stricto (4 clades), Kryptoperidiniaceae, Heterocapsaceae and tinguish toxic spinosum from nontoxic A. trinitatum, Kare- Podolampadaceae, each with one clade; one Superclade only nia brevis from K. mikimotoi and seven species from one included a pair of genera (Ensiculifera and Pentapharsodinium), and another (Supplementary Material 6). Moreover, there are cases in only one included a single genus (Blastodinium). which sequences belonging to closely related genera shared identical All sequences of the order Suessiales (Superclade 3) grouped in a V4 regions; as is the case of the V4 #788 shared among single clade and sequences of species classified within Gymno- veneficum, Takayama pulchellum and Takayama acrotrocha, and the diniales sensu lato clustered into six distinct Superclades (Super- V4 #315 which is identical in several spp. and Ornithocer- clades 12–17), of which Superclade 13 included all Gymnodiniales cus spp. (Supplementary Material 6). Overall, 374 of the 422 species sensu stricto. Two Superclades of Gymnodiniales sensu lato corre- included in DINOREF could be identified with the V4 barcode marker. sponded to Torodiniales and Kareniaceae, respectively, and the other In general, Superclades with many sequences deriving from sev- three Superclades contained sequences attributed to a single genus, eral genera showed large pairwise p-distances for the V4 region (Fig- which are , Akashiwo and Gyrodinium. ure 3). Yet patterns differed profoundly among the Superclades. For Two Superclades, that is, Tovelliaceae and Ptychodiscales, were example, Superclade 1 (Gonyaulacales) that contained 543 sequences classified as “Dinophyceae ordo incertae sedis” in ALGAEBASE (Guiry & from 17 genera and 92 species showed similar p-distance to the Guiry, 2017). A series of dinoflagellate sequences, often including smaller Superclade 8 (Peridiniales sensu stricto) that had only 116 taxa of uncertain classification, were resolved in a whole series of sequences from 13 genera and 46 species (Table 2; Figure 3). Other small clades, and even single sequences, all emerging from the poly- Superclades, such as #16 (Amphidinium) or #18 (Tovelliaceae) and tomy. We distributed these sequences into two categories: “Uncer- #19 (Blastodinium), showed high levels of variation even though tain naked dinophyceae (UND)” and “Uncertain thecate dinophyceae there were only a small number of sequences (Table 2; Figure 3). (UTD)”. Superclade 2 (Dinophysiales) showed similar p-distance patterns as 6 | MORDRET ET AL.

TABLE 1 Organization of the dinoflagellate 18S sequences into genera, clades and Superclades

(Continued) MORDRET ET AL. | 7

TABLE 1 (Continued)

Clades represent the statistically supported (bootstrap values ≥50) larger clades of the majority-rule consensus tree. Clades are grouped into Superclades based on recent taxonomic literature; a selection of references supporting the identification of Superclades is reported. Dinoflagellate species included in DINOREF, but lacking morphological or phylogenetic evidence to be placed within any Superclade has been grouped in “Uncertain Thecate Dinophyceae” and “Uncertain Naked Dinophyceae.” Names in bold represent genera included in the multigene dinoflagellate phylogeny by Orr et al. (2012). Colours identifying clades, and Superclades correspond to those used in the tree (Figure 2).

#3 (Suessiales), but included less than half as many sequences and a the same genus and 12 OTUs included sequences from different far lower number of genera and species. Within genera, the largest genera. Remarkably, a single OTU (OTU #126 in Supplementary p-distance values were found for Amphidinium, Coolia, Gambierdiscus, material 7) contained sequences from 59 genera belonging to differ- Gonyaulax, Protoperidinium and Togula (Figure 4). ent Superclades (e.g., Karlodinium, Prorocentrum, Gyrodinium, When clustered into OTUs at 98% similarity, the 946 unique V4 Podolampas, Duboscquodinium). On the other hand, sequences sequences were reduced to 313 OTUs (Supplementary Material 6); belonging to the same species clustered in different OTUs (e.g., 33 of these OTUs included sequences from different species within Alexandrium fundyense Group_I and Gambierdiscus scabrosus). 8 | MORDRET ET AL.

TABLE 2 Number of unique and total dinoflagellate 18S rRNA gene sequences by Superclade included in the database

No. of sequences in No. of taxa in Total no. of DINOREF DINOREF described taxa

Superclades Unique Total Genera Species Genera Species # 1 Gonyaulacales 507 543 17 85 20 296 # 2 Dinophysiales 97 97 8 38 13 358 # 3 Suessiales 223 240 14 29 26 91 # 4 Thoracosphaeraceae 71 82 16 22 19 66 # 5 26 27 2 11 2 20 # 6 Kryptoperidiniaceae 21 23 5 10 6 16 #7 Pentapharsodinium-Ensiculifera 662 4 26 # 8 Peridiniales sensu stricto 106 116 13 46 25 475 # 9 Heterocapasaceae 18 20 1 7 1 16 # 10 Podolampadaceae 7 7 4 7 8 42 # 11 Prorocentrales 70 78 2 28 4 68 #12 Akashiwo 8131 1 1 1 # 13 Gymnodiniales sensu stricto 115 129 16 41 21 341 # 14 Kareniaceae 31 38 4 9 8 40 #15 Gyrodinium 15 15 1 7 3 112 #16 Amphidinium 36 40 1 10 3 101 # 17 Torodiniales 9 9 2 3 2 3 # 18 Tovelliaceae 6 10 4 4 4 19 #19 Blastodinium 29 32 1 8 1 13 # 20 Ptychodiscales 1 1 1 1 1 2 UTD Uncertain Thecate Dinophyceae 56 56 23 32 37 172 UND Uncertain Naked Dinophyceae 82 89 11 19 25 84 Total 1,540 1,671 149 422 232 2,342

Number of dinoflagellate genera and species represented in the database by at least one sequence. Sequences not assigned to the species level (anno- tated as “sp.”) were not considered. Total number of genera and species described (based on Gomez (2012a), ALGAEBASE (Guiry & Guiry, 2017), CEDIT (Hoppenrath & Elbr€achter, 2015).

4 | DISCUSSION defined taxa in multiple clades emerging directly from a polytomy neither supports nor falsifies the natural status of these taxa; mor- 2 DINOREF adds to the already available 18S protists databases PR (Guil- phological evidence is simply not backed up by molecular evidence. lou et al., 2013) and SILVA (Quast et al., 2013) a considerable number Whether this polytomy is hard, that is, real, due to a brief period of of new 18S sequences, provides an updated and validated identifica- rapid radiation, or soft, due to paucity of data, remains to be tion for more than 400 entries, and places each sequence within a resolved. One promising approach is multigene phylogenies that start curated 8-level taxonomic framework. This is a “conservative” hierar- to provide a clearer picture of dinoflagellate evolutionary patterns chical system, largely reflecting the one presented in ALGAEBASE (Guiry (Janouskovec et al., 2017; Orr et al., 2012). Some groups, such as & Guiry, 2017). In addition, we also present a tentative classification Gymnodiniales, appear to be polyphyletic and are in need of taxo- of dinoflagellates based on the 18S phylogeny and supported by nomic revision. As a matter of fact, various papers have been recent taxonomic literature (“Superclade” in Table 1; column D in recently published to clarify the taxonomic position of several groups

Supplementary Material 1). DINOREF also includes sequences of the (e.g., Boutrup, Moestrup, Tillmann, & Daugbjerg, 2016; Yuasa, Hori- V4 region with information on this marker’s capability to discriminate guchi, Mayama, & Takahashi, 2016). It is clear that the taxonomic among species. treatment of the genera needs revision. For example, genera such as The marked polytomy of the 18S tree built with the sequences Protoperidinium are paraphyletic with daughter genera inside them included in DINOREF leaves the phylogenetic status of most of the (Gu, Liu, & Mertens, 2015; Liu et al., 2015; Mertens et al., 2015), higher taxa unresolved. These results are in line with what was suggesting that these paraphyletic genera need to be split. known from previous studies carried out with this ribosomal marker The comparison between the number of dinoflagellate species

(e.g., Bachvaroff et al., 2014; Murray et al., 2005; Saldarriaga, Taylor, estimated by recent checklists (Gomez, 2012a) and ALGAEBASE (Guiry Keeling, & Cavalier-Smith, 2001). Recovery of morphologically & Guiry, 2017) and those for which sequences are available in MORDRET ET AL. | 9

FIGURE 2 Consensus phylogenetic tree (RAXML, GTR model) based on 1,540 unique 18S rRNA sequences in the DINOREF. Alignment of 2,153 bp with three sequences of (U97109; X56165 and X03772) and three sequences of (M97703; AF236097 and AF291427) used as outgroup. Clades are ordered according to their size and are supported by bootstrap values ≥50%; black dots are proportional to bootstrap values. The colours of the Superclades and clades correspond to those in Table 1. Clades within each Superclade have been marked (A, B, C, etc.), along the outer rim of the tree, corresponding to their assignment in this figure. The Superclades “Uncertain Naked Dinophyceae” and “Uncertain Thecate Dinophyceae” have not been marked and neither have the small clades on the upper left of the tree. The tree can be visualized on ITOL version 3—Interactive Tree of Life (Letunic and Bork, 2016, at https://itol.embl.de/tree/ 1932052318357911479398328) in which all clades are marked

FIGURE 3 (a) Barplot showing the number of genera with 18S rRNA information in 19 of the 20 Superclades depicted in Table 1 and Figure 2. Superclade 20 (Ptychodiscales) is not shown as it contains only one sequence. The number of sequences in each Superclade is reported on the top of each bar. (b) Boxplot showing the pairwise p-distances of the V4 region in the 19 dinoflagellate Superclades 10 | MORDRET ET AL.

FIGURE 4 Boxplots showing the range of the pairwise p-distances over the V4 region of the 18S rRNA sequences within the genera included in each Superclade (indicated by number and colour code as in Table 1 and Figure 2). Number of sequences that have been used to calculate pairwise p-distance and number of species represented by those sequences are specified for each genus. Sequences with no species name (annotated “sp.”) were not accounted for in the number of species, but still used for the calculation of pairwise p-distance MORDRET ET AL. | 11

DINOREF shows that all the taxonomic groups (defined here as Super- Phylogenetic studies and metabarcoding approaches will provide clades) have representatives in the 18S data set, but not all of them important information in this direction in the coming years. We have are equally well covered. The coverage of the DINOREF database is shown the good resolution capability of the V4 18S rDNA marker for strongly biased towards sequences of species in the focus of global dinoflagellates, but other barcode regions are worth exploring such as research such as toxic species (e.g., Alexandrium, Gambierdiscus) and conserved regions of the LSU (e.g., Grzebyk et al., 2017). We recom- endosymbionts associated with coral reefs (Symbiodinium). In general, mend testing different potential markers in parallel with the production it is also biased towards autotrophs because these organisms are of new 18S reference sequences, because a larger amount of data is more easily grown in culture and therefore have been characterized required to achieve the best possible solution for dinoflagellates and more easily than heterotrophs and mixotrophs, which are difficult to protists in general. isolate and may require alternative approaches such as single-cell sequencing. The DINOREF database is still far from covering dinoflagel- late diversity. Only 18% of the 2,342 described species according to NOTE ADDED IN PROOF ALGAEBASE (Guiry & Guiry, 2017) are represented by an 18S reference Proposed taxonomic revisions are being incorporated in NCBI GEN- sequence. The 422 dinoflagellate species with an 18S sequence pre- BANK. sent in DINOREF represent a marked increase from the around 150 species reported by Murray et al. (2005) and from the 345 by Gomez (2014) showing that sequence information is increasing ACKNOWLEDGEMENTS rapidly as a result of the description of new species from different SM has been supported by a PhD fellowship from Stazione Zoo- geographic areas. However, there are still 83 genera and 1,639 spe- logica Anton Dohrn (SZN). This study was supported by the pro- cies for which no 18S sequences are available yet. ject FIRB Biodiversitalia (RBAP10A2T4) funded by the Italian In some cases, however, the diversity of species defined based on Ministero dell'Istruzione, dell'Università e della Ricerca (MIUR). We morphological characters is probably overestimated. This may be the are grateful to Dr. Chris Bowkett for improving the use of English case of species attributed to the genus , where 38% of in the manuscript. the 270 described species have not been reported since their original description (Thessen, Patterson, & Murray, 2012). However, there are also genera such as Symbiodinium and Scrippsiella in which a marked AUTHOR CONTRIBUTION intrageneric diversity has been uncovered by molecular approaches S.M., D.S., R.P., W.H.C.F.K., D.V. and M.M. designed the research. S.M. (Gottschling et al., 2012; Pochon, Putnam, & Gates, 2014). and R.P. collected data and the designed framework for molecular and The usefulness of metabarcoding using the V4 region of the 18S computational analyses. S.M. assembled the database and provided a rRNA gene depends on how well the region is able to distinguish the first draft of the manuscript; all authors contributed to its final version. various taxa of interest (Bendif et al., 2014; Hu et al., 2015). For dinoflagellates, it is considered variable enough to discern most of the species (Ki, 2012; Smith, Kohli, Murray, & Rhodes, 2017) although DATA ACCESSIBILITY there are some exceptions. Our results show that the V4 region can DINOREF is available in flat file format on Figshare at https://doi.org/ unambiguously discriminate 374 of the 422 species included in DI- 10.6084/m9.figshare.5568454 and is incorporated in the NOREF. On the upside, this outcome illustrates the capability of the V4 2 barcode region to resolve protist diversity (Hu et al., 2015; Ki, 2012), Ribosomal Reference database (PR ) version 4.9.0 which is available but on the downside, the V4 is unable to discriminate between some at https://doi.org/10.6084/m9.figshare.5913181. toxic and nontoxic close relatives within Dinophysis, and Aza- dinium. In metabarcoding studies, sequences are clustered at a given ORCID similarity level (98% or 97%) to avoid inflating diversity estimates (Massana et al., 2015; Onda et al., 2017; Smith et al., 2017; de Vargas Diana Sarno http://orcid.org/0000-0001-9697-5301 et al., 2015). This may be appropriate for species exhibiting high intraspecific sequence variation (e.g., in Symbiodinium, Gambierdiscus and Alexandrium), yet clustering at the 98% level means that closely REFERENCES related species are lumped together in single OTU. We therefore rec- Adl, S. M., Simpson, A. G. B., Lane, C. E., Lukes, J., Bass, D., Bowser, S. ommend using ribotypes (unique sequences without any OTU cluster- S., ... Spiegel, F. W. (2012). The revised classification of eukaryotes. ing) rather than clustering for studies that focus on the species-level Journal of Eukaryotic Microbiology, 59(5), 429–493. https://doi.org/10. biodiversity of dinoflagellates. 1111/j.1550-7408.2012.00644.x Amaral-Zettler, L. A., McCliment, E. A., Ducklow, H. W., & Huse, S. M. The enormous diversity of unicellular organisms, the augmented (2009). A method for studying protistan diversity using massively par- knowledge we have about their morphology, ecology and life cycles allel sequencing of V9 hypervariable regions of small-subunit riboso- together with the “revolution” of molecular approaches call for estab- mal RNA genes. PLoS ONE, 4(7), e6372. https://doi.org/10.1371/ lishing a common taxonomic framework (e.g., Berney et al., 2017). journal.pone.0006372 12 | MORDRET ET AL.

Anderson, D. M., Cembella, A. D., & Hallegraeff, G. M. (2012). Progress (Dinophyceae). Protist, 161(1), 35–54. https://doi.org/10.1016/j.pro in understanding harmful algal blooms: Paradigm shifts and new tech- tis.2009.06.004 nologies for research, monitoring, and management. Annual Review of Gottschling, M., Calasan, A. Z., Kretschmann, J., & Gu, H. (2017). Two Marine Science, 4(1), 143–176. https://doi.org/10.1146/annurev-ma new generic names for dinophytes harbouring a as an rine-120308-081121 endosymbiont, Blixaea and Unruhdinium (Kryptoperidiniaceae, Peri- Bachvaroff, T. R., Gornik, S. G., Concepcion, G. T., Waller, R. F., Mendez, diniales). Phytotaxa, 306(4), 296–300. https://doi.org/10.11646/phyto G. S., Lippmeier, J. C., & Delwiche, C. F. (2014). Dinoflagellate phy- taxa.306.4.6 logeny revisited: Using ribosomal proteins to resolve deep branching Gottschling, M., Soehner, S., Zinssmeister, C., John, U., Plotner,€ J., Sch- dinoflagellate clades. Molecular Phylogenetics and Evolution, 70(1), weikert, M., ... Elbr€achter, M. (2012). Delimitation of the Thora- 314–322. https://doi.org/10.1016/j.ympev.2013.10.007 cosphaeraceae (Dinophyceae), including the calcareous Bendif, E. M., Probert, I., Carmichael, M., Romac, S., Hagino, K., & de Var- dinoflagellates, based on large amounts of ribosomal RNA sequence gas, C. (2014). Genetic delineation between and within the wide- data. Protist, 163(1), 15–24. https://doi.org/10.1016/j.protis.2011.06. spread coccolithophore morpho-species Emiliania huxleyi and 003 Gephyrocapsa oceanica (Haptophyta). Journal of Phycology, 50(1), 140– Gouy, M., Guindon, S., & Gascuel, O. (2010). SeaView version 4: A multi- 148. https://doi.org/10.1111/jpy.12147 platform graphical user interface for sequence alignment and phylo- Berdalet, E., Fleming, L. E., Gowen, R., Davidson, K., Hess, P., Backer, L. genetic tree building. Molecular Biology and Evolution, 27(2), 221–224. C., ... Enevoldsen, H. (2016). Marine harmful algal blooms, human https://doi.org/10.1093/molbev/msp259 health and wellbeing: Challenges and opportunities in the 21st cen- Grzebyk, D., Audic, S., Lasserre, B., Abadie, E., de Vargas, C., & Bec, B. tury. Journal of the Marine Biological Association of the United King- (2017). Insights into the harmful algal flora in northwestern Mediter- dom, 96(1), 61–91. https://doi.org/10.1017/S0025315415001733 ranean coastal lagoons revealed by pyrosequencing metabarcodes of Berney, C., Ciuprina, A., Bender, S., Brodie, J., Edgcomb, V., Kim, E., ... the 28S rRNA gene. Harmful , 68,1–16. https://doi.org/10. de Vargas, C. (2017). UniEuk: Time to speak a common language in 1016/j.hal.2017.06.003 protistology!. Journal of Eukaryotic Microbiology, 64(3), 407–411. Gu, H., Liu, T., & Mertens, K. N. (2015). Cyst–theca relationship and phy- https://doi.org/10.1111/jeu.12414 logenetic positions of Protoperidinium (Peridiniales, Dinophyceae) spe- Boutrup, P. V., Moestrup, Ø., Tillmann, U., & Daugbjerg, N. (2016). Kato- cies of the sections Conica and Tabulata, with description of dinium glaucum (Dinophyceae) revisited: Proposal of new genus, fam- Protoperidinium shanghaiense sp. nov. Phycologia, 54(1), 49–66. ily and order based on ultrastructure and phylogeny. Phycologia, 55 https://doi.org/10.2216/14-047.1 (2), 147–164. https://doi.org/10.2216/15-138.1 Guillou, L., Bachar, D., Audic, S., Bass, D., Berney, C., Bittner, L., ... Chris- Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. ten, R. (2013). The Protist Ribosomal Reference database (PR2): A D., Costello, E. K., ... Walters, W. A. (2010). QIIME allows analysis of catalog of unicellular Small Sub-Unit rRNA sequences with high-throughput community sequencing data. Nature Methods, 7(5), curated taxonomy. Nucleic Acids Research, 41(D1), D597–D604. 335–336. https://doi.org/10.1038/nmeth.f.303.QIIME https://doi.org/10.1093/nar/gks1160 de Salas, M. F., Laza-Martınez, A., & Hallegraeff, G. M. (2008). Novel Guindon, S., Dufayard, J. F., Lefort, V., Anisimova, M., Hordijk, W., & Gas- unarmored dinoflagellates from the toxigenic family Kareniaceae cuel, O. (2010). New algorithms and methods to estimate maximum- (Gymnodiniales): Five new species of Karlodinium and one new likelihood phylogenies: Assessing the performance of PhyML 3.0. Sys- Takayama from the Australian sector of the Southern Ocean. Journal tematic Biology, 59(3), 307–321. https://doi.org/10.1093/sysbio/ of Phycology, 44(1), 241–257. https://doi.org/10.1111/j.1529-8817. syq010 2007.00458.x Guiry, M. D., & Guiry, G. M. (2017). AlgaeBase. World-wide elec- de Vargas, C., Audic, S., Henry, N., Decelle, J., Mahe, F., Logares, R., ... tronic publication. http://www.algaebase.org; searched on 15 Octo- Velayoudon, D. (2015). Eukaryotic plankton diversity in the sunlit ber 2017 ocean. Science, 348(6237), 1261605. https://doi.org/10.1126/scie Hackett, J. D., Anderson, D. M., Erdner, D. L., & Bhattacharya, D. (2004). nce.1261605 Dinoflagellates: A remarkable evolutionary experiment. American Jour- Decelle, J., Romac, S., Stern, R. F., Bendif, E. M., Zingone, A., Audic, S., nal of Botany, 91, 1523–1534. https://doi.org/10.3732/ajb.91.10. ... Christen, R. (2015). PhytoREF: A reference database of the plas- 1523 tidial 16S rRNA gene of photosynthetic eukaryotes with curated tax- Hoppenrath, M., Chomerat, N., & Leander, B. S. (2013). Molecular phy- onomy. Molecular Ecology Resources, 15(6), 1435–1445. https://doi. logeny of Sinophysis: Evaluating the possible early evolutionary his- org/10.1111/1755-0998.12401 tory of dinophysoid dinoflagellates. In Biological and geological Fensome, R. A., Taylor, M. F. J. R., Norris, G., Sarjeant, W. A. S., Wharton, perspectives of dinoflagellates. The Micropalaeontological Society, Spe- D. I., & Wiliams, G. L. (1993). A classification of living and fossil cial Publications. Geological Society, London (pp. 207–214). https://d dinoflagellates. Hanover, Pennsylvania, USA: Sheridan Press. oi.org/10.1144/TMS5 Flø Jørgensen, M. F., Murray, S., & Daugbjerg, N. (2004). Amphidinium Hoppenrath, M., & Elbrachter,€ M. (2015). Center of Excellence for Dino- revisited. I. Redefinition of Amphidinium (Dinophyceae) based on phyte Taxonomy (CEDiT). Retrieved from http://www.dinophyta.org cladistic and molecular phylogenetic analyses. Journal of Phycology, Hoppenrath, M., & Leander, B. S. (2007). Morphology and phylogeny of 40(2), 351–365. https://doi.org/10.1111/j.1529-8817.2004.03131.x the pseudocolonial dinoflagellates Polykrikos lebourae and Polykrikos Gomez, F. (2012a). A quantitative review of the lifestyle, habitat and herdmanae n. sp. Protist, 158(2), 209–227. https://doi.org/10.1016/j. trophic diversity of dinoflagellates (Dinoflagellata, Alveolata). System- protis.2006.12.001 atics and Biodiversity, 10(3), 267–275. https://doi.org/10.1080/ Hoppenrath, M., Murray, S. A., Chomerat, N., & Horiguchi, T. (2014). In 14772000.2012.721021 M. Hoppenrath, S. A. Murray, N. Chomerat & T. Horiguchi (Eds.), Gomez, F. (2012b). A checklist and classification of living. CICIMAR Marine benthic dinoflagellates - unveiling their worldwide biodiversity Oceanides , 27(1), 65–140. (Kleine Sen). Stuttgart, Germany: Schweizerbart’sche Verlagsbuch- Gomez, F. (2014). Problematic biases in the availability of molecular markers handlung. in protists: The example of the Dinoflagellates. Acta Protozoologica, 53 Hoppenrath, M., Murray, S., Sparmann, S. F., & Leander, B. S. (2012). (1), 63. https://doi.org/10.4467/16890027AP.13.0021.1118 Morphology and molecular phylogeny of Ankistrodinium gen. Gomez, F., Moreira, D., & Lopez-Garc ıa, P. (2010). Neoceratium gen. nov., nov. (Dinophyceae), a new genus of marine sand-dwelling dinoflag- a new genus for all marine species currently assigned to Ceratium ellates formerly classified within Amphidinium. Journal of Phycology, MORDRET ET AL. | 13

48(5), 1143–1152. https://doi.org/10.1111/j.1529-8817.2012. Environmental Microbiology, 17(10), 4035–4049. https://doi.org/10. 01198.x 1111/1462-2920.12955 Hu, S. K., Liu, Z., Lie, A. A. Y., Countway, P. D., Kim, D. Y., Jones, A. C., Mertens, K. N., Takano, Y., Yamaguchi, A., Gu, H., Bogus, K., Kremp, A., ... Caron, D. A. (2015). Estimating protistan diversity using high- ... Matsuoka, K. (2015). The molecular characterization of the enig- throughput sequencing. Journal of Eukaryotic Microbiology, 62(5), 688– matic dinoflagellate Kolkwitziella acuta reveals an affinity to the 693. https://doi.org/10.1111/jeu.12217 Excentrica section of the genus Protoperidinium. Systematics and Biodi- Janouskovec, J., Gavelis, G. S., Burki, F., Dinh, D., Bachvaroff, T. R., Gor- versity, 13(6), 509–524. https://doi.org/10.1080/14772000.2015. nik, S. G., ... Saldarriaga, J. F. (2017). Major transitions in dinoflagel- 1078855 late evolution unveiled by phylotranscriptomics. Proceedings of the Moestrup, Ø., Akselmann, R., Fraga, S., Hansen, G., Hoppenrath, M., Iwa- National Academy of Sciences, 114(2), E171–E180. https://doi.org/10. taki, M., ... Zingone, A. (2009). IOC-UNESCO taxonomic reference 1073/pnas.1614842114 list of harmful micro algae. Accessed at http://www.marinespecies. Jephcott, T. G., Alves-de-Souza, C., Gleason, F. H., van Ogtrop, F. F., org/hab on 2017-03-13 Sime-Ngando, T., Karpov, S. A., & Guillou, L. (2016). Ecological Montresor, M., John, U., Beran, A., & Medlin, L. K. (2004). Alexandrium impacts of parasitic chytrids, and perkinsids on popula- tamutum sp. nov. (Dinophyceae): A new nontoxic species in the tions of marine photosynthetic dinoflagellates. Fungal Ecology, 19, genus Alexandrium. Journal of Phycology, 40(2), 398–411. https://doi. 47–58. https://doi.org/10.1016/j.funeco.2015.03.007 org/10.1111/j.1529-8817.2004.03060.x John, U., Litaker, R. W., Montresor, M., Murray, S., Brosnahan, M. L., & Morard, R., Darling, K. F., Mahe, F., Audic, S., Ujiie, Y., Weiner, A. K. M., Anderson, D. M. (2014). Formal revision of the Alexandrium tamar- ... de Vargas, C. (2015). PFR2: A curated database of planktonic for- ense species complex (Dinophyceae) taxonomy: The introduction of aminifera 18S ribosomal DNA as a resource for studies of plankton five species with emphasis on molecular-based (rDNA) classification. ecology, biogeography and evolution. Molecular Ecology Resources, 15 Protist, 165(6), 779–804. https://doi.org/10.1016/j.protis.2014.10. (6), 1472–1485. https://doi.org/10.1111/1755-0998.12410 001 Murray, S., Flø Jørgensen, M., Ho, S. Y. W., Patterson, D. J., & Jermiin, L. Katoh, K., & Standley, D. M. (2013). MAFFT multiple sequence alignment S. (2005). Improving the analysis of dinoflagellate phylogeny based software version 7: Improvements in performance and usability. on rDNA. Protist, 156(3), 269–286. https://doi.org/10.1016/j.protis. Molecular Biology and Evolution, 30(4), 772–780. https://doi.org/10. 2005.05.003 1093/molbev/mst010 Not, F., Siano, R., Kooistra, W. H. C. F., Simon, N., Vaulot, D., & Probert, Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, I. (2012). Diversity and ecology of eukaryotic marine phytoplankton. S., ... Drummond, A. (2012). Geneious Basic: An integrated and In G. Piganeau (Ed.), Genomics insights into the biology of algae (pp. 1– extendable desktop software platform for the organization and analy- 53). Amsterdam, The Netherlands: Elsevier Academic Press. https://d sis of sequence data. Bioinformatics, 28(12), 1647–1649. https://doi. oi.org/10.1016/b978-0-12-391499-6.00001-3 org/10.1093/bioinformatics/bts199 Onda, D. F. L., Medrinal, E., Comeau, A. M., Thaler, M., Babin, M., & Ki, J.-S. (2012). Hypervariable regions (V1-V9) of the dinoflagellate 18S Lovejoy, C. (2017). Seasonal and interannual changes in and rRNA using a large dataset for marker considerations. Journal of dinoflagellate species assemblages in the Arctic Ocean (Amundsen Applied Phycology, 24(5), 1035–1043. https://doi.org/10.1007/ Gulf, Beaufort Sea, Canada). Frontiers in Marine Science, 4, 16. s10811-011-9730-z https://doi.org/10.3389/fmars.2017.00016 Ki, J.-S., Jang, G. Y., & Han, M.-S. (2005). Integrated method for single- Orr, R. J. S., Murray, S. A., Stuken,€ A., Rhodes, L., & Jakobsen, K. S. cell DNA extraction, PCR amplification, and sequencing of ribosomal (2012). When naked became armored: An eight-gene phylogeny DNA from harmful dinoflagellates Cochlodinium polykrikoides and reveals monophyletic origin of theca in dinoflagellates. PLoS ONE, 7 Alexandrium catenella. Marine Biotechnology, 6(6), 587–593. https:// (11), e50004. https://doi.org/10.1371/journal.pone.0050004 doi.org/10.1007/s10126-004-1700-x Pawlowski, J., Audic, S., Adl, S., Bass, D., Belbahri, L., Berney, C., ... de Kumar, S., Stecher, G., & Tamura, K. (2016). MEGA7: Molecular evolu- Vargas, C. (2012). CBOL protist working group: Barcoding eukaryotic tionary genetics analysis version 7.0 for bigger datasets. Molecular richness beyond the , , and fungal kingdoms. PLoS Biology, Biology and Evolution, 33(7), 1870–1874. https://doi.org/10.1093/mol 10(11), e1001419. https://doi.org/10.1371/journal.pbio.1001419 bev/msw054 Piredda, R., Tomasino, M. P., D’Erchia, A. M., Manzari, C., Pesole, G., Le Bescot, N., Mahe, F., Audic, S., Dimier, C., Garet, M. J., Poulain, J., ... Montresor, M., ... Zingone, A. (2017). Diversity and temporal pat- Siano, R. (2016). Global patterns of pelagic dinoflagellate diversity terns of planktonic protist assemblages at a Mediterranean Long across protist size classes unveiled by metabarcoding. Environmental Term Ecological Research site. FEMS Microbiology Ecology, 93(1), Microbiology, 18(2), 609–626. https://doi.org/10.1111/1462-2920. https://doi.org/10.1093/femsec/fiw200 13039 Pochon, X., Putnam, H. M., & Gates, R. D. (2014). Multi-gene analysis of Lindberg, K., Moestrup, Ø., Daugbjerg, N., Woloszynskia, I., Woloszynska, Symbiodinium dinoflagellates: A perspective on rarity, symbiosis, and J., & Ehrenb, P. (2005). Studies on woloszynskioid dinoflagellates I: evolution. PeerJ, 2, e394. https://doi.org/10.7717/peerj.394 Woloszynskia coronata re-examined using light and electron micro- Price, M. N., Dehal, P. S., & Arkin, A. P. (2010). FastTree 2 - Approxi- scopy and partial LSU rDNA sequences, with description of Tovellia mately maximum-likelihood trees for large alignments. PLoS ONE, 5 gen. nov. and Jadwigia gen. nov. (Tovelliaceae fam. nov.). Phycologia, (3), e9490. https://doi.org/10.1371/journal.pone.0009490 44(4), 416–440. https://doi.org/10.2216/0031-8884(2005)44[416: Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., ... sowdiw]2.0.co;2 Glockner,€ F. O. (2013). The SILVA ribosomal RNA gene database Liu, T., Mertens, K. N., Ribeiro, S., Ellegaard, M., Matsuoka, K., & Gu, H. project: Improved data processing and web-based tools. Nucleic (2015). Cyst-theca relationships and phylogenetic positions of Peri- Acids Research, 41(D1), D590–D596. https://doi.org/10.1093/nar/ diniales (Dinophyceae) with two anterior intercalary plates, with gks1219 description of Archaeperidinium bailongense sp. nov. and Protoperi- R Development Core Team. (2016). R: A language and environment for dinium fuzhouense sp. nov. Phycological Research, 63(2), 134–151. statistical computing. Vienna, Austria: R Foundation for Statistical https://doi.org/10.1111/pre.12081 Computing. ISBN 3-900051-07-0. https://doi.org/10.1038/sj.hdy. Massana, R., Gobet, A., Audic, S., Bass, D., Bittner, L., Boutte, C., ... de 6800737 Vargas, C. (2015). Marine protist diversity in European coastal waters Ren~e, A., Camp, J., & Garces, E. (2015). Diversity and phylogeny of and sediments as revealed by high-throughput sequencing. Gymnodiniales (Dinophyceae) from the NW Mediterranean Sea 14 | MORDRET ET AL.

revealed by a morphological and molecular approach. Protist, 166(2), Takano, Y., Hansen, G., Fujita, D., & Horiguchi, T. (2008). Serial replace- 234–263. https://doi.org/10.1016/j.protis.2015.03.001 ment of diatom endosymbionts in two freshwater dinoflagellates, Rognes, T., Flouri, T., Nichols, B., Quince, C., & Mahe, F. (2016). Peridiniopsis spp. (Peridiniales, Dinophyceae). Phycologia, 47(1), 41–53. VSEARCH: A versatile open source tool for metagenomics. PeerJ, 4, https://doi.org/10.2216/07-36.1 e2584. https://doi.org/10.7717/peerj.2584 Thessen, A. E., Patterson, D. J., & Murray, S. A. (2012). The taxonomic Ruiz Sebastian, C., & O’Ryan, C. (2001). Single-cell sequencing of significance of species that have only been observed once: The genus dinoflagellate (Dinophyceae) nuclear ribosomal genes. Molecular Ecol- Gymnodinium (Dinoflagellata) as an example. PLoS ONE, 7(8), e44015. ogy Notes, 1(4), 329–331. https://doi.org/10.1046/j.1471-8278 https://doi.org/10.1371/journal.pone.0044015 Salas, R., Tillmann, U., & Kavanagh, S. (2014). Morphological and molecu- Tillmann, U., Gottschling, M., Nezan, E., Krock, B., & Bilien, G. (2014). lar characterization of the small armoured dinoflagellate Heterocapsa Morphological and molecular characterization of three new Azadinium minima (Peridiniales, Dinophyceae). European Journal of Phycology, 49 species (Amphidomataceae, Dinophyceae) from the Irminger Sea. Pro- (4), 413–428. https://doi.org/10.1080/09670262.2014.956800 tist, 165(4), 417–444. https://doi.org/10.1016/j.protis.2014.04.004 Saldarriaga, J. F., Taylor, F. J. R., Keeling, P. J., & Cavalier-Smith, T. Weisse, T., Anderson, R., Arndt, H., Calbet, A., Hansen, P. J., & Mon- (2001). Dinoflagellate nuclear SSU rRNA phylogeny suggests multiple tagnes, D. J. S. (2016). Functional ecology of aquatic phagotrophic plastid losses and replacements. Journal of Molecular Evolution, 53(3), protists – Concepts, limitations, and perspectives. European Journal 204–213. https://doi.org/10.1007/s002390010210 of Protistology, 55,50–74. https://doi.org/10.1016/j.ejop.2016.03. Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, 003 E. B., ... Weber, C. F. (2009). Introducing mothur: Open-source, plat- Wickham, H. (2009). ggplot2 Elegant graphics for data analysis. Media form-independent, community-supported software for describing and (Vol. 35). https://doi.org/10.1007/978-0-387-98141-3 comparing microbial communities. Applied and Environmental Microbiol- Yuasa, T., Horiguchi, T., Mayama, S., & Takahashi, O. (2016). Gymnoxan- ogy, 75(23), 7537–7541. https://doi.org/10.1128/AEM.01541-09 thella radiolariae gen. et sp. nov. (Dinophyceae), a dinoflagellate sym- Skovgaard, A., & Salomonsen, X. M. (2009). Blastodinium galatheanum sp. biont from solitary radiolarians. Journal of Phycology, 52 nov. (Dinophyceae) a parasite of the planktonic Acartia neg- (1), 89–104. https://doi.org/10.1111/jpy.12371 ligens (Crustacea, Calanoida) in the central Atlantic Ocean. European Journal of Phycology, 44(3), 425–438. https://doi.org/10.1080/ 09670260902878743 Smith, K. F., Kohli, G. S., Murray, S. A., & Rhodes, L. L. (2017). Assess- SUPPORTING INFORMATION ment of the metabarcoding approach for community analysis of ben- Additional Supporting Information may be found online in the thic-epiphytic dinoflagellates using mock communities. New Zealand Journal of Marine and Freshwater Research, 51(4), 555–576. https:// supporting information tab for this article. doi.org/10.1080/00288330.2017.1298632 Stamatakis, A. (2014). RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics, 30(9), 1312– 1313. https://doi.org/10.1093/bioinformatics/btu033 How to cite this article: Mordret S, Piredda R, Vaulot D, Stoeck, T., Behnke, A., Christen, R., Amaral-Zettler, L., Rodriguez-Mora, Montresor M, Kooistra WHCF, Sarno D. DINOREF: A curated M. J., Chistoserdov, A., ... Edgcomb, V. P. (2009). Massively parallel dinoflagellate (Dinophyceae) reference database for the 18S tag sequencing reveals the complexity of anaerobic marine protistan rRNA gene. Mol Ecol Resour. 2018;00:1–14. https://doi.org/ communities. BMC Biology, 7(1), 72. https://doi.org/10.1186/1741- 7007-7-72 10.1111/1755-0998.12781 Stoecker, D. K. (1999). Mixotrophy among Dinoflagellates. The Journal of Eukaryotic Microbiology, 46(4), 397–401. https://doi.org/10.1111/j. 1550-7408.1999.tb04619.x