<<

The ISME Journal (2021) 15:1931–1942 https://doi.org/10.1038/s41396-021-00895-0

ARTICLE

Resolving cryptic complexes in marine protists: phylogenetic haplotype networks meet global DNA metabarcoding datasets

1,3 1 2 1 Daniele De Luca ● Roberta Piredda ● Diana Sarno ● Wiebe H.C.F. Kooistra

Received: 26 June 2020 / Revised: 23 December 2020 / Accepted: 14 January 2021 / Published online: 15 February 2021 © The Author(s) 2021. This article is published with open access

Abstract Marine protists have traditionally been assumed to be lowly diverse and cosmopolitan. Yet, several recent studies have shown that many protist species actually consist of cryptic complexes of species whose members are often restricted to particular biogeographic regions. Nonetheless, detection of cryptic species is usually hampered by sampling coverage and application of methods (e.g. phylogenetic trees) that are not well suited to identify relatively recent divergence and ongoing flow. In this paper, we show how these issues can be overcome by inferring phylogenetic haplotype networks from global metabarcoding datasets. We use the Chaetoceros curvisetus (Bacillariophyta) species complex as study case. Using two complementary metabarcoding datasets (Ocean Sampling Day and Tara Oceans), we equally resolve the cryptic fl

1234567890();,: 1234567890();,: complex in terms of number of inferred species. We detect new hypothetical species in both datasets. Gene ow between most of species is absent, but no barcoding gap exists. Some species have restricted distribution patterns whereas others are widely distributed. Closely related taxa occupy contrasting biogeographic regions, suggesting that geographic and ecological differentiation drive speciation. In conclusion, we show the potential of the analysis of metabarcoding data with evolutionary approaches for systematic and phylogeographic studies of marine protists.

Introduction related but retained their ancestral morphology or con- verged morphologically [3, 4]. The term ‘cryptic species’ is used for morphologically Recent molecular taxonomic studies have uncovered indistinguishable taxa for which there is evidence (genetic, remarkably high cryptic species diversity in marine plank- ecological, behavioural, biological, etc.) that they belong to tonic protists [5–7]. Originally such taxa were believed to different evolutionary lineages [1, 2]. Groups of such taxa be lowly diverse because potentially high dispersal oppor- are commonly referred to as ‘cryptic species complexes.’ tunities in the open sea leave little opportunity for genetic Cryptic species may have diverged recently and not yet differentiation even at a global scale [8–12]. Unfortunately, have become morphologically distinct, or they are distantly exploring geographic patterning of cryptic species by tra- ditional means, i.e. of sampling and examining large num- bers of specimens from all over the oceans, is a daunting These authors contributed equally: Daniele De Luca, Roberta Piredda task. The few truly global biogeographic studies of such species complexes to date [13, 14] reveal that the individual Supplementary information The online version contains species can be cosmopolitans in their own right or be supplementary material available at https://doi.org/10.1038/s41396- fi 021-00895-0. geographically more con ned. Exploration of species distribution patterns in marine * Wiebe H.C.F. Kooistra planktonic protists requires accurate species delimitation [email protected] [15, 16]. Cryptic species complexes in protists are usually 1 Department of Integrative Marine Ecology, Stazione Zoologica explored by comparing nucleotide differences on selected Anton Dohrn, Naples, Italy DNA markers with differences in the ultrastructural, bio- 2 Department of Research Infrastructure for Marine Biological chemical, biological or ecological properties of selected Resources, Stazione Zoologica Anton Dohrn, Naples, Italy strains [13, 16–18]. Classically, nucleotide data are gathered – 3 Present address: Department of Biology, Botanical Garden of through Sanger sequencing [19 21]. If genetic distances or Naples, University of Naples Federico II, Naples, Italy bootstrap values justify independent evolutionary lineages, 1932 D. De Luca et al.

Fig. 1 Light microscopy photographs of Chaetoceros curvisetus and C. pseudocurvisetus.aC. curvisetus; b C. pseudocurvisetus. The size and shape of aperture between sibling cells (see arrows) are useful characters for distinguishing these taxa.

cryptic species are hypothesised. Yet, phylogenies depict phylogenetic relationships [47, 48] and to delimit species sharply bifurcating speciation events and vertical changes [49, 50]. within ancestor-descent lineages [22], but in case of recent In the present study, we aim at delimiting species in the speciation, attenuated gene flow may still persist between Chaetoceros curvisetus (C. curvisetus) species complex and the sister lineages, which is visualised better in phylogenetic at mapping their distribution patterns. The complex belongs networks [22–24]. to the diatoms, a diverse class of unicellular algae abundant In recent years, high throughput sequencing of tax- in marine and freshwater habitats. Chaetoceros is arguably onomically discriminative barcode regions (HTS meta- the most abundant and diverse genus in the marine plank- barcoding) has revolutionised our capacity to explore tonic diatoms. Its hallmark is constituted by setae—tubular protistan diversity in environmental DNA samples [25]. silica extensions extending from the frustule and linking However, finding a single, universal DNA barcode for a cells into chains. To date, only two species have been genetically heterogeneous assemblage like protists has described: C. curvisetus Cleve and C. pseudocurvisetus revealed to be virtually impossible because of their long, Mangin (Fig. 1)[51]. Recent taxonomic studies uncovered independent, and complex evolutionary histories [26]. that C. curvisetus consists of several genetically distinct Several protistan DNA barcodes have been proposed, such taxa [39, 52, 53] with C. pseudocurvisetus resolved as the D1–D2 or D2–D3 regions of the 28S rRNA gene amongst these taxa, rendering the morphospecies C. curvi- [27–29], the ribosomal internal transcribed spacers ITS1 setus sensu lato paraphyletic. and ITS2 [30–32], the mitochondrial gene COI [33]orthe Here we use the molecular information contained in the chloroplastic gene rbcL[34], but the ~500 bp variable V4 metabarcode data of OSD [41] and Tara Oceans, coupled region of the 18S rRNA gene has been preferred as the with a phylogenetic network approach to identify cryptic universal protistan barcode [26]. It is part of a multi-copy species and assess their phylogenetic relationships. Refer- region present in all eukaryotes and includes conserved ence sequences of the 18S rRNA gene of C. curvisetus and variable regions useful for recognition at various species and close taxa [39] are used to gather reads taxonomic levels. The 18S rRNA gene is used extensively putatively belonging to the species complex from these to infer phylogenies and, consequently, reference datasets. The extracted reads are sorted into haplotypes, sequences are available for a comprehensive set of species which are used to generate phylogenetic networks. From the from across the protistan diversity [26, 35, 36]. The latter we delineate the species within the complex, explore resolution of the 18S rRNA gene to distinguish species has the evolutionary relationships and possible gene flow among been positively tested in several protistan taxa as for- the species and assess their phylogeographic distribution and aminifera [37], dinoflagellates [38] and some diatoms abundance in Longhurst’s biogeographic provinces [54]. [35, 39], but also negative results have been reported [26, 40]. Global metabarcoding initiatives targeting mar- ine protistan diversity used variable regions in the 18S Materials and methods rRNA gene as metabarcode marker; Ocean Sampling Day 2014 (OSD) [41] selected the V4 variable region whereas Reference molecular dataset Tara Oceans [42] used the shorter V9variable region. The resulting datasets have been used to uncover protistan The reference dataset comprised ten 18S rRNA diversity and distribution [43–46], to analyse their obtained from cultured strains, and retrieved from NCBI Resolving cryptic species complexes in marine protists: phylogenetic haplotype networks meet global DNA. . . 1933

Table 1 List of 18S rRNA gene references for the V4 and V9 regions Downloading and processing of metabarcode data utilised for gathering taxa in the C. curvisetus species complex. Strain GenBank accession number The OSD dataset included 144 samples for the V4 region of the 18S rRNA gene (https://mb3is.megx.net/osd-files?pa C. curvisetus SKLMP YG033 MG821562, V9 not included th=/2014/datasets/workable) and 31 samples for the V9 C. curvisetus 1 Na10C1 MG972232 region of the 18S rRNA gene (https://mb3is.megx.net/osd- C. curvisetus 2 Na1C1 MG972235 files?path=%2F2014%2Fdatasets%2Fworkable%2Frdna). C. curvisetus 2c El6A2 LC466961 From the V4-OSD workable fasta files, we generated a total C. curvisetus 3 newBB2 MG972241 fasta file with the unique haplotypes and a table containing C. curvisetus 3e El4A2 LC466962 the abundance of their reads at each site (Total OSD C. pseudocurvisetus IRB MG385841, V9 not included abundance table) using mothur v1.41.1 [57]. In this manu- C. pseudocurvisetus Na13C4 MG972304 script, we indicate with the term ‘haplotype’ a group of C. cf. tortissimus Na18C4 MG972275 identical cleaned reads. The same procedure was done for C. tortissimus Na25A2 MG972325 OSD-V9 workable fasta files. The Tara Oceans V9 dataset was downloaded from https://doi.pangaea.de/10.1594/ PANGAEA.873277 and ENA (accession number: PRJEB9737; [58, 59]). We directly extracted the fasta file Table 2 List of species abbreviations utilised in the present study. of haplotypes (unique cleaned reads) and the Total Tara Species Abbreviation Corresponding species Reference for Oceans abundance table from the downloaded file con- (this study) (this study) the species taining reads of Tara Oceans’ 210 sites. We then generated a full V9 dataset including OSD-V9 and Tara Oceans data. C. sp. 1 sp. 1 C. curvisetus 1[39, 52] C. sp. 2 sp. 2 C. curvisetus 2[39, 52] Recovery of species from global metabarcoding C. sp. 3 sp. 3 C. curvisetus 3[39] datasets C. sp. 4 sp. 4 C. curvisetus 2c [53] C. sp. 5 sp. 5 C. curvisetus 3e [53] The V4 and V9 fragments of the ten references were used as C. sp. 6 sp. 6 C. pseudocurvisetus [39, 52] queries for a local blastn BLAST [60] against OSD (V4) C. sp. 7 sp. 7 C. curvisetus SKLMP [GenBank and OSD-V9 + Tara Oceans datasets, respectively, to YG033 direct ≥ submission] extract haplotypes at 95% similarity. The strategy of using C. sp. 8 sp. 8 Putative new species [This study] references of close outgroups and a relaxed similarity (OSD dataset) threshold (95%) ensured inclusion of all haplotypes C. sp. 9 sp. 9 Putative new species [This study] belonging to the C. curvisetus species complex, plus those (OSD dataset) of unknown species therein. The extracted metabarcode C. sp. 10 sp. 10 Putative new species [This study] haplotypes were aligned with the ten references using (Tara dataset) MAFFT online [61] and a was built in C. sp. 11 sp. 11 Putative new species [This study] FastTree v2.1.8 [62], using the GTR model. The resulting (Tara dataset) tree was visualised and modified in Archaeopteryx v0.9901 [63] to remove haplotypes resolving within outgroup (false-positives). This procedure was carried out separately (Table 1). All the ten 18S rRNA gene references inclu- for V4 and V9 fragments. The retained (validated) haplo- ded the V4 region as amplified by the primers used in types were considered to belong to taxa in the C. curvisetus OSD [36], modified from [55]), whereas two of them did species complex. The abundance and distribution of V4 and not cover the V9 region as amplified by the Tara Oceans V9 curvisetus haplotypes were extracted from the Total primers ([56]; see Table 1). To simplify the nomen- OSD and Tara Oceans abundance tables. At the end of the clature of the species belonging to this complex (some validation procedures, four files were generated, containing genetically defined previously [39, 53], others identified the validated OSD and OSD-V9 + Tara Oceans sequences in the present study) we indicated the taxa in the C. in fasta format and the respective abundance tables (avail- curvisetus complex as sp. 1–sp. 11 (Table 2). From the able on figshare, see ‘Data availability’ section). full-length 18S rRNA gene sequences we extracted the V4 and V9 regions, corresponding with the fragments Phylogenetic haplotype network inference amplified in OSD and Tara Oceans, respectively. None of the ‘curvisetus’ species shared identical V4 and V9 Phylogenetic haplotype networks were constructed using barcodes. the statistical parsimony algorithm by [64] implemented in 1934 D. De Luca et al.

TCS network [65]. Networks were visualised in PopART First, from the abundance tables previously generated v1.7 [66] including the information of read abundances for (see ‘Data availability’ section), we summed the abun- each haplotype. Each network was exported in nexus format dances of the haplotypes belonging to the same inferred and as table containing the list of sequence ID’s (both species. Then, data were normalised to the total number reference and metabarcode haplotypes) grouped in each of reads for each sample and reported as percentage. node in the network. Considering that the number of mis- Finally, we plotted the transformed abundance of each matches between nodes was normally distributed in both V4 inferred species in Longhurst’s provinces in the form of and V9 networks (mean (µ) ± standard deviation (σ), med- heatmaps. For heatmap generation, we used the R [72] ian, and mode for V4: 1.14 ± 0.60, 1, and 1, and for V9: working packages phyloseq [73]andggplot2 [74]. We 1.03 ± 0.49, 1, and 1, respectively), we considered 2 mis- also plotted the occurrence of each species over the matches (µ + 2σ), corresponding to the area describing the sample stations on a world map using the packages maps 95.46% of mismatch distribution, as threshold for errors and [75]andggplot2. intra-specific variation. Based on these assumptions, we inferred species using the following criteria: (1) nodes without the reference and exhibiting ≤2 mismatches with the Results node containing the reference were attributed to that taxon; (2) nodes without reference and with >2 mismatches with Validation of candidate haplotypes in the C. respect to the nodes with reference were considered as curvisetus species complex hypothetical new taxa if their read-abundance was ≥2. After species inference, we took the representative sequence (the BLAST analysis of the ten reference V4 rRNA gene most abundant) of each delimited species and inferred a sequences against the OSD-V4 data retrieved 4223 reads phylogenetic tree (for V4 and V9 regions) for a rapid and corresponding to 1428 unique (non-redundant) haplotypes. supported visualisation of phylogenetic relationships among The phylogenetic tree-approach resulted in the validation of taxa. Maximum Likelihood (ML) trees were inferred using 1232 of these haplotypes (3804 reads) as members of the IQ-TREE v1.6.8 [67] under the TN + F + G4 model for V4 C. curvisetus species complex (files V4_OSD_curvi_vali- and the K2P + G4 model for V9 (suggested by Mod- dated.fasta and OSD_plus_OSD-V9_Tara_abundance_tables. elFinder, [68]) and 1 000 bootstrap replicates for both xlsx available on figshare, see Data availability). BLAST datasets. Sequences of C. tortissimus and C. cf. tortissimus analysis of the eight reference V9 rRNA gene sequences were used as outgroup. against the OSD-V9 and Tara Oceans data returned 2247 haplotypes (856967 reads in Tara Oceans and 194 in OSD- Genetic divergence among species and variability V9). After validation, 772 haplotypes (68210 reads in Tara within species Oceans and 192 in OSD-V9) were found to belong to the complex (files V9_TARA_LW_curvi_validated.fasta and To quantify the relatedness of each species in terms of dis- OSD_plus_OSD-V9_Tara_abundance_tables.xlsx available tances, we calculated the net genetic distance between pairs of on figshare, see ‘Data availability’ section). Reads of vali- speciesasimplementedinMEGA6[69]. We used the Jukes dated haplotypes were found in 60 out of 144 OSD sampling and Cantor model of sequence evolution [70] to calculate the sites (41.7%) and 117 out of 210 Tara Oceans stations distances across all metabarcode haplotypes of each species, (55.7%) (Fig. 2, Supplementary Table 1). which best fitted our data. We also calculated, using the same model, the minimum, maximum and average evolutionary Phylogenetic haplotype networks divergence of sequences within nodes (the number of base substitutions per site from averaging over all sequence pairs The haplotype network based on the OSD dataset (V4 within each group) using MEGA6 [69]. The presence of region of the 18S rRNA gene) contained seven nodes barcoding gap in the inferred species was explored. The assigned to known species in the C. curvisetus species barcoding gap was considered to occur if the maximum dis- complex plus two without a reference (Fig. 3a). Most of the tance among sequences within species was lower than the metabarcodes were assigned to sp. 1, 2, 3, 6 and 7 (Fig. 3a). minimum distance among sequences between species [71]. Only one haplotype was recovered for the reference of sp. 5 and, likewise, only one haplotype for that of sp. 4. Many Global of taxa belonging to the haplotypes clustered into two closely related nodes (spp. 8 C. curvisetus species complex and 9) lacking references (Fig. 3a). Moreover, sp. 3 is more closely related to sp. 6 (C. pseudocurvisetus) than to the The distribution of the inferred species of the C. curvi- other ‘C. curvisetus’ species; sp. 5 (Red Sea) is closely setus complex was mapped over the world’s oceans. related to sp. 6 and distantly to sp. 1. This latter node is Resolving cryptic species complexes in marine protists: phylogenetic haplotype networks meet global DNA. . . 1935

Fig. 2 Occurrence of taxa a) belonging to the C. curvisetus species complex. a OSD data; presence in b Tara Oceans data. Light blue OSD 50 dots refer to occurrence in OSD absence in OSD data (V4 + V9 regions of the 18S rRNA gene), whilst orange dots, in Tara Oceans data. Grey 0

triangles indicate absence in the Latitude molecular data from the respective sampling site. −50

−100 0 100 200 Longitude b)

presence in Tara Oceans 50 absence in Tara Oceans

0 Latitude

−50

−100 0 100 200 Longitude separated by eight mismatches from the reference of sp. 7 haplotypes) for these four putative new species are available from Hong Kong. in GenBank (see ‘Data availability’ section). An alignment The haplotype network based on the Tara Oceans dataset of the representative V4 and V9 sequences of each species (V9 region of the 18S rRNA gene) contained six nodes with sequence signatures is provided in Supplementary assigned to a known species plus two without a reference Information (Supplementary Fig. 1). (Fig. 3b). Most of the haplotypes were assigned to sp. 2; and In the network inferred from the V4 region of the 18S many to spp. 3 and 5, whereas all the other species were less rRNA gene (Fig. 3a), the group encompassing spp. 3, 5 and abundant. The node of sp. 3 was close but clearly separated 6 was recovered midway between sp. 1 on one side and spp. from a peripheral node with a large number of reads. The 2, 8 and 9 on the other side. Likewise, sp. 4 was recovered same was observed for sp. 5. These peripheral nodes were in between sp. 2 and spp. 8 and 9. The main branches treated as distinct species (spp. 10 and 11, respectively). connecting the nodes showed little reticulation, suggesting Nodes containing the V9 reference sequences of two strains reduced gene flow or none at all. The link between sp. 8 and from the Red Sea (spp. 4 and 5) showed high abundance in sp. 9 was reticulated, suggestive for gene flow between the the Tara Oceans dataset. two. In the network inferred from the V9 region of the 18S The V4 regions of the 18S rRNA gene sequences of spp. rRNA gene (Fig. 3b), the node attributed to sp. 1 was like a 8 and 9, and the V9 regions of the 18S rRNA gene pivot with three branches: one branch with sp. 3, another sequences of spp. 10 and 11 were blasted against NCBI branch with spp. 2 and 4, and a third branch with spp. 5 and GenBank with the aim of retrieving complete 18S rRNA 6. These three branches were devoid of intricate reticula- gene sequences showing exact match with the V4 of sp. 8 or tions, suggesting paucity or absence of gene flow. 9 as well as the V9 of sp. 10 or 11. Such a finding would The ML tree inferred using the V4 representative demonstrate that nodes in the V4- and V9 networks repre- sequences (the most abundant) of each newly identified sent the same species. However, such a connection could putative species plus the references confirmed that the taxa not be made because none of these sequences obtained an without reference barcodes (spp. 8 and 9) are members of exact hit. The representative sequences (dominant the C. curvisetus species complex and likely constitute at 1936 D. De Luca et al.

Fig. 3 TCS haplotype networks for the C. curvisetus species complex. a OSD data; b OSD-V9 + Tara Oceans data. The size of the nodes refers to the abundance of the reads. Asterisk (*) indicates nodes without a reference barcode corresponding to putative new species. Numbers in bold indicate the number of mutations. Edges with the same number of mutations are marked with a straight line.

least one new species (Supplementary Fig. 2a). The two and between spp. 8 and 9 (0.008, Supplementary Table 2a), taxa are closely related and share a common ancestor with whilst the highest values were observed between spp. 1 and sp. 2 (Supplementary Fig. 2a). In this tree, the with 2 (0.107) and between spp. 2 and 7 (0.105) (Supplementary spp. 1 and 7 was the first to branch off. Instead, in the V9 Table 2a). For V9, inter-specific distances ranged from tree (Supplementary Fig. 2b) the clade with spp. 2 and 4 0.368 (between spp. 3 and 4) to 0.022 (between spp. 5 and was sister to a polytomy with the remainder of the ingroup 11) (Supplementary Table 2b). The highest intra-specific references plus spp. 10 and 11. The haplotype of sp. 10 was distance for the V4 and the V9 regions (0.105 and 0.049, recovered as close sister of sp. 3. respectively) was higher than their minimum inter-specific distance (0.007 and 0.022, respectively). Therefore, no Genetic differentiation and variability threshold value could be established to distinguish between inter- and intra-specific variability (barcoding gap). Within Pairwise genetic distances among the inferred species dif- each species, the mean evolutionary divergence over fered between the V4 and V9 regions of the rRNA gene, but sequence pairs ranged from 0.000 (sp. 4) to 0.055 (sp. 5) for the proportions were comparable. For V4, the lowest inter- V4 region and from 0.000 (sp. 3) to 0.017 (sp. 2) for V9 specific genetic distances were between spp. 5 and 6 (0.007) (Supplementary Table 3). Resolving cryptic species complexes in marine protists: phylogenetic haplotype networks meet global DNA. . . 1937

Fig. 4 Heatmaps showing the a) abundance of C. curvisetus spp. in each Longhurst’s province. a OSD data; b OSD- V9 + Tara Oceans data. Data were normalised to the total Black Sea Caribbean California current Kuroshio current Med. Sea Adriatic Med. Sea E Med. Sea W Monsoon Indian gyre NE Atl. subtrop. gyre NE Atl. shelves New Zealand coast NW Atl. shelves Azores Sunda-Arafura shelves number of reads for each sample C. sp. 1 and reported as percentage. Species on the left are ordered C. sp. 7 according to phylogenetic C. sp. 2 Abundance closeness in the respective (%) C. networks. For the meaning of sp. 4 64.00 16.00 the provinces, see ref. [54]. C. sp. 8 4.00 1.00 C. sp. 9 0.25

C. sp. 3

C. sp. 6

C. sp. 5

b) Arabian Sea NW Atlantic Arctic Benguela current coast Brazilian current coast Caribbean Coastal Californian curr. Chesapeake Bay Eastern India coast Falkland SW Atl. Gulf Stream Med. Sea E Med. Sea W Monsoon India gyre N Atlantic Drift NE Atl. subtrop. gyre NW Atl. subtrop. gyre NE Atlantic shelves N Pacific subtrop. gyre N Pacific tropical gyre NW Atlantic shelves California current Pacific equat. divergence Red Sea South subtrop. converg. Antarctic Austral polar Boreal polar Central American coast Chile Guianas coast Indian S subtrop. gyre Med. Sea A Pacific N equat. counter curr. S Atlantic gyral S Pacific gyre

C. sp. 1

C. sp. 3 Abundance (%) C. sp. 10 64.00

C. sp. 5 4.00 0.25 C. sp. 11 0.01

C. sp. 6

C. sp. 2

C. sp. 4

Global distribution of taxa belonging to the C. tropical (V9, observable only in Tara Oceans data). The curvisetus species complex remaining species were encountered less frequently, often at sample sites far apart, e.g. sp. 6 in southern Europe, the Red Plots of occurrences gathered from OSD and Tara Oceans Sea and scattered along the coasts of the Indian Ocean; sp. 7 metabarcoding data revealed that the species complex is in Florida and Singapore; sp. 8 only on the US-East Coast, cosmopolitan, occurring in samples in both coastal and Panama and the Mediterranean and sp. 9 at a few sites in the open ocean waters at all latitudes in the northern to southern tropical Indian Ocean and in Japan (the latter three species hemispheres (Fig. 2a, b). Yet, the inferred species showed observable only in V4, OSD data). slightly to markedly more restricted distribution patterns For those species potentially detectable in both datasets (Fig. 4a, b; Supplementary Fig. 3); sp. 1 was found, often (reference sequences including both the V4 and V9), spp. 1 abundantly, in the Atlantic, Arctic and temperate provinces, and 6 were well-represented at the OSD sample sites, at whilst sp. 2 was observed there as well but also all over the least in Europe, but infrequent at the Tara Oceans sites; sp. tropics and subtropics. Instead, spp. 3, 4 and 5 showed a 2 was observed in both OSD and Tara Oceans sites; and predominantly warm-temperate to tropical distribution, spp. 3, 4 and 5 occurred at many Tara Oceans sites but were though the latter two were encountered also near the Ant- rare in the OSD data. Several closely related pairs of species arctic peninsula. Notably, spp. 10 and 11 were more strictly in the V4-tree exhibited distinct distribution ranges (spp. 1 1938 D. De Luca et al. and 7; spp. 2 and 4; spp. 3 and 6; spp. 8 and 9; see Fig. 4a; specific taxon. This approach, called ‘reverse ’ Supplementary Fig. 2a), and the same was observed [76] was applied previously in other marine protists and between close relatives in the V9 tree (spp. 3 and 10; spp. 5 metazoans [14, 77]. In the case of metabarcoding data, the and 11; spp. 2 and 4; see Fig. 4b; Supplementary Fig. 2b). validation of anonymous sequences through isolation of the target organism is supported by abundance tables, which contain information of occurrence, abundance and date for Discussion each sampled locality.

Phylogenetic relationships among taxa belonging to Considerations on Sanger vs. metabarcoding the C. curvisetus species complex sequencing data

Haplotype networks enable delineation of taxa within In this work, we have used the accepted barcode for protists the C. curvisetus species complex and visualisation of the (the V4 region of the rRNA gene, [26]) and the V9 region of relationships among these taxa. Despite the fact that the the rRNA gene to study a cryptic species complex. Instead global metabarcoding datasets here analysed are different in of a classical, Sanger-based approach of a multitude of terms of gene region sequenced and sampling coverage, we geographic strains, we have used metabarcoding datasets retrieve the same number of taxa, including newly inferred (OSD and Tara Oceans), to take advantage of the data ones. This suggests that the molecular information con- available for many sampling localities across the globe. It tained in these datasets allows an exhaustive exploration of would have been logistically next to impossible to establish the complex. In terms of phylogenetic relationships, hap- monoclonal strains from all of these sites. As consequence lotype networks display relationships better than previously of this choice, we had to work with thousands of sequences. published phylogenetic trees. Indeed, in the trees inferred Indeed, differently from a Sanger sequencing approach that from the V4- and V9 regions of the 18S rRNA gene [39], as provides a single sequence as output (a consensus of all the well as in the multigene phylogeny by [53], relationships amplified products), HTS techniques sequence individual among the members of this complex are poorly resolved, molecules. Furthermore, since the 18S gene occurs in especially in the tree inferred from the V9 region. The fact hundreds to thousands of copies within the , and that adding more reference barcodes and DNA markers to sometime on multiple [78], the number of the phylogenies [53] did not result in any significant sequences to handle was even larger. Such rRNA gene improvements in resolution suggests that the approach used copies are expected to be homogenised by concerted evo- to resolve relationships is more important than the number lution over time, but empirical studies suggest that this and type of markers analysed. Phylogenies visualise spe- process is not perfect and multiple, polymorphic copies can ciation events as dichotomies, whereas haplotype networks persist within the genome [79]. When using environmental can model evolution in a reticulated manner, best fitting samples, 18S rRNA gene copies from different cistrons, cases of recent divergence as may occur in species chromosomes and individuals are mixed together, preclud- complexes. ing the distinction between intra- and inter-specific varia- Slight differences in relationships identified in the net- bility. Using the network approach and simple criteria to works of the V4 and V9 regions of the 18S rRNA gene are assign sequences to a species, we have demonstrated that likely due to different length of the regions (~384 and ~105 this is not an issue. All these sequences resulting from the bp, respectively) and are also observed in the V4 and V9 apparent failure of concerted evolution to achieve complete phylogenetic trees in [39]. Furthermore, the reticulations homogenisation, from geographic variability, from PCR within the networks suggest a weak, but nonetheless and sequencing errors are arranged around the main node in existing gene flow between inferred species. The absence of which the ‘dominant haplotype’ is located. All these, i.e. the a barcoding gap corroborates that signal, suggesting that the dominant haplotype and its surrounding peripheral haplo- genetic barriers among some members of the complex are types contribute to the definition of the species’ overall incomplete. genetic variation for this marker region. This aspect of The V4 and V9 networks allow proposing hypotheses on concerted evolution suggested by metabarcoding data, as putative new species or emerging populations, as also well as the fact that the most abundant haplotype for a confirmed in the trees obtained using the reference barcodes specific taxon corresponds to its reference barcode obtained of curvisetus species and the representative sequences of with Sanger sequencing, has been demonstrated recently by unassigned nodes. Such inference of taxa from metabarcode [80] in a temporal dataset of Chaetocerotaceae at local scale. haplotypes is just the first step of the process; the next In this context, we show that the use of a multi-copy gene is step is to try to isolate the target organism in order to not a disadvantage, but instead, that all these copies con- link the anonymous sequence to the morphology of the tribute to the evaluation of inter- and intra-species variation. Resolving cryptic species complexes in marine protists: phylogenetic haplotype networks meet global DNA. . . 1939

Eco-evo considerations of the C. curvisetus species According to the ‘everything is everywhere’ hypothesis complex [94], most microbes form populations large enough to migrate efficiently and accumulate mutations that could be C. curvisetus was originally described from the Kattegat beneficial in particular environments [95]. Speciation in the [81] and reported by [82] as a common inhabitant of the microbial world is therefore expected to involve selection Atlantic Ocean and the Baltic Sea, with peaks of abundance rather than random drift or geographical separation [95]. in summer and autumn. Hasle and Syvertsen [51] indicated Diatoms, for example, are believed to exhibit high intra- it as a cosmopolitan species mainly found in temperate and specific variability, which would be key for their adaptation warm waters and this was also confirmed by [83]. In Chi- to different environments [96]. It is possible that different nese waters, the species has been associated with harmful strains of a species already possess beneficial mutations algal blooms [84, 85], although no production of toxins is allowing them to adapt to different environments using such known to date. Instead, C. pseudocurvisetus is considered intra-specific variability [96]. Once a different environment by [51] as an inhabitant of warm waters. This finding was is reached, some strains would be favoured by natural partially confirmed by [83], in which the species was found selection and, over time, accumulate other mutations that not only in the Mediterranean Sea, the nearby Atlantic will finally differentiate them from the parental population, Ocean and the Indian Ocean, but also in the North Sea, the leading to speciation. In this context, the adaptation to latter being quite balmy in high summer. different environments would be the factor triggering spe- In general, results of our analysis using OSD and Tara ciation in diatoms. Based on the results of this study and Oceans dataset indicate that the C. curvisetus complex is data for other protists we conclude that ecological differ- cosmopolitan. Nonetheless, some species show preferences entiation is likely to facilitate speciation both in allopatric for particular environmental conditions. Furthermore, clo- [93, 97] and sympatric [98–100] conditions. sely related species often exhibit contrasting geographic In conclusion, this work shows that the study of cryptic distribution patterns mainly related to temperature pre- species complexes in marine protists can benefit from the ferences, especially if they are clearly separated in the combination of evolutionary approaches and HTS data. network (no reticulation; e.g. spp. 1 and 7). Instead, species First, molecular data from global metabarcoding datasets connected by reticulate patterns probably still experience provide an exhaustive picture of the genetic variability of gene flow (e.g. spp. 8 and 9) and exhibit more comparable the cryptic species complex under investigation. Second, distribution patterns. Despite the fact that the partitioning of the use of phylogenetic networks over trees allows a better ocean regions in Longhurst’s provinces takes into account visualisation of relationships among closely related taxa, several biogeochemical parameters, Richter et al. [86] have especially when using metabarcoding data from multi-copy shown that temperature alone is more correlated to the genes, where there is no a priori definition of intra- and distribution of larger plankton species (as diatoms) than inter-specific genetic boundaries. Third, the large spatial other environmental parameters. coverage, including sampling points from different bio- Other studies involving cryptic species in marine protists geographic regions across world’s oceans, allows the inte- have shown similar results. In the genus Skeletonema, gration of ecological traits in the delimitation of cryptic widely distributed S. costatum sensu lato consists of a spe- species. Taken together, all these analyses provide an eco- cies complex [5, 87]. Several of its species appear to be evolutionary framework for systematic biology assessments widely distributed as well, but within broad climatological of cryptic species. boundaries (cool-temperate S. japonicum; temperate to tro- pical S. tropicum) whereas others such as S. grethae seem to Data availability be regional and absent in climatologically comparable regions [13]. More in general, Hasle [88] already noticed The fasta reference sequences of the putative new species that morphologically closely related diatom species are often here identified are available on GenBank at the following found in different biogeographic regions. In Leptocylindrus, accession numbers: MW168796–MW168799. The scripts most species were found to be widespread in coastal waters and input files for generating phylogenetic networks, heat- whereas L. minimus was restricted to cold waters of the maps and distribution maps are available on figshare at Northern Hemisphere [14, 89]. Similar results were also URL: https://doi.org/10.6084/m9.figshare.13150400. Other reported for cryptic species complexes within green algae Supplementary Information is available for download on the [90–93]. Our results and the information available in the ISME website. aforementioned studies indicate that ecological traits are as reliable as phylogenetic data for the circumscription of taxa Funding WHCFK acknowledges funding from H2020 RI Cluster within a cryptic species complex, and provide a strong project EMBRIC (GA-654008) and FP7 project ASSEMBLE (GA- support for the formulation of primary species hypotheses. 227799). DDL was supported by a Ph.D. fellowship funded by the 1940 D. De Luca et al.

Stazione Zoologica Anton Dohrn of Naples (The Open University— 13. Kooistra WHCF, Sarno D, Balzano S, Gu H, Andersen RA, Stazione Zoologica Anton Dohrn Ph.D. Programme). Zingone A. Global diversity and biogeography of Skeletonema species (Bacillariophyta). Protist. 2008;159:177–93. Compliance with ethical standards 14. Nanjappa D, Audic S, Romac S, Kooistra WHCF, Zingone A. Assessment of species diversity and distribution of an ancient diatom using a DNA metabarcoding approach. PLoS fl fl Con ict of interest The authors declare that they have no con ict of ONE. 2014;9:e103810. interest. 15. Kaczmarska I, Mather L, Luddington IA, Muise F, Ehrman JM. Cryptic diversity in a cosmopolitan diatom known as Aster- Publisher’s note Springer Nature remains neutral with regard to ionellopsis glacialis (Fragilariaceae): Implications for ecology, jurisdictional claims in published maps and institutional affiliations. biogeography, and taxonomy. Am J Bot. 2014;101:267–86. 16. Zhao Y, Yi Z, Gentekaki E, Zhan A, Al-Farraj SA, Song W. Open Access This article is licensed under a Creative Commons Utility of combining morphological characters, nuclear and Attribution 4.0 International License, which permits use, sharing, mitochondrial genes: An attempt to resolve the conflicts of adaptation, distribution and reproduction in any medium or format, as species identification for ciliated protists. Mol Phylogenet Evol. long as you give appropriate credit to the original author(s) and the 2016;94:718–29. source, provide a link to the Creative Commons license, and indicate if 17. Weiner A, Aurahs R, Kurasawa A, Kitazato H, Kucera M. changes were made. The images or other third party material in this Vertical niche partitioning between cryptic sibling species of a article are included in the article’s Creative Commons license, unless cosmopolitan marine planktonic protist. Mol Ecol. indicated otherwise in a credit line to the material. If material is not 2012;21:4063–73. included in the article’s Creative Commons license and your intended 18. Lamari N, Ruggiero MV, d’Ippolito G, Kooistra WHCF, Fon- use is not permitted by statutory regulation or exceeds the permitted tana A, Montresor M. Specificity of lipoxygenase pathways use, you will need to obtain permission directly from the copyright supports species delineation in the marine diatom genus Pseudo- holder. To view a copy of this license, visit http://creativecommons. nitzschia. PLoS ONE. 2013;8:e73281. org/licenses/by/4.0/. 19. Škaloud P, Friedl T, Hallmann C, Beck A, Dal Grande F. Taxonomic revision and species delimitation of coccoid green algae currently assigned to the genus Dictyochloropsis (Tre- References bouxiophyceae, Chlorophyta). J Phycol. 2016;52:599–617. 20. de Jesus PB, Costa AL, de Castro Nunes JM, Manghisi A, Genovese G, Morabito M, et al. Species delimitation methods 1. Mayr E. Populations, species, and evolution: an abridgment of reveal cryptic diversity in the Hypnea cornuta complex (Cysto- animal species and evolution. Cambridge: Belknap Press of cloniaceae, Rhodophyta). Eur J Phycol. 2019;54:135–53. Harvard University Press; 1970. 21. Díaz-Tapia P, Ly M, Verbruggen H. Extensive cryptic diversity 2. Bickford D, Lohman DJ, Sodhi NS, Ng PKL, Meier R, Winker in the widely distributed Polysiphonia scopulorum (Rhodome- K, et al. Cryptic species as a window on diversity and con- laceae, Rhodophyta): molecular species delimitation and mor- servation. Trends Ecol Evol. 2007;22:148–55. phometric analyses. Mol Phylogenet Evol. 2020;152:106909. 3. Fišer C, Robinson CT, Malard F. Cryptic species as a window 22. Huson DH, Rupp R, Scornavacca C. Phylogenetic networks. into the paradigm shift of the species concept. Mol Ecol. Cambridge: Cambridge University Press; 2009. 2018;27:613–35. 23. Huson DH, Bryant D. Application of phylogenetic networks in 4. Struck TH, Feder JL, Bendiksby M, Birkeland S, Cerca J, evolutionary studies. Mol Biol Evol. 2006;23:254–67. Gusarov VI, et al. Finding evolutionary processes hidden in 24. Solís-Lemus C, Yang M, Ané C. Inconsistency of species tree cryptic species. Trends Ecol Evol. 2018;33:153–63. methods under gene flow. Syst Biol. 2016;65:843–51. 5. Sarno D, Kooistra WHCF, Medlin LK, Percopo I, Zingone A. 25. Deiner K, Bik HM, Mächler E, Seymour M, Lacoursière-Roussel Diversity in the genus Skeletonema (Bacillariophyceae). II. An A, Altermatt F, et al. Environmental DNA metabarcoding: assessment of the taxonomy of S. costatum-like species with the Transforming how we survey animal and plant communities. description of four new species. J Phycol. 2005;41:151–76. Mol Ecol. 2017;26:5872–95. 6. Gaonkar CC, Kooistra WHCF, Lange CB, Montresor M, Sarno 26. Pawlowski J, Audic S, Adl S, Bass D, Belbahri L, Berney C, D. Two new species in the Chaetoceros socialis complex et al. CBOL protist working group: barcoding eukaryotic rich- (Bacillariophyta): C. sporotruncatus and C. dichatoensis, and ness beyond the animal, plant, and fungal kingdoms. PLoS Biol. characterization of its relatives. J Phycol. 2017;53:889–907. 2012;10:e1001419. 7. Li Y, Boonprakob A, Gaonkar CC, Kooistra WHCF, Lange CB, 27. Trobajo R, Mann DG, Clavero E, Evans KM, Vanormelingen P, Hernández-Becerril D, et al. Diversity in the globally distributed McGregor RC. The use of partial cox1, rbcL and LSU rDNA diatom genus Chaetoceros (Bacillariophyceae): three new spe- sequences for and species identification within the cies from warm-temperate waters. PLoS ONE. 2017;12: Nitzschia palea species complex (Bacillariophyceae). Eur J e0168887. Phycol. 2010;45:413–25. 8. Finlay BJ, Clarke KJ. Ubiquitous dispersal of microbial species. 28. Decelle J, Suzuki N, Mahé F, De Vargas C, Not F. Molecular Nature. 1999;400:828. phylogeny and morphological evolution of the acantharia 9. Finlay BJ, Fenchel T. Divergent perspectives on protist species (Radiolaria). Protist. 2012;163:435–50. richness. Protist. 1999;150:229–33. 29. Stoeck T, Przybos E, Dunthorn M. The D1-D2 region of the 10. Fenchel T, Finlay BJ. The ubiquity of small species: patterns of large subunit ribosomal DNA as barcode for ciliates. Mol Ecol local and global diversity. Bioscience. 2004;54:777. Resour. 2014;14:458–68. 11. Fenchel T. Cosmopolitan microbes and their ‘cryptic’ species. 30. Moniz MBJ, Kaczmarska I. Barcoding of diatoms: nuclear Aquat Microb Ecol. 2005;41:49–54. encoded ITS revisited. Protist. 2010;161:7–34. 12. Miglietta MP, Faucci A, Santini F. Speciation in the sea: over- 31. Gile GH, Stern RF, James ER, Keeling PJ. DNA barcoding of view of the symposium and discussion of future directions. chlorarachniophytes using nucleomorph ITS sequences. J Phy- Integr Comp Biol. 2011;51:449–55. col. 2010;46:743–50. Resolving cryptic species complexes in marine protists: phylogenetic haplotype networks meet global DNA. . . 1941

32. Stern RF, Andersen RA, Jameson I, Küpper FC, Coffroth M-A, 50. Pinseel E, Janssens SB, Verleyen E, Vanormelingen P, Kohler Vaulot D, et al. Evaluating the ribosomal internal transcribed TJ, Biersma EM, et al. Global radiation in a rare biosphere soil spacer (ITS) as a candidate dinoflagellate barcode marker. PLoS diatom. Nat Commun. 2020;11:1–12. ONE. 2012;7:e42780. 51. Hasle GR, Syvertsen EE. Marine diatoms. In: Tomas CR, editor. 33. Saunders GW. Applying DNA barcoding to red macroalgae: a Identifying marine phytoplankton. San Diego: Academic Press; preliminary appraisal holds promise for future applications. 1997. pp 5–385. Philos Trans R Soc B Biol Sci. 2005;360:1879–88. 52. Kooistra WHCF, Sarno D, Hernández-Becerril DU, Assmy P, Di 34. MacGillivary ML, Kaczmarska I. Survey of the efficacy of a Prisco C, Montresor M. Comparative molecular and morpholo- short fragment of the rbcL gene as a supplemental DNA barcode gical phylogenetic analyses of taxa in the Chaetocerotaceae for diatoms. J Eukaryot Microbiol. 2011;58:529–36. (Bacillariophyta). Phycologia. 2010;49:471–500. 35. Zimmermann J, Jahn R, Gemeinholzer B. Barcoding 53. De Luca D, Sarno D, Piredda R, Kooistra WHCF. A multigene diatoms: evaluation of the V4 subregion on the 18S rRNA gene, phylogeny to infer the evolutionary history of Chaetocerotaceae including new primers and protocols. Org Divers Evol. (Bacillariophyta). Mol Phylogenet Evol. 2019;140:106575. 2011;11:173–92. 54. Longhurst AR. Toward and ecological geography of the sea. In: 36. Piredda R, Tomasino MP, D’Erchia AM, Manzari C, Pesole G, Longhurst AR, editor. Ecological geography of the sea. 2nd ed. Montresor M, et al. Diversity and temporal patterns of planktonic Cambridge: Academic Press; 2007. pp 1–17. protist assemblages at a Mediterranean Long Term Ecological 55. Stoeck T, Bass D, Nebel M, Christen R, Jones MDM, Breiner H- Research site. FEMS Microbiol Ecol. 2016;93:fiw200. W, et al. Multiple marker parallel tag environmental DNA 37. Pawlowski J, Lecroq B. Short rDNA barcodes for species sequencing reveals a highly complex eukaryotic community in identification in foraminifera. J Eukaryot Microbiol. marine anoxic water. Mol Ecol. 2010;19:21–31. 2010;57:197–205. 56. Amaral-Zettler LA, McCliment EA, Ducklow HW, Huse SM. A 38. Mordret S, Piredda R, Vaulot D, Montresor M. Kooistra WHCF, method for studying protistan diversity using massively parallel Sarno D. dinoref: a curated dinoflagellate (Dinophyceae) refer- sequencing of V9 hypervariable regions of small-subunit ribo- ence database for the 18S rRNA gene. Mol Ecol Resour. somal RNA genes. PLoS ONE. 2009;4:e6372. 2018;18:974–87. 57. Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, 39. Gaonkar CC, Piredda R, Minucci C, Mann DG, Montresor M, Hollister EB, et al. Introducing mothur: open-source, platform- Sarno D, et al. Annotated 18S and 28S rDNA reference independent, community-supported software for describing and sequences of taxa in the planktonic diatom family Chaetocer- comparing microbial communities. Appl Environ Microbiol. otaceae. PLoS ONE. 2018;13:e0208929. 2009;75:7537–41. 40. Balzano S, Percopo I, Siano R, Gourvil P, Chanoine M, Marie D, 58. De Vargas C, Audic S, Tara Oceans Consortium C, Tara Oceans et al. Morphological and genetic diversity of Beaufort Sea dia- Expedition P. Total V9 rDNA information organized at the toms with high contributions from the Chaetoceros neogracilis metabarcode level for the Tara Oceans Expedition (2009–12). species complex. J Phycol. 2017;53:161–87. 2017. PANGAEA. https://doi.org/10.1594/PANGAEA.873277. 41. Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, 59. Ibarbalz FM, Henry N, Brandão MC, Martini S, Busseni G, Lehmann K, et al. The ocean sampling day consortium. Giga- Byrne H, et al. Global trends in marine plankton diversity across science. 2015;4. https://doi.org/10.1186/s13742-015-0066-5. kingdoms of life. Cell. 2019;179:1084–97. 42. Pesant S, Not F, Picheral M, Kandels-Lewis S, Le Bescot N, 60. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic Gorsky G, et al. Open science resources for the discovery and local alignment search tool. J Mol Biol. 1990;215:403–10. analysis of Tara Oceans data. Sci Data. 2015;2:150023. 61. Katoh K, Rozewicki J, Yamada KD. MAFFT online service: 43. Yau S, Lopes dos Santos A, Eikrem W, Gérikas Ribeiro C, multiple sequence alignment, interactive sequence choice and Gourvil P, Balzano S, et al. Mantoniella beaufortii and Manto- visualization. Brief Bioinform. 2019;20:1160–6. niella baffinensis sp. nov. (Mamiellales, Mamiellophyceae), two 62. Price MN, Dehal PS, Arkin AP. FastTree 2—approximately new green algal species from the high arctic. J Phycol. maximum-likelihood trees for large alignments. PLoS ONE. 2020;56:37–51. 2010;5:e9490. 44. Lopes Dos Santos A, Gourvil P, Tragin M, Noël M-H, Decelle J, 63. Han MV, Zmasek CM. phyloXML: XML for evolutionary Romac S, et al. Diversity and oceanic distribution of prasino- biology and comparative genomics. BMC Bioinform. phytes clade VII, the dominant group of green algae in oceanic 2009;10:356. waters. ISME J. 2017;11:512–28. 64. Templeton AR, Crandall KA, Sing CF. A cladistic analysis of 45. Kuwata A, Yamada K, Ichinomiya M, Yoshikawa S, Tragin M, phenotypic associations with haplotypes inferred from restriction Vaulot D, et al. Bolidophyceae, a sister picoplanktonic group of endonuclease mapping and DNA sequence data. III. diatoms—a review. Front Mar Sci. 2018;5:370. estimation. Genetics. 1992;132:619–33. 46. Segawa T, Matsuzaki R, Takeuchi N, Akiyoshi A, Navarro F, 65. Clement M, Posada D, Crandall KA. TCS: a computer program Sugiyama S, et al. Bipolar dispersal of red-snow algae. Nat to estimate gene genealogies. Mol Ecol. 2000;9:1657–9. Commun. 2018;9:1–8. 66. Leigh JW, Bryant D. popart: full‐feature software for haplotype 47. Ichinomiya M, Dos Santos AL, Gourvil P, Yoshikawa S, Kamiya network construction. Methods Ecol Evol. 2015;6:1110–6. M, Ohki K, et al. Diversity and oceanic distribution of the Par- 67. Nguyen L-T, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: males (Bolidophyceae), a picoplanktonic group closely related to a fast and effective stochastic algorithm for estimating diatoms. ISME J. 2016;10:2419–34. maximum-likelihood phylogenies. Mol Biol Evol. 48. Tragin M, Vaulot D. Novel diversity within marine Mamiello- 2015;32:268–74. phyceae (Chlorophyta) unveiled by metabarcoding. Sci Rep. 68. Kalyaanamoorthy S, Minh BQ, Wong TKF, von Haeseler A, 2019;9:1–14. Jermiin LS. ModelFinder: fast model selection for accurate 49. Morard R, Vollmar NM, Greco M, Kucera M. Unassigned phylogenetic estimates. Nat Methods. 2017;14:587–9. diversity of planktonic foraminifera from environmental 69. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. sequencing revealed as known but neglected species. PLoS MEGA6: molecular evolutionary genetics analysis version 6.0. ONE. 2019;14:e0213936. Mol Biol Evol. 2013;30:2725–9. 1942 D. De Luca et al.

70. Jukes TH, Cantor CR. Evolution of protein molecules. Mamm plankton biogeography shaped by large-scale current systems. Protein Metab. 1969;3:21–132. bioRxiv. 2019. https://doi.org/10.1101/867739. 71. Meyer CP, Paulay G. DNA barcoding: error rates based on 87. Sarno D, Kooistra WHCF, Balzano S, Hargraves PE, Zingone A. comprehensive sampling. PLoS Biol. 2005;3:e422. Diversity in the genus Skeletonema (Bacillariophyceae). III. 72. R Core Team. R: a language and environment for statistical Phylogenetic position and morphological variability of Skeleto- computing. 2019. Vienna, Austria: R Foundation for Statistical nema costatum and Skeletonema grevillei, with the description of Computing; 2019. Skeletonema ardens sp. nov. J Phycol. 2007;43:156–70. 73. McMurdie PJ, Holmes S. phyloseq: an R package for repro- 88. Hasle GR. The biogeography of some marine planktonic dia- ducible interactive analysis and graphics of microbiome census toms. Deep Sea Res Oceanogr Abstr. 1976;23:319–338, IN1- data. PLoS ONE. 2013;8:e61217. IN6. 74. Wickham H. ggplot2: elegant graphics for data analysis. New 89. Pargana A. Functional and molecular diversity of the diatom York: Springer-Verlag; 2016. https://ggplot2.tidyverse.org. family Leptocylindraceae. 2017. PhD Thesis, The Open Uni- 75. Becker A, Wilks AR. Maps: draw geographical maps. 2018. versity, Milton Keynes, UK. https://CRAN.R-project.org/package=maps. 90. Novis PM. Taxonomy of Klebsormidium (Klebsormidiales, 76. Markmann M, Tautz D. Reverse taxonomy: an approach towards Charophyceae) in New Zealand streams and the significance of determining the diversity of meiobenthic organisms based on low-pH habitats. Phycologia. 2006;45:293–301. ribosomal RNA signature sequences. Philos Trans R Soc Lond B 91. Rindi F, Guiry MD, López-Bautista JM. Distribution, morphol- Biol Sci. 2005;360:1917–24. ogy, and phylogeny of Klebsormidium (Klebsormidiales, Char- 77. López-Escardó D, Paps J, de Vargas C, Massana R, Ruiz-Trillo I, ophyceae) in urban environments in Europe. J Phycol. Del Campo J. Metabarcoding analysis on European coastal 2008;44:1529–40. samples reveals new molecular metazoan diversity. Sci Rep. 92. Rindi F, Mikhailyuk TI, Sluiman HJ, Friedl T, López-Bautista 2018;8:9106. JM. Phylogenetic relationships in Interfilum and Klebsormidium 78. Álvarez I, Wendel JF. Ribosomal ITS sequences and plant (Klebsormidiophyceae, Streptophyta). Mol Phylogenet Evol. phylogenetic inference. Mol Phylogenet Evol. 2003;29:417–34. 2011;58:218–31. 79. Alverson AJ, Kolnick L. Intragenomic nucleotide polymorphism 93. Škaloud P, Rindi F. Ecological differentiation of cryptic species among small subunit (18S) rDNA paralogs in the diatom genus within an asexual protist morphospecies: a case study of fila- Skeletonema (Bacillariophyta). J Phycol. 2005;41:1248–57. mentous green alga Klebsormidium (Streptophyta). J Eukaryot 80. Gaonkar CC, Piredda R, Sarno D, Zingone A, Montresor M, Microbiol. 2013;60:350–62. Kooistra WHCF. Species detection and delineation in the marine 94. Baas Becking LGM. Geobiologie of Inleiding tot de Mili- planktonic diatoms Chaetoceros and Bacteriastrum through eukunde. The Hague: Van Stockum & Zoon; 1934). metabarcoding: making biological sense of haplotype diversity. 95. Shapiro BJ, Leducq J-B, Mallet J. What is speciation? PLoS Environ Microbiol. 2020;22:1917–29. Genet. 2016;12:e1005860. 81. Cleve PT. Pelagisk Diatomeer från Kattegat. In: Petersen CGJ, 96. Godhe A, Rynearson T. The role of intraspecific variation in the editor. Det Videnskabelige Udbytte af Kanonbaaden ‘Hauchs’ ecological and evolutionary success of diatoms in changing Togter i de Danske Have Indefor Skagen, I. Aarene 1883–86. environments. Philos Trans R Soc Lond B Biol Sci. Kjøbenhavn: Andr. Fred. Høst & Sons Forlag; 1889. pp 53–56. 2017;372:20160399. 82. Gran HH. Den Norske Nordhaus-Expedition 1876-1878. Bota- 97. de Vargas C, Norris R, Zaninetti L, Gibb SW, Pawlowski J. nik, Protophyta: Diatomaceae, Silicoflagellata og Cilioflagellata. Molecular evidence of cryptic speciation in planktonic for- Christiania: Grøndal & Søns; 1897. aminifers and their relation to oceanic provinces. Proc Natl Acad 83. De Luca D, Kooistra WHCF, Sarno D, Gaonkar CC, Piredda R. Sci USA. 1999;96:2864–8. Global distribution and diversity of Chaetoceros (Bacillar- 98. Amato A, Kooistra WHCF, Levialdi Ghiron JH, Mann DG, iophyta, Mediophyceae): integration of classical and novel stra- Pröschold T, Montresor M. Reproductive isolation among sym- tegies. PeerJ. 2019;7:e7410. patric cryptic species in marine diatoms. Protist. 84. Wang J, Wu J. Occurrence and potential risks of harmful algal 2007;158:193–207. blooms in the East China Sea. Sci Total Environ. 99. Weisse T. Distribution and diversity of aquatic protists: an 2009;407:4012–21. evolutionary and ecological perspective. Biodivers Conserv. 85. Zhen Y, Mi T, Yu Z. Detection of several harmful algal species 2007;17:243–59. by sandwich hybridization integrated with a nuclease protection 100. Vanelslander B, Créach V, Vanormelingen P, Ernst A, Chepurnov assay. Harmful Algae. 2009;8:651–7. VA, Sahan E, et al. Ecological differentiation between sympatric 86. Richter DJ, Watteaux R, Vannier T, Leconte J, Frémont P, pseudocryptic species in the estuarine benthic diatom Navicula Reygondeau G, et al. Genomic evidence for global ocean phyllepta (Bacillariophyceae). J Phycol. 2009;45:1278–89.