Insights Into Global Planktonic Diatom Diversity: Comparisons Between Phylogenetically Meaningful Units That Account for Time
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/167809; this version posted July 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Insights into global planktonic diatom diversity: Comparisons between phylogenetically meaningful units that account for time Teofil Nakov, Jeremy M. Beaulieu, and Andrew J. Alverson Department of Biological Sciences University of Arkansas 1 University of Arkansas, SCEN 601 Fayetteville, AR 72701 Abstract Metabarcoding has offered unprecedented insights into microbial diversity. In many studies, short DNA sequences are binned into consecutively higher Linnaean ranks, and ranked groups (e.g., genera) are the units of biodiver- sity analyses. These analyses assume that Linnaean ranks are biologically meaningful and that identically ranked groups are comparable. We used a meta-barcode dataset for marine planktonic diatoms to illustrate the limits of this approach. We found that the 20 most abundant marine planktonic diatom genera ranged in age from 4 to 134 million years, indicating the non- equivalence of genera because some had more time to diversify than others. Still, species richness was only weakly correlated with genus age, highlighting variation in rates of speciation and/or extinction. Taxonomic classifications often do not reflect phylogeny, so genus-level analyses can include phylogenet- ically nested genera, further confounding rank-based analyses. These results underscore the indispensable role of phylogeny in understanding patterns of microbial diversity. Keywords: diversification, metabarcoding, microbes, phylogeny Preprint submitted to Bioarxiv July 24, 2017 bioRxiv preprint doi: https://doi.org/10.1101/167809; this version posted July 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 1 With potentially millions of species occupying all the world's aquatic and 2 terrestrial biomes, microbial species diversity is notoriously difficult to dis- 3 cover and catalog. Traditional approaches to species discovery are time and 4 labor intensive, and they miss species that cannot be cultivated in the lab [1]. 5 The phylogenetic diversity of this undiscovered "microbial dark matter" is 6 often characterized through community DNA sequencing of barcode genes. 7 A typical workflow includes DNA extraction from an environmental sam- 8 ple, PCR amplification of a barcode fragment, and high-throughput DNA 9 sequencing of the amplicon. Sequencing reads are clustered into operational 10 taxonomic units (OTUs) that are subsequently binned into consecutively 11 higher taxonomic ranks, and these ranked groups, in turn, are often the 12 focus of biodiversity assessments [2]. 13 Linnaean names and ranks are often taken to mean more than what they 14 are: arbitrary taxon delimitations disconnected from evolutionary history. 15 The treatment of named ranks as anything other than arbitrary implies that 16 identically ranked groups are somehow comparable, encouraging comparisons 17 of their ecology, biogeography, and species richness [3, 4, 5]. The only mean- 18 ingful comparisons involve groups with comparable evolutionary histories [6]. 19 In this sense, monophyletic groups (clades) are more likely to be biologically 20 cohesive units, and they should have comparable species richness if they are 21 similar in age and have diversified at similar rates [7]. Comparison of mono- 22 phyletic groups, while accounting for time, provides a robust framework for 23 detecting clades with exceptional species richness and comparing their func- 24 tional, ecological, or biogeographic breadth [8]. 25 The Tara Oceans Project sequenced 18S-V9 metabarcode fragments from 2 bioRxiv preprint doi: https://doi.org/10.1101/167809; this version posted July 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 26 plankton samples to characterize microbial communities and species richness 27 across the world's oceans [9]. A total of 20 diatom genera accounted for nearly 28 99% of all diatom sequencing reads, and these genera were found to differ 29 in relative abundance, cell size, habitat preference, geographical distribution, 30 and species richness [2], however, it was unclear whether or not these patterns 31 deviated from expectations. 32 We focused our analyses on the genus-based patterns of species richness 33 and expected that older genera would be more species rich because they 34 have had a longer period of time to diversify [7]. We used a time-calibrated 35 phylogenetic tree of 1,151 diatoms [10] to calculate expected species richness 36 for the 20 most abundant marine diatom genera in the Tara Oceans survey 37 [2]. Given the crown age of diatoms [10], relative extinction (i.e., extinc- 38 tion/speciation) estimated from Cenozoic fossil species [11], and a minimum 39 approximation of total described and undescribed diatom diversity (30,000 40 species; [12]), we calculated a net diversification rate for diatoms (i.e., speci- 41 ation - extinction). We then used this rate to calculate the upper and lower 42 bounds of expected OTU richness for the 20 focal diatom genera [8]. 43 The 20 diatom genera ranged in age from 4134 million years (My). OTU 44 richness was only weakly correlated with clade age (r=0.36, 95% CI=-0.1{ 45 0.7, df=18, P=0.12), with 12 of the 20 genera falling within expectations for 46 OTU diversity given their age (Figure 1). The most abundant and OTU-rich 47 genus, Chaetoceros, was also the oldest (Figure 1a). The birthdeath diversi- 48 fication model predicted that Chaetoceros diversity should range between 47 49 and 6567 species|the Tara Oceans dataset recovered 644 OTUs, consistent 50 with expectations for a clade of this age (Figure 1b). Some of the most diverse 3 bioRxiv preprint doi: https://doi.org/10.1101/167809; this version posted July 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. a b c Chaetoceros ● ● ● Chaetoceros Rhizosolenia ● ● ● Rhizosolenia Navicula ● * ●B F ● Navicula Leptocylindrus ● ● ● Leptocylindrus Coscinodiscus ● * ● ● Coscinodiscus Actinocyclus ● ● ● Actinocyclus Synedra ● * ●B F ● Synedra Thalassiosira ● * ● ● Thalassiosira Guinardia ● ● ● Guinardia Fragilariopsis ● ● ● Fragilariopsis Proboscia ● ● ● Proboscia Pseudonitzschia ● ● ● Pseudonitzschia Pleurosigma ● ●B ● Pleurosigma Corethron ● ● ● Corethron Planktoniella ● ● ● Planktoniella Eucampia ● ● ● Eucampia not ● ●B ● Haslea * monophyletic Haslea Attheya ● ● ● Attheya ● Polar Centric Minidiscus ● Radial Centric ● ● ● Minidiscus ● Raphid Pennate B Many benthic underdescribed ● ● ● Minutocellus ● Araphid Pennate F Many freshwater underestimated Minutocellus 150 120 90 60 30 0 1 5 10 25 50 100 250 1000 5000 1 5 10 25 50 250 1000 5000 Crown age (Ma bp) Tara swarms Algaebase taxa Figure 1: Age and estimated taxon richness of the 20 most abundant marine planktonic diatom genera identified by the Tara Oceans metabarcode project [2]. Crown ages and uncertainty (grey bars) were estimated from 1000 bootstrap phylogenies [10] (a). Taxon richness was estimated from the number of OTU swarms in the Tara Oceans dataset (b) and the number of accepted species names in AlgaeBase [13] (c); lines delimit 95% confi- dence intervals of expected richness given the crown age of a clade, empirical extinction fraction, and diatom-wide estimate of the net diversification rate. 4 bioRxiv preprint doi: https://doi.org/10.1101/167809; this version posted July 24, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. 51 genera identified by metabarcoding (e.g., Corethron and Pseudo-nitzschia) 52 had OTU richness estimates that exceeded expectations. Assuming OTUs 53 correspond to species and that our estimates of clade age are not heavily bi- 54 ased, these genera have either exceptionally high speciation or low extinction 55 rates. Identifying the drivers of these patterns might offer new mechanistic 56 insights into phytoplankton diversification. Comparisons between OTU rich- 57 ness (Figure 1b) and accepted taxonomic names from AlgaeBase (Figure 1c) 58 showed expected discrepancies for lineages with substantial diversity in ben- 59 thic or freshwater habitats (e.g., Navicula; Figure 1b, B and F annotations; 60 Figure 1c, blue bars) and were also useful in highlighting clades that are 61 underdescribed at the species level (Figure 1c, green bars). 62 Metabarcoding identified Thalassiosira as one of the most abundant, 63 OTU-rich, and geographically widespread marine planktonic diatom gen- 64 era. A total of eight Thalassiosirales genera were detected in the Tara 65 Oceans project (Cyclotella, Lauderia, Minidiscus, Planktoniella, Porosira, 66 Shionodiscus, Skeletonema, and Thalassioisra), and these genera ranged