Fingerprinting Marine Macrophytes in Blue Carbon Habitats

Thesis by

Alejandra Ortega

In Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy in Bioscience

King Abdullah University of Science and Technology

Thuwal, Kingdom of Saudi Arabia

November, 2019 2

EXAMINATION COMMITTEE PAGE

The thesis of Alejandra Ortega is approved by the examination committee.

Committee Chairperson and Thesis Supervisor: Prof. Carlos M. Duarte

Committee Members: Prof. Mark Tester, Prof. Takashi Gojobori, and

Prof. Hugo de Boer [External]

3

© November, 2019

Alejandra Ortega

All Rights Reserved 4

ABSTRACT

Fingerprinting Marine Macrophytes in Blue Carbon Habitats

Alejandra Ortega

Seagrass, mangrove, saltmarshes and macroalgae - the coastal vegetated habitats, offer a promising nature-based solution to climate change mitigation, as they sequester carbon in their living biomass and in marine sediments. Estimation of the macrophyte organic carbon contribution to coastal sediments is key for understanding the sources of blue carbon sequestration, and for establishing adequate conservation strategies. Nevertheless, identification of marine macrophytes has been challenging and current estimations are uncertain. In this dissertation, time- and cost-efficient DNA-based methods were used to fingerprint marine macrophytes and estimate their contribution to the organic pool accumulated in blue carbon habitats. First, a suitable short-length DNA barcode from the universal 18S gene was chosen among six barcoding regions tested, as it successfully recovered degraded DNA from sediment samples and fingerprinted marine macrophyte taxa. Second, an experiment was performed to test whether the abundance of eDNA represents the content of organic carbon within the macrophytes; results supported this notion, indicating a positive correlation (R2 = 0.85) between eDNA and organic carbon.

Third, using the chosen barcode, eDNA of marine macrophyte was identified from sediments of seagrass meadows and mangrove forests in the Arabian Red Sea, to further estimate contributions to the organic carbon pools. Estimations based on eDNA were compared against estimations of organic carbon based on stable isotope analyses from the same sediments; results from both methods were similar. In addition, this research 5 provided the first quantitative evidence of the contribution of macroalgae to coastal and oceanic carbon pools. Hitherto, macroalgae have been ignored in blue carbon assessments because their fingerprinting was challenging and there was no evidence of their carbon export. The results of this dissertation demonstrate that eDNA offers an unprecedent taxonomic discrimination, and resolve the contribution of marine macrophytes to the organic pools in blue carbon sediments. 6

ACKNOWLEDGEMENTS

This doctoral research is a collaborative effort between KAUST and other institutions; I would like to thank my co-authors for their contribution to this work. Especially, to Sarah

Bachmann Øberg, Marlene Wesselmann, Intikhab Alam and Allan A. Kamau. I extent this gratitude to lab support specialists Jasmine Raja Mohamed Sait and Mongi Ennasri.

My profound appreciation goes to Nathan R. Geraldi and Rubén Rua Díaz; their guidance was limitless. This appreciation also goes to Daniel Binham for his unconditional friendship and editing skills. Finally, I thank Prof. Carlos M. Duarte for offering and advising me on this research.

7

TABLE OF CONTENTS

Page

EXAMINATION COMMITTEE PAGE...... 2 COPYRIGHT PAGE...... 3 ABSTRACT...... 4 ACKNOWLEDGEMENTS...... 6 TABLE OF CONTENTS...... 7 LIST OF ABBREVIATIONS...... 9 TABLE OF FIGURES...... 11 LIST OF TABLES...... 12 Dissertation synopsis...... 13 Introduction...... 13 Dissertation Outcome...... 16 Publications...... 17 References...... 18

Part I: Fingerprinting blue carbon in coastal ecosystems...... 21 Chapter 1: A DNA mini-barcode for marine macrophytes...... 22 1.1 Abstract...... 23 1.2 Introduction...... 24 1.3 Methods...... 27 1.4 Results...... 37 1.5 Discussion...... 49 1.6 Acknowledgements...... 52 1.7 Supplementary Information...... 53 1.8 eDNA extraction protocol...... 55 1.9 References...... 61

Chapter 2: Environmental DNA estimates marine macrophyte contribution to blue carbon sediments...... 66 2.1 Abstract...... 67 2.2 Introduction...... 67 2.3 Methods...... 69 2.4 Results and discussion...... 76 2.5 Acknowledgements...... 90 2.6 Supplementary Information...... 91 2.7 Supplementary Results...... 100 2.8 References...... 102 8

Part II: Fingerprinting blue carbon in the open ocean...... 105 Chapter 3: Important contribution of macroalgae to oceanic carbon Sequenstration...... 106 3.1 Abstract...... 107 3.2 Main...... 107 3.3 Tracing macroalgae...... 110 3.4 Macroalgae diversity in the ocean...... 115 3.5 Export of macroalgae throughout the water column...... 117 3.6 Implications for blue carbon assessments...... 122 3.7 Methods...... 123 3.8 Acknowledgements...... 131 3.9 Supplementary Information...... 132 3.10 References...... 138

9

LIST OF ABBREVIATIONS

18S rDNA 18S ribosomal DNA δ13C delta Carbon 13 δ15N delta 15 Nitrogen l microliter μm micrometer mol micromolar ANOVA Analysis of variance BLASTp Basic local alignment search tool for proteins BOLD Barcode of life database bp Base pair CBOL Consortium for the barcode of life CI Chloroform-isoamyl alcohol CO2 Carbon dioxide COI Cytochrome C oxidase subunit I gene Corg Organic carbon CTD Conductivity, temperature, and depth instrument DADA2 Divisive amplicon denoising algorithm 2 df Degree of freedom DMAP Dragon metagenomics analysis platform DOC Dissolved organic carbon eDNA environmental DNA F Pseudo-F statistic IO Indian Ocean ITS2 Internal transcribed spacer 2 gene KEGG Kyoto encyclopedia of genes and genomes LPA Linear polyacrylamide matK Maturase K plastid gene Mg C Megagram of carbon MS Mediterranean Sea NaCl Sodium Chloride NAO North Atlantic Ocean NaOH Sodium hydroxide nMDS Non-metric multidimensional scaling ordination NPO North Pacific Ocean OTU Operational taxonomic units PCR Polymerase chain reaction PEG Polyethylene glycol PERMANOVA Permutational multivariate analysis of variance PERMDISP Analysis of homogeneity of multivariate dispersion Pg C Petagram of carbon PO4 Phosphate POC Particulate organic carbon POM Particulate organic matter rbcL Ribulose bisphophate carboxylase large subunit gene 10

RPM Read per million RS Red Sea SAO South Atlantic Ocean SCG Single-copy protein-encoding genes SE Standard error SIMM Stable isotope mixing model SO Southern Ocean SPO South Pacific Ocean TE Tris-Ethylenediaminetetraacetic acid Tg C Teragram of carbon Tm Melting temperature trn Intergenic non-transcribe spacer tuf Factor Tu gene

11

LIST OF FIGURES

Chapter 1 Figure 1 Visual methods for DNA barcoding...... 37 Figure 2 Phylogenetic tree of the Euka02 primer...... 42 Supplementary Figure 1 Phylogenetic tree of the 18S2 primer...... 53

Chapter 2 Figure 1 Eukaryotic taxonomic composition in blue carbon habitats...... 77 Figure 2 nMDS comparing macrophyte composition...... 80 Figure 3 Contribution of macrophytes to blue carbon habitats...... 81 Figure 4 eDNA correlation to organic carbon...... 88 Supplementary Figure 1 Comparison between intracellular and extracellular eDNA.... 91 Supplementary Figure 2 eDNA recovery through time...... 92 Supplementary Figure 3 Correlation between eDNA, DNA and organic carbon...... 93

Chapter 3 Figure 1 Assemblage of macroalgae in the ocean...... 117 Figure 2 Export of macroalgae to the deep and open ocean...... 119 Figure 3 Oceanic export of macroalgal DNA per order...... 121 Supplementary Figure 1 Rhodophyta SCG phylogeny...... 132 Supplementary Figure 2 Geographical location of sampling points...... 133

12

LIST OF TABLES

Chapter 1 Table 1 List of primer pairs tested for DNA barcoding...... 29 Table 2 List of species tested for DNA barcoding...... 31 Table 3 Amplification performance of the primers...... 39 Table 4 Summary of assignment for taxonomic rank...... 44 Table 5 Summary of assignment for taxonomic rank...... 48 Supplementary Table 1 Intracellular and extracellular eukaryotic...... 54

Chapter 2 Table 1 List of marine macrophytes fingerprinted in blue carbon habitats...... 78 Supplementary Table 1 List of macroalgae taxa found in Red Sea coastal habitats..... 94 Supplementary Table 2 Diversity indices of marine macrophytes...... 95 Supplementary Table 3 Composition of seagrass species...... 96 Supplementary Table 4 Macrophyte treatments pools...... 98 Supplementary Table 5 Taxonomic fingerprinting of eDNA from macrophytes...... 99

Chapter 3 Table 1 Relative abundance of macroalgal eDNA...... 110 Supplementary Table 1 Macroalgal diversity indices by oceanic region...... 134 Supplementary Table 2 PERMANOVA comparing macroalgal assemblage...... 135 Supplementary Table 3 Macroalgal diversity indices by depth...... 136 Supplementary Table 4 Macroalgal catalog for unique genes...... 137

13

DISSERTATION SYNOPSIS

Introduction

Marine macrophytes (macroalgae and marine angiosperms) are important carbon reserves both as living biomass and as a deposit of organic material trapped within their sediment1,2. These habitats form the most intense carbon sink in the ocean, which is known as blue carbon3-5. Restoration and protection of these natural carbon sinks are key strategies for mitigating climate change and slowing down the increase of atmospheric

5,6 CO2 .

Existent blue carbon assessments estimate that marine angiosperms (seagrass, mangroves and saltmarshes) contribute about 50% to the carbon accumulated within their sediments5,7-9. However, there are no estimations of macroalgae contribution to this carbon pool, despite macroalgae form the most extensive and productive vegetated coastal habitat9-11. Contrary to the marine angiosperms, there is no clear evidence of the fate of macroalgae once they are exported from their coastal habitats, hence most assessments have ignored contributions of macroalgae to the oceanic carbon cycle11.

Recent studies highlight the potential role of macroalgae as an allochthonous carbon source, hypothesizing that large amounts of macroalgae are exported form the coast to the open ocean and to sediments of marine angiosperms9,12-14. Although there is no quantitative evidence of this export, those studies pledge for the inclusion of macroalgae in assessments of blue carbon budgets.

The most common method to identify sources of organic matter in blue carbon pools is the analysis of δ13C and δ15N stable isotopes2,8,15-18. However, this method cannot 14 discriminate between lineages of primary producers that share similar isotopic signatures19. For instance, the very diverse lineages of macroalgae (Rhodophyta,

Chlorophyta and Phaeophyta) present diverse metabolic pathways for carbon fixation; thus, macroalgal isotopic signatures extend in a continuum that overlap with those of seagrasses and mangroves, rendering a high level of uncertainty in identification and estimation of marine macrophytes contributors to organic carbon pools20,21.

Alternatively, DNA barcoding can identify species by using a short sequence of DNA from a standard gene that is common among many taxa22-25. This technique is useful for taxonomical identification and census of species, replacing the classical based on anatomical features and visual differences of the specimens22. Molecular-based methods such as DNA barcoding, combined with analysis of environmental DNA

(eDNA), have the potential to identify marine macrophyte taxa in environmental samples

(e.g. water column or coastal sediments)26,27. eDNA refers to the DNA shed by an organism that can be collected from their environment. eDNA is used to track rare and invasive species, to assess fish stocks and to perform routinely biodiversity monitoring, replacing traditional methods that required capture and disturbance of the organisms27-29.

DNA barcoding of marine macrophytes (and plants in general) is a challenging task.

Contrary to metazoans, there is no universal barcode for macrophytes, implying the use of two and even three genes that are not tested in all lineages26,30-35. Nevertheless, even when using a synergic combination of two or three genes, the universality and species discrimination is limited to land plants; other groups such as marine angiosperms and macroalgae are not easily recoverable32,36,37. Two challenges must be addressed and surpassed in order to fingerprint marine macrophytes using DNA barcoding. First, the 15 chosen barcoding region must be suitable for a wide range of taxa, including marine angiosperms and the three lineages of macroalgae. Second, the primer chosen needs to aim amplification of a short DNA region (mini-barcode of 100-200 bp), thus fragmented

DNA can be recovered from sediment samples. Hitherto, we are aware of two eDNA studies fingerprinting marine macrophytes in coastal vegetated sediments, however no barcode has successfully discriminated a broad range of marine macrophytes. One study fingerprinted only marine angiosperms38, while the other focused only on macroalgae39.

In this doctoral dissertation I developed and used molecular-based methods to fingerprint eDNA contributions of marine macrophytes to blue carbon ecosystems, by correlating macrophyte eDNA abundance with organic carbon content. Furthermore, these eDNA-based estimations were compared against estimations of organic carbon based on stable isotopes analyses. The research questions addressed were developed in two parts and three chapters.

Part I focuses on fingerprinting blue carbon in coastal ecosystems. In Chapter I, my co-authors and I solved the DNA barcoding challenges and found a barcode suitable for identification of seagrass, mangroves, macroalgae and land plants from degraded DNA recovered in sediment samples. In addition, we created a DNA reference library for Red

Sea marine macrophytes.

In Chapter II, we experimentally correlated the abundance of macrophyte eDNA with known amounts of organic carbon added to a sediment mixture. We found a significant positive correlation between eDNA abundance and organic carbon content per marine macrophyte. Thus, we used our eDNA mini-barcode to fingerprint marine macrophytes and estimate their contribution to organic carbon pools in seagrass and mangrove 16 sediments in the Arabian Red Sea. Contributions of marine macrophytes to these organic carbon pools were also estimated using the traditional δ13C and δ15N stable isotopes analyses; eDNA-based estimations resemble stable isotope-based estimations. Hence, we demonstrate that eDNA is an unparalleled method for assessing blue carbon stocks.

Part II focuses on fingerprinting blue carbon in the open ocean. In Chapter III, taking advantage of metagenomes generated by two global ocean expeditions (Tara Oceans and

Malaspina Circumnavigation), we fingerprinted eDNA from water samples and traced costal export of macroalgae to the global open and deep ocean. This chapter provided the first quantitative evidence of macroalgal export and highlighted the role of macroalgae in oceanic carbon sequestration.

Dissertation Outcome

This research has three main original contribution to science: first, we provided a

DNA mini-barcode from the 18S universal gene that distinguish photosynthetic eukaryotes from diverse phylogenetic origins and recover degraded DNA from environmental samples. Moreover, a universal barcode such as 18S resolves most marine eukaryotic biodiversity in a single eDNA sample, allowing a parsimonious characterization of sediment communities including metazoans and primary producers.

Second, using this barcode, we demonstrate that eDNA analyses allow quantitative inferences on the relative contribution of marine macrophytes to blue carbon stocks.

Thus, we advocate for inclusion of eDNA in current and future plans to manage blue carbon ecosystems. The eDNA approach not only resembles inferences based on traditional stable isotope analyses, but also provides a significant improvement in 17 discriminating between marine macrophyte taxa; such a level of discrimination (i.e. species level) between organic carbon contributors had not been achieved in any blue carbon assessment to date. Environmental DNA is a cost- and time-effective approach that provide essential data to support and improve blue carbon strategies.

Third, we evidence the important role of macroalgae as contributors to blue carbon stocks in coastal Red Sea sediments and in the global open ocean. Hitherto, no study had included macroalgae in their blue carbon assessment. Our findings demonstrate that macroalgae are exported from their coastal habitats and are important allochthonous carbon source in both adjacent coastal sediments of seagrasses and mangroves, where macroalgae are the second-most contributors to the organic carbon pool; and in the open and deep ocean, where macroalgae is ubiquitously present up to 4,000 m depth and 4,860 km away from the nearest shoreline.

Publications

- Part I, Chapter I. A DNA-mini barcode for marine macrophytes. Under review in

Molecular Ecology Resources.

- Part I, Chapter II. Environmental DNA estimates marine macrophyte contribution to Blue Carbon sediments. Article submitted to Limnology & Oceanography.

- Part II, Chapter III. Important contribution of macroalgae to oceanic carbon sequestration. Published in Nature Geoscience. DOI: 10.1038/s41561-019-0421-8.

18

Additional publication:

Geraldi, Ortega et al. 2019. Fingerprinting blue carbon: rationale and tools to determine the source of organic carbon in marine depositional environments. Front. Mar. Sci.

6:263.

References

1 Donato, D. C. et al. Mangroves among the most carbon-rich forests in the tropics. Nat. Geosci. 4, 293-297 (2011).

2 Duarte, C. M., Kennedy, H., Marbà, N. & Hendriks, I. Assessing the capacity of seagrass meadows for carbon burial: current limitations and future strategies. Ocean Coast. Manage. 83, 32-38 (2013).

3 Duarte, C. M., Losada, I. J., Hendriks, I. E., Mazarrasa, I. & Marbà, N. The role of coastal plant communities for climate change mitigation and adaptation. Nature Climate Change 3, 961 (2013).

4 Duarte, C. M. et al. Seagrass community metabolism: Assessing the carbon sink capacity of seagrass meadows. Global Biogeochem. Cycles 24 (2010).

5 McLeod, E. et al. A blueprint for blue carbon: toward an improved understanding of the role of vegetated coastal habitats in sequestering CO2. Front. Ecol. Environ. 9, 552-560 (2011).

6 Herr, D. & Landis, E. Coastal blue carbon ecosystems. Opportunities for Nationally Determined Contributions. Policy Brief. Gland, Switzerland: IUCN. Washington, DC: TNC (2016).

7 Duarte, C. M. & Krause-Jensen, D. Export from seagrass meadows contributes to marine carbon sequestration. Front. Mar. Sci. 4, 13 (2017).

8 Kennedy, H. et al. Seagrass sediments as a global carbon sink: Isotopic constraints. Global Biogeochem. Cycles 24 (2010).

9 Krause-Jensen, D. & Duarte, C. M. Substantial role of macroalgae in marine carbon sequestration. Nat. Geosci. 9, 737-742 (2016).

10 Duarte, C. M., Middelburg, J. J. & Caraco, N. Major role of marine vegetation on the oceanic carbon cycle. Biogeosciences 1, 659-679 (2004).

11 Krause-Jensen, D. et al. Sequestration of macroalgal carbon: the elephant in the Blue Carbon room. Biol. Lett. 14, 20180236 (2018). 19

12 Dartnall, A. J. in Biogeography and ecology in Tasmania 171-194 (Springer, 1974).

13 Garden, C. J. & Smith, A. M. Voyages of seaweeds: The role of macroalgae in sediment transport. Sediment. Geol. 318, 1-9 (2015).

14 Garden, C. J., Currie, K., Fraser, C. I. & Waters, J. M. Rafting dispersal constrained by an oceanographic boundary. Mar. Ecol. Prog. Ser. 501, 297-302 (2014).

15 Bugalho, M. N., Barcia, P., Caldeira, M. C. & Cerdeira, J. O. Stable isotopes as ecological tracers: an efficient method for assessing the contribution of multiple sources to mixtures. Biogeosci. Disc. 5, 2425-2444 (2008).

16 Fourqurean, J. W. & Schrlau, J. E. Changes in nutrient content and stable isotope ratios of C and N during decomposition of seagrasses and mangrove leaves along a nutrient availability gradient in Florida Bay, USA. Chem. Ecol. 19, 373-390 (2003).

17 Geraldi, N. R. et al. Fingerprinting blue carbon: Rationale and tools to determine the source of organic carbon in marine depositional environments. Front. Mar. Sci. 6, 263 (2019).

18 Diefendorf, A. F., Mueller, K. E., Wing, S. L., Koch, P. L. & Freeman, K. H. Global patterns in leaf 13C discrimination and implications for studies of past and future climate. Proc. Natl. Acad. Sci. U. S. A. 107, 5738-5743 (2010).

19 Farquhar, G. D., Ehleringer, J. R. & Hubick, K. T. Carbon isotope discrimination and photosynthesis. Annu. Rev. Plant Biol. 40, 503-537 (1989).

20 Almahasheer, H. et al. Low Carbon sink capacity of Red Sea mangroves. Sci. Rep. 7, 9700 (2017).

21 Serrano, O., Almahasheer, H., Duarte, C. M. & Irigoien, X. Carbon stocks and accumulation rates in Red Sea seagrass meadows. Sci. Rep. 8, 15037 (2018).

22 Hebert, P. D. N., Cywinska, A. & Ball, S. L. Biological identifications through DNA barcodes. Proc. R. Soc. Lond., Ser. B: Biol. Sci. 270, 313-321 (2003).

23 Kress, W. J., Wurdack, K. J., Zimmer, E. A., Weigt, L. A. & Janzen, D. H. Use of DNA barcodes to identify flowering plants. Proc. Natl. Acad. Sci. U. S. A. 102, 8369-8374 (2005).

24 Newmaster, S. G., Fazekas, A. J. & Ragupathy, S. DNA barcoding in land plants: evaluation of rbcL in a multigene tiered approach. Botany 84, 335-341 (2006).

25 Hebert, P. D. N., Gregory, T. R. & Savolainen, V. The promise of DNA barcoding for taxonomy. Syst. Biol. 54, 852-859 (2005). 20

26 Hebert, P. D. N., Ratnasingham, S. & de Waard, J. R. Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc. R. Soc. Lond., Ser. B: Biol. Sci. 270, S96-S99 (2003).

27 Evans, N. T. et al. Quantification of mesocosm fish and amphibian species diversity via environmental DNA metabarcoding. Mol. Ecol. Resour. 16, 29-41 (2016).

28 Murray, D. C. et al. DNA-Based Faecal Dietary Analysis: A Comparison of qPCR and High Throughput Sequencing Approaches. PLoS One 6, e25776 (2011).

29 Pilliod, D. S., Goldberg, C. S., Arkle, R. S., Waits, L. P. & Richardson, J. Estimating occupancy and abundance of stream amphibians using environmental DNA from filtered water samples. Can. J. Fish. Aquat. Sci. 70, 1123-1130 (2013).

30 Hollingsworth, P. M., Graham, S. W. & Little, D. P. Choosing and using a plant DNA barcode. PLoS One 6, e19254 (2011).

31 Pennisi, E. Wanted: a barcode for plants. Science 318, 190 (2007).

32 CBOL, P. W. G. A DNA barcode for land plants. Proc. Natl. Acad. Sci. U. S. A. 106, 12794-12797 (2009).

33 Kress, W. J. & Erickson, D. L. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS One 2, e508 (2007).

34 Chase, M. W. et al. A proposal for a standardised protocol to barcode all land plants. Taxon 56, 295-299 (2007).

35 Rubinoff, D., Cameron, S. & Will, K. Are plant DNA barcodes a search for the Holy Grail? Trends Ecol. Evol. 21, 1-2 (2006).

36 Kuo, L.-Y., Li, F.-W., Chiou, W.-L. & Wang, C.-N. First insights into fern matK phylogeny. Mol. Phylogenet. Evol. 59, 556-566 (2011).

37 De Groot, G. A. et al. Use of rbcL and trnL-F as a two-locus DNA barcode for identification of NW-European ferns: an ecological perspective. PLoS One 6, e16371 (2011).

38 Reef, R. et al. Using eDNA to determine the source of organic carbon in seagrass meadows. Limnol. Oceanogr. 62, 1254-1265 (2017).

39 Queirós, A. M. et al. Connected macroalgal-sediment systems: blue carbon and food webs in the deep coastal ocean. Ecol. Monogr. 89, e01366 (2019).

21

PART I

Fingerprinting Blue Carbon in Coastal Ecosystems 22

Chapter 1

A DNA mini-barcode for marine macrophytes

This chapter in under review as an article in Molecular Ecology Resources. All co- authors are Alejandra Ortega, Nathan R. Geraldi, Rubén Díaz Rua, Sarah Bachmann

Øberg, Marlene Wesselmann, Dorte Krause-Jensen, and Carlos M. Duarte. Author contributions: AO and CMD conceived the research. AO, RDR, SBØ, and MW conducted the experiments. MW collected and identified Cyprus seagrasses. DKJ provided macroalgae from North Atlantic/Arctic. AO collected and identified macrophytes from the Red Sea, Cyprus, and Finland. AO and NG analyzed the data and all coauthors discussed the results. AO wrote the manuscript with input from all authors, who approved the submission.

23

1.1 Abstract

DNA identification of marine macrophytes is challenging, as there is no universal barcode and these taxa are poorly represented in DNA databases. Thus, not so many studies focus on marine macrophyte metabarcoding from environmental samples. Here, we searched for a short barcode able to fingerprint marine macrophytes from tissue and from coastal sediments; with this mini-barcode, we created a DNA reference library.

Identification of seagrass, mangrove and marine macroalgae (, Rhodophyta and Phaeophyceae) was tested using 18 primer pairs from six barcoding genes: the recommended plant barcodes rbcL, matK and trnL, plus the genes ITS2, COI and 18S.

Barcoding based on two primers from the 18S gene showed the highest universality among marine macrophytes, amplifying 95-100% of samples; amplification performance of the other barcodes was limited. A phylogeny-based approach was used to assign taxonomy, clustering the sequences with available references from databases. Macrophytes were accurately identified within their phyla (88%), order (76%), genus (71%) and species

(23%). Nevertheless, identification at genus or species level can be improve by including more sequence references, hitherto unavailable. Out of 86, only 41 and 13 macrophytes had a reference sequence at genus or species level, respectively. Using our created macrophyte’s DNA library, we identified 21 marine macrophyte species in sediment samples, along with other 1,149 marine eukaryotes. We recommend this 18S mini-barcode to fingerprint both marine macrophytes and other eukaryotes in a single environmental sample such as coastal sediments; this barcode is also suitable for assessment of gut contents from marine herbivorous.

24

1.2 Introduction

Marine macrophytes, especially macroalgae, form highly productive ecosystems, with a global net primary production of about 2 Pg C year-1 spread across 4 million km2 in coastal areas 1. Marine macrophytes export most of their net production to the global ocean and seafloor 2-5. Exported macrophyte biomass play important roles in marine ecosystems, such as contributing to carbon sequestration in coastal and deep-sea sediments 5, and supporting food webs beyond the macrophyte habitat 6. However, tracing the export of marine macrophytes have proved a challenge7. Several techniques have been used to determine provenance of primary producers in studies of marine sediments, food webs and diet analyses based on gut content. Among these, stable isotopes 13C and 15N in bulk biomass are commonly used 8,9. However, isotopic signals often overlap among primary producers, failing to provide conclusive fingerprinting of each primary producer 7-10.

Environmental DNA (eDNA) metabarcoding has a powerful potential for the identification of marine macrophytes, and can improve understanding of carbon sequestration in marine sediments. Sampling and sequencing of eDNA —or the DNA left by organisms in their surroundings, is increasingly used to assess and monitor marine biodiversity 11-14. However, marine macrophytes are underrepresented in the eDNA assessments from marine environments, while metazoans are the main targets. We are aware of only one attempt to trace marine macrophytes: a pioneer study used eDNA to resolve seagrass, mangrove, saltmarshes and macroalgae in sediment samples, yet identification of macroalgae failed 15. Another study reported marine biodiversity from eDNA using ToL-metabarcoding, and although four photosynthetic eukaryote phyla were 25 recovered, the results did not provide deeper differentiation among major macrophyte lineages 13.

Some phylogenies in the animal kingdom have a universal barcode and primers are available; other phylogenies in the same kingdom or beyond do not have the benefit of a universal primer. This is the case of photosynthetic eukaryotes, including marine macrophytes, which present a huge genetic divergence and lack barcoding universality 16.

Eukaryota has eight major groups 17. Marine macrophytes belong to two groups: plants, red and green algae comprise several clades of Archeoplastida, and brown algae are a whole clade in the Heterokont group. On the contrary, Metazoans, the target of most eDNA studies to date, is only a single clade within the group Opistokont 17. Thus, barcode development for photosynthetic eukaryotes, and particularly understudied groups such as marine macrophytes, presents a higher complexity in comparison to the development of barcodes for animals 16,18-30.

The Plant Working Group of the Consortium for the Barcode of Life (CBOL) suggests the use of a conservative locus with a faster-evolving gene region to accurately identify plants 16. Nevertheless, many of the available barcodes and primers are designed for some angiosperms and are not recoverable in other photosynthetic eukaryotes such as marine macrophytes, and particularly macroalgae 16,18-22,24-34. With the exception of some COI primer combinations that can identify a several groups within Rhodophyta 35, regions such as trnH-psbA, trnL, matK, and rbcL have been tested in selective taxa of macroalgae with little PCR or sequencing success 36-40.

The limitations on marine macrophyte differentiation are mainly due to the minimum scientific interest on studying these groups, reflecting an insufficient availability of 26 macrophyte DNA resources such as barcodes and sequence libraries. Furthermore, although quality of eDNA depends on several abiotic factors41, eDNA quality is generally lower than the quality of DNA isolated directly from the organisms42,43; thus, metabarcoding from environmental samples relies mainly on fragmented eDNA44. The average size of amplicons from most available plant barcodes is over 500 bp, whereas much shorter barcodes (100-300 bp) are required for amplification of fragmented eDNA 42-44.

Nevertheless, the level of taxonomical resolution reduces with barcode length. This reduction may be an acceptable trade-off for environmental applications requiring biological assessment of broad taxonomic categories, where classification of marine macrophytes would be enough at the level of major lineages. For our purposes, differentiate primary producers in coastal sediments, identification of marine macrophytes at family, order and even phylum level is an acceptable trade-off, as these levels of taxonomical identification have never been achieved with traditional methods such as the stable isotopes.

The lack of DNA barcodes and sequence libraries for marine macrophytes, and the need of mini-barcodes, restrict the application of eDNA analysis to fingerprint marine macrophytes in sediment samples 7,15. In this study, we searched for barcodes and primers able to fingerprint seagrasses, mangroves and red, brown and green macroalgae

(Rhodophyta, Phaeophyceae and Chlorophyta, respectively); we created a DNA reference library for macrophytes, and we optimized the recovery of these macrophytes with the chosen barcode from eDNA collected in coastal sediments. We searched both barcode universality and barcode specificity for each macrophyte group. The tested barcodes 27 include the CBOL suggested regions rbcL 15,42, matK 16, trnL 21,44, ITS2 34 and COI 35; and the 18S gene, widely known for its universality across eukaryotes 45-47.

1.3 Methods

Primer selection

We tested the amplification of barcodes rbcL, matK, trnL, COI, ITS2 and 18S using 18 primer pairs (hereafter called primer) on DNA extracted from tissue of macroalgae, mangrove, seagrass, and terrestrial plants (including ferns, gymnosperms and angiosperms). Primers were tested one at each time, testing a new primer or barcode until finding desirable results. Plastid regions rbcL, matK and trnL are recommended candidates by the CBOL Plant Working Group 16. rbcL and matK genes present high levels of species discrimination in angiosperms15,28, and rbcL has also been used for identification of marine angiosperms in sediment samples 15. Although trnL intron and the nuclear internal transcribed spacer ITS2 have low phylogenetic resolution in closely related samples, these regions have shown identification of bryophytes and gymnosperms 21. Furthermore, these barcodes perform better than others on degraded plant samples 21,34,44.

The initial aim was to find a specific barcode for each marine macrophyte group, but macroalgae amplification was challenging. Since the proposed combinations of plastid genes were less efficient beyond angiosperms 32, and since our needs do not require high taxonomical identification and can be fulfilled with resolution at even phylum level, we included the universal barcodes from the mitochondrial COI and the nuclear ribosomal 18S genes. Both regions have successfully identified some macroalgae groups, nevertheless based on long-length primers 35,39. 28

Of the 18 primer combinations tested, five primers were designed in this study, while the rest were selected from the literature (Table 1). EcoPrimers software was used to design the primers with default parameters 48, scanning against sequences of the rbcL and matK regions in Rhodophyta, Chlorophyta, Phaeophyceae, seagrasses and mangroves available in the BOLD database (http://v3.boldsystems.org). Targeted primers were less than 150 bp length, following a mini-barcode approach42.

29

Table 1. Primer pairs tested for identifying marine macrophytes. Primers amplify ITS2, 18S, matK, COI, rbcL, and trnL DNA regions. Length Tm Primer pair Primer sequence (5'-3') Reference (bp) (◦C) 18S-V4 F - CCAGCA(G/C)C(C/T)GCGGTAATTCC 400 57-48 Stoeck et al. 2010 R - ACTTTCGTTCTTGAT(C/T)(A/G)A 18S1 F - CCAGCASCYGCGGTAATTCC 399 58 This paper R - ACTTTCGTTCTTGATYRA 18S2 F - CCAGCASCYGCGGTAATTCC 130 58 This paper R - CCTTCYGCAGGTTCACCTA Euka02 (18S) F - TTTGTCTGSTTAATTSCG 133 53 Guardiola et al. R - CACAGACCTGTTATTGC 2015 rbcL F - GCGGGTGTTAAAGAGTACAA 146 59 This paper R - AGTAGAAGATTCGGCAGCTA minirbcL F52- F - GTTGGATTCAAAGCTGGTGTTA 350 53 Little et al. 2014, rbcLB R - AACCYTCTTCAAAAAGGTC Reef et al. 2017 sgmatK1 F - TCCCGGATTCCAGATGTTCC 136 58 This paper R - AGGAACCGAAAGAGTCTTGGA sgmatK2 F - TGTTACCTTGTTCGTTTTTGGCA 142 58 This paper R - CTCCGGACTGCCAAAGGATT mgmatK1 F - ACTGCTTGTGAAACGGTTAA 130 58 This paper R - ACCCCTCCGATATGATTTGA mgmatK2 F - CGCCAGTGGATTCTATGTTT 128 58 This paper R - TCGCTCTTTTGATTTCGGAA COI GF1-GR1 F - TCAACAAATCATAAAGATATTGG 740 60 Saunders 2005 R - ACTTCTGGATGTCCAAAAAAYCA COI GF1-DR1 F - TCAACAAATCATAAAGATATTGG 740 50 Saunders 2005 R - AAAAAYCARAATAAATGTTGA COI GF2-GR1 F - CCAACCAYAAAGATATWGGTAC 740 58 Saunders 2005 R - ACTTCTGGATGTCCAAAAAAYCA COI GF2-DR1 F - CCAACCAYAAAGATATWGGTAC 740 50 Saunders 2005 R - AAAAAYCARAATAAATGTTGA ITS2 F - ATGCGATACTTGGTGTGAAT 162 56 Yao et al. 2010 R - GACGCTTCTCCAGACTACAAT trnL c-d F - CGAAATCGGTAGACGCTACG 456 55 Taberlet et al. 2006 R - GGGGATAGAGGGACTTGAAC trnL g-h F - GGGCAATCCTGAGCCAA 40 55 Taberlet et al. 2006 R - CCATTGAGTCTCTGCACCTATC trnL bryo_P6 -h F - GATTCAGGGAAACTTAGGTTG 200 55 Epp et al. 2012, Taberlet et al. 2006 R - CCATTGAGTCTCTGCACCTATC

30

Sample collection

To create a DNA reference database, tissue of 126 macrophytes and terrestrial plants were collected along the Arabian coast of the Red Sea, in Cyprus, in Finland, and in

Greenland. Macrophytes included lineages of interest for this research: terrestrial and marine angiosperms (seagrass and mangrove) and macroalgae (Rhodophyta, Chlorophyta, and Phaeophyceae). Seven gymnosperms and two ferns were included to test primer’s universality for macrophytes (Table 2). Seagrasses, mangroves and other Embryophyta were identified following common botany knowledge, while macroalgae were identified following guides 49-51, and the AlgaeBase 52 when species were not available in the taxonomic keys. Collection was done by picking up individual macrophytes in plastic bags; samples were frozen or dried until DNA extraction.

Sediment samples were collected in five seagrass meadows and five mangrove forests along the Arabian coast of the Red Sea (28o-18oN, 35o-41o E). These coastal habitats host the seagrass species Halodule uninervis, Halophila stipulacea, H. decipiens, H. ovalis,

Thalassodendron ciliatum and Thalassia hemprichii, and mangrove Avicennia marina.

Sediments were collected using gloves and bleach-sterilized 50 ml tubes, placed into sterilized bags, kept cold and protected from the light, then frozen at -80 ◦C until eDNA extraction.

31

Table 2. Species used for testing barcodes. Questionable identification is denoted as “?” for some species. Abbreviations: NA, North Atlantic; E., Embryophyta.

Lineage Species Location Lineage Species Location Chlorophyta brownii Red Sea Phaeophyta Halidrys siliquosa NA (Arctic) Chlorophyta Caulerpa racemosa var. cylindracea Red Sea Phaeophyta Laminaria digitata NA (Arctic) Chlorophyta Caulerpa racemosa var. lamourouxii? Red Sea Phaeophyta Laminaria solidungula NA (Arctic) Chlorophyta Caulerpa serrulata Red Sea Phaeophyta Padina boergesenii? Red Sea Chlorophyta Caulerpa serrulata f. spiralis Red Sea Phaeophyta Padina boryana? Red Sea Chlorophyta Caulerpa sertularioides Red Sea Phaeophyta Padina sp. Red Sea Chlorophyta Caulerpa taxifolia Red Sea Phaeophyta Punctaria glacialis NA (Arctic) Chlorophyta Chaetomorpha linum NA (Arctic) Phaeophyta Punctaria plantaginea NA (Arctic) Chlorophyta Codium fragile NA (Arctic) Phaeophyta Pylaiella varia NA (Arctic) Chlorophyta Dictyosphaeria cavernosa Red Sea Phaeophyta Saccharina latissima NA (Arctic) Chlorophyta Enteromorpha prolifera NA (Arctic) Phaeophyta Saccharina nigripes NA (Arctic) Chlorophyta Halimeda discoidea? Red Sea Phaeophyta Sargassum aspe Red Sea Chlorophyta Halimeda incrassata Red Sea Phaeophyta Sargassum ilicifolium? Red Sea Chlorophyta Halimeda macroloba? Red Sea Phaeophyta Sargassum mutium? Red Sea Chlorophyta Halimeda sp. Red Sea Phaeophyta Sargassum natans? Red Sea Chlorophyta Ulva lactua NA (Arctic) Phaeophyta Scytosiphon complanatus NA (Arctic) Chlorophyta Ulva? sp. Red Sea Phaeophyta Stictyosiphon tortilis NA (Arctic) Chlorophyta aegagropila Red Sea Phaeophyta Turbinaria ornata var. serrata Red Sea Chlorophyta Valonia ventricosa Red Sea Rhodophyta Chondrophycus? papillosus? Red Sea Phaeophyta Agarum clathratum NA (Arctic) Rhodophyta Chondrus crispus NA (Arctic) Phaeophyta Alaria esculenta NA (Arctic) Rhodophyta Coccotylus truncatus NA (Arctic) Phaeophyta Ascophyllum nodosum NA (Arctic) Rhodophyta NA (Arctic) Phaeophyta Chaetopteris plumosa NA (Arctic) Rhodophyta Dilsea carnosa NA (Arctic) Phaeophyta Cystoseira trinodis Red Sea Rhodophyta Furcellaria lumbricalis NA (Arctic) Phaeophyta Desmarestia aculeata NA (Arctic) Rhodophyta Gracilaria sp. Red Sea Phaeophyta Dictyosiphon foeniculaceus NA (Arctic) Rhodophyta Hypnea cornuta Red Sea Phaeophyta Dictyota crispata? Red Sea Rhodophyta Hypnea sp. Red Sea Phaeophyta Dictyota humifusa? Red Sea Rhodophyta Jania pumila Red Sea Phaeophyta Fucus evanescens NA (Arctic) Rhodophyta Laurencia mcdermidiae Red Sea Phaeophyta Fucus serratus NA (Arctic) Rhodophyta Laurencia mcdermidiae? Red Sea Phaeophyta Fucus vesiculosus NA (Arctic) Rhodophyta Laurencia sp1. Red Sea 32

Table 2. Continuation. Species used for testing barcodes. Lineage Species Location Lineage Species Location Rhodophyta Laurencia? sp2. Red Sea E. (Angiosperm) Aloe vera Red Sea Rhodophyta Laurencia? sp3. Red Sea E. (Angiosperm) Calistemon viminalis Red Sea Rhodophyta Lithothamnion? sp. Red Sea E. (Angiosperm) Caryota mitis Red Sea Rhodophyta alata NA (Arctic) E. (Angiosperm) Chamacrops hormilis Red Sea Rhodophyta Odonthalia dentata NA (Arctic) E. (Angiosperm) Cocos nucifera Red Sea Rhodophyta Palmaria palmata NA (Arctic) E. (Angiosperm) Cordia subcordata Red Sea Rhodophyta Phycodrys rubens NA (Arctic) E. (Angiosperm) Cymbopogon sp. Red Sea Rhodophyta Phyllophora pseudoceranoides NA (Arctic) E. (Angiosperm) Erythrina crista-galli Red Sea Rhodophyta Polysiphonia arctica NA (Arctic) E. (Angiosperm) Erythrina indica Red Sea Rhodophyta Polysiphonia nigrescens NA (Arctic) E. (Angiosperm) Eucalyptus sp. Cyprus Rhodophyta Rhodomela confervoides NA (Arctic) E. (Angiosperm) Ficus nitida Red Sea Rhodophyta Turnerella pennyi NA (Arctic) E. (Angiosperm) Halocnemum strobilaceum Red Sea Rhodophyta Unknown sp5. (Rhodophyta) Red Sea E. (Angiosperm) Nerium oleander Red Sea Rhodophyta Unknown sp6. (Rhodophyta) Red Sea E. (Angiosperm) Olea europaea Cyprus Macroalgae Unknown sp1. Red Sea E. (Angiosperm) Phoenix canarensis Red Sea Macroalgae Unknown sp2. Red Sea E. (Angiosperm) Pistacia lentiscus Red Sea Macroalgae Unknown sp3. Red Sea E. (Angiosperm) Plumeria rubra Red Sea Macroalgae Unknown sp4. Red Sea E. (Angiosperm) Punica granatum Cyprus E. (Mangrove) Avicennia marina Red Sea E. (Angiosperm) Ravenala madagascariensis Red Sea E. (Mangrove) Rhizophora mucronata Red Sea E. (Angiosperm) Suaeda monoica Red Sea E. (Seagrass) Cymodocea nodosa Cyprus E. (Angiosperm) Thevetia peruviana Red Sea E. (Seagrass) Cymodocea rotundata Red Sea E. (Angiosperm) Vitis sp. Cyprus E. (Seagrass) Cymodocea serrulata Red Sea E. (Angiosperm) Zizyphus jujuba Red Sea E. (Seagrass) Enhalus acoroides Red Sea E. (Other) Auraucaria sp. Cyprus E. (Seagrass) Halodule uninervis Red Sea E. (Other) Callitris sp. Cyprus E. (Seagrass) Halophila decipiens Red Sea E. (Other) Cupressus sempervirens Cyprus E. (Seagrass) Halophila ovalis Red Sea E. (Other) Cycas revoluta Red Sea E. (Seagrass) Halophila stipulacea Red Sea E. (Other) Cyperus sp. Red Sea E. (Seagrass) Posidonia oceanica Cyprus E. (Other) Pinus silvestris Finland E. (Seagrass) Thalassia hemprichii Red Sea E. (Other) Pinus sp. Finland E. (Seagrass) Thalassodendrom ciliatum Red Sea E. (Other) Polypodium vulgare Finland E. (Angiosperm) Adenium obesum Red Sea E. (Other) Pteridium aquilinum Finland 33

DNA extraction

To remove epiphytes growing on marine macrophytes (seagrass, mangroves and macroalgae), the tissue surface was cleaned with a scalpel and sterilized by applying bleach and ethanol for a fixed time 53. However, we noticed after sterilization that the procedure should be adjusted to each sample, as the bleach removed pigments and it has been shown to remove DNA54. For instance, green algae (i.e. Ulva lactuca) should be bleached for a shorter time (1 min), than brown algae (i.e Turbinaria ornata), whose cell wall have greater resistance to bleach-sterilization. Homogenization was done grinding the tissue with a pestle in a mortar with liquid nitrogen. Genomic DNA was extracted from

100 mg of homogenized tissue using the NucleoSpin 96 Plant II kit (Macherey-Nagel) according to the manufacturer’s protocol.

Intracellular and extracellular environmental DNA from each of the 10 sediments was extracted using chloroform:isoamyl alcohol following method developed by Lever et al.55.

A step-by-step protocol of this method and some tips are in the Supplementary Information.

DNA amplification and sequencing

DNA extraction and PCR preparation were performed in separate areas of the laboratory and with separate material. The PCR station was UV sterilized before each use. Annealing temperature per primer was estimated after running a triplicated gradient PCR (40-60 ◦C, optimal Tm are reported in Table 1). PCR cocktails of each primer were performed in five replicates of 10 μl reactions, by adding 5 μl of QIAGEN Multiplex PCR Master Mix, 1 μl of each primer (1.0 μM), 1 μl of DNA template, and 2 μl of PCR water. Thermocycler program for primer 18S-V4 was as described in Stoeck et al. 46, while for the other primers

33 34 was: 15 min at 95 ◦C, followed by 35 cycles of 30 s at 94 ◦C, then 45 s at each primer Tm

(see Table 1), 90 s at 72 ◦C, and a final extension at 72 ◦C for 10 min. eDNA were diluted at 1:1, 1:10, 1:50 or 1:100 before PCR to improve amplification performance; these dilutions reduced PCR inhibitors.

Amplification of each of the 18 primers was initially tested on a subset of 10-20 species including all lineages of marine macrophytes; additional species were added according to amplification performance of the candidate primers (Table 3, Figure 1). Blanks for

DNA/eDNA and PCR were run for each primer. Mangrove Avicennia marina was included as PCR positive control in the initial tests for all primers; further macrophytes, once amplification of the species was confirmed, were included as controls when mangrove

DNA was not available. PCR products were visualized on 2% agarose gel. Successful amplicons from the best two primer candidates were excised from gel, purified with Wizard

SV Gel and PCR Clean-up purification kit (Promega). DNA concentration was measured by Qubit 2.0 Fluorometer (Invitrogen).

Since we used up most of the PCR amplicons of one of the primer candidates during initial sequencing tests that failed, we could not create a DNA reference library with equal number of species for the two candidates. Thus, eDNA sequencing was performed only with one primer. DNA from tissue samples and eDNA from sediments were sequenced, separately, using the MiSeq Illumina platform. Amplicons were cleaned up and indexed following Illumina Metagenomic Sequencing Library Preparation protocol (Illumina).

Before opting for MiSeq sequencing on the macrophyte tissue, we initially sequenced

DNA using a Sanger platform, first on non-sterilized and then on sterilized samples.

However, those sequencing attempts failed, as most of the sequences had multiple

34 35 nucleotides per site, most likely due to inference of epiphytes growing on the surface of the marine macrophytes.

Sequencing analyses and clustering

Illumina MiSeq sequencing outputs were demultiplexed using the Illumina protocol and primers were trimmed using Cutadapt 1.17 with default settings 56. FASTQ files were analyzed following default DADA2 pipeline version 1.8.0 57 using R Studio version

1.1.453 58; sequence variants were generated. DADA2 inspects the read quality, preprocess the data by filtering and trimming the sequences (parameters as default,

TruncLen=c(110,110)), learns the error rates, eliminates redundant reads, merges paired reads, removes chimeras and then provides an amplicon sequence variants table.

DNA sequences from macrophyte tissue were compared against the SILVA database using the assignTaxonomy function in DADA2, but the taxonomical assignment was unsuccessful. Assigning species to sequences using reference databases depends on availability of sequence libraries, yet most macroalgae taxonomy is undescribed and we found that the genome reference databases are incomplete and could not identify the species we sequenced. As the traditional assignment techniques preclude identification of most species of macroalgae, we used a phylogeny-based approach for taxonomical assignment. The top five most abundant MiSeq sequence variants per photosynthetic eukaryote were aligned (MUSCLE), along reference sequences from a costume database of 102 macrophytes from similar species, genera, or orders. This database was based on the SILVA 132 SSURef_Nr99_tax reference database (http://www.arb-silva.de), but included only sequences that amplified in silico using the virtualPCR function with default

35 36 parameters from the Insect package 59 in R. The sequences were clustered in GENEIOUS version 11.1.4 (Biomatters Ltd.) with a PhyML tree using TN93 model and 1,000 replicates

60. Phylogenetic trees were edited in FigTree 61. From the five most abundant sequence variants, the sequence variants that clustered near the expected reference taxon or the closest available (genus, family, order), were determined to be the representative of the species sampled, and a DNA reference library of these macrophytes was created. Using this DNA reference library and the SILVA database, taxonomy was assigned for the eDNA sequences using the assignTaxonomy function in DADA2.

36 37

Figure 1. Schematic representation of the steps followed to choose a suitable DNA barcoding to distinguish major marine macrophyte lineages.

1.4 Results

Primer suitability

As none of the barcodes were lineage-specific, the choice of criteria for primer selection was based on the ability of the primers to amplify the majority of species across a wide range of macrophytes. Amplification of the 18 primers was initially tested on 10-20

37 38 species, including red, green, and brown macroalgae, and land and marine angiosperms

(Table 2). According to the amplification performance, the primers that amplified all lineages were tested a second time on a broader range of samples (33-119 species; Table

3).

The nuclear ribosomal 18S barcode performed the best. The four 18S primers amplified across all macrophyte (Table 3). However, two primers (18S1 and 18S-V4) amplified only

30-67% of the macrophytes, and an even lower percentage of macroalgae. Conversely, primers 18S2 (V9 region) and Euka02 (V7 region) amplified well in all lineages (95-100%,

88-119 species; Table 3). Hence, 18S2 and Euka02 were chosen for the sequencing tests.

Performance of barcodes from plastid regions trnL, matK and rbcL was limited for macroalgae (Table 3). All primers from trnL and three of the four matK primers amplified angiosperms but failed to amplify macroalgae; overall amplification was between 21 to

29% of the species tested. The remaining matK primer (sgmatK2) amplified all lineages

(83%, 35 out of 41 species), although macroalgae amplification was limited to 57% of species tested (8 out of 14 species). One of the rbcL primers amplified in all lineages but did not work in all the samples (60%, 9 out of 15 samples), while the minirbcL primer did not amplify land angiosperms (47% of the other macrophytes, 14 out of 30 samples; Table

3).

38 39

Table 3. Amplification performance of primer pairs for lineage of photosynthetic eukaryotes. Number of samples varied among primers and lineages. Percentage represents species (not total of samples) that amplified among the total number of species tested. Not all amplicons were sequenced. ‘Land other’ includes coniferous trees, ferns, and a cycad species.

Primer pair Total Macroalgae Mangrove Seagrass Land angiosperms Land other

18S2 100% (119/119) 100% (81/81) 100% (2/2) 100% (7/7) 100% (24/24) 100% (5/5) Euka02 94% (81/86) 97% (64/66) 100% (2/2) 100% (11/11) 100% (2/2) 40% (2/5) 18S-V4 67% (10/15) 63% (5/8) 100% (1/1) 100% (1/1) 60% (3/5) - 18S1 30% (20/67) 12% (5/42) 50% (1/2) 75% (3/4) 58% (11/19) - rbcL 60% (9/15) 67% (4/6) 50% (1/2) 25% (1/4) 100% (3/3) - minirbcL 47% (14/30) 50% (7/14) 50% (1/2) 71% (5/7) 0% (0/6) 0% (0/1) sgmatK2 85% (35/41) 57% (8/14) 100% (2/2) 83% (5/6) 100% (19/19) - sgmatK1 21% (7/33) 0% (0/7) 50% (1/2) 50% (2/4) 21% (4/19) 100% (1/1) mgmatK1 29% (4/14) 0% (0/7) 50% (1/2) 0% (0/2) 100% (3/3) - mgmatK2 29% (4/14) 0% (0/7) 50% (1/2) 0% (0/2) 100% (3/3) - COI GF1-GR1 7% (3/45) 8% (3/39) 0% (0/1) 0% (0/2) 0% (0/3) - COI GF1-DR1 11% (5/45) 13% (5/39) 0% (0/1) 0% (0/2) 0% (0/3) - COI GF2-GR1 9% (4/45) 10% (4/39) 0% (0/1) 0% (0/2) 0% (0/3) - COI GF2-DR1 2% (1/45) 3% (1/39) 0% (0/1) 0% (0/2) 0% (0/3) - ITS2 29% (4/14) 0% (0/7) 0% (0/1) 0% (0/1) 80% (4/5) - trnL c-d 20% (2/10) 0% (0/3) 0% (0/1) 0% (0/3) 67% (2/3) - trnL g-h 10% (1/10) 0% (0/3) 0% (0/1) 0% (0/3) 33% (1/3) - trnL bryo_P6 -h 10% (1/10) 0% (0/3) 0% (0/1) 0% (0/3) 33% (1/3) -

39 40

The nuclear internal transcribed spacer ITS2 amplified land angiosperms exclusively, while the four primers of the mitochondrial COI gene amplified only a small percentage of macroalgae and none of the Embryophyta (2-11%; Table 3).

Amplification of environmental DNA was tested with the primers sgmatK2, the two rbcL primers, and with 18S2 and Euka02 from the 18S barcode; all barcodes amplified on eDNA dilutions at 1:1, 1:10, 1:50 or 1:100. The best performance was achieved with the 18S barcodes.

DNA reference library

MiSeq sequencing from macrophyte tissue was implemented on amplicons from primers 18S2

(11 samples) and Euka02 (86 samples). Due to a higher number of samples, results for the DNA reference library focus on the sequences from the Euka02 primer. Similarly, as the DNA library was a requirement for macrophyte fingerprinting, eDNA metabarcoding of the sediments was performed on Euka02 amplicons. Nevertheless, we have successfully used both primers for different applications on eDNA from marine sediments (data not shown).

Taxonomical assignment based on blasting against the SILVA database was unsuccessful for most marine macrophytes, due to a lack of DNA references for those species in the database.

Therefore, we followed a phylogeny-based approach and clustered the sequences from our 86 macrophytes along with 102 reference sequences ranging from the same species or genus, when available, to other species within the same order of the tested macrophytes.

Since the MiSeq output for each single macrophyte sample generated several sequence variants, we chose the five most abundant sequence variants per sample. These sequences represented about

88% of the total reads per sample (94% in Embryophyta, 93% Phaeophyceae, 85% Rhodophyta, and 74% in Chlorophyta). We assumed that the actual macrophyte is within the top sequence

40 41 variants, while the others could be epiphytes either growing on the surface of the tissue or could be endophytes living within the macrophyte cells. Taxonomy was assigned to the sequence variant

(or sequence variants) that clustered near the expected reference taxon. Following this approach, we created a DNA reference library for 76 out of the 86 macrophytes sequenced with the Euka02 primer.

Discrimination success

Barcoding with the 18S gene clearly distinguished each lineage of macrophytes. Maximum- likelihood trees from the Euka02 primer displayed high taxonomical cohesion, except for green algae (Figure 2; similar findings from the primer 18S2 are shown in the Supplementary Figure 1).

The embryophytes (marine angiosperms and land plants), brown and sequences grouped in their own assemblages and clustered with the expected reference sequences (Figure 2).

41 42

R

e

f _

S R i a i x a

T y d e s R h i

s le h r R 2 a r f o c e i _ e i p _ e e a n o e R a d a r l f r H i i l s d t a f g f _ i r a i i m e p u _ o R l im _ a a o H i a s o s u R C f H p s n a a i H l d e _ r C s d e m r n e a o r e c P t y v b i f m a Z n i a o e m o l _ y i l u a r f du o m u u o o a u _ l u s o c u l h C m d o p y o o d s t s g m _ s C d d l d a e o s _ p R n y d o s i p l o _ a o a a n a y u n _ d e a r n d t a _ l r e m _ C r n H h u i t s m d e i _ a c . u o d l _ s i _ f o i a _ e b e e y l s a _ i r a s l a _ o o s h p c r s r H u s n l e u _ i r c a o a p _ i a l s _ _ e l H c a o e s r d o p o e p t e i a _ n m n i e e d _ p s t a a _ e a a a _ o e a o e m e h s a Re l p a w o l i o l u o f l i c o o m t li s l l _ r i l s o h i i ac a n l c t i a h a h a _ _ n g h a l p o c i u o l e o _ r a e e r _ v o t e n g d f e s d f i a i l R n i h n i c h p e i i u a _ H l c c o s d p l m i s a g f t a Z H n i a r y a a _g e a o r E a r h i h o T n H i e i _ o u a e l l v o n o m _ i l _ h _ a l _ _ n n m f _ i _ L i t b c d a i d i n i a f i i s l a l a f B b _ l _ i s l s r s u f l a o u a f_ l o t i l y _ r s _ n _ a r m B y i l a i l H n o p e o e n t e _ i a f _ c o o m e F n d u t e h s i _ i f _ u o i a u y C n a n o b t p r i s 2 . e um p t R f _ u t c R r p i R o r l R e g l n l R i u a m a f _ B a n o h l u u C e n ho i e p v a R f o o _ e n e _ J o e a p i u l t R t a am it l f l _ a a l _ a R e f e m l r t R _ h a d a _ C p i h u l c l 2 R e f r t a o _ a H i _ a i a t R _ N e i e l n e s e f h n _ f a a s c t a R _ la o n _ a _ e f L l t o . l _ t in e a R _ h o m 2 A o i _ 1 e e t o h u R p d p e a R f i h g t l p v e e u _ 2 e c t li l s . P i f h r R r i ta y _ 1 A c _ i c l v 1 R u _L o p a v a en l i a i L e r h n s s s lm A a p c s F ef _ d p o a o u A ic l _ i e f M i _ t c s e a n o m e R e y o n n a c v n i e n a f_ H h o t lo ic ri a R it m i n lo il D e n a _ _ in s e f_ n e f i n ia _ m v R L a d _ p 1 c _ p e o e _ h m _ a a _ ty n a a r r R f t a a li p e V o ia m l r a e o h li a s_ a a t _ a m in R h t a h u in lo a m r a it o h t c u i H n _ in a L h t n y g i a i c ar a ta it n o h n ch e l a ri i _ L o d p a ri ia op _v s n 1 d O o s p id R h e p a_ _ r _ is e i n a 2 O f d ia em rm R f_ E la tri ta e n r h e rm e Eu u _ c _ R o sa _ d fo f_ c c ov o 2 h s ia ili E a al a s C le s mc f uc ly yp li a e s a_ a_ al pt t s_ D la ci ci a R yp us us 1 a n n us ef_ tu _ _ Th re re bt P s gr sp u au o 1 un _c an . La L a_ a_ ic oc d f_ ci os a_ c is e en rn s gr ife R r a pu E La an ra au _c ris s s nh ure at L ea _c en en H alu nc um ils us rub sc Re al s_ ia D dr s_ gre f_R op ac _s on ry ni hi hil or p. C od ia_ ta zop a_ oid yc on ala ho de es h ph ra_ des Rh ra_ cip P si pte voi izo mu ien oly no fer pho cr s P ra con 2 Re ra_ ona mb la_ ae_ f_B mu ta Me me guin alan cro do san es Cup op nat Rho ia_ oid ress s_pa a sar pne us_ nch eles a_hy is Ref_C sem eri D lise iform upre perv f_De usc ssus irens Re ea_m _ton Hypn oides Pos kinen Ref_ char idonia sis pnea_ _ocea f_Hy atus nica_ Re _trunc Ref_Pi 1 otylus naides nus_tae Cocc udocera R da ra_pse ef_Ulvop hyllopho hyceae_s P mansii Ref_Cap p. elidium_a sosiphon_fu Ref_G is lvescens sis_lemaneiform Ref_Gracilariop Ref_Halimeda_crypti ca Ref_Gracilaria_chorda Ref_Halimeda_discoidea_2 Ref_Desmarestia_viridis Ref_Halimeda_minima Ref_Ectocarpus_silic a_goreaui ulosus Ref_Halimed Ref_ 1 Ectocarpus_sp a_discoidea_ . Ref_Halimed Ref_Ecto rophysa carpales_s eda_mac Pun p. ef_Halim eata ctaria_p R da_cun P lantaqili Halime ylaiel ca Ref_ sis la_vari rneen Ref_ a da_bo a Distr alime drace R omiu Ref_H cylin s ef_E m_de eda_ ensi xallo cumb alim _loo Ref_ soru ens_ ef_H tea sus Exa s_ha 2 R _Udo ulo Fu llos rvey Ref nod sa cus orus anus lus_ mo D _ve _ols icil ace esm sicu eni Pen a_r sa R ar los i ef_ erp dio ef est us R aul _ra ia _F ia_ _A f_C ra ad Fu ucu acu Re pho ocl . cu s_ lea do err sp R s_ ce ta Cla st m_ ef ve ran ef_ is_ iu is F _F sic oid R ps lon m u uc ulo es ilo oc for a cu us su rop iz oli on As s_ _d s_ ag Rh _c yg is co se ist B eg ef_ ha ch ar Fu p rra ich f_A R rp ra ul i c hy tu us Re o _b ric y R us llu s om f. ut nn a ef _e m et _c _ e r D _D v _n ha ha ia _p ife s ic i an o C p on la l u R ty ct e do ef_ or al el ro e e o yo sc s R m _V er _p ac a R f_ ta p en um eto f rn a ul s e D _ te c a Re u h c o P f ic cr ri e h T rp i m m _D t is s_ _C o en lu u ad yo p p ef m o p in a D i ic t a ro R ro _f _ _l tu ic n ty a ta l e n s a c . R t a_ o _s _ ife nt o ri h a p e yo b ta p 1 r E h te p l s ii R f_ o _ . a ip p r _ _ n e ta c s o o va a R f D _ ry o o t m l d w ia e _ is h a r y e o U e o t R f N t u n ia t a t m r n 2 _ e r m a c ic h a i b P e L w o e D C e l _ u _ f_ m i a h a a p a is R a o h fu H o e l a d P b o iu C rp a i R e i a o s id r l a f n u m a e a_ i o t R e _ d p s l o f a a i h _ u c p i R ef f P _ n ia d a ed s s x s le _ a o e i s i 1 R P a b _ C m ._ a a e _D d o _ r i c i d t a g _ R f a a m u _ f r a b . e _ i e a _ al _ _ a R d n r b m a a c r e o p e f L i r b v H a f l R _ ct i a g a r d t n s s e f n o i b p i _ id . R _ P o _ e r c e a r o e e f y a i e l _ r _ p R _ P a b a s r e a n e m o d A S e f o _ e m u l a c a i s B o u e g t s i r c _ R e f _ P a d t s a l u d u v S a p s n a _ r i s a o _ _ m S f _ H i s e c l a H a d a 2 2 e t 1 a e i r R r _ n t i t e R a p n e t S d a a S C e r i a u 1 a a d i H s m U _ _ m f g n C d a t a S a u a a t a r n i e r m o n 2 _ l a e _ l _ a a a r h l i r _ i _ i a a u l g a _ l s a l n i i a t o a l l t f g S a a s i u t _ f C c s a c a a u s a i o j s d r u s t _ a _ a _ d Rhodophyta a r s a a _ _acr r n a p u m a s d c a d r c r a o t s g r r h i g S s _ s b r r S r i p H m i e e s d e t e r e e i h h u p P r l y s p e n s n a s o ic l h a g s a g a s a r p a s a h m e t s m s l s u i u m s p 2 u i n i u r a u i r u _ _ a i o o r l s s r a l a r g . _ c c _ s 1 t d g m i l i d s f s a i i _ a c m c st Phaeophyceae a a _ t i n C _ n s u i n . s g a a _ p p c l a s a l i a a _ l _ _ H H p a i m u i e a s r a a _ id s u c m _ l o l s i r p a _ n i t i m e _ s _ _ s l o s r o l e r u _ s q m p _ e a _ i a l m a u n i r l p c u m u e a _ n e t s i u v d l n a a u i _ a l i u m i a i r . hlorophyta a m C i n a f a es t o g u r r n m e a i i o g a i i c C m r r s a l s _ n a D a r r _ C A l a h a a i g s f a i t h a C A p t u u c e o h n La c h i A h e i m m s r l s u ag c b yophyt - Se rass l r a c Em _ c s o i u i f c m n a a a f

c _ c s o e a a n S _ b a 1 a r u R S L 3 e r S m p m r Embryophyta - Other

g e u

i m i Embryophyta - Mangrove

Figure 2. Phylogenetic tree of the Euka02 primer, showing differences among major marine macrophyte lineages: Chlorophyta, Rhodophyta, Phaeophyceae, seagrass, mangrove, and terrestrial plants. Species sequenced are accompanied by sequences from the SILVA reference database, labelled with Ref_ before the species name. Numbers after labels represent several individuals of the same species either sequenced here or used as reference. Letters A-B denote the case of two sequence variants chosen from the same sample.

42 43

Barcoding using the primer Euka02 accurately assigned 88% of macrophytes to their lineage:

Embryophyta 95% (19 out of 20 species), Rhodophyta 81% (17 out of 21 species), and

Phaeophyceae 86% (24 out of 28 species; Table 4). Although no Chlorophyta clustered with the reference sequences, most green algae (94%, 16 out of 17 species) formed their own clade apart from the other phyla. Order level was accurately assigned to 77% of the samples, while 52% of samples were assigned to the correct family (Table 4). Genus level was accurately assigned to 79%

(37 out of 47) of the samples for which a reference genus was available. Four macrophytes were identified at species level, although only 15 out of 86 macrophytes had a reference sequence at this taxonomic rank.

All Embryophyta formed a single clade and the barcoding region could separate angiosperms from gymnosperms, and monocots from eudicots. Exceptions were seagrasses Thalassia hemprichii that clustered within the red algae, and one of the two samples of Posidonia oceanica that although was within the phylum clustered with gymnosperms (Figure 2); the reference sequences of these species clustered in the correct clade. Cupressus sempervirens, the sole gymnosperm sequenced here, clustered with the reference species C. tonkinensis and with Pinus taeda, which belongs to the same order (Pinales). Seagrass order (Alismatales) formed a subclade clustering most samples with all references. However, seagrass family Hydrocharitaceae clustered in several branches while all reference sequences from the family Cymodoceae clustered together.

Cyperus sp. wrongly clustered within Alismatales, although both taxa are monocotyledons. The three mangroves samples were identified at species level, and the land plants were identified at genus level although species reference were not available.

43 44

Table 4. Summary of sample assignment by taxonomic rank. The first percentage in genus and species rank indicates the percentage of samples that were accurately assigned over the total number of samples (either 86 or 11); the second percentage indicates the accurate assignment within the number of reference sequences provided. Thus, for instance in the case of Euka02, there were 41 macrophytes that had a genus reference; of these, 29 macrophytes (71%) matched the correct genus.

Rank Euka02 (n = 86) 18S2 (n = 11) Phylum 88% 91% Order 76% 45% Family 54% 45% Genus 37% (71%, 41) 27% (30%, 10) Species 4% (23%, 13) 0% (0%, 5) Embryophyta 95% (19/20) 100% (3/3) Rhodophyta 81% (17/21) 100% (3/3) Phaeophyceae 86% (24/28) 100% (1/1) Chlorophyta 94% (16/17) 75% (3/4)

The Rhodophyta lineage clustered together and displayed two distinctive branches. The larger branch included only species of the subphylum Eurhodophytina, while species from several subphyla were included in the smaller branch (Figure 2). Prior to sequencing, we morphologically identified each collected species. However, we are uncertain of accurate identification a few Red

Sea macroalgae species due to lack of identification keys. The main uncertainties in Rhodophyta were species of the Laurencia complex, a genus that was recently split in four genera 62. We were unsure of four samples in this complex: three of them were left as Laurencia species, while one was morphologically identified as Chondrophycus. Nevertheless, the sequences of both Laurencia and Chondrophycus clustered next to each other, and these species could belong to the same genus.

Similarly, although the two crustose algae that we identified as Lithothamnion spp. clustered next to each other, they did not cluster with their reference genus but instead with the genus

Lithophyllum. Corallinales, the order of these two genera of crustose algae, formed a distinct clade separate from the other Eurhodophytina. The red macroalgae Palmaria palmata, Dilsea carnosa

44 45

(sample number 2), Turnerella pennyi, and Laurencia sp. grouped in the wrong lineages (Figure

2).

The primer Euka02 resolved Phaeophyceae as a monophyletic lineage where most of the sequences grouped either next to their reference or in the neighboring clade. The exceptions were

Dictyosiphon foeniculaceus, Chatopteris plumosa, Padina sp. and Dictyota crispata (sample number 2). The other Padina spp. and Dictyota spp. samples clustered within the lineage and next to the reference sequences. Brown algae orders clustered separate, with all Ectocarpales and all

Laminariales forming a single subclade each. However, Fucales and Dictyotales formed two subclades each (Figure 2). Although a genus reference was provided for only 13 brown algae samples, there were 7 more samples that clustered next to the other samples of the same genera.

This was the case of all Saccharina and all Laminaria samples, whose cluster was even at species level.

In contrast, the lineage of green algae (Chlorophyta) displayed taxonomic ambiguity. All species from the reference database formed a monophyletic group separate from the species sequenced, despite belonging to the same class (, Figure 2). Nevertheless, there was a certain level of identification since the sequenced samples formed a monophyletic group and order and most Ulvales were separated in their own clades. Only one species

(Valonia ventricosa) clustered in the wrong lineage, among Embryophyta.

There were several cases were the primer Euka02 failed to assign sequence variants to the correct taxonomical lineage. For some taxa, we had more than one sample within the same genus, and when one of them was wrongly assigned, we assumed the sequence itself failed but the primer is still appropriate as it accurately identified the other sample. This was the case of Phaeophyceae

Padina sp. and Dictyota crispata, Rhodophyta Dilsea carnosa and Laurencia sp., and seagrass

45 46

Thalassia hemprichii that were wrongly assigned. Other samples of either the same species, genus or family clustered correctly along the reference. In contrast, Phaeophyceae Dictyosiphon foeniculaceus and Chaetopteris plumosa, and Rhodophyta Palmaria palmata and Turnerella pennyi had no reference to compare, hence we cannot conclude whether the sequence was wrongly assigned or the primer was not appropriate for these species.

eDNA metabarcoding

The Euka02 primer from the universal 18S gene fingerprinted several lineages of eukaryotic organisms in environmental samples from coastal sediments, including macrophytes, metazoans, microalgae and unicellular organism, and unknown taxa (Supplementary Table 1). Marine macrophytes represented 13.3% of the total reads (560,931 out of 2,727,626 reads). This percentage differed between extracellular and intracellular eDNA, as the later contributed with

9.6% while the former contributed only to 3.7% of the total marine macrophyte reads

(Supplementary Table 1). Further data interpretation was done by pooling together the reads from both intracellular and extracellular eDNA.

The resolution of marine macrophyte taxonomic identification beyond family level was increased from 36% to 76% after inclusion of the created DNA reference library. This mini- barcode identified 21 marine macrophytes, 14 at genus or species level (Table 5). Five of the seven species present in the sampling sites were identified in the sediments: seagrasses Halophila stipulacea, H. ovalis, Thalassodendron ciliatum and Halodule uninervis, and mangrove Avicennia marina. Only two seagrass species were not recovered in the sediments; these species were not included in the created DNA reference library, as H. decipiens did not amplify with the Euka02 primer, and the sequence of Thalassia hemprichii was assigned to the wrong phylum (see Figure

46 47

1, within Rhodophyta). Avicennia marina contributed to 94.8% of the reads in sediments from mangrove forests; this was the only species reported during sampling. The four recovered seagrasses contributed to 30.4% of the reads in sediments of seagrass meadows (Table 5). The remaining 16 marine macrophyte from both sediment types are macroalgae: three Chlorophyta, six Phaeophyceae and seven Rhodophyta; of those, eight species are not included in the DNA library and were identified at family or order level (Table 5).

47 48

Table 5. List of marine macrophytes fingerprinted with the 18S mini-barcode using the primer Euka02. The seagrass and mangrove species listed were spotted on site during the sediment sampling.

Lineage Species eDNA Reads Chlorophyta Caulerpa serrulata f. spiralis 110 Chlorophyta Halimeda macroloba 10,551 Phaeophyceae Ectocarpales 59 Phaeophyceae Fucus vesiculosus 1,100 Phaeophyceae Laminariales 274 Phaeophyceae Padina sp. 226 Phaeophyceae Saccharina japonica 15 Phaeophyceae Sargassum sp. 1,109 Rhodophyta Ceramiaceae 76 Rhodophyta Corallinales 20 Rhodophyta Florideophyceae 346 Rhodophyta Laurencia mcdermidiae 981 Rhodophyta Lithothamnion sp. 4,886 Rhodophyta Neosiphonia sp2. 6 Rhodophyta Rhodomelaceae 1,051 Rhodophyta Rhodophysema elegans 58 Seagrass Halodule uninervis 17,948 Seagrass Halophila ovalis 26,890 Seagrass Halophila stipulacea 16,822 Seagrass Thalassodendron ciliatum 10,675 Mangrove Avicennia marina 467,728 Total 560,931 In situ marine macrophytes 385,629 % in situ marine macrophytes 68.75

48 49

1.5 Discussion

We tested the performance of 18 candidate primers to barcode marine macrophytes. The short length of the amplicons makes these primers suitable for use on fragmented DNA from environmental samples. Two 18S primers (Euka02 and 18S2) allowed discrimination of major lineages of photosynthetic eukaryotes, particularly differentiating between macroalgae lineages and embryophytes such as seagrasses and mangroves. The primer Euka02 was 81-95% accurate in identifying phyla, 76% in order and 54% accurate identifying macrophytes at family level (Table

4).

DNA metabarcoding is a valuable technique for assessing global biodiversity and reconstructing past diversity in both terrestrial and marine environments 14,44,63. In contrast to the universality of barcoding in animals23, the complex origin and broad genetic divergence of photosynthetic eukaryotes have essentially limited the universality of hitherto proposed barcodes to flowering plants 16,18-30. Moreover, several studies have recognized the limitations and challenges for Embryophyta barcoding16,21,25,26. These studies acknowledge the need to combine two or three barcodes and to look for new DNA regions that can increase taxonomical coverage and bring universality, rather than seeking a single primer pair capable of resolving all plants 16,33.

We tested whether the available barcodes for Embryophyta could distinguish among marine macrophytes. The ITS2 locus did not amplify any of the marine macrophytes tested here, in contrast to previous results for seagrass species 64,65. trnL amplified poorly on land angiosperms and did not amplify our targeted marine macrophytes. The matK and rbcL barcoding regions are well-known as plant barcodes, and several studies have shown their usefulness on flowering plants, including mangrove and seagrass species 16,19,22,66. However, the performance of the matK and rbcL regions on macroalgae is rather limited and ambiguous, as confirmed here and reported earlier

49 50

15,36,38. Thus, barcodes ITS2, trnL, matK and rbcL are limited for use on angiosperms and are not suitable for studies on marine macrophyte.

Saunders 35 pioneered efforts in the DNA barcoding of macroalgae, and developed a set of primers that amplified the COI region in Rhodophyta and Phaeophyceae, while also drawing attention to the low performance of this region on Chlorophyta 38,39. However, in contradiction with Saunders, we obtained a low amplification rate using the same four primer combinations.

Furthermore, these primers are too long (>700 bp) for metabarcoding of fragmented eDNA from sediments.

As highlighted above, recommended barcodes rbcL, matK, trnL, ITS2 and COI focus on

Embryophyte and their performance was insufficient on macroalgae. Due to these limitations, we included primers within the 18S ribosomal RNA gene, widely used on diversity screening and phylogenetic analysis of eukaryotes 67-69. Although previous studies suggest that the 18S region is inappropriate for phylogeny reconstruction 70-72, we included this barcode since our aim was discrimination of major lineages of macrophytes at least at phyla level and not resolving phylogenetic relationships. Most samples were accurately positioned along the reference sequences with the maximum-likelihood trees for barcoding 18S. This result renders the 18S region the best barcoding gene to resolve a major group of marine macrophytes, compared to the other barcodes we tested.

A major drawback of the 18S barcoding was the taxonomical ambiguity for the green algae lineage. Although the samples did not cluster in the same clade of the reference sequences, 94% of Chlorophyta clustered in a single clade as a distinct lineage; most genera even clustered next to each other and separate by order. This result indicates a certain level of identification, yet barcoding of green algae is a challenge and needs improvement. Earlier studies have noted poor

50 51 performance of both plasmid and mitochondrial markers in some order of the class Ulvophyceae

(to which all our samples belong) and within the genus Caulerpa (same genus for five species tested here) 38,73-75. Long barcodes of selected regions (tufA, tufB, rbcL, ITS, 18S) have been useful in diatoms, red and brown algae, but the possible high incidence of introns within green algae may complicate the barcoding of Chlorophyta 36,38.

Depending on the application, barcoding is a trade-off between assigning either few taxa to species level, or many taxa to higher taxonomical levels such as phylum or order. While other

DNA regions may offer higher species-discrimination power, none of the barcodes tested here amplified a broader set of non-related photosynthetic eukaryotes than 18S. Nevertheless, a universal barcode can add significant noise when the amplification is restricted to targeted taxa. In that case, use of specific primers is preferred to universal primers. Here, our initial aim was to get a unique primer for marine angiosperms and a unique primer for each macroalgae lineage, so that we could use methods such as qPCR to estimate their relative abundance. However, creation of those primers was challenging and we could not find a specific primer from the tested candidates.

When attempting design of de novo macroalgae primers, we encounter that the scarcity of reference libraries restricts development of unique primers, as the investigated taxa is largely unknown in the databases. There is a need to increase studies and efforts to generate DNA resources, including genotyping of a broad set of species from the three lineages of macroalgae.

Hitherto, universal barcodes like the 18S gene are the most suitable option for recovering less- studied groups such as macroalgae.

Taxonomical resolution below family level is still limited for most marine macrophytes with any of the two 18S candidates. This limitation will be overcome by increasing barcoding of these taxa and including the sequences in the databases. Out of 86 species sequenced here, we could

51 52 compare against a reference sequence only 41 and 13 of the macrophytes at genus and species level, respectively; there remaining macrophytes were absent in the databases and were compared only at family or order level. Considering the restriction of available reference databases, and considering the need to identify widely-separated lineages such as macroalgae from marine angiosperms, we recommend the 18S region as a universal barcode for macrophytes. As shown here, this barcode clearly resolves major phylogenetic patterns within macrophytes at order and family level. Furthermore, this region has brought high taxonomical resolution to the animal branch of the tree of life. A universal barcode such as 18S provides a parsimonious approach to resolving not just photosynthetic organisms but most marine eukaryotic biodiversity in a single eDNA sediment sample.

1.6 Acknowledgements

This research was supported by King Abdullah University of Science and Technology through baseline support to CMD. DKJ and SBØ received support from the Independent Research Fund

Denmark (8021-00222 B, ‘CARMA’). We thank Pierre Taberlet for suggesting the use of primer

Euka02 and MiSeq sequencing, and Craig T. Michell for generating de novo primers. We thank

Susse Wegeberg, Ole Geertz-Hansen, Karsten Dahl, Peter Stæhr and Michael Bo Rasmussen for contributing with North Atlantic/Arctic species of macroalgae to our collection. Mark A. Tester identified terrestrial halophiles from the Red Sea. We thank Wajitha J. Raja Mohamed Sait and

Nadia Haj Salah for their help with DNA extractions and sterilization.

52 53

1.7 Supplementary Information

R

e

f

_

E R R

x

R R R e e 1 f a _ f

e e _ e _ f l s f L R l f m _2 _ E _ R o _ s u s 1 L o e D u

x F s D s u R e t . u _ o bo f i A o a o i f_ u h a s e _ c p l s 2 s b _ r R l r c c n f R D t m s o _ t N l u R i . _ o ph u a l r y o h e e u s t u t _ c D e o i p u e e p o l i c s c f s n s f s i a s l f m w at s i _ i i s _ h o t _ f e _ s t o ic _ e . a p y D l _ s e c D d D _ t i h o r h il r

c p r u a o _ s a s _ v s i i u _ a _cl s l o o r i a i s c c e e m _ c a t c r p u s s s u u d t m t a r _ r v v r m f y _ t c r yo v r p u s_ l _ u _ s a e a _ i y o i v ar a r _ s _ u p u u d i e o u a s o c c o n f R p a a r c a r n p p a o m y p e e h _ l u

i o t o t t o e t _ r a o c a e s c e c a a f _ er i a t n m F C i r i c h s _ r e g o g o ar y u e m t _ n d o e i d i c t p s u _ _ o L s i g a a y c h R m a n i c i f

e h s e u f a s c t o s_ b t _ c _ E o a s a t _A p s o s e c i p e e c p s t b a b r a s c E i r l e B i _ n y f u t i o o o p i f c _ _ c a _ . _ E s e u h e R c R ef s s P m r a f v a d R r n _ 2 D e _ E l o o p p o o a f p p o e a b s 1 R e _ s o o i s ta R U a r l r f r d e li e f p n _ a i t f _ R _ C le _ _ c i P n i n er R e f a a a u 2 _ c 1 _ u s _ a b a s R e C e u m a a d o a f a t l e e R ic _ i _ R e _ o il _ r e in u t d 2 f C c a o id f a e e R e _ d i d g o R _ _ s n s f U n e _ c P p u R e _ e a is a R e a ja i i f im d d ic R e f_ d p d s R e P l e _ t e f P i o a f_ a a yp f_ _P a n n ta R H im d r a P a d a i e _ l e c R ad d in _s c R f _ im _1 e i in a p a e Ha lim da in a Re f_ na a _ 2 R f_ a e m ide f_ De _ _ s . H m _ o H s ar au p Re f_ li da sc R alo m bo s 1. e a e di a ef_ si ar r tr R _H im a_ at Sa ph es es al ef al d ne a rg on tia ce is R H e cu ce as _t _v ns f_ lim a_ dra R sum om ir Re Ha ed lin ef_ _ e id f_ im cy is Sa ma nto is e al a_ ns rga cr s R _H ed nee Re ssu oc us ef lim bor is f_Sa m_ arp R Ha a_ ens rga thu um f_ ed nkin Ref_ ssu nb Re alim _to Sarg m_ erg _H sus assu fusi ii Ref res m_v form Cup a ach e ef_ taed R ellian R us_ B ef_R um f_Pin a_1_ hizocl Re marin Ref_Ae onium nnia_ gagropi _sp. Avice ncheri lopsis_s ops_pa terrocla f_Balan Ref_C dia Re _A haetomorph _marina_1 a_coliformis Avicennia Re f_Chaetomorpha_brachygona Ref_Aloe_vera Ref_Cladophora_radiosa Ref_Halophila_ovalis_2 ta_B Ref_Haloph meda_incrassa ila_minor Hali Re icosa_B f_Zostera nia_ventr _noltei Valo is Ref_Z ricular ostera_ nia_ut Re marin f_Valo sp. f_Zos a_3 Re eda_ R tera_ Halim ef_Z marin siae R oste a_2 dam ef_ ra_m a_a rea Ha ari ithe rpu Re lod na_1 Lys pu ia f_H ule_ Ref_ atro ar R al pin ia_ uin ef_ odu ifo ng ng lex R H le lia _Ba _sa p e alo _w ef lla sim sa f_S du rig R tie b o Re y le hti lin su oid m f_ rin _u i f_F is_ n iu R C g nin Re ps re ar A ef ym odi e io py ld _ R _ o u rvi ng _a a ae s e Cy d m s a is c i u R f_ m o _i _B st _ id s H o ce soe ef la um m lo s ef a d a R p i er il i R _ lo o _s tif o id d p m C e T p c e ol lb an c a r _ R f_ h h e rr iu Bu y m _p fo e e E a i a_ u m f_ C _ s li a a R f n la la ro la e f_ ia u fi i s e _ h s _ t ta R e c c _ id o B R f H a s o u R n y ia c _ e _ a l ia va nd re h c rm c e R f H lo u _ li a u p n e o a R e _ a s_ h s t a ro e l ia e A f_ P p e _ a L r d f d H e lo h a m 1 d u c _ i id _ ii u i co n a a a A a f E n p l p o m i m o t s s _ u h a r h L _ l r c e R v l E i _ r i _ a a n R o c i o c C f a e s s e i c l s h i s a d R u a a i e h i i i s c p a t d i c t d s R e f ca _ _ i i R n c d s R e _ e h l p e n m o e R f y d R m i R g e _ a a e n u s o a r . f _ R il e r m a r n d e l p r l m e i i e f _ d _ d e a y a a c m p A n t c u a _ o i f d p A _ h f r r u f N _ u n c o f p i r _ a f O a n s i p e y i _ v m d J _ i e i i r _ a L _ o c _ o t s a o l C z f c c i _ a e i a _ h Le i u a f a a _ u t e L _ m L e i i i h t c o v n o o u s h a _ n o n m s_ g n l h s a i e d i R e a e c n p m s d c u b th g r a i o t r r c p i l l _ o n h h a e e n a i a a is c a

_ u H n o e _ l t m a c n r n s a i o o o a t n o l _ i o m i a i n n i i G l _ l d L r e i r n c r a th i a l e t p i a a m t o _ m r na a i e h o a c a f a _ d _ s e n l h e n _ a _ h if a _ e D l i a h l H t m n p y i m er e _ h n i l 2 m _ c m t R f y t l e a p t h um v a s a o u a e l u s e i o y o n H r l _ n o r i h r c R s m _ i H _ h a t i i n y e n G n r f i i o _ p t o s i l _ o e _ _ n a f o n p Rhodophyta n o f n L o b n e i f s R o g _ . r e r _ o a l r

R a g a d m a R e t l hoph a t a Phaeophyceae i l s t y c V e a c u s u H c

Li e a M i n _ c r i _ l _ f a Chlorophyta f l du a

f G e e l e - e e _ r f

R

f i R m l R o Embryophyta - Seagrass e r R i d Embryophyta - Other a Embryophyta - Mangrove

Supplementary Figure 1. Phylogenetic tree of the 18S barcode using the primer 18S2. Each lineage was clearly differentiated. Macrophytes sequenced are accompanied by reference sequences from the SILVA database, which are labelled with Ref_ before the species name. Numbers after labels represent several individuals of the same species either sequenced here or used as reference. Letters A-C denote the case of several sequence variants chosen from the same sequenced sample. Although there is ambiguity in barcoding of Chlorophyta (green color), as these macrophytes clustered in two clades, this primer seems to be more accurate than Euka02 for barcoding of green macroalgae, as Valonia ventricosa clustered next to its reference, and two Halimeda species clustered within reference sequences of the class.

53 54

Supplementary Table 1. Eukaryotic diversity based on intracellular and extracellular eDNA recovered from coastal sediment samples, fingerprinted with the primer Euka02 from the 18S gene. Values represent eDNA reads; Marine macrophytes include macroalgae (Chlorophyta, Rhodophyta and Phaeophyceae), mangrove and seagrass.

Lineage Intracellular % Extracellular % Total eDNA % Chlorophyta 7,219 0.5 3,507 0.03 10,726 0.3 Phaeophyceae 2,466 0.16 317 7.5 2,783 0.0 Rhodophyta 4,153 0.3 3,317 0.5 7,470 0.3 Mangrove 181,846 11.5 87,001 0.3 268,847 7.5 Seagrass 66,643 4.2 5,692 0.3 72,335 0.5 Land Embryophyta 10,099 0.6 11,415 1.0 21,514 0.8 Arthropoda 109,659 7.0 88,406 7.7 198,065 7.3 Fungi 34,169 2.2 63,645 5.5 97,814 3.6 Marine Invertebrate 501,955 31.9 488,576 42.4 990,531 36.3 Microalgae 243,986 15.5 153,207 13.3 397,193 14.6 Marine Unicellular 6,945 0.4 15,056 1.3 22,001 0.8 Unknown 334,351 21.2 158,031 13.7 492,382 18.1 Marine Vertebrate 71,729 4.6 69,346 6.0 141,075 5.2 Vertebrate 0 0 4,890 0.4 4,890 0.2 Total reads 1,575,220 100 1,152,406 100 2,727,626 100 Marine macrophytes 262,327 16.6 99,834 8.7 362,161 13.3 % among Total 9.6 3.7 362,161 13.3

54 55

1.8 Step-by-step protocol for intracellular and extracellular environmental DNA extraction, based on Lever et al. 2015

Instruments

- HulaMixer (Invitrogen, Merelbeke, Belgium)

- Thermomixer (Eppendorf, Hamburg, Germany)

- Vortex

- Centrifuges for 15 ml and 2 ml tubes (14.000 x g)

- Vacuum centrifuge

Notes before starting

1. Prepare reagents following Lever et al. 2015

2. Make sure all materials for reagents and for extraction are sterilized with best available

method (i.e. autoclave, UV light, DNA killer)

3. Regularly clean gloves with DNA killer throughout the extraction

4. Take every precaution to reduce cross-contamination

5. Some tips are listed in the Notes.

Procedure for extraction of intracellular and extracellular eDNA

1. Weigh 0.2 g of sediment in a 15 ml tube.

Note: weight empty tubes before. Thaw and slightly shake sediment samples for easy

transfer to the new tubes. Add an extra tube to run the extraction control.

2. Sorption prevention of nucleic acids

55 56

-1 a) Add 0.5 ml PO4 g

-1 Note: pH should be >9, add NaOH until desired pH is reached. Use 10-100 mol PO4 g

-1 for organic-rich and sandy sediments, and 100-1000 mol PO4 g for carbonate-rich

sediments.

b) Mix gently to coat the sample with PO4. Note: it works fine for 5 min, 250 rpm.

c) Add 9 volumes of 10x TE buffer

d) Gently shake in the HulaMixer at room temperature for 2 hours

e) Centrifuge for 20 min at 10,000 x g. Note: meanwhile, set up syringes and filters.

Optional for carbonate-rich samples:

f) Add 1 volume of CDM to the sample

g) Gently shake at room temperature for 1 hour

h) Add 8 volumes of 10x TE buffer

i) Gently shake at room temperature for 2 hours

j) Centrifuge for 20 min at 10,000 x g. Note: meanwhile, set up syringes and filters.

3. Extracellular and intracellular eDNA separation

a) Transfer filtered supernatant (extracellular eDNA) to a 15 ml tube, use 0.2 m

filter attached to a 5 ml syringe, proceed to first precipitation (step 4)

b) Sediment contains intracellular eDNA, proceed to cell lysis (step 8)

4. First precipitation

a) Add LPA linear polyacrylamide (final concentration 20 l per ml of sample)

b) Add 2 volumes of PEG 6000 (30% w/v with 1.6 M NaCl)

c) Homogenize by hand shaking for 30 sec, or vortex for a few seconds

d) Incubate at room temperature in dark for 2 hours

56 57

Note: safe to stop here. Samples can be incubated overnight at 4C.

e) Centrifuge at 14,000 x g for 60 min. Note: meanwhile, label a batch of 2 ml tubes.

f) Carefully discard the supernatant and keep pellet (pipette off or use vacuum)

Note: leave ~50 l of supernatant to avoid pellet lost.

g) Add 1 ml of 70% ethanol to the 15 ml tube, mix by pipetting

h) Transfer the sample into a 2 ml tube

i) Centrifuge 3 min at 14,000 x g, discard supernatant

j) Repeat ethanol addition (1 ml), centrifuge (5 min), discard supernatant

k) Dry pellet using vacuum centrifuge or oven (45C, 15-40 min). Do not over dry

l) Dissolve pellet in 100 l water

5. Clean-up of extracellular DNA

a) Add 500 l of lysis solution I

b) Gently shake for 1 hour at 50C (350 rpm)

Note: meanwhile, label a new batch of 2 ml tubes.

c) For organic-rich sediment add 500 l of lysis solution II and incubate a second

time for 1 hour at 65C, otherwise skip this step

d) Centrifuge for 10 min at 4C and 10,000 x g

e) Transfer supernatant to a clean tube

6. Purification (perform in a fume hood)

a) Add 1 volume (600 l) of Chloroform-isoamyl alcohol (CI)

b) Vortex at max speed for 10 sec

c) Centrifuge for 10 min at 4C and 10,000 x g.

Note: meanwhile, label a new batch of 2 ml tubes.

57 58

d) Transfer supernatant into a clean tube

Note: take top aqueous layer and leave precipitants in between layers.

e) Repeat CI addition, vortex and centrifugation

7. Second Precipitation

a) Add LPA linear polyacrylamide (final concentration 20 l per ml of sample)

b) Add with 2 volumes of PEG 6000 (30% w/v with 1.6 M NaCl)

c) Homogenize by hand shaking for 30 sec, or vortex for a few seconds

d) Incubate at room temperature in dark for 2 hours

Note: safe to stop here. Samples can be incubated overnight at 4C.

e) Centrifuge at 14,000 x g for 10 min. Note: label a new batch of 2 ml tubes.

f) Carefully discard the supernatant and keep pellet (pipette off or use vacuum)

Note: leave ~50 l of supernatant to avoid pellet lost.

g) Add 650 l of 70% ethanol, mix by pipetting

h) Transfer the sample into a 2 ml tube

i) Centrifuge 2 min at 14,000 x g, discard supernatant

j) Repeat ethanol addition (1 ml), centrifuge (3 min), discard supernatant

k) Dry pellet using vacuum centrifuge or oven (45C, 15-20 min). Do not over dry

l) Dissolve pellet in 100 l PCR water or DNA elution buffer

Extracellular eDNA is ready for PCR

Continuation of intracellular eDNA extraction (from 3b)

8. Cell lysis, starting on frozen sediment in 15 ml tube

58 59 a) Add 2.5 volumes of lysis solution I (500 l lysis solution I per 0.2 g sample) b) Gently homogenize, then freeze sample at -80C 5 min

Follow one of the below lysis protocols (LP, perform in fume hood)

I. LP1 (quick, suitable for most high-throughput applications and RNA)

- Add 2.5 volumes of cold PCI (500 l per 0.2 g sample)

- Thaw sample

- Vortex for 10 sec at high speed

- Disrupt sediment using beads at medium-high speed in a TissueLyser (0.5-1

min)

II. LP2 (most versatile)

- Thaw sample

- Vortex for 10 sec at high speed

- Shake for 1 hour at 50C

- Disrupt sediment using beads at medium-high speed in a TissueLyser (0.5-1

min), or vortex for 10 min

Optional: to increase yield repeat cycle of freezing, thawing and heat-shaking

the sample a second time.

III. LP3 (best for organic rich samples)

- Thaw sample

- Vortex for 10 sec at high speed

- Shake for 1 hour at 50C

- Repeat cycle of freezing at -80C for 5 min, thawing and heat-shaking

- Add 2.5 volumes (500 l per 0.2 g sample) of Lysis solution II

59 60

- Repeat cycle of freezing at -80C for 5 min, thawing and heat-shaking

- Repeat a second time the cycle of freezing at -80C for 5 min, thawing and

heat-shaking (this time 1 hour at 60C)

c) Centrifuge for 15 min at 10,000 x g and 4C

d) Transfer supernatant to a clean tube, discard pellet and flow-through

9. Follow extracellular protocol starting on Purification (step 6). Intracellular DNA is ready

for PCR after Second Precipitation (step 7). Tip: test for best PCR performance with

eDNA dilutions at 1:1, 1:10, 1:50 or 1:100; dilutions reduce PCR inhibitors carried out

from the sediment.

60 61

1.9 References

1 Duarte, C. M. Reviews and syntheses: Hidden forests, the role of vegetated coastal habitats in the ocean carbon budget. Biogeosciences 14, 301-310 (2017).

2 Duarte, C. M. & Cebrián, J. The fate of marine autotrophic production. Limnol. Oceanogr. 41, 1758-1766 (1996).

3 Duarte, C. M. & Krause-Jensen, D. Export from seagrass meadows contributes to marine carbon sequestration. Front. Mar. Sci. 4, 13 (2017).

4 Krause-Jensen, D. & Duarte, C. M. Substantial role of macroalgae in marine carbon sequestration. Nat. Geosci. 9, 737-742 (2016).

5 Ortega, A. et al. Important contribution of macroalgae to oceanic carbon sequestration. Nat. Geosci. 12, 748-754 (2019).

6 Baker, P. et al. Potential contribution of surface-dwelling Sargassum algae to deep-sea ecosystems in the southern North Atlantic. Deep Sea Res. (II Top. Stud. Oceanogr.) 148, 21-34 (2018).

7 Geraldi, N. R. et al. Fingerprinting blue carbon: Rationale and tools to determine the source of organic carbon in marine depositional environments. Front. Mar. Sci. 6, 263 (2019).

8 Raven, J. A. et al. Mechanistic interpretation of carbon isotope discrimination by marine macroalgae and seagrasses. Funct. Plant Biol. 29, 355-378 (2002).

9 Loneragan, N. R., Bunn, S. E. & Kellaway, D. M. Are mangroves and seagrasses sources of organic carbon for penaeid prawns in a tropical Australian estuary? A multiple stable- isotope study. Mar. Biol. 130, 289-300 (1997).

10 McKee, K., Rogers, K. & Saintilan, N. in Global change and the function and distribution of wetlands 63-96 (Springer, 2012).

11 Thomsen, P. F. & Willerslev, E. Environmental DNA–An emerging tool in conservation for monitoring past and present biodiversity. Biol. Conserv. 183, 4-18 (2015).

12 Thomsen, P. F. et al. Monitoring endangered freshwater biodiversity using environmental DNA. Mol. Ecol. 21, 2565-2573 (2012).

13 Stat, M. et al. Ecosystem biomonitoring with eDNA: metabarcoding across the tree of life in a tropical marine environment. Sci. Rep. 7, 12240 (2017).

14 Aylagas, E., Borja, Á., Muxika, I. & Rodríguez-Ezpeleta, N. Adapting metabarcoding- based benthic biomonitoring into routine marine ecological status assessment networks. Ecol. Indic. 95, 194-202 (2018).

61 62

15 Reef, R. et al. Using eDNA to determine the source of organic carbon in seagrass meadows. Limnol. Oceanogr. 62, 1254-1265 (2017).

16 CBOL, P. W. G. A DNA barcode for land plants. Proc. Natl. Acad. Sci. U. S. A. 106, 12794-12797 (2009).

17 Baldauf, S. L. The deep roots of eukaryotes. Science 300, 1703-1706 (2003).

18 Kress, W. J., Wurdack, K. J., Zimmer, E. A., Weigt, L. A. & Janzen, D. H. Use of DNA barcodes to identify flowering plants. Proc. Natl. Acad. Sci. U. S. A. 102, 8369-8374 (2005).

19 Newmaster, S. G., Fazekas, A. J. & Ragupathy, S. DNA barcoding in land plants: evaluation of rbcL in a multigene tiered approach. Botany 84, 335-341 (2006).

20 Chang, C.-C. et al. The chloroplast genome of Phalaenopsis aphrodite (Orchidaceae): comparative analysis of evolutionary rate with that of grasses and its phylogenetic implications. Mol. Biol. Evol. 23, 279-291 (2005).

21 Taberlet, P. et al. Power and limitations of the chloroplast trn L (UAA) intron for plant DNA barcoding. Nucleic Acids Res. 35, e14-e14 (2007).

22 Yu, J., Xue, J. H. & Zhou, S. L. New universal matK primers for DNA barcoding angiosperms. J. Syst. Evol. 49, 176-181 (2011).

23 Hebert, P. D. N., Ratnasingham, S. & de Waard, J. R. Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc. R. Soc. Lond., Ser. B: Biol. Sci. 270, S96-S99 (2003).

24 Pennisi, E. Wanted: a barcode for plants. Science 318, 190 (2007).

25 Hollingsworth, P. M., Graham, S. W. & Little, D. P. Choosing and using a plant DNA barcode. PloS one 6, e19254 (2011).

26 Chase, M. W. et al. A proposal for a standardised protocol to barcode all land plants. Taxon 56, 295-299 (2007).

27 Rubinoff, D., Cameron, S. & Will, K. Are plant DNA barcodes a search for the Holy Grail? Trends Ecol. Evol. 21, 1-2 (2006).

28 Lahaye, R. et al. DNA barcoding the floras of biodiversity hotspots. Proc. Natl. Acad. Sci. U. S. A. 105, 2923-2928 (2008).

29 Hebert, P. D. N., Gregory, T. R. & Savolainen, V. The promise of DNA barcoding for taxonomy. Syst. Biol. 54, 852-859 (2005).

30 Kress, W. J. & Erickson, D. L. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS one 2, e508 (2007).

62 63

31 Kuo, L.-Y., Li, F.-W., Chiou, W.-L. & Wang, C.-N. First insights into fern matK phylogeny. Mol. Phylogenet. Evol. 59, 556-566 (2011).

32 De Groot, G. A. et al. Use of rbcL and trnL-F as a two-locus DNA barcode for identification of NW-European ferns: an ecological perspective. PLoS One 6, e16371 (2011).

33 Seberg, O. & Petersen, G. How many loci does it take to DNA barcode a crocus? PLoS One 4, e4598 (2009).

34 Yao, H. et al. Use of ITS2 region as the universal DNA barcode for plants and animals. PLoS One 5, e13102 (2010).

35 Saunders, G. W. Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1879- 1888 (2005).

36 Saunders, G. W. & Kucera, H. An evaluation of rbcL, tufA, UPA, LSU and ITS as DNA barcode markers for the marine green macroalgae. Cryptogamie Algol. 31, 487 (2010).

37 Du, G. et al. DNA barcoding assessment of green macroalgae in coastal zone around Qingdao, China. JOUC 13, 97-103 (2014).

38 Saunders, G. W. & McDevit, D. C. in DNA Barcodes: Methods and Protocols (eds W. John Kress & David L. Erickson) 207-222 (Humana Press, 2012).

39 McDevit, D. C. & Saunders, G. W. On the utility of DNA barcoding for species differentiation among brown macroalgae (Phaeophyceae) including a novel extraction protocol. Phycol. Res. 57, 131-141 (2009).

40 Heinecke, L. et al. Aquatic macrophyte dynamics in Lake Karakul (Eastern Pamir) over the last 29 cal ka revealed by sedimentary ancient DNA and geochemical analyses of macrofossil remains. J. Paleolimnol. 58, 403-417 (2017).

41 Barnes, M. A. et al. Environmental conditions influence eDNA persistence in aquatic systems. Environ. Sci. Technol. 48, 1819-1827 (2014).

42 Little, D. P. A DNA mini‐barcode for land plants. Mol. Ecol. Resour. 14, 437-446 (2014).

43 Hajibabaei, M. et al. A minimalist barcode can identify a specimen whose DNA is degraded. Mol. Ecol. Resour. 6, 959-964 (2006).

44 Epp, L. S. et al. New environmental metabarcodes for analysing soil DNA: potential for studying past and present ecosystems. Mol. Ecol. 21, 1821-1833 (2012).

45 Guardiola, M. et al. Deep-Sea, Deep-Sequencing: metabarcoding extracellular DNA from sediments of marine canyons. PLoS One 10, e0139633 (2015).

63 64

46 Stoeck, T. et al. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Mol. Ecol. 19, 21-31 (2010).

47 Amaral-Zettler, L. A., McCliment, E. A., Ducklow, H. W. & Huse, S. M. A method for studying protistan diversity using massively parallel sequencing of V9 hypervariable regions of small-subunit ribosomal RNA genes. PLoS One 4, e6372 (2009).

48 Riaz, T. et al. ecoPrimers: inference of new DNA barcode markers from whole genome sequence analysis. Nucleic Acids Res. 39, e145-e145 (2011).

49 Richmond, M. D. A field guide to the seashores of Eastern Africa and the Western Indian Ocean Islands (Stockholm (Sweden) SIDA/UDSM, 2002).

50 Lieske, E., Fiedler, K. E. & Myers, R. F. Coral Reef Guide: Red Sea to Gulf of Aden, South Oman;[the Definitive Guide to Over 1200 Species of Underwater Life] (Collins, 2004).

51 Pedersen, P. M. Grønlands havalger (Epsilon. dk, 2011).

52 Guiry, M. D. AlgaeBase. World-wide electronic publication, http://www.algaebase.org (2013).

53 Coombs, J. T. & Franco, C. M. M. Isolation and identification of actinobacteria from surface-sterilized wheat roots. Appl. Environ. Microbiol. 69, 5603-5608 (2003).

54 Aires, T., Marbà, N., Serrao, E. A., Duarte, C. M. & Arnaud‐Haond, S. Selective elimination of chloroplastidial DNA for metagenomics of bacteria associated with the green alga Caulerpa Taxifolia (Bryopsidophyceae) 1. J. Phycol. 48, 483-490 (2012).

55 Lever, M. A. et al. A modular method for the extraction of DNA and RNA, and the separation of DNA pools from diverse environmental sample types. Front. Microbiol. 6 (2015).

56 Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, pp-10 (2011).

57 Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581 (2016).

58 RStudio: integrated development for R (RStudio, Inc., Boston, MA, 2015).

59 Wilkinson, S. P., Davy, S. K., Bunce, M. & Stat, M. Taxonomic identification of environmental DNA with informatic sequence classification trees. PeerJ PrePrints (2018).

60 Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307-321 (2010).

61 FigTree, a graphical viewer of phylogenetic trees (2007).

64 65

62 Martin-Lescanne, J. et al. Phylogenetic analyses of the Laurencia complex (Rhodomelaceae, ) support recognition of five genera: Chondrophycus, Laurencia, Osmundea, Palisada and Yuzurua stat. nov. Eur. J. Phycol. 45, 51-61 (2010).

63 Sigsgaard, E. E. et al. Seawater environmental DNA reflects seasonality of a coastal fish community. Mar. Biol. 164, 128 (2017).

64 Chen, S. et al. Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species. PLoS One 5, e8613 (2010).

65 Uchimura, M., Jean Faye, E., Shimada, S., Inoue, T. & Nakamura, Y. A reassessment of Halophila species (Hydrocharitaceae) diversity with special reference to Japanese representatives. Bot. Mar. 51, 258-268 (2008).

66 Lucas, C., Thangaradjou, T. & Papenbrock, J. Development of a DNA barcoding system for seagrasses: successful but not simple. PLoS One 7, e29987 (2012).

67 Chenuil, A. Choosing the right molecular genetic markers for studying biodiversity: from molecular evolution to practical aspects. Genetica 127, 101-120 (2006).

68 Field, K. G. et al. Molecular phylogeny of the animal kingdom. Science 239, 748-753 (1988).

69 Woese, C. R., Kandler, O. & Wheelis, M. L. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc. Natl. Acad. Sci. U. S. A. 87, 4576-4579 (1990).

70 Soltis, P. S. et al. The phylogeny of land plants inferred from 18S rDNA sequences: pushing the limits of rDNA signal? Mol. Biol. Evol. 16, 1774-1784 (1999).

71 Meyer, A., Todt, C., Mikkelsen, N. T. & Lieb, B. Fast evolving 18S rRNA sequences from Solenogastres (Mollusca) resist standard PCR amplification and give new insights into mollusk substitution rate heterogeneity. BMC Evol. Biol. 10, 70 (2010).

72 UNFCCC. The Paris Agreement | UNFCCC, https://unfccc.int/process-and-meetings/the- paris-agreement/the-paris-agreement (2016).

73 Famà, P., Wysor, B., Kooistra, W. H. C. F. & Zuccarello, G. C. Molecular phylogeny of the genus Caulerpa (Caulerpales, Chlorophyta) inferred from chloroplast tufA gene1. J. Phycol. 38, 1040-1050 (2002).

74 Kazi, M. A., Reddy, C. R. K. & Jha, B. Molecular phylogeny and barcoding of Caulerpa (Bryopsidales) based on the tufA, rbcL, 18S rDNA and ITS rDNA genes. PLoS One 8, e82438 (2013).

75 Zuccarello, G. C., Price, N., Verbruggen, H. & Leliaert, F. Analysis of a plastid multigene data set and the phylogenetic position of the marine macroalga Caulerpa filiformis (Chlorophyta). J. Phycol. 45, 1206-1212 (2009).

65 66

Chapter 2

Environmental DNA estimates marine macrophyte contribution to Blue Carbon sediments

This chapter was submitted to Limnology & Oceanography. The co-authors are: Alejandra

Ortega, Nathan R. Geraldi and Carlos M. Duarte.

Author contributions: CMD conceived the research. All authors determined sampling and eDNA associated methods, while CMD and AO designed the experiments. AO collected the samples and conducted the experiments. AO analyzed the data and wrote the manuscript with input from all authors.

66 67

2.1 Abstract

Estimation of marine macrophyte contribution to blue carbon habitats is key to understanding the dynamics of carbon sequestration. Nevertheless, identification of macrophyte carbon sources is challenging. Here, we provide high-resolution identification of 48 marine macrophytes recovered from mangrove and seagrass sediments using environmental DNA (eDNA). Our assessment includes major eDNA contributions from 40 taxa of macroalgae. In addition, our experiments showed 92% correlation (R2 = 0.85) between macrophyte eDNA abundance and organic carbon content. The relative eDNA contributions of macrophytes were similar to their contributions of organic carbon based on stable isotopes; however, isotopes are unreliable for discriminating among macrophyte lineages. We demonstrate that eDNA is an unparalleled method for estimating the organic carbon contribution of macrophytes to blue carbon stocks. eDNA offers a new framework for blue carbon assessments and enhances the climate change mitigation strategies reinforced by the Paris Agreement.

2.2 Introduction

Marine macrophytes (mangroves, seagrasses, saltmarshes and macroalgae) form highly productive blue carbon habitats, with a great capacity to store organic carbon from multiple sources within their sediments1-3. These coastal vegetated habitats occupy less than 2% of the seafloor but account for 50% of all carbon burial in marine sediments1,4. Unfortunately, blue carbon ecosystems are threatened due to degradation and loss of vegetated marine habitats4.

During the past 50 years, a third of the global extent of these habitats has been lost, hence the carbon accumulated in their sediments over millennia is being released into the atmosphere5.

Restoration and protection of these natural carbon sinks are key strategies for climate change

67 68 mitigation4,6.

Habitat-forming macrophytes contribute about 50% to the carbon found in their sediments4,7, the remaining 50% coming from other sources. Identification of these sources and estimation of their carbon contribution is relevant for blue carbon assessments and to target the habitats to be conserved. The main challenges in the study of blue carbon are to identify and estimate marine macrophyte contribution to sediments8,9. Traditional methods such as stable isotopes (δ13C and

δ15N)7,8 and secondary metabolites10,11 are used to differentiate among terrestrial and marine vegetated organic matter. However, isotopic signatures and metabolic profiles overlap among macrophyte lineages (macroalgae, seagrass, mangrove and terrestrial Embryophyta), limiting the application of these methods to fingerprinting blue carbon contributors8.

Alternatively, barcoding of environmental DNA (eDNA) can identify organisms from environmental samples based on their unique DNA sequences12, and can potentially identify degraded marine macrophytes within the sediment pools8,13. Since approximately 3% of cellular

14 organic carbon (Corg) is DNA , eDNA-based methods also provide an approach to estimate contributions to blue carbon stocks. Environmental DNA analyses provide information on presence-absence, and can be related to traditional measures of species15-17. However, the relationship between eDNA reads and actual species abundance depends on the barcoding primers used and the target taxa; this relationship should be tested on a study-by-study basis to ensure robust inferences. Despite this limitation, eDNA analyses have the potential to fingerprint macrophyte sequences in sediments and present huge potential for assessing macrophyte contribution to blue carbon sediments8,13.

We are aware of two studies using eDNA to fingerprint blue carbon13,18, but they did not provide evidence that macrophyte eDNA abundance was associated to their contribution to

68 69 sediment organic carbon pools. Moreover, the studies did not include all marine macrophytes in their assessments. One study used eDNA to confirm presence of macroalgae in coastal sediments18, while the use of eDNA barcoding in the other was limited to identification of marine angiosperms13. The role of marine angiosperms in blue carbon habitats is well known, but the contribution of macroalgae has been traditionally ignored19. Yet, macroalgae are believed to be important contributors to carbon sequestration because they deliver allochthonous carbon to coastal and deep-ocean sediments, and are often present within angiosperm-dominated blue carbon habitats20,21. Thus, a holistic blue carbon assessment must include all marine macrophyte lineages, including macroalgae.

Here, we fingerprint the contribution of marine macrophyte taxa to the eDNA pools of blue carbon habitats in the Arabian Red Sea (28o-18oN, 34o-41o E). Using sediments amended with known amounts of nine macrophyte species, we experimentally evaluated whether eDNA sequence abundance provides a proxy for the organic carbon contribution of marine macrophytes to blue carbon sediments. We used a 18S-V7 mini-barcode22 to identify and estimate the contribution of eDNA from seagrasses, mangroves and macroalgae to sediments of Red Sea coastal vegetated habitats; these contributions were compared with inferences of contributions to carbon stocks based on published δ13C and δ15N stable isotope signatures from the same sediments23 and Red Sea macrophytes24.

2.3 Methods

Experimental recovery of marine macrophyte eDNA in sediments

Before fingerprinting naturally occurring eDNA from Red Sea coastal sediments, we experimentally recovered eDNA of targeted macrophytes from sterilized sediment. We thus

69 70 validated eDNA fingerprinting and correlated eDNA sequence abundance with organic carbon content within the macrophytes. To do this, mangrove sediment was collected at KAUST

(22.340436, 39.089260), spread out in a thin layer (2 cm), and then air-dried. To remove pre- existing DNA, the sediment was sterilized by autoclaving for 120 min at 121 oC, and rinsed twice with 3% bleach for 10 mins. Bleach was removed by rinsing three times with nuclease-free water. Sterilized sediment was dried at 65 oC, and kept at -20 until mixture preparation. Targeted macrophytes were collected in mangrove and coral reef from the Red Sea habitats around

KAUST. These included seagrasses Thalassodendron ciliatum, Halophila ovalis and Halodule uninervis; mangrove Avicennia marina; red macroalgae Laurencia sp. and Corallinales sp.; green macroalgae Ulva sp.; and brown macroalgae Padina sp., Sargassum sp. and Saccharina latissima. Macrophyte tissue from fronds (macroalgae) and leaves (angiosperms) was cleaned by removing visual debris and epiphytes using a scalpel, then rinsed with milli-Q water to remove seawater. Clean macrophytes were dried at 65 oC, and then kept at 4 oC until quantification of

DNA and organic carbon.

Quantification of DNA and organic carbon

To correlate recovered eDNA with organic carbon, we quantified DNA extracted from tissue and organic carbon per gram of dry macrophyte. Quantification of DNA was used as a baseline to know whether the abundance of eDNA represented the added concentrations and not a PCR bias. Dry macrophytes (30 mg) were homogenized with a mortar and pestle using liquid nitrogen. DNA extraction was performed on five biological replicates using NucleoSpin Plant II kit (Macherey-Nagel) according to the manufacturer’s protocol. DNA concentration (per mg extracted) was quantified for each extraction in three measurement replicates using Qubit 2.0

70 71

Fluorometer (Quant-IT dsDNA High Sensitivity Assay kit; Invitrogen). Organic carbon content of macrophyte tissue was measured by high-temperature combustion using a CHNS/O Analyzer

(2400 II, PerkinElmer). Tissue from dry macrophytes (10 mg) was previously acidified to remove carbonates, using fumes from 3M hydrochloric acid.

Experimental design and mixture preparation

The experiment included five replicates from three different mixtures of dry macrophyte tissue and sterilized sediment. Each mixture consisted of a combination of nine macrophytes, each one added at different weights reflecting variable percentage of organic carbon content

(Supplementary Table 4). Quantifying the amount of DNA and organic carbon content per dry gram of each macrophyte, and the differential percentages added to each mixture, allowed us to compare how much macrophyte eDNA was recovered in relationship to the initial DNA and organic carbon present in the sediment. To attain robust and parsimonious results, data from the three mixtures were pooled and eDNA reads were compared against added organic carbon and

DNA. Together, the three mixtures provided three different DNA and organic carbon percentage data points per each of the nine macrophyte taxa (Supplementary Table 4).

The mixtures pools were prepared by mixing hand-shredded tissue of all the macrophytes with 189 g of sterile mangrove sediments and eight volumes of filtered (0.22 µm) seawater.

Next, five replicates per mixture were incubated in enclosed plastic boxes (10 x 20 cm), and kept in the dark in order to avoid eDNA degradation by light. We assumed that degraded organic matter would sink within the compacted natural mangrove sediment, and thus eDNA would be protected from sunlight degradation. The mixtures were kept at room temperature. To test for

71 72 eDNA contamination, the incubation included a control sample containing only sterile sediment devoid of macrophyte tissue.

To measure eDNA persistence over time, the mixtures were sampled over six weeks. Three samples of 1 g were collected from each replicate (five per mixture) during five sampling times

(T1 = day 0, T2 = day 20, T3 = day 27, T4 = day 34, T5 = day 41). Each replicate’s three samples were pooled together and stored at -80 oC until eDNA extraction.

Red Sea sampling sites and sediment collection

Contribution of marine macrophytes to sediment eDNA pools was assessed by collecting sediment samples from 35 seagrass meadows and 17 mangrove forests along the Saudi Arabian coast of the Red Sea (28o-18oN, 34o-41o E; Figure 4a, 4e). Mangrove sediments were collected in monospecific forests of Avicennia marina, by far the dominant mangrove species in the Red Sea, while seagrass sediments were collected in either monospecific or mixed meadows. Seagrass species found in the study sites at the time of sampling included Thalassodendron ciliatum,

Cymodocea rotundata, Halodule uninervis, Thalassia hemprichii, Halophila stipulacea, H. ovalis, and H. decipiens. Thirty-eight macroalgae species identified by visual surveys in both habitats are listed in Supplementary Table 1.

One sediment sample was collected at each mangrove and seagrass habitats, from the surface sediment (top 1 cm). Samples consisted of five biological replicates (1 ml of sediment each) randomly collected at the location and pooled together into 5 ml of sediment sample. Each replicate was collected using a modified 25 ml plastic syringe. The barrel of the syringe was cut at the bottom (where needles are attached). Open syringe barrels were inserted into the sediment surface and closed at the top with a rubber stopper, then lifted out with the sediment sample and

72 73 closed at the bottom with another stopper. Samples were taken using bleach-sterilized barrels, bags and gloves. Samples were kept on ice while transported to the lab either by car or aboard the research vessel. Once in the lab, the five replicates were pooled together and frozen at -80 ◦C until eDNA extraction. All species of seagrass, mangrove and macroalgae found in situ were collected, and a local DNA reference library was created as reported in Ortega et al., 201922.

DNA extraction, genotyping and library preparation

Intracellular and extracellular environmental DNA from the Red Sea coastal sediments, as well as intracellular DNA from the experiment, was isolated following the method developed by

Lever et al., 201525. A step-by-step protocol of this extraction method can be found in chapter I. eDNA was amplified in five PCR replicates of 10 μl reactions, by adding 5 μl of QIAGEN

Multiplex PCR Master Mix, 1 μl of 18S-V7 forward and reverse primers (Euka0226 F-

TTTGTCTGSTTAATTSCG, R- CACAGACCTGTTATTGC), 1 μl of DNA template and 2 μl of

PCR water. The PCR thermocycler program was 15 min at 95 ◦C, followed by 35 cycles of 30 sec at 94 ◦C, then 45 sec at 55 ◦C, 1.5 min at 72 ◦C, and a final extension at 72 ◦C for 10 min. eDNA samples were diluted at 1:1, 1:10, 1:50 or 1:100 before PCR to improve amplification performance; these dilutions reduced PCR inhibitors. To determine successful amplification,

PCR products were visualized through capillary electrophoresis using QIAxcel (Qiagen, Hilden,

Germany). eDNA extraction and amplification were performed under strictly sterilized conditions and in different areas of the lab facility. eDNA extraction blanks and PCR positive and negative controls were included. Amplicons were cleaned up and indexed following Illumina

Metagenomic Sequencing Library Preparation protocol (Illumina Inc., San Diego, USA), and sequenced using MiSeq Illumina platform at KAUST Biological Core Lab.

73 74

Data analyses

Sequencing outputs were demultiplexed following the Illumina protocol, primers were trimmed using Cutadapt27 (default settings, version 1.17), and FASTQ files were analyzed using

DADA2 package28 (version 1.8.0) in RStudio29 (version 1.1.463). DADA2 filters, dereplicates, models and corrects substitution errors, identifies chimeras, and merges paired-end reads; parameters were as default with TruncLen=c (110,110). Taxonomy was assigned to the reads using the assignTaxonomy function in DADA2 with a costume reference library (available with details at https://github.com/ngeraldi/eDNA_reference_libraries) that include barcodes of Red

Sea macrophytes22 and the SILVA 132 reference database (SSURef_Nr99_tax, http://www.arb- silva.de).

For the experiment, we assessed whether the eDNA approach allows quantitative inferences or only provides presence-absence information on contributions of macrophytes to the organic pool in sediments. To do this, we compared macrophyte eDNA recovery at lower (e.g. species, genus) and at higher taxonomic level (e.g. phylum). One-way analysis of variance (ANOVA) was used to compare whether there were differences in the abundance of eDNA reads recovered through time, at five different sampling times during a six-week sediment incubation (T1 = day

0, T2 = day 20, T3 = day 27, T4 = day 34, T5 = day 41). Analyses were done in PAST30 and in

RStudio29 (version 1.1.463).

For the Red Sea sediments, we analyzed the abundance of eDNA sequences from marine macrophytes along the sampling sites. To assess how taxonomic richness (at lower level) and relative eDNA abundance of marine macrophytes is distributed within the habitat sediments, we measured the indices for taxa evenness Pielou equitability (J), Dominance (D) and Shannon

74 75 diversity (H) using PAST30. We performed Bray-Curtis non-metric multidimensional scaling ordination (nMDS) and one-way permutational multivariate analysis of variance

(PERMANOVA), in order to elucidate differences in taxonomic composition of marine macrophytes between seagrass and mangrove sediments, and across the latitudinal gradient. Data were standardized dividing the total of marine macrophyte eDNA reads per habitat, by the number of sediment samples collected from each habitat (seagrass habitat n=35, mangrove habitat n=17); data was log-transformed before running the analyses in RStudio29

(version 1.1.463) using the Vegan31 package.

δ13C and δ15N analyses

To compare eDNA analyses, we evaluated the stable isotopic contribution of marine macrophytes to organic matter accumulation in sediments, based on reported isotopic data from

Red Sea primary producers24 and vegetated sediments23. Although Garcias-Bonet et al.32 reported stable isotopic data23 from the same locations of the eDNA samples in our study, their data integrate the top 10 cm of the sediment samples, whereas our study focused on the top 1 cm sediment layer. Thus, based on reported δ13C and δ15N stable isotopic data23 from surface sediments (1 cm), we analyzed the relative contribution of marine macrophytes as potential sources of organic matter using a Bayesian isotopic mixing model (SIMM)33,34. SIMM infers the relative contributions of different isotopic sources (primary producers) to a mixture (sediment pool) by estimating the probability distribution of source contributions under a rigorous Bayesian statistical framework34. Analyses were done in RStudio29 (version 1.1.463) using package simmr35.

75 76

Data availability

Sequence datasets used in this study will be published in PANGAEA.

2.4 Results and discussion

eDNA contributions were then compared with inferences of contributions to carbon stocks based on published δ13C and δ15N isotopic signatures of the same sediments23, and of Red Sea macrophytes24. Since our eDNA samples came from the sediment surface (1 cm), and the stable isotope data integrate the top 10 cm of the sediment sampled, we re-analyzed those data restricting the inferences to the top 1 cm of the sediments. We used a Bayesian isotopic mixing

33,34 model (SIMM) to evaluate the relative contribution of marine macrophytes to the Corg in the sediment.

Environmental DNA fingerprints blue carbon habitats

eDNA fingerprinting clearly differentiated major lineages of marine macrophytes in Red Sea coastal sediments. Seagrasses, mangroves, macroalgae (Rhodophyta, Chlorophyta and

Phaeophyta) and land plants (Embryophyta) represented 10.7% of the total eDNA sequences from blue carbon habitats (7.5% in seagrass sediments, 16% in mangrove sediments); while the rest belong to other eukaryotes (Figure 1a, b). Intracellular and extracellular eDNA showed similar percentages in mangrove sediments (16.5% and 16.1%, respectively), but varied in seagrass sediments (9.8% and 4.1%, respectively). As expected, unicellular organisms (fungi, microalgae and others; Supplementary Figure 1) were highly abundant in the intracellular eDNA pool. Subsequent analyses included both intracellular and extracellular sequences.

76 77

Although eDNA provided a high-resolution identification of marine macrophytes, taxonomic assignment of sequences is restricted by the breadth of sequences in the reference database. We increased macrophyte identification below family level from 36% to 76% after inclusion of a barcoding reference library from Red Sea macrophytes22. We identified 48 marine macrophyte taxa in the sediments, 37 of them at genus or species level (Table 1). Macroalgae were the most diverse lineage (40 taxa), followed by seagrasses (seven taxa) and one mangrove (Table 1).

Figure 1. Taxonomic composition of eukaryotes in Red Sea blue carbon habitats. Summary of environmental DNA sequences from seagrass (a) and from mangrove (b) habitats. Other includes metazoans, fungi, autotrophs and heterotrophs microorganisms. Macrophytes represented 10.7% of the total eDNA sequences among both mangrove and seagrass habitats. c and d, Proportion of macrophytes by order on macroalgae and by species on seagrass and mangroves. Data is log-transformed.

77 78

Table 1. List of marine macrophytes found in mangrove and seagrass sediments. Sediment indicates percentage of the taxon among the total sequences of marine macrophytes, while Lineage indicates percentage of the taxon among their particular macrophyte lineage. Contribution of land Embryophyta is excluded in this table.

Mangrove % Seagrass % Both % Lineage Taxa Sediment Lineage Sediment Lineage Sediment Lineage Caulerpa serrulata f. spiralis 0.044 1.1 4.488 66.5 2.266 42.1 Caulerpa taxifolia 0.011 0.2 0.006 0.1 Halimeda macroloba 3.775 94.0 1.813 26.9 2.794 51.9 Halimeda opuntia 0.004 0.1 0.002 0.0 Halimeda sp. 0.174 4.3 0.087 1.6 Chlorophyta Bryopsidales 0.016 0.4 0.303 4.5 0.160 3.0 Chaetomorpha cf. brachygona 0.001 0.0 0.001 0.0 Chaetomorpha linum 0.005 0.1 0.003 0.0 Ulva sp1. 0.093 1.4 0.047 0.9 Ulva sp2. 0.037 0.6 0.019 0.3 Dictyota humifusa 0.038 1.6 0.019 0.3 Padina sp. 0.079 0.8 0.069 3.0 0.074 1.2 Stypopodium sp. 0.059 2.6 0.030 0.5 Ectocarpales 0.003 0.0 0.021 0.9 0.012 0.2 Phaeophyceae Fucus vesiculosus 0.289 12.7 0.144 2.3 Sargassum sp. 10.198 98.7 0.856 37.5 5.527 87.6 Saccharina japonica 0.006 0.1 0.003 0.0 Laminariales 0.048 0.5 0.024 0.4 Other Phaeophyceae 0.950 41.6 0.475 7.5 Anotrichium licmophorum 0.323 0.8 0.161 0.8 Ceramium sp. 0.741 29.3 0.231 0.6 0.486 2.3 Ceramiaceae 0.060 2.4 1.300 3.3 0.680 3.2 Laurencia mcdermidiae 0.020 0.8 1.115 2.8 0.568 2.7 Laurencia sp. 0.833 2.1 0.416 2.0 Rhodomelaceae 0.275 10.9 3.037 7.6 1.656 7.8 Neosiphonia sp1. 0.071 0.2 0.036 0.2 Neosiphonia sp2. 0.786 2.0 0.393 1.8 Polysiphonia sp. 0.037 0.1 0.019 0.1 Polysiphonia fucoides 0.005 0.2 0.870 2.2 0.438 2.1 Rhodophyta Corallina officinalis 0.004 0.2 0.069 0.2 0.036 0.2 Hydrolithon samoense 0.002 0.1 0.076 0.2 0.039 0.2 Hydrolithon sp. 0.046 0.1 0.023 0.1 Corallinaceae 1.063 2.7 0.531 2.5 Corallinales 0.087 0.2 0.043 0.2 Furcellaria lumbricalis 0.165 0.4 0.082 0.4 Hypnea charoides 0.034 0.1 0.017 0.1 Peyssonnelia sp. 0.097 0.2 0.049 0.2 Lithothamnion sp. 1.339 53.0 29.725 74.4 15.532 73.1 Rhodophysema elegans 0.010 0.4 0.005 0.0 Florideophyceae 0.070 2.8 0.035 0.2 Halodule uninervis 0.050 0.9 9.564 19.9 4.807 17.8 Thalassodendron ciliatum 2.895 6.0 1.448 5.4 Halophila ovalis 5.597 96.1 16.233 33.7 10.915 40.4 Seagrass Halophila stipulacea 0.163 2.8 17.587 36.5 8.875 32.9 Hydrocharitaceae 0.108 0.2 0.054 0.2 Alismatales 1.785 3.7 0.893 3.3 Posidonia oceanica 0.011 0.2 0.006 0.0 Mangrove Avicennia marina 77.303 100 2.831 100 40.067 100

Macrophyte diversity in Blue Carbon ecosystems

78 79

Macrophyte eDNA taxonomic composition differed between the two blue carbon habitats

(PERMANOVA p = 0.001; F (1,50) = 9.5), but not along the latitudinal gradient within each habitat (for mangrove sediments p = 0.946 and F (2,16) = 0.47; for seagrass sediments p = 0.36 and

F (2,34)= 1.06; Figure 2). Nevertheless, eDNA sequences of seagrass taxa were more prevalent in seagrass meadows in the Southern Red Sea (Figure 3a), while macroalgal eDNA abundance was higher in mangrove sediments in offshore islands (Northern Red Sea islands) relative to on-shore mangrove sediments (Figure 3e). High macroalgal abundance in island sediments is consistent with available evidence of offshore export of a large portion of macroalgal net primary production21. The taxonomic composition of mangrove sediments was more even and diverse than that of seagrass sediments, despite the lower macrophyte taxonomic richness (24 versus 39 taxa in seagrass sediments; Supplementary Table 2).

79 80

Figure 2. Bray-Curtis non-metric multidimensional scaling ordination comparing macrophyte taxonomic composition between seagrass and mangrove blue carbon pools, and along the latitudinal gradient of the Arabian coast of the Red Sea (North: 28°-26°, Center: 25°-24°, South: 22°-18°).

80 81

Figure 3. Contribution of macrophytes in blue carbon habitats. a and e, macrophyte eDNA contribution per sampling site along the latitudinal gradient. Pies display proportions of macrophyte lineages (seagrass, mangrove, embryophytes and macroalgae), while stacked bar also details proportion of macroalgae Chlorophyta, Rhodophyta and Phaeophyta. b and f, overall eDNA contribution of each macrophyte lineage to the sediment. c and g, δ13C and δ15N isotopic contribution of each lineage to the sediments. d and h, stable isotope mixing model showing the signature of each macrophyte lineage within the sediments; overlapped isotopic signatures show a strong negative correlation and a high level of uncertainty in the model output.

81 82

Macroalgal eDNA was the most abundant in seagrass sediments (39.8% of the marine macrophyte eDNA) as compared to mangrove sediment (16.6%). The seagrass sediment also had 39% seagrass sequences, 18.8% Embryophyta sequences, and relatively low eDNA contributions from mangroves (2.3%; Figure 3b). As expected, results were different in mangrove sediments, where mangrove taxa contributed to 75.6% of the macrophyte sequences, with a smaller contribution of seagrasses (5.7%) and terrestrial plants (2%; Figure 3f). Although the proportion of each macrophyte lineage varied depending on the habitat (Figure 2, 3b, and 3f), this was not the case for the major contributing taxa within the lineages. Halimeda macroloba (Chlorophyta), Sargassum sp.

(Phaeophyta), Lithothamion sp. (Rhodophyta), Avicennia marina (mangrove) and

Halophila spp. (seagrasses) were the dominant taxa in both seagrass and mangrove sediments (Table 1; Figure 1c, d).

Avicennia marina, the prevalent mangrove of the Red Sea, was recovered in all mangrove sediments. No eDNA sequences from the other Red Sea mangrove Rhizophora mucronata were found, although five trees were spotted in one of the mangrove forests sampled. A few sequences (0.2%) from mangrove sediments were (most likely wrongly) assigned to the endemic Mediterranean seagrass Posidonea oceanica.

Seagrass sediment were collected in monospecific or mixed meadows, where we found seven in situ seagrass species (Supplementary Table 3). Halophila were the most abundant species both in the eDNA pool (70.2% among the seagrass lineage) and extant in situ (present in 38.9% of the meadows), followed by Halodule uninervis (19.9% eDNA and extant in 25% of meadows). Both genera include small, fast-growing and ephemeral species that remain dormant under the sediment for long periods36,37. Clones from

82 83

Halophila and Halodule extend over large areas, and the meadow can be disconnected when intermediate rhizomes decay; separated clone areas can be wrongly considered different meadows. In contrast, most living biomass of larger species such as

Thalassodendron ciliatum grows above the sediment, has slow turnover, and is easily exported elsewhere by wave action. eDNA of T. ciliatum represented only 6% of the reads, while it occurred in 12.5% of the meadows. No eDNA was identified for

Cymodocea rotundata, the less prevalent seagrass (found extant in 1.4% of meadows) nor for Thalassia hemprichii, which was present in 22.2% of the meadows. Most likely, T. hemprichii was not recovered due to lack of barcoding reference, or it does not amplify with the 18S primer we used. Differences between the relative contributions of seagrass species to sediment eDNA and to species occurrence in the meadows may be a bias associated with PCR. Nevertheless, these differences may reflect the dynamic nature of the macrophyte community, and indicate that blue carbon estimations based only on biomass may overrepresent species that are large and easy to find. Though restricted to available reference libraries, eDNA fingerprinting provides a snapshot of the seagrass community regardless of natural dynamic turnovers.

eDNA estimates resemble stable isotope estimates

The proportion of marine macrophytes in blue carbon sediments, based on eDNA analyses, were compared with the proportion of organic carbon they contributed based on isotopic δ13C and δ15N signatures. In seagrass sediments, whereas macroalgal eDNA was similar to seagrass eDNA (39.8% vs. 39%, respectively), the Bayesian stable isotope mixing model (SIMM) identified seagrasses as the largest contributors to the organic

83 84 carbon pool (45%), followed by macroalgae (32%; Figure 3b, c). Our eDNA estimate for the seagrass contribution was lower than that of the only available study for seagrass sediments, where seagrasses contributed to 88% of the macrophyte eDNA pool in

Moreton Bay (Australia)13. However, this estimate may be inflated as that study used a different barcoding region (rbcL gene) and did not recover macroalgal eDNA13. A previous SIMM estimate for Red Sea meadows concluded that seagrasses contribute 41% of sediment organic carbon38, while a global estimate indicated a contribution of 51%

7 (range 33-62%) . These global Corg estimates to seagrass sediments include carbon sources overlapping seagrass isotopic signatures7, and vary by latitude and meadow species39. SIMM inferences of the relative contributions of mangrove and terrestrial plants to the Corg in seagrass sediments were 15% and 8%, respectively (Figure 3c).

In mangrove sediments, SIMM inferences on major contributors were similar to eDNA inferences. The sole mangrove Avicennia marina represented 77.3% of eDNA sequence abundance, while isotopic signatures indicated a mangrove lineage contribution of around

45% to the Corg pool (Figure 3f, g). To the best of our knowledge, there are no studies fingerprinting eDNA in mangrove habitats. A study showed that Red Sea mangroves were similar to terrestrial embryophytes isotopic signatures, and together contribute to

40 60-70% of the Corg within the habitat . Due to this signature overlap, discriminating mangrove or embryophyte contributions to the Corg pool remains imprecise based on isotopic analyses. Based on SIMM inferences, macroalgae, seagrasses and terrestrial embryophytes contributed respectively to 23%, 20% and 12% of organic carbon (Figure

3g).

84 85

eDNA analyses indicated that habitat-forming macrophytes were important contributors to their own habitat, but not to the other habitat. Seagrass taxa contributed only 5.8% to the macrophyte eDNA pool in mangrove sediments, while mangrove taxa contributed only 2.8% to the seagrass sediments. These eDNA proportions differed from the Corg proportions based on SIMM, where seagrasses and mangroves each contributed about 25% to sediment of the other habitat. Stable isotopic analyses may overrepresent the Corg contribution of allocthonous seagrass and mangrove taxa in other habitats.

Previous stable isotope studies in Red Sea coastal sediments indicated a high isotopic signature overlap between seagrass, macroalgae and seston40; between mangrove and terrestrial embryophytes38; and between macroalgae and seston38. Thus, there is uncertainty over the contribution of each taxa. Moreover, macroalgae isotopic signatures have been treated collectively along with other marine carbon sources such as seagrass epiphytes, plankton, and zooxanthellae39,40.

Macroalgae are important contributors to sediment carbon sequestration

Macroalgae lineages (Rhodophyta, Chlorophyta and Phaeophyta) cover a broad phylogenetic spectrum41 and present diverse metabolic pathways for carbon fixation42.

There is much uncertainty, because macroalgal carbon isotopic signatures overlap with seagrass and mangrove signatures39,40. By contrast, DNA barcoding can identify the unique sequences of macroalgae22. Nevertheless, macroalgal DNA identification is challenging22,43,44 and requires considerable improvements in the available barcode reference libraries22. Although our eDNA analyses were limited in the number of taxa in the reference sequences (8,896 macrophyte taxa), we identified 40 taxa in blue carbon

85 86 sediments (Table 1); many of which were not found extant in situ (Supplementary Table

1).

Our analyses evidence that macroalgae are significant contributors to the eDNA pool in blue carbon habitats (33% of total marine macrophyte eDNA; Table 1), and highlight the need to include macroalgae in future blue carbon assessments19,21,45. Likewise, most macroalgae recovered in the eDNA pool were found extant in situ in soft-sediments of seagrass and mangrove habitats, indicating that growth of those species is not restricted to rocky substrates as generally assumed46. In addition, our eDNA results from coastal sediments aligned with those from the open ocean based on metagenomes21. In coastal sediments, Rhodophyta, Phaeophyta and Chlorophyta contributed respectively 64.5%,

19.1% and 16.3% of macroalgal eDNA (seagrass and mangrove were excluded from the calculation, Table 1). In the open ocean, Rhodophyta, Phaeophyta and Chlorophyta contributed respectively 62.8%, 25.7% and 11.5% of macroalgal eDNA (calculation based on SCG dataset from Table 1 in Ortega et al.21). These results not only indicate similarities between macroalgal contributions in coastal and open ocean systems, but also support that our metabarcoding results reflect actual abundance.

eDNA quantifies contribution of marine macrophytes to blue carbon

Given our results, we demonstrate that the eDNA approach provides results consistent with stable isotope analyses, the traditional method used for blue carbon assessments, and also provides a significant improvement in discriminating between marine macrophyte taxa. Such a level of discrimination between organic carbon contributors is unprecedented in blue carbon assessments to date. Furthermore, we wanted to validate

86 87 whether the proportion of eDNA sequences can be an indicator of the proportion of organic carbon for identified taxa. To do this, we experimentally added known amounts of macrophyte Corg to sterilized sediments. We then incubated and sampled the sediment mixture during two months (cf. Methods). Our results showed a significant positive relationship between the amount of organic carbon added and abundance of eDNA sequences (Figure 4). At higher taxonomic level (Rhodophyta, Chlorophyta, Phaeophyta

2 and seagrass), there was 92% correlation (R = 0.85) between the initial Corg added to the sediment and the recovered eDNA sequences; there were 64% correlation (R2 = 0.41) at lower taxonomical levels (e.g. species, Figure 4; more details in Supplementary

Information).

Because the contribution of targeted taxa to eDNA and organic carbon pools have a robust and significant relationship, roughly 1:1 in all macrophyte lineages, they provide evidence that eDNA analyses allow quantitative inferences on the relative contributions of macrophytes to blue carbon stocks.

87 88

Figure 4. Environmental DNA correlation to organic carbon. a, data based on higher taxonomic levels of macrophyte (Rhodophyta, Chlorophyta, Phaeophyta and seagrass). Results show a significant positive correlation between recovered eDNA reads abundance and initial content of organic carbon in the sediments. b, data based on lower taxonomic levels (species, genus or order).

88 89

Several taxa that were abundant in situ were not recovered in the eDNA pool, due to insufficient barcoding reference libraries. This was the case for seagrasses Thalassia hemprichii and Halophila decipiens, and for ubiquitous macroalgae that were not recovered at all (e.g. Turbinaria), or were limited to genus level despite a high species diversity in sampled locations (e.g. Laurencia spp., Caulerpa spp. and Halimeda spp).

Those missing species underline the need to improve barcoding of marine macrophytes, and particularly the need to build local reference libraries. Despite the limitations, this is the first study that taxonomically discriminates between seagrass, mangrove and macroalgae in blue carbon stocks, resolving many taxa down to species level.

Furthermore, our results confirm the important contribution of macroalgae, systematically ignored in much of the literature, in the assessments of Corg in coastal vegetated sediments.

eDNA, blue carbon and climate change

We demonstrate that eDNA is an unparalleled method to resolve contributions to blue carbon sediments. eDNA offers not only a good complement to traditional stable isotope analyses, but also an unprecedented fingerprinting to resolve of marine macrophyte at low taxonomic levels. A drawback of eDNA analyses is that Corg estimates based on eDNA abundance correlations do not account for organic carbon transferred through the food web. In contrast, stable isotope analyses do trace the carbon flowing in the sediment food web (macrophyte carbon consumed by protists, prokaryotes, invertebrates, etc.), as the isotopic signature of the source producers is preserved.

89 90

The Paris Agreement supoprts strategies to mitigate climate change, and stimulates development of new frameworks and technologies that enhance those strategies

(https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement, accessed on 4 September 2019). Coastal vegetated ecosystems offer a promising nature- based solution to climate change mitigation, as they sequester carbon in both their living biomass and in their sediments6,47. Similar to many other natural carbon sinks, blue carbon ecosystems are vulnerable to anthropogenic impacts and are being lost at a high rate4. Conservation measures can prevent further losses of carbon stocks in blue carbon sediments; however, adequate carbon accounting requires that the source of this carbon to be resolved. Environmental DNA is a cost- and time-effective approach that provide vital data to improve carbon accounting in support of blue carbon strategies. Hence, we advocate for the inclusion of eDNA, along with the traditional stable isotope analyses, in current and future carbon assessments in blue carbon ecosystems.

2.5 Acknowledgements

This research was funded by King Abdullah University of Science and Technology through baseline funding provided to CMD. We thank CMOR staff, R/V Thuwal crew,

Andrea Anton and members of CMD research group for help during sampling. We also thank Rubén Díaz-Rúa and Wajitha J. Raja Mohamed Sait for their help in sample processing.

90 91

2.6 Supplementary Information

Supplementary Figure 1. Comparison between intracellular and extracellular DNA from coastal sediments. Vertebrates include both terrestrial and marine vertebrates (>90%, mostly dolphins); Arthopods include mainly marine organism. Unicellular others include both autotrophs and heterotrophs microorganisms.

91 92

Supplementary Figure 2. Environmental DNA reads of targeted marine macrophytes during the incubation experiment. a, recovery of macrophytes eDNA reads in each sampling event over two-months experiment (mean  se). b, decline of macrophytes eDNA reads in the sediment pool through time.

92 93

a b c

60 80 80 y = 1.0819x - 1.0306 Phaeophyta y = 1.3749x - 10.01 R = 0.8462 R = 0.5133 50 Phaeophyta 70 DNA 60 Organic carbon 60 eDNA 40 50 Seagrass 40 30

40 P e rc e n ta g e

20 30 Rhodophyta 20 Seagrass 20

10 P e rc e n ta g e re c o v e re d e D N A re a d s Rhodophyta

A t h P e rc e n ta g e re c o v e re d e D N A re a d s

i g h

e r t a x

o n

o m

i c l e v l e Chlorophyta Chlorophyta 0 10 0 5 1015 2025 30 35 40 45 0 05 1015 20 25 30 35 40 0 -20 Chlorophyta Rhodophyta Seagrass Phaeophyta

25 70 70

60 y = 1.5177x - 5.7522 DNA R = 0.6996 60 20 Organic carbon 50 50 eDNA

15 40 40

30

10 P e rc30 e n ta g e 20 20 5

P e rc10 e n ta g e re c o v e re d e D N A re a d s

P e rc e n ta g e re c o v e re d e D N A re a d s y = 0.4318x - 6.313

A

t l o w e

r t a x

o n o

m i c

l e v e l R = 0.4095 10 0 0 0 5 1015 2025 30 35 40 05 1015 20 25 30 35 40 0 -10 Percentage initial organic carbon Percentage initial DNA Macrophyte species

Supplementary Figure 3. Correlation between initial organic carbon and DNA content with recovered eDNA. a, Percentage of recovered eDNA reads against the percentage of organic carbon per gram of dry tissue added to the sediment. b, Percentage of eDNA recovered reads against the percentage of quantified DNA per gram of dry tissue added to the sediment. c, rank of these percentages: DNA, organic carbon and eDNA by taxa (at higher taxonomic levels in upper panel, e.g. phylum, class; and lower taxonomic level in lower panel, e.g. species, genus).

93 94

Supplementary Table 1. List of macroalgae taxa found in Red Sea coastal vegetated habitats. Since taxonomic keys are limited for macroalgae, there are uncertainties for several morphological identifications.

Lineage Taxon Chlorophyta Halimeda sp. Chlorophyta Halimeda discoidea? Chlorophyta Halimeda macroloba Chlorophyta Halimeda incrassata Chlorophyta Caulerpa racemosa var. lamourouxii? Chlorophyta Caulerpa racemosa var. cylindracea? Chlorophyta Caulerpa serrulata Chlorophyta Caulerpa serrulata f. spiralis Chlorophyta Caulerpa brownii Chlorophyta Caulerpa sertularioides Chlorophyta Caulerpa taxifolia Chlorophyta Ulva sp. Chlorophyta Valonia aegagropila Chlorophyta Valonia ventricosa Chlorophyta Dictyosphaeria cavernosa Phaeophyta Padina boryana? Phaeophyta Padina boergesenii? Phaeophyta Padina sp. Phaeophyta Turbinaria ornata var. serrata Phaeophyta Sargassum mutium? Phaeophyta Sargassum olygocystum? Phaeophyta Sargassum ilicifolium? Phaeophyta Sargassum aspe Phaeophyta Ascophyllum nodosum Phaeophyta Dictyota crispata? Phaeophyta Dictyota humifusa? Rhodophyta Hypnea cornuta Rhodophyta Hypnea sp. Rhodophyta Gracilaria sp. Rhodophyta Laurencia mcdermidiae Rhodophyta Laurencia sp. Rhodophyta Laurencia papillosa? Rhodophyta Laurencia? sp1. Rhodophyta Laurencia? sp2. Rhodophyta Laurencia? sp3. Rhodophyta Jania pumila Rhodophyta Lithothamnion? sp. Rhodophyta Rhodophyta sp1. Rhodophyta Rhodophyta sp2.

94 95

Supplementary Table 2. Diversity indices of marine macrophytes by blue carbon habitat. Relative abundance of eDNA sequences is in reads per million (RPM), standardized by the number of sampling sites in each habitat. Mangrove habitat presented the highest diversity of macrophytes in the sediments (high abundance and equitability, low dominance; mean ± SD).

Index Mangrove Seagrass Taxa n 24 39 eDNA (RPM) 7920 10568 Dominance D 0.3(±0.01) 0.6(±0.02) Shannon Diversity (H) 1.8(±0.02) 1.0(±0.04) Pielou Equitability (J) 0.8(±0.01) 0.4(±0.02)

95 96

Supplementary Table 3. Seagrass species composition in each seagrass habitat sampled. As the meadows presented 1 to 4 species, Incidence indicates the occurrence of each species among all the total occurrence of species (72) in the meadows. Halodule Thalassia Halophila Halophila Halophila Thalassodendron Cymodocea Site Latitude Longitude Meadow uninervis hemprichii ovalis stipulacea decipiens ciliatum rotundata 1 28.0717 34.8457 1 1 1 1 0 0 0 Mixed 2 27.9807 35.2096 0 1 0 0 0 1 0 Mixed 3 27.6545 35.2887 1 0 0 1 1 1 0 Mixed 4 27.3003 35.6398 0 0 0 0 0 1 0 Monospecific 5 26.919 36.0066 0 0 0 0 0 0 1 Monospecific 6 26.9175 36.0076 1 1 1 1 0 0 0 Mixed 7 25.9894 36.6993 1 0 0 0 1 0 0 Mixed 8 25.7601 36.7069 1 0 0 0 0 1 0 Mixed 9 25.7177 36.5832 0 0 0 1 0 0 0 Monospecific 10 25.704 36.5681 0 0 1 1 1 0 0 Mixed 11 25.6816 36.7492 1 0 0 0 0 0 0 Monospecific 12 25.6529 36.5182 0 1 0 0 0 0 0 Monospecific 13 25.5201 36.7723 1 0 0 0 0 1 0 Mixed 14 25.3617 36.9109 1 0 1 0 1 0 0 Mixed 15 25.3585 36.9091 1 0 0 0 1 0 0 Mixed 16 25.1579 37.1627 0 0 0 0 0 1 0 Monospecific 17 25.1573 37.1627 0 0 0 0 0 1 0 Monospecific 18 25.1573 37.1627 0 0 0 0 0 1 0 Monospecific 19 24.1944 37.9304 1 1 1 1 0 0 0 Mixed 20 24.1944 37.9304 1 1 1 1 0 0 0 Mixed 21 22.9336 38.8803 1 0 1 1 0 0 0 Mixed 22 22.2487 39.0717 0 0 1 0 1 0 0 Mixed 23 22.2487 39.0717 0 0 1 0 1 0 0 Mixed 24 20.16 40.217 1 1 0 0 0 0 0 Mixed 25 20.1594 40.2154 1 1 0 0 0 0 0 Mixed 26 20.1578 40.221 1 1 0 0 0 0 0 Mixed 27 19.9602 40.155 1 1 0 1 0 0 0 Mixed 28 19.613 40.6447 1 1 0 0 1 1 0 Mixed 29 19.5115 40.7347 0 1 0 0 0 0 0 Monospecific 30 19.2707 40.8981 0 1 0 0 1 0 0 Mixed 31 18.52 41.0841 0 1 0 0 0 0 0 Monospecific 32 18.2289 41.3233 0 1 0 0 0 0 0 Monospecific 33 18.1503 41.531 0 1 0 0 0 0 0 Monospecific 34 18.1185 41.5654 1 0 0 0 0 0 0 Monospecific 35 18.1175 41.5598 0 0 1 0 0 0 0 Monospecific Occurrence among 72 in total 18 16 10 9 9 9 1 Incidence among all sites 25% 22.2% 13.9% 12.5% 12.5% 12.5% 1.4%

96 97

Supplementary Table 4. Description of macrophyte treatments pools mixed with sediment. Sediment and Tissue indicate grams of dry weight added to the treatment. Percentage of organic carbon (Corg) and DNA indicates proportion of the initial content per macrophyte contributed to the macrophyte tissue pool. These percentages were based on quantification of organic carbon -1 -1 (Corg g DW) and DNA (DNA g DW) in gram of dry weight per macrophyte, and represent the percentage of the total macrophyte organic carbon and DNA.

97 98

-1 -1 Species Sediment (g) Tissue (g) % Corg % DNA Corg g DW DNA g DW Avicennia marina 24.81 6.99 0.41 30.03 Corallinales sp. 0.41 2.63 0.02 41.71 Halodule uninervis 9.93 3.61 0.32 30.65 Halophila ovalis 1 2.82 0.25 187.91 Laurencia sp. 189.55 5.73 19.46 11.27 0.29 44.48 Padina sp. 12.94 15.1 0.3 92.98 Saccharina latissima 14.87 5.82 0.33 33.89 Sargassum sp. 7.81 50.63 0.34 584.86 Ulva prolifera 3.98 1.12 0.31 23.09 Avicennia marina 11.98 1.63 0.41 30.03 Corallinales sp. 0.84 2.58 0.02 41.71 Halodule uninervis 7.94 1.39 0.32 30.65 Laurencia sp. 3.94 1.1 0.29 44.48 Padina sp. 189.115 5.54 16.77 9.44 0.3 92.98 Saccharina latissima 1.01 0.19 0.33 33.89 Sargassum sp. 25.17 78.67 0.34 584.86 Thalassodendron ciliatum 19.92 4.32 0.45 53.11 Ulva prolifera 4.99 0.68 0.31 23.09 Avicennia marina 3.95 0.77 0.41 30.03 Corallinales sp. 1.64 7.26 0.02 41.71 Halodule uninervis 1.03 0.26 0.32 30.65 Halophila ovalis 7.9 15.51 0.25 187.91 Laurencia sp. 189.201 5.93 11.63 4.66 0.29 44.48 Padina sp. 25.98 20.97 0.3 92.98 Saccharina latissima 4.96 1.34 0.33 33.89 Sargassum sp. 9.92 44.47 0.34 584.86 Thalassodendron ciliatum 15.33 4.76 0.45 53.11

98 99

Supplementary Table 5. Taxonomic fingerprinting of eDNA from macrophytes in sediment samples. n indicates whether the taxon was included in two or three treatments (each treatment included five sampling times).

Lineage Macrophyte n Species Genus Order Phylum Mangrove Avicennia marina 15 15 15 15 15 Seagrass Halophila ovalis 15 15 15 15 15 Phaeophyta Sargassum sp. 15 NA 15 15 15 Seagrass Halodule uninervis 15 6 6 15 15 Seagrass Thalassodendron ciliatum 10 9 9 15 10 Chlorophyta Ulva sp. 10 NA 9 9 10 Rhodophyta Corallinales sp. 15 NA NA 8 15 Rhodophyta Laurencia mcdermidiae 15 9 9 15 15 Phaeophyta Padina sp. 15 NA 11 11 15 Phaeophyta Saccharina latissima 15 13 14 14 15

99

2.7 Supplementary Results

Recovery of eDNA in sediment

eDNA was recovered from three macrophytes mixtures. Pooling them together, a total of 7,348,001 eukaryotic reads from 3,396 sequence variants were generated; 20.7% reads from 848 sequence variants belong to marine macrophyte taxa (seagrass, mangrove,

Rhodophyta, Phaeophyta, Chlorophyta). PCR controls and extraction blanks included 1% and 3% of the total eukaryotic reads, respectively. Control sample (sediment without macrophyte tissue) included 5% of the total eukaryotic reads, and 19.9% of these belong to the mangrove Avicennia marina; although sterilized, the sediment used in all the mixtures and control sample were collected in a mangrove habitat. Non-targeted taxa that may come from the non-sterilized macrophyte tissue belong to land Embryophyta (25%, seagrass and mangrove excluded), microorganisms (27.4%), marine invertebrates

(14.3%) and vertebrates (12.5%; including 9.2% fish).

Fingerprinting marine macrophyte eDNA in blue carbon sediments

Targeted macrophytes were identified at their lower taxonomic level (e.g. species) in the eDNA pool of each mixture (Supplementary Table 5), but were not necessary detected in all the five sampling events per sample throughout the incubation period.

Mangrove Avicennia marina, seagrass Halophila ovalis and brown algae Sargassum sp. were consistently fingerprinted at the species level in all samples, followed by seagrass

Thalassodendron ciliatum, green algae Ulva sp. and brown algae Saccharina latissima, that were recovered at their lowest taxonomic level in most of the samples

(Supplementary Table 5). 101 eDNA persistence over time

We evaluated the relative eDNA persistence over time from the reads of all macrophytes mixtures. There were no significant differences in marine macrophytes recovered eDNA abundance through time: 11.5 % reads were recovered in day 0; 17.8% in day 20; 27.8% in day 27; 21.9% in day 34; and 20.8% in day 41 (ANOVA p = 0.893,

F4,130 = 0.276; Supplementary Figure 2a). Overall, the relative eDNA abundance declined exponentially through the incubation period, with 21.1% of the total eDNA abundance persisting in the sediment after the end of the experiment (Supplementary Figure 2b).

eDNA correlation to organic carbon and DNA content

We excluded Avicennia marina from the analysis due to a high incidence of eDNA reads that displayed ambiguity in the results. This ambiguity remained after correcting the abundance of reads based on the control sample – removing 19.9% of A. marina eDNA sequences in each mixture. Our results showed significant positive relationships between the percentage of recovered eDNA reads and the percentage of DNA and organic carbon added to the sediment mixture (Supplementary Figure 3). At higher taxonomic level (e.g. phylum or class), the initial DNA and organic carbon added to the sediment correlated respectively to 71% (R2 = 0.51) and to 81% (R2 = 0.66;

Supplementary Figure 3a) of the proportion of eDNA reads. At lower taxonomic level

(e.g. species), the relationship between eDNA reads and the initial DNA and organic carbon added to the sediment accounted respectively for 83% (R2 = 0.69) and 71% (R2 =

0.51; Supplementary Figure 3b).

101 102 2.8 References

1 Duarte, C. M. & Cebrián, J. The fate of marine autotrophic production. Limnol. Oceanogr. 41, 1758-1766 (1996). 2 Donato, D. C. et al. Mangroves among the most carbon-rich forests in the tropics. Nat. Geosci. 4, 293-297 (2011). 3 Fourqurean, J. W. et al. Seagrass ecosystems as a globally significant carbon stock. Nat. Geosci. 5, 505-509 (2012). 4 McLeod, E. et al. A blueprint for blue carbon: toward an improved understanding of the role of vegetated coastal habitats in sequestering CO2. Front. Ecol. Environ. 9, 552-560 (2011). 5 Pendleton, L. et al. Estimating global “blue carbon” emissions from conversion and degradation of vegetated coastal ecosystems. PLoS One 7, e43542 (2012). 6 Herr, D. & Landis, E. Coastal blue carbon ecosystems. Opportunities for Nationally Determined Contributions. Policy Brief. Gland, Switzerland: IUCN. Washington, DC: TNC (2016). 7 Kennedy, H. et al. Seagrass sediments as a global carbon sink: Isotopic constraints. Global Biogeochem. Cycles 24 (2010). 8 Geraldi, N. R. et al. Fingerprinting blue carbon: Rationale and tools to determine the source of organic carbon in marine depositional environments. Front. Mar. Sci. 6, 263 (2019). 9 Macreadie, P. I. et al. The future of Blue Carbon science. Nature communications 10, 1-13 (2019). 10 Brown, M. R. The amino-acid and sugar composition of 16 species of microalgae used in mariculture. J. Exp. Mar. Bio. Ecol. 145, 79-99 (1991). 11 Zidorn, C. Secondary metabolites of seagrasses (Alismatales and Potamogetonales; Alismatidae): Chemical diversity, bioactivity, and ecological function. Phytochemistry 124, 5-28 (2016). 12 Hebert, P. D. N., Gregory, T. R. & Savolainen, V. The promise of DNA barcoding for taxonomy. Syst. Biol. 54, 852-859 (2005). 13 Reef, R. et al. Using eDNA to determine the source of organic carbon in seagrass meadows. Limnol. Oceanogr. 62, 1254-1265 (2017). 14 Landenmark, H. K. E., Forgan, D. H. & Cockell, C. S. An estimate of the total DNA in the biosphere. PLoS Biol. 13, e1002168 (2015). 15 Thomsen, P. F. et al. Environmental DNA from seawater samples correlate with trawl catches of subarctic, deepwater fishes. PLoS One 11, e0165252 (2016). 16 Pilliod, D. S., Goldberg, C. S., Arkle, R. S., Waits, L. P. & Richardson, J. Estimating occupancy and abundance of stream amphibians using environmental DNA from filtered water samples. Can. J. Fish. Aquat. Sci. 70, 1123-1130 (2013). 17 Hirai, J., Katakura, S., Kasai, H. & Nagai, S. Cryptic zooplankton diversity revealed by a metagenetic approach to monitoring metazoan communities in the coastal waters of the Okhotsk Sea, Northeastern Hokkaido. Front. Mar. Sci. 4, 379 (2017). 18 Queirós, A. M. et al. Connected macroalgal-sediment systems: blue carbon and food webs in the deep coastal ocean. Ecol. Monogr. 89, e01366 (2019). 19 Krause-Jensen, D. et al. Sequestration of macroalgal carbon: the elephant in the Blue Carbon room. Biol. Lett. 14, 20180236 (2018).

102 103 20 Raven, J. Blue carbon: past, present and future, with emphasis on macroalgae. Biol. Lett. 14, 20180336 (2018). 21 Ortega, A. et al. Important contribution of macroalgae to oceanic carbon sequestration. Nat. Geosci. 12, 748-754 (2019). 22 Ortega, A. et al. A DNA mini-barcode for marine macrophytes. Under review (2019). 23 Garcias-Bonet, N. et al. in Supplement to: Garcias-Bonet, N et al. (2019): Carbon and nitrogen concentrations, stocks, and isotopic compositions in Red Sea seagrass and mangrove sediments. Front. Mar. Sci., 6, 267, https://doi.org/10.3389/fmars.2019.00267, https://doi.org/10.1594/PANGAEA.895644 (2018). 24 Anton, A. et al. in Supplement to: Duarte, Carlos M; Delgado-Huertas, Antonio; Anton, Andrea; Carrillo-de-Albornoz, Paloma; Lopez-Sandoval, Daffne C; AgustÌ, Susana; Almahasheer, Hanan; Marba, Nuria; Hendriks, Iris; Krause-Jensen, Dorte; Garcias-Bonet, Neus (2018): Stable isotope (δ13C, δ15N, δ18O,δD) composition and nutrient concentration of Red Sea primary producers. Front. Mar. Sci., 5, https://doi.org/10.3389/fmars.2018.00298, https://doi.org/10.1594/PANGAEA.892848 (2018). 25 Lever, M. A. et al. A modular method for the extraction of DNA and RNA, and the separation of DNA pools from diverse environmental sample types. Front. Microbiol. 6 (2015). 26 Guardiola, M. et al. Deep-Sea, Deep-Sequencing: metabarcoding extracellular DNA from sediments of marine canyons. PLoS One 10, e0139633 (2015). 27 Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal 17, pp-10 (2011). 28 Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581 (2016). 29 RStudio: integrated development for R (RStudio, Inc., Boston, MA, 2015). 30 Hammer, Ř., Harper, D. A. T. & Ryan, P. D. PAST: Paleontological Statistics Software Package for Education and Data Analysis Palaeontol. Electron. 4, 9 (2001). 31 Oksanen, J. et al. Vegan: Community ecology package. R Doc 10, 631-637 (2007). 32 Garcias-Bonet, N. et al. Carbon and nitrogen concentrations, stocks, and isotopic compositions in Red Sea seagrass and mangrove sediments. Front. Mar. Sci. 6, 267 (2019). 33 Parnell, A. C., Inger, R., Bearhop, S. & Jackson, A. L. Source partitioning using stable isotopes: coping with too much variation. PLoS One 5, e9672 (2010). 34 Phillips, D. L. et al. Best practices for use of stable isotope mixing models in food- web studies. Can. J. Zool. 92, 823-835 (2014). 35 Parnell, A. C. simmr: A Stable Isotope Mixing Model. R package version 0.4.1. https://CRAN.R-project.org/package=simmr (2019). 36 Waycott, M., Longstaff, B. J. & Mellors, J. Seagrass population dynamics and water quality in the Great Barrier Reef region: a review and future research directions. Mar. Pollut. Bull. 51, 343-350 (2005). 37 Hovey, R. K. et al. Strategy for assessing impacts in ephemeral tropical seagrasses. Mar. Pollut. Bull. 101, 594-599 (2015).

103 104 38 Serrano, O., Almahasheer, H., Duarte, C. M. & Irigoien, X. Carbon stocks and accumulation rates in Red Sea seagrass meadows. Sci. Rep. 8, 15037 (2018). 39 Miyajima, T. et al. Geographic variability in organic carbon stock and accumulation rate in sediments of East and Southeast Asian seagrass meadows. Global Biogeochem. Cycles 29, 397-415 (2015). 40 Almahasheer, H. et al. Low Carbon sink capacity of Red Sea mangroves. Sci. Rep. 7, 9700 (2017). 41 Guiry, M. D. How many species of algae are there? J. Phycol. 48, 1057-1063 (2012). 42 Kremer, B. P. Light independent carbon fixation by marine macroalgae. J. Phycol. 15, 244-247 (1979). 43 Saunders, G. W. Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1879 (2005). 44 Saunders, G. W. & Kucera, H. An evaluation of rbcL, tufA, UPA, LSU and ITS as DNA barcode markers for the marine green macroalgae. Cryptogamie Algol. 31, 487 (2010). 45 Krause-Jensen, D. & Duarte, C. M. Substantial role of macroalgae in marine carbon sequestration. Nat. Geosci. 9, 737-742 (2016). 46 Coppejans, E., Beeckman, H. & De Wit, M. in The Ecology of Mangrove and Related Ecosystems 59-75 (Springer, 1992). 47 Nellemann, C. et al. Blue carbon: the role of healthy oceans in binding carbon. A rapid response assessment (2009).

104 105

PART II

Fingerprinting Blue Carbon in the Open Ocean

105 106 Chapter 3

Important contribution of macroalgae to oceanic carbon

This chapter is published in Nature Geoscience. The full citation is: Alejandra Ortega,

Nathan R. Geraldi, Intikhab Alam, Allan A. Kamau, Silvia G. Acinas, Ramiro Logares,

Josep M. Gasol, Ramon Massana, Dorte Krause-Jensen and Carlos M. Duarte (2019).

Important contribution of macroalgae to oceanic carbon sequestration. Nature

Geoscience, 12, 748–754 (2019) doi:10.1038/s41561-019-0421-8.

Author contributions: C.M.D. and D.K.-J. conceived the research. J.M.G., S.G.A., R.L.,

R.M., I.A. and A.A.K. produced and curated the metagenomic data. A.O., C.M.D.,

N.R.G. and I.A. conducted the data analysis. A.O. and C.M.D. wrote the manuscript. All co-authors contributed to improving the manuscript and approved the submission.

106 107 3.1 Abstract

The role of macroalgae in Blue Carbon assessments has been controversial, partially due to uncertainties about the fate of exported macroalgae. Available evidence suggests that macroalgae are exported to reach the open ocean and the deep sea. Nevertheless, this evidence lacks systematic assessment. Here, we provide robust evidence of macroalgal export beyond coastal habitats. We used metagenomes and metabarcodes from the global expeditions Tara Oceans and Malaspina 2010 Circumnavigation. We discovered macroalgae worldwide at up to 5,000 km from coastal areas. We found 24 orders, most of which belong to the phylum Rhodophyta. The diversity of macroalgae was similar across oceanic regions, although the assemblage composition differed. The South Atlantic

Ocean presented the highest macroalgal diversity, whereas the Red Sea was the least diverse region. The abundance of macroalgae sequences attenuated exponentially with depth at a rate of 37.3% km−1, and only 24% of macroalgae available at the surface were expected to reach the seafloor at a depth of 4,000 m. Our findings indicate that macroalgae are exported across the open and the deep ocean, suggesting that macroalgae may be an important source of allochthonous carbon, and their contribution should be considered in Blue Carbon assessments.

3.2 Main

Coastal habitats are highly productive ecosystems that contribute greatly to global carbon sequestration1,2. Seagrass meadows, salt marshes and mangrove forests have complex root systems that sequester large amounts of carbon in soft sediments within their habitat3-6. Macroalgae have been neglected in blue carbon assessments7,8, because

107 108 most of them lack root systems, grow on rocky substrate, and do not accumulate carbon- rich sediments. However, macroalgae form the most extensive and productive vegetated coastal habitat, exporting over 44% of their primary production1,7,9. Calculations suggest that 25% of exported macroalgal carbon is sequestered in long-term reservoirs, such as coastal sediments and the deep sea1,7.

Based on first-order calculations7, it is hypotesized that macroalgae globally support an export of 679 Tg C year-1. Most of this carbon is remineralized or grazed in coastal environments, or cast onshore, while 14 Tg C year-1 is sequestered in coastal sediments and 152 Tg C year-1 could be sequestered in the deep sea7. Although there is a lack of empirical data, these calculations are supported by annecdotal evidence from sightings of long-distance macroalgae rafting10 and presence in deep-sea sediments7. This evidence is dominated by observation of large biomass of brown macroalgae (Phaeophyta), but observations of red (Rhodophyta) and green (Chlorophyta) macroalgae are few10. This evidence imbalance could be related to lineage-specific features of the macroalgae cell wall composition and differences in cell-degradation rates11. Furthermore, most calculations of macroalgal primary production suggest that macroalgal carbon is exported as dissolved and particulate organic carbon (DOC and POC)12,13, which are not visually detectable. An inclusive method, such as the identification of macroalgal environmental

DNA (eDNA), could provide evidence of macroalgal carbon export in the ocean, and may allow the required systematic and consistent assessments. eDNA is the DNA left behind by organisms in the surrounding environment including degraded cell tissues, gametes, animal feces, and so on. As DNA comprises approximately 3% of cellular organic carbon14, the presence of macroalgal DNA in waters beyond macroalgal habitats

108 109 is both an indicator of the presence of the species and evidence (not necessarily quantitative) of the export of macroalgal carbon.

Here, we examined the presence and relative abundance of Rhodophyta, Phaeophyta, and Chlorophyta macroalgal eDNA sequences in the ocean. The sequences were derived from hundreds of metagenomes generated by two global expeditions: Tara Oceans15 and

Malaspina 2010 Circumnavigation16. These expeditions surveyed the global ocean from surface to 4,000 m depth, and sequenced the particulate material present in environmental water samples17,18 (see Methods). Although the expeditions primarily assessed the microbial and planktonic diversity, they also generated a global DNA resource that allows identification of multicellular eukaryotes. We exploited the potential of this eukaryotic eDNA resource to explore the presence of macroalgae in the global ocean.

This holistic approach has not been attempted before, but is semiquantitative and consistent for evaluating the hypothesis that macroalgal material is broadly exported across the global ocean.

We identified macroalgae using two global ocean datasets. The first included 163 metabarcodes of amplicon 18S rDNA from Tara Oceans19. The second included 417 metagenomes pooled from the Tara Oceans20 and Malaspina21 expeditions (see Methods).

We used two different strategies for the second dataset: (1) a query targeting all genes, and (2) restriction of the query to the top-four single-copy protein-encoding genes (SCG) available in the gene catalogue of both expeditions. Since macroalgae taxonomy is not well covered in barcoding and genome reference libraries22, we used order instead of species as the taxonomical level for macroalgae identification.

109 110 3.3 Tracing macroalgae

Combining the results of the three independent datasets, 24 macroalgae orders were identified within the particulate organic matter (POM) of the water column. Both metagenomic approaches (all genes and SCG) delivered seventeen orders, of which ten were shared among the two approaches (Table 1). Only six orders were detected in the amplicon 18S rDNA metabarcodes, all of which found in the metagenomes. Rhodophyta was the most common macroalgal lineage (18 orders: 12 in all genes, 13 in SCG, and four in the 18S dataset), followed by Phaeophyta (four orders: two unique in each metagenomic approach), and Chlorophyta (two orders: two in all genes, and one shared in both SCG and 18S datasets; Table 1).

The relative abundance of macroalgal DNA varied between oceanic basins and datasets. The Mediterranean Sea presented the highest abundance of sequences in both

18S and all genes datasets, while the South Atlantic Ocean was the most abundant in the

SCG dataset. The basins with the fewest sequences were the Red Sea in the 18S dataset, the Southern Ocean in all genes and the Mediterranean Sea in the SCG dataset. Similarly, the relative abundance of sequences per order differed greatly. Cyanidiales (Rhodophyta) and Ectocarpales (Phaeophyta) jointly accounted for 57% of the macroalgal sequences in both metagenomic datasets, although they were absent from the amplicon 18S dataset, whose most abundant order was Prasiolales (Chlorophyta) with 53% of all macroalgal sequences (Table 1).

110

Table 1. Relative abundance of macroalgal DNA. IO Indian Ocean; MS Mediterranean Sea; NAO, SAO North South Atlantic Ocean; NPO, SPO North and South Pacific Ocean; RS Red Sea; and SO Southern Ocean. Order abundance indicates the percentage of each order among total macroalgal DNA sequences per dataset. Order prevalence indicates the percentage of metagenomes/metabarcodes (n) where the order was present. The number (n) of metagenomes (for both SCG/all genes) or metabarcodes (18S) per region, respectively, were: 87 and 11 for the Indian Ocean; 19 and 18 for the Mediterranean Sea; 44 and 2 for the North Atlantic Ocean; 101 and 21 for the South Atlantic Ocean; 51 and 0 for the North Pacific Ocean; 88 and 18 for the South Pacific Ocean; 13 and 2 for the Red Sea; and 14 and 6 for the Southern Ocean. Cyanidiales represented 35% of the SCG dataset (reads per million ± s.e.) and was present in 72% of the SCG metagenomes. Prasiolales dominated the 18S dataset (53% of the total sequences, metagenomic Illumina tags ± s.e.) and was found in 66% of the metabarcodes.

Order Order Order Data Lineage Order IO MS NAO NPO RS SAO SO SPO Total Abundance Prevalence SCG Chlorophyta Prasiolales 1.3 (±0.6) 0.9 (±0.4) 2.2 (±0.5) 0.7 (±0.4) 0.3 (±0.3) 1.6 (±0.3) 2.5 (±1.4) 1.2 (±0.2) 10.66 11.46 40.05 SCG Phaeophyta Ectocarpales 1.1 (±0.3) 4.1 (±1.4) 2.1 (±0.5) 3.7 (±1.2) 0.9 (±0.3) 2.1 (±0.3) 4.3 (±1.5) 2.1 (±0.4) 20.35 21.87 49.40 SCG Phaeophyta Fragilariales 0.2 (±0.1) 0.4 (±0.1) 0.9 (±0.3) 0.1 (±0.0) 0.4 (±0.2) 0.6 (±0.2) - 0.3 (±0.2) 2.89 3.11 17.75 SCG Phaeophyta Laminariales 0.2 (±0.1) - - 0.2 (±0.2) - 0.2 (±0.1) - - 0.71 0.77 4.56 SCG Rhodophyta Bangiales ------0.1 (±0.1) - 0.13 0.14 1.44 SCG Rhodophyta Batrachospermales - - - - - 0.4 (±0.3) 1.0 (±0.9) - 1.37 1.47 2.88 SCG Rhodophyta Bonnemaisoniales - - - 0.1 (±0.1) - - - - 0.11 0.12 0.72 SCG Rhodophyta Ceramiales ------0.05 0.05 0.96 SCG Rhodophyta Corallinales 0.1 (±0.0) 0.1 (±0.1) 0.3 (±0.2) 0.1 (±0.0) - 1.3 (±0.5) - 0.1 (±0.0) 1.99 2.14 15.83 SCG Rhodophyta Cyanidiales 3.0 (±0.7) 1.8 (±0.7) 3.6 (±0.8) 2.5 (±0.5) 7.6 (±4.9) 4.8 (±1.1) 2.6 (±0.7) 7.1 (±1.2) 32.86 35.31 72.42 SCG Rhodophyta Gelidiales 0.1 (±0.0) 0.2 (±0.2) - 0.1 (±0.0) - 0.5 (±0.2) 0.1 (±0.1) 0.1 (±0.0) 1.06 1.14 6.24 SCG Rhodophyta Gigartinales 1.2 (±0.2) 0.1 (±0.1) 0.8 (±0.2) 1.1 (±0.3) 0.6 (±0.3) 0.7 (±0.2) 0.9 (±0.6) 1.1 (±0.2) 6.42 6.90 46.52 SCG Rhodophyta Halymeniales 0.1 (±0.1) - 0.2 (±0.1) 0.4 (±0.2) - 0.3 (±0.1) 0.2 (±0.2) 0.2 (±0.1) 1.47 1.58 9.35 SCG Rhodophyta Nemaliales 0.5 (±0.2) 0.1 (±0.0) 2.3 (±0.8) 1.1 (±0.4) 0.1 (±0.1) 2.3 (±0.6) 2.4 (±2.1) 3.6 (±0.7) 12.30 13.22 32.61 SCG Rhodophyta Palmariales ------0.1 (±0.1) - 0.16 0.18 1.44 SCG Rhodophyta Plocamiales ------0.01 0.01 0.24 SCG Rhodophyta Rhodymeniales - - - - - 0.3 (±0.1) - 0.2 (±0.0) 0.50 0.54 13.19 AG Chlorophyta Prasiolales 0.0 (±0.1) - 0.0 (±0.1) - - 0.1 (±0.3) 0.2 (±0.2) - 0.67 1.21 7.91 AG Chlorophyta Ulvales - - - - - 0.1 (±0.1) - - 0.16 0.28 1.44 AG Phaeophyta Ectocarpales 0.4 (±1.5) 0.1 (±0.4) 0.7 (±2.5) 0.6 (±1.8) 0.7 (±2.0) 0.9 (±2.0) 0.3 (±0.5) 0.7 (±3.1) 13.83 25.10 30.22 AG Phaeophyta Fucales ------0.03 0.06 0.48 AG Phaeophyta Laminariales ------0.04 0.07 6.71 AG Rhodophyta Balliales 0.4 (±0.6) - - 0.1 (±0.1) - 0.6 (±0.8) 0.2 (±0.3) 0.4 (±0.6) 2.39 4.33 6.71 AG Rhodophyta Bangiales 0.8 (±1.5) 0.3 (±0.5) 0.1 (±0.5) 0.3 (±0.8) 0.1 (±0.1) 0.3 (±1.0) 0.9 (±1.4) 0.2 (±0.7) 6.54 11.87 33.09 AG Rhodophyta Ceramiales ------0.1 (±0.1) - 0.08 0.15 0.48 AG Rhodophyta Corallinales ------0.02 0.04 0.48 AG Rhodophyta Cyanidiales 0.2 (±0.6) 8.3 (±9.0) 0.4 (±1.6) 0.4 (±1.4) 0.4 (±0.7) 0.5 (±1.5) 0.6 (±1.8) 0.2 (±1.3) 17.85 32.41 44.12 AG Rhodophyta Gigartinales 0.1 (±0.2) - 0.1 (±0.1) 0.2 (±0.2) - 0.1 (±0.2) 0.1 (±0.1) 0.1 (±0.1) 0.95 1.72 5.28 AG Rhodophyta Gracilariales 0.1 (±0.1) - - 0.3 (±0.3) - 0.1 (±0.2) - - 0.58 1.05 1.92 AG Rhodophyta Hapalidiales ------0.04 0.08 0.48 AG Rhodophyta Nemaliales 0.1 (±0.2) 0.1 (±0.2) 0.3 (±0.9) 0.1 (±0.3) 0.6 (±1.3) 0.5 (±1.2) - 0.1 (±0.3) 4.38 7.94 24.22 AG Rhodophyta Palmariales - - - 0.2 (±0.2) 2.4 (±4.6) - - - 4.92 8.93 3.84 AG Rhodophyta Porphyridiales ------0.02 0.04 0.24 AG Rhodophyta Stylonematales 0.0 (±0.1) 0.2 (±0.3) 0.2 (±0.5) 0.1 (±0.2) 0.1 (±0.2) 0.3 (±1.0) - 0.1 (±0.4) 2.60 4.71 19.42 18S Chlorophyta Prasiolales 9.7 (±2.8) 1.0 (±0.6) 0.8 (±0.5) - 1.5 (±0.5) 13.5 (±7.2) 10.7 (±3.5) 4.2 (±1.1) 41.27 53.20 66.23 18S Rhodophyta Ceramiales - 16.8 (±14.6) ------16.78 21.63 12.99 18S Rhodophyta Gigartinales - 0.2 (±0.2) ------0.22 0.29 2.60 18S Rhodophyta Porphyridiales 0.4 (±0.3) 8.4 (±8.0) 1.8 (±3.5) - - 4.3 (±3.4) 0.5 (±0.3) 0.4 (±0.1) 15.64 20.16 35.06 18S Rhodophyta Rhodymeniales - 3.7 (±3.5) ------3.67 4.73 3.90

113

Our pioneering attempt to trace macroalgal eDNA from POM in the global ocean is challenging for two reasons. Firstly, the phylogenetic diversity of macroalgae is so great that the three lineages are as distant from each other as are mushrooms from elephants8.

Second, macroalgal sequences are poorly represented in reference libraries. Metagenomic and metabarcoding identification is restricted to previously sequenced taxa that are available in published databases. Sequencing efforts on macroalgae are rather limited, with only one full genome sequenced23. Half of the 24 orders identified here are not included in the SILVA 18S rDNA reference library (http://www.arb-silva.de, accessed on

July 2018). Furthermore, SILVA includes only 1,068 macroalgae species, compared with

12,471 species reported in the AlgaeBase and the 27,500 described species22. SILVA under-represents green and brown macroalgae in comparison with red algae: Chlorophyta and Phaeophyceae have 46 and 84 entries for macroalgae, respectively, while

Rhodophyta has 938 entries (searched in July 2018). Analogously, macroalgae do not have any single-copy protein-encoding gene reported in the EggNOG database

(http://eggnogdb.embl.de, searched in February 2018), as most proteins are reported for model organisms such as Oryza sativa, Arabidopsis thaliana or Saccharomyces cerevisiae. Because of this scarcity in macroalgae reference sequences, there is an underestimation of macroalgae (false-negatives) and a bias in the taxonomic representation of the macroalgae contributing to the POM. There is a need for enhanced molecular resources for macroalgae, especially for single markers. A single marker (that is, 18S rDNA gene) enhances accurate identification to species level, and could draw phylogenetic relationships among lineages. A robust genomic reference will allow the detection of species in the POC and dissolved organic carbon pools, enabling the use of

113 114 eDNA-based approaches to assign relative contributions of species to the carbon available in the ocean.

Macroalgae taxonomical identification in all datasets was performed by matching the sequences against available DNA references. Macroalgae sequences were less abundant in the 18S metabarcoding dataset, with only 29% of orders available in the metagenomes.

Thus, the metagenomes make it easier to find macroalgal DNA in the water column, given the poor and highly unbalanced representation of macroalgae in the SILVA 18S library.

Macroalgal material is likely to be exported from their coastal habitats as whole thalli or fragments, that either degrade progressively or are rapidly delivery to the deep-sea7.

Although marine eDNA decays within a few days24,25, the drifting macroalgal biomass7 is constantly leaving traces of its DNA. eDNA recovered from metagenomes is the snapshot evidence of the macroalgal biomass exported to the sampling location from the coastal habitat. However, it is uncertain whether the relative abundance of sequences per order truly reflects the contribution of each order within the macroalgal export flux. The focus on metagenomic SCGs provides a parsimonious approach for assessing relative abundance of macroalgae. A single-copy gene occurs once in the genome, accounts for a single cell, and represents one individual in microbial communities26. In multicellular organisms, the relative abundance of DNA sequences from SCGs may be scaled to the relative number of cells (and amount of biomass) available per taxon. Thus, the abundance pattern of macroalgal SCGs from different taxa may be expected to correlate with their contribution to carbon export.

114 115

Given these caveats for metagenomes, and considering that the 18S metabarcodes were limited to fewer samples, we chose the SCG dataset for further analyses of macroalgal order diversity and macroalgal biomass export in the open and deep ocean.

We believe that the SCG approach is probably less biased and more informative than the other two approaches.

3.4 Macroalgal diversity in the ocean

Macroalgal taxonomic composition in the SCG dataset was similar across oceanic regions. Cyanidiales and Ectocarpales were the most ubiquitous and abundant orders across all the basins. Cyanidiales represented 35% of macroalgal DNA sequences. This result was unexpected but may possibly be related to the fact that Cyanidiales is the earliest Rhodophyta and other orders could share enough nucleotides in the sequences that may be identified as Cyanidiales. Nevertheless, we aligned these DNA sequences and the phylogeny separates Rhodophyta orders (Supplementary Figure 1). Furthermore,

Cyanidiales is known for its metabolic capacities and their ability to colonize extreme habitats27. Ectocarpales, the most diverse order of Phaeophyta (774 species in

AlgaeBase28), accounted for 22% of the DNA sequences (Table 1, Figure 1a). The

Atlantic and North Pacific Oceans were the most diverse regions, while the Red Sea (the smallest basin sampled) was the least diverse (Supplementary Table 1, Figure 1a). The

South Atlantic Ocean displayed the highest percentage of macroalgal DNA (17% of the total across all basins), while the lowest was found in the Mediterranean Sea and the

Indian Ocean (8% each). A high abundance of macroalgae was observed poleward of 40° in both the Northern (21%) and Southern (28%) Hemisphere (Figure 1a), possibly

115 116 reflecting high local production of macroalgae at these latitudes. The Arctic supports abundant macroalgae populations along its extensive rocky coastline29, and the

Norwegian Atlantic current may collect significant inputs of boreal macroalgal detritus.

Similarly, there is evidence of export of Antarctic kelps, brown macroalgae of the order

Laminariales, that could potentially be transported over long distances by the Antarctic

Circumpolar Current30. In addition, macroalgal material may be preserved longer at low water temperatures than at the warmer found at tropical latitudes31. Since many species contain air-vesicles that confer buoyancy, polar latitudes could be a dead end for macroalgal material, as has been shown to be the case for plastic accumulation driven by surface circulation32.

One-way PERMANOVA revealed significant differences in the SCG macroalgal

DNA assemblage across oceans (p = 0.0001; df = 7,347; F = 4.7). The Red Sea, Indian

Ocean and South Pacific Ocean were significantly different to the other oceanic regions

(pairwise p < 0.005; Supplementary Table 2). Nevertheless, cluster analysis and non- metric multi-dimensional scaling (nMDS) ordination indicated similarities in the assemblages (Figure 1b-c). Most regions were above 82% similarity, with the Red Sea and the Mediterranean Sea (77% similarity) as exceptions. Differences between overall

PERMANOVA and ordination indicate a dispersion effect, as confirmed with significant difference in variance between groups (analysis of homogeneity of multivariate dispersion (PERMDISP) p < 0.0001; df = 7,347; F = 5.7).

116 117

Figure 1. Assemblage of macroalgae in the ocean. A, Global distribution of macroalgae. Pie sizes represent DNA abundance per region, with highest abundance in the South Pacific Ocean (17%) and lowest abundance in the Mediterranean Sea and Indian Ocean (8% each). The bar graph shows the latitudinal distribution (RPM, read per million means ± s.e.) of total macroalgal DNA, with 50% abundance beyond 40° N and 40° S. b, Bray–Curtis cluster showing similarity > 78% among all regions (cophenetic correlation, r = 0.8634). c, nMDS comparing macroalgal assemblage across oceanic regions, each one indicated by a shaded colour.

3.5 Export of macroalgae throughout the water column

The Malaspina expedition sampled eDNA from surface to 4,000 m, while for Tara the maximum sampling depth was 1,000 m. Consequently, analyses of oceanic macroalgal

117 118 abundance include only the Malaspina dataset. The order diversity of macroalgae varied between depths (PERMANOVA p = 0.001; df = 2,352; F = 43.6; pairwise p < 0.05;

PERMDISP p < 0.0001; df = 2,352; F = 31.7). The epipelagic zone (0-200 m) was the most diverse, while the least diverse was the mesopelagic zone (200-1,000 m,

Supplementary Table 3). The relative abundance of macroalgal DNA (and probably macroalgal carbon) attenuates exponentially with depth at a rate of 37.3% km-1 (Figure

2a). This value is much lower than the attenuation rate of sinking POC flux in the

Northeast Pacific Ocean down to 5,000 m (86% km-1, based on data from Martin et al.33).

However, a lower value for the global ocean attenuation rate is fairly expected due to the refractory nature of macroalgae carbon, which degrades slower compared with planktonic

POC34. These results provide large-scale quantitative evidence of macroalgal transport to the to the deep sea, validating previous assumptions of vertical export7.

Most macroalgae grow in coastal areas. Exceptions are the drifting Sargasso Sea, and macroalgae living on shallow oceanic seamounts35,36. Oceanic and biological processes

(for example, storms and senescence) promote coastal detachment, dispersion and export of macroalgae to the open ocean7,37. In contrast to the exponential attenuation by depth, there was no difference in macroalgal abundance from the shoreline to distances up to

4,860 km (PERMANOVA p = 0.194; df = 6,223; F = 1.2; Figure 2b). This observation corroborates the estimated widespread export of macroalgal material to the open ocean, hitherto based on anecdotal evidence.

118 119

Figure 2. Export of macroalgae to the deep and open ocean. a, Vertical attenuation profile of macroalgal DNA in the Malaspina SCG dataset. The dashed line shows the fitting curve based on the power law equation of y. The calculated attenuation coefficient is indicated by b (see Methods). b, Horizontal export of macroalgal DNA from the shoreline (continent or island) to the open ocean (sampling point), based on the Tara and Malaspina epipelagic metagenomes (0–200 m depth). Data are means ± s.e., RPM, reads per million.

Rhodophyta can tolerate long periods of darkness and remain photosynthetic at great depths38,39. Furthermore, these macroalgae cover the largest geographical extent and support the largest global production38. Thus, Rhodophyta would be estimated to export more material than Phaeophyta and Chlorophyta. Our results confirm these assertions: several red algae were present at high depths, and 63% of the DNA sequences belonged to Rhodophyta, compared with 26% for Phaeophyta and 11% for Chlorophyta (Table 1,

Figure 3). Likewise, Rhodophyta were taxonomically more abundant than Phaeophyta and Chlorophyta (13>3>1 order, respectively). This richness is expected: the AlgaeBase shows a greater diversity of red algae (6,245 classes, 30 orders,) than brown (1,792 classes, 13 orders) or green algae (546 classes, 15 orders)8,22. The low oceanic richness of

Chlorophyta in the exported POM could be related to morphological and biochemical

119 120 features. Rhodophyta and Phaeophyta contain taxon-specific polysaccharides that provide structural complexity and recalcitrancy40. Fucoidans (in brown algae) and carrageenans

(in red algae) bind to the cell wall and protect it from desiccation and cell invasion by microbes, hence delaying degradation11,41-43. These features are absent in green algae41.

Such recalcitrance-promoting compounds may enhance long-distance transport of

Rhodophyta and Phaeophyta, as supported by their prevalence in the oceanic POM pool.

120 121

Figure 3. Oceanic export of macroalgal DNA relative abundance per order. a, Depth attenuation of macroalgae from the surface to the bathypelagic zone (4,000 m), using Malaspina metagenomes. Most orders attenuated with depth, with the exception of Prasiolales and Laminariales. b, Macroalgal DNA export from the shoreline to the open ocean, based on the Tara and Malaspina epipelagic metagenomes (0–200 m depth). The presence of macroalgae is ubiquitous in the open ocean. RPM, reads per million.

121 122

3.6 Implication for Blue Carbon assessment

Our findings demonstrate the ubiquitous presence of macroalgal DNA in the ocean up to a depth of 4,000 m, and 4,860 km away from the nearest coastline. The attenuation rate of macroalgae (37.3% km-1) implies that 69% of the macroalgal DNA available at the surface will sink below 1,000 m. Oceanic models demonstrate that the carbon reaching a depth of 1,500 m is sequestered close to permanent timescales44 in terms of climate change mitigation. Hence, the macroalgal material (and organic carbon) that reaches

1,000 m (the boundary between mesopelagic and bathypelagic layers45) will be sequestered and prevented from exchanging with the atmosphere over extended timescales7,44. Moreover, 24% of macroalgal DNA sequences sinking from the surface will be expected to reach the seafloor (assuming mean oceanic depth of 3,800 m). Our results also revealed an increase in the relative abundance of Laminariales (for instance, kelp) DNA in POM between 3,000 and 4,000 m (Figure 3a), consistent with the reported bedload bulk-transport of kelp to the deep sea7. This transport is influenced by episodic storm-driven events7,46 that detach and rapidly sink macroalgae; this rapid sink is due to the presence of heavy rocky substrate retained by macroalgae in their holdfast.

Submarine canyons support intense bedload fluxes of kelp, thereby delivering macroalgae (along with their DNA sequences and carbon) directly into the deep sea7,47-49.

Through this mechanism, a larger biomass of Laminariales is delivered to the deep sea, while the remaining orders progressively degrade into smaller and smaller fragments, thus attenuating exponentially with increasing depth.

While the global ocean metagenomes analyzed here were produced to explore the oceanic microbiome, the data also allow detection of eukaryotic organisms such as

122 123 macroalgae. Metagenomes are an unexplored tool for fingerprinting the contributions of different organisms to POM in the ocean. This research is a first step supporting the role of macroalgae as an important allochthonous source of blue carbon sequestered in the deep sea. Our eDNA approach provides robust evidence of the widespread oceanic presence of macroalgae, and our data support the hypothesis of macroalgae export to the open and deep ocean hitherto based on estimations1,7. As DNA is also cellular organic carbon14, we infer that the presence of the taxa evidences the export of macroalgal carbon. Nevertheless, calculations of the macroalgal carbon exported to the ocean require experimentally-determined ratios between carbon and DNA content per taxon, which are currently unknown. Although the ultimate fate of oceanic macroalgal material remains uncertain, it is clear that a significant fraction reaches oceanic sinks while another is grazed and degraded by bacteria, thus subsidizing oceanic food webs.

3.7 Methods

Macroalgae taxa are included in two of the eight major lineages of the Eukaryota domain50, where they belong to 4 kingdoms, 15 phyla and 54 classes22. Marine macroalgae are found in three phyla (Rhodophyta, Phaeophyta, Chlorophyta), which also contain microalgae. Chlorophyta (green algae) are closer to vascular plants than to

Rhodophyta (red algae) or to Phaeophyta (brown algae), which are closer to molds than to other macroalgae22,50,51. Macroalgae groups have broad differences in cell wall composition11. Even the same order of algae can have strong divergences among genera52. Thus, macroalgae classification is very diverse, and the term macroalgae describes functional groups that are not necessarily related phylogenetically to each other

123 124 even at phylum level. Identification of macroalgal DNA sequences is a challenging process: there is no universal gene marker53, and barcoding attempts are limited to certain groups53,54. 18S rDNA barcoding resources are poorly represented: only 3.8 % of macroalgae are reported in the SILVA database (1,068 of 27,500 described species22, http://www.arb-silva.de, searched on July 2018). Nevertheless, the available (and limited) molecular resources based on a single gene marker are an important tool for accurately identifying macroalgae, and also for drawing phylogenetic conclusions about these taxa.

However, less strict approaches can also be used to identify marine macroalgae groups when resolution at species level is not required. Since DNA represents 3% of cellular organic carbon14, here we infer the carbon export of marine macroalgae phyla with the presence of macroalgal DNA in the water column. We investigated occurrence of macroalgae in the open and deep ocean, using global metagenomes and metabarcodes generated by the Tara Oceans15 and Malaspina 2010 Circumnavigation16 expeditions.

Sample description

Tara sampling covered epipelagic and mesopelagic zones (5-1,000 m) across eight oceanic regions, with a total of 210 sampling stations: North Atlantic Ocean, South

Atlantic Ocean, North Pacific Ocean, South Pacific Ocean, Indian Ocean, Southern

Ocean, Mediterranean Sea, and Red Sea. Each location was sampled at different depths

(surface water layer 3-7 m, deep chlorophyll maximum layer 30-70 m, mesopelagic zone

400-1,000 m), using CTD and Niskin bottle Rosette sampling system17,18,20. We used 243 metagenomic samples that targeted the gene pool of viral to metazoan plankton, using multiple filters to isolate distinct size-fractions of the suspended particle pool (0.1-0.22

124 125

μm 20 samples, <0.22 μm 45 samples, 0.22-0.45 μm 18 samples, 0.45-0.8 μm 21 samples, 0.22-1.6 μm 36 samples, and 0.22-3 μm 103 samples)20. We used 163 metabarcodes from the 18S rDNA aimed at piconano- to meso-plankton communities

(size-fractions: <0.8 μm 28 samples, 0.22-3 μm 1 sample, 0.8-5 μm 60 samples, 0.8-20

μm 6 samples, 5-20 μm 23 samples, 20-180 μm 31 samples, and 180-2000 μm 14 samples)19. Water samples were kept at -20 °C on board and at -80 °C in the laboratory until DNA extraction, then DNA was kept at -20 °C until sequencing55. Detailed sampling and methods are available for metagenomes in Pesant et al.20, and for 18S rDNA amplicons in De Vargas et al.19.

The Malaspina expedition sampled open-ocean waters from surface to 4,018 m depth, with emphasis on the bathypelagic zone (1,000-4,000 m)16. Water samples were collected using CTD and Niskin bottle Rosette sampling system, at 70 sampling stations across the oceans and grouped into the eight oceanic regions used by Tara (see above). Filters containing the particle pools sampled in the water were flash frozen in liquid nitrogen and stored at -80 °C until DNA extraction and further sequencing56. We used 174 metagenomic21 samples that targeted free-living bacteria to picoeukaryotes and nanoeukaryotes (size-fractions: 0.2-0.8 μm 29 samples, 0.2-3 μm 100 samples, 0.8-20 μm

31 samples, and 3-20 μm 14 samples).

Together, these expeditions used massive DNA sequencing and generated hundreds of metagenomes to assess oceanic microbial and planktonic diversity18. Tara also generated a global Eukaryotic DNA resource based on 18S-V9 rDNA amplicons11. This data collection did not aim to survey macro-organisms. Nevertheless, we exploited their

125 126 potential to reveal macroalgal genes. Data came from 153 Tara and 65 Malaspina sampling stations across the oceans, from the surface to 4,000 m (Supplementary Fig. 2).

We identified DNA sequences of Rhodophyta, Phaeophyta, and Chlorophyta using two datasets: (1) amplicon 18S rDNA-based metabarcodes from Tara Oceans, and (2) metagenomes that contain the whole gene pool from both Tara Oceans and Malaspina.

These macroalgal DNA sequences belonged to 20,212 unique genes available in the reference gene catalogs of both expeditions (Supplementary Table 4). We believe this holistic approach has not been tried before.

Amplicon 18S data extraction

For the first dataset, denoted 18S, single amplicon reads were extracted from 163

Tara57 metabarcodes of the 18S rDNA V9 hyper-variable loop. Metabarcodes were blasted against SILVA 18S rDNA database (SILVA Release 132, http://www.arb- silva.de). Macroalgae taxonomy is not well described22, and sequences from the 18S rDNA are scarce in the SILVA database (only 3.8 % representation, searched on July

2018); thus, to avoid false-negative results, we chose order rather than species as the taxonomical level. The search was taxonomically restricted to taxa ,

Stramenopiles and Rhodophyta. The resulting taxonomical list was filtered manually by choosing all macroalgae orders whose sequences presented an identity percentage cut-off

>90%. A cut-off of 90% is above the accepted threshold for order level (84-90%58-62).

Malaspina 18S rDNA metabarcodes, though available63, were excluded for several reasons: Malaspina sequenced the 18S rDNA V4 region, and the sampling and sequencing effort was much lower than in Tara. The contribution of Malaspina to the

126 127 amplicon 18S dataset was limited to only 7 samples presenting any macroalgae sequence, in contrast to 78 samples from Tara.

Metagenomic data extraction

For the second dataset, denoted as metagenomes, we used 243 Tara20 and 174

Malaspina21 metagenomes to find macroalgal DNA sequences in the open ocean. We used two different strategies: (1) targeting all genes, and (2) restricting the query to the top-four single-copy protein-encoding genes (SCGs) available in the gene catalogs of both expeditions. Each strategy generated its own new dataset. Metagenomic data were analyzed using the Dragon Metagenomics analysis platform (DMAP, http://www.cbrc.kaust.edu.sa/dmap). DMAP re-annotated Tara Oceans and Malaspina metagenomic gene catalogs, keeping the original reads that are based on gene abundance for each sample (units are in reads per million, RPM).

DMAP uses UniProt Knowledgebase as a reference database to compare genes from

Tara and Malaspina’s gene catalogs. To assign taxonomy and generic functional role,

DMAP uses high-throughput BLASTp that are examined to traverse lowest common ancestor along the best hits. Specific functional role is assigned using BLASTp against

Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthologs from KEGG database.

This taxonomic and functional role information is indexed for all genes, and made available for lookups and sample comparisons in the Compare module of DMAP. In this module, we restricted both metagenomic strategies (all genes and SCG) to taxa

Viridiplantae (DMAP filter taxID: 33090), Stramenopiles (taxID: 33634) and

Rhodophyta (taxID: 2763); the search was restricted to coverage and identity percentage

127 128 cut-offs greater than or equal to 90%. A higher cut-off recovers fewer sequences (false- negatives). Identification of macroalgae in the metagenome dataset is based on protein similarity. Proteins are very conserved at higher taxonomic levels, thus a threshold of

90% is above the mean percentage identity for proteins (70%)64.

The SCG strategy included additional steps. Initially, we wanted to restrict the search to SCGs specifically from Chlorophyta, Rhodophyta or Phaeophyta, but there were none available in the reference database (EggNOG65, searched on February 2018). Thus, we used the KEGG Ortholog module of DMAP to search for the top-four SCGs present in

Viridiplantae, Stramenopiles and Rhodophyta within the expeditions’ gene catalogues.

Back to the Compare module, we individually restricted the SCG search to each of the following top-four protein-encoding genes: NADH:ubiquinone reductase (EC: 1.6.5.3),

N-acetyl-gamma-glutamyl-phosphate reductase (EC: 1.2.1.38), DNA-directed RNA polymerase (EC: 2.7.7.6), and non-specific serine/threonine protein kinase (EC:

2.7.11.1).

Order was used as the level for taxonomical assignment, because most macroalgae species have an incomplete genome reference library and undescribed taxonomy22. The initial search using species as taxonomical level returned false-negative BLAST hits. For instance, when the search was restricted to a few species of the order Ectocarpales,

DMAP did not return any sequence because the dataset is incomplete; sequences were returned when we searched directly for the order. The databases include unknown or uncultured sequences that are assigned to higher taxonomic ranks, e.g. order. The search generated a taxonomical list, where we manually filtered all macroalgae orders that returned sequences.

128 129

Data analyses

A list of macroalgae orders and the relative abundance of the sequences was obtained from each dataset (18S metabarcodes, metagenomes all genes and metagenomes SCG).

Relative abundance is reported in the metagenomes as reads per million (RPM), and as metagenomic Illumina tags for the 18S dataset. Each sample included information on depth, size-fraction and location. To account for unequal sampling effort within each oceanic region, the relative abundance of sequences was standardized by dividing the total number of sequences of each order in each oceanic region by the number of samples within each oceanic region.

We performed Bray-Curtis Similarity clustering and non-metric multidimensional scaling (nMDS) ordination to elucidate differences in macroalgal assemblage among oceanic regions. One-way permutational multivariate analysis of variance

(PERMANOVA) and analysis of homogeneity of multivariate dispersion (PERMDISP) based on Bray-Curtis similarities were performed to test for differences in macroalgal assemblage composition across oceanic basins; data were log-transformed prior these analyses. These analyses were done in R using the Vegan66 package. To evaluate how taxonomic richness and relative abundance of the sequences is distributed among oceanic regions, we calculated the indices of Pielou equitability (J), Dominance (D) and Shannon

(H); these indices assess evenness, dominance and diversity at order level. To compare observed order richness with estimated richness, we calculated the index CHAO 2.

Indices were calculated in PAST67. Order diversity was also evaluated through the water column from surface to 4,000 m using only the Malaspina dataset; the Tara dataset is limited to 1,000 m depth.

129 130

The global distribution of macroalgal DNA sequences was analyzed by assessing export and relative abundance with depth from the surface to the deep ocean (vertically), and with distance from the sampling point to the closest shoreline (horizontally). Vertical export was analyzed by comparing Malaspina macroalgae sequences through the water column zones: epipelagic (0-200 m), mesopelagic (200-1,000 m) and bathypelagic zone

(1,000-4,000 m). Attenuation of macroalgae sequences with depth was modeled by fitting relative abundance of each zone to a normalized power function, following the coefficient33 for particulate organic carbon flux:

푦 = 푎푥, where y is macroalgae relative abundance, a is the intercept, x is the depth, and b is the macroalgae attenuation coefficient (sequences in RPM km-1). Relative abundance of macroalgal DNA by depth was standardized, dividing the total number of sequences per depth category by the number of samples within each depth category (0-200 m; 500 m;

1,000 m; 2,000 m; 3,000 m; and 4,000 m; maximum depth recorded was 4,018 m).

Horizontal export was analyzed by comparing relative abundance of macroalgae sequences with distance from the sampling point to the closest shoreline (continent or island) using one-way permutational multivariate analysis of variance (PERMANOVA).

These data consisted of Tara and Malaspina metagenomes that belong to the epipelagic zone (0-200 m). The relative abundance of macroalgal DNA was standardized, dividing the total number of sequences per distance category by the number of samples within each distance category (0-200 km; 500 km; 1,000 km; 2,000 km; 3,000 km; 4,000 km; and 5,000 km; maximum distance recorded was 4,860 km).

130 131

Data availability

The data that support the findings on this study are available in: Pesant et al.20 (Tara

Oceans metagenomes, doi: 10.1038/sdata.2015.23), De Vargas et al.57 (Tara Oceans 18S rDNA V9 metabarcodes, doi: 10.1126/science.1261605); and Zenodo (Malaspina metagenomes, doi: 10.5281/zenodo.2596829)21.

3.8 Acknowledgements

We thank Tara Oceans Consortium for data availability. This research was supported by the Malaspina 2010 Expedition, funded by the Spanish Ministry of Economy and

Competitiveness through the Consolider-Ingenio program to CMD (Reference CSD2008-

00077); CARMA, funded by the Independent Research Fund Denmark to DKJ

(Reference 8021-00222B); and King Abdullah University of Science and Technology’s project BAS/1/1071-01-01 to CMD. We thank all the scientists and crew for their support during sample collection on the Malaspina 2010 cruise, and especially E. Borrull, C.

Díez-Vives, E. Lara, D. Vaqué, G. Salazar, and F. Cornejo-Castillo for DNA sampling.

The authors are grateful to the KAUST Supercomputing Laboratory (KSL) for the resources provided.

131 132

3.9 Supplementary Material

Supplementary Figure 1. Phylogeny of Rhodophyta SCG from Malaspina dataset. Numbers after species indicates enzyme ID.

132 133

Supplementary Figure 2. Geographical location of sampling points per dataset. Malaspina and Tara Oceans (circles) are metagenomic data. 18S rDNA metabarcodes (triangles) were collected during the Tara Oceans expeditions.

133

Supplementary Table 1. Macroalgal diversity indices by oceanic region (mean ± SD): IO Indian Ocean, MS Mediterranean Sea; NAO, SAO North and South Atlantic Ocean; NPO, SPO North and South Pacific Ocean; RS Red Sea; and SO Southern Ocean. Relative abundance of DNA sequences is in reads per million (RPM), standardized by number of samples per basin. The South Atlantic Ocean was the most diverse region (high abundance and equitability, low dominance), while the Red Sea was the least diverse region. Estimated order richness using the CHAO 2 estimator was in agreement with observed richness: 17 orders.

Index IO MS NAO NPO RS SAO SO SPO Order n 14 9 15 16 6 15 11 14 DNA (RPM) 7.8 7.7 12.7 10.1 9.7 15.1 14.2 15.9 Dominance D 0.2(±0.01) 0.4(±0.01) 0.2(±0.01) 0.2(±0.01) 0.6(±0.02) 0.2(±0.01) 0.2(±0.01) 0.3(±0.01) Shannon Diversity (H) 1.8(±0.03) 1.4(±0.03) 1.9(±0.02) 1.8(±0.03) 0.8(±0.03) 2.1(±0.02) 1.8(±0.02) 1.6(±0.02) Pielou Equitability (J) 0.7(±0.01) 0.6(±0.01) 0.7(±0.01) 0.7(±0.01) 0.5(±0.02) 0.8(±0.01) 0.8(±0.01) 0.6(±0.01)

Supplementary Table 2. One-Way PERMANOVA based on Bray-Curtis similarity, comparing SCG macroalgal DNA assemblage among oceanic regions. Bonferroni corrections of p values are shown in the upper right triangle of the matrix; significant differences are shown in bold. F values are shown in the lower left triangle. IO Indian Ocean, MS Mediterranean Sea; NAO, SAO North and South Atlantic Ocean; NPO, SPO North and South Pacific Ocean; RS Red Sea; and SO Southern Ocean.

IO MS NAO NPO RS SAO SO SPO IO 0.0056 0.0028 0.0028 0.6132 0.0028 0.0028 0.0028 MS 4.04 0.224 0.0896 0.5488 0.0028 0.7896 0.0028 NAO 7.767 2.838 0.7448 0.224 0.014 0.0952 0.0364 NPO 5.254 3.405 2.527 1 0.1288 0.0196 0.798 RS 2.163 2.507 2.999 1.755 0.8708 0.1372 0.0924 SAO 4.828 4.339 3.87 2.834 2.007 0.0028 0.0056 SO 6.963 2.383 3.554 4.611 3.136 5.853 0.0028 SPO 11.77 7.7 4.03 2.415 3.403 4.528 7.044

136 Supplementary Table 3. Macroalgae diversity indices (mean ± SD) by depth, using the Malaspina single-copy protein-encoding gene (SCG) dataset. Relative abundance of DNA sequences is in reads per million (RPM), standardized by number of samples per depth. The highest diversity (high abundance and equitability, low dominance) was found in the epipelagic zone (0-200 m), while the mesopelagic zone (200-1,000 m) was the least diverse depth zone.

Epipelagic Mesopelagic Bathypelagic Index <200 m <500 m <1,000 m <2,000 m <3,000 m <4,000 m Order n 11 12 9 5 6 6 DNA (RPM) 13.5 11 7.7 5.1 5 3.9 Dominance D 0.2(±0.01) 0.6(±0.02) 0.6(±0.02) 0.5(±0.02) 0.4(±0.02) 0.4(±0.01) Shannon Diversity (H) 1.8(±0.02) 1.0(±0.04) 1.0(±0.04) 1.0(±0.04) 1.2(±0.04) 1.1(±0.04) Pielou Equitability (J) 0.8(±0.01) 0.4(±0.02) 0.4(±0.02) 0.6(±0.02) 0.7(±0.02) 0.6(±0.02)

136 137 Supplementary Table 4. Number of unique genes per macroalgae order available in the reference gene catalogue of both Malaspina and Tara Oceans expeditions.

Lineage Order Unique genes Chlorophyta Prasiolales 82 Chlorophyta Ulvales 40 Phaeophyta Ectocarpales 12,815 Phaeophyta Fragilariales 99 Phaeophyta Fucales 74 Phaeophyta Laminariales 69 Rhodophyta Balliales 1 Rhodophyta Bangiales 434 Rhodophyta Batrachospermales 18 Rhodophyta Bonnemaisoniales 21 Rhodophyta Ceramiales 230 Rhodophyta Corallinales 83 Rhodophyta Cyanidiales 3,510 Rhodophyta Gelidiales 91 Rhodophyta Gigartinales 1972 Rhodophyta Gracilariales 111 Rhodophyta Halymeniales 33 Rhodophyta Hapalidiales 3 Rhodophyta Nemaliales 282 Rhodophyta Palmariales 39 Rhodophyta Plocamiales 25 Rhodophyta Porphyridiales 80 Rhodophyta Rhodymeniales 47 Rhodophyta Stylonematales 53 Total 20,212

137 138 3.10 References

1 Duarte, C. M. & Cebrián, J. The fate of marine autotrophic production. Limnol. Oceanogr. 41, 1758-1766 (1996).

2 Duarte, C. M. & Krause-Jensen, D. Export from seagrass meadows contributes to marine carbon sequestration. Front. Mar. Sci. 4, 13 (2017).

3 McLeod, E. et al. A blueprint for blue carbon: toward an improved understanding of the role of vegetated coastal habitats in sequestering CO2. Front. Ecol. Environ. 9, 552-560 (2011).

4 Fourqurean, J. W. et al. Seagrass ecosystems as a globally significant carbon stock. Nat. Geosci. 5, 505-509 (2012).

5 Duarte, C. M., Kennedy, H., Marbà, N. & Hendriks, I. Assessing the capacity of seagrass meadows for carbon burial: current limitations and future strategies. Ocean Coast. Manage. 83, 32-38 (2013).

6 Donato, D. C. et al. Mangroves among the most carbon-rich forests in the tropics. Nat. Geosci. 4, 293-297 (2011).

7 Krause-Jensen, D. & Duarte, C. M. Substantial role of macroalgae in marine carbon sequestration. Nat. Geosci. 9, 737-742 (2016).

8 Krause-Jensen, D. et al. Sequestration of macroalgal carbon: the elephant in the Blue Carbon room. Biol. Lett. 14, 20180236 (2018).

9 Duarte, C. M. Reviews and syntheses: Hidden forests, the role of vegetated coastal habitats in the ocean carbon budget. Biogeosciences 14, 301-310 (2017).

10 Garden, C. J. & Smith, A. M. Voyages of seaweeds: The role of macroalgae in sediment transport. Sediment. Geol. 318, 1-9 (2015).

11 Kloareg, B. & Quatrano, R. S. Structure of the cell walls of marine algae and ecophysiological functions of the matrix polysaccharides. Oceanogr. Mar. Biol. 26, 259- 315 (1988).

12 Barrón, C., Apostolaki, E. T. & Duarte, C. M. Dissolved organic carbon fluxes by seagrass meadows and macroalgal beds. Front. Mar. Sci. 1, 42 (2014).

13 Krumhansl, K. A. & Scheibling, R. E. Production and fate of kelp detritus. Mar. Ecol. Prog. Ser. 467, 281-302 (2012).

14 Landenmark, H. K. E., Forgan, D. H. & Cockell, C. S. An estimate of the total DNA in the biosphere. PLoS Biol. 13, e1002168 (2015).

138 139 15 Karsenti, E. et al. A holistic approach to marine eco-systems biology. PLoS Biol. 9, e1001177 (2011).

16 Duarte, C. M. Seafaring in the 21st century: the Malaspina 2010 Circumnavigation Expedition. Limnol. Oceanogr. 24, 11-14 (2015).

17 Salazar, G. et al. Global diversity and biogeography of deep-sea pelagic prokaryotes. ISME J. 10, 596-608 (2016).

18 Hingamp, P. et al. Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes. ISME J. 7, 1678-1695 (2013).

19 De Vargas, C. et al. Eukaryotic plankton diversity in the sunlit ocean. Science 348, 1261605 (2015).

20 Pesant, S. et al. Open science resources for the discovery and analysis of Tara Oceans data. Sci. Data 2, 150023 (2015).

21 Sánchez, P. et al. Dataset: Common photosynthetic enzymes from 174 metagenomes from the Malaspina Expedition 2010. Supplement to: Ortega et al. (2019) Important contribution of macroalgae to oceanic carbon sequestration, Zenodo https://doi.org/10.5281/zenodo.2596829 (2019).

22 Guiry, M. D. How many species of algae are there? J. Phycol. 48, 1057-1063 (2012).

23 Cock, J. M. et al. The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 465, 617 (2010).

24 Collins, R. A. et al. Persistence of environmental DNA in marine systems. Commun. Biol. 1, 185 (2018).

25 Thomsen, P. F., Kielgast, J., Iversen, L.L., Møller, P.R., Rasmussen, M., Willerslev, E. Detection of a diverse marine fish fauna using environmental DNA from seawater samples. PLoS One 7, e41732 (2012).

26 Roux, S., Enault, F., le Bronner, G. & Debroas, D. Comparison of 16S rRNA and protein-coding genes as molecular markers for assessing microbial diversity (Bacteria and Archaea) in ecosystems. FEMS Microbiol. Ecol. 78, 617-628 (2011).

27 Seckbach, J. & Chapman, D. J. Red algae in the genomic age (Springer, 2010).

28 Guiry, M. D. AlgaeBase. World-wide electronic publication, http://www.algaebase.org (2013).

29 Krause-Jensen, D. & Duarte, C. M. Expansion of vegetated coastal ecosystems in the future Arctic. Front. Mar. Sci. 1, 77 (2014).

139 140 30 Kaehler, S., Pakhomov, E. A., Kalin, R. M. & Davis, S. Trophic importance of kelp- derived suspended particulate matter in a through-flow sub-Antarctic system. Mar. Ecol. Prog. Ser. 316, 17-22 (2006).

31 Kelaher, B. P., Coleman, M. A. & Bishop, M. J. Ocean warming, but not acidification, accelerates seagrass decomposition under near-future climate scenarios. Mar. Ecol. Prog. Ser. 605, 103-110 (2018).

32 Cózar, A. et al. The Arctic Ocean as a dead end for floating plastics in the North Atlantic branch of the Thermohaline Circulation. Sci. Adv. 3 (2017).

33 Martin, J. H., Knauer, G. A., Karl, D. M. & Broenkow, W. VERTEX: carbon cycling in the northeast Pacific. Deep Sea Res. (II Top. Stud. Oceanogr.) 34, 267-285 (1987).

34 Enríquez, S., Duarte, C. M. & Sand-Jensen, K. A. J. Patterns in decomposition rates among photosynthetic organisms: the importance of detritus C:N:P content. Oecol. 94, 457-471 (1993).

35 Carpenter, E. J. & Cox, J. L. Production of pelagic Sargassum and a blue‐green epiphyte in the western Sargasso Sea. Limnol. Oceanogr. 19, 429-436 (1974).

36 Woodborne, M. W., Rogers, J. & Jarman, N. The geological significance of kelp-rafted rock along the west coast of South Africa. Geo-Mar. Lett 9, 109-118 (1989).

37 Garden, C. J., Currie, K., Fraser, C. I. & Waters, J. M. Rafting dispersal constrained by an oceanographic boundary. Mar. Ecol. Prog. Ser. 501, 297-302 (2014).

38 Gattuso, J. P. et al. Light availability in the coastal ocean: impact on the distribution of benthic photosynthetic organisms and contribution to primary production. Biogeosciences 3, 895-959 (2006).

39 Littler, M. M., Littler, D. S., Blair, S. M. & Norris, J. N. Deepest known plant life discovered on an uncharted seamount. Science 227, 57–59 (1985).

40 Trevathan-Tackett, S. M. et al. Comparison of marine macrophytes for their contributions to blue carbon sequestration. Ecology 96, 3043-3057 (2015).

41 Percival, E. The polysaccharides of green, red and brown seaweeds: their basic structure, biosynthesis and function. Br. Phycol. 14, 103-117 (1979).

42 Shukla, P. S., Borza, T., Critchley, A. T. & Prithiviraj, B. Carrageenans from red seaweeds as promoters of growth and elicitors of defense response in plants. Front. Mar. Sci. 3, 81 (2016).

43 Berteau, O. & Mulloy, B. Sulfated fucans, fresh perspectives: structures, functions, and biological properties of sulfated fucans and an overview of enzymes active toward this class of polysaccharide. Glycobiology 13, 29R-40R (2003).

140 141 44 Herzog, H., Caldeira, K. & Reilly, J. An issue of permanence: assessing the effectiveness of temporary carbon storage. Clim. Change 59, 293-310 (2003).

45 Robinson, C. et al. Mesopelagic zone ecology and biogeochemistry–a synthesis. Deep Sea Res. (II Top. Stud. Oceanogr.) 57, 1504-1518 (2010).

46 Dierssen, H. M., Zimmerman, R. C., Drake, L. A. & Burdige, D. J. Potential export of unattached benthic macroalgae to the deep sea through wind‐driven Langmuir circulation. Geophys. Res. Lett. 36 (2009).

47 De Leo, F. C., Smith, C. R., Rowden, A. A., Bowden, D. A. & Clark, M. R. Submarine canyons: hotspots of benthic biomass and productivity in the deep sea. Proc. R. Soc. Lond., Ser. B: Biol. Sci., rspb20100462 (2010).

48 Canals, M. et al. Flushing submarine canyons. Nature 444, 354 (2006).

49 Harrold, C. & Lisin, S. Radio-tracking rafts of giant kelp: local production and regional transport. J. Exp. Mar. Biol. Ecol. 130, 237-251 (1989).

50 Baldauf, S. L. The deep roots of eukaryotes. Science 300, 1703-1706 (2003).

51 Zuccarello, G. C., Price, N., Verbruggen, H. & Leliaert, F. Analysis of a plastid multigene data set and the phylogenetic position of the marine macroalga Caulerpa filiformis (Chlorophyta). J. Phycol. 45, 1206-1212 (2009).

52 Nakada, T., Misawa, K. & Nozaki, H. Molecular systematics of Volvocales (Chlorophyceae, Chlorophyta) based on exhaustive 18S rRNA phylogenetic analyses. Mol. Phylogenet. Evol. 48, 281-291 (2008).

53 Saunders, G. W. & Kucera, H. An evaluation of rbcL, tufA, UPA, LSU and ITS as DNA barcode markers for the marine green macroalgae. Cryptogamie Algol. 31, 487 (2010).

54 Saunders, G. W. Applying DNA barcoding to red macroalgae: a preliminary appraisal holds promise for future applications. Philos. Trans. R. Soc. Lond. B Biol. Sci. 360, 1879-1888 (2005).

55 Clerissi, C. et al. Unveiling of the diversity of Prasinoviruses (Phycodnaviridae) in marine samples by using high-throughput sequencing analyses of PCR-amplified DNA polymerase and major capsid protein genes. Appl. Environ. Microbiol. 80, 3150-3160 (2014).

56 Pernice, M. C. et al. Large variability of bathypelagic microbial eukaryotic communities across the world’s oceans. ISME J. 10, 945 (2015).

57 De Vargas, C., Tara Oceans Expedition, P. & Tara Oceans Consortium, C. in Supplement to: De Vargas, C. et. al. (2015): First Tara Oceans V9 rDNA metabarcoding dataset, https://doi.org/10.1594/PANGAEA.843017 (2015).

141 142 58 Lanzén, A. et al. CREST–classification resources for environmental sequence tags. PLoS One 7, e49334 (2012).

59 Cole, J. R., Konstantinidis, K., Farris, R. J. & Tiedje, J. M. in Environmental molecular biology (eds Wen-Tso Liu & Janet K. Jansson) Ch. Microbial diversity and phylogeny: extending from rRNAs to genomes, 1-20 (Horizon Scientific Press, 2010).

60 Giongo, A., Davis-Richardson, A. G., Crabb, D. B. & Triplett, E. W. TaxCollector: modifying current 16S rRNA databases for the rapid classification at six taxonomic levels. Diversity 2, 1015-1025 (2010).

61 Hong, S.-H., Bunge, J., Jeon, S.-O. & Epstein, S. S. Predicting microbial species richness. Proc. Natl. Acad. Sci. U. S. A. 103, 117-122 (2006).

62 Schloss, P. D. & Handelsman, J. Status of the microbial census. Microbiol. Mol. Biol. Rev. 68, 686-691 (2004).

63 Giner, C. R. et al. Marked changes in diversity and relative activity of picoeukaryotes with depth in the global ocean. Preprint at https://www.biorxiv.org/node/177912.abstract 552604 (2019).

64 Krause, L. et al. Taxonomic composition and gene content of a methane-producing microbial community isolated from a biogas reactor. J. Biotechnol. 136, 91-101 (2008).

65 Huerta-Cepas, J. et al. eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences. Nucleic Acids Res. 44, D286-D293 (2015).

66 Oksanen, J. et al. Vegan: Community ecology package. R Doc 10, 631-637 (2007).

67 Hammer, Ř., Harper, D. A. T. & Ryan, P. D. PAST: Paleontological Statistics Software Package for Education and Data Analysis Palaeontol. Electron. 4, 9 (2001).

142