<<

Metabarcoding of the 18S rRNA gene to uncover new molecular biodiversity in Metazoa and unicellular Opisthokonta

Alicia Sánchez Arroyo

TESI DOCTORAL UPF / 2019

Director de la tesi:

Dr. Iñaki Ruiz Trillo

DEPARTAMENT DE CIÈNCIES EXPERIMENTALS I DE LA SALUT

Imagen de cubierta: Collage para la tesis de Alicia, de A.S.D. (IV-2019)

Para mi mamá, por ser mi ejemplo de mujer

y si no hay tiempo suficiente para darle lo que merece crees que si suplico con fuerza al cielo el alma de mi madre volvería a mí como mi hija para que así pueda darle el bienestar que me dio toda mi vida

· Rupi Kaur

Table of contents

Resumen/Abstract ...... I Prefacio/Preface ...... III 1. INTRODUCTION ...... 1 1.1. Biodiversity ...... 3 1.2. Methodology to study biodiversity ...... 6 1.3. Molecular markers ...... 11 1.4. What defines a ? ...... 14 1.5. Opisthokonta: , fungi and their unicellular relatives ...... 18 2. OBJECTIVES ...... 39 3. RESULTS ...... 43 3.1. Novel diversity of deeply branching and unicellular holozoans revealed by metabarcoding in Middle Paraná River, Argentina ...... 45 3.2. Gene similarity networks from the Tara Oceans expedition unveil geographical distribution, ecological interactions, and novel diversity among unicellular relatives of animals ...... 71 3.3. Hidden diversity of Acoelomorpha revealed through metabarcoding ...... 107 4. DISCUSSION ...... 121 4.1. Description of the potential real diversity of unicellular Opisthokonta ...... 124 4.2. New diversity of Acoelomorpha and its implications for understanding early bilaterian evolution ...... 134 4.3. Technical caveats of metabarcoding ...... 136 4.3.1. Cut-offs and clusterization methods for OTU delineation ...... 136 4.3.2. 18S reads as a (crude) proxy for species abundance ...... 139 4.3.3. Differences in biodiversity recovery between 18S rRNA gene hypervariable regions...... 141 4.4. Gene similarity networks to test evolutionary theories ...... 144 4.5. Future perspective for biodiversity assessments ...... 148 5. CONCLUSIONS ...... 151 6. REFERENCES ...... 157 7. APPENDIX ...... 179 7.1. Comparison between USEARCH and Swarm clustering methods ...... 181 8. ACKNOWLEDGEMENTS ...... 185

List of figures and tables

Figure 1. Methodology to study biodiversity ...... 6 Figure 2. Nucleotide variability in the 18S rRNA gene ...... 13 Figure 3. Opisthokonta tree of life ...... 19 Figure 4. diversity ...... 22 Figure 5. Choanoflagellatea diversity ...... 24 Figure 6. diversity...... 26 Figure 7. Ichthyosporea diversity ...... 29 Figure 8. Corallochytrea diversity...... 31 Figure 9. Holomycota diversity ...... 34 Figure 10. Holomycota tree of life...... 36 Figure 11. Alignment of the Tara Oceans forward primer in Filasterea ...... 127 Figure 12. 18S rDNA sequence similarity in different taxonomic ranks ...... 137 Figure 13. Correlation between rDNA copy number and size...... 140 Figure 14. Sequence similarity networks...... 145 Figure 15. Comparison between USEARCH 97% OTUs and Swarm OTUs ...... 183

Table 1. Overview of the relevance of protistan biodiversity ...... 5 Table 2. Clustering thresholds and softwares applied ...... 136 Table 3. Null and alternative hypothesis to test closeness and assortativity ...... 147 Table 4. Comparison between USEARCH and Swarm clustering pipelines…….……182

Resumen Una de las principales transiciones evolutivas es el origen de la multicelularidad en animales y hongos a partir de sus ancestros unicelulares. Para entender este cambio es necesario estudiar a los organismos unicelulares más próximos evolutivamente a los animales y hongos, todos ellos constituyendo el clado Opisthokonta. Sin embargo, la diversidad real de estos linajes es prácticamente desconocida. Esta tesis analiza datos de metabarcoding del gen 18S rRNA tanto de opistocontos microscópicos como de animales con el objetivo de encontrar clados potencialmente nuevos y entender mejor su ecología. Primero describo nueva diversidad molecular en opistocontos unicelulares de un río tropical y de ecosistemas marinos de todo el mundo, algunos incluso con el potencial de ser nuevos linajes. Segundo, implemento el uso de redes de similitud como una aproximación sistemática y estadística para el estudio de nueva biodiversidad molecular. Finalmente, muestro la existencia de diversidad nueva de animales acelomorfos que nos puede ayudar a entender la transición hacia el resto de los animales bilaterales. Este trabajo, por tanto, confirma al metabarcoding como una herramienta esencial para descubrir biodiversidad desconocida y poder resolver grandes problemas evolutivos.

Abstract One of the main evolutionary transitions is the origin of and fungal multicellularity from single-cell ancestors. To address this change, it is necessary to study the extant unicellular relatives of animals (Metazoa) and fungi, all of them forming the Opisthokonta . However, the real diversity of these organisms is mostly unknown. The present thesis analyses metabarcoding data from the 18S rRNA gene of both microbial as well as animals with the objective of finding potential novel and understanding better their ecology. I first describe new molecular diversity in unicellular Opisthokonta from a freshwater tropical river and from global marine ecosystems, some of them with the potential to be new lineages. Second, I implement the use of networks as a systematic and statistical way to address the study of novel molecular biodiversity. Finally, I uncover potential new diversity of acoelomate that can help to understand the transition towards bilaterian animals. All these data confirm the potential of metabarcoding to screen for hidden biodiversity in the tree of life and thus, solve important evolutionary questions.

I

II

Prefacio

La vida en la Tierra es el resultado de millones de años de evolución y diversificación de los seres vivos. El mundo que vemos en la actualidad se mantiene gracias a la estrecha interacción de dichos organismos. Desde la antigüedad, los humanos han utilizado la biodiversidad como fuente de comida, materiales y transporte. Por tanto, la morfología era esencial para diferenciar las especies beneficiosas de las perjudiciales. Este conocimiento no solo fue importante para luchar por la supervivencia, sino también para los naturalistas que en el siglo XVII empezaron a preguntarse cómo, cuándo y por qué toda esa biodiversidad había evolucionado. Poco sabían entonces que la biodiversidad que ellos veían era una diminuta fracción de la real.

El descubrimiento de la estructura de ADN a mediados del siglo XX revolucionó la biología. Francis Crick fue el primero en sugerir el uso del ADN y otras moléculas para construir árboles evolutivos. A partir de esta idea, otros investigadores pudieron desarrollar los métodos que hicieron posible llevarla a cabo, por ejemplo Carl Woese, quien usó las filogenias moleculares de genes ribosomales que llevaron al descubrimiento de un tercer dominio de la vida.

El nuevo milenio trajo otra revolución. En 2003, Paul Hebert propuso que una secuencia de ADN podría funcionar como un código de barras que identificase cada especie. La técnica se denominó barcoding (código de barras en inglés; y posteriormente metabarcoding gracias al avance de las plataformas de secuenciación). La nueva era de la diversidad molecular había comenzado. Las especies podían estudiarse fácilmente a partir de su ADN, sin necesidad de la ardua caracterización morfológica. Estas técnicas revelaron además una enorme cantidad de diversidad microbiana que era totalmente desconocida hasta entonces.

Este es el fundamento de la presente tesis, que ha utilizado datos de metabarcoding del gen 18S RNA ribosomal para buscar diversidad desconocida. Este trabajo es un viaje sobre la ecología y biodiversidad molecular, comenzando por los conceptos básicos sobre diversidad hasta el descubrimiento de linajes potencialmente nuevos que pueden contribuir a solucionar importantes cuestiones evolutivas y a completar el árbol de la vida.

III

Preface

Life on Earth is the result of billions of years of evolution and diversification of living forms. The world we see now is sustained by the tight interaction among these organisms, which maintains the ecological equilibrium upon which we ultimately depend. For centuries, humans had used biodiversity as a source of food, materials and transport. Morphological characters were, therefore, essential to distinguish between beneficial or harmful species. Not only was this knowledge important to humans fighting for survival, but also to the naturalists that in the 17th century started to wonder how, when and why all this biodiversity could have evolved. Little did they know that the picture of biodiversity they were seeing was a tiny fraction of the real one.

It was the discovery of the DNA structure in middle of the 20th century that shook up the field of biology. Francis Crick first planted the seed of using DNA or other molecules to build evolutionary trees. From this inspirational thought, other researchers further developed the theory and methodology necessary for this idea to come true. In this regard, Carl Woese’s grand contribution was the use of molecular phylogenies based on ribosomal genes that led to the discovery of a third of life.

The new millennium brought another revolution. In 2003, Paul Hebert proposed that a DNA sequence could work as a barcode that uniquely identify each particular species. The technique was hence called barcoding (and metabarcoding later on, thanks to the advances of sequencing platforms). The new era of molecular diversity began. Now species could be easily detected through their DNA sequences, without the time- consuming classical morphological characterization. Moreover, these techniques revealed a vast amount of completely unknown microbial diversity.

This is the basic ground of the present thesis, which has used metabarcoding data from the 18S ribosomal RNA gene to look for undescribed diversity that had been overlooked in traditional biodiversity studies. This work is as a journey around molecular biodiversity and ecology, starting from basic concepts of diversity to the discovery of potential new lineages that can contribute to solving important evolutionary questions and complete the tree of life.

IV

1. Introduction

1. INTRODUCTION

“Nature holds the key to our aesthetic, intellectual, cognitive and even spiritual satisfaction.”

· Edward O. Wilson

1

1. Introduction

2

1. Introduction

1.1. Biodiversity

Diversity of living beings (defined as biodiversity) has been a subject of study and classification since the Greeks. From the classics Aristotle (384-322 B.C.) and Carl Linnaeus (1707-1778) to the contemporary Carl Woese (1928-2012), many scientists have made the endeavour to classify the vast amount of diversity on Earth.

This diversity can also be quantified; a purpose that, although laborious, is of sheer necessity for any kind of question aiming to understand the function of the global ecosystem. The most accepted number of formally described species is 1.3-2 million (Roskov et al. 2014). However, the estimations on the real number significantly differ, ranging from 5±3 million (Costello et al. 2013) to approximately 100 million (Ehrlich & Wilson 1991; May 1992; Lambshead 1993), or even more than 1 trillion species (Locey & Lennon 2016). Not only does biodiversity refer to species or individuals from a population, but it also includes the genetic variations (at gene, chromosomal or genomic level) in these species. For example, knowing the genomic features of different crop strains or species is crucial to obtain bumper harvests and antibiotics, respectively.

What biodiversity provides to humanity is immeasurable. Industries such as agriculture, cosmetics, pharmaceutics or construction, are based on either the species themselves or on the substances or molecules produced by them. Biodiversity has not just a positive economic impact on our society; it is also fundamental in ecological processes, namely climate regulation, nutrient cycling, soil fertilization, control of pests, pollination, decomposition of rotten material and waste, purification of air and water, etc.

Even though the number of eukaryotic species is enormous, the knowledge about this diversity is extremely limited. Around 8,000 species were described every year in the period between 1990 and 2000 (Mora et al. 2013) and it is estimated that around 86% of extant terrestrial species and 91% of marine species are waiting to be formally described and studied (Mora et al. 2011). These numbers greatly exceed the taxonomists capacity and resources to tackle this problem (Blaxter 2003). Among this ‘eukaryotic dark matter’, unicellular () stand out.

3

1. Introduction

1.1.1. Protists

The term is commonly used to describe any that does not belong to any of the animal, or fungal groups (Andersen 1998). They are usually single-celled organisms, although some lineages exhibit multicellular species or life stages (Adl et al. 2018). Although protists are eukaryotes, they are not monophyletic. In addition, protists do not share any single morphological, genetic or physiological common trait, known as synapomorphy. This is the reason why ‘protist’ is not considered a valid taxonomic term. However, for the sake of clarity, convenience, and its frequent appearance in the scientific literature, I will use this term through this thesis.

Protists represent only 4% of the catalogued species, although data from environmental samples show a completely new paradigm: they account for 78% of the eukaryotic richness (Pawlowski et al. 2012; del Campo et al. 2014) and predictions of the real amount of protistan species increase to 105-106 species (Adl et al. 2007; Mora et al. 2011). Even more, their biomass in the planet is calculated as 4 gigatons of carbon1 (60% marine, 40% terrestrial), which doubles the animal biomass (Bar-On et al. 2018). Given their contribution to the ecosystem, the large number of estimated species, and the little catalogue of real species we have, it is daunting how many protists are still to be discovered.

This lack of knowledge of protist biodiversity results in a skewed current view of the eukaryotic Tree of Life. This bias has a deep impact in our society from many angles because although tiny, protists are fundamental in many areas that affect, directly or indirectly, humans (Table 1).

1 Biomass is calculated as mass of carbon, using gigatones of carbon (Gt C) as a unit. 1 Gt C = 1015 g of carbon.

4

1. Introduction

Area of interest Example Reference

Photosynthesis Together with cyanobacteria, pigmented Corliss 2002 protists contribute to 40% of the global photosynthesis.

Trophic web Protistan phagotrophs feeding on bacteria or Sherr & Sherr 2002 other eukaryotes affect the rate of mineral circulation in the oceanic photic zone. Their metabolic waste also provides remineralized nutrients for the ecosystem.

Industry More than 500 algal species are directly Corliss 2002 consumed by humans. Protists also produce antibiotics, enzymes and other pharmacological molecules.

Disease Malaria disease is caused by several WHO 2007 species transmitted by Anopheles mosquitoes, causing the death of 435,000 people in 2017.

Biomonitoring Diatoms, , and testate Pawlowski et al. are commonly used as ecological 2016; see review of indicators in freshwater or marine habitats to Pawlowski et al. detect eutrophication, metal contamination, 2018 chemical pollution, coal mining or sewage.

Detoxification and Aspergillus tubingensis is a soil Khan et al. 2017 decomposition capable of breaking down polyester polyurethane, one of the most pollutant plastics in the world.

Reconstruction of the Test or scale fossils from protist skeletons are Armstrong & Brasier past used to reconstruct past climatological 2003 conditions. Apart from academic knowledge, they have a direct impact on oil or gas extraction.

Evolution The origins of animal multicellularity from the Ruiz-Trillo et al. single-cell ancestor requires the study of the 2007; Sebé-Pedrós extant protists most closely related to et al. 2017 animals. Table 1. General overview of the relevance of protistan biodiversity in different areas that affect humans directly or indirectly.

5

1. Introduction

1.2. Methodology to study biodiversity

To overcome the bias in our limited understanding of eukaryotic diversity, different technical approaches have been put forward during the last two centuries.

Figure 1. The study of biodiversity can be addressed using morphological or molecular characters. (A) Morphological characterization relies on phenotypic traits that distinguish different species. Typical characterization is obtained using a dichotomous key. (B) The use of DNA for species identification (barcoding) was firstly proposed by Paul Hebert in 2003. In his seminal paper, he considered one specific gene to act as a unique barcode for every species. The barcoding workflow begins with the genomic DNA extraction of a certain organism, amplification of a housekeeping gene (COI, ribosomal markers, etc), Sanger sequencing, and identification. This sequence is finally added to a reference catalogue of barcodes. (C) Metabarcoding combines barcoding with High-Throughput Sequencing that provide thousands of reads from the sample. In this case, the sample comes directly from the environment (dirt, water, gut content, etc), so there is not morphological information whatsoever. The bioinformatic pipeline has more steps because artifactual and sequencing errors are more likely to appear, skewing the ecological and evolutionary outcomes.

6

1. Introduction

1.2.1. Morphology

Until the end of the 20th century, scientists described species based only on morphological characters. Despite breakthroughs in molecular techniques, formal description still relies on phenotypic traits. In the case of protists these diagnostic features include cell size or shape; presence, absence, number or morphology of cilia and/or ; extracellular structures like or siliceous cell walls; feeding strategies (photosynthetic, heterotroph, saprotroph); colony formation; pigments; presence or absence of chloroplasts; etc (Figure 1). Although this information is essential for understanding the basic biology of a given species and set up a natural framework to work with, it faces a barrier when describing protists. First, they are small (in the range from 0.8 to 200 μm) and hence, invisible to the naked eye, so diagnostic features are more difficult to study. Second, the isolation and culture of protists is a very laborious work, which does not guarantee success. And finally, classical morphology requires a high degree of taxonomic expertise, which implies fewer studies on the community and great time-consumption.

1.2.2. Barcoding

The advances of molecular based on nucleic acids or proteins led several researchers to start using DNA sequences for taxon diagnosis (Arnot et al. 1993; Kurtzman 1994; Folmer et al. 1994; Wilson 1995). But it was the Canadian entomologist Paul Hebert who popularized and standardized the term, proposing that “these sequences can be viewed as genetic barcodes that are embedded in every cell” (Hebert et al. 2003a).

The basic barcoding pipeline consists of extracting genomic DNA from an individual from which there is morphological information as well, amplifying the gene of interest by PCR, and using Sanger technology to obtain the sequence (Figure 1). Hebert and collaborators, visualizing the potential of this methodology, launched the Consortium for the Barcode of Life (CBoL; http://www.barcodeoflife.org) with the objective of tagging every single species on Earth (Marshall 2005). Not only would this project include the DNA barcodes of the already known described species in a standard and easy-to-use database, but it would also allow the detection of unknown species yet to be discovered.

7

1. Introduction

Soon molecular techniques provided a new era of biodiversity assessments, detecting a hidden new world of organisms in the most extreme ecosystems, from Antarctic deep waters (López-García et al. 2001) and oceanic ecosystems (Moon-van der Staay et al. 2001) to the ’s acidic River of Fire (Amaral Zettler et al. 2002).

1.2.3. Metabarcoding

Metabarcoding is a combination of DNA-species identification (barcoding) with High- Throughput Sequencing (HTS) platforms (Figure 1) (Ji et al. 2013 and review of Cristescu 2014). The HTS platforms of second (Roche-454) or third (Illumina) generation technologies provided thousands of sequences in one single run. This implies that, instead of using one single specimen, bulk samples taken directly from the environment can be quickly analysed (soil, water, feces, snow, guts, air, etc.) with no morphological information whatsoever. The weight of bioinformatics in the metabarcoding workflow is greater than in barcoding, i.e. demultiplexing (separation the metabarcoding libraries of each sample), cleaning spurious reads or errors, deleting chimeric sequences, and clustering reads in pragmatic molecular units for downstream analysis (see section 1.4.4. Operational Taxonomic Unit (OTU)1.4.4. Operational Taxonomic Unit (OTU)).

This methodology is extremely useful to study protists because it does not need diagnostic morphological features. A specific initiative was developed as part of the Consortium for the Barcode of Life in order to tackle the lack of knowledge about protists: The Protist Working Group (ProWG) (Pawlowski et al. 2012).

The advantages of molecular , compared to classical morphological-based taxonomy, are: ● It covers a wide range of taxa in all domains of life. ● Analysis of large number of samples in short time, allowing more complex sampling procedures (seasonal; along temperature, salinity or biogeochemical gradients; biomonitoring; etc.). ● Applicable to all life stages, skipping the possible biases of confusing different stages with different species.

8

1. Introduction

● It requires little taxonomic expertise and so, more researchers can contribute to increasing the knowledge of protistan diversity. ● Standardized method. Although many caveats in the pipeline may affect the final results, molecular diversity is reproducible. ● It is automatic, with little manual editing or curation. This has two major consequences: reproducibility, and low time-consumption in data analysis.

It is important to note that these three methods are complementary and none of them is a substitution for the others. As was mentioned early on, new species should be described using all possible resources: molecular data (marker genes, genomes or transcriptomes), morphology, ecology, microscopy, etc. and thus, DNA barcoding is not a full replacement of classical taxonomy (Gregory 2005).

1.2.3.1. Analysis of metabarcoding

Phylogenetic inferences based on one single gene has been the traditional way to analyse metabarcoding data. Single-gene phylogenies have helped us to reveal major discoveries, such as the three-domain classification system of life (Woese et al. 1990). However, the advances of phylogenomics later demonstrated that using one single gene is not enough to resolve deep relationships in some lineages.

Apart from the gene sequence to use, phylogenetic methods heavily rely on taxon sampling. For this purpose, it is important to cover the maximum diversity in the tree, including as many subgroups as possible. An extensive taxon sampling also hampers artifactual results like long-branch attraction (LBA). LBA is the result of a high number of saturated sites in two distantly related lineages (Felsenstein 2004; Bergsten 2005), which can happen if both lineages share the same character state in one position not because of to their shared ancestry, but convergence. Thus, two fast-evolving lineages are erroneously grouped together because their sequences contain multiple sites in which their states have converged. This is especially risky when using DNA or RNA sequences because there are only 4 possible character states and few changes are needed to create a LBA effect.

9

1. Introduction

One serious caveat of phylogenetics using metabarcoding data is the low amount of information contained in the amplicons due to their short length. Using a dataset with some reference full-length gene sequences together with abundant metabarcoding amplicons that only cover a small region of the gene presents two major problems. First, the weak signal coming from the short amplicons negatively affects the reconstruction accuracy of the maximum-likelihood (ML) methods. In addition, the low statistical supports obtained in these ML trees imply feeble outcomes and conclusions. Second, cutting-edge technologies, namely Illumina or PacBio, are producing thousands of amplicons in one single run. This vast amount of sequences are computationally demanding when inferring a ML tree and also, the output is difficult to interpret.

To overcome the weak phylogenetic signal and the high computational cost of the metabarcoding dataset, phylogenetic placement appeared as a reliable method. There are two current softwares that implement this methodology in a similar way: RAxML-EPA (Berger et al. 2011) and pplacer (Matsen et al. 2010). RAxML-EPA is an algorithm implemented in RAxML, one of the most common tools to perform maximum-likelihood inferences (Stamatakis 2006). The basic ground of a phylogenetic placement is to map short reads from metabarcoding into a fixed reference tree previously constructed using curated and full-length sequences. Each amplicon (called query) is placed independently into all branches of the tree, including the internal nodes. One query, therefore, may have different placements across the tree according to the likelihood weight ratio (LWR) of the placement. The location of the queries in the tree (in the external tips or internal branches), the dispersion of the placements, and the LWR are good predictors to infer novel molecular biodiversity.

Another method to analyse metabarcoding data is using gene similarity networks. The network is constructed by pairwise similarity comparison using BLAST (Camacho et al. 2009) between all the sequences in the database. In a network, every node is a sequence and the edge connecting two nodes is the similarity (or any other property) between them. One node might have multiple edges, showing a clear structural difference with the bifurcated tree-like topology. This permits the study of more complex evolutionary patterns, which do not follow a vertical inheritance pattern, such as horizontal gene transfer or gene fusion (Bapteste et al. 2013; Ocaña-Pallarès et al. 2019). Another advantage is the inner structure of the network, which allows the implementation of graph theory properties. They give information about the location of

10

1. Introduction the nodes in the network, the clusterization into larger structures, the preferential association or the number of edges connecting different nodes. All of them are emerging properties of the network and can be used to statistically test ecological or evolutionary theories (Corel et al. 2016; Forster et al. 2015).

1.3. Molecular markers

A molecular marker is a DNA section that provides information about a specific region in the genome. In the context of barcoding and metabarcoding, molecular markers are genes (or gene regions) that identify taxa.

There is a wide range of markers that have been extensively applied to different eukaryotic lineages, depending on their capacity to discriminate between the species or subgroups. For example, cytochrome oxidase I (COI) is the selected marker for animals (Hebert et al. 2003b). Land have been targeted using the plastid ribulose-1,5- bisphosphate carboxylase large subunit (rbcL) (Newmaster et al. 2006), whereas the ribosomal internal transcribed spacer (ITS) has been proposed as universal for Fungi (Schoch et al. 2012).

The most common and widely used markers are ribosomal RNA (rRNA). This rRNA is not transcribed to proteins but rather constitutes the ribosomal subunits, making up 50- 60% of the ribosomes (Fu & Gong 2017).

1.3.1. 18S ribosomal RNA gene

Among all molecular markers, the 18S rRNA gene (18S rDNA) is the most common gene to study molecular biodiversity (Pawlowski et al. 2012). The 18S rDNA is located in the small subunit of the eukaryotic ribosomes (and hence, often called SSU), being homologous to the prokaryotic 16S rDNA. Both of them were the basis of the three- domain system to classify all living beings, proposed by Woese and collaborators (Woese & Fox 1977; Woese et al. 1990), and they are still the most common markers used in metabarcoding surveys.

11

1. Introduction

One of the 18S rDNA caveats is its presence in multiple copies per genome (multi-copy gene). Moreover, there is a large variety of 18S copy number across eukaryotes, ranging from 1 single copy in Nannochloropsis salina to more than 12,000 in Akashiwo sanguinea (Zhu et al. 2005). This has profound implications in downstream analysis that will be discussed thoroughly in the discussion section 4.3.2. 18S reads as a (crude) proxy for species abundance. On the one hand, it allows researchers to perform functional experiments in single cells (Pawlowski et al. 2012). On the other hand, it provides information about the gene abundance, not the species abundance. This technical detail needs to be taken into account not to have skewed interpretations of the species abundances.

However, the advantages of the 18S rRNA gene are multiple and have clearly surpassed the pitfalls: ● Universal. It is present in all eukaryotes because it is crucial for the proper structure and function of the ribosome and, ultimately, of the cell. ● Highly expressed. Ribosomal RNAs are produced to satisfy the cellular needs of ribosomes, which can vary at different growth periods or even extracellular conditions. ● Hypervariable regions. The 18S rDNA is around 1,800 bp length and it is characterized by a combination of conserved and variable regions, which set up the basis of molecular phylogenetics and lineage delimitation. Specifically, there are 8 hypervariable regions, known as V1 to V9 (Figure 2) (Ki 2012; Hadziavdic et al. 2014). The V6 region is so highly conserved in eukaryotes that it is not usually considered in any molecular study, in contrast with the prokaryotic V6. There is a lack of consensus on the best region for biodiversity assessments (Tanabe et al. 2016). However, V4 and V9 appear to be the most suited because they recover the highest proportion of diversity at different gene similarity thresholds (Hadziavdic et al. 2014). Given the importance of these regions to infer molecular novelty from metabarcoding data, I will further explain in detail the most relevant ones.

12

1. Introduction

1.3.1.1. V4 hypervariable region

V4 is the largest and most complex hypervariable region, usually ranging from 350 to 450 bp (Hadziadvic 2014), although lengths of 230 bp or up to 520 bp have also been described (Nickrent&Sargent, 1991). Some lineages also display a high substitution rate in this region (Brate 2010 perkinsea). The V4 region starts with a very conserved 70 bp section, followed by a highly variable portion of 120 bp, and then short alternations of conserved and variable subregions (Figure 2). The combination of length, and the alternation between variable and conserved sections has made V4 one of the preferred targets for studying biodiversity, detecting novel genetic diversity and building accurate phylogenies (Stoeck 2010, Pernice 2013, Pernice 2015 large variability , Richards 2015, Mahé 2017).

Figure 2. Nucleotide variability in the 18S rRNA gene. Shannon entropy values of all eukaryotic alignment positions from the SILVA database along the 18S rRNA gene of . Red dots mark consecutive positions where at least 90% of ≥10 nucleotides have entropy values lower than 0.2, indicating highly conserved regions. The high variable regions of the 18S rDNA are denoted V1 to V9. Note that due to low entropy values, the V6 region is not usually considered in molecular studies. Source: Hadziavdic et al. 2014.

1.3.1.2. V9 hypervariable region

In contrast to the V4, the V9 region ranges from 87 to 186 bp long, 130 being the most abundant length (Figure 2) (Amaral-Zettler et al. 2009). The highest variability of this region is located in the center, covering an extension of 60 bp (Hadziavdic et al. 2014). The V9 is so short compared to the V4 that might weaken the phylogenetic signal to perform supported phylogenies or precise taxonomic assignations. However, it offsets by recovering as much eukaryotic diversity as that identified by taxonomy, even in groups

13

1. Introduction of difficult detection, such as and Foraminifera (Pawlowski et al. 2011). Therefore, the V9 region has been used in other metabarcoding studies, highlighting the great power of novelty detection and ecological assessments (Amaral-Zettler et al. 2009; Stoeck et al. 2009; Behnke et al. 2011; Edgcomb et al. 2011; Pawlowski et al. 2011; de Vargas et al. 2015).

1.4. What defines a species?

Species are fundamental units in Biology, as are cells or genes at smaller levels of organization (Mayr 1982). They have profound implications in comparative genomics, functional analysis and evolutionary reconstructions. As important as they are, the concept of species has been, and still is, one of the most difficult endeavours in Biology. Around 24 definitions have been proposed over the last two centuries, taking into account factors from isolation mechanisms to the evolutionary past of the individuals (for a review, see De Queiroz 2007 and Aldhebiani 2018). Some of them were acceptable when studying animal, plant or fungal species. However, microbes have been squeezing out of these definitions and challenging them. How could we differentiate among bacterial or protistan species given that they have very similar morphological features? How can a species be defined if reticulate processes, like horizontal gene transfer, conflicts with the tree-like representation of evolution? The analysis of molecular instead of morphological diversity led to a collective sigh of relief and appeared as the optimal solution because molecular markers could be applied to any organism. However, with the advance of the -omics era, we are witnessing now how this concept is still far from being clearly delimited.

In this section I will review the main species concepts and whether they can be applied to the protist diversity explored through state-of-the-art molecular techniques.

14

1. Introduction

1.4.1. Biological species concept

The very first attempts to define a species grounded in Darwin’s evolutionary theory were based on the importance of reproductive isolation of the individuals from distinct species. The idea of similar organisms in which the genetic flow is limited was the most supported view in the first half of the 20th century (Cattell 1898; Dobzhansky 1951). But Ernst Mayr was the most famous exponent of this classical definition. In his Systematics and the Origin of Species from the Viewpoint of a Zoologist, he stated that “species are groups of actually or potentially interbreeding natural populations, which are reproductively isolated from other such groups” (Mayr 1942). Even though subsequent proposals undermined the necessity of reproductive barriers for speciation, it seems that there is a need for substantial isolation in the speciation process (Coyne & Orr 2004).

This view, however, does not explain speciation events in asexual organisms (such as most protists). Neither it can be applied to plants, in which hybridization or introgression are common between distant species. Finally, it lacks a temporal dimension.

1.4.2. Evolutionary species concept

George Simpson added a temporal dimension to the biological concept in order to include asexual species and fossils (Simpson 1951). A species is a single lineage of ancestor-descendant populations that maintains its identity from other lineages in space and time, maintaining its own evolutionary tendencies (Wiley & Lieberman 2011).

Even though it represents an advance in the conception of species, it still heavily relies on morphology and fossils, which are scarce and sometimes nonexistent in protists. It also requires sudden changes in diagnostic features, which is not necessarily the case in those lineages that undergo more constant evolution.

1.4.3. Phylogenetic species concept

The phylogenetic species concept defines a species as an irreducible group of organisms, diagnostically distinguishable from other similar groups, and inside which there is a parental pattern of ancestry and descendants (Donoghue 1985). Species are, therefore, the terminal branches in a tree with no detectable ramifications. This is the

15

1. Introduction main difference with the evolutionary concept: the phylogenetic concept recognises the smallest groups of organisms that have experience any evolutionary change in strictly monophyletic units. It also includes sexual and asexual species and can be applied to any organism, including protists. Therefore, this is the currently most accepted definition of species. However, given its reduced clustering procedure, the phylogenetic concept tends to over-create species, which is not openly supported by all the scientific community, and it is heavily based on the phylogeny of the species, which is sometimes difficult to reconstruct.

1.4.4. Operational Taxonomic Unit (OTU)

Morphological traits played a major role in any of the previous three definitions of species. However, as David Caron and Sarah Hu pointed out, it is complicated to apply the concept of morphospecies to protists for the following reasons (Caron & Hu 2018): ● Cryptic species. Two species might have the same morphology but completely different physiology or ecology. For example, in chrysophytes (Bock et al. 2017) and chlorophytes (Fawley et al. 2006). ● Species with amorphous morphology. Parasites usually display odd shapes because of their dependence to the host and thus, can be very difficult to relate to their closest non-parasitic relatives. ● Different mating types. One species with a certain morphotype can have several mating types. Take ciliates as an example. Tetrahymena pyriformis form a complex with several distinct mating types although visually identical. ● Synonymous species. These are species that undergo a complex life cycle with several morphologies at different stages. Some of these morphologies have been wrongly classified as distinct species. For instance, the species Amoeba gigantea and Megamoebomyxa argillobia were later found to be the same species as the foraminifera Astrorhiza limicola (Cedhagen & Tendal 1989).

With the advance of metabarcoding as a massive, fast and global method to study biodiversity, lots of studies revealed DNA sequences that could (or not) match to extant sequences from reference databases. Metabarcoding does not provide or intake any morphological information and therefore, only genetic information is available. This is the

16

1. Introduction context in which Operational Taxonomic Unit (OTU) appeared (Blaxter et al. 2005)2. The original concept was introduced by Robert Sokal and Peter Sneath back in 1963 but it has been popularized in the last two decades thanks to a large number of metabarcoding studies (Sokal & Sneath 1963; Stoeck et al. 2009; Massana et al. 2011; Pernice et al. 2013; Logares et al. 2014; Forster et al. 2016). OTU is a practical proxy for any kind of taxa (ideally, species) in which very similar sequences are clustered together in the OTU (Blaxter et al. 2005; del Campo et al. 2014).

OTUs can be generated using two general approaches: ● Closed-reference methods: those metabarcoding reads or amplicons similar to a reference sequence are clustered in an OTU. This methodology heavily relies on the reference database, which is widely known to be incomplete and sometimes not accurately curated. Another pitfall is that very divergent diversity is lost because its similarity with the closest reference is extremely low. ● De novo methods: reads are grouped into OTUs depending on their pairwise sequence similarity. An OTU is, therefore, an emergent feature from the data. It depends on the relative abundance of the community and it’s computationally more expensive.

Within the novo methods, the most extended one is the USEARCH program (Edgar 2013). This and other related software extend the OTU from a centroid through pairwise comparisons. The OTU increases only if all amplicons fall within a certain threshold. Although there is no consensus on the global similarity threshold for protists, 97% has been widely used (Edgcomb et al. 2011; Massana et al. 2015). Another de novo clustering method is Swarm (Mahé et al. 2015). Unlike USEARCH, Swarm is iterative, choosing for the centroid the most abundant amplicon and extending the OTU using a small local threshold (d). d is 1 by default, which means a single nucleotide difference between two amplicons. Finally, sequence similarity networks have been used as a clustering method (Forster et al. 2016). In a network, every node is an amplicon and the edge connects two nodes if the global similarity value is above the selected score.

2 OTU are sometimes found in literature as MOTU (Molecular Operational Taxonomic Unit), as Blaxter originally defined it (Blaxter 2005).

17

1. Introduction

Networks produce OTUs with an internal structure that can be further analysed using properties and methods of graph theory (Junker & Schreiber 2008; Forster et al. 2016).

1.5. Opisthokonta: animals, fungi and their unicellular relatives

Opisthokonta is one of the major eukaryotic superclades, composed of two conspicuous and well-known multicellular clades: animals and fungi, together with several unicellular lineages (Figure 3) (Cavalier-Smith & Chao 1995; Ruiz-Trillo et al. 2008; Paps & Ruiz- Trillo 2010; Adl et al. 2018; Torruella et al. 2012). Opisthokonta (from the Greek ὀπίσθιος (opísthios) ‘posterior’ and κοντός (kontós) ‘pole’, referred to the flagellum) is thought to have originated around 1,200 Ma (million years ago) (Eme et al. 2014). Opisthokonta forms a monophyletic clade based on two morphological synapomorphies: a single posterior flagellum in, at least, one phase of the life cycle (although secondarily lost in some groups), and flat mitochondria cristae (Cavalier-Smith 1987). One molecular synapomorphy was also found: an approximate 12-amino acid insertion in the translation elongation factor 1-alpha (EF1-alpha) (Baldauf & Palmer 1993; Steenkamp et al. 2006). The monophyly of Opisthokonta has also been proven by single-gene and multigene phylogenies (Cavalier-Smith & Chao 2003; Carr et al. 2008; Ruiz-Trillo et al. 2008; Shalchian-Tabrizi et al. 2008).

The diversity is divided into two subgroups: and Holomycota. Holozoa comprises animals and their unicellular relatives, namely Choanoflagellatea, Filasterea and Teretosporea (or Ichthyosporea and Pluriformea, depending on the authors) (Figure 3) (Lang et al. 2002; Shalchian-Tabrizi et al. 2008; Torruella et al. 2012; Torruella et al. 2015; Hehenberger et al. 2017; Grau-Bové et al. 2017). Holomycota (also called Nucletmycea), on the other hand, includes Fungi sensu lato, , and and alba as their unicellular closest relatives (Brown et al. 2009; Liu et al. 2009; Lara et al. 2010; López-Escardó et al. 2018a).

18

1. Introduction

Figure 3. Opisthokonta tree of life. Depicted in grey are the exclusive environmental groups from which there is no morphological information. FRESCHO: freshwater , MACHO: marine choanoflagellates, FRESHIP: freshwater ichthyosporeans, MAIP: marine ichthyosporeans, MAOP: marine opisthokonts. Note the uncertain position of limacisporum and Syssomonas multiformis, which cluster with Ichthyosporeans forming the Teretosporea clade (Torruella et al. 2015; Grau-Bové et al. 2017; López-Escardó and Ocaña-Pallarès, personal communication) or as sister group of Filosozoa, forming the Pluriformea clade (Hehenberger et al. 2017).

19

1. Introduction

1.5.1. Metazoa

Metazoa or animals are multicellular and heterotrophic organisms that have their cells differentiated in the widest range of cell types, tissues and organs from all eukaryotes. Several synapomorphies uniquely characterize this group, such as the presence of collagen in the intercellular matrix, a mitochondrial genome reduction, and through spermatozoa and ova (Dunn et al. 2014). The zygotes undergo a cellular division called cleavage, followed by gastrulation in which there is a reorganization of the and cell differentiation.

Metazoans are thought to have appeared in the Ediacaran period, around 632-630 Ma (million years ago) (Love et al. 2009; Narbonne 2005), although there is no clear consensus on this date. What it is clear though, is the presence of the basic traits of all “morphogroups” that would give rise to all extant phyla during the Cryogenian (850 to 635 Ma) (Erwin et al. 2011; Grazhdankin 2014). That is to say, the “proto” versions of , , molluscs and all the rest coincided in the Proterozoic ocean (Deline et al. 2018). The appearance of this display of animal diversity on Earth has been related to an increase in oxygen atmospheric levels (the “oxygen control hypothesis”) (Knoll & Carroll 1999; Budd & Jensen 2017), although other abiotic factors such as global surface temperature (Schwartzman 2002) or salinity (Knauth 2005) have also been put forward. In any case, the origins of animals could not have been possible without the intervention of cellular processes, such as cell-cell adhesion, cell communication and cell signalling (Adell et al. 2004; Richards & Degnan 2009; Sebe-Pedros et al. 2010). Moreover, research on the unicellular lineages closest related to animals has revealed that some of the necessary proteins for multicellularity must have already been present in the unicellular ancestor (see the review of Sebé-Pedrós et al. 2017).

There are more than 1.5 million animal species grouped in 30 to 40 recognised animal phyla (Zhang 2013). The number of animal species is highly variable depending on the authors, ranging from 8 million (Mora et al. 2011) to more than 163 million (Larsen et al. 2017). Most of them are arthropods, which account for 76% of the total animal diversity, followed by molluscs (7.1%), chordates (3.8%), flatworms (1.9%) and nematodes (1.5%). Even though animals are one of the most studied groups, phylogenetic relationships are far from being fully resolved. There are two controversial hotspots: the earliest branching phyla and the root of bilaterians.

20

1. Introduction

The dawn of bilaterian animals was one of the crucial events in animal evolution. Bilaterians display several synapomorphies that have provided them with excellent capabilities to extend their diversity in the most extreme way. Examples of that are the presence of the mesoderm (triblastic organisms), a bilateral symmetry that is related to the cephalization process, and circular and longitudinal musculature (Dunn et al. 2014). Therefore, knowing the extant diversity in this part of the phylogenetic tree is essential to understand the origins and evolution of the most complex animals. Traditionally, acoelomorph flatworms (Acoela and Nemertodermatida phyla) were located as a secondarily-derived group of Platyhelminthes, but molecular data showed that they formed the earliest branching bilaterian group (Ruiz-Trillo et al. 1999; Jondelius et al. 2002). Xenoturbella species were later on associated with Acoelomorpha in a new : Xenacoelomorpha (Figure 4) (Cannon et al. 2016). They all have an acoelomate body cavity mainly filled with muscles; a sac-like gut with ventral or posterior opening; a policiliary, glandular epidermis; an extremely high regenerative capacity; a lack of stomatogastric nervous system, lack of sensory apical organ and lack of true gonadal epithelium (Srivastava et al. 2014; Haszprunar 2015). Some authors placed this phylum at the root of bilaterians based on phylogenomics (Hejnol et al. 2009; Cannon et al. 2016; Rouse et al. 2016), whereas a combination of the mitochondrial genome, an extensive phylogenomic dataset and miRNA compliments located it within , specifically as sister group to Ambulacraria ( and hemichordates) (Philippe et al. 2011).

21

1. Introduction

Figure 4. Xenacoelomorpha diversity. Xenacoelomorpha comprises Acoela and Nemertodermatida, together with the Xenoturbella. All of them are symmetrical worms that lack several features common to most other bilaterians (i.e. anus, nephridia, and circulatory system). The position of this clade as the earliest-branching bilaterians made it a key phylum to study the early evolution of bilaterian animals. (A) Xenoturbella profunda, (B) Xenoturbella bocki, (C) Hofstenia miamia, (D) Symsagittifera roscoffensis, (E) Isodiametra pulchra, (F) Diopisthoporus psammophilus, (G) Convolutriloba longifissura, (H) Nemertinoides elongatus, (I) Meara stichopi. Adapted from Hejnol & Pang 2016.

1.5.1.1. Environmental metazoans

Animals are one of the best studied eukaryotic groups (del Campo et al. 2014). The conventional way to study their richness in a specific habitat was the use of visual-based methods applied to animals’ morphological diagnostic features (Deiner et al. 2017). However, the implementation of metabarcoding revealed an extremely high diversity not previously detected (de Vargas et al. 2015; Leray & Knowlton 2016). In some cases, it accounted for 89.1% in oyster reefs (Leray & Knowlton 2015). This unassigned diversity was mainly located within , nematodes, platyhelminthes, gastrotrichs and acoelomorphs (López-Escardó et al. 2018b). However, even phyla with many described species, such as tunicates, were reported to contain a putative new subgroup of

22

1. Introduction organisms (called MAME1 for Marine Metazoan group 1) (López-Escardó et al. 2018b). Thus, metabarcoding has shown that even in a well-known lineage such as Metazoa, it is possible to find new patterns of molecular diversity that are likely to represent new animal species.

1.5.2. Choanoflagellatea

Choanoflagellates are the closest unicellular relatives of animals (Figure 3) (Richter & King 2013). This relatedness has been known since the 19th century, based on the morphological similarity between these organisms and the choanocytes, a specific cell type of Porifera (James-Clark 1866). Molecular phylogenies later confirmed their position as the closest unicellular relative of animals (Lang et al. 2002; Steenkamp et al. 2006; Ruiz-Trillo et al. 2008; Carr et al. 2008). There are around 350 described choanoflagellates; some species are colonial while some others are solitary, but all of them share a similar morphology composed by a spherical cell with an anterior flagellum surrounded by a collar of microvilli (Figure 5) (Richter & Nitsche 2017). They are free- living protists that prey on bacteria. The feeding mode consists of creating a water current using the flagellum to bring bacteria to the outer surface of the collar and then, filter the water to engulf the cells (Pettitt et al. 2002; Dayel et al. 2011).

Choanoflagellates are ubiquitous, as they have been found in fresh and marine waters from all over the world (Richter & Nitsche 2017). Their presence have been reported in a wide range of ecosystems: water column of coastal marine samples (del Campo, Mallo, Massana, de Vargas, T. A. Richards, et al. 2015), abyssal plains (Nitsche et al. 2007), sea ice (Thomsen et al. 1997), hypoxic brackish waters (Wylezich et al. 2012), and even in soils (Ekelund et al. 2001; Geisen et al. 2015). Given their widespread distribution, the number of species and their role as bacteria grazers, they represent key elements in microbial food webs, comprising between 5 and 40% of nanoflagellate biomass in aquatic environments (Arndt et al. 2002).

Choanoflagellatea is divided into two large subgroups: and (Carr et al. 2017). Craspedida contains roughly 210 described species with a wide variety of morphologies and habitats. They can present a thin glycocalyx extracellular coat, an organic covering called a , or lacking any extracellular structure (Figure 5) (Leadbeater 2015). They can also be colonial or not, and being present in both

23

1. Introduction freshwater and marine ecosystems (Carr et al. 2017). The latest phylogeny divides Craspedids in three clades, informally named as Clade 1, Clade 2 and Clade 3, which do not correspond to the distribution of morphological traits related to the theca (Carr et al. 2017).

Figure 5. Choanoflagellatea diversity. Choanoflagellates are the closest relatives of animals. They have an ovoidal or spherical cell body containing an apical flagellum, which is surrounded by a collar of microvilli. In the Craspedida subgroup, some choanoflagellates display a an organic extracellular structure known as a theca. On the contrary, acanthoecid specimens produce a silica-based extracellular structured called a lorica. (A-B) SEM image of (Craspedida), showing the thecate cell (A) and a rosette colony (B). Cells are usually ∼2.5 μm in diameter, with a flagellum ∼15 μm long. Courtesy of Mark Dayel and Nicole King. (C-D) SEM images of the acanthoecids Acanthoeca spectabilis empty lorica (C), and Savillea parva cell and lorica (D). Scale bars: 2 μm. Source: Leadbeater 2008. (E) SEM image of Acanthocorbis mongolica (). Scale bars: 5 μm. Source: Paul 2012.

Acanthoecida, on the other hand, is a smaller group (around 120 described species) with a striking siliceous extracellular structure called lorica (Figure 5) (Richter 2017). This lorica is thought to provide rigidity and extra volume to increase cell floatability in pelagic waters (Leadbeater 2008). Acanthoecids are mostly marine, with only two morphological species described in freshwater lakes: Stephanoeca arndti in Samoa (Nitsche 2014) and Acanthocorbis mongolica in Mongolia (Paul 2012).

24

1. Introduction

1.5.2.1. Environmental choanoflagellates

DNA sequences from environmental choanoflagellates appeared soon after the implementation of NGS technologies. Since then, several exclusive environmental groups have been defined from whom there is no morphological information (Figure 3). Within Craspedida, FRESCHO1 (for freshwater choanoflagellates 1), FRESCHO2 and FRESCHO4 clades were identified screening environmental sequences from the GenBank Nucleotide repository (del Campo & Ruiz-Trillo 2013). In the same study, MACHO1 (for marine choanoflagellates 1) and MACHO3 were also defined and grouped within acanthoecids, although the relationships are not well supported (del Campo & Ruiz-Trillo 2013). MACHO3 was, in fact, very abundant in another marine metabarcoding surveys, in where they accounted for 8.9% of the total unicellular Opisthokonta abundance (del Campo, Mallo, Massana, de Vargas, T. A. Richards, et al. 2015). Finally, the environmental groups FRESCHO3 and CladeL appeared to be another group separated from both craspedids and acanthoecids, although the low support of this topology leaves this position as uncertain (Weber et al. 2012; del Campo & Ruiz-Trillo 2013).

1.5.3. Filasterea

Filasterea (Shalchian-Tabrizi et al. 2008) is the sister-group of (the clade formed by Metazoa and Choanoflagellatea, Brunet & King 2017) and it only contains four described species to date: owczarzaki, vibrans, Pigoraptor vietnamica and P. chileana (Figure 3 and Figure 6). Despite being the smallest of all unicellular Holozoa groups, these species cover a wide range of habitats and lifestyles. C. owczarzaki, for example, was isolated from the haemolymph of a freshwater snail in and considered as symbiont (Owczarzak et al. 1980; Hertel et al. 2002), whereas M. vibrans was isolated as a free-living amoeba in the marine coasts of the UK and South Africa and feeds on bacteria (Tong 1997; Cavalier-Smith & Chao 2003). Pigoraptor vietnamica and P. chileana are also free-living organisms isolated from freshwater lakes in Vietnam and Chile, respectively (Hehenberger et al. 2017).

25

1. Introduction

Figure 6. Filasterea diversity. Filasterea is the closest unicellular lineage to animals, after Choanoflagellatea. This clade contains four described species to date. (A) SEM image of . Cell size between 1-4 μm (3.5-8 μm including the radiating filopodia). Source: MulticellGenome Lab, CC by 2.0. (B) Light micropgraphs of Pigoraptor chileana in different stages: upper images, flagellated cells; lower images, a cell cluster and a cyst. P. vietnamica displays a similar morphology. Scale bars: 10 μm. Source: Hehenberger et al. 2017. (C) SEM image of Capsaspora owczarzaki in the aggregative multicellular stage. Scale bar: 5 μm. Source: Sebé-Pedrós et al. 2013.

The importance of filastereans relies on their phylogenetic position and their genetic content. The species C. owczarzaki is one of the closest unicellular relative to animals that contain a rich repertoire of genes related to multicellular functions even though it is a single-celled organism (Suga et al. 2013). Given this genetic toolkit and the ability to form a transient aggregative multicellular stage, this species is key to understand the genomic innovations and reorganizations that were necessary for the transition towards animal multicellularity (Figure 6) (Sebé-Pedrós et al. 2013; Sebé-Pedrós et al. 2017; Parra-Acero et al. 2018).

Despite the efforts of finding filasterean reads in environmental surveys, only one read had been retrieved from Filasterea at the beginning of this thesis project (del Campo, Mallo, Massana, de Vargas, T. A. Richards, et al. 2015). The addition of Pigoraptor species to the ribosomal phylogenies, made the environmental group MAOP1 (see section 1.5.4.1. Environmental ichthyosporeans) to be within filastereans, although this position is still not clear (Hehenberger et al. 2017).

26

1. Introduction

1.5.4. Ichthyosporea and Pluriformea (Teretosporea)

Ichthyosporea (also known as Mesomycetozoa or DRIPs) is a clade of osmotrophic fungi-like protists typically isolated from animals, where they live as parasites, mutualists or commensals (Figure 3 and Figure 7) (Mendoza et al. 2002). The range of hosts is wide, including , crustaceans, amphibians, birds, or even mammals (Glockling et al. 2013). Although there are also free-living species (Hassett et al. 2015), the negative impact of ichthyosporeans in the fishing and seafood industry has put the main focus about their diversity on the pathogenic species (Lafferty et al. 2015). Take or hoferi as an example. I. hoferi is the most common parasite of both freshwater and marine fish in the Northern Hemisphere (Hershberger et al. 2010; Gregg et al. 2012) and it can cause 8.9% of mortality in clupedids (herrings, sardines, among others) (Gozlan et al. 2014). S. destruens has a prevalence of 32% in salmonid population and around 79% in cyprinids, although the epidemiology and impact have not been gauged yet (Andreou et al. 2012; Gozlan et al. 2014).

Ichthyosporeans are single-celled protists that can form multinucleated structures called coenocytes surrounded by thick cell walls, some showing evidence of (Glockling et al. 2013). There are around 40 described ichthyosporean species, about half of them are phylotypes3 (Glockling et al. 2013). They are divided into two groups: and , based on morphological traits (i.e. dermocystids have a flagellated amoeba as the dispersal stage, whereas ichthyophonids have a naked crawling amoeba (Glockling et al. 2013; Mendoza et al. 2002). The division has also been supported by phylogenetic analysis (Marshall et al. 2008; Lohr et al. 2010; Marshall & Berbee 2011).

Ichthyophonida is the most diverse group, with the highest interspecific morphological variability (del Campo & Ruiz-Trillo 2013; Marshall et al. 2008). There are spherical cells with thick walls, as displays (Jøstensen et al. 2002), or hyphal-like cells with walls, such as in Ichthyophonus (Mendoza et al. 2002) or Abeoforma whisleri (Marshall & Berbee 2011). They can be commensals or parasites, although free-living

3 A phylotype is a biological type that classifies an organism by its phylogenetic relationships, no matter the taxonomic level at which the phylotype is described. For example, phylotypes can be described at species, class, OTUs at 97% genetic similarity or homology level.

27

1. Introduction species have been described and detected in environmental surveys as well (del Campo & Ruiz-Trillo 2013; Glockling et al. 2013). The phylogenetic relationships within ichthyophonids are difficult to establish, but a consensus structure indicates that there are three subgroups:

The “spherical group” (Glockling et al. 2013) includes two clades. The first group is present in fresh waters and it is composed of the amphibian parasite Anurofeca richardsi (Baker et al. 1999) and the planktonic environmental LKM51 sequence (Van Hannen et al. 1999). The marine group is composed of and several Sphaeroforma species. C. fragrantissima was mainly isolated from the gut of the peanut worm Phascolosoma agassizii (Sipuncula), although other reported hosts include tunicates, sea cucumbers, and chitons (Marshall et al. 2008). There are 6 Sphaeroforma species to date (Hassett et al. 2015), being S. arctica the best studied (Jøstensen et al. 2002; Ondracka et al. 2018; Dudin et al. 2019). Both Sphaeroforma and Creolimax fragrantissima have a similar life cycle, in which a small walled spherical cell (6-8 μm) grows to form a structure called a coenocyte that can reach 25-60 μm in diameter. It is at this stage that cellularization occurs, followed by a release of the new- born .

The Eccrinales & Amoebidiales clade exhibits the richest variety in cell forms, including polarized tubular multinucleate cells resembling fungal coenocytic structures. Actually, they were long classified as Trichomycetes Fungi (Lichtwardt 1954). and Paramoebidium have 6 and 13 described species, respectively and they are ecto or endocommensals of freshwater arthropods. There are around 50 described species of Eccrinales, which are mainly found inside the guts in freshwater, marine and terrestrial habitats (Cafaro 2005). The genus Ichthyophonus is also included in this group. Ichthyophonus species are well-known for the dreadful consequences their infections have in fish all over the world (Gozlan et al. 2014; Lafferty et al. 2015). If the culture media is acid, I. hoferi develops a coenocyte hyphae-like morphology very similar to Eccrinales’ or Amoebidiales’ but with branches. If the media has high pH, then they display an amoeba morphology (Okamoto et al. 1985).

The last group contains several described species together with environmental sequences. Abeoforma whisleri and Pirum gemmata were isolated from lophotrocozoan guts (Marshall & Berbee 2011). A. whisleri presents amoeboid cells that can divide

28

1. Introduction without reaching the coenocyte. It also displays the highest variation in morphology: cells with lobose , multinucleated hyphal-like or plasmodial structures. P. gemmata was isolated from the peanut worm Phascolosoma agassizii (the same host species of C. fragrantissima) and, in contrast with A. whisleri, it releases non-motile amoebas (Marshall & Berbee 2011). Two other species are included in this group: the Tenebrio molitor symbiont or TMS (Lord et al. 2012) and Caullerya mesnii (Lohr et al. 2010), both found inside arthropod tissues, with no amoeboid stage and with different morphologies as the ones presented in Pirum or Abeoforma.

Figure 7. Ichthyosporea diversity. Ichthyosporea is a group of osmotrophic/saprotrophic protists that are isolated from animal tissues. There are two main groups: Ichthyophonida and Dermocystida. They all have a similar life cycle with a characteristic large spherical multicellular structure that releases dispersive amoebas or . (A) SEM image of a group of Sphaeroforma arctica cells. Cells are 30 μm in diameter. (B) Light microscopy of Pirum gemmata cells, showing their central vacuoles (CV). Scale bar: 20 μm. (C) Confocal microscopy image of Creolimax fragrantissima: nuclei are stained with DAPI (blue), and actin filaments are stained with Phalloidin (green). Cells are 30-60 μm in diameter. (D) SEM image of Creolimax fragrantissima. Cells are 30-60 μm in diameter. Source: all images are from MulticellGenome Lab, CC by 2.0; except for panel (B), which is from Marshall & Berbee 2011.

29

1. Introduction

Dermocystida (a.k.a Rhinosporideacae), on the other hand, encompasses strict vertebrate parasites with the exception of the recent free-living marine sediment-dweller, Chromosphaera perkinsii (Glockling et al. 2013; Grau-Bové et al. 2017). Given the strict of this group, it is challenging to obtain cultures and as a consequence, full knowledge on the dermocystid biology, ecology and evolution is still lacking. The pathogenic Sphaerothecum destruens (also called rosette agent) is probably the most studied democystid because of the appalling consequences for the fishing industry: records of mortality of Chinook salmon were over 90% in several years in the USA (Gozlan et al. 2009). Phylogenetic studies has placed S. destruens as the first branching lineage in Dermocystida (Arkush et al. 2003). Another fairly known species is , which can also infect humans (Vilela & Mendoza 2012). It releases thousands of non-flagellated cells that affect mostly mucous membranes. Other genera, such as , Amphibiocystidum, or Amphibiothecum complete this clade (Pereira et al. 2005; Pekkarinen et al. 2003; Pascolini et al. 2003).

To this extensive group, a small clade of two described species was added: Corallochytrium limacisporum (Raghukumar 1987) and the recently characterized Syssomonas multiformis (Hehenberger et al. 2017) (Figure 8). C. limacisporum is a free- living osmotroph isolated from Indian and Hawaiian coral reefs (Raghukumar 1987; Grau-Bové et al. 2017). Its life cycle resembles that of ichthyosporeans’: it starts with a unique cell that undergoes multiple binary cell divisions. Daughter cells remain attached to each other until there is a burst of the offspring amoebas. In contrast, no clear coenocyte stage has been reported. The phylogenetic position of C. limacisporum has been controversial. When described, it was classified as a Thraustochytrid (a small group of stramenopiles) (Raghukumar 1987). Later on, based on the AAA lysine pathway it was grouped as Fungi (Sumathi et al. 2006) or sister group to choanoflagellates in 18S rDNA phylogenies (Cavalier-Smith & Allsopp 1996; Pereira et al. 2005; Ruiz-Trillo et al. 2006). Variations in the taxon sampling and the genes used for the phylogeny made C. limacisporum jump over almost all possible positions within Holozoa. A more complete taxon sampling combined with transcriptomic (Torruella et al. 2015) and genomic data (Grau-Bové et al. 2017) branched this species as sister-group to ichthyosporeans. Therefore, the clade formed by Ichthyosporea plus C. limacisporum was named Teretosporea (Torruella et al. 2015).

30

1. Introduction

Figure 8. Corallochytrea diversity. (A-H) Light micrographs of Syssomonas multiformis (A-C) flagellated cells, (D) amoeboflagellate, (E) amoeboid stage, (F) cyst, and (G-H) cell clusters. Source: Hehenberger et al. 2017 (I) Corallochytrium limacisporum with the nuclei stained with DAPI (blue). Courtesy of Omaya Dudin. Scale bars: 10 μm

The phylogenetic instability of C. limacisporum was shown again when a new species was isolated from a freshwater lake in Vietnam: Syssomonas multiformis (Hehenberger et al. 2017). Both species share an amoebal stage in their life cycle, although S. multiformis possess a flagellum whereas C. limacisporum does not (although it has all the flagellar-apparatus genetic tool-kit) (Torruella et al. 2015) (Figure 8). They are also different in their feeding mode (S. multiformis preys on other eukaryotes) and habitat. In any case, transcriptomic analysis grouped this flagellated predator with C. limacisporum in the Pluriformea clade, branching as sister group to the clade formed by animals, choanoflagellates and filastereans (Hehenberger et al. 2017). Much as the exact phylogenetic position and relationship still remains unclear, unpublished results seem to support Teretosporea (Ocaña-Pallarès and López-Escardó, personal communication).

31

1. Introduction

1.5.4.1. Environmental ichthyosporeans

Several exclusive environmental ichthyosporean groups have been retrieved from metabarcoding studies (Figure 3). For example, the freshwater ichthyosporeans 1 (FRESHIP1) was found to be sister group to the Eccrinales & Amoebidiales clade of Ichthyophonida, although with low statistical support (del Campo & Ruiz-Trillo 2013). Marine ichthyosporeans 1 (MAIP1) branched as sister group to the Abeoforma clade with a Maximum-Likelihood support over 50% (del Campo & Ruiz-Trillo 2013). In this study it was found to be fairly abundant accounting for 6% of the unicellular opisthokonts. A metabarcoding study in European coastal marine habitats found that MAIP1 dominated the sediment fraction in anoxic water (del Campo, Mallo, Massana, de Vargas, T. A. Richards, et al. 2015), although it was also found in marine oxic waters, in a salt lake and even once in a freshwater anoxic environment (del Campo & Ruiz-Trillo 2013). The exclusive environmental group MAOP1 (for marine opisthokonts 1) together with MAOP2 (renamed by del Campo & Ruiz-Trillo 2013 from the original uncultured clone M1_18C07 from Marshall & Barbee 2011 and Zuendorf et al. 2006) form a clade with Corallochytrium, although the statistical support is low. MAOP1, the most abundant of the MAOPs, has been reported in oxic, micro-oxic and even anoxic environments (Romari & Vaulot 2004; Cheung et al. 2008; Amacher et al. 2009; Edgcomb et al. 2011; del Campo, Mallo, Massana, de Vargas, T. A. Richards, et al. 2015).

32

1. Introduction

1.5.5. Fungi and Opisthosporidia

Fungi are, together with animals and plants, one of the best studied eukaryotic groups because of their long and important relationship with humanity. Not only are they ultimately responsible for beer, bread or cheese production through fermentation, but also they have been the main sources of antibiotics until the advances of synthetic production (Willis 2018). Invasive and pathogenic fungi can also have devastating consequences in the economy when infecting crops, livestock or other natural resources (Galagan et al. 2005). They also sustain natural ecosystems through their primary role as decomposers of organic matter and symbiotic relationships with bacteria, plants (including ) or animals. However, the large amount of hidden diversity, lack of phylogenetic resolution in some subgroups, and the considerable genetic and trait losses make this group a constant challenge to study. To begin with, there is no clear synapomorphy that clusters all Fungi together (Richards et al. 2017). Several features were used in the past but have been all rejected, such as the synthesis of the amino- acid lysine through the ɑ-aminoadipate pathway (Richards et al. 2017) or the synthesis of ergosterol as an essential element in the cellular membrane (Paterson 2005; Weete et al. 2010). It was later found that ergosterol metabolism is also shared by multiple eukaryotes spread all over the tree of life (Adl et al. 2018), even in some close unicellular relatives of Fungi (Najle et al. 2016). The chitin was thought to be the most characteristic feature to define Fungi (Richards et al. 2017). However, it was later found that some species do not have chitin in any of their life stages. The composition of the cell wall also varies drastically, showing no clear consensus pattern. In addition, the ability to lay down chitin in the cell surface is not exclusive to Fungi, but it is also present in other eukaryotes, such as Teretosporea (Torruella et al. 2015); Entamoeba (Arroyo- Begovich et al. 1980), (Kneipp et al. 1998) and some stramenopiles (Durkin et al. 2009). Finally, many of the genes related to chitin synthesis predate the radiation of Holomycota as they were found in other eukaryotes (Mélida et al. 2013; Torruella et al. 2015), demonstrating that this character is also not valid to define all Fungi.

Classical Fungi has been divided into , Zoopagomycota, and (Figure 9 and Figure 10) (Spatafora et al. 2016). Dikarya contains and , the most recognisable fungi with complex multicellularity (Knoll 2011). However, even in this clade, secondary regression to unicellularity is shown in all Saccharomyces species (Nguyen et al. 2017). The advance of ribosomal-based phylogenies led to the proposal of Opisthosporidia as the sister-

33

1. Introduction group to “true” Fungi (Karpov et al. 2014). Opisthosporidia includes Cryptomycota (a.k.a. or Rozellomycota) (Jones et al. 2011), (Karpov et al. 2014) and (Vávra & Lukeš 2013) (Figure 9). All of these clades are intracellular parasites, presenting an amoeboid and cystic stage, together with a specialized penetration apparatus (Karpov et al. 2014). Microsporidians have even developed a deep strong phylogenetic association with their animal hosts (Smith 2009). The phylogenetic relationships between these groups are still far from being solved (Richards et al. 2017).

Figure 9. Holomycota diversity. Holomycota comprises Fungi sensu lato, Opisthosporidia and several unicellular lineages. (A) allomycis (Cryptomycota) parasitizing the blastocladiomycete macrogynus (). R. allomycis is inside the host cell producing numerous spiny, brownish sporangia (arrow). (B) Planktonic chytrid . Chitin fibrils are stained with the CBD procedure. Cell diameter of 15 μm. (C) Light micrograph of Fonticula alba when displaying the multicellular fruiting body. Scale bar: 100 μm. (D) Light micrograph of thermophila. Scale bar: 25 μm. Image source: (A-B) Grossart et al. 2015 (C) Brown et al. 2009 (D) Yoshida et al. 2009

34

1. Introduction

1.5.6. Nuclearia, Fonticula and Parvularia

A group of free-living amoebas comprises the first splitting lineage of Holomycota (Figure 9 and Figure 10). They are all grouped into three genera: Nuclearia, Fonticula, and Parvularia. Nuclearia amoebas have been known since the 19th century because of their large size (10-60 μm) and their relatively high abundance in fresh and brackish waters (Patterson 1984). There are around 9 currently known Nuclearia species and, although they all share a similar morphology (a spherical cell surrounded by thin filopodia (Mikrjukov & Mylnikov 2001) (Figure 9), this genus displays a wide range of morphological traits. For example, N. delicatula, is multinucleated (Blanc-Brude et al. 1955), while N. simplex has a cystic stage in its life cycle or N. rubra has a structure similar to an extracellular matrix (Patterson 1984). N. pattersoni was isolated from the gills of a freshwater fish, instead of as a free-living species, which is the common trait in this group (Dyková et al. 2003). Another example is the finding of ecto and endosymbiotic bacteria in other Nuclearia species (Dirren et al. 2014). Therefore, not only is its morphology diverse, but genus Nuclearia also represents a good clade to study different ecological associations that might reveal key scenarios for the origins and diversification of Holomycota.

Fonticula alba is the only species described from the Fonticula genus. Described in the 70s (Worley et al. 1979), it was later positioned as the sister-group of Nuclearia using molecular phylogenies based on several genes (Brown et al. 2009). F. alba has a very different lifestyle compared to Nuclearia. It is also an amoeba that feeds on bacteria, but it can build an aggregative multicellular fruiting body that helps to release the in other locations, where eventually new trophic amoebas will germinate and start the cycle over again (Figure 9) (Brown et al. 2009).

The latest incorporation to this clade is the nucleariid Parvularia atlantis (López-García, et al. 2018). Firstly classified as a member of Nuclearia, the phylogenetic distance with other Nuclearia species and Fonticula alba in 18S rDNA phylogenies suggested that this amoeba was a different species. Other singularities, such as a 18S rDNA similarity lower than 95% with Nuclearia and Fonticula and the lack of V4 and V7 Nuclearia’s typical insertions, confirmed the description of this species as a different genus.

35

1. Introduction

Figure 10. Holomycota tree of life. Schematic representation of the fungal lineages, showing the uncertainty in the earliest-branching lineages. Opisthosporidia contains Aphelida, Microsporidia and Cryptomycota (Karpov et al. 2014). Brown bar shows how the common features associated to Fungi are not found in earlier-diverging groups. Adapted from Richards et al. 2017.

1.5.6.1. Environmental Holomycota

Estimations on real fungal diversity propose 2.72 million species of Fungi (Hawksworth 2001), while the number of formally described species is 98,128 according to the 10th edition of the Dictionary of the Fungi (Kirk et al. 2008). This ‘dark matter’ has been recognized to be immense and of crucial importance to understanding the evolution and ecology of Fungi, as well as to potentially find molecules of human interest (Bass & Richards 2011; Willis 2018). The lack of knowledge is more severe in those classified as early-branching fungi, whose clades only represent around 2,300 morphologically described species (Kirk et al. 2008).

Within the first splitting Holomycota clade, Marine Fonticulids (MAFO) is the main environmental group found so far (del Campo & Ruiz-Trillo 2013). It contains around 90 sequences that account for 24% of the total unicellular Opisthokonta richness in European coastal waters (del Campo & Ruiz-Trillo 2013), although they have also been found in other marine environments (Edgcomb et al. 2011). MAFO are thought to be sister-group of Fonticula alba, although the phylogenetic support is low (López-Escardó et al. 2018a). Other environmental groups are located within the Nuclearia clade: env

36

1. Introduction

NUC-1 and env NUC-2 (López-Escardó et al. 2018a). Actually, Parvularia atlantis is actually nested within env NUC-2, although its low statistical support leave as uncertain their phylogenetic relationships.

Regarding Opisthosporidia, one of the environmental groups more consistently identified is the Basal Clone Group 1 (Nagahama et al. 2011), which was later on renamed as Novel Chytrid-Like Clade 1 (NCLC1) by Richards et al. 2015. NCLC1 was initially detected in marine sediments (Tian et al. 2009), but also in deep-sea environments and in the marine water column (Richards et al. 2015). Its presence in a wide range of size fractions made the authors think that it could go through a complex life cycle, displaying different cellular morphotypes or even interactions with some other cells (Richards et al. 2015). Despite the efforts of locating it in the fungal tree of life, its position remains uncertain. It has been related to rozellids (Nagahama et al. 2011), to chytrids (Richards et al. 2015), and even as sister-group of Ichthyosporea (Bass et al. 2007). Two other smaller groups (NCLC2 and NCLC3) were also detected within Chytridiomycota in marine European coasts (Richards et al. 2015), being NCLC3 sister group to Rhizophydiales. Basal Clone Group 2 (BCG2) was also detected in soils and freshwater, and seemed to be sister group to all non-Opisthosporidian Fungi (Bass et al. 2018; Monchy et al. 2011). Within Cryptomycota, the amount of unknown diversity is massive because there is only one characterized genus and one culturable species, Rozella allomycis (James & Berbee 2012). The environmental clade LKM11 clustered with Rozellids and was detected in freshwater environments (peats and other freshwater engineered systems) (Lara et al. 2010).

New diversity has been found not only in early-branching fungal clades: some Dykaria have been found exclusively as environmental sequences (Richards et al. 2012). What is clear is that the number of 2,189 new described fungal species in 2017 is not the exception but the rule, showing the immense diversity that remains to be uncovered (Willis 2018).

37

1. Introduction

38

2. Objectives

2. OBJECTIVES

“Living most of the time in a world created mostly in one’s head, does not make for an easy passage in the real world.”

· Sydney Brenner

39

2. Objectives

40

2. Objectives

The main objective of my thesis was to analyse the molecular diversity of both microbial Opisthokonta and Metazoa more specifically, searching for potential novel clades, as well as trying to better understand their ecology and their geographical distribution.

To accomplish this objective, I used metabarcoding data of the universal marker 18S rRNA gene from different marine and freshwater datasets. My specific objectives were: 1. To find new diversity of unicellular Opisthokonta in both freshwater and marine habitats. 2. To unravel the geographical distribution of unicellular Holozoa lineages around the globe and their possible ecological roles in the ecosystems. 3. To find novel molecular diversity in Acoelomorpha flatworms, a key group to understand the evolution of bilaterian animals. 4. To implement the graph theory through gene similarity networks to address molecular novel biodiversity of unicellular Holozoa.

41

2. Objectives

42

3. Results

3. RESULTS

“If there is a tree of life, it’s a small anomalous structure growing out of the web of life”

· John Dupré

To the Unseeable Animal

My daughter: "I hope there's an animal somewhere that nobody has ever seen. And I hope nobody ever sees it."

· Wendell Berry

43

3. Results

44

3. Results

3.1. Novel diversity of deeply branching Holomycota and unicellular holozoans revealed by metabarcoding in Middle Paraná River, Argentina

Arroyo AS, López-Escardó D, Kim E, Ruiz-Trillo I and Najle SR (2018) Novel Diversity of Deeply Branching Holomycota and Unicellular Holozoans Revealed by Metabarcoding in Middle Paraná River, Argentina. Frontiers in Ecology and Evolution 6:99. doi:10.3389/fevo.2018.00099

45 3. Results

3.2. Gene similarity networks from the Tara Oceans expedition unveil geographical distribution, ecological interactions, and novel diversity among unicellular relatives of animals

Arroyo AS, Lannes R, de Vargas C, Bapteste E and Ruiz-Trillo I. Gene similarity networks from the Tara Oceans expedition unveil geographical distribution, ecological interactions, and novel diversity among unicellular relatives of animals. Unpublished.

71

3. Results

72

3. Results

Gene similarity networks from the Tara Oceans expedition unveil geographical distribution, ecological interactions, and novel diversity among unicellular relatives of animals

Alicia S. Arroyo1*, Romain Lannes2*, Colomban de Vargas, Eric Bapteste2w & Iñaki Ruiz-Trillo1,3,4w

1 Institut de Biologia Evolutiva (CSIC-Universitat Pompeu Fabra), Passeig Marítim de la Barceloneta, 37-49, 08003 Barcelona, Spain

2 Institut de Biologie Paris-Seine (IBPS), UPMC Université Paris 06, Sorbonne Universités, Paris,

3 Departament de Genètica, Microbiologia I Estadística, Institut de Recerca de la Biodiversitat, Universitat de Barcelona, Avinguda Diagonal 643, 08028 Barcelona, Spain

4 ICREA, Pg. Lluís Companys 23, 08010 Barcelona, Spain

73 3. Results

ABSTRACT The Holozoa clade is comprised of animals and several unicellular lineages. Thus, understanding the full diversity of unicellular holozoans is essential to address the origins of animals and other evolutionary questions. However, the full diversity of these lineages is poorly known. In this study, we analysed 18S rDNA metabarcoding data from the global Tara Oceans expedition with the objective of finding new diversity within or between unicellular Holozoa lineages. We used similarity networks to overcome the low phylogenetic information contained in the metabarcoding dataset (composed of sequences from the short V9 region of the gene). We constructed similarity networks by combining two datasets: unknown environmental sequences from Tara Oceans and known reference sequences from GenBank, and blasting them all against all. We calculated network metrics to compare environmental to reference sequences. These metrics reflected the divergence between both types of sequences in a mathematical way and provided an effective way to mine the Tara Oceans dataset to search for evolutionary relevant new diversity, further validated by phylogenetic placements. Our results showed that unicellular holozoans from Tara Oceans were not similar to the extant references, expanding the known diversity of these lineages. Novelties were mainly found in Acanthoecida choanoflagellates, branching off several already described subgroups. We also found 21 OTUs that did not cluster to any other existing lineage and thus, could be a new holozoan group. Moreover, we also explored for the first time the geographical distribution of the extant holozoan lineages around the globe, and the ecological interactions they may have with animals. Results showed that, although ubiquitous, each lineage exhibited a different distribution pattern. We also checked for potential associations between unicellular Holozoa and animals, and identified a positive correlation between the abundance of new animal hosts and the ichthyosporean Creolimax frangrantissima, as well as for other holozoans that were previously reported as free-living. Overall our analyses provide a fresh perspective into the diversity and ecology of unicellular holozoans, highlighting the amount of undescribed diversity in this important clade of the tree of life.

Keywords networks, metabarcoding, 18S, molecular diversity, unicellular Holozoa, novelty

74 3. Results

INTRODUCTION the Tara Oceans expedition (de Vargas et al., 2015; Pesant et al., 2015), of An important evolutionary question that which a third did not match any feeds on our current knowledge of reference in databases (de Vargas et al., diversity is the origin of animals. To 2015). A drawback of this dataset is the understand animal origins, we first need absence of full-length 18S sequences, to have a good phylogenetic framework being composed by the relatively small (Ruiz-Trillo et al., 2007). We now know V9 region (around 130 bp long), located that animals are closely related to at the end of the 18S (Hugerth et al., several unicellular lineages, namely 2014). Because these short amplicons Choanoflagellatea, Filasterea, and contain too little phylogenetic information Ichthyosporea, all together forming the to resolve phylogenies, we followed a Holozoa clade (Lang et al., 2002; Ruiz- different strategy to unravel new Trillo et al., 2008; Shalchian-Tabrizi et molecular diversity within Holozoa al., 2008; Torruella et al., 2015; Grau- (Amaral-Zettler et al., 2009). Bové et al., 2017). Therefore, to understand how the unicellular ancestor To overcome the issue of the limited of animals looked like and how animals phylogenetic signal, we decided to evolved from that ancestor, one must analyse this dataset using gene similarity study unicellular holozoans. However, networks. Networks have been the real diversity of Holozoa is still mostly preferentially applied to study ecological unknown (del Campo et al., 2015; Arroyo interactions, such as predator-prey, et al., 2018). Therefore, improving our parasite-host or mutualism (Logares et knowledge about Holozoa diversity may al., 2014; Krabberød et al., 2017; change current interpretations on the Layeghifard et al., 2017; Pilosof et al., evolutionary transition towards animal 2017; Valverde et al., 2018). They are multicellularity (Ruiz-Trillo et al., 2007; now becoming widely adopted to explain del Campo et al., 2014). complex evolutionary processes, such as horizontal gene transfer, gene domain To fill this gap and provide a more fusion, and gene or genome accurate perspective on the unicellular introgression (Corel et al., 2016; Holozoa diversity, we analysed the Pathmanathan et al., 2018; Ocaña- longest and largest metabarcoding Pallarès et al., 2019). To our knowledge, marine dataset: the 18S ribosomal RNA there are very few metabarcoding gene (hereafter 18S or 18S rDNA) from studies that used networks to describe

75 3. Results

novelty in metabarcoding datasets (6,244 reads in total). We also observed (Forster et al., 2015), even though this that the freshwater environmental group methodology offers a structure to test FRESCHO3 could have diverged from a evolutionary questions in massive high- marine clade, showing another marine- throughput data and to mine large to-freshwater transition in datasets for sequences of interest. choanoflagellates. Finally, our results suggested novel associations between The main objective of our analysis was animals and ichthyosporeans. For to find novelty along unicellular example, the ichthyosporean C. holozoans. Moreover, we also analysed fragrantissima could be associated with the geographical distribution of a wider range of animal hosts than unicellular Holozoa and looked for co- previously described. Other associations occurrence patterns between some were identified between the unicellular Holozoa parasitic lineages environmental clades marine and their corresponding animal hosts. ichthyosporeans 1 (MAIP1) and marine opisthokonts 2 (MAOP2), and different We detected novel unicellular Holozoa animal phyla, adding other ecological diversity, in particular within dynamics to the unicellular relatives of Choanoflagellatea and Ichthyosporea. animals. Specifically, we found unicellular

Holozoa Operational Taxonomic Units RESULTS AND DISCUSSION (OTUs) branching off several acanthoecid subgroups (for example Initial datasets & network H), Syssomonas construction multiformis and Creolimax fragrantissima. We also retrieved 15 The main objective of this study was to Filasterea-related OTUs, detecting this look for potential new diversity of clade for very first time in an unicellular Holozoa and to address, for environmental survey. Interestingly, we the first time, the geographical also identified a putative novel unicellular distribution of the clade around the Holozoa group that could not be located globe. We used metabarcoding data within any other known lineage. This from the V9 region of the 18S rRNA clade, that we tentatively named as gene. We combined two datasets: an MASHOL (for MArine Small HOLozoa environmental dataset of OTUs clade), was composed of 21 OTUs (Operational Taxonomic Units) and a

76 3. Results

reference dataset with known holozoan reference (CCREF) and CC in which there sequences. The environmental dataset were both types of nodes (CCMIX). came from the worldwide Tara Oceans expedition (de Vargas 2015), which The topology of the network was included a total of 1,086 samples from constant in all thresholds, meaning that

210 oceanic stations, 3 water column the number of CCENV was always the layers and 10 size fractions (further largest, followed by a CCMIX and CCREF details about sampling procedures can (Supplementary Figure 1), which be found in Pesant et al., 2015). The indicated the presence of abundant reference dataset was built by collecting divergent groups of environmental sequences from both GenBank sequences. Nucleotide and PR2 databases (see Materials and Methods). Definition of novelty We explored the network structure to The initial unicellular Holozoa network search for molecular diversity. To do so, was built from 2,426 sequences (2,197 we calculated different metrics that are from Tara Oceans, 229 from the grouped in four categories: reference dataset). In the network, each I. Closeness centrality (Figure 1 node represented either an and Supplementary Material 1): It environmental OTU from Tara Oceans defines to which extent a node (hereafter ENV) or a sequence from the (sequence) is central in the reference database (hereafter REF) network. Typically, a peripheral (Figure 1). The basic structure of the sequence is more divergent than network consisted of Connected the rest of the nodes in the Components (CCs): subgraphs of the network because it shares less network in which there is always a path similarity. Therefore, we tested between all nodes (Figure 1). The initial whether and which environmental network was subsequently partitioned sequences (ENV) were using increasing sequence similarity significantly more peripheral than thresholds (≥85%, ≥87%, ≥90%, ≥95% reference sequences (REF), and ≥97%), resulting in a more since this suggests that those fragmented network (Figure 1). CCs ENV sequences extends the could be classified in three types: CCs in current known diversity of which all nodes were environmental Holozoa.

(CCENV), CC in which all nodes were

77 3. Results

Figure 1. Network metrics. Upper panel: once the unicellular Holozoa network was constructed, different similarity thresholds were applied to gain a more detailed structure of their diversity. Lower panels: network metrics computed in this study to address molecular novel diversity in unicellular Holozoa. A more technical explanation of closeness and assortativity can be found in Supplementary Material 1, and of BRIDES in Supplementary Figure 2.

78 3. Results

II. Preferential association IV. Shortest-path distance (Figure (Assortativity, Figure 1 and 1): Shortest paths describe the Supplementary Material 1): minimal number of edges Assortativity quantifies whether between any pairs of nodes in a nodes that belong to the same network. We used these metrics category are more connected to quantify the distance between with each other rather than with ENV and REF nodes in the nodes from other categories. For graph. By definition, increasingly example, a significant preferential divergent ENV sequences will be association between ENV nodes located increasingly far from REF would indicate the existence of sequences. If ENV and REF groups of similar environmental sequences are located in distinct sequences, distinct from CCs, there is no path between sequences from already them, thus the shortest path described Holozoa. distance for such pairs of nodes III. Network comparison (path is infinite. analyses by BRIDES, (Figure 1 and Supplementary Figure 2): All these steps of graph-mining pointed It quantifies the new paths towards evolutionary relevant ENV created in an augmented network sequence candidates, for which when new sequences (e.g. ENV) phylogenetic placement could be finally are added to an original network computed (see Materials and Methods). (with only REF), as in Lord et al., 2016. In particular, this allows the evaluation of whether newly The structure of the unicellular added sequences fill in some Holozoa network shows potential gaps between the original new diversity sequences (B and S paths indicating that added sequences The general structure of the network are intermediate diversity with provided an overview of the unicellular respect to known sequences) or Holozoa diversity and the potential new fail to do so (the I path indicating diversity (Figure 2). that added sequences do not First, we computed the closeness of all present such an intermediate nodes (Figure 1, Figure 3 and diversity). Supplementary Material 1) to test

79 3. Results

Figure 2. Unicellular Holozoa network at ≥85% similarity threshold. Environmental nodes from Tara Oceans are depicted with triangles that are coloured according to the distance to their shortest reference sequence (right panel). Reference nodes from GenBank dataset are depicted with circles that are coloured according to the taxaonomy (left panel). Connected Components composed of only reference nodes are located in the top right corner. The novel Holozoa group described in this paper, MASHOL (for MArine Small HOLozoa), is shown in red triangles and pointed in the network with a black circle. Raw network data can be found in Supplementary Material 4.

whether the distribution of closeness peripheral than REF nodes (Wilcoxon values for REF nodes was (i) signed-rank test, p-value<0.01**) significantly different and (ii) significantly (Figure 3A) in all networks. This result higher than the distribution of closeness indicates a high amount of potential new values for ENV nodes, using Wilcoxon diversity in our unicellular Holozoa signed-rank test. The results showed dataset from Tara Oceans. Not only the that ENV nodes were significantly more closeness distributions for REF nodes

80 3. Results

were significantly higher than that for All networks were significantly ENV nodes, but also their shapes were assortative (one sample t-test, p- different. At ≥85, ≥87 and ≥90% similarity value<0.01**) (Figure 3B). This thresholds, most closeness values of tendency for intra-group preferential both ENV and REF distributions were linkage suggests a lack of representation low (95% confident interval between 0.2- of oceanic Holozoa in the reference 0.4, approximately), and only few nodes dataset before the Tara Ocean presented a closeness value of 1. On the expedition, and thus stresses the high other hand, at ≥95 and ≥97% thresholds, level of potential new diversity in Tara Oceans. when the network was more disconnected, the distributions of Moreover, we checked is these results closeness values for ENV nodes were were not trivially explained by the scattered along a wider range of higher unequal amount of ENV sequences closeness values (~0.2-1). This change (2,197) compared to REF sequences reflected the fragmentation of the (229) in our initial dataset. Basically, we network into more, but smaller CCs. repeated the same analysis using

networks constructed from the same Next, we analysed the preferential number of ENV and REF nodes (see the connection between ENV nodes, which “Control test” section in Materials and showed greater similarity between ENV Methods and Supplementary Table). sequences than between ENV and REF sequences. For every network, we Regarding closeness, at ≥90%, ≥95%, computed (i) a distribution of null and ≥97% similarity thresholds we assortativity values by randomly shuffling obtained the same results than for real the ENV and REF node labels and we networks: the distributions of closeness contrasted these values with (ii) the values for REF nodes were significantly assortativity values of all our real higher than that for ENV nodes. At lower networks (see Materials and Methods).

81 3. Results

(legend on next page)

82 3. Results

Figure 3. Network approach to the analysis of novel diversity of unicellular Holozoa. (A) Closeness distribution of reference nodes was significantly higher than that of environmental nodes. This showed that environmental nodes were located at the periphery of the connected components because they were more divergent. Two asterisks mark the significance of the Wilcoxon signed-rank test when p- value<0.01. (B) Assortativity values were significantly positive in all networks, meaning that environmental nodes tended to connect preferentially together rather than with reference nodes. (C) BRIDES analysis. Environmental OTUs from unicellular Holozoa created new paths with respect to the original reference network, as green bars show (see Supplementary Figure 2 for details about each type of path). (D) New molecular groups in Choanoflagellata. Phylogenetic placement of the OTUs that created breakthroughs and shortcuts at ≥85% similarity threshold in (C; in red) against a curated reference tree of unicellular Holozoa. We computed the placement using the RAxML-EPA algorithm with the GTR+CAT+I evolutionary model (Berger et al., 2011). Several OTUs branched off some acanthoecid clades, such as Choanoflagellate I, G and H, showing a different diversity from the extant known species. This novel molecular diversity is well supported by the high abundance of some OTUs (shown as the number in the brackets) and the good quality of their placement (Supplementary Figure 3A,B). Alignments and the full phylogenetic tree can be found in Supplementary Material 2.

thresholds, however, these differences Identification of a potential novel were usually non-significant, but the Holozoa group. New molecular closeness values of REF nodes were diversity found in Acanthoecida never lower to that of ENV nodes in the (Choanoflagellatea) evenly sampled networks. Assortativity values were also positive and To identify new groups of interest, we significantly different for all control first performed network comparisons networks, as in the actual networks. using BRIDES software (Figure 1,

Supplementary Figure 2) (see Overall, these metrics (closeness and Materials and Methods and Lord et al., assortativity) indicated that our 2016). This allowed us to contrast the environmental dataset of unicellular topologies of networks built exclusively holozoans from Tara Oceans expanded from REF nodes (original networks) with the current known diversity of this group. that in which ENV nodes had been Moreover, we proved that these results included (augmented networks). were not an artefact from the unequal number of ENV sequences compared to BRIDES analysis showed that ENV REF. We then mined this molecular sequences of unicellular Holozoa from diversity to uncover evolutionary relevant Tara Oceans created numerous new Holozoa groups. paths in the augmented similarity

networks (Figure 3C), guiding the

discovery of evolutionary relevant novel

83 3. Results

sequences. First, despite the enhanced (ii) the presence of intermediate ENV molecular diversity provided by the Tara sequences in other habitats but not in Oceans dataset, some REF nodes the marine water column, or (iii) the remained disconnected, indicating that nature of the Holozoa clade, which may the diversity of most ENV sequences be comprised of some significantly was not intermediate with respect to divergent lineages without intermediate some REF nodes. This was especially diversity between them. noticeable for networks built at high similarity thresholds. At ≥97% ID, the On the other hand, breakthroughs (B) vast majority of paths were impasses (I), and shortcuts (S) were increasingly meaning that ENV sequences did not observed in networks at lower thresholds create bridges between REF sequences (Figure 3C). These two types of paths in the augmented network correspond to sequences that introduce (Supplementary Figure 2). This is either new connections in the known logical because, given the high level of diversity (B) or new intermediate stringency, only sequences from the sequences within known groups (S). closest related holozoan lineages would Thus, B paths indicated which ENV connect in the CC, confirming the sequences could possibly branch in general divergent nature of ENV between two groups in a phylogenetic sequences. Interestingly, when lowering tree, whereas S paths indicated which the similarity threshold required to ENV sequences could possibly branch connect sequences in the networks, the within a group (Supplementary Figure proportion of impasses decreased, 2). Overall, the presence of a high showing that some of these divergent proportion of B and S paths (≥85% = ENV sequences started to connect some 36.93%, ≥87% = 33.22%, ≥90% = REF sequences. Still, at ≥85%, some 45.42%) suggested that Tara Oceans Holozoa REF sequences remained data hinted at the existence of oceanic disconnected, indicating that the Tara clades that could help to better resolve Oceans dataset did not provide evidence the Holozoa phylogeny. for intermediate groups for all known We corroborated this putative new Holozoa (i.e., in terms of diversity, there diversity performing a phylogenetic remained persistent gaps within the placement analysis (see Materials and Holozoa tree). Possible explanations to Methods). We selected the OTUs that this enormous amount of impasses may created breakthroughs and shortcuts in be: (i) a lack of sufficient sampling effort,

84 3. Results

the network at 85% similarity threshold the good quality of these placements (Figure 3D). These OTUs unravelled gauging the likelihood and distance novelty within Acanthoecida, one of the between placements (Supplementary two subgroups of Choanoflagellatea. A Figure 3A,B). Alignments and the full group of 6 sequences (with a total of phylogenetic tree of Figure 3D can be 1,675 reads) branched off found in Supplementary Material 2. Choanoflagellate H, suggesting a potential novel environmental group of Our second approach to examine in acanthoecids. Another group of 3 detail the novelty in unicellular Holozoa sequences (including one of the most was performing shortest-path distance abundant OTUs in the whole Tara analysis between every ENV node and Oceans dataset: OTU 2703, with more its closest REF node (Figure 1). The than 28,000 reads) appeared to be the longer the distance, the more divergent sister group of Choanoflagellate G. The the ENV sequence is, because many importance of this result lies in the fact steps are required to reach the nearest that these OTUs do not cluster together REF sequence. The most extreme case with the already morphologically is the infinite distance, shown by ENV described Choanoflagellate G species nodes belonging to exclusively (i.e., Acanthocorbis unguiculata, environmental CCs. Our results showed Acanthoeca spectabilis, Savillea that indirect connections to REF (when micropora, Helgoeca nana), but branch there is more than 1 step from ENV to at an internal node, showing the REF) were the most abundant, ranging divergent nature of these OTUs. We also from 92.5% of all ENV nodes at ≥85% recovered the second earliest diverging similarity network to 69.83% at ≥97% acanthoecid (OTU 5953, with 7,448 reads), splitting differently from the (Figure 4A). In addition, networks at reference sequence JQ223245, which higher similarity thresholds (≥95% and was already identified as a divergent ≥97%) exhibited a high proportion of choanoflagellate (del Campo et al., infinite distances (15.39% of ENV nodes 2015). Finally, several OTUs were at ≥95% similarity threshold; 30.56% at clustered in freshwater environmental ≥97% similarity threshold) (Figure 4A). choanoflagellate groups, such as FRESCHO3 or FRESCHO1, which We then extracted those OTUs to shows a wider ecosystem range in which perform phylogenetic placement against these species can inhabit. We confirmed a curated reference Holozoa tree (see

85 3. Results

Figure 4. Potential new group of unicellular Holozoa (MASHOL) found branching off Choanoflagellatea. (A) Shortest path analysis showed that a considerable proportion of environmental nodes have infinite distance with their closest reference node (15.39% in the network at ≥95% similarity threshold; 30.56% in the network at ≥97%). These ENV nodes were not connected to any reference node whatsoever, suggesting a substantial amount of divergent diversity. (B) Phylogenetic placement of the 21 OTUs that exhibited infinite distance in the networks at ≥95% and ≥97% similarity thresholds in (A). All OTUs were allocated in internal branches, outside Choanoflagellatea and Syssomonas multiformis, depicted as a thick magenta line. The lack of high support (measured as Likelihood Weight Ratio or LWR) in the placements suggests a deep uncertainty about the exact placement of these sequences in the Holozoa tree of life (Supplementary Figure 3D). However, their narrow scattering over the tree and their clear position in internal rather than external branches open up the possibility for these OTUs to be a potential new Holozoa group that we tentatively named as MASHOL (for MArine Small HOLozoa). Phylogenetic placement was carried out using RAxML-EPA algorithm (Berger et al., 2011) under the GTR+CAT+I evolutionary model. Alignments and the full phylogenetic tree can be found in Supplementary Material 3. 86 3. Results

Materials and Methods). The deepest although its exact position is deeply novelty (understood as the diversity that uncertain. lays in internal nodes in the tree) was observed in the networks at ≥95% and Unicellular holozoans are

≥97% thresholds. We performed a globally distributed, with some specific phylogenetic placement of this lineages showing specific deep novelty, shown in Figure 4B. A geographical patterns group of 21 OTUs with a total abundance of 6,244 reads was located in the most We next evaluated the geographical internal branch outside Choanoflagellata, distribution of unicellular Holozoa across specifically scattered across the internal oceans, layers of the water column, and branches of choanoflagellates and sizes. Syssomonas multiformis. These OTUs In general, all lineages of unicellular were mainly recovered in the pico (0.8- Holozoa were widely distributed across 3/5 µm) and nano (3/5-20 µm) size the world’s oceans (Figure 5A). fractions from the Indian Ocean and Ichthyosporeans were the most Mediterranean Sea. Inspired by its homogeneously dispersed group across uncertain phylogenetic position and the all oceans. There were, however, some small size, we tentatively named this exceptions. Within Choanoflagellata, for group as MASHOL (standing for MArine example, Acanthoecida OTUs were Small HOLozoa). The quality of the more abundant in the Arctic samples placement test revealed that the (60.29% of total abundance) compared placements had very low Likelihood to Craspedida (4.5%) (Figure 5A). Weight Ratio (Supplementary Figure These results are consistent with 3D), although all of them were located previous morphological studies of around the same internal branches in the choanoflagellates in sea ice (Thomsen et tree. As Mahé et al., 2017 pointed out, al., 1997), although these these low-probability placements do not choanoflagellates were more extensively necessarily mean that they are incorrect, found in the Antarctic rather than in the but they hold a high molecular distance Arctic oceans, as we observed. OTUs with the reference sequences in the tree. assigned to Filasterea were widely This result indicates that these OTUs do distributed, but their abundance was not really belong to any of the already higher in the samples coming from the known unicellular holozoan lineages, South Pacific Ocean (43.37%), Red Sea (24.7%) and Indian Ocean (16.97%)

87 3. Results

(Figure 5A). OTUs related to

Figure 5. Geographical distribution of unicellular Holozoa OTUs from the Tara Oceans expedition. As depicted in the example (bottom left panel), chord diagrams show OTUs on the bottom half of the circle, and oceanic regions, depths, and fraction sizes on the upper half. Each OTU is represented by a line, whose thickness depicts the OTU’s abundance in that particular place. In general, all unicellular holozoans were widespread and located in surface or DCM layers of the water column. However, some had different preferential geographical locations (i.e., MAOP1 vs MAOP2, or Craspedida vs Acanthoecida), or fraction sizes (i.e., Ichthyophonida vs Dermocystida, or Craspedida vs Acanthoecida). Note that the thickness of each OTU is relative to the amount of OTUs in each group, so comparisons between lineages are not possible. Numbers below group names indicate the number of OTUs.

88 3. Results

Corallochytrea group were widely from 0.016 in the network at ≥85% distributed, although the OTU with the similarity threshold to 0.046 in the highest abundance (OTU 30781, 248 network at ≥97%), it shows a tendency of reads) was mainly located in the North OTUs from the same geographical Pacific Ocean (Figure 5A). Both the region to be more associated between Indian Ocean and the Arctic Ocean held them, hence genetically more similar, 30% of the reads of corallochytreans than with OTUs from other regions. ( ). On the contrary, the Figure 5A presence of corallochytreans in the Regarding the depth in the water Atlantic Ocean seemed to be column, the majority of the unicellular insignificant. Regarding the Holozoans were preferentially located in environmental group of marine the surface or Deep Chlorophyll opisthokonts 1 and 2 (MAOP1 and Maximum (DCM) layers (Figure 5B). MAOP2, respectively), they showed a This tendency to be present in the upper pattern of distribution similar to layers of the water column was Choanoflagellata. MAOP2 appeared to supported by the positive assortativity be most abundant and with more OTUs coefficient (Supplementary Table). Even than MAOP1, in contrast to what had though these are low positive numbers, been found in coastal European waters they were significantly different from the (del Campo et al., 2015). Moreover, random shuffled distribution (one sample while MAOP1 was not found in the Arctic t-test, p-value<0.01**), which supported or the Antarctic Oceans, MAOP2 the tendency for a shallower preference exhibited 36% of its abundance in the location. Arctic, expanding to the maximum the range of geographical locations in which Finally, unicellular holozoans were this environmental group has been found recovered from a wide range of sizes up to now (Figure 5A) (Romari and (Figure 5C). For example, within Vaulot, 2004; Amacher et al., 2009; Choanoflagellata, the majority of Edgcomb et al., 2011; Marshall and Acanthoecida abundance (69.37%) was Berbee, 2011). Assortativity coefficients present in the nano fraction (3/5-20 µm), of geographical distribution across followed by 19.4% in the pico fraction oceans and oceanic provinces showed (0.8-3/5 µm). Filasterean reads were positive values in all networks mainly found in meso (43.18%) and nano (Supplementary Table). Even though (46.21%) fractions. Ichthyosporeans had these values were not very high (a range a different pattern of sizes according to

89 3. Results

subgroup (Figure 5C). The distribution value<0.01**), indicating a tendency for of Dermocystida reads was shifted unicellular Holozoa lineages to be towards the largest fractions (10.96%, retrieved from specific size fractions. 19.98% and 57.73% in meso, micro and nano fractions, respectively). On the Co-occurrence of the contrary, the distribution of ichthyosporean Creolimax Ichthyophonida reads was shifted fragrantissima and its putative towards the smallest fractions (24.46% in animal hosts, some of them nano and 61.97% in pico fractions). detected for the first time OTUs associated with Corallochytrea were preferentially found in the pico, Some of these unicellular species, nano and pico-nano fractions (0.8-20 specially the Ichthyosporea, have been µm). Finally, both MAOP groups were previously described as animal parasites more present in the smallest fractions: or symbionts (Mendoza et al., 2002; nano (54.94%) and pico (37.81%), which Glockling et al., 2013). To see whether differs from previous findings that our data could illuminate us on this showed MAOP dominating the micro aspect, we checked if there was any fraction (del Campo and Ruiz-Trillo, association between the presence of 2013). Yet, these results are consistent unicellular Holozoa and animals. with these authors, who already suggested that MAOP group might be Our results showed that there were composed by species with different indeed significant positive and negative sizes. The group might also undergo a correlations between unicellular Holozoa life cycle with several stages that include and animals (Figure 6A). The strongest different cell sizes. The preferential correlation (Spearman’s rank correlation location of different lineages in different coefficient, ρS=0.6-0.8, p<0.01**) was size fractions can be seen in the shown between OTUs associated with assortativity values (Supplementary Creolimax fragrantissima and several Table). In all networks, assortativity animal phyla: Entoprocta (Barentsiidae), coefficients of fraction sizes were the (Polyplacophora), Tardigrada, highest among all elements considered and Porifera (Homoscleromorpha, (depths, oceanic provinces, oceans and Calcarea and Demospongiae). To see if size). These values were also significant we could detect other associations but compared to the distribution of random monotonic and linear (as Spearman and shuffled labels (one sample t-test, p-

90 3. Results

Figure 6. Co-occurrence analysis between unicellular Holozoa OTUs and animal classes from Tara Oceans. (A) Heatmap representing the Spearman’s rank correlation coefficient (ρ). The ichthyosporean symbiont Creolimax fragrantissima had the strongest correlation coefficient (ρS=0.6-0.8, p<0.01**) with several animal phyla, suggesting a wider diversity of animal hosts in which this organism can dwell. Full heatmap can be found in Supplementary Figure 4A. (B) Network depicting other possible associations, besides monotonic and linear. The environmental clades marine ichthyosporeans 1 (MAIP1) and marine opisthokonts 2 (MAOP2) were connected with several animal phyla, suggesting non-exclusive free-living lifestyles, or coincidence due to the use of same ecological resources. Full network can be found in Supplementary Figure 4B.

Pearson describe, respectively), we used suggested some other putative hosts a bipartite network (Figure 6B). We (Entoprocta, Tardigrada, and Porifera). corroborated the previous finding of Creolimax fragrantissima with several We also found that the environmental animal phyla, specifically with group marine ichthyosporeans 1 (MAIP1) was connected to Acoelomorpha, Polyplacophora (ρS=0.465), Calcarea Arthropoda (Hexapoda, Crustacea), (ρS=0.352) and Demospongiage , , Nematoda (Enoplea) (ρS=0.311). C. fragrantissima was and Chordata (Tunicata, Craniata). This isolated 27 times from guts, result suggests that the environmental mostly from a sipunculid species, but group MAIP1 may be associated with also one tunicate, sea cucumber and animal phyla and not being exclusively chiton (Marshall et al., 2008). Thus, our free-living. Another interesting result was results corroborated some symbiotic the interaction between MAOP2 and relationships (with Polyplacophora, commonly known as chiton) and (ρS=0.409) or Mollusca

91 3. Results

(Cephalopoda) (ρS=0.317), which could reference database was obtained by imply that these taxa use the same merging three different databases: resources or have some ecological GenBank, PR2-Opistho and PR2_V9. interaction, as it was found for other First, we downloaded two databases environmental groups (Lima-Mendez et from GenBank: nucleotide (nt) and al., 2015; Lambert et al., 2018). environmental nucleotide (env_nt) by January 25th 2018. We retrieved 18S Overall, these results suggest more rDNA sequences from these databases complex ecological interactions between by searching them using the human 18S parasitic/symbiotic unicellular holozoans sequence as a query (AC139250, and animals. These biotic effects positions 551,257 to 553,055). This (grazing, pathogenicity, and parasitism) sequence had been previously confirmed have been reported to explain 82% of to contain the Tara Oceans V9 primer the variability in the Tara Oceans sequences. BLASTn parameters were: interactome, giving a greater importance e-value <1E-10, percentage of identity to these interspecific connections (Lima- ≥60% and maximum target sequences of Mendez et al., 2015). However, we 9,9·107 (for nt) and 9,9·108 (for env_nt). refuse to claim that correlation implies From the BLASTn output, we causation. What is certain though is that implemented two filtering processes. In metabarcoding has a great power to the first one, we retrieved the sequences assess diversity in its multiple forms, that contained both Tara Oceans V9 from pure ecological and evolutionary primer sequences. We then trimmed the studies to applied conservationism, sequences just to have the V9 region. In which is of vital importance in a world of the second step, we kept those threat to biodiversity. sequences whose length was comprised between 80 and 120 base pairs to keep the most frequent length range of this MATERIALS & METHODS region (Amaral-Zettler et al., 2009). The second database, PR2-Opistho, was Datasets a well-curated and updated version of The initial environmental dataset was the original PR2 database for provided by the Tara Oceans Opisthokonta clade. This database consortium, which contained a total of (PR2-Opistho) was also trimmed with the 474,303 Operational Taxonomic Units Tara Oceans primer sequences just to (OTUs) from all eukaryotic clades. The keep the V9 region.

92 3. Results

The third database, PR2_V9, was sequences (2,197 were environmental generated by the Tara Oceans from Tara Oceans while 229 were refe- consortium (de Vargas et al., 2015). rence sequences). This dataset can be Because both PR2-Opistho and PR2_V9 found in Supplementary Material 4. were originally generated from PR2 database, we eliminated redundancies Network construction and kept the taxonomical annotation We built the initial similarity network from the PR2-Opistho database. Finally, based on a blast all-against-all of the we combined all databases, producing a unicellular Holozoa dataset. We used final reference database of 49,379 BLASTn v2.7.1+ (Camacho et al., 2009), eukaryotic sequences. with the following options: e-value <1E- 10, percentage of identity ≥85%, To retrieve the unicellular Holozoa maximum number of HSPs 1 and sequences, we performed a phylogenetic maximum target sequences 3,000. placement of both environmental and We used the cleanblastp script from reference datasets against an eukaryotic CompositeSearch software to filter the reference tree, and took those that output in order to remove auto-loops and branched within Holozoa and outside reciprocal connections (A-B would be the animals. A phylogenetic placement same as B-A) (Pathmanathan et al., consists of mapping short amplicons (in 2018). Final networks were obtained by this case, Tara Oceans OTUs) into a setting up a mutual cover threshold of fixed reference tree made from full- ≥95% and increasing sequence similarity length 18S rDNA sequences. This thresholds: ≥85%, ≥87%, ≥90%, ≥95%, reference was constructed using 130 full 18S sequences that covered all and ≥97%. These networks can be found eukaryotic groups. We performed the in Supplementary Material 4. phylogenetic placement using the RAxML-EPA algorithm (Berger et al., Network node annotation 2011), and we selected the sequences To annotate taxonomically every node in that were placed into unicellular Holozoa the network, we performed a BLAST of using the C++ script the initial 2,426 holozoan sequences extract_clade_placements from Genesis against the PR2-Opistho database, using software v0.18.1 (Czech and Stamatakis, the following parameters: e-value <1E-50 2016). Therefore, the starting dataset of and ≥97% percentage of identity. Under unicellular Holozoa contained 2,426 these conditions, only 438 sequences

93 3. Results

could be annotated. Thus, we decided to graph (Figure 1 and Supplementary use a phylogenetic method to Material 1). Namely, we randomly taxonomically assign the rest of the shuffled the labels of the nodes 100 unannotated OTUs: tax2tree algorithm times while keeping the same network (McDonald et al., 2012). This software topology. For example, one ENV node requires the structure of the phylogenetic (i.e., a node composed of an tree of both reference and unannotated environmental sequence) could turn out sequences. Then, it assigns the to be ENV or REF (i.e., a node taxonomy to the unannotated tips, given composed of a reference sequence) a file with the taxonomical information of after the shuffling. For all these 100 the annotated tips. We could random networks, we computed the successfully annotate 1,503 additional assortativity, generating the distribution sequences. Thus, a total of 1,941 of assortativity values for random sequences (78.8% of the initial dataset) networks. We next computed the actual could be taxonomically annotated. value of assortativity in the networks (Figure 3A and Supplementary Table), Network analysis for each tested pair of categories (ENV To address the molecular diversity and vs REF; IND vs MEDIT vs ARCTIC vs novelty of unicellular Holozoa, we ANTAR vs NPAC vs SPAC vs NATL vs analysed all network metrics using SATL vs REDS; SURF vs DCM vs MES NetworkX v2.1 library on python 3.5.1 vs MIX vs ZZZ; MESO vs (Hagberg et al., 2008). MICRO_MESO vs MICRO vs NANO vs PICO_NANO vs PICO_MICRO vs Novelty assessment: preferential PICO). connection Assortativity is a property of the network Control test that measures the preferential We performed a control test to check connection between nodes belonging to whether our results could be explained the same group (Newman, 2003; Forster by the large difference in the amount of et al., 2015) (Figure 1). To compute its ENV sequences (2,197) compared to significance, we first calculated a REF sequences (229). We subsampled distribution of null assortativity values for randomly 10% of the original ENV each network, because it may be sequences 100 times and combined different than 0, which is associated with them with the same REF dataset and a random distribution of the nodes in the

94 3. Results

then, performed all the analyses in the analysis the trimming was done same way as for the real networks. manually, removing those positions with a mean pairwise identity over all pairs Novelty assessment: BRIDES below 30%. We performed the BRIDES software characterizes new phylogenetic placement using the paths that are created when extra nodes RAxML-EPA algorithm (Berger et al., are added to an original network (Lord et 2011). The final tree of Figure 4B was al., 2016). For every sequence similarity enhanced using iTOL (Letunic and Bork, network, we first kept only the REF 2016). nodes (original network) and then, we We validated the quality of the added the ENV nodes of unicellular phylogenetic placement using the Holozoa (augmented networks) to placement_histograms script from compute BRIDES using the default Genesis package v0.18.1 (Czech and parameters. Stamatakis, 2016). The first parameter computed was the EDPL (Expected Novelty assessment: phylogenetic Distance between Placement Locations). placement For every OTU, it calculates the To validate the putative novel diversity weighted distance between all placement previously obtained with BRIDES and positions. In other words, EDPL shortest-path analysis, we performed a quantifies to which extent all placements phylogenetic placement of the OTUs into from an OTU are scattered over the tree. our curated reference Holozoa tree, In both groups, EDPL values were which can be found in Supplementary extremely small (<0.05) Material 5. We aligned the sequences (Supplementary Figure 3A,C). using PaPaRa with default parameters Considering that most branches in the (Berger and Stamatakis, 2011) and tree had less than 0.05 nucleotide manually examined the alignment and substitutions per site, it meant that the corrected wrong positions in Geneious majority of the OTUs were located within v9.0.5 (Kearse et al., 2012). We then the same branch. However, the quality of trimmed the non-homologous positions these placements was not high, with trimAl 1.4.rev15, setting the gap measured as the distribution and threshold option at 0.2 for the alignment frequency of Likelihood Weight Ratio coming from BRIDES analysis (Capella- values (LWR). This was especially Gutiérrez et al., 2009). Regarding the drastic in the placements of MASHOL alignment from the shortest-path OTUs (Supplementary Figure 3D),

95 3. Results

which shows the uncertainty in the would not be able to detect them using location of the group. Spearman’s or Pearson’s correlation coefficients. We used instead MICtools Geographical distribution package (Albanese et al., 2018), which is We described the geographical able to identify a wider range of distribution of unicellular Holozoa relationships in large datasets and lineages, as well as the distribution along assess their statistical significance. Final the water column and size fractions, networks were created using Cytoscape through circular layouts using “circlize” 3.3.0 (Shannon et al., 2003). package in Rstudio (Gu et al., 2014;

RStudio, 2017) SUPPLEMENTARY MATERIAL The Supplementary Material of this Co-occurrence patterns article is available at FigShare: To test the association between https://doi.org/10.6084/m9.figshare.8020 unicellular Holozoa and animal OTUs, 427.v2 we carried out a co-occurrence analysis.

First, we filtered the dataset to keep AUTHOR CONTRIBUTIONS those OTUs that were present in at least ASA, IRT, RL and EB designed the 3 samples (out of 1,086 total samples in study. CDV provided the raw data. ASA Tara Oceans). Then, we summed up and RL performed all analyses. ASA OTU abundances if these OTUs designed the figures and wrote the belonged to the same class in animals or manuscript. EB and IRT supervised the the same genus/species in unicellular project and reviewed the manuscript. Holozoa. We used “corrplot” and “Hmisc” libraries in Rstudio v.1.1.383 to perform FUNDING the analyses (RStudio, 2017; Wei et al., This work was supported by an 2017; Harrell, 2019). These consist of European Research Council building a correlation matrix among all Consolidator Grant (ERC- 2012-Co- pairwise comparisons and then, extract 616960), and grants (BFU2014- 57779-P the significant relationships (Spearman’s and BFU2017-90114-P) from Ministerio significance<0.01**), which finally were de Economía y Competitividad plotted in a heatmap. (MINECO), Agencia Estatal de There was a possibility, however, that Investigación (AEI), and Fondo Europeo some associations could be neither de Desarrollo Regional (FEDER) to IRT. monotonic nor linear. In that case, we EB was founded by the European

96 3. Results

Research Council (FP7/2017- 2013 Berger, S.A., Krompass, D., and Grant Agreement #615274). Stamatakis, A. (2011) Performance, Accuracy, and Web Server for

CONFLICT OF INTEREST Evolutionary Placement of Short None declared Sequence Reads under Maximum Likelihood. Syst. Biol. 60: 291–302. REFERENCES Berger, S.A. and Stamatakis, A. (2011) Aligning short reads to reference alignments and trees. Albanese, D., Riccadonna, S., Donati, Bioinformatics 27: 2068–2075. C., and Franceschi, P. (2018) A Camacho, C., Coulouris, G., Avagyan, practical tool for maximal V., Ma, N., Papadopoulos, J., information coefficient analysis. Bealer, K., and Madden, T.L. (2009) Gigascience 7: 1–8. BLAST+: architecture and Amacher, J., Neuer, S., Anderson, I., applications. BMC Bioinformatics and Massana, R. (2009) Molecular 10: 421. approach to determine contributions de Vargas, C., Stephane, A., Nicolas, H., of the protist community to particle Johan, D., Frederic, M., Ramiro, L., flux. Deep Sea Res. Part I 56: et al. (2015) Eukaryotic plankton 2206–2215. diversity in the sunlit ocean. Amaral-Zettler, L.A., McCliment, E.A., Science 348: 1–12. Ducklow, H.W., and Huse, S.M. del Campo, J., Mallo, D., Massana, R., (2009) A Method for Studying de Vargas, C., Richards, T. a., and Protistan Diversity Using Massively Ruiz-Trillo, I. (2015) Diversity and Parallel Sequencing of V9 distribution of unicellular Hypervariable Regions of Small- opisthokonts along the European Subunit Ribosomal RNA Genes. coast analysed using high- PLoS One 4: e6372. throughput sequencing. Environ. Arroyo, A.S., López-Escardó, D., Kim, Microbiol. n/a-n/a. E., Ruiz-Trillo, I., and Najle, S.R. del Campo, J. and Ruiz-Trillo, I. (2013) (2018) Novel Diversity of Deeply Environmental Survey Meta- Branching Holomycota and analysis Reveals Hidden Diversity Unicellular Holozoans Revealed by among Unicellular Opisthokonts. Metabarcoding in Middle Paraná Mol. Biol. Evol. 30: 802–805. River, Argentina. Front. Ecol. Evol. del Campo, J., Sieracki, M.E., Molestina, 6: 99.

97 3. Results

R., Keeling, P., Massana, R., and Glockling, S.L., Marshall, W.L., and Ruiz-Trillo, I. (2014) The others: our Gleason, F.H. (2013) Phylogenetic biased perspective of eukaryotic interpretations and ecological genomes. Trends Ecol. Evol. 29: potentials of the 252–259. (Ichthyosporea). Fungal Ecol. 6: Capella-Gutiérrez, S., Silla-Martínez, 237–247. J.M., and Gabaldón, T. (2009) Grau-Bové, X., Torruella, G., Donachie, trimAl: a tool for automated S., Suga, H., Leonard, G., Richards, alignment trimming in large-scale T.A., and Ruiz-Trillo, I. (2017) phylogenetic analyses. Dynamics of genomic innovation in Bioinformatics 25: 1972–1973. the unicellular ancestry of animals. Corel, E., Lopez, P., Méheust, R., and eLife 6: e26036. Bapteste, E. (2016) Network- Gu, Z., Gu, L., Eils, R., Schlesner, M., Thinking: Graphs to Analyze and Brors, B. (2014) circlize Microbial Complexity and Evolution. implements and enhances circular Trends Microbiol. 24: 224–237. visualization in R. Bioinformatics 30: Czech, L. and Stamatakis, A. (2016) 2811–2812. Genesis. A Toolkit for Working with Hagberg, A.A., Schult, D.A., and Swart, Phylogenetic Data. P.J. (2008) Exploring network https://github.com/lczech/genesis. structure, dynamics, and function Edgcomb, V., Orsi, W., Bunge, J., Jeon, using NetworkX. In, Varoquaux,G., S., Christen, R., Leslin, C., et al. Vaught,T., and Millman,J. (eds), (2011) Protistan microbial Proceedings of the 7th Python in observatory in the Cariaco Basin, Science Conference (SciPy2008). Caribbean. I. Pyrosequencing vs Pasadena, CA USA, pp. 11–15. Sanger insights into species Harrell, F.E. (2019) Hmisc: Harrell richness. ISME J. 5: 1344–1356. Miscellaneous. Forster, D., Bittner, L., Karkar, S., https://github.com/harrelfe/Hmisc. Dunthorn, M., Romac, S., Audic, S., Hugerth, L.W., Muller, E.E.L., Hu, et al. (2015) Testing ecological Y.O.O., Lebrun, L.A.M., Roume, H., theories with sequence similarity Lundin, D., et al. (2014) Systematic networks: marine ciliates exhibit design of 18S rRNA gene primers similar geographic dispersal for determining eukaryotic diversity patterns as multicellular organisms. in microbial consortia. PLoS One 9:. BMC Biol. 13: 1–16. Kearse, M., Moir, R., Wilson, A., Stones-

98 3. Results

Havas, S., Cheung, M., Sturrock, Lima-Mendez, G., Faust, K., Henry, N., S., et al. (2012) Geneious Basic: An Decelle, J., Colin, S., Carcillo, F., et integrated and extendable desktop al. (2015) Determinants of software platform for the community structure in the global organization and analysis of plankton interactome. Science 348: sequence data. Bioinformatics 28: 1262073–1262073. 1647–1649. Logares, R., Audic, S., Bass, D., Bittner, Krabberød, A.K., Bjorbækmo, M.F.M., L., Boutte, C., Christen, R., et al. Shalchian-Tabrizi, K., and Logares, (2014) Patterns of rare and R. (2017) Exploring the oceanic abundant marine microbial microeukaryotic interactome with eukaryotes. Curr. Biol. 24: 813–821. metaomics approaches. Aquat. Lord, E., Le Cam, M., Bapteste, É., Microb. Ecol. 79: 1–12. Méheust, R., Makarenkov, V., and Lambert, S., Tragin, M., Lozano, J.-C., Lapointe, F.J. (2016) BRIDES: A Ghiglione, J.-F., Vaulot, D., Bouget, new fast algorithm and software for F.-Y., and Galand, P.E. (2019) characterizing evolving similarity Rhythmicity of coastal marine networks using breakthroughs, picoeukaryotes, bacteria and roadblocks, impasses, detours, archaea despite irregular equals and shortcuts. PLoS One 11: environmental perturbations. ISME 5–7. J. 13: 388-401. Mahé, F., de Vargas, C., Bass, D., Lang, B.F., O’Kelly, C., Nerad, T., Gray, Czech, L., Stamatakis, A., Lara, E., M.W., and Burger, G. (2002) The et al. (2017) Parasites dominate closest unicellular relatives of hyperdiverse soil protist animals. Curr. Biol. 12: 1773–1778. communities in Neotropical Layeghifard, M., Hwang, D.M., and rainforests. Nat. Ecol. Evol. 1: 0091. Guttman, D.S. (2017) Disentangling Marshall, W.L. and Berbee, M.L. (2011) Interactions in the Microbiome: A Facing Unknowns: Living Cultures Network Perspective. Trends (Pirum gemmata gen. nov., sp. Microbiol. 25: 217–228. nov., and Abeoforma whisleri, gen. Letunic, I. and Bork, P. (2016) Interactive nov., sp. nov.) from Invertebrate Tree Of Life (iTOL): An online tool Digestive Tracts Represent an for phylogenetic tree display and Undescribed Clade within the annotation. Bioinformatics 44: 127– Unicellular Opisthokont Lineage 128. Ichthyosporea (Mesomycetozoea).

99 3. Results

Protist 162: 33–57. Gene Families Detection. Mol. Biol. Marshall, W.L., Celio, G., McLaughlin, Evol. 35: 252–255. D.J., and Berbee, M.L. (2008) Pesant, S., Not, F., Picheral, M., Multiple Isolations of a Culturable, Kandels-Lewis, S., Le Bescot, N., Motile Ichthyosporean Gorsky, G., et al. (2015) Open (Mesomycetozoa, Opisthokonta), science resources for the discovery Creolimax fragrantissima n. gen., n. and analysis of Tara Oceans data. sp., from Marine Invertebrate Sci. Data 2: 150023. Digestive Tracts. Protist 159: 415– Pilosof, S., Porter, M.A., Pascual, M., 433. and Kéfi, S. (2017) The multilayer McDonald, D., Price, M.N., Goodrich, J., nature of ecological networks. Nat. Nawrocki, E.P., Desantis, T.Z., Ecol. Evol. 1: 0101. Probst, A., et al. (2012) An Romari, K. and Vaulot, D. (2004) improved Greengenes taxonomy Composition and temporal with explicit ranks for ecological and variability of picoeukaryote evolutionary analyses of bacteria communities at a coastal site of the and archaea. ISME J. 6: 610–618. English Channel from 18S rDNA Mendoza, L., Taylor, J.W., and Ajello, L. sequences. Limnol. Oceanogr. 49: (2002) The Class Mesomycetozoea: 784–798. A Heterogeneous Group of RStudio, T. (2017) Rstudio: Integrated at the Animal- Development for R. Fungal Boundary. Annu. Rev. http://www.rstudio.com. Microbiol. 56: 315–344. Ruiz-Trillo, I., Burger, G., Holland, Newman, M.E.J. (2003) Mixing patterns P.W.H., King, N., Lang, B.F., Roger, in networks. Phys. Rev. E 67: 1–13. A.J., and Gray, M.W. (2007) The Ocaña-Pallarès, E., Najle, S.R., origins of multicellularity: a multi- Scazzocchio, C., and Ruiz-Trillo, I. taxon genome initiative. Trends (2019) Reticulate evolution in Genet. 23: 113–118. eukaryotes: Origin and evolution of Ruiz-Trillo, I., Roger, A.J., Burger, G., the nitrate assimilation pathway. Gray, M.W., and Lang, B.F. (2008) PLOS Genet. 15: e1007986. A Phylogenomic Investigation into Pathmanathan, J.S., Lopez, P., Lapointe, the Origin of Metazoa. Mol. Biol. F.-J., and Bapteste, E. (2018) Evol. 25: 664–672. CompositeSearch: A Generalized Shalchian-Tabrizi, K., Minge, M.A., Network Approach for Composite Espelund, M., Orr, R., Ruden, T.,

100 3. Results

Jakobsen, K.S., and Cavalier-Smith, ice (Northeast Water Polynya, G. T. (2008) Multigene Phylogeny of Arch. fur Protistenkd. 148: 77–114. Choanozoa and the Origin of Torruella, G., de Mendoza, A., Grau- Animals. PLoS One 3: e2098. Bové, X., Antó, M., Chaplin, M.A., Shannon, P., Markiel, A., Ozier, O., del Campo, J., et al. (2015) Baliga, N.S., Wang, J.T., Ramage, Phylogenomics Reveals D., et al. (2003) Cytoscape: A Convergent Evolution of Lifestyles Software Environment for Integrated in Close Relatives of Animals and Models of Biomolecular Interaction Fungi. Curr. Biol. 25: 1–7. Networks. Genome Res. 13: 2498– Valverde, S., Piñero, J., Corominas- 2504. Murtra, B., Montoya, J., Joppa, L., Thomsen, H.A., Garrison, D.L., and and Solé, R. (2018) The Kosman, C. (1997) architecture of mutualistic networks Choanoflagellates (Acanthoecidae, as an evolutionary spandrel. Nat. Choanoflagellida) from the Weddell Ecol. Evol. 2: 94–99. sea, Antarctica, taxonomy and Wei, T., Simko, V., Levy, M., Xie, Y., Jin, community structure with particular Y., and Zemla, J. (2017) corrplot: emphasis on the ice biota; with Visualization of a Correlation Matrix. preliminary remarks on https://github.com/taiyun/corrplot. Choanoflagellates from Arctic sea

101 3. Results

SUPPLEMENTARY FIGURES

Supplementary Figure 1. Topological metrics of each network. Connected Components (CCs) with only environmental nodes exceeds the rest of CCs because of the unequal amount of environmental sequences compared to reference sequences in the original database (2,197 environmental sequences; 230 reference sequences). Number of nodes reflects only the nodes that are connected, not singletons. This is the reason why the number of nodes decreases as the similarity threshold increases.

102 3. Results

Supplementary Figure 2. BRIDES paths. An illustration of all BRIDES paths, together with the possible biological interpretation. Blue nodes and edges are generated by the environmental sequences (ENV), which are added to the original network only made from reference sequences (REF), depicted by black nodes and edges. We focused on the paths highlighted with a red box because they were the simplest to interpret from a biological standpoint.

103 3. Results

Supplementary Figure 3. Phylogenetic placement validation. (A,C) The Expected Distance between Placement Locations (EDPL) indicates whether one OTU is scattered over the tree or not. The smaller the EDPL, the better is the placement because it is located in a specific area of the tree. (B,D) Barplot represents the first three most probable Likelihood Weight Rations (LWR) of each OTU. In (D) the distribution of the placements was left-tailed, showing the uncertainty of the placement.

104 3. Results

Supplementary Figure 4. Co-occurrence analysis of unicellular Holozoa OTUs and animal classes from Tara Oceans. (A) Significant correlations (Spearman’s significance<0.01**) range from negative values (brown) to positive ones (blue). “_X” sign after a taxa means “unknown”. Unicellular Holozoa are depicted in red. (B) Significant correlations (Maximal Information Coefficient, MICe, between 0.08- 0.638) displayed among unicellular Holozoa.

105 3. Results

106

3. Results

3.3. Hidden diversity of Acoelomorpha revealed through metabarcoding

Arroyo AS, López-Escardó D, de Vargas C, Ruiz-Trillo I (2016) Hidden diversity of Acoelomorpha revealed through metabarcoding. Biology Letters 12: 20160674. http://dx.doi.org/10.1098/rsbl.2016.0674

107 4. Discussion

4. DISCUSSION

“Truth is born into this world only with pangs and tribulations, and every fresh truth is received unwillingly. To expect the world to receive a new truth, or even an old truth, without challenging it, is to look for one of those miracles which do not occur.”

· Alfred R. Wallace

121

4. Discussion

122

4. Discussion

The idea of using a DNA sequence as a barcode to identify a single species completely revolutionised the understanding of biodiversity in the following decades (Hebert et al. 2003a). The burst of barcoding and metabarcoding allowed a faster and more global perspective on this issue, unveiling a plethora of unknown biodiversity, especially in microbial lineages (López-García et al. 2001; Jones et al. 2011). The importance of revealing new diversity lays on the fact that biodiversity is necessary to fully understand evolution.

Therefore, to address key evolutionary questions, such as the origins of animal multicellularity, we need to study the unicellular lineages closely related to animals. Structured in a taxonomic framework, biodiversity of unicellular relatives of animals has enabled to perform comparative genomics (Torruella et al. 2015; Grau-Bové et al. 2017; López-Escardó et al. 2017; Richter et al. 2018), or functional studies (Sebé-Pedrós et al. 2015; Parra-Acero et al. 2018; Dudin et al. 2019), giving hints on how the transition might have occurred and how the ancestors look like. It is the addition of new taxa that can change dramatically the current view on this transition, as it was already proven when Capsaspora owczarzaki was incorporated into the comparative genomic analyses to understand animal origins (Ruiz-Trillo et al. 2004; Ruiz-Trillo et al. 2008; Shalchian- Tabrizi et al. 2008; Sebe-Pedros et al. 2010; Sebé-Pedrós et al. 2011; Suga et al. 2013).

This is the reason why this thesis was focused on the molecular biodiversity in Opisthokonta, seeking unknown diversity within or between clades. By knowing the phylogenetic position of these organisms, as well as any geographical and ecological information, we are able to go one step further, not only to understand the diversity, but also to actually isolate the species, or obtain their genomes or transcriptomes.

In the following sections, I will first discuss the molecular novelty of unicellular Opisthokonta found in both the freshwater dataset from Paraná River, Argentina, and the marine global Tara Oceans expedition (results 3.1 and 3.2). Explaining the specific novelty in each unicellular lineage, I will later expose the most remarkable findings of novel groups. This section will finish explaining how finding new acoelomorph diversity can help us to understand the origins of bilaterian animals (result 3.3). Finally, I will move on to a more technical section, first addressing the drawbacks of metabarcoding, and second putting forward the use of gene similarity networks to better study molecular biodiversity.

123

4. Discussion

4.1. Description of the potential real diversity of unicellular Opisthokonta

Unicellular opisthokonts are key for understanding the evolution and diversification of both animals and fungi (Ruiz-Trillo et al. 2007). Despite the description of several morphospecies during the last decade (Marshall et al. 2008; Hassett et al. 2015; Hehenberger et al. 2017; López-Escardó et al. 2018a), molecular surveys keep showing the incompleteness of the current Opisthokonta tree of life (del Campo & Ruiz-Trillo 2013; del Campo, Mallo, Massana, de Vargas, T. A. Richards, et al. 2015; Richards et al. 2015). From an ecological point of view, unicellular opisthokonts are not the main components of the aquatic ecosystems (de Vargas et al. 2015). For example, in our Paraná River sampling, unicellular holozoans represented only 2.15% of all eukaryotes, whereas unicellular Holomycota represented 22.8%. However, they can have an important role in the trophic webs, either because of their parasitic condition (Glockling et al. 2013), their role as saprotrophs and decomposers (Grossart et al. 2015), or even as key members in the planktonic food web dynamics (Sukhanova 2001; Richards et al. 2017).

Therefore, the description of the real diversity of Opisthokonta contributes to both, understanding key evolutionary processes, and describing the ecosystems in a more accurate way.

124

4. Discussion

4.1.1. Choanoflagellatea

In this work, we detected that even in this extensively known group, results from Paraná River yielded ~60% of choanoflagellate diversity falling into uncultured GenBank sequences.

The most interesting patterns of molecular novelty were detected in the Acanthoecida clade. While our freshwater sampling in Paraná River yielded only 10 OTUs with 303 reads in total, the Tara Oceans dataset contained 359 acanthoecid OTUs. This result is in line with the traditional view of Acanthoecida as a typical marine clade because only two freshwater species have been described to date: Acanthocorbis mongolica (Paul 2012) and Stephanoeca arndti (Nitsche 2014). In Paraná freshwater sampling, these 10 OTUs were associated with marine environmental sequences, showing that perhaps several acanthoecid species can cope with differences in salinity or that there is a hidden diversity of freshwater species that have not been described. These results are in accordance with the observations or marine-freshwater transitions in Choanoflagellatea (Logares et al. 2009). Moreover, we also located some of these Tara Oceans OTUs outside already described acanthoecid subgroups, such as Choanoflagellate E, G and H.

It is important to note that only 10% of Acanthoecida species has been analyzed using molecular tools (Nitsche et al. 2016). This means that there are more species described at morphological level than at genetic level. This could impact our assumptions because our novelty might be not new in the sense of an unknown morphospecies, but rather a catalogued species to which its genetic barcode has not been matched. Nevertheless, our results are the first step to bring the gap between physical features and molecular information by providing the latter. In this particular case, a good approach to follow would be to link the molecular information provided in this thesis with the morphological description, as it was successfully implemented in five Acanthoecida species by Nietsche and collaborators (Nitsche et al. 2016). Overall, our results confirm the extensive geographical distribution of Choanoflagellatea and reveal molecular novelty branching off described groups of Acanthoecida.

125

4. Discussion

4.1.2. Filasterea

Filasterea was an intriguing clade due to its absence in metabarcoding surveys up to now. Even though only four species are formally known to be part of this group, they were isolated in completely different ecological niches (see section 1.5.3. Filasterea). Caspsaspora owczarzaki, Pigoraptor chileana, and Pigoraptor vietnamica are freshwater species, whereas Ministeria vibrans is marine. On the other hand, while Pigoraptor species and M. vibrans are free-living, C. owczarzaki was isolated from the freshwater snail Biomphalaria glabrata, and was described as a symbiont (Stibbs et al. 1979; Owczarzak et al. 1980). This range of lifestyles suggested that this group would be easy to detect in environmental surveys. However, previous metabarcoding studies failed to retrieve sequences from any of these filasterean species (del Campo & Ruiz-Trillo 2013; del Campo et al. 2015). The scanty amount of one single read of Filasterea in European coastal marine samples was explained by a 4-5 bp mismatch of the V4 primer against the full-length 18S sequence from M. vibrans, the only marine species in the clade (del Campo et al. 2015). Were a primer bias the reason, it would be possible to detect filasterean reads in sea water using another set of primers or targeting another 18S hypervariable region. Our results from the Tara Oceans expedition confirmed this hypothesis.

We recovered 15 filasterean OTUs with 163 reads of total abundance. The alignment of the 1389F forward primer designed for the V9 region against the four described Filasterea species revealed a perfect match (Figure 11). Therefore, it seems that the V9 region works better to retrieve this group, as it is the case with Amoebozoa and Foraminifera (Amaral-Zettler et al. 2009; Pawlowski et al. 2011). The accessibility for the primers to reach the region, which is in a secondary-structure, may also play a role to understand these results (Nickrent & Sargent 1991). Filasterean OTUs were mainly retrieved from southern-hemisphere waters, namely the South Pacific Ocean, the Red Sea, and the Indian Ocean. Unexpectedly, the reads were also retrieved from two very different size fractions: meso (180-2000 μm) and nano (3/5-20 μm). Cell sizes from the current species of Filasterea vary within the nano fraction, so it is surprising to detect OTUs in the meso fraction. One possible explanation is that this new Filasterea diversity could form cell aggregates, as Capsaspora owczarzaki or Pigoraptor species do (Sebé- Pedrós et al. 2013; Hehenberger et al. 2017). However, the diameter of these aggregates usually range between 20 and 30 μm, putting forward the possibility that our results are

126

4. Discussion the consequence of larger cell clusters. Another explanation may be that these unknown filasterean species feed on larger cells to which they may get attached. For example, it has been reported that M. vibrans uses a long stalk (2-16.5 μm) to enclose diatoms, cyanobacteria or detritus (Tong 1997). Pigoraptor species have been reported to feed on Parabodo caudatus (). Thus, it may be that engulfing bigger prays might get them stuck in bigger filters. Finally, it could also be plausible that these new Filasterea species could be just bigger than the ones already described.

Figure 11. Alignment of the 1389F forward primer used in the Tara Oceans expedition against the catalogued filasterean species. The 100% similarity in the alignment of all sequences indicates that the V9 hypervariable region works better than the V4 to retrieve environmental data from Filasterea.

Our results in the freshwater Paraná River, Argentina, failed to recover any filasterean read. This was unexpected because three out of the four described filasterean species were isolated from freshwater environments. We checked again if this abscence could be due to a primer mismatch, as it was previously proposed (del Campo et al. 2015). However both V4 and V8-9 primers perfectly aligned with the regions. We attributed these results to either a low abundance of Filasterea, so the amplification protocol could not detect its presence, a seasonal variability, or just abscence of Filasterea species in that environment, suggesting a more limited freshwater ecosystems where these species might inhabit.

Overall, our metabarcoding results indicate that new Filasterea species may exist in marine ecosystems from southern-hemisphere oceans, whereas the freshwater species could display a more “endemic” behaviour, only being found in specific habitats in certain regions. What is clear is that the evolution of filasterean species probably required

127

4. Discussion several marine-freshwater transitions along its evolutionary pathway, as it has occurred in other eukaryotic lineages (Logares et al. 2009).

4.1.3. Ichthyosporea

Our results in both freshwater Paraná River and marine Tara Oceans samples showed that ichthyosporeans are well represented in the environment, the former containing up to 22.81% of the total unicellular Holozoa richness. Ichthyosporeans were distributed quite homogeneously across all oceans worldwide, not showing a specific hotspot of abundance. We also observed that marine ichthyosporean OTUs are found in different size fractions: the distribution of Dermocystida reads was shifted towards larger sizes (meso, micro and nano fractions accounting for 88.67% of the abundance), whereas Ichthyophonida distribution was shifted towards smaller sizes (nano and pico held 86.43% of the abundance).

Ichthyosporean ecology is vastly unknown, even in the group whose species are well known because of the harm inflicted to animal species of economic interest (Glockling et al. 2013). Until know, these parasitic associations have been described based on morphological analysis, which could only study a limited number of species. We shed light on this topic by describing, for the first time, possible associations between ichthyosporeans from global marine ecosystems and their putative animal hosts using metabarcoding data. With this aim in mind, we conducted a co-occurrence analysis that showed Creolimax fragrantissima to be positively associated with Entoprocta, Polyplacophorans (molluscs), Tardigrada and some Porifera. This result expands the range of hosts reported for this species, which was mainly restricted to the phylum Sipunculida (peanut worms) (Marshall et al. 2008). We could also observe positive associations of environmental ichthyosporeans with other animals (see below), providing more evidence of the tight interrelationship between Ichthyosporea and animals.

Regarding hidden diversity in Ichthyosporea, we detected some patterns of new diversity. For example, a group of 16 OTUs with 754 reads of abundance were divergent from their closest reference species Amoebidium parasiticum and Ichthyophonus irregularis, accounting for up to 18.39% of the ichthyosporean V8-9 region OTUs in Paraná River. We named this group Amoebidium-Ichthyophonus ENV. But it was in the

128

4. Discussion already described environmental groups that we found the most interesting diversity patterns. On the one hand, we retrieved FRESHIP (freshwater ichthyosporeans) in our Paraná River dataset, representing 12.9% of the total unicellular Holozoa abundance in the V4 region. On the other hand, in the Tara Oceans dataset, we detected 18 OTUs of MAIP1 (marine ichthyosporeans) with a total of 66,027 reads of abundance. This distribution shows that these environmental Ichthyosporea groups are restricted to the habitats where they were first described. This is totally opposite to what we found in Choanoflagellatea, in which transitions from sea to fresh waters are fairly common, identifying several FRESCHO (freshwater choanoflagellates) OTUs in marine habitats and vice versa. MAIP1 is thought to dominate the anoxic fractions in coastal marine sediments (del Campo et al. 2015). As they were only present in the DNA but absent in the RNA template, the authors asserted they could be dead or dormant cells, in which its DNA was retrieved from inside the host (del Campo et al. 2015). We support this claim due to the results on MAIP1 and animals co-occurrence analysis. MAIP1 was positively and significantly associated with acoelomorphs, bryozoans, crustaceans, cnidarians or enoplea nematods. This group was also associated with tunicates and craniates, suggesting that MAIP1 may be also composed of fish parasites as though some of their ichthyosporean relatives.

In general, we could not detect big patterns of novel diversity in Ichthyosporea, i.e., a cluster of OTUs located in between Dermocystida and Ichthyophonida or as the sister- group to these two subgroups, as opposed to what we observed in Choanoflagellatea, Corallochytrea or even, at Holozoa level. One explanation may be that the patterns of novel diversity that we detected were restricted to a high 18S variability at a species level. This is in line with the 18S diversity within phylotypes: the group of taxa based solely on their phylogenetic relationships rather than the taxonomy, which is commonly seen in ichthyosporeans (Glockling et al. 2013).

129

4. Discussion

4.1.4. Corallochytrea

Corallochytrea comprises two species: Corallochytrium limacisporum and Syssomonas multiformis (Raghukumar 1987; Grau-Bové et al. 2017; Hehenberger et al. 2017). Much as the specific phylogenetic position of this clade is under debate, it is clear that these two species always cluster together (Grau-Bové et al. 2017; Hehenberger et al. 2017; López-Escardó and Ocaña-Pallarès, personal communication). It is different in 18S rDNA phylogenies though, in which this marker fails to reflect the topology of the previous phylogenomic analyses. Using the 18S rRNA marker, both species do not cluster together, but appear as a different lineages branching as sister-groups to Choanoflagellatea.

Our results revealed that OTUs associated to Syssomonas multiformis were only retrieved from the freshwater Paraná River, while OTUs associated to Corallochytrium limacisporum were only retrieved in the marine Tara Oceans expedition. This distribution is in agreement with the ecosystems from which these species were isolated (Raghukumar 1987; Hehenberger et al. 2017). Moreover, this is the first time these two species are screened in a global metabarcoding dataset and thus, our results shed new light on their diversity and distribution.

Regarding S. multiformis, our results provided the first description of this species outside the Vietnamese freshwater lake where it was isolated (Hehenberger et al. 2017). These OTUs were the most abundant unicellular Holozoa in Paraná River, accounting for 37.36% of the abundance in the V4 dataset and 45% in the V8-9 dataset. Interestingly, the most abundant OTU in each region (2,025 reads in V4; 5,733 reads in V8-9) displayed a 99% similarity with S. multiformis and thus, could be confidentially assigned to this morphospecies. The rest of the OTUs, however, were less confidently associated, having an average of 95% similarity. Whether these OTUs are the real species and thus, S. multiformis has an extensive variability in the 18S sequence, or they are closely related species, remains unsolved. Nevertheless, as we found the same diversity pattern in both the V4 and the V8-9 regions, we are confident about the existence of this potential new diversity around the freshwater Syssomonas multiformis.

OTUs associated to the Corallochytrea group were only retrieved in the Tara Oceans expedition, showing the affinity of these organisms for marine habitats. Specifically, 10

130

4. Discussion

OTUs with a total of 322 reads were detected despite not being confidently assigned to C. limacisporum, the only described species of the group (Raghukumar 1987; Grau-Bové et al. 2017). This implies that these OTUs are likely to be new diversity. They were mainly located in the North Pacific and the Indian Ocean, which coincides with the known distribution of C. limacisporum, firstly isolated from marine coral reef lagoons in the Arabian Sea and later in Hawai’i (Raghukumar 1987; Grau-Bové et al. 2017). Finally, the size fractions where they were found (pico and nano: 0.8-20 μm) are alike to the sizes documented for C. limacisporum (4.5-20 μm) and S. multiformis (7-14 μm), which supports the association of these OTUs within the Corallochytrea clade.

Overall, it is noteworthy to consider the hidden diversity in this clade. The current distribution of the two morphologically described species is far away from each other and also, they appear to be in complete opposite ecosystems. If they were the only two species of the clade, one single speciation event would have led to this situation, which seems hardly plausible. Our work proves that both S. multiformis and C. limacisporum are not restricted to the habitats from which they were isolated, including even the Arctic Ocean. Moreover, we could detect novel diversity associated with Corallochytrea, but not directly to C. limacisporum. The future isolation of more species will be key to make Corallochytrea a monophyletic group in 18S phylogenies, as it happened with Filasterea (they appeared as polyphyletic until the two Pigoraptor species came into play). More corallochytrean described species will also be essential to place this clade in the Holozoa tree of life with confidence, given that its position changes depending on the phylogenetic method, the number and type of positions in the alignment, or even the number and type of species in the outgroup (López-Escardó and Ocaña-Pallarès, personal communication).

4.1.5. Potential new diversity between main Opisthokonta lineages

131

4. Discussion

One of the key discoveries in the present thesis is the identification of the environmental clades FRESHOL1 (for freshwater Holomyota) and FRESHOL1-related in the analysis of the eukaryotic diversity of the Paraná River. FRESHOL1 comprises 10 OTUs that are confidentially located branching earlier than any other Opisthosporidia lineage. FRESHOL1-related also appeared in an early branching position in the tree, although with low nodal support. We retrieved these OTUs from different size fractions, which suggested that this group may be composed of cells from different stages of the life cycle (Richards et al. 2015). For example, chytrids produce flagellated zoospores that find in the aquatic environments the most effective way of dispersion (Monchy et al. 2011). Thus, it is not surprising that other early-splitting fungal clades as the ones described here could perform similarly. What is clear is the divergent nature of this group. The formal description and isolation of this diversity would be ideal. However, it is challenging because of the cryptic nature of fungal species (Yahr et al. 2016). In addition, extreme variations in morphology due to secondary loses or adaptations to the specific ecological niches can also hamper the recognition of diagnostic features necessary for the formal description. One strategy to get morphological information and enhance the discriminatory power against other fungal clades would be to use Fluorescence In Situ Hybridization techniques to target chitin with calcofluor-white or lectin wheat germ (Jones et al. 2011; Richards et al. 2015). Although chitin is not a diagnostic feature of Fungi and thus, a negative result would not imply an abscence of the organism, this experiment could be a first approach. Overall, these findings add more evidence that Fungi are much more than the “charismatic megaflora” we all know (Seifert 2009). Finding new fungal diversity also has profound implications in the description of what Fungi are. Lacking a vast amount of fungal knowledge, we cannot be confident about what uniquely defines Fungi (Grossart et al. 2015; Richards et al. 2017).

The other main discovery in the present thesis was the environmental group that we tentatively named as MASHOL (for MArine Small HOLozoa). Retrieved from the Tara Oceans marine expedition, this group is composed of 21 OTUs and a total of 6,244 reads that were phylogenetically placed at the most internal nodes in between Choanoflagellatea and Syssomonas multiformis. The wide divergence of the sequences made impossible to place them in any of the extant tips of the current reference Holozoa tree of life, suggesting that they form a novel Holozoa group. This novelty is in agreement with the previous observation from the Tara Oceans consortium of unclassified Opisthokonta OTUs increasing the reference tree length 400%, implying that these

132

4. Discussion lineages hold a significant unknown molecular diversity (de Vargas et al. 2015). It is important to note here that the topology of Syssomonas as the sister group of Choanoflagellatea, as depicted in our reference Holozoa tree, is probably an artifact from the 18S rRNA gene. Phylogenomic analyses have demonstrated that Syssomonas multiformis forms a clade with Corallochytrea limacisporum, branching off as a separate Holozoa lineage and not right after choanoflagellates (Grau-Bové et al. 2017, Hehenberger et al. 2017). However, this disparity in the topology does not affect the interpretation of this result. In claiming that this position shows deep-rooted novelty, we propose that it may be a new unicellular Holozoa group. These OTUs were mainly located in the Indian Ocean and the Mediterranean Sea, although their presence was also found in other oceans. They seem to be small organisms, in the range of pico-nano fractions (0.8 to 20 μm). Further molecular research is needed to obtain morphological information of this group, as well as any other ecological trait that may be relevant (i.e., symbiotic relationships, pigments, presence of flagella, cell wall or coenocytic growth).

133

4. Discussion

4.2. New diversity of Acoelomorpha and its implications for understanding early bilaterian evolution

We carried out a global screening of the metazoan phylum Xenacoelomorpha in marine ecosystems, including coastal and open waters, together with deep-sea benthos. Xenacoelomorpha is a clade of flatworms that comprise Acoela, Nemertodermatida, and the Xenoturbella genus, which are likely to be the sister-group to the rest of (Ruiz-Trillo et al. 2002; Hejnol et al. 2009).

Shown by the low BLAST percentages of similarity against reference databases, our dataset seemed to harbour a vast amount of putative new diversity of acoelomorphs. However, these results could also be explained by the fast evolutionary rate of the 18S rRNA gene in this phylum (López-Escardó et al. 2018b), as it had already been proven in Platyhelminthes (Carranza et al. 1996) and Chaetognatha (Gasmi et al. 2014). However, we revealed deep-divergent novel clades that could not be explained only by their fast evolutionary rates. In particular, we described two clades: Deep-sea Acoela clade 1 and 2. Whilst Deep-sea Acoela 2 appeared to be located between some Acoela families, Deep-sea Acoela 1 appeared as the sister-group to the rest of Acoela. The position of both clades were also strongly supported through all phylogenetic methods, especially in the case of the clade 1.

Both Deep-sea Acoela 1 and 2 clades were formed by a unique OTU each, and they were both found in very deep sediments (beneath 3,500 m) of the North Pacific and the North Atlantic Oceans, respectively. One single OTU clade might cause concern because of the possibility to be an artifactual result, rather than one real species. However, the recent discovery of four new Xenoturbella species in deep sediments demonstrates that is it highly likely to find new diversity in the bottom sea (Rouse et al. 2016). In addition, it is highly possible that our clades were not formed only by one single OTU because the 18S rRNA gene underestimates the real diversity in animals (Tang et al. 2012). Therefore, this supports our finding of both clades to be new diversity in the Acoela tree of life.

134

4. Discussion

In a broader perspective, our general screening of the acoelomorph biodiversity also contributes to the animal research community. This is of crucial importance because there is a lack of animal sequences in public repositories that significantly affects the assessments of animal diversity (Stefanni et al. 2018; Trebitz et al. 2015; Abad et al. 2016). Thus, our approach helps to improve current animal databases thanks to the description of two new Acoela clades, and the report of a more extensive planktonic acoelomorph diversity than previously thought. Metabarcoding is probably the best technique to study this phylum, given that morphological assessments failed to retrieve actual specimens that were, on the other hand, found through molecular data (Lejzerowicz et al. 2015).

These results can also shed light on the hypothetical last common bilaterian ancestor, known as Urbilateria. The traditional planuloid-acoeloid hypothesis envisioned the Urbilateria as a simple, benthic acoelomate, very similar to the extant xenacoelomorph species (Graff 1882; Hyman 1940). This hypothesis rests on the premise that the dramatic reductions displayed by extant xenacoelomorphs (i.e., absence of body cavity, circulatory, respiratory and excretory systems; one single opening of the digestive system, etc) were already exhibited by the bilaterian ancestor (Jondelius et al. 2011). However, it is far from being clear how this ancestor really looked like. It is the study of the full Xenacoelomorpha biodiversity that can provide new clues into what is derived and what is ancestral in this phylum to help envision the Urbilateria morphology and ecology (Hejnol & Pang 2016).

In this regard, our analyses provides the perspective that new potentially informative Acoela species are to be discovered in deep sea benthos (3,500-5,000 m). Thus, a strategic sampling would be key to elucidate how was the Urbilateria, where it evolved, and how the earliest bilaterian lineages emerged.

135

4. Discussion

4.3. Technical caveats of metabarcoding

4.3.1. Cut-offs and clusterization methods for OTU delineation

Any metabarcoding survey requires a clustering step to generate the OTUs (see section 1.4.4. Operational Taxonomic Unit (OTU)), which is a pragmatic unit to classify molecular diversity, considered a rough proxy for species. However, there is some controversy on the exact similarity threshold that should be applied when studying eukaryotic diversity, as well as on the best clustering software.

In the present thesis, I used different thresholds and softwares interchangeably either when conducting the clustering or analyzing other datasets (Table 2). In this section, I will, therefore, explain the reasons behind and discuss the widely accepted cut-off of 97% as the benchmark for OTU clustering in protists.

Result 3.1 Result 3.2 Result 3.3 Unicellular Opisthokonta from Similarity networks of Global marine Paraná River, Argentina unicellular Holozoa from Tara Xenacoelomorpha Oceans metabarcoding survey

97% d = 1 97% USEARCH, 99% Qiime, USEARCH Swarm d = 1 Swarm Table 2. Overview of the thresholds and softwares used throughout the thesis to analyse metabarcoding data from unicellular Opisthokonta and animals. Whilst I conducted the pipeline to obtain the OTUs in result 3.1, I took the OTUs already clustered in results 3.2 and 3.3.

Although there is no clear consensus on the best threshold to apply to all eukaryotes (Nebel et al. 2011; Brown et al. 2015), 97% similarity has been extensively used in metabarcoding studies (Sogin et al. 2006; Stoeck et al. 2009; Massana et al. 2015; Giner et al. 2016). It has been reported that this cut-off avoids putative artifacts (Kunin et al. 2010; Behnke et al. 2011) and intra-individual polymorphic variants (Stoeck et al. 2010). Thus, this is why we chose this threshold to cluster the unicellular Opisthokonta amplicon dataset from Paraná River (Table 2). In this regards, the study of Behnke and collaborators on protists from intertidal sediments supported the selection of 97% cut-off (Behnke et al. 2011). They reported that the number of OTUs clustered at 97% reflected better the number of reference OTUs

136

4. Discussion obtained through PCR clone libraries that came from a known subset of the community (Behnke et al. 2011). Given these results, we were more confident about applying this threshold to our sampling of the freshwater eukaryotic community in Paraná River. However, I am aware that the slow evolution rate of the 18S rRNA gene in many protistan lineages can make this threshold not good enough to resolve species delimitation (Stoeck et al. 2010).

In order to know the best threshold for OTU clustering, it would be necessary to analyse the 18S rDNA variability from all the species belonging to each particular clade. One good example of this strategy was conducted in ciliates, in which the authors selected 98% as the best cut-off on account of the inclusion of all intraspecific variability (Figure 11) (Nebel et al. 2011). Whether this percentage of similarity could be applied to the unicellular Opisthokonta remains unknown because we lack the empirical data to test it.

Figure 12. Box plots depicting the data range of pairwise 18S rRNA gene sequence similarities within taxonomic ranks in catalogued Ciliophora. Adapted from Nebel et al. 2011.

To overcome the limitations of using a fixed similarity threshold that cannot be universally applied to all lineages, we agreed to use Swarm clustering software in result 3.2 and result 3.3 (Table 2) (Mahé et al. 2015). Swarm is an iterative process based on the abundance of the amplicons, so the OTU grows adding amplicons with one single nucleotide difference until the abundance drops, reflecting the boundary of the OTU. This means that in some cases, OTUs may contain 1% variability, whereas 4% in others, for

137

4. Discussion example. The advantage is that it is free of input-order dependency of the amplicons, it constructs more natural OTUs and it is possible to see the inner structure of the OTU, which works as a network. These advantages provided us with good reasons to use Swarm OTUs from the Tara Oceans dataset.

Finally, we decided to test whether different clustering strategies would produce different OTU number and distribution. Thus, we clustered the raw data from the Paraná River sampling using both methods: 97% with USEARCH and Swarm with the default d=1 (see Appendix). Much as Swarm clustering yielded 2.3 times more OTUs, the composition of the eukaryotic diversity did not vary significantly. Therefore, we observed that it is not the clustering method the most determinant step when addressing molecular diversity, but the specific level at which the diversity is to be studied.

Overall, the main goal of the present work was to find novel diversity in unicellular Opisthokonta and animals at any biological level. For this reason, I considered less important to what extent our OTUs accurately reflected the real species. For instance, if the novel groups that we named as FRESHOL (for freshwater Holomycota, in result 3.1) or as MASHOL (for MArine Small HOLozoa, in result 3.2) had a large intraspecific variability in their 18S OTUs, it would imply that they would be formed by one or few species with different populations exhibiting a large variation in their amplicon sequences. In this scenario, populations would have adapted and evolved separately in different ecosystems. This polymorphic intraspecific variability is common in phylotypes (groups of organisms based on their phylogenetic relationships, not taking into account their taxonomic level). Phylotypes have been reported in ciliates (Nebel et al. 2011) and Ichthyosporea (Glockling et al. 2013), the latter supporting our hypothesis: the new molecular Opisthokonta group could be one phylotype with a large 18S variability. On the contrary, if the genetic novelty was interspecific, it would imply that closely related species shared the same taxonomic rank. I believe this scenario is less plausible because sister species from unicellular opisthokonts display lower 18S sequence similarities than the clustering thresholds we considered in any of our datasets.

138

4. Discussion

4.3.2. 18S reads as a (crude) proxy for species abundance

Knowing the species abundance is of crucial importance in biodiversity assessments, either from an ecological perspective or when seeking new diversity. In molecular studies, as we work with the specimen’s DNA sequence, the closest approximation to the real species abundance is the number of reads belonging to one OTU. Thus, if a specific OTU is found in an environmental survey in high abundance (with many reads), it increases the confidence about the real existence of this organism. For example, we proposed that MASHOL (result 3.2) could be a potential new Holozoa group based, among other reasons, on the high amount of reads.

However, the exact association between the abundance of the marker, in this case, the 18S rRNA gene, and the abundance of the species is a matter of debate. The reason is the existence of multiple copies of the 18S rRNA gene per genome. In addition, there is an enormous variation in copy number among eukaryotes, whose apparent random pattern of distribution across the tree of life has bewildered researchers during the last 20 years (Zhu et al. 2005; Godhe et al. 2008; Prokopowich et al. 2003). Consequently, when estimating abundances with metabarcoding data, we are actually counting the abundance of the marker, not the abundance of the species. In spite of this disparity, several studies have tried to correlate rRNA gene copy number to any biological feature.

For instance, the number of rDNA copies has been positively correlated with the genome size (Prokopowich et al. 2003), genome stability and integrity (Ide et al. 2010), or with gene expression across the genome (Paredes et al. 2011). rDNA copies have also been negatively correlated with the mitochondrial DNA abundance (Gibbons et al. 2014). Other studies detected a relationship with fluctuations in environmental conditions. For example, a great variation in 18S copy numbers was associated to latitudinal, longitudinal and altitudinal gradients in plants because of the temperature and fire-related stress (Bobola et al. 1992; Govindaraju & Cullis 1992) or in ciliates because of the temperature gradient (Fu & Gong 2017).

The correlation between the number of rDNA copies and the real abundance of the organisms have been addressed comparing abundance of the marker (number of reads) with counts of real individuals either by microscopy inspection, FISH probes or creating mock communities. This strategy was proved successfully reliable in diatoms (de Vargas et al. 2015). Another study targeting three species of Foraminifera revealed that indeed,

139

4. Discussion a good correlation could be drawn between the abundance of species and the proportion of reads (Weber & Pawlowski 2013). Impressively, it was even possible to correlate rDNA copy numbers with the abundance of the uncultured environmental clade MAST4 in marine waters (Rodríguez-Martínez et al. 2009). However, these studies are scarce and seem to be the exception, rather than the rule. As researchers from the Tara Oceans consortium found in a broader inspection of the expedition, there was no significant correlation between V9 reads and taxa abundance in big eukaryotic lineages (de Vargas et al. 2015).

The closer estimate seems to be the correlation between rDNA copy number and cell size (measured as cell length or cell volume, depending on the authors) (Godhe et al. 2008; Zhu et al. 2005). The larger the cell, the greater the number of 18S copies it has in the genome (Figure 13). But it is the cell volume (and not the cell length) what is better correlated with the amount of V9 reads (Godhe et al. 2008; de Vargas et al. 2015). However, it is important to note that not all eukaryotes are reflected in this analysis. Ciliates are usually excluded because they appear as outliers on account for their double nucleus and genomic content typical of these species. Unfortunately, we lack the information in many other eukaryotic clades, such as Opisthokonta or Amoebozoa.

Figure 13. Correlation between rDNA copy number and organism length across all eukaryotes. The compiled data confirm that copy number tends to increase with the size of the organism: cells smaller than 5 μm generally have between 1 and 5 copies, while cells larger than 200 μm can have between 10,000 and 200,000 copies. Source: de Vargas et al. 2015.

140

4. Discussion

Overall, following the association found in other eukaryotes, we can assume that 18S rDNA reads in unicellular Opisthokonta would work in the same way, being a rough proxy of the cell volume (known as biovolume) of the species. The biovolume concept can be easily applied to any kind of cell, from grown-up cells, to cysts or even extracellular structures. This is very convenient for unicellular Opisthokonta because these species display a wide range of lifestyles and cellular life stages, so being able to count the volume of the organisms in the ecosystem rather than the organisms themselves is a neat and helpful approach. If we also take into account OTUs as a reflection of the richness in a sample, this strategy is very adequate when addressing the biodiversity structure from a taxo-functional standpoint.

4.3.3. Differences in biodiversity recovery between 18S rRNA gene hypervariable regions

One of the main caveats of metabarcoding is the use of a specific region instead of the full-length 18S rRNA gene sequence. The profound consequence is the differential recovery of biodiversity depending on the region, which ultimately shows a biased picture of the community. Several studies tackled this problem by sequencing two marker regions, usually the V4 and the V9 from the 18S rRNA gene (Stoeck et al. 2010; Decelle et al. 2014; Piredda et al. 2017; Tragin et al. 2018).

Using this strategy, previous works have already shown differences in eukaryotic recovery. For example, in one investigation all main eukaryotic lineages were amplified using the V9 region, whereas the V4 failed to notice , (Excavata) and several Stramenopile clades (Stoeck et al. 2010). Foraminiferans are known to be only recovered by the V9 but not the V4 region (Amaral-Zettler et al. 2009; Pawlowski et al. 2011; Piredda et al. 2017). Our results from the Paraná River shows a completely different paradigm. First, the V4 region was able to detect OTUs of , Jakobida, and several Stramenopila subgroups. Second, there were V4 OTUs of Picobiliphyta, Dictyostelea, and , that the V9 failed to retrieve, showing that in our case the V9 did not work for all eukaryotes. Finally, the amount of potential new diversity was found to be 8.29% lower using the V9 rather than the V4 region.

141

4. Discussion

We also observed differences in the detection of fungal clades, specifically Chytridiomycota and Cryptomycota, using either the V4 or the V8-9 regions. This result is fully opposite to what it has been described for freshwater Fungi (Lepère et al. 2019). Our results might be explained by the high level of endemicity in freshwater ecosystems that these taxa may display. Another explanation lays on the large amount of dark matter fungi: the diversity of undescribed fungal species that seem to be ubiquitous and abundant in the environment but from which we lack cultures or morphological information whatsoever (Grossart et al. 2015). This is specially cruciating in these early branching fungal clades, given that they have been either recently described, neglected from ecological studies or misplaced in the tree of life (Jones et al. 2011; Karpov et al. 2014; Grossart et al. 2015).

Several reasons have been put forward to explain this skewed recovery of biodiversity depending on the 18S region considered. One of these reasons is the completeness of the database. In GenBank there are more 18S sequences containing the V4 hypervariable region than the V9 (Stoeck et al. 2010). Therefore, when blasting a query environmental OTU against the GenBank database, few V9 sequences would get a hit because this region is simply not included. We verified this using the environmental group FRESHIP1 (for freshwater ichthyosporeans 1). We observed that from three FRESHIP1 sequences in the database, only one contained the V9 region, which could explain why the abundance for this group was greater in the V4 and almost non existent in the V8-9 from our sampling.

Another reason is the 18S region structure and characteristic. The V4 is the longest and most variable 18S region (Wuyts et al. 2000). This is why it is more powerful to differentiate diversity at higher taxonomic ranks (Stoeck et al. 2010) and has been widely used in metabarcoding studies (del Campo & Ruiz-Trillo 2013; Massana et al. 2015; Simon et al. 2015). However, it is the disparity in length (in some cases its absence), the fast evolutionary rate and the secondary structure that creates most of the biases in the eukaryotic recovery (Vossbrinck et al. 1987; Sogin et al. 1989; Bråte et al. 2010; Behnke et al. 2011). In addition, the V4 contains 6.8 times more homopolymers4 than the V9 (Stoeck et al. 2010), which makes it more prone to contain sequencing errors. On the contrary, the V9 is quite simple in structure, with a stable length and less heterogeneity

4 Homopolymer: A tandem of 4 or more of the same nucleotide in a row.

142

4. Discussion

(Amaral-Zettler et al. 2009; Stoeck et al. 2009), attributes that have been well considered for detecting eukaryotic diversity (Mahé et al. 2017; Pawlowski et al. 2011; Stoeck et al. 2010; de Vargas et al. 2015). V9 has also been reported to work similarly than the full- length 18S rRNA sequences at high taxonomic levels, illustrating similar patterns of community among samples (Lie et al. 2014). On the contrary, the V9 seems to be not enough to distinguish between species or genera, and also, its short length comes into conflict to perform phylogenetic reconstructions on account of the weak signal (Pawlowski et al. 2011).

Our results confirmed that there is not a unique ideal 18S hypervariable region. It is a thorough design of the methodology that can provide a more accurate picture of the community. If the original scientific question is a detailed examination of the diversity, a right approach would be to combine both regions, together with different sets of primers, as it was carried out by Zhu et al. 2018. Another possibility is the nested strategy put forward by Pawlowski and collaborators, which is explained in section 4.5. Future perspective for biodiversity assessments. Alternatively, if the original question focuses on a general overview of eukaryotic diversity, a more simple strategy using one of the regions would be enough. Our results of the unicellular Holozoa and eukaryotes of Paraná River showed that in a broader perspective, both V4 and V8-9 regions recovered quite similar diversity. Moreover, this similar picture of the general diversity has been proven to be maintained during seasonal changes (Piredda et al. 2017). Therefore, what these results imply is that the depth of the study determines the use of one or several hypervariable regions and how these regions must be amplified and sequenced.

143

4. Discussion

4.4. Gene similarity networks to test evolutionary theories

Since Darwin times, evolution has been mainly depicted in a tree-like shape. One ancestor split into two different populations that would eventually give rise to different evolutionary species. However, thanks to the advances in cell and molecular biology, metabolism, and ecology, networks burst into the evolutionary biological scene to explain processes that could not be interpreted using the bifurcated classical tree (Dagley & Nicholson 1970; Doolittle & Bapteste 2007; Junker & Schreiber 2008; Arnold & Fogarty 2009; Corel et al. 2016; Layeghifard et al. 2017). The acquisition of this tool for ecological studies was immediate because several ecological processes were already known to have a network-like shape, such as trophic webs, pollination, parasite-host relationships, etc. However, evolutionary biologists were more reluctant to the adoption of this method because it was shaking traditional tenets in biology, for instance, the inexistence of lateral gene transfer in eukaryotes (Martin 2017; Leger et al. 2018).

Consequently, it is not surprising that the application of networks to find molecular novelty in metabarcoding data is practically absent, even today. To the best of our knowledge, there are very few metabarcoding studies that actually drove towards this direction (Forster et al. 2015). Even these studies were more focused on testing ecological theories rather than exploring the molecular novelty within the chosen clade, as we did in the present thesis. This is somehow surprising given the advantages of networks to analyze metabarcoding data. Thus, I will now describe the main arguments to implement similarity networks in the analysis of the Tara Oceans metabarcoding dataset of unicellular Holozoa.

Networks have a structure that emerges from its basic components. In a gene similarity network like ours, nodes (18S sequences) are connected through edges if, and only if, these nodes share a certain percentage of similarity (Figure 14). Therefore, there is a wide range of edge similarities in the same network. It is by increasing the stringency that the network can be subsequently fragmented into Connected Components (CCs), providing a detailed examination of the diversity, for example at the intraspecific level. On the contrary, if the question drives towards understanding large clusters of diversity, lowering the threshold allows the formation of big CCs called Louvain Communities

144

4. Discussion

(LCs). This is the reason why we partitioned the initial network into increasing similarity thresholds (≥85%, ≥87%, ≥90%, ≥95%, ≥97%, ≥99%), to observe the dataset in different grouping structures of the unicellular Holozoa diversity.

Figure 14. Sequence similarity networks. The initial network is constructed through all-against-all BLAST of all sequences from the dataset. Thus, every node is a sequence (black dot). Edges connecting the nodes represent the percentage of similarity between these two sequences, depicted in a colour gradient. (A) When the threshold is higher (i.e., 97% similarity), the network is more stringent because only edges equal or above this percentage are kept in the network. The most basic partition of the network is the connected component (CC): the minimum set of nodes that are interconnected. This detailed zoom can provide information of the molecular biodiversity at a specific or intraspecific level. (B, C) If the threshold decreases (i.e., 96% similarity), more connections are included and thus, larger structures appear, such as giant connected components or Louvain Communities. These structures could be related to higher taxonomic ranks, for example genera or families. Adapted from Forster et al. 2015.

Not only this modularity (the division of a network into CCs or LC) is essential to adjust the level of detail in the biodiversity analysis, but also the application of multi-layer networks that took into account spatial and time dimensions could provide an enormous potential when analysing the complexity of ecological networks (Pilosof et al. 2017). Had another sampling procedure been performed, it would have been possible to apply this multi-layer analysis to our dataset.

145

4. Discussion

Finally, the network structure enables the prediction of future events in the studied system. This capability has already been proven successful when predicting vaccination or quarantine actions after a human disease outbreak (Meyers et al. 2003; Proulx et al. 2005). I believe that this property is of immense value in the field of molecular biodiversity. It could be applied, for example, to predict the outcome of specific conservation plans, or to predict the changes in the community after a monitoring assessment. I consider that microbial studies should exploit more thoroughly genetic networks in order to take the research potential to the next level in terms of explanation of the system (past), its description (present), and prediction of upcoming changes (future).

Apart from the network structure, the mathematical ground underneath allows the statistical test of the data (Layeghifard et al. 2017). Therefore, this data can be explained in the most objective possible way, also allowing the comparison with other studies. Samples can be explained through multiple statistical data distributions, or even modelized in order to achieve predictive capabilities. In our case, we benefited from this feature when testing if the distribution of closeness values from environmental (ENV) nodes from Tara Oceans was significantly different than the distribution of reference (REF) nodes. To this end, we used the Mann-Whitney U Test5 (Table 3). Regarding assortativity, we shuffled randomly 100 times the labels of the nodes keeping the same topology and connections in the network. That is to say, a node could become ENV or REF after the shuffling. We used one sample t-test to compare the real assortativity of the network with a known theoretical or hypothetical mean (the random distribution of assortativity values generated in the previous step) (Table 3).

5 The two-sample Kolmogorov-Smirnov (KS) test could be also implemented to decide whether ENV and REF distributions were statistically the same. However, the KS test has become a widely misused statistical test when a signed version of it is applied to compare if one distribution is significantly higher than the other (Filion 2015). Knowing this artifactual result, we decided not to implement it in our dataset.

146

4. Discussion

Closeness Assortativity Mann-Whitney U Test one sample t-test

H0 : CREF = CENV H0 : m = μ H1 : CREF ≠ CENV H1 : m ≠ μ

C : closeness value m : actual mean from the sample μ : theoretical mean (randomly shuffled 100 distribution mean) Table 3. Null and alternative hypothesis to test closeness and assortativity metrics in the networks from the Tara Oceans study. Regarding closeness, our dataset consisted of two distributions: an environmental distribution from the Tara Oceans dataset (blue line) and the reference distribution from GenBank (black line). Mann-Whitney U Test allows the comparison of both distributions to see if one is significantly different than the other one. Regarding assortativity, we computed the assortativity of 100 random shuffled networks and obtained the distribution of these values (black line). Then, we compared whether the real assortativity value from the network was significantly different (red dot: if A, not significantly different; if B, significantly different).

Finally, we also used bipartite networks, those networks that consist of two sets of nodes in which only connections between them are allowed (Poulin 2010; Layeghifard et al. 2017). These are typical to display parasite-host relationships (Poulin 2010) or other mutualistic interactions (Valverde et al. 2018). By implementing this method, we could detect co-occurrence patterns between already known parasitic ichthyosporeans, such as C. fragrantissima and more animal hosts than what it had already been described (Marshall et al. 2008). Moreover, we could detect other co-occurrence patterns from the environmental group MAIP (marine ichthyosporeans), from which there is no morphological information whatsoever. This expands their possible ecological niches where this group may live in. In addition, this analysis ultimately conveys the idea of

147

4. Discussion environmental DNA (eDNA) as a good tool for detecting ecological interactions that may be very difficult to know otherwise or that would require an extensive sampling effort and expertise to find the same results.

To conclude, networks are a robust method for evolutionary biologists as well as for ecologists, as it has been demonstrated in the last decades (Junker & Schreiber 2008; Arnold & Fogarty 2009; Faust & Raes 2012; Bapteste et al. 2013; Ocaña-Pallarès et al. 2019; Banerjee et al. 2018). Networks are scaling over traditional methods for biodiversity assessment to address complex evolutionary and ecological interactions simultaneously (Corel et al. 2016; Banerjee et al. 2018; Layeghifard et al. 2017). This is the reason why we encourage their implementation using metabarcoding data, as we did with the Tara Oceans dataset. It is not a matter of substituting phylogenetic methods, but rather incorporating the networks as a powerful complement to address such complex biological scenarios.

4.5. Future perspective for biodiversity assessments

The main objective of the present thesis was to find novel molecular diversity of unicellular Opisthokonta and Metazoa. To this end, I analysed metabarcoding data from the 18S rRNA gene using different approximations: phylogenetic inferences, phylogenetic placements, and gene similarity networks. That said, metabarcoding alone is very limited to fully describe a new protist species; this is the reason why it can be complemented with other techniques. In this section, I will describe some of the current strategies that enhance the resolution of metabarcoding surveys and also, I will comment upon the impact of technological advances in the near future of the field.

Several improvements can already be implemented. For instance, to overcome the non- universality of a single marker gene for all species, a nested strategy, which was proposed by Pawlowski and collaborators in 2012, could be applied. The first step of this strategy consists of a general screening using the V4 region of the 18S rDNA, followed by a second step using specific barcodes for every group of interest (Pawlowski et al. 2012). Another promising strategy is the combination of multiple markers in one single

148

4. Discussion sequencing step. For example, when COI and 18S were targeted using 4 primer pairs in a single Illumina run, the recovery of the animal biodiversity over a single marker or single primer pair increased between 14% and 35% (Zhang et al. 2018). In general, the implementation of several primers or markers has led to a better picture of the biodiversity (Shi et al. 2011; Debroas et al. 2017) and hence, could be easily applied even if the computational cost to analyze the data is higher.

But it is the evolution of the High Throughput Sequencing (HTS) platforms that will take metabarcoding to the next level. Third-generation sequencing technologies, such as PacBio, are already able to sequence amplicons up to 3,000 bp, although the quality may be lower than the ones obtained with Illumina (Tedersoo et al. 2018). Regarding protists, PacBio has been successfully used to study the enigmatic Diphyllatea eukaryotes, yielding almost full-length 18S sequences and thus, increasing the phylogenetic resolution of this class (Orr et al. 2018). It has also been used in Fungi, sequencing the ITS and flanking rRNA genes together in the same amplicon (Schlaeppi et al. 2016; Tedersoo et al. 2018). Oxford Nanopore Technology (ONT) is also creating another great deal of expectation given that it does not require a PCR amplification step, avoiding possible biases and artifacts. In addition, it also sequences in real-time. The small and portable MinION device from ONT connects to a computer through a regular USB, allowing the study of biodiversity in remote areas, as well as tackling illegal trade, tracking pests, or verifying food ingredients. Regarding biodiversity, MinION has already allowed the sequencing of bacterial genomes (Loman et al. 2015), as well as bacterial and eukaryotic amplicons (Srivathsan et al. 2018), which foresees a promising future for the metabarcoding and the -omics era.

A further step to gain morphological information of the desired organism can be taken by applying several existing techniques. One of them is the Fluorescence In Situ Hybridization (FISH). This method requires the design of a specific fluorescence probe that binds to a specific gene or chromosomal region from a fixed sample (Amann & Fuchs 2008). This sample is later processed, either by flow cytometry or epifluorescence microscopy. In protists, this technique has been used by targeting chitin to count fungal cells (Richards et al. 2015) or combined with cytoskeletal and organellar immunostaining to count ciliates (Hirst et al. 2011).

149

4. Discussion

Finally, in order to fully describe new diversity, it is necessary to actually isolate the organism and ideally, culture it for further research. One of the isolation techniques is micromanipulation (Tong 1997), which requires a capilar to suck the cell directly from the media while observing through the microscope. Another isolation technique is conducting serial dilutions, as it was also done by Tong 1997. Finally, the use of Fluorescence Activated Cell Sorting (FACS) to separate a population of live cells based on fluorescence labeling is a very straightforward technique to get axenic cultures. For example, having a culture of two eukaryotic species of different size and shape together with many bacteria, one could use a red fluorescence to dye mitochondria and a green dye for vacuoles. Then, eukaryotes would be fluorescent in green while bacteria in red, so the separation of these two types of cells would be relatively easy. Considered the size and shape as other variables, the two eukaryotic species would be separated.

Overall, the work developed through this thesis has shown compelling evidence of the extensive biodiversity in the closest relatives of animals that is still to be known. Even though there is an abundant number of formally described unicellular Opisthokonta species, genetic diversity indicates that there must be many more, even between big lineages. Metabarcoding data provides the first step towards a morphological classification of these organisms, which hopefully will set the ground to conduct further functional studies. I believe that it is the acquisition of basic knowledge like the one provided here, that can serve the community to solve crucial evolutionary and ecological questions, which contributes to a better comprehension of the biodiversity on Earth and how this biodiversity has ultimately evolved.

150

5. Conclusions

5. CONCLUSIONS

“Contrariwise,” continued Tweedledee, “if it was so, it might be; and if it were so, it would be; but as it isn’t, it ain’t. That’s logic.”

· Alice through the looking glass (Lewis Carroll)

151

5. Conclusions

152

5. Conclusions

The main conclusions of the present work are the following:

1. All big eukaryotic clades were successfully retrieved from the Paraná River metabarcoding study. Targeting both the V4 and the V8-9 regions of the 18S rRNA gene was helpful to avoid the biased recovery of each region alone.

2. Amoebozoa was the eukaryotic clade with the largest amount of putative new diversity (52.25%) in Paraná River samples, shown by the percentage of OTUs with less than 90% BLAST identity against the PR2 reference database.

3. A great proportion of unicellular Holomycota and Holozoa OTUs from Paraná River could not be assigned to any reference sequences. In Holomycota, around a third of all OTUs had less than 90% BLAST identity against the PR2 reference database. In Holozoa, 61% of OTUs were associated with environmental uncultured sequences, showing that most of the unicellular Holozoa diversity is still undescribed.

4. Freshwater Holomycota (FRESHOL1) is a novel clade formed by 10 OTUs detected in Paraná River. It is the earliest split-off before Opisthosporidia and Fungi sensu lato, with a good nodal support. Given that it was found in different size fractions, this clade could have a multi-stage life cycle with two types of cells: a smaller dispersive one (cysts, spores) and a bigger one (multicellular sporangia).

5. Freshwater Holomycota-related (FRESHOL1-related) is another novel clade formed by 2 OTUs and splits after FRESHOL. The low support of this clade indicates that the position is unclear, although it is very likely to be an early diverging clade in the Holomycota tree of life.

6. OTUs associated to the newly corallochytrean species Syssomonas multiformis accounted for 41.18% of all unicellular holozoan reads in Paraná River, being the most abundant species. This shows the extremely extensive range of freshwater geographical locations in which this species inhabits.

153

5. Conclusions

7. The exclusive environmental choanoflagellate clade FRESCHO3 (for freshwater choanoflagellates 3) is the most abundant group in the Paraná River dataset, accounting for 15% of the total unicellular Holozoa richness and abundance. We also retrieved 8 OTUs of FRESCHO3 in the marine samples from Tara Oceans, showing that this environmental clade is not freshwater exclusive. This result shows that even in a well-known group such as choanoflagellates, there are many more species to be discovered, in both fresh and sea waters.

8. In the marine Tara Oceans dataset, we detected novel diversity in Acanthoecida, branching off the subgroups Choanoflagellate H and G, the latter including one of the most abundant OTUs among holozoans (28,326 reads). These results indicate that this group of choanoflagellates can harbour abundant unknown diversity clearly divergent from described species.

9. We found a group of 21 OTUs with a total of 6,244 reads in the Tara Oceans dataset that was phylogenetically placed in nodes outside any holozoa lineage, specifically between Choanoflagellatea and Syssomonas multiformis, suggesting to be a putative new unicellular Holozoa group. Retrieved it from the Indian Ocean and the Mediterranean Sea in the smallest size fractions (0.8-20 μm), we tentatively named it as marine small Holozoa (MASHOL).

10. A total of 15 Filasterea OTUs were recovered from the Tara Oceans expedition, being the first time that this group is confidently detected in any environmental survey. We also proved that the V9 hypervariable region works better than the V4 to detect filasterean species through metabarcoding data. Recovered from the South Pacific Ocean, the Red Sea and the Indian Ocean, this clade shows a wider molecular diversity in marine habitats compared to freshwaters.

11. According to the co-occurrence analysis, OTUs assigned to the ichthyosporean species Creolimax fragrantissima were strongly associated with several animal phyla: Entoprocta (Barentsiidae), Mollusca (Polyplacophora), Tardigrada and Porifera (Homoscleromorpha, Calcarea and Demospongiae). Therefore, metabarcoding data is a useful tool to detect ecological interactions between species when there is no morphological information whatsoever.

154

5. Conclusions

12. Other environmental ichthyosporean clades, such as MAIP1 (for marine ichthyosporeans 1) were significantly associated with Acoelomorpha, Arthropoda, Bryozoa, Cnidaria, Nematoda, and Chordata. This result suggests that this group may not be necessarily free-living, as it was previously reported.

13. Gene similarity networks offer a robust method to explore molecular diversity of unicellular Holozoa using metabarcoding data. This method is especially useful if the sequences contain limited phylogenetic signal, the dataset is large, or the relationships between the species are tangled. The nested structure, and the possibility to statistically test the hypothesis, imply that networks are an excellent complement of phylogenetic methods.

14. Plankton harbours large diversity of unsampled acoels within already known families, showing a greater complexity in the ecology and lifestyles of these species than previously described.

15. Deep sea Acoela clade 1 is a novel exclusive environmental group located as the first splitting lineage of acoels with high bootstrap support. This OTU was detected at 3,678 m depth offshore California. Deep sea Acoela clade 2 branches as the sister group to Crucimusculata, the most derived acoels. The sequences come from a sampling of fine muds at the bottom sea (4,878 m depth) in the North Atlantic Ocean.

16. These novel clades show that deep-sea sediments are key to sample unknown hidden Acoelomorpha taxa that are relevant to understand the evolution and diversification of early bilaterian animals.

17. Metabarcoding is a suitable tool to first detect new molecular diversity, paving the way towards the acquisition of genomic and morphological information. Targeting several hypervariable regions and combining different primer sets in the same amplification step can enhance the outcomes. In addition, the implementation of 4th generation sequencing platforms will definitely take metabarcoding resolution to the next level.

155

5. Conclusions

156

6. References

6. REFERENCES

157

6. References

158

6. References

Abad, D. et al., 2016. Is metabarcoding suitable for estuarine plankton monitoring? A comparative study with microscopy. Marine Biology, 163(7), p.149.

Adell, T. et al., 2004. Evolution of Metazoan Cell Junction Proteins: The Scaffold Protein MAGI and the Transmembrane Receptor Tetraspanin in the Demosponge Suberites domuncula. Journal of Molecular Evolution, 59(1), pp.41–50.

Adl, S.M. et al., 2007. Diversity, Nomenclature, and Taxonomy of Protists T. Collins & J. Sullivan, eds. Systematic Biology, 56(4), pp.684–689.

Adl, S.M. et al., 2018. Revisions to the Classification, Nomenclature, and Diversity of Eukaryotes. Journal of Eukaryotic , 66, pp.4–119.

Aldhebiani, A.Y., 2018. Species concept and speciation. Saudi Journal of Biological Sciences, 25(3), pp.437–440.

Amacher, J. et al., 2009. Molecular approach to determine contributions of the protist community to particle flux. Deep Sea Research Part I, 56(12), pp.2206–2215.

Amann, R. & Fuchs, B.M., 2008. Single-cell identification in microbial communities by improved fluorescence in situ hybridization techniques. Nature Reviews Microbiology, 6(5), pp.339–348.

Amaral-Zettler, L.A. et al., 2009. A Method for Studying Protistan Diversity Using Massively Parallel Sequencing of V9 Hypervariable Regions of Small-Subunit Ribosomal RNA Genes G. Langsley, ed. PLoS ONE, 4(7), p.e6372.

Amaral Zettler, L.A. et al., 2002. Microbiology: eukaryotic diversity in Spain’s River of Fire. Nature, 417(6885), p.137.

Andersen, R.A., 1998. What to do with Protists? Australian Systematic Botany, 11, pp.185–201.

Andreou, D. et al., 2012. Introduced Pathogens and Native Freshwater Biodiversity: A Case Study of Sphaerothecum destruens H. Browman, ed. PLoS ONE, 7(5), p.e36998.

Arkush, K.D. et al., 2003. Observations on the Life Stages of Sphaerothecum destruens n. g., n. sp., a Mesomycetozoean Fish Pathogen Formally Referred to as the Rosette Agent. The Journal of Eukaryotic Microbiology, 50(6), pp.430–438.

Armstrong, H.A. & Brasier, M.D., 2003. Microfossils H. A. Armstrong & M. D. Brasier, eds., Blackwell.

Arndt, H. et al., 2002. Functional diversity of heterotrophic in aquatic ecosystems. In B. S. C. Leadbeater & J. C. Green, eds. Flagellates: Unity, diversity and evolution. London: Taylor and Francis, pp. 240–268.

Arnold, M. & Fogarty, N., 2009. Reticulate Evolution and Marine Organisms: The Final Frontier? International Journal of Molecular Sciences, 10(9), pp.3836–3860.

159

6. References

Arnot, D.E., Roper, C. & Bayoumi, R.A.L., 1993. Digital codes from hypervariable tandemly repeated DNA sequences in the circumsporozoite gene can genetically barcode isolates. Molecular and Biochemical Parasitology, 61(1), pp.15–24.

Arroyo-Begovich, A. et al., 1980. Identification of the Structural Component in the Cyst Wall of Entamoeba invadens. The Journal of Parasitology, 66(5), p.735.

Baker, G.C., Beebee, T.J.C. & Ragan, M.A., 1999. Prototheca richardsi, a pathogen of anuran larvae, is related to a clade of protistan parasites near the animal-fungal divergence. Microbiology, 145(7), pp.1777–1784.

Baldauf, S.L. & Palmer, J.D., 1993. Animals and fungi are each other’s closest relatives: congruent evidence from multiple proteins. Proceedings of the National Academy of Sciences, 90(24), pp.11558–11562.

Banerjee, S., Schlaeppi, K. & van der Heijden, M.G.A., 2018. Keystone taxa as drivers of microbiome structure and functioning. Nature Reviews Microbiology, 16(9), pp.567–576.

Bapteste, E. et al., 2013. Networks : expanding evolutionary thinking. Trends in Genetics, 29(8), pp.439–441.

Bar-On, Y.M., Phillips, R. & Milo, R., 2018. The biomass distribution on Earth. Proceedings of the National Academy of Sciences, 115(25), pp.6506–6511.

Bass, D. et al., 2018. Clarifying the Relationships between Microsporidia and Cryptomycota. Journal of Eukaryotic Microbiology, pp.1–10.

Bass, D. et al., 2007. forms dominate fungal diversity in the deep oceans. Proceedings of the Royal Society B: Biological Sciences, 274(1629), pp.3069– 3077.

Bass, D. & Richards, T.A., 2011. Three reasons to re-evaluate fungal diversity ‘on Earth and in the ocean.’ Fungal Biology Reviews, 25(4), pp.159–164.

Behnke, A. et al., 2011. Depicting more accurate pictures of protistan community complexity using pyrosequencing of hypervariable SSU rRNA gene regions. Environmental Microbiology, 13(2), pp.340–349.

Berger, S.A., Krompass, D. & Stamatakis, A., 2011. Performance, Accuracy, and Web Server for Evolutionary Placement of Short Sequence Reads under Maximum Likelihood. Systematic Biology, 60(3), pp.291–302.

Bergsten, J., 2005. A review of long-branch attraction. Cladistics, 21(2), pp.163–193.

Blanc-Brude, R., Skreb, Y. & Dragesco, J., 1955. Sur la biologie de Nuclearia delicatula Cienkowski. Bull. Micr. Appl., 2, pp.113–117.

Blaxter, M., 2003. Counting angels with DNA. Nature, 421(6919), pp.122–123.

160

6. References

Blaxter, M. et al., 2005. Defining operational taxonomic units using DNA barcode data. Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 360(1462), pp.1935–1943.

Bobola, M.S., Eckert, R.T. & Klein, A.S., 1992. Restriction fragment variation in the nuclear ribosomal DNA repeat unit within and between Picea rubens and Picea mariana. Canadian Journal of Forest Research, 22(2), pp.255–263.

Bock, C., Chatzinotas, A. & Boenigk, J., 2017. Genetic diversity in chrysophytes: Comparison of different gene markers. Fottea, 17(2), pp.209–221.

Bråte, J. et al., 2010. Freshwater Perkinsea and marine-freshwater colonizations revealed by pyrosequencing and phylogeny of environmental rDNA. The ISME Journal, 4(9), pp.1144–1153.

Brown, E.A. et al., 2015. Divergence thresholds and divergent biodiversity estimates: Can metabarcoding reliably describe zooplankton communities? Ecology and Evolution, 5(11), pp.2234–2251.

Brown, M.W., Spiegel, F.W. & Silberman, J.D., 2009. Phylogeny of the “forgotten” cellular slime , Fonticula alba, reveals a key evolutionary branch within Opisthokonta. Molecular Biology and Evolution, 26(12), pp.2699–2709.

Brunet, T. & King, N., 2017. The Origin of Animal Multicellularity and Cell Differentiation. Developmental Cell, 43(2), pp.124–140.

Budd, G.E. & Jensen, S., 2017. The origin of the animals and a ‘Savannah’ hypothesis for early bilaterian evolution. Biological Reviews, 92(1), pp.446–473. Cafaro, M.J., 2005. Eccrinales (Trichomycetes) are not fungi, but a clade of protists at the early divergence of animals and fungi. Molecular Phylogenetics and Evolution, 35(1), pp.21–34.

Camacho, C. et al., 2009. BLAST+: architecture and applications. BMC Bioinformatics, 10(1), p.421.

Cannon, J.T. et al., 2016. Xenacoelomorpha is the sister group to Nephrozoa. Nature, 530(7588), pp.89–93.

Caron, D.A. & Hu, S.K., 2019. Are We Overestimating Protistan Diversity in Nature? Trends in Microbiology, 27(3), pp.197–205.

Carr, M. et al., 2017. A six-gene phylogeny provides new insights into choanoflagellate evolution. Molecular Phylogenetics and Evolution, 107, pp.166–178.

Carr, M. et al., 2008. Molecular phylogeny of choanoflagellates, the sister group to Metazoa. Proceedings of the National Academy of Sciences of the United States of America, 105(43), pp.16641–6.

Carranza, S. et al., 1996. Evidence that two types of 18S rDNA coexist in the genome of (Schmidtea) mediterranea (Platyhelminthes, Turbellaria, Tricladida). Molecular Biology and Evolution, 13(6), pp.824–832.

161

6. References

Cattell, J.M., 1898. The definition of species. Science, 7(178), pp.751–752.

Cavalier-Smith, T., 1987. Eukaryotes with no mitochondria. Nature, 326(6111), pp.332– 333.

Cavalier-Smith, T. & Allsopp, M.T.E.P., 1996. Corallochytrium, an enigmatic non- flagellate protozoan related to choanoflagellates. European Journal of , 32(3), pp.306–310.

Cavalier-Smith, T. & Chao, E.E.-Y., 2003. Phylogeny of Choanozoa, Apusozoa, and Other and Early Eukaryote Megaevolution. Journal of Molecular Evolution, 56(5), pp.540–563.

Cavalier-Smith, T. & Chao, E.E., 1995. The opalozoan is related to the common ancestor of animals, fungi, and choanoflagellates. Proceedings of the Royal Society of London. Series B: Biological Sciences, 261(1360), pp.1–6.

Cedhagen, T. & Tendal, O.S., 1989. Evidence of test detachment in Astrorhiza limicola, and two consequential synonyms: Amoeba gigantea and Megamoebomyxa argillobia (Foraminiferida). Sarsia, 74(3), pp.195–200.

Cheung, M.K. et al., 2008. Genetic diversity of picoeukaryotes in a semienclosed harbour in the subtropical western Pacific Ocean. Aquatic Microbial Ecology, 53(3), pp.295– 305.

Corel, E. et al., 2016. Network-Thinking: Graphs to Analyze Microbial Complexity and Evolution. Trends in Microbiology, 24(3), pp.224–237.

Corliss, J.O., 2002. Biodiversity and biocomplexity of the protists and an overview of their significant roles in maintenance of our biosphere. Acta Protozoologica, 41(3), pp.199–219.

Costello, M.J., May, R.M. & Stork, N.E., 2013. Can We Name Earth’s Species Before They Go Extinct? Science, 339(6118), pp.413–416.

Coyne, J.A. & Orr, A.H., 2004. Speciation. Sinauer Associates, ed., Sunderland, Massachusetts.

Cristescu, M.E., 2014. From barcoding single individuals to metabarcoding biological communities: towards an integrative approach to the study of global biodiversity. Trends in Ecology & Evolution, 29(10), pp.566–571.

Dagley, S. & Nicholson, D.E., 1970. An introduction to metabolic pathways S. Dagley & D. E. Nicholson, eds., New York: Wiley.

Dayel, M.J. et al., 2011. Cell differentiation and in the colony-forming choanoflagellate Salpingoeca rosetta. Developmental Biology, 357(1), pp.73–82.

Debroas, D. et al., 2017. Overview of freshwater microbial eukaryotes diversity : A first analysis of publicly available metabarcoding data. FEMS Microbial Ecology, 93(4), pp.1–14.

162

6. References

De Queiroz, K., 2007. Species Concepts and Species Delimitation. Systematic Biology, 56(6), pp.879–886. de Vargas, C. et al., 2015. Eukaryotic plankton diversity in the sunlit ocean. Science, 348(6237), pp.1261605–1261605. del Campo, J. & Ruiz-Trillo, I., 2013. Environmental Survey Meta-analysis Reveals Hidden Diversity among Unicellular Opisthokonts. Molecular Biology and Evolution, 30(4), pp.802–805. del Campo, J. et al., 2014. The others: our biased perspective of eukaryotic genomes. Trends in Ecology & Evolution, 29(5), pp.252–259. del Campo, J., Mallo, D., Massana, R., de Vargas, C., Richards, T.A., et al., 2015. Diversity and distribution of unicellular opisthokonts along the European coast analysed using high-throughput sequencing. Environmental Microbiology, 17(9), pp.3195–3207.

Decelle, J. et al., 2014. Intracellular Diversity of the V4 and V9 Regions of the 18S rRNA in Marine Protists (Radiolarians) Assessed by High-Throughput Sequencing C. Lovejoy, ed. PLoS ONE, 9(8), p.e104297.

Deiner, K. et al., 2017. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities. Molecular Ecology, 26(21), pp.5872–5895.

Deline, B. et al., 2018. Evolution of metazoan morphological disparity. Proceedings of the National Academy of Sciences, 115(38), pp.E8909–E8918.

Dirren, S. et al., 2014. Ménage-à-trois: The Amoeba Nuclearia sp. from Lake Zurich with its Ecto- and Endosymbiotic Bacteria. Protist, 165(5), pp.745–758.

Dobzhansky, T., 1951. Genetics and the origin of species, New York: Columbia University Press.

Donoghue, M.J., 1985. A Critique of the Biological Species Concept and Recommendations for a Phylogenetic Alternative. The Bryologist, 88(3), pp.172– 181.

Doolittle, W.F.F. & Bapteste, E., 2007. Pattern pluralism and the Tree of Life hypothesis. Proceedings of the National Academy of Sciences, 104(7), pp.2043–2049.

Dudin, O. et al., 2019. A unicellular relative of animals generates an epithelium-like cell layer by actomyosin-dependent cellularization. bioRxiv, p.563726.

Dunn, C.W. et al., 2014. Animal Phylogeny and Its Evolutionary Implications. Annual Review of Ecology, Evolution, and Systematics, 45(1), pp.371–395.

Durkin, C.A., Mock, T. & Armbrust, E.V., 2009. Chitin in Diatoms and Its Association with the Cell Wall. Eukaryotic Cell, 8(7), pp.1038–1050.

Dyková, I. et al., 2003. Nuclearia pattersoni sp. n. (Filosea), a new species of amphizoic amoeba isolated from gills of roach (Rutilus rutilus), and its rickettsial endosymbiont. Folia Parasitologica, 50(3), pp.161–170.

163

6. References

Edgar, R.C., 2013. UPARSE: highly accurate OTU sequences from microbial amplicon reads. Nature Methods, 10(10), pp.996–998.

Edgcomb, V. et al., 2011. Protistan microbial observatory in the Cariaco Basin, Caribbean. I. Pyrosequencing vs Sanger insights into species richness. The ISME Journal, 5(8), pp.1344–1356.

Ehrlich, P.R. & Wilson, E. 0., 1991. Biodiversity Studies: Science and Policy. Science, 253(5021), pp.758–762.

Ekelund, F., Rønn, R. & Griffiths, B.S., 2001. Quantitative Estimation of Flagellate Community Structure and Diversity in Soil Samples. Protist, 152(4), pp.301–314.

Eme, L. et al., 2014. On the Age of Eukaryotes: Evaluating Evidence from Fossils and Molecular Clocks. Cold Spring Harbor Perspectives in Biology, 6(8), pp.a016139– a016139.

Erwin, D.H. et al., 2011. The Cambrian Conundrum: Early Divergence and Later Ecological Success in the Early History of Animals. Science, 334(6059), pp.1091– 1097.

Faust, K. & Raes, J., 2012. Microbial interactions: from networks to models. Nature Reviews Microbiology, 10(8), pp.538–550.

Fawley, M.W. et al., 2006. Evaluating the morphospecies concept in the Selenastraceae (Chlorophyceae, ). Journal of Phycology, 42(1), pp.142–154.

Felsenstein, J., 2004. Inferring Phylogenies, Sunderland, Massachusetts: Sinauer Associates.

Filion, G.J., 2015. The signed Kolmogorov-Smirnov test: why it should not be used. GigaScience, 4(1), p.9.

Folmer, O. et al., 1994. DNA primers for amplification of mitochonrial cytochrome c oxidase subunit I from diverse metazoan B. Hansson, ed. Molecular Marine Biology and Biotechnology, 3(5), pp.294–299.

Forster, D. et al., 2016. Comparison of three clustering approaches for detecting novel environmental microbial diversity. PeerJ, 4, p.e1692.

Forster, D. et al., 2015. Testing ecological theories with sequence similarity networks: marine ciliates exhibit similar geographic dispersal patterns as multicellular organisms. BMC Biology, 13(16), pp.1–16.

Fu, R. & Gong, J., 2017. Single Cell Analysis Linking Ribosomal (r)DNA and rRNA Copy Numbers to Cell Size and Growth Rate Provides Insights into Molecular Protistan Ecology. Journal of Eukaryotic Microbiology, 64(6), pp.885–896.

Galagan, J.E. et al., 2005. Genomics of the fungal : Insights into eukaryotic biology. Genome Research, 15(12), pp.1620–1631.

Gasmi, S. et al., 2014. Evolutionary history of Chaetognatha inferred from molecular and

164

6. References

morphological data: a case study for body plan simplification. Frontiers in Zoology, 11(1), pp.1–25.

Geisen, S. et al., 2015. Metatranscriptomic census of active protists in soils. The ISME Journal, 9(10), pp.2178–2190.

Gibbons, J.G. et al., 2014. Ribosomal DNA copy number is coupled with gene expression variation and mitochondrial abundance in humans. Nature Communications, 5(1), p.4850.

Giner, C.R. et al., 2016. Environmental Sequencing Provides Reasonable Estimates of the Relative Abundance of Specific Picoeukaryotes. Applied and Environmental Microbiology, 82(15), pp.4757–4766.

Glockling, S.L., Marshall, W.L. & Gleason, F.H., 2013. Phylogenetic interpretations and ecological potentials of the Mesomycetozoea (Ichthyosporea). Fungal Ecology, 6(4), pp.237–247.

Godhe, A. et al., 2008. Quantification of Diatom and Biomasses in Coastal Marine Seawater Samples by Real-Time PCR. Applied and Environmental Microbiology, 74(23), pp.7174–7182.

Govindaraju, D.R. & Cullis, C.A., 1992. Ribosomal DNA variation among populations of a Pinus rigida Mill. (pitch pine) ecosystem: I. Distribution of copy numbers. Heredity, 69(2), pp.133–140.

Gozlan, R.E. et al., 2014. Current ecological understanding of fungal-like pathogens of fish: what lies beneath? Frontiers in Microbiology, 5(62), pp.1–16.

Gozlan, R.E. et al., 2009. Identification of a rosette-like agent as Sphaerothecum destruens, a multi-host fish pathogen. International Journal for Parasitology, 39(10), pp.1055–1058.

Graff, L. V., 1882. Monographie der Turbellarien I. Rhabdocoelida., Leipzig, Germany: Verlag von Wilheilm Engelman.

Grau-Bové, X. et al., 2017. Dynamics of genomic innovation in the unicellular ancestry of animals. eLife, 6, p.e26036.

Grazhdankin, D., 2014. Patterns of Evolution of the Ediacaran Soft-Bodied Biota. Journal of Paleontology, 88(2), pp.269–283.

Gregg, J. et al., 2012. Inability to demonstrate fish-to-fish transmission of Ichthyophonus from laboratory infected Pacific herring Clupea pallasii to naïve conspecifics. Diseases of Aquatic Organisms, 99(2), pp.139–144.

Gregory, T.R., 2005. DNA barcoding does not compete with taxonomy. Nature, 434(7037), pp.1067–1067.

Grossart, H.-P. et al., 2015. Discovery of dark matter fungi in aquatic ecosystems demands a reappraisal of the phylogeny and ecology of zoosporic fungi. Fungal Ecology, 19, pp.28–38.

165

6. References

Hadziavdic, K. et al., 2014. Characterization of the 18s rRNA gene for designing universal eukaryote specific primers. PLoS ONE, 9(2), p.e87624.

Van Hannen, E.J. et al., 1999. Detritus-dependent development of the microbial community in an experimental system: Qualitative analysis by denaturing gradient gel electrophoresis. Applied and Environmental Microbiology, 65(6), pp.2478–2484.

Hassett, B.T., López, J.A. & Gradinger, R., 2015. Two New Species of Marine Saprotrophic Sphaeroformids in the Mesomycetozoea Isolated From the Sub-Arctic Bering Sea. Protist, 166(3), pp.310–322.

Haszprunar, G., 2015. Review of data for a morphological look on Xenacoelomorpha (Bilateria ). Organisms Diversity and Evolution, 16(2), pp.1–27.

Hawksworth, D.L., 2001. The magnitude of fungal diversity: the 1.5 million species estimate revisited. Mycological Research, 105(12), pp.1422–1432.

Hebert, P.D.N. et al., 2003a. Biological identifications through DNA barcodes. Proceedings of the Royal Society B: Biological Sciences, 270, pp.313–321.

Hebert, P.D.N., Ratnasingham, S. & de Waard, J.R., 2003b. Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proceedings of the Royal Society of London. Series B: Biological Sciences, 270, pp.S96–S99.

Hehenberger, E. et al., 2017. Novel Predators Reshape Holozoan Phylogeny and Reveal the Presence of a Two-Component Signaling System in the Ancestor of Animals. Current Biology, 27(13), p.2043–2050.

Hejnol, A. et al., 2009. Assessing the root of bilaterian animals with scalable phylogenomic methods. Proceedings of the Royal Society B: Biological Sciences, 276(1677), pp.4261–4270.

Hejnol, A. & Pang, K., 2016. Xenacoelomorpha’s significance for understanding bilaterian evolution. Current Opinion in Genetics and Development, 39, pp.48–54.

Hershberger, P.K. et al., 2010. Amplification and transport of an endemic fish disease by an introduced species. Biological Invasions, 12(11), pp.3665–3675.

Hertel, L.A., Loker, E.S. & Bayne, C.J., 2002. The symbiont Capsaspora owczarzaki, nov. gen. nov. sp., isolated from three strains of the pulmonate snail Biomphalaria glabrata is related to members of the Mesomycetozoea. International Journal for Parasitology, 32(9), pp.1183–1191.

Hirst, M.B., Kita, K.N. & Dawson, S.C., 2011. Uncultivated microbial eukaryotic diversity: A method to link ssu rRNA gene sequences with morphology. PLoS ONE, 6(12).

Hyman, L.H., 1940. The invertebrates: Protozoa through Ctenophora: vol. 1, New York: McGraw-Hill.

Ide, S. et al., 2010. Abundance of Ribosomal RNA Gene Copies Maintains Genome Integrity. Science, 327(5966), pp.693–696.

166

6. References

James-Clark, H., 1866. Conclusive proofs of the animality of the ciliate , and of their affinities with the Infusoria flagellata. American Journal of Science, 2, pp.320– 325.

James, T.Y. & Berbee, M.L., 2012. No jacket required - new fungal lineage defies dress code. BioEssays, 34(2), pp.94–102.

Ji, Y. et al., 2013. Reliable, verifiable and efficient monitoring of biodiversity via metabarcoding M. Holyoak, ed. Ecology Letters, 16(10), pp.1245–1257.

Jondelius, U. et al., 2011. How the worm got its pharynx: Phylogeny, classification and bayesian assessment of character evolution in acoela. Systematic Biology, 60(6), pp.845–871.

Jondelius, U. et al., 2002. The Nemertodermatida are basal bilaterians and not members of the Platyhelminthes. Zoologica Scripta, 31(2), pp.201–215.

Jones, M.D.M. et al., 2011. Discovery of novel intermediate forms redefines the fungal tree of life. Nature, 474(7350), pp.200-203.

Jøstensen, J.-P. et al., 2002. Molecular-phylogenetic, structural and biochemical features of a cold-adapted, marine ichthyosporean near the animal-fungal divergence, described from in vitro cultures. European Journal of Protistology, 38(2), pp.93–104.

Junker, B.H. & Schreiber, F., 2008. Analysis of biological networks B. H. Junker & F. Schreiber, eds., Hoboken: John Wiley & Sons.

Karpov, S.A. et al., 2014. Morphology, phylogeny, and ecology of the aphelids (Aphelidea, Opisthokonta) and proposal for the new superphylum Opisthosporidia. Frontiers in Microbiology, 5, pp.1–11.

Khan, S. et al., 2017. Biodegradation of polyester polyurethane by Aspergillus tubingensis. Environmental Pollution, 225, pp.469–480.

Ki, J.-S., 2012. Hypervariable regions (V1–V9) of the dinoflagellate 18S rRNA using a large dataset for marker considerations. Journal of Applied Phycology, 24(5), pp.1035–1043.

Kirk, P.M. et al., 2008. Dictionary of the Fungi 10th ed., Wallingford (UK): CABI Publishing.

Knauth, L.P., 2005. Temperature and salinity history of the Precambrian ocean: implications for the course of microbial evolution. Palaeogeography, Palaeoclimatology, Palaeoecology, 219, pp.53–69.

Kneipp, L.F. et al., 1998. Trichomonas vaginalis and Tritrichomonas foetus: Expression of Chitin at the Cell Surface. Experimental Parasitology, 89(2), pp.195–204.

Knoll, A.H., 2011. The Multiple Origins of Complex Multicellularity. Annual Review of Earth and Planetary Sciences, 39(1), pp.217–239.

Knoll, A.H. & Carroll, S.B., 1999. Early Animal Evolution: Emerging Views from

167

6. References

Comparative Biology and Geology. Science, 284(5423), pp.2129–2137.

Kunin, V. et al., 2010. Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environmental Microbiology, 12(1), pp.118– 123.

Kurtzman, C.P., 1994. Molecular taxonomy of the . Yeast, 10(13), pp.1727–1740.

Lafferty, K.D. et al., 2015. Infectious Diseases Affect Marine Fisheries and Aquaculture Economics. Annual Review of Marine Science, 7(1), pp.471–496.

Lambshead, P.J.D., 1993. Recent developments in marine benthic biodiversity research. Oceanis, 19(6), pp.5–24.

Lang, B.F. et al., 2002. The closest unicellular relatives of animals. Current Biology, 12(20), pp.1773–1778.

Lara, E., Moreira, D. & López-García, P., 2010. The Environmental Clade LKM11 and Rozella Form the Deepest Branching Clade of Fungi. Protist, 161(1), pp.116–121.

Larsen, B.B. et al., 2017. Inordinate Fondness Multiplied and Redistributed: the Number of Species on Earth and the New Pie of Life. The Quarterly Review of Biology, 92(3), pp.229–265.

Layeghifard, M., Hwang, D.M. & Guttman, D.S., 2017. Disentangling Interactions in the Microbiome: A Network Perspective. Trends in Microbiology, 25(3), pp.217–228.

Leadbeater, B.S.C., 2008. Choanoflagellate evolution : the morphological perspective. Protistology, 5(4), pp.256–267.

Leadbeater, B.S.C., 2015. The choanoflagellates: Evolution, biology and ecology, Cambridge: Cambridge University Press.

Leger, M.M. et al., 2018. Demystifying Eukaryote Lateral Gene Transfer. BioEssays, 40(5), p.1700242.

Lejzerowicz, F. et al., 2015. High-throughput sequencing and morphology perform equally well for benthic monitoring of marine ecosystems. Scientific Reports, 5, p.13932.

Lepère, C. et al., 2019. Diversity, spatial distribution and activity of fungi in freshwater ecosystems. PeerJ, 7, p.e6247.

Leray, M. & Knowlton, N., 2016. Censusing marine eukaryotic diversity in the twenty-first century. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1702).

Leray, M. & Knowlton, N., 2015. DNA barcoding and metabarcoding of standardized samples reveal patterns of marine benthic diversity. Proceedings of the National Academy of Sciences, 112(7), pp.2076–2081.

Lichtwardt, R.W., 1954. Three Species of Eccrinales Inhabiting the Hindguts of Millipeds, with Comments on the Eccrinids as a Group. Mycologia, 46(5), pp.564–585.

168

6. References

Lie, A.A.Y. et al., 2014. Investigating microbial eukaryotic diversity from a global census: Insights from a comparison of pyrotag and full-length sequences of 18S rRNA genes. Applied and Environmental Microbiology, 80(14), pp.4363–4373.

Liu, Y. et al., 2009. Phylogenomic analyses predict sistergroup relationship of nucleariids and fungi and paraphyly of zygomycetes with significant support. BMC Evolutionary Biology, 9, p.272.

Locey, K.J. & Lennon, J.T., 2016. Scaling laws predict global microbial diversity. Proceedings of the National Academy of Sciences, 113(21), pp.5970–5975.

Logares, R. et al., 2009. Infrequent marine–freshwater transitions in the microbial world. Trends in Microbiology, 17(9), pp.414–422.

Logares, R. et al., 2014. Patterns of rare and abundant marine microbial eukaryotes. Current Biology, 24(8), pp.813–821.

Lohr, J.N. et al., 2010. A parasite (Caullerya mesnili) constitutes a new member of the Ichthyosporea, a group of protists near the animal-fungi divergence. Journal of Eukaryotic Microbiology, 57(4), pp.328–336.

Loman, N.J., Quick, J. & Simpson, J.T., 2015. A complete bacterial genome assembled de novo using only nanopore sequencing data. Nature Methods, 12(8), pp.733– 735.

López-Escardó, D. et al., 2017. Evaluation of single-cell genomics to address evolutionary questions using three SAGs of the choanoflagellate Monosiga brevicollis. Scientific Reports, 7(1), p.11025.

López-Escardó, D., López-García, P., et al., 2018a. Parvularia atlantis gen. et sp. nov., a Nucleariid Filose Amoeba (Holomycota, Opisthokonta). Journal of Eukaryotic Microbiology, 65(2), pp.170–179.

López-Escardó, D., Paps, J., et al., 2018b. Metabarcoding analysis on European coastal samples reveals new molecular metazoan diversity. Scientific Reports, 8(1), p.9106.

López-García, P. et al., 2001. Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton. Nature, 409(6820), pp.603–607.

Lord, J.C., Hartzer, K.L. & Kambhampati, S., 2012. A Nuptially Transmitted Ichthyosporean Symbiont of Tenebrio molitor (Coleoptera: Tenebrionidae). Journal of Eukaryotic Microbiology, 59(3), pp.246–250.

Love, G.D. et al., 2009. Fossil steroids record the appearance of Demospongiae during the Cryogenian period. Nature, 457(7230), pp.718–721.

Mahé, F. et al., 2017. Parasites dominate hyperdiverse soil protist communities in Neotropical rainforests. Nature Ecology and Evolution, 1(4).

Mahé, F. et al., 2015. Swarm v2: highly-scalable and high-resolution amplicon clustering. PeerJ, 3(e1420), pp.1–20.

169

6. References

Marshall, E., 2005. Will DNA Bar Codes Breathe Life Into Classification? Science, 307(5712), pp.1037–1037.

Marshall, W.L. et al., 2008. Multiple Isolations of a Culturable, Motile Ichthyosporean (Mesomycetozoa, Opisthokonta), Creolimax fragrantissima n. gen., n. sp., from Marine Invertebrate Digestive Tracts. Protist, 159(3), pp.415–433.

Marshall, W.L. & Berbee, M.L., 2011. Facing Unknowns: Living Cultures (Pirum gemmata gen. nov., sp. nov., and Abeoforma whisleri, gen. nov., sp. nov.) from Invertebrate Digestive Tracts Represent an Undescribed Clade within the Unicellular Opisthokont Lineage Ichthyosporea (Mesomycetozoea). Protist, 162(1), pp.33–57.

Martin, W.F., 2017. Too Much Eukaryote LGT. BioEssays, 39(12), p.1700115.

Massana, R. et al., 2015. Marine protist diversity in European coastal waters and sediments as revealed by high-throughput sequencing. Environmental Microbiology, 17, pp.4035–4049.

Massana, R. et al., 2011. Sequence diversity and novelty of natural assemblages of picoeukaryotes from the Indian Ocean. ISME Journal, 5(2), pp.184–195.

Matsen, F.A., Kodner, R.B. & Armbrust, E.V., 2010. pplacer: linear time maximum- likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics, 11(1), p.538.

May, R.M., 1992. How many species inhabit the Earth? Scientific American, 267, pp.18– 24.

Mayr, E., 1982. Speciation and Macroevolution. Evolution, 36(6), p.1119.

Mayr, E., 1942. Systematics and the Origin of Species from the Viewpoint of a Zoologist, Harvard University Press.

Mélida, H. et al., 2013. Analyses of Extracellular Carbohydrates in Unveil the Existence of Three Different Cell Wall Types. Eukaryotic Cell, 12(2), pp.194–203.

Mendoza, L., Taylor, J.W. & Ajello, L., 2002. The Class Mesomycetozoea: A Heterogeneous Group of Microorganisms at the Animal-Fungal Boundary. Annual Review of Microbiology, 56(1), pp.315–344.

Meyers, L.A. et al., 2003. Applying Network Theory to Epidemics: Control Measures for Mycoplasma pneumoniae Outbreaks. Emerging Infectious Diseases, 9(2), pp.204– 210.

Mikrjukov, K.A. & Mylnikov, A.P., 2001. A study of the fine structure and the of a lamellicristate amoeba, Micronuclearia podoventralis gen. et sp. nov. (Nucleariidae, Rotosphaerida). European Journal of Protistology, 37(1), pp.15–24.

Monchy, S. et al., 2011. Exploring and quantifying fungal diversity in freshwater lake ecosystems using rDNA cloning/sequencing and SSU tag pyrosequencing. Environmental Microbiology, 13(6), pp.1433–1453.

170

6. References

Moon-van der Staay, S.Y., de Wachter, R. & Vaulot, D., 2001. Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature, 409(6820), pp.607–610.

Mora, C. et al., 2011. How Many Species Are There on Earth and in the Ocean? G. M. Mace, ed. PLoS Biology, 9(8), p.e1001127.

Mora, C., Rollo, A. & Tittensor, D.P., 2013. Comment on “Can we name earth’s species before they go extinct?” Science, 341(6143), pp.4–5.

Nagahama, T. et al., 2011. Molecular evidence that deep-branching fungi are major fungal components in deep-sea methane cold-seep sediments. Environmental Microbiology, 13(8), pp.2359–2370.

Najle, S.R. et al., 2016. Sterol metabolism in the filasterean Capsaspora owczarzaki has features that resemble both fungi and animals. Open Biology, 6, p.160029.

Narbonne, G.M., 2005. The Ediacara Biota: Neoproterozoic Origin of Animals and Their Ecosystems. Annual Review of Earth and Planetary Sciences, 33(1), pp.421–442.

Nebel, M. et al., 2011. Delimiting operational taxonomic units for assessing ciliate environmental diversity using small-subunit rRNA gene sequences. Environmental Microbiology Reports, 3(2), pp.154–158.

Newmaster, S.G., Fazekas, A.J. & Ragupathy, S., 2006. DNA barcoding in land plants: evaluation of rbcL in a multigene tiered approach. Canadian Journal of Botany, 84(3), pp.335–341.

Nguyen, T.A. et al., 2017. Innovation and constraint leading to complex multicellularity in the Ascomycota. Nature Communications, 8, p.14444.

Nickrent, D.L. & Sargent, M.L., 1991. An overview of the secondary structure of the V4 region of eukaryotic small-subunit ribosomal RNA. Nucleic Acids Research, 19(2), pp.227–235.

Nitsche, F. et al., 2007. Deep sea records of Choanoflagellates with a description of two new species. Acta Protozoologica, 46, pp.99–106.

Nitsche, F., 2014. Stephanoeca arndti spec. nov. - First cultivation success including molecular and autecological data from a freshwater acanthoecid choanoflagellate from Samoa. European Journal of Protistology, 50(4), pp.412–421.

Nitsche, F., Thomsen, H.A. & Richter, D.J., 2016. Bridging the gap between morphological species and molecular barcodes − exemplified by loricate choanoflagellates. European Journal of Protistology, 57, pp.26–37.

Ocaña-Pallarès, E. et al., 2019. Reticulate evolution in eukaryotes: Origin and evolution of the nitrate assimilation pathway. PLOS Genetics, 15(2), p.e1007986.

Okamoto, N. et al., 1985. Life history and morphology of in vitro. Fish Pathology, 20(2/3), pp.273–285.

171

6. References

Ondracka, A., Dudin, O. & Ruiz-Trillo, I., 2018. Decoupling of Nuclear Division Cycles and Cell Size during the Coenocytic Growth of the Ichthyosporean Sphaeroforma arctica. Current Biology, 28, pp.1–6.

Orr, R.J.S. et al., 2018. Enigmatic Diphyllatea eukaryotes: culturing and targeted PacBio RS amplicon sequencing reveals a higher order taxonomic diversity and global distribution. BMC Evolutionary Biology, 18, p.115.

Owczarzak, A., Stibbs, H.H. & Bayne, C.J., 1980. The destruction of Schistosoma mansoni mother sporocysts in vitro by amoebae isolated from Biomphalaria glabrata: an ultrastructural study. Journal of Invertebrate Pathology, 35, pp.26–33.

Paps, J. & Ruiz-Trillo, I., 2010. Animals and Their Unicellular Ancestors. In Encyclopedia of Life Sciences. Chichester, UK: John Wiley & Sons, Ltd.

Paredes, S. et al., 2011. Ribosomal DNA Deletions Modulate Genome-Wide Gene Expression: “rDNA–Sensitive” Genes and Natural Variation. PLoS Genetics, 7(4), p.e1001376.

Parra-Acero, H. et al., 2018. Transfection of Capsaspora owczarzaki, a close unicellular relative of animals. Development, 145(10), p.dev162107.

Pascolini, R. et al., 2003. Parasitism by Dermocystidium ranae in a population of Rana esculenta complex in Central Italy and description of Amphibiocystidium n. gen. Diseases of Aquatic Organisms, 56(1), pp.65–74.

Paterson, R.R.M., 2005. Fungus or bacterium and vice versa? Microbiology, 151(3), pp.641–641.

Patterson, D.J., 1984. The Genus Nuclearia (Sarcodina, Filosea): Species Composition and Characteristics of the Taxa. Archiv fur Protistenkunde, 128(1–2), pp.127–139.

Paul, M., 2012. Acanthocorbis mongolica nov. spec. - Description of the first freshwater loricate choanoflagellate (Acanthoecida) from a Mongolian lake. European Journal of Protistology, 48(1), pp.1–8.

Pawlowski, J. et al., 2011. Eukaryotic richness in the abyss: Insights from pyrotag sequencing. PLoS ONE, 6(4), p.e18169.

Pawlowski, J. et al., 2012. CBOL Protist Working Group: Barcoding Eukaryotic Richness beyond the Animal, Plant, and Fungal Kingdoms. PLoS Biology, 10(11), p.e1001419.

Pawlowski, J. et al., 2016. Protist metabarcoding and environmental biomonitoring: Time for change. European Journal of Protistology, 55, pp.12–25.

Pawlowski, J. et al., 2018. The future of biotic indices in the ecogenomic era: Integrating (e)DNA metabarcoding in biological assessment of aquatic ecosystems. Science of The Total Environment, 637–638, pp.1295–1310.

Pekkarinen, M. et al., 2003. Phylogenetic Position and Ultrastructure of Two Dermocystidium Species (Ichthyosporea) from the Common Perch (Perca fluviatilis). Acta Protozoologica, 42(4), pp.287–307.

172

6. References

Pereira, C.N. et al., 2005. The Pathogen of Frogs Amphibiocystidium ranae Is a Member of the Order Dermocystida in the Class Mesomycetozoea. Journal of Clinical Microbiology, 43(1), pp.192–198.

Pernice, M.C. et al., 2013. General Patterns of Diversity in Major Marine Microeukaryote Lineages. PLoS ONE, 8(2).

Pettitt, M.E. et al., 2002. The hydrodynamics of filter feeding in choanoflagellates. European Journal of Protistology, 38(4), pp.313–332.

Philippe, H. et al., 2011. Acoelomorph flatworms are deuterostomes related to Xenoturbella. Nature, 470(7333), pp.255–258.

Pilosof, S. et al., 2017. The multilayer nature of ecological networks. Nature Ecology & Evolution, 1(4), p.0101.

Piredda, R. et al., 2017. Diversity and temporal patterns of planktonic protist assemblages at a Mediterranean Long Term Ecological Research site. FEMS Microbiology Ecology, 93(1), p.fiw200.

Poulin, R., 2010. Network analysis shining light on parasite ecology and diversity. Trends in Parasitology, 26(10), pp.492–498.

Prokopowich, C.D., Gregory, T.R. & Crease, T.J., 2003. The correlation between rDNA copy number and genome size in eukaryotes. Genome, 46(1), pp.48–50.

Proulx, S.R., Promislow, D.E.L. & Phillips, P.C., 2005. Network thinking in ecology and evolution. Trends in Ecology and Evolution, 20(6), pp.345–353.

Raghukumar, S., 1987. Occurrence of the Thraustochytrid, Corallochytrium limacisporum gen. et sp. nov. in the Coral Reef Lagoons of the Lakshadweep Islands in the Arabian Sea. Botanica Marina, 30(1), pp.83–90.

Richards, G.S. & Degnan, B.M., 2009. The Dawn of Developmental Signaling in the Metazoa. Cold Spring Harbor Symposia on Quantitative Biology, 74, pp.81–90.

Richards, T.A. et al., 2012. Marine Fungi: Their Ecology and Molecular Diversity. Annu. Rev. Mar. Sci, 4, pp.495–522.

Richards, T.A. et al., 2015. Molecular diversity and distribution of marine fungi across 130 European environmental samples. Proceedings of the Royal Society B: Biological Sciences, 282(1819), p.20152243.

Richards, T.A., Leonard, G.U.Y. & Wideman, J.G., 2017. What Defines the “Kingdom” Fungi? Microbiology Spectrum, 5(3), p.FUNK-0044-2017.

Richter, D.J. et al., 2018. Gene family innovation, conservation and loss on the animal stem lineage. eLife, 7, pp.1–43.

Richter, D.J. & King, N., 2013. The Genomic and Cellular Foundations of Animal Origins. Annu. Rev. Genet, 47, pp.509–37.

173

6. References

Richter, D.J. & Nitsche, F., 2017. Choanoflagellatea. In J. M. Archibald, A. G. B. Simpson, & C. H. Slamovits, eds. Handbook of the Protists. Cham: Springer, pp. 1– 19. Rodríguez-Martínez, R. et al., 2009. Distribution of the uncultured protist MAST-4 in the Indian Ocean, Drake Passage and Mediterranean Sea assessed by real-time quantitative PCR. Environmental Microbiology, 11(2), pp.397–408.

Romari, K. & Vaulot, D., 2004. Composition and temporal variability of picoeukaryote communities at a coastal site of the English Channel from 18S rDNA sequences. Limnology and Oceanography, 49(3), pp.784–798.

Roskov, Y. et al., 2014. Species 2000 & ITIS Catalogue of Life, 2014 Annual Checklist. Species 2000.

Rouse, G.W. et al., 2016. New deep-sea species of Xenoturbella and the position of Xenacoelomorpha. Nature, 530(7588), pp.94–97.

Ruiz-Trillo, I. et al., 2002. A phylogenetic analysis of myosin heavy chain type II sequences corroborates that Acoela and Nemertodermatida are basal bilaterians. Proceedings of the National Academy of Sciences of the United States of America, 99(17), pp.11246–11251.

Ruiz-Trillo, I. et al., 2008. A Phylogenomic Investigation into the Origin of Metazoa. Molecular Biology and Evolution, 25(4), pp.664–672.

Ruiz-Trillo, I. et al., 1999. Acoel flatworms: earliest extant bilaterian Metazoans, not members of Platyhelminthes. Science, 283(5409), pp.1919–1923.

Ruiz-Trillo, I. et al., 2004. Capsaspora owczarzaki is an independent opisthokont lineage. Current Biology, 14(22), pp.R946–R947.

Ruiz-Trillo, I. et al., 2006. Insights into the Evolutionary Origin and Genome Architecture of the Unicellular Opisthokonts Capsaspora owczarzaki and Sphaeroforma arctica. The Journal of Eukaryotic Microbiology, 53(5), pp.379–384.

Ruiz-Trillo, I. et al., 2007. The origins of multicellularity: a multi-taxon genome initiative. Trends in Genetics, 23(3), pp.113–118.

Schlaeppi, K. et al., 2016. High-resolution community profiling of arbuscular mycorrhizal fungi. New Phytologist, 212(3), pp.780–791.

Schoch, C.L. et al., 2012. Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi. Proceedings of the National Academy of Sciences, 109(16), pp.6241–6246.

Schwartzman, D., 2002. Temperature, Biogenesis and Biospheric Self-Organization. Columbia University Press.

Sebe-Pedros, A. et al., 2010. Ancient origin of the integrin-mediated adhesion and signaling machinery. Proceedings of the National Academy of Sciences, 107(22), pp.10142–10147.

Sebé-Pedrós, A. et al., 2011. Unexpected repertoire of metazoan transcription factors in

174

6. References

the unicellular holozoan Capsaspora owczarzaki. Molecular Biology and Evolution, 28(3), pp.1241–1254.

Sebé-Pedrós, A. et al., 2013. Regulated aggregative multicellularity in a close unicellular relative of metazoa. eLife, 2013(2), pp.1–20.

Sebé-Pedrós, A. et al., 2015. The Dynamic Regulatory Genome of Capsaspora and the Origin of Animal Multicellularity. Cell, 165, pp.1224–1237.

Sebé-Pedrós, A., Degnan, B.M. & Ruiz-Trillo, I., 2017. The origin of Metazoa: a unicellular perspective. Nature Reviews Genetics, 18(8), pp.498–512.

Seifert, K.A., 2009. Progress towards DNA barcoding of fungi. Molecular Ecology Resources, 9(Suppl. 1), pp.83–89.

Shalchian-Tabrizi, K. et al., 2008. Multigene Phylogeny of Choanozoa and the Origin of Animals R. Aramayo, ed. PLoS ONE, 3(5), p.e2098.

Sherr, E.B. & Sherr, B.F., 2002. Significance of predation by protists in aquatic microbial food webs. , 81, pp.293–308.

Shi, X.L. et al., 2011. Plastid 16S rRNA gene diversity among eukaryotic picophytoplankton sorted by flow cytometry from the South Pacific Ocean. PLoS ONE, 6(4).

Simon, M. et al., 2015. Complex communities of small protists and unexpected occurrence of typical marine lineages in shallow freshwater systems. Environmental microbiology, 17(10), pp.3610–3627.

Simpson, G.G., 1951. The species concept. Evolution, 5(4), pp.285–298.

Smith, J.E., 2009. The ecology and evolution of microsporidian parasites. Parasitology, 136(14), p.1901.

Sogin, M. et al., 1989. Phylogenetic meaning of the kingdom concept: an unusual ribosomal RNA from lamblia. Science, 243(4887), pp.75–77.

Sogin, M.L. et al., 2006. Microbial diversity in the deep sea and the underexplored “rare biosphere.” Proceedings of the National Academy of Sciences, 103(32), pp.12115– 12120.

Sokal, R.R. & Sneath, P.H.A., 1963. Principles of numerical taxonomy 1st edition., San Francisco, CA: W. H. Freeman.

Spatafora, J.W. et al., 2016. A phylum-level phylogenetic classification of zygomycete fungi based on genome-scale data. Mycologia, 108(5), pp.1028–1046.

Srivastava, M. et al., 2014. Whole-Body Acoel Regeneration Is Controlled by Wnt and Bmp-Admp Signaling. Current Biology, 24(10), pp.1107–1113.

Srivathsan, A. et al., 2018. A MinIONTM-based pipeline for fast and cost-effective DNA barcoding. Molecular Ecology Resources, 18(5), pp.1035–1049.

175

6. References

Stamatakis, A., 2006. RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics, 22(21), pp.2688–2690.

Steenkamp, E.T., Wright, J. & Baldauf, S.L., 2006. The protistan origins of animals and fungi. Molecular Biology and Evolution, 23(1), pp.93–106.

Stefanni, S. et al., 2018. Multi-marker metabarcoding approach to study mesozooplankton at basin scale. Scientific Reports, 8(1), p.12085.

Stibbs, H.H. et al., 1979. Schistosome sporocyst-killing amoebae isolated from Biomphalaria glabrata. Journal of Invertebrate Pathology, 33, pp.159–170.

Stoeck, T. et al., 2009. Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities. BMC Biology, 7(i), p.72.

Stoeck, T. et al., 2010. Multiple marker parallel tag environmental DNA sequencing reveals a highly complex eukaryotic community in marine anoxic water. Molecular Ecology, 19(Suppl. 1), pp.21–31.

Suga, H. et al., 2013. The Capsaspora genome reveals a complex unicellular prehistory of animals. Nature Communications, 4, pp.1–9.

Sukhanova, I.N., 2001. Choanoflagellida on the southeastern Bering Sea shelf. Oceanology, 41(2), pp.227–231.

Sumathi, J.C. et al., 2006. Molecular Evidence of Fungal Signatures in the Marine Protist Corallochytrium limacisporum and its Implications in the Evolution of Animals and Fungi. Protist, 157(4), pp.363–376.

Tanabe, A.S. et al., 2016. Comparative study of the validity of three regions of the 18S- rRNA gene for massively parallel sequencing-based monitoring of the planktonic eukaryote community. Molecular Ecology Resources, 16(2), pp.402–414.

Tang, C.Q. et al., 2012. The widely used small subunit 18S rDNA molecule greatly underestimates true diversity in biodiversity surveys of the meiofauna. Proceedings of the National Academy of Sciences, 109(40), pp.16208–16212.

Tedersoo, L., Tooming-Klunderud, A. & Anslan, S., 2018. PacBio metabarcoding of Fungi and other eukaryotes: errors, biases and perspectives. New Phytologist, 217(3), pp.1370–1385.

Thomsen, H.A., Garrison, D.L. & Kosman, C., 1997. Choanoflagellates (Acanthoecidae, Choanoflagellida) from the Weddell sea, Antarctica, taxonomy and community structure with particular emphasis on the ice biota; with preliminary remarks on Choanoflagellates from Arctic sea ice (Northeast Water Polynya, G. Archiv fur Protistenkunde, 148(1–2), pp.77–114.

Tian, F. et al., 2009. Bacterial, archaeal and eukaryotic diversity in Arctic sediment as revealed by 16S rRNA and 18S rRNA gene clone libraries analysis. Polar Biology, 32(1), pp.93–103.

Tong, S.M., 1997. Heterotrophic flagellates and other protists from Southampton Water,

176

6. References

U.K. Ophelia, 47(2), pp.71–131.

Torruella, G. et al., 2012. Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single-copy protein domains. Molecular Biology and Evolution, 29(2), pp.531–544.

Torruella, G. et al., 2015. Phylogenomics Reveals Convergent Evolution of Lifestyles in Close Relatives of Animals and Fungi. Current Biology, 25, pp.1–7.

Tragin, M., Zingone, A. & Vaulot, D., 2018. Comparison of coastal phytoplankton composition estimated from the V4 and V9 regions of the 18S rRNA gene with a focus on photosynthetic groups and especially Chlorophyta. Environmental Microbiology, 20(2), pp.506–520.

Trebitz, A.S. et al., 2015. Potential for DNA-based identification of Great Lakes fauna: match and mismatch between taxa inventories and DNA barcode libraries. Scientific Reports, 5(1), p.12162.

Valverde, S. et al., 2018. The architecture of mutualistic networks as an evolutionary spandrel. Nature Ecology and Evolution, 2(1), pp.94–99.

Vávra, J. & Lukeš, J., 2013. Microsporidia and ‘The Art of Living Together.’ In Advances in Parasitology. Elsevier, pp. 253–319.

Vilela, R. & Mendoza, L., 2012. The taxonomy and phylogenetics of the human and animal pathogen Rhinosporidium seeberi: A critical review. Revista Iberoamericana de Micología, 29(4), pp.185–199.

Vossbrinck, C. et al., 1987. Ribosomal RNA sequence suggests microsporidia are extremely ancient eukaryotes. Nature, 326(6111), pp.411–414.

Weber, A.A.T. & Pawlowski, J., 2013. Can Abundance of Protists Be Inferred from Sequence Data: A Case Study of Foraminifera. PLoS ONE, 8(2), pp.1–8.

Weber, F. et al., 2012. Unveiling trophic functions of uncultured protist taxa by incubation experiments in the Brackish Baltic sea. PLoS ONE, 7(7), p.e41970.

Weete, J.D., Abril, M. & Blackwell, M., 2010. Phylogenetic distribution of fungal sterols. PLoS ONE, 5(5), pp.3–8.

WHO, 2007. World Malaria Report 2017, Geneva.

Wiley, E.O. & Lieberman, B.S., 2011. Phylogenetics: Theory and Practice of Phylogenetic Systematics, Hoboken, NJ, USA: John Wiley & Sons, Inc.

Willis, K.J. ed., 2018. State of the World’s Fungi 2018, Kew Gardens, UK.

Wilson, K.H., 1995. Molecular Biology as a Tool for Taxonomy. Clinical Infectious Diseases, 20(Supplement_2), pp.S117–S121.

Woese, C.R. & Fox, G.E., 1977. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proceedings of the National Academy of Sciences of the United

177

6. References

States of America, 74(11), pp.5088–90.

Woese, C.R., Kandler, O. & Wheelis, M.L., 1990. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proceedings of the National Academy of Sciences, 87(12), pp.4576–4579.

Worley, A.C., Raper, K.B. & Hohl, M., 1979. Fonticula alba: a new cellular (acrasiomycetes). Mycologia, 71(4), pp.746–760.

Wuyts, J. et al., 2000. Comparative analysis of more than 3000 sequences reveals the existence of two pseudoknots in area V4 of eukaryotic small subunit ribosomal RNA. Nucleic Acids Research, 28(23), pp.4698–4708.

Wylezich, C. et al., 2012. Ecologically relevant choanoflagellates collected from hypoxic water masses of the Baltic Sea have untypical mitochondrial cristae. BMC Microbiology, 12(1), p.271.

Yahr, R., Schoch, C.L. & Dentinger, B.T.M., 2016. Scaling up discovery of hidden diversity in fungi: impacts of barcoding approaches. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 371(1702), pp.1–11.

Yoshida, M., Nakayama, T. & Inouye, I., 2009. Nuclearia thermophila sp. nov. (Nucleariidae), a new nucleariid species isolated from Yunoko Lake in Nikko (). European Journal of Protistology, 45(2), pp.147–155.

Zhang, G.K. et al., 2018. Metabarcoding using multiplexed markers increases species detection in complex zooplankton communities. Evolutionary Applications, 11(10), pp.1901–1914.

Zhang, Z.-Q., 2013. Animal biodiversity: An update of classification and diversity in 2013. Zootaxa, 3703(1), pp.5–11.

Zhu, C. et al., 2018. Fusion DB: Assessing microbial diversity and environmental preferences via functional similarity networks. Nucleic Acids Research, 46(D1), pp.D535–D541.

Zhu, F. et al., 2005. Mapping of picoeucaryotes in marine ecosystems with quantitative PCR of the 18S rRNA gene. FEMS Microbiology Ecology, 52(1), pp.79–92.

Zuendorf, A. et al., 2006. Diversity estimates of microeukaryotes below the chemocline of the anoxic Mariager Fjord, . FEMS Microbiology Ecology, 58(3), pp.476–491.

178

7. Appendix

7. APPENDIX

179

7. Appendix

180

7. Appendix

7.1. Comparison between USEARCH and Swarm clustering methods

As discussed in section 4.3.1. Cut-offs and clusterization methods for OTU delineation, the selection of a certain threshold and clusterization method has an enormous impact on the picture of the community obtained. In order to check these possible differences, we performed two clustering strategies in our eukaryotic sampling from the Paraná River (result 3.1).

From the same raw reads, we performed identical steps: error checking, assembly, quality filtering, dereplication, and singleton elimination (Table 4). Then, we clustered the reads using two methods: (i) UPARSE workflow from USEARCH v8.1.1861 (Edgar 2013) with a fixed threshold of 97% similarity, or (ii) Swarm clustering (Mahé et al. 2015), with a distance difference of 1 nucleotide (d=1). Results showed that there were 2.3 times more Swarm OTUs than USEARCH OTUs, putting forward the possibility of obtaining a different eukaryotic community.

Data showed, however, that in general terms, the distribution and proportion of eukaryotic richness and novelty did not vary significantly (Figure 15A). The most noticeable difference was observed in in both V4 and V8-9 regions (Figure 15B), in which USEARCH OTUs failed to assign confidently most of the Archaeplastida diversity (~60% of the OTUs had <90% BLAST similarity against the PR2 reference database). On the contrary, Swarm OTUs showed that only 2.44% (in V4) and 12.15% (in V8-9) of Archaeplastida OTUs had <90% similarity. We can venture that Swarm OTUs reflected better the intraspecific variability of this group, allowing a more accurate assignment of the OTUs against their closest reference sequences in the database.

181

7. Appendix

USEARCH Swarm

97% similarity d=1

Start 3,328,000

Sequencing error correction 3,300,740 (Bayes Hammer)

Assembly of forward and reverse 3,251,360 reads (PEAR)

Quality filtering (USEARCH) 2,413,820

rDNA signature (HMM) 2,413,640

Dereplication 786,299

Sorting and singleton elimination 244,041

OTU clustering 8,747 43,573

Chimera checking 8,367 19,627

Final OTUs 8,367 19,627

4,316 (V4) 4,051 (V8-9) 10,522 (V4) 9,105 (V8-9)

Table4 Table 4. Comparison between USEARCH and Swarm clustering pipelines applied to the same raw dataset from the Paraná River sampling (result 3.1).

These results show that the clustering method may be not that relevant when addressing eukaryotic diversity at lower taxonomic ranks. However, if the research requires a detailed inspection of the diversity, for example at the species level, then a careful selection of the method is necessary. In that case, I would be more inclined to use Swarm OTUs because the level of distinction among OTUs is very precise. These Swarm OTUs usually differ very little in terms of sequence divergence, allowing to select the threshold that collects all the 18S variability displayed in the studied group.

182

7. Appendix

Figure 15. Comparison between USEARCH 97% OTUs and Swarm OTUs in the Paraná River sampling. (A) Eukaryotic richness (number of OTUs) in each big eukaryotic clade. (B) Distribution of percentages of BLAST identity against the reference PR2 database. Although Swarm OTUs were double than USEARCH OTUs, the general picture of the eukaryotic diversity kept remarkably alike.

183

7. Appendix

184

8. Acknowledgements

8. ACKNOWLEDGEMENTS

Agradecimientos

“If somebody offers you an amazing opportunity but you are not sure you can do it, say yes – then learn how to do it later”

· Richard Branson

185

8. Acknowledgements

186

8. Acknowledgements

Mi entrada al MCG no fue convencional. Iñaki, te acordarás de aquel primer email… Tu respuesta fue el mayor (y más elegante) zasca que me habían dado. Un “no veo claro que encajes aquí” que lo único que consiguió es que tuviera más ganas de demostrarte que sí que encajaba. Y tal cual se lo dije. Su respuesta: “esto ya me gusta más!!” ?¿? A cuadros me quedé. Y así, tras la entrevista empecé en el verano de 2015 con un lab meeting de Parri’s integrins.

Iñaki, the boss, miles de gracias, por haberme dado la oportunidad y por haber visto algo “más allá” del CV. Siento que siempre me conocías tú mejor que yo misma. Es como si hubieras captado al instante mi naturaleza y me hubieras dado alas para desarrollarme al completo. Y eso es un lujazo. Gracias por haberme dado la libertad de llevar a cabo mis proyectos como yo creía, por hacerme crecer a todos los niveles y por no despedirme cuando maté a Colp por tercera vez. Me he sentido siempre muy a gusto hablando contigo, y soy consciente de que eso es único. Gracias por enseñarme a comunicar, por tus críticas siempre constructivas y por estar ahí incondicionalmente, de forma tan cercana y humana. Para mi has sido un ejemplo de buen líder y si en algún momento futuro yo lidero algún equipo, créeme que te tendré muy presente.

Y vaya familia el resto del MCG. Gràcies al Bovet por su paciencia para explicarme bash básico y sobre todo a David Davis, por explicarme todo lo que al principio me sonaba a chino y ahora es mi día a día. Y a mi Parra Macarra, gracias por advertirme que el doctorado era chungo de pelotas jaja, me vino bien. Eres un 10, y ha sido un placer irte conociendo estos años. Y a la gran Merimori, ya no sólo por hacer que mi tesis fuera posible a nivel práctico, sino por ser una tía tan cojonuda. No es fácil encontrar en una misma persona tanta maldad, buen corazón y humor. Un placer haberte instruido en el arte de combinar colores y de pasárnoslo teta marioneta montando vídeos de tesis.

A los senior, gracias: Sebahtián! por confiar en mi criterio y en ser tan buena onda, che. Fue un gustazo trabajar contigo en el Paraná paper :) Y cuidar de ese mini Sebas llamado Mirko ha sido genial. A H. Casacuberta, por ser un ejemplo de mujer líder y fomentar el camino para las que venimos a continuación. Andrejjjjj, thanks mate for all your useful and shot advices about statistics and data analysis. Dennis the menace! You are, by far, the best mountain mate I could ever think of. Also, thanks for your ideas about data analysis, and for being such a chill and good- vibe dude. Cannot wait for our summer trip to the Alps and, of course, Denali is waiting for us :) A Michelle ma belle, me alegro infinito haber conocido a alguien con tan buen corazón, tan crack en ciencia y tan auténtica. I’m glad that we still have 3 months ahead to enjoy our conversation Thanks for your immense help with my CV and .شكرا ,(exchanges. Omaya (Peter or Pedro motivation letters. I remember asking you for advice and coming back with strategy and motivation to approach future steps in my career. Hope we can enjoy a fondue in Switzerland at some point :)

Educaña, más conocido como Viejoven, ha sido muy muy guay trabajar espalda con espalda. Si algo me has enseñado, es a ser más tolerante con las deficiencias cuquísticas de la gente. Aparte de eso, gracias por tus indicaciones para hacer scripts, por la discusión eterna de qué es mejor, si Mac o Linux, y por tus críticas constructivas sobre el metabarcoding, la novelty y la conclusión

187

8. Acknowledgements final de que las filogenias son mentira. A Kons, Manoli o Manolita. Me lo pasé dabuti enseñándote las 4 cosas que yo había aprendido de Davis. Nunca olvidaré nuestros moñi- moments de Sudáfrica y LA canción “Σαν έρθει η μέρα που θα ανταλλάξουμε μαζί καλημέρα”. Gracias también a APP, por nuestras charlas sobre la vida, por ser tan fan de Calvin y Hobbes como yo, y por mandarme taaaan buenas canciones de música electrónica.

A Núr, Núrissima, Presidenta de la Asociación Internacional del Cuquismo. Eres una de esas personas que no se olvidan nunca. Gracias por tu felicidad inagotable, y por tus abrazos tan fuertes y energizantes. Gracias también por enseñarme cómo se validaban los sérums parciales, y explicarme tan pacientemente todas las técnicas moleculares que luego nunca más usé xD

And finally, Ola, Olita, Olenka, Olenkita… 4 years of friendship to only have learnt dzień dobry. I know my Polish sucks, but it is because we were learning how to live. Thank you so much for inspiring me at so many levels, for giving me such different perspectives about life, for being by my side in the worst moments. I admire you in so many ways that I can only say it is an honour to be your friend. I might sound cheesy, but it’s 100% real. Thank you for introducing the electronic music to me, for the festivals, for the high waisted trousers and all your fashion tips. You’re also one of the funniest people I’ve ever met, so thank you for the mentos story *the best*. Thank you for showing me other ways to cope with life and giving me helpful, kind and warm advice. I will miss our long conversations at home drinking a glass bottle of wine. It has been the most enriching period of my life and somehow, I’m glad that we were broken hearts; that made us be the healing of the other.

* Pero hay vida más allá del MCG,

Thanks to the amazing team AIRE in Paris; specially to Eric and Romain. Thanks for the immense and valuable knowledge on networks, and for your predisposition to take our collaboration to the next level. Romain, I really enjoyed our never-ending conversations on Hangouts discussing science, network scripts and PhD struggles. Hope we can meet soon!

Chafina, gracias por tu buen rollo siempre, por acogerme en tu pisito de Gràcia y por el viajazo a Japón. Qué épico cuando acampamos en el templo y los monjes budistas vinieron por la noche a echarnos xD Gracias también a los pepitos, y a la gente de animales (geckos, beetles y micromammals) por las calçotadas tan ricas, las risas y los planes outside-PhD que son tan necesarios. Y por supuesto, a la gente de administración, que son todos unos amores y han hecho todo lo posible por facilitarme la vida, los viajes y las gestiones. A María corallo, por cuidarme siempre tan bien y compartir conmigo una visión positiva sobre el futuro. A Ana Trin’I’dade, por todos los planes montañeros tan geniales y por siempre crear tan buen ambiente. Gracias también a la perri-pandi de Montjuic, por hacer que madrugar para pasear al perro fuera tan enriquecedor y divertido; por todos los ánimos, las risas y los desayunos!

Y especial gracias a María Ferrer (Marih-ana), conocerte fue uno de los grandes *antes y después* de mi vida. Gracias por tus puntos de vista, por tu fe en mí y por siempre animarme a

188

8. Acknowledgements divulgar y a continuar por esta vía. Siempre te estaré muy agradecida por haberme abierto la puerta a lugares desconocidos. Crecí muchísimo a tu lado y espero que antes o después, volvamos a compartir tantos buenos momentos.

* Pero hay vida más allá de Barcelona,

A mis pedazo de amigas del alma Iridia/Aliño y Chococrispi/Cristóteles. Os quiero tanto que no tengo un mote para vosotras, sino dos. Nuestros tropecientos audios de 10 minutos cada uno han sido un coñlujazo. Gracias por estar siempre ahí, por perdonarnos todo, por vuestros sabios consejos en los momentos más críticos y por las charlas de horas y horas hasta cerrar el bar o quedarnos afónicas. Os quiero muchísimo, y es un orgullo tener a dos personas que, a pesar de la distancia física que nos separa desde hace 7 años, os siento tan cerca de mí como cuando pringábamos en la uni. Ha sido un placer ir evolucionando a vuestro lado y tengo infinitas ganas de que dentro de poco estemos a tiro de caña y tapa.

Gracias también a mi Ro ro. Eres una de las personas que más admiro en el mundo. Me encanta ese humor negro y ácido tan tuyo que hace que nos riamos de lo que debería hacernos llorar. A Rachel, por estar tan cerca siempre a pesar de estar tan lejos. Tus visitas anuales a Barcelona eran muy míticas, aunque ambas sabemos que el durian no vuelve a entrar en casa. Y a Morcholo, por su buen rollismo continuo y todos los planes que hemos hecho (¡y haremos!).

Y a Guille… Cuánto me alegro de que estuvieras a mi lado al principio, dándome ánimos en forma de abrazos cuando pensaba que esto se me quedaba muy grande. Tu buen humor, ir a la montaña y las cervecitas con aceitunas al cocinar eran la mejor dosis de energía. Gracias por tu eterna confianza en mí; siempre me hizo sentir que sacaría lo mejor de mí al hacer la tesis.

Y por supuesto, infinitas gracias a mi familia. A Yami, por animarme siempre, y darme calma en los momentos de agobio. Pero sobre todo, gracias por tenerme a tu lado en las noches oscuras. Ahora nos queda disfrutar del amanecer maravilloso que se extiende a nuestros pies. A mi super padre, por ser el mejor ejemplo de perseverancia, buen hacer y humildad. Gracias por tu perspectiva sosegada en los momentos críticos, por animarme a pensar en grande y por estar siempre tan orgulloso de mí.

Y a mi madre… Gracias por tu paciencia, tus consejos y tu respeto a mis decisiones. Por haberme alentado siempre a hacer lo que me gustaba, de la mejor forma posible y a no echarme atrás porque pareciera difícil. Gracias por haberme dado siempre alas y alegrarte de los vuelos tan altos que hacía. Ahora que termino el doctorado y tú te jubilas retomemos el proyecto de ciencia y arte; me llenaría de orgullo poder estar en un libro junto a ti.

Y al Abuelo, por haber aparecido en un momento tan duro y cambiarme la vida por completo.

* * A todos los que de alguna forma me habéis acompañado en este camino, gracias

189

190

191

192