<<

Phylogenetic Diversity, Functional Convergence, and Stress Response of the Symbiotic System Between and Microorganisms

Lu Fan

A thesis submitted to the University of New South Wales for the degree of Doctor of Philosophy

April 2012

Abstract

All living multicellular organisms contain associated microorganisms, which often make substantial symbiotic contributions to host physiology, behaviors and evolution. Marine sponges host dense and highly diverse communities of symbiotic microorganisms. These symbionts are assembled in stable and host-specific community structures and are often inherited through the ’s generations. Previous investigations have provided a good understanding of the phylogenetic diversity of microbial symbionts in marine sponges. However the functional features that underpin their symbiotic interactions are largely unknown.

This thesis uses sponges as models to address fundamental concepts of microbial symbiosis, including community diversity and assembly, metabolic interactions with the host and other symbionts, as well as symbiosis stability under perturbations. State-of-the-art techniques including 16S rRNA gene pyro-tag-sequencing, metagenomics and metaproteomics were applied. Specifically, a pipeline was developed to reconstruct full-length ribosomal rRNA genes from pyrosequencing metagenomic shotgun data. This work showed that a substantial proportion of microbial diversity has been typically missed by the PCR-based approaches. Detailed metagenomic analyses then identified the functional core in symbiont communities from taxonomically distinct sponge species. The result indicated that common functions were provided by distinct but functionally equivalent symbionts and enzymes in different sponge hosts. Moreover, the abundant elements involved in horizontal gene transfer suggested their key roles in distributing core functions between co-evolutionary symbionts and in facilitating functional convergence on the community scale. To investigate the expression profile of sponge symbionts, metaproteogenomic analysis was conducted on the sponge concentrica. The analysis detected abundant protein expression for the functions of substrate transport, aerobic and anaerobic metabolisms, stress response, and host-symbiont interactions. To study the stability and dynamics of the sponge holobiont during perturbation, the sponge Rhopaloeides odorabile was examined under controlled thermal stress. Dramatic changes in community structure, functional composition and gene expression were observed along with the host stress response suggesting that a decline in symbiotic interactions was likely to be the key factor in the loss of holobiont function.

i These discoveries combined with a theoretical framework advanced our knowledge in sponge microbiology and provided new insights into the ecology and evolution of microbial symbiosis.

ii Acknowledgements

First, I would like to thank Torsten Thomas for being such a cool supervisor and helpful friend to me. I admire his great knowledge in research, detailed research insights, and creative thinking. His constant demand for ‘German precision’ towards data analysis trained me in the ability to examine every experimental result or conclusion independently and critically with requirement for definative results. Besides the training to be a hard-nosed scientist, I also feel grateful for the freedom and trust he gave me to fulfill my science aspirations. Torsten’s wonderful personality, his positive thinking towards and great sense of humor has had a great influence on me. I still remember the words he gave me before I started my PhD study that, ‘The first thing to be a good scientist is to be a good person’. It has been my great honour to have had the chance to work with Torsten for the last four years.

The remarks on my research from my co-supervisor Staffan Kjelleberg have always been insightful. I appreciate his guidance throughout my PhD study including project design, progress review and manuscript writing. Staffan’s views in the big picture of science and his endeavours in combining science and industry have been of great influence to me.

I would also like to show my gratitude to Nicole Webster, my supervisor during my visit to AIMS as well as my wonderful collaborator. Nicole was excellent in coordinating the heat- stress experiment described in this thesis and provided important data for the GeXP analysis. As a colleague, I am very impressed by her passion towards her research and her love to the sponges. I thank her for her wonderful arrangement for my sampling at AIMS. I really enjoyed my stay there, especially the exclusive ship trip and I deeply fell in love with the fascinating nature of the GBR. I hope to have the chance to re-visit AIMS one day and get seasick there again.

I thank Michael Liu as my wonderful colleague and good friend. Michael contributed to the proteomic sample analyses described in this thesis. Working with him has been great fun. I owe sincere and earnest thankfulness for all his help in life over the last four years, especially when I first came to Australia. I would like to congratulate him on his recent marriage and I wish him a bright future in his career. I also want to thank David Reynolds,

iii who did a wonderful analysis for the ELPs described in this thesis. David is a quick learner and very interesting person. It has been my pleasure to supervise him and I wish him joy for whatever he decides to do in the future. My thanks also go to other colleagues and collaborators who have had a direct contribution to the analyses in this thesis, including Manuel Stark, Rachel Simister, and Ling Zhong. Special thanks to the smart super girl Kerensa McElroy, who contributed a number of neat scripts used in my bioinformatic analyses and to Bill O’Sullivan and Susan Cooke for proofreading of this thesis.

Thank you to everyone from AIMS for their support in sample collection and shipment; including Rose, Rochelle, Chris B, Raffaella, Florita, Heidi, and Shawn. I thank Martin from UNSW for his help in server operation and Patricia and Merrick from the Queensland Museum for sponge identification.

I thank my former and present colleagues at CMB, especially Kirsty and Adam for their grateful support with my scholarship, reagent ordering, sample shipment and conference travel arrangements. My experiments would not have been successful without their assistance. I cherish the memory I had in Lab 304 especially at the beginning of my PhD study. Thanks to Neil, Shaun, Cathy, Maria, Gee, Vickey, Sharon, Melani, Raymond, Carla, and Alex for their great help and advice during the time when I was establishing my experiments. I am also thankful for the 'golden old days' I had with Francesco, Kathrine, Adrian, Sylvain and others whilst enjoying camping, surfing, soccer and badminton.

Lastly, I am exceedingly grateful to my parents and my grandmothers. Thanks for supporting my decision to study in Australia. Thanks for your countless sacrifices that you made on my part over the years and the education you gave me to be honest, responsible and grateful. I am really indebted to the endless love and encouragement I receive from you. Last but not the least, I would like to thank my friends in China and Australia who appreciated me for my work and made my PhD life more enjoyable.

iv Table of Contents

Abstract ...... i

Acknowledgements ...... iii

Table of Contents ...... v

List of Figures ...... x

List of Tables ...... xiii

List of Abbreviations ...... xiv

Chapter One General Introduction ...... 1 1.1 Common features of eukaryota-associated microorganisms ...... 1 1.1.1 Diversity, distribution and functional significance of the micro-biosphere ...... 1 1.1.2 Significance of -associated microbial community and the holobiont concept ...... 3 1.1.3 Loosely associated microbiota and host-restricted symbionts ...... 4 1.1.4 Microbial community assembly mechanisms ...... 5 1.1.5 Evolution of symbiont genomes ...... 7 1.2 Sponge-microbiota holobionts ...... 8 1.2.1 General background and basic biology of sponges ...... 8 1.2.2 Diversity and distribution of sponge microorganisms ...... 10 1.2.3 Evolution of sponge holobiont ...... 10 1.2.4 Functional features in the sponge holobiont ...... 11 1.3 Stability of eukaryote-microbiota symbioses and the global stress on marine holobionts ...... 13 1.3.1 Global stress and disease outbreak on coral reefs ...... 13 1.3.2 Complex disease syndromes of marine ...... 14 1.3.3 Causative agents of disease ...... 15 1.3.4 Models for the mechanism of stress-driven disease ...... 16 1.4 Techniques for studying microbial ecology and their applications in microbial symbiosis research ...... 18 1.4.1 Conventional techniques ...... 18 1.4.2 The molecular revolution – phylogenetic-marker, gene-based surveys ...... 18

v 1.4.3 The most recent technology revolution – meta-omics ...... 20 1.4.4 Reducing the community complexity – sample fractionation and enrichment in meta-omic study ...... 23 1.4.5 Techniques in conventional and contemporary sponge symbiosis research ...... 24 1.5 Aims of the thesis ...... 25 1.6 Chapter synopsis ...... 26

Chapter Two Reconstruction of Ribosomal Genes From Metagenomic Data ...... 28 2.1 Introduction ...... 28 2.2 Materials and Methods ...... 29 2.2.1 Simulated metagenomes and metagenomic samples ...... 29 2.2.2 Reconstruction of 16S rRNA gene sequences ...... 34 2.2.3 Pyrosequencing of 16S rRNA genes amplified by PCR ...... 34 2.2.4 Operational taxonomic unit (OTU) analysis ...... 35 2.2.5 Taxonomic classification and phylogenetic analysis ...... 36 2.3 Results and Discussion ...... 36 2.3.1 16S rRNA gene assembly with minimal chimera formation ...... 36 2.3.2 Assembly of 16S rRNA sequences improves taxonomic classification ...... 39 2.3.3 16S rRNA gene reconstruction reveals community diversity that is missed by PCR-based approaches ...... 39 2.3.4 Primer bias can explain the lack of OTU detection ...... 44 2.3.5 Phylogenetic analysis of the novel 16S rRNA sequences detected by the shotgun approach ...... 45 2.4 Conclusion ...... 47

Chapter Three Functional Equivalence and Evolutionary Convergence in Complex Communities of Sponge Microbial Symbionts ...... 49 3.1 Introduction ...... 49 3.2 Materials and Methods ...... 51 3.2.1 Sample collection and sponge identification ...... 51 3.2.2 Microbial cell enrichment, DNA extraction and sequencing ...... 51 3.2.3 Phylogenetic analysis ...... 52 3.2.4 Assembly, removal of eukaryotic DNA, and gene prediction ...... 52 3.2.5 Functional analysis ...... 53 3.2.6 Identification of differential abundance ...... 54

vi 3.2.7 ELP analysis ...... 54 3.2.8 Cyanophage population analysis based on G20 proteins ...... 55 3.2.9 CRISPR analysis ...... 56 3.2.10 CRISPR-associated (CAS) protein analysis ...... 57 3.3 Results and Discussion ...... 59 3.3.1 Overview of samples and dataset ...... 59 3.3.2 Sponge microbial communities possess distinct phylogenetic and taxonomic profiles ...... 59 3.3.3 Functional annotation reveals shared genomic signatures in sponge symbionts 67 3.3.4 Nitrogen metabolism and adaptation to anaerobic conditions ...... 68 3.3.5 Photosynthesis and Photoprotection ...... 73 3.3.6 Nutrient utilization and nutritional interactions with the host ...... 74 3.3.7 Resistance to environmental and host-specific stress ...... 79 3.3.8 Regulation of cellular response ...... 80 3.3.9 ELPs and their potential interaction with the host ...... 82 3.3.10 Genomic evolution through HGT ...... 84 3.3.11 Mechanisms in controlling excessive genetic exchange ...... 89 3.4 Conclusion ...... 93

Chapter Four Metaproteogenomic Analysis of the Microbial Community Associated with the Sponge Cymbastela concentrica ...... 96 4.1 Introduction ...... 96 4.2 Materials and Methods ...... 97 4.2.1 Sponge sampling and metagenomic analyses cell separation ...... 97 4.2.2 Protein extraction and preparation ...... 98 4.2.3 One-Dimensional SDS polyacrylamide gel electrophoresis and in gel trypsin digestion ...... 98 4.2.4 High-performance liquid chromatography and MS ...... 99 4.2.5 MS/MS data analysis and database searches ...... 100 4.2.6 Protein identification and validation ...... 100 4.2.7 Binning of metagenomic data and comparative genomics ...... 101 4.2.8 Phylogenetic analysis of 16S rRNA genes and a thaumarchaeal AmoA ...... 101 4.2.9 FISH probe design and evaluation ...... 102 4.3 Results and Discussion ...... 103

vii 4.3.1 Overview of the metaproteogenomic data ...... 103 4.3.2 Active transport systems involved in nutrient acquisition ...... 106 4.3.3 Stress response ...... 109 4.3.4 Metabolism ...... 110 4.3.5 Molecular symbiont-host interactions ...... 114 4.3.6 Linking phylotype to function – expression profiling of an uncultured Phyllobacteriaceae-related bacterium ...... 115 4.4 Conclusion ...... 119

Chapter Five Phylogenetic and Functional Response of A Sponge Holobiont to Thermal Stress ...... 121 5.1 Introduction ...... 121 5.2 Materials and Methods ...... 122 5.2.1 Sampling and experimental setup ...... 122 5.2.2 mRT-qPCR analysis ...... 125 5.2.3 T-RFLP analysis ...... 125 5.2.4 Metagenomic sequencing ...... 126 5.2.5 Metaproteomic analysis ...... 126 5.2.6 Normalization of metagenomic functional abundance by SCGs ...... 127 5.2.7 Statistic analyses ...... 127 5.3 Results and Discussion ...... 128 5.3.1 Elevated temperature results in sponge necrosis and changes in stress-related gene expression ...... 128 5.3.2 Changes in structure of the microbial community ...... 131 5.3.3 Symbiotic functions are lost during temperature-induced shift in community composition ...... 135 5.3.4 Metaproteomic analysis reveals expression changes related to stress and symbiosis function ...... 140 5.3.5 Opportunistic scavengers dominated the necrotic sponges ...... 145 5.4 Conclusion ...... 145

Chapter Six Conclusion and Prospective ...... 148 6.1 Summary ...... 148 6.2 The 'barrier hypothesis’ – an extension of the ‘continuum hypothesis’ with reference to symbiont co-evolution ...... 149

viii 6.3 Disease mechanism of the eukaryote-microbiota holobiont ...... 153 6.4 Future study of the microbial symbiosis in sponges ...... 155 6.4.1 Studying the individual community members ...... 155 6.4.2 Looking around the reef – embracing the era of ‘sequencing everything’ ...... 157

ix List of Figures

Fig. 1.1 Schematic representation of a sponge. ------9

Fig. 2.1 16S rRNA gene contigs and chimeric contigs for simulated datasets. ------37 Fig. 2.2 Taxonomic classification of assembled and unassembled shotgun 16S rRNA gene reads for simulated datasets. ------38 Fig. 2.3 -level classification of the sponge pyro-tag-sequencing and shotgun sequencing datasets. ------41 Fig. 2.4 Shared and unique OTUs of the PCR-based and shotgun-based sponge datasets. ------42 Fig. 2.5 The rarefaction plots for the sponge datasets at an OTU distance of 0.01 (A), 0.03 (B) and 0.05 (C) and based on phylogenetic distance (D). ------44 Fig. 2.6 Abundance and primer-mismatches in the top OTUs at the 0.01 phylogenetic distance level for the sponge datasets. ------45 Fig. 2.7 Phylogenetic analysis of the 16S rRNA gene sequences missed by PCR. -- 49

Fig. 3.1 Clustering of proteins found by searching cyanophage G20 proteins. ------56 Fig. 3.2 Clustering of Csn1 and related proteins (TIGR01865). ------58 Fig. 3.3 Phylogenetic relationship of the sponges based on 18S rRNA sequences. - 60 Fig. 3.4 Microbial community diversity of sponge and seawater samples. ------62 Fig. 3.5 16S rRNA gene Maximum-Likelihood tree of Nitrosomonadaceae and Marine Group 1 Thaumarchaeota. ------64 Fig. 3.6 Community composition at phylum level and community diversity. ------65 Fig. 3.7 MDS plots of samples by Bray-Curtis similarity. ------67 Fig. 3.8 Specific functions abundant in sponge-associated or planktonic microbial communities. ------69 Fig. 3.9 Abundance of enzymes in the energy-producing (respiratory) pathways of nitrogen cycling. ------71 Fig. 3.10 Abundance and diversity of Rubisco/Rubisco-like proteins. ------75 Fig. 3.11 Abundance of ELPs in seawater versus sponge samples, and in free-living versus symbiotic species. ------81 Fig. 3.12 T3SS and Sec pathway secretion prediction of ELPs. ------83 Fig. 3.13 Sample clustering based on ELP sequence similarity. ------85

x Fig. 3.14 Cyanophage abundance based on G20 protein analysis. ------88 Fig. 3.15 Abundance and diversity of transposases. ------89 Fig. 3.16 Abundance of CRISPR loci and spacers. ------90 Fig. 3.17 Abundances of CAS proteins in subfamilies. ------90 Fig. 3.18 Sample clustering by CRISPR repeats (left) and CRISPR spacers (right). - 92 Fig. 3.19 A novel Csn1 arrangement in a CRISPR cassette. ------93 Fig. 3.20 Local specificity and abundance of CRISPR and their potential targets. --- 94

Fig. 4.1 Venn diagram showing the distribution of proteins identified across three sponge samples. ------104 Fig. 4.2 Relative abundance of COG categories based on the metaproteomic and metagenomic data. ------105 Fig. 4.3 Abundance of specific COGs in sponge-associated microbial metaproteome. ------107 Fig. 4.4 Contigs containing an archaeal amoABC gene cluster and a thaumarchaeal 16S rRNA gene. ------111 Fig. 4.5 Maximum-Likelihood tree of archaeal and bacterial AmoA sequences. --- 112 Fig. 4.6 16S rRNA gene Maximum-Likelihood tree of the Nitrosopumilus-like thaumarchaeon and N. maritimus SCM1 with other lineages in C1a-α Marine Group I. -- 113 Fig. 4.7 Maximum-Likelihood tree showing the Phyllobacteriaceae-phylotype and phylogenetic relationship to its closely related neighbor phylotypes. ------116 Fig. 4.8 Comparison between expressed proteome and partial genome of the Phyllobacteriaceae-phylotype on the level of COG categories (A) and COG level (B). -- 118 Fig. 4.9 Detection of sponge-associated and the Phyllobacteriaceae- phylotype by FISH. ------119

Fig. 5.1 Morphological changes observed in sponge clones taken on day 4. ------129 Fig. 5.2 Sample clustering based on host expression (GeXP). ------130 Fig. 5.3 Clustering of bacterial communities based on T-RFLP profile using Bray- Curtis similarity. ------132 Fig. 5.4 Shift in bacterial community through phylogenetic construction from metagenomic dataset. ------133 Fig. 5.5 Rarefaction plot showing the dynamics of community diversity in different samples. ------135 xi Fig. 5.6 Sample clustering based on community functional composition and community expression. ------137 Fig. 5.7 Specific functions abundance of metagenomes in normal and stressed sponge microbial communities. ------139 Fig. 5.8 Protein expression for the microbial communities of healthy, intermediate and necrotic sponges as annotated by COG categories. ------141 Fig. 5.9 Specific functions abundance of metaproteomes in normal and stressed sponge microbial communities. ------142

Fig. 6.1 The ‘barrier hypothesis’ model in normal and stressed microbial communities. ------150

xii List of Tables

Table 2.1 The simulated datasets. ------30 Table 2.2 Reads, 16S rRNA contigs, OTUs and chimera examination of the simulated communities. ------34 Table 2.3 The sponge metagenomic datasets. ------40 Table 2.4 The sponge tag-sequencing data sets. ------40 Table 2.5 16S rRNA gene contigs generated from sponge metagenomic samples. --- 46

Table 3.1 Sponge and seawater samples used in the present study. ------51 Table 3.2 Sample information in read processing, 16S rRNA gene containing sequences, assembly, decontamination, and functional annotation. ------61

Table 4.1 Information for the metagenomic analysis of the sponge C. concentrica. - 104

Table 5.1 Sample collections in the temperature-shifting experiment. ------123 Table 5.2 The GeXP analysis. ------124 Table 5.3 Information for the metagenomic analysis of R. odorabile clones. ------138 Table 5.4 Information for the metaproteomic analysis of R. odorabile clones. ------140

xiii List of Abbreviations

AHL N-acyl homoserine lactone AmoA Ammonium monooxygenase subunit A ANK Ankyrin CAMERA Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis CAS CRISPR-associated CDD Conserved Database CMFSW Calcium- and magnesium-free seawater COG Clusters of Orthologous Group CRISPR Clustered regularly interspaced short palindromic repeats Ctab Counts-table DGGE Denaturing gradient gel electrophoresis DMSP Dimethylsulfoniopropionate ELP Eukaryotic-like proteins FCM Flow cytomery FISH Fluorescence in situ hybridization Fn3 Fibronectin domain III GBR Great Barrier Reef GDH Glutamate dehydrogenase HGT Horizontal gene transfer IMG Integrated Microbial Genomes LRR Leucine rich repeats MDA Multiple displacement amplification MDS Multidimensional scaling MGE Mobile genetic element mRT-qPCR Multiplexed reverse transcription quantitative polymerase chain reaction MS Mass-spectrometry MS/MS Mass spectrometry–mass spectrometry NCBI National Center for Biotechnology Information ORF Open reading frames OTU Operational taxonomic unit PCR Polymerase chain reaction

xiv Pfam Protein Family PIC Protease inhibitor cocktail PmoA Methane monooxygenase subunit A Ptab Percentage-table PTK Protein tyrosine kinase qPCR Quantitative polymerase chain reaction R-M Restriction-modification RDP Ribosomal Database Project RuBisCO Ribulose-1,5-bisphosphate carboxylase oxygenase SCG Single-copy gene SCS Single-cell sequencing SDS Sodium dodecyl sulfate SST Sea surface temperature T-A Toxin-antitoxin T-RFLP Terminal restriction fragment length polymorphism TPR Tetratrico peptide repeat TRAP Tripartite ATP-independent periplasmic

xv Chapter One: General Introduction

Chapter One General Introduction

1.1 Common features of eukaryota-associated microorganisms

1.1.1 Diversity, distribution and functional significance of the micro-biosphere

Microorganisms are the most diverse and abundant group of organisms in environment. They are the critical component of our planet, having an essential role in global biogeochemical cycles (1) and thus being extremely important members of many ecosystems, including environments associated with eukaryotic hosts (2-4).

‘Who eats what, where and when?’ is considered a key objective in studies of microbial ecology (5). It includes approaches to understand the composition, functional potential and activities of microbial communities, and to track the stability and dynamics of these features along time and space, especially across different biogeography and in global changes. Such discoveries contribute to our understanding of the roles of microorganisms in global biogeochemical cycles (1), their interaction with eukaryotic hosts including roles in human health, and their application in industries such as drug discovery, bioremediation and renewable energy.

The first question ‘Who’ addresses the diversity and distribution of microorganisms on our planet. A well acknowledged opinion is that 'everything is everywhere but the environment selects' (6-9). It hypothesizes that most of the microbial taxa are ubiquitous around the globe (10, 11), while a certain environment possesses specific microbial species (endemism). This reflects the remarkable dispersal potential of microorganisms. In a certain habitat, microbial community structure (i.e. the type of species and their relative abundances) presents various features, for example, from an extremely acidic habitat with low diversity (12), to marine plankton with moderate diversity (13) and sediments with high diversity (14).

1 Chapter One: General Introduction

Dispersion and adaptation to environmental heterogeneity determine the organisms’ distribution (15). For example, in the case of marine planktonic microorganisms, significant community changes occur typically over a period of days and weeks (16). Along a vertical gradient, there can be significant changes over meters or tens of meters, or even over millimeters at the immediate sea-surface microlayer (17). Along a horizontal dimension, community heterogeneity tends to be on the scale of kilometers or tens of kilometers on surface water (16), while the deep sea is often considered uniform in microbial communities (18).

A recent discovery, made possible by high-throughput sequencing techniques, is the 'rare biosphere' of the microbial community (19). This concept suggests a large numbers of individually rare species are found in most ecosystems, which supports the 'everything is everywhere' hypothesis. There is a fuzzy boundary between 'common' and 'rare' organisms. In marine plankton, for example, common organisms typically carry out most of the metabolic activities (20, 21), while rare ones do not ordinarily affect major biogeochemical processes, such as respiration and the processing of nutrients (6). Instead, the rare species may serve as a functional 'seed bank' (6). As suggested by the ‘insurance policy hypothesis’ (22), they may serve as a reservoir of functional diversity by just dispersing and persisting, potentially becoming dominant when environmental conditions change (23). Being rare and slow-growing helps those organisms to avoid major mortality processes, such as viral infection and predation (6), as the viral infection rate is directly dependent on the abundance of hosts (‘kill the winner’) (24, 25).

Viral particles have been recently recognized as an essential component of the global microbiosphere. Viruses are about 10-fold more abundant than cellular microorganisms (26). They also exhibit great genome variability, and rapid rates of mutation and evolution (27, 28). Bacterial viruses (phages) are highly specific to their bacterial host (29). Global virus distribution also follows the 'everything is everywhere' hypothesis but the relative abundance of each species is restricted by local selection (30, 31). Phages can alter the fitness of the hosts, promote genetic exchange, potentially alter the makeup and function of microbial communities (32-34), and facilitate host evolution (35, 36). Recent evidence suggests that viruses manipulate microbial community structure (37). The 'kill the winners' hypothesis (24) suggests that virus infection is generally host specific and density dependent. Abundant organisms are most susceptible to epidemic infection and consequently decline. This leads to 2 Chapter One: General Introduction a succession of dominant organisms over time as one after another is attacked (25, 26, 38). Viruses have been shown to control both prokaryotic and eukaryotic phytoplankton in the ocean (26, 32, 37, 39, 40), including in algal blooms (41, 42). Phages are responsible for 4- 50% of bacterial destruction (34, 37), though bacteria have evolved complex and sophisticated mechanisms of defense (43). These mechanisms include preventing phage adsorption and phage DNA entry, cutting phage nucleic acids via restriction-modification (R-M) systems or clustered regularly interspaced short palindromic repeats (CRISPRs), and abortive infection (Abi) systems (43). To evade these anti-phage systems, phages also evolved corresponding counter-strategies. This ‘arms-race’ and highly specific co-evolution between phages and their bacterial host lead to extremely rapid genomic rearrangement and turnover of phage particles (32, 44) and the mobile genetic elements and anti-phage components in bacterial genomes (43, 45).

1.1.2 Significance of eukaryote-associated microbial community and the holobiont concept

All living multicellular organisms contain associated microorganisms (46). The number of symbiotic microorganisms and their combined genetic information often far exceed that of their hosts. Host-associated microbial communities are also often very different from the community in the surrounding environment (47-49). The co-existence of a host-microbial community can have beneficial or deleterious effects on the host; the interactions between the microorganisms and the host span the spectrum from parasitic, to commensal and mutualistic. Recent studies have shown substantial contribution of symbiosis to host physiology, behaviors and evolution (46).

Research into single-symbiont systems has generated mechanistic insights into the interactions between symbiotic partners, including aspects of molecular communication and co-metabolism (50). The most prevalent relationship between the host and the symbionts, and among the symbionts is mutualism in nutrient utilization. Gut microbes, such as those in cows and termites, are well known examples. The cow rumen contains a consortium of microbes that specialize in degradation of cellulosic material (51), while microbes in the termite hindgut are able to degrade plant sap and thus provide their host with carbohydrates, amino acids, and vitamins (52). In , such as corals,

3 Chapter One: General Introduction microbial symbionts are essential for providing the host with fixed carbon and nitrogen and for degrading metabolic wastes (53, 54). Moreover, microbial symbionts can also be involved in the host defense by producing antibiotics (55), host development and normal function of the innate and adaptive immune systems (56-59), the structural buildup of blood vessels (60), the regulation of fat accumulation (61), or even the development of social behavior such as mating preference in insects (62, 63).

Because of the prevalence and indispensability of symbionts in , the symbiont community can be considered as an integrated component of the host. For example, the human gut microbiota is sometimes considered as 'a forgotten organ' (58) and the human microbe project has been considered as a second human genome project (64). In the ecological (65, 66) or system biological sense (67), symbionts and their host have been proposed as a whole unit in selection (i.e. a 'superorganism'). The concept of 'holobiont' has been frequently used in recent symbiont community studies. Initially, it arose from the study of coral symbionts as a collective term referring to the totality of a coral , the endosymbiotic zooxanthellae, and the associated endolithic algae, fungi, bacteria, , and viruses (49).

1.1.3 Loosely associated microbiota and host-restricted symbionts

Although numerous microorganisms can be found in physical contact with eukaryotic hosts, the associations have various levels of specificity and constancy, from transient microbes colonizing the surface of green algae in the ocean (68), to the endosymbionts of insects (69). Microorganisms forming permanent and long-term relationship with a eukaryotic host are generally determined as symbionts of the hosts. The most extreme examples of strict symbiosis are mitochondria and chloroplasts, which are believed to have evolved from acquired symbiotic bacteria in the evolution history of eukaryota (70).

The strength of an association is determined by the fidelity of transmitting the microorganism between host generations; to what extent the symbiont is co-evolved with the host (71). There are various ways that the symbionts can be passed to the offspring of the host. Mitochondria and chloroplasts are usually inherited from the female adults through

4 Chapter One: General Introduction cytoplasm (72); endosymbionts in insects are usually transmitted through eggs (73), while many obligate symbionts of sponges are transmitted through larvae (74).

In contrast, many eukaryotic descents are born germ-free, so that symbionts are acquired from the environment. The intimate life with their parents or other co-habiting adults can facilitate the inheritance of symbionts to offspring within the population. For example, newborn mammals obtain gut microbiota through the birth canal, or by close physical contact with parent, family and community members (11, 75). In other cases, larvae feed from ambient environment and obtain microbes from feces of adult termites. Although environmental acquisition of symbionts is less stringent than transmission through eggs, they provide more opportunity for re-establishment of a symbiont community that reflects changed environment from the parents. This has been shown for gut microbiota in response to different diets in rumen (76, 77) and insects (78).

1.1.4 Microbial community assembly mechanisms

Understanding the processes that structure ecological communities has long been a central aim in ecological research (79). The current theories are mostly constructed based on community studies of macroorganisms. Niche theory and neutral theory are two major types of theoretical models that aim to explain patterns of biodiversity observed in nature (80, 81).

The niche theory assumes that ecological traits differ among species within a community (82). Thus the communities are shaped mainly by deterministic factors such as competition and niche differentiation (83, 84). Within a niche, species with the fastest growth rates ultimately outcompete slower growing species (85). A community structure therefore reflects niche specialization.

The neutral theory focuses on stochastic processes of birth, death, dispersal, extinction, immigration and speciation (86) and assumes all species of a community to be ecologically equivalent and to have the same demographic rates (87). Community structure is determined by a dynamic random process (88). Each of these two models has limitations that have been widely discussed in the literature (89). In general, niche theory has difficulties in explaining very diverse complex environments where many rare taxa can coexist (90, 91), while the

5 Chapter One: General Introduction neutral theory is more successful in modeling complex communities, such as in tropical rainforests (92, 93), but may not be valid in low-diversity communities (94).

These theories, although often viewed as contradictory, are not mutually exclusive (87). Recent studies tend to combine the two theories to explain community structure (80, 95). One example is the ‘continuum hypothesis’ that assumes that niche and neutrality represent the ends of a continuum from competitive to stochastic exclusion (81). According to this hypothesis, in an equilibrium community, the community structure is dependent on the complexity of the community’s niche structure (the distribution of environmental conditions) and the extent to which each different niche is saturated by complementary or functional redundant species. The dispersal capabilities of the species in a community will increase the chances of rescuing rare species from stochastic extinction, thus increasing the redundant species and the community neutrality. This hypothesis has been successful in explaining the assembly of communities with various diversities (96).

Due to the high diversity of microorganisms, only recently has technological progress led to the emergence of studies modeling microbial communities (97), including a combined niche and neutral model (98). In general, neutral theory is applicable for many free-living microbial ecosystems (99), but not for faecal microbiome (11, 99). However, few studies have considered the rules governing the formation of eukaryote-associated microbial communities (46, 68). Hosts provide a distinct environment for symbionts when compared to free-living microorganisms. Host-associated microbial communities may possess related but different rules of community assembly, which reflects their different relationships with the host. For strict symbionts undergoing vertical transmission between host generations, symbiont communities are highly consistent among individuals of the same host species through time, while the neutral effect is trivial (e.g. in sponge symbionts (74)). In contrast, for loosely associated microbial communities, random processes may have more impact on community structure. Compositions of these microbial communities among host individuals suggested both neutral effects and functional restriction determined the community assembly (68, 100-102). Another interesting phenomenon in strict symbionts, which has attracted recent attention, is that functionally equivalent symbionts exist in phylogenetically distinct hosts. For example, two endosymbionts belonging to very different bacterial phyla contain similar functional repertoires and metabolic capabilities, which allow them to establish associations with divergent insects hosts feeding on similar diets (103). 6 Chapter One: General Introduction

1.1.5 Evolution of symbiont genomes

The inheritance of symbionts ensures their coexistence with the host over the evolutionary history of the holobiont. Depending on this feature of the association, different types of genomic changes occur in microbial symbionts.

In the case of pathogens or parasites, their relationship with the host is similar to that of the prey-predator relationship. There is continuous variation and selection towards adaptation of the host and counter-adaptations on the side of the parasite as described in the ‘red queen hypothesis’ (104). This type of co-evolution is demonstrated in parasites and host immune systems (105), as well as in bacteriophages and phage-defense systems (43). Host-parasite coevolution determines the evolution rate of both the host and the parasite (106) and plays a central role in the maintenance of biological diversity (107, 108).

In contrast, for strict symbionts, which co-exist with their host during their entire life, genomic evolution follows a distinct pattern. Many strict symbiotic associations have existed for millions of years in insects (103). The physically closed living environment within the host provides them with a special environment for genomic evolution. Once free-living bacteria are acquired by a host and develop host-restriction, genes unnecessary for the symbiotic lifestyle are deleted while those crucial for living with the host are acquired. This genomic erosion includes irreversible loss of genes in many functional categories resulting in a dramatically decreased genome size of many symbionts (109, 110). At the same time genes that underlie the nutritional contributions to the hosts are incorporated to support the symbiotic association (111, 112). Fast genomic changes are often mediated by a large number of mobile elements in the genomes of symbiotic bacteria (113) by disrupting non- required genes or by gene re-arrangements to generate new, required regulatory structures or pathways (114).

One significant result of this genomic change is the development of a metabolic complementarity between the symbionts and the host based on nutritional interdependence. For example, insects that have nutritionally unbalanced food sources have often acquired intracellular symbiotic microorganisms to supplement their diet (115). In sharpshooters, the Sulcia symbiont genome encodes the genes for almost complete biosynthetic pathways for eight of the ten essential amino acids, while the symbiont Baumannia is able to produce the

7 Chapter One: General Introduction remaining two essential amino acids as well as a large number of vitamin cofactors (116, 117).

Once the symbiotic relationship is established, the genomic changes of endosymbionts almost terminate. To illustrate this, some symbiont genomes that diverged 20-200 myr ago show complete colinearity among shared genes (103, 118), a level of genome stability that is unique in bacteria. On the other hand, phylogenetic analyses of common symbionts in closely related hosts have demonstrated that in many cases, the evolutionary tree of symbionts is coherent with the host tree indicating that the symbiont has been associated with the host during its divergence.

1.2 Sponge-microbiota holobionts

1.2.1 General background and basic biology of sponges

Sponges (phylum Porifera) are an important component of marine and freshwater ecosystems: they are highly diverse, abundant and play critical functional roles, such as reef consolidation and bioerosion, provision of habitat for other invertebrates and high filtration efficiencies that link the benthic and pelagic environments (119). They are sessile filter feeders that remove microorganisms from surrounding seawater by pumping large volumes of water through their aquiferous system (120, 121, 702, 703). They are among the oldest of the multicellular (Metazoa) with fossil records dating back 600 million years (122). More than 6,000 species of sponges have been identified from both shallow and deep-water populations (74, 123).

The basic structure of sponges comprises several different cell layers. The outer layer (pinacoderm) is formed by cells known as pinacocytes. The pores on the surface of the sponges are called ostia and these cells extend along the interior canals that permeate the sponge. Inside the sponge, a series of chambers (choanoderms) are formed by flagellated cells, choanocytes, and they are responsible for pumping the water into the sponge through the ostia. The feeding process also takes place within these chambers, where choanocytes filter out food particles (including bacteria and microalgae) from the water and transfer them to the mesohyl. The transferred particles in the mesohyl are eventually digested via

8 Chapter One: General Introduction phagocytosis by archaeocytes (74, 124), a group of totipotent cells capable of differentiating into any other sponge cell type. In the feeding process, some bacteria pumped into the sponge and transferred into the mesohyl may survive and can establish themselves as part of the sponge-specific microbiota (74, 120). At the end of the feeding process, the water is expelled from the sponge via the exhalant opening or osculum.

Despite their simple body plan, sponge morphology is diverse. They come in many different shapes, sizes and colors. Many of these morphologies directly reflect their ecological function. For example, photosynthetic cyanobacterium-containing sponges are often flat, thus optimizing light reception (125, 126). The totipotent nature of the sponge cell means that they are capable of re-aggregation following dissociation, making them capable of regeneration or remodeling after partial mortality. The siliceous or calcareous skeleton and collagenous tissue are responsible for the structural support of sponge bodies (74).

Fig. 1.1 Schematic representation of a sponge. Arrows indicate the direction of water flow through the sponge. Adopted from ref. (74).

9 Chapter One: General Introduction

1.2.2 Diversity and distribution of sponge microorganisms

Sponges form permanent and highly specific symbiotic relationships with microorganisms, which are essential for the sponge’s survival and function (74). Diverse microbial communities have been identified in sponges, including many lineages, which are known to be sponge specific (74, 127). Microbial symbionts can comprise as much as 40% of sponge biomass (74, 128, 129) and reach densities of up to 1010 cells per gram of tissue (129) and occupy all parts of the sponge’s body, including the surface (pinacoderm), the extracellular matrix (mesohyl) and specialized cells (bacteriocytes) (130). Recent phylogenetic studies have suggested that marine sponges harbor specific, stable microbial communities, that are distinct in composition from those of the surrounding seawater (131-135). It is remarkable that such distinct communities exist given that the sponge bacteria are constantly exposed to large numbers of food bacteria during filter feeding (121).

The microbial community structure is highly specific to sponge species (74). This association is fairly stable in both space and time (133, 136-141) and highly resistant to external disturbance (129). The association’s tolerance to starvation, antibiotics exposure, and transplantation at different depths has been demonstrated (136, 142-144). Among different species, sponges from some locations that are connected by ocean currents share more similar microbial communities than those from more isolated collection sites. For example, generally tropical sponges are more similar to each other than those from sub- tropical sponges (145).

A given species of sponge contains a mixture of generalist and specialist microorganisms (133). Many generalist sponge microorganisms form diverse sponge-specific clusters, which are distant from free-living strains (74, 131, 145). Many of these sponge-specific clades contain or are related to coral-associated microbes.

1.2.3 Evolution of sponge holobiont

The association of microorganisms and sponges started from the , prior to the bulk of taxonomic radiation in sponges (146), and was maintained throughout the sponge evolutionary history (147, 148). In some cases, phylogenetic trees of obligate symbionts are

10 Chapter One: General Introduction congruent with those of their hosts (149-151). Symbionts in general make a significant contribute to the host’s metabolic and biochemical repertoire and possibly also to the evolutionary success of their ancient host (148).

The vertical transmission of sponge symbionts was originally proposed from microscopic observations (152-158). More recently, it has been documented in studies using a variety of molecular techniques, including denaturing gradient gel electrophoresis (DGGE), 16S rRNA gene sequencing and fluorescence in situ hybridization (FISH) (159-164, 704, 705). A recent study using 16S rRNA gene based amplicon pyrosequencing demonstrated that nearly 50% of the sponge-specific sequence clusters could be found in both the adult and larvae of the same sponge, implying vertical transmission of these groups (165). While a high proportion of the microbial symbionts are vertically transmitted between generations through eggs, larvae (157) or even sperm (166), environmental acquisition is also an important mode for establishment of symbiont microbiota, especially in juvenile sponges (167). The rare seawater biosphere can act as a seed bank for sponge-specific microbes (74, 162).

Genomic characteristics of sponge symbionts with respect to their co-evolution with the host and other symbionts have not been well studied. One investigation suggested lateral gene transfer between a and the mitochondrion of its sponge host (168). A recent study of the metagenome of sponge Cymbastela concentrica discovered a surprisingly large number of insertion elements. Abundant mobile elements had previously been seen within intracellular symbionts in other hosts (113), and were proposed to have important roles in the evolution of bacterial genomes for symbiotic relationship with their hosts.

1.2.4 Functional features in the sponge holobiont

Sponge holobionts are ideal evolutionary models for exploring the interactions of complex microbial symbiont communities with their host (74). The associated microbial communities in sponges were recognized to contribute to the host sponge biology in a variety of ways, including nutrient provision and cycling (169), transportation and elimination of waste products (170, 171), chemical defenses (172-174) and contribution to mechanical structure (135, 175).

11 Chapter One: General Introduction

The metabolic interdependency between sponge host and microorganisms has long been well acknowledged. Sponges provide various metabolic intermediates or end-products as nutrient resources for symbionts (176). Symbionts also use sponges as shelters from predators or high light levels (177). In contrast, symbionts benefit the sponges by removing metabolic wastes and providing essential nutrients. Sulfur-oxidizing bacteria participate in elimination of toxic metabolic by-products (170). Symbiont translocate photosynthate and fixed nitrogen to the sponge host (178, 179) and therefore contribute to their ecological success on nutrient-poor tropical reefs (126, 169, 180, 181). Energy gained from photosynthetic cyanobacteria can contribute to gamete and larval longevity in the water column (166). Once sponges are settled, the rapid growth is required to outcompete algae and other photosynthetic organisms for substratum in illuminated areas (182).

Sponges are also a rich source of biologically active natural products with promising industrial and medical applications (74). These secondary metabolites are thought to protect sponges from pathogens, predators and biofoulers (172-174, 183-186, 706). Many of the natural products so far identified have structural resemblances to bacterial compounds and therefore were proposed to be the products of associated microbial symbionts (187).

There is a long-standing question as to how sponges discriminate food bacteria from symbionts (188, 189). Recent (meta)genomic studies revealed a high abundance of eukaryotic-like proteins (ELPs) such as the tetratrico peptide repeat (TPR) and ankyrin (ANK) repeat proteins from sponge symbionts (190, 191). These proteins were proposed to play important roles in host-symbiont interactions by potentially mimicking host proteins to modulate host cell functions in sponges and consequently allow the host sponge to discriminate between food and symbiont bacteria (190).

However, functional features of sponge symbiosis in general are still largely unknown (74, 192). Phylogenetic comparison of bacterial communities revealed a minimal core and a large host species-specific bacterial community in sponges (145). This finding was very similar to the results of the human gut microbiota, which also consists of few shared abundant bacterial species (101, 193, 194). Despite the high variability at the species level, a core of shared gene families was demonstrated in the human gut microbiome suggesting that different combinations of species can fulfill the same functional roles (194). It is tempting to speculate that, there may also exist a functional core among the sponge microbial communities, which 12 Chapter One: General Introduction vary extensively in the composition of bacterial species. However, there are few attempts to address this speculation in sponges.

1.3 Stability of eukaryote-microbiota symbioses and the global stress on marine invertebrate holobionts

1.3.1 Global stress and disease outbreak on coral reefs

The world's biosphere is experiencing a fast change. During the last century, the average global temperature has increased by 0.6 ± 0.2°C and is predicted to increase by another 1.5 to 4.5°C in this century (195). Global warming leads to longer summers, a higher frequency and shorter return time of extreme events, and an increase in extreme temperatures and heat waves (196-198). In the ocean, this is reflected by the increase in sea surface temperature (199) and ocean acidification (200). At the same time, increasing anthropogenic activities such as overfishing (201, 202) and pollution facilitates the introduction of terrestrial pathogens to marine organisms (203) thus greatly stressing the marine ecosystem.

Consistent with these concerns, an increasing number of disease outbreak and large-scale mortality events in marine animals have been observed (204-206). This tendency is even more remarkable for the two most important reef builders, corals and sponges (54, 207-211, 707-709). Coral bleaching frequency has increased over wide geographical scales (212, 213). One-third of all coral species are at risk of extinction and 27% of reefs worldwide have already been lost (214). Sponges are also very sensitive to global change and perturbations (208). Large-scale mortality events in sponges have occurred during periods of unseasonably high seawater temperatures and after disease outbreaks (208, 215, 216).

Decline of coral and sponge populations greatly affect coral reefs and consequently the marine ecosystem. Coral bleaching and disease is a primary cause of this coral reef decline (214, 217-221) including loss of reef biodiversity (222), while mortality of sponges results in altered benthic nutrient and energy cycles (208). The overall effects will greatly impact the food security for hundreds of millions of people dependent on reef fish (222).

13 Chapter One: General Introduction

1.3.2 Complex disease syndromes of marine invertebrates

In general, disease is defined as a process resulting in tissue damage or alteration of physiological function, which produce visible symptoms. In this general definition, various abnormal signs of marine invertebrates have been reported as disease. They can be generally considered on two levels, based on the host morphology and on the structure of the symbiotic community.

Most of the records and descriptions of disease breakout start with the direct visual observation of the corals or sponges, such as tissue loss, discoloration (e.g. bleaching), necrosis and degradation in corals (207, 223), short-term tissue regression (224), absence of choanocyte chambers (225), and various other gross pathological symptoms in sponges (208). However, these signs of disease are inherently vague (223, 226). Due to the extremely complex mechanisms behind a disease-like syndrome, the same syndrome could comprise a group of distinct diseases with similar signs. For example, it is necessary to differentiate the cellular causes of whitening of coral colonies (exposure of the coral skeleton) as opposed to coral bleaching (loss of endosymbiotic algae or their associated pigments from host tissue). Compared to human disease, there is a lack of systematic classification and standardized description or records of coral and sponge disease, which hinders the comparison between investigations, and hampers the search for disease causative agents and mechanisms.

Corals and sponges possess highly dense and diverse symbiotic microbial communities, which are important to their health and function. Shifts in symbiotic community (i.e. loss of microbial symbionts and establishment of foreign microbes, including putative pathogens) have been considered as an important phenomenon related to host health (227-230). Since the development of culture-independent technologies, approaches using symbiont community shifts as a sign or evidence of stressed host have been widely adopted. For example, microbial shifts were observed in tissue regression (231) and heat-induced disorder and necrosis of sponges (215, 230). However, in the sponge Ianthella basta, which manifested disease-like syndromes, no significant shift in microbial communities was detected (232). Diseased corals often experience a shift in the composition of the microbial community in the mucus layer surrounding the coral, where the resident microbial community is replaced by pathogenic/opportunistic microbes, often species of the genus Vibrio (54, 233, 234). In the case of bleached coral, zooxanthellae densities in tissue

14 Chapter One: General Introduction decreased by ~64% (235) and Vibrio-affiliated sequences appeared prior to visual signs of bleaching, therefore providing an early warning for changes in coral health (236). Decline of symbiont microbes in diseased corals often accompanies the loss of antibiotic activity (55, 234).

Despite these discoveries, there is a lack of research with comprehensive case definitions, which systematically characterize diseases at the gross, microscopic, immunologic and microbial level (226); and therefore a lack of understanding of the differences in symptoms, causality and possibly novel causative agents (207).

1.3.3 Causative agents of disease

Climate change and human impact stress are often considered as the virtual cause of increasing marine invertebrate diseases. Many abiotic and biotic factors have been related to outbreak of coral and sponge diseases (208, 235).

Stresses such as high seawater temperature (237), eutrophication stress (238-242), pollution (243), over-fishing (201), and increased carbon dioxide inputs (244) have been related to the outbreak of coral diseases. Temperature change is often thought to act as an environmental cofactor (245) in pathogen infection in corals. Temperature elevation, which can induce an increase in reactive oxygen species (246) and lower antioxidant efficiency (247), has been linked with mass mortality (248, 249). Other relevant factors include nutrient stress (250), heavy metal (228, 251-253), high salinity (254), physical damage (255), and predation (256).

There is evidence that biotic stressors, such as pathogens (240, 257, 258) or pathogenic genes (208, 259), increased from terrestrial runoff of sewage into corals and sponges. In corals, about six bacterial strains have been isolated and experimentally characterized as disease agents (260-266). Other studies also suggested a role of viruses and protozoans in the coral disease process (234, 267, 268). However, in sponges, only one study found a primary pathogen as the causative agent of the disease syndrome in Rhopaloeides odorabile (269). This strain produces a collagenase that degrades the sponge skeletal fibers (165). Many biotic stresses can act co-effectively with pathogens in disease development. Elevated seawater temperatures can affect the frequency and severity of disease outbreaks by

15 Chapter One: General Introduction increasing the prevalence and virulence of pathogens, facilitating invasions of new pathogens or reducing host resistance and resilience in corals (270).

However, due to the complexity of disease syndromes and the lack of systematic classification, the identification of a direct causative agent is usually difficult. It has been speculated that disease associated taxa may not be the primary agent but the secondary agent in disease process (208). Such observations illustrate the complex interplay within a holobiont and the difficulty in dissecting cause and effect of many stress-related syndromes.

Attempts to address these questions are further challenged by the difficulty in cultivating of symbiont microbes and the lack of functional information from marker gene based study. Direct searches for virulence functions in a disease-associated community, such as a recent metagenomic study (233) and systematic-designed experiments that look simultaneously at the composition and function of the host and the symbiont community under controlled stress conditions, might help to disentangle some of these issues.

1.3.4 Models for the mechanism of stress-driven disease

In the last decade, many models and hypotheses have been presented to explain the mechanisms of disease in marine invertebrates.

The ‘energetic constraints’ hypothesis proposes that heat can increase the respiration rate of the host animal (271) or affect its physiological response (272, 273) (e.g. by triggering physiological and biochemical responses in the coral animal (274)). This could lead to an imbalance between the high-energy expenditure and low energy income, due to low food availability in anomalous prolonged summers due to the global warming. An increased respiration rate has been correlated to food shortage and mass mortality events in marine benthic suspension-feeding taxa, such as gorgonians and sponges (275).

Heat stress can also have an effect on photosynthetic symbionts. For corals, it causes photoinhibition, which leads to damage of photosystem II in the zooxanthellae of corals (276, 277), and consequently production of reactive oxygen species causing cell and DNA damage. This causes breakdown of the symbiotic association between a coral host and its

16 Chapter One: General Introduction photosynthetic zooxanthellae resulting coral bleaching (278, 279). Similarly, stress has been shown to affect cyanobacterial symbionts on photosynthetic sponges by reducing their photosynthetic efficiency and cause mortality in Mediterranean sponges (215).

Stress can also affect the production of antibiotics by symbionts. A mathematical model, based on the heat-induced decline in antibiotic production in corals, assumed that long-term loss of antibiotic activity would eliminate a critical component in coral defense against disease, giving pathogens an extended opportunity to infect and spread within the host, thus elevating the risk of coral bleaching, disease and mortality (280). This assumption was based on the observation of a loss of antibiotic-producing bacteria during bleaching (55), and an increased susceptibility of bleached corals to opportunistic pathogens (281). The model predicted the sudden detrimental shift in coral microbial community due to temperature increase and the ability of invading pathogens to move towards substrate-rich place and to compete under substrate-limited conditions (280). The community shift persists long after environmental conditions have returned to normal.

Based on the holobiont concept, an attempt was given by the theoretical development of the 'hologenome' theory (53, 71), which evolved from the 'the coral probiotic' hypothesis (282). A hologenome refers to the sum of the genetic information of a holobiont (i.e. host genome plus symbiotic microbiome). The key factor behind this theory is that the symbiont genomes can increase the complex and flexibility of the hologenome. Specifically, the association between host and symbionts can affect the fitness of the holobiont within its environment allowing them to derive energy from complex compounds (e.g. in insects (73, 283)). The changes in the structure of the symbiont microbiome could increase the capability of the host to adapt to environmental changes and thus increase the fitness than by the host genome alone. The relatively frequent change in microbial genomes by conjugation, transduction and DNA transformation, enables rapid and substantial evolutionary adaptation via horizontal gene transfer (HGT) (284).

17 Chapter One: General Introduction

1.4 Techniques for studying microbial ecology and their applications in microbial symbiosis research

1.4.1 Conventional techniques

Ever since the invention of the microscope over 350 years ago until two decades ago, almost all microbiological studies were based on microscopic observation and cultivation-based laboratory isolation. Progress in understanding the global taxonomic and functional diversity of microbial communities remained slow and difficult despite many microorganisms having already been cultivated, maintained and studied in the laboratory. Challenges associated with the fact that only a tiny fraction (<1-5%) of bacteria in situ could be grown in culture (285, 286) have long been recognized, preventing the analysis of the metabolism of such lineages. This situation is even more challenging for studying eukaryote-associated symbionts, as the artificial establishment of the host-like environment is generally impractical. To overcome this problem, an array of culture-independent methods based on molecular techniques, developed in the last two decades, has led to two significant revolutions in microbial ecology.

1.4.2 The molecular revolution – phylogenetic-marker, gene-based surveys

In the last two decades, microbial phylogeny has been mainly based on amplifying and analyzing phylogenetic maker genes, specifically the small subunit ribosomal RNA gene (16S or 18S rRNA gene). The same approach has been used to examine the diversity and taxonomic composition of entire microbial communities (287, 288). 16S rRNA genes are universally found in prokaryotes and contain mixtures of conserved and variable regions, which reflect different levels of evolutionary relationship. These features have been exploited to develop a system of for bacteria, and the 16S rRNA gene sequence databases became the largest repository of bacterial gene sequences (288). Examples include the ribosomal database project (RDP) (289), Greengenes (290) and SILVA database (291). These comprehensive databases allow the classification of 16S rRNA gene sequences and continue to contribute to microbial ecology by revealing valuable taxonomical information.

16S rRNA genes are usually enriched by cloning or amplification from community samples via the polymerase chain reaction (PCR). The advent of 16S rRNA gene sequencing has

18 Chapter One: General Introduction revolutionized how microbial ecologists understand the bacterial and archaeal world around them (292). Currently, the vast majority of the bacterial phyla are known only from 16S rRNA gene surveys and have no cultured representatives (293, 294). More recent developments in automated sequencing technologies have accelerated this situation. For example, an enormous level of diversity (the ‘rare biosphere’) was revealed through massive parallel 454 pyrosequencing of deep sea water samples (19). The implementations of such methods also enable large-scale experiments analyzing hundreds samples. Mammal gut studies revealed highly diversity and variability of the gut microbiota among individuals, while at the phylum level and are numerically the most dominant groups (295). Gut microbial composition is influenced by diet, host morphology and phylogeny.

Based on the 16S rRNA gene marker, a number of other culture-independent approaches have been developed, including community fingerprinting methods and FISH. Community DNA fingerprint techniques, such as DGGE (296), terminal restriction fragment length polymorphism (T-RFLP) (297), and automated ribosomal intergenic spacer analysis (13) are widely applied methods to compare microbial communities and to examine community shifts across temporal and spatial scales. FISH uses the taxonomical information of a marker gene to detect the presence, abundance, localization and distribution of a specific group of microorganisms in situ (298, 299). Up to seven different fluorescently labeled probes can be used simultaneously in a single experiment (300).

The marker gene based approaches have many potential technical errors or biases, which can mask a certain proportion of microbial diversity or produce ‘artificial species’ (301). Factors include multiple 16S rRNA gene copy numbers (302) and microheterogeneity between copies (303), DNA extraction and purification bias (304, 305), cloning bias (306), primer bias in PCR (307, 308), chimeric amplification (309), mutations (310) affected by PCR conditions (311), and sequencing errors in different platforms (312, 313). There is a question in the field of environmental microbiology about how much of the ‘rare biosphere’ is truly biological and how much might reflect methodological errors (19, 310, 314-316).

The other apparent limitation of marker gene based surveys is that they can rarely provide functional details for a specific microorganism, as there is often no irrevocable link between phylogeny and function (317). High genomic variation can exist between bacterial strains 19 Chapter One: General Introduction with identical or near identical ribosomal RNA genes. Genomic analysis of multiple strains within a species has revealed the existence of a core genome (genes shared by all strains of a species) and a flexible genome (genes shared by few strains or only present in a particular strain) (318). While the core genome of a bacterium is composed of housekeeping, regulatory cell envelope and transport genes, the flexible genome or the accessory genomes in the ‘pan genome’ (319) usually contains supplementary biochemical pathways and functions that are not essential for bacterial growth. However, the flexible genome, which is facilitated by gene duplication, domain shuffling, sequence drift, and HGT, confers selective advantages, such as adaptation to different niches, antibiotic resistance, or colonization of a new host. These functions are often responsible for the specific ecological roles of the strain, including virulence in pathogens, and are usually of the most interest to microbial ecologists (320). Therefore, studies in the functional composition of microbial communities are required to predict their specific ecological roles.

1.4.3 The most recent technology revolution – meta-omics

Functional study of microbial communities started with amplifying specific functional genes (e.g. by quantitative polymerase chain reaction (qPCR) (321)), and functional screening (322) and sequencing of environmental-derived genomic fragments. After the well-known global ocean sampling (323, 324) and other pioneering studies (12, 325-329), sequencing the entire community genome or transcriptome, and analyzing the entire proteome became powerful and state-of-the-art approaches in microbial ecology (330). These approaches were greatly facilitated by new sequencing techniques, such as the second generation sequencing platforms (e.g. pyrosequencing, Illumina and SOLiD (331)) and the high-throughput mass spectrometry (MS) analyzers (332).

Sequence-based metagenomic analysis uses a random shotgun sequencing approach, originally developed for whole-genome sequencing. Environmental DNA extracted from a microbial community sample is sequenced, either directly or after an amplification step (e.g. multiple displacement amplification - MDA (333, 334)). Many bioinformatic tools have been developed to analyze metagenomic data, including those for pre-processing, phylogenetic analysis, binning, assembly, coding and non-coding gene identification, functional annotation, sample comparison and data deposit (322, 335-342). The sequence-based

20 Chapter One: General Introduction approach represented a big step beyond the then prevailing marker gene based surveys, as it offered a relatively unbiased view of not only the community structure, but also of the community’s functional potential (343).

With the initial success of metagenomic studies, many large metagenomic sequencing projects from a diverse range of environments have followed with particular focus in the marine environment (344, 345). Recently, the metagenomic approach was also applied to host-associated microbial communities in organisms such as in human microbiota (102, 194, 346-348), bovine rumen (349, 350), freshwater (351), termite hindguts (328) and marine eukaryotes such as macroalgae (68), corals (233, 352) and sponges (190).

Many early metagenomic studies, for example the study of the microbiota of acid mine drainage (12), focused on targeted exploration of specific microbial populations. The main limitation at that time was the high cost of sequencing. However, with the continuing reduction of sequencing costs, experimental designs with appropriate replication and statistical analysis are now within reach to address fundamental questions in microbial ecology. Comparative metagenomic approaches have shown great insight into the distinct gene complements between nine different ecosystems (326), between the deep and surface planktonic ocean communities (353-356), and between gut microbiota of lean and obese twins (194). However, large-scale comparisons with explicit experimental designs and solid statistical support are still scarce in the field of metagenomics.

While metagenomics investigates functional potential for a certain microbial community, metatransciptomics and metaproteomics can provide information in the active utilization of these functions in the community under various environmental conditions. Metatranscriptomics sequences the environmental cDNA directly or after an amplification step. As both rRNA and mRNA are reverse transcribed into cDNA, both the community structure based on the 16S rRNA gene phylogeny (357) and the community functional activity at the time of sampling can be analyzed (358) (i.e. the ‘double RNA approach’ (357)). In combination with high-throughput sequencing technology, metatranscriptomics has been applied to a variety of environments (357-364), including eukaryotic-associated communities (347).

21 Chapter One: General Introduction

Metaproteomic approaches extract the protein of an entire community and identify proteins by MS, using spectral matching or de novo identification (365, 366). Protein identification is based on MS, MS/MS or other technique-based data (332, 366). In addition to the rapid development witnessed in the field of MS, the enormous increases of genomic and metagenomic data and improvement in computing power and bioinformatics provide a more solid basis for protein identification (366). Metaproteomics has been applied to various environment samples including host associated microbial community, such as from deep-sea tube worms (367, 368), termite hindgut (369) and humans (370-372).

Although meta-omic approaches provide much richer information about the microbial community than marker gene approaches, they are also subjected to technological limits. First, sufficient amounts of DNA/RNA/proteins are essential for successful meta-omic analysis but sometimes this is hard to obtain for certain samples. In the case of metagenomics, an additional amplification step is often utilized (e.g. MDA) to increase the amount of DNA for sequencing. However, this amplification step may introduce biases due to reagent contaminations, chimera formation and biased amplification. These issues can have significant impact on subsequent metagenomic community analysis (373). It is necessary to consider if amplification is permissible (342). RNA and protein samples are easily degradable by contaminating enzymes. Therefore, specific sample preservation steps are required (e.g. RNA Later and proteinase inhibitors), while amplification of RNA is also an option (329, 358). Second, contamination from undesirable laboratory species or host tissue during sampling can overwhelm the final data and greatly bias the results. Third, although the cost for high-throughput sequencing keeps declining, sequencing every sample in replicates to sufficient depth is still often unaffordable for many research projects. Therefore, small-scale experiments are often useful to understand the magnitude of variation inherent in a system and to determine if a larger sampling size or greater sequencing effort is required to obtain statistically meaningful results (374). Fourth, the fragmented nature of short reads from high-throughput sequencing platforms makes them difficult for phylogenetic or functional annotation. Assembly is sometimes utilized to obtain longer genomic contigs while care is required to avoid mis-assemblies (342). Fifth, like marker gene approaches, meta-omic studies are also subject to sequencing errors of specific platforms. While pre-processing can reduce the effect of sequence error, shotgun data can also benefit from error correction during assembly. Sixth, compared to the ribosomal RNA databases, functional annotation databases contain only currently characterized 22 Chapter One: General Introduction proteins/genes. In general, only a small proportion of the proteins in metagenomic datasets can be successfully annotated.

1.4.4 Reducing the community complexity – sample fractionation and enrichment in meta-omic study

While direct analysis of environmental DNA/RNA/proteins minimizes the sampling bias, the high complexity of most natural microbial communities can greatly challenge the downstream analysis. For symbiont samples, contamination from complex eukaryotic host DNA/RNA/proteins can significantly decrease the sequencing and analysis efficiency. Therefore, fractionation to obtain simplified communities with fewer species and with removal of host cells is often required. Various strategies have been explored to simplify the community for meta-omic studies.

In the sample collection step, centrifugation and filtration are often used for fractionation based on the cell size (323). Flow cytomery (FCM) is also sometimes used to sort cells, based on their size and fluorescent properties (375). Cells can be pre-fixed and labeled with rRNA specific fluorescence markers to increase the sorting resolution. However, a limitation of FCM is the low throughput. In the case of single-cell sequencing (SCS), only one to several cells are often obtained. Genomic amplification is therefore required for these sample to produce sufficient DNA for (meta)genomic sequencing (375).

In the library construction step, clones can be selected or screened based on phylogenetic or functional criteria. They are first screened for specific phylogenetic markers. Positive clones are then sequenced by the shotgun approach (376-378). For the second approach, environmental DNA clones are constructed in expression vectors and propagated in appropriate engineering strains such as Escherichia coli. Clones can then be screened for specific functions (phenotypes) such as bioactive compounds and natural products (379, 380). Positive clones can be analyzed based on phylogenetic markers or sequence compositional signatures. This approach does not require prior knowledge of the gene sequences and has led to the discovery of novel genes and pathways (380).

23 Chapter One: General Introduction

In the data analysis step, sequences from different phylogenetic origins can be fractionated or decontaminated by binning (342), to the process of sorting DNA sequences into groups that might represent an individual genome or genomes from closely related organisms. Several algorithms have been developed, which employ two types of information, contained within a given DNA sequence. Compositional binning makes use of the specific and conserved nucleotide composition of genomes, while similarity-based binning classifies sequences according to their similarity to known sequences in a reference database (381- 386). Binning accuracy depends on the complexity of the community, the read length, and the quality of the reference database. Although the resolution and accuracy is often low, application of binning methods to the removal of eukaryotic contaminant sequences from metagenomic data has been successfully used (190, 328).

1.4.5 Techniques in conventional and contemporary sponge symbiosis research

The history of research in sponge symbiosis reflects the technical development of microbiology (74). The application of microscopy techniques first discovered the high abundance of microorganisms in sponges with various morphologies (130, 135, 387-389). Subsequently, morphologically diverse bacteria were isolated from various marine sponges through different cultivation approaches (132, 172, 390, 391). However, cultivation of sponge microorganisms has proved to be extremely difficult, probably because of the nutrient interdependence between the host and symbionts. The species composition of the cultured bacterial communities is often distinct from the uncultured community. For example, only 0.1-1% of the total community in the sponge R. odorabile was estimated as culturable (132, 391). A later study, using additional media and supplements, improved the culturable proportion to 5% for two deep-water Scleritoderma spp. sponges (392). These findings further support the notion that the majority of microorganisms are not easily cultured using standard microbiological techniques (293).

The use of culture-independent tools has greatly accelerated the understanding of the phylogeny of sponge-associated microbes in the last two decades (reviewed in (74, 127, 129, 192, 393-395)). 16S rRNA gene based sequencing and fingerprinting approaches revealed a great diversity of sponge microorganisms and their distribution patterns with respect to stability, host specificity, space and time. The application of next-generation sequencing

24 Chapter One: General Introduction technologies further resulted in the discovery of the rare biosphere in sponge microbial communities and their potential relationship to free-living taxa (167). FISH approaches were used to describe the small-scale distribution of symbionts in sponge tissue (167, 396). The combination of 16S rRNA gene based studies and isotope experiments have made a connection between community phylogeny and metabolism in sponge symbionts (397, 398).

Omic-approaches have recently been applied in sponge microbial studies. The first genome of a sponge symbiont was determined from enriched cells of a thaumarchaeote (399, 400). A poribacterium was sequenced after fluorescence activated cell sorting (FACS) and genomic amplification (191, 401). Two other partial genomes from a sponge-specific clade (401) and an alphaproteobacterium were also obtained (402). The first metagenomic study was conducted in the sponge C. concentrica (190) and a metatranscriptomic study in another sponge was recently published (403). These analyses revealed unprecedented insights into sponge symbiotic functions such as ammonium oxidation, autotrophic carbon fixation via the Wood-Ljungdahl pathway as well as the high abundance of transporters, transposases and ELPs.

1.5 Aims of the thesis

Previous investigations have provided a good understanding of the phylogenetic diversity of microbial symbionts in marine sponges. Recently released -omic studies also have given first insights into the functional feature of those symbionts. However, there are still some fundamental questions to be addressed. Firstly, how does the potential bias in 16S rRNA gene based PCR approaches affect the prediction of sponge microbial diversity? Secondly, is there a functional core among divergent sponge species with distinct symbiont communities? Thirdly, what is the expression profile of the sponge microbiota on proteomic level? Finally, how does the sponge holobiont change in phylogenetic, genomic and proteomic composition under stress? This PhD project had the aims of addressing these questions using both conventional and state-of-the-art techniques, including 16S rRNA gene based fingerprinting and hybridization, high-throughput sequencing and MS-based proteomic analysis.

25 Chapter One: General Introduction

1.6 Chapter synopsis

Chapter 2 describes the development of a bioinformatic pipeline to construct nearly full- length 16S rRNA gene sequences from 454 shotgun sequencing data. Through simulation of communities with different diversities, this process was optimized with stringent assembly and data filtering, and generated 16S rRNA contigs with minimal chimera rate. Using this strategy, the microbial communities in two sponges were reconstructed from shotgun sequences and compared to the results of a pyro-tag PCR approach. The analysis showed that about 30% of the abundant phylotypes reconstructed from metagenomic reads failed to be amplified by PCR. This is most likely due to primer mismatches. This assembly-based pipeline not only successfully detected sequences belonging to previous identified sponge specific clades, but also discovered some novel candidate clades.

In chapter 3, an explicit experimental design was employed to determine the phylogenetic and functional profile of microbial communities associated with six sponge species and seawater samples. Common functions were identified in 18 sponge microbiomes demonstrating the existence of functional equivalence. These core functions were not only consistent with the current understanding of the biological and ecological roles of sponge- associated microorganisms, but also provided insight into novel symbiont functions. Importantly, core functions were also provided in each sponge species by analogous enzymes and biosynthetic pathways. Moreover, the abundance of elements involved in HGT suggested their key role in the genomic evolution of symbionts. These data thus demonstrated evolutionary convergence in complex symbiont communities and revealed the details and mechanisms that underpin the process.

A combined metagenomic and metaproteomic approach used to characterize the functional features of the microbial community of C. concentrica is described in Chapter 4. The expression of specific transport functions for typical sponge metabolites such as halogenated aromatics and dipeptides were detected, demonstrating the metabolic interactions between the microbial community and the host. Simultaneous performance of aerobic nitrification and anaerobic denitrification, which would aid in the removal of ammonium secreted by the sponge, was detected. The analysis also highlighted the requirement for the microbial community to respond to variable environmental conditions based on the detection of an array of stress protection proteins. Evidence that molecular interactions between symbionts

26 Chapter One: General Introduction and their host can be mediated by a set of expressed ELPs and cell-cell mediators is presented. Finally, some sponge-associated bacteria (e.g. a sponge Phyllobacteriaceae- phylotype) appeared to still undergo an evolutionary adaptation process to the sponge environment as indicated by active mobile genetic elements.

Chapter 5 describes the use of the sponge R. odorabile and its symbiont community as an experimental model to investigate holobiont function under controlled temperature stress. Community fingerprinting, metagenomics and metaproteomics were used to characterize the community structure and function of symbionts as well as gene expression analysis to define the host state. The result showed that temperature stress caused a decline in the health of the sponge holobiont, characterized by the disruption of symbiotic interactions and a heat stress response in both the host and the symbionts. The symbiotic disruption included changes in metabolic interactions and other essential symbiont functions, which occurred prior to dramatic changes in the community structure. The disturbed holobiont then offered niches that were rapidly occupied by a new set of bacteria, which lacked the capacity for symbiotic interactions, but instead opportunistically scavenged the decaying host.

A general discussion of this project is given in Chapter 6. The overall conclusion of this project is provided, models for symbiont microbial community assembly and dynamics are presented, and potential future research on sponge microbial communities is proposed.

27 Chapter Two: Reconstruction Ribosomal Genes

Chapter Two Reconstruction of Ribosomal Genes From Metagenomic Data

2.1 Introduction

The PCR amplification and sequencing of the 16S rRNA genes directly from environmental samples has over the last two decades revealed an astonishing amount of new microbial diversity (404, 405). However, as the ‘universal’ primers used in PCR are designed based on already known groups of organisms, a skewed picture of community composition is likely obtained, especially for environmental samples containing divergent microbial lineages (307).

The direct sequencing of total environmental DNA (metagenomics) has the potential to assess the true diversity of the environment without primer bias (309, 323). Metagenomic sequences can be assigned to taxa using their similarity to reference genomes based on either sequence similarity (386, 406-408) or nucleotide composition (381-383, 385). However, these types of assignments can only be informative when the genomes of closely related taxa are present in the reference set. As reference genomes are only available for a limited part of the of life (294), these taxonomic predictions are generally of low resolution (e.g. phyla or order) and hence often give only an unsatisfactory description of community composition.

In contrast, several comprehensive databases exist for the 16S rRNA gene that provide detailed phylogenetic trees and allow for taxonomic resolution down to the species level (6). Shotgun metagenomic datasets obviously also contain fragmented 16S rRNA genes and these have been directly assigned to taxa through BLAST-based comparisons (323) or phylogenetic distance-based clustering (409). However, the short and random nature of metagenomic sequences may not contain the phylogenetically most informative regions of the 16S rRNA genes, thus diminishing the efficiency of taxonomic assignments. Sequence assembly can potentially increase the length of the 16S rRNA gene sequences recovered

28 Chapter Two: Reconstruction Ribosomal Genes

(324), but low sequence coverage may limit assembly success for 16S rRNA genes and low- stringency assemblies may result in chimeric sequences (410, 411). Recently the EMIRGE software was released that uses iterative mapping of short Illumina sequencing reads against reference sequences to reconstruct 16S rRNA genes (410). Although this approach has an explicit accuracy to single nucleotide difference, its potential to avoid chimera is strongly dependent on the quality of the reference database. Further, EMIRGE’s algorithm is currently not designed for pyrosequencing reads, which contain high rates of insertion or deletions errors (e.g. homopolymers) (312). There is thus a need for an approach that reconstructs 16S rRNA genes with high accuracy from pyrosequencing data.

In the present study, a strategy to reconstruct nearly full-length 16S rRNA sequences from metagenomic shotgun data was developed. Through simulation of communities with different diversities, a process of stringent assembly and data filtering that generates 16S rRNA contigs with minimal chimera rate was generated. The process was then applied to assess the microbial symbiont communities from two marine sponges species and compared the outcome to PCR-based assessments of the community structure (pyro-tag-sequencing). It was shown that about 30% of the abundant phylotypes reconstructed from metagenomic reads failed to be amplified by PCR, which is most likely due to primer mismatches.

2.2 Materials and Methods

2.2.1 Simulated metagenomes and metagenomic samples

Ninety completed genomes were selected as references, including 76 bacteria and 14 archaea and combined using established profiles of community diversity with high- (HC), median- (MC), and low- (LC) complexity (412) (Table 2.1). Genomic sequences, 16S rRNA gene sequences and gene copy number per genome were obtained from the Integrated Microbial Genomes (IMG) website (http://img.jgi.doe.gov/cgi-bin/w/main.cgi). Heterogenous 16S rRNA genes within a genome were considered separately. For each metagenome complexity, three read data set (1,000,000 reads each, 350 nt) were simulated using empirically derived and context-based error models (GemSIM software (413)).

Three environmental DNA samples for each of the two sponges C. concentrica and

29

Table 2.1 The simulated datasets.

Domain Organism IMG Taxon NCBI Genome 16S Unique HC-A HC-B HC-C MC-A MC-B MC-C LC-A LC-B LC-C Object ID Taxon ID size genes 16S genes Archaea Archaeoglobus fulgidus VC-16, DSM 4304 638154502 224325 2178400 1 1 9061 8584 13352 7940 5138 5138 21752 6647 7335 Archaea Candidatus Korarchaeum cryptofilum OPF8 641522611 374847 1590757 1 1 19075 9537 11922 6539 6539 7006 7251 6647 6724 Archaea Desulfurococcus mucosus DSM 2162 649633040 765177 1314639 1 1 8584 9537 9061 6539 5138 6072 6647 6647 5501 Archaea Halobacterium salinarum R1, DSM 671 641522631 478009 2668776 1 1 7153 8107 8584 5138 5605 3737 6042 5438 4890 Archaea Ignicoccus hospitalis KIN4/I, DSM 18386 640753029 453591 1297538 1 1 8107 11445 19075 4671 5605 4671 6042 6647 6112 Archaea Methanocaldococcus jannaschii DSM 2661 638154505 243232 1739916 2 1 8584 9061 11922 5605 5138 4671 5438 7251 4890 Archaea Methanococcus voltae A3 646564549 456320 1936387 2 1 11445 10014 8107 5605 6072 4671 5438 4834 6112 Archaea Methanosarcina barkeri Fusaro, DSM 804 637000162 269797 4873766 3 1 8584 9537 11445 5138 7473 5605 5438 7251 6724 Archaea Methanospirillum hungatei JF-1 637000164 323259 3544738 4 1 11445 8584 8584 5605 5138 5605 6647 4230 5501 Archaea Pyrobaculum aerophilum IM2 638154513 178306 2222430 1 1 11445 11445 9537 140121 129379 162074 5438 4230 7335 Archaea Pyrobaculum islandicum DSM 4184 639633053 384616 1826402 1 1 10491 10968 10014 5138 13078 13078 4834 6647 7335 Archaea Staphylothermus marinus F1, DSM 3639 640069332 399550 1570485 1 1 10491 10014 10014 5138 5138 4204 4834 7251 5501 Archaea Sulfolobus islandicus Y.N.15.51 643692049 419942 2854410 1 1 10968 11445 11922 4204 5138 5138 5438 5438 8557 Archaea Thermococcus gammatolerans EJ3 644736411 593117 2045438 1 1 10014 11445 11445 3737 4671 4204 4230 5438 6112 Bacteria Acidobacterium sp. MP5ACTX9 649633002 696844 5503984 1 1 10014 11445 10491 11210 5138 4671 7251 6647 6724 Bacteria Acinetobacter baumannii ATCC 17978 640069301 400667 4001457 5 1 8107 9061 11445 5605 5605 5138 6647 6042 6724 Bacteria Actinobacillus succinogenes 130Z 640753001 339671 2046146 6 1 11445 10968 9537 6539 4204 4671 7251 8459 7335 Bacteria Agrobacterium vitis S4 643348505 311402 6320946 4 1 8584 11922 11445 5138 5138 4671 6647 4834 4279 Bacteria Alkalilimnicola ehrlichei MLHE-1 637000005 187272 3272789 2 1 11922 11922 10491 5138 5605 4204 6647 4230 6724 Bacteria Anabaena variabilis ATCC 29413 646564504 240292 7105752 4 1 10491 11445 9537 5138 5138 8407 6647 6647 7335 Bacteria Anaeromyxobacter dehalogenans 2CP-C 637000007 290397 5013479 2 1 11922 11445 10968 5138 3737 5138 6647 8459 6724 Bacteria Arthrobacter sp. FB24 639633006 290399 5011599 5 1 10968 10968 8107 5138 4204 5138 6647 6042 5501 Bacteria Bacillus pseudofirmus OF4 646311908 398511 4249248 7 1 11445 11922 10491 6539 4671 5138 7251 7251 7335 Bacteria Bartonella quintana Toulouse 637000028 283165 1581384 2 1 8107 11445 11922 7473 5138 6072 7251 6647 6724 Bacteria Bifidobacterium longum DJO10A 642555107 205913 2375286 4 1 10014 11922 11445 3737 5605 5605 6042 6647 8557 Bacteria Bradyrhizobium sp. BTAi1 640427103 288000 8422430 2 1 11922 10968 12399 129379 47641 47641 67069 56193 128973 Bacteria Burkholderia cenocepacia AU 1054 637000046 331271 7249477 6 1 10968 11445 12399 5138 4671 4671 6647 6647 6724

Bacteria Burkholderia cenocepacia HI2424 639633014 331272 8139086 6 1 10968 11922 11445 5138 5138 6539 6647 7251 3667 Bacteria Burkholderia xenovorans LB400 637000053 266265 9731138 6 1 9061 9061 9537 4204 4671 5138 5438 6647 5501 Bacteria Campylobacter concisus 13826 640753009 360104 2099412 3 1 11445 11445 10491 6539 4671 3737 7251 4230 3667 Bacteria Candidatus Riesia pediculicola USDA 646564517 515618 582127 2 1 9537 10014 11922 6539 5605 5138 6647 6647 6724 Bacteria Cellulophaga lytica LIM-21, DSM 7489 649633032 867900 3765936 4 1 9061 10968 7153 4671 5138 4671 6647 6042 4279 Bacteria Chloroflexus aurantiacus J-10-fl 641228485 324602 5193782 3 2 11445 8584 10968 5605 5138 5138 7855 7855 22005 Bacteria Clostridium beijerinckii NCIMB 8052 640753016 290402 6000632 14 1 9537 11922 10014 6072 6072 5138 3625 6042 6724 Bacteria Clostridium perfringens 13 637000079 195102 3085740 10 1 9061 10968 11445 8407 5138 6539 3021 6042 4279 Bacteria Clostridium thermocellum ATCC 27405 640069309 203119 3894953 4 1 10491 11445 10968 5605 4671 5138 6647 6042 5501 Bacteria Cytophaga hutchinsonii ATCC 33406 637000087 269798 4433218 3 1 9537 10014 8584 4671 5138 4204 56193 67069 56846 Bacteria Dechloromonas aromatica RCB 637000088 159087 4501104 4 1 10014 11922 11445 4671 3737 5138 6042 7251 7335 Bacteria Desulfitobacterium hafniense DCB-2 643348537 272564 6083768 5 3 10014 11922 11445 4671 5605 5605 6647 5438 4890 Bacteria Desulfovibrio desulfuricans G20 637000095 207559 3730232 4 1 10491 9537 25274 4671 5605 8407 6647 6647 7335 Bacteria Dickeya zeae Ech1591 644736355 561229 4813854 7 1 11445 25274 10014 8407 4671 4671 6647 7251 6112 Bacteria Ehrlichia canis Jake 637000097 269484 1315030 1 1 10014 10968 10014 6539 5138 5138 8459 4834 6112 Bacteria Escherichia coli O6:K15:H31 536 (UPEC) 637000104 362663 4938920 7 2 9061 12399 10014 5138 4671 4671 6042 6042 8557 Bacteria Frankia sp. CcI3 637000116 106370 5433628 2 1 10968 11922 10968 5138 5605 5605 6647 6647 6112 Bacteria Frankia sp. EAN1pec 641228492 298653 9081415 3 1 10968 11445 9537 5138 5138 5138 6647 6042 7335 Bacteria Geobacter metallireducens GS-15 637000119 269799 4011182 3 3 10491 9061 9061 5138 5138 5138 6647 3625 6724 Bacteria Halothiobacillus neapolitanus c2 646311935 555778 2582886 2 1 12399 13352 10968 4671 7006 11210 6042 6042 7335 Bacteria Helicobacter pylori J99 637000134 85963 1643831 2 1 12399 11445 11922 4671 6072 5605 6042 6042 5501 Bacteria Jannaschia sp. CCS1 637000137 290400 4404049 1 1 11922 8584 10491 5605 5138 5605 7251 7251 6724 Bacteria Kineococcus radiotolerans SRS30216 640753031 266940 4893957 4 1 11922 10491 11445 5605 5138 5138 7251 7251 6724 Bacteria Kribbella flavida DSM 17836 646311938 479435 7579488 2 1 10968 11922 10491 4671 4204 5138 6042 6042 5501 Bacteria Magnetococcus sp. MC-1 639633036 156889 4628740 3 1 10968 11445 11922 4671 4671 6539 6042 6647 6724 Bacteria Marinobacter aquaeolei VT8 639633037 351348 4647952 3 1 11445 12399 11445 5605 5138 5138 6647 6647 6724 Bacteria Moorella thermoacetica ATCC 39073 637000167 264732 2628784 1 1 25274 12399 11445 13078 4671 5605 15106 6647 3056 Bacteria Mycobacterium ulcerans Agy99 642555140 362242 5805761 1 1 10968 10968 9061 5138 5605 6539 5438 5438 6724 Bacteria Nitrobacter winogradskyi Nb-255 637000193 323098 3402093 1 1 11445 11922 23367 4671 4204 5605 7251 7855 4890 Bacteria Nitrosospira multiformis ATCC 25196 637000197 323848 3234309 1 1 11445 8107 11922 5605 5605 7940 6647 15106 7335

Bacteria Nocardioides sp. JS614 639633046 196162 5394058 2 1 11445 10968 11922 5605 5138 5138 6647 5438 6724 Bacteria Oenococcus oeni PSU-1 639633047 203123 1782786 2 1 9537 9537 10968 4671 4671 5605 4834 3021 7335 Bacteria Paracoccus denitrificans PD1222 639633048 318586 5175736 3 1 11922 10014 11445 5138 7006 4671 6042 5438 4279 Bacteria Pelobacter carbinolicus DSM 2380 637000204 338963 3662252 2 1 10014 7153 10968 5138 6539 5138 6647 5438 6112 Bacteria Petrotoga mobilis SJ95 641228500 403833 2169548 2 1 10014 11445 10968 5138 3737 5138 5438 6647 6112 Bacteria Polaromonas sp. JS666 637000208 296591 5898676 1 1 11445 10014 11922 6072 5605 5605 7251 4230 6724 Bacteria Pseudoalteromonas atlantica T6c 637000216 342610 5094958 5 1 12399 8584 12399 6072 4204 5138 7251 6647 6724 Bacteria Pseudomonas putida F1 640427132 351746 5925059 6 1 12399 12399 8107 5605 6539 6072 6647 6647 4890 Bacteria Rhizobium leguminosarum bv. trifolii 644736401 395491 7418122 3 1 10014 23367 10968 4204 5605 5138 4834 5438 6112 WSM1325 Bacteria Rhodobacter sphaeroides 2.4.1 640069327 272943 4603060 3 1 8584 19075 10968 3737 3737 5138 4834 8459 6724 Bacteria Rhodococcus equi 103S 649633089 685727 5043170 4 2 11922 10968 9061 5138 5138 5138 4834 6647 6112 Bacteria Rhodopseudomonas palustris BisA53 639279312 316055 5502424 2 1 11922 11922 10014 5605 6539 5138 6647 6647 4279 Bacteria Rhodopseudomonas palustris BisB18 637000237 316056 5513844 2 1 11922 12399 11922 52779 52779 52779 7855 6042 6724 Bacteria Rhodopseudomonas palustris BisB5 637000238 316057 4892717 2 1 11922 11445 11445 162074 140121 129379 7251 7251 6112 Bacteria Rhodopseudomonas palustris HaA2 637000240 316058 5331656 1 1 11445 9061 11445 5605 11210 6539 313595 313595 244499 Bacteria Rhodospirillum rubrum ATCC 11170 637000241 269796 4406557 4 1 10491 10491 10968 47641 162074 140121 6647 7251 6724 Bacteria Rubrobacter xylanophilus DSM 9941 637000248 266117 3299423 1 1 23367 10014 12399 7006 5605 5605 8459 3625 6724 Bacteria Saccharophagus degradans 2-40 637000249 203122 5057531 2 1 11445 10491 11922 5138 3737 5138 6042 5438 7335 Bacteria Shewanella baltica OS155 640069330 325240 5084318 10 2 11445 10014 10491 5605 8407 6539 3625 4834 4890 Bacteria Shewanella sp. ANA-3 639633058 94122 5100729 9 1 11922 10491 10491 5138 5138 4671 6042 6042 7946 Bacteria Shewanella sp. MR-7 637000260 60481 4546355 9 2 12399 10491 10014 5138 4671 6539 4230 6647 7946 Bacteria Shewanella sp. W3-18-1 639633059 351745 4754010 8 1 11922 10014 8584 5138 5138 5138 6042 6647 15281 Bacteria Sphingopyxis alaskensis RB2256 637000271 317655 3343420 1 1 11445 10491 11445 5138 8407 4671 7251 21752 6724 Bacteria Streptococcus pyogenes M28, MGAS6180 637000298 319701 1897573 6 1 12399 10491 11445 5138 5605 5605 5438 7251 5501 Bacteria Streptococcus thermophilus LMD-9 639633062 322159 1842121 6 2 10968 10491 8584 3737 4671 5605 4230 6647 4890 Bacteria Syntrophobacter fumaroxidans MPOB 639633063 335543 4848841 2 1 10491 10014 9061 4671 6539 3737 6647 6647 6112 Bacteria Thermobifida fusca YX 637000319 269800 3642249 4 1 9537 11922 10014 4204 5138 5605 5438 6042 6112 Bacteria Thermotoga neapolitana DSM 4359 643348584 309803 1884562 1 1 11445 11445 11445 5138 6539 5138 4230 4834 7335 Bacteria Thiobacillus denitrificans ATCC 25259 637000324 292415 2909809 2 1 13352 12399 12399 7006 7940 3737 8459 7251 6724 Bacteria Thiomicrospira crunogena XCL-2 637000325 317025 2427734 3 1 10968 11445 11922 5138 5138 7473 6042 6647 6724

Bacteria Trichodesmium erythraeum IMS101 637000329 203124 7750108 2 1 11922 8107 10014 5138 5138 4204 7251 7251 6112 Bacteria Waddlia chondrophila WSU 86-1044 646564588 716544 2131905 2 1 11445 10968 11445 4204 6539 7006 4230 4834 6724 Bacteria Xylella fastidiosa M12 641522659 405440 2475130 2 1 10014 11445 12399 3737 5138 3737 4834 4834 5501 * the last nine columns show read number.

Table 2.2 Reads, 16S rRNA contigs, OTUs and chimera examination of the simulated communities.

Sample HC-A HC-B HC-C MC-A MC-B MC-C LC-A LC-B LC-C Reads after quality filtering 999913 999909 999912 999703 999775 999769 999603 999606 999685 16S rRNA gene - containing reads 1303 1353 1376 984 1112 1153 874 916 860 16S rRNA contigs > 350 nt (chimera, chimera 130 (3, 1) 126 (7, 1) 125 (4, 3) containing > 1 contaminating read) Reads in 16S rRNA contigs > 350 nt (chimera, 3733 (85, 15) 3005 (365, 8) 2386 (374, 150) chimera containing > 1 contaminating read) Filtered 16S rRNA contigs (chimera, chimera 73 (0, 0) 53 (3, 0) 54 (3, 2) containing > 1 contaminating read) Reads in filtered 16S rRNA contigs (chimera, 3257 (0, 0) 2610 (330, 0) 2004 (364, 140) chimera containing > 1 contaminating read) Length of filtered 16S rRNA contigs (min, max, 458, 1548, 1262 574, 1529, 1127 515, 1532, 1174 mean) (nt) Recovered, missed, artificial OTUs (0.01) 81, 0, 0 81, 0, 0 81, 0, 0 75, 1, 1 77, 1, 1 77, 1, 1 80, 0, 0 79, 0, 0 80, 0, 0 Reads in recovered, missed, artificial OTUs (0.01) 1303, 0, 0 1353, 0, 0 1376, 0, 0 978, 2, 4 1106, 2, 2 1148, 4, 4 870, 0, 0 915, 0, 0 857, 0, 0 Recovered, missed, artificial OTUs (0.03) 74, 0, 0 74, 0, 0 74, 0, 0 69, 0, 0 72, 0, 0 72, 0, 0 72, 0, 0 71, 0, 0 72, 0, 0 Reads in recovered, missed, artificial OTUs (0.03) 1303, 0, 0 1353, 0, 0 1376, 0, 0 982, 0, 0 1108, 0, 0 1150, 0, 0 870, 0, 0 915, 0, 0 857, 0, 0 Recovered, missed, artificial OTUs (0.05) 52, 0, 0 53, 0, 0 52, 0, 0 49, 0, 0 50, 0, 0 49, 0, 0 49, 0, 0 48, 0, 0 48, 0, 0 Reads in recovered, missed, artificial OTUs (0.05) 1303, 0, 0 1353, 0, 0 1376, 0, 0 982, 0, 0 1108, 0, 0 1150, 0, 0 870, 0, 0 915, 0, 0 857, 0, 0

Chapter Two: Reconstruction Ribosomal Genes

Cymbastela coralliophila were obtained as described in Chapter 3. Shotgun pyrosequencing (454 Titanium) was conducted at the J. Craig Venter Institute, Rockville, USA and the resulting average read-length corresponded to the simulated datasets above. The shotgun sequencing is available through the Community Cyberinfrastructure for Advanced Microbial Ecology Research and Analysis (CAMERA) website (http://camera.calit2.net/) under project accession ‘CAM_PROJ_BotanyBay’.

2.2.2 Reconstruction of 16S rRNA gene sequences

The metagenomic reads of the simulated communities and the sponge microbial communities were pre-processed with PrinSeq (414) using the settings '("minlen":"60","maxlen":"700","minqualm":"20","nsmaxp":"1","complval":"50","noniupac ":"true","derep0":"true","derep1":"true","complmethod":"2","trimtails":"6","trimns":"1","tri mscore":"15","trimwindow":"2","trimstep":"1","tailsite":"1","trimsite":"3","trimtype":"2","tr imrule":"1")'. Metaxa (version 1.0.2) (415) was then used to identify reads containing 16S rRNA sequences. Reads (>300 nt) from triplicates were then pooled and assembled with the GS de novo Assembler 2.3 (454 Life Sciences, Branford, CT) using the ‘cDNA’ option, which is optimized for the uneven and high coverage typically expected in RNA assemblies. Default settings were used except 'overlap identity' was set to 99% as well as ‘reads limited to one contig’ and ‘extending low depth overlaps’ were selected. The 99% cut-off was chosen, as it would allow overlap of reads with a 1% error, which is typical seen towards the end of pyro-sequencing reads (413). Lower stringency (e.g. 97% as used by Radax et al. during the assembly of 16S rRNA gene (403)), resulted in unacceptable rates of chimera formation (data not shown). After aligning contigs to the SILVA 1.08 database by SINA (291), flanking regions that were not part of the 16S rRNA gene sequences were removed. Resulting contigs were then examined for chimerism. If a contig constituted reads from more than one strain and any of these strains was less than 99% sequence identity to the other strains, then the contig was considered a chimera.

2.2.3 Pyrosequencing of 16S rRNA genes amplified by PCR

Amplification of the 16S rRNA gene was performed on the same DNA sample as used for shotgun sequencing. Primers 28F ‘GAGTTTGATCNTGGCTCAG’ and 519R

34 Chapter Two: Reconstruction Ribosomal Genes

‘GTNTTACNGCGGCKGCTG’ were used for amplification of the variable regions V1-3. PCR and subsequent sequencing are described in ref. (416) and was performed at the Research and Testing Laboratory (Lubbock, USA). Trace data was deposited at the National Center for Biotechnology Information (NCBI) ‘Sequencing Read Archive’ database with the project accession SRP011939.

Analysis of the 16S rRNA tag-sequencing data was performed using Mothur v1.23.1 (301). Specifically, 'shhh.flows' was used for de-noising, 'trim.seqs (pdiffs=2, bdiffs=1, maxhomop=8, minlength=200)' was used for barcode removal and quality filtering, SINA was used for sequence alignment with the SILVA 1.08 database (291), 'screen.seqs(start=1048, minlength=245)' and 'filter.seqs (vertical=T, trump=.)' were used for alignment quality filtering, 'pre.cluster(diffs=2)' was used for further error reduction, 'chimera.uchime' was used for de novo removal of chimeric reads, and Metaxa (version 1.0.2) (415) was used to remove mitochondrial and chloroplast sequences.

2.2.4 Operational taxonomic unit (OTU) analysis

For simulated data, filtered 16S rRNA contigs (with coverage of more than 10 reads and length greater than 700 nt) and 16S rRNA reads not in contigs were pooled with the 16S rRNA sequences of the reference genomes used for simulation. Redundancy within these pools was removed with CD-Hit (99% identify cut-off). PhylOTU (409) was then used to generate OTUs with 0.01, 0.03 and 0.05 phylogenetic distance cut-off. OTUs containing both reference sequences and simulated shotgun sequences (filtered contigs or reads) were assigned as 'recovered'. OTUs containing only reference sequences were termed as 'missed', while those containing only shotgun sequences were assigned as 'artificial'. OTUs coverage was defined as the number of reads contained in each OTU. For the sponge samples, filtered 16S rRNA contigs (with coverage of more than 10 reads and length greater than 700 nt) and 16S RNA reads not in contigs were pooled with PCR-amplified tag-sequences and then processed as above to generate OTUs. Diversity analysis was performed with QIIME (417) and phylogenetic distance-based rarefaction was based on the tree of non-redundant sequences generated during the PhylOTU process.

35 Chapter Two: Reconstruction Ribosomal Genes

2.2.5 Taxonomic classification and phylogenetic analysis

16S rRNA classification were performed with the RDP Classifier 2.3 (418), except for the classification of the abundant OTUs in sponge samples, which were performed with the Greengenes Classifier (March 6, 2012) (419) followed by manual examination. Single-copy gene (SCG) based analysis was performed using MLTreeMap (version 2.05, ‘minimal sequence length after Gblocks’ set to 35) (406). For phylogenetic analysis, Maximum- Likelihood trees of the 16S rRNA gene contigs were constructed using RAxML (420) after alignment by SINA and removal of ambiguous positions by Gblocks (-t=d -b4=5 -b5=h) (421).

2.3 Results and Discussion

2.3.1 16S rRNA gene assembly with minimal chimera formation

As chimera formation was a major issue in previous assembly approaches (324, 403, 410), the occurrence of chimeric 16S rRNA contigs in the assembly strategy was examined on simulated datasets (see Materials and Methods). 9,931 (0.11 %) reads containing 16S rRNA gene information were detected from 8,997,875 shotgun reads after quality filtering (Table 2.2). After applying the assembly strategy between 125-130 contigs containing full or partial 16S rRNA genes were recovered (Table 2.2).

16S rRNA contigs longer than 350 nt were plotted by their length and read coverage (Fig. 2.1). 14 chimeric contigs (3.6%) were detected in all 381 contigs generated from the nine datasets (solid circle and triangles in Fig. 2.1). Four of these contigs could be readily detected using UChime (422) (arrows in Fig. 2.1). Eight chimeras contain only one 'contaminating’ read (solid circles in Fig. 2.1), which were mostly aligned to highly conserved regions of the 16S rRNA gene (data not shown). To examine whether these chimeras would affect the accuracy of community structure prediction, OTUs with different phylogenetic distant cut-offs (0.01, 0.03 and 0.05) were generated. In nearly all cases, all reference OTUs were recovered and no artificial OTU was generated. The only exception was for MC communities at a 0.01 OTU level where one artificial OTU was generated and

36 Chapter Two: Reconstruction Ribosomal Genes

Fig. 2.1 16S rRNA gene contigs and chimeric contigs for simulated datasets. Open circle: non-chimeric contigs; solid circle: chimeric contigs containing one contaminating read; solid triangles: chimeric contigs containing more than one contaminating read. Arrow: chimera detected by UChime. (A) HC. (B) MC. (C) LC.

37 Chapter Two: Reconstruction Ribosomal Genes one OTU present in the reference was missed (Table 2.2). This result shows that the assembly strategy recovers effectively the true microbial community structure, and especially OTU grouping of greater than 0.03 phylogenetic distance.

Fig. 2.2 Taxonomic classification of assembled and unassembled shotgun 16S rRNA gene reads for simulated datasets. (A) HC. (B) MC. (C) LC.

With the aim to recover long 16S rRNA sequences for phylogenetic analysis and to minimize the effects of potential chimeric assembly, contigs for length of greater than 700 nt and for a coverage of more than 10 reads were filtered (Fig. 2.1). In addition UChime was used for chimera removal. Sequences flanking the 16S rRNA gene were removed. This

38 Chapter Two: Reconstruction Ribosomal Genes resulted in 180 contigs (mean length: 1174-1262 nt) in the nine samples with only two (1.1%) of them containing more than one contaminating read (Table 2.2).

2.3.2 Assembly of 16S rRNA sequences improves taxonomic classification

With the assumption that longer 16S rRNA gene sequences can improve the taxonomic description of a community, the proportion of reads before and after assembly that could be confidently assigned using the RDP Classifier (80% confidence) were compared. Despite all strains in the simulated datasets being deposited in the RDP database, a steady decline of classification success was observed with between 60-70% of unassembled reads being assigned at the genus level. In contrast, assembled data showed generally higher classification success and at genus level more than 80% could be confidently assigned (Fig. 2.2). This shows a clear benefit of 16S rRNA gene assembly for taxonomic classification and demonstrates that this approach will also improve phylogenetic analysis (see Section 2.3.5).

2.3.3 16S rRNA gene reconstruction reveals community diversity that is missed by PCR-based approaches

Sponges (phylum Porifera) host complex communities of microbial symbionts, which are essential for the host’s function (74). Over the last decade substantial efforts have been made to describe the phylogenetic diversity and biogeography of sponge-associated microorganisms (74, 145). However, the vast majority of sponge microbiome surveys are based on PCR-amplification of the 16S rRNA gene. Only recently, has one study generated 16S rRNA contigs from a shotgun-sequenced transcriptome of a sponge microbial community (403). However, this study generated relatively short contigs (729 nt on average) despite extremely high sequencing coverage (66,743 reads containing 16S rRNA gene sequences) and the loose stringency during assembly could have created many chimeras (403).

To evaluate the phylogenetic diversity generated by the 16S rRNA gene reconstruction method, six shotgun metagenomes from the two sponges C. concentrica and C. coralliophila were analyzed. From 5,322,385 quality-filtered pyrosequencing reads, 1,942 reads containing 16S rRNA genes (0.04%) and generating 25 filtered contigs could be identified

39 Chapter Two: Reconstruction Ribosomal Genes

(Table 2.3). The majority of contigs were full or near-full length (see Table 2.3). Community composition of the six sponge DNA samples was also assessed by PCR-amplifying and pyrosequencing the variable region V1-3 of the 16S rRNA gene (pyro-tag-sequencing). 22,392 16S rRNA gene sequences were obtained and 1,366 were unique sequences after quality filtering and pre-clustering (see Section 2.2) (Table 2.4).

Table 2.3 The sponge metagenomic datasets.

Sample Cyr-A Cyr-B Cyr-C Cyn-A Cyn-B Cyn-C shotgun shotgun shotgun shotgun shotgun shotgun Sponge host C. coralliophila C. concentrica Raw reads 897408 971976 888127 678263 1169872 1323699 Average read size (nt) 387.6 353.2 276.8 358.0 408.1 392.8 Reads after quality filtering 859525 898161 788662 660869 1004075 1111093 16S rRNA gene - containing reads 282 385 95 237 530 413 16S rRNA gene contigs > 350 nt (reads) 48 (557) 66 (908) Filtered 16S rRNA gene contigs (reads) 13 (445) 12 (727) Length of filtered 16S rRNA gene contigs 1218, 1535, 1418 493, 1517, 1251 (min, max, mean) (nt)

Table 2.4 The sponge tag-sequencing data sets.

Sample Cyr-A Cyr-B Cyr-C Cyn-A Cyn-B Cyn-C PCR PCR PCR PCR PCR PCR Sponge host C. coralliophila C. concentrica Raw reads 5989 7895 13961 8257 5284 12509 Average read size (nt) 301.1 302.5 305.7 306.8 317.2 314.1 Reads after quality filtering 2342 3038 4988 3754 2140 6130 Unique sequences 212 179 311 265 155 244 Average size of unique sequences (nt) 269.8 268.9 272.2 267.2 271 269.2

The community composition derived from the pyro-tag-sequencing data, the shotgun reads with and without assembly and SCGs (see Section 2.2) were first compared at the phylum level (Fig. 2.3). In general, more phyla were detected in shotgun sequencing reads comparing to pyro-tag-sequencing data. Specifically, the PCR-based approach using the 28F/519R primer set recovered predominately phylotypes belonging to the cyanobacteria and , while the shotgun data also detected sequences in the , Nitrospira, Chloroflexi, and (Fig. 2.3AB). This may be not only due to potential primer bias (see Section 2.3.4), but also the short sequences (~250 nt after quality

40

Fig. 2.3 Phylum-level classification of the sponge pyro-tag-sequencing and shotgun sequencing datasets. (A) 16S rRNA gene PCR approach. (B) Unassembled shotgun 16S rRNA gene reads. (C) Assembled shotgun 16S rRNA gene reads. (D) SCG analysis.

Chapter Two: Reconstruction Ribosomal Genes processing, see Section 2.2, Table 2.4) that are difficult to classify. The presence of these ‘missed’ phyla (e.g. Chloroflexi) was also confirmed by SCG-based search (Fig. 2.3D). However, this SCG approach also failed to detect some taxa (e.g. Nitrospira and Verrucomicrobia), which is likely due to the low number of reference genomes available for these phyla. Overall, these results show that 16S rRNA gene analysis from metagenomic datasets has the superior capacity to detect a broad range of phylogenetic diversity.

Fig. 2.4 Shared and unique OTUs of the PCR-based and shotgun-based sponge datasets. Circle sizes are proportional to OTU number. (A) 0.01 phylogenetic distance OTU. (B) 0.03 phylogenetic distance OTU. (C) 0.05 phylogenetic distance OTU.

The pyro-tag-sequencing data and the 16S rRNA gene reconstruction approach were then compared by generating OTUs at different phylogenetic distance cut-offs (see Section 2.2).

42

Fig. 2.5 The rarefaction plots for the sponge datasets at an OTU distance of 0.01 (A), 0.03 (B) and 0.05 (C) and based on phylogenetic distance (D). The plots on the right are zoomed on the dashed boxes of the diagrams on the left.

Chapter Two: Reconstruction Ribosomal Genes

In general, the PCR-based approach produced more OTUs than the metagenome-based approach, except at the 0.05 OTU-level for C. concentrica (Fig. 2.4). This is obviously because of the much higher sequencing depth for the 16S rRNA gene in the pyro-tag samples (Table 2.3 and 2.4). A relatively low number of common OTUs were observed between the two approaches. However, the OTUs unique to the PCR-based approach only present a low proportion (2.5-8.3%) of all pyro-tag reads at OTU-levels of 0.03 and 0.05. This result shows that the majority of pyro-tag reads come from phylotypes that are also contained in the metagenomic data set and that the unique OTUs of the PCR-based approach either constitute low abundance phylotypes (e.g. are part of the rare biosphere) (19) or are undetected chimeras (316). In contrast, a high proportion of reads (~ 30%) belong to unique OTUs generated from the 16S rRNA gene reconstruction, which indicate that they come from abundant organisms that were missed by PCR-based approaches. Different levels of diversity of the PCR analysis and metagenomic reconstruction are also reflected in rarefaction plots (Fig. 2.5). Although the sampling depths of the shotgun samples were relatively low, the trends reflected in their rarefaction plots compared to the plots of the PCR samples clearly suggests a higher community diversity.

2.3.4 Primer bias can explain the lack of OTU detection

To further investigate how PCR-amplification failed to detect certain groups of bacteria (see Section 2.3.3), the most abundant 0.01-level OTUs (>2% in any of the 12 samples) were taxonomically classified (Fig. 2.6). OTUs assigned to the bacterial groups of Robiginitomaculum, Phyllobacteriaceae_4, OCS116, Rhodobacteraceae, Rhodospirillaceae, Acinetobacter, Oceanospirillaceae, Thiotrichaceae, Vibrionaceae, PAUC26f, Sva0996 and Verrucomicrobiaceae were consistently missed or poorly recovered by PCR. Among them, eight 16S rRNA gene contigs belonging to seven 0.01 OTUs (i.e. Robiginitomaculum, Rhodobacteraceae, Acinetobacter, Oceanospirillaceae, PAUC26f, Sva0996, and Verrucomicrobiaceae, including two contigs belonging to Sva0996) covered the entire V1-3 region of the 16S rRNA gene. Alignment of these eight contigs to the degenerate primers 28F/519R found seven of them had mis-matches (either one or both primers) (asterisks in Fig. 2.6). This suggests that primer bias is one of the major causes for the PCR-based approach missing certain OTUs (Fig. 2.4).

44 Chapter Two: Reconstruction Ribosomal Genes

Fig. 2.6 Abundance and primer-mismatches in the top OTUs at the 0.01 phylogenetic distance level for the sponge datasets. Asterisk, primer-mis-match event. The size of the dots presents the relative abundance of the OTUs.

2.3.5 Phylogenetic analysis of the novel 16S rRNA sequences detected by the shotgun approach

To examine how many of the 25 16S rRNA gene contigs reconstructed from shotgun sequencing data have so far not been detected by PCR-based approaches in these two sponges, searches against the NCBI NT database (April 7, 2012) and the full-length 16S rRNA genes (primes 27F and 1492R) previously amplified from C. concentrica by Thomas et al. (190) were performed. Any match with a BlastN identity of >99% was considered as an amplicon counterpart to the contigs. While none of the 13 contigs from C. coralliophila

45 Chapter Two: Reconstruction Ribosomal Genes found amplicon counterparts, 10 of the 12 contigs from C. concentrica had been previously detected (Table 2.5).

Table 2.5 16S rRNA gene contigs generated from sponge metagenomic samples.

16S rRNA contig Length (nt) Classification Amplicon counterpart Cyr

contig00001 1465 Cyanobacteria this study contig00002 1509 Sva0996 - contig00003 1481 TK10 this study contig00004 1518 Acinetobacter this study contig00005 1445 Rhodobacteraceae - contig00006 1512 BD2-7 this study contig00007 1535 Peptococcaceae this study contig00008 1473 PAUC26f this study contig00009 1442 Ruegeria this study contig00010 1315 Synechococcus this study contig00011 1232 Salinisphaeraceae this study contig00012 1218 Nitrosococcus this study contig00013 1283 BD2-11 -

Cyn

contig00001 1508 Sva0996 AY942763, ref (190), this study contig00002 1459 Robiginitomaculum AY942765, ref (190) contig00003 1517 Nitrosomonadaceae ref (190), this study contig00004 1458 Nitrosopumilus - contig00005 1516 Nitrospira AY942775, AY942757, ref (190), this study contig00006 1466 Phyllobacteriaceae_2 AY942778, AY942764, ref (190), this study contig00008 1383 Oceanospirillaceae - contig00009 1436 Aurantimonadaceae ref (190), this study contig00011 777 Verrucomicrobiaceae AY942771, AY942760, ref (190) contig00012 1172 OCS116 AY942776, GQ160462, ref (190) contig00021 493 Mesorhizobium ref (190) contig00022 494 Phyllobacteriaceae_3 ref (190), this study

Among the 15 undetected sequences, 10 were amplified by the primers used in the present study (Fig. 2.6). Of the five remaining contigs, the archaeon Nitrosopumilus was analyzed in the metaproteogenomic study of C. concentrica as described in Chapter 4. The four bacterial contigs were classified as Sva0996, Rhodobacteraceae, BD2-11 and Oceanospirillaceae (Table 2.5) and then further phylogenetically analyzed (Fig. 2.7). The Acidimicrobiales- and the -phylotypes are part of in sponge/coral specific clades in the Sva0996 group and the BD2-11 group, respectively (Fig. 2.7BC). The Rhodobacteraceae-phylotype

46 Chapter Two: Reconstruction Ribosomal Genes branches distantly from the most closely related free-living neighbors (Fig. 2.7A). The Oceanospirillaceae-phylotype has a closely related free-living strain (Fig. 2.7D). This phylotype in the sponge C. concentrica has been consistently missed by PCR-based approaches despite the current and previous extensive sequencing efforts using different protocols and primers (133, 137, 190, 423).

2.4 Conclusion

In the present study, it is described that stringent assemblies and filtering can recover nearly full-length16S rRNA gene sequences from metagenomic dataset. Through simulation of community with various complexities, it could be shown that chimera formation is minimal and will not impact on prediction of community composition. These properties make the described approach readily applicable to existing and future metagenomic datasets. Advances in next generation sequencing technology have in recent years led to an surge of metagenomic studies and thousands of datasets are currently available (342, 424). The present approach will thus prove itself useful in defining the phylogenetic diversity and community composition harbored in these metagenomic resources. It is also expected that this will lead to the discovery of new phylotypes that have previously eluded PCR-based detection and the analysis of sponge symbiont communities has provided examples for this.

Pyro-tag-sequencing has become a standard approach for defining community composition and has thus been extensively applied in, for example, the Human Microbiome Project (425) and clinical diagnosis (426). It is shown here that primer bias can have a substantial impact on the assessment of communities in terms of diversity, composition and abundance. Therefore, caution is recommended for the data generated from these experiments sololy using PCR approach and the use of multiple approaches and careful interpretation for truly rigorous diversity assessment is suggested. It might be worthwhile to benchmark primer choice based on 16S rRNA genes reconstructed from metagenomic data before establishing routine assays based on PCR methods.

47

Fig. 2.7 Phylogenetic analysis of the 16S rRNA gene sequences missed by PCR. Percentage bootstrapping values (1,000 replications) greater than 50% are shown. Sponge-derived sequences are in bold. Pentagram-marked sequences are from the present study. (A) Tree of the family Rhodobacteraceae, rooted to Leisingera methylohalidivoraans [AY005463]. (B) Tree of the clade Sva0996, rooted to Iamia majanohamensis [AB360448]. (C) Tree of the clade BD2- 11, rooted to Gemmatimonas aurantiaca [AP009153]. (D) Tree of the family Oceanospirillaceae, rooted to Comamonas composti [EF015884].

Chapter Three: Functional Convergence of Sponge Symbionts

Chapter Three Functional Equivalence and Evolutionary Convergence in Complex Communities of Sponge Microbial Symbionts

3.1 Introduction

Microorganisms form symbiotic relationships with eukaryotes encompassing all evolutionary stages, from simple amoebae to mammals. Symbiotic systems can range in complexity from those with a single dominant microorganism (e.g. Wolbachia in insects (427) or Vibrio in squids (428)) to those with hundreds of obligate or facultative microbial symbionts (e.g. communities in termite hindgut (429) or human colon (430)). Mechanisms that shape the structure of complex symbiont communities are largely unknown (46), however recent work on communities of free-living microorganisms indicate that both niche and neutral effects can play a role (94, 98).

Symbiotic microorganisms form different kinds of associations with their host, from an epiphytic lifestyle on green algae (431) to intracellular symbiosis in insects (432). Symbionts can be horizontally acquired, for example from the seawater for green algae (431) or from food in the human gut (11), and consequently symbionts will be recruited and selected based on their function. This niche selection can be decoupled from any symbiont taxonomy, thus leading to functionally coherent, but phylogenetically divergent communities (68, 100). However, symbionts can also be transmitted vertically through reproductive cells and larvae such as demonstrated in sponges (163, 167), insects (433) and various other animals (434). This leads to microbial communities with limited variation in taxonomy and function among host individuals of the same or related host species..

Equivalent niches may also exist in phylogenetically divergent hosts that lead similar life- styles or have similar physiological properties. Symbionts occupying these equivalent niches in different host types might thus share functional aspects. This expectation is supported by the recent observation that two phylogenetically distinct bacterial endosymbionts found in

49 Chapter Three: Functional Convergence of Sponge Symbionts either sharpshooters or cicadas possess analogous proteins for methionine synthesis (103). However, for complex symbiont communities it is largely unknown how prevalent functional equivalence is and whether equivalent functions are conducted by evolutionarily convergent mechanisms.

Sponges (phylum Porifera) are among the most ancient forms of metazoans in the current earth and form a major part of the marine benthic fauna across the world’s oceans (119, 148). Sponges are an evolutionarily divergent group of species; however they share common physiological characters and ecological roles, including the filter-feeding of planktonic microorganisms and particulate matter (148). Sponges also host complex communities of microbial symbionts and extensive research over the last decade has provided a good understanding of the phylogenetic diversity and biogeography of sponge-associated microorganisms (74, 145). Symbiont communities in sponges are generally highly specific to host species and are often consistent across time and space (74). This stable association can be explained by the vertical transmission of symbionts through larvae (162). However it has been challenging to identify phylotypes that are common to all sponges and hence represent archetypal symbionts (145). Taking all these characteristics together, the sponge microbiome represents an ideal model system to test if functional equivalence exists across divergent hosts and to study the mechanisms that shape complex symbiont communities.

Here an explicit experimental design is employed to address this issue by analyzing the phylogenetic and functional symbiont community structure of six phylogenetically divergent sponge species. Using a metagenomic approach and comparison with planktonic communities, common functions of sponge symbionts in all six microbiomes were identified. These core functions cover various aspects of metabolism and are importantly provided in each sponge species by functionally equivalent symbionts or analogous enzymes and biosynthetic pathways. Moreover, the abundance of elements involved in HGT suggests their key role in distributing core functions between symbionts during their co-evolutionary association with their host and facilitating functional convergence on the community scale.

50 Chapter Three: Functional Convergence of Sponge Symbionts

3.2 Materials and Methods

3.2.1 Sample collection and sponge identification

Sampling of R. odorabile, Stylissa sp. 445 and C. coralliophila occurred at Davies Reef on the Great Barrier Reef (GBR) (18°49'S, 147°38'E) and sampling of C. concentrica, Tedania anhelans and Scopalina sp. was done at Bare Island, in Botany Bay, New South Wales (33°59'S, 151°14'E). All sponges were collected by SCUBA diving at depth of 7 to 10 m on the sampling days (Table 3.1) and placed in ice-cold, filter-sterilized seawater. Further processing of the samples occurred in the laboratory within 15 minutes of collection. Sponges were morphologically identified by Patricia Sutcliffe and Dr Merrick Ekins at the Queensland Museum, Brisbane, Australia, and their phylogenetic relationships were further investigated using metagenomic-derived 18S rRNA gene sequences.

Table 3.1 Sponge and seawater samples used in the present study.

Sponge / seawater Cell fraction Sample ID Location Depth Date Seawater 0.1 - 0.8 µm SW01-A, SW01-B Bare Island, Sydney 2 m 18 Oct 2006 Seawater 0.8 - 3 µm SW08-A Bare Island, Sydney 2 m 18 Oct 2006 Cymbastela coralliophila < 3 µm Cyr-A, Cyr-B, Cyr-C Palm Island, GBR 7 - 10 m 29 Jul 2009 Rhopaloeides odorabile < 3 µm Rho-A, Rho-B, Rho-C Palm Island, GBR 7 - 10 m 30 Jul 2009 Stylissa sp. 445 < 3 µm Sty-A, Sty-B, Sty-C Palm Island, GBR 7 - 10 m 1 Aug 2009 Cymbastela concentrica < 3 µm Cyn-A, Cyn-B, Cyn-C Bare Island, Sydney 7 - 10 m 15 Sep 2009 Scopalina sp. < 3 µm Sco-A, Sco-B, Sco-C Bare Island, Sydney 7 - 10 m 23 Sep 2009 Tedania anhelans Total Ted-A, Ted-B, Ted-C Bare Island, Sydney 7 - 10 m 17 Sep 2009

3.2.2 Microbial cell enrichment, DNA extraction and sequencing

Microorganisms were enriched from the sponges according to the methods described by Thomas et al. (190), except that the final filter cut-off used to remove eukaryotic cells was individually selected for each sponge species to meet the following criteria: 1) no preferential removal of microorganisms (checked by microscopy and DGGE), and 2) removal of as many eukaryotic cells and organelles as possible (checked by microscopy and 16S/18S rRNA gene comparative PCR). DNA extraction of the seawater samples from the 0.1 µm and 0.8 µm filters and from the cell pellets of sponge bacteria were performed as described by Thomas et al. (190). Shotgun libraries were constructed and sequenced on the

51 Chapter Three: Functional Convergence of Sponge Symbionts

Roche 454 Titanium platform (J. Craig Venter Institute, Rockville, USA). The shotgun sequencing is available through the CAMERA website (http://camera.calit2.net/) under project accession ‘CAM_PROJ_BotanyBay’. Dereplication of the raw reads was conducted by cd-hit-454 (435) with the similarity cut-off of 96% and the short replicate read being covered by at least 95% of the longer replicate.

3.2.3 Phylogenetic analysis

Extraction of 16S rRNA gene sequences from the shotgun pyrosequencing datasets, reconstruction of microbial community profiles and phylogenetic analyses were conducted using the strategy described in Chapter 2. Classification of 16S rRNA genes was conducted using the RDP Classifier 2.3 (418). Community diversity of each sample was assessed by plotting richness curves based on OTU numbers and total phylogenetic distance, respectively, using QIIME (417). Taxonomic assignment of the OTUs was manually conducted based on their locations in the SILVA SSURef tree. A Maximum-Likelihood tree of the OTUs was constructed using RAxML (420) after alignment by SINA and removal of ambiguous positions by Gblocks (-t=d -b4=5 -b5=h) (436). 16S rRNA gene profiles of the samples were clustered using the weighted Unifrac algorithm implemented in QIIME (437).

The SCG-based analysis of community composition was performed on assembled contigs (see Section 3.2.4) using MLTreeMap (version 2.05, ‘minimal sequence length after Gblocks’ set to 35) (406).

3.2.4 Assembly, removal of eukaryotic DNA, and gene prediction

Dereplicated reads of each sample were assembled separately using the GS de novo Assembler ‘genomic’ with the default settings. Contigs, singletons and outliers were pooled and sequences smaller than 100 nt were removed. During the microbial cell fractionation, eukaryotic cells, mitochondria or plastids might not be sufficiently removed and the metagenomic data may therefore be contaminated with eukaryotic sequences. To remove those ‘contaminants’, assembled sequences were searched against the NCBI NT database (September 15, 2010) and the resulting files were parsed through the last common ancestor algorithm implemented in MEGAN (v3.9) (386). All sequences assigned to eukaryotic origin

52 Chapter Three: Functional Convergence of Sponge Symbionts were removed according the procedure described by Thomas et al. (190). Open reading frames (ORFs) of coding genes were predicted from the filtered sequences with the MetaGeneAnnotator (438). The coverage of each gene was calculated from the average coverage of the contig to which the ORF belongs.

For the analysis of the genes encoding ribulose-1,5-bisphosphate carboxylase oxygenase (RuBisCO), photolyase, ELPs and CRISPRs, an additional filtering process was added, involving taxonomic classification with PhymmBL V3.2 (383) using a custom-designed reference database. This database was based on the default PhymmBL reference dataset that includes all sequenced prokaryotic genomes in the NCBI RefSeq database (March 23, 2010) and was supplemented with the genomes of the sponge Amphimedon queenslandica, the round worms Brugia malayi and Caenorhabditis briggsae, the diatoms Blastocystis hominis, Thalassiosira pseudonana and Phaeodactylum tricornutum, the hydrozoan Hydra magnipapillata, as well as all sequenced sponge mitochondria (Mitochondrial DNA from the sponge host can theoretically make a substantial contribution to eukaryotic DNA contamination in the metagenomic samples). Contigs and their corresponding protein sequences assigned as eukaryotic after PhymmBL analysis with default parameters, were removed.

3.2.5 Functional analysis

Pyrosequencing reads for each sample were assembled, filtered for eukaryotic sequences and functional annotation was done. Specifically, predicted ORFs were translated to proteins and searched against the Clusters of Orthologous Group (COG) database (439) using rpsBlast, and against the Protein Family A (Pfam-A) database (v24.0) (440) using Hmmer 3 (441), both with an E-value cut-off of 10-10. Proteins with multiple domains were counted separately, while repeats of the same domain in a protein were counted once. Genes were also annotated to the SEED/Subsystems (442) using the online pipeline MG-RAST (v2) (443) with an E-value cutoff of 10-10. Sample matrices for COG, Pfam and Subsystem annotation were generated. The abundance of each function (e.g. a COG entry) in a sample was weighted by the coverage of the ORFs assigned to this function.

53 Chapter Three: Functional Convergence of Sponge Symbionts

The average genome sizes can potentially be quite different for metagenomic samples and can thus bias the functional profile comparison (444). Several strategies to predict average genome size (or genome copy) in metagenomic datasets have been proposed (445-447). These approaches usually calculate the average coverage of conserved, SCGs for normalization. A similar approach was used here by selecting 18 COGs (namely COG0048, COG0049, COG0087, COG0088, COG0091, COG0093, COG0094, COG0096, COG0097, COG0099, COG0100, COG0102, COG0184, COG0186, COG0256, and COG0522) from the 40 universal SCGs (448). These 18 COG entries were consistently abundant across all metagenomic samples and thus functional matrices of COG, Pfam and Subsystem annotation counts were normalized by the average abundance of the 18 COG entries in each sample.

3.2.6 Identification of differential abundance

The MetaStats script handles two matrices, the original input counts-table (Ctab) and the generated percentage-table (Ptab). MetaStats uses the Ptab to run a t-test and utilizes the Ctab to handle ‘spare’ counts. In the modified script, the sample matrix without normalization was used as the Ctab, and the normalized matrix (see Section 3.2.5) as the Ptab. Functional gene differences were defined if all of the following criteria were met: 1) the P-value was less than 0.05; 2) the group had more than 3 times higher counts of a function than the other group; 3) for the group with higher abundance, the normalized count of the specific function needs to be greater than one copy per genome.

Multidimensional scaling (MDS) plots of samples based on COG, Pfam and Subsystem annotations were generated using the Bray-Curtis similarity resemblance in PRIMER 6 (PRIMER-E Ltd, Lutton, UK). Heatmaps were generated using Cluster 3.0 (449) and Java TreeView (450).

3.2.7 ELP analysis

Proteins were searched against the Pfam-full profiles of the seven candidate ELPs using Hmmer 3 with a bit score cutoff of 25. Proteins from contigs of potential eukaryotic origin were removed as predicted by PhymmBL (see Section 3.2.4). The abundance of a specific ELP in each sample was weighted by ORF coverage and normalized by genome copy (see

54 Chapter Three: Functional Convergence of Sponge Symbionts

Section 3.2.5). The number of repeating ELP motifs in a protein was calculated from Hmmer search results. Secretion signals were predicted with the EFFECTIVE T3 software (451) for T3 and Sec secretory pathways. Only proteins with a complete ORF or an intact N terminal were included. If a given sequence was predicted to have both T3 and Sec signals, the prediction with the higher score was counted. To compare the diversity of the ELPs among samples in a sequence similarity context, ELPs were clustered with a cut-off of 75% identity using BlastClust (452). Representative sequences were picked for each cluster and a pair- wise alignment was generated using ClustalW 2.0 (453). Proteins with homologous domains to ELPs were retrieved for amoebae (Entamoeba histolytica for ANK, leucine rich repeats (LRR), NHL, and Protein tyrosine kinase (PTK), Entamoeba dispar SAW760 for LRR and NHL, Hartmannella vermiformis for NHL, and Polysphondylium pallidum for Fibronectin domain III (Fn3)), sponges (A. queenslandica for ANK and TPR, Suberites domuncula for ANK and PTK, Geodia cydonium and Ephydatia fluviatilis for Fn3), a (Oikopleura dioica for all seven protein classes), (Caenorhabditis elegans for LRR, TPR and NHL, and B. malayi for Cadherin), a fruit fly (Drosophila melanogaster for all seven protein classes) and human (Homo sapiens for all seven proteins). Unweighted Unifrac clusters of samples were generated with QIIME.

3.2.8 Cyanophage population analysis based on G20 proteins

Representative sequences of the cyanophage capsid assembly protein G20 were obtained from the NCBI database along with some from non-cyanophage T4-like phages as outgroup sequences (454). Redundancy of these reference proteins was removed by CD-Hit with a 99% identity cut-off (455). Non-redundant sequences with length greater than 140 aa were searched against all predicted proteins in the 21 samples using PSI-Blast (blastpgp -b 0 -j 3 -h 0.002 -e 0.0001). Hits were searched against the NCBI NR database (September 15, 2010) using BlastP (E-value cutoff of 0.0001). The five best hits for each protein were obtained and redundancy was removed by CD-Hit (identity cutoff 99%). All of the sequences obtained from the above two BLAST searches along with the cyanophage G20 reference sequences were clustered by Clans (456) with a P-value cutoff of 10-30 (Fig. 3.1). Three groups were formed and portal vertex proteins (green and blue dots) were removed as false positives. Proteins indicated by red and black dots in Fig. 3.1 and outgroup proteins were aligned by Muscle (457) and ambiguous positions were removed using Gblocks (t=p -b4=5 -b5=h). A

55 Chapter Three: Functional Convergence of Sponge Symbionts phylogenetic Approximate-Maximum-Likelihood tree was constructed using FastTree 2.1 (458).

Fig. 3.1 Clustering of proteins found by searching cyanophage G20 proteins. Black dots, canonical cyanophage G20 reference sequences; red dots, candidate cyanophage G20 proteins in the present study; green dots and blue dots, false positives (portal vertex proteins).

3.2.9 CRISPR analysis

CRISPR arrays were predicted from the contigs and singletons after assembly by the online prediction tool CRISPRFinder (459) followed by a series of quality filtering steps. Specifically, candidate CRISPR arrays were predicted from the contigs and singletons by CRISPRFinder (459) using the default setting, except that ‘Allowed mismatch between DRs’ was set to 5%, and ‘Allowed mismatch for the degenerated DR’ to 20%. Due to the complexity of the samples (short reads and DNA potentially originating from bacteria/archaea, phages and eukaryotic sequence contaminants), the simple rule of ‘short interspacing repeat’ to identify CRISPRs can potentially generate many false positives, including hits to microsatellites and repeat proteins. To exclude those non-CRISPR repeat sequences, stringent filtering criteria during the CRISPR prediction were used. False positives generated from microsatellite regions were removed using the tandem repeat

56 Chapter Three: Functional Convergence of Sponge Symbionts predictor Phobos (http://www.rub.de/spezzoo/cm/cm_phobos.htm) followed by a manual check. For CRISPRs containing more than two spacers, those whose longest and shortest spacers had a length difference of 3 nt or more were removed. CRISPRs containing two spacers with a length difference exceeding 1 nt were also removed. The remaining CRISPRs with more than one spacer were considered as positive multi-spacer CRISPRs. Due to the fragmented nature of the metagenomic sequences, many candidate CRISPRs predicted by CRISPRFinder contain only one spacer (mono-spacer CRISPRs). Only mono-spacer CRISPRs containing exactly the same repeat sequences found in positive, multi-spacer CRISPRs were accepted. This stringent filtering yielded 203 CRISPRs.

Repeats and spacers were extracted from the CRISPRs and clustered based on pairwise identity by BlastClust with an identity cut-off of 50% and the alignment region covering 80% of the shorter sequence. Samples were clustered according to the presence or absence of repeat/spacer clusters by Bray-Curtis similarity and group-average linkage implemented in PRIMER-6.

The NCBI NT database (September 15, 2010) and comprehensive viral databases downloaded from CAMERA (460) (CAM_BroadPhage, BroadPhageGenomes, CBVIRIO, HFVirus, LakeLimnopolarVirome, MarineVirome, SalternMetagenome, TampaBayPhage, ViralSpring, and ViralStromatolite) were used to examine the potential targets of the spacers in public databases by BlastN (-e 0.1 -W 7 -q 3 -r 1 -G 5 -E 2 -F F) with criteria allowing one gap and one mis-match in the query sequence. To identify the potential targets of these CRISPRs in the present metagenomic samples, all spacers and repeats were searched using BlastN (-e 0.1 -W 7 -q 3 -r 1 -G 5 -E 2 -F F) against the contigs/singletons containing no CRISPR loci. Only contigs/ singletons matched spacers but not repeats were taken as potential targets of the CRISPRs. Plotting of spacers and their potential targets were conducted in Cytoscape 2.8.1 (461).

3.2.10 CRISPR-associated (CAS) protein analysis

All ORFs were searched against the Tigrfam HMM profiles for CAS proteins (462) using Hmmer 3 (441) with a cut-off score of 25. Raw counts were normalized by genome copy (see Section 3.2.5) for each sample.

57 Chapter Three: Functional Convergence of Sponge Symbionts

As the Csn1 profile (TIGR01865) may pick up non-Csn1 proteins containing HNH domains, protein clustering based on pair-wise identity was used to remove false positives (Fig. 3.2). Specifically, the Tigrfam model (TIGR01865) for the multi-domain protein Csn1 from the Nmeni subtype could potentially pick up other proteins with the HNH domains (e.g. proteins belonging to the R-M system). Therefore TIGR01865 profile hits with canonical Csn1 proteins were further clustered and visualized based on pair-wise sequence identity using Clans (456) with the P-value cutoff of 10-20 (-blastpath ‘blastall -p blastn -W 7 -q 3 -r 1 -G 5 -E 2 -F F’) (Fig. 3.2). Most proteins from TIGR01865 (black dots) formed a single group. Some of the proteins from samples in the present study formed two adjacent groups (blue and green dots), while the other proteins (red dots) generally showed high sequence variance from each other. None of the sequences in green or blue had close to Csn1 in the NCBI database (all hits belonged to other HNH endonuclease-domains and proteins from R-M systems) and were therefore removed. Many of the sequences represented as red dots in Fig. 3.2 have Csn1 as their best homologies (by BlastP search against the NCBI NR database) and are considered as candidate Csn1 proteins. Their positions in Fig. 3.2 indicate their variation from each other and from the canonical Csn1 proteins (black dots).

Fig. 3.2 Clustering of Csn1 and related proteins (TIGR01865). Black dots, canonical Csn1 reference sequences; Red dots, candidate Csn1 proteins found in sponge samples; Green dots and Blue dots, false positives (HNH endonuclease domain containing proteins and proteins from R-M systems).

58 Chapter Three: Functional Convergence of Sponge Symbionts

3.3 Results and Discussion

3.3.1 Overview of samples and dataset

Six sponge species were selected to cover a wide selection of sponge morphologies, incorporate taxonomically diverse species and encompass a broad geographic range (tropical vs. temperate) (see Table 3.1, Fig. 3.3 and Section 3.2). Seawater samples were collected as described by Thomas et al. (190). The microbiome of 21 samples (3 x 6 sponges and 3 water samples) were sequenced via a shotgun strategy and the resulting reads were assembled, filtered for eukaryotic sequences and annotated (see Section 3.2 and Table 3.2). 8,373,475 unique, predicted protein-coding sequences with an average of 398,737 predicted protein- coding sequences per sample were obtained.

3.3.2 Sponge microbial communities possess distinct phylogenetic and taxonomic profiles

To initially characterize the diversity of the microbial communities associated with the sponge and seawater samples, partial 16S rRNA gene sequences were constructed from the metagenomic datasets (see Section 3.2). Analysis of the 35 most abundant OTUs (at a 0.03 distance cut-off) showed distinct microbial community profiles between the sponges (Fig. 3.4). Replicate samples were generally very similar, which is consistent with the previous concept of sponge-specific, stable microbial associations (74). The microbial communities of the six sponges represented a wide spectrum of community diversity, evenness and shared phylotypes (Fig. 3.4). For example, Scopalina sp. and T. anhelans were dominated by two very similar phylotypes (less than 0.03 phylogenetic distance) belonging to the Nitrosomonadaceae (Fig. 3.4 and 3.5A), while R. odorabile and C. coralliophila had more even species distributions with limited overlap (Fig. 3.4 and 3.6DE) (133, 167).

59 Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.3 Phylogenetic relationship of the sponges based on 18S rRNA sequences. Maximum-Likelihood tree is constructed and percentage bootstrapping values (1,000 replications) greater than 50% are shown. The tree is rooted to the coral Acanthogorgia granulata [FJ643593]. Sponges from the present study are shown in bold. The Axinella clades are named according to ref. (463). Photos of C. coralliophila and R. odorabile were provided by Dr Heidi Luter. The photo of C. concentrica was provided by Dr Michael Taylor. The inconsistance of sponge nomenclature and their phylogeney is due to the complexity of sponge identification (710).

60

Table 3.2 Sample information in read processing, 16S rRNA gene containing sequences, assembly, decontamination, and functional annotation.

Sample SW01-A SW01-B SW08-A Cyr-A Cyr-B Cyr-C Rho-A Rho-B Rho-C Sty-A Sty-B Sty-C Cyn-A Cyn-B Cyn-C Sco-A Sco-B Sco-C Ted-A Ted-B Ted-C

Dataset ID BBAY01 BBAY02 BBAY01SM BBAY31 BBAY32 BBAY33 BBAY34 BBAY35 BBAY36 BBAY37 BBAY38 BBAY39 BBAY40 BBAY41 BBAY42 BBAY43 BBAY44 BBAY45 BBAY62 BBAY63 BBAY64

Raw read 580055 746735 1073717 897408 971976 888127 949133 583576 519285 626753 464320 514969 678263 1169872 1323699 647813 1015654 609105 597168 611468 363686

Average read size (nt) 354.9 338.1 372.5 387.6 353.2 276.8 402.0 377.3 401.6 319.5 275.0 285.5 358.0 408.1 392.8 349.0 264.3 350.7 356.9 310.1 302.7

Unique read 572731 737462 1028840 859525 898161 788662 904146 505357 506761 590822 436017 457425 660869 1004075 1111093 629438 908792 499316 472310 575079 331761

16S rRNA gene read 450 651 855 244 306 56 298 198 201 73 20 22 188 467 350 277 208 184 211 232 128

Read in 16S OTU 294 440 508 180 228 37 147 120 98 52 14 12 107 368 252 218 144 118 157 196 103

Aligned read 183126 229604 336186 434407 471683 259531 460513 198840 180893 212513 142256 169357 347438 704702 679422 394585 507543 239533 316595 369205 182101

% 32.0% 31.1% 32.7% 50.5% 52.5% 32.9% 50.9% 39.4% 35.7% 36.7% 33.2% 37.5% 53.8% 72.0% 63.3% 62.7% 55.9% 48.0% 67.0% 64.2% 54.9%

Contig > 500 nt 8117 11058 23064 23889 18203 14309 44408 17286 19306 7117 2124 3023 11417 11398 26612 6922 8902 8636 4356 3622 2404

Avg size of contigs > 500 nt (nt) 918 890 1038 1022 1285 710 902 900 836 941 711 742 1185 1459 1119 1160 1215 1030 1005 1095 1265

N50 size of contigs > 500 nt (nt) 859 843 1050 993 1614 682 909 909 838 894 674 713 1345 2262 1192 1279 1461 1046 1030 1200 1598

Max size of contigs (nt) 59504 56677 33749 101234 107941 9624 52481 16414 16419 15909 7128 7136 28780 323086 39109 21389 28627 17816 15827 14797 17891

Contig > 100 nt 16740 22129 40022 54483 39079 48688 71479 29504 32819 23857 14243 16470 22865 22168 56410 19077 22936 22414 11737 11167 8027

Singleton > 100 nt 362835 466957 656384 393881 372735 405414 416488 284341 296611 329277 248978 251623 275175 251668 360031 203477 254563 233729 131480 148917 115509

Prokaryotic-originated contig & singleton 312189 401518 541809 306688 296277 277660 458148 290530 303412 211574 147447 149459 212117 200120 275351 148431 174793 166156 92981 107452 80526

% 82.2% 82.1% 77.8% 68.4% 71.9% 61.1% 93.9% 92.6% 92.1% 59.9% 56.0% 55.7% 71.2% 73.1% 66.1% 66.7% 63.0% 64.9% 64.9% 67.1% 65.2%

Unique protein 397993 498626 627609 320969 312616 232380 566378 345825 369637 184763 121107 115832 215220 229113 281686 152849 176185 175117 92709 109457 78864

Total protein 503724 625286 778971 488668 475089 282226 787435 427733 449663 235211 140259 140770 342235 550559 555480 343903 323773 262430 260009 255111 144937

Protein annotated by COG (1e-10) 147095 175547 225095 110188 120192 34505 224936 116837 119903 20576 5998 6397 103220 223797 180529 99959 96012 56501 93834 85932 41350

% 29.2% 28.1% 28.9% 22.5% 25.3% 12.2% 28.6% 27.3% 26.7% 8.7% 4.3% 4.5% 30.2% 40.6% 32.5% 29.1% 29.7% 21.5% 36.1% 33.7% 28.5%

Protein annotated by Pfam (1e-10) 135897 160948 220514 123056 138496 35542 229853 119095 121332 24034 6584 6595 113803 260250 206000 112534 114479 61817 105494 100697 48512

% 27.0% 25.7% 28.3% 25.2% 29.2% 12.6% 29.2% 27.8% 27.0% 10.2% 4.7% 4.7% 33.3% 47.3% 37.1% 32.7% 35.4% 23.6% 40.6% 39.5% 33.5%

Protein annotated by SEED (1e-10) 313252 369728 415554 171709 183220 56199 312420 162140 168854 36410 11874 11798 156775 319211 262973 154869 150679 93418 142113 127306 61999

% 62.2% 59.1% 53.3% 35.1% 38.6% 19.9% 39.7% 37.9% 37.6% 15.5% 8.5% 8.4% 45.8% 58.0% 47.3% 45.0% 46.5% 35.6% 54.7% 49.9% 42.8%

Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.4 Microbial community diversity of sponge and seawater samples. The relative abundance of the 35 most abundant OTUs (according to the sum of the relative abundance across all samples) is illustrated. Phylogenetic distance cut-off for OTU generation is 0.03. The size of a dot reflects the relative abundance of an OTU in a sample. Maximum- Likelihood tree of the OTUs is shown on the left and bootstrapping percentage greater than 50% are given (1,000 replications). The tree is rooted with the archaeal clade. Samples are clustered based on the phylogenetic relationships of their OTUs (the top 35 ones and the low abundance OTUs) using the weighted Unifrac algorithm with 1,000 rounds of Jackknife values (in percentages) shown in nodes. ‘Low abundant OTUs’ are those not in the top 35, while ‘16S rRNA sequences not in OTUs’ reflects those reads that fail to assemble into contigs used for OTU generation.

Both sponge-specific and general bacteria and marine thaumarchaeota were detected in different abundances in the sponges. Two groups of abundant Thaumarchaeota phylotypes belonging to the Marine Group I were present in Stylissa sp. 445, R. odorabile and C. concentrica, including a dominant Cenarchaeum-like OTU in Stylissa sp. 445 (Fig. 3.4). Marine Group I Thaumarchaeota (previously classified to the phylum (464)) are often found in sponges (74) and can be subdivided into three clades, namely Group C1a-α, Group C1a-Porifera A, and Group C1a-Porifera C (465). A phylogenetic tree was constructed for the four thaumarchaeal 16S rRNA gene sequences in these three sponges including one from Stylissa sp. 445 (thaumarchaeal symbiont Subtype II) not in OTUs (Fig. 3.5B). The dominant thaumarchaeon Subtype I in Stylissa sp. 445 belonged to the sponge specific Group C1a-Porifera C, which specifically associates with AXI2 sponges, including Stylissa sp. 445 (Fig. 3.3) (465). Association of a filamentous thaumarchaeon from this group within the collagen surrounding the siliceous spicules of three Mediterranean AXI2 sponges has previously been reported (139). The other three sequences all fell into the Group C1a-α, which contains two sequenced taxa Nitrosopumilus maritimus SCM1 (466) and Candidatus Nitrosoarchaeum limnia (467). This group contains clones from a diverse range of habitats, including hydrothermal vents, deep-sea sediments, sponges, and planktonic clones (465). No obvious host-clade specificity was found in this group (465). The thaumarchaeon in R. odorabile was found to be most abundant in the pinacoderm region (468) and can be vertically transmitted by sponge larvae (164). Despite this, the polyphyletic nature of Group C1a-α implies that at least some of its sponge-associated members may be facultative symbionts and can be free-living or have a conditional association with sponge

63 Chapter Three: Functional Convergence of Sponge Symbionts hosts. The conditional association of the thaumarchaea has been observed in C. concentrica (190) (Chapter 4) and in sponges from Brazilian waters (469).

Fig. 3.5 16S rRNA gene Maximum-Likelihood tree of Nitrosomonadaceae and Marine Group 1 Thaumarchaeota. Percentage bootstrapping values (1,000 replications) greater than 50% are shown. Sponge-derived sequences are shown in bold. Pentagram-marked sequences are from the present study. (A) Tree of Nitrosomonadaceae. The tree is rooted to Petrobacter succinatimandens [AY219713]. (B) Tree of Marine Group 1 Thaumarchaeota. The tree is rooted to Thermofilum pendens [X14835]. Groups were named according to ref. (465). Solid round- marked species/strains have complete/draft genomes available.

The different planktonic size fractions (i.e. 0.1 to 0.8 µm, and 0.8 to 3 µm) showed distinct community profiles compared to the sponge samples and were dominated by phylotypes of

64 Chapter Three: Functional Convergence of Sponge Symbionts

SAR11, SAR86, the Roseobacter-clade, Flavobacteriaceae and the OCS155 Marine Group. These taxa are frequently found in seawater from around the globe (324). This is consistent with the previous notion that sponges contain microbial consortia that are distinct from those of the surrounding seawater (74).

The similarity between replicates was also confirmed by analyzing the community profiles using SCGs (Fig. 3.6ABC). As many bacterial and archaeal genomes contain more than one copy of the 16S rRNA gene, the real relative abundance of detected phylotypes is potentially biased (470, 471). To further quantify the community composition of the samples, the phylum level profiles using both 16S rRNA gene (assembled and unassembled, see Section 3.2) and SCG-based analyses were compared (Fig. 3.6ABC). The analysis of the SCGs showed a consistent community composition between replicate samples, while classification of unassembled 16S rRNA gene sequences showed greater variation probably reflecting the copy number differences among species or strains. However, due to the limited reference database of SCGs, sequences belonging to the same ribotype can be mistakenly assigned to phylogenetically distant groups or even different phyla (e.g. the Thaumarchaeota population was assigned to both the Crenarchaeota/Thaumarchaeota and other archaeal phyla by MLTreeMap (Fig. 3.6ABC)). The assembly-based construction of 16S rRNA gene sequences gave more accurate classification for highly abundant taxa in the community, compared to direct classification of unassembled reads, which were generally too short for confident assignment. Nevertheless, all three methods confirmed that microbial populations were highly consistent within each sponge species and the seawater samples, but distinct between sample types.

Fig. 3.6 Sponge-associated microbial community composition at phylum level and community diversity. Species richness is based on 16S rRNA gene OTUs reconstructed from metagenomic data. Rarefaction curves are generated with means of 1,000 rounds of Jackknife sub-sampling. (A) Classification based on SCGs. (B) Classification by 16S rRNA gene sequences without assembly (80% confidence). (C) Classification by 16S rRNA gene OTUs constructed in the present study together with unassembled 16S rRNA gene sequences (80% confidence). (D) Species richness calculation based on observed OTU number. (E) Species richness calculation based on phylogenetic distance.

65

Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.7 MDS plots of samples by Bray-Curtis similarity. (A) Sample MDS plots by COG annotation. (B) Sample MDS plots by Pfam annotation. (C) Sample MDS plots by Subsystem annotation.

3.3.3 Functional annotation reveals shared genomic signatures in sponge symbionts

Consistent with the phylogenetic analysis, functional annotation by COG (439), Pfam-A (440), and SEED/Subsystem (442) databases showed that replicate samples were very similar and each sponge species generally contained distinct gene compositions reflecting 67 Chapter Three: Functional Convergence of Sponge Symbionts specific ecological functions or interactions within the sponge holobiont (Fig. 3.7). Functional profiles of the community in Stylissa sp. 445 were generally more distantly related to the other samples, most likely due to the dominant thaumarchaeal symbionts being poorly represented in the reference databases (Table 3.2). Interestingly, functional profiles of the communities in Scopalina sp. and T. anhelans were quite distinct, despite both being dominated by closely related phylotypes (Fig. 3.4 and 3.5A).

Despite the host-specific functional profiles, statistical analysis (see Section 3.2) identified a range of functional features that distinguish the sponge-associated communities as a whole from those found in seawater (Fig. 3.8). Discovery of these common functions indicates similar niches in these divergent sponge hosts. The distinct phylogenetic structure of the microbial communities among these sponges further implies that functionally equivalent symbionts with convergent genomic contents may occupy these niches. These core features are likely to be of general importance for the adaptation of microorganisms to the sponge host environment and revealed some of the basic principles underpinning the symbiotic interactions. These characteristics are discussed in the next sections.

3.3.4 Nitrogen metabolism and adaptation to anaerobic conditions

The contributions of planktonic and sediment microbial communities to the marine nitrogen cycle is well appreciated (472), however, only recently have studies revealed a high rate of nitrogen metabolism in marine microorganisms associated with invertebrate hosts, especially the reef-building corals and sponges (473-475). Nitrogen fixation (180), nitrification (398, 476), anaerobic respiration of ammonium (anammox) (398) and denitrification (396, 398) activities have all been separately analyzed in various marine sponges by stable isotope probing, 16S rRNA or functional gene analyses. In the present study, nitrogen metabolism related functions were significantly enriched in sponge samples (Fig. 3.8).

Further detailed annotation provides an opportunity to investigate aspects of the nitrogen cycle in an integrated way across a range of sponge systems. Compared to the planktonic samples, sponge-associated communities generally contained many more genes related to denitrification (Fig. 3.9). Key enzymes in the first two steps, namely the respiratory nitrate

68 Chapter Three: Functional Convergence of Sponge Symbionts

69 Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.8 Specific functions abundant in sponge-associated or planktonic microbial communities. (A) Abundance of sponge/seawater specific functions by COG annotation. (B) Abundance of sponge/seawater specific functions by Pfam annotation. (C) Abundance of sponge/seawater specific functions by Subsystem annotation. The brightness (red) in the heatmap reflects abundance of a particular function in a sample (copy per genome). Samples are clustered by Bray-Curtis similarity and average general algorithm. * mostly nitrate reductase for respiration. † R-M system component Yee. ‡ CRISPR-associated protein Cas1. § mostly RecD-like DNA helicase YrrC. ¶ mostly F420-dependent n(5)n(10) methylenetetrahydromethanopterin reductase (EC 1.5.99.11). || unknown function. ** unknown function. reductase (cytoplasmic NarG or periplasmic NapA) and respiratory nitrite reductase (copper- containing NirS or cytochrome cd1-dependent NirK), were present at 0.3-1.2, and 0.3-1.4 copies per genome, respectively. Interestingly, specific reductase groups were preferentially found in certain sponge symbiont communities. For example, the periplasmic NapA was abundant in the symbiont community of Scopalina sp. and T. anhelans, while the cytoplasmic NarG was more frequently found in the other four sponges (Fig. 3.9). The ‘NarG-rich’ communities also showed a high number of nitrate/nitrite antiporters (NarK), which import nitrate into the cytoplasm and export nitrite from it (Fig. 3.9) (477). This revealed that different sponge communities utilize analogous pathways for denitrification, although it is not clear why these preferences for NarG or NapA exist.

Further variations in enzyme composition became apparent for the subsequent two steps of denitrification, where generally lower copy numbers of nitric oxide reductase (quinol- dependent qNor or cytochrome c-dependent cNorB) and nitrous oxide reductase (NosZ)

70 Chapter Three: Functional Convergence of Sponge Symbionts were observed (especially in R. odorabile and C. concentrica). This suggests that at least some of the denitrifiers in the sponge microbial communities have incomplete or alternative pathways of nitrate reduction and may accumulate nitric or nitrous oxide (191, 478). Recent studies showed that the methanotrophic bacterium Candidatus Methylomirabilis oxyfera could convert nitric oxide directly to nitrogen and oxygen (479), the latter being subsequently used for methane oxidation. As oxygen can only diffuse ~1 mm into sponge tissue, many parts of the sponge become anoxic when pumping stops (480, 481). Dismutation of nitric oxide to produce oxygen might enable the sponge symbionts to maintain aerobic respiration during periods when the host is not actively pumping. However, further analysis would be required to fully elucidate these theories.

Fig. 3.9 Abundance of enzymes in the energy-producing (respiratory) pathways of nitrogen cycling. With the exception of ammonium monooxygenase subunit A (AmoA) (Pfam annotation), abundances of enzymes are obtained from SEED/Subsystem annotation (see Section 3.2). Unit of the horizontal axis: copy per genome. Standard deviations are shown.

Several candidate organisms are inferred to be able to perform the denitrification process in the sponge samples based on the metabolic reconstruction. C. concentrica contained a phylotype belonging to the family Phyllobacteriacea (Fig. 3.4). Members of the Mesorhizobium and Nitratireductor in this family are capable of fixing nitrogen and

71 Chapter Three: Functional Convergence of Sponge Symbionts reducing nitrate to nitrite, respectively (482-484). Some of the NarG genes and an assembled NarGHIY gene cluster in this sponge could be assigned to the Phyllobacteriacea-phylotype after genomic sequence binning (Chapter 4). In Scopalina sp. and T. anhelans, two closely related, uncultured phylotypes of the family Nitrosomonadaceae (Betaproteobacteria) dominated the microbial communities (Fig. 3.4). These phylotypes were also related to Nitrosomonas spp. and Nitrosospira spp., which are both ammonia-oxidizing bacteria (Fig. 3.5A). Species in these two genera may also contain NirK and cNorB, subjected to HGT (485, 486), and are putatively responsible for nitrous oxide production (487). As ammonia monooxygenase was very rare in these two sponge metagenomes, but denitrification enzymes (i.e. NapA, NirK and cNorB) were abundant (Fig. 3.9), these new Nitrosomonadaceae-phylotypes are most likely primarily involved in denitrification.

Genes of both bacterial (PF05145) and archaeal (PF12942) AmoA were also detected (Fig. 3.9). Abundance varied from very low in Scopalina sp. and T. anhelans to an average of more than one gene copy per genome for Stylissa sp. 445. While in some sponges bacterial AmoA appeared to be exclusively present, in others (such as Stylissa sp. 445) orthologs for both the bacterial and archaeal AmoA are found.

Oxidation of nitrite to nitrate might not be prevalent in the sponges investigated as known nitrite-oxidizing bacteria, like Nitrospira, were only detected in two C. concentrica samples and in very low abundance in R. odorabile (Fig. 3.4). However, the sequence of the nitrite oxidoreductase subunit α (NxrA) is highly similar to NarG (488), so some of the sequences detected here might still be involved in the oxidation of nitrite. Anammox activity might be a rare feature in these six sponges, as homologs to the hydroxylamine-oxidizing enzyme (Hzo) (489) and sequences belonging to known anammox bacteria within the Planctomycete group were absent. Also, no gene for respiratory nitrite ammonification enzymes (e.g. NrfA, EC 1.7.2.2) was detected.

The gene coding glutamate dehydrogenase (GDH) (PF05088) was overrepresented in R. odorabile, C. coralliophila and C. concentrica (Fig. 3.8B). GDH has a potentially important role in nitrogen assimilation in pathogenic bacteria, such as Mycobacterium smegmatis (490). Ammonium assimilation through GDH requires a much lower activation energy than the ubiquitous glutamine synthetase / glutamate synthase pathway and is thus utilized under conditions of nitrogen excess and energy preservation (490, 491). The 72 Chapter Three: Functional Convergence of Sponge Symbionts distribution of GDH in sponge bacteria suggests ammonium excess is experienced for those host-associated taxa. However, GDH can also function in glutamate catabolism and therefore may act to release ammonia from natural glutamate sources, such as proteinaceous exudates from their host.

The integrated analysis highlights that different sponges host distinct microorganisms that utilize different enzymes to perform equivalent functions in denitrification and ammonium oxidation. However, not every sponge community encodes for the complete nitrogen cycle. Whilst respiratory nitrate reduction was ubiquitous, reflecting temporary or permanent anoxic conditions within the interior of sponges (492), many subsequent steps in denitrification reflected incomplete pathways. An alternative to dissimilarity nitrite reduction for supporting anaerobic growth may be acetogenesis. Acetyl-CoA synthetase (ADP forming, COG1042) was abundant in all sponge samples (Fig. 3.8A) and represents the major energy- conserving reaction through substrate level phosphorylation during peptide, pyruvate and sugar fermentation to acetate (493).

3.3.5 Photosynthesis and Photoprotection

Although sponges generally filter-feed to remove microbes or particulate organic matter from the surrounding seawater (494), phototrophy by microbial symbionts can also make a substantial contribution to the host’s growth, especially in low-nutrient and highly illuminated tropical waters (126, 495). To determine phototrophic populations in the sponge symbionts, the presence of RuBisCO was investigated and a phylogenetic analysis using MLTreeMap was performed (406). A high abundance of Form 1 and 4b RuBisCO was observed in the seawater samples (Fig. 3.10A) consistent with the potential for high carbon fixation rates in marine surface waters (496). Among the six sponges, only the tropical species C. coralliophila and Stylissa sp. 445 possessed the highly abundant Form 1 RuBisCO, mostly due to their cyanobacterial populations (Fig. 3.4 and 3.10A). The tropical sponge R. odorabile, however, possessed only the RuBisCO-like proteins in the Form 4 clade, which catalyze the 2,3-diketo-5-methylthiopentyl-1-phosphate enolase reaction in the methionine salvage pathway (497). The lack of phototrophy in R. odorabile has previously been reported with photorespirometry trials, photopigment analysis and an absence of cyanobacteria in sponges from both inshore and offshore reefs clearly demonstrating that it is not a

73 Chapter Three: Functional Convergence of Sponge Symbionts photosynthetic species (498). These observations are consistent with the morphological properties of these three sponges. C. coralliophila and Stylissa sp. 445 are plate-and fan- shaped, respectively, and are hence structurally optimized to harvest light energy and couple it to carbon fixation. In contrast, R. odorabile is a massive three-dimensional sponge with a dense canal system for filter-feeding (499). The bowl-shaped temperate sponge C. concentrica (500) may be morphologically optimized for phototropic growth, but did not have a significant abundance of prokaryotic RuBisCO. However, it contains dense populations of symbiotic diatoms (133), which were mostly removed during prokaryotic cell enrichment in the present study (see Section 3.2). The data show that sponges with optimized body shapes for light harvesting, conducted photosynthesis by phylogenetically diverse symbiotic populations (e.g. cyanobacterial populations vs. diatoms), which were distinct from the free-living ones (e.g. cyanobacteria and proteobacteria) (Fig. 3.10A).

High levels of illumination can result in photo-damage and consistent with this a high abundance of photolyases (PF03441, COG3046, COG0415, Subsystem: DNA Repair Bacterial Photolyase) and the key enzyme phytoene dehydrogenase in carotenoid biosynthesis (COG1233, here comprising mostly CrtI-type phytoene dehydrogenase) were detected in planktonic samples (Fig. 3.8). Many phylogenetic divergent planktonic taxa had this photolyase protection mechanism (Fig. 3.10B). In contrast, photolyases were rare in the sponge communities, likely due to photoprotection provided by the sponge tissue and pigments (177). Nevertheless, some photo-stress might still occur especially in the community of the tropical sponge Stylissa sp. 445, where a number of diverse photolyase sequences were found (Fig. 3.10B).

3.3.6 Nutrient utilization and nutritional interactions with the host

Nutritional conditions inside a host are notably different to the surrounding seawater and this was clearly reflected in the genomic composition of the sponge symbionts.

The abundance of the genes coding creatininase (creatinine amidohydrolase, EC 3.5.2.10) (PF02633) and hydantoinases/ oxoprolinase (EC 3.5.2.9) (PF01968, PF02538, PF05378), which act on the carbon–nitrogen bonds of cyclic amides, demonstrates the capability of sponge symbionts to degrade and utilize metabolic intermediates like creatinine, pyrimidines

74

Fig. 3.10 Abundance and diversity of Rubisco/Rubisco-like proteins (A) and proteins in the photolyase/cryptochrome family (B) with taxonomic annotation. Size of the dots reflects abundance in gene copy per genome.

Chapter Three: Functional Convergence of Sponge Symbionts or 5-oxoproline. Creatine is a nitrogenous organic acid that uses a high-energy phosphate bond to transfer energy between cells of eukaryotic tissue and is considered a metabolically more stable molecule than ATP (501). Most invertebrates, including sponges, synthesize or utilize creatine (502, 503), which can be nonenzymatically and irreversible converted to creatinine in vivo (501). This host-originated creatinine is a valuable carbon and nitrogen source (504), and the two enzymes mentioned above might be crucial for their efficient utilization by sponge symbionts.

Degradation of benzoic compounds by sponge symbionts was also evidenced by the abundance of enzymes from the metal-dependent hydrolase family (COG2159, PF04909) in particular in C. coralliophila, R. odorabile and T. anhelans (Fig. 3.8AB). This family includes the 2-amino-3-carboxymuconate-6-semialdehyde decarboxylase, which converts alpha-amino-betacarboxymuconate-epsilon semialdehyde to alpha-aminomuconate semialdehyde. 2-amino-3-carboxymuconate-6-semialdehyde decarboxylase is involved in the 2-nitrobenzoic acid degradation pathway in prokaryotes by utilizing 2-nitrobenzoic acid as a sole source of carbon, nitrogen, and energy (505). The proteins belonging to the glyoxalase/ bleomycin resistance protein/ dioxygenase superfamily (PF00903) found in C. coralliophila, R. odorabile and C. concentrica (Fig. 3.8B) were mostly ring- extradiol dioxygenases and are therefore most likely also involved in degradation of aromatic compounds (506).

Underpinning those metabolic pathways for potentially host-derived compounds was a high abundance of transporters, including the ABC-type transporters with oligomer-binding domains (PF08402), which deliver various substrates (507), and the oligopeptide/dipeptide transporters (PF08352) of the OPN family, which specifically transport oligopeptides, dipeptides, or nickel (508) (Fig. 3.8B). Interestingly, transport proteins were among the most highly expressed functions in the metaproteomic study of the symbionts in C. concentrica (Chapter 4).

In addition to nutrient acquisition and utilization by the symbiont community, potential benefits for the host sponge were also discovered. Sponge communities were enriched in the function of ThiS (PF02597) and NMT1/THI5-like protein (PF09084) associated with the synthesis of the essential vitamin thiamin pyrophosphate (TPP3 or vitamin B1), which animals must obtain from their diet (Fig. 3.8B) (509, 510). Genes for thiamine synthesis 78 Chapter Three: Functional Convergence of Sponge Symbionts were also found in the sponge-associated thaumarchaeon Cenarchaeum symbiosum (400) and vitamin B12 synthesis was previously identified as an abundant function in the microbial community of C. concentrica (190) and the genome of Poribacteria sp. (191).

In contrast to the planktonic community, sponge symbionts had very few enzymes involved in the breakdown of dimethylsulfoniopropionate (DMSP) (Fig. 3.8C). DMSP synthesis is estimated to account for ~1 to 10% of global marine primary production (511). A large fraction of planktonic bacteria, including members of the Roseobacter and SAR11 clades (Fig. 3.4), assimilate sulfur from DMSP (511-513). The metagenomic results suggested that, in contrast to coral-associated bacteria (514), metabolism of DMSP by sponge symbionts is minor, which is consistent with its concentration being generally low, often below the detection limit in marine sponges (515).

This analysis has highlighted some of the common metabolic features of sponge-associated communities and has provided new insight and hypotheses on how co-metabolism between the sponge host and microorganisms can underpin symbiosis. It is also apparent that sponge symbionts have evolved specific metabolic profiles that are distinct from those of planktonic microorganisms.

3.3.7 Resistance to environmental and host-specific stress

Sponge symbionts were enriched in proteins related to stress responses, such as the universal stress protein (USP) (PF00582) and the PotD (COG0687) (Fig. 3.8AB). In E. coli, UspA is expressed in response to a wide variety of stressors, including nutrient starvation and exposure to heat, acid, heavy metals, oxidative agents, osmotic stress, antibiotics, and uncouplers of oxidative phosphorylation (516). PotD is a periplasmic protein involved in the uptake of polyamines, such as putrescine, spermidine, and cadaverine (517). Intracellular polyamines are linked to the fitness, survival and pathogenesis of many bacteria living in host environments (518).

Sponges or their symbionts are well known for their production of antimicrobial compounds that potentially protect against fouling, predation and competition (519). Permanent symbionts need to defend themselves against this chemical stress and the abundance of

79 Chapter Three: Functional Convergence of Sponge Symbionts predicted permeases YjgP/YjgQ (COG0795) likely contribute to this protection (Fig. 3.8A). This family contains LptF and LptG, which transport lipopolysaccharide from the inner membrane to the cell surface in Gram-negative bacteria (520). Lipopolysaccharide locates in the outer membrane and can serve as a selective permeability barrier against many toxic chemicals, such as detergents and antibiotics (521).

Filter-feeding sponges can also accumulate a high concentration of heavy metals including mercury (522). As a defense against this toxin, sponge symbionts showed a high abundance of mercuric reductase proteins (Subsystem: Mercuric Reductase), which catalyze the reduction of Hg(II) to elemental mercury Hg(0) and act in the detoxification of the immediate environment (523). Supporting this genomic prediction is the observation that all microbial isolates from an Indian Ocean sponge possessed high levels of resistance against mercury (251).

Overall the data illustrate that sponge symbionts in general have acquired resistance mechanisms that are specifically tailored to the stress experienced within their host environment.

3.3.8 Regulation of cellular response

Functions in signal transduction and regulation are overrepresented in sponge samples (Fig. 3.8).

HAMP-containing proteins (PF00672) act as transmembrane modules of two-component signaling pathways to respond to changing environmental conditions (524). The HAMP domains are chemoreceptors that couple motions of transmembrane helices to the activity of a downstream cytoplasmic output domain (525). Its specific role in sponge symbionts is currently unclear.

PTK (PF07714), which functions as an ‘on or off’ switch in many cellular functions by modifying gene expression, were abundant in the sponge metagenomes (Fig. 3.11GH) (526). This canonical PTK family is mostly found in eukaryotes, while bacteria have developed several other types of enzymes that catalyze protein phosphorylation on tyrosine (527). Two

80 Chapter Three: Functional Convergence of Sponge Symbionts eukaryotic-like PTKs have been found to function in signal transduction (528, 529). The reason why this eukaryotic-like PTK was present in sponge symbionts is not clear.

Fig. 3.11 Abundance of ELPs in seawater versus sponge samples, and in free-living versus symbiotic species. Abundance is normalized by genome copy of each sample. Standard deviations are shown. Tests between each sponge and the seawater group are performed, respectively, at 95% CI and significant differences are marked (single asterisk, P-value ≤ 0.05 but > 0.01; double asterisks, P-value ≤ 0.01). (A-G) Seawater versus sponge samples; (H) Free-living versus symbiotic species in the IMG database (November 25, 2011).

81 Chapter Three: Functional Convergence of Sponge Symbionts

Protein homologs to the eukaryotic male sterility protein (PF07993) were also abundant in the sponge samples (530, 531). This protein is capable of lipid biosynthesis in prokaryotes (532). Synthetic lipids can be signals for surface-associated microorganisms to regulate motility for predatory feeding (532). Many sponge microorganisms live in the mesohyl between sponge cells (130), and thus surface motility might be important for their association with the sponge host.

Other regulatory function might be provided by abundant ATPases (COG1373, COG4637, COG0464, COG2865), including some specifically acting in gene regulation (COG0464, COG2865) (Fig. 3.8A).

Proteins belonging to the ribosome-binding GTPase superfamily (COG1217) were abundant in the seawater samples. This protein acts as a translational GTPase and a global stress and virulence regulator. It has been found to be involved in divergent stress resistant functions in different bacteria (533-538).

3.3.9 ELPs and their potential interaction with the host

A recent metagenomic analysis of the microbial community of C. concentrica found that sponge symbionts were enriched in ELPs, and in particular in proteins containing ANK repeats and TPRs (190). A similar observation was subsequently reported in a sponge- associated Poribacteria genome (191). These classes of proteins are often found in facultative or obligate symbionts and are postulated to modulate host behavior by interfering with eukaryotic protein-protein interactions that are mediated by these repeat domains. Recent work provided evidence that Legionella pneumophila used ANK-containing proteins to interfere with cytoskeletal processes of their amoebal host (539). Such molecular interference may be critical for sponge symbionts that need to escape phagocytosis by their host (190). The likely importance of these proteins was further highlighted as bacterial ANK proteins were expressed in the metaproteomic study of the sponge C. concentrica (Chapter 4).

The six sponge metagenomes were not only all rich in proteins with ANKs (COG0666, PF00023) and TPRs (COG0790, PF00515, PF07719), but also in LRR (PF00560) and NHL

82

Fig. 3.12 T3SS and Sec pathway secretion prediction of ELPs. Standard deviations are shown. (A) ANK. (B) LRR. (C) TPR. (D) Fn3. (E) cadherin. (F) NHL. (G) PTK.

Chapter Three: Functional Convergence of Sponge Symbionts repeats (PF01436) (Fig. 3.8AB and Fig. 3.11). LRR proteins are essential for virulence in the pathogen Yersinia pestis (540) and can activate host cell invasion by the pathogen Listeria monocytogenes (541). NHL domains occur in a variety of proteins and potentially function in protein-protein interaction (542). Fn3 proteins (PF00041) and cadherins (PF00028), which are likely involved in adhesion to the host cells, were also abundant (Fig. 3.11DE). Fibronectin proteins can bind to integrins to mediate cell-cell contact and possible colonization (543), while cadherins can play a role in cellular uptake of bacteria into host eukaryotic cells (544). Many of the ELPs detected in the dataset also have predicted signal peptides and thus potentially function extracellularly towards the sponge cells (Fig. 3.12). The abundant type IV secretion systems (Fig. 3.12) in sponge symbionts might also play a role in delivering those ELPs into the host cells as has been shown for L. pneumophila (539).

While ELPs as a class are generally more abundant in sponge microorganisms than the surrounding seawater, unweighted Unifrac clustering based on the pair-wise alignment showed very little sequence similarity between the ELPs of different symbiont communities (Fig. 3.13). In addition, the ELPs have very limited sequence similarity to proteins from eukaryotes (including the sponge A. queenslandica), further indicating that sponge symbionts have divergent and highly specific sets of ELPs.

Together, these data show that sponge symbionts generally contain a large abundance and variety of ELPs that likely mediate interactions with their hosts, implying that each symbiont community may undertake very specific ELP-mediated interactions and communications with their host.

3.3.10 Genomic evolution through HGT

Mobile genetic elements (MGEs), such as transposons, plasmids and prophages, mediate HGT, facilitating the evolutionary adaptation of microbial populations to specific niches (545). Compared to the planktonic bacterial community, the six sponge communities

Fig. 3.13 Sample clustering based on ELP sequence similarity. Samples are clustered using a weighted Unifrac algorithm. Supporting values (in percentage) bigger than 50% of 1,000 replications of Jackknife sub-sampling are marked. (A) ANK. (B) LRR. (C) TPR. (D) Fn3. (E) cadherin. (F) NHL. (G) PTK. 84

Chapter Three: Functional Convergence of Sponge Symbionts investigated here also had a high abundance of systems involved in HGT, including transposase (COG3328, PF02371, PF01609, PF00872, PF05598), conjunctive transfer systems (COG3451, Subsystem: Type 4 secretion and conjugative transfer, Subsystem: Conjugative transfer related cluster), and retroid elements containing reverse transcriptase (COG3344, PF00078) and integrase (PF00665) (Fig. 3.8). Genetic systems for transformation (COG0758) were also abundant in some species.

The potential activities in genetic exchange and rearrangement in sponge microbial communities was further supported by an over representation of DNA recombination and repair enzymes, including RecD (COG0507), which is involved in double-strand DNA break repair, DinG (COG1199), SSL2 (COG1061) and HepA (COG0553), which are crucial for DNA recombination repair and excision repair, as well as protein families involved in general DNA modifying activities like HNH endonuclease (PF01844), and the SNF2 family N-terminal domain (PF00176). These results are consistent with the genome of the sponge symbiont C. symbiosum (400) suggesting that these enzymes are essential for the stable insertion of mobile DNA into the chromosomes and repair of flanking region in sponge symbionts.

Besides this general tendency, each species has a specific set of HGT systems. For example, R. odorabile was especially rich in transposases, conjugative elements, retroid elements and a sub-population of phages, while Stylissa sp. 445 had a lower number of transposases, but a remarkable abundance of T4-like phages (Fig. 3.8).

Despite the general abundance of bacteriophages in the marine environment, there is no information available on phage diversity in sponge systems. 211 sequences encoding for a T4-like phage capsid assembly protein G20 with 107, 23 and 72 from Stylissa sp. 445, R. odorabile, and seawater samples, respectively, were noted here. Phylogenetic analysis revealed that all 211 G20 sequences belonged to the cyanophage group (Fig. 3.14B). When normalized, the abundance of the G20 protein in Stylissa sp. 445 averaged 7 copies per bacterial/archaeal genome (Fig. 3.14A). This abundance was correlated with the large population of cyanobacteria (mostly Synechococcus) in Stylissa sp. 445 (Fig. 3.4). T4 phages are only capable of undergoing a lytic and not the lysogenic lifecycle (546). Thus it is predicted that the cyanobacterial population size in Stylissa sp. 445 is strongly influenced by

86 Chapter Three: Functional Convergence of Sponge Symbionts lysis, or that cyanobacterial metabolism is controlled by viral photosystem genes as recently demonstrated in other systems (547).

The phylogenetic analysis also demonstrated that many G20 sequences formed distinct clusters with no closely related homologs in the current NCBI NR database, indicating novel T4-like cyanophages in the dataset (Fig. 3.14B). These cyanophage sequences showed no host specificity or apparent biogeography in the samples, consistent with the ‘everything is everywhere’ notion of global phage distribution (34, 548).

To further explore the hypothesis that different communities have different set of MGEs, the samples were clustered according to the abundance of 25 detected transposases or transposon-related proteins/domains as annotated by Pfam database (Fig. 3.15). Whilst the clustering revealed very similar transposase profiles within sponge replicates, each sponge community had its own distinct set of transposon systems. There was no evidence for biogeography (i.e. tropical communities were not more similar too each other than to the temperate-water sponges), which could indicate a limited inter-community (i.e. between different sponge species) dispersal of these MGEs within a region. It is therefore proposed that transposons might play a very specific role in exchanging genetic material within communities. These mobile elements can potentially allow all community members to share traits that are specific for the adaptation to their common host (e.g. via conjugation or transformation) and consequently facilitate the evolution of functional convergence inside a symbiont community. Alternatively, transposons could be involved in disrupting non- essential genes that are no longer required for a bacterial/archaeal symbiont as it evolves into a stable association with the host (see Section 3.4).

Fig. 3.14 Cyanophage abundance based on G20 protein analysis. (A) Abundance of cyanophage based on G20 protein number. Standard deviations are shown. (B) T4-like phage populations. The Approximate-Maximum-Likelihood tree of T4 phage G20 proteins was constructed and supported by 1,000 rounds of FastTree local support values (values > 0.5 are marked). The tree is rooted to the outgroup consisting of non-cyanophage T4-like phages. Each grouped clade in gray contains at least one protein detected across samples. The number of proteins detected in each sample type is shown. Clades are named according to ref. (454).

87

Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.15 Abundance and diversity of transposases. The brightness (red) in the heatmap reflects abundance of a particular transposase in a sample (copy per genome). Samples are clustered by Bray-Curtis similarity and average general algorithm. Transposase entries are clustered with Euclidian distance and complete linkage.

3.3.11 Mechanisms in controlling excessive genetic exchange

High rates of HGT can be detrimental to the cell (549) as it erodes genomic integrity (550). Moreover, the high filter-feeding rates of sponges make them particularly vulnerable to phage attack from the plankton (190). Phage-mediated transduction can lead to lysis and death of the bacterial cell (37). Therefore, it was hypothesized that effective mechanisms are potentially required to control excessive genetic exchange and minimize the introduction of foreign DNA in sponge microbial communities. Indeed, R-M systems, CRISPRs and CAS proteins were abundant and diverse across all sponge datasets (Fig. 3.8, 3.16 and 3.17). R-M and Toxin-antitoxin (T-A) systems are also often considered selfish elements that are involved in MGE competition and an ‘arms-race’ between the chromosomes and the MGEs

89 Chapter Three: Functional Convergence of Sponge Symbionts

(551, 552). The evolutionary accumulation of these two systems in sponge symbionts further supports the postulated high rate of HGT inside sponge symbiont communities.

Fig. 3.16 Abundance of CRISPR loci and spacers. Standard deviations are shown.

Fig. 3.17 Abundances of CAS proteins in subfamilies. It is possible that MGEs, such as plasmids or phages, acquire chromosomal R-M and hence become a stable part of the microbial cell. Such selfish and self-protecting features are recognized as an important mechanism for maintaining extrachromosomal elements (551, 552). T-A systems also play a similar role in stabilizing selfish genetic elements. These systems are generally arranged with one toxin and one antidote and lead to post- segregational killing (553) or addiction (551) of the host cell. In all sponge metagenomes, a larger number of Type I (COG0286, COG0610, COG4096, PF02384, PF12161, PF01420), Type II (COG0270, COG1743, COG0863, COG0338, COG4889, PF00145), and Type III (COG2189, PF04851) R-M systems, and proteins from Doc/Phd family (COG3177,

90 Chapter Three: Functional Convergence of Sponge Symbionts

PF02661), VapI (HigA) of the HigAB system (COG3093) and other T-A systems (Subsystem: Toxin-antitoxin systems (other than RelBE and MazEF)), which would help to stabilize an array of MGEs were found. Chromosomes can, however, also unburden themselves from extrachromosomal ‘hitchhikers’ by acquiring the same T-A or R-M as the MGEs. In turn, MGEs could acquire new T-A and R-M systems that would ensure their continuing propagation. This would lead to an ‘arms-race’ between the chromosomes and the MGEs and it has been hypothesized that it results in a higher number of T-A systems in bacterial species that have high rate of HGT (554). The observation of abundant T-A and R-M system is therefore consistent with the large number and diversity of MGEs observed as well as the high frequency of HGT that is postulated for sponge symbionts.

CRISPRs are recently discovered inheritable and adaptive immune systems that provide resistance against the integration of extrachromosomal DNA. Aided by CAS proteins, CRISPRs can acquire short DNA sequences from the invading phage or plasmid and incorporate it into an array of spacer sequences (555). These spacers can then be transcribed and hybridized with invading DNA, which is then degraded by CAS proteins with nuclease function (555). To better understand the diversity of CRISPRs in sponge-associated communities, the potential CRISPR arrays from the metagenomic datasets were extracted. In total, 203 CRISPR arrays were detected from five sponge metagenomes under the stringent filtering criteria with an average of between 0.28 to 0.74 CRISPR copies per genome (Fig. 3.16). No CRISPR was detected in Stylissa sp. 445 samples, which is likely related to the presence of abundant cyanophages in this species. CRISPRs were also virtually absent in seawater samples.

Clustering of repeats and spacers at 50% similarity cut-off showed almost no overlap between the different sponge species (Fig. 3.18). As spacer regions are ‘historical’ records of current and past phage infection (556), the lack of overlap indicates that microbial communities from the same geographic location (e.g. the GBR) have experienced attacks by distinct viral populations. This finding is consistent with the distinct bacterial and archaeal communities within each sponge species (Fig. 3.4), as many phages have a high degree of host-specificity (29). However, different replicates with very similar species composition also contained very few common spacers reflecting the dynamic nature of phage infection and suggesting that phage defense might involve small-scale temporal or spatial variation (557). 91 Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.18 Sample clustering by CRISPR repeats (left) and CRISPR spacers (right). Clustering is based on Bray-Curtis Similarity (presence/absence) by average general algorithm.

The diverse nature of CRISPR was also underpinned by a variety of CAS proteins (462, 558). CAS proteins have been classified into a core set: eight subtype-specific groups and an RAMP-module related group based on the sequenced microbial genome (462). Proteins from the core set and all subtypes were detected in sponge samples and occurred in different abundances for each sponge species (Fig. 3.17). Surprisingly, a low abundance (or absence) of Cas1 and Cas2 proteins was observed in the microbial communities of Scopalina sp. and T. anhelans. Cas1 and Cas2 are considered as the hallmarks of a functional CRISPR array and were proposed to have an essential function during the spacer acquisition step (462). Interestingly, the lack of Cas1/Cas2 coincided with a high abundance of Csn1 (Fig. 3.17). Csn1 is a multi-domain protein thought to possess multiple functions, which are otherwise performed by individual proteins in other subtypes. As far as is known, in all known genomic arrangements, the gene for Csn1 is exclusively located upstream of the genes for Cas1 and Cas2, which are themselves always upstream and directly adjacent to the CRISPR array (462, 559). However, in the T. anhelans dataset, one contig where the Csn1 gene is directly upstream of a CRISPR array was found (Fig. 3.19). This confirmed a deviation from the canonical Cas1/Cas2-based CRISPR arrangement and highlights potential variation in CRISPR structure in sponge-associated microorganisms. It is also worthwhile noting that almost all current Csn1/Nmeni containing CRISPRs are found in genomes of pathogens and commensals (462), although their functional potential is not clear.

92 Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.19 A novel Csn1 arrangement in a CRISPR cassette. Contig layout is generated using Geneious 4.86 (http://www.geneious.com). Gray arrows indicate the CRISPR array.

To further explore the dynamics of the local phage populations, spacers were searched against the NCBI NT and virus databases but no hits were found suggesting host specificity in the local environment (560) and highlighting the largely unknown viral diversity (31). Whilst the sample fractionation employed in this study did not specifically target viral particles, it was still possible to identify 85 putative viral/phage sequences that matched 43 CRISPR spacers from four sponge species, with a notable number of hits found in C. concentrica (see Section 3.2 and Fig. 3.20). All spacer-phage pairs were exclusively found within the same sponge species again indicating a high degree of host-specificity. Generally, spacers were in much lower abundance than their targets (Fig. 3.20). Despite the fact that the protocols the present study enriched specifically for bacterial and archaeal cells, which likely results in a considerable underestimation of viruses, the results showed that a large number of phage sequences were present in the sponge and subject to potential defense by the CRISPR system.

3.4 Conclusion

The analysis showed that despite large phylogenetic differences, recognizable ‘core functions’ exist in symbiont communities from phylogenetically divergent, yet functionally related hosts in tropical and temperate waters. Thus communities from divergent hosts can have a degree of functional equivalence. The specific symbiotic functions identified here are not only consistent with the current understanding of the biological and ecological roles of sponge-associated microorganisms, but have also provided insight into novel symbiont functions (e.g. creatinine metabolism). Detailed metagenomic analysis in the present study has thus facilitated an understanding of the interactions of complex symbiont communities with eukaryotic hosts thereby contributing to an enhanced appreciation of the holobionts (192).

93 Chapter Three: Functional Convergence of Sponge Symbionts

Fig. 3.20 Local specificity and abundance of CRISPR and their potential targets. Edges indicate the connection between the spacers and their potential targets. The object on the left side of an edge stands for the spacer and the one on the right side for the target matched by this spacer. The size of objects refers to their abundance in samples (from 0.0087 to 0.443 copy per genome). Ellipse stands for replicate A, triangle stands for replicate B, and rect stands for replicate C. Olive stands for samples of C. concentrica, red stands for samples of Scopalina sp., black stands for samples of R. odorabile, and lime stands for samples of C. coralliophila.

Niche selection and neutral hypothesis have both been used to model community structures of free-living and host-associated microorganisms (46, 68, 98). The initial symbiont acquisition, the potential for vertical symbiont transmission and the evolution of obligate relationships between sponges and microorganisms (74), have most likely had a major influence on these two types of selection processes. Initially, different types of free-living microorganisms likely entered into associations with sponge hosts. These early associations may have been less selective or more random (i.e. neutral) and hence different sponge species would have acquired different phylogenetic clades of microorganisms. This scenario is consistent with the distinct taxonomic profiles in different sponge species observed here and in other studies (74). As the symbiotic relationship evolves and vertical transmission occurs (162, 167), symbionts will have maintained or acquired functions that would stabilize their interaction with their host. For different host species with similar functional niches, it

94 Chapter Three: Functional Convergence of Sponge Symbionts means that symbionts will functionally converge. The detailed analyses of the six sponge microbiomes indeed revealed that many of these sponge-specific functions are fulfilled by phylogenetically distinct symbionts as well as analogous enzymes and biosynthetic pathways (e.g. the CAS proteins or the types of nitrate reductase proteins involved in denitrification). It means that symbiont communities in divergent hosts have evolved different ‘genomic solutions’ to perform the same function or to occupy the same niche.

The highly abundant and diverse MGEs detected in sponge symbionts may play key roles in these evolutionary processes in three ways. Firstly, MGEs can mediate HGT and distribute essential core functions, such as stress resistance, ELPs and phage-defense, among community members. This activity would have facilitated an evolutionary adaptation to the specific host environment (284). As a consequence, individual genomes from different phylogenetic lineages should become more similar to each other, which is consistent with genomic observations of mammalian gut bacteria (561). Secondly, adaptation to a host environment may no longer require all functions that free-living bacteria have and removal of non-essential genes can be mediated by a dramatic increase in transposon density (114). Examples of this might include the loss of photolyase genes in sponge symbionts (Fig. 3.10B). Such a process would be the same as the reduction in genomic functionality observed in facultative and obligate symbiosis of simple systems (562). Thirdly, individual genomes could use MGEs to eliminate functions that are already provided by other genomes and whose benefit might be shared within the community. This would result in niche specialization of individual phylotypes, for example through nutritional interdependence (563). As a consequence of this specialization, different communities will evolve or utilize different members (and hence different genes) for specific tasks, which is consistent with the functional equivalence concept introduced above. On the genomic level this will also result in a higher level of heterogeneity, which has already been noted during the assembly of two sponge symbiont genomes (191, 400). This genomic evolution of sponge symbionts also appears to be an ongoing process as transposable elements are actively expressed in contemporary microbial communities of C. concentrica (Chapter 4).

Future studies involving the reconstruction of genomes from metagenomic data and SCS techniques (191) would offer additional evolutionary insights into sponge symbiosis and potentially reveal further principles of functional equivalence, evolutionary convergence and specialization during the complex co-evolution with an ancient animal host. 95 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

Chapter Four Metaproteogenomic Analysis of the Microbial Community Associated with the Sponge Cymbastela concentrica

4.1 Introduction

Marine sponges represent a significant component of the marine, benthic communities throughout the world. Sponges harbor diverse communities of microorganisms, which often form stable and specific associations with their symbiotic host (74, 127, 564). While much progress has been made over the last decade in defining the phylogenetic diversity and patterns of sponge-associated microbial communities (74, 192), information on the function of individual symbionts or the microbial community as a whole is limited.

Examples where specific members have been assigned functional roles, include cyanobacterial symbionts, which can provide photosynthetically fixed carbon to the sponge host (126) and the bacterial production of biologically active metabolites that may play a role in host defense (173, 174). The processes of nitrification/denitrification and anaerobic ammonium oxidation (Anammox) have been well investigated in the sponges Geodia barretti (398), Dysidea avara and Chondrosia reniformis (396) using stable isotope experiments, and by the identification of 16S rRNA gene sequences the Anammox process was putatively linked to in a reef sponge (565) (also reviewed by Webster and Taylor (192)). These approaches require, however, an a priori knowledge of the processes performed in the sponge holobiont or the establishment of an irrevocable link between microbial phylogeny and function. Combining these limitations with the inherent difficulty of culturing (potentially obligate) symbionts has meant that there is only a rudimentary understanding of the ecological functions of sponge-associated microorganisms and the nature of the host-symbiont interactions (127).

The whole-community approaches of metagenomics, metatranscriptomics and metaproteomics provide a promising avenue to explore the function of uncultured organisms and for substantially advancing the field of sponge-microorganism symbiosis research. For

96 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica example, a recent study in the laboratory on the microbial community associated with the sponge C. concentrica using metagenomics led to the recognition of many novel genomic markers that could provide specific mechanisms for bacteria to persist within, and interact with, their sponge host (190). To further explore the genetic potential offered by metagenomic datasets, metaproteomics is being increasingly employed to describe the expressed protein profile of microbial communities. Low-diversity microbial systems, such as those of acid mine drainage (566) and lake water from Antarctica (567), but also more complex systems in waste water sludge (568, 569), the human microbiome (570), the hindgut microbiome (369) and open ocean (571-573) have been studied in this way. For these systems, the combination of high-throughput protein MS with extensive metagenomic datasets has provided novel and direct insights into functions expressed by microorganisms.

To further the understanding of microbial functions in sponges, an integrated approach of using metagenome sequencing and metaproteomics on the microbial community associated with C. concentrica, an abundant marine sponge found in shallow, temperate waters of the Australian east coast was carried out. This sponge contains a stable and diverse microbial community, with predominantly uncultured phylotypes belonging to the Gammaproteobacteria, Phyllobacteriaceae, Sphingomondales, Piscirickettsiaceae and Bdellovibrionales amongst others (190). Results showed the expression of transport functions relevant to host-derived nutrients, aerobic and anaerobic metabolism, stress responses for the adaptation to variable conditions inside the sponge microbial community, as well as proteins that could facilitate a direct molecular interaction between the symbionts and the host. The data also revealed specific protein expression by a Phyllobacteriaceae bacterium and a Nitrosopumilus-like thaumarchaeon, thus linking particular functions to an uncultured phylotype.

4.2 Materials and Methods

4.2.1 Sponge sampling and metagenomic analyses cell separation

Sponge sampling of C. concentrica, microbial cell enrichment, DNA extraction, pyrosequencing and metagenomic analyses were described in Section 3.2.

97 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

4.2.2 Protein extraction and preparation

The cell pellet for each triplicate sample was resuspended separately in 1 mL of lysis buffer containing 10 mM Tris-EDTA (pH 8.0), PIC (2 µL/mL), 0.1% Sodium dodecyl sulfate (SDS; Sigma-Aldrich, Sydney, Australia) and 1mM dithiothreitol (Sigma-Aldrich, Sydney, Australia). The cell sample was then disrupted on ice by sonication with a Branson Sonifer (Danbury, CT, USA) for five cycles of 30 s on a 30% amplitude with 0.5 second on-off pulses. Microscopic analysis showed no intact cells after this lysis step. To desalt the protein sample, the final supernatant was transferred into a 5 kDa cut-off Amicon Ultra-15 filter unit (Milipore, MA, USA) and buffer exchanged with 3 mL of 10 mM Tris-EDTA (pH 8.0) followed by concentration to a smaller volume (~200 µL). The final protein concentration of the samples was determined using a bicinchoninic acid protein assay kit (Sigma-Aldrich, Sydney, Australia).

4.2.3 One-Dimensional SDS polyacrylamide gel electrophoresis and in gel trypsin digestion

Protein samples were resuspended in appropriate volumes of SDS polyacrylamide gel electrophoresis sample buffer containing 187.5 mM Tris (pH 6.8), 30% glycerol, 6% SDS, 300mM DTT, 0.03# Bromophenol blue and sterilized MiliQ water. Samples were resolved on a 12% SDS gel using a Mini-PROTEAN system (Bio-Rad, Sydney, Australia), according to the protocol established by Laemmli (575). The separating gel contained 0.375 M Tris- HCl (pH 8.8), 0.1% SDS, 0.06% ammonium persulphate, 0.06% tetramethylethylenediamine and 12% acrylamide/bis-acrylamide solution (15:1 ratio, Bio-Rad, Sydney, Australia). The stacking gel consisted of 0.125 M Tris-HCl (pH 6.8), 0.3% SDS, 0.1% ammonium persulphate, 0.2% tetramethylethylenediamine and 5.5% acrylamide/bis-acrylamide solution. The gel was run at a constant current of 15 mA in running buffer containing 0.025 M Tris- HCl (pH 8.3), 0.192 M glycine and 0.1% SDS. The gel was stained with 0.25% Coomassie Blue solution in 50% methanol and 10% acetic acid for at least 4 hours. After staining, the gels were destained in a solution containing 40% methanol and 10% acetic acid for approximately one hour. Profile images were acquired using a conventional digital camera with a white background.

98 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

Whole gel lanes were sliced into equal proportions using a sterile gel cutter and each slice was washed with sterile Milli-Q water followed by washing with 25 mM NH4HCO3 in acetonitrile, to remove the Coomassie stain. The gel slices were treated separately through a series of reduction, alkylation and dehydration steps before digestion. Briefly, the slices were reduced with 10 mM DTT at 37°C for 30 minutes, alkylated in 25 mM idoacetamide for 45 minutes and dehydrated in acetonitrile. In gel enzymatic digestion was performed by rehydrating the gel pieces in a buffer containing 80 ng/µL of trypsin (Promega, Sydney,

Australia) and 25 mM NH4HCO3 at 37°C for 14 hours. Digested peptides were extracted using 1% formic acid and acetonitrile and then dried using a Savant SpeedVac concentrator (Thermo Fisher, Melbourne, Australia).

4.2.4 High-performance liquid chromatography and MS

Peptide digests were rehydrated in a buffer containing 1% formic acid and 0.05% heptafluorobutyric acid. Peptides were first separated by nano-LC using an Ultimate 3000 high-performance liquid chromatography and autosampler system (Dionex, Amsterdam, Netherlands). Samples (2.5 µL) were concentrated and desalted on a micro C18 precolumn

(0.5 mm X 2 mm; Michrom Bioresources, Auburn, CA, USA) with H2O:CH3CN (98:2, 0.05% trifluoroacetic acid, TFA) at 15 µL per minute. After a 4-minute wash the precolumn was switched (Valco 10 port valve; Dionex) into line with a fritless nano column (75 µm X ~10 cm), containing C18 media (5 u, 200Å Magic; Michrom) manufactured according to ref.

(576). Peptides were eluted using a linear gradient of H2O:CH3CN (98:2, 0.1% formic acid) to H2O:CH3CN (64:36, 0.1% formic acid) at 250 nL per minute over 30 minutes. High voltage (1,800 V) was applied to the low volume tee (Upchurch Scientific, Oak Harbor, WA, USA) and the column tip was positioned ~0.5 cm from the heated capillary (T = 250°C) of an LTQ FT Ultra (Thermo Electron, Bremen, Germany) mass spectrometer. Positive ions were generated by electrospray and the LTQ FT Ultra was operated in data-dependent acquisition mode. A survey scan m/z 350-1,750 was acquired in the FT ICR cell (resolution=100,000 at m/z 400, with an initial accumulation target value of 1,000,000 ions in the linear ion trap). Up to the six most abundant ions (4,000 counts) with charge states of +2, +3 or +4 were sequentially isolated and fragmented within the linear ion trap using collision-induced dissociation with an activation, q = 0.25 and activation time of 30 ms at a target value of 30,000 ions. M/Z ratios selected for mass spectrometry–mass spectrometry

99 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

(MS/MS) were dynamically excluded for 30 seconds. Peak lists were generated using Mascot Daemon/extract_msn (Matrix Science, Thermo, London, UK) using the default parameters (version 2.2; Matrix Science).

Each peptide sample was subjected to three separate LC-MS/MS analyses, which resulted in a total of 108 sample runs (3 sponge samples, 12 slices per sample and 3 runs per sliced sample)

4.2.5 MS/MS data analysis and database searches

All 108 MS/MS spectra were analyzed using Mascot (version 2.3; Matrix Science, London, UK), Sequest (Thermo Fischer Scientific, San Jose, CA, USA; version 1.0.43.0) and X! Tandem (The GPM, thegpm.org; version 2007.01.01.1). All database searches were performed against a combined search database. This database was generated from all predicted protein sequences from the metagenomic analysis of C. concentrica from a previous Sanger-sequencing based study (190) and newly generated metagenomic data (see Section 4.2.2). Mascot and X! Tandem were searched with a fragment ion mass tolerance of 0.40 Da and a parent ion tolerance of 4.0 ppm. Sequest was searched with a fragment ion mass tolerance of 0.50 Da and a parent ion tolerance of 3.0 ppm. Oxidation of methionine and iodoacetamide derivatives of cysteine were specified in Mascot, Sequest and X! Tandem as variable modifications. Oxidation of methionine, iodoacetamide derivatives of cysteine and acrylamide adduct of cysteine were specified in Mascot as variable modifications.

4.2.6 Protein identification and validation

To discriminate between false-positive and confident peptide matches, spectra for each sponge sample were pooled (36 MS/MS spectra per sample) and loaded into Scaffold (version Scaffold_2_00_05, Proteome Software Inc; Portland, OR, USA) as categorical samples for analysis to validate peptide and protein identifications. Peptide identifications were accepted, if they could be established at greater than 95.0% probability as specified by the Peptide Prophet algorithm (577). Protein identifications were accepted, if they could be established at greater than 99.0% probability and contained at least two identified peptides. Protein probabilities were assigned using the Protein Prophet algorithm (578). Proteins that

100 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica contained similar peptides and could not be differentiated based on MS/MS analysis alone, were grouped to satisfy the principles of parsimony. The false discovery rate, as estimated by searches against a decoy database, was below 1%.

4.2.7 Binning of metagenomic data and comparative genomics

Five dominant bacterial lineages, namely Phyllobacteriaceae-type, Piscirickettsiaceae-type, AlphaBetaGamma-proteobacteria-type, Sphingomondales-type, and Deltaproteobactera-type, were previously constructed based on the binning of the tetranucleotide patterns identified from the metagenomic sequence scaffolds generated by Sanger sequencing (190). The original five partial genomes were expanded by adding the original sequences and the NCBI reference genomes as reference genomes into the PhymmBL (383) and by classifying all new metagenomic contigs longer than 1000 nt. Contigs assigned to any of the previously described genome bins (190) were then used to expand the partial genomes. To classify the taxonomic origins of the 765 proteins identified in the metaproteome, the protein sequences were mapped back to the five partial genomes based on protein similarity and contig (DNA) identity. For protein similarity, 765 proteins were searched against all the proteins encoded by the partial genomes using BlastP (identity cut-off 95%), and found 65 proteins assigned to Phyllobacteriaceae-type, 5 to the Piscirickettsiaceae-type, and 1 to the AlphaBetaGamma- proteobacteria-type. For contig identity, the 746 contigs encoding these 765 proteins were extracted, and classified them against the expanded partial genomes using PhymmBL. Again, 65 proteins were assigned to the Phyllobacteriaceae-type, 17 to the Piscirickettsiaceae-type, 15 to the AlphaBetaGamma-proteobacteria-type, 1 to Sphingomonas, and 1 to Bdellovibrionales.

Statistical, pairwise, comparison of the COG category profile and individual COGs between the proteome and partial genome data was performed by re-sampling (n=1,000) of COG subsamples (n=60) as outlined by Lauro et al. (579).

4.2.8 Phylogenetic analysis of 16S rRNA genes and a thaumarchaeal AmoA

The 16S rRNA gene sequence of the Phyllobacteriaceae-phylotype was taken from the previously classified partial genome from C. concentrica (190), which was also found in the

101 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica current metagenomic dataset. This sequence was aligned with SINA (v1.2.9) (291) and inserted into the latest SILVA SSU Ref tree (version 1.08), using the parsimony function and the parsimony mask (pos_var_Bacteria_94) in the ARB software package (580). Sequences for laboratory isolates belonging to the Phyllobacteriaceae-phylotype were exported and its common conserved blocks were extracted using Gblocks (version 0.91b) (436). Filtered alignments such as RAxML (Version 7.2.8) (581) were used for phylogenetic reconstruction, with default setting and supported by rapid bootstrapping with 1,000 resamples. The same procedure was used for the phylogenetic reconstruction of a Nitrosopumilus-like thaumarchaeon, except that selection of sequences from uncultured strains in the Marine Group I was included.

For phylogenetic analysis of the AmoA protein, homologous sequences were obtained by iterative searches (10X) against the NCBI NR database (September 15, 2010) with an E-value cut-off of 10-4. Redundant sequences were removed using CD-Hit (version 4.3) (455), with a similarity cut-off of 95%. A multiple sequence alignment was conducted using Muscle (version 3.8.31) (457). Representative sequences of bacterial AmoA and methane monooxygenase subunit A (PmoA) were selected according to the representatives used by Hallam et al. (399), and aligned respectively. The three alignments of archaeal, bacterial AmoA and PmoA were then aligned using a profile-profile alignment in Muscle. Conserved blocks were selected using Gblocks and phylogeny was constructed using RaxML with empirical based frequencies drawn from the alignment.

4.2.9 FISH probe design and evaluation

Specimens were prepared for microscopy according to the method outlined by Liu et al. (582). Briefly, hybridization was performed in a hybridization chamber at 46°C in hybridization buffer [0.9 M NaCl, 20 mM Tris-HCl (pH 7.4), 0.01% SDS and 30% formamide] for 2 hours. Fluorescent probes were used at a concentration of 5 ng/µL in hybridization buffer [0.9 M NaCl, 20 mM Tris-HCl (pH 7.4), 0.01% SDS and 30% formamide]. A specific oligonucleotide probe PHY-CY3 (5’-CTCAATCTCGCGATCTCG- 3’) was designed and synthesized (Therom Fisher, Germany) to target the Phyllobacteriaceae-phylotype with the Probe Design program within the ARB software package (583). The target position was between 1,258 and 1,276 (E. coli numbering). A total

102 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica of 303 partial 16S rRNA gene sequences related to the sponge Phyllobacteriaceae were used for the design (190). The large multiple-sequence set used in the designing process ensures high specificity of the probe. Probe evaluation was also conducted (583) and the optimal concentration of formamide determined. Slides were incubated in 50 to 100 µL of hybridization mix dependent on the size of the sections, then carefully rinsed and further incubated at 43°C in wash buffer [0.2 M Tris-HCl (pH 7.4), 5 mM EDTA, 0.01% SDS and 100 mM NaCl, X] for 25 minutes. The wash buffer was carefully rinsed off and the slides air-dried in the dark. The anti-fading agent Citifluor (Citifluor Ltd., London) was mounted on the slides to prevent fluorochrome bleaching. Microscopy analysis was done with an Olympus FV1000 Laser Scanning Microscope (inverted) with excitation wavelength of 543 nm for the Cy3-labeled probe. Color microscopy images were acquired using the camera attached to the microscope (Olympus, Japan) and processed with Adobe Photoshop CS. Presence and distribution of bacteria in sponge tissues were examined by investigating multiple regions from at least three replicate specimens.

4.3 Results and Discussion

4.3.1 Overview of the metaproteogenomic data

Sequencing of DNA extracted from the microbial communities associated with three samples of C. concentrica resulted in a total of 2.8 million unique sequencing reads, which assembled into 988,317 contigs or singletons bigger than 100 nt. Of these, 687,588 passed a filtering procedure for eukaryotic contaminations. For each of the three sponge samples, 342,235, 550,559 and 555,480 proteins (ORFs) were identified respectively, and an average of 235,288, 208,852, 209,592, 284,549 could be annotated to CDD, COG, Pfam and SEED, respectively (Table 4.1).

The three proteomic datasets had a total of 765 non-redundant proteins identified from 5,275 peptide fragments. Taxonomic analysis indicated that 139 proteins were possibly from eukaryotic origin leaving 626 proteins for further functional characterization. Of these 367, 364 and 395 proteins were found in each of the three individual C. concentrica samples and 186 proteins were common to all three samples (Fig. 4.1). Protein sequences were annotated and clustered into functional categories. It was found that the expressed proteins in the

103 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica sponge contained a substantial number of proteins (34%) with no assignment, or of hypothetical nature, suggesting the presence of many unrecognized functions in the sponge’s microbial community.

Table 4.1 Information for the metagenomic analysis of the sponge C. concentrica.

Sample Cyn-A Cyn-B Cyn-C Raw read 678263 1169872 1323699 Average read size (nt) 358 408 393 Unique read 660869 1004075 1111093 Aligned read 347438 (53.8%) 704702 (72.0%) 679422 (63.3%) Contig > 1000 nt 3389 3246 6925 Contig > 500 nt 11417 11398 26612 Average size of contigs > 500 nt (nt) 1185 1459 1119 N50 size of contigs > 500 nt (nt) 1345 2262 1192 Maximum size of contigs > 500 nt (nt) 28780 323086 39109 Contig > 100 nt 22865 22168 56410 Singleton > 100 nt 275175 251668 360031 Prokaryotic-originned contig and singleton 212117 (71.2%) 200120 (73.1%) 275351 (66.1%) Unique protein 215220 229113 281686 Total protein 342235 550559 555480 Protein annotated by CDD (1e-4) 149902 (43.8%) 301223 (54.7%) 254738 (45.9%) Protein annotated by COG (1e-4) 131452 (38.4%) 271121 (49.2%) 223983 (40.3%) Protein annotated by Pfam (1e-4) 131095 (38.3%) 270007 (49.0%) 227674 (41.0%) Protein annotated by SEED (1e-4) 182779 (53.4%) 361986 (65.8%) 308882 (55.6%)

Fig. 4.1 Venn diagram showing the distribution of proteins identified across three sponge samples. S, Sample replicate.

104 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

To further compare the expressed functional profile of the microbial communities with their underlying genetic potential, the metaproteomic and metagenomic datasets were analyzed based on COG functional categories (Fig. 4.2). Specific COGs that were relatively overrepresented in the metaproteome included carbohydrate transport and metabolism, post- translational modification, protein turnover, chaperone functions and signal transduction. A relative underrepresentation was observed for functional groups associated with coenzyme transport and metabolism, transcription, translation, replication recombination and repair, when compared to the metagenome dataset. Further analysis at the individual COG level, indicated an abundance (both in terms of proteins and peptides detected) of chaperonin GroEL (HSP60 family) (COG0459), an array of highly specific transporters (COG0747, COG0834, COG4663, COG0683, COG1653), dehydrogenases with different specificities, FabG (COG1028) and outer membrane receptor proteins CirA, OmpA (COG1629, COG2885) (Fig. 4.3). Specific characterization of functions within each category allowed for specifying physiological properties and activities of the sponge’s microbial community.

Fig. 4.2 Relative abundance of COG categories based on the metaproteomic and metagenomic data. COG counts were normalized and the percentage of total counts in each COG categories presented above. The error bars show calculated standard variation of triplicate samples and asterisks indicate a statistical significance with a P-value <0.05 in a t-test.

105 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

4.3.2 Active transport systems involved in nutrient acquisition

The sponge-associated community showed an abundant expression of high-affinity and broad-specificity uptake systems, such as ABC transporters and tripartite ATP-independent periplasmic (TRAP) transporters. The most abundant transporter components detected in the metaproteome were periplasmic substrate-binding domains associated with ABC transporters, in particular for amino acids. DppA (COG0747) (Fig. 4.3) is the substrate-binding component of the DppABCDEF dipeptide transport system, which has been demonstrated to transport proline-containing dipeptides (584). Proline-containing dipeptides have been previously isolated from marine sponges (585), however the exact production source (sponge or symbiont) is not known. Nevertheless DppA-type transporters were not observed to be as abundant in the metaproteome of planktonic bacteria (572, 573), highlighting significant nutritional differences in the free-living and sponge-associated environment. Dipeptide transporters of the DppA type have also been found to be capable of transporting heme and heme precursors (586), indicating the potential scavenging of these iron-containing compounds from the surrounding or the sponge host. Other abundant proteins detected were HisJ and LivK, which are the periplasmic components of the high-affinity histidine- and leucine-specific transport system, respectively.

The overrepresentation of the COG category for carbohydrate transport and metabolism (Fig. 4.2) is principally due to a high number of proteins associated with the glycerol-3-P ABC-type transporter UgpB (COG1653) and the TRAP system DctP (COG1638). UgpB is the periplasmic binding protein of the glycerol-3-phosphate uptake system (587), which transports glycerol-3-phosphate for use as a carbon source and/or phosphate source. However, glycerol-3-phosphate uptake through the Ugp system is unable to supply sufficient carbon for bacterial growth, but instead only increases the internal phosphate concentration (588). Therefore the Ugp system is ideally geared for scavenging phosphate-containing compounds (588). The DctP system is well-characterized in Rhodobacter capsulatus (589) and has also been found in Wolinella succinogenes (590), where it is responsible for the transport of the C4-dicarboxylates, fumarate and malate (590). Structurally, the known TRAP substrates are united by the presence of a carboxylate group so that hundreds of organic acids could be potentially transported by the system (591). Large numbers of evolutionarily diverse TRAP transporters have been found in marine environments,

106

Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

Fig. 4.3 Abundance of specific COGs in sponge-associated microbial metaproteome. Grey bar indicates the absolute counts on the protein level and black bar represents the absolute counts on the peptide level. especially in the SAR11 clade (592), and this suggests an important role in the transport of diverse substrates.

Another TRAP transporter expressed in the sponge-associated community belongs to the FcbT1 type, which has been shown in Comamonas sp. DJ-12 to transport halogenated, aromatic substrates, such as 4-chlorobenzoate (4-CBA) (593). The genetic organization for this TRAP transporter is an operon encoding enzymes responsible for hydrolytic dechlorination of 4-CBA, which can be further metabolized to succinyl-CoA and acetyl-CoA (594). FcbT genes can also be induced by benzoate derivatives like 4-bromobenzoate, indicating a wider potential substrate range of aromatic compounds (593). Marine holobionts, including algae (595), jellyfish (596), (597) and sponges (598) have been recognized as a rich source of naturally occurring halogenated compounds, many of which have antibiotic or antagonistic activities. A number of sponge species, such as Psammopemma sp., Psammaplysilla purpurea, Aplysina aerophoba, and Dysidea herbacea, produce brominated aromatic metabolites, including bromoindoles, bromophenol (BP), polybrominated diphenyl ethers, and dibromodibenzo-p-dioxins (599-602). These observations are therefore consistent with the uptake of halogenated aromatic compounds by the sponge’s symbionts. Alternatively, the bidirectional nature of TRAP transporters (603) could facilitate export, making the symbionts the actual producers of the halogenated aromatics. Whichever way, the data show an intimate symbiotic relationship of the sponge with its symbionts through the transport of halogenated aromatic compounds.

TonB-dependent transporters (TBDTs) were also expressed by the sponge-associated microbial community, something that has also been observed for the microbial membrane metaproteome of specimens from the South Atlantic (571). Specifically, the outer membrane receptor proteins CirA and OmpA, which utilize a proton motive force to transport nutrients across the outer membrane of Gram-negative bacteria, were detected. Genome studies on bacterioplankton have demonstrated TBDTs to be enriched among marine bacterial species (604). The transport activities of TBDTs were thought to be restricted to iron complexes (siderophores) and vitamin B12 (cobalamin), but recent experimental and bioinformatic

108 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica studies indicate that nickel, cobalt, copper, maltodextrins, sucrose, thiamin and chito- oligosaccharides are also suitable substrates (605).

Overall, the sponge-associated community clearly expresses a large number of transporters for the acquisition of various substrates, and in this respect behaves in a similar manner to the planktonic, bacterial communities from oligotrophic open oceans and productive coastal ecosystems (571-573). Despite these broad similarities, there were clearly subtle differences in transport (e.g. dipeptides, halogenated aromatics) that reflect the nutrients specific for the microhabitats of the sponge.

4.3.3 Stress response

The abundance of expressed proteins associated with post-translational modification, protein turnover and chaperone functions (Fig. 4.2) reflected the presence of a high number of chaperone proteins GroEL (HSP60, COG0459), membrane proteases HflC (COG0330) and DnaK (COG0443). These chaperones and proteases are essential for the elimination of denatured or damaged proteins, which could result from stress conditions such as temperature shifts, osmotic pressure, presence of reactive oxygen species and toxic compounds. In addition, the peptide methionine sulfoxide reductase MsrA, which repairs proteins that have been inactivated by oxidation (606), is expressed. A number of proteins that are annotated to be heme-dependent peroxidases as well as the superoxide dismutase SodA, which eliminates harmful oxidation products like hydrogen peroxide were also detected (607). In addition, peroxiredoxin (COG0450) and glutathione-S-transferases (COG0625) were expressed and these proteins might be important to control cytoplasmic redox balance (608, 609). A choline dehydrogenase BetA, which catalyzes the oxidation of choline to glycine betaine was also detected (610). Betaines are potent and frequently used osmolytes that ensure osmotic balance in the cytoplasm and their production is often induced by osmotic stress.

Stress-related functions have previously been noted to be abundant in the genomes of bacteria associated with C. concentrica, compared to planktonic bacteria of the surrounding water (190). The cycling pumping activity of sponges (121) and steep local gradients (398), would expose bacteria to variable environmental conditions, in terms of availability of

109 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica nutrients and electron acceptors (e.g. oxygen). It appears that symbionts of C. concentrica are indeed dealing with such fluctuations in an evolutionary (abundance of genes) and physiological manner (expression of those genes).

4.3.4 Metabolism

The nitrogen metabolism of bacterial and archaeal symbionts is closely linked to the sponge host, which secretes and accumulates ammonium (74, 396, 398, 476). It is therefore not surprising that the expression of the ammonia monooxygenase membrane-bound subunits β and γ (AmoB and C) and an ammonia transporter (AmtB) were also detected in the microbial community of C. concentrica. Both sequences were most closely related (BlastN identity: 92%), to those of the marine thaumarchaeon N. maritimus (466). The genes encoding AmoB and C were adjacent and orientate in opposite transcriptional directions on a contig of the C. concentrica metagenome (Fig. 4.4A). The contig also contains a gene for the α subunit (AmoA) and has overall striking synteny with a genomic region of N. maritimus (Fig. 4.4AB). Phylogenetic analysis of AmoA further confirmed the close relationship with the N. maritimus (Fig. 4.5). Putative nitric oxide reductase subunits (NorQ and NorD) are also encoded in the genomic region (Fig. 4.4AB) and might have a particular role in determining tolerance to nitric oxide under limiting oxygen concentrations or to allow for the use of nitrous oxide as an alternative electron acceptor (611).

N. maritimus belongs to the C1a-α subgroup of thaumarcheaeal Marine Group I, which contains many sequences obtained from sponges and plankton (465). The phylogeny for the Nitrosopumilus-like thaumarchaeon in C. concentrica was further defined. A 16S rRNA gene sequence in a metagenomic contig (Fig. 4.4C) was identified and used to construct the phylogenetic relationship with other selected C1a-α subgroup members. This C. concentrica- derived thaumarchaeal sequence clustered with a mix of 16S rRNA sequences from sponge- symbionts as well as free-living archaea (Fig. 4.6). This shows that aerobic nitrification and transport of ammonia are potentially active in C. concentrica and that these functions are being carried out by a Nitrosopumilus-like thaumarchaeon, which is a representative of common symbiotic and planktonic archaea. These results also indicated that members of this nitrifying, archaeal clade could exist in either a host-associated or a free-living form.

110 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

Fig. 4.4 Contigs containing an archaeal amoABC gene cluster and a thaumarchaeal 16S rRNA gene. Gene annotation is as follows: ZnMc_MMP, Zinc-dependent metalloprotease, matrix metalloproteinase sub-family; arsR, arsenic resistance operon repressor; DUF947, hypothetical protein with a DUF947 superfamily domain; gsaB, glutamate-1-semialdehyde aminotransferase, class III aminotransferase; dam, DNA adenine methylase; rps15p, 30S ribosomal protein S15P; PRK14888, PRK14888 superfamily protein; serS, Seryl-tRNA synthetase class II; rps3Ae, 30S ribosomal protein S3Ae; DUF54, DUF54 superfamily protein; bchP-chlP, geranylgeranyl reductase; norQ, moxR-like ATPase, nitric oxide reductase Q protein; norD, nitric oxide reductase Q protein; PALP, pyridoxal phosphate- dependent enzyme; tbp, TATA box binding protein; ASCH/ASC-1-like, ASC-1 homology domain, ASC-1-like subfamily. Hypothetical genes are not annotated. Contig layout is generated by Geneious 4.86 (http://www.geneious.com). (A) A contig from the Nitrosopumilus-like thaumarchaeon in C. concentrica containing the amoABC gene cluster. (B) A part of the N. maritimus SCM1 genome containing the amoABC gene cluster. (C) A contig from the C. concentrica metagenome containing a thaumarchaeal 16S rRNA gene.

Fig. 4.5 Maximum-Likelihood tree of archaeal and bacterial AmoA sequences. The tree is rooted with PmoA AAQ10310, CAE47800 and AAA87220. Bootstrap values of larger than 50% (for 1,000 replicates) are only shown in the tree nodes. Bold cases indicate sponge- derived sequences.

111

Fig. 4.6 16S rRNA gene Maximum-Likelihood tree of the Nitrosopumilus-like thaumarchaeon and N. maritimus SCM1 with other lineages in C1a-α Marine Group I. Bootstrap values of larger than 50% (for 1,000 replicates) are only shown in the tree nodes. Bold cases indicate sponge-derived sequences. 113

Given that both aerobic and anaerobic conditions might exist within micro-habitats of the sponge tissue, the metaproteomic dataset was investigated for functions related to anaerobiosis. The expression of proteins annotated to COG0076 (GadB; Glutamate decarboxylase and related PLP-dependent proteins) part of the glutamate-dependent acid resistance systems was identified. Glutamate-dependent acid resistance systems protect cells during anaerobic phosphate-starvation, when glutamate is available, by preventing damage from weak acids produced by carbohydrate fermentation. Although no proteins directly associated with carbohydrate fermentation were detected, acetoacetate decarboxylase, which is involved in the solventogenesis of the typical fermentation products butyric and acetic acid into acetone and butanol (612), was expressed. Anaerobic degradation of amines and polyamines may also occur because the expression of crotonobetainyl-CoA hydratase (CaiD) (COG1024), which is part of the carnitine degradation pathway (613), was detected. In E. coli, this pathway, which includes the dehydration and reduction of L-carnitine to γ-butyrobetaine, is induced during anaerobic growth. The carnitine pathway has also been found to generate the osmoprotectant betaine during anaerobic respiration (613-615). Recent analysis of anaerobic carnitine reduction in E. coli has also shown that the electron transfer flavoproteins FixA and FixB genes are necessary for the transfer of electrons to crotonobetaine reductase (CaiA) (616) and FixA was found expressed in the sponge community. Overall the data provide evidence for the existence of both aerobic and anaerobic metabolism in the sponges.

4.3.5 Molecular symbiont-host interactions

A substantial overrepresentation of ANK repeat and TPR proteins was observed in a recent metagenomic study of the bacterial communities associated with C. concentrica from this laboratory (190). The genes for these ELPs also clustered in the genome of an uncultured, delta-proteobacterial symbiont of the same sponge (582). Genes encoding ANK proteins have been reported to be abundant in the genomes of obligate and facultative symbionts, such as Wolbachia pipientis (617), Ehrlichia canis (618), L. pneumophila (619) and Coxiella butnetii (620) and could have a function in mediating host–bacteria associations. This possibility was highlighted recently by a mutant study of L. pneumophila, which showed that certain ANK proteins controlled the intracellular replication of L. pneumophila within the amoebal host (619). An ANK protein (COG0666) and a TPR protein

114

(PFAM00515) were found in the metaproteomic dataset confirming that sponge-associated microorganisms indeed express those proteins, potentially for mediating interactions with the sponge host, as was previously proposed (190). Other proteins with roles in bacteria- eukaryote interactions were also expressed, including proteins with Hep/Hag domains. The seven-residue repeat domain of Hep/Hag is contained in the majority of the sequences of bacterial hemagglutinins and invasins. The adhesin YadA was also detected in the metaproteome. This protein has been shown to be responsible for phagocytosis resistance (621). Sponges are filter feeders, and the presence of such ELPs and bacteria-eukaryote mediators, suggest that sponge symbionts might use these proteins to escape phagocytosis and/or control their symbiotic relationship with their hosts.

4.3.6 Linking phylotype to function – expression profiling of an uncultured Phyllobacteriaceae-related bacterium

Binning of previous sequencing data for the metagenome of C. concentrica (190) and the current pyrosequencing dataset facilitated the reconstruction of partial genomes of five uncultured sponge symbionts; i.e. phylotypes classified to belong to the Phyllobacteriaceae- phylotype, the Piscirickettsiaceae-type, the AlphaBetaGamma-proteobacteria-type, the Sphingomondales-type and the Deltaproteobacteria-type (190) (see Section 4.2). A total of 65 proteins were assigned to the Phyllobacteriaceae genome, 17 to Piscirickettsiaceae, 15 to AlphaBetaGamma-proteobacteria, one to Sphingomondales and one to Deltaproteobacteria. As the number of hits against the Phyllobacteriaceae genome was the highest, the specific dataset for this phylotype was further investigated for its expression profile and genomic features.

Detailed phylogenetic analysis established that the 16S rRNA gene sequence within the Phyllobacteriaceae partial genome forms, together with two sequences previously amplified from C. concentrica (AY942778 and AY942764), belong to a distinct clade that is deeply branched within the Phyllobacteriaceae family (Fig. 4.7). A large majority of members of the Phyllobacteriaceae are plant-associated and have been well studied with respect to their potential to promote plant growth (622). They also occupy diverse habitats, such as soil (623), water (624) and unicellular organisms (625), suggesting a remarkable adaptive capacity to different environmental niches. The sponge sequences of this study are also

115

related to nitrogen-fixing Mesorhizobium species and denitrifying Nitratireductor species (483), which could imply that the Phyllobacteriaceae-phylotype in C. concentrica is potentially involved in nitrogen metabolism.

Fig. 4.7 Maximum-Likelihood tree showing the Phyllobacteriaceae-phylotype and phylogenetic relationship to its closely related neighbor phylotypes. The tree was base on the 16S rRNA gene. An uncultured thaumarchaeote was used as outgroup for the analysis (not shown in the tree). The scale bar indicates 0.1 nucleotide changes (10%) per nucleotide position.

The proteome assigned to the Phyllobacteriaceae-phylotype was compared to its partial genome and found to have over- and underrepresentation of functional COG categories (Fig. 4.8A), similar to the overall metaproteome and metagenome comparison (Fig. 4.2). COG

116 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

Fig. 4.8 Comparison between expressed proteome and partial genome of the Phyllobacteriaceae-phylotype on the level of COG categories (A) and COG level (B). Black and grey bars represent over- and under-representation, respectively, of identified proteins. The X- axis displays the median value with significant cut-off value of -1 and 1.

Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica

Fig. 4.9 Detection of sponge-associated bacteria and the Phyllobacteriaceae-phylotype by FISH. (a) Section of C. concentrica hybridized with a Cy3-labelled specific (PHY_Cy3) probe (green or sometimes yellow, if the signal overlapped with the red autofluorescence of the sponge tissue). Arrows indicate the presence of the Phyllobacteriaceae-phylotype within sponge tissue (mesohyl). (b) Section of non-hybridized C. concentrica showing auto-fluorescence background (red). (c) Section of C. concentrica hybridized with a Cy3-labelled general bacterial (EUB322_CY3) probe (green) with the auto-fluorescent sponge tissue (red) and larger auto- fluorescence cells (yellow). Scale bar: 20 µm.

Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica categories of amino acid transport and metabolism, post-translational modification, protein turnover, chaperone functions and signal transduction were also overrepresented in the expressed proteome data (Fig. 4.8A). At the individual COG level, specific transport functions were overrepresented (Fig. 4.8B), with both ABC-type transporters and TRAP- type transporters being expressed. Of particular interest was the ABC-type nitrate/sulfonate/bicarbonate transport system TauA, which can import nitrate across the cell membrane. A nitrate reductase gene cluster (NarG, NarH, NarI, and NarY) was present in the partial genome of the Phyllobacteriaceae-related phylotype, however the expressed proteins NarG and NarY in the metaproteome dataset, could not be unambiguously assigned to this organism. Nevertheless, the proteomic data suggests that denitrification is actively expressed and most likely takes place in the anaerobic, inner part of the sponge (492). Based on this observation, the physical location of the Phyllobacteriaceae-phylotype within sponge tissue was investigated using FISH. No FISH signal was observed near the outer surface of the sponge and the Phyllobacteriaceae-phylotype was found to reside within the mesohyl, mainly near the chambers, where feeding takes place in sponges (Fig. 4.9).

An expressed transposase gene was also found assigned to the Phyllobacteriaceae-phylotype. Its chromosomal location was flanked by two genes associated with lipid metabolism (cyclopropane-fatty-acyl-phospholipid synthase and acyl-CoA dehydrogenase). Transposons and transposases were found to be abundant in the metagenome of C. concentrica and these elements may be an important part of the genomic adaptation process towards a symbiotic relationship between bacteria and host (190). The observation showed that the transposase function is still active and that the transposase genes are not all just remnants of past events of intra- or intercellular HGT.

4.4 Conclusion

This analysis has provided new insights into the activities of sponge-associated microbial communities, and for the Phyllobacteriaceae phylotype has indicated a clear link between uncultivated phylotypes and functions. For what appears to be the first time, specific transport functions for typical sponge metabolites (e.g. halogenated aromatics, dipeptides) could be identified. While the co-existence of aerobic and anaerobic phases of the nitrogen cycles have been observed for another sponge system (398), the result presented here could

119 Chapter Four: Metaproteogenomics of the sponge Cymbastela concentrica assign those functions to specific, expressed proteins and phylotypes in C. concentrica. The analysis also indicated the requirement for the microbial communities to respond to variable environmental conditions and hence to express an array of stress protection proteins. Finally, molecular interactions between symbionts and their host might also be mediated by a set of expressed ELPs and cell-cell mediators, and some sponge-associated bacteria (e.g. the Phyllobacteriaceae-phylotype) could be undergoing evolutionary adaptation process to the sponge environment, as evidenced by active mobile genetic elements.

The results presented here have clearly shown that a combined metaproteogenomic approach can provide novel information on the activities and physiology of sponge-associated, microbial communities. This approach has proved not only to be useful for investigating the enormous microbial diversity found in sponges around the world, but also for investigating the functional behavior of symbiont communities in response to environmental change or host physiology (127). Such observations will be crucial to understand the dynamic and complex interactions between sponges and their associated microbial symbionts.

120 Chapter Five: Sponge Holobiont Response to Heat Stress

Chapter Five Phylogenetic and Functional Response of A Sponge Holobiont to Thermal Stress

5.1 Introduction

Large-scale mortality events and increased disease outbreaks are being observed in marine animals globally (204-206). For example, one-third of all coral species are at risk of extinction and 27% of reefs have already been lost worldwide (214). Environmental factors including climate change, anthropogenic pollution, introduced species and nutrient enrichment have been linked to stress and disease in marine organisms (204). Increased sea surface temperature (SST) and ocean acidification are threatening marine pelagic and benthic fauna, particularly reef-building organisms such as corals and sponges, for which bleaching, disease and mortality have already been described (199, 200, 208).

Benthic marine invertebrates such as corals and sponges form highly complex symbioses with microorganisms which can make it difficult to identify the underlying mechanisms that impact on the health of the host (74, 207). This is highlighted in the study of coral symbiosis where bleaching can be caused by thermal impairment of zooxanthellae (276, 277) or temperature-induced invasion of bacterial pathogens encoding virulence-related genes (263, 626-629). In addition, many studies have reported shifts in microbial community composition prior to disease development, thereby questioning the importance or involvement of specific pathogens (235, 630, 631). These observations illustrate the complex interplay within a holobiont and the difficulty in dissecting cause and effect of many stress-related syndromes. Systematic studies that simultaneously assess the composition and function of the host and the symbiont community under controlled stress conditions will clarify these issues.

Sponges are an important component of marine ecosystems and play critical roles in reef consolidation, bioerosion, habitat provision and biogeochemical cycles (reviewed

121 Chapter Five: Sponge Holobiont Response to Heat Stress by Bell (119)). Sponges also form stable and often obligate associations with diverse and host-specific communities of microbial symbionts (74, 127, 192) (Chapter 3). The functions of these microbial communities appear to have co-evolved with their host to form a stable holobiont (190) (Chapter 3). Furthermore, large-scale mortality of sponges has occurred during periods of unseasonably high SSTs and after disease outbreaks (reviewed in ref. (208, 215, 216)). These die-offs affect not only the survival of the local sponge population, but also the fate of the many associated marine invertebrates, thus potentially impacting the ecology of entire reef ecosystems (208). In vitro experiments using the GBR sponge R. odorabile have demonstrated a direct link between temperature stress and sponge health and the resulting necrosis of the sponge (632) was accompanied by clear changes in the composition of its microbial symbiont community (as assessed by 16S rRNA gene based analysis) (229).

In the present study, R. odorabile and its symbiont community are used as an experimental model to investigate holobiont function under a controlled temperature stress. Community fingerprinting, metagenomics and metaproteomics were used to characterize the community structure and function of symbionts as well as gene expression analysis to define the host state. It is shown that temperature stress causes a decline in the health of the sponge holobiont, which is characterized by the disruption of symbiotic interactions and a heat stress response in both the host and the symbionts. The symbiotic disruption includes changes in metabolic interactions and other essential symbiont functions, which occur prior to dramatic changes in the community structure. The disturbed holobiont then offers niches that are rapidly occupied by a new set of bacteria, which lack the capacity for symbiotic interactions, but instead opportunistically scavenge on the decaying host.

5.2 Materials and Methods

5.2.1 Sampling and experimental setup

Three individual R. odorabile sponges were collected from Davies Reef on the Great Barrier Reef, Australia (18°82'S, 147°65'E) in July 2009 by SCUBA diving. The donor sponges were cut into 48 clones. Each of these was approximately 15 cm3 in

122 Chapter Five: Sponge Holobiont Response to Heat Stress size weighing approximately 30 g. They were secured in plastic racks and left on the reef to heal for 16 weeks. Clones were then transported to indoor aquaria at the Australian Institute of Marine Science, Townsville, Australia. The water temperature of the aquaria was set to 27 ± 0.5°C to correspond to the ambient water temperature at Davies Reef and clones were acclimatized for seven days. They were then randomly separated into six 60-L tanks, each of which held eight clones. Tanks were supplied with 5 µm filtered seawater at a flow rate of 600 mL per minute and illuminated at 80 µmol quanta m-2 s-1 using 3ft compact fluorescent tubes tied to overhead lights under a diel cycle (12:12 hours). Three additional individuals of R. odorabile (wild- types) were collected from the same site of the GBR and directly processed for microbial cell enrichment.

The thermal stress experiment was commenced at 8 a.m. on November 18, 2009 (day 0). The six tanks were randomly assigned for two temperature treatments (T27 and T32) with three replicate tanks (a, b and c) in each treatment. Tanks in the T27 treatment were maintained at 27°C throughout the experiment, whilst seawater in the T32 treatment tanks was gradually heated (0.2°C/hour) to the final treatment temperature of 32°C and then maintained at this temperature until the end of the experiment. Sampling was conducted at the time points listed in Table 5.1. One clone was randomly selected from each tank and immediately processed for microbial cell enrichment.

Table 5.1 Sample collections in the temperature-shifting experiment. Timepoint Temperature 27°C 32°C Day 0 (Nov 18, 2009) Day0-a (H), Day0-b (H), Day0-c (H) Day 1 (Nov 20, 2009) T27-Day1-a (H), T27-Day1-b (H), T32-Day1-a (H), T32-Day1-b (H), T27-Day1-c (H) T32-Day1-c (H) Day 3 (Nov 22, 2009) T27-Day3-a (H), T27-Day3-b (H), T32-Day3-a (H), T32-Day3-b (H), T27-Day3-c (H) T32-Day3-c (N) Day 4 (Nov 23, 2009) T27-Day4-a (H), T27-Day4-b (H), T32-Day4-a (N), T32-Day4-b (I), T27-Day4-c (H) T32-Day4-c (I)

* H, the healthy group. I, the, intermediate group. N, the necrotic group.

123

Table 5.2 The GeXP analysis.

Function Gene H I N H vs. I H vs. N I vs. N Cytoskeleton/skeleton rearrangement α-tubulin 0.199 ± 0.058 0.013 ± 0.008 0.012 ± 0.005 0.014 0.015 0.435 β-tubulin 0.536 ± 0.359 0.213 ± 0.015 0.177 ± 0.133 0.130 0.112 0.383 Actin 3.475 ± 1.458 1.904 ± 0.053 3.502 ± 2.066 0.101 0.494 0.236 Actin-related protein 2/3 0.968 ± 0.017 0.898 ± 0.035 0.663 ± 0.166 0.088 0.116 0.142 Gelsolin 0.950 ± 0.101 0.546 ± 0.033 0.923 ± 1.021 0.006 0.488 0.347 Profilin 0.818 ± 0.048 0.824 ± 0.067 0.907 ± 0.694 0.465 0.443 0.446 Prolidase 0.146 ± 0.150 0.013 ± 0.003 0.055 ± 0.042 0.133 0.206 0.197 B-thymosin 0.339 ± 0.019 0.175 ± 0.022 0.277 ± 0.029 0.006 0.075 0.033 Villin 0.540 ± 0.100 0.676 ± 0.102 0.783 ± 0.562 0.133 0.325 0.416 Radial Spoke Protein 0.864 ± 0.130 0.229 ± 0.106 0.253 ± 0.111 0.007 0.008 0.424 Signal transduction Calmodulin 0.469 ± 0.065 0.126 ± 0.039 0.159 ± 0.092 0.003 0.036 0.353 YWHAQ 0.989 ± 0.081 0.884 ± 0.080 0.805 ± 0.473 0.137 0.339 0.426 Chaperone Cyclophilin 2.097 ± 0.090 1.328 ± 0.004 1.645 ± 0.228 0.002 0.096 0.150 Hsp70 0.098 ± 0.029 0.361 ± 0.077 0.588 ± 0.534 0.053 0.209 0.328 Hsp90 2.139 ± 1.038 2.135 ± 0.184 2.041 ± 1.163 0.498 0.466 0.464 Protein synthesis/degradation Apoptosis-linked gene 2l 0.019 ± 0.008 0.139 ± 0.025 0.140 ± 0.042 0.038 0.074 0.496 Cyclophilin NIMA-interacting 4 0.040 ± 0.005 0.020 ± 0.004 0.040 ± 0.026 0.008 0.493 0.240 Elongation factor - Tu 0.249 ± 0.027 0.118 ± 0.025 0.025 ± 0.013 0.010 0.001 0.037 Polyubiquitin 0.691 ± 0.071 0.547 ± 0.039 0.949 ± 0.802 0.031 0.364 0.304 Ribosomal Protein S9 0.121 ± 0.019 0.043 ± 0.002 0.060 ± 0.005 0.009 0.012 0.046 Ubiquitin Conjugating Enzyme 0.235 ± 0.042 0.086 ± 0.004 0.392 ± 0.475 0.012 0.361 0.265 Oxidative stress Ferritin 1.594 ± 0.323 1.102 ± 0.009 1.065 ± 0.160 0.059 0.048 0.398 Detoxification Glutathione-S-Transferase 0.580 ± 0.022 0.041 ± 0.012 0.029 ± 0.022 < 0.001 < 0.001 0.297

* Average expression with standard deviations are shown. P-values show pair-wise t-test between the healthy (H), intermediate (I), and necrotic (N) groups. P-values < 0.05 are in bold. Kanamycin was used as the internal control.

Chapter Five: Sponge Holobiont Response to Heat Stress

5.2.2 mRT-qPCR analysis

To investigate the expression profiles of 23 selected host genes in sponges exposed to thermal stress, a multiplexed reverse transcription quantitative polymerase chain reaction (mRT-qPCR) was undertaken using the GenomeLabTM GeXP Genetic Analysis System (Beckman Coulter, Fullerton, CA, USA). Genes involved in the cell stress response and cellular homeostasis-related processes were included in the assay (Table 5.2). Kanamycin was used as the internal control gene. Gene sequences and detailed protocols for multiplex design, mRNA extraction, cDNA preparation, PCR, electropherogram analysis and normalization were described by Webster et al. (Webster et al., In Review). Coefficient of variation among technical replicate reactions was within a maximum of 20%, matching values reported in other studies (633, 634). The optimal pairs of reference genes were identified as Prolifin/YWHAQ and their geometric means were used as the normalization factor.

5.2.3 T-RFLP analysis

Microbial cell enrichment, DNA extraction and preparation were performed as described by Thomas et al. (190). Bacterial 16S rRNA genes were amplified using the primers 27F (5'-AGRGTTTGATCMTGGCTCAG-3') and 1492R (5'- TACGGYTACCTTGTTAYGACTT-3'). The 27F primer was fluorescently labeled at the 5' end with 6-carboxyfluorescein (6-FAM). The PCR reaction contained 11 µL EconoTaq PLUS 2X Master Mix (Lucigen Corporation, Middleton, WI, USA), 400 nM of each primer (Sigma-Aldrich, Castle Hill, NSW, Australia), 1 µL 0.2 ng/µL template DNA, and 11 µL Milli-Q water in a final volume of 25 µL. DNA amplifications were performed using an initial denaturation at 94°C (3 minutes), followed by 30 cycles of denaturation at 94°C (30 seconds), annealing at 54°C (one minute), and extension at 72°C (two minutes). A final extension was run at 72°C for ten minutes. PCR products were verified on 1% agarose gel to ensure they were of the correct size (approx. 1,450 bp) and purified using a DNA Clean & Concentrator™-5 kit (Zymo Research Corporation, Irvine, CA). PCR products were then digested with the restriction enzyme RsaI (GT’AC) in digestion mixtures containing 0.25 U/µL RsaI (New England Biolabs, Hitchin, Hertfordshire, UK), 2 µL

125 Chapter Five: Sponge Holobiont Response to Heat Stress

10X NEB buffer 4, 100 ng DNA product and Milli-Q water used to make a final volume of 20 µL. The digestion included incubation at 37°C for 4 hours followed by denaturation at 60°C for 20 minutes. Restriction digests were cleaned using the DNA Clean & Concentrator™-5 kit (Zymo Research Corporation, Irvine, CA) and then analyzed by the Ramaciotti Centre at the University of New South Wales (Sydney, Australia).

Raw data files containing peak information were tabulated in the Peak Scanner™ Software v1.0 (Life Technologies Corporation, Carlsbad, CA) with size standard 'GS1200LIZ'. T-RFLP data were processed and analyzed with the on-line tool T-REX (635). The data was subjected to quality control procedures: T-RF Alignment (clustering threshold = 0.5), and Noise Filtering (peak area, standard deviation multiplier = 2). Data matrix with presence/absence information was generated for analysis in Primer 6 (PRIMER-E Ltd, Lutton, UK).

5.2.4 Metagenomic sequencing

Metagenomic shotgun sequencing was performed on the Roche 454 Titanium platform (the J. Craig Venter Institute, Rockville, USA). 16S rRNA gene reconstruction from metagenomic shotgun data and phylogenetic analysis was performed as described in Chapter 3. The shotgun sequencing is available through the CAMERA website (http://camera.calit2.net/) under project accession ‘CAM_PROJ_BotanyBay’. Metagenomic read processing, assembly, and functional annotation were performed as described in Chapter 4.

5.2.5 Metaproteomic analysis

Protein extraction and preparation, metaproteomic analysis by tandem MS, MS/MS data analysis and protein identification were performed as described in Chapter 4. MS spectra for the metaproteomic analysis are available through the PRIDE database (www.ebi.ac.uk/pride; accession numbers: 20986-20998). Functional abundance matrices were generated by counting the number of genes (for metagenomic data) or proteins (for metaproteomic data) in each sample that had been assigned to a

126 Chapter Five: Sponge Holobiont Response to Heat Stress particular function (e.g. COG, Pfam and Subsystem). The abundance was weighted by the coverage of the genes (calculated during assembly) or proteins (number of peptides identified) assigned to this function.

5.2.6 Normalization of metagenomic functional abundance by SCGs

As the average genome sizes can potentially be quite different for metagenomic samples and consequently bias the functional profile comparison (444), several strategies to predict average genome size (or genome copy) in metagenomic datasets have been proposed (445-447). These approaches usually calculate the average coverage of conserved, SCGs and use this for normalization. A similar approach was used here by selecting 38 COGs (namely COG0012, COG0016, COG0048, COG0049, COG0052, COG0080, COG0081, COG0087, COG0088, COG0090, COG0091, COG0092, COG0093, COG0094, COG0096, COG0097, COG0098, COG0099, COG0100, COG0102, COG0103, COG0124, COG0172, COG0184, COG0185, COG0186, COG0197, COG0200, COG0201, COG0202, COG0215, COG0256, COG0495, COG0522, COG0525, COG0533, COG0541, and COG0552) from the 40 universal SCGs for normalization (448). These 38 COG entries were consistently abundant across all metagenomic samples and thus functional matrices of COG, Pfam and Subsystem annotation counts were normalized by the average abundance of the 38 COG entries in each sample.

5.2.7 Statistic analyses

Pair-wise statistical comparisons between the healthy / intermediate, intermediate / necrotic and healthy / necrotic samples were conducted using a modified R script for MetaStats (636). The MetaStats script handles two matrices, the original input Ctab and the generated Ptab. MetaStats uses the Ptab to run a t-test and utilizes the Ctab to handle those ‘spare’ counts. A modified script was designed using the sample matrix without normalization as the Ctab, and the normalized/ standardized matrix as the Ptab. To ensure only biologically meaningful functional differences are considered in the comparison, statistical significances were established if all of the following criteria were met: 1) the P-value was less than 0.05; 2) the group had more than three

127 Chapter Five: Sponge Holobiont Response to Heat Stress times higher counts of a function than the other group for metagenomic data, and two times higher for proteomic data; 3) for the group with higher abundance, the normalized count of the specific function needs to be greater than one per genome for metagenomic data or more than 0.5% abundance for metaproteomic data.

Significantly different functions annotated by COG, Pfam, and Subsystem, respectively, were used for sample clustering using Primer 6 (PRIMER-E Ltd, Lutton, UK). Heatmaps were generated using Cluster 3.0 (449) and Java TreeView (450). MDS plots of the GeXP, T-RFLP, metagenomic and metaproteomic results were generated with Primer 6.

5.3 Results and Discussion

5.3.1 Elevated temperature results in sponge necrosis and changes in stress- related gene expression

The experimental design employed 21 R. odorabile clones that were maintained in aquaria at 27°C and 32°C (Table 5.1). All sponge clones from the 27°C treatment remained healthy throughout the experiment, whereas all clones from the 32°C treatment suffered substantial tissue necrosis (de-colorization and appearance of exposed skeletal fibers) after 3 to 4 days. Samples collected on day 4 were photographed (Fig. 5.1). While T32-Day4-a had lost most of their pinacoderm and appeared to be dead, two other samples (T32-Day4-b and T32-Day4-c) still retained a visible amount of healthy tissue (Fig. 5.1). These observations are consistent with previous observations, where R. odorabile began to eject small amounts of cellular material and exhibited surface necrosis after 24 hours at 33°C and major tissue necrosis, resulting in protrusion of skeletal fibers, after three days (229). Based on such morphological observations, three stages of sponge conditions could be defined: fully necrotic (T32-Day3-c and T32-Day4-a), intermediate/ partially necrotic (T32- Day4-b and T32-Day4-c), and healthy (i.e. all other clones) (Table 5.1).

128 Chapter Five: Sponge Holobiont Response to Heat Stress

Fig. 5.1 Morphological changes observed in sponge clones of R. odorabile taken on day 4. (A) T27-Day4-a. (B) T27-Day4-b. (C) T27-Day4-c. (D) T32-Day4-a. (E) T32- Day4-b. (F) T32-Day4-c. Sample T27-Day4-a,b,c present health sponges. Sample T32-Day4-b,c present intermediately necrotic sponges. Sample T32-Day4-a presents necrotic sponges. Arrows indicate typical spots of tissue necrosis.

To further define the physiological state of sponges in these three groups, the expression profiles of 23 selected host genes were investigated using the GeXP Genetic Analysis System (three samples on day 4 at 27°C were used as representatives of the healthy group, see Fig. 5.2). Ordination analysis of the overall expression profiles, showed a clear distinction between the 32°C samples and the healthy sponge clones at 27°C (Fig. 5.2). The high temperature samples were characterized by a decreased expression of genes involved in cytoskeletal/skeletal structures (α-, β-tubulin, B-thymosin, radial spoke protein) and protein synthesis/degradation (cyclophilin NIMA-interacting 4, elongation factor Tu, ribosomal protein S9), which is consistent with the morphological changes observed

129 Chapter Five: Sponge Holobiont Response to Heat Stress

(Table 5.2 and Fig. 5.1). Genes involved in signal transduction (calmodulin) and detoxification (glutathione-S-transferase) also showed lower expression, which indicates substantial changes in gene regulation and sponge metabolism. Expression of the apoptosis-linked genes increased in sponges at 32°C when compared to healthy sponges, demonstrating a direct molecular response to heat stress (Table 5.2). These results are consistent with the recent studies of thermal stress responses in R. odorabile based on qPCR (632) and mRT-qPCR (Webster et al., In Review). In these two studies, the immediate cellular response to thermal stress was also an increased production of Hsp transcripts, which, due to the energetic cost of expression, might compromise normal cellular functions (637).

Fig. 5.2 Sample clustering based on host expression (GeXP). Samples are clusterred using MDS.

Together these experiments showed reproducible changes in morphology and gene expression for R. odorabile in response to elevated temperature. Sponges exposed to 32°C (i.e. the intermediate group and the necrotic group) showed a clear molecular response related to heat stress and ultimately experience cellular apoptosis and tissue- wide necrosis.

130 Chapter Five: Sponge Holobiont Response to Heat Stress

5.3.2 Changes in structure of the microbial community

To investigate if the stress response of R. odorabile occurred in conjunction with a shift in the microbial population, T-RFLP analysis of the 16S rRNA genes was carried out on all 21 clones collected during the temperature exposure (Table 5.1). Sample clustering based on T-RFLP profiles revealed three groups (Fig. 5.3). The first group contained all clones from the 27°C treatment, three clones from the 32°C treatment after one day, and two clones after three days. Bacterial communities of those clones shared 60% or higher similarity, with the only exception being sample T32-Day3-a (just below 60%). This group correlated with ‘healthy’ sponges, no visible signs of tissue degradation (Table 5.1 and Fig. 5.1). The second group consisted of two clones from the 32°C treatment after four days (T32-Day4-b and T32-Day4-c). The similarity between these two clones was greater than 70%, but they were only 50% similar to the healthy group. This group was consistent with ‘intermediate’ sponges that still retained part of the sponge tissue. The last T-RFLP group consisted of one clone from the 32°C treatment after three days (T32-Day3-c) and one after four days (T32-Day4-a). They shared approximately 55% similarity with each other, but only 30% similarity with the other two groups. These two samples belonged to the ‘necrotic’ group. A PermANOVA test based on those samples confirmed that both time and temperature are significant factors related to the microbial community shift in R. odorabile (P = 0.017 and 0.01, respectively). However the co-effect of both factors was not significant (P = 0.119), indicating that they may act independently.

16S rRNA genes were then reconstructed from the shotgun metagenomic data of seven representatives in the three groups and three wild-type samples (Fig. 5.4 and Materials and Methods). Archaeal and bacterial community profiles were reconstructed from the metagenomic datasets and generated 131 OTUs of the 16S rRNA genes with phylogenetic distance cut-off of 0.03 (Materials and Methods). Comparison of healthy in vitro samples with R. odorabile samples taken in the field showed high similarity between microbial communities in the laboratory and the natural environment (Fig. 5.4). This is consistent with previous observations on the short-term maintenance of this sponge in aquaria (638). Only minor differences in OTU abundance (e.g. Defluviicoccus, Nitrosopumilus) were observed, hence the experimental sponges reflect a typical symbiont-system in the field. 131 Chapter Five: Sponge Holobiont Response to Heat Stress

Fig. 5.3 Clustering of bacterial communities based on T-RFLP profile using Bray- Curtis similarity. Samples are clustered by Group average algorithm. Blue, healthy group; yellow, intermediate group; red, necrotic group.

The clustering of samples based on the abundance of OTUs was consistent with the grouping into healthy, intermediate and necrotic sponges and was supported by high Jackknife confidence values (Fig. 5.4). The same grouping profile was observed for the T-RFLP analysis (Fig. 5.3). Microbial community composition of the intermediate samples showed a large overlap with that of the healthy samples, but also showed some similarity with the microbial community of the necrotic samples. The abundance of certain sponge-associated phylotypes decreased (e.g. BD2-11, Gp9 (BPC015), Gp10 (TK85), and Anaerolineaceae) in sponges at the intermediate health

132

Chapter Five: Sponge Holobiont Response to Heat Stress

Fig. 5.4 Shift in R. odorabile bacterial community through phylogenetic construction from metagenomic dataset. The relative abundance of the 50 most abundant OTUs (according to the sum of the relative abundance across all samples) is illustrated. Phylogenetic distance cut-off for OTU generation is 0.03. The size of a dot reflects the relative abundance of an OTU in a sample. Maximum-Likelihood tree of the OTUs is shown on the left and bootstrapping percentage greater than 50% are given (1,000 replications). The tree is rooted with the archaeon Nitrosopumilus. Samples are clustered based on the phylogenetic relationships of their OTUs (the top 50 ones and the low abundance OTUs) using the weighted Unifrac algorithm with 1,000 rounds of Jackknife values (in percentages) shown in nodes. ‘Low abundant OTUs’ are those not in the top 50, while ‘16S rRNA sequences not in OTUs’ reflects those reads that fail to assemble into contigs used for OTU generation. Black, wild-type group; blue, healthy group; yellow, intermediate group; red, necrotic group. γ-proteo, Gammaproteobacteria. α-proteo, Alphaproteobacteria.

state, while other phylotypes (e.g. some proteobacteria) increased in abundance or could be newly detected (Fig. 5.4). These novel and low-abundance phylotypes in intermediate samples eventually dominated the communities of necrotic sponges and belonged to the taxa of Vibrionaceae, Pseudoalteromonas, Colwelliaceae, Ferrimonas, Oceanospirillaceae-2, Endozoicomonas, the BD107 clade, Arcobacter, Marinifillum, and Fusibacter. These taxonomic shifts resulted in the microbial communities of necrotic sponges being clearly distinct from those of healthy and intermediate samples. Changes in community structure were also reflected in community diversity (Fig. 5.5). Generally, the intermediate samples had the highest community diversity, while the wild-type/healthy samples had medium diversity. One of the necrotic clones (32T-Day4-a) lost most of the bacteria found in healthy sponges (Fig. 5.4 and 5.5) and had the lowest diversity.

Overall, exposure to elevated SST increased the microbial diversity within R. odorabile in the intermediate health state as newly colonizing or low-abundance bacteria are able to grow with the community of existing symbionts, which are still present but at lower abundance. Diversity then decreased in the necrotic state as sponge symbionts were lost or outcompeted by other bacteria that benefited from the necrotic sponge environment (see Section 5.3.5).

134 Chapter Five: Sponge Holobiont Response to Heat Stress

Fig. 5.5 Rarefaction plot showing the dynamics of community diversity in different samples. Black, wild-type group; blue, healthy group; yellow, intermediate group; red, necrotic group. Average values of 1,000 replication of Jackknife subsampling were plotted.

5.3.3 Symbiotic functions are lost during temperature-induced shift in community composition

The observations above show that microbial phylotypes, which are typically associated with R. odorabile, were lost during temperature stress and consequently replaced by a new set of microorganisms. Work on zooxanthella species in corals (639, 640) and microbial communities in soil (641) has shown that perturbation can simply cause a replacement of resident microorganisms with functionally equivalent species, which are resistant or adapted to the stress. In other cases, stress was suspected to allow for the introduction of pathogenic strains in the microbial communities of marine invertebrates (207, 208). To investigate if functional changes accompany the taxonomic shift observed for R. odorabile and to characterize the newly introduced taxa, functional annotation and comparison for the metagenomic samples were conducted (Table 5.3).

MDS clustering shows that the functional profiles of wild type and healthy sponges from the 27°C treatment were highly similar (Fig. 5.6AB). They only differed in the abundance of WD40 repeat proteins (COG2319, PF00400) and a transposase (COG0675, PF01385) (Fig. 5.7). This again highlights that the aquarium-maintained R. odorabile are representative of wild-type sponges. Microbial communities of these

135 Chapter Five: Sponge Holobiont Response to Heat Stress native and healthy samples are abundant in functions related to a symbiotic life-style, which are characteristic for sponge symbionts (see Chapter 3 and ref. (190)). Among these are mobile genetic elements (plasmids and transposases), R-M systems and CRISPRs, which might facilitate genetic exchange in the symbiont community, and ELPs, which can potentially be used by symbionts to manipulate their host (see Chapter 3 and ref. (190)). Other typical symbiont functions are represented as genes involved in membrane transport, substrate utilization, cell signaling, regulation, stress response mechanisms and cell-cell adhesion. All of these specific functions were still present in sponges at the intermediate health state, resulting in tight clustering of these samples with the healthy group (Fig. 5.7). In contrast, functions related to symbiosis were dramatically reduced in the communities from necrotic sponges (Fig. 5.7A) demonstrating that they are not composed of ‘typical’ symbionts. A few functions increased in abundance, and specifically, genes encoding for EAL domain proteins (COG2200) were enriched in necrotic sponges. The EAL domain degrades cyclic diguanylate (c-di-GMP), which is an important intracellular signaling molecule that regulates the switch between a motile and surface-associated lifestyle as well as virulence traits in many free-living bacteria (642). Also over-represented in the necrotic samples was the function of flagellar biosynthesis (COG1298, COG1157, COG1049, PF00771) indicating that many of the newly colonizing bacteria were motile. Other pathogenic functions or virulence proteins, including those identified in temperature-stressed corals (233) were not detected in the necrotic sponge samples (Fig. 5.7).

These data demonstrate that temperature stress changes the microbial community from one with predominantly symbiotic functions to one that is characterized by motile bacteria capable of using c-di-GMP as a gene expression regulator. Importantly, the genetic potential for symbiotic interactions was still largely present in the intermediate samples, yet the sponge host had already undergone a stress-related gene expression response (Fig. 5.2). For this reason, it was predicted that the microbial communities of sponges in the intermediate health state, while being genetically similar to the communities in healthy sponges, would nevertheless have altered their expression pattern. In particular, it was hypothesized that altered expression of functions related to symbiosis or temperature stress and this was investigated with metaproteomics (Table 5.4). 136

Fig. 5.6 Sample clustering based on community functional composition and community expression. (A) Metagenomes annotated by COG. (B) Metagenomes annotated by Pfam. (C) Metagenomes annotated by Subsystem. (D) Metaproteomes annotated by COG. (E) Metaproteomes annotated by Pfam. (F) Metaproteomes annotated by Subsystem.

Table 5.3 Information for the metagenomic analysis of R. odorabile clones.

Sample WT-a WT-b WT-c T27-Day4-a T27-Day4-b T27-Day4-c T32-Day3-c T32-Day4-a T32-Day4-b T32-Day4-c Trace file BBAY34 BBAY35 BBAY36 BBAY49 BBAY50 BBAY51 BBAY52 BBAY53 BBAY54 BBAY55 Raw read 949133 583576 519285 742998 695928 627630 537954 403759 660167 453805 Average read size (nt) 402.0 377.3 401.6 423.4 428.2 343.5 316.1 311 391 368.1 Unique read 904146 505357 506761 716090 639263 606860 507711 399034 635356 388317 16S rRNA gene containing read 298 (0.03%) 198 (0.04%) 201 (0.04%) 315 (0.04%) 267 (0.04%) 195 (0.03%) 496 (0.10%) 984 (0.25%) 270 (0.04%) 171 (0.04%) Aligned read 460513 198840 180893 397269 334502 270894 67199 225756 235761 81113 (50.9%) (39.4%) (35.7%) (55.5%) (52.3%) (44.6%) (13.2%) (56.6%) (37.1%) (20.9%) Contig > 500 nt 44408 17286 19306 35262 34102 16733 6420 3796 28674 10205 Average size of contig > 500 nt (nt) 902 900 836 1002 894 1092 743 2441 784 687 N50 size of contig > 500 nt (nt) 909 909 838 1018 895 1178 724 5055 776 664 Maximum size of contig > 500 nt (nt) 52481 16414 16419 52143 18553 17003 6062 89781 5441 3749 Contig > 100 nt 71479 29504 32819 54685 55119 28502 11648 5262 46143 18175 Singleton /outlier > 100 nt 416488 284341 296611 306056 291996 317341 407146 153752 382667 282041 Non-eukaryotic contig and 486090 310735 326136 359921 346460 345197 417634 158279 428032 299667 singleton/outlier (99.6%) (99%) (99%) (99.8%) (99.8%) (99.8%) (99.7%) (99.5%) (99.8%) (99.8%) Unique predicted protein 597098 367768 394503 450176 433069 401283 475263 186518 518234 356540 Total protein 828348 457510 480812 645157 609660 515151 505473 254007 636935 396781 Protein annotated by COG (E<10-10) 230428 120728 123659 204875 181301 134597 104924 89516 175661 100811 (27.8%) (26.4%) (25.7%) (31.8%) (29.7%) (26.1%) (20.8%) (35.2%) (27.6%) (25.4%) Protein annotated by Pfam (E<10-10) 219205 115740 118378 194148 173218 126218 25965 85497 162726 92386 (26.5%) (25.3%) (24.6%) (30.1%) (28.4%) (24.5%) (5.1%) (33.7%) (25.5%) (23.3%) Protein annotated by SEED (E<10-10) 338201 180828 187208 290960 266252 199504 171954 129579 255930 151296 (40.8%) (39.5%) (38.9%) (45.1%) (43.7%) (38.7%) (34%) (51%) (40.2%) (38.1%) * WT, wild-type.

Fig. 5.7 Specific functions abundance of metagenomes in normal and stressed sponge microbial communities. Samples are clustered using Bray-Curtis similarity and group averages. The heatmap is plotted according to the abundance of each function (copy per genome) per sample. Black, the wild-type group. Blue, the healthy group. Orange, the intermediate group. Red, the necrotic group. (A) COG annotation. (B) Pfam annotation. (C) Subsystem annotation.

Chapter Five: Sponge Holobiont Response to Heat Stress

Table 5.4 Information for the metaproteomic analysis of R. odorabile clones.

Sample T27- T27- T32- T32- T32- T32- Day4-a Day4-b Day3-c Day4-a Day4-b Day4-c Unique proteins 462 584 420 301 577 582 Peptides 1389 1727 1218 1061 1642 1713 Proteins annotated by COG 52 71 57 107 81 81 (1e-10) (peptides) 331 482 349 720 529 605 Proteins annotated by 52 86 80 176 104 102 Pfam (1e-10) (peptides) 427 625 536 1255 730 788 Proteins annotated by 53 79 74 123 92 96 SEED (1e-10) (peptides) 433 596 590 1718 885 956

5.3.4 Metaproteomic analysis reveals expression changes related to stress and symbiosis function

Comparison of the metaproteomic datasets showed that the microbial communities of sponges in the intermediate health state had expression profiles much more closely related to necrotic samples than to healthy sponges (Fig. 5.6DE). This confirms that the microbial communities of the intermediate samples had undergone substantial changes in protein expression consistent with the trend observed in the sponge host (Fig. 5.2). Annotation of the expressed proteins based on COG classification revealed that overall functions, like amino acid transport and metabolism were relatively overrepresented in the healthy group, while post-translational modification was overrepresented in the intermediate group. In contrast, translation, transcription and intracellular trafficking were overrepresented in the necrotic sponges (Fig. 5.8).

Further detailed analysis revealed clear support for a heat-stress in the microbial communities of intermediate samples (Fig. 5.9). Specifically, expression of the ClpA protein (COG0542, PF02861, Subsystem: Proteolysis in bacteria, ATP-dependent), belonging to the heat-shock protein family Hsp100, was up-regulated. Clp acts as a chaperone to stabilize and refold proteins (643), or to deliver unfolded proteins to the peptidases for degradation (644). Expression of elongation factor Tu domain 2 was also increased in intermediate samples, and this protein has an established chaperone- like function (645), in addition to its chain-elongation role in translation. Other potential chaperone activities were also detected (PF08406). Small increases in

140 Chapter Five: Sponge Holobiont Response to Heat Stress physiological temperature can cause protein unfolding, entanglement, and unspecific aggregation and the microbial community of heat-treated R. odorabile seemed to have experienced and responded to this thermal stress.

Fig. 5.8 Protein expression for the microbial communities of healthy, intermediate and necrotic sponges as annotated by COG categories. COG category counts were normalized by the total peptides per proteome.

Elevated temperature also decreased functions related to the metabolic interactions between symbionts and the host. Nutritional interdependence between partners in symbiosis often occurs during co-evolution (see Chapter 3 and ref. (74) and is, for example, reflected in the high abundance and diversity of transport proteins found in the metagenomes of healthy R. odorabile (Fig. 5.7) and other sponges (Chapter 3), the metaproteome the sponge C. concentrica (Chapter 4) and the metatranscriptome of the sponge Geodia barretti (403). Consistent with these observations, it was found that transporters involved in the uptake of sugars, peptides, and other substrates, to be highly expressed in healthy R. odorabile (Fig. 5.9). Specifically, this included UgpB

141

Chapter Five: Sponge Holobiont Response to Heat Stress

Fig. 5.9 Specific functions abundance of metaproteomes in normal and stressed sponge microbial communities. Samples are clustered using Bray-Curtis similarity and group averages. The heatmap is plotted according to the abundance of each function (percentage in all peptides) per sample. Blue, the healthy group. Orange, the intermediate group. Red, the necrotic group. (A) COG annotation. (B) Pfam annotation. (C) Subsystem annotation.

and RbsB for sugar utilization, ABC-type transporters for dipeptides, oligopeptides, branched-chain amino acids and trehalose. Expression of these transporters was markedly decreased in sponges with intermediate health, clearly reflecting the shutdown of metabolic processes and interactions within the sponge holobiont. For example, trehalose is often accumulated in the cytoplasm of animal cells, including the sponge S. domuncula (646), to create an osmotic equilibrium with the aqueous surrounding (647, 648). Bacteria can potentially benefit from trehalose as a carbon or energy source as well as an osmolyte (649, 650). Reduced trehalose utilization by temperature-stressed symbionts may reflect a lack of production by the host, which would inevitably lead to a reduced nutrient availability for the microbial community.

Interactions between microbial symbionts and eukaryotic cells may also require specific proteins involved in cell-cell contact (see Chapter 3 and ref. (190). In the microbial community of healthy sponges, the expression of fibronectin type III domains (PF00041), domains related to collagen binding (PF05738, PF01391) and membrane proteins (COG2885, PF00691) may be related to cell-cell interactions (Fig. 5.9). Expression of these ELPs was reduced in intermediate and necrotic samples, which could negatively affect the interactive ability of symbionts and their host. Sponge symbionts must also protect themselves against redox-active substances and phagocytosis and the microbial communities of C. concentrica and other sponges appear to have acquired specific genes to deal with this (see Chapter 3 and ref. (190). For example, the expression of peroxiredoxin (PF10417) and the ELPs NIPSNAP (PF07978) with anti-macrophage activity (651) may be related to symbiont protection and their lower abundance in intermediate samples potentially reduces this protection against host oxidative stress and phagocytosis.

143 Chapter Five: Sponge Holobiont Response to Heat Stress

High abundance of various mobile genetic elements such as conjugative plasmids and phages were detected in the sponge C. concentrica (190) and R. odorabile (Fig. 5.7). Their expression was also altered by heat stress. The decreased expression of the high frequency of lysogenization subunit C (HflC) (COG0330, PF01145, Subsystem: YbbK, Subsystem: CBSS-316057.3.peg.659) indicated an increased lysogenic activity of the prophages in the microbial community (Fig. 5.9). The HflA (containing Hfl X/C/K) locus of E. coli governs the lysis-lysogeny decision of bacteriophage λ by controlling stability of the phage cII protein (652). Decrease of HflXKC will increase prophages, but inhibit mature phage production. This regulation may be either due to the direct effect of elevated seawater temperature or as a consequence of the interrupted nutrient supply (653).

A putative mechanism to prevent sponge bio-fouling may include the expression of N-acyl homoserine lactone hydrolase (AHL lactonase) (Subsystem: Quorum sensing in Yersinia), which inhibits AHL-based quorum-sensing systems (quorum-quenching) (654, 655). In heat-stressed sponges their expression is suppressed (Fig. 5.9C). This could allow for the colonization of bacteria that use AHL systems to control gene expression.

Other expressive changes include the 2-oxoisovalerate dehydrogenase (acylating) involved in amino acid degradation and the protein aldehyde dehydrogenase (COG1012, PF00171) (656). However, the role in sponge symbiosis is currently unclear.

Overall, these observations support a scenario whereby symbionts still persist in the sponges during the intermediate health state but no longer carry out normal symbiotic functions. The disruption of these features likely unbalances a well-tuned association based on nutritional-interdependence and molecular interactions and ultimately leads to a collapse of the holobiont and necrosis of the sponge.

144 Chapter Five: Sponge Holobiont Response to Heat Stress

5.3.5 Opportunistic scavengers dominated the necrotic sponges

Consistent with the metagenomic results, no known virulence-related proteins were expressed in necrotic samples (Fig. 5.9). It is therefore unlikely that the decreased health of the intermediate or necrotic sponges is a direct result of pathogenic bacteria or specific virulence mechanisms. Instead, the new taxa found in the necrotic samples appeared to have a high growth rate and metabolic activity compared to native symbionts. This is supported by the high abundance of RNA polymerase (PF00562, PF04997), ribosomal protein S2 (e.g. COG0052) and translation elongation factors (e.g. COG0480) indicating a high rate of transcription and translation, which is generally related to high growth rates (657). Numerous studies have shown that microbial symbionts derive energy from the oxidation of ammonium that is secreted by sponges (396, 565, 658-660). However, in the necrotic samples, ammonium appeared to be preferentially assimilated through the expression of NADP-specific glutamate dehydrogenase (Subsystem: Arginine and Ornithine Degradation) (661) presumably by the new taxa. This enzyme requires a lower energy cost than the ubiquitous glutamine/ glutamate synthetase pathway and is often utilized under conditions of nitrogen excess (490, 491).

A stressed sponge exhibiting minor necrosis (as observed in the intermediate samples) would be a rich source of nutrients. Bacteria that can quickly move towards the nutrient source, colonize the tissue, rapidly grow and assimilate key nutrients will quickly outcompete the native symbiont community. These observations on the functional gene content and protein expression are consistent with these scavenging tactics. Bursts of nutrients also occur in the marine environment during algal blooms (662) and some of newly colonizing sponge taxa, such as Pseudoalteromonas and Vibrio, are often associated with decaying matter and blooms (663, 664).

5.4 Conclusion

Substantial research effort is currently being directed towards understanding the increasing disease outbreaks in the world’s oceans, especially for corals and sponges (54, 207, 208, 215, 275, 282, 665). Based on some of these studies, mathematical

145 Chapter Five: Sponge Holobiont Response to Heat Stress modeling predicted that a temperature-induced decrease in antibiotic production in corals would cause a general shift in microbial communities and the invasion of pathogens that effectively compete for nutrients (280). These observations for R. odorabile support some aspects of this model; however, they also indicate that a decline in symbiotic interactions is likely to be a key factor in the loss of holobiont function. Specifically, an intermediate state exists, where elevated temperature causes changes in gene expression in both the host and the symbiont community. This is characterized by the expression of a heat-shock response and a loss of symbiosis function. Changes in the community composition occur secondarily to this stress process.

Previous studies have indicated that the invasion of pathogenic microorganisms is responsible for the declining health in some marine invertebrate holobionts (207, 233, 269), while other studies found no support for this (259, 630, 631, 666, 667). The characterization of the temperature-induced decay of R. odorabile showed no evidence for virulence or pathogens, but instead found support for a community shift towards motile and nutrient-scavenging bacteria. These bacteria would opportunistically take advantage of the niches provided by the interruption of symbiosis. It is proposed that these opportunists might thus have lower host specificity than the actual symbionts. Its prediction is consistent with the broad similarity of disease-related microorganisms found in various marine hosts, including sponges, gorgonian sea fans and corals (229, 668, 669).

The stability of free-living microbial communities in response to environmental perturbation has been explored by models and experimental research with a general consensus that community resistance and resilience is positively related to community diversity and functional redundancy (641). In contrast to these free-living systems, members of symbiont communities, especially those in obligate association like in sponges, are highly interdependent and may be very specialized to particular niches (Chapter 3). This high degree of interdependence and specialization would limit functional redundancy within the community, making it unlikely that the loss of an essential symbiont would be compensated by the presence or introduction of a functionally equivalent species (280, 670). These symbiotic systems are therefore likely to be very sensitive to environmental perturbation, highlighted here in the 146 Chapter Five: Sponge Holobiont Response to Heat Stress

R. odorabile holobiont which collapses at 32°C, only 2-3°C above the average annual maximal temperature of their habitat on the GBR (229).

Environmental perturbations, including the effects of climate change and urbanization, are likely to continue to negatively impact on marine species and the ocean’s ecosystem (204, 205, 671, 672). Controlled experiments that simultaneously analyze the structure and function of all symbiosis partners have the potential to reveal which aspects of the holobiont are most sensitive to the environmental change. The multifaceted approach described here would help to discern the mechanisms behind the many models proposed for a range of marine species and their disease syndromes (54, 207, 208, 215, 275, 280, 282, 665). The impact of elevated temperature on a model sponge holobiont suggests that, in invertebrates with similar highly interdependent microbial partnerships, environmental change may irreversibly disrupt the symbiosis with significant implications for host health.

147 Chapter Six: Conclusion and Prospective

Chapter Six Conclusion and Prospective

6.1 Summary

Many microorganisms form symbiotic relationships with eukaryotes, whose complexity can range from those with one single dominant symbiont to associations with hundreds of symbiont species. Symbionts can contribute substantially to host’s physiology, behavior and evolution (46). Until recently, the mechanisms of how microbial symbionts form a community structure and how a community responds to environmental perturbations were rarely studied, partially due to the difficulties of culturing symbionts in laboratory. The recent technical development in the field of microbial ecology has not only unveiled a large diversity of microbial symbionts in nature, but also provided scientists with the opportunity to study symbiosis in more details. The sponge-microbiota symbiosis is an ideal model to study complex microbial symbiont communities. Extensive research over the last decade has provided a good understanding of the phylogenetic diversity and biogeography of sponge-associated microorganisms (74, 192), as well as the phylogenetic community changes in the outbreak of sponge diseases (208). However, functional descriptions of these symbiont communities have been largely missing.

Research conducted in this thesis based on the sponge-microbiota model improved our understanding of the assembly and dynamics of complex symbiont communities. Specifically, a pipeline was developed to reconstruct full-length ribosomal rRNA genes from pyrosequencing metagenomic shotgun data (Chapter 2). This work showed that a substantial proportion of microbial diversity was typically missed by the PCR-based approaches. Chapter 3 described detailed metagenomic analyses identifying the functional core in symbiont communities from taxonomically distinct sponge species. The result indicated that common functions were provided by distinct but functionally equivalent symbionts and enzymes in different sponge hosts. Moreover, the abundance of elements involved in HGT suggested their key roles in

148 Chapter Six: Conclusion and Prospective distributing core functions between symbionts and in facilitating functional convergence on the community scale. To investigate the expression profile of sponge symbionts, a metaproteogenomic analysis was conducted on the sponge C. concentrica as described in Chapter 4. The analysis detected abundant protein expression in substrate transport, aerobic and anaerobic metabolism, stress response, and host-symbiont interactions. As described in Chapter 5, the sponge R. odorabile was then studied under controlled thermal stress. Dramatic changes in community structure, functional composition and gene expression were observed along with the host stress response suggesting that a decline in symbiotic interactions was likely to be the key factor in the loss of holobiont function.

These discoveries greatly extended our understanding of sponge-associated microorganisms. They also provided important evidence to address several fundamental questions in symbiont ecology, especially the mechanisms behind symbiont community assembly and stability. I discuss this in the following two sections.

6.2 The 'barrier hypothesis’ – an extension of the ‘continuum hypothesis’ with reference to symbiont co-evolution

Several theories have recently been proposed to explain microbial community structure. Most of them are developed based on niche, neutral theories or combinations thereof (98, 99, 673). However, these theories are almost exclusively designed for free-living microbes. Only a few individual experiments based on specific research models have discussed mechanisms of community assembly for host-associated microbiota, such as those on macroalgae (68), in human gut (100-102) and in insects (103). A recent review also discussed potential factors shaping microbial communities of symbionts (46). However, a unified model is required to describe the mechanisms of community assembly for free-living microbes and those leading different kinds of associations with hosts.

Based on the discoveries in this thesis, I attempt here to integrate a series of ecological and evolutionary theories, especially the ‘continuum hypothesis’ (81) (i.e.

149 Chapter Six: Conclusion and Prospective niche and neutrality represent the ends of a continuum from competitive to stochastic exclusion, see Section 1.1.4), to present a 'barrier' model to describe community assembly from free-living microbes to strict symbionts. This model introduces the following aspects (Fig. 6.1).

Fig. 6.1 The ‘barrier hypothesis’ model in normal and stressed microbial communities. Solid and empty arrows show immigration and emigration, respectively, with the width of arrows indicates the strength of migration. Squares represent niches in the community and small symbols (triangles etc.) within present community members.

1) A microbial community is defined as a group of microorganisms in a certain spatial and temporal space. However, the definition of the community boundary is often fuzzy and arbitrary. For example, on the spatial scale, a soil community is often sampled for the total microbes of several grams of soil. While a symbiont community is generally defined as all microbes in or on the host body, it can also be narrowed down to those in a particular organ or structure of the host. The features of the community (e.g. the species migration) are not only determined by the content of the community, but also the abiotic and biotic environment immediately outside the

150 Chapter Six: Conclusion and Prospective boundary. The concept of ‘community’ in the present section only applies to a group of microorganisms surrounded by a homogeneous environment, such as the seawater.

2) Every microbial community has many ‘barriers’ to control the exchange (immigration and emigration) of microorganisms with the surrounding environment. The ‘immigration barriers’ of a community are usually characterized by a series of parameters such as environmental factors (e.g. light, nutrient, temperature, pH, salinity, pressure, and radiation), interactions with the host and other microorganisms (e.g. host immune system and antibiotics), and physical separation (e.g. cell membranes of the host). The immigration barriers act as sieves (674) or niche barriers that specify which microorganisms can be included in the community.

The ‘immigration barriers’ of a community are reflected in its functional feature. Specifically, a core set of functions is shared by the community members, often regardless of their phylogenetic backgrounds. This phenomenon called ‘trait convergence’ (675) has been shown in host-associated microbial communities (68, 194) including the sponge microbes in the present study (see Chapter 3). The presence of core functions within a community is not only the result of selection, but may also be facilitated by HGT as suggested in this thesis (see Chapter 3, 4 and 5).

Two communities in similar environments or hosts may share some of their barriers. For example, microbial communities from two different sponge species have the common requirement to interact with their host. These functionally equivalent barriers are thus reflected in the abundance of ELPs in their metagenomes (Chapter 3).

3) The ‘emigration barriers’ determine which members to keep in or to lose from the community. These outgoing barriers are generally characterized by factors such as spatial isolation and metabolic dependence on a specific member by the rest of the community. For example, some insect endosymbionts live in a physically constrained environment and their key role is to provide the host with vital vitamins thus preventing them from being lost (103).

4) Within a microbial community, the assembly follows the rule of the ‘continuum hypothesis’, where both niche structure and population migration play a role. The 151 Chapter Six: Conclusion and Prospective relative importance of the niche and neutral processes is determined by the strength of the migration barriers and the time of co-existence (co-evolution) of the community members.

As different communities possess incoming and outgoing barriers with different strengths of selection, they are thus isolated from the surrounding microorganisms to various levels of extent. For example, in closed systems with strong incoming and outgoing barriers (e.g. sponge symbiosis), the assembly of communities is more determined by niche structure. This is because an isolated environment provides sufficient evolutionary time for competitive exclusion between co-existing community members. These communities thus not only have a low number of redundant species for each niche (Fig. 6.1), but also possess highly structured, nutrient-interdependent relationships between community members (i.e. ‘trait divergence’ (676-678)). HGT may again play a key role in this niche specialization process (Chapter 3) and this can result in reduced genome sizes (562).

In contrast, a planktonic community generally has few barriers with its surrounding (Fig. 6.1). It is thus open to frequent immigration and emigration (neutral process) and the community structure is largely determined by selection following a ‘priority theory’ or ‘lottery hypothesis’ (68, 679, 680). Likely due to the short time of co- existence between functional equivalent species, these communities usually have high functional redundancy (641) (Fig. 6.1).

It is necessary to point out that although the niche specialization tends to reduce microbial genome size, it does not remove the core functions determined by the incoming barriers (675) (i.e. ‘limiting similarity’ (676, 681)). Instead, the isolated environment and long co-evolution history enhance the trait convergence. In contrast, the barrier selection is blurred by frequent migration in free-living communities. This is supported by the observation in the current study that sponge symbionts contain more core functions than in planktonic communities or disease-related scavengers (see Chapter 3 and 5).

5) The specialization level of a closed symbiont community is also determined by its size and its small-scale spatial structure. A spatially homogenous symbiont 152 Chapter Six: Conclusion and Prospective community provides sufficient chance for competitive exclusion, while a community with high spatial structure may permanently possesses two species with overlapping niches due to their spatial separation.

Using a combined niche and neutral theories, the ‘barrier hypothesis’ proposed here predicts that barrier structure and strength drive community specialization and diversity, and highlights the key role of co-existing history of community members in explaining community specialization. The construction of a mathematical model based on the above hypotheses would be an interesting area of future research.

6.3 Disease mechanism of the eukaryote-microbiota holobiont

In Chapter 5, several explanations have been proposed to describe the holobiont changes under thermal stress. Here, I extend these points into a disease model of eukaryote-microbiota holobiont by combining the ‘barrier hypothesis’ from above with the ‘holobiont’ concept and the diversity-related community stability theories.

1) Every holobiont can be considered as a homeostatic system that is maintained by metabolic interdependence and other interactions between the symbionts and the host as well as species migration at various levels. Environmental perturbations alter the barrier structure of a microbial community. For example, an increase in water temperature changed the temperature barrier of the R. odorabile holobiont (Chapter 5). If any single functional unit, such as an enzyme in the sponge cells or in any of the symbiont species, has a low tolerance to this high temperature (e.g. 32°C in Chapter 5), the holobiont’s homeostasis will be affected (Fig. 6.1). If the holobiont cannot compensate for this, then the entire holobiont may start to collapse (see point 3 below).

The shift in the barrier can affect multiple microorganisms in a holobiont as well as the eukaryotic host. For instance, elevated seawater temperature can inhibit the photosynthesis of cyanobacteria in corals (215), increase the coral's respiration rate (271), or disrupt the interaction between the symbionts and the sponge host (Chapter 5). As each functional unit in a holobiont has its specific maximum tolerance to temperature, the lowest tolerance might determine the threshold for the entire

153 Chapter Six: Conclusion and Prospective holobiont (e.g. 32°C for the R. odorabile holobiont) – a concept described by the metaphor that ‘the capacity of a wooden bucket is determined by its shortest board’.

The same change of a barrier (e.g. raised temperature) can result in different outcomes for two holobionts with distinct community compositions. Therefore, in principle, different holobionts (e.g. different sponge species in the same location) could have different temperature thresholds. This could explain that elevated seawater temperature triggered species- or population-specific outbreaks of mortality in corals and sponges (215, 682).

2) While a shift in temperature can alter one of the community barriers, it may also impact on the holobiont’s ability to produce antibiotics (55), inhibitors of quorum sensing (683, 684), anti-fouling compounds or defense molecules against pathogens. These processes further alter other immigration barriers of the holobiont. Consequently, free-living microorganisms with higher temperature tolerance will potentially immigrate from the surrounding seawater (Fig. 6.1). At the same time, the niche structure in the declining holobiont may change dramatically. Those changes can include increased availability of certain nutrients due to the suppressed metabolite utilization by symbionts (see Fig. 6.1 and Chapter 5). Such a eutrophic environment would then select for microorganisms with fast growth rates and motility (see Chapter 5), as they would quickly consume the released nutrient and outcompete the remaining symbionts. This community shift can subsequently be detected by fingerprint-based technologies (see Chapter 5). However, at that point, the holobiont system is already in an irreversible process of functional loss.

3) In contrast to symbiont communities, microbial communities with high functional redundancy, such as soil communities (641), are more resistant to stresses and can recover their function during changes in the environment. In these communities, one niche is usually occupied by several functional equivalent species, which may have various levels of stress tolerance (Fig. 6.1). Once the environmental temperature raises, some species might be ejected from the community, while other functionally equivalent organisms with higher temperature tolerance will survive. These persisting species increase their population by propagation to fill the niche space freed up by the declining taxa and ensure the overall functional stability of the community (Fig. 6.1). 154 Chapter Six: Conclusion and Prospective

Alternatively, species from the surrounding with higher temperature tolerance can be recruited, especially if they have few requirements for functional interdependence with other community members. Therefore, a new homeostasis state is then established and the community switches its phylogenetic composition, but restores its function (Fig. 6.1) (641). This would also explain why marine invertebrate holobionts are expected to be extremely sensitive to global climate change (54, 207, 208) while planktonic microbes can escape extinction during evolution (685).

However, highly specialized symbiont community may also possess a certain level of resistance to environmental perturbation, especially when assisted by genomic recombination (i.e. in the concept of 'hologenome', see 1.3.4). HGT may play key a role by transferring new functions from the gene pool of the rare biosphere to the dominant symbionts (19) and duplication can increase the copy number of a particular gene. These introduced or increased gene functions could help the symbionts to adapt to the altered environment, thus preserving community structure.

6.4 Future study of the microbial symbiosis in sponges

Technological revolutions and theoretical development in microbial ecology have greatly advanced our knowledge of microbial symbiosis in sponges as presented in this thesis and in other recent -omic studies (46, 192). This will unquestionably facilitate future work and two topics are discussed below.

6.4.1 Studying the individual community members

Meta-omic studies have increased our understanding of the overall functional profile of entire communities. However, these mixed ‘soups’ of gene function (686) overlook many structural details of the community. For example, without binning genomic fragment into taxonomic units, it is difficult to determine if the enzymes in a biosynthetic pathway are encoded by one single species, by cells from divergent species (as found in endosymbionts in insects) (103), or by mobile gene elements (such as plasmids) (687). Binning strategies have been used to characterize functional patterns of dominant species in sponge metagenomes (see Chapter 3 and ref. (190)).

155 Chapter Six: Conclusion and Prospective

However, its resolution and accuracy is highly dependent on the community complexity and the amount of available sequence data. Comparative genomics investigating the micro-heterogeneity within a symbiont population (400) and the genomic changes during the adaptation of a symbiont to a host environment (688) require construction of individual symbiont genomes. Moreover, physiological predictions derived from genomic information can potentially enhance the future cultivation of sponge symbionts (689). Several techniques can be applied to obtain individual genomes of microbial symbionts as described in Section 1.4. Besides cell enrichment (400), SCS has been applied to sponge symbionts (191). These approaches should continually be used to obtain symbiont genomes in model sponge holobionts. Reconstruction of nearly complete genomes directly from metagenomic shotgun data by assembly is another powerful technique (12, 51, 323, 347). With sufficient sequencing depth and advanced bioinformatic processes, it is now even possible to reconstruct genomes for low-abundance species (<10% of the community) (690).

Many ecologically significant functions are often encoded in plasmids and viruses, rather than chromosomes (31, 691). However, these mobile parts are often under- sampled in meta-omic protocols, which are mainly designed to target chromosomal DNA. This bias has been noticed and studies specifically targeting the plasmid or viral metagenome have been performed (692-696), including for the human gut (697, 698) and corals (234). While such approaches are generally lacking in sponge microbiology, accidental observations of mobile elements, such as the discovery of novel cyanophages, have been reported in Chapter 3.

While a symbiosis system constitutes both the symbionts and the host, -omic studies often overlook the eukaryotic host. Aside from the extensively investigated human genome/transcriptome/proteome, few other model hosts in symbiosis researches have been sequenced. The recently finished genome of the sponge A. queenslandica (402) provides a good opportunity for systematic study of the sponge symbiosis in future.

156 Chapter Six: Conclusion and Prospective

6.4.2 Looking around the reef – embracing the era of ‘sequencing everything’

While it is common in epidemiology to study pathogens both within a host population and among hosts, current studies of sponge disease rarely consider the microbial symbionts or putative pathogens in other components of the local ecosystem. The detection of disease-related strains in sponges and corals, and the presence of putative coral pathogens in sponges (668, 669), fire worms (699), macroalgae (700) and marine sediments (701) imply a large complexity of marine epidemiology.

With the promise of decreasing sequencing cost and increasing data quality, omic- level studies are becoming the standard in microbial ecology. It can be expected that sequencing entire local ecosystems will become feasible in the near future. Using conventional and -omic techniques, the simultaneous study of the holobionts of seawater, sediments, seaweeds, sponges, corals and other invertebrates and on a reef through time and environmental changes will greatly facilitate our understanding of disease causality and transmission as well as the ecological interactions conducted by microbes.

Sponge microbiology has a history of more than forty years. Despite extensive field studies and molecular surveys, the relatively limited impact of this research field to the broader scientific community does not match its great significance in ecology and evolution. This may be because of insufficient efforts in theoretical exploration and model development. The discoveries and models presented in this thesis hopefully will advance future studies in sponge microbial symbiosis in the light of theoretical and technological revolutions we currently experience.

157

References

1. Falkowski PG, Fenchel T, and Delong EF (2008) The microbial engines that drive Earth's biogeochemical cycles. Science 320:1034-9. 2. Balser T et al. (2006) Bridging the gap between micro - and macro-scale perspectives on the role of microbial communities in global change ecology. Plant and Soil 289:59-70. 3. Gutknecht J, Goodman R, and Balser T (2006) Linking soil process and microbial ecology in freshwater wetland ecosystems. Plant and Soil 289:17-34. 4. Schimel J, Balser TC, and Wallenstein M (2007) Microbial stress-response physiology and its implications for ecosystem function. Ecology 88:1386-94. 5. Neufeld JD, Wagner M, and Murrell JC (2007) Who eats what, where and when? Isotope-labelling experiments are coming of age. ISME J 1:103-10. 6. Pedrós-Alió C (2006) Marine microbial diversity: Can it be determined? Trends Microbiol 14:257-63. 7. Martiny JB et al. (2006) Microbial biogeography: putting microorganisms on the map. Nat Rev Microbiol 4:102-12. 8. Foissner W, Foissner, and Wilhelm (2006) Biogeography and dispersal of micro-organisms: A review emphasizing . Africa 45:111-36. 9. Ramette A, and Tiedje JM (2007) Biogeography: An emerging cornerstone for understanding prokaryotic diversity, ecology, and evolution. Microb Ecol 53:197- 207. 10. Lozupone CA, and Knight R (2007) Global patterns in bacterial diversity. Proc Natl Acad Sci U S A 104:11436-40. 11. Ley RE, Lozupone CA, Hamady M, Knight R, and Gordon JI (2008) Worlds within worlds: Evolution of the vertebrate gut microbiota. Nat Rev Microbiol 6:776- 88. 12. Tyson GW et al. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428:37-43. 13. Hewson I, and Fuhrman JA (2004) Richness and diversity of bacterioplankton species along an estuarine gradient in Moreton Bay, Australia. Appl Environ Microbiol 70:3425-33. 14. Quince C, Curtis TP, and Sloan WT (2008) The rational exploration of microbial diversity. ISME J 2:997-1006. 15. Green J, and Bohannan BJ (2006) Spatial scaling of microbial biodiversity. Trends Ecol Evol 21:501-7.

158

16. Hewson I, Steele JA, Capone DG, and Fuhrman JA (2006) Temporal and spatial scales of variation in bacterioplankton assemblages of oligotrophic surface waters. Marine Ecology Progress Series 311:67-77. 17. Franklin MP et al. (2005) Bacterial diversity in the bacterioneuston (sea surface microlayer): the bacterioneuston through the looking glass. Environ Microbiol 7:723-36. 18. Hewson I, Steele JA, Capone DG, and Fuhrman JA (2006) Remarkable heterogeneity in meso-and bathypelagic bacterioplankton assemblage composition. Limnology and oceanography 51:1274-83. 19. Sogin ML et al. (2006) Microbial diversity in the deep sea and the underexplored "rare biosphere". Proc Natl Acad Sci U S A 103:12115-20. 20. Cottrell MT, and Kirchman DL (2003) Contribution of major bacterial groups to bacterial biomass production (thymidine and leucine incorporation) in the Delaware estuary. Limnology and Oceanography 48:168-78. 21. Malmstrom RR, Cottrell MT, Elifantz H, and Kirchman DL (2005) Biomass production and assimilation of dissolved organic matter by SAR11 bacteria in the Northwest Atlantic Ocean. Appl Environ Microbiol 71:2979-86. 22. Yachi S, and Loreau M (1999) Biodiversity and ecosystem productivity in a fluctuating environment: the insurance hypothesis. Proc Natl Acad Sci U S A 96:1463-8. 23. Bent SJ, and Forney LJ (2008) The tragedy of the uncommon: understanding limitations in the analysis of microbial diversity. ISME J 2:689-95. 24. Thingstad, Lignell R, Thingstad, and R Lignell (1997) Theoretical models for the control of bacterial growth rate, abundance, diversity and carbon demand. Aquatic Microbial Ecology 13:19-27. 25. Weitz JS, Hartman H, and Levin SA (2005) Coevolutionary arms races between bacteria and bacteriophage. Proc Natl Acad Sci U S A 102:9535-40. 26. Wommack KE, and Colwell RR (2000) Virioplankton: Viruses in aquatic ecosystems. Microbiol Mol Biol Rev 64:69-114. 27. Hendrix RW (2003) Bacteriophage genomics. Curr Opin Microbiol 6:506-11. 28. Hatfull GF (2008) Bacteriophage genomics. Curr Opin Microbiol 11:447-53. 29. Lu J, Chen F, and Hodson RE (2001) Distribution, isolation, host specificity, and diversity of cyanophages infecting marine Synechococcus spp. in river estuaries. Appl Environ Microbiol 67:3285-90. 30. Srinivasiah S et al. (2008) Phages across the biosphere: contrasts of viruses in soil and aquatic environments. Res Microbiol 159:349-57. 31. Thurber RV (2009) Current insights into phage biodiversity and biogeography. Curr Opin Microbiol 12:582-7.

159

32. Fuhrman JA (1999) Marine viruses and their biogeochemical and ecological effects. Nature 399:541-8. 33. Bratbak G, Heldal M, Norland S, and Thingstad TF (1990) Viruses as partners in spring bloom microbial trophodynamics. Appl Environ Microbiol 56:1400- 5. 34. Breitbart M, and Rohwer F (2005) Here a virus, there a virus, everywhere the same virus? Trends Microbiol 13:278-84. 35. Canchaya C, Fournous G, Chibani-Chennoufi S, Dillmann M-L, and Brüssow H (2003) Phage as agents of lateral gene transfer. Curr Opin Microbiol 6:417 - 424. 36. Ochman H, Lawrence JG, and Groisman EA (2000) Lateral gene transfer and the nature of bacterial innovation. Nature 405:299-304. 37. Rohwer F, and Thurber RV (2009) Viruses manipulate the marine environment. Nature 459:207-12. 38. Fuhrman JA, and Suttle CA (1993) Viruses in marine planktonic systems. Oceanography 6:51-63. 39. Suttle CA (2007) Marine viruses--major players in the global ecosystem. Nat Rev Microbiol 5:801-12. 40. Suttle CA (2005) Viruses in the sea. Nature 437:356-61. 41. Brussaard CP (2004) Viral control of phytoplankton populations--a review. J Eukaryot Microbiol 51:125-38. 42. Mühling M et al. (2005) Genetic diversity of marine Synechococcus and co- occurring cyanophage communities: evidence for viral control of phytoplankton. Environ Microbiol 7:499-508. 43. Labrie SJ, Samson JE, and Moineau S (2010) Bacteriophage resistance mechanisms. Nat Rev Microbiol 8:317-27. 44. Hoskisson PA, and Smith MC (2007) Hypervariation and phase variation in the bacteriophage 'resistome'. Curr Opin Microbiol 10:396-400. 45. Stern A, and Sorek R (2010) The phage-host arms race: shaping the evolution of microbes. Bioessays 33:43-51. 46. Robinson CJ, Bohannan BJ, and Young VB (2010) From structure to function: the ecology of host-associated microbial communities. Microbiol Mol Biol Rev 74:453-76. 47. Chelius MK, and Triplett EW (2001) The diversity of archaea and bacteria in association with the roots of Zea mays L. Microb Ecol 41:252-63. 48. Frias-Lopez J, Zerkle AL, Bonheyo GT, and Fouke BW (2002) Partitioning of bacterial communities between seawater and healthy, black band diseased, and dead coral surfaces. Appl Environ Microbiol 68:2214-28.

160

49. Rohwer F et al. (2002) Diversity and distribution of coral-associated bacteria. Marine Ecology Progress Series 243:1-10. 50. Dale C, and Moran NA (2006) Molecular interactions between bacterial symbionts and their hosts. Cell 126:453-65. 51. Hess M et al. (2011) Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331:463-7. 52. Moran NA, McCutcheon JP, and Nakabachi A (2008) Genomics and evolution of heritable bacterial symbionts. Annu Rev Genet 42:165-90. 53. Rosenberg E, Koren O, Reshef L, Efrony R, and Zilber-Rosenberg I (2007) The role of microorganisms in coral health, disease and evolution. Nat Rev Microbiol 5:355-62. 54. Bourne DG et al. (2009) Microbial disease and the coral holobiont. Trends Microbiol 17:554-62. 55. Ritchie KB (2006) Regulation of microbial populations by coral surface mucus and mucus-associated bacteria. Marine Ecology Progress Series 322:1-14. 56. Hooper LV, and Gordon JI (2001) Commensal host-bacterial relationships in the gut. Science 292:1115-8. 57. Braun-Fahrländer C et al. (2002) Environmental exposure to endotoxin and its relation to asthma in school-age children. N Engl J Med 347:869-77. 58. O'Hara AM, and Shanahan F (2006) The gut flora as a forgotten organ. EMBO Rep 7:688-93. 59. Ivanov II, and Littman DR (2011) Modulation of immune homeostasis by commensal bacteria. Curr Opin Microbiol 14:106-14. 60. Stappenbeck TS, Hooper LV, and Gordon JI (2002) Developmental regulation of intestinal angiogenesis by indigenous microbes via Paneth cells. Proc Natl Acad Sci U S A 99:15451-5. 61. Bäckhed F et al. (2004) The gut microbiota as an environmental factor that regulates fat storage. Proc Natl Acad Sci U S A 101:15718-23. 62. Stow A, and Beattie A (2008) Chemical and genetic defenses against disease in insect societies. Brain Behav Immun 22:1009-13. 63. Sharon G et al. (2010) Commensal bacteria play a role in mating preference of Drosophila melanogaster. Proc Natl Acad Sci U S A 107:20051-6. 64. Relman DA, and Falkow S (2001) The meaning and impact of the human genome sequence for microbiology. Trends Microbiol 9:206-8. 65. Wilson DS, and Sober E (1989) Reviving the superorganism. J Theor Biol 136:337-56.

161

66. Reeve HK, and Hölldobler B (2007) The emergence of a superorganism through intergroup competition. Proc Natl Acad Sci U S A 104:9736-40. 67. Goodacre R (2007) Metabolomics of a superorganism. J Nutr 137:259S-66S. 68. Burke C, Steinberg P, Rusch D, Kjelleberg S, and Thomas T (2011) Bacterial community assembly based on functional genes rather than species. Proc Natl Acad Sci U S A 108:14288-93. 69. Angela E., and Douglas (2011) Lessons from studying insect symbioses. Cell Host Microbe 10:359-67. 70. Sapp J (2007) The structure of microbial evolutionary theory. Stud Hist Philos Biol Biomed Sci 38:780-95. 71. Zilber-Rosenberg I, and Rosenberg E (2008) Role of microorganisms in the evolution of animals and : the hologenome theory of evolution. FEMS Microbiol Rev 32:723-35. 72. Gray MW, Burger G, and Lang BF (1999) Mitochondrial evolution. Science 283:1476-81. 73. Baumann P et al. (1995) Genetics, physiology, and evolutionary relationships of the genus Buchnera: intracellular symbionts of aphids. Annu Rev Microbiol 49:55- 94. 74. Taylor MW, Radax R, Steger D, and Wagner M (2007) Sponge-associated microorganisms: evolution, ecology, and biotechnological potential. Microbiol Mol Biol Rev 71:295-347. 75. Mueller S et al. (2006) Differences in fecal microbiota in different European study populations in relation to age, gender, and country: a cross-sectional study. Appl Environ Microbiol 72:1027-33. 76. Weimer PJ, Waghorn GC, Odt CL, and Mertens DR (1999) Effect of diet on populations of three species of ruminal cellulolytic bacteria in lactating dairy cows. J Dairy Sci 82:122-34. 77. Russell JB, and Rychlik JL (2001) Factors that alter rumen microbial ecology. Science 292:1119-22. 78. Watanabe H, and Tokuda G (2010) Cellulolytic systems in insects. Annu Rev Entomol 55:609-32. 79. Preston FW (1948) The commonness, and rarity, of species. Ecology 29:254- 83. 80. Adler PB, Hillerislambers J, and Levine JM (2007) A niche for neutrality. Ecology Letters 10:95-104. 81. Gravel D, Canham CD, Beaudet M, and Messier C (2006) Reconciling niche and neutrality: the continuum hypothesis. Ecology Letters 9:399-409.

162

82. Leibold MA, and McPeek MA (2006) Coexistence of the niche and neutral perspectives in community ecology. Ecology 87:1399-410. 83. Ramette A, and Tiedje JM (2007) Multiscale responses of microbial life to spatial distance and environmental heterogeneity in a patchy ecosystem. Proc Natl Acad Sci U S A 104:2761-6. 84. Silvertown J (2004) Plant coexistence and the niche. Trends Ecol Evol 19:605 - 611. 85. Armstrong RA, and McGehee R (1980) Competitive exclusion. Am Nat 115:151-70. 86. Bell G (2000) The distribution of abundance in neutral communities. Am Nat 155:606-17. 87. Chave J (2004) Neutral theory and community ecology. Ecology Letters 7:241-53. 88. Hubbell SP (2006) Neutral theory and the evolution of ecological equivalence. Ecology 87:1387-98. 89. Gaston KJ, and Chown SL (2005) Neutrality and the niche. Functional Ecology 19:1-6. 90. Bruno, and Hérault (2007) Reconciling niche and neutrality through the Emergent Group approach. Perspectives in Plant Ecology, Evolution and Systematics 9:71-8. 91. Gewin V (2006) Beyond neutrality--ecology finds its niche. PLoS Biol 4:e278. 92. Volkov I, Banavar JR, Hubbell SP, and Maritan A (2007) Patterns of relative species abundance in rainforests and coral reefs. Nature 450:45-9. 93. Bell G (2001) Neutral macroecology. Science 293:2413-8. 94. Chisholm RA, and Pacala SW (2010) Niche and neutral models predict asymptotically equivalent species abundance distributions in high-diversity ecological communities. Proc Natl Acad Sci U S A 107:15821-5. 95. Alonso D, Etienne RS, and McKane AJ (2006) The merits of neutral theory. Trends Ecol Evol 21:451-7. 96. Chu CJ et al. (2007) On the balance between niche and neutral processes as drivers of community structure along a successional gradient: insights from alpine and sub-alpine meadow communities. Ann Bot 100:807-12. 97. Curtis TP, and Sloan WT (2005) Microbiology. Exploring microbial diversity--a vast below. Science 309:1331-3. 98. Ofiteru ID et al. (2010) Combined niche and neutral effects in a microbial wastewater treatment community. Proc Natl Acad Sci U S A 107:15345-50.

163

99. Sloan WT et al. (2006) Quantifying the roles of immigration and chance in shaping prokaryote community structure. Environ Microbiol 8:732-40. 100. Turnbaugh PJ, and Gordon JI (2009) The core gut microbiome, energy balance and obesity. J Physiol 587:4153-8. 101. Hamady M, and Knight R (2009) Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res 19:1141-52. 102. Qin J et al. (2010) A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464:59-65. 103. McCutcheon JP, McDonald BR, and Moran NA (2009) of metabolic roles in bacterial co-symbionts of insects. Proc Natl Acad Sci U S A 106:15394-9. 104. Van Valen L (1973) A new evolutionary law. Evolutionary theory 1:1-30. 105. Guidotti LG, and Chisari FV (2001) Noncytolytic control of viral infections by the innate and adaptive immune response. Annu Rev Immunol 19:65-91. 106. Bustamante CD et al. (2005) Natural selection on protein-coding genes in the human genome. Nature 437:1153-7. 107. Buckling A, and Rainey PB (2002) The role of parasites in sympatric and allopatric host diversification. Nature 420:496-9. 108. Laine A-L, and Tellier A (2008) Heterogeneous selection promotes maintenance of polymorphism in host–parasite interactions. Oikos 117:1281-8. 109. McCutcheon JP, McDonald BR, and Moran NA (2009) Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS Genet 5:e1000565. 110. Nakabachi A et al. (2006) The 160-kilobase genome of the bacterial endosymbiont Carsonella. Science 314:267. 111. Moran NA, and Mira A (2001) The process of genome shrinkage in the obligate symbiont Buchnera aphidicola. Genome Biol 2:research0054. 112. Nilsson AI et al. (2005) Bacterial genome size reduction by experimental evolution. Proc Natl Acad Sci U S A 102:12112-6. 113. Wu M et al. (2004) Phylogenomics of the reproductive parasite Wolbachia pipientis wMel: a streamlined genome overrun by mobile genetic elements. PLoS Biol 2:E69. 114. Moran NA, and Plague GR (2004) Genomic changes following host restriction in bacteria. Curr Opin Genet Dev 14:627-33. 115. Douglas AE (1989) Mycetocyte symbiosis in insects. Biol Rev Camb Philos Soc 64:409-34.

164

116. McCutcheon JP, and Moran NA (2007) Parallel genomic evolution and metabolic interdependence in an ancient symbiosis. Proc Natl Acad Sci U S A 104:19392-7. 117. Wu D et al. (2006) Metabolic complementarity and genomics of the dual bacterial symbiosis of sharpshooters. PLoS Biol 4:e188. 118. Tamas I et al. (2002) 50 million years of genomic stasis in endosymbiotic bacteria. Science 296:2376-9. 119. Bell JJ (2008) The functional roles of marine sponges. Estuarine, Coastal and Shelf Science 79:341-53. 120. Kennedy J, Marchesi JR, and Dobson AD (2007) Metagenomic approaches to exploit the biotechnological potential of the microbial consortia of marine sponges. Appl Microbiol Biotechnol 75:11-20. 121. Vogel S (1977) Current-induced flow through living sponges in nature. Proc Natl Acad Sci U S A 74:2069-71. 122. Borchiellini C et al. (2001) Sponge paraphyly and the origin of Metazoa. J Evol Biol 14:171-9. 123. Dayton PK (1989) Interdecadal variation in an antarctic sponge and its predators from oceanographic climate shifts. Science 245:1484-6. 124. Reiswig HM (1971) Particle feeding in natural populations of three marine . Biological Bulletin 141:568-91. 125. Sara M et al. (1998) Endosymbiosis in sponges: relevance for epigenesis and evolution. Symbiosis 25:57-70. 126. Wilkinson CR (1983) Net primary productivity in coral reef sponges. Science 219:410-2. 127. Webster NS, and Blackall LL (2009) What do we really know about sponge- microbial symbioses? ISME J 3:1-3. 128. Friedrich AB et al. (1999) Microbial diversity in the marine sponge Aplysina cavernicola (formerly Verongia cavernicola) analyzed by fluorescence in situ hybridization (FISH). Marine Biology 134:461-70. 129. Hentschel U, Usher KM, and Taylor MW (2006) Marine sponges as microbial fermenters. FEMS Microbiol Ecol 55:167-77. 130. Vacelet J, and Donadey C (1977) Electron microscope study of the association between some sponges and bacteria. J Exp Mar Bio Ecol 30:301-14. 131. Hentschel U et al. (2002) Molecular evidence for a uniform microbial community in sponges from different oceans. Appl Environ Microbiol 68:4431-40. 132. Santavy DL, Willenz P, and Colwell RR (1990) Phenotypic study of bacteria associated with the caribbean sclerosponge, Ceratoporella nicholsoni. Appl Environ Microbiol 56:1750-62. 165

133. Taylor MW, Schupp PJ, Dahllöf I, Kjelleberg S, and Steinberg PD (2004) Host specificity in marine sponge-associated bacteria, and potential implications for marine microbial diversity. Environ Microbiol 6:121-30. 134. Webster NS, Wilson KJ, Blackall LL, and Hill RT (2001) Phylogenetic diversity of bacteria associated with the marine sponge Rhopaloeides odorabile. Appl Environ Microbiol 67:434-44. 135. Wilkinson CR (1978) Microbial associations in sponges. I. Ecology, physiology and microbial populations of coral reef sponges. Marine Biology 49:161-7. 136. Friedrich AB, Fischer I, Proksch P, Hacker J, and Hentschel U (2001) Temporal variation of the microbial community associated with the mediterranean sponge Aplysina aerophoba. FEMS Microbiol Ecol 38:105-15. 137. Taylor MW, Schupp PJ, de Nys R, Kjelleberg S, and Steinberg PD (2005) Biogeography of bacteria associated with the marine sponge Cymbastela concentrica. Environ Microbiol 7:419-33. 138. Webster NS, Negri AP, Munro MM, and Battershill CN (2004) Diverse microbial communities inhabit Antarctic sponges. Environ Microbiol 6:288-300. 139. Margot, Acebal, Toril, Amils, and Fernandez Puentes (2002) Consistent association of crenarchaeal Archaea with sponges of the genus Axinella. Marine Biology 140:739-45. 140. Montalvo NF, and Hill RT (2011) Sponge-associated bacteria are strictly maintained in two closely related but geographically distant sponge hosts. Appl Environ Microbiol 77:7207-16. 141. Wichels A, Würtz S, Döpke H, Schütt C, and Gerdts G (2006) Bacterial diversity in the breadcrumb sponge Halichondria panicea (Pallas). FEMS Microbiol Ecol 56:102-18. 142. Friedrich AB, Fischer I, Proksch P, Hacker J, and Hentschel U (2001) Temporal variation of the microbial community associated with the mediterranean sponge Aplysina aerophoba. FEMS Microbiol Ecol 38:105-15. 143. Thoms C, Horn M, Wagner M, Hentschel U, and Proksch P (2003) Monitoring microbial diversity and natural product profiles of the sponge Aplysina cavernicola following transplantation. Marine Biology 142:685-92. 144. Maldonado M, and Young CM (1998) Limits on the bathymetric distribution of keratose sponges: a field test in deep water. Marine Ecology Progress Series 174:123-39. 145. Schmitt S et al. (2011) Assessing the complex sponge microbiota: core, variable and species-specific bacterial communities in marine sponges. ISME J 6:564- 76. 146. Wilkinson, and C. R (1984) Immunological evidence for the precambrian origin of bacterial symbioses in marine sponges. Proc Biol Sci 220:509.

166

147. Brunton FR, and Dixon OA (1994) Siliceous sponge-microbe biotic associations and their recurrence through the Phanerozoic as reef mound constructors. Palaios 9:370-87. 148. Taylor MW, Thacker RW, and Hentschel U (2007) Genetics. Evolutionary insights from sponges. Science 316:1854-5. 149. Erpenbeck, Breeuwer, van der Velde, and van Soest (2002) Unravelling host and symbiont phylogenies of halichondrid sponges (Demospongiae, Porifera) using a mitochondrial marker. Marine Biology 141:377-86. 150. Ridley CP et al. (2005) Speciation and biosynthetic variation in four dictyoceratid sponges and their cyanobacterial symbiont, Oscillatoria spongeliae. Chem Biol 12:397-406. 151. Thacker RW, and Starnes S (2003) Host specificity of the symbiotic cyanobacterium Oscillatoria spongeliae in marine sponges, Dysidea spp. Marine Biology 142:643-8. 152. Gaino E, Burlando B, Buffa P, and Sarà M (1987) Ultrastructural study of the mature egg of Tethya citrina Sarà and Melone (Porifera, Demospongiae). Gamete Res 16:259-65. 153. Kaye HR (1991) Sexual reproduction in four Caribbean commercial sponges. 2. Oogenesis and transfer of bacterial symbionts. Invertebrate reproduction and development. Rehovot 19:13-24. 154. Sciscioli M, Scalera Liaci L, Lepore E, Gherardi M, and Simpson TL (1991) Ultrastructural study of the mature egg of the marine sponge Stelletta grubii (Porifera Demospongiae). Mol Reprod Dev 28:346-50. 155. Sciscioli M, Lepore E, Gherardi M, and SCALERA L (1994) Transfer of symbiotic bacteria in the mature oocyte of Geodia cydonium (Porifera, Demosponsgiae): an ultrastructural study. Cahiers de biologie marine 35:471-8. 156. Usher KM, Kuo J, Fromont J, and Sutton DC (2001) Vertical transmission of cyanobacterial symbionts in the marine sponge Chondrilla australiensis (Demospongiae). Hydrobiologia 461:9-13. 157. Ereskovsky AV, Gonobobleva E, and Vishnyakov A (2005) Morphological evidence for vertical transmission of symbiotic bacteria in the viviparous sponge Halisarca dujardini Johnston (Porifera, Demospongiae, Halisarcida). Marine Biology 146:869-75. 158. De Caralt S, Uriz MJ, and Wijffels RH (2007) Vertical transmission and successive location of symbiotic bacteria during embryo development and formation in Corticium candelabrum (Porifera: Demospongiae). Journal of the Marine Biological Association of the UK 87:1693-9. 159. Enticknap JJ, Kelly M, Peraud O, and Hill RT (2006) Characterization of a culturable alphaproteobacterial symbiont common to many marine sponges and

167

evidence for vertical transmission via sponge larvae. Appl Environ Microbiol 72:3724-32. 160. Lee OO, Chui PY, Wong YH, Pawlik JR, and Qian PY (2009) Evidence for vertical transmission of bacterial symbionts from adult to embryo in the Caribbean sponge Svenzea zeai. Appl Environ Microbiol 75:6147-56. 161. Schmitt S, Weisz JB, Lindquist N, and Hentschel U (2007) Vertical transmission of a phylogenetically complex microbial consortium in the viviparous sponge Ircinia felix. Appl Environ Microbiol 73:2067-78. 162. Schmitt S, Angermeier H, Schiller R, Lindquist N, and Hentschel U (2008) Molecular microbial diversity survey of sponge reproductive stages and mechanistic insights into vertical transmission of microbial symbionts. Appl Environ Microbiol 74:7694-708. 163. Sharp KH, Eam B, Faulkner DJ, and Haygood MG (2007) Vertical transmission of diverse microbes in the tropical sponge Corticium sp. Appl Environ Microbiol 73:622-9. 164. Steger D et al. (2008) Diversity and mode of transmission of ammonia- oxidizing archaea in marine sponges. Environ Microbiol 10:1087-94. 165. Mukherjee J, Webster N, and Llewellyn LE (2009) Purification and characterization of a collagenolytic enzyme from a pathogen of the great barrier reef sponge, Rhopaloeides odorabile. PLoS One 4:e7177. 166. Usher KM, Sutton DC, Toze S, Kuo J, and Fromont J (2005) Inter- generational transmission of microbial symbionts in the marine sponge Chondrilla australiensis (Demospongiae). Marine and freshwater research 56:125-31. 167. Webster NS et al. (2010) Deep sequencing reveals exceptional diversity and modes of transmission for bacterial sponge symbionts. Environ Microbiol 12:2070-82. 168. Rot C, Goldfarb I, Ilan M, and Huchon D (2006) Putative cross- horizontal gene transfer in sponge (Porifera) mitochondria. BMC Evol Biol 6:71. 169. Arillo A, Bavestrello G, Burlando B, and Sarà M (1993) Metabolic integration between symbiotic cyanobacteria and sponges: a possible mechanism. Marine Biology 117:159-62. 170. Hoffmann F et al. (2005) An anaerobic world in sponges. Geomicrobiology Journal 22:1-10. 171. Kahlert M, and Neumann D (1997) Early development of freshwater sponges under the influence of nitrite and pH. Archiv für Hydrobiologie 139:69-81. 172. Hentschel U et al. (2001) Isolation and phylogenetic analysis of bacteria with antimicrobial activities from the Mediterranean sponges Aplysina aerophoba and Aplysina cavernicola. FEMS Microbiol Ecol 35:305-12.

168

173. Schmidt EW, Obraztsova AY, Davidson SK, Faulkner DJ, and Haygood MG (2000) Identification of the antifungal peptide-containing symbiont of the marine sponge Theonella swinhoei as a novel δ-proteobacterium, “Candidatus Entotheonella palauensis”. Marine Biology 136:969-77. 174. Unson MD, Holland ND, and Faulkner DJ (1994) A brominated secondary metabolite synthesized by the cyanobacterial symbiont of a marine sponge and accumulation of the crystalline metabolite in the sponge tissue. Marine Biology 119:1- 11. 175. Wilkinson CR (1978) Microbial associations in sponges. II. Numerical analysis of sponge and water bacterial populations. Marine Biology 49:169-76. 176. Müller WE et al. (2004) Oxygen-controlled bacterial growth in the sponge Suberites domuncula: toward a molecular understanding of the symbiotic relationships between sponge and bacteria. Appl Environ Microbiol 70:2332-41. 177. Sara M (1971) Ultrastructural aspects of the symbiosis between two species of the genus Aphanocapsa (Cyanophyceae) and Ircinia variabilis (Demospongiae). Marine Biology 11:214-21. 178. Weisz J, Hentschel U, Lindquist N, and Martens C (2007) Linking abundance and diversity of sponge-associated microbial communities to metabolic differences in host sponges. Marine Biology 152:475-83. 179. Wilkinson CR (1979) Nutrient translocation from symbiotic cyanobacteria to coral reef sponges. Coll. Int. CNRS 291:373-80. 180. Wilkinson CR, and Fay P (1979) Nitrogen fixation in coral reef sponges with symbiotic cyanobacteria. Nature 279:527-9. 181. Wilkinson CR, and Vacelet J (1979) Transplantation of marine sponges to different conditions of light and current. J Exp Mar Bio Ecol 37:91-104. 182. Wilkinson CR, and Cheshire AC (1988) Growth rate of Jamaican coral reef sponges after hurricane Allen. Biol Bull 175:175-9. 183. Chelossi E, Milanese M, Milano A, Pronzato R, and Riccardi G (2004) Characterisation and antimicrobial activity of epibiotic bacteria from Petrosia ficiformis (Porifera, Demospongiae). J Exp Mar Bio Ecol 309:21-33. 184. Lee OO, and Qian PY (2004) Potential control of bacterial epibiosis on the surface of the sponge Mycale adhaerens. Aquatic microbial ecology 34:11-21. 185. Thakur NL et al. (2003) Antibacterial activity of the sponge Suberites domuncula and its primmorphs: potential basis for epibacterial chemical defense. Aquatic microbial ecology 31:77-83. 186. Thiel V, and Imhoff JF (2003) Phylogenetic identification of bacteria with antimicrobial activities isolated from Mediterranean sponges. Biomol Eng 20:421-3.

169

187. Newman DJ, and Hill RT (2006) New drugs from marine microbes: the tide is turning. J Ind Microbiol Biotechnol 33:539-44. 188. Wilkinson CR, R. Garrone, and J. Vacelet (1984) Marine sponges discriminate between food bacteria and bacterial Symbionts: Electron microscope radioautography and in situ evidence. Proc Biol Sci 220:519. 189. Wehrl M, Steinert M, and Hentschel U (2007) Bacterial uptake by the marine sponge Aplysina aerophoba. Microb Ecol 53:355-65. 190. Thomas T et al. (2010) Functional genomic signatures of sponge bacteria reveal unique and shared features of symbiosis. ISME J 4:1557-67. 191. Siegl A et al. (2011) Single-cell genomics reveals the lifestyle of Poribacteria, a candidate phylum symbiotically associated with marine sponges. ISME J 5:61-70. 192. Webster NS, and Taylor MW (2011) Marine sponges and their microbial symbionts: love and other relationships. Environ Microbiol 14:335-46. 193. Tschöp MH, Hugenholtz P, and Karp CL (2009) Getting to the core of the gut microbiome. Nat Biotechnol 27:344-6. 194. Turnbaugh PJ et al. (2009) A core gut microbiome in obese and lean twins. Nature 457:480-4. 195. Griggs DJ, and Noguer M (2002) Climate change 2001: The scientific basis. Contribution of Working Group I to the Third Assessment Report of the Intergovernmental Panel on Climate Change. Weather 57:267-9. 196. Michel, and Déqué (2007) Frequency of precipitation and temperature extremes over France in an anthropogenic scenario: Model results and statistical correction according to observed values. Global and Planetary Change 57:16 - 26. 197. Somot S, Sevault F, Déqué M, and Crépon M (2008) 21st century climate change scenario for the Mediterranean using a coupled atmosphere–ocean regional climate model. Global and Planetary Change 63:112-26. 198. Fischer E, and Schär C (2009) Future changes in daily summer temperature variability: driving processes and role for temperature extremes. Climate Dynamics 33:917-35. 199. Levitus S, Antonov J, and Boyer T (2005) Warming of the world ocean, 1955--2003. Geophysical Research Letters 32:L02604. 200. Hoegh-Guldberg O et al. (2007) Coral reefs under rapid climate change and ocean acidification. Science 318:1737-42. 201. Jackson JB et al. (2001) Historical overfishing and the recent collapse of coastal ecosystems. Science 293:629-37. 202. Myers RA, and Worm B (2003) Rapid worldwide depletion of predatory fish communities. Nature 423:280-3.

170

203. Bengtson JL et al. (1991) Antibodies to canine distemper virus in Antarctic seals. Marine Mammal Science 7:85-7. 204. Lafferty KD, Porter JW, and Ford SE (2004) Are diseases increasing in the ocean? Annu Rev Ecol Evol Syst 35:31-54. 205. Plowright RK et al. (2008) Causal inference in disease ecology: investigating ecological drivers of disease emergence. Frontiers in Ecology and the Environment 6:420-9. 206. Ward JR, and Lafferty KD (2004) The elusive baseline of marine disease: are diseases in ocean ecosystems increasing? PLoS Biol 2:E120. 207. Rosenberg E, Kushmaro A, Kramarsky-Winter E, Banin E, and Yossi L (2009) The role of microorganisms in coral bleaching. ISME J 3:139-46. 208. Webster NS (2007) Sponge disease: a global threat? Environ Microbiol 9:1363-75. 209. Hoegh-Guldberg O (1999) Climate change, coral bleaching and the future of the world's coral reefs. Marine and freshwater research 50:839-66. 210. Porter JW et al. (2001) Patterns of spread of coral disease in the Florida Keys. Hydrobiologia 460:1-24. 211. Brandt M (2009) The effect of species and colony size on the bleaching response of reef-building corals in the Florida Keys during the 2005 mass bleaching event. Coral Reefs 28:911-24. 212. Hoegh-Guldberg O, Hoegh-Guldberg, and O (2004) Coral reefs in a century of rapid environmental change. Symbiosis 37:1-31. 213. Loya et al. (2001) Coral bleaching: the winners and the losers. Ecology Letters 4:122-31. 214. Carpenter KE et al. (2008) One-third of reef-building corals face elevated extinction risk from climate change and local impacts. Science 321:560-3. 215. Cebrian E, Uriz MJ, Garrabou J, and Ballesteros E (2011) Sponge mass mortalities in a warming Mediterranean Sea: are cyanobacteria-harboring species worse off? PLoS One 6:e20211. 216. Maldonado M, Sánchez-Tocino L, and Navarro C (2010) Recurrent disease outbreaks in corneous demosponges of the genus Ircinia: epidemic incidence and defense mechanisms. Marine Biology 157:1577-90. 217. Aronson RB, Precht WF, and Macintyre IG (1998) Extrinsic control of species replacement on a Holocene reef in Belize: the role of coral disease. Coral Reefs 17:223-30. 218. Gladfelter WB (1982) White-band disease in Acropora palmata: implications for the structure and growth of shallow reefs. Bulletin of Marine Science 32:639-43.

171

219. Kim K, and Harvell CD (2004) The rise and fall of a six-year coral-fungal epizootic. Am Nat 164 Suppl 5:S52-63. 220. Hodgson G (1997) Assessing coral reef health. Science 277:165. 221. Remily ER, and Richardson LL (2006) Ecological physiology of a coral pathogen and the coral reef environment. Microb Ecol 51:345-52. 222. Jones GP, McCormick MI, Srinivasan M, and Eagle JV (2004) Coral decline threatens fish biodiversity in marine reserves. Proc Natl Acad Sci U S A 101:8251-3. 223. Ainsworth TD, Kramasky-Winter E, Loya Y, Hoegh-Guldberg O, and Fine M (2007) Coral disease diagnostics: what's between a plague and a band? Appl Environ Microbiol 73:981-92. 224. Knight P-A, and Fell PE (1987) Low salinity induces reversible tissue regression in the estuarine sponge Microciona prolifera (Ellis & Solander). J Exp Mar Bio Ecol 107:253 - 261. 225. Luter H, Whalan S, and Webster N (2011) The marine sponge Ianthella basta can recover from stress-induced tissue regression. Hydrobiologia 687:1-9. 226. Work TM, and Aeby GS (2006) Systematically describing gross lesions in corals. Dis Aquat Organ 70:155. 227. López-Legentil S, Erwin PM, Pawlik JR, and Song B (2010) Effects of sponge bleaching on ammonia-oxidizing Archaea: distribution and relative expression of ammonia monooxygenase genes associated with the barrel sponge Xestospongia muta. Microb Ecol 60:561-71. 228. Webster NS, Webb RI, Ridd MJ, Hill RT, and Negri AP (2001) The effects of copper on the microbial community of a coral reef sponge. Environ Microbiol 3:19-31. 229. Webster NS, Cobb RE, and Negri AP (2008) Temperature thresholds for bacterial symbiosis with a sponge. ISME J 2:830-42. 230. Webster NS, Xavier JR, Freckelton M, Motti CA, and Cobb R (2008) Shifts in microbial and chemical patterns within the marine sponge Aplysina aerophoba during a disease outbreak. Environ Microbiol 10:3366-76. 231. Thoms C, Hentschel U, Schmitt S, and Schupp P (2008) Rapid tissue reduction and recovery in the sponge Aplysinella sp. Marine Biology 156:141-53. 232. Luter HM, Whalan S, and Webster NS (2010) Exploring the role of microorganisms in the disease-like syndrome affecting the sponge Ianthella basta. Appl Environ Microbiol 76:5736-44. 233. Vega Thurber R et al. (2009) Metagenomic analysis of stressed coral holobionts. Environ Microbiol 11:2148-63. 234. Vega Thurber RL et al. (2008) Metagenomic analysis indicates that stressors induce production of herpes-like viruses in the coral Porites compressa. Proc Natl Acad Sci U S A 105:18413-8.

172

235. Bourne D, Iida Y, Uthicke S, and Smith-Keune C (2008) Changes in coral- associated microbial communities during a bleaching event. ISME J 2:350-63. 236. Pantos O et al. (2003) The bacterial ecology of a plague-like disease affecting the Caribbean coral Montastrea annularis. Environ Microbiol 5:370-82. 237. Jokiel PL, and Brown EK (2004) Global warming, regional trends and inshore environmental conditions influence coral bleaching in Hawaii. Global Change Biology 10:1627-41. 238. Koop K et al. (2001) ENCORE: the effect of nutrient enrichment on coral reefs. Synthesis of results and conclusions. Mar Pollut Bull 42:91-120. 239. Fabricius KE (2005) Effects of terrestrial runoff on the ecology of corals and coral reefs: review and synthesis. Mar Pollut Bull 50:125-46. 240. Voss J, and Richardson L (2006) Nutrient enrichment enhances black band disease progression in corals. Coral Reefs 25:569-76. 241. Kline DI, Kuntz NM, Breitbart M, Knowlton N, and Rohwer F (2006) Role of elevated organic carbon levels and microbial activity in coral mortality. Marine Ecology Progress Series 314:119-25. 242. Kuntz NM, Kline DI, Sandin SA, and Rohwer F (2005) Pathologies and mortality rates caused by organic carbon and nutrient stressors in three Caribbean coral species. Marine Ecology Progress Series 294:173-80. 243. Szmant A (2002) Nutrient enrichment on coral reefs: Is it a major cause of coral reef decline? Estuaries and Coasts 25:743-66. 244. Kleypas JA et al. (1999) Geochemical consequences of increased atmospheric carbon dioxide on coral reefs. Science 284:118-20. 245. Rosenberg E, and Falkovitz L (2004) The Vibrio shiloi/Oculina patagonica model system of coral bleaching. Annu Rev Microbiol 58:143-59. 246. Regoli F, Cerrano C, Chierici E, Chiantore MC, and Bavestrello G (2004) Seasonal variability of prooxidant pressure and antioxidant adaptation to symbiosis in the Mediterranean Petrosia ficiformis. Marine Ecology Progress Series 275:129-37. 247. Zocchi E et al. (2001) The temperature-signaling cascade in sponges involves a heat-gated cation channel, abscisic acid, and cyclic ADP-ribose. Proc Natl Acad Sci U S A 98:14859-64. 248. Cerrano C et al. (2000) A catastrophic mass-mortality episode of gorgonians and other organisms in the Ligurian Sea (North-western Mediterranean), summer 1999. Ecology Letters 3:284-93. 249. Vicente VP (1989) Regional commercial sponge extinctions in the West Indies: Are recent climatic changes responsible? Marine Ecology 10:179-91.

173

250. Bruno JF, Petes LE, Drew Harvell C, and Hettinger A (2003) Nutrient enrichment can increase the severity of coral diseases. Ecology Letters 6:1056-61. 251. Selvin J, Shanmugha Priya S, Seghal Kiran G, Thangavelu T, and Sapna Bai N (2009) Sponge-associated marine bacteria as indicators of heavy metal pollution. Microbiol Res 164:352-63. 252. Cebrian E, Agell G, Martí R, and Uriz MJ (2006) Response of the Mediterranean sponge Chondrosia reniformis Nardo to copper pollution. Environ Pollut 141:452-8. 253. Cebrian E, Martí R, Uriz JM, and Turon X (2003) Sublethal effects of contamination on the Mediterranean sponge Crambe crambe: metal accumulation and biological responses. Mar Pollut Bull 46:1273-84. 254. Smith FGW (1939) Sponge mortality at British Honduras. Nature 144:785. 255. Wulff JL (2006) Resistance vs recovery: morphological strategies of coral reef sponges. Functional Ecology 20:699-708. 256. Knowlton AL, and Highsmith RC (2005) Nudibranch-sponge feeding dynamics: Benefits of symbiont-containing sponge to Archidoris montereyensis (Cooper, 1862) and recovery of nudibranch feeding scars by Halichondria panicea (Pallas, 1766). J Exp Mar Bio Ecol 327:36-46. 257. Kaczmarsky LT, Draud M, and Williams EH (2005) Is there a relationship between proximity to sewage effluent and the prevalence of coral disease? Caribbean Journal of Science 41:124-37. 258. Denner EB et al. (2003) Aurantimonas coralicida gen. nov., sp. nov., the causative agent of white plague type II on Caribbean scleractinian corals. Int J Syst Evol Microbiol 53:1115-22. 259. Cervino JM, Winiarski-Cervino K, Polson SW, Goreau T, and Smith GW (2006) Identification of bacteria associated with a disease affecting the marine sponge Ianthella basta in New Britain, Papua New Guinea. Marine Ecology Progress Series 324:139-50. 260. Kushmaro A, Loya Y, Fine M, and Rosenberg E (1996) Bacterial infection and coral bleaching. Nature 380:396. 261. Kushmaro A et al. (1997) Bleaching of the coral Oculina patagonica by Vibrio AK-1. Marine Ecology Progress Series 147:159-65. 262. Ben-Haim, and Rosenberg (2002) A novel Vibrio sp. pathogen of the coral Pocillopora damicornis. Marine Biology 141:47-55. 263. Ben-Haim Y, Zicherman-Keren M, and Rosenberg E (2003) Temperature- regulated bleaching and lysis of the coral Pocillopora damicornis by the novel pathogen Vibrio coralliilyticus. Appl Environ Microbiol 69:4236-42.

174

264. Richie KB, and Smith GW (1995) Preferential carbon utilization by surface bacterial communities from water mass, normal, and white-band diseased Acropora cervicornis. Mol Mar Biol Biotechnol 4:345-52. 265. Patterson KL et al. (2002) The etiology of white pox, a lethal disease of the Caribbean elkhorn coral, Acropora palmata. Proc Natl Acad Sci U S A 99:8725-30. 266. Cervino JM et al. (2004) Relationship of Vibrio species infection and elevated temperatures to yellow blotch/band disease in Caribbean corals. Appl Environ Microbiol 70:6855-64. 267. Wilson WH, Dale AL, Davy JE, and Davy SK (2005) An enemy within? Observations of virus-like particles in reef corals. Coral Reefs 24:145-8. 268. Patten N, Harrison P, and Mitchell J (2008) Prevalence of virus-like particles within a staghorn scleractinian coral (Acropora muricata) from the Great Barrier Reef. Coral Reefs 27:569-80. 269. Webster NS, Negri AP, Webb RI, and Hill RT (2002) A spongin-boring alpha-proteobacterium is the etiological agent of disease in the Great Barrier Reef sponge Rhopaloeides odorabile. Marine Ecology Progress Series 232:305-9. 270. Sutherland K et al. (2004) Disease and immunity in Caribbean and Indo- Pacific zooxanthellate corals. Marine Ecology Progress Series 266:273-302. 271. Coma R, Ribes M, Gili JM, and Zabala M (2002) Seasonality of in situ respiration rate in three temperate benthic suspension feeders. Limnology and oceanography:324-31. 272. Rodolfo-Metalpa R et al. (2006) Response of zooxanthellae in symbiosis with the Mediterranean corals Cladocora caespitosa and Oculina patagonica to elevated temperatures. Marine Biology 150:45-55. 273. Ferrier-Pagès C et al. (2009) Physiological response of the symbiotic gorgonian Eunicella singularis to a long-term temperature increase. J Exp Biol 212:3007-15. 274. Lesser MP (2004) Experimental biology of coral reef ecosystems. J Exp Mar Bio Ecol 300:217-52. 275. Coma R et al. (2009) Global warming-enhanced stratification and mass mortality events in the Mediterranean. Proc Natl Acad Sci U S A 106:6176-81. 276. Warner ME, Fitt WK, and Schmidt GW (1996) The effects of elevated temperature on the photosynthetic efficiency of zooxanthellae in hospite from four different species of reef coral: a novel approach. Plant, Cell & Environment 19:291-9. 277. Warner ME, Fitt WK, and Schmidt GW (1999) Damage to photosystem II in symbiotic : a determinant of coral bleaching. Proc Natl Acad Sci U S A 96:8007-12.

175

278. Brown BE (1997) Coral bleaching: causes and consequences. Coral Reefs 16:S129-38. 279. Lesser MP (1997) Oxidative stress causes coral bleaching during exposure to elevated temperatures. Coral Reefs 16:187-92. 280. Mao-Jones J, Ritchie KB, Jones LE, and Ellner SP (2010) How microbial community composition regulates coral disease development. PLoS Biol 8:e1000345. 281. Muller E, Rogers C, Spitzack A, and van Woesik R (2008) Bleaching increases likelihood of disease on Acropora palmata (Lamarck) in Hawksnest Bay, St John, US Virgin Islands. Coral Reefs 27:191-5. 282. Reshef L, Koren O, Loya Y, Zilber-Rosenberg I, and Rosenberg E (2006) The coral probiotic hypothesis. Environ Microbiol 8:2068-73. 283. Russell JA, Latorre A, Sabater-Muñoz B, Moya A, and Moran NA (2003) Side-stepping secondary symbionts: widespread horizontal transfer across and beyond the Aphidoidea. Mol Ecol 12:1061-75. 284. Chia N, and Goldenfeld N (2011) Dynamics of gene duplication and transposons in microbial genomes following a sudden environmental change. Phys Rev E Stat Nonlin Soft Matter Phys 83:021906. 285. Handelsman J (2004) Metagenomics: application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68:669-85. 286. Fuhrman JA, and Campbell L (1998) Microbial microdiversity. Nature 393:410-1. 287. Pace NR, Stahl DA, Lane DJ, and Olsen GJ (1986) The analysis of natural microbial populations by ribosomal RNA sequences. Advances in microbial ecology 9:1-55. 288. Woese CR (1987) Bacterial evolution. Microbiol Rev 51:221-71. 289. Cole JR et al. (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37:D141-5. 290. DeSantis TZ et al. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72:5069-72. 291. Pruesse E et al. (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188-96. 292. Pace NR, Stahl DA, Lane DJ, and Olsen GJ (1985) Analyzing natural microbial populations by rRNA sequences. ASM American Society for Microbiology News 51:4-12. 293. Rappé MS, and Giovannoni SJ (2003) The uncultured microbial majority. Annu Rev Microbiol 57:369-94.

176

294. Wu D et al. (2009) A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea. Nature 462:1056-60. 295. Ley RE et al. (2008) Evolution of mammals and their gut microbes. Science 320:1647-51. 296. Muyzer G, and Smalla K (1998) Application of denaturing gradient gel electrophoresis (DGGE) and temperature gradient gel electrophoresis (TGGE) in microbial ecology. Antonie Van Leeuwenhoek 73:127-41. 297. Schütte UM et al. (2008) Advances in the use of terminal restriction fragment length polymorphism (T-RFLP) analysis of 16S rRNA genes to characterize microbial communities. Appl Microbiol Biotechnol 80:365-80. 298. Amann R, Fuchs BM, and Behrens S (2001) The identification of microorganisms by fluorescence in situ hybridisation. Curr Opin Biotechnol 12:231-6. 299. Wagner M, Horn M, and Daims H (2003) Fluorescence in situ hybridisation for the identification and characterisation of prokaryotes. Curr Opin Microbiol 6:302- 9. 300. Amann R, Snaidr J, Wagner M, Ludwig W, and Schleifer KH (1996) In situ visualization of high genetic diversity in a natural microbial community. J Bacteriol 178:3496-500. 301. Schloss, Gevers PDA, Westcott DA, and Sarah S (2011) Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS One 6:e27310. 302. Rainey FA, Ward-Rainey NL, Janssen PH, Hippe H, and Stackebrandt E (1996) Clostridium paradoxum DSM 7308T contains multiple 16S rRNA genes with heterogeneous intervening sequences. Microbiology 142 ( Pt 8):2087-95. 303. Mylvaganam S, and Dennis PP (1992) Sequence heterogeneity between the two genes encoding 16S rRNA from the halophilic archaebacterium Haloarcula marismortui. Genetics 130:399-410. 304. Carrigg C, Rice O, Kavanagh S, Collins G, and O'Flaherty V (2007) DNA extraction method affects microbial community profiles from soils and sediment. Appl Microbiol Biotechnol 77:955-64. 305. Yang ZhH, Xiao Y, Zeng GM, Xu ZhY, and Liu YSh (2007) Comparison of methods for total community DNA extraction and purification from compost. Appl Microbiol Biotechnol 74:918-25. 306. Temperton B et al. (2009) Bias in assessments of marine microbial biodiversity in fosmid libraries as evaluated by pyrosequencing. ISME J 3:792-6. 307. Hong S, Bunge J, Leslin C, Jeon S, and Epstein SS (2009) Polymerase chain reaction primers miss half of rRNA microbial diversity. ISME J 3:1365-73.

177

308. Caron DA, Countway PD, and Brown MV (2004) The growing contributions of molecular biology and immunology to protistan ecology: molecular signatures as ecological tools. J Eukaryot Microbiol 51:38-48. 309. Haas BJ et al. (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21:494-504. 310. Huse SM, Huber JA, Morrison HG, Sogin ML, and Welch DM (2007) Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8:R143. 311. Wu JY et al. (2010) Effects of polymerase, template dilution and cycle number on PCR based 16 S rRNA diversity analysis using the deep sequencing method. BMC Microbiol 10:255. 312. Margulies M et al. (2005) Genome sequencing in microfabricated high- density picolitre reactors. Nature 437:376-80. 313. Kozarewa I et al. (2009) Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes. Nat Methods 6:291-5. 314. Huse SM, Welch DM, Morrison HG, and Sogin ML (2010) Ironing out the wrinkles in the rare biosphere through improved OTU clustering. Environ Microbiol 12:1889-98. 315. Kunin V, Engelbrektson A, Ochman H, and Hugenholtz P (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118-23. 316. Quince C, Lanzen A, Davenport RJ, and Turnbaugh PJ (2011) Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12:38. 317. Tettelin H et al. (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial "pan-genome". Proc Natl Acad Sci U S A 102:13950-5. 318. Medini D, Donati C, Tettelin H, Masignani V, and Rappuoli R (2005) The microbial pan-genome. Curr Opin Genet Dev 15:589-94. 319. Kettler GC et al. (2007) Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet 3:e231. 320. Lapierre P, and Gogarten JP (2009) Estimating the size of the bacterial pan- genome. Trends Genet 25:107-10. 321. Wong K, and Xagoraraki I (2010) Quantitative PCR assays to survey the bovine adenovirus levels in environmental samples. J Appl Microbiol 109:605-12. 322. Chistoserdova L (2010) Recent progress and new challenges in metagenomics for biotechnology. Biotechnol Lett 32:1351-9.

178

323. Venter JC et al. (2004) Environmental genome shotgun sequencing of the Sargasso Sea. Science 304:66-74. 324. Rusch DB et al. (2007) The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 5:e77. 325. Culley AI, Lang AS, and Suttle CA (2006) Metagenomic analysis of coastal RNA virus communities. Science 312:1795-8. 326. Tringe SG et al. (2005) Comparative metagenomics of microbial communities. Science 308:554-7. 327. Gill SR et al. (2006) Metagenomic analysis of the human distal gut microbiome. Science 312:1355-9. 328. Warnecke F et al. (2007) Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature 450:560-5. 329. Frias-Lopez J et al. (2008) Microbial community gene expression in ocean surface waters. Proc Natl Acad Sci U S A 105:3805-10. 330. Morales SE, and Holben WE (2011) Linking bacterial identities and ecosystem processes: can 'omic' analyses be more than the sum of their parts? FEMS Microbiol Ecol 75:2-16. 331. Metzker ML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11:31-46. 332. Aebersold R, and Mann M (2003) Mass spectrometry-based proteomics. Nature 422:198-207. 333. Williams R et al. (2006) Amplification of complex gene libraries by emulsion PCR. Nat Methods 3:545-50. 334. Yilmaz S, Allgaier M, and Hugenholtz P (2010) Multiple displacement amplification compromises quantitative analysis of metagenomes. Nat Methods 7:943-4. 335. Kunin V, Copeland A, Lapidus A, Mavromatis K, and Hugenholtz P (2008) A bioinformatician's guide to metagenomics. Microbiol Mol Biol Rev 72:557-78. 336. Chen K, and Pachter L (2005) Bioinformatics for whole-genome shotgun sequencing of microbial communities. PLoS Comput Biol 1:106-12. 337. Cowan D et al. (2005) Metagenomic gene discovery: past, present and future. Trends Biotechnol 23:321-9. 338. Rodriguez-Brito B, Rohwer F, and Edwards RA (2006) An application of statistics to comparative metagenomics. BMC Bioinformatics 7:162. 339. Raes J, Foerstner KU, and Bork P (2007) Get the most out of your metagenome: Computational analysis of environmental sequence data. Curr Opin Microbiol 10:490-8.

179

340. Ye Y, and Wooley JC (2009) Metagenomics: Facts and artifacts, and computational challenges. Journal of Computer Science and Technology 25:71-81. 341. Wooley JC, Godzik A, and Friedberg I (2010) A primer on metagenomics. PLoS Comput Biol 6:e1000667. 342. Thomas et al. (2012) Metagenomics - a guide from sampling to data analysis. Microbial Informatics and Experimentation 2:3. 343. Hugenholtz P, and Tyson GW (2008) Microbiology: Metagenomics. Nature 455:481-3. 344. Mou X, Sun S, Edwards RA, Hodson RE, and Moran MA (2008) Bacterial carbon processing by generalist species in the coastal ocean. Nature 451:708-11. 345. Walsh DA et al. (2009) Metagenome of a versatile chemolithoautotroph from expanding oceanic dead zones. Science 326:578-82. 346. Reyes A et al. (2010) Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466:334-8. 347. Turnbaugh PJ et al. (2010) Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins. Proc Natl Acad Sci U S A 107:7503-8. 348. Brüls T, and Weissenbach J (2011) The human metagenome: our other genome? Hum Mol Genet 20:R142-8. 349. Dinsdale EA et al. (2008) Functional metagenomic profiling of nine biomes. Nature 452:629-32. 350. Brulc JM et al. (2009) Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc Natl Acad Sci U S A 106:1948-53. 351. Qi W, Nong G, Preston JF, Ben-Ami F, and Ebert D (2009) Comparative metagenomics of Daphnia symbionts. BMC Genomics 10:172. 352. Wegley L, Edwards R, Rodriguez-Brito B, Liu H, and Rohwer F (2007) Metagenomic analysis of the microbial community associated with the coral Porites astreoides. Environ Microbiol 9:2707-19. 353. DeLong EF et al. (2006) Community genomics among stratified microbial assemblages in the ocean's interior. Science 311:496-503. 354. Konstantinidis KT, Braff J, Karl DM, and DeLong EF (2009) Comparative metagenomic analysis of a microbial community residing at a depth of 4,000 meters at station ALOHA in the North Pacific subtropical gyre. Appl Environ Microbiol 75:5345-55. 355. Sharon I et al. (2011) Comparative metagenomics of microbial traits within oceanic viral communities. ISME J 5:1178-90.

180

356. Xie W et al. (2011) Comparative metagenomics of microbial communities inhabiting deep-sea hydrothermal vent chimneys with contrasting chemistries. ISME J 5:414-26. 357. Urich T et al. (2008) Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS One 3:e2527. 358. Gilbert JA et al. (2008) Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities. PLoS One 3:e3042. 359. Hewson I et al. (2009) In situ transcriptomic analysis of the globally important keystone N2-fixing taxon Crocosphaera watsonii. ISME J 3:618-31. 360. Klatt CG et al. (2011) Community ecology of hot spring cyanobacterial mats: predominant populations and their functional potential. ISME J 5:1262-78. 361. Liu Z et al. (2011) Metatranscriptomic analyses of chlorophototrophs of a hot-spring microbial mat. ISME J 5:1279-90. 362. Poretsky RS et al. (2009) Comparative day/night metatranscriptomic analysis of microbial communities in the North Pacific subtropical gyre. Environ Microbiol 11:1358-75. 363. Shi Y, Tyson GW, and DeLong EF (2009) Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column. Nature 459:266-9. 364. Stewart FJ, Ottesen EA, and Delong EF (2010) Development and quantitative analyses of a universal rRNA-subtraction protocol for microbial metatranscriptomics. ISME J 4:896-907. 365. Maron PA, Ranjard L, Mougel C, and Lemanceau P (2007) Metaproteomics: a new approach for studying functional microbial ecology. Microb Ecol 53:486-93. 366. Schneider T, and Riedel K (2010) Environmental proteomics: analysis of structure and function of microbial communities. Proteomics 10:785-98. 367. Markert S et al. (2007) Physiological proteomics of the uncultured endosymbiont of . Science 315:247-50. 368. Markert S et al. (2011) Status quo in physiological proteomics of the uncultured Riftia pachyptila endosymbiont. Proteomics 11:3106-17. 369. Burnum KE et al. (2011) Proteome insights into the symbiotic relationship between a captive colony of Nasutitermes corniger and its hindgut microbiome. ISME J 5:161-4. 370. Klaassens ES, de Vos WM, and Vaughan EE (2007) Metaproteomics approach to study the functionality of the microbiota in the human infant gastrointestinal tract. Appl Environ Microbiol 73:1388-92. 371. Verberkmoes NC et al. (2009) Shotgun metaproteomics of the human distal gut microbiota. ISME J 3:179-89.

181

372. Rudney JD, Xie H, Rhodus NL, Ondrey FG, and Griffin TJ (2010) A metaproteomic analysis of the human salivary microbiota by three-dimensional peptide fractionation and tandem mass spectrometry. Mol Oral Microbiol 25:38-49. 373. Abbai NS, Govender A, Shaik R, and Pillay B (2012) Pyrosequence analysis of unamplified and whole genome amplified DNA from hydrocarbon-contaminated groundwater. Mol Biotechnol 50:39-48. 374. Prosser JI (2010) Replicate or lie. Environ Microbiol 12:1806-10. 375. Palenik B, Ren Q, Tai V, and Paulsen IT (2009) Coastal Synechococcus metagenome reveals major roles for horizontal gene transfer and plasmids in population diversity. Environ Microbiol 11:349-59. 376. Stein JL, Marsh TL, Wu KY, Shizuya H, and DeLong EF (1996) Characterization of uncultivated prokaryotes: isolation and analysis of a 40-kilobase- pair genome fragment from a planktonic marine archaeon. J Bacteriol 178:591-9. 377. Béjà O et al. (2000) Construction and analysis of bacterial artificial chromosome libraries from a marine microbial assemblage. Environ Microbiol 2:516- 29. 378. Rondon MR et al. (2000) Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl Environ Microbiol 66:2541-7. 379. Banik JJ, and Brady SF (2010) Recent application of metagenomic approaches toward the discovery of antimicrobials and other bioactive small molecules. Curr Opin Microbiol 13:603-9. 380. Simon C, and Daniel R (2009) Achievements and new knowledge unraveled by metagenomic approaches. Appl Microbiol Biotechnol 85:265-76. 381. Teeling H, Waldmann J, Lombardot T, Bauer M, and Glöckner FO (2004) TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics 5:163. 382. McHardy AC, Martín HG, Tsirigos A, Hugenholtz P, and Rigoutsos I (2007) Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods 4:63-72. 383. Brady A, and Salzberg SL (2009) Phymm and PhymmBL: Metagenomic phylogenetic classification with interpolated Markov models. Nat Methods 6:673-6. 384. Parks DH, Macdonald NJ, and Beiko RG (2011) Classifying short genomic fragments from novel lineages using composition and homology. BMC Bioinformatics 12:328. 385. Saeed I, Tang S-L, and Halgamuge SK (2011) Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition. Nucleic Acids Res 40:e34.

182

386. Huson DH, Auch AF, Qi J, and Schuster SC (2007) MEGAN analysis of metagenomic data. Genome Res 17:377-86. 387. Vacelet J (1975) Étude en microscopie électronique de lassociation entre bactéries et spongiaires du genre Verongia (Dictyoceratida). J Microsc Biol Cell 23:271-88. 388. Vacelet J, and Gallissian M-F (1978) Virus-like particles in cells of the sponge Verongia cavernicola (Demospongiae, Dictyoceratida) and accompanying tissues changes. J Invertebr Pathol 31:246 - 254. 389. Wilkinson CR (1979) Bdellovibrio-like parasite of cyanobacteria symbiotic in marine sponges. Arch Microbiol 123:101-3. 390. Olson JB, Lord CC, and McCarthy PJ (2000) Improved recoverability of microbial colonies from marine sponge samples. Microb Ecol 40:139-47. 391. Webster NS, and Hill RT (2001) The culturable microbial community of the Great Barrier Reef sponge Rhopaloeides odorabile is dominated by an α- Proteobacterium. Marine Biology 138:843-51. 392. Olson JB, and McCarthy PJ (2005) Associated bacterial communities of two deep-water sponges. Aquatic microbial ecology 39:47-55. 393. Grozdanov L, and Hentschel U (2007) An environmental genomics perspective on the diversity and function of marine sponge-associated microbiota. Curr Opin Microbiol 10:215-20. 394. Taylor MW, Hill RT, Piel J, Thacker RW, and Hentschel U (2007) Soaking it up: the complex of marine sponges and their microbial associates. ISME J 1:187- 90. 395. Vogel G (2008) The inner lives of sponges. Science 320:1028-30. 396. Schläppy ML et al. (2010) Evidence of nitrification and denitrification in high and low microbial abundance sponges. Marine Biology 157:593-602. 397. Southwell MW, Popp BN, and Martens CS (2008) Nitrification controls on fluxes and isotopic composition of nitrate from Florida Keys sponges. Marine Chemistry 108:96-108. 398. Hoffmann F et al. (2009) Complex nitrogen cycling in the sponge Geodia barretti. Environ Microbiol 11:2228-43. 399. Hallam SJ et al. (2006) Pathways of carbon assimilation and ammonia oxidation suggested by environmental genomic analyses of marine Crenarchaeota. PLoS Biol 4:e95. 400. Hallam SJ et al. (2006) Genomic analysis of the uncultivated marine crenarchaeote Cenarchaeum symbiosum. Proc Natl Acad Sci U S A 103:18296-301.

183

401. Siegl A, and Hentschel U (2010) PKS and NRPS gene clusters from microbial symbiont cells of marine sponges by whole genome amplification. Environmental Microbiology Reports 2:507-13. 402. Srivastava M et al. (2010) The Amphimedon queenslandica genome and the evolution of animal complexity. Nature 466:720-6. 403. Radax R et al. (2012) Metatranscriptomics of the marine sponge Geodia barretti: tackling phylogeny and function of its microbial community. Environ Microbiol (In Press). 404. Pace NR (1997) A molecular view of microbial diversity and the biosphere. Science 276:734-40. 405. Tringe SG, and Hugenholtz P (2008) A renaissance for the pioneering 16S rRNA gene. Curr Opin Microbiol 11:442-6. 406. Stark M, Berger SA, Stamatakis A, and von Mering C (2010) MLTreeMap-- accurate Maximum Likelihood placement of environmental DNA sequences into taxonomic and functional reference phylogenies. BMC Genomics 11:461. 407. Wu M, and Eisen JA (2008) A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9:R151. 408. Liu B, Gibbons T, Ghodsi M, Treangen T, and Pop M (2011) Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences. BMC Genomics 12 Suppl 2:S4. 409. Sharpton TJ et al. (2011) PhylOTU: A high-throughput procedure quantifies microbial community diversity and resolves novel taxa from metagenomic data. PLoS Comput Biol 7:e1001061. 410. Miller CS, Baker BJ, Thomas BC, Singer SW, and Banfield JF (2011) EMIRGE: Reconstruction of full length ribosomal genes from microbial community short read sequencing data. Genome Biol 12:R44. 411. Schloss PD, and Handelsman J (2005) Metagenomics for studying unculturable microorganisms: cutting the Gordian knot. Genome Biol 6:229. 412. Mavromatis K et al. (2007) Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods 4:495-500. 413. McElroy KE, Luciani F, and Thomas T (2012) GemSIM: General, Error- Model based SIMulator of next-generation sequencing data. BMC Genomics 13:74. 414. Schmieder R, and Edwards R (2011) Quality control and preprocessing of metagenomic datasets. Bioinformatics 27:863-4. 415. Bengtsson J et al. (2011) Metaxa: A software tool for automated detection and discrimination among ribosomal small subunit (12S/16S/18S) sequences of archaea, bacteria, eukaryotes, mitochondria, and chloroplasts in metagenomes and environmental sequencing datasets. Antonie Van Leeuwenhoek 100:471-5.

184

416. Dowd SE et al. (2008) Evaluation of the bacterial diversity in the feces of cattle using 16S rDNA bacterial tag-encoded FLX amplicon pyrosequencing (bTEFAP). BMC Microbiol 8:125. 417. Caporaso JG et al. (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7:335-6. 418. Wang Q, Garrity GM, Tiedje JM, and Cole JR (2007) Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73:5261-7. 419. McDonald D et al. (2011) An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6:610- 8. 420. Stamatakis A (2006) RAxML-VI-HPC: Maximum Likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22:2688-90. 421. Talavera G, and Castresana J (2007) Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56:564-77. 422. Edgar RC, Haas BJ, Clemente JC, Quince C, and Knight R (2011) UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27:2194-200. 423. Yung PY et al. (2009) Phylogenetic screening of a bacterial, metagenomic library using homing endonuclease restriction and marker insertion. Nucleic Acids Res 37:e144. 424. Simon C, and Daniel R (2011) Metagenomic analyses: Past and future trends. Appl Environ Microbiol 77:1153-61. 425. Peterson J et al. (2009) The NIH Human Microbiome Project. Genome Res 19:2317-23. 426. Siqueira JF, Fouad AF, and Rôças IN (2012) Pyrosequencing as a tool for better understanding of human microbiomes. J Oral Microbiol 4:10743. 427. Serbus LR, Casper-Lindley C, Landmann F, and Sullivan W (2008) The genetics and cell biology of Wolbachia-host interactions. Annu Rev Genet 42:683-707. 428. McFall-Ngai M (2008) Host-microbe symbiosis: the squid-Vibrio association- -a naturally occurring, experimental model of animal/bacterial partnerships. Adv Exp Med Biol 635:102-12. 429. Hongoh Y (2011) Toward the functional analysis of uncultivable, symbiotic microorganisms in the termite gut. Cell Mol Life Sci 68:1311-25. 430. Marchesi JR (2010) Prokaryotic and eukaryotic diversity of the human gut. Adv Appl Microbiol 72:43-62.

185

431. Burke C, Thomas T, Lewis M, Steinberg P, and Kjelleberg S (2011) Composition, uniqueness and variability of the epiphytic bacterial community of the green alga Ulva australis. ISME J 5:590-600. 432. López-Sánchez MJ et al. (2009) Evolutionary convergence and nitrogen metabolism in Blattabacterium strain Bge, primary endosymbiont of the cockroach Blattella germanica. PLoS Genet 5:e1000721. 433. Moran NA, and Baumann P (2000) Bacterial endosymbionts in animals. Curr Opin Microbiol 3:270-5. 434. McFall-Ngai MJ (2002) Unseen forces: the influence of bacteria on animal development. Dev Biol 242:1-14. 435. Niu B, Fu L, Sun S, and Li W (2010) Artificial and natural duplicates in pyrosequencing reads of metagenomic data. BMC Bioinformatics 11:187. 436. Castresana J (2000) Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol 17:540-52. 437. Lozupone C, Lladser ME, Knights D, Stombaugh J, and Knight R (2010) UniFrac: An effective distance metric for microbial community comparison. ISME J 5:169-72. 438. Noguchi H, Taniguchi T, and Itoh T (2008) MetaGeneAnnotator: Detecting species-specific patterns of ribosomal binding site for precise gene prediction in anonymous prokaryotic and phage genomes. DNA Res 15:387-96. 439. Tatusov RL et al. (2003) The COG database: An updated version includes eukaryotes. BMC Bioinformatics 4:41. 440. Finn RD et al. (2010) The Pfam protein families database. Nucleic Acids Res 38:D211-22. 441. Finn RD, Clements J, and Eddy SR (2011) HMMER web server: Interactive sequence similarity searching. Nucleic Acids Res 39:W29-37. 442. Overbeek R et al. (2005) The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res 33:5691-702. 443. Meyer F et al. (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9:386. 444. Beszteri B, Temperton B, Frickenhaus S, and Giovannoni SJ (2010) Average genome size: A potential source of bias in comparative metagenomics. ISME J 4:1075-7. 445. Raes J, Korbel JO, Lercher MJ, von Mering C, and Bork P (2007) Prediction of effective genome size in metagenomic samples. Genome Biol 8:R10.

186

446. Angly FE et al. (2009) The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput Biol 5:e1000593. 447. Frank JA, and Sørensen SJ (2011) Quantitative metagenomic analyses based on average genome size normalization. Appl Environ Microbiol 77:2513-21. 448. Ciccarelli FD et al. (2006) Toward automatic reconstruction of a highly resolved tree of life. Science 311:1283-7. 449. de Hoon MJ, Imoto S, Nolan J, and Miyano S (2004) Open source clustering software. Bioinformatics 20:1453-4. 450. Saldanha AJ (2004) Java Treeview--extensible visualization of microarray data. Bioinformatics 20:3246-8. 451. Jehl MA, Arnold R, and Rattei T (2011) Effective--a database of predicted secreted bacterial proteins. Nucleic Acids Res 39:D591-5. 452. Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403-10. 453. Larkin MA et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23:2947-8. 454. Wang G, Asakawa S, and Kimura M (2011) Spatial and temporal changes of cyanophage communities in paddy field soils as revealed by the capsid assembly protein gene g20. FEMS Microbiol Ecol 76:352-9. 455. Li W, and Godzik A (2006) Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22:1658-9. 456. Frickey T, and Lupas A (2004) CLANS: A Java application for visualizing protein families based on pairwise similarity. Bioinformatics 20:3702-4. 457. Edgar RC (2004) MUSCLE: Multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32:1792-7. 458. Price MN, Dehal PS, and Arkin AP (2009) FastTree: Computing large minimum evolution trees with profiles instead of a distance matrix. Mol Biol Evol 26:1641-50. 459. Grissa I, Vergnaud G, and Pourcel C (2007) CRISPRFinder: A web tool to identify clustered regularly interspaced short palindromic repeats. Nucleic Acids Res 35:W52-7. 460. Seshadri R, Kravitz SA, Smarr L, Gilna P, and Frazier M (2007) CAMERA: A community resource for metagenomics. PLoS Biol 5:e75. 461. Shannon P et al. (2003) Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 13:2498-504.

187

462. Haft DH, Selengut J, Mongodin EF, and Nelson KE (2005) A guild of 45 CRISPR-associated (Cas) protein families and multiple CRISPR/Cas subtypes exist in prokaryotic genomes. PLoS Comput Biol 1:e60. 463. Gazave E et al. (2010) Polyphyly of the genus Axinella and of the family (Porifera: Demospongiae). Mol Phylogenet Evol 57:35-47. 464. Brochier-Armanet C, Boussau B, Gribaldo S, and Forterre P (2008) Mesophilic Crenarchaeota: Proposal for a third archaeal phylum, the Thaumarchaeota. Nat Rev Microbiol 6:245-52. 465. Holmes B, and Blanch H (2007) Genus-specific associations of marine sponges with group I crenarchaeotes. Marine Biology 150:759-72. 466. Walker CB et al. (2010) Nitrosopumilus maritimus genome reveals unique mechanisms for nitrification and autotrophy in globally distributed marine crenarchaea. Proc Natl Acad Sci U S A 107:8818-23. 467. Blainey PC, Mosier AC, Potanina A, Francis CA, and Quake SR (2011) Genome of a low-salinity ammonia-oxidizing archaeon determined by single-cell and metagenomic analysis. PLoS One 6:e16626. 468. Webster NS, Watts JE, and Hill RT (2001) Detection and phylogenetic analysis of novel crenarchaeote and euryarchaeote 16S ribosomal RNA gene sequences from a Great Barrier Reef sponge. Mar Biotechnol (NY) 3:600-8. 469. Turque AS et al. (2010) Environmental shaping of sponge associated archaeal communities. PLoS One 5:e15774. 470. Pei AY et al. (2010) Diversity of 16S rRNA genes within individual prokaryotic genomes. Appl Environ Microbiol 76:3886-97. 471. Maslunka C, Carr E, Gürtler V, Kämpfer P, and Seviour R (2006) Estimation of ribosomal RNA operon (rrn) copy number in Acinetobacter isolates and potential of patterns of rrn operon-containing fragments for typing strains of members of this genus. Syst Appl Microbiol 29:216-28. 472. Falkowski PG (1997) Evolution of the nitrogen cycle and its influence on the biological sequestration of CO2 in the ocean. Nature 387:272-5. 473. Beman JM, Roberts KJ, Wegley L, Rohwer F, and Francis CA (2007) Distribution and diversity of archaeal ammonia monooxygenase genes associated with corals. Appl Environ Microbiol 73:5642-7. 474. Kneip C, Lockhart P, Voss C, and Maier UG (2007) Nitrogen fixation in eukaryotes--new models for symbiosis. BMC Evol Biol 7:55. 475. Fiore CL, Jarett JK, Olson ND, and Lesser MP (2010) Nitrogen fixation and nitrogen transformations in marine symbioses. Trends Microbiol 18:455-63.

188

476. Bayer K, Schmitt S, and Hentschel U (2008) Physiology, phylogeny and in situ evidence for bacterial and archaeal nitrifiers in the marine sponge Aplysina aerophoba. Environ Microbiol 10:2942-55. 477. Moir JWB, and Wood NJ (2001) Nitrate and nitrite transport in bacteria. Cell Mol Life Sci 58:215-24. 478. Philippot L (2002) Denitrifying genes in bacterial and Archaeal genomes. Biochim Biophys Acta 1577:355-76. 479. Ettwig KF et al. (2010) Nitrite-driven anaerobic methane oxidation by oxygenic bacteria. Nature 464:543-8. 480. Hoffmann F, Rapp HT, Zöller T, and Reitner J (2003) Growth and regeneration in cultivated fragments of the boreal deep water sponge Geodia barretti Bowerbank, 1858 (Geodiidae, Tetractinellida, Demospongiae). J Biotechnol 100:109- 18. 481. Hoffmann F, Larsen O, Rapp HT, and Osinga R (2005) Oxygen dynamics in choanosomal sponge explants. Marine Biology Research 1:160-3. 482. Kaneko T et al. (2000) Complete genome structure of the nitrogen-fixing symbiotic bacterium Mesorhizobium loti. DNA Res 7:331-8. 483. Labbe N, Parent S, and Villemur R (2004) Nitratireductor aquibiodomus gen. nov., sp. nov., a novel alpha-proteobacterium from the marine denitrification system of the Montreal Biodome (Canada). Int J Syst Evol Microbiol 54:269-73. 484. Kim KH et al. (2009) Nitratireductor basaltis sp. nov., isolated from black beach sand. Int J Syst Evol Microbiol 59:135-8. 485. Casciotti KL, and Ward BB (2005) Phylogenetic analysis of nitric oxide reductase gene homologues from aerobic ammonia-oxidizing bacteria. FEMS Microbiol Ecol 52:197-205. 486. Garbeva P, Baggs EM, and Prosser JI (2007) Phylogeny of nitrite reductase (nirK) and nitric oxide reductase (norB) genes from Nitrosospira species isolated from soil. FEMS Microbiol Lett 266:83-9. 487. Wrage N, Velthof GL, Beusichem MLV, and Oenema O (2001) Role of nitrifier denitrification in the production of nitrous oxide. Soil Biology and Biochemistry 33:1723-32. 488. Kirstein K, and Bock E (1993) Close genetic relationship between Nitrobacter hamburgensis nitrite oxidoreductase and Escherichia coli nitrate reductases. Arch Microbiol 160:447-53. 489. Schmid MC et al. (2008) Environmental detection of octahaem cytochrome c hydroxylamine/hydrazine oxidoreductase genes of aerobic and anaerobic ammonium- oxidizing bacteria. Environ Microbiol 10:3140-9.

189

490. Belanger AE, and Hatfull GF (1999) Exponential-phase glycogen recycling is essential for growth of Mycobacterium smegmatis. J Bacteriol 181:6670-8. 491. Harper C, Hayward D, Wiid I, and van Helden P (2008) Regulation of nitrogen metabolism in Mycobacterium tuberculosis: A comparison with mechanisms in Corynebacterium glutamicum and Streptomyces coelicolor. IUBMB Life 60:643-50. 492. Hoffmann F et al. (2008) Oxygen dynamics and transport in the Mediterranean sponge Aplysina aerophoba. Marine Biology 153:1257-64. 493. Siebers B, and Schönheit P (2005) Unusual pathways and enzymes of central carbohydrate metabolism in Archaea. Curr Opin Microbiol 8:695-705. 494. Yahel G, Sharp JH, Marie D, Häse C, and Genin A (2003) In situ feeding and element removal in the symbiont-bearing sponge Theonella swinhoei: Bulk DOC is the major source for carbon. Limnology and Oceanography 48:141-9. 495. Steindler L, Beer S, and Ilan M (2002) Photosymbiosis in intertidal and subtidal tropical sponges. Symbiosis 33:263-73. 496. Falkowski PG, Barber RT, and Smetacek V (1998) Biogeochemical controls and feedbacks on ocean primary production. Science 281:200-7. 497. Ashida H et al. (2003) A functional link between RuBisCO-like protein of Bacillus and photosynthetic RuBisCO. Science 302:286-90. 498. Bannister RJ et al. (2011) Incongruence between the distribution of a common coral reef sponge and photosynthesis. Marine Ecology-Progress Series 423:95-100. 499. Thompson JE, Murphy PT, Berquist PR, and Evans EA (1987) Environmentally induced variation in diterpene composition of the marine sponge Rhopaloeides odorabile. Biochem Syst Ecol 15:595-606. 500. Roberts DE, Cummins SP, Davis AR, and Pangway C (1999) Evidence for symbiotic algae in sponges from temperate coastal reefs in New South Wales, Australia. Memoirs-Queensland Museum 44:493-8. 501. Wyss M, and Kaddurah-Daouk R (2000) Creatine and creatinine metabolism. Physiol Rev 80:1107-213. 502. Ellington WR, and Suzuki T (2007) Early evolution of the creatine kinase gene family and the capacity for creatine biosynthesis and membrane transport. Subcell Biochem 46:17-26. 503. Van Pilsum JF, Stephens GC, and Taylor D (1972) Distribution of creatine, guanidinoacetate and the enzymes for their biosynthesis in the animal kingdom. Implications for phylogeny. Biochem J 126:325-45. 504. Yoshimoto T et al. (2004) Crystal structures of creatininase reveal the substrate binding site and provide an insight into the catalytic mechanism. J Mol Biol 337:399-416.

190

505. Hasegawa Y et al. (2000) A novel degradative pathway of 2-nitrobenzoate via 3-hydroxyanthranilate in Pseudomonas fluorescens strain KU-7. FEMS Microbiol Lett 190:185-90. 506. Moran GR (2005) 4-Hydroxyphenylpyruvate dioxygenase. Arch Biochem Biophys 433:117-28. 507. Koonin EV, Wolf YI, and Aravind L (2000) Protein fold recognition using sequence profiles and its application in structural genomics. Adv Protein Chem 54:245-75. 508. Dassa E, and Bouige P (2001) The ABC of ABCS: A phylogenetic and functional classification of ABC systems in living organisms. Res Microbiol 152:211- 29. 509. Webb ME, Marquet A, Mendel RR, Rébeillé F, and Smith AG (2007) Elucidating biosynthetic pathways for vitamins and cofactors. Nat Prod Rep 24:988- 1008. 510. Begley TP, Chatterjee A, Hanes JW, Hazra A, and Ealick SE (2008) Cofactor biosynthesis--still yielding fascinating new biological chemistry. Curr Opin Chem Biol 12:118-25. 511. Howard EC et al. (2006) Bacterial taxa that limit sulfur flux from the ocean. Science 314:649-52. 512. Vila M et al. (2004) Use of microautoradiography combined with fluorescence in situ hybridization to determine dimethylsulfoniopropionate incorporation by marine bacterioplankton taxa. Appl Environ Microbiol 70:4648-57. 513. Malmstrom RR, Kiene RP, Cottrell MT, and Kirchman DL (2004) Contribution of SAR11 bacteria to dissolved dimethylsulfoniopropionate and amino acid uptake in the North Atlantic ocean. Appl Environ Microbiol 70:4129-35. 514. Raina JB, Dinsdale EA, Willis BL, and Bourne DG (2010) Do the organic sulfur compounds DMSP and DMS drive coral microbial associations? Trends Microbiol 18:101-8. 515. Van Alstyne K, Schupp P, and Slattery M (2006) The distribution of dimethylsulfoniopropionate in tropical Pacific coral reef invertebrates. Coral Reefs 25:321-7. 516. Nyström T, and Neidhardt FC (1994) Expression and role of the universal stress protein, UspA, of Escherichia coli during growth arrest. Mol Microbiol 11:537- 44. 517. Igarashi K, and Kashiwagi K (1999) Polyamine transport in bacteria and yeast. Biochem J 344 Pt 3:633-42. 518. Shah P, and Swiatlo E (2008) A multifaceted role for polyamines in bacterial pathogens. Mol Microbiol 68:4-16.

191

519. Piel J (2009) Metabolites from symbiotic bacteria. Nat Prod Rep 26:338-62. 520. Ruiz N, Gronenberg LS, Kahne D, and Silhavy TJ (2008) Identification of two inner-membrane proteins required for the transport of lipopolysaccharide to the outer membrane of Escherichia coli. Proc Natl Acad Sci U S A 105:5537-42. 521. Nikaido H (2003) Molecular basis of bacterial outer membrane permeability revisited. Microbiol Mol Biol Rev 67:593-656. 522. Hansen IV, Weeks JM, and Depledge MH (1995) Accumulation of copper, zinc, cadmium and chromium by the marine sponge Halichondria panicea Pallas and the implications for biomonitoring. Mar Pollut Bull 31:133-8. 523. Barkay T, and Wagner-Döbler I (2005) Microbial transformations of mercury: potentials, challenges, and achievements in controlling mercury toxicity in the environment. Adv Appl Microbiol 57:1-52. 524. Szurmant H, White RA, and Hoch JA (2007) Sensor complexes regulating two-component signal transduction. Curr Opin Struct Biol 17:706-15. 525. Hazelbauer GL, Falke JJ, and Parkinson JS (2008) Bacterial chemoreceptors: high-performance signaling in networked arrays. Trends Biochem Sci 33:9-19. 526. Mauro LJ, and Dixon JE (1994) 'Zip codes' direct intracellular protein tyrosine phosphatases to the correct cellular 'address'. Trends Biochem Sci 19:151-5. 527. Grangeasse C, Cozzone AJ, Deutscher J, and Mijakovic I (2007) Tyrosine phosphorylation: An emerging regulatory device of bacterial physiology. Trends Biochem Sci 32:86-94. 528. Thomasson B et al. (2002) MglA, a small GTPase, interacts with a tyrosine kinase to control type IV pili-mediated motility and development of Myxococcus xanthus. Mol Microbiol 46:1399-413. 529. Zhao X, and Lam JS (2002) WaaP of Pseudomonas aeruginosa is a novel eukaryotic type protein-tyrosine kinase as well as a sugar kinase essential for the biosynthesis of core lipopolysaccharide. J Biol Chem 277:4722-30. 530. Aarts MG et al. (1997) The Arabidopsis MALE STERILITY 2 protein shares similarity with reductases in elongation/condensation complexes. Plant J 12:615-23. 531. Arbeitman MN, Fleming AA, Siegal ML, Null BH, and Baker BS (2004) A genomic analysis of Drosophila somatic sexual differentiation and its regulation. Development 131:2007-21. 532. Curtis PD, Geyer R, White DC, and Shimkets LJ (2006) Novel lipids in Myxococcus xanthus and their role in chemotaxis. Environ Microbiol 8:1935-49. 533. Owens RM et al. (2004) A dedicated translation factor controls the synthesis of the global regulator Fis. EMBO J 23:3375-85. 534. Grant AJ et al. (2003) Co-ordination of pathogenicity island expression by the BipA GTPase in enteropathogenic Escherichia coli (EPEC). Mol Microbiol 48:507-21.

192

535. Farris M, Grant A, Richardson TB, and O'Connor CD (1998) BipA: a tyrosine-phosphorylated GTPase that mediates interactions between enteropathogenic Escherichia coli (EPEC) and epithelial cells. Mol Microbiol 28:265-79. 536. Barker HC, Kinsella N, Jaspe A, Friedrich T, and O'Connor CD (2000) Formate protects stationary-phase Escherichia coli and Salmonella cells from killing by a cationic antimicrobial peptide. Mol Microbiol 35:1518-29. 537. Reva ON et al. (2006) Functional genomics of stress response in Pseudomonas putida KT2440. J Bacteriol 188:4079-92. 538. Kiss E, Huguet T, Poinsot V, and Batut J (2004) The typA gene is required for stress adaptation as well as for symbiosis of Sinorhizobium meliloti 1021 with certain Medicago truncatula lines. Mol Plant Microbe Interact 17:235-44. 539. Pan X, Lührmann A, Satoh A, Laskowski-Arce MA, and Roy CR (2008) Ankyrin repeat proteins comprise a diverse family of bacterial type IV effectors. Science 320:1651-4. 540. Evdokimov AG, Anderson DE, Routzahn KM, and Waugh DS (2001) Unusual molecular architecture of the Yersinia pestis cytotoxin YopM: A leucine-rich repeat protein with the shortest repeating unit. J Mol Biol 312:807-21. 541. Marino M, Braun L, Cossart P, and Ghosh P (1999) Structure of the lnlB leucine-rich repeats, a domain that triggers host cell invasion by the bacterial pathogen L. monocytogenes. Mol Cell 4:1063-72. 542. Slack FJ, and Ruvkun G (1998) A novel repeat domain that is often associated with RING finger and B-box motifs. Trends Biochem Sci 23:474-5. 543. Hoffmann C, Ohlsen K, and Hauck CR (2011) Integrin-mediated uptake of fibronectin-binding bacteria. Eur J Cell Biol 90:891-6. 544. Lecuit M et al. (2000) A role for alpha-and beta-catenins in bacterial uptake. Proc Natl Acad Sci U S A 97:10008-13. 545. Moliner C, Fournier PE, and Raoult D (2010) Genome analysis of microorganisms living in amoebae reveals a melting pot of evolution. FEMS Microbiol Rev 34:281-94. 546. Tarahovsky YS, Ivanitsky GR, and Khusainov AA (1994) Lysis of Escherichia coli cells induced by bacteriophage T4. FEMS Microbiol Lett 122:195-9. 547. Sharon I et al. (2009) Photosystem I gene cassettes are present in marine virus genomes. Nature 461:258-62. 548. Short CM, and Suttle CA (2005) Nearly identical bacteriophage structural gene sequences are widely distributed in both marine and freshwater environments. Appl Environ Microbiol 71:480-6. 549. Doolittle WF, Kirkwood TB, and Dempster MA (1984) Selfish DNAs with self-restraint. Nature 307:501-2.

193

550. Mahillon J, and Chandler M (1998) Insertion sequences. Microbiol Mol Biol Rev 62:725-74. 551. Yarmolinsky MB (1995) Programmed cell death in bacterial populations. Science 267:836-7. 552. Garcia-Pino A et al. (2008) Doc of prophage P1 is inhibited by its antitoxin partner Phd through fold complementation. J Biol Chem 283:30821-7. 553. Gerdes K, Rasmussen PB, and Molin S (1986) Unique type of plasmid maintenance function: postsegregational killing of plasmid-free cells. Proc Natl Acad Sci U S A 83:3116-20. 554. Cooper TF, Paixão T, and Heinemann JA (2010) Within-host competition selects for plasmid-encoded toxin-antitoxin systems. Proc Biol Sci 277:3149-55. 555. Barrangou R et al. (2007) CRISPR provides acquired resistance against viruses in prokaryotes. Science 315:1709-12. 556. Vale PF, and Little TJ (2010) CRISPR-mediated phage resistance and the ghost of coevolution past. Proc Biol Sci 277:2097-103. 557. Held NL, and Whitaker RJ (2009) Viral biogeography revealed by signatures in Sulfolobus islandicus genomes. Environ Microbiol 11:457-66. 558. Makarova KS, Grishin NV, Shabalina SA, Wolf YI, and Koonin EV (2006) A putative RNA-interference-based immune system in prokaryotes: computational analysis of the predicted enzymatic machinery, functional analogies with eukaryotic RNAi, and hypothetical mechanisms of action. Biol Direct 1:7. 559. van der Oost J, Jore MM, Westra ER, Lundgren M, and Brouns SJ (2009) CRISPR-based adaptive and heritable immunity in prokaryotes. Trends Biochem Sci 34:401-7. 560. Tyson GW, and Banfield JF (2008) Rapidly evolving CRISPRs implicated in acquired resistance of microorganisms to viruses. Environ Microbiol 10:200-7. 561. Zaneveld JR, Lozupone C, Gordon JI, and Knight R (2010) Ribosomal RNA diversity predicts genome diversity in gut bacteria and their relatives. Nucleic Acids Res 38:3869-79. 562. McCutcheon JP, and Moran NA (2011) Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol 10:13-26. 563. Pfeiffer T, and Bonhoeffer S (2004) Evolution of cross-feeding in microbial populations. Am Nat 163:E126-35. 564. Zhu P, Li Q, and Wang G (2008) Unique microbial signatures of the alien Hawaiian marine sponge Suberites zeteki. Microb Ecol 55:406-14. 565. Mohamed NM, Saito K, Tal Y, and Hill RT (2010) Diversity of aerobic and anaerobic ammonia-oxidizing bacteria in marine sponges. ISME J 4:38-48.

194

566. Ram RJ et al. (2005) Community proteomics of a natural microbial biofilm. Science 308:1915-20. 567. Ng C et al. (2010) Metaproteogenomic analysis of a dominant green sulfur bacterium from Ace Lake, Antarctica. ISME J 4:1002-19. 568. Wilmes P et al. (2008) Community proteogenomics highlights microbial strain-variant protein expression within activated sludge performing enhanced biological phosphorus removal. ISME J 2:853-64. 569. Wilmes P, Wexler M, and Bond PL (2008) Metaproteomics provides functional insight into activated sludge wastewater treatment. PLoS One 3:e1778. 570. Chen J, Ryu S, Gharib SA, Goodlett DR, and Schnapp LM (2008) Exploration of the normal human bronchoalveolar lavage fluid proteome. Proteomics Clin Appl 2:585-95. 571. Morris RM et al. (2010) Comparative metaproteomics reveals ocean-scale shifts in microbial nutrient utilization and energy transduction. ISME J 4:673-85. 572. Sowell SM et al. (2009) Transport functions dominate the SAR11 metaproteome at low-nutrient extremes in the Sargasso Sea. ISME J 3:93-105. 573. Sowell SM et al. (2011) Environmental proteomics of microbial plankton in a highly productive coastal upwelling system. ISME J 5:856-65. 574. Marchler-Bauer A et al. (2011) CDD: A Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res 39:D225-9. 575. Laemmli UK (1970) Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature 227:680-5. 576. Gatlin CL, Kleemann GR, Hays LG, Link AJ, and Yates JR (1998) Protein identification at the low femtomole level from silver-stained gels using a new fritless electrospray interface for liquid chromatography-microspray and nanospray mass spectrometry. Anal Biochem 263:93-101. 577. Keller A, Nesvizhskii AI, Kolker E, and Aebersold R (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74:5383-92. 578. Nesvizhskii AI, Keller A, Kolker E, and Aebersold R (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75:4646-58. 579. Lauro FM et al. (2009) The genomic basis of trophic strategy in marine bacteria. Proc Natl Acad Sci U S A 106:15527-33. 580. Ludwig W et al. (2004) ARB: a software environment for sequence data. Nucleic Acids Res 32:1363-71. 581. Stamatakis A, Hoover P, and Rougemont J (2008) A rapid bootstrap algorithm for the RAxML Web servers. Syst Biol 57:758-71.

195

582. Liu MY, Kjelleberg S, and Thomas T (2011) Functional genomic analysis of an uncultured δ-proteobacterium in the sponge Cymbastela concentrica. ISME J 5:427-35. 583. Hugenholtz P, Tyson GW, and Blackall LL (2002) Design and evaluation of 16S rRNA-targeted oligonucleotide probes for fluorescence in situ hybridization. Methods Mol Biol 179:29-42. 584. Olson ER, Dunyak DS, Jurss LM, and Poorman RA (1991) Identification and characterization of dppA, an Escherichia coli gene encoding a periplasmic dipeptide transport protein. J Bacteriol 173:234-44. 585. Yang B et al. (2009) Proline-containing dipeptides from a marine sponge of a Callyspongia species. Helvetica Chimica Acta 92:1112-7. 586. Létoffé S, Delepelaire P, and Wandersman C (2006) The housekeeping dipeptide permease is the Escherichia coli heme transporter and functions with two optional peptide binding proteins. Proc Natl Acad Sci U S A 103:12891-6. 587. Brzoska P, Rimmele M, Brzostek K, and Boos W (1994) The pho regulon- dependent Ugp uptake system for glycerol-3-phosphate in Escherichia coli is trans inhibited by Pi. J Bacteriol 176:15-20. 588. Boos W (1998) Binding protein-dependent ABC transport system for glycerol 3-phosphate of Escherichia coli. Methods Enzymol 292:40-51. 589. Forward JA, Behrendt MC, Wyborn NR, Cross R, and Kelly DJ (1997) TRAP transporters: a new family of periplasmic solute transport systems encoded by the dctPQM genes of Rhodobacter capsulatus and by homologs in diverse gram-negative bacteria. J Bacteriol 179:5482-93. 590. Ullmann R, Gross R, Simon J, Unden G, and Kröger A (2000) Transport of C(4)-dicarboxylates in Wolinella succinogenes. J Bacteriol 182:5757-64. 591. Mulligan C, Fischer M, and Thomas GH (2011) Tripartite ATP-independent periplasmic (TRAP) transporters in bacteria and archaea. FEMS Microbiol Rev 35:68- 86. 592. Morris RM et al. (2002) SAR11 clade dominates ocean surface bacterioplankton communities. Nature 420:806-10. 593. Chae JC, Kim Y, Kim YC, Zylstra GJ, and Kim CK (2000) Genetic structure and functional implication of the fcb gene cluster for hydrolytic dechlorination of 4- chlorobenzoate from Pseudomonas sp. DJ-12. Gene 258:109-16. 594. Nichols NN, and Harwood CS (1995) Repression of 4-hydroxybenzoate transport and degradation by benzoate: a new layer of regulatory control in the Pseudomonas putida beta-ketoadipate pathway. J Bacteriol 177:7033-40. 595. Pedersén M, Saenger P, and Fries L (1974) Simple brominated phenols in . Phytochemistry 13:2273-9.

196

596. White RH, and Hager LP (1977) Occurrence of fatty acid chlorohydrins in jellyfish lipids. Biochemistry 16:4944-8. 597. Ashworth RB, and Cormier MJ (1967) Isolation of 2,6-dibromophenol from the marine hemichordate, biminiensis. Science 155:1558-9. 598. Schmitz FJ, and Gopichand Y (1978) (7E, 13 [xi], 15Z)-14, 16-dibromo-7, 13, 15-hexadecatrien-5-ynoic acid. A novel dibromo acetylenic acid from the marine sponge Xestospongia muta. Tetrahedron Letters 19:3637-40. 599. Ebel R, Brenzinger M, Kunze A, Gross HJ, and Proksch P (1997) Wound activation of protoxins in marine sponge Aplysina aerophoba. J Chem Ecol 23:1451- 62. 600. Gribble GW, and Gribble GW (1999) The diversity of naturally occurring organobromine compounds. Chemical Society Reviews 28:335-46. 601. Norte M, Rodriguez ML, Fernandez JJ, Eguren L, and Estrada DM (1988) Aplysinadiene and (R, R) 5 [3, 5-dibromo-4-[(2-oxo-5-oxazolidinyl)] methoxyphenyl]-2-oxazolidinone, two novel metabolites from Aplysina aerophoba Synthes. Tetrahedron 44:4973-80. 602. Utkina NK et al. (2001) Spongiadioxins A and B, two new polybrominated dibenzo-p-dioxins from an Australian marine sponge Dysidea dendyi. J Nat Prod 64:151-3. 603. Poolman B, and Konings WN (1993) Secondary solute transport in bacteria. Biochim Biophys Acta 1183:5-39. 604. Giovannoni S, and Stingl U (2007) The importance of culturing bacterioplankton in the 'omics' age. Nat Rev Microbiol 5:820-6. 605. Schauer K, Rodionov DA, and de Reuse H (2008) New substrates for TonB- dependent transport: do we only see the 'tip of the iceberg'? Trends Biochem Sci 33:330-8. 606. Ezraty B, Aussel L, and Barras F (2005) Methionine sulfoxide reductases in prokaryotes. Biochim Biophys Acta 1703:221-9. 607. Perry JJ, Shin DS, Getzoff ED, and Tainer JA (2010) The structural biochemistry of the superoxide dismutases. Biochim Biophys Acta 1804:245-62. 608. Hofmann B, Hecht HJ, and Flohé L (2002) Peroxiredoxins. Biol Chem 383:347-64. 609. Vuilleumier S (1997) Bacterial glutathione S-transferases: What are they good for? J Bacteriol 179:1431-41. 610. Landfald B, and Strøm AR (1986) Choline-glycine betaine pathway confers a high level of osmotic tolerance in Escherichia coli. J Bacteriol 165:849-55. 611. Schmidt I, and Bock E (1997) Anaerobic ammonia oxidation with nitrogen dioxide by Nitrosomonas eutropha. Arch Microbiol 167:106-11.

197

612. Schaffer S, Isci N, Zickner B, and Dürre P (2002) Changes in protein synthesis and identification of proteins specifically induced during solventogenesis in Clostridium acetobutylicum. Electrophoresis 23:110-21. 613. Elssner T, Engemann C, Baumgart K, and Kleber HP (2001) Involvement of coenzyme A esters and two new enzymes, an enoyl-CoA hydratase and a CoA- transferase, in the hydration of crotonobetaine to L-carnitine by Escherichia coli. Biochemistry 40:11140-8. 614. Kleber HP (1997) Bacterial carnitine metabolism. FEMS Microbiol Lett 147:1-9. 615. Preusser A, Wagner U, Elssner T, and Kleber HP (1999) Crotonobetaine reductase from Escherichia coli consists of two proteins. Biochim Biophys Acta 1431:166-78. 616. Walt A, and Kahn ML (2002) The fixA and fixB genes are necessary for anaerobic carnitine reduction in Escherichia coli. J Bacteriol 184:4044-7. 617. Iturbe-Ormaetxe I, Burke GR, Riegler M, and O'Neill SL (2005) Distribution, expression, and motif variability of ankyrin domain genes in Wolbachia pipientis. J Bacteriol 187:5136-45. 618. Mavromatis K et al. (2006) The genome of the obligately intracellular bacterium Ehrlichia canis reveals themes of complex membrane structure and immune evasion strategies. J Bacteriol 188:4015-23. 619. Habyarimana F et al. (2008) Role for the Ankyrin eukaryotic-like genes of Legionella pneumophila in parasitism of protozoan hosts and human macrophages. Environ Microbiol 10:1460-74. 620. Voth DE et al. (2009) The Coxiella burnetii ankyrin repeat domain-containing protein family is heterogeneous, with C-terminal truncations that influence Dot/Icm- mediated secretion. J Bacteriol 191:4232-42. 621. Nummelin H et al. (2004) The Yersinia adhesin YadA collagen-binding domain structure is a novel left-handed parallel beta-roll. EMBO J 23:701-11. 622. Mantelin S et al. (2006) Emended description of the genus Phyllobacterium and description of four novel species associated with plant roots: Phyllobacterium bourgognense sp. nov., Phyllobacterium ifriqiyense sp. nov., Phyllobacterium leguminum sp. nov. and Phyllobacterium brassicacearum sp. nov. Int J Syst Evol Microbiol 56:827-39. 623. Jurado V et al. (2005) Phyllobacterium catacumbae sp. nov., a member of the order 'Rhizobiales' isolated from Roman catacombs. Int J Syst Evol Microbiol 55:1487-90. 624. Mergaert J, Boley A, Cnockaert MC, Müller WR, and Swings J (2001) Identity and potential functions of heterotrophic bacterial isolates from a continuous-

198

upflow fixed-bed reactor for denitrification of drinking water with bacterial polyester as source of carbon and electron donor. Syst Appl Microbiol 24:303-10. 625. Alavi M, Miller T, Erlandson K, Schneider R, and Belas R (2001) Bacterial community associated with Pfiesteria-like cultures. Environ Microbiol 3:380-96. 626. Toren A, Landau L, Kushmaro A, Loya Y, and Rosenberg E (1998) Effect of temperature on adhesion of Vibrio strain AK-1 to Oculina patagonica and on coral bleaching. Appl Environ Microbiol 64:1379-84. 627. Banin E, Khare SK, Naider F, and Rosenberg E (2001) Proline-rich peptide from the coral pathogen Vibrio shiloi that inhibits photosynthesis of Zooxanthellae. Appl Environ Microbiol 67:1536-41. 628. Banin E, Vassilakos D, Orr E, Martinez RJ, and Rosenberg E (2003) Superoxide dismutase is a virulence factor produced by the coral bleaching pathogen Vibrio shiloi. Curr Microbiol 46:418-22. 629. Kimes NE et al. (2012) Temperature regulation of virulence factors in the pathogen Vibrio coralliilyticus. ISME J 6:835-46. 630. Bourne DG (2005) Microbiological assessment of a disease outbreak on corals from Magnetic Island (Great Barrier Reef, Australia). Coral Reefs 24:304-12. 631. Sussman M, Willis BL, Victor S, and Bourne DG (2008) Coral pathogens identified for White Syndrome (WS) epizootics in the Indo-Pacific. PLoS One 3:e2393. 632. Pantile R, and Webster NS (2011) Strict thermal threshold identified by quantitative PCR in the sponge Rhopaloeides odorabile. Mar Ecol Prog Ser 431:97- 105. 633. Rai AJ, Kamath RM, Gerald W, and Fleisher M (2009) Analytical validation of the GeXP analyzer and design of a workflow for cancer-biomarker discovery using multiplexed gene-expression profiling. Anal Bioanal Chem 393:1505-11. 634. Souter P et al. (2011) A multilocus, temperature stress-related gene expression profile assay in Acropora millepora, a dominant reef-building coral. Mol Ecol Resour 11:328-34. 635. Culman SW, Bukowski R, Gauch HG, Cadillo-Quiroz H, and Buckley DH (2009) T-REX: Software for the processing and analysis of T-RFLP data. BMC Bioinformatics 10:171. 636. White JR, Nagarajan N, and Pop M (2009) Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5:e1000352. 637. Feder ME, and Hofmann GE (1999) Heat-shock proteins, molecular chaperones, and the stress response: evolutionary and ecological physiology. Annu Rev Physiol 61:243-82.

199

638. Webster NS et al. (2011) Bacterial community dynamics in the marine sponge Rhopaloeides odorabile under in situ and ex situ cultivation. Mar Biotechnol (NY) 13:296-304. 639. Rowan R, Knowlton N, Baker A, and Jara J (1997) Landscape ecology of algal symbionts creates variation in episodes of coral bleaching. Nature 388:265-9. 640. Baker AC (2001) Reef corals bleach to survive change. Nature 411:765-6. 641. Girvan MS, Campbell CD, Killham K, Prosser JI, and Glover LA (2005) Bacterial diversity promotes community stability and functional resilience after perturbation. Environ Microbiol 7:301-13. 642. Hengge R (2009) Principles of c-di-GMP signalling in bacteria. Nat Rev Microbiol 7:263-73. 643. Zavilgelsky GB, Kotova VY, Mazhul' MM, and Manukhov IV (2002) Role of Hsp70 (DnaK-DnaJ-GrpE) and Hsp100 (ClpA and ClpB) chaperones in refolding and increased thermal stability of bacterial luciferases in Escherichia coli cells. Biochemistry (Mosc) 67:986-92. 644. Weber-Ban EU, Reid BG, Miranker AD, and Horwich AL (1999) Global unfolding of a substrate protein by the Hsp100 chaperone ClpA. Nature 401:90-3. 645. Caldas TD, El Yaagoubi A, and Richarme G (1998) Chaperone properties of bacterial elongation factor EF-Tu. J Biol Chem 273:11478-82. 646. Bachinski N et al. (1997) Immediate early response of the marine sponge Suberites domuncula to heat stress: reduction of trehalose and glutathione concentrations and glutathione S-transferase activity. J Exp Mar Bio Ecol 210:129-41. 647. Dinnbier U, Limpinsel E, Schmid R, and Bakker EP (1988) Transient accumulation of potassium glutamate and its replacement by trehalose during adaptation of growing cells of Escherichia coli K-12 to elevated sodium chloride concentrations. Arch Microbiol 150:348-57. 648. Eleutherio EC, Araujo PS, and Panek AD (1993) Protective role of trehalose during heat stress in Saccharomyces cerevisiae. Cryobiology 30:591-6. 649. Zevenhuizen LP (1992) Levels of trehalose and glycogen in Arthrobacter globiformis under conditions of nutrient starvation and osmotic stress. Antonie Van Leeuwenhoek 61:61-8. 650. Jensen JB, Peters NK, and Bhuvaneswari TV (2002) Redundancy in periplasmic binding protein-dependent transport systems for trehalose, sucrose, and maltose in Sinorhizobium meliloti. J Bacteriol 184:2978-86. 651. Lee AH, Zareei MP, and Daefler S (2002) Identification of a NIPSNAP homologue as host cell target for Salmonella virulence protein SpiC. Cell Microbiol 4:739-50.

200

652. Noble JA et al. (1993) The Escherichia coli hflA locus encodes a putative GTP-binding protein and two membrane proteins, one of which contains a protease- like domain. Proc Natl Acad Sci U S A 90:10866-70. 653. Kourilsky P, and Knapp A (1974) Lysogenization by bacteriophage lambda. III. Multiplicity dependent phenomena occuring upon infection by lambda. Biochimie 56:1517-23. 654. Romero M, Diggle SP, Heeb S, Cámara M, and Otero A (2008) Quorum quenching activity in Anabaena sp. PCC 7120: identification of AiiC, a novel AHL- acylase. FEMS Microbiol Lett 280:73-80. 655. Leadbetter JR, and Greenberg EP (2000) Metabolism of acyl-homoserine lactone quorum-sensing signals by Variovorax paradoxus. J Bacteriol 182:6921-6. 656. Otani M, Tabata J, Ueki T, Sano K, and Inouye S (2001) Heat-shock-induced proteins from Myxococcus xanthus. J Bacteriol 183:6282-7. 657. Neidhardt FC, Ingraham JL, and Schaechter M (1990) Physiology of the bacterial cell: a molecular approach (West Hartford, Conn. (EUA). Kumarian Press). 658. Jiménez E, and Ribes M (2007) Sponges as a source of dissolved inorganic nitrogen: Nitrification mediated by temperate sponges. Limnology and Oceanography 52:948-58. 659. Radax R, Hoffmann F, Rapp HT, Leininger S, and Schleper C (2012) Ammonia-oxidizing archaea as main drivers of nitrification in cold-water sponges. Environ Microbiol 14:909-23. 660. Ribes M et al. (2012) Functional convergence of microbes associated with temperate marine sponges. Environ Microbiol:no-. 661. Bonete MJ, Perez-Pomares F, Ferrer J, and Camacho ML (1996) NAD- glutamate dehydrogenase from Halobacterium halobium: inhibition and activation by TCA intermediates and amino acids. Biochim Biophys Acta 1289:14-24. 662. Loureiro S, Reñé A, Garcés E, Camp J, and Vaqué D (2011) Harmful algal blooms (HABs), dissolved organic matter (DOM), and planktonic microbial community dynamics at a near-shore and a harbour station influenced by upwelling (SW Iberian Peninsula). Journal of Sea Research 65:401-13. 663. Alderkamp AC, van Rijssel M, and Bolhuis H (2007) Characterization of marine bacteria and the activity of their enzyme systems involved in degradation of the algal storage glucan laminarin. FEMS Microbiol Ecol 59:108-17. 664. Kelly KM, and Chistoserdov AY (2001) Phylogenetic analysis of the succession of bacterial communities in the Great South Bay (Long Island). FEMS Microbiol Ecol 35:85-95. 665. Jones RJ, Hoegh-Guldberg O, Larkum AWD, and Schreiber U (1998) Temperature-induced bleaching of corals begins with impairment of the CO2 fixation mechanism in zooxanthellae. Plant, Cell & Environment 21:1219-30.

201

666. Olson JB, Gochfeld DJ, and Slattery M (2006) Aplysina red band syndrome: a new threat to Caribbean sponges. Dis Aquat Organ 71:163-8. 667. Angermeier H et al. (2011) The pathology of sponge orange band disease affecting the Caribbean barrel sponge Xestospongia muta. FEMS Microbiol Ecol 75:218-30. 668. Ein-Gil N et al. (2009) Presence of Aspergillus sydowii, a pathogen of gorgonian sea fans in the marine sponge Spongia obscura. ISME J 3:752-5. 669. Negandhi K, Blackwelder P, Ereskovsky A, and Lopez J (2010) Florida reef sponges harbor coral disease-associated microbes. Symbiosis 51:117-29. 670. Allison SD, and Martiny JB (2008) Colloquium paper: Resistance, resilience, and redundancy in microbial communities. Proc Natl Acad Sci U S A 105 Suppl 1:11512-9. 671. Harvell CD et al. (1999) Emerging marine diseases--climate links and anthropogenic factors. Science 285:1505-10. 672. Harvell D et al. (2004) The rising tide of ocean diseases: Unsolved problems and research priorities. Front Ecol Environ 2:375-82. 673. Woodcock S et al. (2007) Neutral assembly of bacterial communities. FEMS Microbiol Ecol 62:171-80. 674. Keddy PA (1992) Assembly and response rules: two goals for predictive community ecology. Journal of Vegetation Science 3:157-64. 675. Lavorel S, and Garnier E (2002) Predicting changes in community composition and ecosystem functioning from plant traits: Revisiting the Holy Grail. Functional Ecology 16:545-56. 676. MacArthur R, and Levins R (1967) The limiting similarity, convergence, and divergence of coexisting species. American naturalist 101:377-85. 677. Ackerly DD, Schwilk DW, and Webb CO (2006) Niche evolution and adaptive radiation: Testing the order of trait divergence. Ecology 87:S50-61. 678. Grime JP (2006) Trait convergence and trait divergence in herbaceous plant communities: Mechanisms and consequences. Journal of Vegetation Science 17:255- 60. 679. Sale PF (1976) Reef fish lottery. Na. Hist 85:60-5. 680. Kelley SE (1989) Experimental studies of the evolutionary significance of sexual reproduction. V. A field test of the Sib-Competition Lottery hypothesis. Evolution 43:1054-65. 681. Wilson JB, and Gitay H (1995) Limitations to species coexistence: evidence for competition from field observations, using a patch model. Journal of Vegetation Science 6:369-76.

202

682. Bensoussan N, Romano J-C, Harmelin J-G, and Garrabou J (2010) High resolution characterization of northwest Mediterranean coastal waters thermal regimes: To better understand responses of benthic communities to climate change. Estuarine, Coastal and Shelf Science 87:431-41. 683. Rasmussen TB, and Givskov M (2006) Quorum sensing inhibitors: A bargain of effects. Microbiology 152:895-904. 684. Bauer WD, Mathesius U, and Teplitski M (2005) Eukaryotes deal with bacterial quorum sensing. Asm News 71:129-35. 685. Cermeño P (2012) Marine planktonic microbes survived climatic instabilities in the past. Proc Biol Sci 279:474-9. 686. Rosen GL et al. (2009) Signal processing for metagenomics: extracting information from the soup. Current Genomics 10:493-510. 687. Suzuki H, Sota M, Brown CJ, and Top EM (2008) Using Mahalanobis distance to compare genomic signatures between bacterial plasmids and chromosomes. Nucleic Acids Res 36:e147. 688. Moran NA, McLaughlin HJ, and Sorek R (2009) The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria. Science 323:379-82. 689. Tyson GW, and Banfield JF (2005) Cultivating the uncultivated: A community genomics perspective. Trends Microbiol 13:411-5. 690. Iverson V et al. (2012) Untangling genomes from metagenomes: Revealing an uncultured class of marine . Science 335:587-90. 691. Frost LS, Leplae R, Summers AO, and Toussaint A (2005) Mobile genetic elements: The agents of open source evolution. Nat Rev Microbiol 3:722-32. 692. Jones BV, and Marchesi JR (2007) Transposon-aided capture (TRACA) of plasmids resident in the human gut mobile metagenome. Nat Methods 4:55-61. 693. Szczepanowski R et al. (2008) Insight into the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to antimicrobial drugs analysed by the 454-pyrosequencing technology. J Biotechnol 136:54-64. 694. Thurber RV, Haynes M, Breitbart M, Wegley L, and Rohwer F (2009) Laboratory procedures to generate viral metagenomes. Nat Protoc 4:470-83. 695. Desnues C et al. (2008) Biodiversity and biogeography of phages in modern stromatolites and thrombolites. Nature 452:340-3. 696. López-Bueno A et al. (2009) High diversity of the viral community from an Antarctic lake. Science 326:858-61. 697. Jones BV, Sun F, and Marchesi JR (2010) Comparative metagenomic analysis of plasmid encoded functions in the human gut microbiome. BMC Genomics 11:46.

203

698. Jones BV (2010) The human gut mobile metagenome: A metazoan perspective. Gut Microbes 1:415-31. 699. Sussman M, Loya Y, Fine M, and Rosenberg E (2003) The marine fireworm Hermodice carunculata is a winter reservoir and spring-summer vector for the coral- bleaching pathogen Vibrio shiloi. Environ Microbiol 5:250-5. 700. Nugues MM, Smith GW, van Hooidonk RJ, Seabra MI, and Bak RPM (2004) Algal contact as a trigger for coral disease. Ecology Letters 7:919-23. 701. Richardson LL (1997) Occurrence of the black band disease cyanobacterium on healthy corals of the Florida Keys. Bulletin of Marine Science 61:485-90. 702. Yahel G, Sharp JH, Marie D, Häse C, and Genin A (2003) In situ feeding and element removal in the symbiont-bearing sponge Theonella swinhoei: Bulk DOC is the major source for carbon. Limnology and Oceanography 48:141-9. 703. Hadas E, Marie D, Shpigel M, and Ilan M (2006) Virus predation by sponges is a new nutrient-flow pathway in coral reef food webs. Limnology and oceanography:1548-50. 704. Oren M, Steindler L, and Ilan M (2005) Transmission, plasticity and the molecular identification of cyanobacterial symbionts in the Red Sea sponge Diacarnus erythraenus. Marine Biology 148:35-41. 705. Bergman O et al. (2011) Marine-based cultivation of diacarnus sponges and the bacterial community composition of wild and maricultured sponges and their larvae. Mar Biotechnol (NY) 13:1169-82. 706. Pawlik JR (1998) Coral reef sponges: Do predatory fishes affect their distribution? Limnology and Oceanography:1396-9. 707. Smith FGW (1941) Sponge disease in British Honduras, and its transmission by water currents. Ecology:415-21. 708. Lauckner L (1987) Ecological effects of larval trematode infestation on littoral marine invertebrate populations. Int J Parasitol 17:391 - 398. 709. Gaino E, Pronzato R, Corriero G, and Buffa P (1992) Mortality of commercial sponges: Incidence in two Mediterranean areas. Italian Journal of 59:79-85. 710. Erpenbeck D, Breeuwer JAJ, Parra-Velandia FJ, and Soest RWMV (2006) Speculation with spiculation?--Three independent gene fragments and biochemical characters versus morphology in demosponge higher classification. Mol Phylogenet Evol 38:293 - 305.

204

Publications

Fan L, McElroy K, and Thomas T (2012) Reconstruction of ribosomal RNA genes from metagenomic data. PLoS One 7:e39948.

Fan L Reynolds D, Liu M, Stark M, Kjelleberg S, Webster NS, and Thomas T (2012) Functional equivalence and evolutionary convergence in complex communities of microbial sponge symbionts. Proc Natl Acad Sci U S A (DOI: 10.1073/pnas.1203287109).

Liu M, Fan L (joint 1st author), Zhong L, Kjelleberg S, and Thomas T (2012) Metaproteogenomic analysis of a community of sponge symbionts. ISME J (DOI: 10.1038/ismej.2012.1).

205