DEVELOPING THE ANEMONE AIPTASIA AS A TRACTABLE MODEL FOR CNIDARIAN- : GENERATING TRANSCRIPTOMIC RESOURCES AND PROFILING EXPRESSION

A DISSERTATION SUBMITTED TO THE DEPARTMENT OF GENETICS AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

ERIK MICHAEL LEHNERT AUGUST 2013

© 2013 by Erik Michael Lehnert. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/bv901hc0997

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

John Pringle, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Andrew Fire

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Wolf Frommer

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Gavin Sherlock

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii Abstract

Coral reefs are -built structures that provide habitats for a disproportionately large number of marine species relative to the small percentage of the ocean that they cover. , and some other cnidarians such as anemones, host within the cells of their gastrodermal tissue. The dinoflagellates fix carbon photosynthetically and transfer it to the host; the dinoflagellates can provide up to 90% of the hosts’ metabolic requirements. The symbiosis between the cnidarian host and its dinoflagellate symbiont is therefore the trophic foundation on which reef are built.

Sadly, coral reefs are threatened throughout the world due to several factors: by crown-of-thorns starfish, disease, climate change, and ocean acidification. Amongst these, climate-change-induced increases in sea-surface temperatures is perhaps the most threatening. Corals live and thrive at temperatures very close to the upper limit of their thermal tolerance. Temperatures elevated above these thermal limits can lead to a phenomenon known as , in which the corals expel or degrade their dinoflagellate symbionts. Following such events, the corals must either reestablish their symbiosis or die. Most current climate-change models predict that the areas in which corals currently live will frequently exceed these limits over the course of this century.

The organisms themselves provide a major challenge in coral research. Corals are mostly found in remote locations. The populations nears human settlement are often the most threatened, and therefore collection is extremely limited if it is permitted at all. Very few coral species are pandemic, which means that most

iv laboratories investigating coral biology work on different species, making experimental reproducibility (the gold standard of the scientific method) difficult.

Finally, corals grow slowly and have exacting requirements with regards to water quality and light conditions. In order to accelerate the growth of knowledge of the cellular and genetic mechanisms underlying cnidarian-dinoflagellate symbioses, we turned to an aquarium pest, the anemone Aiptasia. Aiptasia host dinoflagellates similar to those found in corals, yet they grow rapidly with little regard to water quality. They reproduce in the laboratory almost exclusively by pedal laceration

(spawning can be induced, but does not occur under standard conditions), allowing the generation of large clonal lines that can be distributed to labs globally. As such,

Aiptasia is a promising emerging model for cnidarian-dinoflagellate symbioses.

Despite the attraction of Aiptasia for experimental studies, however, few genomic resources existed at the beginning of my doctoral work. At the same time, new sequencing technologies were emerging that provided relatively inexpensive means of deeply sequencing transcriptomes and genomes. Therefore, I used these new technologies to sequence the transcriptome of aposymbiotic Aiptasia, of symbiotic

Aiptasia, and of the dinoflagellate symbionts. I then used these resources to perform gene-expression analysis comparing symbiotic and aposymbiotic anemones, which provided numerous testable biological hypotheses about both the structural basis of the symbiosis and the downstream metabolic effects of metabolite transfer to the host.

These resources should serve as the foundation for future experiments in our laboratory and more widely in the field of coral biology.

v

Acknowledgements

There are many people who gave me their advice, time, support, care, and enthusiasm throughout this project. They made this work possible; I would therefore like to thank the following people (and many others).

My thesis advisor, Prof. John Pringle, allowed me the freedom to pursue projects I was passionate about. I would also like to thank the rest of my thesis committee: Prof. Andy Fire, Prof. Wolf Frommer, and Prof. Gavin Sherlock. The four of them provided me with excellent, detailed feedback throughout the process. Their advice saved me from having to learn things the hard way at several points throughout my projects.

My undergraduate advisor, Prof. Virginia Walbot, whose classes were instrumental in putting me on a path to academic research and an interest in symbiosis.

Her enthusiasm and support for students will be the standard I compare myself to in my future career.

Members of my lab, past and present, who have given me help and advice throughout the years: Lisl Esherick, Dr. Veena Singla, Tamaki Bieri, Dr. Masa Onishi,

Meng Wang, Shanshan Tuo, Dr. Ryuichi Nishihama, Dr. Kenichi Nakashima, Dr. Jan

DeNofrio, Dr. Santiago Perez, Dr. Cory Krediet, Dr. Cawa Tran, Natalya Gallo, and

Dr. Elizabeth Hambleton. I must particularly thank Dr. Matthew Burriesci; it is rare that one gets to work on a long project with one of one’s best friends. I also thank

Carlo Caruso, with whom I worked closely on several projects back when Aiptasia as a model system was far more a concept than a working reality. Few lab managers give

vi ignorant grad students permission to make as many mistakes as I was, which let me learn many skills I never thought would be part of my graduate studies, especially plumbing.

My collaborators, Morgan Mouchka, Jodi Schwarz, and the Grossman and

Palumbi labs, who helped expand the scope of these projects beyond what I could have seen on my own.

Our department administrator, Wendy Christiansen, who repeatedly and with the most good-natured exasperation fixed all my stupid mistakes. Janice and Larry

Burriesci, who ensured I had a place to spend holidays when I couldn’t go home, always made me feel like a member of the family. Catherine D’Arcey, who started as a boss, and became an amazing and supportive friend. Raymond Von Itter, who helped with many projects while developing resources for a plant genetics course, taught me a great deal about carpentry and construction.

Finally, I must thank my family, who always supported me in my interests, even when it wasn’t clear to anyone (myself included) where they would take me.

Especially my parents, who worked hard and gave up so much of their own time and energy so that I could have a childhood that prepared me for what I am doing today and, hopefully, what I will do in the future.

vii Table of Contents

Chapter 1 Introduction ...... 1 Coral reefs: ecosystems in decline ...... 1 Aiptasia: an emerging model system of cnidarian-dinoflagellate symbiosis ...... 2 Dinoflagellate biology ...... 4 The spatial organization of the cnidarian-dinflagellate symbiosis and its implic- ations ...... 6 Thesis overview ...... 10 Chapter 2 Developing the anemone Aiptasia as a tractable model for cnidarian- dinoflagellate symbiosis: the transcriptome of aposymbiotic A. pallida ...... 12 Abstract ...... 13 Background ...... 15 Methods ...... 17 Aiptasia strain and culture ...... 17 RNA extraction and sequencing ...... 17 Read filtering and transcriptome assembly ...... 19 Transcriptome annotation ...... 20 Validation of contigs by alignment with paired-end Sanger reads ...... 20 Genomic DNA extraction and sequencing ...... 20 Heterozygous SNV detection ...... 21 Estimation of genome size ...... 22 Results and Discussion ...... 22 Sequencing and assembly of the transcriptome ...... 23 Validation and functional annotation ...... 24 Estimation of SNV frequency ...... 28 Estimation of genome size ...... 30 Identification of possible neuropeptide precursors ...... 30 Conclusions ...... 32 Chapter 3 Extensive differences in gene expression between symbiotic and aposymbiotic cnidarians ...... 34 Abstract ...... 35 Background ...... 36 Materials and Methods ...... 39 Aiptasia strain and culture ...... 39 Experimental design ...... 40 RNA isolation and sequencing ...... 41 Read filtering and transcriptome assembly ...... 42 Classification of contig origin using a transcript-sorting algorithm and alignment of genomic reads ...... 43 Expression analysis by RNA-Seq ...... 45 Expression analysis by qPCR ...... 45 Bayesian phylogenetic analysis ...... 47 Unbiased screen for functional groups among the differentially expressed 48 Results ...... 48

viii Sequencing and assembly of the transcriptome of symbiotic Aiptasia ...... 48 Classification of contigs using TopSort and comparison to genomic sequence .... 49 Characterization and annotation of transcriptome ...... 50 Identification of differentially expressed transcripts ...... 53 Genes involved in metabolite transport ...... 54 Genes controlling certain metabolic pathways ...... 60 Genes potentially involved in host tolerance of dinoflagellates ...... 66 Discussion ...... 71 Transcriptome assembly and annotation ...... 71 Differential expression of animal genes ...... 73 Genes controlling and transport in gastrodermal and epidermal cells 73 Recognition and tolerance of dinoflagellate symbionts by the host ...... 82 References ...... 87 Appendix 1 Supplementary Data for Chapter 3 ...... 106

ix

List of Tables

Table 2-1 Properties of the libraries and sequencing runs used for transcriptome analysis ...... 19 Table 2-2 Summary of the aposymbiotic Aiptasia transcriptome assembly ...... 22 Table 2-3 Length dependence of BLAST alignment success ...... 25 Table 2-4 Estimating transcriptome completeness by comparison to Nematostella .. 26 Table 2-5 Completeness of transcripts and sequence conservation for some involved in cellular spatial organization ...... 27 Table 2-6 SNV and indel distributions in Aiptasia ...... 29 Table 3-1 Summary of experimental conditions ...... 41 Table 3-2 Assignment of contigs to species of origin ...... 49 Table 3-3 Size distribution of the representative contigs ...... 51 Table 3-4 Summary of alignments to SwissProt and nr ...... 52 Table 3-5 Distribution of representative contigs among accession numbers ...... 52 Table 3-6 Differential expression of cnidarian contigs ...... 53 Table 3-7 Transport-related proteins that were strongly up-regulated in symbiotic anemones ...... 55 Table 3-S1A Correlation between RNA-Seq and RT-qPCR measurements of differ- ential gene expression in symbiotic relative to aposymbiotic anemones ...... 107 Table 3-S1B Primer sequences and product sizes for RT-qPCR data ...... 108 Table 3-S2 Transport-related genes showing differential expression in symbiotic rel- ative to aposymbiotic anemones ...... 109 Table 3-S3 Lipid-metabolism genes showing differential expression in symbiotic rel- ative to aposymbiotic anemones ...... 111 Table 3-S4 Presence or absence in the Aiptasia transcriptome of genes encoding the enzymes involved in the synthesis of particular amino acids ...... 113 Table 3-S5 Genes potentially involved in host tolerance of the symbiont that are differentially expressed between symbiotic and aposymbiotic anemones ...... 115 Table 3-S6 Primer sequences used for potential qPCR standards ...... 117 Table 3-S7 Experimental conditions used to test gene-expression levels by qPCR 118 Table 3-S8 Assessment of gene-expression stability under various conditions ...... 119

x List of Figures

Figure 2-1 Distribution of contig lengths in the transcriptome assembly ...... 23 Figure 2-2 Predicted amino acid sequences of putative neuropeptide precursors ...... 31 Figure 3-1 The spatial organization of cnidarian-dinoflagellate symbiosis ...... 37 Figure 3-2 Npc2-like proteins that putatively do or do not have the ability to transport cholesterol ...... 57 Figure 3-3 Expression changes of genes governing β-oxidation of fatty acids ...... 62 Figure 3-4 Expression changes of genes governing glutamine and glutamate metabolism ...... 65 Figure 3-5 Expression changes of genes governing the metabolism of sulfur- containing amino acids and the S-adenosylmethionine (SAM) cycle ...... 66 Figure 3-6 Expression changes of genes with functions that may relate to host tolerance of the symbiont ...... 69 Figure 3-7 Summary of hypotheses about metabolism and metabolite transport as suggested by the gene-expression data and previously available information ...... 76 Figure 3-S1 Alignments of Npc2 sequences from Aiptasia and other organisms .... 106 Figure 3-S2 Distinct but related genes whose products may be involved in host tolerance of the symbiont ...... 108

xi Chapter 1 Introduction

Coral reefs: ecosystems in decline

Coral reefs are extensive marine ecosystems that exist largely in nutrient-poor, shallow, tropical waters. Such reefs are of great ecological, economic, and aesthetic value. Reefs provide many benefits to humans by protecting coastlines by slowing storms and erosion [1], producing biologically active compounds for pharmaceutical research [2], sustaining fisheries [3], and sequestering carbon [4]. Despite the importance of these ecosystems, coral reefs are declining worldwide due to a complex combination of factors, many of which are anthropogenic. For example, current studies of coral loss on the Great Barrier Reef estimate that 48% of coral death is due to tropical cyclones, 42% to predation by crown-of-thorn starfish (agricultural run-off is thought to contribute to these outbreaks), and 10% to coral bleaching [5]. The potential causes of coral bleaching, and contributing anthropogenic factors, will be discussed below.

The trophic foundation of the ecosystem is the symbiosis between the corals and endosymbiotic dinoflagellates, single-cellular alveolates that reside within the gastrodermal cells of the corals and transfer fixed carbon to the host. Upon exposure to certain stressors (e.g. elevated water temperature), the symbiosis can break down and the dinoflagellates can be expelled, digested, or depigmented. While coral bleaching currently causes a relatively small fraction of coral-coverage decline, it is particularly worrisome because corals currently live at temperatures very close to their upper thermal limit. The National Oceanic and Atmospheric Administration

1 measures elevated sea surface temperatures in Degree Heating Weeks (DHW), a measure indicating the cumulative time and degrees spent above the maximum predicted summer sea surface temperature (SST). For example, a 2+ DHW can indicate two weeks at 1° C or one week at 2° C above the predicted maximum SST. A

DHW >4+ is correlated with a risk of potential severe bleaching [6]. Global climate change models currently predict an increase in SST that could result in bleaching rates above replacement and the eventual loss of corals in many areas where they currently thrive [7].

Despite the importance of this symbiosis to the coral reef ecosystem, the cellular mechanisms that govern symbiosis establishment, maintenance, and breakdown remain largely unknown. The two broad goal of my thesis work were to develop genomic and transcriptomic resources in an emerging model system of cnidarian-dinoflagellate symbiosis, the Aiptasia, and to perform expression experiments to generate testable hypotheses about these mechanisms. By improving our understanding of the cellular mechanisms governing these processes, it is our long-term goal to be able to better understand and mitigate the threats to coral reefs.

Aiptasia: an emerging model system of cnidarian-dinoflagellate symbiosis

Previous studies to identify cellular processes important to cnidarian-dinoflagellate symbioses have been limited by the intractability of corals to laboratory study. Corals are notoriously difficult to grow in aquaculture, possess a calcium carbonate skeleton that renders biochemical and microscopic assays difficult, and cannot readily be

2 induced to spawn. Due to these limitations, researchers have, out of necessity, studied samples of coral and/or anemones gathered from the wild. Using harvested coral is problematic for several reasons: (1) researchers tend to use their preferred coral species (i.e. that which is locally abundant), making it difficult to compare results from different regions; (2) corals are threatened in many environments, which requires researchers to obtain permits and limits sample mass; (3) corals are extremely genetically heterogeneous, which can render it difficult to determine if changes in phenotype are due to environmental conditions or due to the underlying genetic population structure; (4) the cost of transporting corals can be significant, and high mortality rates often occur while shipping them to laboratories. For these reasons, we and others have turned to Aiptasia spp., a small sea anemone that hosts similar or identical dinoflagellates as coral, as a model system to study cnidarian-dinoflagellate symbiosis.

Aiptasia are small (~0.1-3 cm tall) anemones that, in contrast to corals, are extremely simple to grow and maintain in aquaria. While their lack of calcareous skeleton renders them a poor model for coral calcification (an area of separate ecological interest), this lack permits microscopic investigation of cells that would be much more difficult in corals. Importantly, they reproduce asexually by pedal laceration, allowing us to establish and distribute a genetically homogeneous lineage

[8]. Others in our lab have developed techniques to induce spawning under laboratory conditions, enabling us to produce larvae year-round (as opposed to once per year as is often the norm for corals), leaving only the development of protocols to induce settlement and metamorphosis for a complete life-cycle to be available in the

3 lab (Perez & Pringle, 2013, in press). Our lab currently has established several strains of Aiptasia, including the one predominantly used in our experiments, CC7, clonally derived from a single animal believed to have been collected in the southeastern

United States [9].

Dinoflagellate biology

Dinoflagellates are unicellular alveolates that live in the world’s oceans both as free- living plankton and, in some genera, as endosymbionts to such as and sponges. This phylum is extremely diverse, both morphologically and genetically; only roughly half are photosynthetic [10]. Members of the same purported genus

Symbidionium, despite living in similar habitats (i.e. are found endosymbiotic in coral cells) have been reported to have as much as 18.5-27.3% divergence at the nucleotide level of their transcripts [11]. The phylum is also notable for having distinct genomic architecture and trans-splicing of mRNAs.

Dinoflagellates have some of the largest genomes of all eukaryotic organisms, ranging from 1-5 Gb in the genus to much larger sizes (i.e. ~100 Gb) in non-symbiotic dinoflagellates [12]. While it was initially hypothesized that this was due to repetitive elements within the genome, investigations of genomic rehybridization kinetics have shown comparable rates to other eukaryotic organsims and do not support this hypothesis [13,14]. The organization of these large genomes has been the source of some controversy; a definitive characterization is still lacking

(and may well differ between different genera). The topic, briefly discussed here, has been reviewed extensively in Wisecaver & Hackett, 2011. While initially thought to

4 lack histones based on the inability to identify nucleosome structures by EM, histone sequences have been recently found in some EST libraries [15,16]. Despite the presence of these ESTs, these histones clearly do not occupy the majority of genomic

DNA and may be involved in some other function. The role of chromatin organization appears to be filled by basic nuclear proteins, referred to as histone-like proteins, similar to those found in bacteria [17].

An additional interesting feature of dinoflagellates is the addition of a conserved 22-nucleotide leader sequence to the 5’ end of mRNA [18,19]. The leader sequence is encoded in tandem gene copies, and it is thought that trans-splicing of the leader sequence is responsible for converting polycistronic mRNA to monocistronic mRNA [20]. This may provide dinoflagellates with a mechanism for regulating mRNA levels, and a means for researchers to clone full-length cDNAs from dinoflagellates.

Dinoflagellates have a complex evolutionary history; photosynthetic dinoflagellates contain a diversity of plastids. Most commonly, these plastids are surrounded by three membrans and contain chlorophyll c2, which are believed to have arisen from the secondary endosymbiosis of a red alga [21,22]. Subsequently in some lineages, this plastid was replaced by tertiary endosymbiosis (endosymbiosis of an alga containing a secondary endosymbiont) from a cryptophyte, a haptophyte, a stramenophile, or a green alga (which is technically swapping secondary endosymbionts rather than tertiary endosymbiosis) [15,23,24].

5 The spatial organization of the cnidarian-dinflagellate symbiosis and its implications

Anemones and corals have remarkably simple body plans, consisting of two tissue layers: the gastrodermis, in which the dinoflagellate symbionts reside, and epidermis.

These two layers are separated by the largely acellular, gel-like mesoglea. The first steps of symbiosis establishment remain almost completely unknown. It has been shown that some lectin-glycan interaction may be essential for symbiont recognition by the host (Wood-Charlson et al., 2006). Upon recognition, the symbiont is thought to be phagocytosed by a host gastrodermal cell and sorted to the early endosome [26].

Unlike endosomes containing phagocytosed food particles, however, the vacuole containing a dinoflagellate rarely fuses with the lysosome, but instead resides inside a novel organelle (i.e. the symbiosome; see Chapter 3, Figure 1). The dinoflagellate resides inside the symbiosome and transfers up to 90% of its photosynthate to the host

[27], primarily in the form of glucose [28]. This localization is thought to be dependent on the exclusion Rab7 from the symbiosome membrane [29].

This spatial organization implies some key questions about the maintenance of the symbiosis. All nutrients needed by the dinoflagellate for basal cellular maintenance and growth must traverse the symbiosome membrane; similarly, any molecules surrendered to the host must traverse the same. The epidermal tissue of the host lacks access to both the gastric cavity and nutrients from the dinoflagellate.

Therefore, nutrients derived from the symbiont (or from the metabolism of said transferred nutrients) must be transported to epidermal tissue by some mechanism.

Therefore, we can consider three key questions in dinoflagellate-cnidarian symbiosis:

6 (1) What are the identities of metabolites and by what mechanisms are they translocated from the dinoflagellate to the host (and vice versa)? (2) How are these nutrients employed by the host? (3) How and in what form is nutrition transported to the epidermal tissue?

Symbiotic cnidarians have been reported to receive fixed carbon from their dinoflagellate symbionts in the forms of glycerol, glucose, lipids, or amino acids. I briefly review here the evidence for these claims, which will inform my interpretation of expression data in later chapters. First, the evidence for glycerol transfer is relatively limited, having primarily been observed in the artificial condition of

Symbiodinium isolated from the host [30–33]. Studies of intact symbioses have not confirmed this finding. For example, when the giant clam Tridachna, in which symbiotic dinoflagellates are extracellular, was exposed to radiolabelled bicarbonate, the label was detected in glucose but not glycerol in the hemolymph [34]. However, labeled glycerol was detected when the mantle was damaged or host homogenate was added to the isolated Symbiodinium, even though exogenously added glycerol was not converted to glucose rapidly in the hemolymph [35]. Experiments in Anemonia viridis showed that aposymbiotic anemones exposed to labeled glucose, malate, succinate, fumarate, and amino acids had labeled metablite profiles more similar to symbiotic anemones incubated with radiolabeled bicarbonate than those exposed to labeled glycerol [36]. Finally, studies in our own lab showed that symbiotic anemones incubated with labeled bicarbonate had detectable levels of labeled glucose in the host tissue two minutes after exposure to photosynthetic conditions, while glycerol did not appear until 24 hours later [28]. On the whole, these data indicate that glucose, and

7 not glycerol, is the major form of translocated fixed carbon in our anemones (and corals as well).

Whether or not lipids are transferred between the two partners remains an open question. Early reports of lipid drops protruding from algal cells [37,38] were later identified as host nuclei [39]. This was the most direct evidence for lipid transfer, and since it has been shown not to be applicable, there the issue has remained unclear.

Importantly, no experiment to our knowledge has investigated labeling patterns of fatty acids after separation of the symbionts from host tissue; the study performed in our lab was not optimized for lipophilic substances, and therefore the inability to detect them cannot be interpreted as evidence of translocation. However, neither does there appear to be strong evidence indicating that they are transferred in large quantities, if at all.

As most animals are incapable of synthesizing ‘essential’ amino acids [40], it has been hypothesized that amino acids might be synthesized by the dinoflagellates and transferred to the host [41,42]. However, none of the isotope-tracing studies mentioned above detected labeled amino acids at high levels within host tissues.

Additionally, the main piece of evidence for this hypothesis, that aposymbiotic animals contain higher levels of ammonia that they must excrete into seawater due to the lack of a nitrogen sink [43] has since been shown to be due to ‘nitrogen conservation’. The nitrogen conservation hypothesis, originally tested in the Hydra-

Chlorella symbiosis (in which a symbiotic green alga is endosymbiotic to a

Hydrozoan), states that the increase in ammonia levels is due to a carbon shortage, which is remedied by transdeamination of amino acids so that their carbon skeletons

8 can enter the citric acid cycle. It has been shown that this increase in levels of ammonium can be blocked by incubation with inhibitors of transdeamination of amino acids [44]. Further work in Aiptasia demonstrated that ammonium elevation and excretion in aposymbiotic anemones could be decreased to a level similar to that of symbiotic anemones by supplementing the seawater with α-ketoglutarate (providing a carbon source, and thereby obviating the need for amino acid catabolism) [42]. These results invalidate the major indirect argument for the transfer of amino acids to the host, leaving us to conclude that the amounts transferred are slight, if translocation occurs at all. However, complicating this interpretation is the fact that seven amino acids (histidine, isoleucine, leucine, lysine, phenylalanine, tyrosine, and valine) were labeled in symbiotic anemones cultured with radiolabelled glucose, aspartate, and glutamate over two days, while they were not in aposymbiotic animals (tryptophan, aspartate, and glutamate could not be assayed) [45]. Taken together, these results indicate that some essential amino acids may be synthesized by the dinoflagellate and transported to the host, although presumably not in large quantities.

In addition to fixed carbon being translocated to the host, some nutrients must be tendered to the dinoflagellate. All essential inorganic compounds (e.g. CO2, , nitrogen, metal ions, etc…) must be translocated across the symbiosome membrane to the dinoflagellate, but relatively little is known about the identity of the molecules or the mechanisms of transfer. Carbonic anhydrases have been implicated as a potential carbon concentrating mechanism for the symbiosome (see Davy,

Allemand, & Weis, 2012 for review). Isotope-tracing studies have also shown

9 increased uptake of both [47] and ammonium in symbiotic anemones, with portions of these compounds ending up within the dinoflagellate [42,48–51].

The downstream use of the translocated nutrients remains almost entirely unelucidated. While presumably the fixed carbon is used to sustain basic metabolic needs of the host, it has been shown that up to 40% of the net fixed carbon can be lost to mucus production [52,53]. It is also known that fixed carbon (and presumably digested food as well) is stored in corals and anemones as lipids, in particular as wax esters, triacylglycerides (TAG), fatty acids, sterols, and polar lipids. It is worth noting that cnidarians store a relatively large quantity of carbon as wax esters (20-30% in one stony coral, Goniastrea aspera, and 9% in Anthopleura elegantissima) [54,55]. The amount of total lipid, predominantly from the pools of TAG and wax esters, decreases as corals bleach [56] and at lower depths [57]. Additionally, exposure of A. viridis to increased light levels led to increased levels of TAG in both dinoflagellates and the host, but the levels of wax esters increased in the host alone [58]. Overall, these data indicate that the carbon received from the dinoflagellate is essential to allow high levels of lipid deposition in corals. However, understanding how fixed carbon is incorporated into existing metabolic pathways – and how this is regulated – remains a major goal in coral biology.

Thesis overview

This thesis will describe the development of genomic resources in Aiptasia and their use in determining differences in transcript abundance between symbiotic and aposymbiotic anemones. The second chapter consists of a paper describing the

10 transcriptome of the aposymbiotic anemone; I conducted all of the data collection and analysis for this paper with the exception of the program Fulcrum, which was written by Matthew Burriesci. The third chapter describes the assembly and annotation of the transcriptome of the symbiotic anemone and its endogenous symbiont, as well as the differences in gene expression between symbiotic and aposymbiotic anemones. I performed all the described analyses except the following: the program to classify contigs based on sequence features was written by Matthew Burriesci; the determination of appropriate reference transcripts for qPCR was performed by Natalya

Gallo; the collection of RNA and library syntheses for Experiment 2, the interpretation of differentially expressed genes promoting symbiont tolerance, and the RT-PCR validation was performed by Morgan Mouchka.

11

Chapter 2 Developing the anemone Aiptasia as a tractable model for cnidarian- dinoflagellate symbiosis: the transcriptome of aposymbiotic A. pallida

This chapter was previously published in BMC Genomics (2012), 13: 271 with the following authors Erik M. Lehnert, Matthew S. Burriesci, John R. Pringle

Department of Genetics, Stanford University School of Medicine, Stanford, CA 94025 USA

The work contained in the original manuscript is my own with the exception of the development of Fulcrum, which was performed by Matthew Burriesci.

12 Abstract

Background: Coral reefs are hotspots of oceanic biodiversity, forming the foundation of ecosystems that are important both ecologically and for their direct practical impacts on humans. Corals are declining globally due to a number of stressors, including rising sea-surface temperatures and pollution; such stresses can lead to a breakdown of the essential symbiotic relationship between the coral host and its endosymbiotic dinoflagellates, a process known as coral bleaching. Although the environmental stresses causing this breakdown are largely known, the cellular mechanisms of symbiosis establishment, maintenance, and breakdown are still largely obscure. Investigating the symbiosis using an experimentally tractable model organism, such as the small sea anemone Aiptasia, should improve our understanding of exactly how the environmental stressors affect coral survival and growth.

Results: We assembled the transcriptome of a clonal population of adult, aposymbiotic (dinoflagellate-free) Aiptasia pallida from ~208 million reads, yielding 58,018 contigs. We demonstrated that many of these contigs represent full-length or near-full-length transcripts that encode proteins similar to those from a diverse array of pathways in other organisms, including various metabolic enzymes, cytoskeletal proteins, and neuropeptide precursors. The contigs were annotated by sequence similarity, assigned GO terms, and scanned for conserved domains. We analyzed the frequency and types of single-nucleotide variants and estimated the size of the Aiptasia genome to be ~421 Mb. The contigs and annotations are available through NCBI (Transcription Shotgun Assembly database, accession numbers JV077153-JV134524) and at http://pringlelab.stanford.edu/projects.html.

Conclusions: The availability of an extensive transcriptome assembly for A. pallida will facilitate analyses of gene-expression changes, identification of proteins of

13 interest, and other studies in this important emerging model system.

14 Background Coral reefs are global resources of great ecological, economic, and aesthetic value. The success of corals in their typically nutrient-poor environments is due largely to their symbiosis with dinoflagellates of the genus Symbiodinium. These algae inhabit the symbiosome (a vacuole derived from the early endosome) in gastrodermal cells of the host [26,59–61] and transfer up to 95% of their photosynthetically fixed carbon to the host [62]. Reef-building corals have recently declined worldwide, with pollution, disease, destructive fishing practices, increased sea-surface temperatures, and ocean acidification all implicated as contributory factors. Some of these environmental changes affect the symbiotic relationship between algae and host and can lead to dramatic and potentially lethal “bleaching” events, during which the algae are lost and the host may die. Bleaching events have become more frequent over the past 20 years. Much recent research in coral biology has focused on the effects of stresses – particularly high temperature and lowered pH – on the coral holobiont (the community of living organisms making up a healthy coral), as well as on which genetic and molecular factors of the host and alga lead to differential stress responses and resilience [63–70]. However, these efforts have been impeded by the lack of an experimentally tractable system for studies of the establishment, maintenance, and breakdown of the symbiosis. Corals themselves present major logistical difficulties for laboratory investigation. They grow slowly and are difficult and costly to maintain, their calcareous skeletons make many biochemical and cell biological techniques difficult, and it can be difficult to obtain sufficient biomass to do high- throughput experiments. In addition, samples collected from the wild can have heterogeneous genetic backgrounds, causing difficulties in the application and interpretation of gene-expression studies. To circumvent these difficulties, we and others are developing the small sea

15 anemone Aiptasia as a model system for studies of dinoflagellate-cnidarian symbiosis [8,9]. Like corals, Aiptasia is an anthozoan (a Class in the Phylum Cnidaria) and maintains intracellular symbiotic dinoflagellates closely related to those in corals. However, unlike corals, Aiptasia is extremely hardy, grows and reproduces rapidly via in the laboratory (allowing the generation of large clonal populations), and lacks a calcareous skeleton. The lack of skeletal deposition makes Aiptasia an unsuitable model for this aspect of coral biology but greatly facilitates other studies of cell biology and biochemistry. Additionally, Aiptasia can exist in an aposymbiotic (dinoflagellate-free) state or host a variety of Symbiodinium types (although not all), allowing studies of symbiosis specificity [8,71,72]. We have recently developed a protocol for the year-round induction of spawning and larvae production in laboratory-raised Aiptasia [73], which should free a variety of studies from dependence on the seasonal coral reproductive cycle and potentially open the door to genetic analysis. Studies of the dinoflagellate-cnidarian symbiosis can take advantage of genomics approaches. For example, gene-expression studies should help to elucidate how symbiotic cnidarians respond to various stressors, whereas comparative genomics approaches using sequence data from cnidarians that are not symbiotic with dinoflagellates should help us understand how these symbioses evolved. Genomic and transcriptomic resources for cnidarians are beginning to accumulate rapidly, thanks to the advent of new sequencing technologies. Recently, the genome of Acropora digitifera, a common Indo-Pacific coral, was sequenced and assembled [74]. In addition, the genomes of two non-symbiotic cnidarians, the anemone Nematostella vectensis (an anthozoan) and the more distantly related Hydra magnipapillata (in Class Hydrozoa), have been sequenced [75,76]. Small, Sanger-sequenced EST datasets are available for several species of corals and anemones [9,77,78], as are

16 larger 454-sequenced datasets for several corals [79–81]. As a step in the development of Aiptasia as a model system, we have performed a detailed analysis of the transcriptome of the aposymbiotic animals. Unlike previous transcriptomes in the field of symbiotic cnidarian biology, these data are derived from a clonal and easily distributed strain of anemone, greatly facilitating a straightforward comparison of experimental results between different laboratories.

Methods Aiptasia strain and culture All animals used were from clonal population CC7 [9], which in spawning experiments typically behaves as a male (hundreds of sperm spawns compared to three occasions on which individual polyps have produced eggs) [73]. The stock cultures were grown in a circulating artificial seawater (ASW) system at ~25ºC with 20-40 µmol photons m-2 s-1 of photosynthetically active radiation (PAR) on an ~12 h light : 12 h dark cycle and fed freshly hatched brine-shrimp nauplii approximately twice per week. Aposymbiotic animals were generated by several repetitions of the following process: cold-shocking by addition of 4ºC ASW and incubation at 4ºC for 4 h, followed by 1-2 days of treatment at ~25ºC in ASW containing the inhibitor diuron (Sigma-Aldrich) at 50 µM. After recovery for several weeks in ASW in the light (~12:12 light:dark) at ~25ºC, putatively aposymbiotic anemones were inspected by fluorescence microscopy to confirm the complete absence of dinoflagellates (whose bright chlorophyll autofluorescence is conspicuous when they are present) and were then cultured in separate tanks as described for the stock culture above.

RNA extraction and sequencing Separate populations of animals were exposed to various conditions prior to RNA

17 isolation in an attempt to maximize the diversity of genes expressed. Whole, medium- sized (~1 cm long) anemones were collected in three pools: (i) ~20 animals grown in control conditions; (ii) animals (2-3 per concentration and time point) exposed to bacterial lipopolysaccharide [LPS (Sigma, cat. no. L2880), which is commonly used to induce a strong innate immune response in other organisms] at 1, 10, or 100 µg/µl for 6 or 24 h; (iii) animals (2-3 per treatment) that had been exposed to a single treatment [elevated light (~250 µmol photons m-2 s-1) for 3 h; dark for 3 h; cold shock at 4ºC for 4 h; heat shock at 37ºC for 4 h; ultraviolet illumination for several minutes; starvation for one week; hyperosmolarity (1.5x normal salt concentration) or hypoosmolarity (0.3x normal salt concentration) for 30 min; exposure to 10 µM or 100 µM of the 20 standard amino acids or the sugars sucrose and D-glucose for 1 h]. Treated animals were stored in RNALater (Ambion, cat. no. AM7021) at -20ºC for later processing. We extracted total RNA from the anemones in each pool by homogenization in TRIzol reagent (Invitrogen, cat. no. 15596-026) following the manufacturer’s protocol and using the alternative high-salt method of RNA precipitation recommended by Invitrogen to reduce proteoglycan and polysaccharide contamination. We enriched for polyadenylated RNA using the MicroPoly(A) Purist kit (Ambion, cat. no. 1919) and then fragmented the RNA using divalent cations [5 min at 94ºC in the reverse- transcriptase first-strand buffer supplied with SuperScript III reverse transcriptase

(Invitrogen, cat. no. 18080044)]. cDNA was synthesized using random-hexamer primers (Invitrogen, cat. no. N8080127), ligated to Illumina PE Adaptors, size- selected, amplified, and size-selected a second time. Libraries with different insert sizes (ca. 200, 400, and 600 bp) were synthesized for each pool. Clustering and sequencing were performed by the Stanford Center for Genomics and Personalized Medicine using an Illumina Genome Analyzer IIx (GAIIX) sequencer.

18 Table 2-1 Properties of the libraries and sequencing runs used in transcriptome analysis

Population from Approximate Amount of which mRNA was library insert Number of GAIIx Number of GAIIx sequence derived length (bp) cycles (bp) lanes sequenced (Gb)

Control stock 200 76 3 9.1

Mixed-treatment 600 76 3 12.8

LPS-treated 200 101 1 5.3

LPS-treated 400 101 1 4.9

LPS-treated 600 76 1 2.7

LPS-treated 600 101 1 1.8

Read filtering and transcriptome assembly To minimize redundancy in the dataset, we used the Fulcrum program to collapse duplicate reads and return a single representative read with improved quality scores for each “read family” [82]. Reads were then filtered for quality and length. Briefly, reads were trimmed such that no nucleotide had a quality score less than 10 and no ambiguous nucleotides (N’s) remained. Any read shorter than 45 bp was then discarded. The remaining reads were combined into files based on the insert size of the library (irrespective of the prior biological treatment) and assembled using an additive multiple-k-mer (35, 39, 43, 47, 51, 55, 59, 63, and 67) approach [83,84] with the Velvet/Oases assembler (Velvet version 1.1.04 and Oases version 0.1.21) [85,86]. Oases assembled many contigs that formed "hairpins", suggesting mis-assembly caused by the presence of palindromic or near-palindromic sequences in the reads. (This problem appears to have been solved in more recent versions of Oases – version 0.1.19 and later – that were released after our study was completed.) We identified these hairpin-containing sequences and split each of them into two separate contigs. The contigs resulting from the individual assemblies were then assembled together

19 with the original Illumina reads using a k-mer length of 67 with the conserveLong option turned on. Both the output from this final assembly and the combined contigs from each individual assembly were merged into a single file, new hairpins were identified and processed as described above, and identical contigs were collapsed into single representatives using cd-hit-est [87]. The resulting contigs were assembled using CAP3 (requiring ≥50-bp overlap with ≥90% identity to join two contigs) to join overlapping contigs and reduce redundancy in the transcriptome dataset [88]. Contigs shorter than 200 bp were discarded as likely to be uninformative.

Transcriptome annotation In order to assign putative functional roles to the transcripts, we aligned them to the NCBI Non-Redundant protein database (nr) using the blastx program from the standalone BLAST 2.2.25+ software suite with an e-value cutoff of 1e-3 [89]. Predicted protein sequences were searched for specific domains using Interproscan [90]. The blastx and Interproscan outputs were imported using the Blast2GO software package [91] and used to assign (GO) terms to the predicted proteins [92].

Validation of contigs by alignment with paired-end Sanger reads As one approach to contig validation, we aligned a set of paired-end Sanger-sequenced

ESTs [9] to our transcriptome assembly using BLAT (minimum percent identity 90%) [93]. We counted the number of times the best alignments of a pair of forward and reverse Sanger reads were to the same contig but with the expected opposite orientation.

Genomic DNA extraction and sequencing Genomic DNA was isolated from medium-sized aposymbiotic anemones by

20 incubating the whole animals at 55ºC for 4 h in lysis buffer (100 mM NaCl, 50 mM Tris pH 8.0, 50 mM EDTA, 1% SDS) to which Proteinase K had been added to a final concentration of 0.77 µg/µl. The resulting solution was extracted twice with equal volumes of buffer-saturated phenol (Invitrogen, cat. no. 15513-039) and once with an equal volume of phenol/chloroform/isoamyl alcohol (25:24:1). The genomic DNA was then precipitated by ethanol, resuspended in 100 µl of TE buffer, and sheared using a Covaris Adaptive Focused Acoustics machine, following the manufacturer’s instructions, to a target size of 400 bp (10% Duty Cycle, 4 Intensity, 200 cycles per burst, 55 seconds). End-repair and adapter ligation were performed following Illumina’s instructions, and two lanes were sequenced using an Illumina HiSeq system by the Stanford Center for Genomics and Personalized Medicine.

Heterozygous SNV detection Putative single-nucleotide variants (SNVs) were detected using CLC Genomics Workbench version 4.6 (CLC bio). Fulcrum-collapsed (see above) and quality- filtered HiSeq genomic reads were mapped against the transcriptome. After an optimal alignment was generated, it was considered valid if 40% of the read aligned with ≥96% agreement (at least 34 of 35 base-pairs for the average post-trimming read length of 88-bp). We used 40% rather than something higher because the 100-bp reads could overlap exon-intron boundaries, and we do not yet have a good estimate of average exon size in Aiptasia. The 40% criterion should prevent intron sequence in the read from disallowing a valid match while still providing sufficient specificity. If a given site had a minimum of 10x coverage and ≥35% of the reads at that site contained the alternative base, we classified that base as an SNV. To estimate the percentage of false positives among our SNV calls, we amplified genomic DNA for some of them using primers to the flanking sequences and

21 sequenced the products using the Sanger method. We identified SNVs as positions in otherwise high-quality chromatograms where there were peaks for two different bases.

Estimation of genome size Genome size was estimated by using a slightly modified version of the protocol outlined by Hu et al. [94]. We aligned two lanes of HiSeq genomic data (see above) against the assembled transcriptome using BLAT. We determined the number of bases in each read that aligned with the corresponding contig from the top hit that had no alignment gaps; where multiple hits with equal scores existed, the first hit listed was used. The numbers of aligned bases were summed for all genomic reads mapping to a given contig and divided by the contig length, giving each contig in the transcriptome an average coverage. The modal coverage of the entire contig dataset was then used to estimate the depth to which the genome had been sequenced. The total amount of sequence in the genomic reads was then divided by the estimated sequencing depth to obtain the genome size.

Results and Discussion

Table 2-2 Summary of the aposymbiotic Aiptasia transcriptome assembly

Total number of contigs 58,018 Total base-pairs in contigs 44.7 Mb Contig size range 200–13,061 bp Median contig length 453 bp Mean contig length 770 bp a Contigs of <200 bp were discarded (see text).

22 Sequencing and assembly of the transcriptome From a laboratory-raised, clonal population of Aiptasia pallida, we generated aposymbiotic anemones and confirmed that they were dinoflagellate-free as described in Methods. We then used these animals to produce three pools of mRNA for cDNA synthesis: mRNA from animals grown in control conditions, mRNA from LPS-treated animals, and mRNA from animals subjected to a variety of other treatments. From each pool of mRNA, we then synthesized three paired-end Illumina libraries with different insert sizes. Those libraries deemed the best quality by Bioanalyzer trace were then sequenced in separate lanes using the Illumina GAIIX system (Table 1). The resulting 10 lanes of sequence yielded a total of ~208 million raw pairs of reads and ~36 Gb of sequence. We used the Fulcrum program [82] on the sequence data from each lane to collapse duplicate sequences due to either PCR amplification or high coverage of abundant transcripts; this operation reduced the number of reads to ~44 million pairs. (The large reduction presumably represented overamplification of the

Figure 2-1 Distribution of contig lengths in the transcriptome assembly. The 11000-11399, 11800- 12199, 12200-12599, and 13000-13399 ranges each contained one contig (hence log10 = 0).

23 original libraries.) After trimming to remove low-quality and adaptor sequences and removing short reads, the number of reads was further reduced to ~42 million pairs comprising ~7.4 Gb of sequence. The collapsed, quality-filtered reads were assembled using Velvet/Oases following a multiple-k-mer approach (see Methods). The resulting assemblies were merged, resulting in an initial set of 69,402 contigs. Of these, 11,384 appeared to be due to bacterial contamination as judged by their strong similarity to known bacterial sequences. Most of these contigs were derived from the LPS-treatment libraries and probably resulted from the presence of bacterial DNA in the LPS stock. The remaining 58,018 contigs ranged from 200 bp to 13,061 bp, with a mean of 770 bp and an N50 of 1,185 bp (Table 2). Although the size distribution was weighted towards smaller contigs (Figure 1), there were 13,208 contigs with lengths >1,000 bp.

Validation and functional annotation We used several approaches to validate the transcriptome assembly. First, we compared it to a set of 4,833 pairs of Sanger-sequence reads from a cDNA library derived from mRNA isolated from symbiotic anemones [9]. In the preparation of this library, an effort was made to obtain full-length cDNAs, which were also size-selected to enrich for longer species (average size ≅ 1.95 kb); thus, it should be enriched for full-length or near-full-length transcripts. When we aligned these ESTs to our transcriptome assembly using BLAT, 73% (7,091) of the Sanger reads mapped to the transcriptome with identity ≥90%. The remaining 27% are likely to be sequences from Symbiodinium, from genes that are expressed only at low levels in aposymbiotic Aiptasia, or from other organisms that were present in the culture used to prepare the library for Sanger sequencing. Of the 755 Sanger read-pairs in which each read mapped to one and only one contig, 73% (551) mapped to the same contig in opposite

24 directions. Of the additional 1,520 read-pairs with valid alignments in which one or both reads aligned to more than one contig, for 82% (1,239) there was at least one contig to which both reads aligned with opposite orientations. These data suggest that even among long transcripts, which are more likely to be fragmented in our assembly, many are represented by full-length or near-full-length contigs.

Table 2-3 Length dependence of BLAST alignment success

Number of contigs in % of contigs in size Total number of size range with BLAST range with BLAST contigs in size alignments (e-value ≤ alignments (e- Contig size range range 1e-10) value ≤ 1e-10) 200 bp – 599 bp 35,424 11,356 32% 600 bp – 999 bp 9,372 6,080 65% 1000 bp – 1399 bp 5,239 4,113 79% ≥ 1400 bp 7,983 7,244 91%

To further validate and begin assigning gene functions to the assembled transcripts, we used blastx to align them to the NCBI non-redundant protein database (nr). Using an e-value cut-off of 1e-10, we found that 49.6% (28,794) of the contigs encoded predicted proteins with significant similarity to proteins in nr. In 70.0 % (20,154) of these cases, the top hit was to a predicted protein from Nematostella. The large number of contigs without nr hits appears to be due mainly to the presence of many short contigs (Table 3), which presumably cover only non-conserved regions in the encoded proteins. The 28,794 contigs with nr hits identified only 14,479 unique protein accessions. This may be due both to the presence of multiple alternative contigs produced by Velvet/Oases for many transcripts and to the alignment of shorter contigs to different parts of the same protein. The presence of contigs that represent only a portion of a transcript, particularly from those transcripts that are expressed at low levels, makes it difficult at this time to achieve full clustering of contigs derived from alternative transcripts of the same gene (or from different regions of the same

25 transcript), as well as to determine the number of alternative transcripts produced by each genetic locus. Using Blast2GO, we assigned GO terms based on the transcripts’ associated nr annotations – using the default e-value cutoff of the Blast2GO software, 1e-3 - and the results of InterProScan [95]. We were able to assign GO terms to 14,904 contigs in our transcriptome.

Table 2-4 Estimating transcriptome completeness by comparison to Nematostella a

Number of predicted proteins Average % of Average assigned to Number of Nematostella predicted this pathway orthologs in coding sequence amino acid in Aiptasia covered by best similarity (%) Pathway Nematostella transcriptome alignment b Glycolysis and 30 25 89 87 gluconeogenesis Amino-sugar and nucleotide-sugar 28 25 68 85 metabolism Regulation of 13 10 63 85 autophagy Pentose-phosphate 18 14 81 89 pathway Citrate cycle 28 24 90 89 Valine, leucine, and isoleucine 37 34 80 88 degradation Purine metabolism 92 78 62 86 Fatty acid 5 3 62 79 biosynthesis a Aiptasia orthologs to Nematostella proteins in each pathway (as identified in KEGG) were predicted by best-reciprocal-BLAST analysis (see text). b Using BLOSUM62 similarity matrix [96].

We also looked for complete or partial coding sequences of conserved genes in several cellular pathways. In one analysis, we examined the proteins predicted from the Nematostella genome sequence to be involved in eight metabolic pathways. Sequences assigned to these pathways were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) [97], and their putative orthologs in the Aiptasia

26 transcriptome were identified by a best-reciprocal-blast approach (Table 4). These results suggest that our transcriptome provides complete or nearly complete coverage of many pathways and that most Nematostella proteins have orthologs that are represented by full-length or nearly full-length transcripts in our Aiptasia transcriptome. In a second analysis, we looked individually at the Aiptasia homologues of an arbitrarily chosen functional group of proteins, namely a subset of those involved in cellular spatial organization and cytoskeletal function. The results (Table 5) show clearly (i) that many Aiptasia genes are represented in our transcriptome assembly by contigs that cover the entire coding region plus sequences of the 5'- and 3'-UTRs; (ii) that even some very long transcripts are represented by contigs that cover most of their lengths; (iii) that, for whatever reason(s), even some genes of moderate length are not represented by complete transcripts in our current assembly (see the septin entry in Table 5); and (iv) that, as expected, proteins in this functional group are closely conserved in Aiptasia as they are in other animals. Table 2-5 Completeness of transcripts and sequence conservation for some proteins involved in cellular spatial organization

% amino- Amino acids acid Positions in Query protein Amino of query sequence Length of contig (GenBank acids in covered by identity Aiptasia covered by Accession query best BLAST (number of contig best BLAST Number) sequence hit(s) gaps) (bp) hit Mouse Cdc42 191 1-191 92 (0) 1,399 275-847 (P60766) Mouse cyto- plasmic actin 1 375 2-375 97 (0) 1,455 117-1,247 (P60710) Mouse tubulin 451 1-432 97 (0) 1,971 1,879-584 α1B (P05213) Mouse tubulin 444 1-427 98 (0) 2,592 1,529-249 β5 (P99024) Mouse septin-2 361 139-329 65 (14 b) 1,979 3-617 (P42208) a Mouse kinesin 3-199 72 (0) 906 592-2 1 heavy chain 963 453-947 53 (14) 2,138 6-1,514 (Q61768)

27 Mouse myosin 1,937 23-1,903 50 (14) 7,798 7,650-2,032 8 (P13542) Mouse dynein 21-216 53 (4) 790 191-790 heavy chain 1 4,644 836-4,643 73(30) 11,600 3-11,414 (Q9JHU4) a Our unpublished studies have shown that as expected, the A. pallida genome encodes multiple septins and that, like other septins [98], these contain the three conserved motifs of a GTP-binding site in their N-terminal regions. We do not yet know why none of these transcripts appears in full-length form in the current transcriptome. b The predicted A. pallida protein has a single insertion of 14 amino acids near the C-terminus of the "septin-unique element" [98].

In summary, although we are undoubtedly lacking the sequences (or at least lacking complete sequences) for some transcripts that are expressed only at low levels, in particular cell types, during particular stages of development, or under conditions to which we did not expose the anemones, it appears that the transcriptome described here contains at least partial sequences (and many full-length sequences) for the large majority of transcripts expressed in adult, aposymbiotic anemones. It will be particularly interesting to see how many additional transcripts are identified when the transcriptome of symbiotic anemones is examined.

Estimation of SNV frequency We identified SNVs (i.e., sites of heterozygosity in our clonal Aiptasia stock) by mapping genomic data to the transcriptome (see Methods). To minimize the misclassification of sequencing errors as SNVs, we demanded that any SNV called be represented in ≥35% of the reads mapping to a region. We identified 48,404 putative SNVs (not including deletions and insertions) by mapping one lane of HiSeq genomic data (~5.5 Gb of sequence, of which ~1.3 Gb mapped to the transcriptome), and an additional 6,896 by adding a second lane of genomic data (for a total of ~10.1 Gb, of which ~2.2 Gb mapped to the transcriptome), for a total of 55,300, or 1 SNV per 808 bp. Because the additional lane roughly doubled the amount of sequence mapped but led to only an ~14% increase in the SNVs discovered, further mapping would

28 presumably find few additional SNVs within our clonal strain of Aiptasia. The majority of SNVs we identified were transition rather than transversion mutations (Table 6), consistent with findings in other organisms [99] and with the previous observations for Aiptasia [9]. Additional investigation using similar methods led to the identification of 8,691 putative deletion or insertion variants (Table 6).

Table 2-6 SNV and indel distributions in Aiptasia

Variant type Frequency in Transcriptome A/G Transition SNV 19,806 C/T Transition SNV 20,087 A/C Transversion SNV 6,846 G/C Transversion SNV 4,440 A/T Transversion SNV 7,408 G/T Transversion SNV 6,232 Insertion or Deletion 8,691 a [1, 5,773; 2, 1,289; 3, 862; 4, 372; 5, 165; 6,143; 7, 57; 8, 30] a Broken down within the brackets by the number of nucleotides involved.

To evaluate the reliability of our SNV calls, we designed primers to nine contigs in our assembly based on the following criteria. (1) The top BLAST hit was to a cnidarian, so we could be confident that we were looking at an Aiptasia-derived contig. (2) The predicted SNV was not located so close to an end of the contig that it would be within 40 bp of the primer that we were using to amplify (as this could have led to confusion from low-quality sequence near the primer sites). (3) The variant was a simple base-pair change rather than an indel (as these would have been undetectable by our method of inquiry). (4) Contigs with multiple SNVs were preferred as this enabled to us perform more tests with fewer primers. For the nine contigs, we created 12 primer pairs that would amplify regions containing a total of 17 putative SNVs. Of these 12 primer pairs, eight produced clear PCR products with single bands, encompassing a total of 11 putative SNVs. Six of these bands had the predicted sizes, and two were larger (~400 instead of 250 bp and ~1600 instead of 576 bp), presumably indicating the presence of introns. The

29 remaining four primer pairs presumably either needed additional optimization of the PCR reactions to ensure specificity or represented regions in which the exons were separated by introns that were too long for amplification under standard PCR conditions. All eight of the PCR products were sequenced using the same primers as used for the PCR, and the SNV was considered to be validated when there were dual peaks matching the reference and variant calls at the specified location surrounded by otherwise high-quality peaks. All 11 of the SNVs tested were validated in this test, suggesting that there is only a low false-positive rate for our larger set of SNV calls.

Estimation of genome size We aligned ~10.1 Gb of genomic reads to the transcriptome assembly (see Methods), and estimated a modal coverage of ~24x per contig. Thus, we estimate a genome size of 10,100 Mb/24 = 421 Mb. Nematostella and Acropora digitifera, the closest relatives of Aiptasia whose genomes have been sequenced, have genome sizes of ~450 Mb and ~420 Mb, respectively [75,76]. Given its apparently modest size, the Aiptasia genome could be readily sequenced using the currently available technologies.

Identification of possible neuropeptide precursors Investigation of neuropeptide precursors and their cleavage products in other cnidarians has improved understanding of their neurological organization and development [100]. It has also provided tools for manipulation of the animals, such as increasing the rate of budding and inducing larval metamorphosis and settlement [100–103]. Attempts to induce settlement of larvae using the Hydra neuropeptide Hym-248 (pEPLPIGLW-NH2) were successful in several species of the coral genus Acropora [102,103] but not in other coral genera [103] or in Aiptasia (S. Perez, personal communication). To ask if this failure were due to a lack of sequence

30 similarity between Hym-248 and the neuropeptides in Aiptasia, we scanned the transcriptome for potential neuropeptide precursors of the GLW-NH2 type. We found three distinct transcripts containing repeated GLW motifs (Figure 2).

Ap-Npe1 MALKGQLCVILTTLLLIQCQGKSTKKENIEQHKAVQTSGAERTGSIAGELSEISE ERREAEPPQFGLWGKRQVESPIEDPQFFDKKANSFGLWGKRGNGVGLWGRSADSW SKRQDSGLGLWGRSANPGNAVGLWGKRQRGGGRRGLDAKRYANPGDGVGLWGKRQ HDFGLWGRSAEPGNPVGLWGRVADKRDEQKRQKSIGLWGRSADPQKIGLWGR

Ap-Npe2 MDMDLACLFVSSDLQTVIIPRRTTRTFNVIFEICMGCSNSSSVLLGIITVCKSEE TNKQARSMSMFASKCPPGLWCGKKRSLVMSNLKKINKLDEDASKPLDERSSMSKF AAKCPPGLWCGKKRNAPLKLQEIPEIPVTGEIMHNERPHETVDRNVHRRFMSRCP PGLWCGKKRDLVTRFANKCPPGLWCGK

Ap-Npe3 MLLAAKSFIFLIVVFTVIFHIAASTPSLKIHHKGLRVKESHNLCRGEDCAKESQE NGAVECPEGKPCLNSERIMNKKEPCTQKRCEEDEKLERKEENENCESNESGCQTK RTLNKRGKRIGSVTPCPNHLTCARSLTETEAENLGFENSLTLRGCPPGLWCKRAI EKNWRKSGRKPSRSLSGCPPGLWCKRSLSKLQARELGFESSLSLNDCPRGLWCKR KASRALNGCPPGLWCKRGLDRSTAMKMGFENDESLQGCPPGLWCKRNSCPPGLKC ARGLTEAHAKAMGYKMADSLAGCPPGLWCKRNVKSGRSDGFKVFLDQANKKPKCP PGLWCKRDVSLFDAQGSSNGNKNCPPGLWCKRDAEFESDDQQRNHKCPPGLWCKR DSRYSLCPKGLPCKRRAKKYDDVAERFVR

Figure 2-2 Predicted amino acid sequences of putative neuropeptide precursors. The transcriptome was scanned for contigs whose predicted translation products contained multiple copies of the tripeptide GLW, as found in many neuropeptides (see text). Bold face and underlined, GLW motifs plus the immediately following G, CG, or C residues; bold face without underlining, the immediately preceding CPP sequences present in many of these putative peptide precursors; underlining, XA and/or XP dipeptides immediately C-terminal to the potential cut sites (where present); red, basic residues that might direct endoprotease action.

Interestingly, the three putative precursors differ in the amino acids found immediately downstream of the GLW motif. In Ap-Npe1, this is in all 10 cases a G, suggesting that the mature peptides would indeed terminate in GLW-NH2 [100], but in Ap-Npe2, each of the four GLW motifs is followed by a CG, suggesting that the mature peptides

31 might terminate in GLWC-NH2 (Figure 2). In Ap-Npe3, each of the nine GLW motifs is again followed by a C, but without a G to suggest C-terminal amidation of the mature peptides (or perhaps that this polypeptide is not actually processed to neuropeptides). Each of the putative peptides is also preceded and followed by basic residues that could serve as cutting sites for endoproteases [100], and in some cases the possible endoprotease cut sites are followed by XA and/or XP sequences that could be subject to removal by typical dipeptidylaminopeptidases [100]. Importantly, none of the peptides that might be derived from these putative precursors would be a match either for Hym-248 or for Metamorphosin-A (pEQPGLW-NH2, where the N- terminal pyroE is derived from a Q in the primary sequence), a morphogenesis- inducing peptide from the anemone Anthopleura elegantissima [100,101], suggesting that the neuropeptide(s) responsible for morphogenesis and induction of settlement differ among cnidarians.

Conclusions We have assembled and characterized a reference transcriptome for adult, aposymbiotic Aiptasia pallida using the Illumina sequencing platform. We have used this resource to detect SNVs in our clonal population of anemones, estimate the genome size, and identify possible neuropeptide-encoding genes. This transcriptome will enable future studies to explore the changes in gene expression that accompany the association with dinoflagellate endosymbionts, determine how the symbiotic partners respond to a variety of stressors, further test the applicability of this model system to corals, and complete the assembly and annotation of the Aiptasia genome (for which the transcriptomic data will be essential). The contigs and their associated annotations are available through NCBI (Transcription Shotgun Assembly database,

32 accession numbers JV077153-JV134524) and at http://pringlelab.stanford.edu/projects.html. The limitations of the current assembly should diminish in updated versions that incorporate additional sequence data, particularly those from symbiotic animals and from different developmental stages. Updated assemblies will be made available through both the NCBI site and our lab website.

33

Chapter 3 Extensive differences in gene expression between symbiotic and aposymbiotic cnidarians

This chapter is a paper submitted with the following authors Erik M. Lehnert, Morgan E. Mouchka, Matthew S. Burriesci, Natalya D. Gallo, Jodi A. Schwarz, and John R. Pringle

Department of Genetics, Stanford University School of Medicine, Stanford, CA 94025 USA

The work contained in the original manuscript is my own with the following exceptions: Morgan Mouchka collected data, performed qPCR, and analysed differentially expressed genes involved in symbiont recognition; Matthew Burriesci developed TopSort; Natalya D. Gallo identified genes whose expression is stable for qPCR controls; Jodi A. Schwarz and John R. Pringle helped conceive of the study and contributed to writing the paper.

34 Abstract

Coral reefs provide habitats for a disproportionate number of marine species relative to the small area of the oceans that they occupy. The mutualism between the cnidarian animal hosts and their intracellular dinoflagellate symbionts provides the nutritional foundation for coral growth and formation of large reef structures, as algal photosynthesis can provide >90% of the host's total energy. The large-scale disruption of this symbiosis, known as ‘coral bleaching’, is due largely to anthropogenic factors and poses a major threat to the future of coral reefs. Despite the importance of this symbiosis, the cellular mechanisms involved in its establishment, maintenance, and dissolution remain largely unknown. Here we report our continued development of genomic tools to study these mechanisms in Aiptasia, a small sea anemone that is emerging as a powerful model system for studies of cnidarian-dinoflagellate symbiosis. Specifically, we report a de novo assembly of the transcriptomes both of symbiotic anemones from a clonal line and of their endogenous dinoflagellate symbionts. We then demonstrate the utility of this resource by comparing transcript abundances in these anemones to those of animals from the same clonal line but lacking dinoflagellates (aposymbiotic). This analysis led to the identification of >900 differentially expressed transcripts and has allowed us to generate testable biological hypotheses about what cellular functions are affected by symbiosis establishment. The differentially regulated transcripts include >60 encoding distinct proteins that may play roles in transporting nutrients between the symbiotic partners; many more encoding proteins functioning in several metabolic pathways, providing clues as to how the transported nutrients may be used by the partners; and several encoding proteins that may be involved in host tolerance of the dinoflagellate.

35 Background

Coral reefs comprise only a small part of the world’s ocean environment but are habitats for a disproportionately large fraction of all marine species. Corals are able to produce the massive and biologically rich reef habitats despite growing in nutrient- poor waters because of the energy acquired through their mutualistic symbiosis with dinoflagellates of the genus Symbiodinium. These unicellular algae inhabit the symbiosomes (endosome-derived vacuoles) of gastrodermal cells in corals and other cnidarians (Figure 1) and transfer up to 95% of their photosynthetically fixed carbon to the host [104]. Reef-building corals are declining worldwide due largely to anthropogenic causes, which include pollution, destructive fishing practices, and increasing sea-surface temperatures [5]. Such stresses can lead to coral "bleaching", in which the algae lose their photosynthetic capacity and/or are lost altogether by the host. In severe cases, bleaching can result in the death of the host. This is particularly alarming because many corals already live near the upper limits of their thermal tolerances, and most climate-change models predict that these tolerances will frequently be exceeded in the coming decades, leading to widespread coral bleaching and death and a resulting loss of the reef habitats [105]. Despite the great ecological importance of cnidarian-dinoflagellate symbioses, little is known about the cellular and molecular mechanisms by which these relationships are established, maintained, or disrupted. This situation has resulted in part from the difficulties inherent in studying corals directly [106]. Thus, we and others have turned to the small sea anemone Aiptasia, which is normally symbiotic with dinoflagellates closely related to those found in corals but offers many experimental advantages [107]. In particular, Aiptasia lacks the calcareous skeleton that renders biochemical and microscopic analyses of corals challenging, grows rapidly by asexual reproduction under standard aquarium conditions to form large

36 clonal populations, can be induced to spawn and produce larvae throughout the year in the laboratory [73], and (importantly for this study) can be maintained indefinitely in an aposymbiotic (bleached) state so long as it is fed regularly [71]. y t vi er a t a C glea o w a

es Symbiosome e Membrane S M astric G

Epiderm Gastroderm

Figure 3-1 The spatial organization of cnidarian-dinoflagellate symbiosis. A simplified diagram of a section of cnidarian body wall is shown. The two major tissue layers are the epiderm, which faces the outside seawater and lacks both symbionts and direct access to food in the gastric cavity, and the gastroderm, which faces the gastric cavity and may contain dinoflagellate symbionts in some of its cells. These two cell layers are separated by the largely acellular mesoglea. After phagocytosis by a host gastrodermal cell, the dinoflagellate resides within a "symbiosome" (believed to be derived from a host endosome that does not fuse with lysosomes) and transfers fixed carbon to the host.

The intracellular localization of the dinoflagellate (Figure 1) raises some key questions about regulation of the symbiosis. First, how does the host recognize, take

37 up, and maintain appropriate symbionts without generating a deleterious immune response that could result in a failure of algal uptake, digestion of the algae after uptake, or apoptosis of the host cells? Second, what metabolites do the two organisms exchange across the symbiosome membrane, and how? It seems likely that the symbiotic state involves both transporters and regulation of metabolic pathways that are distinct from those found in aposymbiotic animals. Third, what changes in transport occur at other membranes? For example, although both gastrodermal and epidermal cells in aposymbiotic anemones presumably excrete ammonium as a toxic waste product, as do other aquatic invertebrates [108], at least some of that ammonium must be redirected to the algae in symbiotic anemones. Particularly intriguing questions are how the epidermal tissue layer is nourished (as it lacks both dinoflagellate symbionts and direct access to food particles) and whether the nature and mechanisms of this nourishment change upon the establishment of symbiosis. To begin to investigate these questions, we used RNA-Seq to generate an assembled and annotated transcriptome for symbiotic Aiptasia. This transcriptome was then used as a reference to compare global transcript abundances between symbiotic and aposymbiotic anemones. Previous studies using microarrays have identified few genes that were differentially expressed between the two states, possibly because of the insensitivity of the technology, the low ratio of infected to uninfected cells, and/or a lack of probes for the relevant genes. In contrast, we identified nearly 1,000 genes with significant expression differences, many of which were large. Many of these expression differences suggest interesting and testable biological hypotheses.

38 Materials and Methods

Aiptasia strain and culture All animals were from clonal population CC7 [9], which in spawning experiments typically behaves as a male [73]. For experiments performed at Stanford, the stock cultures were grown in a circulating artificial seawater (ASW) system at ~25ºC with 20-40 µmol photons m-2 s-1 of photosynthetically active radiation (PAR) on an ~12 h light : 12 h dark (12L:12D) cycle and fed freshly hatched brine-shrimp nauplii approximately twice per week. To generate aposymbiotic anemones, animals were placed in a separate polycarbonate tub and subjected to several repetitions of the following process: cold-shocking by addition of 4ºC ASW and incubation at 4ºC for 4 h, followed by 1-2 days of treatment at ~25ºC in ASW containing the photosynthesis inhibitor diuron (Sigma-Aldrich D2425) at 50 µM (lighting approximately as above). After recovery for several weeks in ASW at ~25ºC in the light (as above) with feeding (as above, with a water change on the following day), putatively aposymbiotic anemones were inspected by fluorescence microscopy to confirm the complete absence of dinoflagellates (whose bright chlorophyll autofluorescence is conspicuous when they are present). For experiments performed at Cornell, anemones were grown in incubators at

25°C in ASW in 1 L glass bowls and fed (as above) approximately three times per week. Symbiotic anemones were kept on a 12L:12D cycle at 18-22 µmol photons m-2 s-1 of PAR. Aposymbiotic animals were generated by exposing anemones under the same lighting and feeding regimen to 50 µM diuron in ASW, with daily water changes, for ~30 d or until the anemones were devoid of algae, as confirmed by fluorescence microscopy. Following bleaching, aposymbiotic anemones were maintained in the dark for ~2 years (with feeding as above) prior to experimentation.

39 Experimental design Three separate experiments were performed using somewhat different conditions (Table 1). For Experiment 1 (RNA-Seq), both symbiotic and aposymbiotic anemones were held at 27°C on a 12L:12D cycle, with feeding and water changes as above, for one month before sampling to allow them to acclimate. The aposymbiotic anemones were checked immediately before sampling by fluorescence microscopy to ensure that they were still symbiont free. Anemones were collected ~2 d after the last feeding and ~5 h into the light period. Each of three biological replicates per condition consisted of two to five pooled anemones (for a total of ~35 mg wet weight); samples were stored in RNALater (Ambion AM7021) at -20ºC until processing. For Experiment 2 (RNA-Seq), both symbiotic and aposymbiotic anemones were starved for 2 weeks before sampling. Symbiotic anemones were maintained at 25°C on a 12L:12D cycle, while aposymbiotic anemones were maintained at 25°C in constant dark. Anemones were collected 9 h into the symbiotic anemones' light period. Four symbiotic or eight aposymbiotic anemones (~50 mg wet weight in each case) were pooled in each of four biological replicates per treatment, flash frozen in liquid nitrogen, and held at -80ºC until processing. For Experiment 3 (RT-qPCR), both symbiotic and aposymbiotic anemones were maintained at 25°C on a 12L:12D cycle with feeding every 2 d followed by water changes; samples were collected 2 d after the last feeding and 6 h into the light period. Four symbiotic or eight aposymbiotic anemones (~50 mg wet weight in each case), were pooled in each of four biological replicates per treatment, flash frozen in liquid nitrogen, and held at -80ºC until processing.

40 RNA isolation and sequencing

Table 3-1 Summary of experimental conditions.

Experi- Site Purpose Light (µmol Tempera- Feeding ment photons ture (°C) schedule m-2 s-1) 1, Apo Stanford Gene expression a 25 (12L:12D) 27 Every 2 d 1, Sym Stanford Transcriptome 25 (12L:12D) 27 Every 2 d assembly and gene expression b 2, Apo Cornell Gene expression c 0 25 Unfed 2 weeks 2, Sym Cornell Transcriptome 18-22 25 Unfed 2 weeks assembly and gene (12L:12D) expression d 3, Apo Cornell RT-qPCR 18-22 25 Every 2 d (12L:12D) 3, Sym Cornell RT-qPCR 18-22 25 Every 2 d (12L:12D) a ~49 million 36-bp single-end reads (Accession Number SRR612167). b ~200 million 101-bp paired-end reads (Accession Number SRR610288) and 51 million 36-bp single end reads (Accession Number SRR612166). c ~80 million 101-bp paired-end reads (Accession Number SRR612165). d ~83 million 101-bp paired-end reads (Accession Number SRR696732).

In Experiment 1, total RNA was extracted from whole anemones using the RNAqueous-4PCR Kit (Ambion AM1914) following the manufacturer’s instructions. The RNA-integrity number (RIN) of each sample was determined using an Agilent

2100 Bioanalyzer, and only samples with a RIN ≥ 9 were used. ~3 µg of total RNA were processed (including a poly-A+-selection step) using the TruSeq RNA Sample Prep Kit (Illumina FC-122-1001) following the manufacturer’s instructions to produce indexed libraries. The resulting libraries were pooled based on their indices (as described in the kit instructions), and clustering and sequencing (both 101-bp paired- end reads and 36-bp single-end reads) were performed by the Stanford Center for Genomics and Personalized Medicine using an Illumina HiSeq 2000 sequencer. In Experiments 2 and 3, total RNA was extracted using the ToTALLY RNA™ Total RNA Isolation Kit (Ambion AM1910) following the manufacturer’s

41 instructions, except that the RNA was precipitated using 0.1 volume of 3 M sodium acetate and 4 volumes of 100% ethanol. The resulting RNA was purified using the RNA Clean and Concentrator™-25 Kit (Zymo Research R1017). For RNA-Seq, the RIN of each sample was verified to be ≥9 using an Agilent 2100 Bioanalyzer, and ~4 µg of total RNA per sample were processed using the TruSeq Kit (as above) to produce indexed libraries. The resulting libraries were pooled into 8 samples per lane, and clustering and sequencing (101-bp paired-end reads) were performed by the Cornell Life Sciences Core Laboratory Center using an Illumina HiSeq 2000 sequencer. Processing of samples for Reverse Transcriptase quantitative PCR (RT- qPCR) is described below.

Read filtering and transcriptome assembly and annotation Transcriptome assembly used all of the 101-bp paired-end reads obtained from symbiotic anemones at both Stanford and Cornell (Table 1; reads available through NCBI Short-Read Archive, accession numbers SRR610288, SRR612165, and SRR696732). Prior to assembly, the reads were processed as follows: (1) reads of <60 bp or containing ≥1 N were discarded; (2) any read for which <25 of the first 35 bases had quality scores >30 was discarded; and (3) reads were trimmed to the first position for which a sliding 4-bp window had an average quality-score of <20. The remaining read-pairs were then processed using FLASH to join reads whose ends overlapped by ≥10 bp with no mismatches [109]. Finally, adapter sequencers were removed using cutadapt with default settings [110]. The processed reads were assembled in three sets due to memory constraints. Each set was assembled using an additive-multiple-k-mer approach (k-mers of 51, 59, 67, 75, 83, 91) with the Velvet/Oases assembler (Velvet version 1.1.07 and Oases version 0.2.02: [85,86]) and merged using the Oases merge function with a k-mer of 27. The final outputs of each assembly were merged with one another using the Oases

42 merge function again. Near-identical contigs (≥99% identical over their entire lengths) were merged using UCLUST v. 5.2.32 [111]. To cluster alternative transcripts from the same gene (and presumably also transcripts from highly similar paralogs), UCLUST was used again, as follows. Contigs were aligned locally in both directions and clustered together if the alignment consisted of ≥20% of the total length of each contig and the sequence was ≥99% identical over the alignment. These parameters were chosen because they produced valid clusters on a test dataset from zebrafish in 93% of cases (with the remaining cases being mostly the near-identical paralogs common in teleosts due to genome duplication) (EML and B. Benayoun, unpublished results). To assign putative functional roles to the transcripts, we aligned them to the SwissProt protein database and the NCBI Non-Redundant Protein Database (nr) using the blastx program from the standalone BLAST 2.2.25+ software suite with an E- value cutoff of 1e-5 [89]. The results of the alignment to SwissProt were imported using the Blast2GO software package and used to assign Enzyme Codes and Gene Ontology (GO) terms to the predicted proteins [91,92].

Classification of contig origin using a transcript-sorting algorithm and alignment of genomic reads

To classify contigs into those derived from Aiptasia, those from the dinoflagellate symbionts, and those from other aquarium organisms that might have been associated with the mucous coats of the isolated anemones, we developed the machine-learning program TopSort, which uses support vector machines to classify transcripts as cnidarian, dinoflagellate, fungal, or bacterial [112]. TopSort's basic principle is that if there are N features for each element in a dataset, each element can be represented as a point defined by these features in N-dimensional hyperspace. If classes of elements are distinguishable by the N features, then there should be a N-1-dimensional

43 hyperplane that cuts the space such that one class can be separated from the others. The features used for TopSort were GC content; amino-acid and codon biases (where a strong BLAST hit allowed a reliable prediction of reading frame); phylogenetic classification of the top five best BLAST hits to the nr database (scoring each hit as cnidarian, non-cnidarian animal, dinoflagellate, non-dinoflagellate alveolate, plant, fungus, bacteria, or none-of-the-above); and best BLAST hit to a custom database composed of the sequences of known origin that were not chosen for either the training or test set (see below and Appendix 1, Supplementary Methods). BLAST hits to the species from which the training and test sequence sets were derived were discarded so as to avoid the development of a classifier that was highly accurate on the test and training sets but useless for a novel dataset. To build the training and test sets and the custom database, we used publicly available sequences for the cnidarians Nematostella vectensis and Hydra magnipapillata; the dinoflagellates Alexandrium tamarense, A. catenella, A. ostenteldii, A. mitum, Karlodinium micrum, Karenia brevis, and Symbiodinium strain KB8 (clade A); the fungi Saccharomyces cerevisiae, Schizosaccharomyces pombe, Aspergillus niger, and Neurospora crassa); and the bacteria Escherichia coli and Salmonella enterica (see Appendix 1, Supplementary Materials and Methods for Accession Numbers). We also included contigs from an earlier aposymbiotic Aiptasia transcriptome [113] that had ≥30 reads mapping to them from the aposymbiotic libraries produced during Experiment 1 of this study, as well as a large set of contigs from axenically cultured Clade B Symbiodinium strain SSB01 ([114]; T. Xiang and A. Grossman, personal communication). In addition, we tested the assembled contigs for alignment to Aiptasia genomic DNA sequences. We isolated genomic DNA from aposymbiotic Aiptasia and obtained about 101 Gb of untrimmed sequence reads from six separate libraries

44 (Accession Numbers SRR646474 and SRR606428; to be described in detail elsewhere). We aligned the genomic reads to our contigs and obtained the mean read count for each contig from the six libraries. A previous test had shown that only 20 of ~60,000 contigs assembled from RNA isolated from cultured, axenic Symbiodinium strain SSB01 ([114]; T. Xiang and A. Grossman, personal communication) had any Aiptasia genomic reads mapping to them. However, this clade B strain may have many sequence differences from the clade A strain found in CC7 anemones, so it seemed possible that low levels of Symbiodinium in our putatively aposymbiotic anemones might lead to misclassification of dinoflagellate transcripts as cnidarian. We determined that 15,499 of the 23,794 contigs classified as dinoflagellate by TopSort had zero genomic reads mapping to them, whereas the median of the mean read counts for the contigs classified by TopSort as cnidarian was ~200. Thus, we chose a mean read count of 10 as the cut-off to classify a contig as cnidarian by genomic evidence.

Expression analysis by RNA-Seq 36-bp reads (Experiment 1) were trimmed as described above. With the 101-bp paired-end reads (Experiment 2), the forward reads were shortened to 36 bp for expression analysis and then trimmed as described above. Reads were aligned using bwa to the representative contigs (i.e., the longest contig in each cluster produced by UCLUST; see above) [115]. The number of reads with a valid alignment to each contig was counted if it aligned with no errors or gaps to a unique region of the transcriptome. The R package DESeq was used to call contigs as differentially expressed if the false-discovery-rate (FDR)-adjusted P-value was ≤0.1 [116].

Expression analysis by RT-qPCR RNA was extracted and purified as described above, treated with DNase using the

45 TURBO DNA-free kit™ (Ambion AM1907) following the manufacturers' instructions, and diluted to a concentration of 200 ng per µl. cDNA was then synthesized using the GoScript™ Reverse Transcriptase System (Promega), following the manufacturers' instructions. Primers (Appendix 1, Supplementary Table 1B) were designed using Primer Quest (Integrated DNA Technologies) for 29 contigs with a variety of read counts and expression patterns; four of these contigs had previously been identified as appropriate internal reference standards as described below. The predicted product sizes of 110-238 bp were confirmed by agarose-gel electrophoresis after standard PCR amplification. Primer efficiencies were determined using Real- time PCR Miner [117] and ranged from 90-100%. The RT-qPCR products were also sequenced (Cornell Life Sciences Core Laboratory Center), and all matched the expected product identities. To quantify transcript levels, we used a ViiA™ 7 thermocycler (Applied Biosystems) with reaction conditions as follows: 12.5 µl of 2X Power SYBR® Green Master Mix (Applied Biosystems), 200 nmol of each primer, and 18 ng of cDNA in a total volume of 25 µl. Each sample and a no-template control was run in duplicate with thermocycler parameters of 95°C for 10 min, 40 cycles of 95°C for 15 s and 60°C for 60 s, and a subsequent dissociation curve to confirm the absence of non-specific products. To confirm the absence of genomic-DNA contamination, a pool of all eight

RNA samples (see above) was used as template in a separate reaction as described above except omitting the reverse transcriptase. Real-time PCR Miner was used to calculate the critical threshold (CT) of each gene from the raw fluorescence data. To identify reliable reference standards to use for qPCR normalization, we evaluated six housekeeping genes that appeared to be plausible candidates and have indeed been used for this purpose in previous studies of cnidarian gene expression (see Supplemental Materials and Methods for details). Briefly, the expression levels of

46 these genes were tested across a variety of experimental conditions (e.g., heat shock and cold shock) in both aposymbiotic and symbiotic anemones and evaluated for stability of expression using the software geNorm [118]. Based on this analysis, the genes encoding 60S ribosomal protein L11, 40S ribosomal protein S7, NADH dehydrogenase subunit 5, and glyceraldehyde-3-phosphate dehydrogenase were selected as standards. The stability of these genes in Experiment 3 was confirmed using geNorm prior to calculating a normalization factor from the geometric mean of their expression values ([118];see Appendix 1, Supplementary Table 1A). The expression levels of all 29 genes were then normalized via the normalization factor, and relative expression values were calculated using the equation 1/(1 + Primer

Efficiency)^CT. Log2 fold-changes in expression in symbiotic relative to aposymbiotic anemones were then calculated as the quotients of the above equation. The R software package was used to perform correlations between log2 fold-change data from qPCR and RNA-Seq Experiment 1.

Bayesian phylogenetic analysis Alignments of Npc2 proteins were generated using the MUSCLE software with its default parameters [119]. The alignments were inspected to identify regions conserved in all proteins and optimized manually over the conserved regions. We then generated a consensus phylogeny using MrBayes 3.1.2 with the following settings: prset aamodelpr = mixed and lset rates = invgamma [120]. Two separate runs were performed to ensure that identical consensus trees emerged regardless of starting conditions. The runs were terminated after 50 million generations with the average standard deviation of split frequencies ≤ 0.005.

47 Unbiased screening for functional groups among the differentially expressed genes As one approach to identifying genes involved in the symbiosis, we used the Database for Annotation, Visualization and Integrated Discovery (DAVID) v6.7 [121,122]. This program performs Fisher Exact tests to determine biological processes (based on GO terms) that are significantly overrepresented among differentially expressed transcripts relative to the background transcriptome. The Functional Annotation Clustering method was employed, which clusters groups of similar biological processes and provides an enrichment score representative of the -log geometric mean of the P-values of the individual processes. Clusters were considered significantly enriched when the enrichment score was >1.3 (corresponding roughly to P < 0.05).

Results

Sequencing and assembly of the transcriptome of symbiotic Aiptasia We isolated total RNA from a clonal population of symbiotic anemones raised under non-stressful culture conditions, enriched for poly-A+ RNA, and used this RNA to synthesize paired-end Illumina libraries, from which we obtained a total of ~345 million pairs of reads containing ~70 Gb of sequence. The raw reads were trimmed and processed as described in Materials and Methods, leaving ~228 million pairs of reads and ~45 Gb of sequence. These reads were assembled in three batches using Velvet/Oases and a multiple-k-mer approach (see Materials and Methods). The resulting assemblies were merged using the Oases merge option, and redundant contigs (≥99% identical over their entire lengths) were collapsed using UCLUST, yielding an initial set of 140,945 contigs with lengths of 102 to 32,510 bp. To estimate the number of genes represented by these contigs and choose a

48 representative contig for each gene, we clustered contigs with good alignments (≥99% identical over ≥20% of the length of the shorter contig). This resulted in 52,717 clusters, and the longest contig from each was taken as representative for further analysis. Although 31,014 clusters contained only a single contig, 19,380 contained two to nine contigs, and 2,323 clusters contained 10 or more contigs, with a largest cluster of 230 contigs (see Discussion).

Classification of contigs using TopSort and comparison to genomic sequence It is difficult or impossible to obtain animal RNA without contamination by RNA from the intracellular algal symbionts and (although presumably in much smaller amounts) from other organisms in the non-sterile aquarium system. To address this issue, we developed the TopSort support-vector-machines algorithm to classify contigs as putatively of cnidarian, dinoflagellate, bacterial, or fungal origin. Sequences of reliably known origin were used to create training and test sets, and each contig was scored on several metrics (see Materials and Methods). After training on the training set, the accuracy of TopSort on the test set was ~95% for contigs of 150- 300 bp and >99% for contigs of >300 bp, for an overall error rate of 2-3%.

Table 3-2 Assignment of contigs to species of origin.

A B C D E Type of Organism No. by No. with No. without False-positive TopSort a genomic genomic rate (%) c evidence b evidence b Cnidarian 28,026 26,219 1,807 6.4 Dinoflagellate 23,794 1,126 22,668 4.7 Fungi 166 18 148 10.8 d Bacteria 731 185 546 25.3 d a See text. b Genomic evidence was defined as ≥10 paired-end reads aligning from Aiptasia genomic-DNA libraries prepared from aposymbiotic anemones (see Materials and Methods). c Classified as cnidarian by TopSort but lacking genomic evidence, or classified as non-cnidarian by TopSort but with apparent matches to Aiptasia genomic DNA. d Many of these are presumably transcripts from contaminants that were present on the anemones from which the genomic DNA was prepared. However, the high rate of apparent false-positives among the

49 putatively bacterial and fungal sequences probably also reflects Bayes’s Rule, whereby the ratio of false-positives to true positives is high when the a priori probability of a true positive is low.

We used TopSort to classify the 52,717 representative contigs in our dataset (Table 2, column B). As expected, most contigs were classified as cnidarian or dinoflagellate. However, the 2-3% error rate of TopSort with the test dataset suggested that some hundreds of the putative cnidarian contigs might actually be dinoflagellate contigs that had been misclassified, which would be a significant problem for subsequent analyses of gene-expression differences between symbiotic and aposymbiotic animals. Thus, we also aligned reads from Aiptasia genomic-DNA sequence libraries to the transcriptome (see Materials and Methods). ~94% of the contigs classified by TopSort as cnidarian had supporting genomic evidence, as compared to only ~5% of contigs classified by TopSort as non-cnidarian (Table 2, columns C-E). These results validated the performance of TopSort in initial classification and yielded a set of 26,219 high-confidence Aiptasia contigs (henceforth referred to as "cnidarian") on which we have focused for our further analyses. In the remainder of this paper, we also use the term "dinoflagellate" to refer to the 22,668 contigs classified as dinoflagellate by TopSort and lacking matches to Aiptasia genomic DNA, and we refer to contigs for which the classifications by TopSort and genomic match conflicted as "ambiguous".

Characterization and annotation of transcriptome The cnidarian contigs ranged in size up to >32 kb, with a median of 1,644 bp, whereas the dinoflagellate contigs had somewhat smaller maximum and median sizes (Table 3). The remaining contigs ("Other" in Table 3) had a size distribution similar to those of the cnidarian and dinoflagellate contigs. It is therefore unlikely that the failure to classify these contigs as cnidarian or dinoflagellate was due simply to their being shorter than average and thus more difficult to annotate by BLAST or align to

50 genomic reads.

Table 3-3 Size distribution of the representative contigs.

Parameter Cnidarian Dinoflagellate Other a Number of contigs 26,219 22,668 3,830 Median contig size (bp) 1,644 1,144 1,474 Mean contig size (bp) 2,227 1,355 1,789 Minimum contig size (bp) 106 108 102 Maximum contig size (bp) 32,510 20,508 18,089 Total length of contigs (Mb) 58 31 7 a Includes both the contigs classified as "ambiguous" (see text) and those classified as fungal or bacterial.

To assign putative functions to the representative cnidarian and dinoflagellate contigs, we used blastx to align them to SwissProt and the NCBI nr database, retaining only alignments with E-values ≤10-5. Of the 26,219 cnidarian contigs, 16,373 (62%) had such alignments to 9,386 unique accession numbers in SwissProt (Table 4). In contrast, of the 22,668 dinoflagellate contigs, only 7,895 (~35%) had such alignments to 5,054 unique accession numbers (Table 4). Similar numbers were obtained by aligning sequences to nr (Table 4). Using Blast2GO with its default cut-off of 1e-3, we assigned GO terms based on the SwissProt annotations. We were able to assign 10,521 unique GO terms to cnidarian sequences and 5,747 unique GO terms to algal sequences.

To investigate why there were so few unique accession numbers relative to the numbers of representative contigs, we examined the distributions of contigs per accession number (Table 5). In both the cnidarian and dinoflagellate cases, ~76% of accession numbers annotated only one representative contig, and another ~14% annotated two representative contigs (as might occur with a duplicated gene or two sufficiently different alleles of the same gene). In contrast, some accession numbers were hit by much larger numbers of representative contigs (Table 5).

51

Table 3-4 Summary of alignments to SwissProt and nr.

Classif- No. No. (%) of No. (%b) of No. (%) of No. (%b) of ication Contigs contigs aligned unique contigs aligned unique to SwissProt a accessions to nr a accessions Cnidarian 26,219 16,373 (62) 9,386 (57) 19,259 (74) 11,593 (60) Dinoflagellate 22,668 7,895 (34) 5,054 (64) 11,184 (49) 7,789 (70) a Alignments with E-value ≤ 10-5. b As % of all alignments.

Although there are several possible explanations for such cases (including the existence of extended gene families, complex alternative splicing, and/or somatic differentiation), we suspect that most reflect a failure of contigs derived from the same gene to cluster with the algorithm used, perhaps because of repeat structures within the genes. In any case, if we assume (as a worst-case scenario) that all such cases result from such failures to cluster, and that the failure rate was identical between the successfully annotated and unannotated contigs, then we can infer that the numbers of ‘unigenes’ (sequences derived from unique genes) present in our dataset are ~14,500 for Aiptasia and ~14,000 for Symbiodinium, representing substantial fractions of the total gene numbers expected from information on other eukaryotes (see Discussion).

Table 3-5 Distribution of representative contigs among accession numbers. a

No. of contigs with best No. of accession No. of accession numbers blast hit to a given accession numbers (cnidarian) (dinoflagellate) number 1 7,108 3,874 2 1,332 703 3-5 643 361 6 to 10 190 85 11 to 25 89 25 26 to 50 14 3 >50 b 10 3 Total 9,386 5,054 a Analysis performed to investigate why there were so few unique BLAST hits relative to the numbers of representative contigs. See text for details. b The largest numbers were 187 (cnidarian) and 71 (dinoflagellate).

52

Identification of differentially expressed transcripts To compare gene expression in symbiotic relative to aposymbiotic anemones, we performed two RNA-Seq experiments using somewhat different conditions (see Materials and Methods; Table 1). In each experiment, we identified many transcripts that appeared to be differentially expressed, including many in which the changes in abundance were ≥5-fold (Table 6, columns B and C). Although the two experiments identified many of the same genes, there were also differences that probably reflect both the noise inherent in such analyses and actual differences in expression due to the different experimental conditions. However, we hypothesized that any genes involved directly in the maintenance of symbiosis (e.g., genes encoding proteins of the symbiosome) would show similar expression differences in both experiments. Therefore, we identified these contigs (Table 6, column D) and focused on them in subsequent analyses.

Table 3-6 Differential expression of cnidarian contigs.a

A B C D E No. of Contigs Shared (%d)

Contig Behavior b Experiment 1 c Experiment 2 c Total Unannotated Upregulated 1,109 3,093 456 (41) 53 (5) e Upregulated ≥5-fold 138 631 79 (57) 17 (12) Downregulated 1,036 2,905 464 (45) 48 (5) Downregulated ≥5- 23 388 6 (26) 4 (17) fold a Classified as cnidarian by TopSort and confirmed by genomic match (see Table 2). b Expression in symbiotic relative to aposymbiotic anemones. In all cases shown, the difference in expression was significant at a false-discovery-rate-adjusted P ≤ 0.1. c For experimental conditions, see Materials and Methods and Table 1. d The percentage in each case is the number shared divided by the number from Experiment 1. e Four of these 17 contigs had an ORF of >100 codons.

53 Although our further analyses to date have also focused on transcripts with convincing annotations by blastx, it is important to note that 101 of the cnidarian transcripts that appeared to be differentially expressed in both experiments, including 21 with ≥5-fold expression changes could not be annotated at this time (Table 6, column E). 52 of these transcripts (including four of the 21 with ≥5-fold expression changes) contained apparent open reading frames with ≥100 codons. Identifying the functions of these unknown proteins may be critical to understanding the structural and biochemical bases of the symbiosis. To evaluate the reliability of the RNA-Seq data, we also performed an RT-qPCR experiment using culture conditions similar to those of Experiment 1 (Table 1). We tested 29 contigs that exhibited a range of fold-changes and read counts, including some that were of particular biological interest (Appendix 1, Supplementary Table 1). In order to assess the overall agreement between the RNA-Sseq and RT-qPCR experiments, we determined the Spearman’s rank correlation coefficient of the Log2 fold-change for all contigs, excluding those that had apparently infinite changes in expression (i.e. were only found in either symbiotic or aposymbiotic anemones). The correlation coefficient of 0.96 (P-value = 3e-14) showed a strong correlation between the RNA-Seq and RT-qPCR datasets. In what follows, we discuss several sets of cnidarian genes whose differential expression suggests testable biological hypotheses.

Genes involved in metabolite transport Given the intimate relationship between the symbiotic partners, transporters involved in moving metabolites between compartments seem likely to be of special importance in maintaining the symbiosis. To identify such transporters, we screened the differentially expressed transcripts associated with the GO term “P:transport” for those encoding putative transporters of small molecules. Although the GO annotation

54 of Aiptasia is incomplete, we were able to identify 48 up-regulated and 18 down- regulated transcripts encoding putative transporters and transport-related proteins (Appendix 1, Supplementary Table 2). We focus in what follows on the 15 such proteins that were most highly up-regulated in symbiotic anemones (Table 7).

Table 3-7 Transport-related proteins that were strongly up-regulated in symbiotic anemones. a

UniProt Fold- Fold- Locus#/ Accession Blast-hit change b change c Transcript# Best Blast Hit No. E-value 1 11 6.3 86800/1 Human facilitated glucose Q9NY64 9e-89 transporter (GLUT8) 2 3.7 n.d. 11708/1 Human facilitated glucose Q9NY64 1e-88 transporter (GLUT8) + 3 ∞ n.d. 36456/1 Rabbit Na /(glucose/myo- Q28728 3e-104 inositol) transporter 2 4 5.8 n.d. 45451/1 Drosophila lipid-droplet Q9VXY7 2e-08 surface-binding protein 2 5 28 3.7 77179/1 Human scavenger receptor Q8WTV0 9e-65 class B member 1 6 44 57 125065/1 Drosophila organic-cation Q9VCA2 6e-35 (carnitine) transporter 7 600 26 102514/1 d Human Npc2 cholesterol P61916 2e-14 transporter + - 8 ∞ 29 58798/1 Bovine Na - and Cl - Q9MZ34 1e-169 dependent taurine transporter 9 4.9 6.2 95114/1 Mouse aromatic-amino-acid Q3U9N9 3e-65 transporter 1 10 6.9 n.d. 12006/1 Xenopus GABA and glycine Q6PF45 8e-60 transporter 11 4.3 n.d. 84720/1 Fish (Tribolodon) carbonic Q8UWA5 2e-36 anhydrase II 12 13 2.2 65589/1 Sheep aquaporin-5 Q866S3 8e-37 13 4.3 n.d. 2130/2 Pig aquaporin-3 A9Y006 1e-68 + 14 130 n.d. 60777/1 Zebrafish NH4 transporter rh Q7T070 3e-98 type b + 15 5.9 7.0 70728/1 C. elegans NH4 transporter 1 P54145 6e-72 (AMT1-type) a Putative small-molecule transporters and some proteins of related function are arranged in the order of their discussion in the text. b By RNA-Seq (see Appendix 1, Supplementary Table 2). The arithmetic mean of the values from Experiments 1 and 2 is shown except for transcript 77179/1 (line 5). ∞, expression was not detected in aposymbiotic animals. Transcript 77179/1 was detected in aposymbiotic anemones in Experiment 1 but not in Experiment 2, giving a nominal ∞-fold change in expression in that experiment. However, as the normalized read counts in both experiments were rather low, and the possible involvement of the 77179/1-encoded protein in lipid metabolism makes it likely to have been affected in its expression by the starvation conditions used in Experiment 2, we indicate here the more conservative value from Experiment 1 alone. c By qPCR (see Appendix 1, Supplementary Table 1). n.d., not determined.

55 d Encoding putative protein Npc2D.

Transport of photosynthetically fixed carbon and other organic metabolites Among the transcripts strongly upregulated in symbiotic anemones were two (Table 7, lines 1 and 2) that encode proteins closely related (~39% identify in amino- acid sequence) to the mammalian facilitative GLUT8, which localizes to the endosome membrane [123]. This localization depends on an N- terminal dileucine motif, and indeed dileucines are present at amino acids 32-33 and 26-27 of the two Aiptasia GLUT8 proteins. One or both of the Aiptasia GLUT8 proteins are thus likely to be involved in the transport of photosynthetically produced glucose across the symbiosome membrane into the host cytoplasm (see Discussion). However, it should also be noted that the transcript encoding a predicted Na+- glucose/myo-inositol co-transporter was detected only in symbiotic anemones (Table 7, line 3), while a transcript encoding a related protein was also upregulated 2.2-fold (Appendix 1, Supplementary Table 2, line 26). Interestingly, a transcript encoding a third member of this protein class was strongly downregulated in symbiotic anemones (Appendix 1, Supplementary Table 2, line 65). Lipids may also be an important energy currency in symbiotic animals (see further discussion below), and in this regard it is interesting that the transcripts for a putative lipid-droplet surface-binding protein (potentially involved in the mobilization of stored fats for transport), a protein similar to scavenger receptor class B member 1 (related to CD36-type fatty-acid transport proteins), and a putative carnitine transporter (potentially involved in entry of fatty acids into mitochondria for degradation) were all strongly upregulated in symbiotic animals (Table 7, lines 4-6). In the last regard, it should also be noted that the transcripts for several putative acyl- carnitine transferases were also upregulated in symbiotic anemones (Appendix 1,

56 Supplementary Table 2, lines 25, 43, and 44).

Figure 3-2 Npc2-like proteins that putatively do or do not have the ability to transport cholesterol. (A) Consensus phylogenetic tree constructed from alignments (Supplementary Figure 1B) of 25 Npc2-like proteins (see Materials and Methods). Oscarella carmela, a sponge, served as the outgroup, and the single human Npc2 protein, the single mouse Npc2 protein, and one of the eight Drosophila melanogaster Npc2 proteins were included in the analysis. The cnidarian sequences included are from two corals (Acropora digitifera and Montastraea faveolata), three anemones (Aiptasia sp., Nematostella vectensis, and Anemonia viridis), and a hydrozoan (Hydra magnipapillata). The Npc2-encoding transcripts found to be upregulated in symbiotic anemones, which fall outside the clade containing the mammalian and Drosophila sequences, are shown in red with their fold-changes (Table 7, line 4 [127]). Light blue and pink shading indicate the groups of anthozoan proteins in the cladogram to which the sequence displays in B correspond. Numbers indicated the bootstrap values for the branches indicated. (B, lower) Amino acids highly conserved in animal (including some cnidarian) Npc2 proteins and thought to be involved in cholesterol binding (see text). The mammalian proteins both have the sequences F…PVK, and the Drosophila Npc2A sequence is F…PVL. (B, upper) The variety of amino acids found at the corresponding positions in members of the other protein clade. The differentially regulated Aiptasia and A. viridis proteins both have the sequence L…SID.

Also dramatically upregulated was the transcript for a member (Npc2D) of the Npc2 protein family (Table 7, line 7; Appendix 1, Supplementary Figure 1). In mammalian and Drosophila cells, Npc2 binds cholesterol in the lumen of the endosome and lysosome and transfers it to Npc1, a transmembrane protein that exports the cholesterol to other intracellular locations [124–126]. Consistent with a previous study of the anemone Anemonia viridis [127], we identified multiple transcripts encoding Npc2-like proteins in the Aiptasia transcriptome as well as in

57 the transcriptomes of three other cnidarians. A multiple-sequence alignment and Bayesian phylogenetic analysis identified two subclades reflecting at least one duplication event in the Anthozoan lineage (Figure 2A). One subclade (including the Npc2A proteins of both A. viridis and Aiptasia) clustered with the canonical Npc2 proteins found in most animals (including mammals and Drosophila), while the second subclade contained both the A. viridis [78,127] and Aiptasia Npc2D proteins that are upregulated during symbiosis. Strikingly, all of the proteins in this second subclade have sequence alterations at conserved positions in the sterol-binding site (Figure 2B). Mutations to alanine at these positions are known to disrupt cholesterol binding in human cells [128,129], raising interesting questions about the roles of these proteins in symbiotic cnidarians (see Discussion). Other putative organic-metabolite transporters also showed large changes in expression. In particular, a transcript encoding a putative taurine transporter was detected only in symbiotic anemones (Table 7, line 8), while the transcripts for an aromatic-amino-acid transporter and a GABA/ were upregulated 4.9- and 6.9-fold, respectively (Table 7, lines 9 and 10). Interestingly, taurine has been reported to comprise ~35% of the amino-acid pool in symbiotic Aiptasia [41], although its specific functions are not well understood. Determining the intracellular localization of the transporter identified here could provide insight into the possible function(s) of taurine. The transcripts for other putative amino-acid transporters also showed significant differences in expression between symbiotic and aposymbiotic anemones (Appendix 1, Supplementary Table 2, lines 16, 28, 30, 34, 39, 40, 45, 52, and 55), suggesting that the establishment of symbiosis produces profound changes in amino-acid transport and metabolism (see Discussion).

Transport of inorganic nutrients

CO2 is an excreted waste product for animals such as aposymbiotic anemones, but

58 it is required for photosynthesis when dinoflagellate symbionts are present. It may not require specific transporters if it can diffuse freely across cellular membranes. However, in order to maintain a high concentration of inorganic carbon in the symbiosome, the host may need to convert CO2 to the less freely diffusing bicarbonate anion. We identified one carbonic-anhydrase gene that was upregulated 4.3-fold in symbiotic anemones (Table 7, line 11), while a second gene was downregulated 3-fold

(Appendix 1, Supplementary Table 2, line 63). In addition, it is not clear that CO2 diffuses sufficiently rapidly through the relevant membranes to support efficient photosynthesis, and some studies have suggested that aquaporins may play a role in facilitating this diffusion [130,131]. Thus, it is of interest that that we found two aquaporins to be up-regulated 13- and 4.3-fold in symbiotic anemones (Table 7, rows 12 and 13). Although aposymbiotic anemones, like other aquatic animals, excrete excess (and potentially toxic) ammonium produced by amino-acid breakdown [49], symbiotic anemones need to supply nitrogen to their dinoflagellates. Thus, it was not surprising that we found differentially expressed genes encoding ammonium transporters. These genes were in both of the two major families found in animals: a "rhesus-like" gene and an "AMT-like" gene were upregulated 130- and 5.9-fold, respectively, in symbiotic anemones (Table 7, rows 14 and 15), suggesting that they might be involved with ammonium supply to the dinoflagellate, whereas another rhesus-like transporter was down-regulated 2.9-fold (Appendix 1, Supplementary Table 2, row 61), suggesting that it might be involved in ammonium excretion. The host must also supply other inorganic nutrients to the algae. For example, phosphate and sulfate must be translocated across the symbiosome membrane either as the inorganic ions or as part of some organic metabolite. In this regard, it is of interest that we found the genes for two putative inorganic-phosphate transporters to be up-

59 regulated ~2-fold in symbiotic anemones, a gene for a putative UDP-sugar transporter to be up-regulated 2.7-fold, and a gene for a putative to be up- regulated 3.1-fold (Appendix 1, Supplementary Table 2, lines 19, 22, 27 and 42). In addition, although it is not clear why, zinc is apparently absorbed to a greater extent by symbiotic than aposymbiotic anemones [47],with increased concentrations in both animal and dinoflagellate, and we found genes for three putative zinc transporters, in two different families, to be up-regulated 1.7- to 2.6-fold (Appendix 1, Supplementary Table 2, rows 23, 36, and 46).

Genes controlling certain metabolic pathways To explore the integration of metabolite transport with the overall regulation of metabolic pathways, we looked for the presence and coordinated regulation of genes encoding the enzymes of particular pathways that we hypothesized might be involved in the animal's response to the presence of a symbiont. For these analyses, we used the full transcriptome but only the expression data from RNA-Seq Experiment 1, because the starvation of the aposymbiotic anemones in RNA-Seq Experiment 2 seemed likely to have had a strong effect on the expression of metabolic-pathway genes.

Lipid metabolism

There appear to be systematic changes in lipid metabolism between symbiotic and aposymbiotic anemones. Four genes encoding enzymes involved in fatty-acid synthesis (acetyl-CoA carboxylase, a fatty-acid elongase, and ∆5- and ∆6-fatty-acid desaturases) were upregulated 3.5- to 6.2-fold (Appendix 1, Supplementary Table 3, lines 1-4), and at least nine genes encoding proteins putatively involved in lipid storage or its regulation were also differentially regulated (Appendix 1, Supplementary

Table 3, lines 5-13). In addition, many genes involved in β-oxidation of fatty acids

60 were upregulated in symbiotic anemones (Figure 3; Appendix 1, Supplementary table 3, lines 17-21, 23, 24, and 30). Although some of the fold-changes were not large, the consistency is striking, and gastrodermal and epidermal cells may well differ in their expression patterns in ways that obscure the full extent of the changes in a particular cell population (see Discussion).

61 Fatty Acid -1.9-fold -3.1-fold 28-fold FATP1/4 SRB1

Cell Membrane Fatty Acid CoA ACSL4 5.7-fold ACSL5 2.9-fold

Acyl-CoA 2.4-fold Acylcarnitine Carnitine CPT1 Outer Mitochondrial Membrane

CACT Inner Mitochondrial Membrane CPT2 Acyl-CoA 1.6-fold Carnitine Acyl-CoA Acylcarnitine C12 1.6-fold C10 C18 C18:1 MCAD VLCAD C16 C16:1 C8 C14 C14:1 {C6 Enoyl-CoA SCAD C4

1.4-fold Enoyl-CoA Enoyl-CoA Enoyl-CoA MTP C12:1 3-cis C12:1 2-trans DCI crotonase

3-hydroxyacyl-CoA 3-hydroxyacyl-CoA 1.4-fold

MTP M/SCHAD

3-ketoacyl-CoA Acyl-CoA(n-2) + Acetyl-CoA 3-ketoacyl-CoA 1.4-fold 2.4-fold MTP MCKAT

Figure 3-3 Expression changes of genes governing β-oxidation of fatty acids. The diagram (adapted from [207]) shows the localization of proteins involved in fatty-acid transport and β-oxidation in relation to the membranes of the mitochondrion and cell (as known from other animal cells). Statistically significant expression changes from RNA-Seq Experiment 1 are shown where applicable; upregulation in symbiotic relative to aposymbiotic anemones is shown by positive/red numbers, and downregulation is shown by negative/blue numbers. Scavenger receptor class B member 1 (SRB1; CD36-related protein) and FATP1/4, putative fatty-acid transporters at the cell surface; ACSL4 and ACSL5, enzymes that convert free fatty acids to fatty acyl-CoA esters; CPT1, CPT2, and CACT, proteins involved in transporting fatty acyl-CoA esters across the mitochondrial membranes; VLCAD, MTP, MCAD, SCAD, M/SCHAD, and MCKAT, enzymes responsible for β-oxidation; DCI, converts fatty acids with double bonds starting at odd-numbered positions to fatty acids with double bonds starting at even-numbered positions; crotonase, hydrates double bonds that start at even-numbered positions. See Appendix 1, Supplementary Table 3, lines 14-30, for full protein names, UniProt Accession Numbers, and transcript numbers.

Finally, although the glyoxylate cycle (which allows cells to achieve a net synthesis of longer carbon chains from two-carbon units such as those derived by β- oxidation) is not generally present in animal cells, we identified genes putatively encoding its two key enzymes, isocitrate lyase and malate synthase (Appendix 1,

62 Supplementary Table 3, lines 31 and 32), consistent with a previous report of the presence of this cycle in cnidarians [132]. Although the malate-synthase transcript showed no statistically significant differential expression, the isocitrate-lyase transcript was upregulated 3.9-fold in symbiotic anemones. Interestingly, we did not see significant upregulation of the genes encoding the enzymes responsible for metabolizing medium- and short-chain fatty Acyl-CoA (MCAD, SCAD, crotonase, and M/SCHAD in Figure 3; Appendix 1, Supplementary Table 3, lines 25-28), suggesting that the metabolic change accompanying the establishment of symbiosis primarily involves long-chain and/or very-long-chain fatty acids.

Amino-acid metabolism and the SAM cycle Consistent with previous observations [42], we found that the transcript for a putative glutamine synthetase was upregulated in symbiotic anemones, as was the transcript for a putative NADPH-dependent glutamate synthase (Figure 4). These data suggest that the epidermal cells, the gastrodermal cells, or both synthesize glutamate via a complete GS-GOGAT cycle [133] rather than (or in addition to) simply obtaining it from the dinoflagellate. We also identified both upregulated and downregulated genes that encode putative glutamate dehydrogenases (Figure 4), which normally catabolize glutamate to α-ketoglutarate and ammonium in animal cells (where the concentrations of ammonium are typically too low to allow the reverse reaction to proceed effectively). The subcellular-localization program WoLF PSORT [134] predicts that the downregulated and upregulated enzymes should localize to the mitochondria and cytosol, respectively, consistent with a previous report that corals contain both mitochondrial and cytosolic glutamate dehydrogenases [135]. It seems likely that these initially rather puzzling observations (upregulation of one glutamate dehydrogenase and downregulation of another; upregulation of enzymes both of glutamate synthesis and of glutamate breakdown) reflect the differing

63 metabolic needs of different cell types, and/or of different compartments within the same cells, in symbiotic anemones.

64 We also observed multiple changes in the expression of genes governing the metabolism of sulfur-containing amino-acids and the S-adenosylmethionine (SAM) cycle (Figure 5). Based on the failure to find a gene encoding cystathionine β- synthase (CBS) in the A. digitifera genome, it has been hypothesized that cysteine is an essential amino acid in cnidarians that must be obtained directly from either prey or the symbiont [74]. However, we found an Aiptasia transcript encoding a CBS [best BLAST hit, rabbit CBS

+ (Q9N0V7); E-value 9e-166], NH4 + -ketoglutarate suggesting that anthozoans resemble other animals in their Glutamate Glutamate dehydrogenase dehydrogenase ability to synthesize cysteine (cytosolic?) (mitochondrial?) from methionine. The CBS -3.1-fold 13-fold transcript was downregulated 2.1-fold in symbiotic anemones, perhaps reflecting a decreased NH + Glutamate 4 need for cysteine synthesis in the host because it is being

Glutamine Glutamate supplied directly by the synthetase synthase dinoflagellate. Conceivably in 3.0-fold 1.9-fold the more obligately symbiotic

Glutamine -ketoglutarate corals, the enzyme is never needed and the gene has been Figure 3-4 Expression changes of genes governing glutamine and glutamate metabolism. Upregulation in lost altogether. It should also be symbiotic relative to aposymbiotic anemones is shown by positive/red numbers, and downregulation is shown by noted that cysteine synthesis via negative/blue numbers. For UniProt and transcript numbers, see Appendix 1, Supplementary Table 4, lines 1-4. The possible localizations of the glutamate dehydrogenases are the CBS pathway is a drain on discussed in the text. the homocysteine pool, which

65 otherwise remains available for the synthesis of methionine and the SAM cycle. The concordant upregulation of four genes encoding enzymes of the SAM cycle (Figure 5) suggests that it may assume an increased importance in symbiotic animals, although the very wide range of possible methylation targets makes it difficult to guess at the precise biological significance of this regulation. The apparent switch of pathways used for synthesis of methionine from homocysteine (Figure 5) may also be related, in that it could reflect an alteration in the kinetics and/or localization of the SAM cycle.

ATP PPi + Pi

Methionine S-Adenosylmethionine MAT 1.7-fold X THF DMG

MS BHMT SAM Cycle Methyl 3.4-fold -5.3-fold Transferases

5-methyl THF Betaine Choline X-CH MTHFR 3 11-fold Homocysteine S-Adenosylhomocysteine 5,10-methylene THF SAHH Serine 1.8-fold

CBS -2.1-fold NH4 + alpha-Ketobutyrate Water

Cystathionine Cysteine CGS HAT CGL Homoserine Water

Figure 3-5 Expression changes of genes governing the metabolism of sulfur-containing amino acids and the S-adenosylmethionine (SAM) cycle. Upregulation in symbiotic relative to aposymbiotic anemones is shown by positive/red numbers, and downregulation is shown by negative/blue numbers. For full names of enzymes, UniProt Accession Numbers, and transcript numbers, see Supplementary Table 4, lines 7, 8, 10-14, 18, and 19. THF, tetrahydrofolate; DMG, dimethylglycine.

Interestingly, despite the presence of a CBS, we could not find a gene(s) encoding aspartokinase or homoserine dehydrogenase in the transcriptomes of symbiotic or

66 aposymbiotic Aiptasia or in the A. digitifera genome ([74]; Appendix 1, Supplementary Table 4, lines 15-17), implying that anthozoans would be unable to achieve a net synthesis of homoserine (and hence of homocysteine and other sulfur- containing amino acids) from central metabolic intermediates. If confirmed, this would be consistent with the situation in other animals (where methionine is an amino acid essential in the diet) [40] but surprisingly inconsistent with labeling results indicating synthesis of methionine by starved, aposymbiotic anemones [45]. A related puzzle is that the Aiptasia transcriptome and the A. digitifera genome appear to contain genes encoding both a homoserine O-acetyltransferase and a cystathionine γ- synthase, which would allow the synthesis of cystathionine from homoserine, but not a cystathionine β-lyase, which in many microorganisms is responsible for the synthesis of homocysteine from cystathionine (Appendix 1, Supplementary Table 4, lines 18- 20). Further studies will be needed to resolve these issues. These questions about methionine and cysteine metabolism raised a broader question about the degree to which the amino-acid-biosynthetic capabilities of anthozoans resemble those of better-characterized animals, in which 12 of the 20 amino acids needed for protein synthesis cannot be synthesized from central-pathway intermediates and so must be obtained (directly or indirectly) from the diet. To address this question, we asked if the elements of amino-acid-biosynthetic pathways were present in the Aiptasia transcriptome. As expected, it appears that Aiptasia should be able to synthesize the eight generally nonessential amino acids from intermediates in the central metabolic pathways (Appendix 1, Supplementary Table 4, lines 1, 2, 22-36). In addition, like other animals, they should be able to synthesize arginine from ornithine via the urea cycle (Appendix 1, Supplementary Table 4, lines 37-42), although a net synthesis of ornithine and arginine would not be possible because of the apparent lack of either an acetylglutamate kinase or an ornithine

67 acetyltransferase (Supplemementary Table 4, lines 43 and 44). Similarly, although the Aiptasia transcriptome revealed genes encoding various enzymes involved in interconversions within other groups of amino acids, key enzymes needed to synthesize these groups of amino acids from central-pathway intermediates appear to be missing (Appendix 1, Supplementary Table 4, lines 15-17, 45-79). Thus, Aiptasia, like other animals, apparently must obtain 12 amino acids (or their amino-acid precursors) from their food, their dinoflagellate symbionts, or both. This conclusion is generally compatible with radiolabeling studies suggesting that leucine, isoleucine, valine, histidine, lysine, phenylalanine, and tyrosine are all translocated from the dinoflagellates to the host [45].

Genes potentially involved in host tolerance of dinoflagellates To take an unbiased approach to the identification of other genes that might be involved in maintenance of the symbiosis, we used the DAVID program to identify biological processes (based on GO terms) that were significantly overrepresented among the differentially expressed transcripts (see Materials and Methods). Among the groups of genes identified in this way were three that are potentially involved in the animal host's tolerance of the symbiotic dinoflagellates.

68 Figure 3-6 Expression changes of genes with functions that may relate to host tolerance of the symbiont. Functionally related groups of genes (by GO-term assignments) that were significantly enriched among the differentially expressed genes relative to the background transcriptome were identified as described in the text. Fold-changes are shown as expression in symbiotic anemones relative to that in aposymbiotic anemones. ¢, putatively pro- inflammatory; p, putatively anti-inflammatory; *, highly up-regulated (28-fold in Experiment 1 and detected only in symbiotic animals in Experiment 2; see also Supplementary Table 2, footnote b); u, 12-fold-change; Ë, 44-fold-change; ê, 60-fold-change. For additional details, see Supplementary Table 5.

69 Response to oxidative stress As the presence of an intracellular photosynthetic symbiont presumably imposes oxidative stress on an animal host (see Discussion), it was quite surprising that of the eight differentially expressed genes identified under this GO term, six (including a catalase gene) were actually downregulated in symbiotic animals (Figure 6A). Moreover, one of the two upregulated genes encodes a predicted guanylate cyclase, which may have many functions unrelated to oxidative stress. The other upregulated gene is one of a pair encoding distinct Aiptasia proteins (Appendix 1, Supplementary Figure 2A) that had a human peroxidasin as their top blastx hit (Appendix 1, Supplementary Table 5, section A). However, a function for these proteins in coping with oxidative stress is doubtful for two reasons. First, the second gene is downregulated in symbiotic animals (Figure 6A). Second, despite the blastx results, neither of the Aiptasia proteins contains the peroxidase domain found in canonical peroxidasins with peroxidase activity [136] (Appendix 1, Supplementary Figure 2A,B)

Inflammation/tissue remodeling/response to wounding Of the 15 differentially expressed genes associated with this cluster of GO terms, eight were downregulated and seven were upregulated in symbiotic animals (Figure 6B). Despite this heterogeneity, a suggestive pattern was observed in which genes encoding proteins whose homologues are considered pro-inflammatory were mostly downregulated, whereas the three genes encoding proteins whose homologues are considered anti-inflammatory were all upregulated (Figure 6B; Appendix 1, Supplementary Table 5). The pattern appears even stronger when it is noted that one of the three upregulated genes with a putatively pro-inflammatory function encodes just one of at least three distinct Aiptasia plasma-kallikrein homologues (Appendix 1, Supplementary Figure 2C), the other two of which are downregulated (Figure 6B), and that the upregulated ficolin may function specifically in recognition of Symbiodinium

70 rather than just as a general activator of the complement innate-immunity pathway [Logan et al., 2010]. Thus, a downward modulation of the host's inflammatory response may contribute to allowing the persistence of dinoflagellate symbionts.

Apoptosis/cell death Of the 13 differentially expressed genes associated with this pair of GO terms, five were downregulated in symbiotic animals but eight were upregulated, including several with large fold-changes in expression (Figure 6B). Thus, it seems possible that that an increased activity of cell-death pathways may be required for the animal to cope with the presence of the symbiotic dinoflagellate even under conditions considered to be nonstressful (see Discussion).

Discussion

To explore the cellular and molecular basis of the cnidarian-dinoflagellate symbiosis, we undertook a global analysis of the transcriptomes of symbiotic and aposymbiotic Aiptasia. This study has yielded (i) extensive transcriptome assemblies for both the anemone and its symbiont; (ii) novel hypotheses about changes in metabolism and metabolite transport that may occur in the host upon symbiosis establishment; and (iii) novel hypotheses about genes that may be involved in symbiont recognition and tolerance by the host. The transcriptomes also provide a reference for future studies of gene expression under other conditions such as exposure to various stresses.

Transcriptome assembly and annotation We have sequenced, assembled, and partially characterized the transcriptomes of both a clonal stock of symbiotic Aiptasia and the endogenous clade A symbionts present in that stock. The animal and algal transcripts were separated bioinformatically using the TopSort algorithm that we developed for this purpose and

71 comparisons to genomic sequence obtained from fully aposymbiotic animals. Although the assemblies appear to be of high quality overall, there remain some areas where improvements could be made. For example, it remains unclear both why reads from some genes assembled into multiple contigs that clustered based on regions of nucleotide identity (up to 230 contigs in a cluster) and why a small set of accession numbers have so many representative transcripts aligning to them. As it seems unlikely that alternative splicing and gene duplications alone could explain the magnitude of the effects observed, we presume that they result from some combination of these factors and the inherent complexities of de novo transcriptome assembly. For example, two of the contig clusters with the most members encode putative actins and olfactory C proteins. In most animals, actins are encoded by families of genes and expressed at high levels, whereas olfactory C proteins are members of very large gene families. Highly abundant transcripts may have some error-containing reads that recur with sufficient frequency that they are assembled into distinct contigs, and gene families with many members could generate chimeric contigs if they have identical subsequences that are longer than the k-mers used for assembly. In addition, templates derived from more than one gene may arise during the reverse transcription or PCR steps of library preparation; the resulting fusion reads may lead to incorrect assembly of contigs, as well as increase the proportion of such contigs substantially when the Oases merge function is used (from 3.6 to 12.2% in one reported case: [137]). The question about accession numbers can perhaps be explained by similar mechanisms. One approach that might help with these problems would be to assemble the reads initially with a greater k-mer length and coverage cut- off to obtain fewer misassemblies for the most abundant transcripts and gene families, remove the reads that map to these transcripts, and then assemble the remaining lower- coverage reads with less stringent thresholds.

72 The assemblies could also be improved by investigating more closely the contigs for which the species of origin could not be determined with the methods used to date. To this end, the performance of TopSort could probably be improved in one or more of several ways. First, it could be retrained with a new dataset that includes transcriptome or genome sequence from an axenic Clade A Symbiodinium strain (to improve recognition of contigs from the Clade A strain resident in Aiptasia stock CC7). Second, it could be extended such that it assigns contigs not just to the four groups used to date (cnidarian, dinoflagellate, fungi, and bacteria) but also to other groups (such as diatoms and ciliates) that may be present in nontrivial amounts in the guts and/or mucous layers of the anemones; this would also require retraining with a dataset that included unequivocal sequences from those groups. Third, the classification metrics could be extended to include also other sequence features found to be specific to the phyla of interest (such as the spliced leader sequences thought to be present in many dinoflagellate transcripts [20]). In addition to improving the performance of TopSort, its assignments could also be tested further using alignment to transcriptome or genome sequence from a clonal, axenic Symbiodinium strain (preferably of Clade A), essentially as we have already done using Aiptasia genome sequence. Ultimately, however, some contigs may remain ambiguous in assignment until assembled genomes of both partner organisms are available for alignment.

Several additional issues will require more investigation as the relevant resources become available. First, the numbers of unigenes found here for both symbiotic partners (~14,000 by conservative estimate) are significantly less than those expected for the full genomes. For Symbiodinium, this probably reflects, at least in part, the existence of many genes that are expressed at significant levels only in free-living and/or stressed organisms. For Aiptasia, it presumably reflects the absence in the current transcriptome of genes that are expressed at significant levels only in

73 specialized and non-abundant cell types (e.g., in nerve and muscle), at other stages in development (e.g., in embryos and larvae), or under other environmental conditions. Second, only 62 and 74% of representative Aiptasia transcripts could be annotated using SwissProt and nr, respectively, and these numbers were even lower (35 and 49%) for Symbiodinium. This incomplete annotation presumably reflects the poor representation in the databases of genes and proteins unique to these relatively understudied organisms, as well as the great phylogenetic distance of the dinoflagellates from more intensively studied groups. Finally, the Symbiodinium transcriptome reported here has not yet been analyzed in depth, in part because of a lack of informative comparisons to be made at this time. However, we should soon be able to use this transcriptome to make interesting comparisons of gene expression in this Symbiodinium strain growing in culture vs. in hospite, in this strain after exposure to various stressors, and in different Symbiodinium strains grown in this same host.

Differential expression of animal genes Based on their consistent behavior in two separate RNA-Seq experiments done under somewhat different conditions, at least 920 genes appear to have significantly different expression between symbiotic and aposymbiotic anemones (see Appendix 1, Supplementary Material for the full list), including ≥85 for which there is an ≥5-fold change in expression. These findings show the value of a comprehensive analysis of differential expression, as earlier studies using microarrays had found much smaller numbers of differentially expressed genes [127,138–143]. We have focused our more detailed analyses to date on several groups of genes whose differential expression suggests interesting hypotheses about the biology of the symbiosis.

Genes controlling metabolism and transport in gastrodermal and epidermal cells Given the intimate relationship between the symbiotic partners (Figure 1), we

74 were not surprised to observe substantial changes in the expression of many genes encoding proteins involved in small-molecule metabolism and transport. We present here some speculative but testable hypotheses about how these changes might reflect the establishment and maintenance of symbiosis.

75 Figure 3-7 Summary of hypotheses about metabolism and metabolite transport as suggested by the gene-expression data and previously available information. Letters A through N are for reference in the text. Thick arrows across membranes, transporters hypothesized to be present in those membranes; thick arrows within cells, metabolic pathways hypothesized to be important in those cells; thin arrow, presumed diffusion of Npc2-sterol complexes; Npc1(a), the anemone- derived Npc1-like protein described in the text; Npc1(d), a presumed but as-yet-unidentified sterol transporter produced by the dinoflagellate and present in its plasma membrane; ?, hypotheses that we consider to be more problematic. Not shown because of their potential complexity are the + other changes in inorganic-nutrient transport (e.g., a damping of NH4 and CO2 excretion across the apical plasma membranes of the gastrodermal cells) that are likely to occur upon the onset of symbiosis. All of these hypotheses should be testable through a combination of experiments including protein localization by immunofluorescence and/or cell fractionation, studies of separated gastrodermal and epidermal cell layers, sterol-binding experiments on Npc2 proteins, and others. See text for additional details.

76 Glucose transport within gastrodermal cells As glucose appears to be the major form in which fixed carbon is transferred from the dinoflagellate to the host [28,36], it was not surprising to find the transcripts for three presumed glucose transporters among those highly upregulated in symbiotic animals (Table 7, lines 1-3). As mammalian GLUT8 is localized to the endosome membrane [123], it is likely that one or both of the Aiptasia GLUT8 orthologs localize to the symbiosome membrane (Figure 7, A), a hypothesis that should be readily testable by immunolocalization studies once appropriate antibodies are available. The putative Na+-glucose/myo-inositol co-transporters might also be involved in glucose transport across the symbiosome membrane. It should be noted that there must also be a Symbiodinium protein(s) that transports large amounts of glucose into the symbiosome lumen (Figure 7, B); it should be possible to identify the corresponding gene(s) among those expressed differentially in Symbiodinium cells growing in hospite relative to those growing in culture.

Possible glucose transport between cells To our knowledge, there is currently no information as to whether and how the gastrodermal cells provide energy to the epidermal cells, cells in the mesoglea, and gastrodermal cells that lack dinoflagellates and/or access to nutrients from the gastric cavity, and as to whether these modes of nourishment change upon the establishment of symbiosis. Nourishment of the epidermal cells is a major issue because these cells neither contain algae nor have direct access to food, but they presumably require large amounts of energy for maintenance, reproduction, nematocyst replacement, and mucus production (which is extensive and has been reported to consume as much as 40% of the energy available to corals [52]). Thus, one or more of the upregulated glucose transporters might be found in the basolateral membranes of the gastrodermal cells (Figure 7, C and D), the basolateral membranes of the epidermal cells (Figure 7, E), or

77 both. These questions should be resolvable by immunolocalization experiments and/or experiments in which gene expression is evaluated in separated tissue layers [127].

Fatty-acid metabolism and transport Our previous study of the transfer of fixed carbon from the algae to the host was more comprehensive in its evaluation of polar than of nonpolar compounds [28]. Thus, it is conceivable that the upregulation of genes encoding proteins of fatty-acid metabolism and transport reflects a significant role of fatty acids in this transfer (Figure 7, F). Importantly, however (1) we saw upregulation of genes of both fatty- acid synthesis and fatty-acid breakdown (suggesting that gastrodermal and epidermal cells may be behaving differently) and (2) in most animals, high levels of glucose (such as those expected in the cytoplasm of gastrodermal cells harboring algae) inhibit

β-oxidation and stimulate fatty-acid synthesis [144]. Thus, we think it more likely that gastrodermal cells synthesize fatty acids from the glucose provided by the algae during the daytime (Figure 7, G), store them in neutral fats or wax esters, and subsequently mobilize them to serve as an energy supply at night and/or for transfer into the mesoglea (Figure 7, H) and then into the epidermal cells (Figure 7, I)

[52,145]. The epidermal cells would metabolize the fatty acids by β-oxidation (Figure 7, J) to provide energy and acetyl-CoA building blocks. The apparent upregulation of the glyoxylate cycle (Figure 7, K; see Results) would be explained by the epidermal cells' need to synthesize carbohydrates, amino acids, etc., from acetyl-CoA. Testing of these hypotheses should be possible by immunolocalization of the relevant proteins, in situ mRNA hybridization, and/or examination of gene expression in separated gastrodermal and epidermal tissue layers.

Transport of sterols as building blocks or symbiosis signals

78 A previous study showed that the anemone A. viridis has at least two genes encoding Npc2-like proteins, one of which is in a sub-family distinct from that of the mammalian and Drosophila Npc2 proteins and was upregulated in symbiotic gastrodermal tissue [127]. We have confirmed and extended these findings by showing that Aiptasia and several other cnidarians also contain multiple genes encoding Npc2-like proteins. In a phylogenetic analysis, one of the Aiptasia proteins clustered with the mammalian and Drosophila proteins and shares with them key residues implicated in cholesterol binding. Four other Aiptasia proteins, including the one (Npc2D) whose transcript is massively upregulated in symbiotic anemones, belong to a separate subfamily that also contains the upregulated A. viridis Npc2D; members of this subfamily do not share the residues implicated in cholesterol binding (Figure 2B). In contrast, we identified only one Aiptasia gene encoding an unequivocal Npc1-like protein; its expression did not change between aposymbiotic and symbiotic animals. These observations suggest the following speculative model (Figure 7, L). A Symbiodinium-encoded sterol transporter (e.g., an Npc1-like protein) is present in the dinoflagellate plasma membrane and passes one or more dinoflagellate-synthesized sterols to the Aiptasia-encoded Npc2D, which is localized specifically to the symbiosome lumen. Npc2D in turn passes the sterol(s) to the Aiptasia-encoded Npc1 in the symbiosome membrane, which passes them to a sterol- carrier protein in the cytosol. If the same Npc1 protein functions in different membranes, in concert with all of the different Npc2 partners, and in other cell types as well as gastrodermal cells containing dinoflagellates, this could explain why its transcript is not significantly upregulated upon the onset of symbiosis. The model to this point is agnostic about what sterols might be transferred, and to what end(s), but these are also important questions. The membranes of Aiptasia, like those of other animals, presumably contain cholesterol as an essential component.

79 This cholesterol could be obtained from food, by de novo synthesis, or by modification of one or more of the distinctive, non-cholesterol sterols (dinosterol, gorgosterols) produced in large quantities by many dinoflagellates (but not by other marine algae that have been investigated) [146]. Based on its sequence (Figure 2B), Aiptasia Npc2A seems the most likely to be involved in cholesterol traffic per se, whereas Npc2D (and Npc2B, C, and E: Figure 2) might all be involved in traffic of various other dinoflagellate-produced sterols. The latter might serve only as precursors of cholesterol, but the more intriguing possibility is that one or more of these molecules serves as the signal that a symbiosis-compatible dinoflagellate is present in the endosome/symbiosome. Although the model of Figure 7L, is speculative, it is important to note that its major features should be testable by experiments that include (i) localization to the gastrodermal and/or epidermal cell layers of the expression of the several genes, (ii) protein localization by immunofluorescence and/or cell fractionation, (iii) using purified, labeled sterols to test the binding specificities of bacterially expressed Npc2 proteins, and (iv) tests of the abilities of exogenously added Npc2 proteins to complement the loss-of-cholesterol-transport phenotype in NPC2-knockout human cells [129]. It is also worth noting that if this model is correct, Npc2D proteins would become an invaluable marker for the isolation of intact (i.e., non-ruptured) symbiosomes.

Transport and metabolism of inorganic nutrients and the coordination of nitrogen and carbon metabolism

+ It is clear that symbiotic cnidarians must transport CO2, NH4 , and other inorganic nutrients across the symbiosome membrane – and indeed concentrate at least some of these materials within the symbiosome lumen – in order to provide their resident dinoflagellates with these essential building blocks [49]. Thus, at least some of the

80 many inorganic-nutrient transporters and transport-related proteins that are upregulated in symbiotic anemones (see Results) are presumably localized to the symbiosome membrane (Fig. 7, M) or lumen. However, it seems virtually certain that the onset of symbiosis also induces changes in inorganic-nutrient transport across the various plasma-membrane domains. For example, aposymbiotic anemones

+ presumably excrete CO2 and NH4 across the apical membranes of both epidermal and gastrodermal cells, whereas symbiotic anemones presumably reduce such excretion at least from the gastrodermal cells, and may even achieve a net uptake of both compounds from the environment. Although the possibilities are too many and complicated for ready depiction in Fig. 7, determining the cellular (gastrodermal, epidermal, or both) and intracellular (symbiosome membrane, apical plasma membrane, basolateral plasma membrane, and/or other) localizations of the differentially regulated transporters and transport-related proteins should begin to answer many of these questions with a decisiveness that has not been possible before.

+ The changes in NH4 movement upon symbiosis establishment also bear upon the probable linkage between carbon and nitrogen metabolism. In this regard, two non- mutually exclusive models have been put forward. The "nitrogen-recycling model" focuses on the possibility that continued host catabolism of amino acids produces

+ NH4 that is supplied to the dinoflagellate, which in turn releases amino acids for use by the host (Fig. 7, N) [42,147,148]. In contrast, the "nitrogen-conservation model" focuses on the possibility that the fixed carbon provided by the dinoflagellate leads to

+ a suppression of host amino-acid catabolism and therefore of the generation of NH4 to be used by the dinoflagellate or excreted [44,149] [42]. Our data provide support for aspects of both models. In particular, we observed upregulation of the genes of the

+ GS-GOGAT cycle (the first dedicated step of NH4 assimilation in animals) in symbiotic anemones, indicating that the release of carbon from the dinoflagellate

81 promoted synthesis, rather than catabolism, of some amino acids. [The simultaneous upregulation of a presumably catabolic glutamate dehydrogenase might be taken as countervailing evidence, but this observation is difficult to interpret without knowing in which cell type(s) and cytoplasmic compartment this upregulation occurs.] Meanwhile, we also obtained strong support for previous evidence that cnidarians, like other animals, can only synthesize eight of the 20 amino acids found in proteins from intermediates of the central metabolic pathways. The remaining 12 amino acids must thus be obtained either from food or from the dinoflagellates, and our observation that the host genes for at least nine amino-acid transporters are upregulated in symbiotic anemones (see Results) suggests strongly that the dinoflagellates indeed contribute to the host's amino-acid supply. It should be informative to determine the localizations and amino-acid specificities of these host-encoded transporters as well as identify any amino-acid transporters expressed differentially by Symbiodinium in hospite (as it is unlikely that free-living dinoflagellates would export amino acids into the surrounding seawater).

Recognition and tolerance of dinoflagellate symbionts by the host Establishment and maintenance of the mutualistic relationship also require that the host recognize and tolerate the dinoflagellate symbionts. Thus, it was not surprising that an unbiased screen for functional groups that were enriched among the differentially expressed genes revealed three groups that might be involved in these processes, as discussed below.

Response to oxidative stress Both a priori logic and considerable experimental evidence support the view that possession of an intracellular photosynthetic symbiont imposes oxidative stress on the host, particularly under conditions in which chloroplast damage may result in

82 enhanced production of reactive oxygen species (ROS) [150–152]. If not detoxified, ROS can damage DNA, proteins, and lipids [153], and it is widely believed that ROS production under stress is the major trigger of symbiosis breakdown during bleaching [106,152,154–156]. Surprisingly, however, our results provide no support for this model: most of the differentially regulated genes in this GO category were actually downregulated in symbiotic animals, and the two that were upregulated do not seem likely to be involved in ROS detoxification (see Results). Previous studies of other species of anemones have also found host genes thought to be involved in ROS detoxification (/zinc superoxide dismutase and glutathione S-transferase) to be downregulated in symbiotic relative to aposymbiotic individuals [127,143]. Although other studies have indicated that symbiotic cnidarians have higher superoxide- dismutase activities than their aposymbiotic counterparts [157,158], it was not determined whether the enzyme was of host or dinoflagellate origin. Thus, it is possible that the hosts are protected from ROS by symbiont-generated antioxidants and can reduce the expression of their own enzymes [143]. These other studies, like our own, were conducted under conditions thought to be non-stressful, and it is possible that a different picture would emerge under stressful conditions. In that regard, however, we have also recently observed that bleaching under heat stress can occur rapidly in the dark, when photosynthetically produced ROS cannot be present

[159].

Inflammation/tissue remodeling/response to wounding Inflammation is a protective tissue response to injury or pathogens that serves to destroy, dilute, and/or wall off both the injurious agent and the injured tissue [160]. In invertebrates, including anthozoans, the inflammation-like response involves both cellular and humoral aspects, including the infiltration of immune cells such as

83 amoebocytes and granular cells [161–163], phagocytosis and/or encapsulation of foreign material [161,163–165], and the production of cytotoxic molecules such as ROS, nitric oxide, lysozyme, antimicrobial peptides, and intermediates of the phenoloxidase cascade [162,163,166,167]. Our data suggest that the establishment of symbiosis is associated with an overall attenuation of the inflammatory response (see Results), presumably to allow the dinoflagellate to co-exist peacefully with the host rather than being attacked as a harmful invader. Other studies also support this conclusion and suggest that it may be a general feature of the means by which animal hosts accommodate symbiotic microbes. For example, when symbiotic and aposymbiotic Aiptasia were challenged with bacterial lipopolysaccharide (LPS), the former produced much less nitric oxide than did the latter [168], and in two hard coral species with inflammatory-like responses, dinoflagellate densities were lower in the "inflamed" than in the adjacent healthy tissues [162]. Similarly, successful colonization of squid light organs by symbiotic bacteria is associated with an irreversible attenuation of host nitric oxide production [169,170]. Of particular interest because of its massive upregulation in symbiotic anemones is the gene encoding scavenger receptor B class member 1 (SRB1); upregulation of SRB1 was also observed previously in symbiotic individuals of the anemone Anthopleura [143]. SRB1 is a member of the CD36 protein family and is a transmembrane cell-surface glycoprotein that has been implicated in multiple functions, including lipid transport (see Figure 2, Table 7, and associated text), cell adhesion, wound healing, apoptosis, and innate immunity [171]. Perhaps of most interest is its role in Plasmodium infection, as these apicomplexan parasites are a sister taxon to the dinoflagellates [172]. SRB1 has been shown to boost host hepatocyte permissiveness to Plasmodium infection, promote parasite development by acting as major lipid provider, and enable adhesion between Plasmodium-infected and

84 uninfected erythrocytes, thus allowing for movement of parasites between host cells [173–175]. It is possible that SRB1 has similar functions in the cnidarian- dinoflagellate symbiosis, and further investigation of its function by protein localization and knockdown of function should be highly informative.

Apoptosis/cell death The possible roles of apoptosis and necrotic cell death in the breakdown of symbiosis under stress have been investigated [176–183], and a role for apoptosis in the post-phagocytic selection of compatible symbionts has also been suggested [184]. However, the possible role of apoptosis in maintenance of a stable symbiotic relationship has not been addressed experimentally. Others have suggested that apoptosis might contribute to the dynamic equilibrium between host and symbiont cell growth and proliferation that is presumably necessary to ensure a stable relationship [46,185,186], and our observation that 13 apoptosis/cell death-related genes were differentially expressed (some massively so) in symbiotic relative to aposymbiotic anemones is broadly consistent with this possibility. However, the complexity of the apoptotic pathways and the fact that a single protein can have either pro- or anti- apoptotic function depending on its localization and/or the presence/absence of other specific signals makes it impossible to draw firm conclusions from gene-expression data alone. Nonetheless, it is worth noting the potential role of tumor-necrosis factor (TNF) family members and their associated proteins, which are prominent regulators of cell survival, proliferation, and differentiation in both vertebrates and invertebrates (reviewed by [187]). We found a TNF-family ligand, a TNF receptor, a receptor- associated factor, and the functionally related "growth-arrest and DNA-damage- inducible protein" all to be upregulated in symbiotic anemones (1.9-, 60-, 1.8-, and 5.1-fold, respectively). These proteins are capable of inducing caspase-dependent

85 apoptosis via at least two different pathways [188–191], as well as of activating the multi-functional NFκB and MAPK pathways [189,190], so that they may coordinate multiple biological processes to regulate symbiotic stability. Interestingly, genes encoding TNF receptors and receptor-associated proteins were also prominent among the genes found to be upregulated in corals living under chronic, mild, heat stress

[192].

86 References

1. Frihy OE, El Ganaini MA, El Sayed WR, Iskander MM (2004) The role of fringing coral reef in beach protection of Hurghada, Gulf of Suez, Red Sea of Egypt. Ecological Engineering 22: 17–25.

2. Carté B (1996) Biomedical potential of marine natural products. Bioscience 46: 271–286.

3. Stoeckl N, Hicks CC, Mills M, Fabricius K, Esparon M, et al. (2011) The economic value of ecosystem services in the Great Barrier Reef: our state of knowledge. Annals of the New York Academy of Sciences 1219: 113–133.

4. Kinsey D, Hopley D (1991) The significance of coral reefs as global carbon sinks—response to Greenhouse. Palaeogeography, Palaeoclimatology, Paleoecology 89: 363–377.

5. De’ath G, Fabricius KE, Sweatman H, Puotinen M (2012) The 27-year decline of coral cover on the Great Barrier Reef and its causes. Proceedings of the National Academy of Sciences of the United States of America 109: 17995– 17999.

6. Liu G, Matrosova L, Penland C, Gledhill D, Eakin C, et al. (2008) NOAA Coral Reef Watch coral bleaching outlook system. Proceedings of the 11th International Coral Reef Symposium. Fort Lauderdale, Fl. pp. 7–11.

7. Donner S, Heron S, Skirving W (2009) Future scenarios: a review of modelling efforts to predict the future of coral reefs in an era of climate change. In: van Oppen M, Lough J, editors. Coral Bleaching. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 159–173.

8. Weis VM, Davy SK, Hoegh-Guldberg O, Rodriguez-Lanetty M, Pringle JR (2008) Cell biology in model systems as the key to understanding corals. Trends in Ecology & Evolution 23: 369–376.

9. Sunagawa S, Wilson EC, Thaler M, Smith ML, Caruso C, et al. (2009) Generation and analysis of transcriptomic resources for a model system on the rise: the sea anemone Aiptasia pallida and its dinoflagellate endosymbiont. BMC Genomics 10: 258.

10. Wisecaver JH, Hackett JD (2011) Dinoflagellate genome evolution. Annual Review of Microbiology 65: 369–387.

11. Ladner JT, Barshis DJ, Palumbi SR (2012) Protein evolution in two co- occurring types of Symbiodinium: an exploration into the genetic basis of

87 thermal tolerance in Symbiodinium clade D. BMC Evolutionary Biology 12: 217.

12. Lajeunesse TC, Lambert G, Andersen RA, Coffroth MA, Galbraith DW (2005) Symbiodinium (Pyrrhophyta) genome sizes (DNA content) are smallest among dinoflagellates. Journal of Phycology 41: 880–886.

13. Allen J, Roberts TM, Loeblich A, Klotz L (1975) Characterization of the DNA from the dinoflagellate Crypthecodinium Cohnii and implications for nuclear organization. Cell 6: 161–169.

14. Davies W, Jakobsen KS, Nordby O (1988) Characterization of DNA from the dinoflagellate Woloszyniskia bostoniensis. Journal of Protozoology 35: 418– 422.

15. Hackett JD, Scheetz TE, Yoon HS, Soares MB, Bonaldo MF, et al. (2005) Insights into a dinoflagellate genome through expressed sequence tag analysis. BMC Genomics 6: 80.

16. Bayer T, Aranda M, Sunagawa S, Yum LK, Desalvo MK, et al. (2012) Symbiodinium transcriptomes: genome insights into the dinoflagellate symbionts of reef-building corals. PloS ONE 7: e35269.

17. Chan Y-H, Wong JTY (2007) Concentration-dependent organization of DNA by the dinoflagellate histone-like protein HCc3. Nucleic Acids Research 35: 2573–2583.

18. Lin S, Zhang H, Zhuang Y, Tran B, Gill J (2010) Spliced leader-based metatranscriptomic analyses lead to recognition of hidden genomic features in dinoflagellates. Proceedings of the National Academy of Sciences of the United States of America 107: 20033–20038.

19. Zhang H, Lin S (2009) Retrieval of missing spliced leader in dinoflagellates. PLoS ONE 4: e4129.

20. Zhang H, Hou Y, Miranda L, Campbell DA, Sturm NR, et al. (2007) Spliced leader RNA trans-splicing in dinoflagellates. Proceedings of the National Academy of Sciences of the United States of America 104: 4618–4623.

21. Zhang Z, Green B, Cavalier-Smith T (1999) Single gene circles in dinoflagellate chloroplast genomes. Nature 400: 155–159.

22. Cavalier-Smith T (2002) Chloroplast evolution : secondary symbiogenesis and multiple losses. Current Biology 12: 62–64.

88 23. Yoon HS, Hackett JD, Bhattacharya D (2002) A single origin of the peridinin- and fucoxanthin-containing plastids in dinoflagellates through tertiary endosymbiosis. Proceedings of the National Academy of Sciences of the United States of America 99: 11724–11729.

24. Yoon HS, Hackett JD, Van Dolah FM, Nosenko T, Lidie KL, et al. (2005) Tertiary endosymbiosis driven genome evolution in dinoflagellate algae. Molecular Biology and Evolution 22: 1299–1308.

25. Wood-Charlson EM, Hollingsworth LL, Krupp DA, Weis VM (2006) Lectin/glycan interactions play a role in recognition in a coral/dinoflagellate symbiosis. Cellular Microbiology 8: 1985–1993.

26. Fitt W, Trench R (1983) Endocytosis of the symbiotic dinoflagellate Symbiodinium microadriaticum by endodermal cells of the scyphistomae of Cassioeia xamachana and resistance of the algae to host digestion. Journal of Cell Science 64: 195–212.

27. Muscatine L, Porter JW (1977) Reef corals: mutualistic symbioses adapted to nutrient-poor environments. Bioscience 27: 454–460.

28. Burriesci MS, Raab TK, Pringle JR (2012) Evidence that glucose is the major transferred metabolite in dinoflagellate-cnidarian symbiosis. The Journal of Experimental Biology 215: 3467–3477.

29. Chen MC, Cheng YM, Sung PJ, Kuo CE, Fang LS (2003) Molecular identification of Rab7 (ApRab7) in Aiptasia pulchella and its exclusion from phagosomes harboring zooxanthellae. Biochemical and Biophysical Research Communications 308: 586–595.

30. Suescún-Bolívar LP, Iglesias-Prieto R, Thomé PE (2012) Induction of glycerol synthesis and release in cultured Symbiodinium. PloS ONE 7: e47182.

31. Grant A, Remond M, Hinde R (1998) Low molecular-weight factor from versipora ( ) that release and glycerol metabolism of isolated symbiotic algae. Marine Biology: 553–557.

32. Muscatine L (1967) Glycerol excretion by symbiotic algae from corals and Tridacna and its control by the host. Science 156: 516 – 519.

33. Grant A, People J (1997) Effects of host-tissue homogenate of the scleractinian coral Plesiastrea versipora on glycerol metabolism in isolated symbiotic dinoflagellates. Marine Biology 128: 665–670.

89 34. Rees T, Fitt W, Baillie B, Yellowlees D (1993) A method for temporal measurement of hemolymph composition in the giant clam symbiosis and its application to glucose and glycerol levels during a diel cycle. Limnology and Oceanography 38: 213–217.

35. Ishikura M, Adachi K, Maruyama T (1999) Zooxanthellae release glucose in the tissue of a giant clam, Tridacna crocea. Marine Biology 133: 665–673.

36. Whitehead L, Douglas A (2003) Metabolite comparisons and the identity of nutrients translocated from symbiotic algae to an animal host. Journal of Experimental Biology 206: 3149–3157.

37. Kellogg RB, Patton JS (1983) Lipid droplets, medium of energy exchange in the symbiotic anemone Condylactis gigantea: a model coral polyp. Marine Biology 75: 137–149.

38. Patton J (1983) Lipid synthesis and extrusion by freshly isolated zooxanthellae (symbiotic algae). Marine Biology 136: 131–136.

39. Muscatine L, Gates RD, LaFontaine I (1994) Do symbiotic dinoflagellates secrete lipid droplets? Limnology and Oceanography 39: 925–929.

40. Guedes RLM, Prosdocimi F, Fernandes GR, Moura LK, Ribeiro H a L, et al. (2011) Amino acids biosynthesis and nitrogen assimilation pathways: a great genomic deletion during eukaryotes evolution. BMC Genomics 12 Suppl 4: S2.

41. Swanson R, Hoegh-Guldberg O (1998) Amino acid synthesis in the symbiotic sea anemone Aiptasia pulchella. Marine Biology 131: 83–93.

42. Wang J, Douglas A (1998) Nitrogen recycling or nitrogen conservation in an alga-invertebrate symbiosis? The Journal of Experimental Biology 201: 2445– 2453.

43. Wilkerson FP, Muscatine L (1984) Uptake and assimilation of dissolved inorganic nitrogen by a symbiotic sea anemone. Proceedings of the Royal Society B: Biological Sciences 221: 71–86.

44. Rees T a. V. (1986) The Green Hydra Symbiosis and Ammonium I. The Role of the Host in Ammonium Assimilation and its Possible Regulatory Significance. Proceedings of the Royal Society B: Biological Sciences 229: 299–314.

45. Wang JT, Douglas a. E (1999) Essential amino acid synthesis and nitrogen recycling in an alga-invertebrate symbiosis. Marine Biology 135: 219–222.

90 46. Davy SK, Allemand D, Weis VM (2012) Cell biology of cnidarian- dinoflagellate symbiosis. Microbiology and Molecular Biology Reviews 76: 229–261.

47. Harland A (1990) Zinc and cadmium absorption in the symbiotic anemone Anemonia viridis and the non-symbiotic anemone Actinia equina. Journal of the Marine … 70: 789–802.

48. Roberts J, Fixter L, Davies P (2001) Ammonium metabolism in the symbiotic sea anemone Anemonia viridis. Hydrobiologia 461: 25–35.

49. Pernice M, Meibom A, Van Den Heuvel A, Kopp C, Domart-Coulon I, et al. (2012) A single-cell view of ammonium assimilation in coral-dinoflagellate symbiosis. The ISME Journal: 1–11.

50. Lipschultz F, Cook C (2002) Uptake and assimilation of 15N-ammonium by the symbiotic sea anemones Bartholomea annulata and Aiptasia pallida: conservation versus recycling of nitrogen. Marine Biology 140: 489–502.

51. Roberts J, Davies P, Fixter L, Preston T (1999) Primary site and initial products of ammonium assimilation in the symbiotic sea anemone Anemonia viridis. Marine Biology 135: 223–236.

52. Crossland C, Barnes D, Borowitzka M (1980) Diurnal lipid and mucus production in the staghorn coral Acropora acuminata. Marine Biology 60: 81– 90.

53. Crossland CJ (1987) In situ release of mucus and DOC-lipid from the corals Acropora variabilis and Stylophora pistillata in different light regimes. Coral Reefs 6: 35–42.

54. Oku H, Yamashiro H, Onaga K, Sakai K, Iwasaki H (2003) Seasonal changes in the content and composition of lipids in the coral Goniastrea aspera. Coral Reefs 22: 83–85.

55. Blanquet RS, Nevenzel JC, Benson a. a. (1979) Acetate incorporation into the lipids of the anemone Anthopleura elegantissima and its associated zooxanthellae. Marine Biology 54: 185–194.

56. Yamashiro H, Oku H, Onaga K (2005) Effect of bleaching on lipid content and composition of Okinawan corals. Fisheries Science 71: 448–453.

57. Harland A, Davies P, Fixter L (1992) Lipid content of some Carribean corals in relation to depth and light. Marine Biology 113: 357–361.

91 58. Harland A, Fixter L, Davies P, Anderson R (1992) Effect of light on the total lipid content and storage lipids of the symbiotic sea anemone Anemonia viridis. Marine Biology 112: 253–258.

59. Roth E, Jeon K, Stacey G (1988) Homology in endosymbiotic systems: the term “symbiosome.”

60. Wakefield T, Kempf S (2001) Development of host-and symbiont-specific monoclonal antibodies and confirmation of the origin of the symbiosome membrane in a cnidarian-dinoflagellate symbiosis. The Biological Bulletin 200: 127–143.

61. Chen MC, Hong MC, Huang YS, Liu MC, Cheng YM, et al. (2005) ApRab11, a cnidarian homologue of the recycling regulatory protein Rab11, is involved in the establishment and maintenance of the Aiptasia-Symbiodinium endosymbiosis. Biochemical and Biophysical Research Communications 338: 1607–1616.

62. Muscatine L, Falkowski PG, Dubinsky Z (1983) Carbon budgets in symbiotic associations.

63. Richier S, Rodriguez-Lanetty M, Schnitzler CE, Weis VM (2008) Response of the symbiotic cnidarian Anthopleura elegantissima transcriptome to temperature and UV increase. Comparative Biochemistry and Physiology Part D, Genomics & Proteomics 3: 283–289.

64. Desalvo MK, Voolstra CR, Sunagawa S, Schwarz JA, Stillman JH, et al. (2008) Differential gene expression during thermal stress and bleaching in the Caribbean coral Montastraea faveolata. Molecular Ecology 17: 3952–3971.

65. Rodriguez-Lanetty M, Harii S, Hoegh-Guldberg O (2009) Early molecular responses of coral larvae to hyperthermal stress. Molecular Ecology 18: 5101– 5114.

66. Seneca F, Foret S, Ball EE, Portune KJ, Voolstra CR, et al. (2010) Development and heat stress-induced transcriptomic changes during embryogenesis of the scleractinian coral Acropora palmata. Marine Genomics 3: 51–62.

67. Aranda M, Banaszak AT, Bayer T, Luyten JR, Medina M, et al. (2011) Differential sensitivity of coral larvae to natural levels of ultraviolet radiation during the onset of larval competence. Molecular Ecology 20: 2955–2972.

92 68. Bellantuono AJ, Hoegh-Guldberg O, Rodriguez-Lanetty M (2011) Resistance to thermal stress in corals without changes in symbiont composition. Proceedings of the Royal Society B: Biological Sciences.

69. Meyer E, Aglyamova G V, Matz M V (2011) Profiling gene expression responses of coral larvae (Acropora millepora) to elevated temperature and settlement inducers using a novel RNA-Seq procedure. Molecular Ecology 20: 3599–3616.

70. Howells EJ, Beltran VH, Larsen NW, Bay LK, Willis BL, et al. (2011) Coral thermal tolerance shaped by local adaptation of photosymbionts. Nature Climate Change 2: 116–120.

71. Schoenberg DA, Trench RK (1980) Genetic variation in Symbiodinium (=Gymnodinium) microadriaticum Freudenthal, and specificity in its symbiosis with marine invertebrates. I. Isoenzyme and soluble protein patterns of axenic cultures of Symbiodinium microadriaticum. Proceedings of the Royal Society B Biological Sciences 207: 405–427.

72. Belda-Baillie C, Baillie B, Maruyama T (2002) Specificity of a model cnidarian-dinoflagellate symbiosis. Biological Bulletin 202: 74–85.

73. Perez S (2013) Unpublished data.

74. Shinzato C, Shoguchi E, Kawashima T, Hamada M, Hisata K, et al. (2011) Using the Acropora digitifera genome to understand coral responses to environmental change. Nature 476: 320–323.

75. Putnam NH, Srivastava M, Hellsten U, Dirks B, Chapman J, et al. (2007) Sea anemone genome reveals ancestral eumetazoan gene repertoire and genomic organization. Science 317: 86–94.

76. Chapman JA, Kirkness EF, Simakov O, Hampson SE, Mitros T, et al. (2010) The dynamic genome of Hydra. Nature 464: 592–596.

77. Forêt S, Kassahn K, Grasso L, Hayward D, Iguchi A, et al. (2007) Genomic and microarray approaches to coral reef conservation biology. Coral Reefs 26: 475– 486.

78. Sabourault C, Ganot P, Deleury E, Allemand D, Furla P (2009) Comprehensive EST analysis of the symbiotic sea anemone, Anemonia viridis. BMC Genomics 10: 333.

93 79. Meyer E, Aglyamova G V, Wang S, Buchanan-Carter J, Abrego D, et al. (2009) Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx. BMC Genomics 10: 219.

80. Traylor-Knowles N, Granger BR, Lubinski TJ, Parikh JR, Garamszegi S, et al. (2011) Production of a reference transcriptome and transcriptomic database (PocilloporaBase) for the cauliflower coral, Pocillopora damicornis. BMC Genomics 12: 585.

81. Polato NR, Vera JC, Baums IB (2011) Gene discovery in the threatened Elkhorn coral: 454 sequencing of the Acropora palmata transcriptome. PLoS ONE 6: e28634.

82. Burriesci MS, Lehnert EM, Pringle JR (2012) Fulcrum: condensing redundant reads from high-throughput sequencing studies. Bioinformatics 28: 1324–1327.

83. Martin J, Bruno VM, Fang Z, Meng X, Blow M, et al. (2010) Rnnotator: an automated de novo transcriptome assembly pipeline from stranded RNA-Seq reads. BMC Genomics 11: 663.

84. Surget-Groba Y, Montoya-Burgos JI (2010) Optimization of de novo transcriptome assembly from next-generation sequencing data. Genome Research 20: 1432–1440.

85. Zerbino DR, Birney E (2008) Velvet: algorithms for de novo>/i> short read assembly using de Bruijn graphs. Genome Research 18: 821–829.

86. Schulz MH, Zerbino DR, Vingron M, Birney E (2012) Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics 28: 1086–1092.

87. Li W, Godzik A, Ã WL (2006) Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22: 1658.

88. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Research 9: 868.

89. Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, et al. (2009) BLAST+: architecture and applications. BMC Bioinformatics 10: 421.

90. Zdobnov E, Apweiler R (2001) InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17: 847.

94 91. Conesa A, Götz S, García-Gómez J, Terol J, Talón M, et al. (2005) Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21: 3674.

92. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25: 25–29.

93. Kent WJ (2002) BLAT--the BLAST-like alignment tool. Genome Research 12: 656–664.

94. Hu H, Bandyopadhyay P, Olivera B, Yandell M (2011) Characterization of the Conus bullatus genome and its venom-duct transcriptome. BMC Genomics 12: 60.

95. Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, et al. (2005) InterProScan: protein domains identifier. Nucleic Acids Research 33: W116– W120.

96. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the United States of America 89: 10915–10919.

97. Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Research 32: D277–80.

98. Nishihama R, Onishi M, Pringle JR (2011) New insights into the phylogenetic distribution and evolutionary origins of the septins. Biological Chemistry 392: 681–687.

99. Wakeley J (1994) Substitution-rate variation among sites and the estimation of transition bias. Molecular Biology and Evolution 11: 436–442.

100. Grimmelikhuijzen C, Williamson M, Hansen G (2002) Neuropeptides in cnidarians. Canadian Journal of Zoology 80: 1690–1702.

101. Leitz T (1998) Metamorphosin A and related compounds: a novel family of neuropeptides with morphogenic activity. Annals of the New York Academy of Sciences 839: 105–110.

102. Iwao K, Fujisawa T, Hatta M (2002) A cnidarian neuropeptide of the GLWamide family induces metamorphosis of reef-building corals in the genus Acropora. Coral Reefs 21: 127–129.

95 103. Erwin P, Szmant A (2010) Settlement induction of Acropora palmata planulae by a GLW-amide neuropeptide. Coral Reefs 29: 929–939.

104. Muscatine L, Falkowski PG, Porter JW, Dubinsky Z (1984) Fate of photosynthetic fixed carbon in light- and shade-adapted colonies of the symbiotic coral Stylophora pistillata. Proceedings of the Royal Society B: Biological Sciences 222: 181–202.

105. Hughes TP, Baird a H, Bellwood DR, Card M, Connolly SR, et al. (2003) Climate change, human impacts, and the resilience of coral reefs. Science 301: 929–933.

106. Weis VM (2008) Cellular mechanisms of Cnidarian bleaching: stress causes the collapse of symbiosis. The Journal of Experimental Biology 211: 3059–3066.

107. Lajeunesse T, Parkinson J, Reimer J (2012) A genetics-based description of Symbiodinium minutum sp. nov. and S. psygmophilum sp. nov. (Dinophyceae), two dinoflagellates symbiotic with Cnidaria. Journal of Phycology 48: 1380– 1391.

108. Wright PA (1995) Nitrogen excretion: three end products, many physiological roles. 281: 273–281.

109. Magoc T, Salzberg SL (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27: 1–8.

110. Martin M (2011) Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal: 10–12.

111. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26: 2460–2461.

112. Burriesci MS (2011) Developing Aiptasia pallida as a tractable model system for cnidarian-dinoflagellate symbiosis: transferred metabolites and designing tools for analysis of ultra-high-throughput-sequencing data. Stanford University.

113. Lehnert EM, Burriesci MS, Pringle JR (2012) Developing the anemone Aiptasia as a tractable model for cnidarian-dinoflagellate symbiosis: the transcriptome of aposymbiotic A. pallida. BMC Genomics 13: 271.

114. Xiang T, Hambleton E, DeNofrio J, Pringle JR, Grossman A (2013) Isolation of clonal, axenic strains of the symbiotic dinoflagellate Symbiodinium and their growth and host specificity. Journal of Phycology.

96 115. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics 25: 1754–1760.

116. Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome biology 11: R106.

117. Zhao S, Fernald RD (2005) Comprehensive algorithm for quantitative real-time polymerase chain reaction. Journal of Computational Biology 12: 1047–1064.

118. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, et al. (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome biology 3: RESEARCH0034.

119. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792–1797.

120. Ronquist F, Huelsenbeck JP (2003) MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19: 1572–1574.

121. Dennis G, Sherman BT, Hosack D a, Yang J, Gao W, et al. (2003) DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome biology 4: P3.

122. Huang S, Yuan S, Guo L, Yu Y, Li J, et al. (2008) Genomic analysis of the immune gene repertoire of amphioxus reveals extraordinary innate complexity and diversity. Genome Research 18: 1112–1126.

123. Augustin R, Riley J, Moley KH (2005) GLUT8 contains a [DE]XXXL[LI] sorting motif and localizes to a late endosomal/lysosomal compartment. Traffic 6: 1196–1212.

124. Infante RE, Wang ML, Radhakrishnan A, Kwon HJ, Brown MS, et al. (2008) NPC2 facilitates bidirectional transfer of cholesterol between NPC1 and lipid bilayers, a step in cholesterol egress from lysosomes. Proceedings of the National Academy of Sciences of the United States of America 105: 15287– 15292.

125. Frolov A, Zielinski SE, Crowley JR, Dudley-Rucker N, Schaffer JE, et al. (2003) NPC1 and NPC2 regulate cellular cholesterol homeostasis through generation of low density lipoprotein cholesterol-derived oxysterols. The Journal of Biological Chemistry 278: 25517–25525.

126. Sleat DE, Wiseman J a, El-Banna M, Price SM, Verot L, et al. (2004) Genetic evidence for nonredundant functional cooperativity between NPC1 and NPC2

97 in lipid transport. Proceedings of the National Academy of Sciences of the United States of America 101: 5886–5891.

127. Ganot P, Moya A, Magnone V, Allemand D, Furla P, et al. (2011) Adaptations to endosymbiosis in a cnidarian-dinoflagellate association: differential gene expression and specific gene duplications. PLoS Genetics 7: e1002187.

128. Wang ML, Motamed M, Infante RE, Abi-Mosleh L, Kwon HJ, et al. (2010) Identification of surface residues on Niemann-Pick C2 essential for hydrophobic handoff of cholesterol to NPC1 in lysosomes. Cell Metabolism 12: 166–173.

129. Ko DC, Binkley J, Sidow A, Scott MP (2003) The integrity of a cholesterol- binding pocket in Niemann-Pick C2 protein is necessary to control lysosome cholesterol levels. Proceedings of the National Academy of Sciences of the United States of America 100: 2518–2525.

130. Uehlein N, Sperling H, Heckwolf M, Kaldenhoff R (2012) The Arabidopsis aquaporin PIP1;2 rules cellular CO(2) uptake. Plant, Cell & Environment 35: 1077–1083.

131. Kaldenhoff R (2012) Mechanisms underlying CO2 diffusion in leaves. Current Opinion in Plant Biology 15: 276–281.

132. Kondrashov F a, Koonin E V, Morgunov IG, Finogenova T V, Kondrashova MN (2006) Evolution of glyoxylate cycle enzymes in Metazoa: evidence of multiple horizontal transfer events and pseudogene formation. Biology Direct 1: 31.

133. Miflin BJ, Habash DZ (2002) The role of glutamine synthetase and glutamate dehydrogenase in nitrogen assimilation and possibilities for improvement in the nitrogen utilization of crops. Journal of Experimental Botany 53: 979–987.

134. Horton P, Park K-J, Obayashi T, Fujita N, Harada H, et al. (2007) WoLF PSORT: protein localization predictor. Nucleic Acids Research 35: W585–587.

135. Dudler N, Yellowlees D, Miller DJ (1987) Localization of Two L-Glutamate Dehydrogenases in the Coral Acropora latistella. Archives of Biochemistry and Biophysics 254: 368–371.

136. Nelson RE, Fessler LI, Takagi Y, Blumberg B, Keene DR, et al. (1994) Peroxidasin: a novel enzyme-matrix protein of Drosophila development. The EMBO journal 13: 3438–3447.

98 137. Chu H-T, Hsiao WWL, Chen J-C, Yeh T-J, Tsai M-H, et al. (2013) EBARDenovo: Highly accurate de novo assembly of RNA-Seq with efficient chimera-detection. Bioinformatics: 1–7.

138. Weis VM, Levine RP (1996) Differential protein profiles reflect the different lifestyles of symbiotic and aposymbiotic Anthopleura Elegantissima, a sea anemone from temperate waters. The Journal of Experimental Biology 199: 883–892.

139. Yuyama I, Watanabe T, Takei Y (2011) Profiling differential gene expression of symbiotic and aposymbiotic corals using a high coverage gene expression profiling (HiCEP) analysis. Marine Biotechnology.

140. Barneah O, Benayahu Y, Weis VM (2006) Comparative proteomics of symbiotic and aposymbiotic juvenile soft corals. Marine Biotechnology 8: 11– 16.

141. Kuo J, Liang Z, Lin C (2010) Suppression subtractive hybridization identifies genes correlated to symbiotic and aposymbiotic sea anemone associated with dinoflagellate. Journal of Experimental Marine Biology and Ecology 388: 11– 19.

142. Kuo J, Chen M-C, Lin C-H, Fang L-S (2004) Comparative gene expression in the symbiotic and aposymbiotic Aiptasia pulchella by expressed sequence tag analysis. Biochemical and Biophysical Research Communications 318: 176– 186.

143. Rodriguez-Lanetty M, Phillips WS, Weis VM (2006) Transcriptome analysis of a cnidarian-dinoflagellate mutualism reveals complex modulation of host gene expression. BMC Genomics 7: 23.

144. Randle PJ (1998) Regulatory interactions between lipids and carbohydrates: the glucose fatty acid cycle after 35 years. Diabetes/Metabolism Research and Reviews 14: 263–283.

145. Harland AD, Navarro JC, Spencer Davies P, Fixter LM (1993) Lipids of some Caribbean and Red Sea corals: total lipid, wax esters, triglycerides and fatty acids. Marine Biology 117: 113–117.

146. Giner J-L, Wikfors GH (2011) “Dinoflagellate sterols” in marine diatoms. Phytochemistry 72: 1896–1901.

147. Cates N, McLaughlin J (1976) Differences of ammonia metabolism in symbiotic and aposymbiotic Condylactus and Cassiopea spp. Journal of Experimental Marine Biology and Ecology 21: 1–5.

99 148. Szmant-Froelich A, Pilson M (1977) Nitrogen excretion by colonies of the temperate coral Astrangia danae with and without zooxanthellae. Proc. 3rd int. Coral Reef Symp. Vol. 1. pp. 417–424.

149. Rees T, Ellard F (1989) Nitrogen conservation and the green hydra symbiosis. Proceedings of the Royal Society B Biological Sciences 236: 203–212.

150. D’Aoust B, White R, Wells J, Olsen D (1976) Coral-algal associations: capacity for producing and sustaining elevated oxygen tensions in situ. Undersea Biomedical Research 3: 35–40.

151. Lesser MP (1997) Oxidative stress causes coral bleaching during exposure to elevated temperatures. Planta 16: 187–192.

152. Venn A, Loram J, Douglas A (2008) Photosynthetic symbioses in animals. Journal of Experimental Botany 59: 1069–1080.

153. Lesser M (2006) Oxidative stress in marine environments: biochemistry and physiological ecology. Annual Review of Physiology 68: 253–278.

154. Jones RJ, Larkum AWD, Schreiber U (1998) Temperature-induced bleaching of corals begins with impairment of the CO2 fixation mechanism in zooxanthellae. Plant, Cell & Environment 21: 1219–1230.

155. Downs CA, Fauth JE, Halas JC, Dustan P, Bemiss J, et al. (2002) Oxidative stress and seasonal coral bleaching. Free Radical Biology and Medicine 33: 533–543.

156. Lesser MP (2011) Coral bleaching: causes and mechanisms. Coral reefs: an ecosystem in transition. Springer. pp. 405–419.

157. Dykens JA, Shick JM (1982) Oxygen production by endosymbiotic algae controls superoxide dismutase activity in their animal host. Nature 297: 579– 580.

158. Furla P, Allemand D, Shick JM, Ferrier-Pagès C, Richier S, et al. (2005) The symbiotic anthozoan: a physiological chimera between alga and animal. Integrative and Comparative Biology 45: 595–604.

159. Tolleter D, Seneca F, DeNofrio J, Palumbi S, Pringle J, et al. (2013) Coral bleaching independent of photosynthetic activity. Current Biology (in press).

160. Sparks AK (1985) Synopsis of invertebrate pathology-exclusive of insects. Elsevier Science Publishers BV (Biomedical Division).

100 161. Patterson MJ, Landolt ML (1979) Cellular reaction to injury in the anthozoan Anthopleura elegantissima. Journal of Invertebrate Pathology 33: 189–196.

162. Palmer C V, Mydlarz LD, Willis BL (2008) Evidence of an inflammatory-like response in non-normally pigmented tissues of two scleractinian corals. Proceedings of the Royal Society B Biological Sciences 275: 2687–2693.

163. Mydlarz LD, Holthouse SF, Peters EC, Harvell CD (2008) Cellular responses in sea fan corals: granular amoebocytes react to pathogen and climate stressors. PLoS ONE 3: 9.

164. Olano CT, Bigger CH (2000) Phagocytic activities of the gorgonian coral Swiftia exserta. Journal of Invertebrate Pathology 76: 176–184.

165. Petes LE, Harvell CD, Peters EC, Webb MAH, Mullen KM (2003) Pathogens compromise reproduction and induce melanization in Caribbean sea fans. Marine Ecology Progress Series 264: 167–171.

166. Hutton DMC, Smith VJ (1996) Antibacterial properties of isolated amoebocytes from the sea snemone Actinia equina. Biological Bulletin 191: 441.

167. Perez S, Weis V (2006) Nitric oxide and cnidarian bleaching: an eviction notice mediates breakdown of a symbiosis. The Journal of Experimental Biology 209: 2804–2810.

168. Detournay O, Schnitzler CE, Poole A, Weis VM (2012) Regulation of cnidarian–dinoflagellate mutualisms: evidence that activation of a host TGFβ innate immune pathway promotes tolerance of the symbiont. Developmental & Comparative Immunology 38: 525–537.

169. Davidson SK, Koropatnick TA, Kossmehl R, Sycuro L, McFall-Ngai MJ (2004) NO means “yes” in the squid-vibrio symbiosis: nitric oxide (NO) during the initial stages of a beneficial association. Cellular Microbiology 6: 1139– 1151.

170. Altura MA, Stabb E, Goldman W, Apicella M, McFall-Ngai MJ (2011) Attenuation of host NO production by MAMPs potentiates development of the host in the squid-Vibrio symbiosis. Cellular Microbiology 13: 527–537.

171. Areschoug T, Gordon S (2009) Scavenger receptors: role in innate immunity and microbial pathogenesis. Cellular Microbiology 11: 1160–1169.

172. Baldauf SL (2003) The deep roots of eukaryotes. Science 300: 1703–1706.

101 173. Adams Y, Smith SL, Schwartz-Albiez R, Andrews KT (2005) Carrageenans inhibit the in vitro growth of Plasmodium falciparum and cytoadhesion to CD36. Parasitology Research 97: 290–294.

174. Yalaoui S, Huby T, Franetich J-F, Gego A, Rametti A, et al. (2008) Scavenger receptor BI boosts hepatocyte permissiveness to Plasmodium infection. Cell Host Microbe 4: 283–292.

175. Rodrigues CD, Hannus M, Prudêncio M, Martin C, Gonçalves LA, et al. (2008) Host scavenger receptor SR-BI plays a dual role in the establishment of malaria parasite liver infection. Cell host microbe 4: 271–282.

176. Dunn SR, Bythell JC, Le Tissier MDA, Burnett WJ, Thomason JC (2002) Programmed cell death and cell necrosis activity during hyperthermic stress- induced bleaching of the symbiotic sea anemone Aiptasia sp. Journal of Experimental Marine Biology and Ecology 272: 29–53.

177. Dunn SR, Thomason JC, Le Tissier MDA, Bythell JC (2004) Heat stress induces different forms of cell death in sea anemones and their endosymbiotic algae depending on temperature and duration. Cell Death and Differentiation 11: 1213–1222.

178. Dunn SR, Phillips WS, Green DR, Weis VM (2007) Knockdown of actin and caspase gene expression by RNA interference in the symbiotic anemone Aiptasia pallida. The Biological Bulletin 212: 250–258.

179. Richier S, Sabourault C, Courtiade J, Zucchini N, Allemand D, et al. (2006) Oxidative stress and apoptotic events during thermal stress in the symbiotic sea anemone, Anemonia viridis. The FEBS journal 273: 4186–4198.

180. Pernice M, Dunn S, Miard T, Dufour S (2011) Regulation of apoptotic mediators reveals dynamic responses to thermal stress in the reef building coral Acropora millepora. PloS ONE 6: e16095.

181. Tchernov D, Kvitt H, Haramaty L, Bibby TS, Gorbunov MY, et al. (2011) Apoptosis and the selective survival of host animals following thermal bleaching in zooxanthellate corals. Proceedings of the National Academy of Sciences of the United States of America 108: 9905–9909.

182. Ainsworth TD, Wasmund K, Ukani L, Seneca F, Yellowlees D, et al. (2011) Defining the tipping point. A complex cellular life/death balance in corals in response to stress. Scientific Reports 1: 160.

102 183. Kvitt H, Rosenfeld H, Zandbank K, Tchernov D (2011) Regulation of apoptotic pathways by Stylophora pistillata (, Pocilloporidae) to survive thermal stress and bleaching. PLoS ONE 6: e28665.

184. Dunn SR, Weis VM (2009) Apoptosis as a post-phagocytic winnowing mechanism in a coral-dinoflagellate mutualism. Environmental Microbiology 11: 268–276.

185. Muscatine L, Pool RR (1979) Regulation of numbers of intracellular algae. Proceedings of the Royal Society B: Biological Sciences 204: 131–139.

186. Fitt WK (2000) Cellular growth of host and symbiont in a cnidarian- zooxanthellar symbiosis. The Biological bulletin 198: 110–120.

187. Branschädel M, Boschert V, Krippner-Heidenreich A (2007) Tumour necrosis factors. Tumor Necrosis Factors. Wiley Online Library.

188. Zhang W, Bae I, Krishnaraju K, Azam N, Fan W, et al. (1999) CR6: A third member in the MyD118 and Gadd45 gene family which functions in negative growth control. Oncogene 18: 4899–4907.

189. Sinha SK, Chaudhary PM (2004) Induction of apoptosis by X-linked ectodermal dysplasia receptor via a caspase 8-dependent mechanism. The Journal of Biological Chemistry 279: 41873–41881.

190. Burkly LC, Michaelson JS, Hahm K, Jakubowski A, Zheng TS (2007) TWEAKing tissue remodeling by a multifunctional cytokine: role of TWEAK/Fn14 pathway in health and disease. Cytokine 40: 1–16.

191. Sabour Alaoui S, Dessirier V, De Araujo E, Alexaki V-I, Pelekanou V, et al. (2012) TWEAK affects keratinocyte G2/M growth arrest and induces apoptosis through the translocation of the AIF protein to the nucleus. PLoS ONE 7: e33609.

192. Barshis DJ, Ladner JT, Oliver T a, Seneca FO, Traylor-Knowles N, et al. (2013) Genomic basis for coral resilience to climate change. Proceedings of the National Academy of Sciences of the United States of America 110: 1387– 1392.

193. Hooper JD, Campagnolo L, Goodarzi G, Truong TN, Stuhlmann H, et al. (2003) Mouse matriptase-2: identification, characterization and comparative mRNA expression analysis with mouse hepsin in adult and embryonic tissues. The Biochemical Journal 373: 689–702.

103 194. Lalmanach G, Naudin C, Lecaille F, Fritz H (2010) Kininogens: More than cysteine protease inhibitors and kinin precursors. Biochimie 92: 1568–1579.

195. Moreau ME, Garbacki N, Molinaro G, Brown NJ, Marceau F, et al. (2005) The kallikrein-kinin system: current and future pharmacological targets. Journal of Pharmacological Sciences 99: 6–38.

196. Takahashi M, Iwaki D, Kanno K, Xiong J, Matsushita M, et al. (2008) Mannose-binding lectin (MBL)-associated serine protease (MASP)-1 contributes to activation of the lectin complement pathway. Journal of Immunology 180: 6132–6138.

197. Li W-Y, Chong SSN, Huang EY, Tuan T-L (2003) Plasminogen activator/plasmin system: A major player in wound healing? Wound Repair and Regeneration 11: 239–247.

198. Smith FM, Vearing C, Lackmann M, Treutlein H, Himanen J, et al. (2004) Dissecting the EphA3/Ephrin-A5 interactions using a novel functional mutagenesis screen. The Journal of Biological Chemistry 279: 9522–9531.

199. Endo Y, Matsushita M, Fujita T (2007) Role of ficolin in innate immunity and its molecular basis. Immunobiology 212: 371–379.

200. Logan DDK, LaFlamme AC, Weis VM, Davy SK (2010) Flow-cytometric characterization of the cell-surface glycans of symbiotic dinoflagellates (Symbiodinium spp.). Journal of Phycology 46: 525–533.

201. Martin F, Penet M-F, Malergue F, Lepidi H, Dessein A, et al. (2004) Vanin-1(- /-) mice show decreased NSAID- and Schistosoma-induced intestinal inflammation associated with higher glutathione stores. Journal of Clinical Investigation 113: 591–597.

202. Kobuke K, Furukawa Y, Sugai M, Tanigaki K, Ohashi N, et al. (2001) ESDN, a novel neuropilin-like cloned from vascular cells with the longest secretory signal sequence among eukaryotes, is up-regulated after vascular injury. The Journal of Biological Chemistry 276: 34105–34114.

203. Darsigny M, Babeu J-P, Dupuis A-A, Furth EE, Seidman EG, et al. (2009) Loss of hepatocyte-nuclear-factor-4alpha affects colonic ion transport and causes chronic inflammation resembling inflammatory bowel disease in mice. PloS ONE 4: e7609.

204. Gessi S, Merighi S, Fazzi D, Stefanelli A, Varani K, et al. (2011) Adenosine receptor targeting in health and disease. Expert Opin Investig Drugs 20: 1591– 1609.

104 205. Kenkel CD, Traylor MR, Wiedenmann J, Salih A, Matz M V (2011) Fluorescence of coral larvae predicts their settlement response to crustose coralline algae and reflects stress. Proceedings of the Royal Society B: Biological Sciences 278: 2691–2697.

206. Leggat W, Seneca F, Wasmund K, Ukani L, Yellowlees D, et al. (2011) Differential responses of the coral host and their algal symbiont to thermal stress. PloS ONE 6: e26687.

207. Houten SM, Wanders RJ a (2010) A general introduction to the biochemistry of mitochondrial fatty acid β-oxidation. Journal of Inherited Metabolic Disease 33: 469–477.

105 Appendix 1 Supplementary Data for Chapter 3

Lehnert et al. "Extensive differences in gene expression between symbiotic and aposymbiotic cnidarians" Supplementary Materials

Figure S1A Alignments of Npc2 sequences from Aiptasia and other organisms. Full-length alignments of selected Npc2-like proteins from Aiptasia sp. (this study), A. viridis [127], human, and D. melanogaster. Red and green dots, amino acids that are identical (red) or similar (green: I,V,L; S,T; D,E; K,R; Q,N) between Aiptasia NpcD and human Npc2; red shading, amino acids whose mutation to alanine ablates the cholesterol-binding function of Npc2 in mammalian cells [129]; blue shading, conserved cysteines used to identify conserved regions of the proteins for phylogenetic analysis; dark underline, the conserved region used for phylogenetic analysis.

106 B

A_digitifera_NPC2E -KNCT--KNDDVTVESLDIN---PCSE-EP---CIFHK-GSTVSVTVAF-TPLEEVKSGE A_digitifera_NPC2F -KNCA--SRKYALPLKVAIN---PCTK-QP---CTLHP-GKKASIAVVV-KPLVTIRRGT O_carmela_NPC2a -SNCTSNPGPSTLGKTVNVTAVPPCDT-AP---CVVHQ-GESLNVTVTF-VPNVAIENFT A_digitifera_NPC2D -QTC---DKPSGRLNSVDVT---PCNG-NP---CVFKR-GTNETITVTF-TPNEVVSKGK Aiptasia_NPC2E -KDC---GSKGATIVRLDIS---PCEE-EP---CNFKT-GTTVTGTLTF-VAKEYFTSGR N_vectensis_NPC2B -RDC---GSQEGEIVGMDIS---PCDS-EP---CVLKR-GTSVDGSLTF-IPHEDLKRAK Aiptasia_NPC2B VVVL---VVVVGIVVVVDVD---QCTSDDP---CSLKR-GTNVTSTATM-IPLEEVTQAT Aiptasia_NPC2C -TDC---GSYLGEIHSLEVN---PCTS-DP---CVLKR-GDNMTSVISF-TPHEQVSAAK Aiptasia_NPC2D -KDC---GSQVGEIVSLDVT---PCTS-DP---CSLKRGGTNATVTINF-KPHEQVTQSK N_vectensis_NPC2C -QDC---GSKKGELISVDLT---PCSS-DP---CVIKR-GANASGVITF-IPHEVVTSSK A_viridis_NPC2D -KDC---GSKVGKLVSFDLS---PCSQ-DP---CIIKR-GSNATGTVTF-IPSEEVTSSK M_faveolata_NPC2B -ANCSDVTALEGKLISVDLT---PCPS-QP---CVFHK-GTNVTATIKF-SPEEMVTDGT D_melanogaster_NPC2A -SDC---GSKTGKFTRVAIE---GCDT-TK-AECILKR-NTTVSFSIDF-ALAEEATAVK M_faveolata_NPC2A -ADC---GSL-AKINFVDVS---PCVM-EP---CELKK-GTNESIEIQF-IPNSNITEGK A_digitifera_NPC2B -RDC---GNKELSPAQVIIT---PCPA-EP---CQLKK-GVNESVEVIF-KPTEVVTSAK A_digitifera_NPC2A -SYI---GSKESSISQVIVT---PCPA-EP---CQLKK-GVNESIEVIF-KPGEVVTSSK Human_NPC2a -KDC---GSVDGVIKEVNVS---PCPT-QP---CQLSK-GQSYSVNVTF-TSNIQSKSSK Mouse_NPC2A -KDC---GSKVGVIKEVNVS---PCPT-DP---CQLHK-GQSYSVNITF-TSGTQSQNST Aiptasia_NPC2A ------K-ELKSSVKRTF-IPHENVTDAE N_vectensis_NPC2A -KDCSG-GKGEGEIVELDIS---PCPT-QP---CTLHK-GTTVSVNITF-VPHVTLDSGK A_viridis_NPC2A -DDCSG-GKGKGEIEKLEII---PCPT-QP---CQLKK-GSKVQIKVTF-VPHEDLTEAT H_magnipapillata_NPC2D -QNC---GHLDSNTI-VSIT---PCEK-EP---CTLVR-GSNATLEIQF-KAKHFSKQLK H_magnipapillata_NPC2B -KPC----DMSSTVGDVAIS---PCDK-QP---CAFQR-GGSANIEISF-TAAKDADKLT H_magnipapillata_NPC2A -KKCTS-PASSAVIGDVIIT---PCDS-LP---CSFKR-GGSGNIKINF-QATKNNSELT H_magnipapillata_NPC2C -KKCSS-PASSAVVGDVVIS---PCDN-QP---CQFIR-GGNANIQIHF-QAKKDNSNIT

A_digitifera_NPC2E LSVDAI-AFGHRLP-M--VRKE--NICEG--HGVT-----CPLEKGKKQTFTINQKVERY A_digitifera_NPC2F LELYGIHWLGIKFP-LS-VPNP--DICHG--YGTR-----CPMIANSRVVLSISQTLPSF O_carmela_NPC2A VVVHAS-VGIIHVP-YP-VTDP--NGCDTAVTGVT-----CPLKANVAVEWHHSFSVPSI A_digitifera_NPC2D ILLYAK-LVLGWIE-LS-LRNP--NICEG--YGLK-----CPLAKGVREELSVTERVPQV Aiptasia_NPC2E VKAYAV-IEGVDLP-LP-IPT---DACQG--YGLT-----CPINNGQTANFVIKQEIQAD N_vectensis_NPC2B LSAHAI-IDKLPLP-LP-IPS---DACQG--YGLS-----CPVDSGVKSMFKIHQAIESE Aiptasia_NPC2B IYMHAT-VSGITIP-ID-IPNP--NACSG--HGLS-----CPLKSGETVELSMVLEVEAK Aiptasia_NPC2C IDINAI-IAGSPIH-VH-IPNP--NACDG--HGLK-----CPLEKGKKVELVVSQVIRRS Aiptasia_NPC2D IYVYAI-IGIIPIP-LP-IPNP--DACTG--HGLT-----CPLASGKDVELVVKQSIDST N_vectensis_NPC2C VLAYAI-FGLIPVP-LP-LPNS--DGCKG--YGLT-----CPLKSGKQVELVFEHYIDQT A_viridis_NPC2D VYMYAI-IGFIPVP-LP-LPNT--DGCKG--YGLT-----CPLKSGKPDELVFSHSIDST M_faveolata_NPC2B LQVYGF-IEGIKTP-FP-LEQP--DACKE--HGLE-----CPLKSGVTYSLEITLAIKPA D_melanogaster_2A TVVHGK-VLGIEMP-FP-LANP--DACVD--SGLK-----CPLEKDESYRYTATLPVLRS M_faveolata_NPC2A TVVYGI-IEGVQVP-FP-VDNP--EVCKE--HGIT-----CPMPAEKTQTFKATLPVKSE A_digitifera_NPC2B VVIHGI-IEGVRFP-FP-FPHP--NGCKE--HGLE-----CPLKPNKEYTFKATLPVKRT A_digitifera_NPC2A VVVHGI-IAGVPVP-FP-ISQP--NGCED--HGLD-----CPLQPNKEYTFKATLPVKSA Human_NPC2a AVVHGI-LMGVPVP-FP-IPEP--DGCKS---GIN-----CPIQKDKTYSYLNKLPVKSE Mouse_NPC2A ALVHGI-LEGIRVP-FP-IPEP--DGCKS---GIN-----CPIQKDKVYSYLNKLPVKNE Aiptasia_NPC2A SSVHGK-VMGFWVP-FP-LPNA--HACKD--SGVK-----CPLVAGSKYEYSSTLDIKSA N_vectensis_NPC2A AIVHGV-IAGIPVP-FP-LPNA--DVCKN--SGLK-----CPLEPGTKYVYQSSLEVKTM A_viridis_NPC2A SVVHGE-IGGFPVP-FP-LPNS--NCCKD--SGLT-----CPLKAGQKYVYTSALDVKSE H_magnipapillata_NPC2D TKVYGK--LLFWVPYYN-FGKE--DSCLD--NGIT-----CPVIEDEEYSYSQSLHISKL H_magnipapillata_NPC2B TVVKGK-IGPIWVP-FP-LSQP--DACNN--EGLT-----CPIKSSQKYTYQYSLPISES H_magnipapillata_NPC2A SVVKGK-IGPLWVP-FP-LSQP--DACQN--EGIT-----CPIKDGQSYLFSYDLPISTT H_magnipapillata_NPC2C TIVKGK-IGPLWVP-FP-LSQP--DGCLN--DGII-----CPVKTDQQYVYSYDLPISKS

A_digitifera_NPC2E YPPLPI-DVEAYVENDNRK----ILC A_digitifera_NPC2F VPMGSY-QLQAVMKDQLGR--M-VLC O_carmela_NPC2a APKGPVEIITWELQAPSKE--D-VAC A_digitifera_NPC2D LPSSTR-EVKAKLVDQNGG--T-VVC Aiptasia_NPC2E FPKVKL-QLKGEVMDPQGN--M-LFC N_vectensis_NPC2B FPVGNL-TLKAAVTDSDTS--QVVFC Aiptasia_NPC2B FPRGKV-ILKTELKDQAKN--D-IFC Aiptasia_NPC2C APPGRY-RIRTELKEQYGI--D-VFC Aiptasia_NPC2D FPAGKV-TVKAELKDQVQN--N-VLC N_vectensis_NPC2C FPTGHL-TLKAELKDQDSD--V-VIC A_viridis_NPC2D FPAGTV-TLKGELKDQEEN--N-IFC M_faveolata_NPC2B YPSIQL-VAQMDFKLPDDG--Y-LFC D_melanogaster_2A YPKVSV-LVKWELQDQDGA--D-IIC M_faveolata_NPC2A YPALQL-DVKWELHDQDAK--V-VYC A_digitifera_NPC2B YQDVCM-I---RLL------CSC A_digitifera_NPC2A YPDIKL-VVKWQLLDQNAN--S-VFC Human_NPC2a YPSIKL-VVEWQLQDDKNQ--S-LFC Mouse_NPC2A YPSIKL-VVEWKLEDDKKN--N-LFC Aiptasia_NPC2A YPAISV-VVKWQLQDGKGQ--D-LYC N_vectensis_NPC2A YPSLKL-VVRWEIQDNKNK--D-VLC A_viridis_NPC2A YPAIKV-VVKWEMQDKDNN--D-VFC H_magnipapillata_NPC2D NPKISI-PVKWLIQNEAEK--D-LVC H_magnipapillata_NPC2B YPKINL-PVSWELKDEKGE--S-LVC H_magnipapillata_NPC2A YPAISL-VVSWEIQDENGN--D-VVC H_magnipapillata_NPC2C YPAISV-VVSWELQDENGN--D-LVC

Figure S1B. The multiple-sequence alignment of the conserved regions used to produce the phylogenetic tree in Figure 2.

107

Figure S2 Distinct but related genes whose products may be involved in host tolerance of the symbiont. The transcripts differentially expressed between symbiotic and aposymbiotic anemones included two whose top blastx hit in SwissProt was a human peroxidasin and three whose top blastx hit was a mammalian plasma kallikrein (Figure 6; Supplementary Table 5). (A,B) The two Aiptasia peroxidasin-related proteins (Apr1 and Apr2) appear to represent distinct gene products with limited domain homology both to each other and to human peroxidasins. (A) ClustalW sequence alignment of

108 the two Aiptasia proteins shows interspersed identical and different amino acids as expected from distinct gene products rather than from alternative splice products or misassembled contigs. Boxes show regions of sequence similarity between the Aiptasia proteins but not the human ones (Box 1) or among all four proteins (Boxes 2 and 3), as diagrammed in B. *, :, and . indicate identical, conserved, and semi-conserved amino acids, respectively. (B) Schematic diagram comparing protein domains found in human peroxidasins and the Aiptasia peroxidasin-related proteins using Pfam. LRR, leucine- rich repeat; I-set, immunoglobulin I-set; Peroxidase, domain with similarity to canonical peroxidases; VWC, von Willebrand factor type-C; CR, collagen triple-helix repeat; Ig 2, immunoglobulin; F5/8 type C domain (or discoidin domain), with cell-adhesion functions; Pentraxin, domain with similarity to pentraxin pattern-recognition receptors displaying Ca2+-dependent ligand binding.

Table S1A. Correlation between RNA-Seq and RT-qPCR measurements of differential gene expression in symbiotic relative to aposymbiotic anemones. a

Locus #/ Top Blast Hit UniProt Read Fold- Fold- transcript accession count b change change # number (RNA- (RT- Seq) qPCR) 58798/1 Bovine Na+- and Cl—dependent taurine Q9MZ34 61 ∞ 29 transporter 102514/1 Human Npc2 cholesterol transporter P61916 269 1197 26 95010/1 Mouse tumor necrosis factor receptor Q8BX35 202 240 33 superfamily member 27 125065/1 Drosophila organic-cation (carnitine) Q9VCA2 255 131 57 transporter 77179/1 Human scavenger receptor class B Q8WTV0 11 28 3.7 member 1 (SRB1; CD36-related) 95925/1 Bacteroides thetaiotaomicron glutamate P94598 852 13 2.9 dehydrogenase 86800/1 Human facilitated glucose transporter Q9NY64 57 12 6.3 (GLUT8) 65589/1 Sheep aquaporin-5 Q866S3 71 11 2.2 + 70728/1 C. elegans NH4 transporter 1 (AMT1- P54145 1382 6.4 7.0 type) 95114/1 Mouse aromatic-amino-acid transporter 1 Q3U9N9 54 5.9 6.2 101012/1 Bacillus halodurans isocitrate lyase Q9K9H0 79 3.9 4.6 66644/1 Human carnitine O-palmitoyltransferase 1 P50416 1237 2.4 2.8 101000/1 S. cerevisiae delta(24(24(1)))-sterol P25340 40 2.0 ∞ reductase 105631/1 Rat Na+- and Cl—dependent GABA P23978 1302 2.0 1.9 transporter 1 125822/1 Cerberus rynchops ficolin D8VNS9 187 1.7 1.8 (collagen/fibrinogen domain containing lectin) 2 27493/1 Salmo salar Golgi pH regulator B5X1G3 61 1.2 1.3 12296/1 60S ribosomal protein L11 P46222 280 1.1 0.9 119098/1 Rat 40S ribosomal protein s7 Q9ZNS1 94 1.0 1.1 12335/1 Dictyostelium F-box/WD repeat- Q54N86 239 -1.0 -1.3 containing protein A-like protein 84201/1 Metridium senile cytochrome c oxidase Q35101 1784 -1.4 -1.4 58671/1 Coturnix japonica glyceraldehyde-3- Q05025 237 -1.4 -1.1 phosphate dehydrogenase 77428/1 Superoxide dismutase P81926 987 -1.6 -1.7 21845/2 Rat apoptosis-inducing factor Q9JM53 769 -1.6 -1.1 mitochondrial 59465/1 Rat calmodulin-like protein 3 Q5U206 679 -1.6 -1.5 13527/1 Rat monocarboxylate transporter 10 Q91Y77 47 -1.7 -1.8

109 12461/1 Rat mannan-binding lectin serine protease Q8CHN8 223 -3.1 -2.8 1 431/2 Human Na+/glucose 4 Q2M3M2 67 -3.2 -2.0 1568/1 Mouse E2F transcription factor 2 P56931 136 -3.5 -2.1 20440/1 Zebrafish delta-like protein c Q9IAT6 2 -∞ -1.8 a Transcripts are arranged (top to bottom) in order of their degree of expression in symbiotic relative to aposymbiotic anemones as determined by RNA-Seq. Only the data from RNA-Seq Experiment 1 are used, because its conditions matched more closely those of the RT-qPCR experiment (see Materials and Methods and Table 1). b The baseMean expression value as calculated by DESeq [116].

Table S1B. . Primer sequences and product sizes for RT-qPCR data. a

Locus #/ Forward Primer Reverse Primer Product transcript # Size

58798/1 AAAGATCTGCTGGCTGACCCTGA AACACCAACCAATTGCCTCACCC 134 102514/1 AAGTGACCCGTGCGTTCTCAAA TGCGTTTGGGTTGGGAATGTGT 148 95010/1 TTTGACATGCTGCGCGAACTGCT AATGGCCACGACGTGTTTGAAGG 225 125065/1 TGTCAGTGGCGTTGCACAGTCTT ACATTGCCAATTCTTGCGCGGT 159 77179/1 GAAATGGCGGAAAAAGCATA GGTGGAAATTGTGTCCCATC 225 95925/1 CAAAGCCTGGACATCGACGCAAA CAATGACACAGGCCCGCAGAAA 194 86800/1 AGCTGGAGGGAAGGCACCAATAA TGGGAGCTGTCAATCAACTTGGGA 110 65589/1 TTTGCCGGGAACACGTGCATT TGAGCGCCGAGTGATGTAGGA 177 70728/1 ACCAACGGATTCCCATTCTCGTCA TTTGCGGGCAGCAGTGTTGTT 110 95114/1 TGTCGCGCTGTTGCCTTTGTT TGGCCAAAGCAAGGCGTTTGTGA 187 101012/1 GGTCAGCACGCATGAAAGCATTGT AAGCAATCCAGATGGCAAAGGCAG 171 66644/1 TCCAAGACCAAGTGTTGGTGGACT TGATCCAAGTCAGGGACAGGCAAA 110 101000/1 TCTGTCGTGGACACTGCTGTTGA ATCCAACCGAACTTCTCCGTGGT 189 105631/1 ACCGTGAACACTTCTTGAGAGCCA GCCTCGGTTGAATGCTTTGTTCGT 210 125822/1 ACCTCGCGCCTTGTCCTTATCAAA AATGGGACTGTTAAGGCGGTTCGT 225 27493/1 GGTTTGCTGCATCTTCACAGGTCA AGAAACAGCTGGCGACTAAGCTCT 133 12296/1 AGCCAAGGTCTTGGAGCAGCTTA TTGGGCCTCTGACAGTACAGTGAAC 125 A 119098/1 ACTGCAGTCCACGATGCTATCCTT GTCTGTTGTGCTTTGTCGAGATGC 125 12335/1 TGAAACCTCCTTTCAGCCTCCCA TCACTTCACTCATCTCGGCAGCA 172 84201/1 AGCAGTTGGTAAGTCTGCACAA GTAACCATGGTAGCAGCATGAA 105 58671/1 AACAGCTTTGGCAGCACCTGTAGA TGCTTTCACAGCAACCCAGAAGAC 114 77428/1 AAGGCAAGCGGTAACGAGGTTT TGCTTTCCTTCTGTCAGCCCAGT 177 21845/2 TCATGGCAAGGACGACGAGTGAA TCACCCATGGCAGTAAAGAGCGA 156 59465/1 TCGGCAGGATTGTGTCCAAGTGA AAACGAGCGACACAACGTCAGCA 197 13527/1 AGACACCCAACTGTTCCTTCCCA ACACGCCGTAAGTAAACGCCAA 212 12461/1 AGCAAAGGGCACGAACAACCAAC TTGACTCGCTATGGCCGCTAACA 125

110 431/2 TGGCCTTCAACAAACCTTCACGCT ACGTTTGTAGTCCCAGCCAGTCA 238 1568/1 AAGTTCGTTGGAGGGTACTGCGA CCACCAAAGACTTCACACAGCCA 110 20440/1 AATGGCGGAGTTTGTCAAGACGG TGCCGATGCATTTGCCTGAGTT 118 a Transcripts are listed in the same order as in Supplementary Table 1A.

Table S2. Transport-related genes showing differential expression in symbiotic relative to aposymbiotic anemones. a

Line Fold- Read Locus#/ Best BLAST hit UniProt BLAST-hit change b count c transcript# accession E-value number

1 ∞ 78 58798/1 Bovine Na+- and Cl--dependent Q9MZ34 1e-169 taurine transporter 2 ∞ 81 36456/1 Rabbit Na+/(glucose/myo- Q28728 3e-104 inositol) transporter 2 3 600 659 102514/1 Human Npc2 cholesterol P61916 2e-14 transporter + 4 131 437 60777/1 Zebrafish NH4 transporter rh Q7T070 3e-98 type b 5 44 150 125065/1 Drosophila organic-cation Q9VCA2 6e-35 (carnitine) transporter 6 28 11 77179/1 Human scavenger receptor class Q8WTV0 9e-65 B member 1 (SRB1; CD36- related) 7 13 70 65589/1 Sheep aquaporin-5 Q866S3 8e-37 8 11 52 86800/1 Human facilitated glucose Q9NY64 9e-89 transporter (GLUT8) 9 6.9 94 12006/1 Xenopus GABA and glycine Q6PF45 8e-60 transporter + 10 5.9 881 70728/1 C. elegans NH4 transporter 1 P54145 6e-72 (AMT1-type) 11 5.8 1667 45451/1 Drosophila lipid-droplet Q9VXY7 2e-08 surface-binding protein 2 12 4.9 45 95114/1 Mouse aromatic-amino-acid Q3U9N9 3e-65 transporter 1 13 4.3 198 84722/1 Fish (Tribolodon) carbonic Q8UWA5 2e-36 anhydrase II 14 4.3 288 2130/2 Pig aquaporin-3 A9Y006 1e-68 15 3.7 111 11708/1 Human facilitated glucose Q9NY64 1e-88 transporter (GLUT8) 16 3.6 2547 101327/1 Rat neutral- and basic-amino- P82252 2e-117 acid transporter 1 17 3.5 52 103419/1 Chicken monocarboxylate P57788 1e-28 transporter 4 (slc16a3) 18 3.1 71 37788/1 Rabbit hyperpolarization- Q9TV66 9e-129 activated cation channel 4 + - 19 3.1 707 56440/1 Mouse Na -independent SO4 Q80ZD3 1e-126 transporter 20 2.9 241 97639/1 Bovine ABC subfamily f Q2KJA2 0 member 2 21 2.9 264 11677/1 Arabidopsis ABC transporter g Q9FT51 7e-67 family member 27 22 2.7 35 26261/3 Dictyostelium UDP-sugar Q54YK1 1e-32 transporter 23 2.6 108 49092/1 Mouse zinc transporter 1 (znt- Q60738 4e-70 type) 24 2.4 61 15916/1 Human major-facilitator- Q6NUT3 9e-37 superfamily-domain-containing

111 protein 12 25 2.4 991 66644/1 Human carnitine O- P50416 0 palmitoyltransferase 1 26 2.2 35 119860/1 Mouse Na+/(glucose/myo- Q8K0E3 4e-125 inositol) cotransporter 2

27 2.1 3879 76979/1 Human Na+-dependent O95436 2e-112 phosphate-transport protein 2b 28 2.1 4804 109479/1 Human neutral- and basic- Q07837 1e-49 amino-acid transport protein 29 2.1 554 12947/1 Rat very-low-density- P98166 4e-162 lipoprotein receptor 30 2.1 53 37499/1 Human aromatic-amino-acid Q8TF71 6e-49 transporter 1 31 2.1 714 14877/3 Zebrafish pyrimidine-nucleotide Q6DG32 1e-69 carrier 32 2.1 243 16360/1 Xenopus monocarboxylate Q6P2X9 3e-35 transporter 12 (slc16a12) 33 2.0 88 22123/1 Rat chloride channel clic-like Q9WU61 1e-16 protein 34 2.0 61 120787/1 Mouse aromatic amino acid Q3U9N9 7e-34 transporter 1 35 2.0 989 105631/1 Rat Na+- and Cl--dependent P23978 9e-124 GABA transporter 1 36 2.0 231 2338/1 Bovine zinc transporter (zip- A5D7L5 1e-43 type) 37 2.0 582 49156/1 Human ABC subfamily b P08183 3e-108 member 1 38 1.9 1076 86906/1 Human lipid-transfer protein Q9NQZ5 1e-52 39 1.8 218 120269/1 Mouse Na+-dependent neutral- O88576 2e-117 amino-acid transporter 40 1.8 1272 36717/5 Rat neutral- and basic-amino- P82252 2e-116 acid transporter 1 41 1.8 381 19286/1 Rat v-ATPase subunit f P50408 3e-41 42 1.8 211 81279/1 Rat Na+-dependent phosphate Q9JJP0 2e-71 transporter 1 43 1.7 105 82158/1 Columba livia carnitine O- P52826 5e-151 acetyltransferase 44 1.7 176 12043/1 Mouse carnitine O- P52825 0 palmitoyltransferase 2 45 1.7 1020 70022/1 Rat neutral- and basic-amino- P82252 2e-128 acid transporter 1 46 1.7 193 126133/1 Zebrafish zinc transporter (znt- Q5PQZ3 2e-112 type) 47 1.6 1058 33916/1 Human transitional-ER ATPase P55072 0 48 1.6 479 86475/1 Xenopus peptide transporter 4 Q68F72 1e-98 49 -1.6 164 22834/1 Chicken monocarboxylate Q90632 2e-30 transporter 3 50 -1.6 684 108875/1 Chicken low-density-lipoprotein P98157 0 receptor-related protein 1 51 -1.6 404 28605/1 Rat TRP cation-channel Q6RI86 1e-115 subfamily a, member 1 52 -1.8 1323 33284/1 Mouse aromatic amino acid Q3U9N9 3e-34 transporter 1 53 -1.8 5989 71915/1 Rabbit non-specific lipid- O62742 0 transfer protein 54 -1.8 817 16745/1 Rat plasma membrane Ca2+- P11505 3e-98 transporting ATPase 55 -2.0 39 13527/1 Rat monocarboxylate Q91Y77 6e-19 transporter 10 (aromatic amino acid transporter 1) 56 -2.1 33 122320/1 Human long-chain fatty acid Q6PCB7 3e-96 transport protein 1

112 57 -2.2 848 43841/1 Chicken ovotransferrin P02789 3e-47

d 58 -2.3 4512 98994/1 Human Npc2 cholesterol P61916 5e-09 transporter 59 -2.4 5556 44110/1 Human low-density lipoprotein O75096 8e-61 receptor-related protein 4 60 -2.8 40 76106/1 C. elegans TRP-like cation P34586 3e-55 channel protein 1 + 61 -2.9 190 93152/1 Chimpanzee NH4 transporter rh Q3BCQ7 5e-111 type c 62 -2.9 629 104248/1 Rat serotransferrin P12346 2e-47 63 -3.0 506 76019/1 Carbonic anhydrase P83299 1e-37 64 -3.5 1893 56973/1 Mouse organic cation carnitine Q9WTN6 5e-31 transporter 3 65 -3.5 301 432/1 Human Na+/glucose Q2M3M2 2e-130 cotransporter 4

66 -3.8 325 129624/1 E. coli high-affinity choline- P0ABD0 2e-78 transport protein a Putative small-molecule transporters and some proteins of related function (see text) are arranged in order of their degree of differential expression in symbiotic anemones relative to aposymbiotic anemones. Positive fold-changes, expression higher in symbiotic anemones; negative fold-changes, expression higher in aposymbiotic anemones. b The arithmetic mean of the values from the two RNA-Seq experiments, except in line 6 (transcript 77179/1). ∞, expression was not detected in aposymbiotic animals. Transcript 77179/1 was not detected in aposymbiotic anemones in Experiment 2, giving a nominal ∞-fold change in expression. However, as the normalized read counts in both experiments were rather low, and the possible involvement of the 77179/1-encoded protein in lipid metabolism makes it likely to have been affected in its expression by the starvation conditions used in Experiment 2, we report in line 6 the more conservative value from Experiment 1 alone. c Except for line 6, the average of the baseMean expression values (as calculated by DESeq [116]) for Experiment 1 and Experiment 2. As explained in footnote b, for transcript 77179/1 (line 6), we show the value for Experiment 1 alone. d Appears to represent a truncated version of transcript 98999/1, whose predicted protein product was used for the phylogenetic analysis of Figure 2A.

Table S3. Lipid-metabolism genes showing differential expression in symbiotic relative to aposymbiotic anemones. a

Line Metabolic Putative protein function (best UniProt BLAST- Locus #/ Fold- Process BLAST hit) accession hit transcript # change b number E-value 1 FA synthesis ACC1: Acetyl-CoA carboxylase P11029 0 26166/1 3.9 (chicken) 2 FA synthesis ELOVL4: Elongation-of-very-long- Q9EQC4 1e-23 4012/1 4.2 chain-fatty-acid protein 4 (mouse) 3 FA synthesis Δ5 fatty-acid desaturase (Mortierella O74212 6e-39 120701/1 6.2 alpina) 4 FA synthesis Δ6 fatty acid desaturase (human) O95684 6e-46 92492/1 3.5 5 Lipid storage DHAPAT: dihydroxyacetone O15228 1e-95 8091/1 1.4 phosphate acyltransferase (human) 6 Lipid storage 2-acylglycerol O-acyltransferase 2-a Q2KHS5 8e-75 10118/2 2.7 (Xenopus) 7 Lipid storage 2-acylglycerol O-acyltransferase 2-a Q2KHS5 4e-89 15365/1 -2.0 (Xenopus)

113 8 Lipid storage 2-acylglycerol O-acyltransferase 2-b Q5M7F4 9e-81 78512/1 2.4 (Xenopus) 9 Lipid storage Diacyglycerol O-acyltransferase O06795 1e-19 76581/1 2.1 (Mycobacterium tuberculosis) 10 Lipid storage AGPAT 1: 1-acyl-sn-glycerol-3- Q99943 2e-23 67491/1 -1.4 phosphate acyltransferase alpha (human) 11 Lipid storage Lipid-droplet surface-binding protein Q9VX7 2e-08 45451/1 5.8 regulation 2 (Drosophila) 12 Lipase HSL: Hormone-sensitive lipase Q05469 2e-84 13988/1 1.8 (human) 13 Lipase ATGL: Adipose triglyceride lipase Q8BJ56 9e-62 16411/1 -3.3 (mouse) 14 FA transport FATP1: Fatty-acid transport protein 1 Q6PCB7 3e-96 122320/1 -1.9 (human) 15 FA transport FATP4: Long-fatty-acid transport Q5RDY4 8e-92 122313/8 -3.1 protein 4 (orangutan) 16 FA transport SRB1: Scavenger receptor class B Q8WTV0 9e-65 77179/1 28 member 1 (human; CD36-related protein) 17 FA-CoA-ligase ACSL4: Long-chain-fatty-acid ligase O60488 0 89704/1 5.7 4 (human) 18 FA-CoA-ligase ACSL5: Long-chain-fatty-acid ligase Q9ULC5 5e-116 106694/1 2.9 5 (human) 19 FA β-oxidation Organic cation (carnitine) transporter Q9VCA2 6e-35 125065/1 44 (Drosophila) 20 FA β-oxidation CPT1: Carnitine O- P50416 0 66644/1 2.4 palmitoyltransferase 1 (human) 21 FA β-oxidation CPT2: Carnitine O- P52825 0 12043/1 1.6 palmitoyltransferase 2 (mouse) 22 FA β-oxidation CACT: Carnitine acylcarnitine carrier Q08DK7 2e-60 13918/1 n.s. protein (bovine) 23 FA β-oxidation VLCAD: Very-long-chain-specific P48818 0 108541/1 1.6 acyl-CoA dehydrogenase (bovine) 24 FA β-oxidation MTP: Trifunctional enzyme (pig) Q29554 0 33057/3 1.4 25 FA β-oxidation MCAD: Medium-chain-specific acyl- Q3SZB4 5e-165 92556/1 n.s. CoA dehydrogenase (bovine) 26 FA β-oxidation SCAD: Short-branched-chain- Q54RR5 2e-121 127382/1 n.s. specific acyl-CoA dehydrogenase (Dictyostelium) 27 FA β-oxidation Crotonase: Short-chain enoyl-CoA Q1ZXF1 4e-67 117452/1 n.s. hydratase (Dictyostelium) 28 FA β-oxidation M/SCHAD: Medium and short-chain Q16836 5e-29 34949/1 n.s. l-3-hydroxyacyl-CoA dehydrogenase (human) 29 FA β-oxidation DCI: Enoyl-Δ isomerase (human) P42126 7e-33 55782/1 n.s. 30 FA β-oxidation MCKAT: 3-ketoacyl-CoA thiolase P13437 5e-145 56206/1 2.4 (rat) 31 Glyoxylate Isocitrate lyase (Bacillus halodurans) Q9K9H0 3e-164 101012/1 3.9 cycle 32 Glyoxylate Malate synthase (Myxococcus P95329 2e-141 22622/1 n.s. cycle xanthus) a Genes encoding proteins putatively involved in lipid metabolism are arranged in groups by biological process (see Chapter 3, Figure 3). FA, fatty acid. b Because of the likelihood that the starvation conditions in Experiment 2 would affect lipid metabolism, the fold-change values from Experiment 1 are shown. Positive fold-changes, expression higher in symbiotic anemones; negative fold-changes, expression higher in aposymbiotic anemones. n.s., no significant differential expression observed.

114

Table S4 Presence or absence in the Aiptasia transcriptome of genes encoding the enzymes involved in the synthesis of particular amino acids. a

Line Amino acid Enzyme UniProt Aiptasia accession locus #/ number transcript # 1 Gln Glutamine synthetase P32288 104234/1 2 Glu Glutamate synthase Q12680 60857/1 3 Glu/Pro NADP-specific glutamate dehydrogenase b Q9C8I0 3911/1 4 Glu/Pro NADP-specific glutamate dehydrogenase 2 b P39708 95925/1 5 Glu/Pro NAD-specific glutamate dehydrogenase c P33327 99746/1 6 Glu/Pro Glutamate dehydrogenase 2 c Q38946 7229/1 7 Met MTHFR: Methylenetetrahydrofolate reductase Q9WU20 15095/1 8 Met MS: Methionine synthase (cobalamin-dependent) Q99707 50131/1 9 Met Methionine synthase (cobalamin-independent) d P05694 55393/1 10 Met BHMT: Betaine-homocysteine S-methyltransferase 1 Q93088 45257/1 11 Cys/SAM e MAT: Methionine adenosyltransferase 1 Q91X83 140402/1 12 Cys SAHH: S-adenosyl-L-homocysteine hydrolase P27604 91092/1 13 Cys CBS: Cystathionine β-synthase P32582 98284/1 14 Cys CGL: Cystathionine γ-lyase P31373 7792/1 15 Met/Cys/Thr/Ile/Lys Aspartokinase/homoserine dehydrogenase Q9SA18 16 Met/Cys/Thr/ Ile/Lys Aspartokinase P10869 17 Met/Cys/Thr/ Ile Homoserine dehydrogenase P31116 18 Met/Cys HAT: Homoserine O-acetyltransferase P08465 70690/1 19 Met/Cys CGL: Cystathionine γ-synthase P47164 974/1 20 Met Cystathionine β-lyase P43623 21 Met Homocysteine S-methyltransferase 3 Q8LAX0 4396/6 22 Ser D-3-phosphoglycerate dehydrogenase 1 P40054 294/1 23 Ser Phosphoserine aminotransferase P33330 57256/1 24 Ser Phosphoserine phosphatase P42941 122485/1 25 Ser Catabolic L-serine/threonine dehydratase P25379 21690/1 26 Ser/Gly Serine hydroxymethyltransferase, mitochondrial P37292 11787/1 27 Ser/Gly Serine hydroxymethyltransferase, cytosolic P37291 11787/1 28 Gly Alanine-glyoxylate aminotransferase 1 P43567 47533/1 29 Gly Serine-glyoxylate aminotransferase Q56YA5 47531/1 30 Gly Low specificity L-threonine aldolase P37303 109186/1 31 Asp/Glu/Asn Aspartate aminotransferase, mitochondrial Q01802 23248/1 32 Asp/Glu/Asn Aspartate aminotransferase, cytoplasmic P46646 111366/1 33 Asn Asparagine synthetase P49089 51175/1 34 Ala Alanine aminotransferase 1 P52893 89107/1 35 Pro γ-glutamyl phosphate reductase P54885 128220/1 36 Pro Pyrroline-5-carboxylate reductase P32263 115939/1 37 Arg Caramoyl-phosphate synthetase P31327 21357/1 38 Arg Ornithine carbamoyltransferase P00480 116500/1 39 Arg Argininosuccinate synthetase P22768 53174/1 40 Arg Argininosuccinate lyase P04076 29236/1

115 41 Arg Arginase-1 P05089 118787/1 42 Arg N-acetylglutamate synthase Q8N159 19094/1 43 Arg Acetylglutamate kinase Q01217 44 Arg Ornithine acetyltransferase Q04728 45 Phe/Tyr Aromatic/aminoadipate aminotransferase 1 P53090 109224/1 46 Tyr Tyrosine aminotransferase Q9LVY1 58220/1 47 Tyr Phenylalanine 4-hydroxylase P00439 37855/1 48 Phe/Trp Class-II DAHP synthetase-like protein Q9SK84 49 Phe/Trp Phospho-2-dehydro-3-deoxyheptonate aldolase, tyrosine- P32449 inhibited 50 Phe/Trp Pentafunctional AROM polypeptide P08566 51 Phe/Trp Chorismate mutase P32178 52 Phe/Trp Chorismate synthase P28777 53 Phe/Trp Anthranilate synthase component 1 P00899 54 Phe/Trp Anthranilate phosphoribosyltransferase P07285 55 Trp Tryptophan synthase Q42529 56 His ATP phosphoribosyltransferase P00498 57 His Imidazole glycerol phosphate synthase hisHF P33734 58 His Histidinol-phosphate aminotransferase P07172 59 His Histidine biosynthesis trifunctional protein P00815 60 His Histidinol dehydrogenase Q9C5U8 61 Val/Leu/Ile Acetolactate synthase P07342 8385/1 62 Val/Leu/Ile Ketol-acid reductoisomerase, mitochondrial P06168 63 Val/Leu/Ile Dihydroxy-acid dehydratase, mitochondrial P39522 127954/1 64 Val/Leu/Ile Branched-chain-amino-acid aminotransferase, cytosolic P47176 85088/1 65 Leu 2-isopropylmalate synthase P06208 66 Leu 3-isopropylmalate dehydratase P07264 67 Leu 3-isopropylmalate dehydrogenase P04173 68 Ile Threonine dehydratase, mitochondrial Q9ZSS6 57366/1 69 Lys Homocitrate synthase, mitochondrial Q12122 70 Lys Kynurenine/α-aminoadipate aminotransferase,mitochondrial Q8N5Z0 71 Lys Homoaconitase, mitochondrial P49367 72 Lys Homoisocitrate dehydrogenase, mitochondrial P40495 73 Lys L-aminoadipate-semialdehyde dehydrogenase P07702 127184/1 74 Lys Saccharopine dehydrogenase [NADP(+), L-glutamate-forming] P38999 1580/1 75 Lys Saccharopine dehydrogenase [NAD(+), L-lysine-forming] P38998 76032/1 76 Lys 4-hydroxy-tetrahydrodipicolinate synthase 2, chloroplastic Q9FVC8 77 Lys Dihydrodipicolinate synthase Q0WSN6 78 Lys Diaminopimelate decarboxylase 1 Q949X7 37096/1? f 79 Thr Threonine synthase P16120 10016/1 a The UniProt Accession Number shown is for the seed sequence used to identify the Aiptasia transcript. For an Aiptasia transcript to be listed, its best reciprocal BLAST hit (to the same species as the seed sequence) had to be the seed sequence itself or to a sequence encoding a paralogous protein. Where no transcript is listed, no Aiptasia homologue of the seed sequence could be identified with confidence. b Downregulated (transcript 3911/1) and upregulated (95925/1) in symbiotic relative to aposymbiotic anemones (see Figure 4). Both proteins had a Bacterioides thetaiotaomicron NAD(P)-utilizing glutamate dehydrogenase (UniProt P94598) as their top BLAST hit. c No significant differential expression in symbiotic relative to aposymbiotic anemones.

116 d In contrast to transcript 50131/1 (MS in Figure 5), transcript 55393/1 showed no differential expression in symbiotic vs. aposymbiotic anemones. e S-adenosyl-methionine (see Figure 5). f Although this Aiptasia transcript met the formal criterion for inclusion (footnote a), the number of genomic reads mapping to it barely exceeded our cut-off for calling a sequence cnidarian (see Table 2), so that it may represent a contaminant rather than an Aiptasia gene encoding a homologue of this typically bacterial and plant enzyme.

Table S5 Genes potentially involved in host tolerance of the symbiont that are differentially expressed between symbiotic and aposymbiotic anemones. a

Protein (from top BLAST hit) UniProt Locus #/ BLAST-hit Fold- accession transcript # E-value Change b number

A. Response to oxidative stress Catalase P04040 100968/1 0 -4.7 ADAM (disintegrin and metalloproteinase domain- Q13443 123296/1 2e-62 -3 containing protein) 9

Transient receptor potential cation channel (subfamily Q91YD4 125627/1 7e-16 -2.9 M, member 2)

Peroxidasin-related protein 1 Q92626 99631/1 4e-06 -2.5 Dual oxidase 2 Q8HZK2 17080/2 5e-27 -2 Allene oxide synthase-lipoxygenase O16025 9291/1 4e-38 -1.8 Soluble guanylate cyclase 88E Q8INF0 7254/1 1e-171 1.9 Peroxidasin-related protein 2 A1KZ92 57146/1 7e-25 2.9

B. Inflammation/tissue remodeling/response to wounding Transmembrane serine protease 6 c Q9DBI0 81296/1 3e-50 -6.1 Plasma kallikrein d P14272 12789/1 7e-47 -4 Mannan-binding lectin serine peptidase 1 (MASP-1) e Q8CHN8 36375/1 5e-12 -3 Plasminogen f P00747 21286/1 3e-47 -3 Plasma kallikrein d P03952 84752/1 1e-48 -2.9 Ephrin type-a receptor 3 g P29320 6695/1 2e-63 -2.8 Phospholipase A2 (isoform 4) h Q6T179 3740/5 1e-26 -2.4 Arachidonate 5-lipoxygenase i P48999 55879/1 5e-28 -1.7 Plasma kallikrein d P26262 73922/1 8e-40 2.1 Ficolin 2 j Q15485 62279/1 1e-42 2.2 Vanin-I k Q58CQ9 48344/1 8e-118 2.6 Discoidin, CUB, and LCCL domain containing 2 l Q91ZV2 80843/1 3e-12 2.7

Hepatocyte nuclear factor 4 (alpha) m P22449 34830/4 1e-113 3.6 Adenosine A2b receptor n O13076 66307/1 2e-17 4.2 Scavenger receptor class B member 1 o Q8WTV0 77179/1 9e-65 28

117 C. Apoptosis/cell death Transcription factor E2F2 P56931 1568/1 5e-07 -4.2 Receptor-binding cancer antigen expressed on SiSo Q865S0 42283/1 4e-06 -2.7 cells

Tumor protein p73 Q9JJP2 88973/1 2e-35 -2.4 Paired box protein Pax-3 P23760 46973/1 3e-43 -1.8 Apoptosis-inducing factor 1 (mitochondrial) Q9JM53 21845/2 1e-171 -1.8 TNF (Tumor Necrosis Factor) receptor-associated Q13114 30586/1 2e-74 1.8 factor 3

TNF superfamily member 12 O43508 18277/1 1e-07 1.9 Kruppel-like factor 11 O14901 58173/1 8e-51 2.6 G1 to S phase transition 1 P15170 25564/1 0 2.8 Growth arrest and DNA damage-inducible protein Q9Z111 55453/1 6e-09 5.1 (GADD45 gamma)

Ribonucleoside-diphosphate reductase (small chain C) Q9LSD0 18748/1 1e-132 12

Organic cation transporter Q9VCA2 88336/1 6e-35 44 TNF receptor superfamily member 27 Q8BX35 94982/1 9e-10 60 a The set of all transcripts displaying differential expression by RNA-Seq was analyzed to identify biological processes (based on GO terms) that were overrepresented in this set relative to the background transcriptome (see Materials and Methods). The sets of processes identified here (A, B, and C) emerged from this analysis and may be involved in host tolerance of the symbiont. b In all but one case, the arithmetic mean of the values from the two RNA-Seq experiments is shown. For transcript 77179/1 (last line of section B), the value from Experiment 1 is shown for reasons explained in Supplementary Table 2, footnote b. Positive fold-changes, expression higher in symbiotic anemones; negative fold-changes, expression higher in aposymbiotic anemones. c Hydrolyzes a range of proteins including type I collagen, fibronectin, and fibrinogen and may play a role in matrix-remodeling processes [193]. d Serine proteases activated by tissue injury or microbial invasion; they activate the release of potent pro-inflammatory cytokines that ultimately result in the release of effector molecules such as nitric oxide and tumor necrosis factor-α and can stimulate the complement innate-immunity system [194,195]. e Plays a role as an amplifier of the complement cascade, potentially via the activation of MASP-2 [196]. f The zymogen of plasmin; it can be activated via plasma kallikrein and functions in the breakdown of fibrin in fibrinolysis, the activation of proteases, and the modulation of cell adhesion [197]. g A receptor tyrosine kinase that binds membrane-bound ephrin family ligands residing on adjacent cells and regulates cell-cell adhesion, cytoskeletal organization, and cell migration [198]. h Releases arachidonic acid from cellular membrane phospholipids, leading to its conversion to pro- inflammatory prostaglandins via arachidonate 5-lipoxygenase [195]. i See note h. j A lectin whose binding to microbial surface glycans can initiate activation of the complement pathway [199]; it also appears to bind to cell-surface glycans of Symbiodinium [200].

118 k Hydrolyzes pantetheine to pantothenic acid and cysteamine, the latter of which can lead to acute and chronic epithelial inflammation [201]. l Thought to play a role in cell adhesion and wound healing [202]. m A transcriptional regulator that is decreased in inflammatory bowel disease and protects against chemically-induced colitis in mice [203]. n Its activation appears to result in inhibition of pro-inflammatory cytokine production, and mice deficient in A2b receptors are more susceptible to intestinal inflammation [204]. o See text.

Supplementary Materials and Methods Identification and Optimization of qPCR Standards for Aiptasia Six housekeeping genes were selected as potential qPCR standards based on their prior use in coral studies. Gene names used here are those assigned to the Aiptasia pallida genes and differ in most cases from those used in the other organisms. The genes encoding ribosomal protein L11 (RPL11), NADH-dehydrogenase subunit 5 (NDH5), and glyceraldehyde-3-phosphate-dehydrogenase (GPD1) were reported to be stable in Porites astreoides during heat stress, settlement induction, and metamorphosis [205].The genes encoding ribosomal protein S7 (RPS7) and adenosylhomocysteinase (AHC1) were used as standards during studies of thermal stress in Acropora aspera [206]. The β-actin gene (ACT1) was used to explore modulation of host-gene expression [143] and was used as the standard for early qPCR studies in our lab. Primers were developed and tested for these six potential standard genes. The aposymbiotic A. pallida transcriptome [113] was searched using tblastx with sequences from Porites lobata for NDH5, P. astreoides for RPL11, Urticina eques for GPD1, Acropora millepora for RPS7, and Nematostella vectensis for AHC1. The loci identified in the A. pallida transcriptome were searched using blastx in NCBI and all top hits were indeed the genes of interest. The identified loci were then translated using ORFPredictor and the longest ORFs were used to identify conserved sequences by performing protein alignments in MacVector with sequences available from NCBI. Conserved sequences were then used to develop primers using PrimerQuest from Integrated DNA Technologies (IDT). Primers were tested on A. pallida cDNA and gDNA. Primers that spanned an exon-intron junction were preferentially identified for further use (Supplementary Table 6). PCR products were cloned into a TA cloning vector and electroporation- competent E. coli cells were transformed with the plasmids. Transformed cells were plated on Ampicillin/X-Gal plates and white/light-blue colonies were selected for colony PCR using M13 forward and reverse primers. PCR products were sequenced, and the sequences were aligned with the expected sequences from the transcriptome. All primer pairs accurately selected the sequences of interest.

119 Table S6 Primer sequences used for potential qPCR standards

Gene Primer sequences RPL11 F: AGCCAAGGTCTTGGAGCAGCTTA R: TTGGGCCTCTGACAGTACAGTGAACA RPS7 F: ACTGCAGTCCACGATGCTATCCTT R: GTCTGTTGTGCTTTGTCGAGATGC NDH5 F: AGCAGTTGGTAAGTCTGCACAA R: GTAACCATGGTAGCAGCATGAA GPD1 F: AACAGCTTTGGCAGCACCTGTAGA R: TGCTTTCACAGCAACCCAGAAGAC AHC1 F: CCATTACAGCAACAACACAGGCCA R: GCATCAAACGTTGGCAGATGAAGC ACT1 F: ACACCGTCTTGTCAGGAGGTTCAA R: TCCACATCTGTTGGAAGGTGGACA

The six genes were then tested for their expression levels across 11 experimental conditions (Supplementary Table 7). RNA was extracted from 3-4 medium-sized anemones from each condition using a Trizol/RNeasy hybrid protocol (details available upon request). RNA integrity was checked both by using a Nanodrop and by running samples on a 2% agarose gel. For all RNA samples used, 260/280 readings were >1.9, and two clear rRNA bands were visible. For each condition, 300 ng of RNA was reverse transcribed using the Maxima® First Strand cDNA-synthesis kit for RT-qPCR (Fermentas). 17 µL of RT product was then diluted with 23 µL of H2O. 2 µL of this cDNA solution was then used for the qPCR reaction. Each qPCR well had 2 µL of cDNA, 2 µL of H2O, 5 µL of Power SYBR® Green PCR Master Mix (Applied Biosystems), and 1 µL of a primer mix containing 1.5 µM forward (F) primer and 1.5 µM reverse (R) primer.

Table S7 Experimental conditions used to test gene-expression levels by qPCR a b c Conditions CC7 Sym CC7 Apo Room Temperature (27ºC) x x 1 h heat shock (35ºC) x x 1.5 h heat shock (37ºC) x x 1 h cold shock (8ºC) x x d 1 h incubation with 500 µg/mL dsRNA (27ºC) x x e Kept in the dark for 1 month (27ºC) x not done a Except for the sample incubated in the dark, all anemones were incubated on a 12L:12D cycle with 25 µmol photons m-2 s-1 from Cool White fluorescent bulbs, and the manipulations indicated were performed during the light period. b Symbiotic anemones (containing the endogenous population of Clade A Symbiodinium) of the CC7 clonal line of Aiptasia [9]. c Aposymbiotic CC7 animals that had been cured of their endogenous Symbiodinium by a combination of cold shock, DCMU treatment, and extended growth in the dark [113]. All anemones were screened for absence of dinoflagellates prior to use in these experiments.

120 d dsRNA (477 bp) synthesized for A. pallida nematogalectin gene knockdown. e Represents a partially aposymbiotic condition.

The primer efficiency of each primer pair was tested across a dilution series of 1:1, 1:10, 1:100, 1:1000, and 1:10000 cDNA; the calculated efficiencies were 95- 105%. Possible gDNA contamination in RNA samples was tested by running RNA- only controls; these samples showed no amplification. Standard qPCR settings were used, and an additional dissociation stage was added to test for the presence of multiple products. The dissociation stage showed only one clear peak in every case. Ct values for each of the six genes under each of the 11 conditions were analyzed using geNorm [118] to determine the relative expression stabilities of the prospective standard genes; the M-values are inversely proportional to the stabilities of the genes (Supplementary Table 8). ACT1 (M = 0.625) and AHC1 (M = 0.775) were considerably less stable in expression than the four genes shown in the table. Statistical analysis of the qPCR results also indicates that ACT1 should not be used as an expression standard in the study of symbiosis in Aiptasia due to the large expression difference between aposymbiotic and symbiotic animals: there was a significant (p = 0.002) up-regulation in ACT1 expression in aposymbiotic (or mostly aposymbiotic) anemones compared to symbiotic anemones across all conditions. This was determined by normalizing qPCR Ct values with the two most stable standard genes (RPL11 and RPS7) and performing a Mann-Whitney statistical test on ACT1 expression levels in aposymbiotic and symbiotic anemones.

Table S8 Assessment of gene-expression stability under various conditions a Protein geNorm Product Primer Gene encoded M Product Sequence Length Efficiency RPL11 Component 0.357 AGCCAAGGTCTTGGAGCAGCTTACAG cDNA 125 bp 98% of the 60S GCCAACAGCCTGTGTTTTCAAAAG ribosomal (INTRON – 236 bp) gDNA 361 bp subunit CTCGCTACACTGTGAGATCTTTTGGAA TCAGAAGGAACGAGAAGATCTCTGTTC ACTGTACTGTCAGAGGCCCAA RPS7 Component 0.380 ACTGCAGTCCACGATGCTATCCTTGAA cDNA 125 bp 97% of the 40S GATCTTGTCTTTCCTAGTGAAATTGTTG ribosomal GCAAAAGGATAAGAGTTAAACTTGAT gDNA 536 bp subunit GGTTCACGTCTCGTTAAAGTG (INTRON – 411 bp) CATCTCGA CAAAGCACAACAGAC NDH5 NADH- 0.423 AGCAGTTGGTAAGTCTGCACAATTAGG cDNA 105 bp 95% dehydrogena CTTACACACTTGGTTACCGGATGCAAT se subunit 5 GGAAGGT (INTRON – 1729 bp) gDNA 1834 bp CCAACTC CGGTGTCTGCCTTGATTCATGCTGCTA CCATGGTTAC GPD1 Glyceraldehy 0.530 AACAGCTTTGGCAGCACCTGTAGAGGC cDNA 114 bp 95% de-3- TGGGATGATATTCTGATTGGCACCTCT phosphate- ACCATCACGCCATTTCT (INTRON – 567 gDNA 681 bp dehydrogena bp) se TCCCACTAGGTCCATCTACAGTCTTCT GGGTTGCTGTGAAAGCA a Tested across the 11 experimental conditions described in Supplementary Table 7.

121

Accession numbers for sequences used in developing the training and test sets for TopSort Cnidarian dataset: Nematostella vectensis (AB126336.1-AB126336.1, AB450038.1- AB450044.1, AB479470.1-AB479474.1, AB495365.1-AB495368.1, AF020956.1- AF020964.1, AF085282.1-AF085283.1, AY286508.1-AY286510.1, AY339866.1- AY339873.1, AY391716.1-AY391717.1, AY465174.1-AY465182.1, AY496945.1- AY496946.1, AY496948.1-AY496949.1, AY530300.1-AY530301.1, AY687348.1- AY687350.1, AY725201.1-AY725205.1, AY730689.1-AY730697.1, DQ066724.1- DQ066725.1, DQ116032.1-DQ116034.1, DQ173687.1-DQ173698.1, DQ358699.1- DQ358704.1, DQ471325.1-DQ471326.1, DQ492688.1-DQ492689.1, DQ493899.1- DQ493901.1, DQ497246.1-DQ497247.1, DQ517920.1-DQ517928.1, DQ826414.1- DQ826417.1, DQ882654.1-DQ882656.1, EF068140.1-EF068151.1, EF173462.1- EF173463.1, EF424410.1-EF424412.1, EU092640.1-EU092641.1, EU162649.1-EU162655.1, EU394531.1-EU394532.1, EU422968.1-EU422972.1, EU877197.1-EU877198.1, FJ824849.1-FJ824851.1, GQ240844.1-GQ240851.1, GU320063.1-GU320067.1, HM004556.1-HM004558.1, HM754642.1-HM754644.1, XM_001617352.1- XM_001642094.1, U42728.2, FJ428244.1, EU289217.1, EF427936.1, DQ632751.1, DQ286294.1, DQ198160.1, AY792510.1, AY651960.1, AY534532.1, AY494080.1, AY457634.1, AY363391.1, AY226090.1, AY226076.1, AY226067.1, AY226056.1, AF540387.2, AF408421.1, AF327845.1, AB495363.1, AB274036.1, AB274034.1); Hydra magnipapillata (AB583744.1-AB583747.1, AM233901.1-AM233903.1, AM393878.1- AM393881.1, AY212265.1-AY212267.1, AY218839.1-AY218840.1, BK004161.1- BK004162.1, DQ073557.1-DQ073558.1, DQ127903.1-DQ127904.1, DQ449927.1- DQ449931.1, EU170504.1-EU170505.1, FJ156099.1-FJ156102.1, FJ177032.1-FJ177033.1, FJ196704.1-FJ196706.1, FJ200200.1-FJ200210.1, FJ205481.1-FJ205489.1, FJ236863.1- FJ236864.1, FJ496649.1-FJ496653.1, FJ517724.1-FJ517728.1, GQ856263.1-GQ856264.1, GU219979.1-GU219981.1, GU256274.1-GU256281.1, XM_002153740.1-XM_002153922.1, XM_002153924.1-XM_002154094.1, XM_002154096.1-XM_002154206.1, XM_002154208.1-XM_002154426.1, XM_002154428.1-XM_002154512.1, XM_002154514.1-XM_002154764.1, XM_002154766.1-XM_002154895.1, XM_002154897.1-XM_002154984.1, XM_002154986.1-XM_002155429.1, XM_002155431.1-XM_002155750.1, XM_002155752.1-XM_002156047.1, XM_002156049.1-XM_002156748.1, XM_002156750.1-XM_002157387.1, XM_002157389.1-XM_002157474.1, XM_002157476.1-XM_002158411.1, XM_002158413.1-XM_002158516.1, XM_002158518.1-XM_002158837.1, XM_002158839.1-XM_002159264.1, XM_002159266.1-XM_002159291.1, XM_002159293.1-XM_002159320.1, XM_002159322.1-XM_002159398.1, XM_002159400.1-XM_002159430.1, XM_002159432.1-XM_002159454.1, XM_002159456.1-XM_002159503.1, XM_002159505.1-XM_002159563.1, XM_002159565.1-XM_002159607.1, XM_002159609.1-XM_002159628.1, XM_002159630.1-XM_002159660.1, XM_002159662.1-XM_002159732.1, XM_002159734.1-XM_002159756.1, XM_002159758.1-XM_002159789.1, XM_002159791.1-XM_002159832.1, XM_002159834.1-XM_002159873.1, XM_002159875.1-XM_002159897.1, XM_002159899.1-XM_002159921.1, XM_002159923.1-XM_002159938.1, XM_002159940.1-XM_002159972.1, XM_002159974.1-XM_002159999.1, XM_002160001.1-XM_002160050.1, XM_002160052.1-XM_002160081.1, XM_002160083.1-XM_002160109.1, XM_002160111.1-XM_002160170.1, XM_002160172.1-XM_002160207.1,

122 XM_002160209.1-XM_002160254.1, XM_002160256.1-XM_002160282.1, XM_002160284.1-XM_002160328.1, XM_002160330.1-XM_002160388.1, XM_002160390.1-XM_002160464.1, XM_002160466.1-XM_002160488.1, XM_002160490.1-XM_002160516.1, XM_002160518.1-XM_002160520.1, XM_002160522.1-XM_002160546.1, XM_002160548.1-XM_002160549.1, XM_002160551.1-XM_002160590.1, XM_002160592.1-XM_002160609.1, XM_002160611.1-XM_002160643.1, XM_002160645.1-XM_002160677.1, XM_002160679.1-XM_002160736.1, XM_002160738.1-XM_002160762.1, XM_002160764.1-XM_002160793.1, XM_002160795.1-XM_002160812.1, XM_002160814.1-XM_002160825.1, XM_002160827.1-XM_002160916.1, XM_002160918.1-XM_002160934.1, XM_002160936.1-XM_002160939.1, XM_002160941.1-XM_002160987.1, XM_002160989.1-XM_002161016.1, XM_002161018.1-XM_002161068.1, XM_002161070.1-XM_002161111.1, XM_002161113.1-XM_002161217.1, XM_002161219.1-XM_002161238.1, XM_002161240.1-XM_002161303.1, XM_002161305.1-XM_002161357.1, XM_002161359.1-XM_002161405.1, XM_002161407.1-XM_002161440.1, XM_002161442.1-XM_002161495.1, XM_002161497.1-XM_002161541.1, XM_002161543.1-XM_002161571.1, XM_002161573.1-XM_002161614.1, XM_002161616.1-XM_002161633.1, XM_002161635.1-XM_002161742.1, XM_002161744.1-XM_002162017.1, XM_002162019.1-XM_002163255.1, XM_002163257.1-XM_002163587.1, XM_002163589.1-XM_002164597.1, XM_002164599.1-XM_002165037.1, XM_002165039.1-XM_002165179.1, XM_002165181.1-XM_002165206.1, XM_002165208.1-XM_002165386.1, XM_002165388.1-XM_002165426.1, XM_002165428.1-XM_002165450.1, XM_002165452.1-XM_002165555.1, XM_002165557.1-XM_002165581.1, XM_002165583.1-XM_002165658.1, XM_002165660.1-XM_002165664.1, XM_002165666.1-XM_002165741.1, XM_002165743.1-XM_002165874.1, XM_002165876.1-XM_002165937.1, XM_002165939.1-XM_002165968.1, XM_002165970.1-XM_002165993.1, XM_002165995.1-XM_002166009.1, XM_002166011.1-XM_002166023.1, XM_002166025.1-XM_002166044.1, XM_002166046.1-XM_002166072.1, XM_002166074.1-XM_002166099.1, XM_002166101.1-XM_002166227.1, XM_002166229.1-XM_002166249.1, XM_002166251.1-XM_002166268.1, XM_002166270.1-XM_002166291.1, XM_002166293.1-XM_002166400.1, XM_002166402.1-XM_002166447.1, XM_002166449.1-XM_002166477.1, XM_002166479.1-XM_002166498.1, XM_002166500.1-XM_002166520.1, XM_002166522.1-XM_002166544.1, XM_002166546.1-XM_002166566.1, XM_002166568.1-XM_002166613.1, XM_002166615.1-XM_002166665.1, XM_002166667.1-XM_002166857.1, XM_002166859.1-XM_002167064.1, XM_002167066.1-XM_002167374.1, XM_002167376.1-XM_002167520.1, XM_002167522.1-XM_002167715.1, XM_002167717.1-XM_002167771.1, XM_002167773.1-XM_002167799.1, XM_002167801.1-XM_002167815.1, XM_002167817.1-XM_002167831.1, XM_002167833.1-XM_002167845.1, XM_002167847.1-XM_002167857.1, XM_002167859.1-XM_002167904.1, XM_002167906.1-XM_002167937.1, XM_002167939.1-XM_002168075.1, XM_002168077.1-XM_002168736.1, XM_002168738.1-XM_002168749.1, XM_002168751.1-XM_002169356.1, XM_002169358.1-XM_002169408.1, XM_002169410.1-XM_002169425.1, XM_002169427.1-XM_002169976.1, XM_002169978.1-XM_002169985.1, XM_002169987.1-XM_002169990.1, XM_002169993.1-XM_002169995.1, XM_002169997.1-XM_002170003.1, XM_002170005.1-XM_002170008.1,

123 XM_002170010.1-XM_002170017.1, XM_002170019.1-XM_002170029.1, XM_002170031.1-XM_002170034.1, XM_002170036.1-XM_002170123.1, XM_002170125.1-XM_002170370.1, XM_002170372.1-XM_002170971.1, XM_002170973.1-XM_002171130.1, XM_002171132.1-XM_002171275.1, X70839.1, X70839.1, X67590.1, X67590.1, U53444.1, U53444.1, U36781.1, U36781.1, HQ184466.1, HQ184466.1, GU199337.1, GU199337.1, GQ983384.1, GQ983384.1, GQ856264.1, FN257513.1, FN257513.1, FJ823136.1, FJ823136.1, FJ222238.1, FJ222238.1, FJ154842.1, FJ154842.1, EU877199.1, EU877199.1, EU787490.1, EU787490.1, EU442372.1, EU442372.1, EU178740.1, EU015880.1, EU015880.1, EF370474.1, EF370474.1, EF010985.1, EF010985.1, DQ518873.1, DQ518873.1, DQ073560.1, DQ073560.1, DQ072591.1, DQ072591.1, AY841903.1, AY841903.1, AY458134.1, AY458134.1, AY422083.1, AY422083.1, AY372112.1, AY372112.1, AY332609.1, AY332609.1, AY225467.1, AY225467.1, AY216501.1, AY216501.1, AY213094.1, AY213094.1, AM233513.2, AM182483.1, AF307098.1, AF307098.1, AF188478.1, AF188478.1, AF043907.1, AF043907.1) Fungal dataset: Schizosaccharomyces pombe (gi|301736437-301750575|); Aspergillus niger (AJ239738.1-AJ239987.1, BE758760.1-BE760957.1, CK769166.1-CK769173.1, DR697868.1-DR710686.1, EY187740.1-EY188372.1, EY223258.1-EY254202.1); Neurospora crassa (AA574464.1-AA574465.1, AA601776.1-AA601777.1, AA738494.1- AA738501.1, AA774383.1-AA774387.1, AA897792.1-AA899039.1, AA901496.1- AA902101.1, AA908001.1-AA908006.1, AI318697.1-AI320510.1, AI320569.1-AI322045.1, AI328149.1-AI330327.1, AI391954.1-AI391955.1, AI391957.1-AI392604.1, AI397485.1- AI399633.1, AI416404.1-AI416428.1, AW708018.1-AW719192.1, AW721859.1- AW725138.1, BE900092.1-BE900100.1, BF072409.1-BF072839.1, BF739420.1- BF739760.1, BG278041.1-BG280722.1, FK707478.1-FK707538.1, GE917356.1- GE999999.1, GH000001.1-GH158787.1); Saccharomyces cerevisiae (AA417440.1- AA417500.1, AA417502.1-AA417537.1, DB636784.1-DB668630.1, EG999314.1-T17502.1, T17635.1-T36312.1, T39110.1-X78018.1, EH038222.1) Dinoflagellate dataset: Alexandrium tamarense (CF751845.1-CF751962.1, CF774560.1- CF774855.1, CF947047.1-CF948546.1, CK431405.1-CK433904.1, CK782344.1- CK786698.1, CV553867.1-CV555405.1), Alexandrium catenella (EX454357.1-EX464203.1, AB212072.1), Alexandrium ostenteldii (HO658038.1-HO663459.1,HO652585.1- HO658036.1), Alexandrium mitum (GW792032.1-GW792241.1, GW792243.1-GW792256.1, GW792258.1-GW792278.1, GW792280.1-GW792403.1, GW792405.1-GW792489.1, GW792491.1-GW792620.1, GW792634.1-GW792636.1, GW792645.1-GW792648.1, GW792652.1-GW792654.1, GW792655.1-GW792657.1, GW792662.1-GW792666.1, GW792680.1-GW792682.1, GW792706.1-GW792708.1, GW792710.1-GW792769.1, GW792771.1-GW792774.1, GW792776.1-GW792787.1, GW792789.1-GW792804.1, GW792805.1-GW792807.1, GW792820.1-GW792821.1, GW792823.1-GW792861.1, GW792863.1-GW792865.1, GW792871.1-GW792976.1, GW792980.1-GW792985.1, GW792988.1-GW793010.1, GW793012.1-GW793017.1, GW793019.1-GW793113.1, GW793115.1-GW793179.1, GW793182.1-GW793185.1, GW793187.1-GW793190.1, GW793193.1-GW793227.1, GW793229.1-GW793255.1, GW793257.1-GW793268.1, GW793270.1-GW793275.1, GW793277.1-GW793281.1, GW793283.1-GW793359.1, GW793361.1-GW793364.1, GW793366.1-GW793367.1, GW793369.1-GW793376.1, GW793411.1-GW793413.1, GW793552.1-GW793554.1, GW793755.1-GW793757.1, GW793766.1-GW793768.1, GW793832.1-GW793834.1, GW793846.1-GW793851.1, GW793853.1-GW793894.1, GW793896.1-GW793925.1, GW793927.1-GW793942.1, GW793944.1-GW793946.1, GW793952.1-GW793954.1, GW793962.1-GW793964.1, GW794043.1-GW794045.1, GW794143.1-GW794145.1, GW794188.1-GW794190.1,

124 GW794217.1-GW794219.1, GW794223.1-GW794225.1, GW794307.1-GW794309.1, GW794319.1-GW794321.1, GW794331.1-GW794334.1, GW794357.1-GW794359.1, GW794412.1-GW794414.1, GW794415.1-GW794417.1, GW794430.1-GW794441.1, GW794443.1-GW794448.1, GW794450.1-GW794460.1, GW794462.1-GW794488.1, GW794490.1-GW794530.1, GW794532.1-GW794623.1, GW794625.1-GW794643.1, GW794645.1-GW794711.1, GW794713.1-GW794894.1, GW794896.1-GW794988.1, GW794990.1-GW795077.1, GW795079.1-GW795089.1, GW795091.1-GW795182.1, GW795184.1-GW795186.1, GW795188.1-GW795246.1, GW795248.1-GW795278.1, GW795280.1-GW795375.1, GW795377.1-GW795395.1, GW795398.1-GW795406.1, GW795410.1-GW795412.1, GW795414.1-GW795415.1, GW795417.1-GW795419.1, GW795422.1-GW795431.1, GW795434.1-GW795444.1, GW795446.1-GW795454.1, GW795458.1-GW795466.1, GW795469.1-GW795479.1, GW795500.1-GW795502.1, GW795513.1-GW795515.1, GW795520.1-GW795543.1, GW795545.1-GW795554.1, GW795557.1-GW795566.1, GW795569.1-GW795612.1, GW795614.1-GW795637.1, GW795640.1-GW795650.1, GW795652.1-GW795662.1, GW795664.1-GW795680.1, GW795682.1-GW795752.1, GW795754.1-GW795761.1, GW795763.1-GW795999.1, GW796001.1-GW796184.1, GW796186.1-GW796252.1, GW796257.1-GW796262.1, GW796264.1-GW796293.1, GW796295.1-GW796353.1, GW796355.1-GW796486.1, GW796488.1-GW796575.1, GW796608.1-GW796610.1, GW796612.1-GW796614.1, GW796616.1-GW796618.1, GW796633.1-GW796635.1, GW796656.1-GW796658.1, GW796729.1-GW796731.1, GW796752.1-GW796754.1, GW796796.1-GW796885.1, GW796573.1, GW795518.1, GW795477.1, GW794428.1, GW793940.1, GW793844.1, GW793374.1, GW792936.1, GW792940.1, GW792960.1, GW792962.1, GW792964.1, GW792883.1, GW792885.1, GW792966.1, GW792835.1, GW792839.1, GW792851.1, GW792854.1, GW792817.1, GW792794.1, GW792704.1, GW792689.1, GW792695.1, GW792676.1, GW792678.1, GW792628.1, GW792630.1, GW792618.1); Karlodinium micrum (EC147064.1-EC163595.1); Karenia brevis (CO059029.1-CO065717.1, CO517335.1-CO517390.1, CV173737.1-CV173976.1, EX864807.1-EX878969.1, EX956452.1-EX980006.1, CV179548.1); Symbiodinium strain KB8 (FE537410.1- FE540062.1). Bacterial dataset: Escherichia coli strain MS 175-1 (gi|EFJ63866-EFJ68735|); Salmonella enterica (EDZ33444-EDZ37920)

125