Genomic insights into the evolutionary origin of Myxozoa within

E. Sally Changa, Moran Neuhofb,c, Nimrod D. Rubinsteind, Arik Diamante, Hervé Philippef,g, Dorothée Huchonb,1, and Paulyn Cartwrighta,1 aDepartment of Ecology and Evolutionary Biology, University of Kansas, Lawrence, KS 66045; bDepartment of Zoology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 6997801, Israel; cDepartment of Neurobiology, George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 6997801, Israel; dDepartment of Molecular and Cellular Biology, Harvard University, Cambridge, MA 02138; eNational Center for Mariculture, Israel Oceanographic and Limnological Research, Eilat 88112, Israel; fCNRS, Station d’Ecologie Expérimentale du CNRS, Moulis 09200, France; and gDépartement de Biochimie, Centre Robert-Cedergren, Universitéde Montréal, Montréal, QC, Canada H3C 3J7

Edited by David M. Hillis, The University of Texas at Austin, Austin, TX, and approved October 16, 2015 (received for review June 12, 2015)

The Myxozoa comprise over 2,000 species of microscopic obligate vermiform life cycle stage (10), which is a derived trait that has been parasites that use both and vertebrate hosts as part lost and regained several times within this particular lineage (11). of their life cycle. Although the evolutionary origin of myxozoans Another obligate cnidarian endoparasite, Polypodium hydriforme, has been elusive, a close relationship with cnidarians, a group that in contrast to myxozoans, does not have a degenerate body form includes corals, sea anemones, jellyfish, and hydroids, is supported and instead displays conventional cnidarian-like features, including by some phylogenetic studies and the observation that the distinc- tentacles, a gut, and a mouth (Fig. 1B). P. hydriforme lies dormant as tive myxozoan structure, the polar capsule, is remarkably similar to a binucleate cell within the oocytes of female acipenseriform fishes the stinging structures (nematocysts) in cnidarians. To gain insight (paddlefish and sturgeon) (Fig. 1B), eventually developing into an into the extreme evolutionary transition from a free-living cnidarian to elongated stolon, which emerges from the host’s oocyte upon a microscopic endoparasite, we analyzed genomic and transcriptomic spawning. Once freely living, the stolon begins to fragment into assemblies from two distantly related myxozoan species, Kudoa iwatai multiple individuals, each developing a mouth with which to feed and Myxobolus cerebralis, and compared these to the transcriptome (Fig. 1B). Adult P. hydriforme infect juvenile female fish to repeat and genome of the less reduced cnidarian parasite, Polypodium hydri- thelifecycle(12). EVOLUTION forme. A phylogenomic analysis, using for the first time to our knowl- Classification of Myxozoa and P. hydriforme has been contro- – edge, a taxonomic sampling that represents the breadth of myxozoan versial (13 15). From the first descriptions in the 1880s until rela- diversity, including four newly generated myxozoan assemblies, con- tively recently myxozoans were considered to be protists, largely due firms that myxozoans are cnidarians and are a sister taxon to P. hydri- to their highly reduced, microscopic construction (16). Unlike forme. Estimations of genome size reveal that myxozoans have one of Myxozoa, the placement of the monotypic species P. hydriforme as a the smallest reported genomes. Gene enrichment analyses cnidarian has long been proposed based on morphology (17, 18). show depletion of expressed genes in categories related to develop- With the advent of molecular phylogenetics, it was discovered that ment, cell differentiation, and cell–cell communication. In addition, a myxozoans are not protists, but instead are metazoans (19, 20). The search for candidate genes indicates that myxozoans lack key elements first studies, based mainly on analyses of 18S rDNA, usually re- of signaling pathways and transcriptional factors important for multi- covered Myxozoa as the sister taxon to P. hydriforme. However, the cellular development. Our results suggest that the degeneration of the myxozoan body plan from a free-living cnidarian to a microscopic par- Significance asitic cnidarian was accompanied by extreme reduction in genome size and gene content. Myxozoans are a diverse group of microscopic parasites that in- fect invertebrate and vertebrate hosts. The assertion that myxo- Myxozoa | Cnidaria | Polypodium | parasite | genome evolution zoans are highly reduced cnidarians is supported by the presence of polar capsules, which resemble cnidarian stinging structures bligate parasitism can lead to dramatic reduction of body called “nematocysts.” Our study characterizes the genomes and plans and associated morphological structures (1, 2). One of transcriptomes of two distantly related myxozoan species, Kudoa O iwatai Myxobolus cerebralis the most spectacular examples is the microscopic Myxozoa, which and , and another cnidarian parasite, Polypodium hydriforme spend most of their parasitic life cycle as just a few cells in size (3). . Phylogenomic analyses that use a broad The most conspicuous myxozoan cell type houses a polar capsule, sampling of myxozoan taxa confirm the position of myxozoans P. hydriforme which is a complex structure with an eversible tube (or filament) within Cnidaria with as the sister taxon to Myxozoa. that is thought to facilitate attachment to the host. The polar cap- Analyses of myxozoan genomes indicate that the transition to the sule bears remarkable similarity to the stinging structures (nema- highly reduced body plan was accompanied by massive reduction in genome size, including depletion of genes considered hallmarks tocysts) of cnidarians (corals, sea anemones, jellyfish, and hydroids), of animal multicellularity. suggesting that nematocysts and polar capsules are homologous and that myxozoans are related to cnidarians (4, 5). Author contributions: D.H. and P.C. designed research; E.S.C., D.H., and P.C. performed Myxozoa are a diverse group of obligate endoparasites that research; E.S.C., A.D., D.H., and P.C. contributed new reagents/analytic tools; E.S.C., M.N., comprise over 2,180 species (6). The vast majority of myxozoan N.D.R., H.P., D.H., and P.C. analyzed data; and D.H. and P.C. wrote the paper. species alternate between a fish and annelid host. In Myxobolus The authors declare no conflict of interest. cerebralis (7), the causative agent for whirling disease in rainbow This article is a PNAS Direct Submission. trout (8), the annelid host Tubifex tubifex releases infective actino- Freely available online through the PNAS open access option. spores (Fig. 1A), which subsequently anchor onto the fish, injecting Data deposition: The data reported in this paper have been deposited in the Sequence the sporoplasm into the host tissue (9). Infective myxospores de- Read Archive database (accession nos. SRX704259, SRX702459, SRX554567, SRX1034928, velop within the fish and are eventually ingested by the annelid SRX1034914, SRX570527, and SRX687102). where they develop into an actinospore, thereby completing the 1To whom correspondence may be addressed. Email: [email protected] or huchond@post. parasitic life cycle (Fig. 1A). Although the vast majority of myxo- tau.ac.il. zoan species are composed of just a few cells, some malacosporean This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. myxozoans, such as Buddenbrockia plumatellae, have a complex 1073/pnas.1511468112/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1511468112 PNAS Early Edition | 1of6 Fig. 1. Life cycles of M. cerebralis and P. hydriforme.(A) M. cerebralis alternates its development between a fish (salmonid) host and an annelid (T. tubifex) host. The myxospore is produced in the fish (Right), and the actinospore is produced in the annelid (Left). Both stages consist of just a few cells, including those housing polar capsules. (B)InP. hydriforme, the stolon stage (Top) develops inside the ovaries of its host (acipenceriform fish). Upon host spawning, P. hydriforme emerges from the host’s oocyte (Right), fragments, and lives as a free-living stage with a mouth (Left) before infecting its host. position of this clade was unstable. It was either placed as the Results sister clade to Bilateria or nested within Cnidaria, depending on Phylogenetic Position. A phylogenomic analysis was performed taxon sampling, alignment, optimization method, and the char- using the newly generated transcriptome assemblies from K. acters considered (13, 19–23). Recent phylogenomic studies sup- iwatai, M. cerebralis, and P. hydriforme as well as genomic data of port a position of Myxozoa within Cnidaria, as the sister clade to Enteromyxum leei and Sphaeromyxa zaharoni in conjunction with Medusozoa (24–26). However, in these studies P. hydriforme and published sequences from three additional myxozoans (Budden- representatives of major lineages of myxozoan and nonmyxozoan brockia plumatellae, Tetracapsuloides bryosalmonae, and Thelo- cnidarians were notably absent. Thus, the precise phylogenetic hanellus kitauei), altogether encompassing 22 cnidarians, 38 position within Cnidaria remains uncertain. representatives of the Metazoan diversity, and 9 unicellular In recent studies, myxozoans were found to possess the cni- opistokont taxa. Both Bayesian analyses using the CAT model darian-specific minicollagen and nematogalectin genes (24, 27, (34) and a maximum-likelihood (ML) analysis using the GTR 28), each of which has been shown to play important roles in the model recovered P. hydriforme as sister to a monophyletic Myx- nematocyst structure in Hydra (29, 30). These studies support ozoa with maximal support [Bayesian posterior probability (PP) previous morphology-based assertions that myxozoan polar of 1.0, ML bootstrap percentage (BP) of 100]. Within a mono- capsules are homologous to cnidarian nematocysts (4, 5, 31, 32) phyletic Cnidaria (PP = 1.0, BP = 100), the Myxozoa + P. hydri- and thus indirectly suggest a close evolutionary relationship be- forme clade was recovered as sister to the medusozoan clade with = = tween these two groups. maximal support (PP 1.0, BP 100) (Fig. 2). The Bayesian and For this study, we analyzed genomic and transcriptomic as- ML topologies differed only in the position of two taxa (Porites and semblies from two distantly related myxosporean myxozoans, Kudoa Strigamia). Several analyses were conducted to evaluate the ro- + iwatai and M. cerebralis, as well as the cnidarian parasite P. hydri- bustness of the position of the Myxozoa P. hydriforme clade within forme, to gain insight into the evolutionary transition to parasitism Cnidaria. Because it has been claimed that ribosomal genes can and extreme reduction of body plans from a free-living cnidarian. contain a different signal from nonribosomal genes (35), phyloge- First, we used these newly generated data, in conjunction with netic analyses were conducted on a dataset of 41,237 amino acids, excluding ribosomal genes (Fig. S1). Additionally, because taxon publicly available data, to determine the phylogenetic relationships sampling can affect phylogenetic inferences, phylogenetic recon- between all major lineages of myxozoans (33), P. hydriforme,and structions were performed either with only cnidarian taxa (Fig. S2) other cnidarians to reconstruct the evolutionary history of endo- or after removing either Myxozoa or P. hydriforme. None of these parasitism in Cnidaria. Second, we compared genome size, gene analyses affected the position of Myxozoa and P. hydriforme or its number, gene content, and enrichment of expressed genes between position as the sister clade to Medusozoa. myxozoans, P. hydriforme, and published cnidarian genomes to de- termine if the degeneration of the cnidarian body plan displayed in Estimation of the Completeness of Genome and Transcriptome Myxozoa (but not in P. hydriforme) was accompanied by genome Assemblies. RNA libraries from M. cerebralis, K. iwatai,and reduction and gene loss. Our findings reaffirm a cnidarian origin for P. hydriforme and DNA libraries from the latter two were se- myxozoans and recover them as the sister group to P. hydriforme. quenced using a short-read Illumina platform. Data were deposited Analysis of genome and transcriptome assemblies reveal that the in the National Center for Biotechnology Information (NCBI) ar- highly degenerate body plan of myxozoans coincided with extreme chives (Table S1). Previously generated M. cerebralis genomic data reduction in genome size and gene loss while retaining some genes (26) were downloaded from the NCBI (SRX208206). Assembly necessary to function as an obligate parasite. By contrast, P. hydri- statistics are shown in Table 1, and size distribution of the tran- forme, which displays many cnidarian-like morphological features, scriptome sequences is shown in Fig. S3. Completeness of the has a genome size and gene content similar to that of published genome and transcriptome assemblies was estimated by de- cnidarian genomes. termining the presence of the 248 ultra-conserved core eukaryotic

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1511468112 Chang et al. 1/88 Nanomia bijuga recovered very few CEGs due to low coverage, its transcriptome Hydractinia polyclina Clytia hemisphaerica Hydrozoa assembly recovered 90% of complete CEGs. 0.2 Hydra magnipapillata Cnidaria Ectopleura larynx Craspedacusta sowerbyi 1/99 Tripedalia cystophora Cubozoa Cnidaria General Characteristics of Genomes. Cyanea capillata Genome size estimates based Stomolophus meleagris Scyphozoa Cnidaria Aurelia aurita on overall coverage of known individual genes are shown in Table 2. Polypodium hydriforme Cnidaria Tetracapsuloides bryosalmonae These estimates suggest that the myxozoan genome of K. iwatai Buddenbrockia plumatellae Enteromyxum leei (22.5 Mb) is one of the smallest reported animal genomes, com- Kudoa iwatai Myxozoa Sphaeromyxa zaharoni parable to the genome of the recently reported parasitic nematode Thelohanellus kitauei ∼ Myxobolus cerebralis ( 20 Mb) (37). The K. iwatai genome is more than 20-fold smaller Gorgonia ventalina Eunicella cavolinii Octocorallia Cnidaria than the estimated size of the P. hydriforme genome (561 Mb) and Pocillopora damicornis 1/- Montastraea faveolata the published genome of the cnidarian Nematostella vectensis (450 Mb) Porites australiensis Acropora cervicornis (38) and more than 40-fold smaller than the published estimated Nematostella vectensis Edwardsiella lineata Hexacorallia Cnidaria genome size of the cnidarian Hydra magnipapillata (1,005 Mb) eques 1/99 tuediae (39). Although the published estimated genome size of the Hormathia digitata Aiptasia pallida myxozoan Thelohanellus kitauei (188.5 Mb), which was based on Latimeria menadoensis Homo sapiens Petromyzon marinus K-mer distribution (40), is eightfold larger than our estimated 1/82 Ciona intestinalis Botryllus schlosseri Deuterostomia K. iwatai genome size, it is still significantly smaller than the 1/96 Branchiostoma floridae Saccoglossus kowalevskii nonmyxozoan cnidarian genomes. As an independent test of the Strongylocentrotus purpuratus 1/75 Patiria miniata accuracy of genome-size estimation, we compared genome size Pediculus humanus 1/- Nasonia vitripennis based on the overall assembly coverage of the K. iwatai genome Daphnia pulex Ecdysozoa Strigamia maritima from two independent sequencing runs (SI Materials and Meth- Ixodes scapularis Helobdella robusta ods). This revealed a very similar genome-size estimate (23.5 Mb). Capitella teleta Octopus vulgaris Lophotrochozoa Due to the low coverage of the published M. cerebralis genomic Ilyanassa obsoleta Crassostrea gigas read data, it was not possible to estimate its genome size. Trichoplax adhaerens Placozoa Sycon coactum The number of protein-coding genes and average intron and Sycon ciliatum Leucetta chagosensis exon sizes were estimated from the genome assemblies using the Corticium candelabrum Oscarella carmela Maker2 (SI Materials and Methods). These analyses revealed that Oscarella sp Euplectella aspergillum Porifera the number of protein-coding genes in the K. iwatai genome (5,533) Aphrocallistes vastus EVOLUTION Sarcotragus fasciculatus P. hydriforme H. Chondrilla nucula is less than 30% of those estimated in (17,440), Ephydatia muelleri Petrosia ficiformis magnipapillata (16,839), and N. vectensis (18,000) (Table 2). In ad- Amphimedon queenslandica Euplokamis dunlapae dition, the myxozoan genome appears to be much more compact, Vallicula multiformis Pleurobrachia pileus Ctenophora with a mean intron size of 82 bp in K. iwatai, compared with 1,163 Mnemiopsis leidyi Beroe abyssicola bp in P. hydriforme,2,673bpinH. magnipapillata,and799bpin Monosiga ovata Acanthoeca sp N. vectensis (Table 2). 1/80 Stephanoeca diplocostata Choanoflagellata Salpingoeca sp Monosiga brevicollis Ministeria vibrans Characteristics of Transcriptomes: Comparisons of Gene Ontology and Capsaspora owczarzaki Sphaeroforma arctica Ichthyosporea Gene Orthology. To identify the biological pathways that have gained Amoebidium parasiticum or lost a significant fraction of expressed genes in the myxozoans Fig. 2. Phylogenetic tree generated from a matrix of 51,940 amino acid M. cerebralis, K. iwatai,andP. hydriforme compared with the cni- positions and 77 taxa using Bayesian inference under the CAT model. Sup- darian model species H. magnipapillata and N. vectensis,Fisher’s port values are indicated only for nodes that did not received maximal exact tests were used to infer enrichment and depletion in the pro- support. Bayesian posterior probabilities/ML bootstrap supports under the portion of genes present in 112 Gene Ontology (GO) categories as PROTGAMMAGTR model are given near the corresponding node. A minus defined by the GOSlim list of CateGOrizer (41) (Table 3 and sign (‘‘−’’) indicates that the corresponding node is absent from the ML Dataset S1). Because the GO terms of P. hydriforme were more bootstrap consensus tree. similar to N. vectensis and H. magnipapillata than to the myxo- zoans M. cerebralis and K. iwatai, the most informative compar- ison was between M. cerebralis and K. iwatai versus P. hydriforme, genes (CEGs), obtained from the Core Eukaryotic Genes Map- H. magnipapillata, and N. vectensis (Dataset S1, Tab 2). Of the top 20 GO categories with the highest occurrences of GO terms ping Approach (CEGMA) database (36) (Table 1). The K. iwatai (Fig. 3), the expressed myxozoan genes appear to be significantly genome and transcriptome assemblies recovered over 70% of the × depleted (by comparison with other cnidarians) in categories that CEGs with over 1,000 estimated mean base-pair coverage. The are related to development, cell differentiation, and cell-to-cell M. cerebralis transcriptome was less complete, recovering only communication (Table 3), consistent with lack of a complex mul- 39% of the CEGs. We were unable to recover any CEGs for M. ticellular body in myxosporean myxozoans. By contrast, myxozoan- cerebralis from its published genomic data, most likely due to expressed genes have an abundance of categories such as cellular their low coverage. Although the P. hydriforme genome assembly function, for which the number of genes does not differ significantly

Table 1. Assembly statistics for sequenced genomes and transcriptomes K. iwatai M. cerebralis P. hydriforme

Genome Transcriptome Genome* Transcriptome Genome Transcriptome

Raw reads 167,917,062 154,253,215 NA 312,202,378 229,917,588 467,431,688 Contigs 1,637 6,528 NA 52,972 83,415 24,523 N50 40,195 1,662 NA 994 3,865 1,475 CEGs (c), %† 179/72 190/77 0/0 97/39 14/6 223/90 † CEGs (p), % 188/76 208/84 0/0 164/66 56/23 232/94

*Published ESTs. †Number/percentage of 248 ultra-conserved CEGs. c, complete; p, partial.

Chang et al. PNAS Early Edition | 3of6 Table 2. Estimated genome characteristics Discussion Kudoa Polypodium Hydra (39) Nematostella (38) Our analyses of transcriptomic and genomic assemblies of myxo- zoans have yielded significant insight into the evolution of these Genome 22.5 561 1,005 450 microscopic parasites from free-living cnidarians. We report, for the size (Mb) first time to our knowledge, a broad phylogenomic sampling of No. protein 5,533 17,440 16,839 18,000 myxozoans, including representatives from the malacosporean genes clade and the freshwater and marine myxosporean clades (33), as GC content, % 28 47 29 41 well as the only phylogenomic study to date to include P. hydri- Mean intron 82 1,163 2,673 799 forme. In addition, we have a more comprehensive sampling of size, bp cnidarians than previous phylogenomic studies, addressing the Mean exon 102 216 218 208 placement of myxozoans (24–26). We recover P. hydriforme as the size, bp sister taxon to Myxozoa and can confirm, with an increased sampling and thus a higher degree of confidence, the placement of this clade as the sister taxon to medusozoan cnidarians. These results are consistent with those of other molecular phylogenetic fromthenumberobservedincnidarians(e.g.,asimilarnumberof studies (19, 23), although these have been criticized as possible nucleoplasm genes was found in both) (Dataset S1, Tab 2). Although artifacts of long-branch attraction (21). The monophyly of these analyses were from transcriptomes, the general patterns likely Myxozoa + P. hydriforme is also supported by endoparasitism in fish, reflect overall genome content as multiple life cycle stages are rep- a unique cell-within-cell developmental stage, possession of a single resented in the combined transcriptome of K. iwatai and M. cerebralis similar type of nematocyst (19, 23), and similarity in minicollagen (SI Materials and Methods). To confirm this, we also performed a GO sequences (28). This phylogenomic pattern suggests that endopar- comparison analysis of the genes predicted based on genomic se- asitism in Cnidaria was a single event that occurred at the base of quences, which revealed the same general patterns for K. iwatai,but Myxozoa + P. hydriforme, but that the dramatic reduction in body not for P. hydriforme whose transcriptome assembly was of better plan occurred following the divergence of P. hydriforme from quality than its genome assembly (Dataset S1 and Fig. S4). myxozoans, as P. hydriforme retains many cnidarian features. Using the OrthoMCL database (42), we determined the number The Myxozoa represent an extreme example of degeneration of of orthologous groups (OG) that could be identified from our body plans due to parasitism. Genome and transcriptome analyses transcriptome assemblies of P. hydriforme and K. iwatai, compared reveal that this degeneration was accompanied by massive genome with published predicted proteins from H. magnipapillata (39) and reduction, with myxozoans having one of the smallest reported N. vectensis (38) (Fig. S5). A total of 8,021 unique OGs were re- animal genomes. Genome size reduction included loss of many covered from H. magnipapillata,11,162fromN. vectensis, 5,451 genes considered hallmarks of metazoan development, yet retention from P. hydriforme, and 2,735 from K. iwatai. Although there was of genes necessary to function as obligate parasites, such as nema- tocyst-specific genes. In contrast to myxozoans, P. hydriforme has a more overlap in the number of OGs between H. magnipapillata, genome similar in size, gene number, and gene content to the N. vectensis,andP. hydriforme than between K. iwatai and the other model system Hydra. This finding is not surprising given that, three, this is consistent with the significantly lower number of total although P. hydriforme is an obligate parasite, it has maintained OGs in K. iwatai (Fig. S5). its cnidarian-like body plan, including epithelia, mouth, gut, and tentacles. Our study provides a robust phylogenetic hypothesis Analyses of Gene Pathways and Candidate Genes in the Transcriptome for myxozoan placement within Cnidaria, as the sister taxon to and Genome Assemblies. We searched for several candidate genes P. hydriforme, and a framework for comparative genomic studies, and gene pathways that have been previously characterized as im- which should be valuable for future phylogenetic and genomic portant for cnidarian cell signaling and development in the tran- investigations of Cnidaria sensu lato. scriptome and genome assemblies of M. cerebralis, K. iwatai,and P. hydriforme. Using BLAST searches and the Kyoto Encyclopedia Materials and Methods of Genes and Genomes (KEGG) pathway analyses we determined Phylogenetic Reconstruction. Curated sequence alignments of 200 protein presence/absence of representatives within a gene family, but not markers (52) were augmented with sequences from the four myxozoan precise orthology, within the particular families. Myxozoan genomes species generated for this study, from P. hydriforme, and from the NCBI appear to lack key genes and signaling pathways that are present in databases (www.ncbi.nlm.nih.gov) [GenBank and Sequence Read Archive P. hydriforme and other cnidarians (Table 4). Specifically, the con- (SRA)] as described (52). After removing positions with unreliable alignment, served transcriptional factors belonging to the Hox and Runx gene the final dataset included 51,940 sites and 12% missing data (Dataset S4). Bayesian tree reconstructions were conducted under the Bayesian CAT families, which have been shown to be important for cnidarian model (34) as implemented in Phylobayes MPI vs.1.5 (53) (SI Materials and patterning and cellular differentiation, respectively (43, 44), were Methods). The ML analysis was conducted under the PROTGAMMAGTR found in neither the genomes nor transcriptomes of myxozoans, but model, as implemented in RAxML 8.1.3 (54). Alignments and Bayesian tree were nevertheless present in P. hydriforme (Table 4 and Dataset S2). In addition, myxozoans appear to have lost the ligands, receptors, and most downstream elements of the Wnt- and Hedgehog-sig- Table 3. Transcriptome-based Gene Ontology categories naling pathways, which have been shown to be important for axial showing depletion in myxozoans compared with other patterning (45) and cell signaling (46), respectively, whereas nearly cnidarians all components of these pathways were recovered in P. hydriforme Category Myxozoa, %* Cnidaria, %† (Table 4 and Dataset S3). By contrast, myxozoans and P. hydriforme possess orthologs to the stem-cell markers FoxO (47) and Piwi (48), Cell differentiation 548/0.0234 1763/0.0302 and P. hydriforme and K. iwatai appear to have the gene Hap2, which Development 1052/0.0450 3892/0.0667 was shown to be involved in gamete fusion in Hydra (49) (Table 4 and Morphogenesis 456/0.0195 1693/0.0290 Dataset S2). The two myxozoans and P. hydriforme were also found to Receptor activity 46/0.00197 310/0.00531 have key elements of the Notch-signaling pathway, and M. cerebralis Signal transducer activity 49/0.0002 328/0.0056 β possessed some elements of the TGF pathway (Table 4 and Dataset Myxozoan total number = 23,388. Cnidarian total number = 53,354. All S3).Notchisreportedtohaveanimportant role in differentiation of categories had a P value ≤0.0001. stem-cell lineages (50), whereas TGFβ appears to play a more general *Myxozoa = K. iwatai + M. cerebralis. role in cell signaling (51). †Cnidaria = H. magnipillata + N. vectensis + P. hydriforme.

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1511468112 Chang et al. + − Response to stress Table 4. Presence ( ) or absence ( ) of genes and KEGG Morphogenesis pathways that have been characterized in other cnidarians Signal transduction K. iwatai M. cerebralis Polypodium Hydrolase activity Genes Cell differentiation Hox-like −− + Cytoplasm Organelle organization Runx −− + and biogenesis Piwi ++ + Transport FoxO ++ + Protein metabolism Hap2 + − + Cell communication KEGG pathways −− − Transferase activity Wnt −− + Binding Hedge −− + Biosynthesis TGFβ − ++ Nucleobase, nucleoside, nucleotide and nucleic acid metabolism Notch ++ + Development Cell organization and biogenesis Intracellular Estimation of Genome Size and Content. Output from the CEGMA runs was used Cell for coverage-based estimates of the genome size for P. hydriforme and K. iwatai. Catalytic activity We used the “dna” output files from the genomic CEGMA runs, which include Metabolism the raw sequence for each region identified by CEGMA as a partial or complete 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% core gene, as well as the 2,000 bp of sequence on each side. Because CEGs were % Mapping to GO term chosen to minimize in-paralogy and therefore should be largely single-copy, mapping reads to these regions provide a simple unbiased estimate of genome Myxobolus Kudoa Polypodium Hydra Nematostella coverage. For each of the genomes, the complete set of raw reads was mapped to the CEGMA output files using the Burrows-Wheeler Aligner, BWA-MEM Fig. 3. GO annotation of unigenes in transcriptomes. The top 20 GO cate- (v.0.7.8), with default settings (59). Coverage was calculated using QualiMap gories are shown as a percentage of total GO terms from the transcriptome 2 (60). Under the assumption that the coverage estimated from the conserved assemblies of M. cerebralis, K. iwatai, P. hydriforme, and the published EVOLUTION CEGMA-identified regions is representative of genome-wide coverage, the protein sequences of H. magnipapillata and N. vectensis. Categories for number of base pairs used in the whole read set for each assembly was divided which myxozoans present significantly fewer GO terms than other cnidarians by the calculated coverage for that species, thus providing an estimate of the are indicated by boldface type. genome size for each species (Table 2). In addition, an independent approach based on coverage of all contigs was used to estimate the genome size of K. iwatai (SI Materials and Methods). Genome annotation was conducted using have been deposited in the TreeBASE repository (55) (purl.org/phylo/treebase/ MAKER2 (SI Materials and Methods). The OrthoMCL database (42) was used to phylows/study/TB2:S17743?format=html). determine the number of orthologous groups identified in the transcriptome assemblies of K. iwatai and P. hydriforme, compared with published predicted Specimen Collection. Collection of specimens of P. hydriforme, K. iwatai proteins in H. magnipapillata (SI Materials and Methods). plasmodia, E. leei trophozoites, and S. zaharoni plasmodia for both genomic and transcriptomic purposes was carried out as described in ref. 28 and Gene Enrichment Analysis of Transcriptomes. Before conducting the gene en- SI Materials and Methods. Flash-frozen M. cerebralis actinospores were richment analysis, duplicate sequences and alternative transcripts were removed kindly provided by Ron Hedrick (University of California at Davis). from the transcriptomes of P. hydriforme and M. cerebralis using methods de- scribed in SI Materials and Methods. No redundant contigs were found Illumina Sequencing. For the M. cerebralis trancriptome assembly, RNA extrac- in the transcriptome of K. iwatai. We used the Trinotate pipeline (SI Materials tion, library preparation, and sequencing was performed for the P. hydriforme and Methods) to annotate the K. iwatai, M. cerebralis,andP. hydriforme unigenes. transcriptome as described (28). Library preparations and genome sequencing of In particular, Trinotate searched the GO database (61) and recovered for each tran- P. hydriforme was carried out at the Genome Sequence Facility at the University script its relevant GO terms. The GO terms of each transcriptome were reduced using of Kansas Medical School. P. hydriforme gDNA was sheared to a size of 350 bp, the GOSlim list in CateGOrizer (41). The database of H. magnipapillata and and 100 bp paired-end (PE) sequencing was performed on an Illumina HiSEq N. vectensis protein sequences was downloaded from the Metazome website 2000. Library preparation and Illumina 100-bp PE HiSEq 2000 sequencing of (metazome.net/). GO terms were also assigned to H. magnipapillata and N. vec- K. iwatai (transcriptome and genome), S. zaharoni (genome), and E. leei tensis using Trinotate (SI Materials and Methods). The GO terms categories of (genome) was described (28). In addition to the Illumina HiSeq sequencing, the Myxozoa and nonmyxozoan Cnidaria were compared for depletion or enrichment K. iwatai genomic library was also independently sequenced with an Illumina using Fisher’s exact tests. The significance level was corrected for multiple testing Genome Analyzer IIx platform, which produced 95,434,687 paired reads (76 bp). using a Bonferroni correction (specifically, α = 0.05 was corrected to α = 0.000446). Gene enrichment analysis of genomes is described in SI Materials and Methods. Genome and Transcriptome Assemblies. The M. cerebralis transcriptome was filtered for read quality and assembled following protocols described for the Analysis of Gene Pathways and Candidate Genes. We searched assembled genomes P. hydriforme transcriptome (28). Genome de novo assemblies for P. hydriforme, and transcriptomes for genes and gene pathways that are considered important E. leei, S. zaharoni,andK. iwatai were performed with ABySS v. 1.3.6 (56), and for cnidarian development. A multistep BLAST-based approach (62) was used for the transcriptome de novo assembly of K. iwatai was performed with Trinity (57). identifying candidate genes (SI Materials and Methods). All BLAST-search results Contigs shorter than 500 and 300 bp were removed from the genomic and are found in Dataset S2. We also assessed the completeness of candidate sig- transcriptomic assemblies, respectively. Filtering methods for host contaminants naling pathways in our assemblies using KEGG (63) (SI Materials and Methods). are described in SI Materials and Methods. The accession of the different as- semblies and short read data are indicated in Table S1. ACKNOWLEDGMENTS. We thank the Oklahoma Paddlefish Research Center for help with collecting P. hydriforme; Ron Hedrick for providing frozen tissue from Analysis of Assembly Completeness. For each transcriptome and genome as- M. cerebralis; A. Shaver for illustrations; and M. Shcheglovitova, Liran Dray, and sembly, relative completeness was assessed using CEGMA that searches for Tamar Feldstein for technical assistance. Genome sequencing services were pro- the presence of 248 ultra-conserved CEGs (58). P. hydriforme assemblies were vided by the University of Kansas Medical School Genome Sequencing Center, run using default settings. Because our evidence indicates that myxozoans the Technion Genome Center, and the Duke Center for Genomic and Computa- – tional Biology Sequencing Core Facility. Computing support was provided by the have unusually small intron sizes (see below), the max_intron_size param- University of Kansas Information and Telecommunication Technology Center. eter for M. cerebralis and K. iwatai genomic CEGMA runs was adjusted to We thank K. Jensen, J. Kelly, J. Ryan, S. Sanders, C. Schnitzler, O. Simakov, and match the maximum intron size of the M. cerebralis intron size distribution R. Steele for discussions and two anonymous reviewers for helpful comments (2,630 bp). and suggestions. This work was supported by the National Science Foundation

Chang et al. PNAS Early Edition | 5of6 (NSF)-Division of Environmental Biology 0953571 (to P.C.), the Israel Science Joint Funding Program in International Collaborations in Organismal Biology Foundation (663/10 to D.H.), and the Binational Science Foundation (BSF)-NSF (ICOB): NSF-IOS 1321759 (to P.C.) and BSF 2012768 (to D.H.).

1. Hoeg JT (1995) The biology and life cycle of the Rhizocephala (Cirripedia). J Mar Biol 37. Burke M, et al. (2015) The plant parasite Pratylenchus coffeae carries a minimal Assoc UK 75(3):517–550. nematode genome. Nematology 17(6):621–637. 2. Kobayashi M, Furuya H, Holland PWH (1999) Dicyemids are higher . Nature 38. Putnam NH, et al. (2007) genome reveals ancestral eumetazoan gene 401(6755):762–762. repertoire and genomic organization. Science 317(5834):86–94. 3. Kent ML, et al. (2001) Recent advances in our knowledge of the Myxozoa. J Eukaryot 39. Chapman JA, et al. (2010) The dynamic genome of Hydra. Nature 464(7288):592–596. Microbiol 48(4):395–413. 40. Yang Y, et al. (2014) The genome of the myxosporean Thelohanellus kitauei shows ad-  4. Stolc A (1899) Actinomyxidies, nouveau groupe de Mesozoaires parent des Myx- aptations to nutrient acquisition within its fish host. Genome Biol Evol 6(12):3182–3198. osporidies. Bull Intl Acad Sci Boheme 22:1–12. 41. Zhi-Liang H, Jie B, Reecy JM (2008) CateGOrizer: A web-based program to batch 5. Weill R (1938) L’interpretation des Cnidosporidies et la valeur taxonomique de leur analyze gene ontology classification categories. Online J Bioinform 9(2):108–112. cnidome. Leur cycle comparé à la phase larvaire des Narcomeduses Cuninides. Travaux 42. Chen F, Mackey AJ, Stoeckert CJ, Jr, Roos DS (2006) OrthoMCL-DB: Querying a com- – de la Station Zoologique de Wimereaux 13:727 744. prehensive multi-species collection of ortholog groups. Nucleic Acids Res 34(Database 6. Lom J, Dyková I (2006) Myxozoan genera: Definition and notes on , life- issue, suppl 1):D363–D368. – cycle terminology and pathogenic species. Folia Parasitol (Praha) 53(1):1 36. 43. Ryan JF, et al. (2006) The cnidarian-bilaterian ancestor possessed at least 56 homeoboxes: 7. Wolf K, Markiw ME (1984) Biology contravenes taxonomy in the myxozoa: New discoveries Evidence from the starlet sea anemone, Nematostella vectensis. Genome Biol 7(7):R64. – show alternation of invertebrate and vertebrate hosts. Science 225(4669):1449 1452. 44. Sullivan JC, et al. (2008) The evolutionary origin of the Runx/CBFbeta transcription 8. Hedrick RP, el-Matbouli M, Adkison MA, MacConnell E (1998) Whirling disease: Re- factors: Studies of the most basal metazoans. BMC Evol Biol 8(1):228. emergence among wild trout. Immunol Rev 166:365–376. 45. Hensel K, Lotan T, Sanders SM, Cartwright P, Frank U (2014) Lineage-specific evolu- 9. El-Matbouli M, Hoffmann RW, Mandok C (1995) Light and electron-microscopic ob- tion of cnidarian Wnt ligands. Evol Dev 16(5):259–269. servations on the route of the triactinomyxon-sporoplasm of Myxobolus cerebralis 46. Matus DQ, Magie CR, Pang K, Martindale MQ, Thomsen GH (2008) The Hedgehog from epidermis into rainbow trout cartilage. J Fish Biol 46(6):919–935. gene family of the cnidarian, Nematostella vectensis, and implications for un- 10. Monteiro AS, Okamura B, Holland PWH (2002) Orphan worm finds a home: Bud- derstanding metazoan Hedgehog pathway evolution. Dev Biol 313(2):501–518. denbrockia is a myxozoan. Mol Biol Evol 19: 968–971. 47. Boehm A-M, et al. (2012) FoxO is a critical regulator of stem cell maintenance in 11. Hartikainen H, Gruhl A, Okamura B (2014) Diversification and repeated morphological tran- immortal Hydra. Proc Natl Acad Sci USA 109(48):19697–19702. sitions in endoparasitic cnidarians (Myxozoa: Malacosporea). Mol Phylogenet Evol 76:261–269. 48. Juliano CE, et al. (2014) PIWI proteins and PIWI-interacting RNAs function in Hydra 12. Raikova EV, Suppes VC, Hoffman GL (1979) The parasitic coelenterate, Polypodium – hydriforme USSOV, from the eggs of the American acipenseriform Polyodon spa- somatic stem cells. Proc Natl Acad Sci USA 111(1):337 342. thula. J Parasitol 65(5):804–810. 49. Steele RE, Dana CE (2009) Evolutionary history of the HAP2/GCS1 gene and sexual 13. Evans NM, Holder MT, Barbeitos MS, Okamura B, Cartwright P (2010) The phyloge- reproduction in metazoans. PLoS One 4(11):e7680. netic position of Myxozoa: Exploring conflicting signals in phylogenomic and ribo- 50. Käsbauer T, et al. (2007) The Notch signaling pathway in the cnidarian Hydra. Dev Biol – somal datasets. Mol Biol Evol 27(12):2733–2746. 303(1):376 390. 14. Foox J, Siddall ME (2015) The road to Cnidaria: History of phylogeny of the Myxozoa. 51. Hobmayer B, Rentzsch F, Holstein TW (2001) Identification and expression of HySmad1, J Parasitol 101(3):269–274. a member of the R-Smad family of TGFbeta signal transducers, in the diploblastic 15. Okamura B, Gruhl A (2015) Myxozoan Affinities and Route to Endoparasitism. metazoan Hydra. Dev Genes Evol 211(12):597–602. Myxozoan Evolution, Ecology and Development (Springer, Berlin), pp 23–44. 52. Philippe H, et al. (2011) Acoelomorph flatworms are deuterostomes related to Xen- 16. Bütschli O (1881) Myxosporidien. Zoologischer Jahrbericht für 1:162–164. oturbella. Nature 470(7333):255–258. 17. Lipin A (1925) Geschlechtliche Form, Phylogenie und systematische Stellung von 53. Lartillot N, Rodrigue N, Stubbs D, Richer J (2013) PhyloBayes MPI: Phylogenetic reconstruction Polypodium hydriforme Ussov. Zool Jahrb Anat 47:541–635. with infinite mixtures of profiles in a parallel environment. Syst Biol 62(4):611–615. 18. Raikova EV (1988) On the systematic position of Polypodium hydriforme Ussov 54. Stamatakis A (2006) RAxML-VI-HPC: Maximum likelihood-based phylogenetic analy- (Coelenterata). Porifera and Cnidaria: Contemporary State and Perspectives of ses with thousands of taxa and mixed models. Bioinformatics 22(21):2688–2690. Investigations, eds Koltum VM, Stepanjants SD (Zoological Institute of Academy of 55. Piel WH, et al. (2009) TreeBASE v. 2: A database of phylogenetic knowledge. Sciences of USSR, Leningrad), pp 116–122. 56. Simpson JT, et al. (2009) ABySS: A parallel assembler for short read sequence data. 19. Siddall ME, Martin DS, Bridge D, Desser SS, Cone DK (1995) The demise of a phylum of Genome Res 19(6):1117–1123. protists: Phylogeny of Myxozoa and other parasitic cnidaria. J Parasitol 81(6):961–967. 57. Grabherr MG, et al. (2011) Full-length transcriptome assembly from RNA-Seq data without 20. Smothers JF, von Dohlen CD, Smith LH, Jr, Spall RD (1994) Molecular evidence that the a reference genome. Nat Biotechnol 29(7):644–652. myxozoan protists are metazoans. Science 265(5179):1719–1721. 58. Parra G, Bradnam K, Korf I (2007) CEGMA: A pipeline to accurately annotate core 21. Evans NM, Lindner A, Raikova EV, Collins AG, Cartwright P (2008) Phylogenetic genes in eukaryotic genomes. Bioinformatics 23(9):1061–1067. placement of the enigmatic parasite, Polypodium hydriforme, within the Phylum 59. Li H, Durbin R (2010) Fast and accurate long-read alignment with Burrows-Wheeler Cnidaria. BMC Evol Biol 8(1):139. transform. Bioinformatics 26(5):589–595. 22. Siddall ME, Whiting MF (1999) Long-branch abstractions. Cladistics 15(1):9–24. 60. García-Alcalde F, et al. (2012) Qualimap: Evaluating next-generation sequencing š 23. Zrzavý J, Hyp a V (2003) Myxozoa, Polypodium, and the origin of the Bilateria: The alignment data. Bioinformatics 28(20):2678–2679. “ ” phylogenetic position of Endocnidozoa in the light of the rediscovery of Budden- 61. Ashburner M, et al.; The Gene Ontology Consortium (2000) Gene ontology: Tool for – brockia. Cladistics 19(3):164 169. the unification of biology. Nat Genet 25(1):25–29. 24. Feng J-M, et al. (2014) New phylogenomic and comparative analyses provide cor- 62. Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: A new generation of protein – roborating evidence that Myxozoa is Cnidaria. Mol Phylogenet Evol 81(0):10 18. database search programs. Nucleic Acids Res 25(17):3389–3402. 25. Jiménez-Guri E, Philippe H, Okamura B, Holland PWH (2007) Buddenbrockia is a 63. Kanehisa M, et al. (2014) Data, information, knowledge and principle: Back to me- – cnidarian worm. Science 317(5834):116 118. tabolism in KEGG. Nucleic Acids Res 42(Database issue, D1):D199–D205. 26. Nesnidal MP, Helmkampf M, Bruchhaus I, El-Matbouli M, Hausdorf B (2013) Agent of 64. Diamant A, Ucko M, Paperna I, Colorni A, Lipshitz A (2005) Kudoa iwatai (Myx- whirling disease meets orphan worm: Phylogenomic analyses firmly place Myxozoa in osporea: Multivalvulida) in wild and cultured fish in the Red Sea: Redescription and Cnidaria. PLoS One 8(1):e54576. molecular phylogeny. J Parasitol 91(5):1175–1189. 27. Holland JW, Okamura B, Hartikainen H, Secombes CJ (2011) A novel minicollagen 65. Dray L, Neuhof M, Diamant A, Huchon D (August 8, 2014) The complete mitochon- gene links cnidarians and myxozoans. Proc R Soc Lond B Biol Sci 278(1705):546–553. drial genome of the devil firefish Pterois miles (Bennett, 1828) (Scorpaenidae). 28. Shpirer E, et al. (2014) Diversity and evolution of myxozoan minicollagens and Mitochondrial DNA, 10.3109/19401736.2014.945565. nematogalectins. BMC Evol Biol 14(1):205. 66. Dray L, Neuhof M, Diamant A, Huchon D (2014) The complete mitochondrial genome 29. Adamczyk P, et al. (2008) Minicollagen-15, a novel minicollagen isolated from Hydra, of the gilthead seabream Sparus aurata L. (Sparidae). Mitochondrial DNA, 10.3109/ forms tubule structures in nematocysts. J Mol Biol 376(4):1008–1020. 30. Hwang JS, et al. (2010) Nematogalectin, a nematocyst protein with GlyXY and ga- 19401736.2014.928861. lectin domains, demonstrates nematocyte-specific alternative splicing in Hydra. Proc 67. Li B, Dewey CN (2011) RSEM: Accurate transcript quantification from RNA-Seq data Natl Acad Sci USA 107(43):18539–18544. with or without a reference genome. BMC Bioinformatics 12(1):323. 31. Reft AJ, Daly M (2012) Morphology, distribution, and evolution of apical structure of 68. Haas BJ, et al. (2013) De novo transcript sequence reconstruction from RNA-seq using – nematocysts in hexacorallia. J Morphol 273(2):121–136. the Trinity platform for reference generation and analysis. Nat Protoc 8(8):1494 1512. 32. Okamura B, Gruhl A, Reft AJ (2015) Cnidarian Origins of the Myxozoa: Myxozoan 69. Huang X, Madan A (1999) CAP3: A DNA sequence assembly program. Genome Res Evolution, Ecology and Development (Springer, Berlin), pp 45–68. 9(9):868–877. 33. Fiala I, Bartosová P (2010) History of myxozoan character evolution on the basis of 70. Langmead B, Salzberg SL (2012) Fast gapped-read alignment with Bowtie 2. Nat rDNA and EF-2 data. BMC Evol Biol 10(1):228. Methods 9(4):357–359. 34. Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities 71. Holt C, Yandell M (2011) MAKER2: An annotation pipeline and genome-database in the amino-acid replacement process. Mol Biol Evol 21(6):1095–1109. management tool for second-generation genome projects. BMC Bioinformatics 12:491. 35. Nosenko T, et al. (2013) Deep metazoan phylogeny: When different genes tell dif- 72. Min XJ, Butler G, Storms R, Tsang A (2005) OrfPredictor: Predicting protein-coding regions ferent stories. Mol Phylogenet Evol 67(1):223–233. in EST-derived sequences. Nucleic Acids Res 33(Web Server issue, suppl 2):W677–W680. 36. Parra G, Bradnam K, Ning Z, Keane T, Korf I (2009) Assessing the gene space in draft 73. Bardou P, Mariette J, Escudié F, Djemiel C, Klopp C (2014) jvenn: An interactive Venn genomes. Nucleic Acids Res 37(1):289–297. diagram viewer. BMC Bioinformatics 15(1):293.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1511468112 Chang et al. Supporting Information

Chang et al. 10.1073/pnas.1511468112 SI Materials and Methods species and 200 ribosomal and nonribosomal protein genes (51,940 Specimen Collection. Locality information is given in Table S1. amino acids); and (iv) a dataset that includes 30 cnidarian species Actinospores of M. cerebralis (kindly provided by R. Hedrick, and 128 nonribosomal protein genes (41,237 amino acids). Addi- University of California, Davis, CA) were collected and flash- tional analyses were also performed excluding either Myxozoa or frozen as they emerged from the annelid host (Fig. 1). For P. hydriforme. For all datasets, Bayesian tree reconstructions were K. iwatai, plasmodia were collected from the fish host. K. iwatai conducted under the CAT model (34) as implemented in Phylo- plasmodia form pseudocysts encapsulated by host cells. Inside bayes MPI vs.1.5 (53). For the third dataset, the CAT-GTR model, the plasmodia, the myxospores are at various stages of matura- which is more computationally intensive, was also used. For the tion (64). Because our RNA extractions were based on several analyses of datasets 1 and 3, two independent chains were run for cysts pooled together, we can assume that our RNA data rep- 10,000 cycles, and trees were saved every 10 cycles. The first 2,000 resent all Kudoa life stages present in the fish. Unfortunately, the trees were discarded (burn-in). For the analysis of the second annelid host of K. iwatai is unknown. P. hydriforme was collected dataset, the two chains were run for 6,000 generations and sampled and flash-frozen 3–5 d after emerging from the host’s oocytes, every 10 trees, and the first 2,000 trees were discarded. For the after it has fragmented into free-living individuals (Fig. 1). analysis of the fourth dataset, the two chains were run for 20,000 generations and sampled every 10 trees, and the first 5,000 trees Host Contaminant Filtering. To filter the assemblies from fish con- were discarded. Chain convergence was evaluated with the bpcomp taminants, genomic sequences were obtained for Sparus aurata and tracecomp programs of the Phylobayes software. The maximum (E. leei and K. iwatai host) and Pterois miles (S. zaharoni host) as and average differences observed at the end of each run were lower described (65, 66). Blast searches were conducted to eliminate con- than 0.0005 for all analyses. Similarly, the effsize and rel_diff pa- taminating fish sequences from the genomic assemblies. Specifically rameters were always higher than 30 and lower than 0.3, re- BLASTN (version 2.2.27+) searches were performed for each of spectively, which indicates a correct chain convergence because our the three myxozoans, using the genomic assembly sequences as analyses investigate the topology rather than branch length and all query against a database of their respective fish host DNA contigs. relevant posterior probabilities = 1.TheMLanalyseswerecon- Sequences of S. aurata available in the NCBI dbEST (www.ncbi. ducted for each dataset under the PROTGAMMAGTR as im- nlm.nih.gov/dbEST/) were also included. The BLASTN parameters plemented in RAxML 8.1.3 (54). Bootstrap support was computed that were used are: -e-value 1e-75 and -perc_identity 85 (62). All after 250 rapid bootstrap replicates. Alignments and Bayesian tree sequences that passed this threshold were considered to be con- have been deposited in the TreeBASE repository (55) (purl.org/ taminants. Furthermore, we performed additional BLASTN phylo/treebase/phylows/study/TB2:S17743?format=html). searches against the NCBI nonredundant nucleotide database (e.g., to remove other contaminant such as bacterial sequences). Genome Size Estimation. Independent estimate of K. iwatai genome To filter the K. iwatai RNA assembly from contaminants (ei- size based on assembly coverage using reads and contig sequences ther host RNA or other sample contaminants), we ran RSEM from two independent sequencing runs. We ran CAP3 (69) with (67) with the default parameters and filtered the low abundance the parameters -o 300 -p 90 (300 bp overlap between contigs and transcripts using the filter_fasta_by_rsem_values.pl script sup- 90% identity in the overlapping sequence) on the GIIx K. iwatai plied by Trinity (68) with default parameters (–fpkm_cutoff DNA assembly to remove redundant contigs. A total of 952 re- = 1200–isopct_cutoff = 1.00). We then ran BLASTN with dundant contigs were removed using CAP3. To filter the K. iwatai -e-value 1e-75 and -perc_identity 85 against the sequences of the HiSeq DNA reads from contaminants, we created a contaminant contaminant database described above (mainly, S. aurata DNA database consisting of the Kudoa sequences identified as con- contigs and ESTs available in the NCBI database). We also added taminants, the S. aurata DNA contigs obtained, and all S. aurata the sequence of the S. aurata mitochondrial genome (65). All ESTs available in the NCBI EST database on May 2014. Bowtie2 sequences identified were removed. We then performed BLASTN was used with default settings to align the K. iwatai HiSeq DNA against the NCBI nucleotide database on the remaining sequences reads to the contaminant database. The –unconc flag was used to with -e-value 1e-75 and -perc_identity 80. We removed all of the save the reads that did not map to the contaminant database to contigs that matched any euteleost sequence with over 80% separate paired-end FASTQ files. A total of 164,284,209 reads identity and other taxa (e.g., fungi, Drosophila) with over 90% remained after this step. We then used Bowtie2 (70) to align the identity. Finally, we then performed a BLASTN search against the filtered paired-end HiSeq reads to the GIIx assembly with default two filtered K. iwatai DNA assemblies (HiSeq and GIIx assem- parameters. The coverage was calculated using bedtools genomecov blies) with an –e-value 1e-5 threshold and removed sequences that -d –ibam on the output of Bowtie2 (70). The average coverage-per- could not align to any of the DNA assemblies. P. hydriforme ge- position of the GIIx assembly was estimated (1391.6; SD: 1095.4; nomic and transcriptomic sequences were filtered using sequence SE: 0.2577). The genome size was then estimated by dividing the material from the paddlefish (Polyodon spathula)oocytetran- number of base pairs sequenced (filtered reads) by the coverage scriptome as described (28). according to the following formula: (number of paired reads) × (read length) × 2)/(average coverage-per-position) = (163,897,028 Phylogenetic Reconstruction. Phylogenetic reconstructions based × 100 × 2)/1,391.6 = 23,555,192. Using this method, the genome on the Bayesian and maximum-likelihood criteria were per- size was thus estimated to be ∼23.5 Mbp. formed for different gene and species combinations: (i) a dataset that includes 77 species representative of the animal diversity Genome Annotation. Genome annotation was conducted using with their closest outgroups and 200 ribosomal and non- MAKER2 (71), incorporating the Semi-HMM based Nucleic ribosomal protein genes (51,940 amino acids); (ii) a dataset that Acid Parser (SNAP) gene predictor software to assess gene includes 77 species representative of the animal diversity with content of the K. iwatai and P. hydriforme genome assemblies. their closest outgroups and 128 nonribosomal protein genes For each assembly, MAKER2 was first run using the assembled (41,237 amino acids); (iii) a dataset that includes 30 cnidarian transcriptome for each species (EST evidence), a file of the core

Chang et al. www.pnas.org/cgi/content/short/1511468112 1of6 CEGMA proteins, and a random precompiled eukaryotic HMM downloaded from NCBI. Each protein FASTA file was uploaded to profile to train SNAP. The output of this training was a species- the OrthoMCL Groups web server (www.orthomcl.org/orthomcl/ specific HMM profile for each assembly created by SNAP. In the proteomeUpload.do). OrthoMCL analysis results in lists of next round, MAKER2 was used to annotate the K. iwatai and P. OrthoGroup assignments for each protein in an assembly. These hydriforme genomes by using EST evidence, protein evidence for were parsed using R to create lists of unique ortholog group IDs each species, and the species-specific HMM files generated in (OGs) found in each assembly. These lists were used as input for the training round. The number of genes found by MAKER2 the jvenn web-server (bioinfo.genotoul.fr/jvenn/example.html)(73), and other annotation statistics were tabulated using the gene- which calculated the overlaps between all combinations of the stats function of the SNAP package (Table 2). Mean intron and exon sizes for H. magnipapillata and N. vectensis were calculated lists of OGs and created a four-way Venn Diagram to visualize from the annotated scaffolds from the Joint Genome Institute. these overlaps (Fig. S3). An independent estimate of intron size for K. iwatai was per- formed by mapping the RNA contigs onto the DNA contigs Analysis of Gene Pathways and Candidate Genes. For each candidate (a total of 23,393 introns were evaluated). This method revealed gene, a sequence from H. magnipapillata (Dataset S2) was used as a similar mean intron length estimate (i.e., 85.4 bp). a query for performing a tblastx search (e-value cutoff: 1e-03) against the genome and transcriptome assemblies. To confirm Gene Enrichment Analysis of Predicted Genes from Annotated Genomes. their cnidarian identity, assembly sequences with significant hits The gene contigs longer than 300 bp, predicted by MAKER2, for the were then BLAST-searched against the NCBI NR sequence K. iwatai and P. hydriforme assembly were annotated using the database using the blastx algorithm. We also assessed the com- Trinotate pipeline. The GO terms provided by the Trinotate an- pleteness of candidate signaling pathways in our assemblies using notation (68) were then analyzed and compared with those assigned KEGG. Genomic and transcriptomic materials were combined to the transcriptome of K. iwatai and P. hydriforme and to the into one file per species and sent through the KEGG Automatic H. magnipapillata and N. vectensis protein sequences, as described Annotation Server (KAAS) for ortholog assignment and pathway for the gene enrichment analysis of transcriptomes. mapping (www.genome.jp/tools/kaas/). The KAAS assigned KEGG OrthoMCL Analysis. The OrthoMCL database (42) was used to de- orthology (KO) terms for each species dataset using the single- termine the number of orthologous groups identified in the tran- directional best-hit method against a representative eukaryotic scriptome assemblies of K. iwatai and P. hydriforme, compared with dataset. After assignment of KO terms, completeness of the candi- published predicted proteins in H. magnipapillata and N. vectensis. date pathways compared with their canonical pathway was assessed ORFs were predicted and sequences translated using the OrfPre- and visualized in each species using the KEGG Mapper tool (www. dictor server (72). Predicted proteins for H. magnipapillata were genome.jp/kegg/tool/map_pathway1.html).

Chang et al. www.pnas.org/cgi/content/short/1511468112 2of6 1/73 Nanomia bijuga 0.2 1/99 Hydractinia polyclina Clytia hemisphaerica Hydrozoa Hydra magnipapillata Cnidaria Ectopleura larynx Craspedacusta sowerbyi Tripedalia cystophora Cubozoa Cnidaria Cyanea capillata Stomolophus meleagris Scyphozoa Cnidaria Aurelia aurita Polypodium hydriforme Cnidaria Tetracapsuloides bryosalmonae Buddenbrockia plumatellae Kudoa iwatai Myxozoa Enteromyxum leei Sphaeromyxa zaharoni Thelohanellus kitauei Myxobolus cerebralis Gorgonia ventalina Eunicella cavolinii Octocorallia Cnidaria Pocillopora damicornis Montastraea faveolata Porites australiensis Acropora cervicornis Nematostella vectensis 1/93 Edwardsiella lineata Hexacorallia Cnidaria Hormathia digitata Aiptasia pallida Saccoglossus kowalevskii Strongylocentrotus purpuratus Patiria miniata Latimeria menadoensis Homo sapiens Deuterostomia Petromyzon marinus 1/68 Ciona intestinalis 1/94 Botryllus schlosseri Branchiostoma floridae 1/- Pediculus humanus 1/- Nasonia vitripennis Daphnia pulex Ecdysozoa Strigamia maritima Ixodes scapularis Helobdella robusta Capitella teleta Octopus vulgaris Lophotrochozoa Ilyanassa obsoleta Crassostrea gigas Trichoplax adhaerens Placozoa Sycon coactum Sycon ciliatum Leucetta chagosensis Corticium candelabrum Oscarella carmela Oscarella sp Euplectella aspergillum Porifera Aphrocallistes vastus Sarcotragus fasciculatus Chondrilla nucula Ephydatia muelleri Petrosia ficiformis Amphimedon queenslandica Euplokamis dunlapae Vallicula multiformis Pleurobrachia pileus Ctenophora Mnemiopsis leidyi Beroe abyssicola Monosiga ovata Acanthoeca sp Stephanoeca diplocostata Choanoflagellata 1/- Salpingoeca sp Monosiga brevicollis Ministeria vibrans Capsaspora owczarzaki Sphaeroforma arctica Amoebidium parasiticum Ichthyosporea

Fig. S1. Phylogenetic tree generated from a matrix of 41,237 amino acid positions, which excludes ribosomal genes, and 77 taxa using Bayesian inference under the CAT model. Support values are indicated only for nodes that did not receive maximal support. Bayesian posterior probabilities/ML bootstrap supports under the PROTGAMMAGTR are given near the corresponding node. A minus sign (‘‘–’’) indicates that the corresponding node is absent from the ML bootstrap consensus tree.

Chang et al. www.pnas.org/cgi/content/short/1511468112 3of6 A Nanomia bijuga Hydractinia polyclina 1/1/56 Clytia hemisphaerica Hydra magnipapillata Hydrozoa Ectopleura larynx 1/1/98 Craspedacusta sowerbyi Tripedalia cystophora Cubozoa Cyanea capillata Stomolophus meleagris Scyphozoa Aurelia aurita Polypodium hydriforme Tetracapsuloides bryosalmonae Buddenbrockia plumatellae Myxozoa Kudoa iwatai Enteromyxum leei Sphaeromyxa zaharoni Thelohanellus kitauei Myxobolus cerebralis Gorgonia ventalina Octocorallia 1/1/- Eunicella cavolinii Acropora cervicornis Porites australiensis Pocillopora damicornis Montastraea faveolata Nematostella vectensis Edwardsiella lineata Hexacorallia Urticina eques Bolocera tuediae Hormathia digitata Aiptasia palida 0.2

B Nanomia bijuga 1/44 Hydractinia polyclina Clytia hemisphaerica Hydra magnipapillata Hydrozoa Ectopleura larynx Craspedacusta sowerbyi Tripedalia cystophora Cubozoa Cyanea capillata Stomolophus meleagris Scyphozoa Aurelia aurita Polypodium hydriforme Tetracapsuloides bryosalmonae Buddenbrockia plumatellae Myxozoa Kudoa iwatai 0.99/100 Enteromyxum leei Sphaeromyxa zaharoni Thelohanellus kitauei Myxobolus cerebralis Gorgonia ventalina Eunicella cavolinii Octocorallia Acropora cervicornis Porites australiensis Pocillopora damicornis Montastraea faveolata Nematostella vectensis Edwardsiella lineata Hexacorallia Urticina eques Bolocera tuediae Hormathia digitata Aiptasia palida 0.2

Fig. S2. (A) Phylogenetic tree generated from a matrix of 51,940 amino acid sequences and 30 cnidarian taxa using Bayesian inference under the CAT model. Support values are indicated only for nodes that did not receive maximal support. Bayesian posterior probabilities under the CAT model/Bayesian posterior probabilities under the CAT-GTR model/ML bootstrap supports under the PROTGAMMAGTR are given near the corresponding node. A minus sign (‘‘–’’)in- dicates that the corresponding node is absent from the ML bootstrap consensus tree. (B) Phylogenetic tree generated from a matrix that excludes ribosomal genes, comprising 41,237 amino acid sequences and 30 cnidarian taxa using Bayesian inference under the CAT model. Support values are indicated only for nodes that did not receive maximal support. Bayesian posterior probabilities/ML bootstrap supports under the PROTGAMMAGTR are given near the corre- sponding node.

Chang et al. www.pnas.org/cgi/content/short/1511468112 4of6 Myxobolus Kudoa Polypodium

8000

7000

6000

5000

4000

3000 Number of contigs

2000

1000

0

Contig lengths (bp)

Fig. S3. Sequence size distribution of the assembled transcriptome sequences.

Response to stress Morphogenesis Signal transduction Hydrolase activity Cell differentiation Cytoplasm Organelle organization and biogenesis Transport Protein metabolism Cell communication Transferase activity Binding Biosynthesis Nucleobase, nucleoside, nucleotide and nucleic acid metabolism Development Cell organization and biogenesis Intracellular Cell Catalytic activity Metabolism 0% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% % Mapping to GO term

Kudoa (genome) Kudoa (transcriptome) Polypodium (genome) Polypodium (transcriptome) Hydra Nematostella

Fig. S4. GO annotation of unigenes in genomes and transcriptomes. The top 20 GO categories are shown as a percentage of total GO terms from the assemblies of K. iwatai, P. hydriforme, and the published protein sequences of H. magnipapillata and N. vectensis. Categories for which K. iwatai presents significantly fewer GO terms than other cnidarians are indicated in boldface type.

Chang et al. www.pnas.org/cgi/content/short/1511468112 5of6 Nematostella Kudoa

4063 174 Hydra Polypodium 44

2081 24

142 84 1148 355

2178

31 369

58 2201

182

Size of each list

11162 11162

5581 8021 2735 5451

# of Ortho Groups 0 Hydra Nematostella Kudoa Polypodium

Species

Fig. S5. Comparison of OGs in myxozoan and other cnidarian transcriptomes. VENN diagram comparing OGs for the OrthoMCL database from transcriptome assemblies of K. iwatai and P. hydriforme and published predicted proteins in H. magnipapillata and N. vectensis.

Table S1. Sample information and accession numbers Reads Genome/transcriptome Species DNA/RNA Platform (bp PE) BioProject BioSample SRA experiments shotgun assembly Locality

K. iwatai DNA G2x 76 PRJNA261422 SAMN03068681 SRX704259 JRUY00000000 Red Sea, Eilat K. iwatai DNA HiSEq. 2000 100 PRJNA261052 SAMN03068681 SRX702459 JRUX00000000 Red Sea, Eilat K. iwatai RNA HiSEq. 2000 100 PRJNA248713 SAMN02800925 SRX554567 GBGI00000000 Red Sea, Eilat E. leei DNA HiSEq. 2000 100 PRJNA284325 SAMN03701405 SRX1034928 LDNA00000000 Red Sea, Eilat S. zaharoni DNA HiSEq. 2000 100 PRJNA284326 SAMN03701400 SRX1034914 LDMZ00000000 Red Sea, Eilat M. cerebralis RNA HiSEq. 2000 100 PRJNA258474 SAMN02998096 GBKL00000000 California P. hydriforme RNA HiSEq. 2000 100 PRJNA251648 SAMN02837860 SRX570527 GBGH00000000 Grand Lake State Park, OK P. hydriforme DNA HiSEq. 2000 100 PRJNA259515 SAMN02998112 SRX687102 Grand Lake State Park, OK

Other Supporting Information Files

Dataset S1 (XLSX) Dataset S2 (XLSX) Dataset S3 (PDF) Dataset S4 (XLSX)

Chang et al. www.pnas.org/cgi/content/short/1511468112 6of6 Wnt pathway components identified in the genome and transcriptome of Kudoa iwatai through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

Wnt pathway components identified in the genome and transcriptome of Myxobolus cerebralis through assignment of KEGG orthology terms. Components identified in this species are highlighted in green

Wnt pathway components identified in the genome and transcriptome of Polypodium hydriforme through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

Hedgehog signaling pathway components identified in the genome and transcriptome of Kudoa iwatai through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

Hedgehog signaling pathway components identified in the genome and transcriptome of Myxobolus cerebralis through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

Hedgehog signaling pathway components identified in the genome and transcriptome of Polypodium hydriforme through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

TGF-β signaling pathway components identified in the genome and transcriptome of Kudoa iwatai through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

TGF-β signaling pathway components identified in the genome and transcriptome of Myxobolus cerebralis through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

TGF-β signaling pathway components identified in the genome and transcriptome of Polypodium hydriforme through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

Notch signaling pathway components identified in the genome and transcriptome of Kudoa iwatai through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

Notch signaling pathway components identified in the genome and transcriptome of Myxobolus cerebralis through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.

Notch signaling pathway components identified in the genome and transcriptome of Myxobolus cerebralis through assignment of KEGG orthology terms. Components identified in this species are highlighted in green.