bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 A Reference Genome from the symbiotic hydrozoan, viridissima 3 4 Mayuko Hamada*,+, ++,1, Noriyuki Satoh*, Konstantin Khalturin* 5 6 7 *Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate 8 University, Onna, Okinawa 904-0495, Japan 9 10 + Ushimado Marine Institute, Okayama University, Setouchi, Okayama 701-4303, Japan 11 12 ++Zoological Institute, Kiel University, Kiel 24118, Germany 13

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

14 Short running title: 15 Green hydra Hydra viridissima genome 16 17 Keywords: 18 green hydra, Hydra viridissima A99, whole genome sequencing, de novo assembly, 19 symbiosis 20 21 Corresponding author1: 22 Mayuko Hamada 23 Ushimado Marine Institute, Okayama University, 24 Kashino 130-17, Ushimado, Setouchi, Okayama 701-4303, Japan 25 Tel: +81-869-34-5210 26 Email: [email protected] 27

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

28 ABSTRACT

29

30 Cnidarians are one of the oldest eumetazoan taxa, and are thought to be a sister group to 31 all bilaterians. In spite of comparatively simple morphology, cnidarians exhibit diverse 32 body forms and life histories. In addition, many cnidarian species establish symbiotic 33 relationships with microalgae. Various Hydra species have been employed as model 34 organisms since the 18th century. Introduction of transgenic and knock-down 35 technologies made them ideal experimental systems for studying cellular and molecular 36 mechanisms involved in regeneration, body-axis formation, senescence, symbiosis, and 37 holobiosis. In order to provide an important reference for genetic studies, the Hydra 38 magnipapillata genome was sequenced. However, the initial published version of the H. 39 magnipapillata genome did not achieve assembly continuity comparable to those of other 40 model systems, due mainly to a large number of transposable elements. For almost a 41 decade, the highly fragmented genome assembly of H. magnipapillata (scaffold 42 N50=128Kb) has remained the only genomic resource for this genus with several dozen 43 species. Here we report a draft 280-Mbp genome assembly for Hydra viridissima strain 44 A99, a symbiotic, early diverging member of the Hydra clade, with a scaffold N50 of 1.1 45 Mbp. The H. viridissima genome contains an estimated 21,476 protein-coding genes. 46 Comparative analysis of Pfam domains and orthologous proteins highlights characteristic 47 features of H. viridissima, such as diversification of innate immunity genes that are 48 important for host-symbiont interactions. Thus, the Hydra viridissima assembly provides 49 an important hydrozoan genome reference that will facilitate symbiosis research and 50 better comparisons of metazoan genome architectures. (245 words)

51

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

52 INTRODUCTION

53

54 The is an evolutionarily ancient and well-defined phylum, characterized by the

55 possession of nematocytes (Brusca et al. 2016). Cnidarian species belong to the

56 Medusozoa, which consists of the , the Scyphozoa and the Cubozoa, and the

57 Anthozoa (Fig. 1A). Although cnidarian morphology exhibits astonishingly diverse

58 forms and life styles, those of the fresh water Hydrozoa Hydra are relatively simple. It

59 possesses only a polyp stage, while other medusozoans exhibit alternation of polyp and

60 jellyfish generations. With its simple body structure and easy laboratory cultivation,

61 Hydra has been an experimental model for studying cellular and molecular mechanisms

62 underlying the formation of the body axis (Bode 2011), regeneration (Trembley et al.

63 1744; Bode 2003; Holstein et al. 2003), and also holobiotic relationships with microbiota

64 (Deines and Bosch 2016). Introduction of transgenic and knock-down technologies

65 further promoted these studies (Wittlieb et al. 2006). In order to provide genetic

66 information for these studies, the brown hydra, Hydra magnipapillata, genome was

67 sequenced (Chapman et al. 2010).

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

68 In addition to H. magnipapillata, genomes of representative species belonging

69 to each subgroup of cnidarians have been sequenced, including the other hydrozoans

70 Clytia hemisphaerica (Leclère et al. 2019), the scyphozoan jellyfish, Aurelia aurita

71 (Khalturin et al. 2019; Gold et al. 2019) and the cubozoan box jellyfish, Morbakka

72 virulenta (Khalturin et al. 2019), the anthozoans sea anemones Nematostella vectensis

73 (Putnam et al. 2007), Aiptasia (Baumgarten et al. 2015), the coral Acropora digitifera

74 (Shinzato et al. 2011) and other coral species. The quality of the initial published version

75 of the Hydra magnipapillata genome assembly was approximately 4-22x poorer than

76 those of other cnidarian genomes. A burst of transposable elements occurred in the brown

77 hydra lineage, resulting in a large genome size (Chapman et al. 2010; Wong et al. 2019)

78 and making long assembly of the genome difficult.

79 Compared to H. magnipappilata, the genome size of green hydra Hydra

80 viridissima was expected to be much smaller (Zacharias et al. 2004). H. viridissima is the

81 most basal species in the genus Hydra (Fig. 1B) (Martínez et al. 2010; Schwentner and

82 Bosch 2015), and a symbiotic species that establishes a mutualistic relationship with

83 microalgae to exchange metabolites (Fig. 1C) (Muscatine 1965; Cernichiari et al. 1969;

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

84 Mews 1980; McAuley 1991). While symbiosis with dinoflagellates is observed in many

85 marine cnidarians, such as corals, jellyfish, and sea anemones, H. viridissima harbors the

86 green alga, (Douglas and Huss 1986; Huss et al. 1993/1994; Davy et al. 2012).

87 While the two lineages have similar, simple body structures, as described

88 above, there are significant differences between green and brown hydras. However, little

89 is known about the genetics that enable green hydras to support this unique symbiosis. In

90 addition, the evolutionary trajectory of the Hydra lineage among the Medusozoa is still

91 unclear. Here we report a draft assembly of the ~284-Mbp genome of Hydra viridissima

92 strain A99 as another better reference of Hydra genomes, and demonstrate characteristics

93 of the green hydra genome, including transposable elements and genes involved in its

94 body plan and in symbiosis.

95 (497/500 words)

96

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

97 MATERIALS AND METHODS

98

99 Hydra and extraction of DNA

100 The Australian Hydra viridissima strain A99, which was kindly provided by Dr. Richard

101 Campbell, at the University of California at Irvine, was used in this study. Polyps were

102 maintained at 18°C on a 12-hour light/dark cycle and fed with Artemia two or three times

103 a week. DNA for genome sequencing were isolated from about 1000 polyps that were

104 clonally cultured. Before genomic DNA extraction, symbiotic algae in H. viridissima

105 were removed by photobleaching with 5 μM DCMU

106 (3-(3,4-dichlorophenyl)-1,1-dimethylurea), as described previously (Pardy 1976;

107 Habetha et al. 2003). To remove contamination from other organisms, polyps were

108 starved and treated with antibiotics (50 mg/L ampicillin, rifampicin, neomycin, and

109 streptomycin) for one week.

110 After several rounds of washing in sterilized culture medium, polyps were

111 lysed in DNA extraction buffer (10 mM Tris-HCl, pH 8.0, 100 mM NaCl, 25 mM EDTA,

112 pH 8.0, 0.5% SDS) and digested with 100 mg/L Proteinase K. Genomic DNA was

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

113 extracted using the standard phenol-chloroform method with 100 mg/L RNaseA

114 treatment. The quantity of DNA was determined using a NanoDrop (Thermo Fisher

115 Scientific, Waltham, MA, USA), and the quality of high molecular-weight DNA was

116 checked using agarose gel electrophoresis.

117

118 Sequencing of genomic DNA

119 In paired-end library preparations for genome sequencing, genomic DNA was

120 fragmented with a Focused-ultrasonicator M220 (Covaris Inc., Woburn, MA, USA).

121 Paired-end library (average insert size: 540 bp) and mate-pair libraries (average insert

122 sizes: 3.2, 4.6, 7.8 and 15.2 kb) were prepared using Illumina TruSeq DNA LT Sample

123 Prep Kits and Nextera Mate Pair Sample Preparation Kits (Illumina Inc., San Diego, CA,

124 USA), following the manufacturer's protocols. These libraries were quantified by

125 Real-Time PCR (Applied Biosystems StepOnePlus, Thermo Fisher Scientific) and

126 quality controlled using capillary electrophoresis on a Bioanalyzer. Genome sequencing

127 was performed using the Illumina Miseq system with 600-cycle chemistry (2 x 300 bp).

128 Genome sequencing statistics is shown in Table S1A.

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

129

130 RNA extraction and sequencing

131 Total RNA was extracted from about 1000 polyps in six different conditions (with or

132 without symbiotic algae, in light or dark conditions, and treated with antibiotics or

133 DMCU with symbiotic algae) using Trizol reagent (Thermo Fisher Scientific) and an

134 RNeasy Mini kit (QIAGEN, Hilden, Germany). Quantity of RNA were checked with a

135 NanoDrop (Thermo Fisher Scientific). Quality of total RNA was checked with a

136 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA). For mRNA-seq, libraries

137 were produced using an Illumina TruSeq Stranded mRNA Sample Prep Kit and were

138 sequenced on HiSeq 2000 instruments using 2 × 150-cycle chemistry.

139 mRNA-sequencing statistics is shown in Table S1B.

140

141 Assembly and gene prediction

142 Sequencing reads of genomic DNA were assembled using Newbler Assembler version

143 2.8 (Roche, Penzberg, Germany) and subsequent scaffolding was performed with

144 SSPACE (Boetzer et al. 2011). Gaps inside scaffolds were closed with paired-end and

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

145 mate-pair data using GapCloser of the Short Oligonucleotide Analysis Package (Luo et al.

146 2012). Then one round of Haplomerger2 processing pipeline (Huang et al. 2017) was

147 applied to eliminate redundancy in scaffolds and to merge haplotypes. For gene model

148 prediction, we used a species-specific gene prediction model that was trained based on

149 mapping of the Hydra viridissima transcriptome and raw RNAseq reads against the

150 genome assembly. Mapping and gene structure annotation were performed using PASA

151 pipeline v2.01 and were used to train AUGUSTUS software (Haas et al. 2003; Stanke et

152 al. 2006). Genome completeness was evaluated using BUSCO (Benchmarking Universal

153 Single-Copy Ortholog) (Seppey et al. 2019). RNA-Seq transcripts were mapped to the

154 genome assembly with BWA.

155

156 Genome size estimation

157 Genome size was estimated from raw paired-end reads by k-mer distribution analysis.

158 Jellyfish v2.0.0 was used to count k-mers and their frequencies (Marcais and Kingsford

159 2011). The Hydra viridissima genome size was estimated from obtained k-mer

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

160 distribution frequencies using GenomeScope web tool (Vurture et al. 2017)

161 (http://qb.cshl.edu/genomescope/).

162

163 Analysis of repetitive elements

164 Repetitive elements in the draft genome assemblies of Hydra viridissima were identified

165 de novo with RepeatScout version 1.0.5 (http://www.repeatmasker.org/RepeatModeler)

166 and RepeatMasker version 4.0.6 (http://www.repeatmasker.org). Repetitive elements

167 were filtered by length and occurrence so that only sequences longer than 50 bp and

168 present more than 10 times in the genome were retained. The resulting sets of repetitive

169 elements were annotated using BLASTN and BLASTX searches against

170 RepeatMasker.lib (35,996 nucleotide sequences) and RepeatPeps.lib (10,544 peptides)

171 bundled with RepeatMasker version 4.0.6. The results of both searches were combined,

172 and BLASTX results were given priority in cases where BLASTN and BLASTX

173 searches gave conflicting results.

174

175 Analysis of Hydra viridissima genes

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

176 For comparative analysis of H. viridissima genes among cnidarians, protein sequences

177 were obtained from Compagen (http://www.compagen.org/) for Hydra magnipapillata

178 (Chapman et al. 2010), from JGI

179 (https://genome.jgi.doe.gov/Nemve1/Nemve1.home.html) for Nematostella vectensis

180 (Putnam et al. 2007), from MARIMBA (available at http://

181 marimba.obs-vlfr.fr/organism/Clytia/hemisphaerica) for Clytia hemisphaerica (Leclère

182 et al. 2019), from the genome project website of OIST Marine Genomics Unit

183 (https://marinegenomics.oist.jp/gallery/gallery/index) for Acropora digitifera (Shinzato

184 et al. 2011), and for Morbakka virulenta and Aurelia aurita in Atlantic Ocean (Khalturin

185 et al. 2019). In comparative analyses, domain searches against the Pfam database

186 (Pfam-A.hmm) were performed using HMMER (Finn et al. 2016), and ortholog gene

187 grouping was done using OrthoFinder (Emms and Kelly 2015). To classify

188 homeodomain containing proteins, BLAST searches and phylogenetic analysis were

189 performed. Homeodomain sequences in various were obtained from Homeobox

190 Database (http://homeodb.zoo.ox.ac.uk/families.get?og=All) were used (Zhong and

191 Holland 2011).

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

192 For phylogenetic analysis, multiple alignments were produced with CLUSTALX (2.1)

193 with gap trimming (Larkin et al. 2007). Sequences of poor quality that did not align well

194 were deleted using BioEdit (Hall 1999) Phylogenetic analyses were performed using the

195 Neighbor-Joining method (Saitou and Nei 1987) in CLUSTALX with default parameters

196 (1,000 bootstrap tests and 111 seeds). Representative phylogenetic trees were drawn

197 using FigTree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/). Gene/protein ID used for

198 phylogenetic analysis are shown in the trees (Figs S2 and S3).

199

200 Data availability

201 This whole-genome shotgun sequencing project has been deposited at

202 DDBJ/ENA/GenBank under the accession QPEY00000000 (BioProject ID:

203 SAMN09635813). RNA-seq reads has been deposited at SRA of NCBI

204 (SRX6792700-SRX6792705). Genome sequences, gene models, and a genome browser

205 are also accessible at the website of the OIST Marine Genomics Unit Genome Project

206 (https://marinegenomics.oist.jp/hydraviridissima_A99). The genome browser was

207 established for assembled sequences using the JBrowser 1.12.3 (Skinner et al. 2009).

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

208 Gene annotations from the protein domain search and BLAST search are likewise shown

209 on the site. Reagents, software and datasets used in this study were listed in Reagent

210 Table. Supplemental files are available at FigShare; Figure S1. k-mer frequency

211 distribution plots in Hydra viridissima A99 genome sequence, Figure S2. Phylogenetic

212 analysis of ANTP genes, Figure S3. Phylogenetic analysis of PRD genes, Table S1.

213 Sequencing statistics for Hydra viridissima A99, Table S2. Summary of repetitive

214 sequences in the Hydra viridissima A99 genome assembly, Table S3. Number of Pfam

215 domain-containing genes in the Hydra viridissima A99 genome, Table S4. Number of

216 orthologs enriched in Hydra viridissima A99 (A) and Hydra (B), Table S5. Gene IDs of

217 ANTP genes in Hydra viridissima A99.

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

218 RESULTS AND DISCUSSION

219

220 Genome architecture of Hydra viridissima

221 Hydra viridissima appears green because of the symbiotic Chlorella, and has a smaller

222 body compared to the brown hydra Hydra magnipapillata (Fig. 1C). We decoded the

223 genome of H. viridissima strain A99, which is closely related to strain A14c, Wuerzburg

224 and J8 (Fig. 1B). We previously reported the genome of its specific symbiotic algae,

225 Chlorella sp. A99, and demonstrated the metabolic co-dependency between H.

226 viridissima A99 and the symbiont (Hamada et al. 2018).

227 The genome of H. viridissima A99 was sequenced using the Illumina platform

228 with paired-end and mate pair libraries. Statistics of sequence reads, the assembly, and

229 genome architecture are shown in Table 1. We obtained ~7,070 Mbp of paired-end

230 sequences, and 4,765, 4,769, 3,669 and 3,551 Mbp for 3.2k, 4.6k, 7.8k, and 15.2k

231 insert-size pate-pair sequences, respectively, comprising a total of ~23,826 Mbp (Table

232 S1). The size of the H. viridissima genome was estimated at ~254 Mbp using k-mer

233 analysis (k-mer=19) based on paired-end sequence data (Fig. S1). This indicates that we

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

234 achieved more than 90-fold sequence coverage of the genome. On the other hand, the

235 total length of the genome sequence assembly reached 284,265,305 bp. That is, the total

236 assembly represented a good match for the estimated genome size.

237 Although genomic DNA was extracted from a clonally propagated culture of

238 hydra polyps maintained in the laboratory, heterozygosity was comparatively high

239 (2.28% of the entire sequence) (Fig. S1). Thus, polyps originally collected from the wild

240 had a high level of heterozygosity. Repetitive sequences constituted 37.5 % of the

241 genome and the gap rate was 16.8% of the genome (Table 1; see next section). Scaffolds

242 from the present analysis numbered 2,677 and the scaffold N50 was 1.1 Mbp, with the

243 longest scaffold reaching 5.1 Mbp (Table 1). The GC content of the genome was 24.7%

244 (Table 1), suggesting that H. viridissima has an AT-rich genome. Using 67,339,858,036

245 nucleotides of RNA-sequence data (Table S1), we predicted gene models. The genome

246 was estimated to contain 21,476 protein-coding genes (Table 1). We did not find any gene

247 models with sequence similarities to the symbiotic Chlorella. The mean gene length,

248 exon length, and intron length were 7637, 209, and 838 bp, respectively (Table 1). The

249 BUSCO method supports 84% of the H. viridissima gene models as complete and

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

250 predicts that the genome accounts for 91% of the genes, including partial sequences

251 (Table 1).

252 During assembly and gene annotation, we noticed that scaffold2223, composed

253 of 18,375 base pairs (bp), contained almost the entire H. viridissima mitochondrial

254 genome. The mitochondrial genome of H. viridissima strain A99 was linear, as reported

255 by Pan et al. (2014) for the other green hydra Hydra sinensis, while the brown hydra,

256 Hydra vulgaris, mitochondrial genome is composed of two linear molecules (Pan et al.

257 2014a; Pan et al. 2014b).

258 Comparison of H. viridissima genome statistics with those of other cnidarian

259 genomes showed that the H. viridissima genome assembly is of higher quality than the H.

260 magnipapillata genome assembly (Table 1). Although the genome size of H.

261 magnipapillata is approximately 3x larger than that of H. viridissima, the number of

262 scaffolds and the scaffold N50 were ten-times better in H. viridissima (2,677 and 1.1 Mbp,

263 respectively) than in H. magnipapillata (20,914 and 0.1 Mbp, respectively) (Table 1).

264 BUSCO support for gene modeling was higher in H. viridissima (83.9%) than in H.

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

265 magnipapillata (76.8%) (Table 1). Compared to H. magnipapillata, the green hydra has a

266 compact genome, with ~65% as many genes (Table 1).

267

268 Repetitive sequences in the Hydra viridissima genome

269 Although the abundance of repetitive sequences in anthozoan genomes is generally low

270 (15~17%), genomes of medusozoans and hydrozoans have comparatively high levels of

271 repetitive sequences, 57% in H. magnipapillata, 41% in Clytia, 45% in Aurelia, and 37%

272 in Morbakka (Table 1). This was also true of H. viridissima (37.5%) (Table 1). DNA

273 transposons were the most abundant type, accounting for approximately 22.41% of the

274 genome (Figure 2A, Table S2). Of these, TcMariner, CMC, Maverick and hAT were

275 largest components (Figure 2A). On the other hand, the percentages of LTR

276 retrotransposons (1.63%) and non-LTR retrotransposons (0.99%) were comparatively

277 low (Figure 2A, Table S2).

278 In comparing repetitive elements between H. viridissima and H.

279 magnipapillata, it first becomes apparent that DNA transposons (DNA/TcMar,

280 DNA/hAT and DNA/CMC) occupy a similar portion of both Hydra genomes (Figure 2).

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

281 In addition, novel and potentially cnidarian-specific repetitive elements occupy ~12% of

282 both genomes. Second, long, interspersed nuclear elements LINE/L2 (~7%) and

283 LINE/CR1 (8.4%) are large components of the H. magnipapillata genome, but they are

284 almost absent in the H. viridissima genome. It was suggested that a burst of

285 retrotransposons occurred in the brown hydra lineage and may have caused the large

286 genome size (Chapman et al. 2010; Wong et al. 2019). It is likely that the H. viridissima

287 genome represents something close to the ancestral state of the Hydra genome. Molecular

288 and evolutionary mechanisms involved in the insertion of LINE components in the H.

289 magnipapillata genome will be a subject of future studies in relation to diversification

290 and speciation within the Hydra clade.

291

292 Innate immunity-related protein genes in the Hydra viridissima genome

293 Using the Pfam-domain search method, we surveyed genes for protein domains in the H.

294 viridissima genome. We found approximately 4,500 different Pfam domains in this

295 species (Table S3), a number comparable to those of other cnidarians. With the criterion

296 that the number of domains should be ≥2x higher in the green hydra genome than those of

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

297 non-symbiotic cnidarians and that the number represents enrichment by Chi-Square test

298 (p-value<0.001), we searched for Pfam domains that are enriched in the H. viridissima

299 genome (Table 2A). Then we checked the number of H. viridissima-enriched domains in

300 the genome of the coral, Acropora digitifera, since it is also a symbiotic cnidarian. Death

301 and NACHT were the two most highly enriched domains and TIR occurred in the top 10

302 (Table 2A). Death, NACHT and TIR domains are involved in the innate immune system.

303 NACHT and TIR domains are found in the pattern-recognition receptors, Nod-like and

304 Toll-like, which are sensors for pathogen and damage-associated molecules. The Death

305 domain occurs in some pattern recognition receptors and their adaptors, which mediate

306 signals to downstream immune responses. It appears that these domains are also enriched

307 in Acropora digitifera, but not in non-symbiotic cnidarian species (Table 2A).

308 Expansion of genes for NACHT-containing proteins is supported by

309 identification of orthologous protein groups by OrthoFinder. Genes for proteins similar to

310 Nod-like receptor were the second most overrepresented group in the H. viridissima

311 genome (Table 3A, Table S4). However, these orthologs were not scored in the Acropora

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

312 genome, suggesting that the NACHT-containing proteins expanded in H. viridissima are

313 different from those expanded in Acropora.

314 In Acropora, we previously showed unique, complex domain structures of

315 proteins with NACHT or NB-ARC domains, which have similar structures and functions

316 to NACHT (Hamada et al. 2013). Thus, we further examined domain combinations of

317 NACHT/NB-ARC proteins in H. viridissima to determine whether such complex domain

318 structures are also found in H. viridissima. Basically Nod-like receptors have a tripartite

319 domain structure, consisting of effector-binding domain constituted of apoptosis-related

320 domains, such as Death and DED in the N-terminus, NACHT/NB-ARC in the center, and

321 a repeat domain that recognizes pathogen and damage-associate molecules at the

322 C-terminus. Humans have approximately 20 Nod-like receptor family proteins, and their

323 ligand recognition region is a leucine-rich repeat (LRR). On the other hand, in Nod-like

324 receptors of basal metazoans, not only LRR, but also tetratricopeptide repeats (TPR),

325 WD40 repeats, and ankyrin repeats (Ank) are found as repeat domains. We previously

326 showed that Acropora has all 4 types of Nod-like receptors, and that with LRR type is the

327 most common (Table. 4) (Hamada et al. 2013). In other cnidarians examined, only TPR

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

328 and WD40 are found as repeat domains of Nod-like receptors, suggesting loss of the other

329 types. Especially in H. viridissima, a larger number of genes for Nod-like receptors with

330 TPR were found. In addition, their domain structures in H. viridissima vary widely,

331 compared to those of H. magnipapillata (Fig. 3). For example, the domain combination

332 of NACHT/NB-ARC with DED or TIR, was found in H. viridissima, but not in H.

333 magnipapillata. In addition to NACHT-containing protein, H. viridissima has more

334 genes for TIR domain-containing proteins, including an interleukin-1 receptor (ILR),

335 which are not found in H. magnipapillata.

336 As mentioned above, diversification of pattern-recognition receptor-related

337 genes are found in both H. viridissima and Acropora digitifera. Their most significant

338 shared attribute is symbiosis, the former with Chlorella and the latter with the

339 dinoflagellate, Symbiodinium. Therefore, it is likely that the evolutionary acquisition of

340 symbiosis by certain cnidarians requires expansion and greater sophistication of innate

341 immunity genes. They may participate in recognition and maintenance of symbiotic

342 organisms in cnidarian tissues. On the other hand, the structures (e.g. repeat

343 combination) of the Nod-like receptors most abundant in green hydra and coral were

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

344 different. This indicates that species-specific adaptations to the environment occurred in

345 each of these lineages independently.

346

347 Genes enriched in the genus Hydra

348 We further examined Pfam domains overrepresented specifically in H. viridissima and

349 others present in both H. viridissima and H. magnipapillata. This was done using the

350 same criterion as above, that is, that the number of domains is ≥2x higher than those in

351 other cnidarians and that the number is enriched by Chi-Square test (p-value<0.001).

352 (Table 2B).

353 Pfam domain searches and ortholog protein grouping demonstrated that H.

354 viridissima and H. magnipapillata possess many genes encoding domains that function in

355 DNA binding. For example, genes containing transposase-related domain (DDE_3,

356 Dimer_Tnp_hAT and Transposase_mut) and DNA-binding motif (HTH: helix-turn-helix

357 and zf: zinc finger) were overrepresented in both H. viridissima and H. magnipapillata

358 (Table 2B). In addition, PAX, Endonuclease_7, RVT_3, MarR_2, some HTHs,

359 Sigma70_r4, CIDE-N, PIF-1 and RAG1, which bind DNA, were specifically

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

360 overrepresented in H. viridissima (Table 2A). Particularly, ortholog protein grouping

361 suggested that the most overrepresented gene in the H. viridissima genome was PIF1

362 ATP-dependent DNA helicase-like genes (Table 3A), which are required for

363 maintenance of genome structure. Although the functions of these genes are presently

364 unknown, they may be involved in the genome structure of Hydra, which contains many

365 transposable elements.

366 Pfam domain searches also demonstrated that genes for proteins containing

367 Sulfate_transp domain and those containing Polysacc_deac_1 domain were enriched in

368 both H. viridissima and H. magnipappilata (Table 2B). Sulfate_transp is found in the

369 sulfate permeases family, which is involved in uptake or exchange of inorganic anions,

370 such as sulfate. So far, their functions in Hydra are unknown, but they may relate to their

371 limnetic life style, which requires active ion uptake. Polysacc_deac_1 is found in

372 polysaccharide deacetylase, including chitin deacetylase, which is involved in chitin

373 metabolism. It may contribute to construction of the extracellular matrix surrounding the

374 body or structure of nematocyte, or molecular recognition events such as immune

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

375 response to pathogens with chitinous cell wall (Balasubramanian et al. 2012; Elieh Ali

376 Komi et al. 2018; Rodrigues et al. 2016).

377

378 Gene families for transcription factors and signaling molecules

379 Using Pfam-supported families, we examined the number of gene families for putative

380 transcription regulator genes and signaling molecules (Table 5), since these genes are

381 essential in development and physiology of metazoans. While major signaling pathways

382 are present in cnidarians, some specialization in Cnidaria is known. For example, Wnt

383 genes, which are important for oral-aboral body axis formation, diversified in the

384 cnidarian lineage (Kusserow et al. 2005; Khalturin et al. 2019). Table 5A shows numbers

385 of putative transcription factor genes in the H. viridissima genome. Zinc finger proteins

386 (C2H2 type) were most abundant, with 105 members, although the abundance of this

387 family has been noted in other cnidarian genomes (Khalturin et al. 2019). There were 33

388 HLH domain-containing and 50 homeobox domain-containing genes

389 (Homeodomain-containing genes of H. viridissima are discussed in the next section). A

390 similar analysis of putative signaling molecule genes showed that the H. viridissima

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

391 genome contains 16 fibroblast growth factor (FGF)-like domain genes, 11 transforming

392 growth factor-beta (TGB-b) genes, and 10 Wnt genes (Table 5B). These numbers are

393 comparable to those in H.magnipappilata. In general, the number of transcriptional factor

394 and signal molecule family members appeared similar among cnidarians, although a few

395 families such as AT_hook and Hairly-orange of transcriptional factors (Table 5A) and

396 Interleukin 3 (IL3) families of signal molecules (Table 5B) were not found in Hydra

397 genomes.

398

399 Hox and Para-Hox genes in Hydra viridissima

400 Among transcriptional factors, homeodomain-containing proteins have been intensively

401 investigated as key molecules in the developmental toolkit. They are highly diversified

402 and participate in a wide variety of developmental processes in metazoans. In particular,

403 those in Cnidarians that are shared by the common ancestors of deuterostomes and

404 protostomes are important to understand body plan evolution of bilaterians (Ferrier and

405 Holland 2001; Chourrout et al. 2006; Ferrier 2016; DuBuc et al. 2018). While many

406 orthologous genes of known homeodomain-containing proteins, including Hox and

26 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

407 ParaHox genes, have been identified in cnidarians, cnidarian-specific specializations,

408 such as loss of some homeodomain protein genes and fragmentation of the Hox cluster

409 have been reported (Kamm et al. 2006; Steele et al. 2011; Chapman et al. 2010; Leclère et

410 al. 2019). To understand the evolutionary trajectory of homeobox protein genes in the

411 Hydra lineage, we classified them into ANTP (HOXL and NKL), PRD, LIM, POU,

412 PROS, SINE, TALE, CERS, or ZF by bi-directional BLAST searches against sequences

413 of homeodomains in other animals, using HomeoDB (Zhong and Holland 2011) (Table 6,

414 Table S5) and phylogenetic analysis for ANPT- and PRD-class genes (Figs. S2 and S3),

415 referring to the Hox genes previously identified in other cnidarians (Schummer et al.

416 1992; Chourrout et al. 2006; Leclère et al. 2019; Khalturin et al. 2019).

417 In the H. viridissima genome, we identified 48 homeodomain-containing genes

418 in the genome, 5 ANTP-HOXL, 9 ANTP-NKL, 21 PRD, 4 LIM, 4 TALE, 2 SINE, 2 POU,

419 and 1 CERS; however, we failed to find CUT, HNF, PROS and ZF classes. This tendency

420 toward gene loss was shared by the two other hydrozoans, H. magnipapillata and Clytia

421 hemishpaerica (Table 6). Among cnidarians, anthozoan genomes (Nematostella and

422 Acropora) apparently contain the most homeodomain-containing genes, while

27 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

423 scyphozoans (Aurelia) and cubozoans (Morbakka) have intermediate numbers, and

424 hydrozoan genomes contain the fewest. CUT class genes were not found in medusozoan

425 genomes and all cnidarian genomes lacked PROS and ZF class genes altogether. In

426 addition, NKL genes were less abundant in Hydra and HOXL genes less abundant in

427 hydrozoans generally, than in other cnidarians.

428 H.viridissima and H. magnipapillata possess the same ANTP genes (Fig. 4,

429 Table 7, Table S5), suggesting a reason for the same body plan in these Hydra species,

430 although the body size of H. viridissima is smaller. As previously reported (Leclère et al.

431 2019; Khalturin et al. 2019; Gauchat et al. 2000; Quiquand et al. 2009), ParaHox genes

432 Gsh and Meox are present in Hydra, whereas Xlox and Cdx are missing, unlike other

433 medusozoans (Table 7, Fig. 4). On the other hand, Hox gene composition is quite similar

434 among medusozoans. They have Hox1and Hox9-14, but lack Hox2, Evx, Gbx, Mnx,

435 unlike anthozoans. Medusozoans have lost many NKL genes, Nedx, Hlx, Mnx, Msx, and

436 Lbx compared to anthozoans. In addition, Dbx, Hlx, Nk3, Nk6 and Nk7 were not found in

437 hydrozoans, nor were Nk5, Exm or Msxlx in Hydra (Table 7, Fig. 4). In addition, some

438 degree of synteny conservation of HOXL genes and NKL genes is found in Anthozoa, but

28 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

439 not in Medusozoa (Fig. 4), suggesting complete fragmentation of the homeobox gene

440 cluster in the common ancestor of medusozoans. Nematostella expresses Gbx, Hlx, Nk3

441 and Nk6 in the pharyngeal or mesenteric region (Gbx in pharyngeal endoderm (Matus et

442 al. 2006), Hlx and Nk6 in pharyngeal ectoderm, and Nk3 in nutrient-storing somatic

443 gonads in mesentery (Steinmetz et al. 2017)). Anthozoans have a pharynx and a

444 mesentery that structurally supports the pharynx and serves the site of gamete production

445 in the gastrovascular cavity, while these tissues are not found in Hydra. Loss of these

446 genes reflects the simplification of body structure in the Hydra lineage.

29 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

447 CONCLUSION

448 H. viridissima belongs to an early branching group of Hydra, which seems to have

449 maintained the ancestral state of the genome than does more derived H. magnipapillata.

450 On the other hand, the repertoire of transcriptional factor genes including

451 homeodomain-containing genes in H. viridissima is quite similar to that in H.

452 magnipapillata, reflecting the common body plan in these species. Comparative analysis

453 of transcription factor genes among cnidarians indicates gradual simplification of the

454 ANTP gene repertoire in the Hydra lineage, which is likely to reflect the simple body

455 structure of Hydra. In addition, we found diverse innate immunity genes in H. viridissima

456 genome that are also observed in coral, indicating the common feature involved in algal

457 symbiosis. The H. viridissima genome presented here provides a Hydra genome

458 comparable in quality to those of other cnidarians, including medusozoans and

459 anthozoans, which will hopefully facilitate further studies of cnidarian genes, genomes,

460 and genetics to understand basal metazoan evolution and strategies to support algal

461 symbiosis in cnidarians.

462

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

463 ACKNOWLEDGEMENTS

464

465 This research was supported by funding provided by Okinawa Institute of Science and

466 Technology (OIST) to the Marine Genomics Unit (NS) and by the Okayama University

467 Dispatch Project for Female Faculty members (MH). MH was supported by Japanese

468 Society for Promotion of Sciences Funding (15K07173, 18K06364). We thank Dr.

469 Steven D. Aird for editing the manuscript, Ms. Kanako Hisata (OIST) for creating the

470 genome browser, Dr. Chuya Shinzato (OIST, Tokyo University) for providing valuable

471 advice and help with genome sequencing and assembly, and Prof. Thomas C. G. Bosch

472 (Kiel University) for offering valuable discussion. We also thank the DNA Sequencing

473 Section and the IT Section of OIST for excellent technical support. Computations for this

474 work were partially performed on the NIG supercomputer at the ROIS National Institute

475 of Genetics.

476

31 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

477 FIGURE LEGENDS

478

479 Figure 1. Phylogeny and morphology of green hydra Hydra viridissima. (A)

480 Phylogenetic position of H. viridissima (red) within the phylum Cnidaria. (B)

481 Relationship of Hydra viridissima strain A99 (red) with other H. viridissima strains and

482 brown hydra species, based on phylogenetic analysis by NJ method using cytochrome c

483 oxidase subunit I (COI) gene sequences. The genomic region in H. viridissima A99 and

484 Genbank IDs in other strains used in the phylogenetic analysis are indicated. (c)

485 Photograph of H. viridissima (left) and H. magnipapillata (right). H. viridissima is

486 smaller than H. magnipapillata, and green due to symbiotic Chlorella in its endodermal

487 epithelial cells.

488

489 Figure 2. Interspersed Repeat Landscape in Hydra. Components and proportions of

490 repetitive sequences in the genome of (A) Hydra viridissima A99 and (B) H.

491 magnipapillata are shown. Classes of repeat are shown in the right column.

492

32 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

493 Figure 3. Schematic representation of domain structures of NACHT/NB-ARC or

494 TIR-domain-containing proteins identified in Hydra. The domain structures and the

495 number of NACHT/NB-ARC or TIR-domain-containing proteins in Hydra viridissima

496 A99 (Hv) and H. magnipapillata (Hm) are shown.

497

498 Figure 4. ParaHox, Hox and NK genes in cnidarians. Putative Homeobox megacluster

499 in the last common ancestor of cnidarians and bilaterians (top) and homeobox genes and

500 their cluster structures in present cnidarians are represented. ParaHox genes (green

501 boxes); Hox genes (pink boxes); NK genes (blue boxes). Empty boxes indicate lost genes.

502 Horizontal lines (black) indicate chromosome fragments.

503

33 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

504 REFERENCES

505

506 Balasubramanian, P.G., A. Beckmann, U. Warnken, M. Schnölzer, A. Schüler et al., 2012 507 Proteome of Hydra nematocyst. The Journal of biological chemistry 287 508 (13):9672-9681. 509 Baumgarten, S., O. Simakov, L.Y. Esherick, Y.J. Liew, E.M. Lehnert et al., 2015 The 510 genome of <em>Aiptasia</em>, a sea anemone model for coral 511 symbiosis. Proceedings of the National Academy of Sciences 112 (38):11893. 512 Bode, H., 2011 Axis Formation in Hydra. Annual Review of Genetics 45 (1):105-117. 513 Bode, H.R., 2003 Head regeneration in Hydra. Developmental Dynamics 226 514 (2):225-236. 515 Boetzer, M., C.V. Henkel, H.J. Jansen, D. Butler, and W. Pirovano, 2011 Scaffolding 516 pre-assembled contigs using SSPACE. Bioinformatics 27 (4):578-579. 517 Brusca, R.C., W. Moore, and S.M. Shuster, 2016 Invertebrates. Sunderland, 518 Massachusetts U.S.A.: Sinauer Associates, Inc., Publishers. 519 Cernichiari, E., L. Muscatine, and D.C. Smith, 1969 Maltose Excretion by the Symbiotic 520 Algae of Hydra viridis. Proceedings of the Royal Society of London, Series B: 521 Biological Sciences 173 (1033):557-576. 522 Chapman, J.A., E.F. Kirkness, O. Simakov, S.E. Hampson, T. Mitros et al., 2010 The 523 dynamic genome of Hydra. Nature 464 (7288):592-596. 524 Chourrout, D., F. Delsuc, P. Chourrout, R.B. Edvardsen, F. Rentzsch et al., 2006 525 Minimal ProtoHox cluster inferred from bilaterian and cnidarian Hox 526 complements. Nature 442 (7103):684-687. 527 Davy, S.K., D. Allemand, and V.M. Weis, 2012 Cell Biology of Cnidarian-Dinoflagellate 528 Symbiosis. Microbiology and Molecular Biology Reviews 76 (2):229. 529 Deines, P., and T.C.G. Bosch, 2016 Transitioning from Microbiome Composition to 530 Microbial Community Interactions: The Potential of the Metaorganism Hydra 531 as an Experimental Model. Frontiers in Microbiology 7 (1610). 532 Douglas, A.E., and V.A.R. Huss, 1986 On the characteristics and taxonomic position of 533 symbiotic Chlorella. Archives of Microbiology 145 (1):80-84.

34 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

534 DuBuc, T.Q., T.B. Stephenson, A.Q. Rock, and M.Q. Martindale, 2018 Hox and Wnt 535 pattern the primary body axis of an anthozoan cnidarian before gastrulation. 536 Nature Communications 9 (1):2007. 537 Elieh Ali Komi, D., L. Sharma, and C.S. Dela Cruz, 2018 Chitin and Its Effects on 538 Inflammatory and Immune Responses. Clinical Reviews in Allergy & 539 Immunology 54 (2):213-223. 540 Emms, D.M., and S. Kelly, 2015 OrthoFinder: solving fundamental biases in whole 541 genome comparisons dramatically improves orthogroup inference accuracy. 542 Genome Biology 16 (1):157. 543 Ferrier, D.E.K., 2016 Evolution of Homeobox Gene Clusters in Animals: The 544 Giga-Cluster and Primary vs. Secondary Clustering. Frontiers in Ecology and 545 Evolution 4 (36). 546 Ferrier, D.E.K., and P.W.H. Holland, 2001 Ancient origin of the Hox gene cluster. Nature 547 Reviews Genetics 2 (1):33-38. 548 Finn, R.D., P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry et al., 2016 The Pfam protein 549 families database: towards a more sustainable future. Nucleic Acids Research 550 44 (D1):D279-285. 551 Gauchat, D., F. Mazet, C. Berney, M. Schummer, S. Kreger et al., 2000 Evolution of 552 Antp-class genes and differential expression of Hydra 553 Hox/paraHox genes in anterior patterning. Proceedings of the National 554 Academy of Sciences 97 (9):4493-4498. 555 Gold, D.A., T. Katsuki, Y. Li, X. Yan, M. Regulski et al., 2019 The genome of the jellyfish 556 Aurelia and the evolution of complexity. Nature Ecology & Evolution 3 557 (1):96-104. 558 Haas, B.J., A.L. Delcher, S.M. Mount, J.R. Wortman, R.K. Smith Jr et al., 2003 Improving 559 the Arabidopsis genome annotation using maximal transcript alignment 560 assemblies. Nucleic Acids Research 31 (19):5654-5666. 561 Habetha, M., F. Anton-Erxleben, K. Neumann, and T.C. Bosch, 2003 The Hydra 562 viridis/Chlorella symbiosis. Growth and sexual differentiation in polyps without 563 symbionts. Zoology 106 (2):101-108. 564 Hall, 1999 BioEdit: a user-friendly biological sequence alignment editor and analysis 565 program for Windows 95/98/NT. Nucleic Acid Symposium Series 41:95-98.

35 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

566 Hamada, M., K. Schröder, J. Bathia, U. Kürn, S. Fraune et al., 2018 Metabolic 567 co-dependence drives the evolutionarily ancient Hydra-Chlorella symbiosis. 568 eLife 7:e35122. 569 Hamada, M., E. Shoguchi, C. Shinzato, T. Kawashima, D.J. Miller et al., 2013 The 570 complex NOD-like receptor repertoire of the coral Acropora digitifera includes 571 novel domain combinations. Molecular Biology and Evolution 30 (1):167-176. 572 Holstein, T.W., E. Hobmayer, and U. Technau, 2003 Cnidarians: An evolutionarily 573 conserved model system for regeneration? Developmental Dynamics 226 574 (2):257-267. 575 Huang, S., M. Kang, and A. Xu, 2017 HaploMerger2: rebuilding both haploid 576 sub-assemblies from high-heterozygosity diploid genome assembly. 577 Bioinformatics 33 (16):2577-2579. 578 Huss, V.A.R., C. Holweg, B. Seidel, V. Reich, M. Rahat et al., 1993/1994 There is an 579 ecological basis for host/symbiont specificity in Chlorella/Hydra symbioses. 580 Endocytobiosis and Cell Research 10:35-46. 581 Kamm, K., B. Schierwater, W. Jakob, S.L. Dellaporta, and D.J. Miller, 2006 Axial 582 Patterning and Diversification in the Cnidaria Predate the Hox System. 583 Current Biology 16 (9):920-926. 584 Khalturin, K., C. Shinzato, M. Khalturina, M. Hamada, M. Fujie et al., 2019 Medusozoan 585 genomes inform the evolution of the jellyfish body plan. Nature Ecology & 586 Evolution 3 (5):811-822. 587 Kusserow, A., K. Pang, C. Sturm, M. Hrouda, J. Lentfer et al., 2005 Unexpected 588 complexity of the Wnt gene family in a sea anemone. Nature 433 589 (7022):156-160. 590 Larkin, M.A., G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan et al., 2007 591 Clustal W and Clustal X version 2.0. Bioinformatics 23 (21):2947-2948. 592 Leclère, L., C. Horin, S. Chevalier, P. Lapébie, P. Dru et al., 2019 The genome of the 593 jellyfish Clytia hemisphaerica and the evolution of the cnidarian life-cycle. 594 Nature Ecology & Evolution 3 (5):801-810. 595 Luo, R., B. Liu, Y. Xie, Z. Li, W. Huang et al., 2012 SOAPdenovo2: an empirically 596 improved memory-efficient short-read de novo assembler. Gigascience 1 597 (1):18.

36 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

598 Marcais, G., and C. Kingsford, 2011 A fast, lock-free approach for efficient parallel 599 counting of occurrences of k-mers. Bioinformatics 27 (6):764-770. 600 Martínez, D.E., A.R. Iñiguez, K.M. Percell, J.B. Willner, J. Signorovitch et al., 2010 601 Phylogeny and biogeography of Hydra (Cnidaria: Hydridae) using mitochondrial 602 and nuclear DNA sequences. Molecular Phylogenetics and Evolution 57 603 (1):403-410. 604 Matus, D.Q., K. Pang, H. Marlow, C.W. Dunn, G.H. Thomsen et al., 2006 Molecular 605 evidence for deep evolutionary roots of bilaterality in animal development. 606 Proceedings of the National Academy of Sciences 103 (30):11195-11200. 607 McAuley, P.J., 1991 Amino acids as a nitrogen source for Chlorella symbiotic with 608 green hydra, pp. 369-376 in Coelenterate Biology: Recent Research on 609 Cnidaria and Ctenophora: Proceedings of the Fifth International Conference on 610 Coelenterate Biology, 1989, edited by R.B. Williams, P.F.S. Cornelius, R.G. 611 Hughes and E.A. Robson. Springer Netherlands, Dordrecht. 612 Mews, L.K., 1980 The Green Hydra Symbiosis. III. The Biotrophic transport of 613 Carbohydrate from Alga to Animal. Proceedings of the Royal Society of 614 London. Series B. Biological Sciences 209 (1176):377-401. 615 Muscatine, L., 1965 Symbiosis of hydra and algae. 3. Extracellular products of the 616 algae. Comparative Biochemistry and Physiology 16 (1):77-92. 617 Pan, H.-C., H.-Y. Fang, S.-W. Li, J.-H. Liu, Y. Wang et al., 2014a The complete 618 mitochondrial genome of Hydra vulgaris (Hydroida: Hydridae). Mitochondrial 619 DNA 25 (6):418-419. 620 Pan, H.-C., X.-C. Qian, P. Li, X.-F. Li, and A.-T. Wang, 2014b The complete 621 mitochondrial genome of Chinese green hydra, Hydra sinensis (Hydroida: 622 Hydridae). Mitochondrial DNA 25 (1):44-45. 623 Pardy, R.L., 1976 The morphology of green hydra endosymbionts as influenced by host 624 strain and host environment. Journal of Cell Science 20 (655-669). 625 Putnam, N.H., M. Srivastava, U. Hellsten, B. Dirks, J. Chapman et al., 2007 Sea 626 anemone genome reveals ancestral eumetazoan gene repertoire and genomic 627 organization. Science 317 (5834):86-94.

37 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

628 Quiquand, M., N. Yanze, J. Schmich, V. Schmid, B. Galliot et al., 2009 More constraint 629 on ParaHox than Hox gene families in early metazoan evolution. Developmental 630 Biology 328 (2):173-187. 631 Rodrigues, M., T. Ostermann, L. Kremeser, H. Lindner, C. Beisel et al., 2016 Profiling of 632 adhesive-related genes in the freshwater cnidarian Hydra magnipapillata by 633 transcriptomics and proteomics. Biofouling 32 (9):1115-1129. 634 Saitou, N., and M. Nei, 1987 The neighbor-joining method: a new method for 635 reconstructing phylogenetic trees. Molecular Biology and Evolution 4 636 (4):406-425. 637 Schummer, M., I. Scheurlen, C. Schaller, and B. Galliot, 1992 HOM/HOX homeobox 638 genes are present in hydra (Chlorohydra viridissima) and are differentially 639 expressed during regeneration. The EMBO journal 11 (5):1815-1823. 640 Schwentner, M., and T.C. Bosch, 2015 Revisiting the age, evolutionary history and 641 species level diversity of the genus Hydra (Cnidaria: Hydrozoa). Molecular 642 Phylogenetics and Evolution 91:41-55. 643 Seppey, M., M. Manni, and E.M. Zdobnov, 2019 BUSCO: Assessing Genome Assembly 644 and Annotation Completeness, pp. 227-245 in Gene Prediction: Methods and 645 Protocols, edited by M. Kollmar. Springer New York, New York, NY. 646 Shinzato, C., E. Shoguchi, T. Kawashima, M. Hamada, K. Hisata et al., 2011 Using the 647 Acropora digitifera genome to understand coral responses to environmental 648 change. Nature 476 (7360):320-323. 649 Skinner, M.E., A.V. Uzilov, L.D. Stein, C.J. Mungall, and I.H. Holmes, 2009 JBrowse: a 650 next-generation genome browser. Genome Research 19 (9):1630-1638. 651 Stanke, M., O. Keller, I. Gunduz, A. Hayes, S. Waack et al., 2006 AUGUSTUS: ab initio 652 prediction of alternative transcripts. Nucleic Acids Research 34. 653 Steele, R.E., C.N. David, and U. Technau, 2011 A genomic view of 500 million years of 654 cnidarian evolution. Trends in Genetics 27 (1):7-13. 655 Steinmetz, P.R.H., A. Aman, J.E.M. Kraus, and U. Technau, 2017 Gut-like ectodermal 656 tissue in a sea anemone challenges germ layer homology. Nature Ecology & 657 Evolution 1 (10):1535-1542.

38 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

658 Trembley, A., C. Pronk, J.v.d. Schley, and P. Lyonet, 1744 Mémoires pour servir à 659 l'histoire d'un genre de polypes d'eau douce, à bras en forme de cornes. A 660 Leide :: Chez Jean & Herman Verbeek. 661 Vurture, G.W., F.J. Sedlazeck, M. Nattestad, C.J. Underwood, H. Fang et al., 2017 662 GenomeScope: fast reference-free genome profiling from short reads. 663 Bioinformatics 33 (14):2202-2204. 664 Wittlieb, J., K. Khalturin, J.U. Lohmann, F. Anton-Erxleben, and T.C.G. Bosch, 2006 665 Transgenic <em>Hydra</em> allow <em>in vivo</em> 666 tracking of individual stem cells during morphogenesis. Proceedings of the 667 National Academy of Sciences 103 (16):6208. 668 Wong, W.Y., O. Simakov, D.M. Bridge, P. Cartwright, A.J. Bellantuono et al., 2019 669 Expansion of a single transposable element family is associated with 670 genome-size increase and radiation in the genus <em>Hydra</em>. 671 Proceedings of the National Academy of Sciences 116 (46):22915. 672 Zacharias, H., B. Anokhin, K. Khalturin, and T.C. Bosch, 2004 Genome sizes and 673 chromosomes in the basal metazoan Hydra. Zoology (Jena) 107 (3):219-227. 674 Zhong, Y.-f., and P.W.H. Holland, 2011 HomeoDB2: functional expansion of a 675 comparative homeobox gene database for evolutionary developmental biology. 676 Evolution & Development 13 (6):567-568.

677

39 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fig. 1

A Nematostella vectensis Anthozoa Acropora digitifera

Hydra viridissima Hydra magnipapillata Hydrozoa Clytia hemisphaerica

Morbakka virulenta Cubozoa

Aurelia aurita Scyphozoa Medusozoa

B Green hydra: Hydra viridissima

429KP895129.1_H.viridissima_A14c 221KP895121.1_H.viridissima_Wuerzburg 1000 H.viridissima_A99_sca old2223_819..1277 1000 AB565114.1_H.viridissima_J8 KP895123.1_H.viridissima_A250

1000 KP895125.1_H.viridissima_M120 899 AB565135.1_H.viridissima_M10 615AB565134.1_H.viridissima_M9 1000 AB565133.1_H.viridissima_M8 AB565126.1_H.viridissima_L9 995 KP895124.1_H.viridissima_Husum 1000 1000 AB565121.1_H.viridissima_K10 KP895120.1_H.oxycnida 994 KP895119.1_H.robusta_L7 990 KP895118.1_H.oligactis_St.Petersburg KP895117.1_H.vulgaris_reg-16 1000 AB565113.1_H.vulgaris_J7 (H. magnipappilata) 0.02 Brown hydra C H. viridissima H. magnipappilata

5 mm bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fig. 2

Repeat class: A Hydra viridissima A99 SINE/U SINE/tRNA SINE/5S 3.0 SINE 2.6% DNA/TcMar LINE/Penelope 2.5 5.7% LINE/R2 DNA/Maverick 2.8% LINE/Dong-R4 2.5% DNA/hAT 2.0 LINE/Jockey-I 4.7% DNA/CMC 62.5% LINE/R1

geno me Non- LINE/LOA 11.5% of 1.5 repetitive Novel/Cnidaria LINE/Proto2

cent LINE/L2

pe r 1.0 LINE/Rex-Babar LINE/CR1 LINE/RTE 0.5 LINE LINE/L1 0.0 LTR/ERVK 0 5 10 15 20 25 30 35 40 45 50 LTR/ERV LTR/ERV1 Kimura substitution level (CpG adjusted) LTR LTR/ERVL LTR/Gypsy LTR/Copia B LTR/Pao Hydra magnipapillata LTR/Ngaro LTR/DIRS RC/Helitron 3.00 LINE/L2 DNA/Dada

7% DNA/Zator LINE/CR1 DNA/TcMar 40.6% 8.4% DNA/Sola 2.25 Non- repetitive DNA/PiggyBac 5.3% DNA/TcMar DNA/P DNA/Merlin genom e

of 1.50 4.6% DNA/Maverick 12.7% 4.4% DNA/hAT DNA/Kolobok DNA/hAT percent DNA/CMC Novel/Cnidaria DNA/Harbinger 0.75 DNA/Ginger DNA/Crypton DNA/CMC 0.00 DNA/Academ 0 5 10 15 20 25 30 35 40 45 50 Novel/Cnidaria Simple repeats Kimura substitution level (CpG adjusted) Non-repetitive bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fig. 3

Hv Hm NACHT/NB-ARC 83 35 Death NACHT/NB-ARC 43 2 DED NACHT/NB-ARC 1 0 DED Death NACHT/NB-ARC 1 0 NACHT/NB-ARC 3 5 APAF-1 CARD NACHT/NB-ARC WD40 1 1 NACHT/NB-ARC TPR 44 33 Death NACHT/NB-ARC TPR 50 2 NACHT/NB-ARC TPR 2 3 TIR NACHT/NB-ARC TPR 2 0 TIR TPR 1 0 TIR Death NACHT/NB-ARC 3 0 TIR NACHT/NB-ARC 5 0 MyD88 Death TIR 2 1 TIR 29 8 ILR IG IG TIR 1 0 Transmembrane Region Others 26 9 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Fig. 4

ParaHox genes Hox genes NK genes Putative Homeobox megacluster in Gsh Xlox Cdx Meox Hox1 Evx Hox2 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx Nk2/4 Nk3 Lbx Nk1 Nk5 Not Emx Nk7 Nk6 ancestral metazoan Gsh Xlox Cdx Meox Hox1 Evx Hox2 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK5 Not Emx Nk7 Nk6 Acropora digitiferaGsh Xlox Cdx Meox Hox1 Evx Hox2 Hox9/14 Gbx Mnx Nedx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK5 Not Emx Nk7 Nk6 NK2/4 NK2/4 NK2/4

Gsh Xlox Cdx Meox Meox Meox Meox Hox1 Evx Hox2 Hox2 Hox2 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK5 Not Emx Nk7 Nk6 Nematostella vectensis Hox9/14 Hox9/14 Nedx NK2/4 NK2/4 NK2/4 NK2/4 NK2/4 NK2/4

Gsh Xlox Cdx Meox Hox1 Evx Hox2 Hox9/14 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK5 Not Emx Nk7 Nk6 Morbakka virulenta Cdx Hox9/14 Dlx NK2/4 NK2/4 Hox9/14 Hox9/14 Aurelia auritaXlox Gsh Cdx Meox Hox1 Evx Hox2 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK1 NK5 Not Emx Nk7 Nk6

Gsh Xlox Cdx Meox Hox1 Evx Hox2 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK5 Not Emx Nk7 Nk6 Clytia hemisphaerica Hox9/14 Dlx Not Hox9/14 Hox9/14

Gsh Xlox Cdx Meox Hox1 Evx Hox2 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK5 Not Emx Nk7 Nk6 Hydra magnipappilata Hox9/14 Dlx Hox9/14 Hydra viridissima A99Gsh Xlox Cdx Meox Hox1 Evx Hox2 Hox9/14 Gbx Mnx Dlx Nedx Mnx Msx NK2/4 NK3 Lbx NK1 NK5 Not Emx Nk7 Nk6 Hox9/14 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 1. Comparison of the genome assembly statistics of cridarians

Hydrozoa Scyphozoa Cubozoa Anthozoa Hydra Hydra Clytia Aurelia Morbakka Acropora Nematostella Species viridissima*1 magnipapillata*2 hemisphaerica*3 aurita*4 virulenta*4 digitifera*5 vectensis*6 Laboratory Laboratory Villefranche Baltic Sea Seto Inland Okinawa Laboratory Geographical origin strain (A99) strain (Atlantic) (Atlantic) Sea (Pacific) (Pacific) strain Genome size (Mbp) 284 852 445 377 952 420 457 Number of Scaffolds 2,677 20,914 7,644 2,710 4,538 2,420 10,804 Longest scaffold (Mbp) 5.1 0.91 2.9 4.4 14.5 2.5 3.3 Scaffold N50 (Mbp) 1.1 0.1 0.4 1.0 2.2 0.5 0.5 GC content (%) 24.73 25.41 35 34.65 31.37 39 39 Repeats (%) 37.5 57 41 44.67 37.41 12.9 26 Gap rate (%) 16.77 7.82 16.6 6.63 11.87 15.2 16.6 Number of genes 21,476 31,452 26,727 28,625 24,278 23,668 27,273 Mean gene length (bp) 7,637 6,873 5,848 10,215 21,444 8,727 4,500 Mean exon length (bp) 209 247 281 368 350 316 208 Mean intron length (bp) 838 N/A N/A 1391 3572 1057 800 BUSCO (complete) % 83.9 76.8 86.4 79.8 81.5 74.5 91.4 *1, This study. *2, Chapman et al.,2010. *3, Leclere et al., 2019. *4, Khaltunin et al., 2019. *5, Shinzato et al., 2011. *6, Putnamu et al., 2007. bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 2. Number of genes with Pfam domains enriched in the Hydra viridissima genome and comparison of their number in the other cnidarian genomes.

A. Domains enriched in Hydra viridissima Domain Hv Hm Ch Aa Mv Nv Ad Chi test* Death 198 71 14 20 16 20 132 5.E-111 NACHT 161 38 42 45 23 39 458 2.E-71 PIF1 121 25 59 49 3 8 18 3.E-65 ATPase_2 64 20 28 14 13 19 36 1.E-19 PAX 61 23 3 4 23 9 8 6.E-35 TIR_2 49 10 19 24 15 17 49 1.E-11 Endonuclease_7 46 5 3 1 5 1 0 1.E-48 RAG1 42 4 1 4 0 1 0 8.E-49 TIR 41 6 3 15 11 12 36 4.E-15 RVT_3 33 1 7 10 14 1 1 2.E-20 CbiA 23 6 6 7 6 6 2 2.E-08 DUF2738 21 0 1 0 0 1 0 4.E-28 MarR_2 21 5 4 0 0 4 3 6.E-15 HTH_7 21 7 2 4 0 1 1 2.E-15 Sigma70_r4 17 5 1 1 1 0 2 6.E-14 HTH_Tnp_IS630 15 4 1 1 1 1 0 3.E-12 HTH_3 13 3 1 2 5 2 1 9.E-07 TMEM151 10 4 2 3 0 3 6 2.E-03 CIDE-N 10 2 1 3 2 2 1 9.E-05 DUF4807 7 0 1 3 1 3 7 1.E-02 DUF1294 5 1 1 2 1 0 0 1.E-02 A_deaminase 9 2 4 3 3 3 5 3.E-02 HD_4 4 1 1 1 1 0 0 4.E-02 B. Domains enriched in Hydra species Domain Hv Hm Ch Aa Mv Nv Ad Chi test* DDE_3 365 462 16 16 31 6 17 0.E+00 Dimer_Tnp_hAT 296 549 11 52 44 21 27 0.E+00 HTH_Tnp_Tc3_2 266 279 15 13 23 0 3 1.E-251 HTH_32 185 104 14 4 25 4 9 4.E-149 ANAPC3 182 166 61 36 37 46 44 1.E-68 HTH_23 166 137 6 15 33 8 20 1.E-114

zf-BED 116 83 14 9 8 19 5 3.E-78

HTH_29 95 81 7 1 19 2 5 1.E-73

HTH_psq 93 105 3 6 3 1 1 3.E-92 HTH_28 85 54 2 5 6 6 11 3.E-63 SRP_TPR_like 83 93 6 1 2 1 1 4.E-83 DUF4806 69 57 6 1 0 0 0 2.E-65 BTAD 32 37 7 3 1 3 3 2.E-22 Sigma70_r4_2 28 24 2 2 0 1 3 2.E-21 HTH_Tnp_Tc3_1 26 26 1 0 0 0 0 1.E-25 CD225 21 21 9 4 2 7 7 1.E-06 IGFBP 15 23 7 4 5 1 3 4.E-07 Sulfate_transp 15 21 2 5 4 4 5 4.E-06 DUF2961 14 49 2 3 3 1 3 4.E-27 DUF1280 13 20 4 4 0 3 2 1.E-07 Polysacc_deac_1 13 14 3 1 6 4 2 8.E-05 DUF4817 12 20 2 1 0 0 0 1.E-12 GRDP-like 9 7 2 1 2 2 3 7.E-03 Transposase_mut 8 13 1 3 1 0 0 3.E-06 CARDB 5 7 1 0 0 1 2 6.E-03 SecA_DEAD 3 9 1 1 1 0 0 8.E-04 Hv: Hydra viridissima A99, Hm: Hydra magnipappilata, Ch: Clytia hemisphaerica, Mv: Morbakka virulenta, Aa: Aurelia aurita, Nv, Nematostella vectensis, Ad: Acropora digitifera, *Chi-test: evalue of Chi-square test bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 3. Top 10 overrepresented orthologs in the Hydra viridissima genome and comparison of their gene number in the other cnidarian genomes.

A. Orthologs enriched in the Hydra viridissima genome Hv Hm Cly AUR MOR Nv Ad Chi-test* Annotation OG0000037 191 19 9 10 0 1 0 7.E-238 ATP-dependent DNA helicase PIF1 OG0000044 170 40 1 3 1 2 1 3.E-199 uncharacterized protein OG0000020 159 27 30 19 0 0 0 4.E-151 Nod-like receptor like OG0000059 90 35 9 4 15 27 0 4.E-54 Uncharacterized protein

OG0000145 72 4 0 0 0 1 0 1.E-103 Uncharacterized protein OG0000113 63 12 1 2 19 3 4 2.E-51 Uncharacterized protein OG0000166 58 9 0 0 7 2 0 2.E-63 Uncharacterized protein OG0000220 42 17 0 1 0 1 0 1.E-42 transposase OG0000151 39 11 1 3 13 1 16 5.E-23 Uncharacterized protein OG0000355 39 5 0 0 1 0 0 4.E-50 Uncharacterized protein

Hv Hm Cly AUR MOR Nv Ad Chi-test* Annotation B. OG0000009 300 288 25 19 27 3 5 2.E-262 RNA-directed DNA polymerase from mobile element jockey-like Ortho OG0000013 173 304 13 8 3 1 2 2.E-229 tigger transposable element-derived protein -like logs OG0000036 111 118 24 1 0 0 0 6.E-104 Uncharacterized protein OG0000035 93 158 1 1 7 1 1 2.E-121 Uncharacterized protein enric OG0000012 53 446 4 8 1 1 1 0.E+00 Uncharacterized protein OG0000076 44 99 0 1 1 0 0 2.E-75 zinc finger BED hed domain-containing protein like OG0000029 38 230 1 18 1 0 2 5.E-175 zinc finger MYM-type protein in 1-like Hydr OG0000214 35 26 0 0 1 0 0 3.E-34 Uncharacterized protein OG0000083 30 69 14 2 0 8 2 8.E-35 Uncharacterized protein a OG0000251 28 27 0 0 0 0 0 1.E-28 Uncharacterized protein speci es

Hv: Hydra viridissima A99, Hm: Hydra magnipappilata, Ch: Clytia hemisphaerica, Mv: Morbakka virulenta, Aa: Aurelia aurita, Nv, Nematostella vectensis, Ad: Acropora digitifera, *Chi-test: evalue of Chi-square test bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 4. The number of NACHT/NB-ARC domain-containing proteins in cnidarians and the combination of repeat domains.

Hv Hm Ch Aa Mv Nv Ad

Total 264 89 56 58 24 6 489

TPR 103 38 3 8 4 2 57 WD40 2 4 3 2 3 1 8 LRR 0 0 0 0 0 0 125 Ank 0 0 0 0 0 0 11

Hv: Hydra viridissima A99, Hm: Hydra magnipappilata, Ch: Clytia hemisphaerica, Mv: Morbakka virulenta, Aa: Aurelia aurita, Nv, Nematostella vectensis, Ad: Acropora digitifera, bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table. 5. Number of putative transcriptional factor genes (A) and signal molecule genes (B) in the Hydra viridissima genome.

A. Transcriptional genes Domain Hv Hm Ch Aa Mv Nv Ad ARID 8 7 10 10 8 5 8 AT_hook 0 0 1 2 0 0 0 bZIP_1 26 26 26 22 25 36 29 bZIP_2 25 22 31 24 27 32 17 CUT 1 0 3 1 1 2 1 DM 6 6 7 9 11 12 7 Ets 9 11 13 21 14 16 12 Forkhead 17 17 19 15 26 34 22 GATA 4 4 3 7 7 4 5 Hairy_orange 0 0 1 4 2 6 7 HLH 33 36 44 52 50 72 53 HMG_box 30 33 31 30 29 33 27 Homeobox 50 44 70 88 82 153 96 Hormone_recep 9 9 12 12 9 20 9 P53 2 1 2 1 2 3 3 PAX 61 23 3 4 23 9 8 Pou 2 3 3 4 4 6 4 RHD_DNA_bind 2 1 5 2 2 3 2 SRF-TF 2 2 4 3 2 3 1 T-box 6 7 11 10 9 14 10 TF_AP-2 1 1 2 3 2 1 2 zf-C2H2 105 123 244 233 118 169 90 zf-C2HC 1 2 3 2 3 3 4 zf-C4 9 8 11 8 9 19 12

B. Signal Domain Hv Hm Ch Aa Mv Nv Ad molecule Cbl_N 1 1 2 0 1 0 1 Cbl_N2 1 1 1 0 1 0 1 genes Cbl_N3 1 1 2 2 1 0 1 DIX 1 1 3 3 3 4 2 FGF 16 12 10 18 13 13 13 Focal_AT 1 2 1 1 1 0 0 G-alpha 29 28 29 29 32 37 22 G-gamma 2 2 1 7 5 4 3 IL3 0 0 1 0 0 0 0 PDGF 1 2 5 2 3 1 6 Phe_ZIP 1 0 1 1 1 0 1 Rabaptin 1 2 1 1 1 1 1 RGS 13 13 14 16 16 13 11 RGS-like 2 3 1 2 1 0 1 STAT_alpha 1 0 1 1 2 2 1 STAT_bind 2 1 1 1 1 1 1 STAT_int 2 1 1 0 1 0 1 TGF_beta 11 11 7 9 9 7 10 TGFb_propeptide 8 10 4 6 8 6 9 wnt 10 13 12 15 17 26 15

Hv: Hydra viridissima A99, Hm: Hydra magnipappilata, Ch: Clytia hemisphaerica, Mv: Morbakka virulenta, Aa: Aurelia aurita, Nv, Nematostella vectensis, Ad: Acropora digitifera bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table 6. Number of genes for the subclass of homeodomain-containing proteins in cnidarians.

Medusozoa Anthozoa Hydrozoa Class Hv Hm Ch Mv Aa Nv Ad ANTP-HOXL 5 7 6 13 13 17 9 ANTP-NKL 9 8 20 21 20 65 33 PRD 21 16 18 25 28 43 31 LIM 4 4 5 5 5 5 4 TALE 4 3 5 4 10 6 5 SINE 2 2 4 5 4 6 4 POU 2 2 3 4 4 6 4 CERS 1 2 1 1 1 1 1 CUT 0 0 0 0 0 1 2 HNF 0 0 0 1 0 1 1 PROS 0 0 0 0 0 0 0 ZF 0 0 0 0 0 0 0 Total 48 44 62 79 85 151 94

Hv: Hydra viridissima A99, Hm: Hydra magnipappilata, Ch: Clytia hemisphaerica, Mv: Morbakka virulenta, Aa: Aurelia aurita, Nv, Nematostella vectensis, Ad: Acropora digitifera bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Table. 7. Number of homeodomain-containing genes in the Hydra viridissima genome.

Class Subclass Familiy Hv Hm Ch Aa Mv Nv Ad ANTP HOXL Cdx 0 0 1 2 1 0 0 Evx 0 0 0 0 0 1 1 Gbx 0 0 0 0 0 1 1 Gsx 1 1 1 1 1 1 1

Hox1 1 1 1 1 1 1 1 Hox2 0 0 0 0 0 3 1 Hox9-13 2 3 1 4 5 3 1 Meox 1 1 0 1 1 4 1 Mnx 0 0 0 0 0 1 1 Xlox/Pdx 0 0 1 1 1 1 1 NKL Barx 1 1 0 1 1 4 2 Dbx 0 0 0 1 1 2 1 Dlx 1 2 2 1 2 1 1 Emx 0 0 1 1 1 2 1 Hhex 1 1 1 1 1 1 1 Hlx 0 0 0 0 1 3 2 Lbx 0 0 0 0 0 1 1 Msx 1 1 1 0 0 1 1 Msxlx 0 0 1 1 1 2 0 Nedx 0 0 0 0 0 2 2

Nk1 1 1 1 2 1 1 1 Nk2.1/2.2/4 1 1 1 1 3 8 4 Nk3 0 0 0 1 1 1 1 Nk5/Hmx 0 0 1 1 0 1 1 Nk6 0 0 0 1 1 1 1 Nk7 0 0 0 1 1 1 1 Noto 1 1 2 1 1 6 2 Ro 0 0 0 0 0 1 0 Tlx 0 0 1 1 1 0 0 PRD Alx 0 0 0 1 1 1 1 Arx 1 2 0 0 0 1 1 Dmbx 1 0 1 1 0 7 2 Gsc 1 1 1 1 1 1 1 Hbn 1 1 1 2 1 1 1 Otp 2 2 1 1 1 1 1 Otx 2 3 3 4 7 3 4 Pax3/7 0 0 0 0 0 2 2 Pax4/6 2 1 2 2 1 2 1 Pitx 1 1 1 1 1 1 1 Prrx 0 0 0 0 0 1 0 Rax 0 0 0 1 1 1 1 Repo 0 0 0 0 0 1 0 Uncx 1 1 1 1 1 2 2 Vsx 0 0 0 0 0 1 0 LIM Isl 0 0 1 1 1 1 0 Lhx1/5 1 0 1 1 1 1 1 Lhx2/9 0 0 1 0 1 1 1 Lhx6/8 1 1 1 1 1 1 1 Lmx 1 1 1 1 1 1 1 POU Hdx 0 0 1 0 0 0 0 Pou1 0 0 0 1 1 1 0 Pou3 0 0 0 1 1 3 2 Pou4 1 1 1 1 1 1 1 Pou6 1 1 1 1 1 1 1 SINE Six1/2 0 0 1 2 2 2 2 Six3/6 1 1 1 1 2 1 1 Six4/5 1 1 1 1 1 2 1 Irx 1 1 1 2 1 1 1 TALE Meis 1 1 1 5 1 1 1 Pbx 1 1 1 1 1 1 1 Pknox 1 0 1 1 1 1 1 Tgif 0 0 0 0 0 1 1 CERS Cers 2 0 2 1 1 1 1

CUT Onecut 0 0 0 0 0 1 1 HNF Hnf1 0 0 0 0 1 0 1 Hv: Hydra viridissima A99, Hm: Hydra magnipappilata, Ch: Clytia hemisphaerica, Mv: Morbakka virulenta, Aa: Aurelia aurita, Nv, Nematostella vectensis, Ad: Acropora digitifera bioRxiv preprint doi: https://doi.org/10.1101/2020.05.26.117382; this version posted May 29, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.