bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1 Title:

2 An ancient genome duplication in the speciose reef-building coral , Acropora

3

4 Authors: Yafei Mao1*, Chuya Shinzato1†, Noriyuki Satoh1

5 Affiliations:

6 1Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate

7 University, Onna, Okinawa 904-0495, Japan

8

9 *Correspondence to: Yafei Mao: [email protected]

10 †Present address: Atmosphere and Ocean Research Institute, The University of

11 Tokyo, Kashiwa, Chiba 277-8564, Japan

12

13 Yafei Mao: [email protected]

14 Chuya Shinzato: [email protected]

15 Noriyuki Satoh: [email protected]

1 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

16 Abstract:

17 Whole-genome duplication (WGD) has been recognized as a significant evolutionary

18 force in the origin and diversification of vertebrates, plants, and other organisms.

19 Acropora, one of the most speciose reef-building coral genera, responsible for

20 creating spectacular but increasingly threatened marine ecosystems, is suspected to

21 have originated by polyploidy, yet there is no genetic evidence to support this

22 hypothesis. Using comprehensive phylogenomic and comparative genomic

23 approaches, we analyzed five Acropora genomes and an Astreopora genome

24 (: ) to show that a WGD event likely occurred between 27.9

25 and 35.7 Million years ago (Mya) in the most recent common ancestor of Acropora,

26 concurrent with a massive worldwide coral extinction. We found that duplicated

27 genes became highly enriched in gene regulation functions, some of which are

28 involved in stress responses. The different functional clusters of duplicated genes are

29 related to the divergence of gene expression patterns during development. Some gene

30 duplications of proteinaceous toxins were generated by WGD in Acropora compared

31 with other Cnidarian . Collectively, this study provides evidence for an ancient

32 WGD event in corals and it helps to explain the origin and diversification of Acropora.

2 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

33 Introduction

34 Reef-building corals contribute to tropical marine ecosystems that support

35 innumerable marine organisms, but reefs are increasingly threatened due to recent

36 increases in seawater temperatures, pollution, and other stressors [1,2]. The

37 Acroporidae is a family of reef-building corals in the phylum , one of the

38 basal phyla of the clade [3-5]. Astreopora (: Acroporidae) is the

39 sister genus in the acroporid lineage according to fossil records and molecular

40 phylogenetic evidence [3,6,7]. Importantly, Acropora (Anthozoa: Acroporidae), one

41 of the most diverse genera of reef-building corals, including more than 150 species in

42 Indo-Pacific Ocean, is thought to have originated from Astreopora almost 60 Mya

43 with several species turnovers [3,5,8,9]. Investigating the evolutionary history of this

44 group importantly contributes to our understanding of coral reef biodiversity and

45 conservation. Hybridization among Acropora species has been observed in the wild

46 [10] and variable chromosome numbers have been determined in different Acropora

47 lineages [11]. Additionally, gene duplications have been shown in several Acropora

48 gene families [12,13]. Thus, based on their unique lifestyle, variable chromosome

49 numbers, and complicated reticular evolutionary history, Indo-Pacific Acropora likely

50 originated via polyploidy [10-16]. However, there is no direct molecular and genetic

51 evidence to support this hypothesis.

52

53 Ancient whole (large-scale)-genome duplication (WGD), or paleopolyploidy, has

54 shaped in the genomes of vertebrates, green plants, and other organisms, and is

55 usually regarded as an evolutionary landmark in the origins and diversification of

56 organisms [17-19] (Supplementary Fig. 1). Two separate WGD events have been

57 documented in the common ancestors of vertebrates (two-rounds of WGD) [20] and

3 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

58 another major WGD has been reported in the last common ancestor of teleost fish

59 [21,22]. Meanwhile, living angiosperms share an ancient WGD event [23,24], and

60 many other WGD events have been reported in major clades of angiosperms [25,26].

61 In addition, two-rounds of WGDs in the vertebrates are suggested to have occurred

62 during the Cambrian Period, and some WGDs in plants are believed to have occurred

63 during Cretaceous-Tertiary [17,25,27]. Thus, WGD is regarded as an important

64 evolutionary way to reduce the risk of extinction [17,18,25]. However, the study of

65 WGD in Cnidaria has received less attention [17,18,28-30].

66

67 Duplicated genes created by WGD have complex fates following the time of

68 diploidization [18,31]. Usually, one of the duplicated genes is silenced or lost due to

69 redundancy of gene functions, termed “nonfunctionalization”. However, retained

70 duplicated genes provide important sources of biological complexity and evolutionary

71 novelty due to subfunctionalization, neofunctionalization, and dosage effects [22,32].

72 Duplicated genes may develop complementary gene functions via

73 subfunctionalization, or evolve new functions through neofunctionalization, or are

74 retained in complicated regulatory networks with different gene expressions due to

75 dosage effects. For instance, duplicated MADS-Box genes are crucial for flower

76 development and the origin of phenotypic novelty in plants [18,33]. Duplicated

77 homeobox genes provide raw genetic material for vertebrate development [22,34]. In

78 addition, toxin diversification following by gene duplications has been recognized as

79 a mechanism to enhance adaptation in [35,36], especially in snake venoms

80 [37,38]. Meanwhile, toxic proteins are involved in various important processes in

81 corals, including prey capture, protection from predators, wound-healing, etc. [39,40],

82 but it is still unclear how gene duplications of toxic proteins evolved in corals.

4 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

83

84 Isozyme electrophoresis and restriction fragment length polymorphism (RFLP) were

85 used to identify gene duplications in polyploids a few decades ago [41,42]. In the past

86 ten years, next generation sequencing (NGS) has generated a wealth of genomic data

87 at vastly decreased cost and reduced effort [43,44]. Then, three main methods were

88 developed to identify WGD, based on: 1). analysis of the rate of synonymous

89 substitutions per synonymous site (dS) of duplicated genes within a genome (dS-

90 based method) [25,45,46]; 2). phylogenetic analysis of gene families among multiple

91 genomes (Phylogenomic analysis) [23,47]; and 3). synteny block identification

92 compared with sister lineages without WGD (Synteny analysis) [20,48,49],

93 respectively. The dS-based method and phylogenomic analysis only require gene

94 family information, without genome assembly. However, the dS-based method cannot

95 detect ancient WGD, and gene tree uncertainty usually causes bias in the

96 phylogenomic analysis. Both methods rely heavily on gene family estimation and

97 clustering. Inaccurate gene predictions (gene models) and rough gene family cluster

98 algorithms can easily fail to detect WGD using either method. In contrast, the synteny

99 analysis relies heavily on the genome assembly quality. Poor assembly quality can

100 hide the WGD signals, and some genomes with huge rearrangements cannot be used

101 to detect WGD using synteny block identification. Thus, the most credible

102 conclusions depend on complementary evidence from different methods [24,50,51].

103

104 Here, we analyzed a genome of Astreopora (Astreopora sp1) as an outgroup, as well

105 as five Acropora genomes (A. digitifera, A. gemmifera, A. subgrabla, A. echinata and

106 A. tenuis) to address the following questions using all three methods; (I) whether and

107 when WGD occurred in Acropora, (II) what is the fate of duplicated genes in

5 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

108 Acropora after the event, (III) what the gene expression patterns of duplicated genes

109 cross five developmental stages in A. ditigifera, and (IV) what roles of WGD were

110 involved in the diversification of toxic proteins in Acropora (Supplementary Fig. 2).

111

6 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

112 Results

113 Cluster of gene families and calibration of the acroporid phylogenomic tree

114 We clustered all homologs among the six Acroporid species into 19,760 gene families,

115 and they shared 6,520 gene families (Fig 1a, See Additional file 1). Each Acropora

116 genome had very few unique gene families, but Astreopora sp1 had 218, suggesting

117 that Astreopora sp1 is genetically divergent from the five Acropora species. 3,461

118 single-copy orthologs were selected from 6,520 shared gene families. These were

119 concatenated to reconstruct a calibrated phylogenomic tree based on the reported

120 divergence time of Acropora (Mao et al., in revision). We found that Astreopora sp1

121 split from Acropora ~ 53.6 Mya (95% highest posterior density (HPD): 51.02 - 56.21

122 My) (Fig 1b; Supplementary Fig 3). This established a timescale to analyze the timing

123 of the subsequent WGD.

124

125 WGD identification with the dS-based method

126 Synonymous substitution rate (dS) analysis has been widely used to infer WGD

127 [25,52]. We identified over 10,000 paralogous gene pairs, based on their sequence

128 similarities as well as we identified over 10,000 anchor gene pairs, based on synteny

129 information from each species (Supplementary Table 1; for details see Methods).

130 Then we calculated dS values from paralogous gene pairs and anchor gene pairs for

131 each species.

132

133 An ‘L-shaped’ distribution was evident in both paralogous and anchor gene pair dS

134 distributions of Astreaopora sp1, illustrating that no WGD occurred in Astreaopora

135 sp1. However, all five Acropora species displayed a similar peak in dS distributions

7 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

136 of both paralogous and anchor gene pairs (peak: 0~0.3), suggesting that WGD did

137 occur in Acropora (Fig 2a, Supplementary Figs 4, 5).

138

139 dS values of orthologous gene pairs between two pairs of species (Astreopora sp1 and

140 A. tenuis; A. tenuis and A. digitifera) were estimated as the speciation time between

141 them according to neutral evolution theory [48,53]. We combined the dS values of

142 paralogous gene pairs for the five Acropora species and estimated the peak in the log

143 dS distribution (modal value = -1.82). Also, we estimated the distribution of

144 orthologous gene pairs between Astreopora sp1 and A. tenuis (modal value = -0.31)

145 and the distribution of orthologous gene pairs between A. tenuis and A. digitifera

146 (modal value = -3.46). The result indicates that the WGD occurred in Acropora after

147 the split of Astreopora sp1 and A. tenuis (Fig 2b). In other words, an ancient WGD

148 event likely occurred in the most recent common ancestor of Acropora. Based on

149 speciation time estimated in the calibrated phylogenomic tree and assuming a constant

150 dS rate [25], we estimated that the WGD of Acropora occurred ~ 35.01 Mya (95%

151 confidence interval: 31.18 - 35.7 My) (Supplementary Table 2, for details see

152 Methods). Here, we defined this event as invertebrate α event of WGD specifically in

153 Acropora (IAsα).

154

155 Phylogenomic and synteny analysis of IAsα

156 If the proposed IAsα is correct, then the ohnologs of Acropora (paralogs created by

157 IAsα) should form two clades from their orthologs in Astreopora sp1 by mapping

158 IAsα onto phylogenetic trees [23,54]. In other words, the phylogenetic topology

159 would be (((Acropora clade1) bootstrap1, (Acropora clade2) bootstrap2), Astreopora

160 sp1), defined as gene duplication topology (Fig 2c).

8 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

161

162 We performed comprehensive phylogenomic analysis to confirm IAsα. First, we

163 defined orthogroups as clusters of homologous genes in Acropora derived from a

164 single gene in Astreopora sp1. Each orthogroup contained at least seven homologous

165 genes, including at least one gene copy in each Acropora species and one gene copy

166 in Astreopora sp1. We selected 883 orthogroups from 19,760 gene families, and

167 reconstructed the phylogeny of 883 orthogroups using both Maximum likelihood (ML)

168 and Bayesian methods. We found that the phylogeny of 205 orthogroups was

169 consistent with gene duplication topology supporting IAsα. We further defined the

170 205 orthogroups as core-orthogroups (Supplementary Table 3, See Additional file 1).

171

172 In particular, we found differential gene loss, retention, and duplication in Acropora

173 lineages. For instance, the phylogeny of orthogroup 1370 (alpha-protein kinase 1-like)

174 showed gene retention in A. subgrabla, A. digitifera, and A. echinata, gene loss in A.

175 tenuis, and an extra gene duplication in A. subgrabla. This implies that diversification

176 of duplicated genes may contribute to species complexity and evolutionary innovation

177 in Acropora [22] (Fig 2c).

178

179 In order to estimate the split time of the two Acropora clades that could be regarded

180 as the timing of IAsα, we selected 154 high-quality core-orthogroups, with both

181 bootstrap values in both Acropora clades > 70 in ML phylogeny, to reconstruct a

182 time-calibrated phylogeny from the 205 core-orthogroups using BEAST2 [23].

183 However, we found that it is difficult for the parameters in MCMC to converge in 70

184 core-orthogroups, and we successfully dated the phylogenetic trees of only 135 high-

185 quality core-orthogroups (See Additional file 1). Next, we estimated the distribution

9 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

186 of inferred node ages between the two Acropora clades and the peak value was

187 estimated as 30.78 My (95% confidence interval: 27.86-34.77 My), indicating that

188 IAsα occurred at 30.78 My (Fig 2d). This result strongly supports the timing of the

189 IAsα estimated using the dS-based method.

190

191 Intergenomic co-linearity is often used to directly identify ancient WGD and to

192 reconstruct ancestral karyotypes in vertebrates [48,53,55]. We performed

193 intergenomic co-linearity and synteny analysis between Astreopora sp1 and A. tenuis

194 to support IAsα. First, we found great co-linearity between Astreopora sp1 and A.

195 tenuis. Second, more duplicated scaffold segments were observed in A. tenuis than in

196 Astreopora sp1 (Supplementary Fig 6) and we also found that some duplicated

197 regions were located in different scaffolds in A. tenuis (Supplementary Fig 6).

198

199 In summary, we clearly established IAsα using the dS-based method, phylogenomic

200 and synteny analyses. Moreover, we suggest that IAsα probably occurred between

201 27.86 and 35.7 Mya (Fig 1b).

202

203 The fate of duplicated genes originating from IAsα

204 Duplicated genes provide substrates for diversification and evolutionary novelty, and

205 most of them are regulators of complex gene networks in vertebrates and plants

206 [23,48,56]. We examined gene ontology (GO) for all genes among the 154 high-

207 quality core-orthogroups to investigate their roles in IAsα and found that their

208 molecular functions have been enriched in specific categories; transporter, catalytic,

209 binding, and receptor activity, most of which are involved in gene regulation (Table

210 1).

10 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

211

212 Further, we identified some duplicated genes under subfunctionalization and

213 neofunctionalization, possibly contributing to stress responses of corals. dnaJ

214 homolog subfamily B member 11-like (DNAJB) protein was shown to be involved in

215 heat stress responses in marine organisms [57,58]. Orthogroups 1247 (DNAJB) has

216 two main domains (Ras and Dnaj domains) in Astreopora sp1 representing the ancient

217 state. Each of the two domains was independently lost in the duplicated genes,

218 resulting in complementary functions of the duplicated genes after IAsα (Fig 3a and

219 Supplementary Figs 7). Excitatory amino acid transporters may be related to

220 symbiotic interactions in Acropora [59]. Orthogroups 1244 (excitatory amino acid

221 transporter 1-like) was predicted as a six transmembrane protein, and a high number

222 of mutations have accumulated in both untransmembrane and transmembrane regions,

223 suggesting that new functions would be generated (Fig 3b and Supplementary Figs 8).

224 These examples suggest that IAsα participates in both stress responses and symbiotic

225 interactions in Acropora. Moreover, these results accord with previous patterns of the

226 fate of duplicated genes in vertebrates and plants [17,19,23,48], indicating that the

227 IAsα possibly contributes to the species complexity and diversification in Acropora.

228

229 Gene expression patterns of duplicated genes across five developmental stages in A.

230 digitifera

231

232 To better to understand evolution of duplicated genes, gene expression analysis across

233 five developmental stages in A. digitifera (blastula, gastrula, postgastrula, planula, and

234 adult polyps) was carried out based on previous transcriptome data [60]. We

235 identified 236 ohnologous pairs in A. digitifera from 883 ML phylogeny (for details

11 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

236 see Methods) and found that these ohnologous pairs present an interesting gene

237 expression profiling. We divided 236 ohnologous pairs into two clusters based on the

238 pairwise correlation of gene expression during development (high correlation or HC:

239 P<0.05; no correlation or NC: P>=0.05; Pearson’s correlation test); 25% (25/236)

240 ohnologous pairs in HC and 75% (211/236) ohnologous pairs in NC (Fig 4a).

241 Ohnologous pairs in the HC cluster are enriched in protein kinase, while ohnologous

242 pairs in the NC cluster are enriched in membrane transporter and ion binding proteins

243 (Fig 4b). This result indicates that the two clusters of ohnologous pairs potentially

244 evolved into different gene functions. Additionally, we compared dN/dS values in

245 order to investigate selective pressure between HC and NC clusters (Fig 4c), but there

246 is no significant difference between the two clusters (Mann-Whitney-Wilcoxon Test,

247 P = 0.51).

248

249 Evolution of toxic proteins in Cnidaria

250 Next, we investigated the role of IAsα in the diversification of toxins in Acropora. We

251 identified about two hundred putative toxic proteins in each of the five Acropora

252 species, and then we clustered them with putative toxic proteins of Astreopora sp1

253 and other six Cnidarian species (Hydra magnapapillata, Nematostella vectensis,

254 Montastraea cavernosa, Porites australiensis, Porites astreoides, and Porites lobata)

255 into 24 gene families (Supplementary Table 4, for details see Methods). Based on

256 reconstruction of the gene family phylogeny, which contains at least 15 genes (See

257 Additional file 1), we found that toxic proteins have undergone widespread gene

258 duplications in Cnidaria, and most of gene duplications occurred in individual species

259 lineages, except for Acropora (Fig 5 and Supplementary Figs 9-16). Interestingly,

260 gene duplications occurred in the most recent common ancestor of Acropora in 9 of

12 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

261 15 gene families, potentially caused by WGD (IAsα). For example, in gene family-1

262 (Coagulation factor X), each species contains around 50 genes, except H.

263 magnapapillata, and gene duplications occurred frequently in individual species

264 lineages: Astreopora sp1, M. cavernosa, N. vectensis, and P. australiensis. However,

265 five gene duplications occurred in the most recent common ancestor of Acropora by

266 WGD (Fig 5). These results indicated that IAsα contributed to the diversification of

267 proteinaceous toxins in Acropora.

268

269 Discussion

270 Ancient WGD is considered as a significant evolutionary factor in the origin and

271 diversification of evolutionary lineages [17,18,28,54], but much work remains to

272 definitively identify WGD and to understand its consequences in different

273 evolutionary lineages. Staghorn corals of the genus Acropora, which constitute the

274 foundation of modern coral reef ecosystems, are hypothesized to have originated

275 through polyploidization [2,11,15]. However, there is no genetic evidence to support

276 this assertion. To that end, we analyzed genomes of one Astreopora and five

277 Acropora species to address the possibility of WGD in Acropora and the functional

278 fate of duplicated genes from that event.

279

280 To the best of our knowledge, this is the first study to report genomic-scale evidence

281 of WGD in corals (IAsα). We find that large numbers of ohnologs are retained in

282 Acropora species and hundreds of gene families display phylogenetic duplication

283 topology among the five Acropora species, meanwhile, our synteny analysis between

284 Astreopora. sp1 and A. tenuis directly supports IAsα. However, reconstruction of the

285 ancestral karyotype will necessitate genomes assembled to the chromosome level to

13 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

286 fully understanding gene fractionation and chromosome arrangements in Acropora

287 under IAsα [27,61].

288

289 Ancient WGD is usually inferred using the dS-based method, but artificial signals in

290 dS distributions have been reported in previous studies, because of dS saturation (dS

291 value > 1) or because of using poorly annotated genomes [24,52,62]. There is an extra

292 peak in the dS distribution of anchor gene pairs in A. digitifera and A. tenuis. One

293 possible explanation is that the extra peak is artifactitious because few anchor gene

294 pairs were used in the analysis. However, this could also indicate a second WGD

295 event in Acropora. We found few orthogroups with topologies that fit the two

296 proposed WGDs events (Supplementary Fig 17). If a second WGD event occurred,

297 the reason that the second WGD signal appeared among anchor gene pairs rather than

298 among paralogous gene pairs may be that the paralogs generated by the second WGD

299 have been largely lost; thus, few of them are only retained in conserved order.

300 Fortunately, a new maximum likelihood phylogeny modeling approach was recently

301 developed to overcome difficulties of the dS-based method [24,62]. We used it to test

302 whether a second WGD occurred in Acropora. The result showed that one WGD

303 event is the best model in Acropora and it occurred 30.69 to 34.69 Mya

304 (Supplementary Table 5 and Supplementary Table 6; for details see Methods). Thus,

305 we have supportive genome-scale evidence to support IAsα, but as yet, there is no

306 conclusive evidence to support a second WGD in Acropora.

307

308 It is crucial to accurately estimate the timing of a WGD event to understand its

309 evolutionary consequences [23,25]. Our study has clearly estimated the timing of

310 IAsα using both phylogenomic analysis and the dS-based method. We suggest that

14 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

311 IAsα probably occurred between 27.86 and 35.7 Mya (Fig 1b). Interestingly, species

312 turnover events usually occurred with extinctions [63], and one species turnover event

313 in corals (Oligocene-Miocene transition: OMT) was suggested to have occurred from

314 15.97 to 33.7 Mya [8]. The timing of IAsα may correspond to a massive extinction of

315 corals created by OMT. This finding supports the hypothesis that WGD may enable

316 organisms to escape extinction during drastic environmental changes [17] (Fig 1b).

317

318 The occurrence of IAsα raises the question of what impact it may have had upon coral

319 evolution [15,32]. We performed GO analysis on duplicated genes and examined

320 several duplicated gene families, showing that duplicated genes following IAsα

321 indeed provided raw genetic material for Acropora to diversify and are potentially

322 crucial for stress responses. In particular, toxin diversification in Acropora was

323 mainly generated by WGD. In addition, we focused on expression patterns of

324 duplicated genes in A. digitifera, showing that expressions of duplicated protein

325 kinases are likely to be correlated during development. A possible explanation may be

326 that protein kinases are probably retained in complex signal transduction pathways via

327 subfunctionalization or dosage effects [22,32]. However, expressions of duplicated

328 membrane proteins are likely uncorrelated probably because these proteins may have

329 developed different functions via neofunctionalization, such as excitatory amino acid

330 transporters (orthogroups 1244). However, there is still much work needed to

331 investigate molecular mechanisms of duplicated genes to examine these hypotheses in

332 the diversification of Acropora [64]. For instance, previous gene functional studies

333 have demonstrated that voltage-gated sodium channel gene paralogs, duplicated in

334 teleosts, contributed to the acquisition of new electric organs via neofunctionalization

335 in both mormyroid and gymnotiform electric fishes [65,66].

15 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

336

337 Our previous study proposed that adaptive radiation in Acropora was probably driven

338 by introgression (Mao et al., in revision); thus, Acropora is the first invertebrates

339 lineage reported to have undergone both WGD and introgression. Meanwhile, both

340 introgression and WGD have also been reported in cichlid fish lineages [67], a

341 famous model for adaptive radiation in vertebrates [67,68]. Both WGD and

342 introgression are regarded as significant forces in adaptive radiation of organisms

343 [17,67], but we still do not understand the relationship between WGD and

344 introgression in adaptive radiations [69].

345

346 In conclusion, this study identified an ancient WGD shared by Acropora species

347 (IAsα) that not only provides new insights into the evolution of reef-building corals,

348 but also expands a new model of WGD in animals.

349

350 Methods

351 Species information, genomic data and gene families cluster

352 Data can be accessed at: http://marinegenomics.oist.jp and

353 http://comparative.reefgenomics.org/datasets.html. The Acropora species information

354 in this study will be described in the paper (Mao et al., in revision). Information about

355 Astreopora sp1 was described previously [7] and transcriptome data across five

356 development stages was descried previously [60]. Protein sequences of the six species

357 were combined together to perform all-against-all BLASTP approach to find all

358 orthologs and paralogs among six species. Then, OrthoMCL was used with default

359 settings to cluster homologs into 19,760 gene families according to sequence

360 similarity [70].

16 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

361

362 Single-copy orthologs and reconstruction of a calibrated phylogenomic tree

363 A custom python script was used to select 3,461 single-copy orthologs with

364 only one gene copy in each species. For each sequence alignments of single-copy

365 orthologs, coding sequences were aligned with MAFFT [71] as described previously.

366 Then, the concatenated sequences of 3,461 single-copy orthologs were used to

367 reconstruct the phylogenomic tree (species tree) with BEAST2 [72]. First, we

368 partitioned the concatenated coding sequences by codon position. Molecular clock

369 and trees, except substitution model, were linked together. Then, divergence time was

370 estimated using the HKY substitution model, relaxed lognormal clock model, and

371 calibrated Yule prior with the divergence time estimated in our previous study (Mao

372 et al., in submission). We ran BEAST2 three times independently, 50 million Markov

373 chain Monte Carlo (MCMC) generations for each run, then we used Tracer to check

374 the log files and we found that ESS of each of parameters exceeded 200.

375

376 Orthogroup selection and detection of a WGD event with dS analysis

377 (a) dS distributions of paralogous gene pairs

378 Paralogous gene pairs of each species were identified by all-against-all

379 BLASTP approach and then OrthoMCL was used to cluster paralogs to gene families

380 for each species [70]. Gene families with fewer than 20 genes were used to calculate

381 dS values. Each gene pair within a given gene family was aligned with MAFFT [71]

382 and aligned sequences were used to calculate dS values with PAML [73]. The dS

383 distribution of each species was plotted with bins=0.02 in R [74].

384 (b) dS distributions of anchor gene pairs

17 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

385 First, we used MCScanX with default settings (except for match_size=3) to

386 find anchor gene pairs based on synteny information for each species [75]. Each

387 anchor gene pair was aligned with MAFFT [71] and aligned sequences were used to

388 calculate dS values with PAML [73]. The dS distribution of each species was plotted

389 with bins=0.02 in R [74].

390 (c) dS distributions of orthologous gene pairs

391 First, we used MCScanX with default settings (except for match_size=3) to

392 find orthologous gene pairs based on synteny information between Astreopora sp1

393 and A. tenuis, and between A. tenuis and A. digitifera [75]. Each orthologous gene

394 pair was aligned with MAFFT [71] and aligned sequences were used to calculate dS

395 values with PAML [73]. dS distributions of all species were plotted with bins=0.02 in

396 R [74].

397

398 Detection of a WGD event using phylogenetic analysis

399 A custom python script was used to select the 883 gene families, including at

400 least one gene copy in Astreopora, one gene copy in each of the five species and at

401 least two ohnologs in one of five Acropora species, as orthogroups. Ohnologs are

402 defined as paralogs originating from WGD.

403 For each of the 883 gene tree reconstructions, we used MAFFT [71] to align

404 amino acid sequences of each single-copy ortholog. We aligned coding sequences

405 with TranslatorX [76] based on amino acid alignments and we excluded the single-

406 copy orthologous genes containing ambiguous ‘N’. PartitionFinder [77] was used to

407 find the best substitution model for RAxML (Version 8.2.2) [78] and MrBayes

408 (Version 3.2.3) [79]

18 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

409 Then, 205 orthogroups, for which phylogeny matched the duplication

410 topology (Astreaopora, (Acropora, Acropora)), were selected as core-orthogroups by

411 eyes. The 154 high quality core-orthogroups, for which clades’ bootstrap values in

412 ML phylogeny exceeded 70, were used to perform molecular dating with BEAST2

413 based on the calibrated phylogenomic tree. Molecular clock and trees, except

414 substitution model, were linked together. Then, divergence time was estimated using

415 the HKY substitution model, relaxed lognormal clock model, and calibrated Yule

416 prior with the divergence time from our previous study (Mao et al., in submission).

417 We ran BEAST2 three times independently, 30 million Markov chain Monte Carlo

418 (MCMC) generations for each run. Then we used Tracer to check the log files. 135

419 time-calibrated phylogeny with ESS values exceeded 200 were carried out by

420 BEAST2 [72].

421

422 Estimating peak values in dS distributions and inferred node ages’ distribution with

423 KDE toolbox

424 Each distribution was estimated using KDE toolbox in MATLAB, as

425 described previously [25,80].

426 (a). Estimating peak values in distributions

427 To estimate the age of WGD within dS distributions, we assumed the peak

428 value in orthologous gene pair dS distributions as the split time between two species:

429 the split time between Astreopora sp1 and A. tenuis is 53.6 My, whereas that the split

430 time between A. tenuis and A. digitifera is 14.69 My. Before we used the kde()

431 function in KDE toolbox, we first truncated dS distributions to avoid estimation bias

432 due to extreme values: the dS distribution of orthologous gene pairs between

433 Astreopora sp1 and A. tenuis was truncated with a range from -1 to1 while the dS

19 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

434 distribution of orthologous gene pairs between A. tenuis and A. digitifera was

435 truncated with a range from -5 to -2. Then, we used the kde() function in KDE

436 toolbox to estimate the peak values of these two dS distributions as -0.314 and -

437 3.4596, respectively. Moreover, the distribution of Acropora paralogous gene pairs

438 was truncated with a range from -4 to 0 and we estimated the peak value of this

439 distribution as -1.8165. We also used bootstrapping to estimate 95% confidence

440 intervals (CIs) of Acropora paralogous gene pairs distribution as -1.7606 to -2.1261

441 (31.18 to 35.71 My). For bootstrapping, we generated 100 bootstrap samples for each

442 distribution by sampling with replacement from the original data distribution (49,002

443 samples in the original distribution) with the sample() function. We estimated

444 maximum peak values for each 100 bootstrap samples. Then we sorted maximum

445 peak values and values of 6th and 95th rank were used to define the 95% CI.

446 (b). Estimating peak values in distributions of inferred node age

447 To estimate the age of WGD in the distribution of inferred node ages, we used

448 the kde() function in KDE toolbox to estimate the peak value as 30.78 My, and we

449 used bootstrapping to estimate the 95% CIs as 27.86 to 34.77 My. For bootstrapping,

450 we generated 100 bootstrap samples from the distribution by sampling with

451 replacement from the original data distribution (135 samples in the original

452 distribution) with the sample() function. We estimated maximum peak values for each

453 of 100 bootstrap samples. Then, we sorted maximum peak values and values of 6th

454 and 95th rank defined the 95% CI.

455

456 Maximum likelihood approach to detect WGD with gene family count data

457 First, we filtered gene family cluster data generated by OrthoMCL described

458 above [70]. The gene family, including only one Astreopora sp1 gene and at least one

20 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

459 gene in each of the five Acropora species, was counted. Then, we used the WGDgc

460 package in R to estimate log likelihood for parameters (0, 1, 2, 3) of WGD event(s)

461 with setting (dirac=1,conditioning="twoOrMore") [62]. Then, we performed

462 likelihood ratio test (pchisq(2*(Likelihood_1-Likelihood_2), df=1,

463 lower.tail=FALSE)) to find the best model and found that one WGD event was the

464 best model to fit the gene family count data. We estimated the age of WGD on 4 My

465 intervals between 18.69 and 38.69 My under a one WGD event model. The lowest log

466 likelihood was shown at the age of WGD: 30.69 and 34.69 My.

467

468 Gene expression profiling analysis and dN/dS calculation

469 We selected 236 gene pairs of A. digitifera (ohnologous gene pairs) from 831

470 orthogroups. We BLASTed these ohnologous gene pairs against the gene expression

471 data across five developmental stages [60] and these data were normalized for each

472 developmental stage. Correlations between two ohnologous genes were performed

473 using Pearson’s correlation in R [74]. Hierarchical clustering was performed using

474 Pheatmap for HC cluster genes and NC cluster genes, respectively. Pairwise dN/dS

475 ratios were calculated with PAML using codeml based on the coding sequence

476 alignment of ohnologous gene pairs [73]. The dN/dS distribution was plotted with

477 ggplot2 in R and significance tests of differences between dN/dS distributions were

478 evaluated by a Mann-Whitney test in R [74].

479

480 Evolution analysis of toxic proteins in corals

481 The 55 toxic proteins of A. digitifera identified in the previous study were

482 downloaded from http://www.uniprot.org/ as queries. The proteins sequences of

483 Porites astreoides, Porites australiensis, Porites lobata, Montastraea cavernosa,

21 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

484 Hydra magnipapillata and Nematostella vectensis were downloaded from

485 http://comparative.reefgenomics.org/datasets.html, and combined them with protein

486 sequences of six Acroporid species to create a search database.

487 We identified candidates of toxic proteins by BLASTing the 55 toxins against

488 the combined protein sequences with settings: e-value < 1e-20 and identity > 30%.

489 Then, we used OrthoMCL to cluster candidates of toxins into 24 gene families and

490 reconstructed their ML gene trees with ExaML and RAxML. Each gene tree was

491 rooted at a branch or clade of query sequences.

492

493 Gene ontology enrichment for duplicated genes of core-orthogroups and protein

494 domains and transmembrane helices prediction

495 We BLASTed the sequences of 154 high quality core-orthogroups and

496 ohnologous gene pairs in Acropora against the UNIPROT database to find best hits.

497 Identical hits in each ohonlogs group were removed and the remaining hits were used

498 to perform gene enrichment in David [81]. We also used InterProScan [82] to predict

499 protein domains and used the TMHMM Server (v. 2.0) [83] to predict transmembrane

500 helices from protein sequences.

501

502 Acknowledgments

503 This study was supported by OIST and was funded by JSPS grants (No.

504 17J00557 to YFM). We thank Dr. Douglas E Soltis and Dr. Pamela S Soltis for their

505 comments and insight into this draft manuscript. We thank Dr. Steven D. Aird for

506 editing the manuscript.

507

508 Availability of data and materials

22 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

509 Genome assembly and gene models in this study can be downloaded from

510 http://marinegenomics.oist.jp.

511

512 Authors’ contributions

513 YFM and NS conceived the study. CS provided genomic data. YFM

514 performed all analysis in this study. YFM and NS wrote the initial manuscript and all

515 authors edited the final manuscript.

516

517 Competing Interests

518 The authors declare no competing interests

519

520 Additional files

521 Additional file 1: An Excel file with three tables listing 1) the original data of

522 gene family clusters in this study, 2) orthogroup datasets in this study, 3) node ages of

523 core-orthogrouops inferred by BEAST2 in this study, 4) data of gene expression

524 analysis for ohnologous gene pairs, and 5) toxin gene family clusters.

525

23 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

526 Reference:

527 1. Ainsworth TD, Heron SF, Ortiz JC, Mumby PJ, Grech A, et al. (2016) Climate

528 change disables coral bleaching protection on the Great Barrier Reef. Science

529 352: 338-342.

530 2. Renema W, Pandolfi JM, Kiessling W, Bosellini FR, Klaus JS, et al. (2016) Are

531 coral reefs victims of their own past success? Science advances 2: e1500850.

532 3. Wallace CC (2012) Acroporidae of the Caribbean. Geologica Belgica 15: 388-393.

533 4. Richards ZT, Miller DJ, Wallace CC (2013) Molecular phylogenetics of

534 geographically restricted Acropora species: Implications for threatened species

535 conservation. Molecular Phylogenetics and Evolution 69: 837-851.

536 5. Wallace CC, Rosen BR (2006) Diverse staghorn corals (Acropora) in high-latitude

537 Eocene assemblages: implications for the evolution of modern diversity

538 patterns of reef corals. Proceedings of the Royal Society B: Biological

539 Sciences 273: 975-982.

540 6. Fukami H, Omori M, Hatta M (2000) Phylogenetic relationships in the coral family

541 Acroporidae, reassessed by inference from mitochondrial genes. Zoological

542 Science 17: 689-696.

543 7. Suzuki G, Nomura K (2013) Species boundaries of Astreopora corals (Scleractinia,

544 Acroporidae) inferred by mitochondrial and nuclear molecular markers.

545 Zoological science 30: 626-632.

546 8. Edinger EN, Risk MJ (1994) Oligocene-Miocene extinction and geographic

547 restriction of Caribbean corals: roles of turbidity, temperature, and nutrients.

548 Palaios: 576-598.

549 9. Renema W, Bellwood DR, Braga JC, Bromfield K, Hall R, et al. (2008) Hopping

550 hotspots: Global shifts in marine Biodiversity. Science 321: 654-657.

24 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

551 10. Vollmer SV, Palumbi SR (2002) Hybridization and the evolution of reef coral

552 diversity. Science 296: 2023-2025.

553 11. Kenyon JC (1997) Models of reticulate evolution in the coral genus Acropora

554 based on chromosome numbers: Parallels with plants. Evolution 51: 756-767.

555 12. Hamada M, Shoguchi E, Shinzato C, Kawashima T, Miller DJ, et al. (2013) The

556 complex NOD-Like receptor repertoire of the coral Acropora digitifera

557 includes novel domain combinations. Molecular Biology and Evolution 30:

558 167-176.

559 13. Gacesa R, Chung R, Dunn SR, Weston AJ, Jaimes-Becerra A, et al. (2015) Gene

560 duplications are extensive and contribute significantly to the toxic proteome of

561 nematocysts isolated from Acropora digitifera (Cnidaria: Anthozoa:

562 Scleractinia). BMC Genomics 16.

563 14. van Oppen MJ, McDonald BJ, Willis B, Miller DJ (2001) The evolutionary

564 history of the coral genus Acropora (Scleractinia, Cnidaria) based on a

565 mitochondrial and a nuclear marker: reticulation, incomplete lineage sorting,

566 or morphological convergence? Molecular Biology and Evolution 18: 1315-

567 1329.

568 15. Willis BL, van Oppen MJH, Miller DJ, Vollmer SV, Ayre DJ (2006) The role of

569 hybridization in the evolution of reef corals. Annual Review of Ecology

570 Evolution and Systematics 37: 489-517.

571 16. Richards ZT, Hobbs JPA (2015) Hybridisation on coral reefs and the conservation

572 of evolutionary novelty. Current Zoology 61: 132-145.

573 17. Van De Peer Y, Mizrachi E, Marchal K (2017) The evolutionary significance of

574 polyploidy. Nature Reviews Genetics 18: 411-424.

25 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

575 18. Van de Peer Y, Maere S, Meyer A (2009) The evolutionary significance of

576 ancient genome duplications. Nature Reviews Genetics 10: 725-732.

577 19. Soltis PS, Marchant DB, Van de Peer Y, Soltis DE (2015) Polyploidy and genome

578 evolution in plants. Current Opinion in Genetics & Development 35: 119-125.

579 20. Dehal P, Boore JL (2005) Two rounds of whole genome duplication in the

580 ancestral vertebrate. Plos Biology 3: 1700-1708.

581 21. Christoffels A, Koh EGL, Chia JM, Brenner S, Aparicio S, et al. (2004) Fugu

582 genome analysis provides evidence for a whole-genome duplication early

583 during the evolution of ray-finned fishes. Molecular Biology and Evolution

584 21: 1146-1151.

585 22. Glasauer SMK, Neuhauss SCF (2014) Whole-genome duplication in teleost fishes

586 and its evolutionary consequences. Molecular Genetics and Genomics 289:

587 1045-1060.

588 23. Jiao YN, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, et al.

589 (2011) Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97-

590 U113.

591 24. Tiley GP, Ane C, Burleigh JG (2016) Evaluating and Characterizing Ancient

592 Whole-Genome Duplications in Plants with Gene Count Data. Genome

593 Biology and Evolution 8: 1023-1037.

594 25. Vanneste K, Baele G, Maere S, Van de Peer Y (2014) Analysis of 41 plant

595 genomes supports a wave of successful genome duplications in association

596 with the Cretaceous-Paleogene boundary. Genome Research 24: 1334-1347.

597 26. Soltis DE, Albert VA, Leebens‐Mack J, Bell CD, Paterson AH, et al. (2009)

598 Polyploidy and angiosperm diversification. American journal of botany 96:

599 336-348.

26 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

600 27. Smith JJ, Kuraku S, Holt C, Sauka-Spengler T, Jiang N, et al. (2013) Sequencing

601 of the sea lamprey (Petromyzon marinus) genome provides insights into

602 vertebrate evolution. Nature Genetics 45: 415-421.

603 28. Kenny NJ, Chan KW, Nong W, Qu Z, Maeso I, et al. (2017) Ancestral whole-

604 genome duplication in the marine chelicerate horseshoe crabs (vol 116, pg

605 190, 2016). Heredity 119: 388-388.

606 29. Schwager EE, Sharma PP, Clarke T, Leite DJ, Wierschin T, et al. (2017) The

607 house spider genome reveals an ancient whole-genome duplication during

608 arachnid evolution. BMC Biology 15.

609 30. Li Z, Tiley GP, Galuska SR, Reardon CR, Kidder TI, et al. (2018) Multiple large-

610 scale gene and genome duplications during the evolution of hexapods.

611 Proceedings of the National Academy of Sciences: 201710791.

612 31. Sémon M, Wolfe KH (2007) Consequences of genome duplication. Current

613 opinion in genetics & development 17: 505-512.

614 32. Conant GC, Birchler JA, Pires JC (2014) Dosage, duplication, and diploidization:

615 clarifying the interplay of multiple models for duplicate gene evolution over

616 time. Current Opinion in Plant Biology 19: 91-98.

617 33. Veron AS, Kaufmann K, Bornberg-Bauer E (2006) Evidence of interaction

618 network evolution by whole-genome duplications: a case study in MADS-box

619 proteins. Molecular Biology and Evolution 24: 670-678.

620 34. Canestro C, Albalat R, Irimia M, Garcia-Fernàndez J. Impact of gene gains, losses

621 and duplication modes on the origin and diversification of vertebrates; 2013.

622 Elsevier. pp. 83-94.

623 35. Kordiš D, Gubenšek F (2000) Adaptive evolution of animal toxin multigene

624 families. Gene 261: 43-52.

27 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

625 36. Kondrashov FA (2012) Gene duplication as a mechanism of genomic adaptation

626 to a changing environment. Proceedings of the Royal Society B: Biological

627 Sciences 279: 5048-5057.

628 37. Hargreaves AD, Swain MT, Hegarty MJ, Logan DW, Mulley JF (2014)

629 Restriction and recruitment—gene duplication and the origin and evolution of

630 snake venom toxins. Genome Biology and Evolution 6: 2088-2095.

631 38. Vonk FJ, Casewell NR, Henkel CV, Heimberg AM, Jansen HJ, et al. (2013) The

632 king cobra genome reveals dynamic gene evolution and adaptation in the

633 snake venom system. Proceedings of the National Academy of Sciences 110:

634 20651-20656.

635 39. Armoza-Zvuloni R, Schneider A, Sher D, Shaked Y (2016) Rapid hydrogen

636 peroxide release from the coral Stylophora pistillata during feeding and in

637 response to chemical and physical stimuli. Scientific reports 6: 21000.

638 40. Ben-Ari H, Paz M, Sher D (2018) The chemical armament of reef-building corals:

639 inter-and intra-specific variation and the identification of an unusual

640 actinoporin in Stylophora pistilata. Scientific reports 8: 251.

641 41. Fürthauer M, Thisse B, Thisse C (1999) Three different noggin genes antagonize

642 the activity of bone morphogenetic proteins in the zebrafish embryo.

643 Developmental biology 214: 181-196.

644 42. Stuber CW, Goodman MM (1983) Inheritance, intracellular localization, and

645 genetic variation of phosphoglucomutase isozymes in maize (Zea mays L.).

646 Biochemical genetics 21: 667-689.

647 43. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of

648 next-generation sequencing technologies. Nature Reviews Genetics 17: 333.

28 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

649 44. Hardwick SA, Deveson IW, Mercer TR (2017) Reference standards for next-

650 generation sequencing. Nature Reviews Genetics 18: 473.

651 45. Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate

652 genes. Science 290: 1151-1155.

653 46. Blanc G, Hokamp K, Wolfe KH (2003) A recent polyploidy superimposed on

654 older large-scale duplications in the Arabidopsis genome. Genome research

655 13: 137-144.

656 47. Blomme T, Vandepoele K, De Bodt S, Simillion C, Maere S, et al. (2006) The

657 gain and loss of genes during 600 million years of vertebrate evolution.

658 Genome biology 7: R43.

659 48. Zhang GQ, Liu KW, Li Z, Lohaus R, Hsiao YY, et al. (2017) The Apostasia

660 genome and the evolution of orchids. Nature 549: 379-+.

661 49. Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm

662 genome evolution by phylogenetic analysis of chromosomal duplication

663 events. Nature 422: 433.

664 50. Chen ZJ, Birchler JA (2013) Polyploid and hybrid genomics: John Wiley & Sons.

665 51. Soltis PS, Soltis DE (2012) Polyploidy and genome evolution: Springer.

666 52. Vanneste K, Van de Peer Y, Maere S (2012) Inference of genome duplications

667 from age distributions revisited. Molecular biology and evolution 30: 177-190.

668 53. Berthelot C, Brunet F, Chalopin D, Juanchich A, Bernard M, et al. (2014) The

669 rainbow trout genome provides novel insights into evolution after whole-

670 genome duplication in vertebrates. Nature communications 5: 3657.

671 54. Marcet-Houben M, Gabaldón T (2015) Beyond the whole-genome duplication:

672 phylogenetic evidence for an ancient interspecies hybridization in the baker's

673 yeast lineage. PLoS biology 13: e1002220.

29 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

674 55. Nakatani Y, Takeda H, Kohara Y, Morishita S (2007) Reconstruction of the

675 vertebrate ancestral genome reveals dynamic genome reorganization in early

676 vertebrates. Genome research 17: 1254-1265.

677 56. Kassahn KS, Dang VT, Wilkins SJ, Perkins AC, Ragan MA (2009) Evolution of

678 gene function and regulatory control after whole-genome duplication:

679 comparative analyses in vertebrates. Genome research 19: 1404-1418.

680 57. Wang W, Hui JHL, Chan TF, Chu KH (2014) De Novo transcriptome sequencing

681 of the snail echinolittorina malaccana: identification of genes responsive to

682 thermal stress and development of genetic markers for population studies.

683 Marine Biotechnology 16: 547-559.

684 58. Fujikawa T, Munakata T, Kondo S, Satoh N, Wada S (2010) Stress response in

685 the ascidian Ciona intestinalis: transcriptional profiling of genes for the heat

686 shock protein 70 chaperone system under heat stress and endoplasmic

687 reticulum stress. Cell Stress & Chaperones 15: 193-204.

688 59. Bertucci A, Foret S, Ball EE, Miller DJ (2015) Transcriptomic differences

689 between day and night in Acropora millepora provide new insights into

690 metabolite exchange and light-enhanced calcification in corals. Molecular

691 Ecology 24: 4489-4504.

692 60. Reyes-Bermudez A, Villar-Briones A, Ramirez-Portilla C, Hidaka M, Mikheyev

693 AS (2016) Developmental progression in the coral Acropora digitifera is

694 controlled by differential expression of distinct regulatory gene networks.

695 Genome Biology and Evolution 8: 851-870.

696 61. Smith JJ, Keinath MC (2015) The sea lamprey meiotic map improves resolution

697 of ancient vertebrate genome duplications. Genome research 25: 1081-1090.

30 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

698 62. Rabier CE, Ta T, Ane C (2014) Detecting and locating whole genome

699 duplications on a phylogeny: a probabilistic approach. Molecular Biology and

700 Evolution 31: 750-762.

701 63. Jackson ST, Sax DF (2010) Balancing biodiversity in a changing environment:

702 extinction debt, immigration credit and species turnover. Trends in Ecology &

703 Evolution 25: 153-160.

704 64. Yasuoka Y, Shinzato C, Satoh N (2016) The mesoderm-forming gene brachyury

705 regulates ectoderm-endoderm demarcation in the coral Acropora digitifera.

706 Current Biology 26: 2885-2892.

707 65. Zakon HH, Lu Y, Zwickl DJ, Hillis DM (2006) Sodium channel genes and the

708 evolution of diversity in communication signals of electric fishes: convergent

709 molecular evolution. Proceedings of the National Academy of Sciences 103:

710 3675-3680.

711 66. Arnegard ME, Zwickl DJ, Lu Y, Zakon HH (2010) Old gene duplication

712 facilitates origin and diversification of an innovative communication system

713 twice. Proceedings of the National Academy of Sciences 107: 22172-22177.

714 67. Berner D, Salzburger W (2015) The genomics of organismal diversification

715 illuminated by adaptive radiations. Trends in Genetics 31: 491-499.

716 68. Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, et al. (2014)

717 Genomics and the origin of species. Nature Reviews Genetics 15: 176-192.

718 69. Soltis PS, Soltis DE (2009) The role of hybridization in plant speciation. Annual

719 review of plant biology 60: 561-588.

720 70. Li L, Stoeckert CJ, Jr., Roos DS (2003) OrthoMCL: identification of ortholog

721 groups for eukaryotic genomes. Genome research 13: 2178-2189.

31 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

722 71. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid

723 multiple sequence alignment based on fast Fourier transform. Nucleic Acids

724 Research 30: 3059-3066.

725 72. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, et al. (2014) BEAST 2: a

726 software platform for Bayesian evolutionary analysis. PLOS Computational

727 Biology 10: e1003537.

728 73. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood.

729 Molecular Biology and Evolution 24: 1586-1591.

730 74. Team RC (2013) R: A language and environment for statistical computing.

731 75. Wang YP, Tang HB, DeBarry JD, Tan X, Li JP, et al. (2012) MCScanX: a toolkit

732 for detection and evolutionary analysis of gene synteny and collinearity.

733 Nucleic Acids Research 40.

734 76. Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: multiple alignment of

735 nucleotide sequences guided by amino acid translations. Nucleic Acids Res

736 38: W7-13.

737 77. Lanfear R, Calcott B, Ho SY, Guindon S (2012) Partitionfinder: combined

738 selection of partitioning schemes and substitution models for phylogenetic

739 analyses. Molecular Biology and Evolution 29: 1695-1701.

740 78. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, et al. (2012)

741 MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice

742 across a large model space. Systematic Biology 61: 539-542.

743 79. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-

744 analysis of large phylogenies. Bioinformatics 30: 1312-1313.

32 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

745 80. Barone P (2014) Kernel density estimation via diffusion and the complex

746 exponentials approximation problem. Quarterly of Applied Mathematics 72:

747 291-310.

748 81. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative

749 analysis of large gene lists using DAVID bioinformatics resources. Nature

750 protocols 4: 44-57.

751 82. Zdobnov EM, Apweiler R (2001) InterProScan: an integration platform for the

752 signature-recognition methods in InterPro. Bioinformatics 17: 847-848.

753 83. Krogh A, Larsson B, von Heijne G, Sonnhammer ELL (2001) Predicting

754 transmembrane protein topology with a hidden Markov model: Application to

755 complete genomes. Journal of Molecular Biology 305: 567-580.

756

33 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

757 Figures

758

759 Figure 1. Ancient WGD in the reef-building coral Acropora (IAsα). (a). Venn

760 diagram of shared and unique gene families in six Acroporid species. (b). A calibrated

761 phylogenomic tree of six Acroporid species inferred from 3,461 single-copy orthologs

762 using BEAST2. Horizontal bars on branches of the tree represent the timing of WGD

763 in Acropora. The timing of IAsα was estimated at 35.01 Mya (95% confidence

764 interval: 31.18-35.7 Mya) by dS-based analysis (horizontal blue bar) and 30.78 Mya

765 (95% confidence interval: 27.86-34.77 Mya) by phylogenomic analysis (horizontal

766 orange bar). Grey shading represents the timing of one coral species turnover event,

767 the Oligocene-Miocene transition (OMT), suggesting that IAsα is correlated with

768 OMT.

769

34 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

770 Figure 2. Ancient WGD identification (IAsα) and timing of the event in Acropora.

771 (a). Frequency distribution of dS values for paralogous gene pairs in five Acropora

772 and one Astreopora species showing that a WGD event occurred in Acropora. Similar

773 peaks (dS value: 0.1-0.3) in dS distributions of five Acropora lineages indicate that a

774 WGD event occurred in Acropora. (Light red: A. digitifera; Light yellow: A. echinata;

775 Light green: A. gmmifera; Light blue: A. subglabra; Light purple: A. tenuis; Light

776 cyan: Astreopora sp1). (b). Frequency distribution of dS log values for paralogous

777 genes in Acropora and for orthologous genes showing that a WGD event occurred in

778 the most recent common ancestor of Acropora. Distributions are plotted with a bin

779 size of 0.1. (c). Hypothetical tree topology of duplicated genes in the Acroporidae and

780 the phylogeny of one duplicated gene (alpha-protein kinase 1-like). The phylogenetic

781 tree shows gene retention, loss, and duplications following with WGD. (d). Node age

782 distribution of IAsα. Inferred node ages from 135 phylogenies were analyzed with

783 KDE toolbox to show the peak at 30.78 My, represented by the black solid line. The

784 grey lines represent density estimations from 1000 bootstraps and the black dotted

785 line represents the corresponding 95% confidence interval (27.86 - 34.77 My) from

786 100 bootstraps.

35 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

a c

A. digitifera A. echinata A. subglabra_sc138g11906.t1 A. gemmifera 0.5938 A. subglabra A. tenuis 1 A. gemmifera_sc211g7084.t1 Astreopora sp1

A. subglabra_sc154g17731.t1 Acropora_P1 1 1000 A. digitifera_sc200g239.t1 1 WGD 1 A. echinata_sc239g10530.t1

Acropora_p2 1 A. tenuis_sc212g261.t1

A. subglabra_sc1g3672.t1 500 Number of pairs Asteropora sp1 1 A. digitifera_sc139g187.t1 1

A. echinata_sc19g24535.t1

Asteropora sp1_sc75g12748t1 0.04

0

0.00 0.25 0.50 0.75 1.00 b dS d

1250 25 0.03 Acropora A. tenuis VS Astreopora sp1 A. tenuis VS A. digitifera 0.025 1000 20

0.02

750 15 Density

0.015 Frequecy 500 10 Number of pairs 0.01

250 5 0.005

0 0 0 −10 −5 0 -40 -20 0 20 40 60 80 100 787 log2(dS) Node Age

788

36 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

789 Figure 3. Phylogenetic trees show duplicated genes under subfunctionalization or

790 neofunctionalization. (a). The phylogeny of orthogroup 1247 (dnaJ homolog

791 subfamily B member 11-like) reconstructed with MrBayes shows duplicated genes

792 under subfunctionalization. Bayesian posterior probabilities are shown at each node.

793 The bottom right panel shows that two domains are in Astreopora sp1, but each

794 domain was independently lost in duplicated genes under subfunctionalization in

795 orthogroups 1247. (b). The phylogeny of orthogroup 1244 (excitatory amino acid

796 transporter 1-like) reconstructed with MrBayes shows duplicated genes under

797 neofunctionalization. Bayesian posterior probabilities are shown at each node. Six

798 transmembrane helices prediction is shown in the bottom right.

a b

A. subgrabla_sc0000044g2354t1 A. subgrabla_sc0000147g25825t1

A. gemmifera_sc0000234g14931t1 A. echinata_sc0000126g23664t1 1 1 1 A. echinata_sc0000736g27804t1 A. digitifera_sc0000022g82t1 1 1

1 A. tenuis_sc0000132g178t1 A. digitifera_sc0000018g282t1 0.9324 1 1 A. gemmifera_sc0000347g8194t1 A. tenuis_sc0000066g624t1

A. tenuis_sc0000132g182t1 A. tenuis_sc0000151g68t1

1 A. gemmifera_sc0000263g18942t1 1 1 A. subgrabla_sc0000180g7605t1 0.9595 A. subgrabla_sc0000360g2229t1 1 A. gemmifera_sc0000233g14735t1

A. echinata_sc0000126g23662t1 0.9595

Duplication A. digitifera_sc0000090g24t1 1 Subfunctionalization A. digitifera_sc0000236g229t1 1 Rab3 GTPase-activating protein catalytic subunit A. echinata_sc0000156g1851t1 Astreopora sp1_scaffold163g18268t1 DnaJ-class molecular chaperone with C-terminal Zn fnger domain

0.06 Astreopora sp1_scaffold296g24376t1

0.2

799

800

37 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

801 Figure 4. Gene expression profiling reveals evolution of duplicated genes in A.

802 digitifera. (a). Gene expression profiling across five developmental stages (blastula:

803 PC, gastrula: G, postgastrula: S, planula: P, and adult polyps: A) in A. digitifera. Two

804 clusters of gene expression of ohnologous gene pairs: HC: high correlation, P<0.05;

805 NC: no correlation, P>=0.05 (Pearson’s correlation test). Pearson’s correlation

806 coefficients between two ohnologous gene pairs are presented in the right panel and

807 lines represent average values of correlation coefficients in each cluster. (b)

808 Significant functional enrichments of two clusters of ohnologues gene pairs (P<0.05,

809 Fisher’s exact test) indicate that divergence of gene expression is associated with gene

810 functions. Colors of the bar represent fold change values in enrichments. (c) Boxplot

811 of dN/dS values of ohnologous gene pairs shows no significant difference between

812 the two clusters (P=0.51, Mann-Whitney test).

813

814

38 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

815 Figure 5. Diversification of toxic proteins via gene duplications in Cnidaria.

816 Phylogenetic analysis of Coagulation factor X in 12 Cnidarian species shows widely

817 gene duplications. Gene duplication occurred in individual species lineages (red

818 arrows) and gene duplications by WGD in Acropora are indicated with blue arches.

819 Outer color strips represent 12 Cnidarian species and black strip represents Non-

820 Cnidarian species. Bootstrap values greater than 50 are shown with black dots at

821 nodes.

Non-Cnidaria Gene Duplication H. magnapapillata Gene Duplication by WGD N. vectensis M. cavernosa P. lobata P. australiensis P. astreoides Astreopora sp1 A tenuis A. echinata A. digitifera A. subgrabla A. gemmifera

822

823

824

39 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

825 Table 1. Gene ontology clusters of 154 high-quality core-orthogroups.

Annotation cluser P_Value

Transmembrane 1.90E-06

Death domain 3.10E-05

G-protein coupled receptor 1.20E-04

VIT domain 3.30E-03

Protein kinase-like domain 1.90E-02

826

40 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

827 Supplementary information

828 Supplementary Figures

Lancelet

Hagfish

Lamprey

Sharks 1R Chordata

Teleosts Ts3R Ss4R

2R

Tetrapods

Ambulacraria

Lophotrochozoa

Ecdysozoa

Acoela

Acropora IAsα Cnidaria Astreopora Nematostella Trichoplax Placozoa

Porifera

700 600 500 400 300 200 100 829

830 Figure S1. WGD events in evolution of the animal kingdom. The backbone and

831 divergence time of the tree are based on various sources (e.g., Satoh, 2016). The

832 shaded grey oval represents the uncertain position of two rounds of WGD and colored

833 triangles represent the corresponding divergent groups. Grey triangles represent WGD

834 and the red star represents invertebrate WGD specific to Acropora (IAsα) reported in

835 this study.

836

41 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Genomes & Gene models OrthoMCL & Six Cnidarian BLAST RAxML Gene models & & PAML MrBayes BLAST & BEAST2 & Two gene pairs RAxML (Ohnologues gene pairs) identifcation (236) WGDgc OrthoMCL MCScanX MCScanX Orthogroups (883) package & & in R PAML PAML (Clustering homologous of all speices based on sequence similarity and each group contains at least one gene copy in each of Acropora species, two gene copies in one Acropora species Correlation analysis and one Astreaopora gene) PAML Paralogous gene pairs Anchor gene pairs DAVID Co-linearity & Core-orthogroups Gene data flter synteny & (205) dN/dS (Clustering paralgous for each speices (Selecting paralgous based on gene idetifcation (The topology of each core-or- GO analysis Modeling position within each speices) calculation based on sequence similarity) thogroups diversifed with two clades from Astreaopora) Gene expression High quality core-orthogroups analysis (154) Peak values estimation (Bootstraps values of each two clade as the age of WGD are greater than 70)

Node age estimating as the Gene count data Synonymous substitutions GO analysis Synteny analysis age of WGD (135) phylogenomic modeling analysis (dS-based) (ESS values of all parameters are greater than 200, the distribution of inferred Gene family node age was esimated by KDE toolbox) phylogeny analysis

Phylogenomic analysis Toxin proteins analysis in Cnidaria

Test for WGD and the age of WGD Test for evolutionary consequence of WGD 837

838 Figure S2. Flowchart of study design. The flowchart illustrates methods used to test

839 the main questions in this study.

840

42 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

841

842 Figure S3. Phylogeny of the Family Acroporidae. Time-calibrated phylogenetic tree

843 reconstructed based on fossil calibration and concatenated coding sequences

844 (7,467,066 bp in total) from 3,461 single-copy orthologous genes with Beast2. Branch

845 lengths are scaled to estimated-divergence time. Posterior 95% CIs of node ages are

846 represented with blue horizontal bars as well as ML bootstrap values and Bayesian

847 posterior probabilities are shown at each node.

848

43 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

1500 Astreopora

1000

500 sp1

0

400 A. digitifera

200

0 600

400 A. echinata

200

0

Numbers of pairs 600 A. gemmifera 400

200

0

600 A. subglabra

400

200

0

300

200 A. tennuis

100

0 0.0 0.5 1.0 1.5 2.0 849 dS

850 Figure S4. Frequency distribution of dS values for paralogous gene pairs in five

851 Acropora and one Astreopora species. The distributions of dS values of paralogs,

852 estimating neutral evolutionary divergence since the two paralogs diverged, are

44 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

853 plotted with a bin size of 0.02, showing the similar peaks (dS value: 0.1-0.25) in

854 Acropora (red arrows).

855

856

45 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

200 Astreopora

100 sp1

0

75 A. digitifera

50

25

0

100 A. echinata

50 Numbers of pairs 0 200

150 A. gemmifera

100

50

0

150 A. subglabra

100

50

0

60

40 A. tennuis

20

0 0.0 0.5 1.0 1.5 2.0 857 dS

858 Figure S5. Frequency distribution of dS values for anchor-gene pairs in five

859 Acropora and one Astreopora species. Distributions of dS values of anchor paralogs,

860 estimating the neutral evolutionary divergence times since the paralogs diverged, are

861 plotted with a bin size of 0.02, showing the similar peaks (dS value: 0.1-0.25, red

46 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

862 arrows) in Acropora and extra peaks in A. digitifera and A. tenuis (dS value: 0.25-0.5,

863 blue arrows).

864

47 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

865

866 Figure S6. Co-linearity and synteny between Astreopora sp1 and A. tenuis. Only

867 co-linear segments with at least 8 anchor pairs are shown in the top length 100

868 scaffolds.

869

48 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

Astreopora|scafold163.g18268.t1/1-1533 MS------GAQ FLVL------GKGLFGNFEKSKLNP IPKKKDDVD- RDGSWGVMSERISFGSVDFVVSYHHLRTRRKVLG ------NNLEDCLPDAMM EMFDM EFDFPPRAHCLSRWY GLKEFLVITPAEDADT VYSESKCHLLLSSIGIAATN 135 A. digitifera|sc0000022.g82.t1/1-847 ------A. echinata|sc0000126.g23664.t1/1-1094 M ADGDDEVDEV FEITDFTTSSEWERFAAR LEQLLIEWKLDGTNKKKST SKGK ELFNNFEK SK LNPIPHKKVEVDEREGSWGVISERISFGNANFIVSYHHLRTRKKSHGKKSKQT EDLEDC LPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 183 A. gemmifera|sc0000347.g8194.t1/1-965 M ADGDDE------GKELFNNFEKSKLNP IPH KKVEVDEREGSWGVISERISFGNANFIVSYHHLRTRKKSHGKKSKQT EDLEDC LPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 140 A. subgrabla|sc0000147.g25825.t1/1-990 M ADGDDE------GKELFNNFEKSKLNP IPH KKVEVDEREGSWGVISERISFGNANFIVSYHHLRTRKKSHGKKPKQTEDLEDCLPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 140 A. tenuis|sc0000132.g178.t1/1-1092 M ADGDEG-DEV FEITDFTTSSEWERFAAR LEQLLIEWKLDGTN-KKSASKGKELFNNFEKSKLNP IPH KKVEEDEREGSWGVMSERISFGNANFIVSYHHLRARKKSHGKKSKQT EDLEDC LPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 181 Adif|sc0000236.g229.t1/1-357 MA------2 Aech|sc0000126.g23662.t1/1-377 MA------2 Agem|sc0000263.g18942.t1/1-357 MA------2 Asub|sc0000360.g2229.t1/1-347 MA------2 Aten|sc0000132.g182.t1/1-357 MA------2

Astreopora|scafold163.g18268.t1/1-1533 SKCTVPM FVHT LGQWRRVYYGLC I GGGFRTTFQ ISHLHH IPSHYGHLAGLLEIFKDKLACPISPM HPVT LTVRFTYVLDEWTDYAWPQEPPDLSSCLECTVGVKNLESLP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWSVRAKM SENPQCLLGEYLGGYMTLCHRQ EST 318 A. digitifera|sc0000022.g82.t1/1-847 ------M PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLLGEYLRGYMTLCHRQ EST 119 A. echinata|sc0000126.g23664.t1/1-1094 SKCAVP IFAH ILQQWRR IYHGLC I GGGFRTTFHM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLLGEYLRGYMTLCHRQ EST 366 A. gemmifera|sc0000347.g8194.t1/1-965 SKCA IPIFAH ILQQWRR IYHGLC I GGGFRTTFHM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLLGEYLRGYMTLCHRQ EST 323 A. subgrabla|sc0000147.g25825.t1/1-990 SKCA IPIFAH ILQQWRR IYHGLC I GGGFRTTFHM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLL------306 A. tenuis|sc0000132.g178.t1/1-1092 SKCA IPIFAHVLQQWRR IYHGLC I GGGFRTTFLM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWT DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPHWS LRAKM SENPQCLLGEYLRGYITLCHRQ EST 364 Adif|sc0000236.g229.t1/1-357 ------EMHV------YL ------8 Aech|sc0000126.g23662.t1/1-377 ------QMHV------CL ------8 Agem|sc0000263.g18942.t1/1-357 ------EMHV------YF ------8 Asub|sc0000360.g2229.t1/1-347 ------EMHV------YL ------8 Aten|sc0000132.g182.t1/1-357 ------EMHV------YL ------8

Astreopora|scafold163.g18268.t1/1-1533 MQLLNM TAM DDDHNLAD ITHA LQRLTEA PPVASLSTAVSRATSRISAM GERSSNEFPIPAN LLSEILKHLFPDADQ DLEDLEIAQQ ELSFDKAENRKEAK ECNAPQ EK FR SLKSAP EDSLT FFLAVC VC IVYHNYGGLRAVAHLWQEFVLEM RYRWENDIFIPRLK LQAPNLGSCLVHQK LQM 501 A. digitifera|sc0000022.g82.t1/1-847 MQLLKNT LMDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISA IGD SSLDEYP IPGK LLSEILQHLFPDADQ NLEDLETVQQ E LLFAKM ESENDAKEEN IQQDKFRSLKSAPEDSLT FFLAVG ICIVYHSYGGLKAVAHLWQEFVLEM RFRWENDIFIPRLKLQAPNLGSCLLHQK LQM 302 A. echinata|sc0000126.g23664.t1/1-1094 MQLLKNT VMDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISA IGD SSLDEYP IPGK LLSEILQHLFPDADQ DLEDLETAQQ E LLFAKM ESENDAKEEN IQQDKFRSLKSAPEDSLT FFLAVG ICIVYHSYGGLKAVAHLWQEFVLEM RFRWENDIFIPRLKLQAP SLGSCLLHQK LQM 549 A. gemmifera|sc0000347.g8194.t1/1-965 MQLLKNT LMDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISAM GDSSLDEYP IPGK LLSEILQHLFPDADQ DLEDLETVQQ E LLFAKM ESENDAKEET IQ------QM430 A. subgrabla|sc0000147.g25825.t1/1-990 ------DNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISA IGD SSLDEYP IPGK LLSEILQHLFPDADQ DLEDLETVQQ E LLFAKM ESENDAKEEN IQ------QM401 A. tenuis|sc0000132.g178.t1/1-1092 MQLLKNT LVDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISALGD SSSDEYP IPGK LLSEILQHLFPDADQ DLEDLETAQQ E LLFAKM ESENEAKEEN IQQDKFRSLKSAPEDSLT FFLAVG ICIVYHSYGGLKAVAHLWQEFVLEM RYRWENDIFIPRLKLQAPNLGSCLLHQK LQM 547 Adif|sc0000236.g229.t1/1-357 ------VSAL ------V13 Aech|sc0000126.g23662.t1/1-377 ------VSAL ------V13 Agem|sc0000263.g18942.t1/1-357 ------VSAL ------V13 Asub|sc0000360.g2229.t1/1-347 ------VSAL ------V13 Aten|sc0000132.g182.t1/1-357 ------VSAL ------V13

Astreopora|scafold163.g18268.t1/1-1533 LNCC IERKRLSK IRTASLQFATNLSKT ---RTESNVTNA EKV RRNSSNT RLGSTDSQ SSVSDDEEFFEAV ESQ EESET ESG NNT ENFKNIRGTDSDYNM DNSCSGNNT IFCKREGGK EP FGDLKLLVSGEP LIVP ITQEPAPMTEDM LEEQAEILTQLGTSVEGARVRAQMQCASLLSDM EAF 681 A. digitifera|sc0000022.g82.t1/1-847 LNCC IERKRLSK IRTASLQFSTNVNKNSGT RIP NNT SNM ESSRKSDSNT RLESTGSQ SSISDDEEFFEALENQDESEK ETRNYFENFNIEGKTNSDSDIENSNENNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------439 A. echinata|sc0000126.g23664.t1/1-1094 LNCC IERKRLSK IRTASLQFSTNVNKNSGT RIP NNT SNM ESSRKSDSNT RLESTGSQ SSISDDEEFFEALENQDENEK ETRKYFENFNIKGKTNSDSDIENSNDNNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------686 A. gemmifera|sc0000347.g8194.t1/1-965 LNCC IERKRLSK IRTASLQFSTNINKNSGRRIP NNT SNM ESSRKSDSNKRLESTGSQ SSISDDEEFFEALENQDESEK EARNYFENFNIEGKTKSDSDIENSNDNNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------567 A. subgrabla|sc0000147.g25825.t1/1-990 LNCC IERKRLSK IRTASLQFSTNINKNSGRRIP NNT SNM ESSRKSDSNT RLESTGSQ SSISDDEEFFEALENQDESEK ETRNYFENFNIEGKTNSDSDIENSNDNNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------538 A. tenuis|sc0000132.g178.t1/1-1092 LNCC IERKRLSK IRTASLQFSTNVNKNSGT RIP NNT SNM ESCRKSDSNT RLESTGSQ SSISDDEEFFEALENQDENEK ETRNYFENFNIEGKTNSDSDIENSNDNNA SMS- KRE GGR ELFGDLKLLVSGEP L VVP ITQ ------684 Adif|sc0000236.g229.t1/1-357 LSCCLQ------19 Aech|sc0000126.g23662.t1/1-377 LSCCLQ------19 Agem|sc0000263.g18942.t1/1-357 LSCCLQ------19 Asub|sc0000360.g2229.t1/1-347 LSCCLQ------19 Aten|sc0000132.g182.t1/1-357 LSCCLQ------19

Astreopora|scafold163.g18268.t1/1-1533 K AANPGC LLEDFVRWY SP FDWI LGPET EEEI EELKKLKTSHEKDSK ESSTPVT AAT SEVSLDEGWDTEEIEIPDEDFTTSDAKSLMAKAKAGNV EELQVPIKWREEGHLSVRMRIPGNMWA EVWQ SARPVPVRRQKRLFDETKEAEK EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQCA 864 A. digitifera|sc0000022.g82.t1/1-847 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 475 A. echinata|sc0000126.g23664.t1/1-1094 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 722 A. gemmifera|sc0000347.g8194.t1/1-965 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 603 A. subgrabla|sc0000147.g25825.t1/1-990 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 574 A. tenuis|sc0000132.g178.t1/1-1092 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 720 Adif|sc0000236.g229.t1/1-357 ------Aech|sc0000126.g23662.t1/1-377 ------Agem|sc0000263.g18942.t1/1-357 ------Asub|sc0000360.g2229.t1/1-347 ------Aten|sc0000132.g182.t1/1-357 ------

Astreopora|scafold163.g18268.t1/1-1533 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPET EEEI EELEK LKTSHEKDSK ESSTPVT AAT SEVSLDEGWDTEEIEIPDEDFTTSDAKSLMAKAKAGNV EELQVPIKWREEGHLSIRMRIPGN MWA EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGLKPAEVAFQ LFPVLLH AAV L 1047 A. digitifera|sc0000022.g82.t1/1-847 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKASEEPSTPV AATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 658 A. echinata|sc0000126.g23664.t1/1-1094 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKASEEPSTRV AATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 905 A. gemmifera|sc0000347.g8194.t1/1-965 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKASEEPSRPVAATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 786 A. subgrabla|sc0000147.g25825.t1/1-990 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEV I EELER LKNNA EKASEEPSRPVAATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 757 A. tenuis|sc0000132.g178.t1/1-1092 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKGSEEPSRPVEATVSEV SFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSHRM RIPGN TWVEVWQSARPVPVRRQKRLFDETKEAEKVLHFLAGM KPAEVAFQ LFPVLLR AAV L 903 Adif|sc0000236.g229.t1/1-357 ------Aech|sc0000126.g23662.t1/1-377 ------Agem|sc0000263.g18942.t1/1-357 ------Asub|sc0000360.g2229.t1/1-347 ------Aten|sc0000132.g182.t1/1-357 ------

Astreopora|scafold163.g18268.t1/1-1533 RIEEAGARDIPAVASLLDKVLEKASK LSWAM PQD LLLC EEFVKQLQLCEL VVARAHSLR FK FQDA LSDK ETGT DKDV EHFLSALMEKPEVAVIGAGRGPAGRVLHGLFAKQ ESARMQFTVDESS-PN QAV ---AFR EQPIAHPSM SK EQVGNK ITRRDFYAILGVPRDANKNQIKRAYRKLAM 1226 A. digitifera|sc0000022.g82.t1/1-847 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LV-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHSLFVKQ ESARMQFGGDESPFPNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYSVLT- 830 A. echinata|sc0000126.g23664.t1/1-1094 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LL-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHSLFVKQ ESARMQFGGDESPFRNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYCVLT- 1077 A. gemmifera|sc0000347.g8194.t1/1-965 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LL-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGR ------SARMQFGGDESPFPNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYCVLT- 948 A. subgrabla|sc0000147.g25825.t1/1-990 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LL-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHSLFVKQ ESARMQFGGDETPFPNQQV- -SWFRSCFFALLVYLIRNDDATVKENDYR IVISAITGNSHPPIPPCLRALNI937 A. tenuis|sc0000132.g178.t1/1-1092 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARAHSLRYKFHDA LV-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHGLFVKQ ESARMQFGGDESPFPNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYCVLT- 1075 Adif|sc0000236.g229.t1/1-357 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Aech|sc0000126.g23662.t1/1-377 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Agem|sc0000263.g18942.t1/1-357 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Asub|sc0000360.g2229.t1/1-347 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Aten|sc0000132.g182.t1/1-357 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50

Astreopora|scafold163.g18268.t1/1-1533 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDSHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGSQHSHQ REVPRGSDIHM DLEVTLEELYNGNFIEV LR FKPVTETVPGTRQCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE1389 A. digitifera|sc0000022.g82.t1/1-847 ------N------DEFR ---LAG ------AFTDDV IFQ ------847 A. echinata|sc0000126.g23664.t1/1-1094 ------N------DEFR ---LAG ------AFTDDV IFQ ------1094 A. gemmifera|sc0000347.g8194.t1/1-965 ------N------DEFR ---LAG ------AFTDDV IFQ ------965 A. subgrabla|sc0000147.g25825.t1/1-990 ISFP------FLSA------ASFS- YATPG ------LP I- PYWSGVYFE -----NNH------PQTPA------974 A. tenuis|sc0000132.g178.t1/1-1092 ------N------DEFR ---LAG ------AFTDDV IFQ ------1092 Adif|sc0000236.g229.t1/1-357 KWHPDKNKDDP KAQERFQDLSAAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213 Aech|sc0000126.g23662.t1/1-377 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FSRFVFSGFLTK IAEGSR LELCTFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE233 Agem|sc0000263.g18942.t1/1-357 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213 Asub|sc0000360.g2229.t1/1-347 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213 Aten|sc0000132.g182.t1/1-357 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213

Astreopora|scafold163.g18268.t1/1-1533 IEPGM HDGQ EYPFIAEGEPHIDGEPGDLKFM IRELRHER FQRKGDDLYTNVTITLVDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNNVKGDLYITFDV EFPRGGLEDSEK EALK SILKQDSQQKAYNGL 1533 A. digitifera|sc0000022.g82.t1/1-847 ------A. echinata|sc0000126.g23664.t1/1-1094 ------A. gemmifera|sc0000347.g8194.t1/1-965 ------A. subgrabla|sc0000147.g25825.t1/1-990 ------LISP LSS ------KNV LCADQR------990 A. tenuis|sc0000132.g178.t1/1-1092 ------Adif|sc0000236.g229.t1/1-357 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GAKIKKKEEGMPNYENNN IKGDLY ITFDIAFPRGGLEDNDKEA LKNILKQSPNQKAYNGL 357 Aech|sc0000126.g23662.t1/1-377 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GAKIKKKEEGMPNYENNN IKGDLY ITFDIAFPRGGLEDNDKEA LKNILKQSPNQKAYNGL 377 Agem|sc0000263.g18942.t1/1-357 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNN IKGDLY ITFDIEFPRGGLEDNDKEA LKDILKQSPNQKAYNGL 357 Asub|sc0000360.g2229.t1/1-347 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNN IKGDLY ITFDIEFPRGGLEDNDKEG W ---YKSSIS ------347 870 Aten|sc0000132.g182.t1/1-357 IEPGM HDGQ EYPFVAEGEPHIDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNN IKGDLY ITFDIEFPRGGLEDNDKEA LKNILKQSPNQKTYNGL 357

871 Figure S7. Alignment of orthogroup 1247 (dnaJ homolog subfamily B member

872 11-like) showing the independent loss of the domain in duplicates.

873

49 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

874

875 Figure S8. Alignment of orthogroup 1244 (excitatory amino acid transporter 1-

876 like) showing mutations on transmembrane and exposed regions, suggesting that

877 new functions would be generated. Exposed regions are shown in yellow.

878

50 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

879

880

881 Figure S9. Phylogeny of the toxic protein, Ryncolin-4. The phylogeny was

882 reconstructed using ExaML and bootstrap values are shown at each node. Gene

883 duplications caused by WGD in Acropora are shown in cyan.

884

51 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

885

886

887 Figure S10. Phylogeny of the toxic Astacin-like metalloprotease. The phylogeny

888 was reconstructed using ExaML and bootstrap values are shown at each node. Gene

889 duplication caused by WGD in Acropora is shown in cyan.

890

52 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

891

892 Figure S11. Phylogeny of a toxic protein (Putative lysosomal acid lipase/

893 cholesteryl ester hydrolase). The phylogeny was reconstructed using RAxML and

894 bootstrap values are shown at each node. Gene duplications caused by WGD in

895 Acropora are shown in cyan shadows.

896

53 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

897

898

899 Figure S12. Phylogeny of the toxic putative endothelial lipase. The phylogeny was

900 reconstructed using RAxML and bootstrap values are shown at each node. WGD in

901 Acropora is shown in cyan.

902

54 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

903

904

905 Figure S13. Phylogeny of the toxin protein (DELTA-thalatoxin-Avl2a/DELTA-

906 alicitoxin-Pse2a). The phylogeny was reconstructed using RAxML and bootstrap

907 values are shown at each node. Gene duplication by WGD in Acropora is shown in

908 cyan.

909

55 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

910

911

912 Figure S14. Phylogeny of the toxic protein (DELTA-actitoxin-Aas1a). The

913 phylogeny was reconstructed using RAxML and bootstrap values are shown at each

914 node. Gene duplication by WGD in Acropora is shown in cyan.

915

56 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

916

917

918 Figure S15. Phylogeny of the toxic protein (Phospholipase-B 81). The phylogeny

919 was reconstructed using RAxML and bootstrap values are shown at each node. Gene

920 duplication by WGD in Acropora is shown in cyan.

921

57 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

922

923

924 Figure S16. Phylogeny of the toxic snake venom 5'-nucleotidase. The phylogeny

925 was reconstructed using RAxML and bootstrap values are shown at each node. Gene

926 duplication by WGD in Acropora is shown in cyan.

927

58 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

A. gemmifera_sc0000016g26670t1

0.5172 A. echinata_sc0000003g27340t1 0.5336

0.9967 A.digitifera_sc0000001g331t1

A.subgrabla_sc0000002g21169t1 0.9984

A.tenuis_sc0000018g61t1

A.subgrabla_sc0000002g21172t1 1

A. gemmifera_sc0000016g26672t1 0.9861 0.7631

A.digitifera_sc0000001g330t1

A. echinata_sc0000003g27338t1

A.tenuis_sc0000018g60t1

1 A. gemmifera_sc0000016g26671t1 0.9598

A.subgrabla_sc0000002g21171t1 0.959

0.6434 A.subgrabla_sc0000002g21170t1

A. echinata_sc0000003g27339t1

1 A.digitifera_sc0000001g329t1

1 A. gemmifera_sc0000016g26673t1 1

0.9943 A.subgrabla_sc0000002g21173t1

A. echinata_sc0000003g27337t1

Astreopora sp1_scaffold42g6297t1 928 0.2

929 Figure S17. Phylogeny of orthogroup 434 (somatostatin receptor type 5-like)

930 shows duplicates under two WGD topology. The phylogeny was reconstructed

931 using MrBayes, and Bayesian posterior probabilities are shown at each node.

932

933

934

59 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

935 Supplementary Tables

936 Table S1. Numbers of gene pairs in the paralogous gene pairs dataset and anchor

937 gene pairs datasets.

Paragous gene pairs (<=20 gene families) Anchor gene pairs (<=20 gene families)

Total numbers Total numbers (0

A. digitifera 39827 8249 46559 1958

A.echinata 47299 10948 54956 2530

A. gemmifera 48051 11093 56972 3299

A. subgrabla 50852 12077 44093 2380

A. tenuis 34097 6635 28073 1488

Astreopora 49135 13648 52481 3033 sp1

938

939 Table S2. Peak value estimation of dS distribution by KDE toolbox.

Astreopora sp1_A. A. tenuis_A. digitifera Acropora_WGD tenuis

Peak age 53.6 14.69 35.01458704

log2(dS_paralog_peak) -0.314 -3.4596 -1.8165

95%_HDP_log2(dS_paralog_peak) (-0.22031,-0.33195) (-3.4008,-3.5141) (-1.7606,-2.1261)

95%_HDP_Age NA NA (31.18,35.7)

940

941 Table S3. Numbers of gene family in orthogroups, core-orthogroups and high-

942 quality core-orthogroups

Numbers

Orthogroups 883

Core-orthogroups 205

High-quality core-orthogroups 154

943

60 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

944 Table S4. The number of putative toxin proteins in 12 Cnidarian species

H. A. Astreo M. N. P. P. Gene A. magni P. Query_name A. digitifera A.echinata A. gemmifera subgra pora cavern vecten astreoi austra family tenuis papill lobata bla sp1 osa sis des liensis ata

Gene Coagulation 52 48 52 58 50 49 12 39 56 7 46 34 family_1 factor X

Gene Ryncolin-4 48 45 37 62 38 29 0 23 46 5 24 29 family_2

Astacin-like Gene metalloprote 28 23 25 26 30 33 36 20 60 6 31 14 family_3 ase toxin

Gene Reticulocalb 18 15 14 14 18 20 5 16 18 6 19 17 family_4 in

Putative

lysosomal

Gene acid 10 10 10 11 7 13 4 6 5 4 5 5 family_5 lipase/choles

teryl ester

hydrolase

Venom Gene carboxyleste 8 6 9 7 7 17 1 5 14 3 8 4 family_6 rase-6

Putative Gene endothelial 11 1 1 17 11 25 2 2 5 1 4 5 family_7 lipase

DELTA-

thalatoxin- Gene Avl2a/DELT 9 5 7 12 11 13 0 2 5 2 2 0 family_8 A-alicitoxin-

Pse2a

Venom Gene phosphodiest 5 6 6 6 5 7 1 7 9 3 5 5 family_9 erase 2

Gene DELTA-

family_1 actitoxin- 3 4 5 5 4 3 0 1 0 1 5 4

0 Aas1a

Gene family_11 7 5 2 3 2 2 1 1 1 1 2 1

Gene Phospholipa family_1 4 2 2 2 3 4 0 3 3 3 1 1 se-B 81 2

Gene Venom

family_1 dipeptidyl 5 2 1 2 2 3 2 4 3 0 0 1

3 peptidase 4

Gene Hyaluronida family_1 3 2 2 2 2 2 1 2 2 1 2 1 se-1 4

Gene Snake 1 1 2 4 2 2 2 1 1 1 1 0

61 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

family_1 venom 5'-

5 nucleotidase 945

946 Table S5. Likelihood of multiple WGDs hypotheses in Acropora using WGDgc

947 method with gene counts data.

WGD event(s) Likelihood Likelihood Ratio Test P_value

0 -38731.86 0 VS 1 7.66E-05

1 -38724.04 1 VS 2 0.01248965

2 -38720.92 2 VS 3 0.05990546

3 -38719.15

948

949 Table S6. Likelihood of different times of WGD under one WGD event in Acropora

950 using WGDgc.

Time of WGD Likelihood

18.697005 -38724.14

22.697005 -38724.09

26.697005 -38724.06

30.697005 -38724.04

34.697005 -38724.04

38.697005 -38724.06

42.697005 -38724.11

46.697005 -38724.2

50.697005 -38724.35

951

952

953

62