bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1 Title:
2 An ancient genome duplication in the speciose reef-building coral genus, Acropora
3
4 Authors: Yafei Mao1*, Chuya Shinzato1†, Noriyuki Satoh1
5 Affiliations:
6 1Marine Genomics Unit, Okinawa Institute of Science and Technology Graduate
7 University, Onna, Okinawa 904-0495, Japan
8
9 *Correspondence to: Yafei Mao: [email protected]
10 †Present address: Atmosphere and Ocean Research Institute, The University of
11 Tokyo, Kashiwa, Chiba 277-8564, Japan
12
13 Yafei Mao: [email protected]
14 Chuya Shinzato: [email protected]
15 Noriyuki Satoh: [email protected]
1 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
16 Abstract:
17 Whole-genome duplication (WGD) has been recognized as a significant evolutionary
18 force in the origin and diversification of vertebrates, plants, and other organisms.
19 Acropora, one of the most speciose reef-building coral genera, responsible for
20 creating spectacular but increasingly threatened marine ecosystems, is suspected to
21 have originated by polyploidy, yet there is no genetic evidence to support this
22 hypothesis. Using comprehensive phylogenomic and comparative genomic
23 approaches, we analyzed five Acropora genomes and an Astreopora genome
24 (Scleractinia: Acroporidae) to show that a WGD event likely occurred between 27.9
25 and 35.7 Million years ago (Mya) in the most recent common ancestor of Acropora,
26 concurrent with a massive worldwide coral extinction. We found that duplicated
27 genes became highly enriched in gene regulation functions, some of which are
28 involved in stress responses. The different functional clusters of duplicated genes are
29 related to the divergence of gene expression patterns during development. Some gene
30 duplications of proteinaceous toxins were generated by WGD in Acropora compared
31 with other Cnidarian species. Collectively, this study provides evidence for an ancient
32 WGD event in corals and it helps to explain the origin and diversification of Acropora.
2 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
33 Introduction
34 Reef-building corals contribute to tropical marine ecosystems that support
35 innumerable marine organisms, but reefs are increasingly threatened due to recent
36 increases in seawater temperatures, pollution, and other stressors [1,2]. The
37 Acroporidae is a family of reef-building corals in the phylum Cnidaria, one of the
38 basal phyla of the animal clade [3-5]. Astreopora (Anthozoa: Acroporidae) is the
39 sister genus in the acroporid lineage according to fossil records and molecular
40 phylogenetic evidence [3,6,7]. Importantly, Acropora (Anthozoa: Acroporidae), one
41 of the most diverse genera of reef-building corals, including more than 150 species in
42 Indo-Pacific Ocean, is thought to have originated from Astreopora almost 60 Mya
43 with several species turnovers [3,5,8,9]. Investigating the evolutionary history of this
44 group importantly contributes to our understanding of coral reef biodiversity and
45 conservation. Hybridization among Acropora species has been observed in the wild
46 [10] and variable chromosome numbers have been determined in different Acropora
47 lineages [11]. Additionally, gene duplications have been shown in several Acropora
48 gene families [12,13]. Thus, based on their unique lifestyle, variable chromosome
49 numbers, and complicated reticular evolutionary history, Indo-Pacific Acropora likely
50 originated via polyploidy [10-16]. However, there is no direct molecular and genetic
51 evidence to support this hypothesis.
52
53 Ancient whole (large-scale)-genome duplication (WGD), or paleopolyploidy, has
54 shaped in the genomes of vertebrates, green plants, and other organisms, and is
55 usually regarded as an evolutionary landmark in the origins and diversification of
56 organisms [17-19] (Supplementary Fig. 1). Two separate WGD events have been
57 documented in the common ancestors of vertebrates (two-rounds of WGD) [20] and
3 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
58 another major WGD has been reported in the last common ancestor of teleost fish
59 [21,22]. Meanwhile, living angiosperms share an ancient WGD event [23,24], and
60 many other WGD events have been reported in major clades of angiosperms [25,26].
61 In addition, two-rounds of WGDs in the vertebrates are suggested to have occurred
62 during the Cambrian Period, and some WGDs in plants are believed to have occurred
63 during Cretaceous-Tertiary [17,25,27]. Thus, WGD is regarded as an important
64 evolutionary way to reduce the risk of extinction [17,18,25]. However, the study of
65 WGD in Cnidaria has received less attention [17,18,28-30].
66
67 Duplicated genes created by WGD have complex fates following the time of
68 diploidization [18,31]. Usually, one of the duplicated genes is silenced or lost due to
69 redundancy of gene functions, termed “nonfunctionalization”. However, retained
70 duplicated genes provide important sources of biological complexity and evolutionary
71 novelty due to subfunctionalization, neofunctionalization, and dosage effects [22,32].
72 Duplicated genes may develop complementary gene functions via
73 subfunctionalization, or evolve new functions through neofunctionalization, or are
74 retained in complicated regulatory networks with different gene expressions due to
75 dosage effects. For instance, duplicated MADS-Box genes are crucial for flower
76 development and the origin of phenotypic novelty in plants [18,33]. Duplicated
77 homeobox genes provide raw genetic material for vertebrate development [22,34]. In
78 addition, toxin diversification following by gene duplications has been recognized as
79 a mechanism to enhance adaptation in animals [35,36], especially in snake venoms
80 [37,38]. Meanwhile, toxic proteins are involved in various important processes in
81 corals, including prey capture, protection from predators, wound-healing, etc. [39,40],
82 but it is still unclear how gene duplications of toxic proteins evolved in corals.
4 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
83
84 Isozyme electrophoresis and restriction fragment length polymorphism (RFLP) were
85 used to identify gene duplications in polyploids a few decades ago [41,42]. In the past
86 ten years, next generation sequencing (NGS) has generated a wealth of genomic data
87 at vastly decreased cost and reduced effort [43,44]. Then, three main methods were
88 developed to identify WGD, based on: 1). analysis of the rate of synonymous
89 substitutions per synonymous site (dS) of duplicated genes within a genome (dS-
90 based method) [25,45,46]; 2). phylogenetic analysis of gene families among multiple
91 genomes (Phylogenomic analysis) [23,47]; and 3). synteny block identification
92 compared with sister lineages without WGD (Synteny analysis) [20,48,49],
93 respectively. The dS-based method and phylogenomic analysis only require gene
94 family information, without genome assembly. However, the dS-based method cannot
95 detect ancient WGD, and gene tree uncertainty usually causes bias in the
96 phylogenomic analysis. Both methods rely heavily on gene family estimation and
97 clustering. Inaccurate gene predictions (gene models) and rough gene family cluster
98 algorithms can easily fail to detect WGD using either method. In contrast, the synteny
99 analysis relies heavily on the genome assembly quality. Poor assembly quality can
100 hide the WGD signals, and some genomes with huge rearrangements cannot be used
101 to detect WGD using synteny block identification. Thus, the most credible
102 conclusions depend on complementary evidence from different methods [24,50,51].
103
104 Here, we analyzed a genome of Astreopora (Astreopora sp1) as an outgroup, as well
105 as five Acropora genomes (A. digitifera, A. gemmifera, A. subgrabla, A. echinata and
106 A. tenuis) to address the following questions using all three methods; (I) whether and
107 when WGD occurred in Acropora, (II) what is the fate of duplicated genes in
5 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
108 Acropora after the event, (III) what the gene expression patterns of duplicated genes
109 cross five developmental stages in A. ditigifera, and (IV) what roles of WGD were
110 involved in the diversification of toxic proteins in Acropora (Supplementary Fig. 2).
111
6 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
112 Results
113 Cluster of gene families and calibration of the acroporid phylogenomic tree
114 We clustered all homologs among the six Acroporid species into 19,760 gene families,
115 and they shared 6,520 gene families (Fig 1a, See Additional file 1). Each Acropora
116 genome had very few unique gene families, but Astreopora sp1 had 218, suggesting
117 that Astreopora sp1 is genetically divergent from the five Acropora species. 3,461
118 single-copy orthologs were selected from 6,520 shared gene families. These were
119 concatenated to reconstruct a calibrated phylogenomic tree based on the reported
120 divergence time of Acropora (Mao et al., in revision). We found that Astreopora sp1
121 split from Acropora ~ 53.6 Mya (95% highest posterior density (HPD): 51.02 - 56.21
122 My) (Fig 1b; Supplementary Fig 3). This established a timescale to analyze the timing
123 of the subsequent WGD.
124
125 WGD identification with the dS-based method
126 Synonymous substitution rate (dS) analysis has been widely used to infer WGD
127 [25,52]. We identified over 10,000 paralogous gene pairs, based on their sequence
128 similarities as well as we identified over 10,000 anchor gene pairs, based on synteny
129 information from each species (Supplementary Table 1; for details see Methods).
130 Then we calculated dS values from paralogous gene pairs and anchor gene pairs for
131 each species.
132
133 An ‘L-shaped’ distribution was evident in both paralogous and anchor gene pair dS
134 distributions of Astreaopora sp1, illustrating that no WGD occurred in Astreaopora
135 sp1. However, all five Acropora species displayed a similar peak in dS distributions
7 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
136 of both paralogous and anchor gene pairs (peak: 0~0.3), suggesting that WGD did
137 occur in Acropora (Fig 2a, Supplementary Figs 4, 5).
138
139 dS values of orthologous gene pairs between two pairs of species (Astreopora sp1 and
140 A. tenuis; A. tenuis and A. digitifera) were estimated as the speciation time between
141 them according to neutral evolution theory [48,53]. We combined the dS values of
142 paralogous gene pairs for the five Acropora species and estimated the peak in the log
143 dS distribution (modal value = -1.82). Also, we estimated the distribution of
144 orthologous gene pairs between Astreopora sp1 and A. tenuis (modal value = -0.31)
145 and the distribution of orthologous gene pairs between A. tenuis and A. digitifera
146 (modal value = -3.46). The result indicates that the WGD occurred in Acropora after
147 the split of Astreopora sp1 and A. tenuis (Fig 2b). In other words, an ancient WGD
148 event likely occurred in the most recent common ancestor of Acropora. Based on
149 speciation time estimated in the calibrated phylogenomic tree and assuming a constant
150 dS rate [25], we estimated that the WGD of Acropora occurred ~ 35.01 Mya (95%
151 confidence interval: 31.18 - 35.7 My) (Supplementary Table 2, for details see
152 Methods). Here, we defined this event as invertebrate α event of WGD specifically in
153 Acropora (IAsα).
154
155 Phylogenomic and synteny analysis of IAsα
156 If the proposed IAsα is correct, then the ohnologs of Acropora (paralogs created by
157 IAsα) should form two clades from their orthologs in Astreopora sp1 by mapping
158 IAsα onto phylogenetic trees [23,54]. In other words, the phylogenetic topology
159 would be (((Acropora clade1) bootstrap1, (Acropora clade2) bootstrap2), Astreopora
160 sp1), defined as gene duplication topology (Fig 2c).
8 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
161
162 We performed comprehensive phylogenomic analysis to confirm IAsα. First, we
163 defined orthogroups as clusters of homologous genes in Acropora derived from a
164 single gene in Astreopora sp1. Each orthogroup contained at least seven homologous
165 genes, including at least one gene copy in each Acropora species and one gene copy
166 in Astreopora sp1. We selected 883 orthogroups from 19,760 gene families, and
167 reconstructed the phylogeny of 883 orthogroups using both Maximum likelihood (ML)
168 and Bayesian methods. We found that the phylogeny of 205 orthogroups was
169 consistent with gene duplication topology supporting IAsα. We further defined the
170 205 orthogroups as core-orthogroups (Supplementary Table 3, See Additional file 1).
171
172 In particular, we found differential gene loss, retention, and duplication in Acropora
173 lineages. For instance, the phylogeny of orthogroup 1370 (alpha-protein kinase 1-like)
174 showed gene retention in A. subgrabla, A. digitifera, and A. echinata, gene loss in A.
175 tenuis, and an extra gene duplication in A. subgrabla. This implies that diversification
176 of duplicated genes may contribute to species complexity and evolutionary innovation
177 in Acropora [22] (Fig 2c).
178
179 In order to estimate the split time of the two Acropora clades that could be regarded
180 as the timing of IAsα, we selected 154 high-quality core-orthogroups, with both
181 bootstrap values in both Acropora clades > 70 in ML phylogeny, to reconstruct a
182 time-calibrated phylogeny from the 205 core-orthogroups using BEAST2 [23].
183 However, we found that it is difficult for the parameters in MCMC to converge in 70
184 core-orthogroups, and we successfully dated the phylogenetic trees of only 135 high-
185 quality core-orthogroups (See Additional file 1). Next, we estimated the distribution
9 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
186 of inferred node ages between the two Acropora clades and the peak value was
187 estimated as 30.78 My (95% confidence interval: 27.86-34.77 My), indicating that
188 IAsα occurred at 30.78 My (Fig 2d). This result strongly supports the timing of the
189 IAsα estimated using the dS-based method.
190
191 Intergenomic co-linearity is often used to directly identify ancient WGD and to
192 reconstruct ancestral karyotypes in vertebrates [48,53,55]. We performed
193 intergenomic co-linearity and synteny analysis between Astreopora sp1 and A. tenuis
194 to support IAsα. First, we found great co-linearity between Astreopora sp1 and A.
195 tenuis. Second, more duplicated scaffold segments were observed in A. tenuis than in
196 Astreopora sp1 (Supplementary Fig 6) and we also found that some duplicated
197 regions were located in different scaffolds in A. tenuis (Supplementary Fig 6).
198
199 In summary, we clearly established IAsα using the dS-based method, phylogenomic
200 and synteny analyses. Moreover, we suggest that IAsα probably occurred between
201 27.86 and 35.7 Mya (Fig 1b).
202
203 The fate of duplicated genes originating from IAsα
204 Duplicated genes provide substrates for diversification and evolutionary novelty, and
205 most of them are regulators of complex gene networks in vertebrates and plants
206 [23,48,56]. We examined gene ontology (GO) for all genes among the 154 high-
207 quality core-orthogroups to investigate their roles in IAsα and found that their
208 molecular functions have been enriched in specific categories; transporter, catalytic,
209 binding, and receptor activity, most of which are involved in gene regulation (Table
210 1).
10 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
211
212 Further, we identified some duplicated genes under subfunctionalization and
213 neofunctionalization, possibly contributing to stress responses of corals. dnaJ
214 homolog subfamily B member 11-like (DNAJB) protein was shown to be involved in
215 heat stress responses in marine organisms [57,58]. Orthogroups 1247 (DNAJB) has
216 two main domains (Ras and Dnaj domains) in Astreopora sp1 representing the ancient
217 state. Each of the two domains was independently lost in the duplicated genes,
218 resulting in complementary functions of the duplicated genes after IAsα (Fig 3a and
219 Supplementary Figs 7). Excitatory amino acid transporters may be related to
220 symbiotic interactions in Acropora [59]. Orthogroups 1244 (excitatory amino acid
221 transporter 1-like) was predicted as a six transmembrane protein, and a high number
222 of mutations have accumulated in both untransmembrane and transmembrane regions,
223 suggesting that new functions would be generated (Fig 3b and Supplementary Figs 8).
224 These examples suggest that IAsα participates in both stress responses and symbiotic
225 interactions in Acropora. Moreover, these results accord with previous patterns of the
226 fate of duplicated genes in vertebrates and plants [17,19,23,48], indicating that the
227 IAsα possibly contributes to the species complexity and diversification in Acropora.
228
229 Gene expression patterns of duplicated genes across five developmental stages in A.
230 digitifera
231
232 To better to understand evolution of duplicated genes, gene expression analysis across
233 five developmental stages in A. digitifera (blastula, gastrula, postgastrula, planula, and
234 adult polyps) was carried out based on previous transcriptome data [60]. We
235 identified 236 ohnologous pairs in A. digitifera from 883 ML phylogeny (for details
11 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
236 see Methods) and found that these ohnologous pairs present an interesting gene
237 expression profiling. We divided 236 ohnologous pairs into two clusters based on the
238 pairwise correlation of gene expression during development (high correlation or HC:
239 P<0.05; no correlation or NC: P>=0.05; Pearson’s correlation test); 25% (25/236)
240 ohnologous pairs in HC and 75% (211/236) ohnologous pairs in NC (Fig 4a).
241 Ohnologous pairs in the HC cluster are enriched in protein kinase, while ohnologous
242 pairs in the NC cluster are enriched in membrane transporter and ion binding proteins
243 (Fig 4b). This result indicates that the two clusters of ohnologous pairs potentially
244 evolved into different gene functions. Additionally, we compared dN/dS values in
245 order to investigate selective pressure between HC and NC clusters (Fig 4c), but there
246 is no significant difference between the two clusters (Mann-Whitney-Wilcoxon Test,
247 P = 0.51).
248
249 Evolution of toxic proteins in Cnidaria
250 Next, we investigated the role of IAsα in the diversification of toxins in Acropora. We
251 identified about two hundred putative toxic proteins in each of the five Acropora
252 species, and then we clustered them with putative toxic proteins of Astreopora sp1
253 and other six Cnidarian species (Hydra magnapapillata, Nematostella vectensis,
254 Montastraea cavernosa, Porites australiensis, Porites astreoides, and Porites lobata)
255 into 24 gene families (Supplementary Table 4, for details see Methods). Based on
256 reconstruction of the gene family phylogeny, which contains at least 15 genes (See
257 Additional file 1), we found that toxic proteins have undergone widespread gene
258 duplications in Cnidaria, and most of gene duplications occurred in individual species
259 lineages, except for Acropora (Fig 5 and Supplementary Figs 9-16). Interestingly,
260 gene duplications occurred in the most recent common ancestor of Acropora in 9 of
12 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
261 15 gene families, potentially caused by WGD (IAsα). For example, in gene family-1
262 (Coagulation factor X), each species contains around 50 genes, except H.
263 magnapapillata, and gene duplications occurred frequently in individual species
264 lineages: Astreopora sp1, M. cavernosa, N. vectensis, and P. australiensis. However,
265 five gene duplications occurred in the most recent common ancestor of Acropora by
266 WGD (Fig 5). These results indicated that IAsα contributed to the diversification of
267 proteinaceous toxins in Acropora.
268
269 Discussion
270 Ancient WGD is considered as a significant evolutionary factor in the origin and
271 diversification of evolutionary lineages [17,18,28,54], but much work remains to
272 definitively identify WGD and to understand its consequences in different
273 evolutionary lineages. Staghorn corals of the genus Acropora, which constitute the
274 foundation of modern coral reef ecosystems, are hypothesized to have originated
275 through polyploidization [2,11,15]. However, there is no genetic evidence to support
276 this assertion. To that end, we analyzed genomes of one Astreopora and five
277 Acropora species to address the possibility of WGD in Acropora and the functional
278 fate of duplicated genes from that event.
279
280 To the best of our knowledge, this is the first study to report genomic-scale evidence
281 of WGD in corals (IAsα). We find that large numbers of ohnologs are retained in
282 Acropora species and hundreds of gene families display phylogenetic duplication
283 topology among the five Acropora species, meanwhile, our synteny analysis between
284 Astreopora. sp1 and A. tenuis directly supports IAsα. However, reconstruction of the
285 ancestral karyotype will necessitate genomes assembled to the chromosome level to
13 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
286 fully understanding gene fractionation and chromosome arrangements in Acropora
287 under IAsα [27,61].
288
289 Ancient WGD is usually inferred using the dS-based method, but artificial signals in
290 dS distributions have been reported in previous studies, because of dS saturation (dS
291 value > 1) or because of using poorly annotated genomes [24,52,62]. There is an extra
292 peak in the dS distribution of anchor gene pairs in A. digitifera and A. tenuis. One
293 possible explanation is that the extra peak is artifactitious because few anchor gene
294 pairs were used in the analysis. However, this could also indicate a second WGD
295 event in Acropora. We found few orthogroups with topologies that fit the two
296 proposed WGDs events (Supplementary Fig 17). If a second WGD event occurred,
297 the reason that the second WGD signal appeared among anchor gene pairs rather than
298 among paralogous gene pairs may be that the paralogs generated by the second WGD
299 have been largely lost; thus, few of them are only retained in conserved order.
300 Fortunately, a new maximum likelihood phylogeny modeling approach was recently
301 developed to overcome difficulties of the dS-based method [24,62]. We used it to test
302 whether a second WGD occurred in Acropora. The result showed that one WGD
303 event is the best model in Acropora and it occurred 30.69 to 34.69 Mya
304 (Supplementary Table 5 and Supplementary Table 6; for details see Methods). Thus,
305 we have supportive genome-scale evidence to support IAsα, but as yet, there is no
306 conclusive evidence to support a second WGD in Acropora.
307
308 It is crucial to accurately estimate the timing of a WGD event to understand its
309 evolutionary consequences [23,25]. Our study has clearly estimated the timing of
310 IAsα using both phylogenomic analysis and the dS-based method. We suggest that
14 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
311 IAsα probably occurred between 27.86 and 35.7 Mya (Fig 1b). Interestingly, species
312 turnover events usually occurred with extinctions [63], and one species turnover event
313 in corals (Oligocene-Miocene transition: OMT) was suggested to have occurred from
314 15.97 to 33.7 Mya [8]. The timing of IAsα may correspond to a massive extinction of
315 corals created by OMT. This finding supports the hypothesis that WGD may enable
316 organisms to escape extinction during drastic environmental changes [17] (Fig 1b).
317
318 The occurrence of IAsα raises the question of what impact it may have had upon coral
319 evolution [15,32]. We performed GO analysis on duplicated genes and examined
320 several duplicated gene families, showing that duplicated genes following IAsα
321 indeed provided raw genetic material for Acropora to diversify and are potentially
322 crucial for stress responses. In particular, toxin diversification in Acropora was
323 mainly generated by WGD. In addition, we focused on expression patterns of
324 duplicated genes in A. digitifera, showing that expressions of duplicated protein
325 kinases are likely to be correlated during development. A possible explanation may be
326 that protein kinases are probably retained in complex signal transduction pathways via
327 subfunctionalization or dosage effects [22,32]. However, expressions of duplicated
328 membrane proteins are likely uncorrelated probably because these proteins may have
329 developed different functions via neofunctionalization, such as excitatory amino acid
330 transporters (orthogroups 1244). However, there is still much work needed to
331 investigate molecular mechanisms of duplicated genes to examine these hypotheses in
332 the diversification of Acropora [64]. For instance, previous gene functional studies
333 have demonstrated that voltage-gated sodium channel gene paralogs, duplicated in
334 teleosts, contributed to the acquisition of new electric organs via neofunctionalization
335 in both mormyroid and gymnotiform electric fishes [65,66].
15 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
336
337 Our previous study proposed that adaptive radiation in Acropora was probably driven
338 by introgression (Mao et al., in revision); thus, Acropora is the first invertebrates
339 lineage reported to have undergone both WGD and introgression. Meanwhile, both
340 introgression and WGD have also been reported in cichlid fish lineages [67], a
341 famous model for adaptive radiation in vertebrates [67,68]. Both WGD and
342 introgression are regarded as significant forces in adaptive radiation of organisms
343 [17,67], but we still do not understand the relationship between WGD and
344 introgression in adaptive radiations [69].
345
346 In conclusion, this study identified an ancient WGD shared by Acropora species
347 (IAsα) that not only provides new insights into the evolution of reef-building corals,
348 but also expands a new model of WGD in animals.
349
350 Methods
351 Species information, genomic data and gene families cluster
352 Data can be accessed at: http://marinegenomics.oist.jp and
353 http://comparative.reefgenomics.org/datasets.html. The Acropora species information
354 in this study will be described in the paper (Mao et al., in revision). Information about
355 Astreopora sp1 was described previously [7] and transcriptome data across five
356 development stages was descried previously [60]. Protein sequences of the six species
357 were combined together to perform all-against-all BLASTP approach to find all
358 orthologs and paralogs among six species. Then, OrthoMCL was used with default
359 settings to cluster homologs into 19,760 gene families according to sequence
360 similarity [70].
16 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
361
362 Single-copy orthologs and reconstruction of a calibrated phylogenomic tree
363 A custom python script was used to select 3,461 single-copy orthologs with
364 only one gene copy in each species. For each sequence alignments of single-copy
365 orthologs, coding sequences were aligned with MAFFT [71] as described previously.
366 Then, the concatenated sequences of 3,461 single-copy orthologs were used to
367 reconstruct the phylogenomic tree (species tree) with BEAST2 [72]. First, we
368 partitioned the concatenated coding sequences by codon position. Molecular clock
369 and trees, except substitution model, were linked together. Then, divergence time was
370 estimated using the HKY substitution model, relaxed lognormal clock model, and
371 calibrated Yule prior with the divergence time estimated in our previous study (Mao
372 et al., in submission). We ran BEAST2 three times independently, 50 million Markov
373 chain Monte Carlo (MCMC) generations for each run, then we used Tracer to check
374 the log files and we found that ESS of each of parameters exceeded 200.
375
376 Orthogroup selection and detection of a WGD event with dS analysis
377 (a) dS distributions of paralogous gene pairs
378 Paralogous gene pairs of each species were identified by all-against-all
379 BLASTP approach and then OrthoMCL was used to cluster paralogs to gene families
380 for each species [70]. Gene families with fewer than 20 genes were used to calculate
381 dS values. Each gene pair within a given gene family was aligned with MAFFT [71]
382 and aligned sequences were used to calculate dS values with PAML [73]. The dS
383 distribution of each species was plotted with bins=0.02 in R [74].
384 (b) dS distributions of anchor gene pairs
17 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
385 First, we used MCScanX with default settings (except for match_size=3) to
386 find anchor gene pairs based on synteny information for each species [75]. Each
387 anchor gene pair was aligned with MAFFT [71] and aligned sequences were used to
388 calculate dS values with PAML [73]. The dS distribution of each species was plotted
389 with bins=0.02 in R [74].
390 (c) dS distributions of orthologous gene pairs
391 First, we used MCScanX with default settings (except for match_size=3) to
392 find orthologous gene pairs based on synteny information between Astreopora sp1
393 and A. tenuis, and between A. tenuis and A. digitifera [75]. Each orthologous gene
394 pair was aligned with MAFFT [71] and aligned sequences were used to calculate dS
395 values with PAML [73]. dS distributions of all species were plotted with bins=0.02 in
396 R [74].
397
398 Detection of a WGD event using phylogenetic analysis
399 A custom python script was used to select the 883 gene families, including at
400 least one gene copy in Astreopora, one gene copy in each of the five species and at
401 least two ohnologs in one of five Acropora species, as orthogroups. Ohnologs are
402 defined as paralogs originating from WGD.
403 For each of the 883 gene tree reconstructions, we used MAFFT [71] to align
404 amino acid sequences of each single-copy ortholog. We aligned coding sequences
405 with TranslatorX [76] based on amino acid alignments and we excluded the single-
406 copy orthologous genes containing ambiguous ‘N’. PartitionFinder [77] was used to
407 find the best substitution model for RAxML (Version 8.2.2) [78] and MrBayes
408 (Version 3.2.3) [79]
18 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
409 Then, 205 orthogroups, for which phylogeny matched the duplication
410 topology (Astreaopora, (Acropora, Acropora)), were selected as core-orthogroups by
411 eyes. The 154 high quality core-orthogroups, for which clades’ bootstrap values in
412 ML phylogeny exceeded 70, were used to perform molecular dating with BEAST2
413 based on the calibrated phylogenomic tree. Molecular clock and trees, except
414 substitution model, were linked together. Then, divergence time was estimated using
415 the HKY substitution model, relaxed lognormal clock model, and calibrated Yule
416 prior with the divergence time from our previous study (Mao et al., in submission).
417 We ran BEAST2 three times independently, 30 million Markov chain Monte Carlo
418 (MCMC) generations for each run. Then we used Tracer to check the log files. 135
419 time-calibrated phylogeny with ESS values exceeded 200 were carried out by
420 BEAST2 [72].
421
422 Estimating peak values in dS distributions and inferred node ages’ distribution with
423 KDE toolbox
424 Each distribution was estimated using KDE toolbox in MATLAB, as
425 described previously [25,80].
426 (a). Estimating peak values in distributions
427 To estimate the age of WGD within dS distributions, we assumed the peak
428 value in orthologous gene pair dS distributions as the split time between two species:
429 the split time between Astreopora sp1 and A. tenuis is 53.6 My, whereas that the split
430 time between A. tenuis and A. digitifera is 14.69 My. Before we used the kde()
431 function in KDE toolbox, we first truncated dS distributions to avoid estimation bias
432 due to extreme values: the dS distribution of orthologous gene pairs between
433 Astreopora sp1 and A. tenuis was truncated with a range from -1 to1 while the dS
19 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
434 distribution of orthologous gene pairs between A. tenuis and A. digitifera was
435 truncated with a range from -5 to -2. Then, we used the kde() function in KDE
436 toolbox to estimate the peak values of these two dS distributions as -0.314 and -
437 3.4596, respectively. Moreover, the distribution of Acropora paralogous gene pairs
438 was truncated with a range from -4 to 0 and we estimated the peak value of this
439 distribution as -1.8165. We also used bootstrapping to estimate 95% confidence
440 intervals (CIs) of Acropora paralogous gene pairs distribution as -1.7606 to -2.1261
441 (31.18 to 35.71 My). For bootstrapping, we generated 100 bootstrap samples for each
442 distribution by sampling with replacement from the original data distribution (49,002
443 samples in the original distribution) with the sample() function. We estimated
444 maximum peak values for each 100 bootstrap samples. Then we sorted maximum
445 peak values and values of 6th and 95th rank were used to define the 95% CI.
446 (b). Estimating peak values in distributions of inferred node age
447 To estimate the age of WGD in the distribution of inferred node ages, we used
448 the kde() function in KDE toolbox to estimate the peak value as 30.78 My, and we
449 used bootstrapping to estimate the 95% CIs as 27.86 to 34.77 My. For bootstrapping,
450 we generated 100 bootstrap samples from the distribution by sampling with
451 replacement from the original data distribution (135 samples in the original
452 distribution) with the sample() function. We estimated maximum peak values for each
453 of 100 bootstrap samples. Then, we sorted maximum peak values and values of 6th
454 and 95th rank defined the 95% CI.
455
456 Maximum likelihood approach to detect WGD with gene family count data
457 First, we filtered gene family cluster data generated by OrthoMCL described
458 above [70]. The gene family, including only one Astreopora sp1 gene and at least one
20 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
459 gene in each of the five Acropora species, was counted. Then, we used the WGDgc
460 package in R to estimate log likelihood for parameters (0, 1, 2, 3) of WGD event(s)
461 with setting (dirac=1,conditioning="twoOrMore") [62]. Then, we performed
462 likelihood ratio test (pchisq(2*(Likelihood_1-Likelihood_2), df=1,
463 lower.tail=FALSE)) to find the best model and found that one WGD event was the
464 best model to fit the gene family count data. We estimated the age of WGD on 4 My
465 intervals between 18.69 and 38.69 My under a one WGD event model. The lowest log
466 likelihood was shown at the age of WGD: 30.69 and 34.69 My.
467
468 Gene expression profiling analysis and dN/dS calculation
469 We selected 236 gene pairs of A. digitifera (ohnologous gene pairs) from 831
470 orthogroups. We BLASTed these ohnologous gene pairs against the gene expression
471 data across five developmental stages [60] and these data were normalized for each
472 developmental stage. Correlations between two ohnologous genes were performed
473 using Pearson’s correlation in R [74]. Hierarchical clustering was performed using
474 Pheatmap for HC cluster genes and NC cluster genes, respectively. Pairwise dN/dS
475 ratios were calculated with PAML using codeml based on the coding sequence
476 alignment of ohnologous gene pairs [73]. The dN/dS distribution was plotted with
477 ggplot2 in R and significance tests of differences between dN/dS distributions were
478 evaluated by a Mann-Whitney test in R [74].
479
480 Evolution analysis of toxic proteins in corals
481 The 55 toxic proteins of A. digitifera identified in the previous study were
482 downloaded from http://www.uniprot.org/ as queries. The proteins sequences of
483 Porites astreoides, Porites australiensis, Porites lobata, Montastraea cavernosa,
21 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
484 Hydra magnipapillata and Nematostella vectensis were downloaded from
485 http://comparative.reefgenomics.org/datasets.html, and combined them with protein
486 sequences of six Acroporid species to create a search database.
487 We identified candidates of toxic proteins by BLASTing the 55 toxins against
488 the combined protein sequences with settings: e-value < 1e-20 and identity > 30%.
489 Then, we used OrthoMCL to cluster candidates of toxins into 24 gene families and
490 reconstructed their ML gene trees with ExaML and RAxML. Each gene tree was
491 rooted at a branch or clade of query sequences.
492
493 Gene ontology enrichment for duplicated genes of core-orthogroups and protein
494 domains and transmembrane helices prediction
495 We BLASTed the sequences of 154 high quality core-orthogroups and
496 ohnologous gene pairs in Acropora against the UNIPROT database to find best hits.
497 Identical hits in each ohonlogs group were removed and the remaining hits were used
498 to perform gene enrichment in David [81]. We also used InterProScan [82] to predict
499 protein domains and used the TMHMM Server (v. 2.0) [83] to predict transmembrane
500 helices from protein sequences.
501
502 Acknowledgments
503 This study was supported by OIST and was funded by JSPS grants (No.
504 17J00557 to YFM). We thank Dr. Douglas E Soltis and Dr. Pamela S Soltis for their
505 comments and insight into this draft manuscript. We thank Dr. Steven D. Aird for
506 editing the manuscript.
507
508 Availability of data and materials
22 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
509 Genome assembly and gene models in this study can be downloaded from
510 http://marinegenomics.oist.jp.
511
512 Authors’ contributions
513 YFM and NS conceived the study. CS provided genomic data. YFM
514 performed all analysis in this study. YFM and NS wrote the initial manuscript and all
515 authors edited the final manuscript.
516
517 Competing Interests
518 The authors declare no competing interests
519
520 Additional files
521 Additional file 1: An Excel file with three tables listing 1) the original data of
522 gene family clusters in this study, 2) orthogroup datasets in this study, 3) node ages of
523 core-orthogrouops inferred by BEAST2 in this study, 4) data of gene expression
524 analysis for ohnologous gene pairs, and 5) toxin gene family clusters.
525
23 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
526 Reference:
527 1. Ainsworth TD, Heron SF, Ortiz JC, Mumby PJ, Grech A, et al. (2016) Climate
528 change disables coral bleaching protection on the Great Barrier Reef. Science
529 352: 338-342.
530 2. Renema W, Pandolfi JM, Kiessling W, Bosellini FR, Klaus JS, et al. (2016) Are
531 coral reefs victims of their own past success? Science advances 2: e1500850.
532 3. Wallace CC (2012) Acroporidae of the Caribbean. Geologica Belgica 15: 388-393.
533 4. Richards ZT, Miller DJ, Wallace CC (2013) Molecular phylogenetics of
534 geographically restricted Acropora species: Implications for threatened species
535 conservation. Molecular Phylogenetics and Evolution 69: 837-851.
536 5. Wallace CC, Rosen BR (2006) Diverse staghorn corals (Acropora) in high-latitude
537 Eocene assemblages: implications for the evolution of modern diversity
538 patterns of reef corals. Proceedings of the Royal Society B: Biological
539 Sciences 273: 975-982.
540 6. Fukami H, Omori M, Hatta M (2000) Phylogenetic relationships in the coral family
541 Acroporidae, reassessed by inference from mitochondrial genes. Zoological
542 Science 17: 689-696.
543 7. Suzuki G, Nomura K (2013) Species boundaries of Astreopora corals (Scleractinia,
544 Acroporidae) inferred by mitochondrial and nuclear molecular markers.
545 Zoological science 30: 626-632.
546 8. Edinger EN, Risk MJ (1994) Oligocene-Miocene extinction and geographic
547 restriction of Caribbean corals: roles of turbidity, temperature, and nutrients.
548 Palaios: 576-598.
549 9. Renema W, Bellwood DR, Braga JC, Bromfield K, Hall R, et al. (2008) Hopping
550 hotspots: Global shifts in marine Biodiversity. Science 321: 654-657.
24 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
551 10. Vollmer SV, Palumbi SR (2002) Hybridization and the evolution of reef coral
552 diversity. Science 296: 2023-2025.
553 11. Kenyon JC (1997) Models of reticulate evolution in the coral genus Acropora
554 based on chromosome numbers: Parallels with plants. Evolution 51: 756-767.
555 12. Hamada M, Shoguchi E, Shinzato C, Kawashima T, Miller DJ, et al. (2013) The
556 complex NOD-Like receptor repertoire of the coral Acropora digitifera
557 includes novel domain combinations. Molecular Biology and Evolution 30:
558 167-176.
559 13. Gacesa R, Chung R, Dunn SR, Weston AJ, Jaimes-Becerra A, et al. (2015) Gene
560 duplications are extensive and contribute significantly to the toxic proteome of
561 nematocysts isolated from Acropora digitifera (Cnidaria: Anthozoa:
562 Scleractinia). BMC Genomics 16.
563 14. van Oppen MJ, McDonald BJ, Willis B, Miller DJ (2001) The evolutionary
564 history of the coral genus Acropora (Scleractinia, Cnidaria) based on a
565 mitochondrial and a nuclear marker: reticulation, incomplete lineage sorting,
566 or morphological convergence? Molecular Biology and Evolution 18: 1315-
567 1329.
568 15. Willis BL, van Oppen MJH, Miller DJ, Vollmer SV, Ayre DJ (2006) The role of
569 hybridization in the evolution of reef corals. Annual Review of Ecology
570 Evolution and Systematics 37: 489-517.
571 16. Richards ZT, Hobbs JPA (2015) Hybridisation on coral reefs and the conservation
572 of evolutionary novelty. Current Zoology 61: 132-145.
573 17. Van De Peer Y, Mizrachi E, Marchal K (2017) The evolutionary significance of
574 polyploidy. Nature Reviews Genetics 18: 411-424.
25 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
575 18. Van de Peer Y, Maere S, Meyer A (2009) The evolutionary significance of
576 ancient genome duplications. Nature Reviews Genetics 10: 725-732.
577 19. Soltis PS, Marchant DB, Van de Peer Y, Soltis DE (2015) Polyploidy and genome
578 evolution in plants. Current Opinion in Genetics & Development 35: 119-125.
579 20. Dehal P, Boore JL (2005) Two rounds of whole genome duplication in the
580 ancestral vertebrate. Plos Biology 3: 1700-1708.
581 21. Christoffels A, Koh EGL, Chia JM, Brenner S, Aparicio S, et al. (2004) Fugu
582 genome analysis provides evidence for a whole-genome duplication early
583 during the evolution of ray-finned fishes. Molecular Biology and Evolution
584 21: 1146-1151.
585 22. Glasauer SMK, Neuhauss SCF (2014) Whole-genome duplication in teleost fishes
586 and its evolutionary consequences. Molecular Genetics and Genomics 289:
587 1045-1060.
588 23. Jiao YN, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, et al.
589 (2011) Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97-
590 U113.
591 24. Tiley GP, Ane C, Burleigh JG (2016) Evaluating and Characterizing Ancient
592 Whole-Genome Duplications in Plants with Gene Count Data. Genome
593 Biology and Evolution 8: 1023-1037.
594 25. Vanneste K, Baele G, Maere S, Van de Peer Y (2014) Analysis of 41 plant
595 genomes supports a wave of successful genome duplications in association
596 with the Cretaceous-Paleogene boundary. Genome Research 24: 1334-1347.
597 26. Soltis DE, Albert VA, Leebens‐Mack J, Bell CD, Paterson AH, et al. (2009)
598 Polyploidy and angiosperm diversification. American journal of botany 96:
599 336-348.
26 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
600 27. Smith JJ, Kuraku S, Holt C, Sauka-Spengler T, Jiang N, et al. (2013) Sequencing
601 of the sea lamprey (Petromyzon marinus) genome provides insights into
602 vertebrate evolution. Nature Genetics 45: 415-421.
603 28. Kenny NJ, Chan KW, Nong W, Qu Z, Maeso I, et al. (2017) Ancestral whole-
604 genome duplication in the marine chelicerate horseshoe crabs (vol 116, pg
605 190, 2016). Heredity 119: 388-388.
606 29. Schwager EE, Sharma PP, Clarke T, Leite DJ, Wierschin T, et al. (2017) The
607 house spider genome reveals an ancient whole-genome duplication during
608 arachnid evolution. BMC Biology 15.
609 30. Li Z, Tiley GP, Galuska SR, Reardon CR, Kidder TI, et al. (2018) Multiple large-
610 scale gene and genome duplications during the evolution of hexapods.
611 Proceedings of the National Academy of Sciences: 201710791.
612 31. Sémon M, Wolfe KH (2007) Consequences of genome duplication. Current
613 opinion in genetics & development 17: 505-512.
614 32. Conant GC, Birchler JA, Pires JC (2014) Dosage, duplication, and diploidization:
615 clarifying the interplay of multiple models for duplicate gene evolution over
616 time. Current Opinion in Plant Biology 19: 91-98.
617 33. Veron AS, Kaufmann K, Bornberg-Bauer E (2006) Evidence of interaction
618 network evolution by whole-genome duplications: a case study in MADS-box
619 proteins. Molecular Biology and Evolution 24: 670-678.
620 34. Canestro C, Albalat R, Irimia M, Garcia-Fernàndez J. Impact of gene gains, losses
621 and duplication modes on the origin and diversification of vertebrates; 2013.
622 Elsevier. pp. 83-94.
623 35. Kordiš D, Gubenšek F (2000) Adaptive evolution of animal toxin multigene
624 families. Gene 261: 43-52.
27 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
625 36. Kondrashov FA (2012) Gene duplication as a mechanism of genomic adaptation
626 to a changing environment. Proceedings of the Royal Society B: Biological
627 Sciences 279: 5048-5057.
628 37. Hargreaves AD, Swain MT, Hegarty MJ, Logan DW, Mulley JF (2014)
629 Restriction and recruitment—gene duplication and the origin and evolution of
630 snake venom toxins. Genome Biology and Evolution 6: 2088-2095.
631 38. Vonk FJ, Casewell NR, Henkel CV, Heimberg AM, Jansen HJ, et al. (2013) The
632 king cobra genome reveals dynamic gene evolution and adaptation in the
633 snake venom system. Proceedings of the National Academy of Sciences 110:
634 20651-20656.
635 39. Armoza-Zvuloni R, Schneider A, Sher D, Shaked Y (2016) Rapid hydrogen
636 peroxide release from the coral Stylophora pistillata during feeding and in
637 response to chemical and physical stimuli. Scientific reports 6: 21000.
638 40. Ben-Ari H, Paz M, Sher D (2018) The chemical armament of reef-building corals:
639 inter-and intra-specific variation and the identification of an unusual
640 actinoporin in Stylophora pistilata. Scientific reports 8: 251.
641 41. Fürthauer M, Thisse B, Thisse C (1999) Three different noggin genes antagonize
642 the activity of bone morphogenetic proteins in the zebrafish embryo.
643 Developmental biology 214: 181-196.
644 42. Stuber CW, Goodman MM (1983) Inheritance, intracellular localization, and
645 genetic variation of phosphoglucomutase isozymes in maize (Zea mays L.).
646 Biochemical genetics 21: 667-689.
647 43. Goodwin S, McPherson JD, McCombie WR (2016) Coming of age: ten years of
648 next-generation sequencing technologies. Nature Reviews Genetics 17: 333.
28 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
649 44. Hardwick SA, Deveson IW, Mercer TR (2017) Reference standards for next-
650 generation sequencing. Nature Reviews Genetics 18: 473.
651 45. Lynch M, Conery JS (2000) The evolutionary fate and consequences of duplicate
652 genes. Science 290: 1151-1155.
653 46. Blanc G, Hokamp K, Wolfe KH (2003) A recent polyploidy superimposed on
654 older large-scale duplications in the Arabidopsis genome. Genome research
655 13: 137-144.
656 47. Blomme T, Vandepoele K, De Bodt S, Simillion C, Maere S, et al. (2006) The
657 gain and loss of genes during 600 million years of vertebrate evolution.
658 Genome biology 7: R43.
659 48. Zhang GQ, Liu KW, Li Z, Lohaus R, Hsiao YY, et al. (2017) The Apostasia
660 genome and the evolution of orchids. Nature 549: 379-+.
661 49. Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm
662 genome evolution by phylogenetic analysis of chromosomal duplication
663 events. Nature 422: 433.
664 50. Chen ZJ, Birchler JA (2013) Polyploid and hybrid genomics: John Wiley & Sons.
665 51. Soltis PS, Soltis DE (2012) Polyploidy and genome evolution: Springer.
666 52. Vanneste K, Van de Peer Y, Maere S (2012) Inference of genome duplications
667 from age distributions revisited. Molecular biology and evolution 30: 177-190.
668 53. Berthelot C, Brunet F, Chalopin D, Juanchich A, Bernard M, et al. (2014) The
669 rainbow trout genome provides novel insights into evolution after whole-
670 genome duplication in vertebrates. Nature communications 5: 3657.
671 54. Marcet-Houben M, Gabaldón T (2015) Beyond the whole-genome duplication:
672 phylogenetic evidence for an ancient interspecies hybridization in the baker's
673 yeast lineage. PLoS biology 13: e1002220.
29 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
674 55. Nakatani Y, Takeda H, Kohara Y, Morishita S (2007) Reconstruction of the
675 vertebrate ancestral genome reveals dynamic genome reorganization in early
676 vertebrates. Genome research 17: 1254-1265.
677 56. Kassahn KS, Dang VT, Wilkins SJ, Perkins AC, Ragan MA (2009) Evolution of
678 gene function and regulatory control after whole-genome duplication:
679 comparative analyses in vertebrates. Genome research 19: 1404-1418.
680 57. Wang W, Hui JHL, Chan TF, Chu KH (2014) De Novo transcriptome sequencing
681 of the snail echinolittorina malaccana: identification of genes responsive to
682 thermal stress and development of genetic markers for population studies.
683 Marine Biotechnology 16: 547-559.
684 58. Fujikawa T, Munakata T, Kondo S, Satoh N, Wada S (2010) Stress response in
685 the ascidian Ciona intestinalis: transcriptional profiling of genes for the heat
686 shock protein 70 chaperone system under heat stress and endoplasmic
687 reticulum stress. Cell Stress & Chaperones 15: 193-204.
688 59. Bertucci A, Foret S, Ball EE, Miller DJ (2015) Transcriptomic differences
689 between day and night in Acropora millepora provide new insights into
690 metabolite exchange and light-enhanced calcification in corals. Molecular
691 Ecology 24: 4489-4504.
692 60. Reyes-Bermudez A, Villar-Briones A, Ramirez-Portilla C, Hidaka M, Mikheyev
693 AS (2016) Developmental progression in the coral Acropora digitifera is
694 controlled by differential expression of distinct regulatory gene networks.
695 Genome Biology and Evolution 8: 851-870.
696 61. Smith JJ, Keinath MC (2015) The sea lamprey meiotic map improves resolution
697 of ancient vertebrate genome duplications. Genome research 25: 1081-1090.
30 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
698 62. Rabier CE, Ta T, Ane C (2014) Detecting and locating whole genome
699 duplications on a phylogeny: a probabilistic approach. Molecular Biology and
700 Evolution 31: 750-762.
701 63. Jackson ST, Sax DF (2010) Balancing biodiversity in a changing environment:
702 extinction debt, immigration credit and species turnover. Trends in Ecology &
703 Evolution 25: 153-160.
704 64. Yasuoka Y, Shinzato C, Satoh N (2016) The mesoderm-forming gene brachyury
705 regulates ectoderm-endoderm demarcation in the coral Acropora digitifera.
706 Current Biology 26: 2885-2892.
707 65. Zakon HH, Lu Y, Zwickl DJ, Hillis DM (2006) Sodium channel genes and the
708 evolution of diversity in communication signals of electric fishes: convergent
709 molecular evolution. Proceedings of the National Academy of Sciences 103:
710 3675-3680.
711 66. Arnegard ME, Zwickl DJ, Lu Y, Zakon HH (2010) Old gene duplication
712 facilitates origin and diversification of an innovative communication system
713 twice. Proceedings of the National Academy of Sciences 107: 22172-22177.
714 67. Berner D, Salzburger W (2015) The genomics of organismal diversification
715 illuminated by adaptive radiations. Trends in Genetics 31: 491-499.
716 68. Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, et al. (2014)
717 Genomics and the origin of species. Nature Reviews Genetics 15: 176-192.
718 69. Soltis PS, Soltis DE (2009) The role of hybridization in plant speciation. Annual
719 review of plant biology 60: 561-588.
720 70. Li L, Stoeckert CJ, Jr., Roos DS (2003) OrthoMCL: identification of ortholog
721 groups for eukaryotic genomes. Genome research 13: 2178-2189.
31 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
722 71. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid
723 multiple sequence alignment based on fast Fourier transform. Nucleic Acids
724 Research 30: 3059-3066.
725 72. Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu C-H, et al. (2014) BEAST 2: a
726 software platform for Bayesian evolutionary analysis. PLOS Computational
727 Biology 10: e1003537.
728 73. Yang Z (2007) PAML 4: phylogenetic analysis by maximum likelihood.
729 Molecular Biology and Evolution 24: 1586-1591.
730 74. Team RC (2013) R: A language and environment for statistical computing.
731 75. Wang YP, Tang HB, DeBarry JD, Tan X, Li JP, et al. (2012) MCScanX: a toolkit
732 for detection and evolutionary analysis of gene synteny and collinearity.
733 Nucleic Acids Research 40.
734 76. Abascal F, Zardoya R, Telford MJ (2010) TranslatorX: multiple alignment of
735 nucleotide sequences guided by amino acid translations. Nucleic Acids Res
736 38: W7-13.
737 77. Lanfear R, Calcott B, Ho SY, Guindon S (2012) Partitionfinder: combined
738 selection of partitioning schemes and substitution models for phylogenetic
739 analyses. Molecular Biology and Evolution 29: 1695-1701.
740 78. Ronquist F, Teslenko M, van der Mark P, Ayres DL, Darling A, et al. (2012)
741 MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice
742 across a large model space. Systematic Biology 61: 539-542.
743 79. Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-
744 analysis of large phylogenies. Bioinformatics 30: 1312-1313.
32 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
745 80. Barone P (2014) Kernel density estimation via diffusion and the complex
746 exponentials approximation problem. Quarterly of Applied Mathematics 72:
747 291-310.
748 81. Huang DW, Sherman BT, Lempicki RA (2009) Systematic and integrative
749 analysis of large gene lists using DAVID bioinformatics resources. Nature
750 protocols 4: 44-57.
751 82. Zdobnov EM, Apweiler R (2001) InterProScan: an integration platform for the
752 signature-recognition methods in InterPro. Bioinformatics 17: 847-848.
753 83. Krogh A, Larsson B, von Heijne G, Sonnhammer ELL (2001) Predicting
754 transmembrane protein topology with a hidden Markov model: Application to
755 complete genomes. Journal of Molecular Biology 305: 567-580.
756
33 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
757 Figures
758
759 Figure 1. Ancient WGD in the reef-building coral Acropora (IAsα). (a). Venn
760 diagram of shared and unique gene families in six Acroporid species. (b). A calibrated
761 phylogenomic tree of six Acroporid species inferred from 3,461 single-copy orthologs
762 using BEAST2. Horizontal bars on branches of the tree represent the timing of WGD
763 in Acropora. The timing of IAsα was estimated at 35.01 Mya (95% confidence
764 interval: 31.18-35.7 Mya) by dS-based analysis (horizontal blue bar) and 30.78 Mya
765 (95% confidence interval: 27.86-34.77 Mya) by phylogenomic analysis (horizontal
766 orange bar). Grey shading represents the timing of one coral species turnover event,
767 the Oligocene-Miocene transition (OMT), suggesting that IAsα is correlated with
768 OMT.
769
34 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
770 Figure 2. Ancient WGD identification (IAsα) and timing of the event in Acropora.
771 (a). Frequency distribution of dS values for paralogous gene pairs in five Acropora
772 and one Astreopora species showing that a WGD event occurred in Acropora. Similar
773 peaks (dS value: 0.1-0.3) in dS distributions of five Acropora lineages indicate that a
774 WGD event occurred in Acropora. (Light red: A. digitifera; Light yellow: A. echinata;
775 Light green: A. gmmifera; Light blue: A. subglabra; Light purple: A. tenuis; Light
776 cyan: Astreopora sp1). (b). Frequency distribution of dS log values for paralogous
777 genes in Acropora and for orthologous genes showing that a WGD event occurred in
778 the most recent common ancestor of Acropora. Distributions are plotted with a bin
779 size of 0.1. (c). Hypothetical tree topology of duplicated genes in the Acroporidae and
780 the phylogeny of one duplicated gene (alpha-protein kinase 1-like). The phylogenetic
781 tree shows gene retention, loss, and duplications following with WGD. (d). Node age
782 distribution of IAsα. Inferred node ages from 135 phylogenies were analyzed with
783 KDE toolbox to show the peak at 30.78 My, represented by the black solid line. The
784 grey lines represent density estimations from 1000 bootstraps and the black dotted
785 line represents the corresponding 95% confidence interval (27.86 - 34.77 My) from
786 100 bootstraps.
35 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
a c
A. digitifera A. echinata A. subglabra_sc138g11906.t1 A. gemmifera 0.5938 A. subglabra A. tenuis 1 A. gemmifera_sc211g7084.t1 Astreopora sp1
A. subglabra_sc154g17731.t1 Acropora_P1 1 1000 A. digitifera_sc200g239.t1 1 WGD 1 A. echinata_sc239g10530.t1
Acropora_p2 1 A. tenuis_sc212g261.t1
A. subglabra_sc1g3672.t1 500 Number of pairs Asteropora sp1 1 A. digitifera_sc139g187.t1 1
A. echinata_sc19g24535.t1
Asteropora sp1_sc75g12748t1 0.04
0
0.00 0.25 0.50 0.75 1.00 b dS d
1250 25 0.03 Acropora A. tenuis VS Astreopora sp1 A. tenuis VS A. digitifera 0.025 1000 20
0.02
750 15 Density
0.015 Frequecy 500 10 Number of pairs 0.01
250 5 0.005
0 0 0 −10 −5 0 -40 -20 0 20 40 60 80 100 787 log2(dS) Node Age
788
36 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
789 Figure 3. Phylogenetic trees show duplicated genes under subfunctionalization or
790 neofunctionalization. (a). The phylogeny of orthogroup 1247 (dnaJ homolog
791 subfamily B member 11-like) reconstructed with MrBayes shows duplicated genes
792 under subfunctionalization. Bayesian posterior probabilities are shown at each node.
793 The bottom right panel shows that two domains are in Astreopora sp1, but each
794 domain was independently lost in duplicated genes under subfunctionalization in
795 orthogroups 1247. (b). The phylogeny of orthogroup 1244 (excitatory amino acid
796 transporter 1-like) reconstructed with MrBayes shows duplicated genes under
797 neofunctionalization. Bayesian posterior probabilities are shown at each node. Six
798 transmembrane helices prediction is shown in the bottom right.
a b
A. subgrabla_sc0000044g2354t1 A. subgrabla_sc0000147g25825t1
A. gemmifera_sc0000234g14931t1 A. echinata_sc0000126g23664t1 1 1 1 A. echinata_sc0000736g27804t1 A. digitifera_sc0000022g82t1 1 1
1 A. tenuis_sc0000132g178t1 A. digitifera_sc0000018g282t1 0.9324 1 1 A. gemmifera_sc0000347g8194t1 A. tenuis_sc0000066g624t1
A. tenuis_sc0000132g182t1 A. tenuis_sc0000151g68t1
1 A. gemmifera_sc0000263g18942t1 1 1 A. subgrabla_sc0000180g7605t1 0.9595 A. subgrabla_sc0000360g2229t1 1 A. gemmifera_sc0000233g14735t1
A. echinata_sc0000126g23662t1 0.9595
Duplication A. digitifera_sc0000090g24t1 1 Subfunctionalization A. digitifera_sc0000236g229t1 1 Rab3 GTPase-activating protein catalytic subunit A. echinata_sc0000156g1851t1 Astreopora sp1_scaffold163g18268t1 DnaJ-class molecular chaperone with C-terminal Zn fnger domain
0.06 Astreopora sp1_scaffold296g24376t1
0.2
799
800
37 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
801 Figure 4. Gene expression profiling reveals evolution of duplicated genes in A.
802 digitifera. (a). Gene expression profiling across five developmental stages (blastula:
803 PC, gastrula: G, postgastrula: S, planula: P, and adult polyps: A) in A. digitifera. Two
804 clusters of gene expression of ohnologous gene pairs: HC: high correlation, P<0.05;
805 NC: no correlation, P>=0.05 (Pearson’s correlation test). Pearson’s correlation
806 coefficients between two ohnologous gene pairs are presented in the right panel and
807 lines represent average values of correlation coefficients in each cluster. (b)
808 Significant functional enrichments of two clusters of ohnologues gene pairs (P<0.05,
809 Fisher’s exact test) indicate that divergence of gene expression is associated with gene
810 functions. Colors of the bar represent fold change values in enrichments. (c) Boxplot
811 of dN/dS values of ohnologous gene pairs shows no significant difference between
812 the two clusters (P=0.51, Mann-Whitney test).
813
814
38 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
815 Figure 5. Diversification of toxic proteins via gene duplications in Cnidaria.
816 Phylogenetic analysis of Coagulation factor X in 12 Cnidarian species shows widely
817 gene duplications. Gene duplication occurred in individual species lineages (red
818 arrows) and gene duplications by WGD in Acropora are indicated with blue arches.
819 Outer color strips represent 12 Cnidarian species and black strip represents Non-
820 Cnidarian species. Bootstrap values greater than 50 are shown with black dots at
821 nodes.
Non-Cnidaria Gene Duplication H. magnapapillata Gene Duplication by WGD N. vectensis M. cavernosa P. lobata P. australiensis P. astreoides Astreopora sp1 A tenuis A. echinata A. digitifera A. subgrabla A. gemmifera
822
823
824
39 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
825 Table 1. Gene ontology clusters of 154 high-quality core-orthogroups.
Annotation cluser P_Value
Transmembrane 1.90E-06
Death domain 3.10E-05
G-protein coupled receptor 1.20E-04
VIT domain 3.30E-03
Protein kinase-like domain 1.90E-02
826
40 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
827 Supplementary information
828 Supplementary Figures
Lancelet
Hagfish
Lamprey
Sharks 1R Chordata
Teleosts Ts3R Ss4R
2R
Tetrapods
Ambulacraria
Lophotrochozoa
Ecdysozoa
Acoela
Acropora IAsα Cnidaria Astreopora Nematostella Trichoplax Placozoa
Porifera
700 600 500 400 300 200 100 829
830 Figure S1. WGD events in evolution of the animal kingdom. The backbone and
831 divergence time of the tree are based on various sources (e.g., Satoh, 2016). The
832 shaded grey oval represents the uncertain position of two rounds of WGD and colored
833 triangles represent the corresponding divergent groups. Grey triangles represent WGD
834 and the red star represents invertebrate WGD specific to Acropora (IAsα) reported in
835 this study.
836
41 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Genomes & Gene models OrthoMCL & Six Cnidarian BLAST RAxML Gene models & & PAML MrBayes BLAST & BEAST2 & Two gene pairs RAxML (Ohnologues gene pairs) identifcation (236) WGDgc OrthoMCL MCScanX MCScanX Orthogroups (883) package & & in R PAML PAML (Clustering homologous of all speices based on sequence similarity and each group contains at least one gene copy in each of Acropora species, two gene copies in one Acropora species Correlation analysis and one Astreaopora gene) PAML Paralogous gene pairs Anchor gene pairs DAVID Co-linearity & Core-orthogroups Gene data flter synteny & (205) dN/dS (Clustering paralgous for each speices (Selecting paralgous based on gene idetifcation (The topology of each core-or- GO analysis Modeling position within each speices) calculation based on sequence similarity) thogroups diversifed with two clades from Astreaopora) Gene expression High quality core-orthogroups analysis (154) Peak values estimation (Bootstraps values of each two clade as the age of WGD are greater than 70)
Node age estimating as the Gene count data Synonymous substitutions GO analysis Synteny analysis age of WGD (135) phylogenomic modeling analysis (dS-based) (ESS values of all parameters are greater than 200, the distribution of inferred Gene family node age was esimated by KDE toolbox) phylogeny analysis
Phylogenomic analysis Toxin proteins analysis in Cnidaria
Test for WGD and the age of WGD Test for evolutionary consequence of WGD 837
838 Figure S2. Flowchart of study design. The flowchart illustrates methods used to test
839 the main questions in this study.
840
42 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
841
842 Figure S3. Phylogeny of the Family Acroporidae. Time-calibrated phylogenetic tree
843 reconstructed based on fossil calibration and concatenated coding sequences
844 (7,467,066 bp in total) from 3,461 single-copy orthologous genes with Beast2. Branch
845 lengths are scaled to estimated-divergence time. Posterior 95% CIs of node ages are
846 represented with blue horizontal bars as well as ML bootstrap values and Bayesian
847 posterior probabilities are shown at each node.
848
43 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
1500 Astreopora
1000
500 sp1
0
400 A. digitifera
200
0 600
400 A. echinata
200
0
Numbers of pairs 600 A. gemmifera 400
200
0
600 A. subglabra
400
200
0
300
200 A. tennuis
100
0 0.0 0.5 1.0 1.5 2.0 849 dS
850 Figure S4. Frequency distribution of dS values for paralogous gene pairs in five
851 Acropora and one Astreopora species. The distributions of dS values of paralogs,
852 estimating neutral evolutionary divergence since the two paralogs diverged, are
44 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
853 plotted with a bin size of 0.02, showing the similar peaks (dS value: 0.1-0.25) in
854 Acropora (red arrows).
855
856
45 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
200 Astreopora
100 sp1
0
75 A. digitifera
50
25
0
100 A. echinata
50 Numbers of pairs 0 200
150 A. gemmifera
100
50
0
150 A. subglabra
100
50
0
60
40 A. tennuis
20
0 0.0 0.5 1.0 1.5 2.0 857 dS
858 Figure S5. Frequency distribution of dS values for anchor-gene pairs in five
859 Acropora and one Astreopora species. Distributions of dS values of anchor paralogs,
860 estimating the neutral evolutionary divergence times since the paralogs diverged, are
861 plotted with a bin size of 0.02, showing the similar peaks (dS value: 0.1-0.25, red
46 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
862 arrows) in Acropora and extra peaks in A. digitifera and A. tenuis (dS value: 0.25-0.5,
863 blue arrows).
864
47 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
865
866 Figure S6. Co-linearity and synteny between Astreopora sp1 and A. tenuis. Only
867 co-linear segments with at least 8 anchor pairs are shown in the top length 100
868 scaffolds.
869
48 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
Astreopora|scafold163.g18268.t1/1-1533 MS------GAQ FLVL------GKGLFGNFEKSKLNP IPKKKDDVD- RDGSWGVMSERISFGSVDFVVSYHHLRTRRKVLG ------NNLEDCLPDAMM EMFDM EFDFPPRAHCLSRWY GLKEFLVITPAEDADT VYSESKCHLLLSSIGIAATN 135 A. digitifera|sc0000022.g82.t1/1-847 ------A. echinata|sc0000126.g23664.t1/1-1094 M ADGDDEVDEV FEITDFTTSSEWERFAAR LEQLLIEWKLDGTNKKKST SKGK ELFNNFEK SK LNPIPHKKVEVDEREGSWGVISERISFGNANFIVSYHHLRTRKKSHGKKSKQT EDLEDC LPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 183 A. gemmifera|sc0000347.g8194.t1/1-965 M ADGDDE------GKELFNNFEKSKLNP IPH KKVEVDEREGSWGVISERISFGNANFIVSYHHLRTRKKSHGKKSKQT EDLEDC LPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 140 A. subgrabla|sc0000147.g25825.t1/1-990 M ADGDDE------GKELFNNFEKSKLNP IPH KKVEVDEREGSWGVISERISFGNANFIVSYHHLRTRKKSHGKKPKQTEDLEDCLPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 140 A. tenuis|sc0000132.g178.t1/1-1092 M ADGDEG-DEV FEITDFTTSSEWERFAAR LEQLLIEWKLDGTN-KKSASKGKELFNNFEKSKLNP IPH KKVEEDEREGSWGVMSERISFGNANFIVSYHHLRARKKSHGKKSKQT EDLEDC LPHAMM EMFDM EFDFPPRAHCLSRWY GLKEFVVISPAEDA EAVITESKCQ LLLSSIGIAATN 181 Adif|sc0000236.g229.t1/1-357 MA------2 Aech|sc0000126.g23662.t1/1-377 MA------2 Agem|sc0000263.g18942.t1/1-357 MA------2 Asub|sc0000360.g2229.t1/1-347 MA------2 Aten|sc0000132.g182.t1/1-357 MA------2
Astreopora|scafold163.g18268.t1/1-1533 SKCTVPM FVHT LGQWRRVYYGLC I GGGFRTTFQ ISHLHH IPSHYGHLAGLLEIFKDKLACPISPM HPVT LTVRFTYVLDEWTDYAWPQEPPDLSSCLECTVGVKNLESLP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWSVRAKM SENPQCLLGEYLGGYMTLCHRQ EST 318 A. digitifera|sc0000022.g82.t1/1-847 ------M PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLLGEYLRGYMTLCHRQ EST 119 A. echinata|sc0000126.g23664.t1/1-1094 SKCAVP IFAH ILQQWRR IYHGLC I GGGFRTTFHM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLLGEYLRGYMTLCHRQ EST 366 A. gemmifera|sc0000347.g8194.t1/1-965 SKCA IPIFAH ILQQWRR IYHGLC I GGGFRTTFHM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLLGEYLRGYMTLCHRQ EST 323 A. subgrabla|sc0000147.g25825.t1/1-990 SKCA IPIFAH ILQQWRR IYHGLC I GGGFRTTFHM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWA DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPQWS LRAKM SENPQCLL------306 A. tenuis|sc0000132.g178.t1/1-1092 SKCA IPIFAHVLQQWRR IYHGLC I GGGFRTTFLM SQ LRHIPSQYGHLAGLLDIFKDKLACPISPM PPVTLTVRFTYVLNEWT DY SWPQ E PPDPSSCLECKVGVSNLKT LP FGACQDPLRELHLSTTW PCLSEEVIVDNSVYSDLDPCHAPHWS LRAKM SENPQCLLGEYLRGYITLCHRQ EST 364 Adif|sc0000236.g229.t1/1-357 ------EMHV------YL ------8 Aech|sc0000126.g23662.t1/1-377 ------QMHV------CL ------8 Agem|sc0000263.g18942.t1/1-357 ------EMHV------YF ------8 Asub|sc0000360.g2229.t1/1-347 ------EMHV------YL ------8 Aten|sc0000132.g182.t1/1-357 ------EMHV------YL ------8
Astreopora|scafold163.g18268.t1/1-1533 MQLLNM TAM DDDHNLAD ITHA LQRLTEA PPVASLSTAVSRATSRISAM GERSSNEFPIPAN LLSEILKHLFPDADQ DLEDLEIAQQ ELSFDKAENRKEAK ECNAPQ EK FR SLKSAP EDSLT FFLAVC VC IVYHNYGGLRAVAHLWQEFVLEM RYRWENDIFIPRLK LQAPNLGSCLVHQK LQM 501 A. digitifera|sc0000022.g82.t1/1-847 MQLLKNT LMDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISA IGD SSLDEYP IPGK LLSEILQHLFPDADQ NLEDLETVQQ E LLFAKM ESENDAKEEN IQQDKFRSLKSAPEDSLT FFLAVG ICIVYHSYGGLKAVAHLWQEFVLEM RFRWENDIFIPRLKLQAPNLGSCLLHQK LQM 302 A. echinata|sc0000126.g23664.t1/1-1094 MQLLKNT VMDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISA IGD SSLDEYP IPGK LLSEILQHLFPDADQ DLEDLETAQQ E LLFAKM ESENDAKEEN IQQDKFRSLKSAPEDSLT FFLAVG ICIVYHSYGGLKAVAHLWQEFVLEM RFRWENDIFIPRLKLQAP SLGSCLLHQK LQM 549 A. gemmifera|sc0000347.g8194.t1/1-965 MQLLKNT LMDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISAM GDSSLDEYP IPGK LLSEILQHLFPDADQ DLEDLETVQQ E LLFAKM ESENDAKEET IQ------QM430 A. subgrabla|sc0000147.g25825.t1/1-990 ------DNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISA IGD SSLDEYP IPGK LLSEILQHLFPDADQ DLEDLETVQQ E LLFAKM ESENDAKEEN IQ------QM401 A. tenuis|sc0000132.g178.t1/1-1092 MQLLKNT LVDDDHNLAD ITHA LQRLTEA PPVASLSSAVSRATSRISALGD SSSDEYP IPGK LLSEILQHLFPDADQ DLEDLETAQQ E LLFAKM ESENEAKEEN IQQDKFRSLKSAPEDSLT FFLAVG ICIVYHSYGGLKAVAHLWQEFVLEM RYRWENDIFIPRLKLQAPNLGSCLLHQK LQM 547 Adif|sc0000236.g229.t1/1-357 ------VSAL ------V13 Aech|sc0000126.g23662.t1/1-377 ------VSAL ------V13 Agem|sc0000263.g18942.t1/1-357 ------VSAL ------V13 Asub|sc0000360.g2229.t1/1-347 ------VSAL ------V13 Aten|sc0000132.g182.t1/1-357 ------VSAL ------V13
Astreopora|scafold163.g18268.t1/1-1533 LNCC IERKRLSK IRTASLQFATNLSKT ---RTESNVTNA EKV RRNSSNT RLGSTDSQ SSVSDDEEFFEAV ESQ EESET ESG NNT ENFKNIRGTDSDYNM DNSCSGNNT IFCKREGGK EP FGDLKLLVSGEP LIVP ITQEPAPMTEDM LEEQAEILTQLGTSVEGARVRAQMQCASLLSDM EAF 681 A. digitifera|sc0000022.g82.t1/1-847 LNCC IERKRLSK IRTASLQFSTNVNKNSGT RIP NNT SNM ESSRKSDSNT RLESTGSQ SSISDDEEFFEALENQDESEK ETRNYFENFNIEGKTNSDSDIENSNENNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------439 A. echinata|sc0000126.g23664.t1/1-1094 LNCC IERKRLSK IRTASLQFSTNVNKNSGT RIP NNT SNM ESSRKSDSNT RLESTGSQ SSISDDEEFFEALENQDENEK ETRKYFENFNIKGKTNSDSDIENSNDNNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------686 A. gemmifera|sc0000347.g8194.t1/1-965 LNCC IERKRLSK IRTASLQFSTNINKNSGRRIP NNT SNM ESSRKSDSNKRLESTGSQ SSISDDEEFFEALENQDESEK EARNYFENFNIEGKTKSDSDIENSNDNNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------567 A. subgrabla|sc0000147.g25825.t1/1-990 LNCC IERKRLSK IRTASLQFSTNINKNSGRRIP NNT SNM ESSRKSDSNT RLESTGSQ SSISDDEEFFEALENQDESEK ETRNYFENFNIEGKTNSDSDIENSNDNNA SMY- KRE GGK ELFGDLKLLVSGEP L VVP ITQ ------538 A. tenuis|sc0000132.g178.t1/1-1092 LNCC IERKRLSK IRTASLQFSTNVNKNSGT RIP NNT SNM ESCRKSDSNT RLESTGSQ SSISDDEEFFEALENQDENEK ETRNYFENFNIEGKTNSDSDIENSNDNNA SMS- KRE GGR ELFGDLKLLVSGEP L VVP ITQ ------684 Adif|sc0000236.g229.t1/1-357 LSCCLQ------19 Aech|sc0000126.g23662.t1/1-377 LSCCLQ------19 Agem|sc0000263.g18942.t1/1-357 LSCCLQ------19 Asub|sc0000360.g2229.t1/1-347 LSCCLQ------19 Aten|sc0000132.g182.t1/1-357 LSCCLQ------19
Astreopora|scafold163.g18268.t1/1-1533 K AANPGC LLEDFVRWY SP FDWI LGPET EEEI EELKKLKTSHEKDSK ESSTPVT AAT SEVSLDEGWDTEEIEIPDEDFTTSDAKSLMAKAKAGNV EELQVPIKWREEGHLSVRMRIPGNMWA EVWQ SARPVPVRRQKRLFDETKEAEK EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQCA 864 A. digitifera|sc0000022.g82.t1/1-847 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 475 A. echinata|sc0000126.g23664.t1/1-1094 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 722 A. gemmifera|sc0000347.g8194.t1/1-965 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 603 A. subgrabla|sc0000147.g25825.t1/1-990 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 574 A. tenuis|sc0000132.g178.t1/1-1092 ------EPAPM TEDM LEEQAEILTQLGTSVEGARVRAQMQSA 720 Adif|sc0000236.g229.t1/1-357 ------Aech|sc0000126.g23662.t1/1-377 ------Agem|sc0000263.g18942.t1/1-357 ------Asub|sc0000360.g2229.t1/1-347 ------Aten|sc0000132.g182.t1/1-357 ------
Astreopora|scafold163.g18268.t1/1-1533 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPET EEEI EELEK LKTSHEKDSK ESSTPVT AAT SEVSLDEGWDTEEIEIPDEDFTTSDAKSLMAKAKAGNV EELQVPIKWREEGHLSIRMRIPGN MWA EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGLKPAEVAFQ LFPVLLH AAV L 1047 A. digitifera|sc0000022.g82.t1/1-847 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKASEEPSTPV AATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 658 A. echinata|sc0000126.g23664.t1/1-1094 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKASEEPSTRV AATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 905 A. gemmifera|sc0000347.g8194.t1/1-965 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKASEEPSRPVAATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 786 A. subgrabla|sc0000147.g25825.t1/1-990 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEV I EELER LKNNA EKASEEPSRPVAATVSEASFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSLRM RIPGN MWV EVWQ SARPVPVRRQKRLFDETKEAEKVLHFLAGMKPAEVAFQ LFPVLLR AAV L 757 A. tenuis|sc0000132.g178.t1/1-1092 S LLSDM EAFKAANPGC LLEDFVRWY SP FDWI LGPETQEEI EELER LKNNA EKGSEEPSRPVEATVSEV SFGEGWDTEEIEIADEDLVSKDAKSLMNEAKAGNLDQ LQAPAKWR EEGHLSHRM RIPGN TWVEVWQSARPVPVRRQKRLFDETKEAEKVLHFLAGM KPAEVAFQ LFPVLLR AAV L 903 Adif|sc0000236.g229.t1/1-357 ------Aech|sc0000126.g23662.t1/1-377 ------Agem|sc0000263.g18942.t1/1-357 ------Asub|sc0000360.g2229.t1/1-347 ------Aten|sc0000132.g182.t1/1-357 ------
Astreopora|scafold163.g18268.t1/1-1533 RIEEAGARDIPAVASLLDKVLEKASK LSWAM PQD LLLC EEFVKQLQLCEL VVARAHSLR FK FQDA LSDK ETGT DKDV EHFLSALMEKPEVAVIGAGRGPAGRVLHGLFAKQ ESARMQFTVDESS-PN QAV ---AFR EQPIAHPSM SK EQVGNK ITRRDFYAILGVPRDANKNQIKRAYRKLAM 1226 A. digitifera|sc0000022.g82.t1/1-847 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LV-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHSLFVKQ ESARMQFGGDESPFPNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYSVLT- 830 A. echinata|sc0000126.g23664.t1/1-1094 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LL-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHSLFVKQ ESARMQFGGDESPFRNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYCVLT- 1077 A. gemmifera|sc0000347.g8194.t1/1-965 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LL-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGR ------SARMQFGGDESPFPNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYCVLT- 948 A. subgrabla|sc0000147.g25825.t1/1-990 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARANSLRYKFQDA LL-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHSLFVKQ ESARMQFGGDETPFPNQQV- -SWFRSCFFALLVYLIRNDDATVKENDYR IVISAITGNSHPPIPPCLRALNI937 A. tenuis|sc0000132.g178.t1/1-1092 RIEEAGARDIPAVAFLLDKVLDKASK LSWEM PQDLTQCEEFVKQLQLCEL VVARAHSLRYKFHDA LV-KGT ETDKDV EHFLSALMEKPEVTVIGAGRGPAGRVLHGLFVKQ ESARMQFGGDESPFPNQQQRHSATLPQDFPSP ------TGR EY ILRTTIPRPRPSSRPCPQRMYCVLT- 1075 Adif|sc0000236.g229.t1/1-357 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Aech|sc0000126.g23662.t1/1-377 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Agem|sc0000263.g18942.t1/1-357 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Asub|sc0000360.g2229.t1/1-347 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50 Aten|sc0000132.g182.t1/1-357 ------ILAGRDFYAILGVPRDANKNQIKRAYRKLAM 50
Astreopora|scafold163.g18268.t1/1-1533 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDSHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGSQHSHQ REVPRGSDIHM DLEVTLEELYNGNFIEV LR FKPVTETVPGTRQCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE1389 A. digitifera|sc0000022.g82.t1/1-847 ------N------DEFR ---LAG ------AFTDDV IFQ ------847 A. echinata|sc0000126.g23664.t1/1-1094 ------N------DEFR ---LAG ------AFTDDV IFQ ------1094 A. gemmifera|sc0000347.g8194.t1/1-965 ------N------DEFR ---LAG ------AFTDDV IFQ ------965 A. subgrabla|sc0000147.g25825.t1/1-990 ISFP------FLSA------ASFS- YATPG ------LP I- PYWSGVYFE -----NNH------PQTPA------974 A. tenuis|sc0000132.g178.t1/1-1092 ------N------DEFR ---LAG ------AFTDDV IFQ ------1092 Adif|sc0000236.g229.t1/1-357 KWHPDKNKDDP KAQERFQDLSAAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213 Aech|sc0000126.g23662.t1/1-377 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FSRFVFSGFLTK IAEGSR LELCTFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE233 Agem|sc0000263.g18942.t1/1-357 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213 Asub|sc0000360.g2229.t1/1-347 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213 Aten|sc0000132.g182.t1/1-357 KWHPDKNK EDPKAQ ER FQDLS AAY EV LTDEEKRRTYDTHGEEGLKNMNSGGGDPFDP FS------SFFGGFGFHFEGAQHSHQ REVPRGSDINM DLEVTLEELYNGNFIEV SR FKPVKETISGTRKCNCRQ EMRTTQLGPGRFQMSPVDICDECPAVTYVTKEK LLEVE213
Astreopora|scafold163.g18268.t1/1-1533 IEPGM HDGQ EYPFIAEGEPHIDGEPGDLKFM IRELRHER FQRKGDDLYTNVTITLVDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNNVKGDLYITFDV EFPRGGLEDSEK EALK SILKQDSQQKAYNGL 1533 A. digitifera|sc0000022.g82.t1/1-847 ------A. echinata|sc0000126.g23664.t1/1-1094 ------A. gemmifera|sc0000347.g8194.t1/1-965 ------A. subgrabla|sc0000147.g25825.t1/1-990 ------LISP LSS ------KNV LCADQR------990 A. tenuis|sc0000132.g178.t1/1-1092 ------Adif|sc0000236.g229.t1/1-357 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GAKIKKKEEGMPNYENNN IKGDLY ITFDIAFPRGGLEDNDKEA LKNILKQSPNQKAYNGL 357 Aech|sc0000126.g23662.t1/1-377 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GAKIKKKEEGMPNYENNN IKGDLY ITFDIAFPRGGLEDNDKEA LKNILKQSPNQKAYNGL 377 Agem|sc0000263.g18942.t1/1-357 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNN IKGDLY ITFDIEFPRGGLEDNDKEA LKDILKQSPNQKAYNGL 357 Asub|sc0000360.g2229.t1/1-347 IEPGM HDGQ EYPFVAEGEPHVDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNN IKGDLY ITFDIEFPRGGLEDNDKEG W ---YKSSIS ------347 870 Aten|sc0000132.g182.t1/1-357 IEPGM HDGQ EYPFVAEGEPHIDGEPGDLKFI IKELRHDR FQRKGDDLYTNVTVTLLDALNGFEM DILHLDGHKVH VVREK ITWP GARIKKKEEGMPNYENNN IKGDLY ITFDIEFPRGGLEDNDKEA LKNILKQSPNQKTYNGL 357
871 Figure S7. Alignment of orthogroup 1247 (dnaJ homolog subfamily B member
872 11-like) showing the independent loss of the domain in duplicates.
873
49 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
874
875 Figure S8. Alignment of orthogroup 1244 (excitatory amino acid transporter 1-
876 like) showing mutations on transmembrane and exposed regions, suggesting that
877 new functions would be generated. Exposed regions are shown in yellow.
878
50 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
879
880
881 Figure S9. Phylogeny of the toxic protein, Ryncolin-4. The phylogeny was
882 reconstructed using ExaML and bootstrap values are shown at each node. Gene
883 duplications caused by WGD in Acropora are shown in cyan.
884
51 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
885
886
887 Figure S10. Phylogeny of the toxic Astacin-like metalloprotease. The phylogeny
888 was reconstructed using ExaML and bootstrap values are shown at each node. Gene
889 duplication caused by WGD in Acropora is shown in cyan.
890
52 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
891
892 Figure S11. Phylogeny of a toxic protein (Putative lysosomal acid lipase/
893 cholesteryl ester hydrolase). The phylogeny was reconstructed using RAxML and
894 bootstrap values are shown at each node. Gene duplications caused by WGD in
895 Acropora are shown in cyan shadows.
896
53 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
897
898
899 Figure S12. Phylogeny of the toxic putative endothelial lipase. The phylogeny was
900 reconstructed using RAxML and bootstrap values are shown at each node. WGD in
901 Acropora is shown in cyan.
902
54 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
903
904
905 Figure S13. Phylogeny of the toxin protein (DELTA-thalatoxin-Avl2a/DELTA-
906 alicitoxin-Pse2a). The phylogeny was reconstructed using RAxML and bootstrap
907 values are shown at each node. Gene duplication by WGD in Acropora is shown in
908 cyan.
909
55 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
910
911
912 Figure S14. Phylogeny of the toxic protein (DELTA-actitoxin-Aas1a). The
913 phylogeny was reconstructed using RAxML and bootstrap values are shown at each
914 node. Gene duplication by WGD in Acropora is shown in cyan.
915
56 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
916
917
918 Figure S15. Phylogeny of the toxic protein (Phospholipase-B 81). The phylogeny
919 was reconstructed using RAxML and bootstrap values are shown at each node. Gene
920 duplication by WGD in Acropora is shown in cyan.
921
57 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
922
923
924 Figure S16. Phylogeny of the toxic snake venom 5'-nucleotidase. The phylogeny
925 was reconstructed using RAxML and bootstrap values are shown at each node. Gene
926 duplication by WGD in Acropora is shown in cyan.
927
58 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
A. gemmifera_sc0000016g26670t1
0.5172 A. echinata_sc0000003g27340t1 0.5336
0.9967 A.digitifera_sc0000001g331t1
A.subgrabla_sc0000002g21169t1 0.9984
A.tenuis_sc0000018g61t1
A.subgrabla_sc0000002g21172t1 1
A. gemmifera_sc0000016g26672t1 0.9861 0.7631
A.digitifera_sc0000001g330t1
A. echinata_sc0000003g27338t1
A.tenuis_sc0000018g60t1
1 A. gemmifera_sc0000016g26671t1 0.9598
A.subgrabla_sc0000002g21171t1 0.959
0.6434 A.subgrabla_sc0000002g21170t1
A. echinata_sc0000003g27339t1
1 A.digitifera_sc0000001g329t1
1 A. gemmifera_sc0000016g26673t1 1
0.9943 A.subgrabla_sc0000002g21173t1
A. echinata_sc0000003g27337t1
Astreopora sp1_scaffold42g6297t1 928 0.2
929 Figure S17. Phylogeny of orthogroup 434 (somatostatin receptor type 5-like)
930 shows duplicates under two WGD topology. The phylogeny was reconstructed
931 using MrBayes, and Bayesian posterior probabilities are shown at each node.
932
933
934
59 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
935 Supplementary Tables
936 Table S1. Numbers of gene pairs in the paralogous gene pairs dataset and anchor
937 gene pairs datasets.
Paragous gene pairs (<=20 gene families) Anchor gene pairs (<=20 gene families)
Total numbers Total numbers (0 A. digitifera 39827 8249 46559 1958 A.echinata 47299 10948 54956 2530 A. gemmifera 48051 11093 56972 3299 A. subgrabla 50852 12077 44093 2380 A. tenuis 34097 6635 28073 1488 Astreopora 49135 13648 52481 3033 sp1 938 939 Table S2. Peak value estimation of dS distribution by KDE toolbox. Astreopora sp1_A. A. tenuis_A. digitifera Acropora_WGD tenuis Peak age 53.6 14.69 35.01458704 log2(dS_paralog_peak) -0.314 -3.4596 -1.8165 95%_HDP_log2(dS_paralog_peak) (-0.22031,-0.33195) (-3.4008,-3.5141) (-1.7606,-2.1261) 95%_HDP_Age NA NA (31.18,35.7) 940 941 Table S3. Numbers of gene family in orthogroups, core-orthogroups and high- 942 quality core-orthogroups Numbers Orthogroups 883 Core-orthogroups 205 High-quality core-orthogroups 154 943 60 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. 944 Table S4. The number of putative toxin proteins in 12 Cnidarian species H. A. Astreo M. N. P. P. Gene A. magni P. Query_name A. digitifera A.echinata A. gemmifera subgra pora cavern vecten astreoi austra family tenuis papill lobata bla sp1 osa sis des liensis ata Gene Coagulation 52 48 52 58 50 49 12 39 56 7 46 34 family_1 factor X Gene Ryncolin-4 48 45 37 62 38 29 0 23 46 5 24 29 family_2 Astacin-like Gene metalloprote 28 23 25 26 30 33 36 20 60 6 31 14 family_3 ase toxin Gene Reticulocalb 18 15 14 14 18 20 5 16 18 6 19 17 family_4 in Putative lysosomal Gene acid 10 10 10 11 7 13 4 6 5 4 5 5 family_5 lipase/choles teryl ester hydrolase Venom Gene carboxyleste 8 6 9 7 7 17 1 5 14 3 8 4 family_6 rase-6 Putative Gene endothelial 11 1 1 17 11 25 2 2 5 1 4 5 family_7 lipase DELTA- thalatoxin- Gene Avl2a/DELT 9 5 7 12 11 13 0 2 5 2 2 0 family_8 A-alicitoxin- Pse2a Venom Gene phosphodiest 5 6 6 6 5 7 1 7 9 3 5 5 family_9 erase 2 Gene DELTA- family_1 actitoxin- 3 4 5 5 4 3 0 1 0 1 5 4 0 Aas1a Gene family_11 7 5 2 3 2 2 1 1 1 1 2 1 Gene Phospholipa family_1 4 2 2 2 3 4 0 3 3 3 1 1 se-B 81 2 Gene Venom family_1 dipeptidyl 5 2 1 2 2 3 2 4 3 0 0 1 3 peptidase 4 Gene Hyaluronida family_1 3 2 2 2 2 2 1 2 2 1 2 1 se-1 4 Gene Snake 1 1 2 4 2 2 2 1 1 1 1 0 61 bioRxiv preprint doi: https://doi.org/10.1101/366435; this version posted July 18, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. family_1 venom 5'- 5 nucleotidase 945 946 Table S5. Likelihood of multiple WGDs hypotheses in Acropora using WGDgc 947 method with gene counts data. WGD event(s) Likelihood Likelihood Ratio Test P_value 0 -38731.86 0 VS 1 7.66E-05 1 -38724.04 1 VS 2 0.01248965 2 -38720.92 2 VS 3 0.05990546 3 -38719.15 948 949 Table S6. Likelihood of different times of WGD under one WGD event in Acropora 950 using WGDgc. Time of WGD Likelihood 18.697005 -38724.14 22.697005 -38724.09 26.697005 -38724.06 30.697005 -38724.04 34.697005 -38724.04 38.697005 -38724.06 42.697005 -38724.11 46.697005 -38724.2 50.697005 -38724.35 951 952 953 62