Quick viewing(Text Mode)

Genes in Teleost Fishes

Genes in Teleost Fishes

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1

2

3 Cluster expansion of apolipoprotein D (ApoD) genes in teleost fishes

4 Langyu Gu1,2*, Canwei Xia3

5

6 1Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education,

7 Laboratory of Aquatic Science of Chongqing, School of Life Sciences, 400715, Southwest

8 University, Chongqing, China. [email protected]

9 2Zoological Institute, University of Basel, Vesalgasse 1, 4051, Basel, Switzerland.

10 3Ministry of Education Key Laboratory for Biodiversity and Ecological Engineering, College of

11 Life Sciences, Beijing Normal University, Beijing, China. [email protected]

12 13 14

15

16

17 Corresponding author:

18 *Langyu Gu

19 [email protected]

20

21

22

23

24

1

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

25 Abstract

26 Background: Gene and genome duplication play important roles in the evolution of gene function.

27 Compared to individual duplicated genes, gene clusters attract particular attentions considering their

28 frequent associations with innovation and adaptation. Here, we report for the first time the

29 expansion of the ligand (e.g., pheromone and hormone)-transporter genes, apolipoprotein D (ApoD)

30 genes in a cluster, specific to teleost fishes.

31

32 Results: The single ApoD gene in the ancestor expands in two clusters with a dynamic evolutionary

33 pattern in teleost fishes. Based on comparative genomic and transcriptomic analyses, protein 3D

34 structure comparison, evolutionary rate detection and breakpoint detection, orthologous genes show

35 conserved expression patterns. Lineage-specific duplicated genes that are under positive selection

36 evolved specific and even new expression profiles. Different duplicates show high tissue-specific

37 expression patterns (e.g., skin, eye, anal fin pigmentation patterns, gonads, gills, spleen and lower

38 pharyngeal jaw). Cluster analyses based on protein 3D structure comparisons, especially the four

39 loops at the opening side, show segregation patterns with different duplicates. Duplicated ApoD

40 genes are predicted to be associated with forkhead transcription factors and MAPK genes, and

41 they are located next to the breakpoints of genome rearrangements.

42

43 Conclusions: Here, we report the expansion of ApoD genes specific to teleost fishes in a cluster

44 manner for the first time. Neofunctionalization and subfunctionalization were observed at both

45 protein and expression levels after duplication. Evidence from different aspects, i.e. abnormal

46 expression induced disease in human, fish-specific expansion, predicted associations with forkhead

47 transcription factors and MAPK genes, highly specific expression patterns in tissues related to

48 sexual selection and adaptation, duplicated genes that are under positive selection, and their

49 locations next to breakpoints of genome rearrangement, suggests the potential advantageous roles of 2

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

50 ApoD genes in teleost fishes. Cluster expansion of ApoD genes specific to teleost fishes thus

51 provides an ideal evo-devo model for studying gene duplication, cluster maintenance and new gene

52 function emergence.

53

54 Key words

55 apolipoprotein D (ApoD), duplication, gene cluster, positive selection, breakpoints, teleost fishes

56

57 Background

58

59 Gene and genome duplication play important roles in evolution by providing new genetic

60 materials [1]. The gene copies emerging from duplication events (including whole genome

61 duplications (WGD)) can undergo different evolutionary fates, and a number of models have been

62 proposed as to what can happen after duplication [2]. In many instances, one of the duplicates

63 becomes silenced via the accumulation of deleterious mutations (i.e. pseudogenization or

64 nonfunctionalization [1]). Alternatively, the original pre-duplication function can be subdivided

65 between duplicates (i.e. subfunctionalization) [3], or one of the duplicates can gain a new function

66 (i.e. neofunctionalization) [4]. Since the probability of accumulating beneficial substitutions is

67 relatively low, examples for neofunctionalization are sparse. There are, nevertheless, examples for

68 neofunctionalization. For example, the duplication of dachshund in spiders and allies has been

69 associated with the evolution of a novel leg segment [5]; the expansion of repetitive regions in a

70 duplicated trypsinogen-like gene led to functional antifreeze glycoproteins in Antarctic notothenioid

71 fish [6]; and the duplication of opsin genes is implicated in trichromatic vision in primates [7].

72 Another selective advantage of gene duplication can be attributed to the increased numbers of gene

73 copies, e.g. in the form of gene dosage effects [8,9]. Multiple mechanisms can act together to shape

74 different phases of gene evolution after duplication [10]. 3

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

75 Gene functional changes after duplication can occur at the protein level [6,11,12]. For

76 example, the physiological division of labour between the oxygen-carrier function of haemoglobin

77 and the oxygen-storage function of myoglobin in vertebrates [13] (subfunctionalization); and the

78 acquired enhanced digestive efficiencies of duplicated gene encoding of pancreatic ribonuclease in

79 leaf monkeys [14] (neofunctionalization). However, the chance to accumulate beneficial alleles is

80 low, and thus, functional changes after duplication at the protein level are sparse. Instead, changes

81 in the expression level are more tolerable and efficient, since they do not require modification of

82 coding sequences, and they can immediately offer phenotypic consequences. For example,

83 complementary degenerative mutation in the regulatory regions of duplicated genes is a common

84 mechanism for subfunctionalization [2,15]. Many examples have provided evidence that duplicated

85 genes acquiring new expression domains are linked to neofunctionalization (e.g., dac2, a novel leg

86 segment in Arachnid [5]; elnb, bulbus arteriosus in teleost fishes [16]; and fhl2b, egg-spots in

87 cichlid fishes [17]).

88

89 In some cases, gene clusters resulting from gene duplication have attracted considerable

90 attention, such as clusters [18], globin gene clusters [19], paraHox gene clusters [20],

91 MHC gene clusters [21] and opsin gene clusters [22]. Duplicated genes in clusters are usually

92 related to innovation and adaptation [11,22,23], suggesting advantageous roles during evolution.

93 The expansion of gene clusters can be traced back to WGD and tandem duplication [23,24].

94 Compared to two rounds of WGD occurring before the split between cartilaginous and bony fish

95 [25], the ancestor of teleost fishes experienced another round of WGD (teleost genome duplication,

96 TGD) after its divergence from non-teleost actinopterygians, including Bichir, Sturgeon, bowfin

97 and spotted gar [26,27]. This extra TGD provides an additional opportunity for gene family

98 expansion in fishes [22,28–30].

99

4

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

100 Genome rearrangements have been suggested to occur frequently in teleost fishes [31]. If

101 genome rearrangements can capture locally adapted genes or antagonist sex determining genes by

102 reducing recombination, the rearranged genome can promote divergence and reproductive isolation

103 [32,33], and thus contribute to speciation and adaptation. Examples can be found in butterflies [34],

104 mosquitoes [35] and fish [36]. This occurs especially when advantageous genes are located next to

105 the breakpoints of a genome rearrangement due to its low recombination rates [37,38]. Considering

106 that the expansion of gene clusters is usually adaptive (as mentioned above) and is linked to

107 genome instability [33,39,40], it will be interesting to investigate the roles of gene clusters located

108 next to the breakpoints of genome rearrangements. However, related studies are sparse.

109

110 Here, we report, for the first time, the expansion of the gene apolipoprotein D (ApoD) in

111 teleost fishes. The ApoD gene belongs to the lipocalin superfamily of lipid transport proteins

112 [41,42]. In humans, ApoD is suggested to function as a multi-ligand, multifunctional transporter

113 (e.g., hormone and pheromone) [42,43], which is important in homeostasis and in the housekeeping

114 of many organs [43]. It is expressed in multiple tissues, most notably in the brain and testis (see e.g.

115 [42,44,45]), and it has been suggested to be involved in the central and peripheral nervous systems

116 [42]. Interestingly, the ApoD gene is duplicated in a cluster manner in two chromosomes in

117 different teleost fishes (http://www.genomicus.biologie.ens.fr/genomicus-91.01/cgi-bin/search.pl).

118 However, no detailed analyses have been reported. Here, we investigated the evolutionary history

119 of ApoD genes in fishes for the first time.

120

121 Results

122

123 1. In silico screen and phylogenetic reconstruction of ApoD genes

124 5

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

125 To investigate the expansion of ApoD genes, we performed phylogenetic reconstruction

126 with high quality assembly genome data (Figures 1A and 1B). There is one ApoD gene in

127 (Latimeria chalumnae), and two copies (A and B) in spotted gar (Lepisosteus oculatus),

128 which are located in one cluster. Different numbers of ApoD genes are located in two clusters in

129 different teleost fishes, i.e., with two copies in cavefish (Astyanax mexicanus) (A1 and A2) and in

130 pufferfish (Tetraodon nigroviridis) (B2a and A2); three copies (A1, A2, B2) in zebrafish (Danio

131 rerio) and in cod (Gadus morhua) (A1, A2 and B2b); four copies in platyfish (Xiphophorus

132 maculatus) (A1, A2, B2a and B2b); five copies in Amazon molly (Poecilia formosa) (A1, A2, B2a,

133 B2ba1, B2ba2) and in fugu (Takifugu rubripes) (A1, A2, B1, B2a, B2b); six copies in medaka

134 (Oryzias latipes) (A1, A2m1, A2m2, A2m3, B2a, B2b); and eight copies in stickleback (Gasterosteus

135 aculeatus) (A1, A2s1, A2s2, B1, B2a, B2bs1, B2bs2, B2bs3) and in tilapia (Oreochromis niloticus)

136 (A1, A2t1, A2t2, A2t3, A2t4, B1, B2a, B2b). Noticeably, although the copy B2b of cod is clustered

137 within the B2a clade based on a maximum likelihood (ML) tree, the bootstrap value is very low

138 which could be due to recent duplication (Figure 1B). Considering its gene direction and syntenic

139 with B2b genes in other species (Figure 1A), we named it copy B2b. Sequence alignment for tree

140 construction can be found in Additional file 1.

141

142 To further retrieve the evolutionary history of ApoD genes in fishes, we performed an in

143 silico screen across the whole phylogeny of teleost fishes using draft genomes (Figure 1C). No

144 ApoD gene was detected in Stylephorus chordatus. Unlike copy A1 which shows no lineage-

145 specific duplication, copy A2 exhibits variable lineage-specific copies in different fishes, with the

146 highest lineage-specific duplication present in the cichlid fish tilapia (four copies). Copy B1 is

147 absent in the clade of Gadiformes but is kept in Acanthopterygii. Species in Danio rerio, Osmerus

148 eperlanus and Parasudis fraserbrunneri possess copy B2. The co-existence of B2a band is B2

149 common in Percomorphaceae. The largest numbers of lineage-specific duplicated ApoD genes were

6

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

150 found in tilapia (copy A2), medaka (copy A2) and stickleback (copy B2b) in Percomorphaceae. The

151 predicted ApoD gene sequences can be found in Additional file 2.

152

153 To infer the relationship between ApoD gene duplication and TGD, syntenic analyses were

154 conducted, revealing that two paralogous regions in teleost fishes correspond to one ohnolog region

155 in spotted gar and in chicken (connected by the same coloured lines in Figure 2A). For example, the

156 region with yellow colour in linkage group (LG) 9 in spotted gar (left above), or in chromosome

157 (chr) 2 in chicken (left down) have paired paralogous regions (connected by yellow lines) in chr17

158 (ApoD cluster I located on) and in chr20 (ApoD cluster II located on) in medaka. Similarly, the

159 region with purple colour in LG3 in spotted gar and in chr1 in chicken has paired paralogous

160 regions (connected by purple lines) in chr17 and in chr20 in medaka.

161

162 2. ApoD clusters next to the breakpoints of genome rearrangements

163

164 Syntenic analyses clearly show that more than one chromosome in spotted gar and in

165 chicken are syntenic to the corresponding paralogous regions where ApoD clusters are located in

166 fishes. For example, chromosomes (or LGs) containing ApoD clusters (chr17 and chr 20 in medaka,

167 LG III and LG XXI in stickleback, LG9 and LG18 in tilapia, chr2 and chr24 in zebrafish, Figures

168 2A and 2B) are syntenic to chromosomes LG10, LG9, LG19, LG3 and LG14 in spotted gar and

169 chr8, chr2, chr28, chr1 and chr9 in chicken (Figures 2A and 2B). Noticeably, ApoD clusters are

170 located next to the breakpoints of genome rearrangements (Figures 2A and 2B). The rearranged

171 segments (approx. 80 kb to 200 kb syntenic with LG14 in spotted gar, Figure 2B) include ApoD

172 clusters and their neighbour genes (e.g., and2, samd7, sec62, nadkb, gpr160, skila, prkci and phc3

173 for Cluster I; otos, myeov2, and1, tmtopsb, tnk2a and tfr1b for Cluster II) (Figure 2B). Analyses of

174 available cichlid genomes further show that inversion of the segments (approx. 600 kb to 800kb) 7

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

175 with ApoD clusters as the breakpoints also occurred in the species belonging to the most species-

176 rich lineage, the haplochromine lineage in cichlid fishes, supported by split-read analysis using

177 Delly (Figure 2C).

178

179 3. Protein-protein association predictions and protein 3D structure comparisons

180

181 To assess the biological functions of different ApoD genes, protein-protein associations

182 were predicted. Protein domain architecture analyses revealed that ApoD proteins in different fishes

183 are composed of approx. 20 amino acids as the signal peptide and approx. 144 amino acids as the

184 lipocalin domain (Figure 2D). The common associations of ApoD genes are with the immunity-

185 related gene pla2g15 [47,48], the high-density lipoprotein biogenesis-related gene lcat [49],

186 different copies of zg16 related to pathogenic fungi recognition [50] and MAPK genes [51,52].

187 Different ApoD genes in teleost fishes can be subdivided into two classes based on the association

188 predictions. ApoD genes from one class (copy A2, copy B2a and copy B1) are associated with

189 forkhead transcription factors. ApoD genes from the other class (copy A1 and copy B2b) lost their

190 associations with forkhead transcription factors but are associated with the lipoprotein-related gene

191 apoa1 [53]. Noticeably, the only ApoD gene in coelacanth possesses both associations with

192 forkhead transcription factors and with apoa1. ApoD in coelacanth is also associated with genes

193 encoding ligands that can activate the Notch signalling pathway (jag1, jag2) [54], and the gene

194 belongs to the annexin family (anxaII) (Figure 2D). Noticeably, more members of forkhead

195 transcription factors and MAPK genes are associated with ApoD genes after duplication. Unlike in

196 other fishes, copy A1 in zebrafish is associated with bone resorption-related duplicated genes

197 (ostf1a, ostf1b, ostf1c) [55], the cell growth and division-related gene ppp2cb [56], the

198 neurodevelopment-related gene rab3gap2 [57] and the guanylate-binding gene gbp1 [58] (Figure

199 2D). 8

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

200

201 Homology protein structure modelling of different ApoD genes shows a conserved structure,

202 including a cup-like central part made by eight antiparallel β-sheet strands and two ends connected

203 by loops (a wide opening side formed by loops 1, 3, 5 and 7; and a narrow, closed bottom formed

204 by loops 2, 4 and 6) (Figure 3A). Interestingly, unlike the very conserved cup-like central part, the

205 loops are highly variable (Figure 3A). Cluster analyses based on the whole protein 3D structure

206 comparison can clearly segregate different duplicates, e.g., copy A1, copy A2, copy B2a and copy

207 B2b. Especially for lineage-specific duplicated genes which are clustered together. Copy B1s are

208 clustered together, nesting within the copy A2 clade (Figure 3B). The same analyses focused on

209 only the four loops (loops 1, 3, 5 and 7) at the opening side show a similar segregation pattern

210 (Figure 3C). However, cluster analyses focused on only the three loops (loops 2, 4 and 6) at the

211 bottom side did not show a segregation pattern (Figure 3D). Details about the PDB files and the

212 cluster results can be found in Additional file 3.

213

214 4. Positive selection detection on ApoD genes

215

216 Positive selection was detected in many ApoD genes before and after duplication, for

217 example, the branch of copy A2 in zebrafish, the ancestral branch of copies B2a and B2b in fugu,

218 medaka and platyfish; and, especially, in lineage-specific duplicated genes, such as in stickleback,

219 cichlid fishes, and amazon molly (Figure 4 and Additional file 4). Positive selection sites under

220 Bayes empirical Bayes (BEB) with posterior probability >95% were given in Figure 5 and

221 Additional file 4. Noticeably, most sites under positive selection are located on the highly variable

222 loops or on the connections between loops and the cup-like central part at the opening side (loops 3

223 and 5) (Figures 3A and 5). Our strategy here is relatively conservative. As we can see, if we only

224 focus on one prior hypothesis on a particular branch without doing multiple test correction, we will 9

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

225 find more branches with dN/dS that are significantly larger than 1 (p<0.05) (Additional file 4).

226 However, we chose to use our strategy to make the analyses more rigorous. Even with this strict but

227 comprehensive strategy, our results still show that positive selection occurred in branches before

228 and after ApoD gene duplication, especially for lineage-specific duplicated genes. Details about the

229 codon alignment for the PAML analyses can be found in Additional file 4.

230

231 5. Gene expression profile detection of ApoD genes in different species

232

233 Copy A1 is mainly expressed in the skin and the eye. Noticeably, copy A1 is also highly

234 expressed in the novel anal fin pigmentation patterns of a cichlid fish, Astatotilapia burtoni, which

235 is in consistent with our previous study [59]. Copy A2 shows redundant expression patterns in the

236 gills, eye, skin and gonads in zebrafish and cavefish, but its orthologous A2 genes including

237 lineage-specific duplicated ones in the species of Acanthomorphata, which we test here (cod,

238 stickleback, medaka, tilapia and A. burtoni), are expressed in the gills and in the related apparatus,

239 the lower pharyngeal jaw. Copy B2 in zebrafish shows redundant expression profiles (in the skin

240 and gonads) overlapping with the profiles of copies A1 and A2. Copies B2a and B2b show specific

241 expression patterns in different fishes of Acanthomorphata, which we test here. For example, copy

242 B2b is expressed in the ovary and in the liver in cod, while B2a and B2bs are mainly expressed in

243 the spleen and in the liver of the stickleback. The expression of B2a and B2b was not detected in

244 adult medaka, tilapia or A. burtoni. Interestingly, based on the available transcriptomic data for

245 different developmental stages of gonads in tilapia, we found that B2a and B2b are highly

246 specifically expressed at the early developmental stage of gonad tissues (five days after hatching,

247 5dah) (Figure 6). Copy B1 is highly specifically expressed in the liver in tilapia and in A. burtoni

248 but not in stickleback in which B1 is inverted (Figure 1A). Noticeably, no expression profile was

249 detected for copy B in gar, at least in the tissues we test here. Instead, expression profiles including 10

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

250 the skin, eye, gill, liver, testis and brain were detected for copy A in gar. Details can be found in

251 Figure 6 and Additional file 5.

252

253 Discussion

254

255 Dynamic evolution of ApoD genes in teleost fishes

256

257 Here, we report the cluster expansion of ApoD genes specific to teleost fishes for the first

258 time. Phylogenetic reconstruction and syntenic analyses clearly show the expansion of ApoD genes

259 in two clusters with lineage-specific tandem duplications after TGD. The interplay between genome

260 duplication and tandem duplication to prompt the expansion of the gene family has already been

261 shown in a few studies [60–62]. It has been suggested that the fixation of duplications is much more

262 common in genome regions where the rates of mutations are elevated due to the presence of

263 already-fixed duplication, which is the so-called “snow-ball” effect [63]. Both genome duplication

264 and tandem duplication can produce raw genetic materials to fuel the diversification of teleost

265 fishes at a later stage, e.g., morphological complexity and ecological niche expansion, which is

266 compatible with the “time lag model” [64]. Neofunctionalization and subfunctionalization after

267 duplication are important steps to realize this diversification, and both can be detected in ApoD

268 duplicates.

269

270 Functional divergence of ApoD genes occurred at the protein level. On the one hand, based

271 on protein-protein association predictions, although common associations with MAPK genes were

272 shared among ApoD genes, subdivided associations with forkhead transcription factors and apoa1

273 were detected in different paralogous ApoD genes, indicating subfunctionalization. Noticeably,

274 more members of forkhead transcription factors and MAPK genes are associated with ApoD genes 11

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

275 after duplication. In this case, neofunctionalization could also occur but is not limited to the dosage

276 effects. On the other hand, divergence was also detected at the protein structure level. Protein

277 structures are often related to functional divergence [65–68]. New fold can even evolve novel

278 function [69,70]. Even a few residues mutations can induce structural changes [71,72]. Therefore,

279 structure-based inference is important for understanding molecular function. Our protein 3D

280 structure modelling showed a conserved backbone conformation in spite of sequence divergence.

281 Indeed, this is a feature of the lipocalin family [66], to which ApoDs belong [41,42,73]. The cup-

282 like central part is used to transport large varieties of ligands, such as hormones and pheromones

283 [73]. Interestingly, cluster analyses based on the whole protein structure or only on the four loops at

284 the opening side can segregate different duplicates, although not for copy B1. Considering that copy

285 B1 has been lost multiple times, its function might be not necessary and could share functions with

286 other copies; thus, it is not surprising if they are clustered together within the copy A2 clade. The

287 cluster results of two other individual genes (copy B2 of zebrafish and B2b of medaka) do not affect

288 the general segregation pattern. Noticeably, unlike the four loops at the opending side, cluster

289 analyses based on only the three loops (loops 2, 4 and 6) at the bottom side did not show a clear

290 segregation pattern (Figure 3D). It has been suggested that the loops of lipocalin proteins can affect

291 ligand binding specificities, which are similar to the binding modes of antibodies [74]; and mediate

292 protein-protein interactions [66]. Their segregation with different duplicates might indicate

293 functional divergence occurred at the loops at the opening side. The evidence that most amino acids

294 under positive selection are located on the loops or the connections between loops and strands

295 (Figures 3A and 5, Additonal file 4) further indicates the potential important roles of these loops

296 during evolution. Actually, reshaping different parts of the protein structure is an efficient way to

297 produce functional divergence at the protein level in a short time [73], and this can be one of the

298 ways in which divergence at the protein level occurred in ApoD genes.

299

12

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

300 Functional divergence of ApoD genes also occurred at the expression level. Different ApoD

301 genes in zebrafish and cavefish exhibited redundant expression profiles but became more specific as

302 the numbers of tandem duplicates increased, indicating subfunctionalization. Noticeably, new

303 expression profiles were detected in duplicated paralogs, e.g., copy A1 in novel anal fin

304 pigmentation patterns and copy A2s in the lower pharyngeal jaw in A. burtoni, which is a

305 representative cichlid fish of the most species-richness haplochromine lineage. These two traits are

306 key innovations that are associated with adaptive radiation in cichlid fishes [75]. With expression

307 changes, duplication can be an important source for the emergence of novelty [76], especially if

308 they are adaptive. The relaxed selection pressure induced by the changing expression profiles can

309 even further prompt the accumulation of mutations at the protein level.

310

311 Potential advantageous roles of ApoD genes in teleost fishes

312

313 The ApoD gene has been thoroughly studied in humans and mice, and its abnormal

314 expression was reported to be related to human diseases, such as Parkinson’s disease and

315 Alzheimer’s disease [77–79]. Interestingly, as our study revealed here, ApoD dynamically expanded

316 in fishes in two clusters in different chromosomes. Furthermore, these expanded duplicates are

317 transcriptionally active. For example, they are highly expressed, and are restricted to different

318 tissues related to sexual selection (skin [80], eye [81,82], gonads, anal fin pigmentation pattern [59])

319 and adaptation (gills [83–86], spleen [87,88], lower pharyngeal jaw [75,89]). These tissues are also

320 derived from the neural crest, which is a key innovation in vertebrates [75,90–92]. Besides, many

321 duplicated ApoD genes are under positive selection, especially for lineage-specific duplicated genes.

322 Combined with their associations with MAPK genes and forkhead transcription factors, and their

323 functions as pheromone and hormone transporters, we report evidences that suggests the potential

324 importance of ApoD genes have in fishes. 13

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

325

326 Noticeably, two clusters exhibited distinct expression patterns. ApoD genes in cluster I in

327 most fishes are expressed in tissues related to sexual selection (skin, eye, gonad and anal fin

328 pigmentation), but in cluster II, ApoD genes are mainly expressed in tissues related to adaptation

329 (gills and lower pharyngeal jaw). Interestingly, both clusters were maintained with their

330 neighbouring genes and are located next to the breakpoints of genome rearangements during

331 evolution. An inversion with ApoD clusters next to the breakpoints occurred again in cichlid fishes

332 belonging to the haplochromine lineage, which is the most species-rich lineage [93]. If genome

333 rearrangements can capture locally adaptive genes, or genes related to sexual antagonism, it would

334 accelerate divergence by reducing recombination rates, thus prompting speciation and adaptation,

335 even the emergence of neo-sex chromosome [94]. Given that many ApoD genes are under positive

336 selection, their location at the breakpoints of genome rearrangements gives them more opportunities

337 to be involved in speciation and adaptation, which deserves further investigation.

338

339 Conclusions

340

341 Here, we report the cluster expansion of ApoD genes in a cluster manner specific to fishes

342 for the first time. Different evidences based on computational evolutionary analyses strongly

343 suggest the potentially advantageous roles of ApoD genes in fishes. An in-depth functional

344 characterization of ApoD genes could help consolidate a model for the study of subfunctionalization

345 and neofunctionalization. Moreover, finding the regulatory mechanisms behind the cluster

346 expansion of ApoD genes and the reason why the ApoD gene expanded specific to fishes remain

347 open questions. As more fish genomes with high assembly quality are available, especially closely

348 related species such as cichlid fishes, the roles of ApoD genes in fish speciation and adaptation

349 could be further investigated. Above all, the ApoD gene clusters reported here, provide an ideal 14

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

350 evo-devo model for studying gene duplication, cluster maintenance, and the gene regulatory

351 mechanism and their roles in speciation and adaptation.

352

353 Methods

354

355 In silico screen and phylogenetic reconstruction to infer gene duplication

356

357 To retrieve ApoD gene duplication in teleost fishes, we first extracted orthologs and

358 paralogs in fishes with available genome data from Ensembl (Release 84) [95] and the NCBI

359 database (https://www.ncbi.nlm.nih.gov/genome/). To confirm gene copy numbers, all orthologs

360 and paralogs were used as queries in a tblastx search against the corresponding genomes. For all

361 unannotated positive hits, a region spanning approx. 2 kb was extracted, and open reading frames

362 (ORF) were predicted using Augustus (http://augustus.gobics.de/) [96]. The predicted coding

363 sequences were then BLAST to the existing transcriptome database to retrieve the corresponding

364 cDNA sequences. The cDNAs were then re-mapped to the corresponding predicted genes to re-

365 check the predicted exon-intron boundary. Coding sequence of genes from human and spotted gar,

366 as well as their neighbour genes were used to perform tblastx against the genomes of lamprey

367 (Lampetra fluviatilis) and amphioxus (Branchiostoma belcheri). To infer gene duplication, a ML

368 analysis was performed with ApoD genes retrieved from available assembled genomes in RAxML

369 v8.2.10 [97] with the GTR+G model and ‘-f a’ option (which generates the optimal tree and

370 conducts 10,000 rapid bootstrap searches).

371

372 To further retrieve ApoDs in other fishes across the whole phylogeny with draft genomes

373 [21], all sequences retrieved above were used as queries in a tblastx search with the threshold e

374 value 0.001. The hit scaffolds were retrieved, and genes within scaffolds were predicted using

15

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

375 Augustus. The predicted ApoDs were then translated and re-aligned with known ApoD ORF to

376 further confirm the exon-intron boundary. All predicted orthologs and paralogs were used as a

377 query again in a tblastx search against the corresponding genome data until no more ApoD genes

378 were predicted.

379

380 Syntenic analyses and inversion detection

381

382 To further confirm the relationship between ApoD gene duplication and TGD, genes regions

383 adjacent to the duplicates and to the outgroup species that did not experience TGD (including

384 spotted gar and chicken) were retrieved. To this end, a window of 5 Mb around the ApoD clusters in

385 teleost fishes as well as the corresponding chromosomes in spotted gar and chicken were retrieved

386 from the Ensemble database and NCBI database. Syntenic analyses were performed with SyMap

387 [98]. To further detect the structural variation around ApoD genes in cichlid fishes, we retrieved the

388 available cichlid genomes raw data from [99]. The program Delly [100] based on paired-end split-

389 reads analyses was used to detect the inversion and its corresponding breakpoints of the segments

390 that contain the ApoD clusters in cichlid fishes, with tilapia as the reference.

391

392 Protein-protein interaction prediction, protein 3D structure modelling and comparison

393

394 To predict the biological functions of ApoD genes, the protein domains of ApoD genes and

395 protein-protein interactions were predicted with the Simple Modular Architecture Research Tool

396 (SMART) [101,102] and the stringdb database http://string-db.org/ [103]. To see whether there are

397 divergences at the protein level for different ApoDs, protein 3D structures were simulated with

398 Swiss-model [104–106] using the human ApoD crystal protein structure (PDB ID 2hzr) as the

399 template. The results were further visualized, evaluated and analysed with Swiss-PdbViewer [107].

16

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

400 To compare the protein structures, we first extracted and converted the corresponding information

401 in the PDB file using Swiss-PdbViewer. Protein 3D structure comparisons were then conducted

402 using Vorolign [108], which can compare closely related protein structures even when structurally

403 flexible regions exist [108]. The protein 3D structure comparisons and cluster analyes were

404 conducted with the ProCKSI-server (http://www.procksi.net/). Clustering results were further

405 visualized using Figtree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree). Copy B2ba2 of of amazon

406 molly was not included in the cluster analyses due to its relatively short sequence. The PDB files

407 were used as the input files. For the global protein structure comparisons, PDB files were extracted

408 directly from the simulation results of Swiss-model [104–106]. To only get PDB files of the loop

409 region, we revised the PDB files manually to get rid of the cup-like central part. To consider the

410 highly variable connections between the loops and the cup-like central part (Figure 3A), we also

411 included the structures of two more residues next to the loops. The resulting PDB files were further

412 checked using Swiss-PdbViewer.

413

414 Positive selection detection

415

416 To examine whether ApoD duplicates underwent adaptive sequence evolution, a branch-site

417 model was used to test positive selection affecting a few sites along the target lineages (called

418 foreground branches) in codeml within PAML [109,110]. The rates of non-synonymous to

419 synonymous substitutions (ω or dN/dS) with a priori partitions for foreground branches (PAML

420 manual) were calculated. Considering the very dynamic evolutionary pattern of ApoD genes, we

421 tested positive selection in every branch in each species. In this case, we designated each branch as

422 the foreground branch to run the branch-site model multiple times. Noticeably, there are two

423 different ways to designate the foreground branch in the branch-site model. One way is to designate

424 only the branch as the foreground branch. The other way is to designate the whole clade (all the

17

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

425 branches within the clade including the ancestral branch) as the foreground branch. We included

426 both perspectives in our data analyses (Additonal file 4). If the whole clade is under positive

427 selection, we will label all the branches within this clade including the ancestral branch with ω>1 in

428 the results (Figure 4).

429

430 All model comparisons in PAML were performed with fixed branch lengths (fix_blength =

431 2) derived under the M0 model in PAML. Alignment gaps and ambiguity characters were removed

432 (Cleandata = 1). A likelihood ratio test was used to test for statistical significance. In addition,

433 Bonferroni correction was conducted for multiple test correction [111]. The BEB was used to

434 identify sites that are under positive selection. Sites under positive selection under BEB with

435 posterior probability >0.95 are given in Figure 5 and Additional file 4.

436

437 Gene expression profile analyses

438

439 To see the expression profiles of ApoD duplicates in different fishes, raw transcriptomic

440 data from zebrafish, cavefish, cod, medaka, tilapia and A. burtoni were retrieved from NCBI

441 https://www.ncbi.nlm.nih.gov/ (Additional file 5). Raw reads were mapped to the corresponding

442 cDNAs (http://www.ensembl.org/) to calculate the RPKM (reads per kilobase per million mapped

443 reads) value. Quantitative polymerase chain reaction (qPCR) was used to detect the expression

444 profiles of different ApoD duplicates in the tissues that are not included in the existing

445 transcriptomes (Additional file 5). Prior to tissue dissection, specimens were euthanized with MS

446 222 (Sigma-Aldrich, USA) following an approved procedure (permit nr. 2317 issued by the

447 cantonal veterinary office, Switzerland; Guidelines for the Care and Use of Laboratory Animals

448 prescribed by the Regulation of Animal Experimentation of Chongqing, China). RNA isolation was

449 performed according to the TRIzol protocol (Invitrogen, USA). DNase treatment was performed

18

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

450 with the DNA-free™ Kit (Ambion, Life Technologies, USA). The RNA quantity and quality were

451 determined with a NanoDrop1000 spectrophotometer (Thermo Scientific, USA). cDNA was

452 produced using the High Capacity RNA-to-cDNA Kit (Applied Biosystems, USA). Housekeeping

453 gene elongation factor 1 alpha (elfa1) [112], ubiquitin (ubc) [113] and ribosomal protein L7 (rpl7)

454 [17] [114] were used as endogenous control. qPCR was performed on a StepOnePlusTM Real-Time

455 PCR system (Applied Biosystems, Life Technologies) using the SYBR Green Master Mix (Roche,

456 Switzerland) with an annealing temperature of 58°C and following the manufacturer’s protocols.

457 Primers are available in Additional file 6.

458

459 List of abbreviations

460 ApoD: apolipoprotein D. WGD: whole genome duplication. TGD: teleost genome duplication. ML:

461 maximum likelihood. LG: linkage group. chr: chromosome. BEB: Bayes empirical Bayes. ORF:

462 open reading frame. SMART: Simple Modular Architecture Research Tool. RPKM: Reads Per

463 Kilobase per Million mapped reads. qPCR: quantitative polymerase chain reaction. elfa1:

464 elongation factor 1 alpha. ubc: ubiquitin. rpl7: ribosomal protein L7.

465

466 Declarations

467 Ethics approval and consent to participate

468 Animal experiments reported in this study have been approved by the cantonal veterinary

469 office in Basel under permit number 2317, Switzerland; Guidelines for Care and Use of Laboratory

470 Animals prescribed by the Regulation of Animal Experimentation of Chongqing, China.

471 Consent for publication

472 Not applicable

473

19

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

474 Availability of data and material

475 The datasets supporting the results of this article are available in the Dryad repository,

476 doi:10.5061/dryad.39g63v2. https://datadryad.org/review?doi=doi:10.5061/dryad.39g63v2 [46].

477

478 Competing interests

479 The authors declare that they have no competing interests.

480

481 Funding

482 This work was supported by the PhD grant from University of Basel, Switzerland;

483 postdoctoral funding from Southwest University, China; China Postdoctoral Science Foundation

484 Grant (2018M633311); and National Natural Science Foundation of China (31802283) to LG.

485

486 Author’s contributions

487 LG discovered the expansion of ApoD gene family, did data analysis and wrote the

488 manuscript. LG did protein 3D structure comparison and cluster analyses with the help of CX.

489

490 Acknowledgements

20

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

491 We thank Prof. Walter Salzburger for his suggestions. We also thank Prof. Deshou Wang

492 for his help of sampling of tilapia and laboratory support. We particularly thank Prof. Ziheng Yang

493 for his valuable suggestions about positive selection detection using PAML. Many thanks to the

494 kind computing support from Prof. Zhengwang Zhang. We appreciate the valuable suggestions and

495 comments from the associate editor and two anonymous reviewers very much. Many thanks to

496 Dario Moser, Heinz-Georg Belting, Jing Wei and Hua Ruan for their help of fish sampling. Many

497 thanks for the help of fish dissection from Yang Zhao, Fabrizia Ronco, Attila Rüegg, Adrian

498 Indermaur, Xianbo Zhang and He Ma. Many thanks to Lukas Zimmerman, Peter Fields, Yuchen

499 Sun, Yanyan Xu, Zihui Zhang and De Chen for their help and fruitful discussions. We also would

500 like to express our thanks to Prof. Xiangjiang Zhan in the Institute of Zoology, Chinese Academy of

501 Sciences for the help of revising the rebuttal letter. Many thanks to Prof. Anders Moller in CNRS,

502 Université Paris-Sud in France, Dr. Juan Felipe Ortiz in Vanderbilt University in America for the

503 correction of English language.

504

505 Figure legends

506 Figure 1 (A) Cluster expansion of ApoD genes specific to teleost fishes after teleost-specific

507 duplication (TGD). Each arrow represents a single gene copy. Genes with red and orange colours

508 represent paralogs derived from one common ancestor. Genes with dark and light green colours

509 represent paralogs derived from one common ancestor. Phylogeny reconstruction is based on a

510 consensus fish phylogeny [22]. (B) Maximum likelihood phylogenetic tree reconstruction to infer

511 gene duplication. Bootstrap values > 50% are marked on the branch. Lineage-specific duplication

512 events were labelled. Note, although copy B2b of Gadus morhua is clustered within the B2a clade

513 (labelled with a green star), its bootstrap value is low, which could be due to recent duplication. (C)

514 A dynamic evolutionary pattern of ApoD genes across the entire phylogeny of teleost fishes. Highly 21

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

515 variable copy numbers are detected in different lineages, especially in the Paracanthopterygii

516 lineage. Stylephorus chordatus lost all ApoD genes. Compared to copy A1, copy A2 exhibits more

517 variable lineage-specific duplicates in different fishes, with the highest numbers appearing in tilapia

518 (four copies). Copy B1 is absent in the whole clade of Gadiformes. Copy B2 only shows up in

519 species Danio rerio, Osmerus eperlanus and Parasudis fraserbrunneri. The co-existence of B2a

520 and B2b is common in Percomorphaceae. The largest numbers of lineage-specific duplicated genes

521 are found in tilapia (copy A2), medaka (copy A2) and stickleback (copy B2b) in Percomorphaceae.

522

523 Figure 2 (A) Syntenic analyses between teleost fishes and spotted gar (top) and chicken (bottom),

524 respectively. The same colour between spotted gar and chicken represents orthologous

525 chromosomes. Two paralogous duplicated segments in teleost fishes (chromosomes (chrs)/linkage

526 groups (LG) indicated in red colour) can be traced back to one corresponding orthologous region in

527 spotted gar and chicken (chrs/LGs named with black colour), linked by colourful lines. Arrows

528 show the regions where apolipoprotein D (ApoD) clusters are located. (B) ApoD clusters as the

529 breakpoints of genome rearrangements before teleost-specific genome duplication (TGD). The

530 same colour between gar and chicken represents orthologous chromosomes. Neighbouring genes

531 are labelled. The red arrows show breakpoints, and the black arrows show gene direction.

532 Neighbouring genes are named according to Ensembl database. (C) Inversion with ApoD clusters as

533 the breakpoints occurred again in cichlid fishes. The haplochromine lineage, the most species-rich

534 lineage of cichlid fishes, is labelled. (D) ApoD domains and protein-protein association predictions.

535 a. Conserved domains include a single peptide of approx. 20 amino acids (AA) and a lipocalin

536 domain of approx. 144 AA. b. Different paralogs exhibited differential associations. The common

537 association is with MAPK genes. One class (copies A2, B2a and B1) is associated with multiple

538 forkhead transcription factors. The other class (copies A1 and B2b) lost this association; instead, it

22

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

539 is associated with the lipoprotein-related gene apoa1. The single ApoD gene in coelacanth

540 possesses both associations.

541

542 Figure 3 (A) Protein 3D structure modelling. Different ApoDs among species exhibited a

543 conserved 3D structure, including a cup-like central part made by eight antiparallel β-sheet strands

544 and two ends connected by loops (a wide opening part formed by loops 1, 3, 5 and 7; and a narrow,

545 closed bottom formed by loops 2, 4 and 6). Unlike the very conserved cup-like central part, the

546 loops at the two ends are highly variable. Note that most sites under positive selection are located

547 on the loops or on the connections between the loops and the cup-like central part. (B) Cluster

548 analyses based on the whole protein 3D structure. Cluster analyses can clearly segregate different

549 ApoD duplicates, including lineage-specific duplicated genes. (C) Cluster analyses based on only

550 four loops (loops 1, 3, 5 and 7) at the opening side. Cluster analyses can segregate different ApoD

551 duplicates. (D) Cluster analyses based on only three loops (loops 2, 4 and 6) at the bottom side.

552 Cluster analyses cannot segregate with different duplicates.

553

554 Figure 4 Positive selection detection of ApoD genes in different fishes under the branch-site

555 model. Many duplicated ApoD genes are under positive selection with the value of ω significantly

556 larger than 1 after Bonferroni correction, especially for lineage-specific duplicated genes. Note that

557 if the whole clade is designated as the foreground branch when detecting positive selection, each

558 branch of this clade will be labelled if its ω is significantly larger than 1. The unrooted tree was

559 used for PAML analyses, and the rooted tree is only for presentation. More details can be found in

560 Additional file 4.

561

23

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

562 Figure 5 Amino acids alignment of ApoD duplicates among species with annotation. The purple

563 arrow represents the β-strand, and the yellow arrow represents the loop. Sites under positive

564 selection under Bayes empirical Bayes (BEB) with posterior probability > 95% are labelled with the

565 red colour. Most sites under positive selection are distributed on the loops or on the connections

566 between the loop and the β-strand. More details can be found in Additional file 4. Note, the

567 positions of the sites that under positive selection in Additional file 4 are based on the codon

568 alignment used for PAML analyses (Additional file 4) instead of the amino acids alignment here.

569 But the amino acids that under positive selection are the same in Figure 5 and Additional file 4.

570

571 Figure 6 Gene expression profiles of different ApoD genes. (a) Schematic figure showing the

572 gene expression profiles of different ApoD genes in two clusters after teleost-specific duplication

573 (TGD). Each block represents a single gene copy. Phylogeny reconstruction was based on the

574 consensus fish phylogeny [22]. (b) Detailed gene expression profiles of ApoD genes among fishes.

575 The red colour represents tissues with a high expression level. Dark grey, data unavailable (either

576 because the tissue does not exist in the species, or is undetected in this study). Light grey represents

577 tissues with a low expression level. LPJ, the lower pharyngeal jaw in cichlid fishes. (c) Gene

578 expression profiles of ApoD genes in different developmental stages of gonads in tilapia based on

579 available transcriptomic data [115]. RPKM, reads per kilobase per million mapped reads; dah, days

580 after hatching.

581

582 Additional Files

583 Additional file 1 Sequence alignment for ML tree reconstruction.

584 Additional file 2 Predicted ApoD genes across the phylogeny.

585 Additional file 3 Protein PDB files and cluster analyses results.

24

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

586 Additional file 4 Sequence alignment for PAML test and statistics results.

587 Additional file 5 Raw transcriptomic data information and gene expression profiles.

588 Additional file 6 qPCR primer sequences.

589

590 References

591 1. Ohno S. Evolution by gene duplication. Springer Berlin Heidelb. 1970;

592 2. Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing

593 between models. Nat. Rev. Genet. 2010;11:97–108.

594 3. Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. Preservation of duplicate

595 genes by complementary, degenerative mutations. Genetics. 1999;151:1531–45.

596 4. Rastogi S, Liberles DA. Subfunctionalization of duplicated genes as a transition state to

597 neofunctionalization. BMC Evol. Biol. 2005;5:28.

598 5. Turetzek N, Pechmann M, Schomburg C, Schneider J, Prpic N-M. Neofunctionalization of a

599 Duplicate dachshund Gene Underlies the Evolution of a Novel Leg Segment in Arachnids. Mol.

600 Biol. Evol. 2016;33:109–21.

601 6. Chen L, DeVries AL, Cheng C-HC. Evolution of antifreeze glycoprotein gene from a

602 trypsinogen gene in Antarctic notothenioid fish. Proc. Natl. Acad. Sci. 1997;94:3811–6.

603 7. Dulai KS, von Dornum M, Mollon JD, Hunt DM. The Evolution of Trichromatic Color Vision

604 by Opsin Gene Duplication in New World and Old World Primates. Genome Res. 1999;9:629–38.

605 8. Tang Y-C, Amon A. Gene copy-number alterations: a cost-benefit analysis. Cell.

606 2013;152:394–405.

607 9. Lin Z, Li W-H. Expansion of hexose transporter genes was associated with the evolution of

608 aerobic fermentation in yeasts. Mol. Biol. Evol. 2011;28:131–42.

609 10. Glasauer SMK, Neuhauss SCF. Whole-genome duplication in teleost fishes and its evolutionary

610 consequences. Mol. Genet. Genomics. 2014;289:1045–60. 25

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

611 11. Baalsrud HT, Voje KL, Tørresen OK, Solbakken MH, Matschiner M, Malmstrøm M, et al.

612 Evolution of Hemoglobin Genes in Codfishes Influenced by Ocean Depth. Sci. Rep. 2017;7:7956.

613 12. Zhang J, Zhang Y, Rosenberg HF. Adaptive evolution of a duplicated pancreatic ribonuclease

614 gene in a leaf-eating monkey. Nat. Genet. 2002;30:411–5.

615 13. Storz JF, Opazo JC, Hoffmann FG. Gene duplication, genome duplication, and the functional

616 diversification of vertebrate globins. Mol. Phylogenet. Evol. 2013;66:469–78.

617 14. Zhang J. Parallel adaptive origins of digestive RNases in Asian and African leaf monkeys. Nat.

618 Genet. 2006;38:819–23.

619 15. Jiménez-Delgado S, Pascual-Anaya J, Garcia-Fernàndez J. Implications of duplicated cis-

620 regulatory elements in the evolution of metazoans: the DDI model or how simplicity begets novelty.

621 Brief. Funct. Genomic. Proteomic. 2009;8:266–75.

622 16. Moriyama Y, Ito F, Takeda H, Yano T, Okabe M, Kuraku S, et al. Evolution of the fish heart by

623 sub/neofunctionalization of an elastin gene. Nat. Commun. 2016;7:10397.

624 17. Santos ME, Braasch I, Boileau N, Meyer BS, Sauteur L, Böhne A, et al. The evolution of

625 cichlid fish egg-spots is linked with a cis-regulatory change. Nat. Commun. 2014;5:5149.

626 18. Carrasco AE, McGinnis W, Gehring WJ, De Robertis EM. Cloning of an X. laevis gene

627 expressed during early embryogenesis coding for a peptide region homologous to Drosophila

628 homeotic genes. Cell. 1984;37:409–14.

629 19. Proudfoot NJ, Shander MH, Manley JL, Gefter ML, Maniatis T. Structure and in vitro

630 transcription of human globin genes. Science. 1980;209:1329–36.

631 20. Brooke NM, Garcia-Fernàndez J, Holland PWH. The ParaHox gene cluster is an evolutionary

632 sister of the Hox gene cluster. Nature. 1998;392:920–2.

633 21. Malmstrøm M, Matschiner M, Tørresen OK, Star B, Snipen LG, Hansen TF, et al. Evolution of

634 the immune system influences speciation rates in teleost fishes. Nat. Genet. 2016;48:1204–10.

635 22. Cortesi F, Musilová Z, Stieb SM, Hart NS, Siebeck UE, Malmstrøm M, et al. Ancestral

26

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

636 duplications and highly dynamic opsin gene evolution in percomorph fishes. Proc. Natl. Acad. Sci.

637 2015;112:1493–8.

638 23. Garcia-Fernàndez J. The genesis and evolution of gene clusters. Nat. Rev. Genet.

639 2005;6:881–92.

640 24. Hardison RC. Evolution of hemoglobin and its genes. Cold Spring Harb. Perspect. Med.

641 2012;2:a011627.

642 25. Robinson-Rechavi M, Boussau B, Laudet V. Phylogenetic dating and characterization of gene

643 duplications in vertebrates: the cartilaginous fish reference. Mol. Biol. Evol. 2004;21:580–6.

644 26. Sato Y, Nishida M. Teleost fish with specific genome duplication as unique models of

645 vertebrate evolution. Environ. Biol. Fishes.2010;88:169–88.

646 27. Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, et al. The spotted gar

647 genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat. Genet.

648 2016;48:427–37.

649 28. Brunet FG, Volff J-N, Schartl M. Whole Genome Duplications Shaped the Tyrosine

650 Kinase Repertoire of Jawed Vertebrates. Genome Biol. Evol. 2016;8:1600–13.

651 29. Gillis WQ, St John J, Bowerman B, Schneider SQ. Whole genome duplications and expansion

652 of the vertebrate GATA gene family. BMC Evol. Biol. 2009;9:207.

653 30. Voldoire E, Brunet F, Naville M, Volff J-N, Galiana D. Expansion by whole genome

654 duplication and evolution of the in teleost fish. Vaudry H, editor. PLoS One.

655 2017;12:e0180936.

656 31. Sémon M, Wolfe KH. Rearrangement rate following the whole-genome duplication in teleosts.

657 Mol. Biol. Evol. 2007;24:860–7.

658 32. Kirkpatrick M, Barton N. Chromosome inversions, local adaptation and speciation. Genetics.

659 2006;173:419–34.

660 33. Farré M, Micheletti D, Ruiz-Herrera A. Recombination Rates and Genomic Shuffling in Human

27

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

661 and Chimpanzee—A New Twist in the Chromosomal Speciation Theory. Mol. Biol. Evol.

662 2013;30:853–64.

663 34. Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR, et al. Chromosomal

664 rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature.

665 2011;477:203–6.

666 35. Ayala D, Fontaine MC, Cohuet A, Fontenille D, Vitalis R, Simard F. Chromosomal inversions,

667 natural selection and adaptation in the malaria vector Anopheles funestus. Mol. Biol. Evol.

668 2011;28:745–58.

669 36. Barth JMI, Berg PR, Jonsson PR, Bonanomi S, Corell H, Hemmer-Hansen J, et al. Genome

670 architecture enables local adaptation of Atlantic cod despite high connectivity. Mol. Ecol.

671 2017;26:4452–66.

672 37. Stevison LS, Hoehn KB, Noor MAF. Effects of Inversions on Within- and Between-Species

673 Recombination and Divergence. Genome Biol. Evol. 2011;3:830–41.

674 38. Corbett-Detig RB. Selection on Inversion Breakpoints Favors Proximity to Pairing Sensitive

675 Sites in Drosophila melanogaster. Genetics. 2016;204:259–65.

676 39. Otto SP. The Evolutionary Consequences of Polyploidy. Cell. 2007;131:452–62.

677 40. Hufton AL, Groth D, Vingron M, Lehrach H, Poustka AJ, Panopoulou G. Early vertebrate

678 whole genome duplications were predated by a period of intense genome rearrangement. Genome

679 Res. 2008;18:1582–91.

680 41. Ayrault Jarrier M, Levy G, Polonovski J. ETUDE DES ALPHA-LIPOPROT’EINES

681 S'ERIQUES HUMAINES PAR. Bull. Soc. Chim. Biol. (Paris). 1963;45:703–13.

682 42. Rassart E, Bedirian A, Do Carmo S, Guinard O, Sirois J, Terrisse L, et al. Apolipoprotein D.

683 Biochim. Biophys. Acta - Protein Struct. Mol. Enzymol. 2000;1482:185–98.

684 43. Weech P, Provost P, Tremblay N, Camato R, Milne R, Marcel Y, et al. Apolipoprotein D—An

685 atypical apolipoprotein. Prog. Lipid Res. 1991;30:259–66.

28

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

686 44. Drayna D, Fielding C, McLean J, Baer B, Castro G, Chen E, et al. Cloning and expression of

687 human apolipoprotein D cDNA. J. Biol. Chem. 1986;261:16535–9.

688 45. Provost PR, Villeneuve L, Weech PK, Milne RW, Marcel YL, Rassart E. Localization of the

689 major sites of rabbit apolipoprotein D gene transcription by in situ hybridization. J. Lipid Res.

690 1991;32:1959–70.

691 46. Gu L, Xia C. Data from: Cluster expansion of apolipoprotein D (ApoD) genes in teleost fishes.

692 Dryad Digit. Repos. 2018. p. https://doi.org/10.5061/dryad.39g63v2.

693 47. Gilleron M, Lepore M, Layre E, Cala-De Paepe D, Mebarek N, Shayman JA, et al. Lysosomal

694 Lipases PLRP2 and LPLA2 Process Mycobacterial Multi-acylated Lipids and Generate T Cell

695 Stimulatory Antigens. Cell Chem. Biol. 2016;23:1147–56.

696 48. Bailey SD, Xie C, Do R, Montpetit A, Diaz R, Mohan V, et al. Variation at the NFATC2 Locus

697 Increases the Risk of Thiazolidinedione-Induced Edema in the Diabetes REduction Assessment

698 with ramipril and rosiglitazone Medication (DREAM) Study. Diabetes Care. 2010;33:2250–3.

699 49. Fotakis P, Kuivenhoven JA, Dafnis E, Kardassis D, Zannis VI. The Effect of Natural LCAT

700 Mutations on the Biogenesis of HDL. Biochemistry. 2015;54:3348–59.

701 50. Tateno H, Yabe R, Sato T, Shibazaki A, Shikanai T, Gonoi T, et al. Human ZG16p recognizes

702 pathogenic fungi through non-self polyvalent mannose in the digestive system. Glycobiology.

703 2012;22:210–20.

704 51. Plestant C, Anton ES. Scaling the MAPK Signaling Threshold during CNS Patterning. Dev.

705 Cell. 2013;25:221–2.

706 52. Shvartsman SY, Coppey M, Berezhkovskii AM. MAPK signaling in equations and embryos.

707 Fly. 2009;3:62–7.

708 53. Mangaraj M, Nanda R, Panda S. Apolipoprotein A-I: A Molecule of Diverse Function. Indian J.

709 Clin. Biochem. 2016;31:253–9.

710 54. Fiaschetti G, Schroeder C, Castelletti D, Arcaro A, Westermann F, Baumgartner M, et al.

29

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

711 NOTCH ligands JAG1 and JAG2 as critical pro-survival factors in childhood medulloblastoma.

712 Acta Neuropathol. Commun. 2014;2:39.

713 55. Reddy S, Devlin R, Menaa C, Nishimura R, Choi SJ, Dallas M, et al. Isolation and

714 characterization of a cDNA clone encoding a novel peptide (OSF) that enhances osteoclast

715 formation and bone resorption. J. Cell. Physiol. 1998;177:636–45.

716 56. Wong C-H, Fung Y-WW, Ng EK-O, Lee SM-Y, Waye MM-Y, Tsui SK-W. LIM domain

717 protein FHL1B interacts with PP2A catalytic β subunit - A novel cell cycle regulatory pathway.

718 FEBS Lett. 2010;584:4511–6.

719 57. Ng EL, Tang BL. Rab GTPases and their roles in brain neurons and glia. Brain Res. Rev.

720 2008;58:236–46.

721 58. Pandita E, Rajan S, Rahman S, Mullick R, Das S, Sau AK. Tetrameric assembly of hGBP1 is

722 crucial for both stimulated GMP formation and antiviral activity. Biochem. J. 2016;473:1745–57.

723 59. Gu L, Xia C. Revelation of the Genetic Basis for Convergent Innovative Anal Fin Pigmentation

724 Patterns in Cichlid Fishes. bioRxiv. 2017; doi:https://doi.org/10.1101/165217.

725 60. Hofberger JA, Nsibo DL, Govers F, Bouwmeester K, Schranz ME. A Complex Interplay of

726 Tandem- and Whole-Genome Duplication Drives Expansion of the L-Type Lectin Receptor Kinase

727 Gene Family in the Brassicaceae. Genome Biol. Evol. 2015;7:720–34.

728 61. Bellieny-Rabelo D, Oliveira AEA, Venancio TM. Impact of Whole-Genome and Tandem

729 Duplications in the Expansion and Functional Diversification of the F-Box Family in Legumes

730 (Fabaceae). Nikolaidis N, editor. PLoS One. 2013;8:e55127.

731 62. Hammoudi V, Vlachakis G, Schranz ME, van den Burg HA. Whole-genome duplications

732 followed by tandem duplications drive diversification of the protein modifier SUMO in

733 Angiosperms. New Phytol. 2016;211:172–85.

734 63. Kondrashov FA, Kondrashov AS. Role of selection in fixation of gene duplications. J. Theor.

735 Biol. 2006;239:141–51.

30

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

736 64. Schranz ME, Mohammadin S, Edger PP. Ancient whole genome duplications, novelty and

737 diversification: the WGD Radiation Lag-Time Model. Curr. Opin. Plant Biol. 2012;15:147–53.

738 65. Engel A, Gaub HE. Structure and mechanics of membrane proteins. Annu. Rev. Biochem.

739 2008;77:127–48.

740 66. Flower DR. The lipocalin protein family: structure and function. Biochem. J. 1996;318:1–14.

741 67. McLachlan AD. Protein Structure and Function. Annu. Rev. Phys. Chem. 1972;23:165–92.

742 68. Klingenberg M. Membrane protein oligomeric structure and transport function. Nature.

743 1981;290:449–54.

744 69. Friedman JM. Structure, dynamics, and reactivity in hemoglobin. Science. 1985;228:1273–80.

745 70. Ewart K V., Lin Q, Hew CL. Structure, function and evolution of antifreeze proteins. Cell. Mol.

746 Life Sci. C. 1999;55:271–83.

747 71. Arévalo-Pinzón G, Curtidor H, Muñoz M, Patarroyo MA, Bermudez A, Patarroyo ME. A single

748 amino acid change in the Plasmodium falciparum RH5 (PfRH5) human RBC binding sequence

749 modifies its structure and determines species-specific binding activity. Vaccine. 2012;30:637–46.

750 72. Schaefer C, Rost B. Predict impact of single amino acid change upon protein structure. BMC

751 Genomics. 2012;13:S4.

752 73. Flower DR. Multiple molecular recognition properties of the lipocalin protein family. J. Mol.

753 Recognit. 1995;8:185–95.

754 74. Skerra A. Engineered protein scaffolds for molecular recognition. J. Mol. Recognit.

755 2000;13:167–87.

756 75. Salzburger W. The interaction of sexually and naturally selected traits in the adaptive radiations

757 of cichlid fishes. Mol. Ecol. 2009;18:169–85.

758 76. Rogers RL, Shao L, Thornton KR. Tandem duplications lead to novel expression patterns

759 through exon shuffling in Drosophila yakuba. Begun DJ, editor. PLOS Genet. 2017;13:e1006795.

760 77. Chen Y, Jia L, Wei C, Wang F, Lv H, Jia J. Association between polymorphisms in the

31

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

761 apolipoprotein D gene and sporadic Alzheimer’s disease. Brain Res. 2008;1233:196–202.

762 78. Helisalmi S, Hiltunen M, Vepsäläinen S, Iivonen S, Corder EH, Lehtovirta M, et al. Genetic

763 variation in apolipoprotein D and Alzheimer's disease. J. Neurol. 2004;251:951–7.

764 79. Waldner A, Dassati S, Redl B, Smania N, Gandolfi M. Apolipoprotein D Concentration in

765 Human Plasma during Aging and in Parkinson’s Disease: A Cross-Sectional Study. Parkinsons.

766 Dis. 2018;2018:1–7.

767 80. Wellenreuther M, Svensson EI, Hansson B. Sexual selection and genetic colour polymorphisms

768 in animals. Mol. Ecol. 2014;23:5398–414.

769 81. CARLETON KL, PARRY JWL, BOWMAKER JK, HUNT DM, SEEHAUSEN O. Colour

770 vision and speciation in Lake Victoria cichlids of the genus Pundamilia. Mol. Ecol. 2005;14:4341–

771 53.

772 82. Flamarique IN, Bergstrom C, Cheng CL, Reimchen TE. Role of the iridescent eye in

773 stickleback female mate choice. J. Exp. Biol. 2013;216:2806–12.

774 83. Bystriansky JS, Schulte PM. Changes in gill H+-ATPase and Na+/K+-ATPase expression and

775 activity during freshwater acclimation of Atlantic salmon (Salmo salar). J. Exp. Biol.

776 2011;214:2435–42.

777 84. McCormick SD, Bradshaw D. Hormonal control of salt and water balance in vertebrates. Gen.

778 Comp. Endocrinol. 2006;147:3–8.

779 85. Sakamoto T. Growth hormone and prolactin in environmental adaptation. Zoolog. Sci.

780 2003;20:1497–8.

781 86. Foskett JK, Bern HA, Machen TE, Conner M. Chloride cells and the hormonal control of teleost

782 fish osmoregulation. J. Exp. Biol. 1983;106:255–81.

783 87. Papetti C, Harms L, Windisch HS, Frickenhaus S, Sandersfeld T, Jürgens J, et al. A first insight

784 into the spleen transcriptome of the notothenioid fish Lepidonotothen nudifrons: Resource

32

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

785 description and functional overview. Mar. Genomics. 2015;24:237–9.

786 88. Huang Y, Chain FJJ, Panchal M, Eizaguirre C, Kalbe M, Lenz TL, et al. Transcriptome

787 profiling of immune tissues reveals habitat-specific gene expression between lake and river

788 sticklebacks. Mol. Ecol. 2016;25:943–58.

789 89. Muschick M, Indermaur A, Salzburger W. Convergent evolution within an adaptive radiation of

790 cichlid fishes. Curr. Biol. 2012;22:2362–8.

791 90. Green SA, Simoes-Costa M, Bronner ME. Evolution of vertebrates as viewed from the crest.

792 Nature. 2015;520:474–82.

793 91. Barlow-Anacker AJ, Fu M, Erickson CS, Bertocchini F, Gosain A. Neural Crest Cells

794 Contribute an Astrocyte-like Glial Population to the Spleen. Sci. Rep. 2017;7:45645.

795 92. Bailey AP, Bhattacharyya S, Bronner-Fraser M, Streit A. Lens Specification Is the Ground State

796 of All Sensory Placodes, from which FGF Promotes Olfactory Identity. Dev. Cell. 2006;11:505–17.

797 93. Salzburger W, Mack T, Verheyen E, Meyer A. Out of Tanganyika: genesis, explosive

798 speciation, key-innovations and phylogeography of the haplochromine cichlid fishes. BMC Evol.

799 Biol. 2005;5:17.

800 94. Charlesworth D. Evolution of recombination rates between sex chromosomes. Philos. Trans. R.

801 Soc. B Biol. Sci. 2017;372:20160456.

802 95. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. Ensembl 2014. Nucleic Acids

803 Res. 2014;42:D749–55.

804 96. Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA

805 alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–44.

806 97. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large

807 phylogenies. Bioinformatics. 2014;30:1312–3.

808 98. Soderlund C, Bomhoff M, Nelson W. SyMAP v3.4: a turnkey synteny system with application

809 to plant genomes. Nucleic Acids Res. 2011;39:e68.

33

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

810 99. Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, et al. The genomic substrate for

811 adaptive radiation in African cichlid fish. Nature. 2014;513:375–81.

812 100. Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO. DELLY: structural variant

813 discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–9.

814 101. Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015.

815 Nucleic Acids Res. 2015;43:D257–60.

816 102. Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic Acids

817 Res. 2018;46:D493–6.

818 103. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING

819 database in 2017: quality-controlled protein–protein association networks, made broadly accessible.

820 Nucleic Acids Res. 2017;45:D362–8.

821 104. Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL:

822 modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids

823 Res. 2014;42:W252–8.

824 105. Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T. Protein structure homology

825 modeling using SWISS-MODEL workspace. Nat. Protoc. 2008;4:1–13.

826 106. Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based

827 environment for protein structure homology modelling. Bioinformatics. 2006;22:195–201.

828 107. Guex N, Peitsch MC, Schwede T. Automated comparative protein structure modeling with

829 SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis. 2009;30:S162–

73. 830 73.

831 108. Birzele F, Gewehr JE, Csaba G, Zimmer R. Vorolign--fast structural alignment using Voronoi

832 contacts. Bioinformatics. 2007;23:e205–11.

833 109. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol.

834 2007;24:1586–91.

34

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

835 110. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood.

836 Comput. Appl. Biosci. 1997;13:555–6.

837 111. Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages under positive selection

838 that affects only a few sites. Mol. Biol. Evol. 2007;24:1219–28.

839 112. McCurley AT, Callard G V. Characterization of housekeeping genes in zebrafish: male-female

840 differences and effects of tissue type, developmental stage and chemical treatment. BMC Mol. Biol.

841 2008;9:102.

842 113. Hibbeler S, Scharsack JP, Becker S. Housekeeping genes for quantitative expression studies in

843 the three-spined stickleback Gasterosteus aculeatus. BMC Mol. Biol. 2008;9:18.

844 114. Zhang Z, Hu J. Development and validation of endogenous reference genes for expression

845 profiling of medaka (Oryzias latipes) exposed to endocrine disrupting chemicals by quantitative

846 real-time RT-PCR. Toxicol. Sci. 2007;95:356–68.

847 115. Tao W, Sun L, Shi H, Cheng Y, Jiang D, Fu B, et al. Integrated analysis of miRNA and

848 mRNA expression profiles in tilapia gonads at an early stage of sex differentiation. BMC

849 Genomics. 2016;17:328.

850

35

bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A1 B2B2a B2b A2 B1 Astyanax mexicanus A1 A C Dario rerio B2 A1 Danio rerio Astyanax mexicanus Gadus morhua B2b A1 Cluster I Osmerus eperlanus Gasterosteus aculeatus B2a B2bs1 B2bs2 B2bs3 A1 Borostomias antarcticus Takifugu rubripes B2a B2b A1 Parasudis fraserbrunneri Tetraodon nigroviridis B2a Guentherus altivela Oreochromis niloticus B2a B2b A1 Benthosema glaciale Percopsis transmontana TGD Oryzias latipes B2a B2b A1 Typhlichthys subterraneus Poecilia formosa B2a B2ba1 A1 B2ba2 Polymixia japonica Xiphophorus maculatus B2a B2b A1 Neoteleostei Zeus faber Cyttopsis roseus Astyanax mexicanus A2 Paracanthopterygii Stylephorus chordatus Dario rerio A2 Trachyrincus scabrus Cluster II Gadus morhua A2 Zeiogadaria Trachyrincus murrayi Gasterosteus aculeatus A2s1 A2s2 B1 Ctenosquamata Muraenolepis marmoratus Takifugu rubripes A2 B1 Bathygadus melanobranchus Macrourus berglax Tetraodon nigrovirides A2 Gadariae Malacocephalus occidentalis Oreochromis niloticus A2t4 A2t1 B1 A2t2 A2t3 Melanonus zugmayeri Oryzias latipes A2m1 A2m2 A2m3 Merluccius merluccius Poecilia formosa A2 Merluccius capensis Phycis blennoides Xiphophorus maculatus A2 Acanthomorphata Gadiformes Trisopterus minutus Merlangius merlangus Lepisosteus oculatus B A Gadidae Arctogadus glacialis Latimeria chalumnae A 500bp Gadus morhua Brosme brosme Molva molva Regalecus glesne copy A B Lampris guttatus Lepisosteus oculatus Oreochromis niloticus 100 Neolamprologus brichardi cichlids specific duplication Maylandia zebra copy A2t1 Monocentris japonica Pundamilia nyererei Astatotilapia burtoni 100 Oreochromis niloticus Acanthopterygii Rondeletia loricata 100 Pundamilia nyererei 100 Maylandia zebra copy A2t2 Astatotilapia burtoni Acanthochaenus luetkenii 100 Oreochromis niloticus Neolamprologus brichardi Holocentrus rufus 100 Maylandia zebra Astatotilapia burtoni copy A2t3 100 Neoniphon sammara OreochromisPundamilianiloticus nyererei Euacanthomorphacea 66 100 Pundamilia nyererei Maylandia zebra copy A2t4 Myripristis jacobus Astatotilapia burtoni Neolamprologus brichardi 100 Oryzias latipes A2m2 Brotula barbata Oryzias latipes 98 Oryzias latipes A2m3 100 Poecilia formosa A2m1 Lamprogrammus exutus Xiphophorus maculatus 100 Gasterosteus aculeatus copy A2 Gasterosteus aculeatusA2s1 Carapus acus 95 Tetraodon nigroyirides A2s2 Takifugu rubripes Lesueurigobius cf. sanzi Gadus morhua 97 Astyanax mexicanus Dario rerio 97 Oreochromis niloticus Thunnus albacares 100 Neolamprologus brichardi 57 Astatotilapia burtoni Maylandia zebra Percomorphaceae Selene dorsalis Pundamilia nyererei 58 Gasterosteus aculeatus copy A1 89 Takifugu rubripes Helostoma temminckii Oryzias latipes 92 100 Poecilia formosa Anabas testudineus 100 Xiphophorus maculatus 81 Astyanax mexicanusGadus morhua Parablennius parvicornis Dario rerio Oreochromis niloticus 100 Pundamilia nyererei Maylandia zebra Chromis chromis Astatotilapia burtoni Neolamprologus brichardi Pseudochromis fuscus Takifugu rubripes copy B2b 100 100 Poecilia formosaB2ba2 Xiphophorus maculatus 51 XiphophorusPoeciliamaculatusformosa B2ba1 Oryzias latipes 94 Gasterosteus aculeatusB2bs1 Poecilia formosa 100 Gasterosteus aculeatusB2bs2 Gasterosteus aculeatus B2bs3 Neolamprologus brichardi Oryzias latipes 99 Pundamilia nyererei 82 100 Maylandia zebra Oreochromis niloticus OreochromisAstatotilapia niloticusburtoni Oryzias latipes copy B2a Symphodus melops 100 100 Xiphophorus maculatus 95 Poecilia formosa 95 Tetraodon nigroyirides Spondyliosoma cantharus 57 Takifugu rubripes Gasterosteus aculeatus 97 Gadus morhua Antennarius striatus AstatotilapiaDario rerioburtoni copy B2 Pundamilia nyererei Takifugu rubripes 80 100 Maylandia zebra 64 Neolamprologus brichardi 100 Oreochromis niloticus copy B1 Tetraodon nigrovirides Gasterosteus aculeatus Takifugu rubripes Lepisosteus oculatus Gasterosteus aculeatus Latimeria chalumnae copy B 0.2 Sebastes norvegicus bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A Syntenic analysis of genome region possessing ApoD clusters in teleost fishes with gar and chicken B ApoD clusters as the breakpoints of genome rearrangement beforeTGD

LG14 LG14 LG14 LG14 a. Genome rearrangement before TGD b. ApoD genes and their neighbours at the breakpoints LG19 LG19 LG19 LG19 si:dkey elavl1 gng7 LG10 LG10 LG10 chr 2 myo9b insra sppl2 57k2.6 phc3 skila nadkbsamd7 apod pex19 zgc101744 LG10 Dario rerio arhgef18a si:dkeyprkci gpr160sec62 apod copatspan37 diras1b hiat1a dapk357k2.7 slc1a8b gadd45bb group III arhgef18a gpr160 novel LG3 LG3 LG3 LG3 atcyab novel hiat1a dapk3 prkci sec62 apod apod apod lrrcc1 ackr4a Gasterosteus aculeatus sppl2 phc3 skila nadkbsamd7apod apod ralyl novel chr20 group XXI cluster I loxl5bmyo9b insra Cluster II LG9 chr2 Cluster II Cluster II chr 17 myo9b insra novel novel prkci novel samd7apod and2 novel novel ackr4anovel gar_LG9 Cluster I Oryzias latipes LG9 LG9 LG9 hiat1a sppl2 nadkb sec62 apod apod ralyl lrrcc1 e2f5 novel novel chr17 group III LG18 chr24 arhgef18a dapk3 sppl2 Cluster I Cluster I Cluster I Cluster II LG18 nmrk2 myo9b insra phc3 skila sec62 apod apod ralyl e2f5 Oreochromis niloticus hiat1a dapk3 prkci nadkbsamd7 apod and2lrrcc1ackr4a medaka-gar stickleback-gar tilapia-gar zebrafish-gar loxl5b arhgef18a TGD ApoD clusters genome rearrangement skilb rpl22l1 lrrc34 apod myeov2 tfr1b klhl24b chr9 chr9 chr9 chr9 with ApoD clusters as the breakpoints larp4bmsl2bnceh1bnovel ghsrb tnikb cldn11b tmtopsb map3k15 chr28 chr28 chr 24 chr28 chr28 Dario rerio stag1b nceh1b pld1b eif5a novel lrrc31 mynn otos and1 tnk2accl20b dip2ca tnfsf10 fndc3bb slc7a14b sh3kbp1 chr8 rad51b um_ chr8 chr8 chr8 group XXI rdh12l mastl abi1a gad2anxa13thnsl1sam821apod otos and1 tnk2a sh3kbp1 cnksr2a Gasterosteus aculeatus arhgap21 tfr1b novel chr1 chr1 ca2 acbd5a pdss1 gpr158a apod myeov2 klhl34 chr1 chr1 cluster II novel myo3a enkur tmtopsb map3k15 um_ apod chr 20 yme1l1a anxa13 thnsl1 apod nceh1b ralyl rdh12l acbd5apdss1 gad2 sam821 myeov2novel tnk2a tfr1b chr20 group XXI LG9 chr2 Oryzias latipes Cluster II Cluster II Cluster I apbb1ip novel enkur apod otos novel novel Cluster II novel novel mastl abi1a myo3a arhgap21 tmtopsb tnfsf10 um_ chr2 chr2 chr2 chr2 LG 9 rdh12l myo3a thnsl1 sh3kbp1 cnksr2a ralyl mastl abi1aapbb1ip gpr158a sam821apod apod otos and1 novel ccl20b chr17 group III LG18 chr24 Oreochromis niloticus arhgap21 tfr1b tlr22 Cluster I Cluster I Cluster I Cluster II ca2 acbd5apdss1gad2 anxa13novel apod myeov2 yme1l1a enkur apod tmtopsb map3k15

medaka-chicken stickleback-chicken tilapia-chicken zebrafish-chicken LG10 LG14 LG19 LG3 LG9 skila nadkbsamd7 lrriq4mynn apod otos and1 novel tnk2b novel novel ccl34b Lepisosteus oculatus prkci gpr160sec62 lrrc31lrrc34novel apod myeov2 tmtopsb novel adipoqa tfr1b ccl20 ApoD clusters as the breakpoints note: the schematic lengths here do not represent the real lengths

Inversion occurred again with ApoD clusters as the breakpoints in cichlid fishes a. Schematic of ApoD protein domains D C common 5’ signal peptide lipocalin 3’ apoa1+MAPK Cluster I ca. 16AA tilapia approx. 20AA approx. 144AA Pundamilia nyererei b. Duplicated ApoD genes binding opportunities prediction platyfish MAPK+forkhead fugu haplochromine pla2g15 lcat cod Maylandia zebra A2 foxo3a stickleback copy A1 mapk9 copy A2 mapk10foxo1a zg16 medaka Astatotilapia burtoni foxo1b mapk8a LG18 foxo3b zebrafish foxo5mapk8b MAPK signalling pathway forkhead transcription factors coelacanth Oreochromis niloticus copy B1 coelacanth 40 kb pla2g15 lcat apoa1 B1 pla2g15 lcat Cluster II foxo3a mapk9 A foxo1a mapk9 jag1 mapk10 zg16 foxo1 jag2 Pundamilia nyererei foxo3bmapk8a anxa11 foxo1b mapk8 mapk8b MAPK signalling pathway MAPK signalling pathway haplochromine Maylandia zebra forkhead transcription factors forkhead transcription factors copy B2a copy B2b Astatotilapia burtoni LG9 pla2g15 lcat foxo3a B2a Oreochromis niloticus mapk9 mapk10 foxo1a mapk8b zg16 foxo3b foxo1b mapk8a foxo5 30 kb MAPK signalling pathway forkhead transcription factors zebrafish LG ApoD cluster inversion copy A1 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Protein 3D structure modelling and cluster analyses of ApoD genes

A. loop 3 B. Cluster analyses based on the whole protein 3D structure opening side loop 1 B2b B2bs2 loop 7 B2bs1 zeb_B2 loop 5 B2a B1 cup-like A2m2 center A2m3

A2t

loop 4 loop 6 bottom A1 loop 2 A2 sites under positive selection under BEB with posterior probability > 95% A2s 0.8 C. Cluster analyses based on loops 1, 3, 5, 7 at the opening side D. Cluster analyses based on loops 2, 4, 6 at the bottom B2bB2a A1 A1 B2a B1 meda_B2b A2 A1 A2

zeb_B2 A2 B2

B2b B2a B1 B2b A1 A2 0.7 B2a 0.7 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Positive selection detection under branch-site model within codeml in PAML

�>1 A1 A1 �>1 A2s2 �>1 A2s1 A1 >1 � A2 A2 B1 B2bs1 B2bs2 B2 B2b B2bs3 B2a Dario rerio 0.2 Gadus morhua 0.3 Gasterosteus aculeatus 0.2

�>1 A2t2 A2m3 A1 >1 �>1 � >1 �>1 � A2t4 A2m2 A2 �>1 A2t3 A2m1 B1 A2t1 A1 A1 B2a B2b �>1 >1 B2b B2a � B2b B1 B2a Oryzias latipes Takifugu rubripes 0.4 cichlid fishes 0.3 0.3

A1 A1 A2 A2 B2a B2a �>1B2ba1 �>1 �>1 �>1B2ba2 B2b Poecilia formosa Xiphophorus maculatus 0.3 0.3 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Amino acids alignment of ApoD genes among species with protein 3D structure annotation Dario rerio Astyanax mexicanus Gadus morhua Gasterosteus aculeatus Takifugu rubripes copy A1 Oreochromis niloticus Oryzias latipes Xiphophorus maculatus Poecilia formosa Dario rerio Astyanax mexicanus Gadus morhua Gasterosteus aculeatus_A2s1 Gasterosteus aculeatus_A2s2

Takifugu rubripes Tetraodon nigroyirides Oreochromis niloticus_A2t1 Oreochromis niloticus_A2t2 copy A2 Oreochromis niloticus_A2t3

Oreochromis niloticus_A2t4

Oryzias latipes_A2m1 Oryzias latipes_A2m2 Oryzias latipes_A2m3

Xiphophorus maculatus Poecilia formosa Gasterosteus aculeatus Takifugu rubripes Tetraodon nigroyirides copy B2a Oreochromis niloticus Oryzias latipes Poecilia formosa Xiphophorus maculatus Gadus morhua Gasterosteus aculeatus_B2bs1 Gasterosteus aculeatus_B2bs2 loop Gasterosteus aculeatus_B2bs3 Takifugu rubripes copy B2b Oreochromis niloticus strand Oryzias latipes Poecilia formosa_B2ba1

Poecilia formosa_B2ba2 sites under positive selection under Bayes empirical Bayes Xiphophorus maculatus with posterior probability > 0.95 Gasterosteus aculeatus Takifugu rubripes copy B1 Oreochromis niloticus Dario rerio copy B2 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. a. a. Expression patterns of of ApoD genes in different teleost fishes Cluster I Cluster II haplochromine Astatotilapia burtoni B2a B2b A1 A2t4 A2t2 A2t3 A2t1 B1 skin eye cichlid fishes LPJ LPJ LPJ LPJ liver anal fin pigmentation

Oreochromis niloticus B2a B2b A1 A2t4 A2t2 A2t3 A2t1 B1 5dh gonad 5dh gonad skin eye LPJ gill gill gill eye liver

Oryzias latipes B2a B2b A1 A2m1 A2m2 A2m3 Percomorphaceae eye skin eye gill gill gill ovary testis eye Acanthomorphata Gasterosteus aculeatus B2a B2bs1 B2bs2 B2bs3 A1 A2s1 A2s2 B1 spleen spleen spleen skin eye gill TGD liver liver liver Gadus morhua B2b A1 A2

ovary liver gill

Astyanax mexicanus A1 A2 eye ovary gill ovary skin eye

Dario rerio B2 A1 A2 skin skin skin testis testis ovary eye eye gill Lepisosteus oculatus B A skin eye gill liver testis brain anal fin pigmentation skin eye gill LPJ ovary testis brain spleen liver b. Oreochromis niloticus_A2t4 Astatotilapia burtoni_A2t4 A2t4 Oreochromis niloticus_A2t2 A2t2 Astatotilapia burtoni_A2t2 Oreochromis niloticus_A2t3 A2t3 c. Astatotilapia burtoni_A2t3 Oreochromis niloticus_A2t1 Highly specific expression of copy B2a and B2b Astatotilapia burtoni_A2t1 Oryzias latipes_A2m1 Oryzias latipes_A2m2 at early developmental stages of gonads in tilapia Oryzias latipes_A2m3 A2 Gasterosteus aculeatus_A2s1 Gasterosteus aculeatus_A2s2 140 Gadus morhua_A2 Dario rerio_A2 120 Astyanax mexicanus_A2 Oreochromis niloticus_A1 100 Astatotilapia burtoni_A1 Oryzias latipes_A1 Gasterosteus aculeatus_A1 80 A1 Gadus morhua_A1 Dario rerio_A1 60 Astyanax mexicanus_A1 Lepisosteus oculatus_A A 40 Oreochromis niloticus_B2a Astatotilapia burtoni_B2a 20 Oryzias latipes_B2a B2a Gasterosteus aculeatus_B2a Oreochromis niloticus_B2b 0 Astatotilapia burtoni_B2b A1 B2a B2b A2t1 A2t2 A2t3 A2t4 B1 Oryzias latipes_B2b Gasterosteus aculeatus_B2bs1 Gasterosteus aculeatus_B2bs2 B2b Gasterosteus aculeatus_B2bs3 5dahovary 5dahtestis 90dahovary 90dahtestis 180dahovary 180dahtestis Gadus morhua_B2b Dario rerio_B2 B2 Oreochromis niloticus_B1 Astatotilapia burtoni_B1 B1 Gasterosteus aculeatus_B1 Lepisosteus oculatus_B B