Genes in Teleost Fishes
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
1
2
3 Cluster expansion of apolipoprotein D (ApoD) genes in teleost fishes
4 Langyu Gu1,2*, Canwei Xia3
5
6 1Key Laboratory of Freshwater Fish Reproduction and Development, Ministry of Education,
7 Laboratory of Aquatic Science of Chongqing, School of Life Sciences, 400715, Southwest
8 University, Chongqing, China. [email protected]
9 2Zoological Institute, University of Basel, Vesalgasse 1, 4051, Basel, Switzerland.
10 3Ministry of Education Key Laboratory for Biodiversity and Ecological Engineering, College of
11 Life Sciences, Beijing Normal University, Beijing, China. [email protected]
12 13 14
15
16
17 Corresponding author:
18 *Langyu Gu
20
21
22
23
24
1
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
25 Abstract
26 Background: Gene and genome duplication play important roles in the evolution of gene function.
27 Compared to individual duplicated genes, gene clusters attract particular attentions considering their
28 frequent associations with innovation and adaptation. Here, we report for the first time the
29 expansion of the ligand (e.g., pheromone and hormone)-transporter genes, apolipoprotein D (ApoD)
30 genes in a cluster, specific to teleost fishes.
31
32 Results: The single ApoD gene in the ancestor expands in two clusters with a dynamic evolutionary
33 pattern in teleost fishes. Based on comparative genomic and transcriptomic analyses, protein 3D
34 structure comparison, evolutionary rate detection and breakpoint detection, orthologous genes show
35 conserved expression patterns. Lineage-specific duplicated genes that are under positive selection
36 evolved specific and even new expression profiles. Different duplicates show high tissue-specific
37 expression patterns (e.g., skin, eye, anal fin pigmentation patterns, gonads, gills, spleen and lower
38 pharyngeal jaw). Cluster analyses based on protein 3D structure comparisons, especially the four
39 loops at the opening side, show segregation patterns with different duplicates. Duplicated ApoD
40 genes are predicted to be associated with forkhead transcription factors and MAPK genes, and
41 they are located next to the breakpoints of genome rearrangements.
42
43 Conclusions: Here, we report the expansion of ApoD genes specific to teleost fishes in a cluster
44 manner for the first time. Neofunctionalization and subfunctionalization were observed at both
45 protein and expression levels after duplication. Evidence from different aspects, i.e. abnormal
46 expression induced disease in human, fish-specific expansion, predicted associations with forkhead
47 transcription factors and MAPK genes, highly specific expression patterns in tissues related to
48 sexual selection and adaptation, duplicated genes that are under positive selection, and their
49 locations next to breakpoints of genome rearrangement, suggests the potential advantageous roles of 2
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
50 ApoD genes in teleost fishes. Cluster expansion of ApoD genes specific to teleost fishes thus
51 provides an ideal evo-devo model for studying gene duplication, cluster maintenance and new gene
52 function emergence.
53
54 Key words
55 apolipoprotein D (ApoD), duplication, gene cluster, positive selection, breakpoints, teleost fishes
56
57 Background
58
59 Gene and genome duplication play important roles in evolution by providing new genetic
60 materials [1]. The gene copies emerging from duplication events (including whole genome
61 duplications (WGD)) can undergo different evolutionary fates, and a number of models have been
62 proposed as to what can happen after duplication [2]. In many instances, one of the duplicates
63 becomes silenced via the accumulation of deleterious mutations (i.e. pseudogenization or
64 nonfunctionalization [1]). Alternatively, the original pre-duplication function can be subdivided
65 between duplicates (i.e. subfunctionalization) [3], or one of the duplicates can gain a new function
66 (i.e. neofunctionalization) [4]. Since the probability of accumulating beneficial substitutions is
67 relatively low, examples for neofunctionalization are sparse. There are, nevertheless, examples for
68 neofunctionalization. For example, the duplication of dachshund in spiders and allies has been
69 associated with the evolution of a novel leg segment [5]; the expansion of repetitive regions in a
70 duplicated trypsinogen-like gene led to functional antifreeze glycoproteins in Antarctic notothenioid
71 fish [6]; and the duplication of opsin genes is implicated in trichromatic vision in primates [7].
72 Another selective advantage of gene duplication can be attributed to the increased numbers of gene
73 copies, e.g. in the form of gene dosage effects [8,9]. Multiple mechanisms can act together to shape
74 different phases of gene evolution after duplication [10]. 3
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
75 Gene functional changes after duplication can occur at the protein level [6,11,12]. For
76 example, the physiological division of labour between the oxygen-carrier function of haemoglobin
77 and the oxygen-storage function of myoglobin in vertebrates [13] (subfunctionalization); and the
78 acquired enhanced digestive efficiencies of duplicated gene encoding of pancreatic ribonuclease in
79 leaf monkeys [14] (neofunctionalization). However, the chance to accumulate beneficial alleles is
80 low, and thus, functional changes after duplication at the protein level are sparse. Instead, changes
81 in the expression level are more tolerable and efficient, since they do not require modification of
82 coding sequences, and they can immediately offer phenotypic consequences. For example,
83 complementary degenerative mutation in the regulatory regions of duplicated genes is a common
84 mechanism for subfunctionalization [2,15]. Many examples have provided evidence that duplicated
85 genes acquiring new expression domains are linked to neofunctionalization (e.g., dac2, a novel leg
86 segment in Arachnid [5]; elnb, bulbus arteriosus in teleost fishes [16]; and fhl2b, egg-spots in
87 cichlid fishes [17]).
88
89 In some cases, gene clusters resulting from gene duplication have attracted considerable
90 attention, such as Hox gene clusters [18], globin gene clusters [19], paraHox gene clusters [20],
91 MHC gene clusters [21] and opsin gene clusters [22]. Duplicated genes in clusters are usually
92 related to innovation and adaptation [11,22,23], suggesting advantageous roles during evolution.
93 The expansion of gene clusters can be traced back to WGD and tandem duplication [23,24].
94 Compared to two rounds of WGD occurring before the split between cartilaginous and bony fish
95 [25], the ancestor of teleost fishes experienced another round of WGD (teleost genome duplication,
96 TGD) after its divergence from non-teleost actinopterygians, including Bichir, Sturgeon, bowfin
97 and spotted gar [26,27]. This extra TGD provides an additional opportunity for gene family
98 expansion in fishes [22,28–30].
99
4
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
100 Genome rearrangements have been suggested to occur frequently in teleost fishes [31]. If
101 genome rearrangements can capture locally adapted genes or antagonist sex determining genes by
102 reducing recombination, the rearranged genome can promote divergence and reproductive isolation
103 [32,33], and thus contribute to speciation and adaptation. Examples can be found in butterflies [34],
104 mosquitoes [35] and fish [36]. This occurs especially when advantageous genes are located next to
105 the breakpoints of a genome rearrangement due to its low recombination rates [37,38]. Considering
106 that the expansion of gene clusters is usually adaptive (as mentioned above) and is linked to
107 genome instability [33,39,40], it will be interesting to investigate the roles of gene clusters located
108 next to the breakpoints of genome rearrangements. However, related studies are sparse.
109
110 Here, we report, for the first time, the expansion of the gene apolipoprotein D (ApoD) in
111 teleost fishes. The ApoD gene belongs to the lipocalin superfamily of lipid transport proteins
112 [41,42]. In humans, ApoD is suggested to function as a multi-ligand, multifunctional transporter
113 (e.g., hormone and pheromone) [42,43], which is important in homeostasis and in the housekeeping
114 of many organs [43]. It is expressed in multiple tissues, most notably in the brain and testis (see e.g.
115 [42,44,45]), and it has been suggested to be involved in the central and peripheral nervous systems
116 [42]. Interestingly, the ApoD gene is duplicated in a cluster manner in two chromosomes in
117 different teleost fishes (http://www.genomicus.biologie.ens.fr/genomicus-91.01/cgi-bin/search.pl).
118 However, no detailed analyses have been reported. Here, we investigated the evolutionary history
119 of ApoD genes in fishes for the first time.
120
121 Results
122
123 1. In silico screen and phylogenetic reconstruction of ApoD genes
124 5
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
125 To investigate the expansion of ApoD genes, we performed phylogenetic reconstruction
126 with high quality assembly genome data (Figures 1A and 1B). There is one ApoD gene in
127 coelacanth (Latimeria chalumnae), and two copies (A and B) in spotted gar (Lepisosteus oculatus),
128 which are located in one cluster. Different numbers of ApoD genes are located in two clusters in
129 different teleost fishes, i.e., with two copies in cavefish (Astyanax mexicanus) (A1 and A2) and in
130 pufferfish (Tetraodon nigroviridis) (B2a and A2); three copies (A1, A2, B2) in zebrafish (Danio
131 rerio) and in cod (Gadus morhua) (A1, A2 and B2b); four copies in platyfish (Xiphophorus
132 maculatus) (A1, A2, B2a and B2b); five copies in Amazon molly (Poecilia formosa) (A1, A2, B2a,
133 B2ba1, B2ba2) and in fugu (Takifugu rubripes) (A1, A2, B1, B2a, B2b); six copies in medaka
134 (Oryzias latipes) (A1, A2m1, A2m2, A2m3, B2a, B2b); and eight copies in stickleback (Gasterosteus
135 aculeatus) (A1, A2s1, A2s2, B1, B2a, B2bs1, B2bs2, B2bs3) and in tilapia (Oreochromis niloticus)
136 (A1, A2t1, A2t2, A2t3, A2t4, B1, B2a, B2b). Noticeably, although the copy B2b of cod is clustered
137 within the B2a clade based on a maximum likelihood (ML) tree, the bootstrap value is very low
138 which could be due to recent duplication (Figure 1B). Considering its gene direction and syntenic
139 with B2b genes in other species (Figure 1A), we named it copy B2b. Sequence alignment for tree
140 construction can be found in Additional file 1.
141
142 To further retrieve the evolutionary history of ApoD genes in fishes, we performed an in
143 silico screen across the whole phylogeny of teleost fishes using draft genomes (Figure 1C). No
144 ApoD gene was detected in Stylephorus chordatus. Unlike copy A1 which shows no lineage-
145 specific duplication, copy A2 exhibits variable lineage-specific copies in different fishes, with the
146 highest lineage-specific duplication present in the cichlid fish tilapia (four copies). Copy B1 is
147 absent in the clade of Gadiformes but is kept in Acanthopterygii. Species in Danio rerio, Osmerus
148 eperlanus and Parasudis fraserbrunneri possess copy B2. The co-existence of B2a band is B2
149 common in Percomorphaceae. The largest numbers of lineage-specific duplicated ApoD genes were
6
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
150 found in tilapia (copy A2), medaka (copy A2) and stickleback (copy B2b) in Percomorphaceae. The
151 predicted ApoD gene sequences can be found in Additional file 2.
152
153 To infer the relationship between ApoD gene duplication and TGD, syntenic analyses were
154 conducted, revealing that two paralogous regions in teleost fishes correspond to one ohnolog region
155 in spotted gar and in chicken (connected by the same coloured lines in Figure 2A). For example, the
156 region with yellow colour in linkage group (LG) 9 in spotted gar (left above), or in chromosome
157 (chr) 2 in chicken (left down) have paired paralogous regions (connected by yellow lines) in chr17
158 (ApoD cluster I located on) and in chr20 (ApoD cluster II located on) in medaka. Similarly, the
159 region with purple colour in LG3 in spotted gar and in chr1 in chicken has paired paralogous
160 regions (connected by purple lines) in chr17 and in chr20 in medaka.
161
162 2. ApoD clusters next to the breakpoints of genome rearrangements
163
164 Syntenic analyses clearly show that more than one chromosome in spotted gar and in
165 chicken are syntenic to the corresponding paralogous regions where ApoD clusters are located in
166 fishes. For example, chromosomes (or LGs) containing ApoD clusters (chr17 and chr 20 in medaka,
167 LG III and LG XXI in stickleback, LG9 and LG18 in tilapia, chr2 and chr24 in zebrafish, Figures
168 2A and 2B) are syntenic to chromosomes LG10, LG9, LG19, LG3 and LG14 in spotted gar and
169 chr8, chr2, chr28, chr1 and chr9 in chicken (Figures 2A and 2B). Noticeably, ApoD clusters are
170 located next to the breakpoints of genome rearrangements (Figures 2A and 2B). The rearranged
171 segments (approx. 80 kb to 200 kb syntenic with LG14 in spotted gar, Figure 2B) include ApoD
172 clusters and their neighbour genes (e.g., and2, samd7, sec62, nadkb, gpr160, skila, prkci and phc3
173 for Cluster I; otos, myeov2, and1, tmtopsb, tnk2a and tfr1b for Cluster II) (Figure 2B). Analyses of
174 available cichlid genomes further show that inversion of the segments (approx. 600 kb to 800kb) 7
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
175 with ApoD clusters as the breakpoints also occurred in the species belonging to the most species-
176 rich lineage, the haplochromine lineage in cichlid fishes, supported by split-read analysis using
177 Delly (Figure 2C).
178
179 3. Protein-protein association predictions and protein 3D structure comparisons
180
181 To assess the biological functions of different ApoD genes, protein-protein associations
182 were predicted. Protein domain architecture analyses revealed that ApoD proteins in different fishes
183 are composed of approx. 20 amino acids as the signal peptide and approx. 144 amino acids as the
184 lipocalin domain (Figure 2D). The common associations of ApoD genes are with the immunity-
185 related gene pla2g15 [47,48], the high-density lipoprotein biogenesis-related gene lcat [49],
186 different copies of zg16 related to pathogenic fungi recognition [50] and MAPK genes [51,52].
187 Different ApoD genes in teleost fishes can be subdivided into two classes based on the association
188 predictions. ApoD genes from one class (copy A2, copy B2a and copy B1) are associated with
189 forkhead transcription factors. ApoD genes from the other class (copy A1 and copy B2b) lost their
190 associations with forkhead transcription factors but are associated with the lipoprotein-related gene
191 apoa1 [53]. Noticeably, the only ApoD gene in coelacanth possesses both associations with
192 forkhead transcription factors and with apoa1. ApoD in coelacanth is also associated with genes
193 encoding ligands that can activate the Notch signalling pathway (jag1, jag2) [54], and the gene
194 belongs to the annexin family (anxaII) (Figure 2D). Noticeably, more members of forkhead
195 transcription factors and MAPK genes are associated with ApoD genes after duplication. Unlike in
196 other fishes, copy A1 in zebrafish is associated with bone resorption-related duplicated genes
197 (ostf1a, ostf1b, ostf1c) [55], the cell growth and division-related gene ppp2cb [56], the
198 neurodevelopment-related gene rab3gap2 [57] and the guanylate-binding gene gbp1 [58] (Figure
199 2D). 8
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
200
201 Homology protein structure modelling of different ApoD genes shows a conserved structure,
202 including a cup-like central part made by eight antiparallel β-sheet strands and two ends connected
203 by loops (a wide opening side formed by loops 1, 3, 5 and 7; and a narrow, closed bottom formed
204 by loops 2, 4 and 6) (Figure 3A). Interestingly, unlike the very conserved cup-like central part, the
205 loops are highly variable (Figure 3A). Cluster analyses based on the whole protein 3D structure
206 comparison can clearly segregate different duplicates, e.g., copy A1, copy A2, copy B2a and copy
207 B2b. Especially for lineage-specific duplicated genes which are clustered together. Copy B1s are
208 clustered together, nesting within the copy A2 clade (Figure 3B). The same analyses focused on
209 only the four loops (loops 1, 3, 5 and 7) at the opening side show a similar segregation pattern
210 (Figure 3C). However, cluster analyses focused on only the three loops (loops 2, 4 and 6) at the
211 bottom side did not show a segregation pattern (Figure 3D). Details about the PDB files and the
212 cluster results can be found in Additional file 3.
213
214 4. Positive selection detection on ApoD genes
215
216 Positive selection was detected in many ApoD genes before and after duplication, for
217 example, the branch of copy A2 in zebrafish, the ancestral branch of copies B2a and B2b in fugu,
218 medaka and platyfish; and, especially, in lineage-specific duplicated genes, such as in stickleback,
219 cichlid fishes, and amazon molly (Figure 4 and Additional file 4). Positive selection sites under
220 Bayes empirical Bayes (BEB) with posterior probability >95% were given in Figure 5 and
221 Additional file 4. Noticeably, most sites under positive selection are located on the highly variable
222 loops or on the connections between loops and the cup-like central part at the opening side (loops 3
223 and 5) (Figures 3A and 5). Our strategy here is relatively conservative. As we can see, if we only
224 focus on one prior hypothesis on a particular branch without doing multiple test correction, we will 9
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
225 find more branches with dN/dS that are significantly larger than 1 (p<0.05) (Additional file 4).
226 However, we chose to use our strategy to make the analyses more rigorous. Even with this strict but
227 comprehensive strategy, our results still show that positive selection occurred in branches before
228 and after ApoD gene duplication, especially for lineage-specific duplicated genes. Details about the
229 codon alignment for the PAML analyses can be found in Additional file 4.
230
231 5. Gene expression profile detection of ApoD genes in different species
232
233 Copy A1 is mainly expressed in the skin and the eye. Noticeably, copy A1 is also highly
234 expressed in the novel anal fin pigmentation patterns of a cichlid fish, Astatotilapia burtoni, which
235 is in consistent with our previous study [59]. Copy A2 shows redundant expression patterns in the
236 gills, eye, skin and gonads in zebrafish and cavefish, but its orthologous A2 genes including
237 lineage-specific duplicated ones in the species of Acanthomorphata, which we test here (cod,
238 stickleback, medaka, tilapia and A. burtoni), are expressed in the gills and in the related apparatus,
239 the lower pharyngeal jaw. Copy B2 in zebrafish shows redundant expression profiles (in the skin
240 and gonads) overlapping with the profiles of copies A1 and A2. Copies B2a and B2b show specific
241 expression patterns in different fishes of Acanthomorphata, which we test here. For example, copy
242 B2b is expressed in the ovary and in the liver in cod, while B2a and B2bs are mainly expressed in
243 the spleen and in the liver of the stickleback. The expression of B2a and B2b was not detected in
244 adult medaka, tilapia or A. burtoni. Interestingly, based on the available transcriptomic data for
245 different developmental stages of gonads in tilapia, we found that B2a and B2b are highly
246 specifically expressed at the early developmental stage of gonad tissues (five days after hatching,
247 5dah) (Figure 6). Copy B1 is highly specifically expressed in the liver in tilapia and in A. burtoni
248 but not in stickleback in which B1 is inverted (Figure 1A). Noticeably, no expression profile was
249 detected for copy B in gar, at least in the tissues we test here. Instead, expression profiles including 10
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
250 the skin, eye, gill, liver, testis and brain were detected for copy A in gar. Details can be found in
251 Figure 6 and Additional file 5.
252
253 Discussion
254
255 Dynamic evolution of ApoD genes in teleost fishes
256
257 Here, we report the cluster expansion of ApoD genes specific to teleost fishes for the first
258 time. Phylogenetic reconstruction and syntenic analyses clearly show the expansion of ApoD genes
259 in two clusters with lineage-specific tandem duplications after TGD. The interplay between genome
260 duplication and tandem duplication to prompt the expansion of the gene family has already been
261 shown in a few studies [60–62]. It has been suggested that the fixation of duplications is much more
262 common in genome regions where the rates of mutations are elevated due to the presence of
263 already-fixed duplication, which is the so-called “snow-ball” effect [63]. Both genome duplication
264 and tandem duplication can produce raw genetic materials to fuel the diversification of teleost
265 fishes at a later stage, e.g., morphological complexity and ecological niche expansion, which is
266 compatible with the “time lag model” [64]. Neofunctionalization and subfunctionalization after
267 duplication are important steps to realize this diversification, and both can be detected in ApoD
268 duplicates.
269
270 Functional divergence of ApoD genes occurred at the protein level. On the one hand, based
271 on protein-protein association predictions, although common associations with MAPK genes were
272 shared among ApoD genes, subdivided associations with forkhead transcription factors and apoa1
273 were detected in different paralogous ApoD genes, indicating subfunctionalization. Noticeably,
274 more members of forkhead transcription factors and MAPK genes are associated with ApoD genes 11
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
275 after duplication. In this case, neofunctionalization could also occur but is not limited to the dosage
276 effects. On the other hand, divergence was also detected at the protein structure level. Protein
277 structures are often related to functional divergence [65–68]. New fold can even evolve novel
278 function [69,70]. Even a few residues mutations can induce structural changes [71,72]. Therefore,
279 structure-based inference is important for understanding molecular function. Our protein 3D
280 structure modelling showed a conserved backbone conformation in spite of sequence divergence.
281 Indeed, this is a feature of the lipocalin family [66], to which ApoDs belong [41,42,73]. The cup-
282 like central part is used to transport large varieties of ligands, such as hormones and pheromones
283 [73]. Interestingly, cluster analyses based on the whole protein structure or only on the four loops at
284 the opening side can segregate different duplicates, although not for copy B1. Considering that copy
285 B1 has been lost multiple times, its function might be not necessary and could share functions with
286 other copies; thus, it is not surprising if they are clustered together within the copy A2 clade. The
287 cluster results of two other individual genes (copy B2 of zebrafish and B2b of medaka) do not affect
288 the general segregation pattern. Noticeably, unlike the four loops at the opending side, cluster
289 analyses based on only the three loops (loops 2, 4 and 6) at the bottom side did not show a clear
290 segregation pattern (Figure 3D). It has been suggested that the loops of lipocalin proteins can affect
291 ligand binding specificities, which are similar to the binding modes of antibodies [74]; and mediate
292 protein-protein interactions [66]. Their segregation with different duplicates might indicate
293 functional divergence occurred at the loops at the opening side. The evidence that most amino acids
294 under positive selection are located on the loops or the connections between loops and strands
295 (Figures 3A and 5, Additonal file 4) further indicates the potential important roles of these loops
296 during evolution. Actually, reshaping different parts of the protein structure is an efficient way to
297 produce functional divergence at the protein level in a short time [73], and this can be one of the
298 ways in which divergence at the protein level occurred in ApoD genes.
299
12
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
300 Functional divergence of ApoD genes also occurred at the expression level. Different ApoD
301 genes in zebrafish and cavefish exhibited redundant expression profiles but became more specific as
302 the numbers of tandem duplicates increased, indicating subfunctionalization. Noticeably, new
303 expression profiles were detected in duplicated paralogs, e.g., copy A1 in novel anal fin
304 pigmentation patterns and copy A2s in the lower pharyngeal jaw in A. burtoni, which is a
305 representative cichlid fish of the most species-richness haplochromine lineage. These two traits are
306 key innovations that are associated with adaptive radiation in cichlid fishes [75]. With expression
307 changes, duplication can be an important source for the emergence of novelty [76], especially if
308 they are adaptive. The relaxed selection pressure induced by the changing expression profiles can
309 even further prompt the accumulation of mutations at the protein level.
310
311 Potential advantageous roles of ApoD genes in teleost fishes
312
313 The ApoD gene has been thoroughly studied in humans and mice, and its abnormal
314 expression was reported to be related to human diseases, such as Parkinson’s disease and
315 Alzheimer’s disease [77–79]. Interestingly, as our study revealed here, ApoD dynamically expanded
316 in fishes in two clusters in different chromosomes. Furthermore, these expanded duplicates are
317 transcriptionally active. For example, they are highly expressed, and are restricted to different
318 tissues related to sexual selection (skin [80], eye [81,82], gonads, anal fin pigmentation pattern [59])
319 and adaptation (gills [83–86], spleen [87,88], lower pharyngeal jaw [75,89]). These tissues are also
320 derived from the neural crest, which is a key innovation in vertebrates [75,90–92]. Besides, many
321 duplicated ApoD genes are under positive selection, especially for lineage-specific duplicated genes.
322 Combined with their associations with MAPK genes and forkhead transcription factors, and their
323 functions as pheromone and hormone transporters, we report evidences that suggests the potential
324 importance of ApoD genes have in fishes. 13
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
325
326 Noticeably, two clusters exhibited distinct expression patterns. ApoD genes in cluster I in
327 most fishes are expressed in tissues related to sexual selection (skin, eye, gonad and anal fin
328 pigmentation), but in cluster II, ApoD genes are mainly expressed in tissues related to adaptation
329 (gills and lower pharyngeal jaw). Interestingly, both clusters were maintained with their
330 neighbouring genes and are located next to the breakpoints of genome rearangements during
331 evolution. An inversion with ApoD clusters next to the breakpoints occurred again in cichlid fishes
332 belonging to the haplochromine lineage, which is the most species-rich lineage [93]. If genome
333 rearrangements can capture locally adaptive genes, or genes related to sexual antagonism, it would
334 accelerate divergence by reducing recombination rates, thus prompting speciation and adaptation,
335 even the emergence of neo-sex chromosome [94]. Given that many ApoD genes are under positive
336 selection, their location at the breakpoints of genome rearrangements gives them more opportunities
337 to be involved in speciation and adaptation, which deserves further investigation.
338
339 Conclusions
340
341 Here, we report the cluster expansion of ApoD genes in a cluster manner specific to fishes
342 for the first time. Different evidences based on computational evolutionary analyses strongly
343 suggest the potentially advantageous roles of ApoD genes in fishes. An in-depth functional
344 characterization of ApoD genes could help consolidate a model for the study of subfunctionalization
345 and neofunctionalization. Moreover, finding the regulatory mechanisms behind the cluster
346 expansion of ApoD genes and the reason why the ApoD gene expanded specific to fishes remain
347 open questions. As more fish genomes with high assembly quality are available, especially closely
348 related species such as cichlid fishes, the roles of ApoD genes in fish speciation and adaptation
349 could be further investigated. Above all, the ApoD gene clusters reported here, provide an ideal 14
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
350 evo-devo model for studying gene duplication, cluster maintenance, and the gene regulatory
351 mechanism and their roles in speciation and adaptation.
352
353 Methods
354
355 In silico screen and phylogenetic reconstruction to infer gene duplication
356
357 To retrieve ApoD gene duplication in teleost fishes, we first extracted orthologs and
358 paralogs in fishes with available genome data from Ensembl (Release 84) [95] and the NCBI
359 database (https://www.ncbi.nlm.nih.gov/genome/). To confirm gene copy numbers, all orthologs
360 and paralogs were used as queries in a tblastx search against the corresponding genomes. For all
361 unannotated positive hits, a region spanning approx. 2 kb was extracted, and open reading frames
362 (ORF) were predicted using Augustus (http://augustus.gobics.de/) [96]. The predicted coding
363 sequences were then BLAST to the existing transcriptome database to retrieve the corresponding
364 cDNA sequences. The cDNAs were then re-mapped to the corresponding predicted genes to re-
365 check the predicted exon-intron boundary. Coding sequence of genes from human and spotted gar,
366 as well as their neighbour genes were used to perform tblastx against the genomes of lamprey
367 (Lampetra fluviatilis) and amphioxus (Branchiostoma belcheri). To infer gene duplication, a ML
368 analysis was performed with ApoD genes retrieved from available assembled genomes in RAxML
369 v8.2.10 [97] with the GTR+G model and ‘-f a’ option (which generates the optimal tree and
370 conducts 10,000 rapid bootstrap searches).
371
372 To further retrieve ApoDs in other fishes across the whole phylogeny with draft genomes
373 [21], all sequences retrieved above were used as queries in a tblastx search with the threshold e
374 value 0.001. The hit scaffolds were retrieved, and genes within scaffolds were predicted using
15
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
375 Augustus. The predicted ApoDs were then translated and re-aligned with known ApoD ORF to
376 further confirm the exon-intron boundary. All predicted orthologs and paralogs were used as a
377 query again in a tblastx search against the corresponding genome data until no more ApoD genes
378 were predicted.
379
380 Syntenic analyses and inversion detection
381
382 To further confirm the relationship between ApoD gene duplication and TGD, genes regions
383 adjacent to the duplicates and to the outgroup species that did not experience TGD (including
384 spotted gar and chicken) were retrieved. To this end, a window of 5 Mb around the ApoD clusters in
385 teleost fishes as well as the corresponding chromosomes in spotted gar and chicken were retrieved
386 from the Ensemble database and NCBI database. Syntenic analyses were performed with SyMap
387 [98]. To further detect the structural variation around ApoD genes in cichlid fishes, we retrieved the
388 available cichlid genomes raw data from [99]. The program Delly [100] based on paired-end split-
389 reads analyses was used to detect the inversion and its corresponding breakpoints of the segments
390 that contain the ApoD clusters in cichlid fishes, with tilapia as the reference.
391
392 Protein-protein interaction prediction, protein 3D structure modelling and comparison
393
394 To predict the biological functions of ApoD genes, the protein domains of ApoD genes and
395 protein-protein interactions were predicted with the Simple Modular Architecture Research Tool
396 (SMART) [101,102] and the stringdb database http://string-db.org/ [103]. To see whether there are
397 divergences at the protein level for different ApoDs, protein 3D structures were simulated with
398 Swiss-model [104–106] using the human ApoD crystal protein structure (PDB ID 2hzr) as the
399 template. The results were further visualized, evaluated and analysed with Swiss-PdbViewer [107].
16
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
400 To compare the protein structures, we first extracted and converted the corresponding information
401 in the PDB file using Swiss-PdbViewer. Protein 3D structure comparisons were then conducted
402 using Vorolign [108], which can compare closely related protein structures even when structurally
403 flexible regions exist [108]. The protein 3D structure comparisons and cluster analyes were
404 conducted with the ProCKSI-server (http://www.procksi.net/). Clustering results were further
405 visualized using Figtree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree). Copy B2ba2 of of amazon
406 molly was not included in the cluster analyses due to its relatively short sequence. The PDB files
407 were used as the input files. For the global protein structure comparisons, PDB files were extracted
408 directly from the simulation results of Swiss-model [104–106]. To only get PDB files of the loop
409 region, we revised the PDB files manually to get rid of the cup-like central part. To consider the
410 highly variable connections between the loops and the cup-like central part (Figure 3A), we also
411 included the structures of two more residues next to the loops. The resulting PDB files were further
412 checked using Swiss-PdbViewer.
413
414 Positive selection detection
415
416 To examine whether ApoD duplicates underwent adaptive sequence evolution, a branch-site
417 model was used to test positive selection affecting a few sites along the target lineages (called
418 foreground branches) in codeml within PAML [109,110]. The rates of non-synonymous to
419 synonymous substitutions (ω or dN/dS) with a priori partitions for foreground branches (PAML
420 manual) were calculated. Considering the very dynamic evolutionary pattern of ApoD genes, we
421 tested positive selection in every branch in each species. In this case, we designated each branch as
422 the foreground branch to run the branch-site model multiple times. Noticeably, there are two
423 different ways to designate the foreground branch in the branch-site model. One way is to designate
424 only the branch as the foreground branch. The other way is to designate the whole clade (all the
17
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
425 branches within the clade including the ancestral branch) as the foreground branch. We included
426 both perspectives in our data analyses (Additonal file 4). If the whole clade is under positive
427 selection, we will label all the branches within this clade including the ancestral branch with ω>1 in
428 the results (Figure 4).
429
430 All model comparisons in PAML were performed with fixed branch lengths (fix_blength =
431 2) derived under the M0 model in PAML. Alignment gaps and ambiguity characters were removed
432 (Cleandata = 1). A likelihood ratio test was used to test for statistical significance. In addition,
433 Bonferroni correction was conducted for multiple test correction [111]. The BEB was used to
434 identify sites that are under positive selection. Sites under positive selection under BEB with
435 posterior probability >0.95 are given in Figure 5 and Additional file 4.
436
437 Gene expression profile analyses
438
439 To see the expression profiles of ApoD duplicates in different fishes, raw transcriptomic
440 data from zebrafish, cavefish, cod, medaka, tilapia and A. burtoni were retrieved from NCBI
441 https://www.ncbi.nlm.nih.gov/ (Additional file 5). Raw reads were mapped to the corresponding
442 cDNAs (http://www.ensembl.org/) to calculate the RPKM (reads per kilobase per million mapped
443 reads) value. Quantitative polymerase chain reaction (qPCR) was used to detect the expression
444 profiles of different ApoD duplicates in the tissues that are not included in the existing
445 transcriptomes (Additional file 5). Prior to tissue dissection, specimens were euthanized with MS
446 222 (Sigma-Aldrich, USA) following an approved procedure (permit nr. 2317 issued by the
447 cantonal veterinary office, Switzerland; Guidelines for the Care and Use of Laboratory Animals
448 prescribed by the Regulation of Animal Experimentation of Chongqing, China). RNA isolation was
449 performed according to the TRIzol protocol (Invitrogen, USA). DNase treatment was performed
18
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
450 with the DNA-free™ Kit (Ambion, Life Technologies, USA). The RNA quantity and quality were
451 determined with a NanoDrop1000 spectrophotometer (Thermo Scientific, USA). cDNA was
452 produced using the High Capacity RNA-to-cDNA Kit (Applied Biosystems, USA). Housekeeping
453 gene elongation factor 1 alpha (elfa1) [112], ubiquitin (ubc) [113] and ribosomal protein L7 (rpl7)
454 [17] [114] were used as endogenous control. qPCR was performed on a StepOnePlusTM Real-Time
455 PCR system (Applied Biosystems, Life Technologies) using the SYBR Green Master Mix (Roche,
456 Switzerland) with an annealing temperature of 58°C and following the manufacturer’s protocols.
457 Primers are available in Additional file 6.
458
459 List of abbreviations
460 ApoD: apolipoprotein D. WGD: whole genome duplication. TGD: teleost genome duplication. ML:
461 maximum likelihood. LG: linkage group. chr: chromosome. BEB: Bayes empirical Bayes. ORF:
462 open reading frame. SMART: Simple Modular Architecture Research Tool. RPKM: Reads Per
463 Kilobase per Million mapped reads. qPCR: quantitative polymerase chain reaction. elfa1:
464 elongation factor 1 alpha. ubc: ubiquitin. rpl7: ribosomal protein L7.
465
466 Declarations
467 Ethics approval and consent to participate
468 Animal experiments reported in this study have been approved by the cantonal veterinary
469 office in Basel under permit number 2317, Switzerland; Guidelines for Care and Use of Laboratory
470 Animals prescribed by the Regulation of Animal Experimentation of Chongqing, China.
471 Consent for publication
472 Not applicable
473
19
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
474 Availability of data and material
475 The datasets supporting the results of this article are available in the Dryad repository,
476 doi:10.5061/dryad.39g63v2. https://datadryad.org/review?doi=doi:10.5061/dryad.39g63v2 [46].
477
478 Competing interests
479 The authors declare that they have no competing interests.
480
481 Funding
482 This work was supported by the PhD grant from University of Basel, Switzerland;
483 postdoctoral funding from Southwest University, China; China Postdoctoral Science Foundation
484 Grant (2018M633311); and National Natural Science Foundation of China (31802283) to LG.
485
486 Author’s contributions
487 LG discovered the expansion of ApoD gene family, did data analysis and wrote the
488 manuscript. LG did protein 3D structure comparison and cluster analyses with the help of CX.
489
490 Acknowledgements
20
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
491 We thank Prof. Walter Salzburger for his suggestions. We also thank Prof. Deshou Wang
492 for his help of sampling of tilapia and laboratory support. We particularly thank Prof. Ziheng Yang
493 for his valuable suggestions about positive selection detection using PAML. Many thanks to the
494 kind computing support from Prof. Zhengwang Zhang. We appreciate the valuable suggestions and
495 comments from the associate editor and two anonymous reviewers very much. Many thanks to
496 Dario Moser, Heinz-Georg Belting, Jing Wei and Hua Ruan for their help of fish sampling. Many
497 thanks for the help of fish dissection from Yang Zhao, Fabrizia Ronco, Attila Rüegg, Adrian
498 Indermaur, Xianbo Zhang and He Ma. Many thanks to Lukas Zimmerman, Peter Fields, Yuchen
499 Sun, Yanyan Xu, Zihui Zhang and De Chen for their help and fruitful discussions. We also would
500 like to express our thanks to Prof. Xiangjiang Zhan in the Institute of Zoology, Chinese Academy of
501 Sciences for the help of revising the rebuttal letter. Many thanks to Prof. Anders Moller in CNRS,
502 Université Paris-Sud in France, Dr. Juan Felipe Ortiz in Vanderbilt University in America for the
503 correction of English language.
504
505 Figure legends
506 Figure 1 (A) Cluster expansion of ApoD genes specific to teleost fishes after teleost-specific
507 duplication (TGD). Each arrow represents a single gene copy. Genes with red and orange colours
508 represent paralogs derived from one common ancestor. Genes with dark and light green colours
509 represent paralogs derived from one common ancestor. Phylogeny reconstruction is based on a
510 consensus fish phylogeny [22]. (B) Maximum likelihood phylogenetic tree reconstruction to infer
511 gene duplication. Bootstrap values > 50% are marked on the branch. Lineage-specific duplication
512 events were labelled. Note, although copy B2b of Gadus morhua is clustered within the B2a clade
513 (labelled with a green star), its bootstrap value is low, which could be due to recent duplication. (C)
514 A dynamic evolutionary pattern of ApoD genes across the entire phylogeny of teleost fishes. Highly 21
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
515 variable copy numbers are detected in different lineages, especially in the Paracanthopterygii
516 lineage. Stylephorus chordatus lost all ApoD genes. Compared to copy A1, copy A2 exhibits more
517 variable lineage-specific duplicates in different fishes, with the highest numbers appearing in tilapia
518 (four copies). Copy B1 is absent in the whole clade of Gadiformes. Copy B2 only shows up in
519 species Danio rerio, Osmerus eperlanus and Parasudis fraserbrunneri. The co-existence of B2a
520 and B2b is common in Percomorphaceae. The largest numbers of lineage-specific duplicated genes
521 are found in tilapia (copy A2), medaka (copy A2) and stickleback (copy B2b) in Percomorphaceae.
522
523 Figure 2 (A) Syntenic analyses between teleost fishes and spotted gar (top) and chicken (bottom),
524 respectively. The same colour between spotted gar and chicken represents orthologous
525 chromosomes. Two paralogous duplicated segments in teleost fishes (chromosomes (chrs)/linkage
526 groups (LG) indicated in red colour) can be traced back to one corresponding orthologous region in
527 spotted gar and chicken (chrs/LGs named with black colour), linked by colourful lines. Arrows
528 show the regions where apolipoprotein D (ApoD) clusters are located. (B) ApoD clusters as the
529 breakpoints of genome rearrangements before teleost-specific genome duplication (TGD). The
530 same colour between gar and chicken represents orthologous chromosomes. Neighbouring genes
531 are labelled. The red arrows show breakpoints, and the black arrows show gene direction.
532 Neighbouring genes are named according to Ensembl database. (C) Inversion with ApoD clusters as
533 the breakpoints occurred again in cichlid fishes. The haplochromine lineage, the most species-rich
534 lineage of cichlid fishes, is labelled. (D) ApoD domains and protein-protein association predictions.
535 a. Conserved domains include a single peptide of approx. 20 amino acids (AA) and a lipocalin
536 domain of approx. 144 AA. b. Different paralogs exhibited differential associations. The common
537 association is with MAPK genes. One class (copies A2, B2a and B1) is associated with multiple
538 forkhead transcription factors. The other class (copies A1 and B2b) lost this association; instead, it
22
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
539 is associated with the lipoprotein-related gene apoa1. The single ApoD gene in coelacanth
540 possesses both associations.
541
542 Figure 3 (A) Protein 3D structure modelling. Different ApoDs among species exhibited a
543 conserved 3D structure, including a cup-like central part made by eight antiparallel β-sheet strands
544 and two ends connected by loops (a wide opening part formed by loops 1, 3, 5 and 7; and a narrow,
545 closed bottom formed by loops 2, 4 and 6). Unlike the very conserved cup-like central part, the
546 loops at the two ends are highly variable. Note that most sites under positive selection are located
547 on the loops or on the connections between the loops and the cup-like central part. (B) Cluster
548 analyses based on the whole protein 3D structure. Cluster analyses can clearly segregate different
549 ApoD duplicates, including lineage-specific duplicated genes. (C) Cluster analyses based on only
550 four loops (loops 1, 3, 5 and 7) at the opening side. Cluster analyses can segregate different ApoD
551 duplicates. (D) Cluster analyses based on only three loops (loops 2, 4 and 6) at the bottom side.
552 Cluster analyses cannot segregate with different duplicates.
553
554 Figure 4 Positive selection detection of ApoD genes in different fishes under the branch-site
555 model. Many duplicated ApoD genes are under positive selection with the value of ω significantly
556 larger than 1 after Bonferroni correction, especially for lineage-specific duplicated genes. Note that
557 if the whole clade is designated as the foreground branch when detecting positive selection, each
558 branch of this clade will be labelled if its ω is significantly larger than 1. The unrooted tree was
559 used for PAML analyses, and the rooted tree is only for presentation. More details can be found in
560 Additional file 4.
561
23
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
562 Figure 5 Amino acids alignment of ApoD duplicates among species with annotation. The purple
563 arrow represents the β-strand, and the yellow arrow represents the loop. Sites under positive
564 selection under Bayes empirical Bayes (BEB) with posterior probability > 95% are labelled with the
565 red colour. Most sites under positive selection are distributed on the loops or on the connections
566 between the loop and the β-strand. More details can be found in Additional file 4. Note, the
567 positions of the sites that under positive selection in Additional file 4 are based on the codon
568 alignment used for PAML analyses (Additional file 4) instead of the amino acids alignment here.
569 But the amino acids that under positive selection are the same in Figure 5 and Additional file 4.
570
571 Figure 6 Gene expression profiles of different ApoD genes. (a) Schematic figure showing the
572 gene expression profiles of different ApoD genes in two clusters after teleost-specific duplication
573 (TGD). Each block represents a single gene copy. Phylogeny reconstruction was based on the
574 consensus fish phylogeny [22]. (b) Detailed gene expression profiles of ApoD genes among fishes.
575 The red colour represents tissues with a high expression level. Dark grey, data unavailable (either
576 because the tissue does not exist in the species, or is undetected in this study). Light grey represents
577 tissues with a low expression level. LPJ, the lower pharyngeal jaw in cichlid fishes. (c) Gene
578 expression profiles of ApoD genes in different developmental stages of gonads in tilapia based on
579 available transcriptomic data [115]. RPKM, reads per kilobase per million mapped reads; dah, days
580 after hatching.
581
582 Additional Files
583 Additional file 1 Sequence alignment for ML tree reconstruction.
584 Additional file 2 Predicted ApoD genes across the phylogeny.
585 Additional file 3 Protein PDB files and cluster analyses results.
24
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
586 Additional file 4 Sequence alignment for PAML test and statistics results.
587 Additional file 5 Raw transcriptomic data information and gene expression profiles.
588 Additional file 6 qPCR primer sequences.
589
590 References
591 1. Ohno S. Evolution by gene duplication. Springer Berlin Heidelb. 1970;
592 2. Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing
593 between models. Nat. Rev. Genet. 2010;11:97–108.
594 3. Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J. Preservation of duplicate
595 genes by complementary, degenerative mutations. Genetics. 1999;151:1531–45.
596 4. Rastogi S, Liberles DA. Subfunctionalization of duplicated genes as a transition state to
597 neofunctionalization. BMC Evol. Biol. 2005;5:28.
598 5. Turetzek N, Pechmann M, Schomburg C, Schneider J, Prpic N-M. Neofunctionalization of a
599 Duplicate dachshund Gene Underlies the Evolution of a Novel Leg Segment in Arachnids. Mol.
600 Biol. Evol. 2016;33:109–21.
601 6. Chen L, DeVries AL, Cheng C-HC. Evolution of antifreeze glycoprotein gene from a
602 trypsinogen gene in Antarctic notothenioid fish. Proc. Natl. Acad. Sci. 1997;94:3811–6.
603 7. Dulai KS, von Dornum M, Mollon JD, Hunt DM. The Evolution of Trichromatic Color Vision
604 by Opsin Gene Duplication in New World and Old World Primates. Genome Res. 1999;9:629–38.
605 8. Tang Y-C, Amon A. Gene copy-number alterations: a cost-benefit analysis. Cell.
606 2013;152:394–405.
607 9. Lin Z, Li W-H. Expansion of hexose transporter genes was associated with the evolution of
608 aerobic fermentation in yeasts. Mol. Biol. Evol. 2011;28:131–42.
609 10. Glasauer SMK, Neuhauss SCF. Whole-genome duplication in teleost fishes and its evolutionary
610 consequences. Mol. Genet. Genomics. 2014;289:1045–60. 25
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
611 11. Baalsrud HT, Voje KL, Tørresen OK, Solbakken MH, Matschiner M, Malmstrøm M, et al.
612 Evolution of Hemoglobin Genes in Codfishes Influenced by Ocean Depth. Sci. Rep. 2017;7:7956.
613 12. Zhang J, Zhang Y, Rosenberg HF. Adaptive evolution of a duplicated pancreatic ribonuclease
614 gene in a leaf-eating monkey. Nat. Genet. 2002;30:411–5.
615 13. Storz JF, Opazo JC, Hoffmann FG. Gene duplication, genome duplication, and the functional
616 diversification of vertebrate globins. Mol. Phylogenet. Evol. 2013;66:469–78.
617 14. Zhang J. Parallel adaptive origins of digestive RNases in Asian and African leaf monkeys. Nat.
618 Genet. 2006;38:819–23.
619 15. Jiménez-Delgado S, Pascual-Anaya J, Garcia-Fernàndez J. Implications of duplicated cis-
620 regulatory elements in the evolution of metazoans: the DDI model or how simplicity begets novelty.
621 Brief. Funct. Genomic. Proteomic. 2009;8:266–75.
622 16. Moriyama Y, Ito F, Takeda H, Yano T, Okabe M, Kuraku S, et al. Evolution of the fish heart by
623 sub/neofunctionalization of an elastin gene. Nat. Commun. 2016;7:10397.
624 17. Santos ME, Braasch I, Boileau N, Meyer BS, Sauteur L, Böhne A, et al. The evolution of
625 cichlid fish egg-spots is linked with a cis-regulatory change. Nat. Commun. 2014;5:5149.
626 18. Carrasco AE, McGinnis W, Gehring WJ, De Robertis EM. Cloning of an X. laevis gene
627 expressed during early embryogenesis coding for a peptide region homologous to Drosophila
628 homeotic genes. Cell. 1984;37:409–14.
629 19. Proudfoot NJ, Shander MH, Manley JL, Gefter ML, Maniatis T. Structure and in vitro
630 transcription of human globin genes. Science. 1980;209:1329–36.
631 20. Brooke NM, Garcia-Fernàndez J, Holland PWH. The ParaHox gene cluster is an evolutionary
632 sister of the Hox gene cluster. Nature. 1998;392:920–2.
633 21. Malmstrøm M, Matschiner M, Tørresen OK, Star B, Snipen LG, Hansen TF, et al. Evolution of
634 the immune system influences speciation rates in teleost fishes. Nat. Genet. 2016;48:1204–10.
635 22. Cortesi F, Musilová Z, Stieb SM, Hart NS, Siebeck UE, Malmstrøm M, et al. Ancestral
26
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
636 duplications and highly dynamic opsin gene evolution in percomorph fishes. Proc. Natl. Acad. Sci.
637 2015;112:1493–8.
638 23. Garcia-Fernàndez J. The genesis and evolution of homeobox gene clusters. Nat. Rev. Genet.
639 2005;6:881–92.
640 24. Hardison RC. Evolution of hemoglobin and its genes. Cold Spring Harb. Perspect. Med.
641 2012;2:a011627.
642 25. Robinson-Rechavi M, Boussau B, Laudet V. Phylogenetic dating and characterization of gene
643 duplications in vertebrates: the cartilaginous fish reference. Mol. Biol. Evol. 2004;21:580–6.
644 26. Sato Y, Nishida M. Teleost fish with specific genome duplication as unique models of
645 vertebrate evolution. Environ. Biol. Fishes.2010;88:169–88.
646 27. Braasch I, Gehrke AR, Smith JJ, Kawasaki K, Manousaki T, Pasquier J, et al. The spotted gar
647 genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nat. Genet.
648 2016;48:427–37.
649 28. Brunet FG, Volff J-N, Schartl M. Whole Genome Duplications Shaped the Receptor Tyrosine
650 Kinase Repertoire of Jawed Vertebrates. Genome Biol. Evol. 2016;8:1600–13.
651 29. Gillis WQ, St John J, Bowerman B, Schneider SQ. Whole genome duplications and expansion
652 of the vertebrate GATA transcription factor gene family. BMC Evol. Biol. 2009;9:207.
653 30. Voldoire E, Brunet F, Naville M, Volff J-N, Galiana D. Expansion by whole genome
654 duplication and evolution of the sox gene family in teleost fish. Vaudry H, editor. PLoS One.
655 2017;12:e0180936.
656 31. Sémon M, Wolfe KH. Rearrangement rate following the whole-genome duplication in teleosts.
657 Mol. Biol. Evol. 2007;24:860–7.
658 32. Kirkpatrick M, Barton N. Chromosome inversions, local adaptation and speciation. Genetics.
659 2006;173:419–34.
660 33. Farré M, Micheletti D, Ruiz-Herrera A. Recombination Rates and Genomic Shuffling in Human
27
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
661 and Chimpanzee—A New Twist in the Chromosomal Speciation Theory. Mol. Biol. Evol.
662 2013;30:853–64.
663 34. Joron M, Frezal L, Jones RT, Chamberlain NL, Lee SF, Haag CR, et al. Chromosomal
664 rearrangements maintain a polymorphic supergene controlling butterfly mimicry. Nature.
665 2011;477:203–6.
666 35. Ayala D, Fontaine MC, Cohuet A, Fontenille D, Vitalis R, Simard F. Chromosomal inversions,
667 natural selection and adaptation in the malaria vector Anopheles funestus. Mol. Biol. Evol.
668 2011;28:745–58.
669 36. Barth JMI, Berg PR, Jonsson PR, Bonanomi S, Corell H, Hemmer-Hansen J, et al. Genome
670 architecture enables local adaptation of Atlantic cod despite high connectivity. Mol. Ecol.
671 2017;26:4452–66.
672 37. Stevison LS, Hoehn KB, Noor MAF. Effects of Inversions on Within- and Between-Species
673 Recombination and Divergence. Genome Biol. Evol. 2011;3:830–41.
674 38. Corbett-Detig RB. Selection on Inversion Breakpoints Favors Proximity to Pairing Sensitive
675 Sites in Drosophila melanogaster. Genetics. 2016;204:259–65.
676 39. Otto SP. The Evolutionary Consequences of Polyploidy. Cell. 2007;131:452–62.
677 40. Hufton AL, Groth D, Vingron M, Lehrach H, Poustka AJ, Panopoulou G. Early vertebrate
678 whole genome duplications were predated by a period of intense genome rearrangement. Genome
679 Res. 2008;18:1582–91.
680 41. Ayrault Jarrier M, Levy G, Polonovski J. ETUDE DES ALPHA-LIPOPROT’EINES
681 S'ERIQUES HUMAINES PAR. Bull. Soc. Chim. Biol. (Paris). 1963;45:703–13.
682 42. Rassart E, Bedirian A, Do Carmo S, Guinard O, Sirois J, Terrisse L, et al. Apolipoprotein D.
683 Biochim. Biophys. Acta - Protein Struct. Mol. Enzymol. 2000;1482:185–98.
684 43. Weech P, Provost P, Tremblay N, Camato R, Milne R, Marcel Y, et al. Apolipoprotein D—An
685 atypical apolipoprotein. Prog. Lipid Res. 1991;30:259–66.
28
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
686 44. Drayna D, Fielding C, McLean J, Baer B, Castro G, Chen E, et al. Cloning and expression of
687 human apolipoprotein D cDNA. J. Biol. Chem. 1986;261:16535–9.
688 45. Provost PR, Villeneuve L, Weech PK, Milne RW, Marcel YL, Rassart E. Localization of the
689 major sites of rabbit apolipoprotein D gene transcription by in situ hybridization. J. Lipid Res.
690 1991;32:1959–70.
691 46. Gu L, Xia C. Data from: Cluster expansion of apolipoprotein D (ApoD) genes in teleost fishes.
692 Dryad Digit. Repos. 2018. p. https://doi.org/10.5061/dryad.39g63v2.
693 47. Gilleron M, Lepore M, Layre E, Cala-De Paepe D, Mebarek N, Shayman JA, et al. Lysosomal
694 Lipases PLRP2 and LPLA2 Process Mycobacterial Multi-acylated Lipids and Generate T Cell
695 Stimulatory Antigens. Cell Chem. Biol. 2016;23:1147–56.
696 48. Bailey SD, Xie C, Do R, Montpetit A, Diaz R, Mohan V, et al. Variation at the NFATC2 Locus
697 Increases the Risk of Thiazolidinedione-Induced Edema in the Diabetes REduction Assessment
698 with ramipril and rosiglitazone Medication (DREAM) Study. Diabetes Care. 2010;33:2250–3.
699 49. Fotakis P, Kuivenhoven JA, Dafnis E, Kardassis D, Zannis VI. The Effect of Natural LCAT
700 Mutations on the Biogenesis of HDL. Biochemistry. 2015;54:3348–59.
701 50. Tateno H, Yabe R, Sato T, Shibazaki A, Shikanai T, Gonoi T, et al. Human ZG16p recognizes
702 pathogenic fungi through non-self polyvalent mannose in the digestive system. Glycobiology.
703 2012;22:210–20.
704 51. Plestant C, Anton ES. Scaling the MAPK Signaling Threshold during CNS Patterning. Dev.
705 Cell. 2013;25:221–2.
706 52. Shvartsman SY, Coppey M, Berezhkovskii AM. MAPK signaling in equations and embryos.
707 Fly. 2009;3:62–7.
708 53. Mangaraj M, Nanda R, Panda S. Apolipoprotein A-I: A Molecule of Diverse Function. Indian J.
709 Clin. Biochem. 2016;31:253–9.
710 54. Fiaschetti G, Schroeder C, Castelletti D, Arcaro A, Westermann F, Baumgartner M, et al.
29
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
711 NOTCH ligands JAG1 and JAG2 as critical pro-survival factors in childhood medulloblastoma.
712 Acta Neuropathol. Commun. 2014;2:39.
713 55. Reddy S, Devlin R, Menaa C, Nishimura R, Choi SJ, Dallas M, et al. Isolation and
714 characterization of a cDNA clone encoding a novel peptide (OSF) that enhances osteoclast
715 formation and bone resorption. J. Cell. Physiol. 1998;177:636–45.
716 56. Wong C-H, Fung Y-WW, Ng EK-O, Lee SM-Y, Waye MM-Y, Tsui SK-W. LIM domain
717 protein FHL1B interacts with PP2A catalytic β subunit - A novel cell cycle regulatory pathway.
718 FEBS Lett. 2010;584:4511–6.
719 57. Ng EL, Tang BL. Rab GTPases and their roles in brain neurons and glia. Brain Res. Rev.
720 2008;58:236–46.
721 58. Pandita E, Rajan S, Rahman S, Mullick R, Das S, Sau AK. Tetrameric assembly of hGBP1 is
722 crucial for both stimulated GMP formation and antiviral activity. Biochem. J. 2016;473:1745–57.
723 59. Gu L, Xia C. Revelation of the Genetic Basis for Convergent Innovative Anal Fin Pigmentation
724 Patterns in Cichlid Fishes. bioRxiv. 2017; doi:https://doi.org/10.1101/165217.
725 60. Hofberger JA, Nsibo DL, Govers F, Bouwmeester K, Schranz ME. A Complex Interplay of
726 Tandem- and Whole-Genome Duplication Drives Expansion of the L-Type Lectin Receptor Kinase
727 Gene Family in the Brassicaceae. Genome Biol. Evol. 2015;7:720–34.
728 61. Bellieny-Rabelo D, Oliveira AEA, Venancio TM. Impact of Whole-Genome and Tandem
729 Duplications in the Expansion and Functional Diversification of the F-Box Family in Legumes
730 (Fabaceae). Nikolaidis N, editor. PLoS One. 2013;8:e55127.
731 62. Hammoudi V, Vlachakis G, Schranz ME, van den Burg HA. Whole-genome duplications
732 followed by tandem duplications drive diversification of the protein modifier SUMO in
733 Angiosperms. New Phytol. 2016;211:172–85.
734 63. Kondrashov FA, Kondrashov AS. Role of selection in fixation of gene duplications. J. Theor.
735 Biol. 2006;239:141–51.
30
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
736 64. Schranz ME, Mohammadin S, Edger PP. Ancient whole genome duplications, novelty and
737 diversification: the WGD Radiation Lag-Time Model. Curr. Opin. Plant Biol. 2012;15:147–53.
738 65. Engel A, Gaub HE. Structure and mechanics of membrane proteins. Annu. Rev. Biochem.
739 2008;77:127–48.
740 66. Flower DR. The lipocalin protein family: structure and function. Biochem. J. 1996;318:1–14.
741 67. McLachlan AD. Protein Structure and Function. Annu. Rev. Phys. Chem. 1972;23:165–92.
742 68. Klingenberg M. Membrane protein oligomeric structure and transport function. Nature.
743 1981;290:449–54.
744 69. Friedman JM. Structure, dynamics, and reactivity in hemoglobin. Science. 1985;228:1273–80.
745 70. Ewart K V., Lin Q, Hew CL. Structure, function and evolution of antifreeze proteins. Cell. Mol.
746 Life Sci. C. 1999;55:271–83.
747 71. Arévalo-Pinzón G, Curtidor H, Muñoz M, Patarroyo MA, Bermudez A, Patarroyo ME. A single
748 amino acid change in the Plasmodium falciparum RH5 (PfRH5) human RBC binding sequence
749 modifies its structure and determines species-specific binding activity. Vaccine. 2012;30:637–46.
750 72. Schaefer C, Rost B. Predict impact of single amino acid change upon protein structure. BMC
751 Genomics. 2012;13:S4.
752 73. Flower DR. Multiple molecular recognition properties of the lipocalin protein family. J. Mol.
753 Recognit. 1995;8:185–95.
754 74. Skerra A. Engineered protein scaffolds for molecular recognition. J. Mol. Recognit.
755 2000;13:167–87.
756 75. Salzburger W. The interaction of sexually and naturally selected traits in the adaptive radiations
757 of cichlid fishes. Mol. Ecol. 2009;18:169–85.
758 76. Rogers RL, Shao L, Thornton KR. Tandem duplications lead to novel expression patterns
759 through exon shuffling in Drosophila yakuba. Begun DJ, editor. PLOS Genet. 2017;13:e1006795.
760 77. Chen Y, Jia L, Wei C, Wang F, Lv H, Jia J. Association between polymorphisms in the
31
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
761 apolipoprotein D gene and sporadic Alzheimer’s disease. Brain Res. 2008;1233:196–202.
762 78. Helisalmi S, Hiltunen M, Vepsäläinen S, Iivonen S, Corder EH, Lehtovirta M, et al. Genetic
763 variation in apolipoprotein D and Alzheimer's disease. J. Neurol. 2004;251:951–7.
764 79. Waldner A, Dassati S, Redl B, Smania N, Gandolfi M. Apolipoprotein D Concentration in
765 Human Plasma during Aging and in Parkinson’s Disease: A Cross-Sectional Study. Parkinsons.
766 Dis. 2018;2018:1–7.
767 80. Wellenreuther M, Svensson EI, Hansson B. Sexual selection and genetic colour polymorphisms
768 in animals. Mol. Ecol. 2014;23:5398–414.
769 81. CARLETON KL, PARRY JWL, BOWMAKER JK, HUNT DM, SEEHAUSEN O. Colour
770 vision and speciation in Lake Victoria cichlids of the genus Pundamilia. Mol. Ecol. 2005;14:4341–
771 53.
772 82. Flamarique IN, Bergstrom C, Cheng CL, Reimchen TE. Role of the iridescent eye in
773 stickleback female mate choice. J. Exp. Biol. 2013;216:2806–12.
774 83. Bystriansky JS, Schulte PM. Changes in gill H+-ATPase and Na+/K+-ATPase expression and
775 activity during freshwater acclimation of Atlantic salmon (Salmo salar). J. Exp. Biol.
776 2011;214:2435–42.
777 84. McCormick SD, Bradshaw D. Hormonal control of salt and water balance in vertebrates. Gen.
778 Comp. Endocrinol. 2006;147:3–8.
779 85. Sakamoto T. Growth hormone and prolactin in environmental adaptation. Zoolog. Sci.
780 2003;20:1497–8.
781 86. Foskett JK, Bern HA, Machen TE, Conner M. Chloride cells and the hormonal control of teleost
782 fish osmoregulation. J. Exp. Biol. 1983;106:255–81.
783 87. Papetti C, Harms L, Windisch HS, Frickenhaus S, Sandersfeld T, Jürgens J, et al. A first insight
784 into the spleen transcriptome of the notothenioid fish Lepidonotothen nudifrons: Resource
32
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
785 description and functional overview. Mar. Genomics. 2015;24:237–9.
786 88. Huang Y, Chain FJJ, Panchal M, Eizaguirre C, Kalbe M, Lenz TL, et al. Transcriptome
787 profiling of immune tissues reveals habitat-specific gene expression between lake and river
788 sticklebacks. Mol. Ecol. 2016;25:943–58.
789 89. Muschick M, Indermaur A, Salzburger W. Convergent evolution within an adaptive radiation of
790 cichlid fishes. Curr. Biol. 2012;22:2362–8.
791 90. Green SA, Simoes-Costa M, Bronner ME. Evolution of vertebrates as viewed from the crest.
792 Nature. 2015;520:474–82.
793 91. Barlow-Anacker AJ, Fu M, Erickson CS, Bertocchini F, Gosain A. Neural Crest Cells
794 Contribute an Astrocyte-like Glial Population to the Spleen. Sci. Rep. 2017;7:45645.
795 92. Bailey AP, Bhattacharyya S, Bronner-Fraser M, Streit A. Lens Specification Is the Ground State
796 of All Sensory Placodes, from which FGF Promotes Olfactory Identity. Dev. Cell. 2006;11:505–17.
797 93. Salzburger W, Mack T, Verheyen E, Meyer A. Out of Tanganyika: genesis, explosive
798 speciation, key-innovations and phylogeography of the haplochromine cichlid fishes. BMC Evol.
799 Biol. 2005;5:17.
800 94. Charlesworth D. Evolution of recombination rates between sex chromosomes. Philos. Trans. R.
801 Soc. B Biol. Sci. 2017;372:20160456.
802 95. Flicek P, Amode MR, Barrell D, Beal K, Billis K, Brent S, et al. Ensembl 2014. Nucleic Acids
803 Res. 2014;42:D749–55.
804 96. Stanke M, Diekhans M, Baertsch R, Haussler D. Using native and syntenically mapped cDNA
805 alignments to improve de novo gene finding. Bioinformatics. 2008;24:637–44.
806 97. Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large
807 phylogenies. Bioinformatics. 2014;30:1312–3.
808 98. Soderlund C, Bomhoff M, Nelson W. SyMAP v3.4: a turnkey synteny system with application
809 to plant genomes. Nucleic Acids Res. 2011;39:e68.
33
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
810 99. Brawand D, Wagner CE, Li YI, Malinsky M, Keller I, Fan S, et al. The genomic substrate for
811 adaptive radiation in African cichlid fish. Nature. 2014;513:375–81.
812 100. Rausch T, Zichner T, Schlattl A, Stutz AM, Benes V, Korbel JO. DELLY: structural variant
813 discovery by integrated paired-end and split-read analysis. Bioinformatics. 2012;28:i333–9.
814 101. Letunic I, Doerks T, Bork P. SMART: recent updates, new developments and status in 2015.
815 Nucleic Acids Res. 2015;43:D257–60.
816 102. Letunic I, Bork P. 20 years of the SMART protein domain annotation resource. Nucleic Acids
817 Res. 2018;46:D493–6.
818 103. Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, et al. The STRING
819 database in 2017: quality-controlled protein–protein association networks, made broadly accessible.
820 Nucleic Acids Res. 2017;45:D362–8.
821 104. Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, et al. SWISS-MODEL:
822 modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids
823 Res. 2014;42:W252–8.
824 105. Bordoli L, Kiefer F, Arnold K, Benkert P, Battey J, Schwede T. Protein structure homology
825 modeling using SWISS-MODEL workspace. Nat. Protoc. 2008;4:1–13.
826 106. Arnold K, Bordoli L, Kopp J, Schwede T. The SWISS-MODEL workspace: a web-based
827 environment for protein structure homology modelling. Bioinformatics. 2006;22:195–201.
828 107. Guex N, Peitsch MC, Schwede T. Automated comparative protein structure modeling with
829 SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis. 2009;30:S162–
73. 830 73.
831 108. Birzele F, Gewehr JE, Csaba G, Zimmer R. Vorolign--fast structural alignment using Voronoi
832 contacts. Bioinformatics. 2007;23:e205–11.
833 109. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol.
834 2007;24:1586–91.
34
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
835 110. Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood.
836 Comput. Appl. Biosci. 1997;13:555–6.
837 111. Anisimova M, Yang Z. Multiple hypothesis testing to detect lineages under positive selection
838 that affects only a few sites. Mol. Biol. Evol. 2007;24:1219–28.
839 112. McCurley AT, Callard G V. Characterization of housekeeping genes in zebrafish: male-female
840 differences and effects of tissue type, developmental stage and chemical treatment. BMC Mol. Biol.
841 2008;9:102.
842 113. Hibbeler S, Scharsack JP, Becker S. Housekeeping genes for quantitative expression studies in
843 the three-spined stickleback Gasterosteus aculeatus. BMC Mol. Biol. 2008;9:18.
844 114. Zhang Z, Hu J. Development and validation of endogenous reference genes for expression
845 profiling of medaka (Oryzias latipes) exposed to endocrine disrupting chemicals by quantitative
846 real-time RT-PCR. Toxicol. Sci. 2007;95:356–68.
847 115. Tao W, Sun L, Shi H, Cheng Y, Jiang D, Fu B, et al. Integrated analysis of miRNA and
848 mRNA expression profiles in tilapia gonads at an early stage of sex differentiation. BMC
849 Genomics. 2016;17:328.
850
35
bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A1 B2B2a B2b A2 B1 Astyanax mexicanus A1 A C Dario rerio B2 A1 Danio rerio Astyanax mexicanus Gadus morhua B2b A1 Cluster I Osmerus eperlanus Gasterosteus aculeatus B2a B2bs1 B2bs2 B2bs3 A1 Borostomias antarcticus Takifugu rubripes B2a B2b A1 Parasudis fraserbrunneri Tetraodon nigroviridis B2a Guentherus altivela Oreochromis niloticus B2a B2b A1 Benthosema glaciale Percopsis transmontana TGD Oryzias latipes B2a B2b A1 Typhlichthys subterraneus Poecilia formosa B2a B2ba1 A1 B2ba2 Polymixia japonica Xiphophorus maculatus B2a B2b A1 Neoteleostei Zeus faber Cyttopsis roseus Astyanax mexicanus A2 Paracanthopterygii Stylephorus chordatus Dario rerio A2 Trachyrincus scabrus Cluster II Gadus morhua A2 Zeiogadaria Trachyrincus murrayi Gasterosteus aculeatus A2s1 A2s2 B1 Ctenosquamata Muraenolepis marmoratus Takifugu rubripes A2 B1 Bathygadus melanobranchus Macrourus berglax Tetraodon nigrovirides A2 Gadariae Malacocephalus occidentalis Oreochromis niloticus A2t4 A2t1 B1 A2t2 A2t3 Melanonus zugmayeri Oryzias latipes A2m1 A2m2 A2m3 Merluccius merluccius Poecilia formosa A2 Merluccius capensis Phycis blennoides Xiphophorus maculatus A2 Acanthomorphata Gadiformes Trisopterus minutus Merlangius merlangus Lepisosteus oculatus B A Gadidae Arctogadus glacialis Latimeria chalumnae A 500bp Gadus morhua Brosme brosme Molva molva Regalecus glesne copy A B Lampris guttatus Lepisosteus oculatus Oreochromis niloticus 100 Neolamprologus brichardi cichlids specific duplication Maylandia zebra copy A2t1 Monocentris japonica Pundamilia nyererei Astatotilapia burtoni 100 Oreochromis niloticus Acanthopterygii Rondeletia loricata 100 Pundamilia nyererei 100 Maylandia zebra copy A2t2 Astatotilapia burtoni Acanthochaenus luetkenii 100 Oreochromis niloticus Neolamprologus brichardi Holocentrus rufus 100 Maylandia zebra Astatotilapia burtoni copy A2t3 100 Neoniphon sammara OreochromisPundamilianiloticus nyererei Euacanthomorphacea 66 100 Pundamilia nyererei Maylandia zebra copy A2t4 Myripristis jacobus Astatotilapia burtoni Neolamprologus brichardi 100 Oryzias latipes A2m2 Brotula barbata Oryzias latipes 98 Oryzias latipes A2m3 100 Poecilia formosa A2m1 Lamprogrammus exutus Xiphophorus maculatus 100 Gasterosteus aculeatus copy A2 Gasterosteus aculeatusA2s1 Carapus acus 95 Tetraodon nigroyirides A2s2 Takifugu rubripes Lesueurigobius cf. sanzi Gadus morhua 97 Astyanax mexicanus Dario rerio 97 Oreochromis niloticus Thunnus albacares 100 Neolamprologus brichardi 57 Astatotilapia burtoni Maylandia zebra Percomorphaceae Selene dorsalis Pundamilia nyererei 58 Gasterosteus aculeatus copy A1 89 Takifugu rubripes Helostoma temminckii Oryzias latipes 92 100 Poecilia formosa Anabas testudineus 100 Xiphophorus maculatus 81 Astyanax mexicanusGadus morhua Parablennius parvicornis Dario rerio Oreochromis niloticus 100 Pundamilia nyererei Maylandia zebra Chromis chromis Astatotilapia burtoni Neolamprologus brichardi Pseudochromis fuscus Takifugu rubripes copy B2b 100 100 Poecilia formosaB2ba2 Xiphophorus maculatus 51 XiphophorusPoeciliamaculatusformosa B2ba1 Oryzias latipes 94 Gasterosteus aculeatusB2bs1 Poecilia formosa 100 Gasterosteus aculeatusB2bs2 Gasterosteus aculeatus B2bs3 Neolamprologus brichardi Oryzias latipes 99 Pundamilia nyererei 82 100 Maylandia zebra Oreochromis niloticus OreochromisAstatotilapia niloticusburtoni Oryzias latipes copy B2a Symphodus melops 100 100 Xiphophorus maculatus 95 Poecilia formosa 95 Tetraodon nigroyirides Spondyliosoma cantharus 57 Takifugu rubripes Gasterosteus aculeatus 97 Gadus morhua Antennarius striatus AstatotilapiaDario rerioburtoni copy B2 Pundamilia nyererei Takifugu rubripes 80 100 Maylandia zebra 64 Neolamprologus brichardi 100 Oreochromis niloticus copy B1 Tetraodon nigrovirides Gasterosteus aculeatus Takifugu rubripes Lepisosteus oculatus Gasterosteus aculeatus Latimeria chalumnae copy B 0.2 Sebastes norvegicus bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A Syntenic analysis of genome region possessing ApoD clusters in teleost fishes with gar and chicken B ApoD clusters as the breakpoints of genome rearrangement beforeTGD
LG14 LG14 LG14 LG14 a. Genome rearrangement before TGD b. ApoD genes and their neighbours at the breakpoints LG19 LG19 LG19 LG19 si:dkey elavl1 gng7 LG10 LG10 LG10 chr 2 myo9b insra sppl2 57k2.6 phc3 skila nadkbsamd7 apod pex19 zgc101744 LG10 Dario rerio arhgef18a si:dkeyprkci gpr160sec62 apod copatspan37 diras1b hiat1a dapk357k2.7 slc1a8b gadd45bb group III arhgef18a gpr160 novel LG3 LG3 LG3 LG3 atcyab novel hiat1a dapk3 prkci sec62 apod apod apod lrrcc1 ackr4a Gasterosteus aculeatus sppl2 phc3 skila nadkbsamd7apod apod ralyl e2f5 novel chr20 group XXI cluster I loxl5bmyo9b insra Cluster II LG9 chr2 Cluster II Cluster II chr 17 myo9b insra novel novel prkci novel samd7apod and2 novel novel ackr4anovel gar_LG9 Cluster I Oryzias latipes LG9 LG9 LG9 hiat1a sppl2 nadkb sec62 apod apod ralyl lrrcc1 e2f5 novel novel chr17 group III LG18 chr24 arhgef18a dapk3 sppl2 Cluster I Cluster I Cluster I Cluster II LG18 nmrk2 myo9b insra phc3 skila sec62 apod apod ralyl e2f5 Oreochromis niloticus hiat1a dapk3 prkci nadkbsamd7 apod and2lrrcc1ackr4a medaka-gar stickleback-gar tilapia-gar zebrafish-gar loxl5b arhgef18a TGD ApoD clusters genome rearrangement skilb rpl22l1 lrrc34 apod myeov2 tfr1b klhl24b chr9 chr9 chr9 chr9 with ApoD clusters as the breakpoints larp4bmsl2bnceh1bnovel ghsrb tnikb cldn11b tmtopsb map3k15 chr28 chr28 chr 24 chr28 chr28 Dario rerio stag1b nceh1b pld1b eif5a novel lrrc31 mynn otos and1 tnk2accl20b dip2ca tnfsf10 fndc3bb slc7a14b sh3kbp1 chr8 rad51b um_ chr8 chr8 chr8 group XXI rdh12l mastl abi1a gad2anxa13thnsl1sam821apod otos and1 tnk2a sh3kbp1 cnksr2a Gasterosteus aculeatus arhgap21 tfr1b novel chr1 chr1 ca2 acbd5a pdss1 gpr158a apod myeov2 klhl34 chr1 chr1 cluster II novel myo3a enkur tmtopsb map3k15 um_ apod chr 20 yme1l1a anxa13 thnsl1 apod nceh1b ralyl rdh12l acbd5apdss1 gad2 sam821 myeov2novel tnk2a tfr1b chr20 group XXI LG9 chr2 Oryzias latipes Cluster II Cluster II Cluster I apbb1ip novel enkur apod otos novel novel irf9 Cluster II novel novel mastl abi1a myo3a arhgap21 tmtopsb tnfsf10 um_ chr2 chr2 chr2 chr2 LG 9 rdh12l myo3a thnsl1 sh3kbp1 cnksr2a ralyl mastl abi1aapbb1ip gpr158a sam821apod apod otos and1 novel ccl20b chr17 group III LG18 chr24 Oreochromis niloticus arhgap21 tfr1b tlr22 Cluster I Cluster I Cluster I Cluster II ca2 acbd5apdss1gad2 anxa13novel apod myeov2 yme1l1a enkur apod tmtopsb map3k15
medaka-chicken stickleback-chicken tilapia-chicken zebrafish-chicken LG10 LG14 LG19 LG3 LG9 skila nadkbsamd7 lrriq4mynn apod otos and1 novel tnk2b novel novel ccl34b Lepisosteus oculatus prkci gpr160sec62 lrrc31lrrc34novel apod myeov2 tmtopsb novel adipoqa tfr1b ccl20 ApoD clusters as the breakpoints note: the schematic lengths here do not represent the real lengths
Inversion occurred again with ApoD clusters as the breakpoints in cichlid fishes a. Schematic of ApoD protein domains D C common 5’ signal peptide lipocalin 3’ apoa1+MAPK Cluster I ca. 16AA tilapia approx. 20AA approx. 144AA Pundamilia nyererei b. Duplicated ApoD genes binding opportunities prediction platyfish MAPK+forkhead fugu haplochromine pla2g15 lcat cod Maylandia zebra A2 foxo3a stickleback copy A1 mapk9 copy A2 mapk10foxo1a zg16 medaka Astatotilapia burtoni foxo1b mapk8a LG18 foxo3b zebrafish foxo5mapk8b MAPK signalling pathway forkhead transcription factors coelacanth Oreochromis niloticus copy B1 coelacanth 40 kb pla2g15 lcat apoa1 B1 pla2g15 lcat Cluster II foxo3a mapk9 A foxo1a mapk9 jag1 mapk10 zg16 foxo1 foxo3 jag2 Pundamilia nyererei foxo3bmapk8a anxa11 foxo1b mapk8 mapk8b MAPK signalling pathway MAPK signalling pathway haplochromine Maylandia zebra forkhead transcription factors forkhead transcription factors copy B2a copy B2b Astatotilapia burtoni LG9 pla2g15 lcat foxo3a B2a Oreochromis niloticus mapk9 mapk10 foxo1a mapk8b zg16 foxo3b foxo1b mapk8a foxo5 30 kb MAPK signalling pathway forkhead transcription factors zebrafish LG ApoD cluster inversion copy A1 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Protein 3D structure modelling and cluster analyses of ApoD genes
A. loop 3 B. Cluster analyses based on the whole protein 3D structure opening side loop 1 B2b B2bs2 loop 7 B2bs1 zeb_B2 loop 5 B2a B1 cup-like A2m2 center A2m3
A2t
loop 4 loop 6 bottom A1 loop 2 A2 sites under positive selection under BEB with posterior probability > 95% A2s 0.8 C. Cluster analyses based on loops 1, 3, 5, 7 at the opening side D. Cluster analyses based on loops 2, 4, 6 at the bottom B2bB2a A1 A1 B2a B1 meda_B2b A2 A1 A2
zeb_B2 A2 B2
B2b B2a B1 B2b A1 A2 0.7 B2a 0.7 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.
Positive selection detection under branch-site model within codeml in PAML
�>1 A1 A1 �>1 A2s2 �>1 A2s1 A1 >1 � A2 A2 B1 B2bs1 B2bs2 B2 B2b B2bs3 B2a Dario rerio 0.2 Gadus morhua 0.3 Gasterosteus aculeatus 0.2
�>1 A2t2 A2m3 A1 >1 �>1 � >1 �>1 � A2t4 A2m2 A2 �>1 A2t3 A2m1 B1 A2t1 A1 A1 B2a B2b �>1 >1 B2b B2a � B2b B1 B2a Oryzias latipes Takifugu rubripes 0.4 cichlid fishes 0.3 0.3
A1 A1 A2 A2 B2a B2a �>1B2ba1 �>1 �>1 �>1B2ba2 B2b Poecilia formosa Xiphophorus maculatus 0.3 0.3 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Amino acids alignment of ApoD genes among species with protein 3D structure annotation Dario rerio Astyanax mexicanus Gadus morhua Gasterosteus aculeatus Takifugu rubripes copy A1 Oreochromis niloticus Oryzias latipes Xiphophorus maculatus Poecilia formosa Dario rerio Astyanax mexicanus Gadus morhua Gasterosteus aculeatus_A2s1 Gasterosteus aculeatus_A2s2
Takifugu rubripes Tetraodon nigroyirides Oreochromis niloticus_A2t1 Oreochromis niloticus_A2t2 copy A2 Oreochromis niloticus_A2t3
Oreochromis niloticus_A2t4
Oryzias latipes_A2m1 Oryzias latipes_A2m2 Oryzias latipes_A2m3
Xiphophorus maculatus Poecilia formosa Gasterosteus aculeatus Takifugu rubripes Tetraodon nigroyirides copy B2a Oreochromis niloticus Oryzias latipes Poecilia formosa Xiphophorus maculatus Gadus morhua Gasterosteus aculeatus_B2bs1 Gasterosteus aculeatus_B2bs2 loop Gasterosteus aculeatus_B2bs3 Takifugu rubripes copy B2b Oreochromis niloticus strand Oryzias latipes Poecilia formosa_B2ba1
Poecilia formosa_B2ba2 sites under positive selection under Bayes empirical Bayes Xiphophorus maculatus with posterior probability > 0.95 Gasterosteus aculeatus Takifugu rubripes copy B1 Oreochromis niloticus Dario rerio copy B2 bioRxiv preprint doi: https://doi.org/10.1101/265538; this version posted September 11, 2018. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. a. a. Expression patterns of of ApoD genes in different teleost fishes Cluster I Cluster II haplochromine Astatotilapia burtoni B2a B2b A1 A2t4 A2t2 A2t3 A2t1 B1 skin eye cichlid fishes LPJ LPJ LPJ LPJ liver anal fin pigmentation
Oreochromis niloticus B2a B2b A1 A2t4 A2t2 A2t3 A2t1 B1 5dh gonad 5dh gonad skin eye LPJ gill gill gill eye liver
Oryzias latipes B2a B2b A1 A2m1 A2m2 A2m3 Percomorphaceae eye skin eye gill gill gill ovary testis eye Acanthomorphata Gasterosteus aculeatus B2a B2bs1 B2bs2 B2bs3 A1 A2s1 A2s2 B1 spleen spleen spleen skin eye gill TGD liver liver liver Gadus morhua B2b A1 A2
ovary liver gill
Astyanax mexicanus A1 A2 eye ovary gill ovary skin eye
Dario rerio B2 A1 A2 skin skin skin testis testis ovary eye eye gill Lepisosteus oculatus B A skin eye gill liver testis brain anal fin pigmentation skin eye gill LPJ ovary testis brain spleen liver b. Oreochromis niloticus_A2t4 Astatotilapia burtoni_A2t4 A2t4 Oreochromis niloticus_A2t2 A2t2 Astatotilapia burtoni_A2t2 Oreochromis niloticus_A2t3 A2t3 c. Astatotilapia burtoni_A2t3 Oreochromis niloticus_A2t1 Highly specific expression of copy B2a and B2b Astatotilapia burtoni_A2t1 Oryzias latipes_A2m1 Oryzias latipes_A2m2 at early developmental stages of gonads in tilapia Oryzias latipes_A2m3 A2 Gasterosteus aculeatus_A2s1 Gasterosteus aculeatus_A2s2 140 Gadus morhua_A2 Dario rerio_A2 120 Astyanax mexicanus_A2 Oreochromis niloticus_A1 100 Astatotilapia burtoni_A1 Oryzias latipes_A1 Gasterosteus aculeatus_A1 80 A1 Gadus morhua_A1 Dario rerio_A1 60 Astyanax mexicanus_A1 Lepisosteus oculatus_A A 40 Oreochromis niloticus_B2a Astatotilapia burtoni_B2a 20 Oryzias latipes_B2a B2a Gasterosteus aculeatus_B2a Oreochromis niloticus_B2b 0 Astatotilapia burtoni_B2b A1 B2a B2b A2t1 A2t2 A2t3 A2t4 B1 Oryzias latipes_B2b Gasterosteus aculeatus_B2bs1 Gasterosteus aculeatus_B2bs2 B2b Gasterosteus aculeatus_B2bs3 5dahovary 5dahtestis 90dahovary 90dahtestis 180dahovary 180dahtestis Gadus morhua_B2b Dario rerio_B2 B2 Oreochromis niloticus_B1 Astatotilapia burtoni_B1 B1 Gasterosteus aculeatus_B1 Lepisosteus oculatus_B B